No Arabic abstract
Customers make a lot of reviews on online shopping websites every day, e.g., Amazon and Taobao. Reviews affect the buying decisions of customers, meanwhile, attract lots of spammers aiming at misleading buyers. Xianyu, the largest second-hand goods app in China, suffering from spam reviews. The anti-spam system of Xianyu faces two major challenges: scalability of the data and adversarial actions taken by spammers. In this paper, we present our technical solutions to address these challenges. We propose a large-scale anti-spam method based on graph convolutional networks (GCN) for detecting spam advertisements at Xianyu, named GCN-based Anti-Spam (GAS) model. In this model, a heterogeneous graph and a homogeneous graph are integrated to capture the local context and global context of a comment. Offline experiments show that the proposed method is superior to our baseline model in which the information of reviews, features of users and items being reviewed are utilized. Furthermore, we deploy our system to process million-scale data daily at Xianyu. The online performance also demonstrates the effectiveness of the proposed method.
These years much effort has been devoted to improving the accuracy or relevance of the recommendation system. Diversity, a crucial factor which measures the dissimilarity among the recommended items, received rather little scrutiny. Directly related to user satisfaction, diversification is usually taken into consideration after generating the candidate items. However, this decoupled design of diversification and candidate generation makes the whole system suboptimal. In this paper, we aim at pushing the diversification to the upstream candidate generation stage, with the help of Graph Convolutional Networks (GCN). Although GCN based recommendation algorithms have shown great power in modeling complex collaborative filtering effect to improve the accuracy of recommendation, how diversity changes is ignored in those advanced works. We propose to perform rebalanced neighbor discovering, category-boosted negative sampling and adversarial learning on top of GCN. We conduct extensive experiments on real-world datasets. Experimental results verify the effectiveness of our proposed method on diversification. Further ablation studies validate that our proposed method significantly alleviates the accuracy-diversity dilemma.
Last year, IEEE 802.11 Extremely High Throughput Study Group (EHT Study Group) was established to initiate discussions on new IEEE 802.11 features. Coordinated control methods of the access points (APs) in the wireless local area networks (WLANs) are discussed in EHT Study Group. The present study proposes a deep reinforcement learning-based channel allocation scheme using graph convolutional networks (GCNs). As a deep reinforcement learning method, we use a well-known method double deep Q-network. In densely deployed WLANs, the number of the available topologies of APs is extremely high, and thus we extract the features of the topological structures based on GCNs. We apply GCNs to a contention graph where APs within their carrier sensing ranges are connected to extract the features of carrier sensing relationships. Additionally, to improve the learning speed especially in an early stage of learning, we employ a game theory-based method to collect the training data independently of the neural network model. The simulation results indicate that the proposed method can appropriately control the channels when compared to extant methods.
Spreadsheet table detection is the task of detecting all tables on a given sheet and locating their respective ranges. Automatic table detection is a key enabling technique and an initial step in spreadsheet data intelligence. However, the detection task is challenged by the diversity of table structures and table layouts on the spreadsheet. Considering the analogy between a cell matrix as spreadsheet and a pixel matrix as image, and encouraged by the successful application of Convolutional Neural Networks (CNN) in computer vision, we have developed TableSense, a novel end-to-end framework for spreadsheet table detection. First, we devise an effective cell featurization scheme to better leverage the rich information in each cell; second, we develop an enhanced convolutional neural network model for table detection to meet the domain-specific requirement on precise table boundary detection; third, we propose an effective uncertainty metric to guide an active learning based smart sampling algorithm, which enables the efficient build-up of a training dataset with 22,176 tables on 10,220 sheets with broad coverage of diverse table structures and layouts. Our evaluation shows that TableSense is highly effective with 91.3% recall and 86.5% precision in EoB-2 metric, a significant improvement over both the current detection algorithm that are used in commodity spreadsheet tools and state-of-the-art convolutional neural networks in computer vision.
Nowadays, with the rise of Internet access and mobile devices around the globe, more people are using social networks for collaboration and receiving real-time information. Twitter, the microblogging that is becoming a critical source of communication and news propagation, has grabbed the attention of spammers to distract users. So far, researchers have introduced various defense techniques to detect spams and combat spammer activities on Twitter. To overcome this problem, in recent years, many novel techniques have been offered by researchers, which have greatly enhanced the spam detection performance. Therefore, it raises a motivation to conduct a systematic review about different approaches of spam detection on Twitter. This review focuses on comparing the existing research techniques on Twitter spam detection systematically. Literature review analysis reveals that most of the existing methods rely on Machine Learning-based algorithms. Among these Machine Learning algorithms, the major differences are related to various feature selection methods. Hence, we propose a taxonomy based on different feature selection methods and analyses, namely content analysis, user analysis, tweet analysis, network analysis, and hybrid analysis. Then, we present numerical analyses and comparative studies on current approaches, coming up with open challenges that help researchers develop solutions in this topic.
We devise an autoencoder based strategy to facilitate anomaly detection for boosted jets, employing Graph Neural Networks (GNNs) to do so. To overcome known limitations of GNN autoencoders, we design a symmetric decoder capable of simultaneously reconstructing edge features and node features. Focusing on latent space based discriminators, we find that such setups provide a promising avenue to isolate new physics and competing SM signatures from sensitivity-limiting QCD jet contributions. We demonstrate the flexibility and broad applicability of this approach using examples of $W$ bosons, top quarks, and exotic hadronically-decaying exotic scalar bosons.