No Arabic abstract
Attributed networks are ubiquitous since a network often comes with auxiliary attribute information e.g. a social network with user profiles. Attributed Network Embedding (ANE) has recently attracted considerable attention, which aims to learn unified low dimensional node embeddings while preserving both structural and attribute information. The resulting node embeddings can then facilitate various network downstream tasks e.g. link prediction. Although there are several ANE methods, most of them cannot deal with incomplete attributed networks with missing links and/or missing node attributes, which often occur in real-world scenarios. To address this issue, we propose a robust ANE method, the general idea of which is to reconstruct a unified denser network by fusing two sources of information for information enhancement, and then employ a random walks based network embedding method for learning node embeddings. The experiments of link prediction, node classification, visualization, and parameter sensitivity analysis on six real-world datasets validate the effectiveness of our method to incomplete attributed networks.
Deep generative models (DGMs) have achieved remarkable advances. Semi-supervised variational auto-encoders (SVAE) as a classical DGM offer a principled framework to effectively generalize from small labelled data to large unlabelled ones, but it is difficult to incorporate rich unstructured relationships within the multiple heterogeneous entities. In this paper, to deal with the problem, we present a semi-supervised co-embedding model for attributed networks (SCAN) based on the generalized SVAE for heterogeneous data, which collaboratively learns low-dimensional vector representations of both nodes and attributes for partially labelled attributed networks semi-supervisedly. The node and attribute embeddings obtained in a unified manner by our SCAN can benefit for capturing not only the proximities between nodes but also the affinities between nodes and attributes. Moreover, our model also trains a discriminative network to learn the label predictive distribution of nodes. Experimental results on real-world networks demonstrate that our model yields excellent performance in a number of applications such as attribute inference, user profiling and node classification compared to the state-of-the-art baselines.
In this paper, we propose a novel framework, called Semi-supervised Embedding in Attributed Networks with Outliers (SEANO), to learn a low-dimensional vector representation that systematically captures the topological proximity, attribute affinity and label similarity of vertices in a partially labeled attributed network (PLAN). Our method is designed to work in both transductive and inductive settings while explicitly alleviating noise effects from outliers. Experimental results on various datasets drawn from the web, text and image domains demonstrate the advantages of SEANO over state-of-the-art methods in semi-supervised classification under transductive as well as inductive settings. We also show that a subset of parameters in SEANO is interpretable as outlier score and can significantly outperform baseline methods when applied for detecting network outliers. Finally, we present the use of SEANO in a challenging real-world setting -- flood mapping of satellite images and show that it is able to outperform modern remote sensing algorithms for this task.
Network embedding has recently emerged as a promising technique to embed nodes of a network into low-dimensional vectors. While fairly successful, most existing works focus on the embedding techniques for static networks. But in practice, there are many networks that are evolving over time and hence are dynamic, e.g., the social networks. To address this issue, a high-order spatio-temporal embedding model is developed to track the evolutions of dynamic networks. Specifically, an activeness-aware neighborhood embedding method is first proposed to extract the high-order neighborhood information at each given timestamp. Then, an embedding prediction framework is further developed to capture the temporal correlations, in which the attention mechanism is employed instead of recurrent neural networks (RNNs) for its efficiency in computing and flexibility in modeling. Extensive experiments are conducted on four real-world datasets from three different areas. It is shown that the proposed method outperforms all the baselines by a substantial margin for the tasks of dynamic link prediction and node classification, which demonstrates the effectiveness of the proposed methods on tracking the evolutions of dynamic networks.
Mining tasks over sequential data, such as clickstreams and gene sequences, require a careful design of embeddings usable by learning algorithms. Recent research in feature learning has been extended to sequential data, where each instance consists of a sequence of heterogeneous items with a variable length. However, many real-world applications often involve attributed sequences, where each instance is composed of both a sequence of categorical items and a set of attributes. In this paper, we study this new problem of attributed sequence embedding, where the goal is to learn the representations of attributed sequences in an unsupervised fashion. This problem is core to many important data mining tasks ranging from user behavior analysis to the clustering of gene sequences. This problem is challenging due to the dependencies between sequences and their associated attributes. We propose a deep multimodal learning framework, called NAS, to produce embeddings of attributed sequences. The embeddings are task independent and can be used on various mining tasks of attributed sequences. We demonstrate the effectiveness of our embeddings of attributed sequences in various unsupervised learning tasks on real-world datasets.
Recent years have witnessed an upsurge of interest in the problem of anomaly detection on attributed networks due to its importance in both research and practice. Although various approaches have been proposed to solve this problem, two major limitations exist: (1) unsupervised approaches usually work much less efficiently due to the lack of supervisory signal, and (2) existing anomaly detection methods only use local contextual information to detect anomalous nodes, e.g., one- or two-hop information, but ignore the global contextual information. Since anomalous nodes differ from normal nodes in structures and attributes, it is intuitive that the distance between anomalous nodes and their neighbors should be larger than that between normal nodes and their neighbors if we remove the edges connecting anomalous and normal nodes. Thus, hop counts based on both global and local contextual information can be served as the indicators of anomaly. Motivated by this intuition, we propose a hop-count based model (HCM) to detect anomalies by modeling both local and global contextual information. To make better use of hop counts for anomaly identification, we propose to use hop counts prediction as a self-supervised task. We design two anomaly scores based on the hop counts prediction via HCM model to identify anomalies. Besides, we employ Bayesian learning to train HCM model for capturing uncertainty in learned parameters and avoiding overfitting. Extensive experiments on real-world attributed networks demonstrate that our proposed model is effective in anomaly detection.