ترغب بنشر مسار تعليمي؟ اضغط هنا

Multi-scale Anomaly Detection on Attributed Networks

105   0   0.0 ( 0 )
 نشر من قبل Leonardo Gutierrez
 تاريخ النشر 2019
والبحث باللغة English




اسأل ChatGPT حول البحث

Many social and economic systems can be represented as attributed networks encoding the relations between entities who are themselves described by different node attributes. Finding anomalies in these systems is crucial for detecting abuses such as credit card frauds, web spams or network intrusions. Intuitively, anomalous nodes are defined as nodes whose attributes differ starkly from the attributes of a certain set of nodes of reference, called the context of the anomaly. While some methods have proposed to spot anomalies locally, globally or within a community context, the problem remain challenging due to the multi-scale composition of real networks and the heterogeneity of node metadata. Here, we propose a principled way to uncover outlier nodes simultaneously with the context with respect to which they are anomalous, at all relevant scales of the network. We characterize anomalous nodes in terms of the concentration retained for each node after smoothing specific signals localized on the vertices of the graph. Besides, we introduce a graph signal processing formulation of the Markov stability framework used in community detection, in order to find the context of anomalies. The performance of our method is assessed on synthetic and real-world attributed networks and shows superior results concerning state of the art algorithms. Finally, we show the scalability of our approach in large networks employing Chebychev polynomial approximations.



قيم البحث

اقرأ أيضاً

Given a network with attributed edges, how can we identify anomalous behavior? Networks with edge attributes are commonplace in the real world. For example, edges in e-commerce networks often indicate how users rated products and services in terms of number of stars, and edges in online social and phonecall networks contain temporal information about when friendships were formed and when users communicated with each other -- in such cases, edge attributes capture information about how the adjacent nodes interact with other entities in the network. In this paper, we aim to utilize exactly this information to discern suspicious from typical node behavior. Our work has a number of notable contributions, including (a) formulation: while most other graph-based anomaly detection works use structural graph connectivity or node information, we focus on the new problem of leveraging edge information, (b) methodology: we introduce EdgeCentric, an intuitive and scalable compression-based approach for detecting edge-attributed graph anomalies, and (c) practicality: we show that EdgeCentric successfully spots numerous such anomalies in several large, edge-attributed real-world graphs, including the Flipkart e-commerce graph with over 3 million product reviews between 1.1 million users and 545 thousand products, where it achieved 0.87 precision over the top 100 results.
Recent years have witnessed an upsurge of interest in the problem of anomaly detection on attributed networks due to its importance in both research and practice. Although various approaches have been proposed to solve this problem, two major limitat ions exist: (1) unsupervised approaches usually work much less efficiently due to the lack of supervisory signal, and (2) existing anomaly detection methods only use local contextual information to detect anomalous nodes, e.g., one- or two-hop information, but ignore the global contextual information. Since anomalous nodes differ from normal nodes in structures and attributes, it is intuitive that the distance between anomalous nodes and their neighbors should be larger than that between normal nodes and their neighbors if we remove the edges connecting anomalous and normal nodes. Thus, hop counts based on both global and local contextual information can be served as the indicators of anomaly. Motivated by this intuition, we propose a hop-count based model (HCM) to detect anomalies by modeling both local and global contextual information. To make better use of hop counts for anomaly identification, we propose to use hop counts prediction as a self-supervised task. We design two anomaly scores based on the hop counts prediction via HCM model to identify anomalies. Besides, we employ Bayesian learning to train HCM model for capturing uncertainty in learned parameters and avoiding overfitting. Extensive experiments on real-world attributed networks demonstrate that our proposed model is effective in anomaly detection.
Dynamic networks, also called network streams, are an important data representation that applies to many real-world domains. Many sets of network data such as e-mail networks, social networks, or internet traffic networks are best represented by a dy namic network due to the temporal component of the data. One important application in the domain of dynamic network analysis is anomaly detection. Here the task is to identify points in time where the network exhibits behavior radically different from a typical time, either due to some event (like the failure of machines in a computer network) or a shift in the network properties. This problem is made more difficult by the fluid nature of what is considered normal network behavior. The volume of traffic on a network, for example, can change over the course of a month or even vary based on the time of the day without being considered unusual. Anomaly detection tests using traditional network statistics have difficulty in these scenarios due to their Density Dependence: as the volume of edges changes the value of the statistics changes as well making it difficult to determine if the change in signal is due to the traffic volume or due to some fundamental shift in the behavior of the network. To more accurately detect anomalies in dynamic networks, we introduce the concept of Density-Consistent network statistics. On synthetically generated graphs anomaly detectors using these statistics show a a 20-400% improvement in the recall when distinguishing graphs drawn from different distributions. When applied to several real datasets Density-Consistent statistics recover multiple network events which standard statistics failed to find.
The maximization of generalized modularity performs well on networks in which the members of all communities are statistically indistinguishable from each other. However, there is no theory bounding the maximization performance in more realistic netw orks where edges are heterogeneously distributed within and between communities. Using the random graph properties, we establish asymptotic theoretical bounds on the resolution parameter for which the generalized modularity maximization performs well. From this new perspective on random graph model, we find the resolution limit of modularity maximization can be explained in a surprisingly simple and straightforward way. Given a network produced by the stochastic block models, the communities for which the resolution parameter is larger than their densities are likely to be spread among multiple clusters, while communities for which the resolution parameter is smaller than their background inter-community edge density will be merged into one large component. Therefore, no suitable resolution parameter exits when the intra-community edge density in a subgraph is lower than the inter-community edge density in some other subgraph. For such networks, we propose a progressive agglomerative heuristic algorithm to detect practically significant communities at multiple scales.
There has been a surge of interest in community detection in homogeneous single-relational networks which contain only one type of nodes and edges. However, many real-world systems are naturally described as heterogeneous multi-relational networks wh ich contain multiple types of nodes and edges. In this paper, we propose a new method for detecting communities in such networks. Our method is based on optimizing the composite modularity, which is a new modularity proposed for evaluating partitions of a heterogeneous multi-relational network into communities. Our method is parameter-free, scalable, and suitable for various networks with general structure. We demonstrate that it outperforms the state-of-the-art techniques in detecting pre-planted communities in synthetic networks. Applied to a real-world Digg network, it successfully detects meaningful communities.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا