ترغب بنشر مسار تعليمي؟ اضغط هنا

Accelerating Community Detection by Using K-core Subgraphs

127   0   0.0 ( 0 )
 نشر من قبل Chengbin Peng
 تاريخ النشر 2014
والبحث باللغة English




اسأل ChatGPT حول البحث

Community detection is expensive, and the cost generally depends at least linearly on the number of vertices in the graph. We propose working with a reduced graph that has many fewer nodes but nonetheless captures key community structure. The K-core of a graph is the largest subgraph within which each node has at least K connections. We propose a framework that accelerates community detection by applying an expensive algorithm (modularity optimization, the Louvain method, spectral clustering, etc.) to the K-core and then using an inexpensive heuristic (such as local modularity maximization) to infer community labels for the remaining nodes. Our experiments demonstrate that the proposed framework can reduce the running time by more than 80% while preserving the quality of the solutions. Recent theoretical investigations provide support for using the K-core as a reduced representation.



قيم البحث

اقرأ أيضاً

The organisation of a network in a maximal set of nodes having at least $k$ neighbours within the set, known as $k$-core decomposition, has been used for studying various phenomena. It has been shown that nodes in the innermost $k$-shells play a cruc ial role in contagion processes, emergence of consensus, and resilience of the system. It is known that the $k$-core decomposition of many empirical networks cannot be explained by the degree of each node alone, or equivalently, random graph models that preserve the degree of each node (i.e., configuration model). Here we study the $k$-core decomposition of some empirical networks as well as that of some randomised counterparts, and examine the extent to which the $k$-shell structure of the networks can be accounted for by the community structure. We find that preserving the community structure in the randomisation process is crucial for generating networks whose $k$-core decomposition is close to the empirical one. We also highlight the existence, in some networks, of a concentration of the nodes in the innermost $k$-shells into a small number of communities.
Grouping objects into clusters based on similarities or weights between them is one of the most important problems in science and engineering. In this work, by extending message passing algorithms and spectral algorithms proposed for unweighted commu nity detection problem, we develop a non-parametric method based on statistical physics, by mapping the problem to Potts model at the critical temperature of spin glass transition and applying belief propagation to solve the marginals corresponding to the Boltzmann distribution. Our algorithm is robust to over-fitting and gives a principled way to determine whether there are significant clusters in the data and how many clusters there are. We apply our method to different clustering tasks and use extensive numerical experiments to illustrate the advantage of our method over existing algorithms. In the community detection problem in weighted and directed networks, we show that our algorithm significantly outperforms existing algorithms. In the clustering problem when the data was generated by mixture models in the sparse regime we show that our method works to the theoretical limit of detectability and gives accuracy very close to that of the optimal Bayesian inference. In the semi-supervised clustering problem, our method only needs several labels to work perfectly in classic datasets. Finally, we further develop Thouless-Anderson-Palmer equations which reduce heavily the computation complexity in dense-networks but gives almost the same performance as belief propagation.
Research into detection of dense communities has recently attracted increasing attention within network science, various metrics for detection of such communities have been proposed. The most popular metric -- Modularity -- is based on the so-called rule that the links within communities are denser than external links among communities, has become the default. However, this default metric suffers from ambiguity, and worse, all augmentations of modularity and based on a narrow intuition of what it means to form a community. We argue that in specific, but quite common systems, links within a community are not necessarily more common than links between communities. Instead we propose that the defining characteristic of a community is that links are more predictable within a community rather than between communities. In this paper, based on the effect of communities on link prediction, we propose a novel metric for the community detection based directly on this feature. We find that our metric is more robustness than traditional modularity. Consequently, we can achieve an evaluation of algorithm stability for the same detection algorithm in different networks. Our metric also can directly uncover the false community detection, and infer more statistical characteristics for detection algorithms.
Embedding a network in hyperbolic space can reveal interesting features for the network structure, especially in terms of self-similar characteristics. The hidden metric space, which can be thought of as the underlying structure of the network, is ab le to preserve some interesting features generally observed in real-world networks such as heterogeneity in the degree distribution, high clustering coefficient, and small-world effect. Moreover, the angular distribution of the nodes in the hyperbolic plane reveals a community structure of the embedded network. It is worth noting that, while a large body of literature compares well-known community detection algorithms, there is still no consensus on what defines an ideal community partition on a network. Moreover, heuristics for communities found on networks embedded in the hyperbolic space have been investigated here for the first time. We compare the partitions found on embedded networks to the partitions obtained before the embedding step, both for a synthetic network and for two real-world networks. The second part of this paper presents the application of our pipeline to a network of retweets in the context of the Italian elections. Our results uncover a community structure reflective of the political spectrum, encouraging further research on the application of community detection heuristics to graphs mapped onto hyperbolic planes.
Many systems exhibit complex temporal dynamics due to the presence of different processes taking place simultaneously. Temporal networks provide a framework to describe the time-resolve interactions between components of a system. An important task w hen investigating such systems is to extract a simplified view of the temporal network, which can be done via dynamic community detection or clustering. Several works have generalized existing community detection methods for static networks to temporal networks, but they usually rely on temporal aggregation over time windows, the assumption of an underlying stationary process, or sequences of different stationary epochs. Here, we derive a method based on a dynamical process evolving on the temporal network and restricted by its activation pattern that allows to consider the full temporal information of the system. Our method allows dynamics that do not necessarily reach a steady state, or follow a sequence of stationary states. Our framework encompasses several well-known heuristics as special cases. We show that our method provides a natural way to disentangle the different natural dynamical scales present in a system. We demonstrate our method abilities on synthetic and real-world examples.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا