No Arabic abstract
Grouping objects into clusters based on similarities or weights between them is one of the most important problems in science and engineering. In this work, by extending message passing algorithms and spectral algorithms proposed for unweighted community detection problem, we develop a non-parametric method based on statistical physics, by mapping the problem to Potts model at the critical temperature of spin glass transition and applying belief propagation to solve the marginals corresponding to the Boltzmann distribution. Our algorithm is robust to over-fitting and gives a principled way to determine whether there are significant clusters in the data and how many clusters there are. We apply our method to different clustering tasks and use extensive numerical experiments to illustrate the advantage of our method over existing algorithms. In the community detection problem in weighted and directed networks, we show that our algorithm significantly outperforms existing algorithms. In the clustering problem when the data was generated by mixture models in the sparse regime we show that our method works to the theoretical limit of detectability and gives accuracy very close to that of the optimal Bayesian inference. In the semi-supervised clustering problem, our method only needs several labels to work perfectly in classic datasets. Finally, we further develop Thouless-Anderson-Palmer equations which reduce heavily the computation complexity in dense-networks but gives almost the same performance as belief propagation.
Message-passing methods provide a powerful approach for calculating the expected size of cascades either on random networks (e.g., drawn from a configuration-model ensemble or its generalizations) asymptotically as the number $N$ of nodes becomes infinite or on specific finite-size networks. We review the message-passing approach and show how to derive it for configuration-model networks using the methods of (Dhar et al., 1997) and (Gleeson, 2008). Using this approach, we explain for such networks how to determine an analytical expression for a cascade condition, which determines whether a global cascade will occur. We extend this approach to the message-passing methods for specific finite-size networks (Shrestha and Moore, 2014; Lokhov et al., 2015), and we derive a generalized cascade condition. Throughout this chapter, we illustrate these ideas using the Watts threshold model.
In this article we identify social communities among gang members in the Hollenbeck policing district in Los Angeles, based on sparse observations of a combination of social interactions and geographic locations of the individuals. This information, coming from LAPD Field Interview cards, is used to construct a similarity graph for the individuals. We use spectral clustering to identify clusters in the graph, corresponding to communities in Hollenbeck, and compare these with the LAPDs knowledge of the individuals gang membership. We discuss different ways of encoding the geosocial information using a graph structure and the influence on the resulting clusterings. Finally we analyze the robustness of this technique with respect to noisy and incomplete data, thereby providing suggestions about the relative importance of quantity versus quality of collected data.
Community detection is expensive, and the cost generally depends at least linearly on the number of vertices in the graph. We propose working with a reduced graph that has many fewer nodes but nonetheless captures key community structure. The K-core of a graph is the largest subgraph within which each node has at least K connections. We propose a framework that accelerates community detection by applying an expensive algorithm (modularity optimization, the Louvain method, spectral clustering, etc.) to the K-core and then using an inexpensive heuristic (such as local modularity maximization) to infer community labels for the remaining nodes. Our experiments demonstrate that the proposed framework can reduce the running time by more than 80% while preserving the quality of the solutions. Recent theoretical investigations provide support for using the K-core as a reduced representation.
Research into detection of dense communities has recently attracted increasing attention within network science, various metrics for detection of such communities have been proposed. The most popular metric -- Modularity -- is based on the so-called rule that the links within communities are denser than external links among communities, has become the default. However, this default metric suffers from ambiguity, and worse, all augmentations of modularity and based on a narrow intuition of what it means to form a community. We argue that in specific, but quite common systems, links within a community are not necessarily more common than links between communities. Instead we propose that the defining characteristic of a community is that links are more predictable within a community rather than between communities. In this paper, based on the effect of communities on link prediction, we propose a novel metric for the community detection based directly on this feature. We find that our metric is more robustness than traditional modularity. Consequently, we can achieve an evaluation of algorithm stability for the same detection algorithm in different networks. Our metric also can directly uncover the false community detection, and infer more statistical characteristics for detection algorithms.
Embedding a network in hyperbolic space can reveal interesting features for the network structure, especially in terms of self-similar characteristics. The hidden metric space, which can be thought of as the underlying structure of the network, is able to preserve some interesting features generally observed in real-world networks such as heterogeneity in the degree distribution, high clustering coefficient, and small-world effect. Moreover, the angular distribution of the nodes in the hyperbolic plane reveals a community structure of the embedded network. It is worth noting that, while a large body of literature compares well-known community detection algorithms, there is still no consensus on what defines an ideal community partition on a network. Moreover, heuristics for communities found on networks embedded in the hyperbolic space have been investigated here for the first time. We compare the partitions found on embedded networks to the partitions obtained before the embedding step, both for a synthetic network and for two real-world networks. The second part of this paper presents the application of our pipeline to a network of retweets in the context of the Italian elections. Our results uncover a community structure reflective of the political spectrum, encouraging further research on the application of community detection heuristics to graphs mapped onto hyperbolic planes.