No Arabic abstract
Community detection has been well studied recent years, but the more realistic case of mixed membership community detection remains a challenge. Here, we develop an efficient spectral algorithm Mixed-ISC based on applying more than K eigenvectors for clustering given K communities for estimating the community memberships under the degree-corrected mixed membership (DCMM) model. We show that the algorithm is asymptotically consistent. Numerical experiments on both simulated networks and many empirical networks demonstrate that Mixed-ISC performs well compared to a number of benchmark methods for mixed membership community detection. Especially, Mixed-ISC provides satisfactory performances on weak signal networks.
Mixed-SCORE is a recent approach for mixed membership community detection proposed by Jin et al. (2017) which is an extension of SCORE (Jin, 2015). In the note Jin et al. (2018), the authors propose SCORE+ as an improvement of SCORE to handle with weak signal networks. In this paper, we propose a method called Mixed-SCORE+ designed based on the Mixed-SCORE and SCORE+, therefore Mixed-SCORE+ inherits nice properties of both Mixed-SCORE and SCORE+. In the proposed method, we consider K+1 eigenvectors when there are K communities to detect weak signal networks. And we also construct vertices hunting and membership reconstruction steps to solve the problem of mixed membership community detection. Compared with several benchmark methods, numerical results show that Mixed-SCORE+ provides a significant improvement on the Polblogs network and two weak signal networks Simmons and Caltech, with error rates 54/1222, 125/1137 and 94/590, respectively. Furthermore, Mixed-SCORE+ enjoys excellent performances on the SNAP ego-networks.
For community detection problem, spectral clustering is a widely used method for detecting clusters in networks. In this paper, we propose an improved spectral clustering (ISC) approach under the degree corrected stochastic block model (DCSBM). ISC is designed based on the k-means clustering algorithm on the weighted leading K + 1 eigenvectors of a regularized Laplacian matrix where the weights are their corresponding eigenvalues. Theoretical analysis of ISC shows that under mild conditions the ISC yields stable consistent community detection. Numerical results show that ISC outperforms classical spectral clustering methods for community detection on both simulated and eight empirical networks. Especially, ISC provides a significant improvement on two weak signal networks Simmons and Caltech, with error rates of 121/1137 and 96/590, respectively.
Community detection in network analysis is an attractive research area recently. Here, under the degree-corrected mixed membership (DCMM) model, we propose an efficient approach called mixed regularized spectral clustering (Mixed-RSC for short) based on the regularized Laplacian matrix. Mixed-RSC is designed based on an ideal cone structure of the variant for the eigen-decomposition of the population regularized Laplacian matrix. We show that the algorithm is asymptotically consistent under mild conditions by providing error bounds for the inferred membership vector of each node. As a byproduct of our bound, we provide the theoretical optimal choice for the regularization parameter {tau}. To demonstrate the performance of our method, we apply it with previous benchmark methods on both simulated and real-world networks. To our knowledge, this is the first work to design spectral clustering algorithm for mixed membership community detection problem under DCMM model based on the application of regularized Laplacian matrix.
With invaluable theoretical and practical benefits, the problem of partitioning networks for community structures has attracted significant research attention in scientific and engineering disciplines. In literature, Newmans modularity measure is routinely applied to quantify the quality of a given partition, and thereby maximizing the measure provides a principled way of detecting communities in networks. Unfortunately, the exact optimization of the measure is computationally NP-complete and only applicable to very small networks. Approximation approaches have to be sought to scale to large networks. To address the computational issue, we proposed a new method to identify the partition decisions. Coupled with an iterative rounding strategy and a fast constrained power method, our work achieves tight and effective spectral relaxations. The proposed method was evaluated thoroughly on both real and synthetic networks. Compared with state-of-the-art approaches, the method obtained comparable, if not better, qualities. Meanwhile, it is highly suitable for parallel execution and reported a nearly linear improvement in running speed when increasing the number of computing nodes, which thereby provides a practical tool for partitioning very large networks.
Community detections for large-scale real world networks have been more popular in social analytics. In particular, dynamically growing network analyses become important to find long-term trends and detect anomalies. In order to analyze such networks, we need to obtain many snapshots and apply same analytic methods to them. However, it is inefficient to extract communities from these whole newly generated networks with little differences every time, and then it is impossible to follow the network growths in the real time. We proposed an incremental community detection algorithm for high-volume graph streams. It is based on the top of a well-known batch-oriented algorithm named DEMON[1]. We also evaluated performance and precisions of our proposed incremental algorithm with real-world big networks with up to 410,236 vertices and 2,439,437 edges and computed in less than one second to detect communities in an incremental fashion - which achieves up to 107 times faster than the original algorithm without sacrificing accuracies.