No Arabic abstract
Based on the classical Degree Corrected Stochastic Blockmodel (DCSBM) model for network community detection problem, we propose two novel approaches: principal component clustering (PCC) and normalized principal component clustering (NPCC). Without any parameters to be estimated, the PCC method is simple to be implemented. Under mild conditions, we show that PCC yields consistent community detection. NPCC is designed based on the combination of the PCC and the RSC method (Qin & Rohe 2013). Population analysis for NPCC shows that NPCC returns perfect clustering for the ideal case under DCSBM. PCC and NPCC is illustrated through synthetic and real-world datasets. Numerical results show that NPCC provides a significant improvement compare with PCC and RSC. Moreover, NPCC inherits nice properties of PCC and RSC such that NPCC is insensitive to the number of eigenvectors to be clustered and the choosing of the tuning parameter. When dealing with two weak signal networks Simmons and Caltech, by considering one more eigenvectors for clustering, we provide two refinements PCC+ and NPCC+ of PCC and NPCC, respectively. Both two refinements algorithms provide improvement performances compared with their original algorithms. Especially, NPCC+ provides satisfactory performances on Simmons and Caltech, with error rates of 121/1137 and 96/590, respectively.
Spectral clustering methods are widely used for detecting clusters in networks for community detection, while a small change on the graph Laplacian matrix could bring a dramatic improvement. In this paper, we propose a dual regularized graph Laplacian matrix and then employ it to three classical spectral clustering approaches under the degree-corrected stochastic block model. If the number of communities is known as $K$, we consider more than $K$ leading eigenvectors and weight them by their corresponding eigenvalues in the spectral clustering procedure to improve the performance. Three improved spectral clustering methods are dual regularized spectral clustering (DRSC) method, dual regularized spectral clustering on Ratios-of-eigenvectors (DRSCORE) method, and dual regularized symmetrized Laplacian inverse matrix (DRSLIM) method. Theoretical analysis of DRSC and DRSLIM show that under mild conditions DRSC and DRSLIM yield stable consistent community detection, moreover, DRSCORE returns perfect clustering under the ideal case. We compare the performances of DRSC, DRSCORE and DRSLIM with several spectral methods by substantial simulated networks and eight real-world networks.
The community detection problem requires to cluster the nodes of a network into a small number of well-connected communities. There has been substantial recent progress in characterizing the fundamental statistical limits of community detection under simple stochastic block models. However, in real-world applications, the network structure is typically dynamic, with nodes that join over time. In this setting, we would like a detection algorithm to perform only a limited number of updates at each node arrival. While standard voting approaches satisfy this constraint, it is unclear whether they exploit the network information optimally. We introduce a simple model for networks growing over time which we refer to as streaming stochastic block model (StSBM). Within this model, we prove that voting algorithms have fundamental limitations. We also develop a streaming belief-propagation (StreamBP) approach, for which we prove optimality in certain regimes. We validate our theoretical findings on synthetic and real data.
In the presence of heterogeneous data, where randomly rotated objects fall into multiple underlying categories, it is challenging to simultaneously classify them into clusters and synchronize them based on pairwise relations. This gives rise to the joint problem of community detection and synchronization. We propose a series of semidefinite relaxations, and prove their exact recovery when extending the celebrated stochastic block model to this new setting where both rotations and cluster identities are to be determined. Numerical experiments demonstrate the efficacy of our proposed algorithms and confirm our theoretical result which indicates a sharp phase transition for exact recovery.
For community detection problem, spectral clustering is a widely used method for detecting clusters in networks. In this paper, we propose an improved spectral clustering (ISC) approach under the degree corrected stochastic block model (DCSBM). ISC is designed based on the k-means clustering algorithm on the weighted leading K + 1 eigenvectors of a regularized Laplacian matrix where the weights are their corresponding eigenvalues. Theoretical analysis of ISC shows that under mild conditions the ISC yields stable consistent community detection. Numerical results show that ISC outperforms classical spectral clustering methods for community detection on both simulated and eight empirical networks. Especially, ISC provides a significant improvement on two weak signal networks Simmons and Caltech, with error rates of 121/1137 and 96/590, respectively.
We propose a novel graph clustering method guided by additional information on the underlying structure of the clusters (or communities). The problem is formulated as the matching of a graph to a template with smaller dimension, hence matching $n$ vertices of the observed graph (to be clustered) to the $k$ vertices of a template graph, using its edges as support information, and relaxed on the set of orthonormal matrices in order to find a $k$ dimensional embedding. With relevant priors that encode the density of the clusters and their relationships, our method outperforms classical methods, especially for challenging cases.