ﻻ يوجد ملخص باللغة العربية
This paper explores and analyzes the unsupervised clustering of large partially observed graphs. We propose a scalable and provable randomized framework for clustering graphs generated from the stochastic block model. The clustering is first applied to a sub-matrix of the graphs adjacency matrix associated with a reduced graph sketch constructed using random sampling. Then, the clusters of the full graph are inferred based on the clusters extracted from the sketch using a correlation-based retrieval step. Uniform random node sampling is shown to improve the computational complexity over clustering of the full graph when the cluster sizes are balanced. A new random degree-based node sampling algorithm is presented which significantly improves upon the performance of the clustering algorithm even when clusters are unbalanced. This algorithm improves the phase transitions for matrix-decomposition-based clustering with regard to computational complexity and minimum cluster size, which are shown to be nearly dimension-free in the low inter-cluster connectivity regime. A third sampling technique is shown to improve balance by randomly sampling nodes based on spatial distribution. We provide analysis and numerical results using a convex clustering algorithm based on matrix completion.
A common goal in network modeling is to uncover the latent community structure present among nodes. For many real-world networks, observed connections consist of events arriving as streams, which are then aggregated to form edges, ignoring the tempor
Fitting large Bayesian network models quickly become computationally infeasible when the number of nodes grows into the hundreds of thousands and millions. In particular, the mixed membership stochastic blockmodel (MMSB) is a popular Bayesian network
A distinguishing property of communities in networks is that cycles are more prevalent within communities than across communities. Thus, the detection of these communities may be aided through the incorporation of measures of the local richness of th
We study the community detection problem on a Gaussian mixture model, in which vertices are divided into $kgeq 2$ distinct communities. The major difference in our model is that the intensities for Gaussian perturbations are different for different e
Community detection is a key task to further understand the function and the structure of complex networks. Therefore, a strategy used to assess this task must be able to avoid biased and incorrect results that might invalidate further analyses or ap