Information-theoretic thresholds for community detection in sparse networks


الملخص بالإنكليزية

We give upper and lower bounds on the information-theoretic threshold for community detection in the stochastic block model. Specifically, let $k$ be the number of groups, $d$ be the average degree, the probability of edges between vertices within and between groups be $c_mathrm{in}/n$ and $c_mathrm{out}/n$ respectively, and let $lambda = (c_mathrm{in}-c_mathrm{out})/(kd)$. We show that, when $k$ is large, and $lambda = O(1/k)$, the critical value of $d$ at which community detection becomes possible -- in physical terms, the condensation threshold -- is [ d_c = Theta!left( frac{log k}{k lambda^2} right) , , ] with tighter results in certain regimes. Above this threshold, we show that the only partitions of the nodes into $k$ groups are correlated with the ground truth, giving an exponential-time algorithm that performs better than chance -- in particular, detection is possible for $k ge 5$ in the disassortative case $lambda < 0$ and for $k ge 11$ in the assortative case $lambda > 0$. (Similar upper bounds were obtained independently by Abbe and Sandon.) Below this threshold, we use recent results of Neeman and Netrapalli (who generalized arguments of Mossel, Neeman, and Sly) to show that no algorithm can label the vertices better than chance, or even distinguish the block model from an ErdH{o}s-Renyi random graph with high probability. We also rely on bounds on certain functions of doubly stochastic matrices due to Achlioptas and Naor; indeed, our lower bound on $d_c$ is the second moment lower bound on the $k$-colorability threshold for random graphs with a certain effective degree.

تحميل البحث