Weighted Community Detection and Data Clustering Using Message Passing


Abstract in English

Grouping objects into clusters based on similarities or weights between them is one of the most important problems in science and engineering. In this work, by extending message passing algorithms and spectral algorithms proposed for unweighted community detection problem, we develop a non-parametric method based on statistical physics, by mapping the problem to Potts model at the critical temperature of spin glass transition and applying belief propagation to solve the marginals corresponding to the Boltzmann distribution. Our algorithm is robust to over-fitting and gives a principled way to determine whether there are significant clusters in the data and how many clusters there are. We apply our method to different clustering tasks and use extensive numerical experiments to illustrate the advantage of our method over existing algorithms. In the community detection problem in weighted and directed networks, we show that our algorithm significantly outperforms existing algorithms. In the clustering problem when the data was generated by mixture models in the sparse regime we show that our method works to the theoretical limit of detectability and gives accuracy very close to that of the optimal Bayesian inference. In the semi-supervised clustering problem, our method only needs several labels to work perfectly in classic datasets. Finally, we further develop Thouless-Anderson-Palmer equations which reduce heavily the computation complexity in dense-networks but gives almost the same performance as belief propagation.

Download