Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Semidefinite Programs for Exact Recovery of a Hidden Community

70 0 0.0 ( 0 )

Download Cite

Added by Jiaming Xu

Publication date 2016

fields Mathematical Statistics Informatics Engineering

and research's language is English

Authors Bruce Hajek - Yihong Wu - Jiaming Xu

Machine Learning Information Theory Social and Information Networks

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We study a semidefinite programming (SDP) relaxation of the maximum likelihood estimation for exactly recovering a hidden community of cardinality $K$ from an $n times n$ symmetric data matrix $A$, where for distinct indices $i,j$, $A_{ij} sim P$ if $i, j$ are both in the community and $A_{ij} sim Q$ otherwise, for two known probability distributions $P$ and $Q$. We identify a sufficient condition and a necessary condition for the success of SDP for the general model. For both the Bernoulli case ($P={{rm Bern}}(p)$ and $Q={{rm Bern}}(q)$ with $p>q$) and the Gaussian case ($P=mathcal{N}(mu,1)$ and $Q=mathcal{N}(0,1)$ with $mu>0$), which correspond to the problem of planted dense subgraph recovery and submatrix localization respectively, the general results lead to the following findings: (1) If $K=omega( n /log n)$, SDP attains the information-theoretic recovery limits with sharp constants; (2) If $K=Theta(n/log n)$, SDP is order-wise optimal, but strictly suboptimal by a constant factor; (3) If $K=o(n/log n)$ and $K to infty$, SDP is order-wise suboptimal. The same critical scaling for $K$ is found to hold, up to constant factors, for the performance of SDP on the stochastic block model of $n$ vertices partitioned into multiple communities of equal size $K$. A key ingredient in the proof of the necessary condition is a construction of a primal feasible solution based on random perturbation of the true cluster matrix.

rate research

Information Limits for Recovering a Hidden Community

155 - Bruce Hajek , Yihong Wu , Jiaming Xu 2015

We study the problem of recovering a hidden community of cardinality $K$ from an $n times n$ symmetric data matrix $A$, where for distinct indices $i,j$, $A_{ij} sim P$ if $i, j$ both belong to the community and $A_{ij} sim Q$ otherwise, for two known probability distributions $P$ and $Q$ depending on $n$. If $P={rm Bern}(p)$ and $Q={rm Bern}(q)$ with $p>q$, it reduces to the problem of finding a densely-connected $K$-subgraph planted in a large Erdos-Renyi graph; if $P=mathcal{N}(mu,1)$ and $Q=mathcal{N}(0,1)$ with $mu>0$, it corresponds to the problem of locating a $K times K$ principal submatrix of elevated means in a large Gaussian random matrix. We focus on two types of asymptotic recovery guarantees as $n to infty$: (1) weak recovery: expected number of classification errors is $o(K)$; (2) exact recovery: probability of classifying all indices correctly converges to one. Under mild assumptions on $P$ and $Q$, and allowing the community size to scale sublinearly with $n$, we derive a set of sufficient conditions and a set of necessary conditions for recovery, which are asymptotically tight with sharp constants. The results hold in particular for the Gaussian case, and for the case of bounded log likelihood ratio, including the Bernoulli case whenever $frac{p}{q}$ and $frac{1-p}{1-q}$ are bounded away from zero and infinity. An important algorithmic implication is that, whenever exact recovery is information theoretically possible, any algorithm that provides weak recovery when the community size is concentrated near $K$ can be upgraded to achieve exact recovery in linear additional time by a simple voting procedure.

Machine Learning Information Theory Information Theory

Achieving Exact Cluster Recovery Threshold via Semidefinite Programming: Extensions

472 - Bruce Hajek , Yihong Wu , Jiaming Xu 2015

Resolving a conjecture of Abbe, Bandeira and Hall, the authors have recently shown that the semidefinite programming (SDP) relaxation of the maximum likelihood estimator achieves the sharp threshold for exactly recovering the community structure under the binary stochastic block model of two equal-sized clusters. The same was shown for the case of a single cluster and outliers. Extending the proof techniques, in this paper it is shown that SDP relaxations also achieve the sharp recovery threshold in the following cases: (1) Binary stochastic block model with two clusters of sizes proportional to network size but not necessarily equal; (2) Stochastic block model with a fixed number of equal-sized clusters; (3) Binary censored block model with the background graph being ErdH{o}s-Renyi. Furthermore, a sufficient condition is given for an SDP procedure to achieve exact recovery for the general case of a fixed number of clusters plus outliers. These results demonstrate the versatility of SDP relaxation as a simple, general purpose, computationally feasible methodology for community detection.

Machine Learning Social and Information Networks Probability

Joint Community Detection and Rotational Synchronization via Semidefinite Programming

72 - Yifeng Fan , Yuehaw Khoo , Zhizhen Zhao 2021

In the presence of heterogeneous data, where randomly rotated objects fall into multiple underlying categories, it is challenging to simultaneously classify them into clusters and synchronize them based on pairwise relations. This gives rise to the joint problem of community detection and synchronization. We propose a series of semidefinite relaxations, and prove their exact recovery when extending the celebrated stochastic block model to this new setting where both rotations and cluster identities are to be determined. Numerical experiments demonstrate the efficacy of our proposed algorithms and confirm our theoretical result which indicates a sharp phase transition for exact recovery.

Machine Learning Machine Learning Social and Information Networks

Information Theoretic Limits of Exact Recovery in Sub-hypergraph Models for Community Detection

84 - Jiajun Liang , Chuyang Ke , Jean Honorio 2021

In this paper, we study the information theoretic bounds for exact recovery in sub-hypergraph models for community detection. We define a general model called the $m-$uniform sub-hypergraph stochastic block model ($m-$ShSBM). Under the $m-$ShSBM, we use Fanos inequality to identify the region of model parameters where any algorithm fails to exactly recover the planted communities with a large probability. We also identify the region where a Maximum Likelihood Estimation (MLE) algorithm succeeds to exactly recover the communities with high probability. Our bounds are tight and pertain to the community detection problems in various models such as the planted hypergraph stochastic block model, the planted densest sub-hypergraph model, and the planted multipartite hypergraph model.

Machine Learning Machine Learning

Recovering a Hidden Community Beyond the Kesten-Stigum Threshold in $O(|E| log^*|V|)$ Time

220 - Bruce Hajek , Yihong Wu , Jiaming Xu 2015

Community detection is considered for a stochastic block model graph of n vertices, with K vertices in the planted community, edge probability p for pairs of vertices both in the community, and edge probability q for other pairs of vertices. The main focus of the paper is on weak recovery of the community based on the graph G, with o(K) misclassified vertices on average, in the sublinear regime $n^{1-o(1)} leq K leq o(n).$ A critical parameter is the effective signal-to-noise ratio $lambda=K^2(p-q)^2/((n-K)q)$, with $lambda=1$ corresponding to the Kesten-Stigum threshold. We show that a belief propagation algorithm achieves weak recovery if $lambda>1/e$, beyond the Kesten-Stigum threshold by a factor of $1/e.$ The belief propagation algorithm only needs to run for $log^ast n+O(1) $ iterations, with the total time complexity $O(|E| log^*n)$, where $log^*n$ is the iterated logarithm of $n.$ Conversely, if $lambda leq 1/e$, no local algorithm can asymptotically outperform trivial random guessing. Furthermore, a linear message-passing algorithm that corresponds to applying power iteration to the non-backtracking matrix of the graph is shown to attain weak recovery if and only if $lambda>1$. In addition, the belief propagation algorithm can be combined with a linear-time voting procedure to achieve the information limit of exact recovery (correctly classify all vertices with high probability) for all $K ge frac{n}{log n} left( rho_{rm BP} +o(1) right),$ where $rho_{rm BP}$ is a function of $p/q$.

Machine Learning Computational Complexity Social and Information Networks

comments

Fetching comments

University of Mosul

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Semidefinite Programs for Exact Recovery of a Hidden Community

Ask ChatGPT about the research

No Arabic abstract

Read More