ترغب بنشر مسار تعليمي؟ اضغط هنا

Phase Transitions in Sparse PCA

183   0   0.0 ( 0 )
 نشر من قبل Thibault Lesieur
 تاريخ النشر 2015
والبحث باللغة English




اسأل ChatGPT حول البحث

We study optimal estimation for sparse principal component analysis when the number of non-zero elements is small but on the same order as the dimension of the data. We employ approximate message passing (AMP) algorithm and its state evolution to analyze what is the information theoretically minimal mean-squared error and the one achieved by AMP in the limit of large sizes. For a special case of rank one and large enough density of non-zeros Deshpande and Montanari [1] proved that AMP is asymptotically optimal. We show that both for low density and for large rank the problem undergoes a series of phase transitions suggesting existence of a region of parameters where estimation is information theoretically possible, but AMP (and presumably every other polynomial algorithm) fails. The analysis of the large rank limit is particularly instructive.



قيم البحث

اقرأ أيضاً

We study the problem of detecting a structured, low-rank signal matrix corrupted with additive Gaussian noise. This includes clustering in a Gaussian mixture model, sparse PCA, and submatrix localization. Each of these problems is conjectured to exhi bit a sharp information-theoretic threshold, below which the signal is too weak for any algorithm to detect. We derive upper and lower bounds on these thresholds by applying the first and second moment methods to the likelihood ratio between these planted models and null models where the signal matrix is zero. Our bounds differ by at most a factor of root two when the rank is large (in the clustering and submatrix localization problems, when the number of clusters or blocks is large) or the signal matrix is very sparse. Moreover, our upper bounds show that for each of these problems there is a significant regime where reliable detection is information- theoretically possible but where known algorithms such as PCA fail completely, since the spectrum of the observed matrix is uninformative. This regime is analogous to the conjectured hard but detectable regime for community detection in sparse graphs.
Sparse Principal Component Analysis (PCA) is a dimensionality reduction technique wherein one seeks a low-rank representation of a data matrix with additional sparsity constraints on the obtained representation. We consider two probabilistic formulat ions of sparse PCA: a spiked Wigner and spiked Wishart (or spiked covariance) model. We analyze an Approximate Message Passing (AMP) algorithm to estimate the underlying signal and show, in the high dimensional limit, that the AMP estimates are information-theoretically optimal. As an immediate corollary, our results demonstrate that the posterior expectation of the underlying signal, which is often intractable to compute, can be obtained using a polynomial-time scheme. Our results also effectively provide a single-letter characterization of the sparse PCA problem.
We study the statistical problem of estimating a rank-one sparse tensor corrupted by additive Gaussian noise, a model also known as sparse tensor PCA. We show that for Bernoulli and Bernoulli-Rademacher distributed signals and emph{for all} sparsity levels which are sublinear in the dimension of the signal, the sparse tensor PCA model exhibits a phase transition called the emph{all-or-nothing phenomenon}. This is the property that for some signal-to-noise ratio (SNR) $mathrm{SNR_c}$ and any fixed $epsilon>0$, if the SNR of the model is below $left(1-epsilonright)mathrm{SNR_c}$, then it is impossible to achieve any arbitrarily small constant correlation with the hidden signal, while if the SNR is above $left(1+epsilon right)mathrm{SNR_c}$, then it is possible to achieve almost perfect correlation with the hidden signal. The all-or-nothing phenomenon was initially established in the context of sparse linear regression, and over the last year also in the context of sparse 2-tensor (matrix) PCA, Bernoulli group testing, and generalized linear models. Our results follow from a more general result showing that for any Gaussian additive model with a discrete uniform prior, the all-or-nothing phenomenon follows as a direct outcome of an appropriately defined near-orthogonality property of the support of the prior distribution.
250 - Zhongxing Sun , Wei Cui , 2021
This paper is concerned with the problem of recovering a structured signal from a relatively small number of corrupted random measurements. Sharp phase transitions have been numerically observed in practice when different convex programming procedure s are used to solve this problem. This paper is devoted to presenting theoretical explanations for these phenomenons by employing some basic tools from Gaussian process theory. Specifically, we identify the precise locations of the phase transitions for both constrained and penalized recovery procedures. Our theoretical results show that these phase transitions are determined by some geometric measures of structure, e.g., the spherical Gaussian width of a tangent cone and the Gaussian (squared) distance to a scaled subdifferential. By utilizing the established phase transition theory, we further investigate the relationship between these two kinds of recovery procedures, which also reveals an optimal strategy (in the sense of Lagrange theory) for choosing the tradeoff parameter in the penalized recovery procedure. Numerical experiments are provided to verify our theoretical results.
We consider the problem of decoding a discrete signal of categorical variables from the observation of several histograms of pooled subsets of it. We present an Approximate Message Passing (AMP) algorithm for recovering the signal in the random dense setting where each observed histogram involves a random subset of entries of size proportional to n. We characterize the performance of the algorithm in the asymptotic regime where the number of observations $m$ tends to infinity proportionally to n, by deriving the corresponding State Evolution (SE) equations and studying their dynamics. We initiate the analysis of the multi-dimensional SE dynamics by proving their convergence to a fixed point, along with some further properties of the iterates. The analysis reveals sharp phase transition phenomena where the behavior of AMP changes from exact recovery to weak correlation with the signal as m/n crosses a threshold. We derive formulae for the threshold in some special cases and show that they accurately match experimental behavior.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا