Free Energy Wells and Overlap Gap Property in Sparse PCA

98 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Alexander Wein

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Gerard Ben Arous - Alexander S. Wein - Ilias Zadik

الاحتمالات بنى وهياكل البيانات والخوارزميات التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We study a variant of the sparse PCA (principal component analysis) problem in the hard regime, where the inference task is possible yet no polynomial-time algorithm is known to exist. Prior work, based on the low-degree likelihood ratio, has conjectured a precise expression for the best possible (sub-exponential) runtime throughout the hard regime. Following instead a statistical physics inspired point of view, we show bounds on the depth of free energy wells for various Gibbs measures naturally associated to the problem. These free energy wells imply hitting time lower bounds that corroborate the low-degree conjecture: we show that a class of natural MCMC (Markov chain Monte Carlo) methods (with worst-case initialization) cannot solve sparse PCA with less than the conjectured runtime. These lower bounds apply to a wide range of values for two tuning parameters: temperature and sparsity misparametrization. Finally, we prove that the Overlap Gap Property (OGP), a structural property that implies failure of certain local search algorithms, holds in a significant part of the hard regime.

قيم البحث

108 - David Gamarnik , Aukosh Jagannath , Subhabrata Sen 2019

We study support recovery for a $k times k$ principal submatrix with elevated mean $lambda/N$, hidden in an $Ntimes N$ symmetric mean zero Gaussian matrix. Here $lambda>0$ is a universal constant, and we assume $k = N rho$ for some constant $rho in ( 0,1)$. We establish that {there exists a constant $C>0$ such that} the MLE recovers a constant proportion of the hidden submatrix if $lambda {geq C} sqrt{frac{1}{rho} log frac{1}{rho}}$, {while such recovery is information theoretically impossible if $lambda = o( sqrt{frac{1}{rho} log frac{1}{rho}} )$}. The MLE is computationally intractable in general, and in fact, for $rho>0$ sufficiently small, this problem is conjectured to exhibit a emph{statistical-computational gap}. To provide rigorous evidence for this, we study the likelihood landscape for this problem, and establish that for some $varepsilon>0$ and $sqrt{frac{1}{rho} log frac{1}{rho} } ll lambda ll frac{1}{rho^{1/2 + varepsilon}}$, the problem exhibits a variant of the emph{Overlap-Gap-Property (OGP)}. As a direct consequence, we establish that a family of local MCMC based algorithms do not achieve optimal recovery. Finally, we establish that for $lambda > 1/rho$, a simple spectral method recovers a constant proportion of the hidden submatrix.

الاحتمالات التعقيد الحسابي نظرية الإحصاء

Sparse PCA via Bipartite Matchings

101 - Megasthenis Asteris , Dimitris Papailiopoulos , Anastasios Kyrillidis 2015

We consider the following multi-component sparse PCA problem: given a set of data points, we seek to extract a small number of sparse components with disjoint supports that jointly capture the maximum possible variance. These components can be comput ed one by one, repeatedly solving the single-component problem and deflating the input data matrix, but as we show this greedy procedure is suboptimal. We present a novel algorithm for sparse PCA that jointly optimizes multiple disjoint components. The extracted features capture variance that lies within a multiplicative factor arbitrarily close to 1 from the optimal. Our algorithm is combinatorial and computes the desired components by solving multiple instances of the bipartite maximum weight matching problem. Its complexity grows as a low order polynomial in the ambient dimension of the input data matrix, but exponentially in its rank. However, it can be effectively applied on a low-dimensional sketch of the data; this allows us to obtain polynomial-time approximation guarantees via spectral bounds. We evaluate our algorithm on real data-sets and empirically demonstrate that in many cases it outperforms existing, deflation-based approaches.

التعلم الالي بنى وهياكل البيانات والخوارزميات التعلم الآلي

Sparse PCA: Algorithms, Adversarial Perturbations and Certificates

125 - Tommaso dOrsi , Pravesh K. Kothari , Gleb Novikov 2020

We study efficient algorithms for Sparse PCA in standard statistical models (spiked covariance in its Wishart form). Our goal is to achieve optimal recovery guarantees while being resilient to small perturbations. Despite a long history of prior work s, including explicit studies of perturbation resilience, the best known algorithmic guarantees for Sparse PCA are fragile and break down under small adversarial perturbations. We observe a basic connection between perturbation resilience and emph{certifying algorithms} that are based on certificates of upper bounds on sparse eigenvalues of random matrices. In contrast to other techniques, such certifying algorithms, including the brute-force maximum likelihood estimator, are automatically robust against small adversarial perturbation. We use this connection to obtain the first polynomial-time algorithms for this problem that are resilient against additive adversarial perturbations by obtaining new efficient certificates for upper bounds on sparse eigenvalues of random matrices. Our algorithms are based either on basic semidefinite programming or on its low-degree sum-of-squares strengthening depending on the parameter regimes. Their guarantees either match or approach the best known guarantees of emph{fragile} algorithms in terms of sparsity of the unknown vector, number of samples and the ambient dimension. To complement our algorithmic results, we prove rigorous lower bounds matching the gap between fragile and robust polynomial-time algorithms in a natural computational model based on low-degree polynomials (closely related to the pseudo-calibration technique for sum-of-squares lower bounds) that is known to capture the best known guarantees for related statistical estimation problems. The combination of these results provides formal evidence of an inherent price to pay to achieve robustness.

التعلم الآلي بنى وهياكل البيانات والخوارزميات

Phase Transitions in Sparse PCA

478 - Thibault Lesieur , Florent Krzakala , Lenka Zdeborova 2015

We study optimal estimation for sparse principal component analysis when the number of non-zero elements is small but on the same order as the dimension of the data. We employ approximate message passing (AMP) algorithm and its state evolution to ana lyze what is the information theoretically minimal mean-squared error and the one achieved by AMP in the limit of large sizes. For a special case of rank one and large enough density of non-zeros Deshpande and Montanari [1] proved that AMP is asymptotically optimal. We show that both for low density and for large rank the problem undergoes a series of phase transitions suggesting existence of a region of parameters where estimation is information theoretically possible, but AMP (and presumably every other polynomial algorithm) fails. The analysis of the large rank limit is particularly instructive.

نظرية المعلومات الميكانيكا الإحصائية نظرية المعلومات

ODE-Inspired Analysis for the Biological Version of Ojas Rule in Solving Streaming PCA

86 - Chi-Ning Chou , Mien Brabeeba Wang 2019

Ojas rule [Oja, Journal of mathematical biology 1982] is a well-known biologically-plausible algorithm using a Hebbian-type synaptic update rule to solve streaming principal component analysis (PCA). Computational neuroscientists have known that this biological version of Ojas rule converges to the top eigenvector of the covariance matrix of the input in the limit. However, prior to this work, it was open to prove any convergence rate guarantee. In this work, we give the first convergence rate analysis for the biological version of Ojas rule in solving streaming PCA. Moreover, our convergence rate matches the information theoretical lower bound up to logarithmic factors and outperforms the state-of-the-art upper bound for streaming PCA. Furthermore, we develop a novel framework inspired by ordinary differential equations (ODE) to analyze general stochastic dynamics. The framework abandons the traditional step-by-step analysis and instead analyzes a stochastic dynamic in one-shot by giving a closed-form solution to the entire dynamic. The one-shot framework allows us to apply stopping time and martingale techniques to have a flexible and precise control on the dynamic. We believe that this general framework is powerful and should lead to effective yet simple analysis for a large class of problems with stochastic dynamics.

الخلايا العصبية والإدراك بنى وهياكل البيانات والخوارزميات التعلم الآلي