Sharp Restricted Isometry Bounds for the Inexistence of Spurious Local Minima in Nonconvex Matrix Recovery

192 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Richard Zhang

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Richard Y. Zhang - Somayeh Sojoudi - Javad Lavaei

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Nonconvex matrix recovery is known to contain no spurious local minima under a restricted isometry property (RIP) with a sufficiently small RIP constant $delta$. If $delta$ is too large, however, then counterexamples containing spurious local minima are known to exist. In this paper, we introduce a proof technique that is capable of establishing sharp thresholds on $delta$ to guarantee the inexistence of spurious local minima. Using the technique, we prove that in the case of a rank-1 ground truth, an RIP constant of $delta<1/2$ is both necessary and sufficient for exact recovery from any arbitrary initial point (such as a random point). We also prove a local recovery result: given an initial point $x_{0}$ satisfying $f(x_{0})le(1-delta)^{2}f(0)$, any descent algorithm that converges to second-order optimality guarantees exact recovery.

قيم البحث

196 - Richard Y. Zhang , Cedric Josz , Somayeh Sojoudi 2018

When the linear measurements of an instance of low-rank matrix recovery satisfy a restricted isometry property (RIP)---i.e. they are approximately norm-preserving---the problem is known to contain no spurious local minima, so exact recovery is guaran teed. In this paper, we show that moderate RIP is not enough to eliminate spurious local minima, so existing results can only hold for near-perfect RIP. In fact, counterexamples are ubiquitous: we prove that every x is the spurious local minimum of a rank-1 instance of matrix recovery that satisfies RIP. One specific counterexample has RIP constant $delta=1/2$, but causes randomly initialized stochastic gradient descent (SGD) to fail 12% of the time. SGD is frequently able to avoid and escape spurious local minima, but this empirical result shows that it can occasionally be defeated by their existence. Hence, while exact recovery guarantees will likely require a proof of no spurious local minima, arguments based solely on norm preservation will only be applicable to a narrow set of nearly-isotropic instances.

التعلم الآلي التحسين والتحكم التعلم الالي

Sharp Global Guarantees for Nonconvex Low-Rank Matrix Recovery in the Overparameterized Regime

80 - Richard Y. Zhang 2021

We prove that it is possible for nonconvex low-rank matrix recovery to contain no spurious local minima when the rank of the unknown ground truth $r^{star}<r$ is strictly less than the search rank $r$, and yet for the claim to be false when $r^{star} =r$. Under the restricted isometry property (RIP), we prove, for the general overparameterized regime with $r^{star}le r$, that an RIP constant of $delta<1/(1+sqrt{r^{star}/r})$ is sufficient for the inexistence of spurious local minima, and that $delta<1/(1+1/sqrt{r-r^{star}+1})$ is necessary due to existence of counterexamples. Without an explicit control over $r^{star}le r$, an RIP constant of $delta<1/2$ is both necessary and sufficient for the exact recovery of a rank-$r$ ground truth. But if the ground truth is known a priori to have $r^{star}=1$, then the sharp RIP threshold for exact recovery is improved to $delta<1/(1+1/sqrt{r})$.

التحسين والتحكم التعلم الآلي التعلم الالي

Passed & Spurious: Descent Algorithms and Local Minima in Spiked Matrix-Tensor Models

150 - Stefano Sarao Mannelli , Florent Krzakala , Pierfrancesco Urbani 2019

In this work we analyse quantitatively the interplay between the loss landscape and performance of descent algorithms in a prototypical inference problem, the spiked matrix-tensor model. We study a loss function that is the negative log-likelihood of the model. We analyse the number of local minima at a fixed distance from the signal/spike with the Kac-Rice formula, and locate trivialization of the landscape at large signal-to-noise ratios. We evaluate in a closed form the performance of a gradient flow algorithm using integro-differential PDEs as developed in physics of disordered systems for the Langevin dynamics. We analyze the performance of an approximate message passing algorithm estimating the maximum likelihood configuration via its state evolution. We conclude by comparing the above results: while we observe a drastic slow down of the gradient flow dynamics even in the region where the landscape is trivial, both the analyzed algorithms are shown to perform well even in the part of the region of parameters where spurious local minima are present.

التعلم الآلي الأنظمة المضطربة والشبكات العصبية نظرية الإحصاء

Matrix Completion via Nonconvex Regularization: Convergence of the Proximal Gradient Algorithm

148 - Fei Wen , Rendong Ying , Peilin Liu 2019

Matrix completion has attracted much interest in the past decade in machine learning and computer vision. For low-rank promotion in matrix completion, the nuclear norm penalty is convenient due to its convexity but has a bias problem. Recently, vario us algorithms using nonconvex penalties have been proposed, among which the proximal gradient descent (PGD) algorithm is one of the most efficient and effective. For the nonconvex PGD algorithm, whether it converges to a local minimizer and its convergence rate are still unclear. This work provides a nontrivial analysis on the PGD algorithm in the nonconvex case. Besides the convergence to a stationary point for a generalized nonconvex penalty, we provide more deep analysis on a popular and important class of nonconvex penalties which have discontinuous thresholding functions. For such penalties, we establish the finite rank convergence, convergence to restricted strictly local minimizer and eventually linear convergence rate of the PGD algorithm. Meanwhile, convergence to a local minimizer has been proved for the hard-thresholding penalty. Our result is the first shows that, nonconvex regularized matrix completion only has restricted strictly local minimizers, and the PGD algorithm can converge to such minimizers with eventually linear rate under certain conditions. Illustration of the PGD algorithm via experiments has also been provided. Code is available at https://github.com/FWen/nmc.

التعلم الآلي التحسين والتحكم التعلم الالي

Restricted Isometry Property under High Correlations

119 - Shiva Prasad Kasiviswanathan , Mark Rudelson 2019

Matrices satisfying the Restricted Isometry Property (RIP) play an important role in the areas of compressed sensing and statistical learning. RIP matrices with optimal parameters are mainly obtained via probabilistic arguments, as explicit construct ions seem hard. It is therefore interesting to ask whether a fixed matrix can be incorporated into a construction of restricted isometries. In this paper, we construct a new broad ensemble of random matrices with dependent entries that satisfy the restricted isometry property. Our construction starts with a fixed (deterministic) matrix $X$ satisfying some simple stable rank condition, and we show that the matrix $XR$, where $R$ is a random matrix drawn from various popular probabilistic models (including, subgaussian, sparse, low-randomness, satisfying convex concentration property), satisfies the RIP with high probability. These theorems have various applications in signal recovery, random matrix theory, dimensionality reduction, etc. Additionally, motivated by an application for understanding the effectiveness of word vector embeddings popular in natural language processing and machine learning applications, we investigate the RIP of the matrix $XR^{(l)}$ where $R^{(l)}$ is formed by taking all possible (disregarding order) $l$-way entrywise products of the columns of a random matrix $R$.

التعلم الآلي بنى وهياكل البيانات والخوارزميات التعلم الالي