Nonconvex Factorization and Manifold Formulations are Almost Equivalent in Low-rank Matrix Optimization

81 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Yuetian Luo

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Yuetian Luo - Xudong Li - Anru R. Zhang

التحسين والتحكم نظرية المعلومات التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

In this paper, we consider the geometric landscape connection of the widely studied manifold and factorization formulations in low-rank positive semidefinite (PSD) and general matrix optimization. We establish an equivalence on the set of first-order stationary points (FOSPs) and second-order stationary points (SOSPs) between the manifold and the factorization formulations. We further give a sandwich inequality on the spectrum of Riemannian and Euclidean Hessians at FOSPs, which can be used to transfer more geometric properties from one formulation to another. Similarities and differences on the landscape connection under the PSD case and the general case are discussed. To the best of our knowledge, this is the first geometric landscape connection between the manifold and the factorization formulations for handling rank constraints. In the general low-rank matrix optimization, the landscape connection of two factorization formulations (unregularized and regularized ones) is also provided. By applying these geometric landscape connections, we are able to solve unanswered questions in literature and establish stronger results in the applications on geometric analysis of phase retrieval, well-conditioned low-rank matrix optimization, and the role of regularization in factorization arising from machine learning and signal processing.

قيم البحث

62 - Thomas Y. Hou , Zhenzhen Li , 2021

We show that the Riemannian gradient descent algorithm on the low-rank matrix manifold almost surely escapes some spurious critical points on the boundary of the manifold. Given that the low-rank matrix manifold is an incomplete set, this result is t he first to overcome this difficulty and partially justify the global use of the Riemannian gradient descent on the manifold. The spurious critical points are some rank-deficient matrices that capture only part of the SVD components of the ground truth. They exhibit very singular behavior and evade the classical analysis of strict saddle points. We show that using the dynamical low-rank approximation and a rescaled gradient flow, some of the spurious critical points can be converted to classical strict saddle points, which leads to the desired result. Numerical experiments are provided to support our theoretical findings.

التحسين والتحكم نظرية المعلومات التعلم الآلي

Global Optimality in Distributed Low-rank Matrix Factorization

159 - Zhihui Zhu , Qiuwei Li , Xinshuo Yang 2018

We study the convergence of a variant of distributed gradient descent (DGD) on a distributed low-rank matrix approximation problem wherein some optimization variables are used for consensus (as in classical DGD) and some optimization variables appear only locally at a single node in the network. We term the resulting algorithm DGD+LOCAL. Using algorithmic connections to gradient descent and geometric connections to the well-behaved landscape of the centralized low-rank matrix approximation problem, we identify sufficient conditions where DGD+LOCAL is guaranteed to converge with exact consensus to a global minimizer of the original centralized problem. For the distributed low-rank matrix approximation problem, these guarantees are stronger---in terms of consensus and optimality---than what appear in the literature for classical DGD and more general problems.

التحسين والتحكم التعلم الآلي التعلم الالي

Low-rank optimization for distance matrix completion

253 - B. Mishra , G. Meyer , R. Sepulchre 2013

This paper addresses the problem of low-rank distance matrix completion. This problem amounts to recover the missing entries of a distance matrix when the dimension of the data embedding space is possibly unknown but small compared to the number of c onsidered data points. The focus is on high-dimensional problems. We recast the considered problem into an optimization problem over the set of low-rank positive semidefinite matrices and propose two efficient algorithms for low-rank distance matrix completion. In addition, we propose a strategy to determine the dimension of the embedding space. The resulting algorithms scale to high-dimensional problems and monotonically converge to a global solution of the problem. Finally, numerical experiments illustrate the good performance of the proposed algorithms on benchmarks.

التحسين والتحكم التعلم الآلي التعلم الالي

Sharp Global Guarantees for Nonconvex Low-Rank Matrix Recovery in the Overparameterized Regime

80 - Richard Y. Zhang 2021

We prove that it is possible for nonconvex low-rank matrix recovery to contain no spurious local minima when the rank of the unknown ground truth $r^{star}<r$ is strictly less than the search rank $r$, and yet for the claim to be false when $r^{star} =r$. Under the restricted isometry property (RIP), we prove, for the general overparameterized regime with $r^{star}le r$, that an RIP constant of $delta<1/(1+sqrt{r^{star}/r})$ is sufficient for the inexistence of spurious local minima, and that $delta<1/(1+1/sqrt{r-r^{star}+1})$ is necessary due to existence of counterexamples. Without an explicit control over $r^{star}le r$, an RIP constant of $delta<1/2$ is both necessary and sufficient for the exact recovery of a rank-$r$ ground truth. But if the ground truth is known a priori to have $r^{star}=1$, then the sharp RIP threshold for exact recovery is improved to $delta<1/(1+1/sqrt{r})$.

التحسين والتحكم التعلم الآلي التعلم الالي

Global Convergence of Gradient Descent for Asymmetric Low-Rank Matrix Factorization

97 - Tian Ye , Simon S. Du 2021

We study the asymmetric low-rank factorization problem: [min_{mathbf{U} in mathbb{R}^{m times d}, mathbf{V} in mathbb{R}^{n times d}} frac{1}{2}|mathbf{U}mathbf{V}^top -mathbf{Sigma}|_F^2] where $mathbf{Sigma}$ is a given matrix of size $m times n$ a nd rank $d$. This is a canonical problem that admits two difficulties in optimization: 1) non-convexity and 2) non-smoothness (due to unbalancedness of $mathbf{U}$ and $mathbf{V}$). This is also a prototype for more complex problems such as asymmetric matrix sensing and matrix completion. Despite being non-convex and non-smooth, it has been observed empirically that the randomly initialized gradient descent algorithm can solve this problem in polynomial time. Existing theories to explain this phenomenon all require artificial modifications of the algorithm, such as adding noise in each iteration and adding a balancing regularizer to balance the $mathbf{U}$ and $mathbf{V}$. This paper presents the first proof that shows randomly initialized gradient descent converges to a global minimum of the asymmetric low-rank factorization problem with a polynomial rate. For the proof, we develop 1) a new symmetrization technique to capture the magnitudes of the symmetry and asymmetry, and 2) a quantitative perturbation analysis to approximate matrix derivatives. We believe both are useful for other related non-convex problems.

التحسين والتحكم التعلم الآلي التعلم الالي