Global Optimality in Distributed Low-rank Matrix Factorization

160 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Michael Wakin

تاريخ النشر 2018

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Zhihui Zhu - Qiuwei Li - Xinshuo Yang

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We study the convergence of a variant of distributed gradient descent (DGD) on a distributed low-rank matrix approximation problem wherein some optimization variables are used for consensus (as in classical DGD) and some optimization variables appear only locally at a single node in the network. We term the resulting algorithm DGD+LOCAL. Using algorithmic connections to gradient descent and geometric connections to the well-behaved landscape of the centralized low-rank matrix approximation problem, we identify sufficient conditions where DGD+LOCAL is guaranteed to converge with exact consensus to a global minimizer of the original centralized problem. For the distributed low-rank matrix approximation problem, these guarantees are stronger---in terms of consensus and optimality---than what appear in the literature for classical DGD and more general problems.

قيم البحث

97 - Tian Ye , Simon S. Du 2021

We study the asymmetric low-rank factorization problem: [min_{mathbf{U} in mathbb{R}^{m times d}, mathbf{V} in mathbb{R}^{n times d}} frac{1}{2}|mathbf{U}mathbf{V}^top -mathbf{Sigma}|_F^2] where $mathbf{Sigma}$ is a given matrix of size $m times n$ a nd rank $d$. This is a canonical problem that admits two difficulties in optimization: 1) non-convexity and 2) non-smoothness (due to unbalancedness of $mathbf{U}$ and $mathbf{V}$). This is also a prototype for more complex problems such as asymmetric matrix sensing and matrix completion. Despite being non-convex and non-smooth, it has been observed empirically that the randomly initialized gradient descent algorithm can solve this problem in polynomial time. Existing theories to explain this phenomenon all require artificial modifications of the algorithm, such as adding noise in each iteration and adding a balancing regularizer to balance the $mathbf{U}$ and $mathbf{V}$. This paper presents the first proof that shows randomly initialized gradient descent converges to a global minimum of the asymmetric low-rank factorization problem with a polynomial rate. For the proof, we develop 1) a new symmetrization technique to capture the magnitudes of the symmetry and asymmetry, and 2) a quantitative perturbation analysis to approximate matrix derivatives. We believe both are useful for other related non-convex problems.

التحسين والتحكم التعلم الآلي التعلم الالي

Sharp Global Guarantees for Nonconvex Low-Rank Matrix Recovery in the Overparameterized Regime

80 - Richard Y. Zhang 2021

We prove that it is possible for nonconvex low-rank matrix recovery to contain no spurious local minima when the rank of the unknown ground truth $r^{star}<r$ is strictly less than the search rank $r$, and yet for the claim to be false when $r^{star} =r$. Under the restricted isometry property (RIP), we prove, for the general overparameterized regime with $r^{star}le r$, that an RIP constant of $delta<1/(1+sqrt{r^{star}/r})$ is sufficient for the inexistence of spurious local minima, and that $delta<1/(1+1/sqrt{r-r^{star}+1})$ is necessary due to existence of counterexamples. Without an explicit control over $r^{star}le r$, an RIP constant of $delta<1/2$ is both necessary and sufficient for the exact recovery of a rank-$r$ ground truth. But if the ground truth is known a priori to have $r^{star}=1$, then the sharp RIP threshold for exact recovery is improved to $delta<1/(1+1/sqrt{r})$.

التحسين والتحكم التعلم الآلي التعلم الالي

Low-rank optimization for distance matrix completion

503 - B. Mishra , G. Meyer , R. Sepulchre 2013

This paper addresses the problem of low-rank distance matrix completion. This problem amounts to recover the missing entries of a distance matrix when the dimension of the data embedding space is possibly unknown but small compared to the number of c onsidered data points. The focus is on high-dimensional problems. We recast the considered problem into an optimization problem over the set of low-rank positive semidefinite matrices and propose two efficient algorithms for low-rank distance matrix completion. In addition, we propose a strategy to determine the dimension of the embedding space. The resulting algorithms scale to high-dimensional problems and monotonically converge to a global solution of the problem. Finally, numerical experiments illustrate the good performance of the proposed algorithms on benchmarks.

التحسين والتحكم التعلم الآلي التعلم الالي

The Global Geometry of Centralized and Distributed Low-rank Matrix Recovery without Regularization

90 - Shuang Li , Qiuwei Li , Zhihui Zhu 2020

Low-rank matrix recovery is a fundamental problem in signal processing and machine learning. A recent very popular approach to recovering a low-rank matrix X is to factorize it as a product of two smaller matrices, i.e., X = UV^T, and then optimize o ver U, V instead of X. Despite the resulting non-convexity, recent results have shown that many factorized objective functions actually have benign global geometry---with no spurious local minima and satisfying the so-called strict saddle property---ensuring convergence to a global minimum for many local-search algorithms. Such results hold whenever the original objective function is restricted strongly convex and smooth. However, most of these results actually consider a modified cost function that includes a balancing regularizer. While useful for deriving theory, this balancing regularizer does not appear to be necessary in practice. In this work, we close this theory-practice gap by proving that the unaltered factorized non-convex problem, without the balancing regularizer, also has similar benign global geometry. Moreover, we also extend our theoretical results to the field of distributed optimization.

التحسين والتحكم

Wave-Informed Matrix Factorization with Global Optimality Guarantees

267 - Harsha Vardhan Tetali , Joel B. Harley , Benjamin D. Haeffele 2021

With the recent success of representation learning methods, which includes deep learning as a special case, there has been considerable interest in developing representation learning techniques that can incorporate known physical constraints into the learned representation. As one example, in many applications that involve a signal propagating through physical media (e.g., optics, acoustics, fluid dynamics, etc), it is known that the dynamics of the signal must satisfy constraints imposed by the wave equation. Here we propose a matrix factorization technique that decomposes such signals into a sum of components, where each component is regularized to ensure that it satisfies wave equation constraints. Although our proposed formulation is non-convex, we prove that our model can be efficiently solved to global optimality in polynomial time. We demonstrate the benefits of our work by applications in structural health monitoring, where prior work has attempted to solve this problem using sparse dictionary learning approaches that do not come with any theoretical guarantees regarding convergence to global optimality and employ heuristics to capture desired physical constraints.

التعلم الآلي التعلم الالي