Low-rank optimization with trace norm penalty

124 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Bamdev Mishra

تاريخ النشر 2011

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف B. Mishra - G. Meyer - F. Bach

التحسين والتحكم التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

The paper addresses the problem of low-rank trace norm minimization. We propose an algorithm that alternates between fixed-rank optimization and rank-one updates. The fixed-rank optimization is characterized by an efficient factorization that makes the trace norm differentiable in the search space and the computation of duality gap numerically tractable. The search space is nonlinear but is equipped with a particular Riemannian structure that leads to efficient computations. We present a second-order trust-region algorithm with a guaranteed quadratic rate of convergence. Overall, the proposed optimization scheme converges super-linearly to the global solution while maintaining complexity that is linear in the number of rows and columns of the matrix. To compute a set of solutions efficiently for a grid of regularization parameters we propose a predictor-corrector approach that outperforms the naive warm-restart approach on the fixed-rank quotient manifold. The performance of the proposed algorithm is illustrated on problems of low-rank matrix completion and multivariate linear regression.

قيم البحث

253 - B. Mishra , G. Meyer , R. Sepulchre 2013

This paper addresses the problem of low-rank distance matrix completion. This problem amounts to recover the missing entries of a distance matrix when the dimension of the data embedding space is possibly unknown but small compared to the number of c onsidered data points. The focus is on high-dimensional problems. We recast the considered problem into an optimization problem over the set of low-rank positive semidefinite matrices and propose two efficient algorithms for low-rank distance matrix completion. In addition, we propose a strategy to determine the dimension of the embedding space. The resulting algorithms scale to high-dimensional problems and monotonically converge to a global solution of the problem. Finally, numerical experiments illustrate the good performance of the proposed algorithms on benchmarks.

التحسين والتحكم التعلم الآلي التعلم الالي

On the Treatment of Optimization Problems with L1 Penalty Terms via Multiobjective Continuation

72 - Katharina Bieker , Bennet Gebken , Sebastian Peitz 2020

We present a novel algorithm that allows us to gain detailed insight into the effects of sparsity in linear and nonlinear optimization, which is of great importance in many scientific areas such as image and signal processing, medical imaging, compre ssed sensing, and machine learning (e.g., for the training of neural networks). Sparsity is an important feature to ensure robustness against noisy data, but also to find models that are interpretable and easy to analyze due to the small number of relevant terms. It is common practice to enforce sparsity by adding the $ell_1$-norm as a weighted penalty term. In order to gain a better understanding and to allow for an informed model selection, we directly solve the corresponding multiobjective optimization problem (MOP) that arises when we minimize the main objective and the $ell_1$-norm simultaneously. As this MOP is in general non-convex for nonlinear objectives, the weighting method will fail to provide all optimal compromises. To avoid this issue, we present a continuation method which is specifically tailored to MOPs with two objective functions one of which is the $ell_1$-norm. Our method can be seen as a generalization of well-known homotopy methods for linear regression problems to the nonlinear case. Several numerical examples - including neural network training - demonstrate our theoretical findings and the additional insight that can be gained by this multiobjective approach.

التحسين والتحكم التعلم الآلي

Automatic differentiation for Riemannian optimization on low-rank matrix and tensor-train manifolds

155 - Alexander Novikov , Maxim Rakhuba , Ivan Oseledets 2021

In scientific computing and machine learning applications, matrices and more general multidimensional arrays (tensors) can often be approximated with the help of low-rank decompositions. Since matrices and tensors of fixed rank form smooth Riemannian manifolds, one of the popular tools for finding the low-rank approximations is to use the Riemannian optimization. Nevertheless, efficient implementation of Riemannian gradients and Hessians, required in Riemannian optimization algorithms, can be a nontrivial task in practice. Moreover, in some cases, analytic formulas are not even available. In this paper, we build upon automatic differentiation and propose a method that, given an implementation of the function to be minimized, efficiently computes Riemannian gradients and matrix-by-vector products between approximate Riemannian Hessian and a given vector.

التحسين والتحكم التعلم الآلي البرمجيات الرياضية

Low-rank optimization for semidefinite convex problems

158 - M. Journee , F. Bach , P.-A. Absil 2008

We propose an algorithm for solving nonlinear convex programs defined in terms of a symmetric positive semidefinite matrix variable $X$. This algorithm rests on the factorization $X=Y Y^T$, where the number of columns of Y fixes the rank of $X$. It i s thus very effective for solving programs that have a low rank solution. The factorization $X=Y Y^T$ evokes a reformulation of the original problem as an optimization on a particular quotient manifold. The present paper discusses the geometry of that manifold and derives a second order optimization method. It furthermore provides some conditions on the rank of the factorization to ensure equivalence with the original problem. The efficiency of the proposed algorithm is illustrated on two applications: the maximal cut of a graph and the sparse principal component analysis problem.

التحسين والتحكم

Global Optimality in Distributed Low-rank Matrix Factorization

159 - Zhihui Zhu , Qiuwei Li , Xinshuo Yang 2018

We study the convergence of a variant of distributed gradient descent (DGD) on a distributed low-rank matrix approximation problem wherein some optimization variables are used for consensus (as in classical DGD) and some optimization variables appear only locally at a single node in the network. We term the resulting algorithm DGD+LOCAL. Using algorithmic connections to gradient descent and geometric connections to the well-behaved landscape of the centralized low-rank matrix approximation problem, we identify sufficient conditions where DGD+LOCAL is guaranteed to converge with exact consensus to a global minimizer of the original centralized problem. For the distributed low-rank matrix approximation problem, these guarantees are stronger---in terms of consensus and optimality---than what appear in the literature for classical DGD and more general problems.

التحسين والتحكم التعلم الآلي التعلم الالي