Implicit Regularization in Matrix Sensing via Mirror Descent

117 0 0.0 ( 0 )

Download Cite

Added by Fan Wu

Publication date 2021

fields Mathematical Statistics Informatics Engineering

and research's language is English

Authors Fan Wu - Patrick Rebeschini

Machine Learning Machine Learning

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We study discrete-time mirror descent applied to the unregularized empirical risk in matrix sensing. In both the general case of rectangular matrices and the particular case of positive semidefinite matrices, a simple potential-based analysis in terms of the Bregman divergence allows us to establish convergence of mirror descent -- with different choices of the mirror maps -- to a matrix that, among all global minimizers of the empirical risk, minimizes a quantity explicitly related to the nuclear norm, the Frobenius norm, and the von Neumann entropy. In both cases, this characterization implies that mirror descent, a first-order algorithm minimizing the unregularized empirical risk, recovers low-rank matrices under the same set of assumptions that are sufficient to guarantee recovery for nuclear-norm minimization. When the sensing matrices are symmetric and commute, we show that gradient descent with full-rank factorized parametrization is a first-order approximation to mirror descent, in which case we obtain an explicit characterization of the implicit bias of gradient flow as a by-product.

rate research

Implicit regularization and solution uniqueness in over-parameterized matrix sensing

100 - Kelly Geyer , Anastasios Kyrillidis , Amir Kalev 2018

We consider whether algorithmic choices in over-parameterized linear matrix factorization introduce implicit regularization. We focus on noiseless matrix sensing over rank-$r$ positive semi-definite (PSD) matrices in $mathbb{R}^{n times n}$, with a sensing mechanism that satisfies restricted isometry properties (RIP). The algorithm we study is emph{factored gradient descent}, where we model the low-rankness and PSD constraints with the factorization $UU^top$, for $U in mathbb{R}^{n times r}$. Surprisingly, recent work argues that the choice of $r leq n$ is not pivotal: even setting $U in mathbb{R}^{n times n}$ is sufficient for factored gradient descent to find the rank-$r$ solution, which suggests that operating over the factors leads to an implicit regularization. In this contribution, we provide a different perspective to the problem of implicit regularization. We show that under certain conditions, the PSD constraint by itself is sufficient to lead to a unique rank-$r$ matrix recovery, without implicit or explicit low-rank regularization. emph{I.e.}, under assumptions, the set of PSD matrices, that are consistent with the observed data, is a singleton, regardless of the algorithm used.

Machine Learning Machine Learning Optimization and Control

Active Model Aggregation via Stochastic Mirror Descent

625 - Ravi Ganti 2015

We consider the problem of learning convex aggregation of models, that is as good as the best convex aggregation, for the binary classification problem. Working in the stream based active learning setting, where the active learner has to make a decision on-the-fly, if it wants to query for the label of the point currently seen in the stream, we propose a stochastic-mirror descent algorithm, called SMD-AMA, with entropy regularization. We establish an excess risk bounds for the loss of the convex aggregate returned by SMD-AMA to be of the order of $Oleft(sqrt{frac{log(M)}{{T^{1-mu}}}}right)$, where $muin [0,1)$ is an algorithm dependent parameter, that trades-off the number of labels queried, and excess risk.

Machine Learning Artificial Intelligence Machine Learning

Implicit Regularization for Optimal Sparse Recovery

175 - Tomas Vav{s}keviv{c}ius , Varun Kanade , Patrick Rebeschini 2019

We investigate implicit regularization schemes for gradient descent methods applied to unpenalized least squares regression to solve the problem of reconstructing a sparse signal from an underdetermined system of linear measurements under the restricted isometry assumption. For a given parametrization yielding a non-convex optimization problem, we show that prescribed choices of initialization, step size and stopping time yield a statistically and computationally optimal algorithm that achieves the minimax rate with the same cost required to read the data up to poly-logarithmic factors. Beyond minimax optimality, we show that our algorithm adapts to instance difficulty and yields a dimension-independent rate when the signal-to-noise ratio is high enough. Key to the computational efficiency of our method is an increasing step size scheme that adapts to refined estimates of the true solution. We validate our findings with numerical experiments and compare our algorithm against explicit $ell_{1}$ penalization. Going from hard instances to easy ones, our algorithm is seen to undergo a phase transition, eventually matching least squares with an oracle knowledge of the true support.

Machine Learning Machine Learning Signal Processing

The Statistical Complexity of Early-Stopped Mirror Descent

303 - Tomas Vav{s}keviv{c}ius , Varun Kanade , Patrick Rebeschini 2020

Recently there has been a surge of interest in understanding implicit regularization properties of iterative gradient-based optimization algorithms. In this paper, we study the statistical guarantees on the excess risk achieved by early-stopped unconstrained mirror descent algorithms applied to the unregularized empirical risk with the squared loss for linear models and kernel methods. By completing an inequality that characterizes convexity for the squared loss, we identify an intrinsic link between offset Rademacher complexities and potential-based convergence analysis of mirror descent methods. Our observation immediately yields excess risk guarantees for the path traced by the iterates of mirror descent in terms of offset complexities of certain function classes depending only on the choice of the mirror map, initialization point, step-size, and the number of iterations. We apply our theory to recover, in a clean and elegant manner via rather short proofs, some of the recent results in the implicit regularization literature, while also showing how to improve upon them in some settings.

Machine Learning Machine Learning

Implicit Regularization of Bregman Proximal Point Algorithm and Mirror Descent on Separable Data

104 - Yan Li , Caleb Ju , Ethan X. Fang 2021

Bregman proximal point algorithm (BPPA), as one of the centerpieces in the optimization toolbox, has been witnessing emerging applications. With simple and easy to implement update rule, the algorithm bears several compelling intuitions for empirical successes, yet rigorous justifications are still largely unexplored. We study the computational properties of BPPA through classification tasks with separable data, and demonstrate provable algorithmic regularization effects associated with BPPA. We show that BPPA attains non-trivial margin, which closely depends on the condition number of the distance generating function inducing the Bregman divergence. We further demonstrate that the dependence on the condition number is tight for a class of problems, thus showing the importance of divergence in affecting the quality of the obtained solutions. In addition, we extend our findings to mirror descent (MD), for which we establish similar connections between the margin and Bregman divergence. We demonstrate through a concrete example, and show BPPA/MD converges in direction to the maximal margin solution with respect to the Mahalanobis distance. Our theoretical findings are among the first to demonstrate the benign learning properties BPPA/MD, and also provide corroborations for a careful choice of divergence in the algorithmic design.

Machine Learning

Implicit Regularization in Matrix Sensing via Mirror Descent

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions