New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Convergence rates and approximation results for SGD and its continuous-time counterpart

170 0 0.0 ( 0 )

Download Cite

Added by Xavier Fontaine

Publication date 2020

fields Mathematical Statistics

and research's language is English

Authors Xavier Fontaine - Valentin De Bortoli -

Optimization and Control Machine Learning

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

This paper proposes a thorough theoretical analysis of Stochastic Gradient Descent (SGD) with non-increasing step sizes. First, we show that the recursion defining SGD can be provably approximated by solutions of a time inhomogeneous Stochastic Differential Equation (SDE) using an appropriate coupling. In the specific case of a batch noise we refine our results using recent advances in Steins method. Then, motivated by recent analyses of deterministic and stochastic optimization methods by their continuous counterpart, we study the long-time behavior of the continuous processes at hand and establish non-asymptotic bounds. To that purpose, we develop new comparison techniques which are of independent interest. Adapting these techniques to the discrete setting, we show that the same results hold for the corresponding SGD sequences. In our analysis, we notably improve non-asymptotic bounds in the convex setting for SGD under weaker assumptions than the ones considered in previous works. Finally, we also establish finite-time convergence results under various conditions, including relaxations of the famous {L}ojasiewicz inequality, which can be applied to a class of non-convex functions.

rate research

Non-asymptotic convergence bounds for Wasserstein approximation using point clouds

125 - Quentin Merigot , Filippo Santambrogio (ICJ 2021

Several issues in machine learning and inverse problems require to generate discrete data, as if sampled from a model probability distribution. A common way to do so relies on the construction of a uniform probability distribution over a set of $N$ points which minimizes the Wasserstein distance to the model distribution. This minimization problem, where the unknowns are the positions of the atoms, is non-convex. Yet, in most cases, a suitably adjusted version of Lloyds algorithm -- in which Voronoi cells are replaced by Power cells -- leads to configurations with small Wasserstein error. This is surprising because, again, of the non-convex nature of the problem, as well as the existence of spurious critical points. We provide explicit upper bounds for the convergence speed of this Lloyd-type algorithm, starting from a cloud of points sufficiently far from each other. This already works after one step of the iteration procedure, and similar bounds can be deduced, for the corresponding gradient descent. These bounds naturally lead to a modified Poliak-Lojasiewicz inequality for the Wasserstein distance cost, with an error term depending on the distances between Dirac masses in the discrete distribution.

Optimization and Control Machine Learning

The Epsilon-Alternating Least Squares for Orthogonal Low-Rank Tensor Approximation and Its Global Convergence

81 - Yuning Yang 2019

The epsilon alternating least squares ($epsilon$-ALS) is developed and analyzed for canonical polyadic decomposition (approximation) of a higher-order tensor where one or more of the factor matrices are assumed to be columnwisely orthonormal. It is shown that the algorithm globally converges to a KKT point for all tensors without any assumption. For the original ALS, by further studying the properties of the polar decomposition, we also establish its global convergence under a reality assumption not stronger than those in the literature. These results completely address a question concerning the global convergence raised in [L. Wang, M. T. Chu and B. Yu, emph{SIAM J. Matrix Anal. Appl.}, 36 (2015), pp. 1--19]. In addition, an initialization procedure is proposed, which possesses a provable lower bound when the number of columnwisely orthonormal factors is one. Armed with this initialization procedure, numerical experiments show that the $epsilon$-ALS exhibits a promising performance in terms of efficiency and effectiveness.

Optimization and Control Numerical Analysis Numerical Analysis

Randomized Methods for Linear Constraints: Convergence Rates and Conditioning

125 - D. Leventhal , A.S. Lewis 2008

We study randomized variants of two classical algorithms: coordinate descent for systems of linear equations and iterated projections for systems of linear inequalities. Expanding on a recent randomized iterated projection algorithm of Strohmer and Vershynin for systems of linear equations, we show that, under appropriate probability distributions, the linear rates of convergence (in expectation) can be bounded in terms of natural linear-algebraic condition numbers for the problems. We relate these condition measures to distances to ill-posedness, and discuss generalizations to convex systems under metric regularity assumptions.

Optimization and Control Numerical Analysis

Improved convergence rates and trajectory convergence for primal-dual dynamical systems with vanishing damping

119 - Radu Ioan Bot , Dang-Khoa Nguyen 2021

In this work, we approach the minimization of a continuously differentiable convex function under linear equality constraints by a second-order dynamical system with asymptotically vanishing damping term. The system is formulated in terms of the augmented Lagrangian associated to the minimization problem. We show fast convergence of the primal-dual gap, the feasibility measure, and the objective function value along the generated trajectories. In case the objective function has Lipschitz continuous gradient, we show that the primal-dual trajectory asymptotically weakly converges to a primal-dual optimal solution of the underlying minimization problem. To the best of our knowledge, this is the first result which guarantees the convergence of the trajectory generated by a primal-dual dynamical system with asymptotic vanishing damping. Moreover, we will rediscover in case of the unconstrained minimization of a convex differentiable function with Lipschitz continuous gradient all convergence statements obtained in the literature for Nesterovs accelerated gradient method.

Optimization and Control Classical Analysis and ODEs

On the metric resolvent: nonexpansiveness, convergence rates and applications

124 - Feng Xue 2021

In this paper, we study the nonexpansive properties of metric resolvent, and present a convergence rate analysis for the associated fixed-point iterations (Banach-Picard and Krasnoselskii-Mann types). Equipped with a variable metric, we develop the global ergodic and non-ergodic iteration-complexity bounds in terms of both solution distance and objective value. A byproduct of our expositions also extends the proximity operator and Moreaus decomposition identity to arbitrary variable metric. It is further shown that many classes of the first-order operator splitting algorithms, including alternating direction methods of multipliers, primal-dual hybrid gradient and Bregman iterations, can be expressed by the fixed-point iterations of a simple metric resolvent, and thus, the convergence can be analyzed within this unified framework.

Optimization and Control

comments

Fetching comments

Ebla Private University

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Convergence rates and approximation results for SGD and its continuous-time counterpart

Ask ChatGPT about the research

No Arabic abstract

Read More