A Unified Convergence Analysis of First Order Convex Optimization Methods via Strong Lyapunov Functions

314 0 0.0 ( 0 )

Download Cite

Added by Hao Luo

Publication date 2021

fields

and research's language is English

Authors Long Chen - Hao Luo

Optimization and Control

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We present a unified convergence analysis for first order convex optimization methods using the concept of strong Lyapunov conditions. Combining this with suitable time scaling factors, we are able to handle both convex and strong convex cases, and establish contraction properties of Lyapunov functions for many existing ordinary differential equation models. Then we derive prevailing first order optimization algorithms, such as proximal gradient methods, heavy ball methods (also known as momentum methods), Nesterov accelerated gradient methods, and accelerated proximal gradient methods from numerical discretizations of corresponding dynamical systems. We also apply strong Lyapunov conditions to the discrete level and provide a more systematical analysis framework. Another contribution is a novel second order dynamical system called Hessian-driven Nesterov accelerated gradient flow which can be used to design and analyze accelerated first order methods for smooth and non-smooth convex optimizations.

rate research

On The Convergence of First Order Methods for Quasar-Convex Optimization

100 - Jikai Jin 2020

In recent years, the success of deep learning has inspired many researchers to study the optimization of general smooth non-convex functions. However, recent works have established pessimistic worst-case complexities for this class functions, which is in stark contrast with their superior performance in real-world applications (e.g. training deep neural networks). On the other hand, it is found that many popular non-convex optimization problems enjoy certain structured properties which bear some similarities to convexity. In this paper, we study the class of textit{quasar-convex functions} to close the gap between theory and practice. We study the convergence of first order methods in a variety of different settings and under different optimality criterions. We prove complexity upper bounds that are similar to standard results established for convex functions and much better that state-of-the-art convergence rates of non-convex functions. Overall, this paper suggests that textit{quasar-convexity} allows efficient optimization procedures, and we are looking forward to seeing more problems that demonstrate similar properties in practice.

Optimization and Control Machine Learning Machine Learning

A Unified Analysis of First-Order Methods for Smooth Games via Integral Quadratic Constraints

193 - Guodong Zhang , Xuchan Bao , Laurent Lessard 2020

The theory of integral quadratic constraints (IQCs) allows the certification of exponential convergence of interconnected systems containing nonlinear or uncertain elements. In this work, we adapt the IQC theory to study first-order methods for smooth and strongly-monotone games and show how to design tailored quadratic constraints to get tight upper bounds of convergence rates. Using this framework, we recover the existing bound for the gradient method~(GD), derive sharper bounds for the proximal point method~(PPM) and optimistic gradient method~(OG), and provide emph{for the first time} a global convergence rate for the negative momentum method~(NM) with an iteration complexity $mathcal{O}(kappa^{1.5})$, which matches its known lower bound. In addition, for time-varying systems, we prove that the gradient method with optimal step size achieves the fastest provable worst-case convergence rate with quadratic Lyapunov functions. Finally, we further extend our analysis to stochastic games and study the impact of multiplicative noise on different algorithms. We show that it is impossible for an algorithm with one step of memory to achieve acceleration if it only queries the gradient once per batch (in contrast with the stochastic strongly-convex optimization setting, where such acceleration has been demonstrated). However, we exhibit an algorithm which achieves acceleration with two gradient queries per batch.

Optimization and Control Machine Learning Machine Learning

Relatively-Smooth Convex Optimization by First-Order Methods, and Applications

92 - Haihao Lu , Robert M. Freund , Yurii Nesterov 2016

The usual approach to developing and analyzing first-order methods for smooth convex optimization assumes that the gradient of the objective function is uniformly smooth with some Lipschitz constant $L$. However, in many settings the differentiable convex function $f(cdot)$ is not uniformly smooth -- for example in $D$-optimal design where $f(x):=-ln det(HXH^T)$, or even the univariate setting with $f(x) := -ln(x) + x^2$. Herein we develop a notion of relative smoothness and relative strong convexity that is determined relative to a user-specified reference function $h(cdot)$ (that should be computationally tractable for algorithms), and we show that many differentiable convex functions are relatively smooth with respect to a correspondingly fairly-simple reference function $h(cdot)$. We extend two standard algorithms -- the primal gradient scheme and the dual averaging scheme -- to our new setting, with associated computational guarantees. We apply our new approach to develop a new first-order method for the $D$-optimal design problem, with associated computational complexity analysis. Some of our results have a certain overlap with the recent work cite{bbt}.

Optimization and Control

A Unified Convergence Analysis for Shuffling-Type Gradient Methods

141 - Lam M. Nguyen , Quoc Tran-Dinh , Dzung T. Phan 2020

In this paper, we provide a unified convergence analysis for a class of shuffling-type gradient methods for solving a well-known finite-sum minimization problem commonly used in machine learning. This algorithm covers various variants such as randomized reshuffling, single shuffling, and cyclic/incremental gradient schemes. We consider two different settings: strongly convex and non-convex problems. Our main contribution consists of new non-asymptotic and asymptotic convergence rates for a general class of shuffling-type gradient methods to solve both non-convex and strongly convex problems. While our rate in the non-convex problem is new (i.e. not known yet under standard assumptions), the rate on the strongly convex case matches (up to a constant) the best-known results. However, unlike existing works in this direction, we only use standard assumptions such as smoothness and strong convexity. Finally, we empirically illustrate the effect of learning rates via a non-convex logistic regression and neural network examples.

Optimization and Control Machine Learning Machine Learning

New Computational Guarantees for Solving Convex Optimization Problems with First Order Methods, via a Function Growth Condition Measure

338 - Robert M. Freund , Haihao Lu 2015

Motivated by recent work of Renegar, we present new computational methods and associated computational guarantees for solving convex optimization problems using first-order methods. Our problem of interest is the general convex optimization problem $f^* = min_{x in Q} f(x)$, where we presume knowledge of a strict lower bound $f_{mathrm{slb}} < f^*$. [Indeed, $f_{mathrm{slb}}$ is naturally known when optimizing many loss functions in statistics and machine learning (least-squares, logistic loss, exponential loss, total variation loss, etc.) as well as in Renegars transformed version of the standard conic optimization problem; in all these cases one has $f_{mathrm{slb}} = 0 < f^*$.] We introduce a new functional measure called the growth constant $G$ for $f(cdot)$, that measures how quickly the level sets of $f(cdot)$ grow relative to the function value, and that plays a fundamental role in the complexity analysis. When $f(cdot)$ is non-smooth, we present new computational guarantees for the Subgradient Descent Method and for smoothing methods, that can improve existing computational guarantees in several ways, most notably when the initial iterate $x^0$ is far from the optimal solution set. When $f(cdot)$ is smooth, we present a scheme for periodically restarting the Accelerated Gradient Method that can also improve existing computational guarantees when $x^0$ is far from the optimal solution set, and in the presence of added structure we present a scheme using parametrically increased smoothing that further improves the associated computational guarantees.

Optimization and Control