Factor-$sqrt{2}$ Acceleration of Accelerated Gradient Methods

82 0 0.0 ( 0 )

Download Cite

Added by Chanwoo Park

Publication date 2021

fields

and research's language is English

Authors Chanwoo Park - Jisun Park - Ernest K. Ryu

Optimization and Control

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

The optimized gradient method (OGM) provides a factor-$sqrt{2}$ speedup upon Nesterovs celebrated accelerated gradient method in the convex (but non-strongly convex) setup. However, this improved acceleration mechanism has not been well understood; prior analyses of OGM relied on a computer-assisted proof methodology, so the proofs were opaque for humans despite being verifiable and correct. In this work, we present a new analysis of OGM based on a Lyapunov function and linear coupling. These analyses are developed and presented without the assistance of computers and are understandable by humans. Furthermore, we generalize OGMs acceleration mechanism and obtain a factor-$sqrt{2}$ speedup in other setups: acceleration with a simpler rational stepsize, the strongly convex setup, and the mirror descent setup.

rate research

Stopping rules for accelerated gradient methods with additive noise in gradient

101 - Vasin Artem , Alexander Gasnikov , Vladimir Spokoiny 2021

In this article, we consider an accelerated first-order method, namely, the method of similar triangles, which is optimal in the class of convex (strongly convex) problems with a gradient. The paper considers a model of additive noise in a gradient and a Euclidean prox-structure. Convergence estimates are obtained in the case of strong convexity and its absence, and a stopping criterion is proposed for not strongly convex problems.

Optimization and Control

Robust Distributed Accelerated Stochastic Gradient Methods for Multi-Agent Networks

98 - Alireza Fallah , Mert Gurbuzbalaban , Asuman Ozdaglar 2019

We study distributed stochastic gradient (D-SG) method and its accelerated variant (D-ASG) for solving decentralized strongly convex stochastic optimization problems where the objective function is distributed over several computational units, lying on a fixed but arbitrary connected communication graph, subject to local communication constraints where noisy estimates of the gradients are available. We develop a framework which allows to choose the stepsize and the momentum parameters of these algorithms in a way to optimize performance by systematically trading off the bias, variance, robustness to gradient noise and dependence to network effects. When gradients do not contain noise, we also prove that distributed accelerated methods can emph{achieve acceleration}, requiring $mathcal{O}(kappa log(1/varepsilon))$ gradient evaluations and $mathcal{O}(kappa log(1/varepsilon))$ communications to converge to the same fixed point with the non-accelerated variant where $kappa$ is the condition number and $varepsilon$ is the target accuracy. To our knowledge, this is the first acceleration result where the iteration complexity scales with the square root of the condition number in the context of emph{primal} distributed inexact first-order methods. For quadratic functions, we also provide finer performance bounds that are tight with respect to bias and variance terms. Finally, we study a multistage version of D-ASG with parameters carefully varied over stages to ensure exact $mathcal{O}(-k/sqrt{kappa})$ linear decay in the bias term as well as optimal $mathcal{O}(sigma^2/k)$ in the variance term. We illustrate through numerical experiments that our approach results in practical algorithms that are robust to gradient noise and that can outperform existing methods.

Optimization and Control Machine Learning Machine Learning

Acceleration Methods

61 - Alexandre dAspremont , Damien Scieur , Adrien Taylor 2021

This monograph covers some recent advances on a range of acceleration techniques frequently used in convex optimization. We first use quadratic optimization problems to introduce two key families of methods, momentum and nested optimization schemes, which coincide in the quadratic case to form the Chebyshev method whose complexity is analyzed using Chebyshev polynomials. We discuss momentum methods in detail, starting with the seminal work of Nesterov (1983) and structure convergence proofs using a few master templates, such as that of emph{optimized gradient methods} which have the key benefit of showing how momentum methods maximize convergence rates. We further cover proximal acceleration techniques, at the heart of the emph{Catalyst} and emph{Accelerated Hybrid Proximal Extragradient} frameworks, using similar algorithmic patterns. Common acceleration techniques directly rely on the knowledge of some regularity parameters of the problem at hand, and we conclude by discussing emph{restart} schemes, a set of simple techniques to reach nearly optimal convergence rates while adapting to unobserved regularity parameters.

Optimization and Control Machine Learning Numerical Analysis

Stochastic Bias-Reduced Gradient Methods

153 - Hilal Asi , Yair Carmon , Arun Jambulapati 2021

We develop a new primitive for stochastic optimization: a low-bias, low-cost estimator of the minimizer $x_star$ of any Lipschitz strongly-convex function. In particular, we use a multilevel Monte-Carlo approach due to Blanchet and Glynn to turn any optimal stochastic gradient method into an estimator of $x_star$ with bias $delta$, variance $O(log(1/delta))$, and an expected sampling cost of $O(log(1/delta))$ stochastic gradient evaluations. As an immediate consequence, we obtain cheap and nearly unbiased gradient estimators for the Moreau-Yoshida envelope of any Lipschitz convex function, allowing us to perform dimension-free randomized smoothing. We demonstrate the potential of our estimator through four applications. First, we develop a method for minimizing the maximum of $N$ functions, improving on recent results and matching a lower bound up logarithmic factors. Second and third, we recover state-of-the-art rates for projection-efficient and gradient-efficient optimization using simple algorithms with a transparent analysis. Finally, we show that an improved version of our estimator would yield a nearly linear-time, optimal-utility, differentially-private non-smooth stochastic optimization method.

Optimization and Control Data Structures and Algorithms Machine Learning

Exact and Heuristic Methods with Warm-start for Embedded Mixed-Integer Quadratic Programming Based on Accelerated Dual Gradient Projection

153 - Vihangkumar V. Naik , Alberto Bemporad 2021

Small-scale Mixed-Integer Quadratic Programming (MIQP) problems often arise in embedded control and estimation applications. Driven by the need for algorithmic simplicity to target computing platforms with limited memory and computing resources, this paper proposes a few approaches to solving MIQPs, either to optimality or suboptimally. We specialize an existing Accelerated Dual Gradient Projection (GPAD) algorithm to effectively solve the Quadratic Programming (QP) relaxation that arise during Branch and Bound (B&B) and propose a generic framework to warm-start the binary variables which reduces the number of QP relaxations. Moreover, in order to find an integer feasible combination of the binary variables upfront, two heuristic approaches are presented: ($i$) without using B&B, and ($ii$) using B&B with a significantly reduced number of QP relaxations. Both heuristic approaches return an integer feasible solution that may be suboptimal but involve a much reduced computation effort. Such a feasible solution can be either implemented directly or used to set an initial upper bound on the optimal cost in B&B. Through different hybrid control and estimation examples involving binary decision variables, we show that the performance of the proposed methods, although very simple to code, is comparable to that of state-of-the-art MIQP solvers.

Optimization and Control Systems and Control Systems and Control