New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Nonlinear Acceleration of CNNs

96 0 0.0 ( 0 )

Download Cite

Added by Damien Scieur

Publication date 2018

fields Informatics Engineering

and research's language is English

Authors Damien Scieur - Edouard Oyallon - Alexandre dAspremont

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

The Regularized Nonlinear Acceleration (RNA) algorithm is an acceleration method capable of improving the rate of convergence of many optimization schemes such as gradient descend, SAGA or SVRG. Until now, its analysis is limited to convex problems, but empirical observations shows that RNA may be extended to wider settings. In this paper, we investigate further the benefits of RNA when applied to neural networks, in particular for the task of image recognition on CIFAR10 and ImageNet. With very few modifications of exiting frameworks, RNA improves slightly the optimization process of CNNs, after training.

rate research

Generalized Framework for Nonlinear Acceleration

74 - Damien Scieur 2019

Nonlinear acceleration algorithms improve the performance of iterative methods, such as gradient descent, using the information contained in past iterates. However, their efficiency is still not entirely understood even in the quadratic case. In this paper, we clarify the convergence analysis by giving general properties that share several classes of nonlinear acceleration: Anderson acceleration (and variants), quasi-Newton methods (such as Broyden Type-I or Type-II, SR1, DFP, and BFGS) and Krylov methods (Conjugate Gradient, MINRES, GMRES). In particular, we propose a generic family of algorithms that contains all the previous methods and prove its optimal rate of convergence when minimizing quadratic functions. We also propose multi-secants updates for the quasi-Newton methods listed above. We provide a Matlab code implementing the algorithm.

Optimization and Control

A Generic Acceleration Framework for Stochastic Composite Optimization

81 - Andrei Kulunchakov 2019

In this paper, we introduce various mechanisms to obtain accelerated first-order stochastic optimization algorithms when the objective function is convex or strongly convex. Specifically, we extend the Catalyst approach originally designed for deterministic objectives to the stochastic setting. Given an optimization method with mild convergence guarantees for strongly convex problems, the challenge is to accelerate convergence to a noise-dominated region, and then achieve convergence with an optimal worst-case complexity depending on the noise variance of the gradients. A side contribution of our work is also a generic analysis that can handle inexact proximal operators, providing new insights about the robustness of stochastic algorithms when the proximal operator cannot be exactly computed.

Optimization and Control Machine Learning Machine Learning

Nonlinear Acceleration of Momentum and Primal-Dual Algorithms

88 - Raghu Bollapragada , Damien Scieur , Alexandre dAspremont 2018

We describe convergence acceleration schemes for multistep optimization algorithms. The extrapolated solution is written as a nonlinear average of the iterates produced by the original optimization method. Our analysis does not need the underlying fixed-point operator to be symmetric, hence handles e.g. algorithms with momentum terms such as Nesterovs accelerated method, or primal-dual methods. The weights are computed via a simple linear system and we analyze performance in both online and offline modes. We use Crouzeixs conjecture to show that acceleration performance is controlled by the solution of a Chebyshev problem on the numerical range of a non-symmetric operator modeling the behavior of iterates near the optimum. Numerical experiments are detailed on logistic regression problems.

Optimization and Control

Limitations on Variance-Reduction and Acceleration Schemes for Finite Sum Optimization

112 - Yossi Arjevani 2017

We study the conditions under which one is able to efficiently apply variance-reduction and acceleration schemes on finite sum optimization problems. First, we show that, perhaps surprisingly, the finite sum structure by itself, is not sufficient for obtaining a complexity bound of $tilde{cO}((n+L/mu)ln(1/epsilon))$ for $L$-smooth and $mu$-strongly convex individual functions - one must also know which individual function is being referred to by the oracle at each iteration. Next, we show that for a broad class of first-order and coordinate-descent finite sum algorithms (including, e.g., SDCA, SVRG, SAG), it is not possible to get an `accelerated complexity bound of $tilde{cO}((n+sqrt{n L/mu})ln(1/epsilon))$, unless the strong convexity parameter is given explicitly. Lastly, we show that when this class of algorithms is used for minimizing $L$-smooth and convex finite sums, the optimal complexity bound is $tilde{cO}(n+L/epsilon)$, assuming that (on average) the same update rule is used in every iteration, and $tilde{cO}(n+sqrt{nL/epsilon})$, otherwise.

Optimization and Control Machine Learning Machine Learning

Acceleration Methods

61 - Alexandre dAspremont , Damien Scieur , Adrien Taylor 2021

This monograph covers some recent advances on a range of acceleration techniques frequently used in convex optimization. We first use quadratic optimization problems to introduce two key families of methods, momentum and nested optimization schemes, which coincide in the quadratic case to form the Chebyshev method whose complexity is analyzed using Chebyshev polynomials. We discuss momentum methods in detail, starting with the seminal work of Nesterov (1983) and structure convergence proofs using a few master templates, such as that of emph{optimized gradient methods} which have the key benefit of showing how momentum methods maximize convergence rates. We further cover proximal acceleration techniques, at the heart of the emph{Catalyst} and emph{Accelerated Hybrid Proximal Extragradient} frameworks, using similar algorithmic patterns. Common acceleration techniques directly rely on the knowledge of some regularity parameters of the problem at hand, and we conclude by discussing emph{restart} schemes, a set of simple techniques to reach nearly optimal convergence rates while adapting to unobserved regularity parameters.

Optimization and Control Machine Learning Numerical Analysis

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Nonlinear Acceleration of CNNs

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions