Acceleration in First Order Quasi-strongly Convex Optimization by ODE Discretization

70 0 0.0 ( 0 )

Download Cite

Added by Jingzhao Zhang

Publication date 2019

fields

and research's language is English

Authors Jingzhao Zhang - Suvrit Sra - Ali Jadbabaie

Optimization and Control

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We study gradient-based optimization methods obtained by direct Runge-Kutta discretization of the ordinary differential equation (ODE) describing the movement of a heavy-ball under constant friction coefficient. When the function is high order smooth and strongly convex, we show that directly simulating the ODE with known numerical integrators achieve acceleration in a nontrivial neighborhood of the optimal solution. In particular, the neighborhood can grow larger as the condition number of the function increases. Furthermore, our results also hold for nonconvex but quasi-strongly convex objectives. We provide numerical experiments that verify the theoretical rates predicted by our results.

rate research

Achieving Acceleration in Distributed Optimization via Direct Discretization of the Heavy-Ball ODE

71 - Jingzhao Zhang , Cesar A. Uribe , Aryan Mokhtari 2018

We develop a distributed algorithm for convex Empirical Risk Minimization, the problem of minimizing large but finite sum of convex functions over networks. The proposed algorithm is derived from directly discretizing the second-order heavy-ball differential equation and results in an accelerated convergence rate, i.e, faster than distributed gradient descent-based methods for strongly convex objectives that may not be smooth. Notably, we achieve acceleration without resorting to the well-known Nesterovs momentum approach. We provide numerical experiments and contrast the proposed method with recently proposed optimal distributed optimization algorithms.

Optimization and Control

Robustness of accelerated first-order algorithms for strongly convex optimization problems

128 - Hesameddin Mohammadi , Meisam Razaviyayn , Mihailo R. Jovanovic 2019

We study the robustness of accelerated first-order algorithms to stochastic uncertainties in gradient evaluation. Specifically, for unconstrained, smooth, strongly convex optimization problems, we examine the mean-squared error in the optimization variable when the iterates are perturbed by additive white noise. This type of uncertainty may arise in situations where an approximation of the gradient is sought through measurements of a real system or in a distributed computation over a network. Even though the underlying dynamics of first-order algorithms for this class of problems are nonlinear, we establish upper bounds on the mean-squared deviation from the optimal solution that are tight up to constant factors. Our analysis quantifies fundamental trade-offs between noise amplification and convergence rates obtained via any acceleration scheme similar to Nesterovs or heavy-ball methods. To gain additional analytical insight, for strongly convex quadratic problems, we explicitly evaluate the steady-state variance of the optimization variable in terms of the eigenvalues of the Hessian of the objective function. We demonstrate that the entire spectrum of the Hessian, rather than just the extreme eigenvalues, influence robustness of noisy algorithms. We specialize this result to the problem of distributed averaging over undirected networks and examine the role of network size and topology on the robustness of noisy accelerated algorithms.

Optimization and Control Artificial Intelligence Machine Learning

Relatively-Smooth Convex Optimization by First-Order Methods, and Applications

92 - Haihao Lu , Robert M. Freund , Yurii Nesterov 2016

The usual approach to developing and analyzing first-order methods for smooth convex optimization assumes that the gradient of the objective function is uniformly smooth with some Lipschitz constant $L$. However, in many settings the differentiable convex function $f(cdot)$ is not uniformly smooth -- for example in $D$-optimal design where $f(x):=-ln det(HXH^T)$, or even the univariate setting with $f(x) := -ln(x) + x^2$. Herein we develop a notion of relative smoothness and relative strong convexity that is determined relative to a user-specified reference function $h(cdot)$ (that should be computationally tractable for algorithms), and we show that many differentiable convex functions are relatively smooth with respect to a correspondingly fairly-simple reference function $h(cdot)$. We extend two standard algorithms -- the primal gradient scheme and the dual averaging scheme -- to our new setting, with associated computational guarantees. We apply our new approach to develop a new first-order method for the $D$-optimal design problem, with associated computational complexity analysis. Some of our results have a certain overlap with the recent work cite{bbt}.

Optimization and Control

On The Convergence of First Order Methods for Quasar-Convex Optimization

100 - Jikai Jin 2020

In recent years, the success of deep learning has inspired many researchers to study the optimization of general smooth non-convex functions. However, recent works have established pessimistic worst-case complexities for this class functions, which is in stark contrast with their superior performance in real-world applications (e.g. training deep neural networks). On the other hand, it is found that many popular non-convex optimization problems enjoy certain structured properties which bear some similarities to convexity. In this paper, we study the class of textit{quasar-convex functions} to close the gap between theory and practice. We study the convergence of first order methods in a variety of different settings and under different optimality criterions. We prove complexity upper bounds that are similar to standard results established for convex functions and much better that state-of-the-art convergence rates of non-convex functions. Overall, this paper suggests that textit{quasar-convexity} allows efficient optimization procedures, and we are looking forward to seeing more problems that demonstrate similar properties in practice.

Optimization and Control Machine Learning Machine Learning

Katyusha Acceleration for Convex Finite-Sum Compositional Optimization

91 - Yibo Xu , Yangyang Xu 2019

Structured problems arise in many applications. To solve these problems, it is important to leverage the structure information. This paper focuses on convex problems with a finite-sum compositional structure. Finite-sum problems appear as the sample average approximation of a stochastic optimization problem and also arise in machine learning with a huge amount of training data. One popularly used numerical approach for finite-sum problems is the stochastic gradient method (SGM). However, the additional compositional structure prohibits easy access to unbiased stochastic approximation of the gradient, so directly applying the SGM to a finite-sum compositional optimization problem (COP) is often inefficient. We design new algorithms for solving strongly-convex and also convex two-level finite-sum COPs. Our design incorporates the Katyusha acceleration technique and adopts the mini-batch sampling from both outer-level and inner-level finite-sum. We first analyze the algorithm for strongly-convex finite-sum COPs. Similar to a few existing works, we obtain linear convergence rate in terms of the expected objective error, and from the convergence rate result, we then establish complexity results of the algorithm to produce an $varepsilon$-solution. Our complexity results have the same dependence on the number of component functions as existing works. However, due to the use of Katyusha acceleration, our results have better dependence on the condition number $kappa$ and improve to $kappa^{2.5}$ from the best-known $kappa^3$. Finally, we analyze the algorithm for convex finite-sum COPs, which uses as a subroutine the algorithm for strongly-convex finite-sum COPs. Again, we obtain better complexity results than existing works in terms of the dependence on $varepsilon$, improving to $varepsilon^{-2.5}$ from the best-known $varepsilon^{-3}$.

Optimization and Control Machine Learning Numerical Analysis