Heavy-ball Algorithms Always Escape Saddle Points

61 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Tao Sun

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Tao Sun - Dongsheng Li - Zhe Quan

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Nonconvex optimization algorithms with random initialization have attracted increasing attention recently. It has been showed that many first-order methods always avoid saddle points with random starting points. In this paper, we answer a question: can the nonconvex heavy-ball algorithms with random initialization avoid saddle points? The answer is yes! Direct using the existing proof technique for the heavy-ball algorithms is hard due to that each iteration of the heavy-ball algorithm consists of current and last points. It is impossible to formulate the algorithms as iteration like xk+1= g(xk) under some mapping g. To this end, we design a new mapping on a new space. With some transfers, the heavy-ball algorithm can be interpreted as iterations after this mapping. Theoretically, we prove that heavy-ball gradient descent enjoys larger stepsize than the gradient descent to escape saddle points to escape the saddle point. And the heavy-ball proximal point algorithm is also considered; we also proved that the algorithm can always escape the saddle point.

قيم البحث

62 - Lampros Flokas , Emmanouil-Vasileios Vlatakis-Gkaragkounis , Georgiosn Piliouras 2019

We consider the case of derivative-free algorithms for non-convex optimization, also known as zero order algorithms, that use only function evaluations rather than gradients. For a wide variety of gradient approximators based on finite differences, w e establish asymptotic convergence to second order stationary points using a carefully tailored application of the Stable Manifold Theorem. Regarding efficiency, we introduce a noisy zero-order method that converges to second order stationary points, i.e avoids saddle points. Our algorithm uses only $tilde{mathcal{O}}(1 / epsilon^2)$ approximate gradient calculations and, thus, it matches the converge rate guarantees of their exact gradient counterparts up to constants. In contrast to previous work, our convergence rate analysis avoids imposing additional dimension dependent slowdowns in the number of iterations required for non-convex zero order optimization.

التحسين والتحكم التعلم الآلي التعلم الالي

Escaping strict saddle points of the Moreau envelope in nonsmooth optimization

67 - Damek Davis , Mateo Diaz , Dmitriy Drusvyatskiy 2021

Recent work has shown that stochastically perturbed gradient methods can efficiently escape strict saddle points of smooth functions. We extend this body of work to nonsmooth optimization, by analyzing an inexact analogue of a stochastically perturbe d gradient method applied to the Moreau envelope. The main conclusion is that a variety of algorithms for nonsmooth optimization can escape strict saddle points of the Moreau envelope at a controlled rate. The main technical insight is that typical algorithms applied to the proximal subproblem yield directions that approximate the gradient of the Moreau envelope in relative terms.

التحسين والتحكم التعلم الآلي التعلم الالي

Non-ergodic Convergence Analysis of Heavy-Ball Algorithms

162 - Tao Sun , Penghang Yin , Dongsheng Li 2018

In this paper, we revisit the convergence of the Heavy-ball method, and present improved convergence complexity results in the convex setting. We provide the first non-ergodic O(1/k) rate result of the Heavy-ball algorithm with constant step size for coercive objective functions. For objective functions satisfying a relaxed strongly convex condition, the linear convergence is established under weaker assumptions on the step size and inertial parameter than made in the existing literature. We extend our results to multi-block version of the algorithm with both the cyclic and stochastic update rules. In addition, our results can also be extended to decentralized optimization, where the ergodic analysis is not applicable.

التحسين والتحكم التعلم الالي

Exit Time Analysis for Approximations of Gradient Descent Trajectories Around Saddle Points

60 - Rishabh Dixit , Waheed U. Bajwa 2020

This paper considers the problem of understanding the exit time for trajectories of gradient-related first-order methods from saddle neighborhoods under some initial boundary conditions. Given the `flat geometry around saddle points, first-order meth ods can struggle in escaping these regions in a fast manner due to the small magnitudes of gradients encountered. In particular, while it is known that gradient-related first-order methods escape strict-saddle neighborhoods, existing literature does not explicitly leverage the local geometry around saddle points in order to control behavior of gradient trajectories. It is in this context that this paper puts forth a rigorous geometric analysis of the gradient-descent method around strict-saddle neighborhoods using matrix perturbation theory. In doing so, it provides a key result that can be used to generate an approximate gradient trajectory for any given initial conditions. In addition, the analysis leads to a linear exit-time solution for gradient-descent method under certain necessary initial conditions for a class of strict-saddle functions.

التحسين والتحكم التعلم الآلي أنظمة وتحكم

A fast and simple modification of Newtons method helping to avoid saddle points

64 - Tuyen Trung Truong , Tat Dat To , Tuan Hang Nguyen 2020

We propose in this paper New Q-Newtons method. The update rule is very simple conceptually, for example $x_{n+1}=x_n-w_n$ where $w_n=pr_{A_n,+}(v_n)-pr_{A_n,-}(v_n)$, with $A_n= abla ^2f(x_n)+delta _n|| abla f(x_n)||^2.Id$ and $v_n=A_n^{-1}. abla f(x _n)$. Here $delta _n$ is an appropriate real number so that $A_n$ is invertible, and $pr_{A_n,pm}$ are projections to the vector subspaces generated by eigenvectors of positive (correspondingly negative) eigenvalues of $A_n$. The main result of this paper roughly says that if $f$ is $C^3$ (can be unbounded from below) and a sequence ${x_n}$, constructed by the New Q-Newtons method from a random initial point $x_0$, {bf converges}, then the limit point is a critical point and is not a saddle point, and the convergence rate is the same as that of Newtons method. The first author has recently been successful incorporating Backtracking line search to New Q-Newtons method, thus resolving the convergence guarantee issue observed for some (non-smooth) cost functions. An application to quickly finding zeros of a univariate meromorphic function will be discussed. Various experiments are performed, against well known algorithms such as BFGS and Adaptive Cubic Regularization are presented.

التحسين والتحكم التعلم الآلي التحليل العددي