Lower Bounds for Finding Stationary Points II: First-Order Methods

77 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Yair Carmon

تاريخ النشر 2017

مجال البحث

والبحث باللغة English

تأليف Yair Carmon - John C. Duchi - Oliver Hinder

التحسين والتحكم

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We establish lower bounds on the complexity of finding $epsilon$-stationary points of smooth, non-convex high-dimensional functions using first-order methods. We prove that deterministic first-order methods, even applied to arbitrarily smooth functions, cannot achieve convergence rates in $epsilon$ better than $epsilon^{-8/5}$, which is within $epsilon^{-1/15}logfrac{1}{epsilon}$ of the best known rate for such methods. Moreover, for functions with Lipschitz first and second derivatives, we prove no deterministic first-order method can achieve convergence rates better than $epsilon^{-12/7}$, while $epsilon^{-2}$ is a lower bound for functions with only Lipschitz gradient. For convex functions with Lipschitz gradient, accelerated gradient descent achieves the rate $epsilon^{-1}logfrac{1}{epsilon}$, showing that finding stationary points is easier given convexity.

قيم البحث

87 - Yair Carmon , John C. Duchi , Oliver Hinder 2017

We prove lower bounds on the complexity of finding $epsilon$-stationary points (points $x$ such that $| abla f(x)| le epsilon$) of smooth, high-dimensional, and potentially non-convex functions $f$. We consider oracle-based complexity measures, where an algorithm is given access to the value and all derivatives of $f$ at a query point $x$. We show that for any (potentially randomized) algorithm $mathsf{A}$, there exists a function $f$ with Lipschitz $p$th order derivatives such that $mathsf{A}$ requires at least $epsilon^{-(p+1)/p}$ queries to find an $epsilon$-stationary point. Our lower bounds are sharp to within constants, and they show that gradient descent, cubic-regularized Newtons method, and generalized $p$th order regularization are worst-case optimal within their natural function classes.

التحسين والتحكم

Lower complexity bounds of first-order methods for convex-concave bilinear saddle-point problems

75 - Yuyuan Ouyang , Yangyang Xu 2018

On solving a convex-concave bilinear saddle-point problem (SPP), there have been many works studying the complexity results of first-order methods. These results are all about upper complexity bounds, which can determine at most how many efforts woul d guarantee a solution of desired accuracy. In this paper, we pursue the opposite direction by deriving lower complexity bounds of first-order methods on large-scale SPPs. Our results apply to the methods whose iterates are in the linear span of past first-order information, as well as more general methods that produce their iterates in an arbitrary manner based on first-order information. We first work on the affinely constrained smooth convex optimization that is a special case of SPP. Different from gradient method on unconstrained problems, we show that first-order methods on affinely constrained problems generally cannot be accelerated from the known convergence rate $O(1/t)$ to $O(1/t^2)$, and in addition, $O(1/t)$ is optimal for convex problems. Moreover, we prove that for strongly convex problems, $O(1/t^2)$ is the best possible convergence rate, while it is known that gradient methods can have linear convergence on unconstrained problems. Then we extend these results to general SPPs. It turns out that our lower complexity bounds match with several established upper complexity bounds in the literature, and thus they are tight and indicate the optimality of several existing first-order methods.

التحسين والتحكم

Practical Schemes for Finding Near-Stationary Points of Convex Finite-Sums

103 - Kaiwen Zhou , Lai Tian , Anthony Man-Cho So 2021

The problem of finding near-stationary points in convex optimization has not been adequately studied yet, unlike other optimality measures such as minimizing function value. Even in the deterministic case, the optimal method (OGM-G, due to Kim and Fe ssler (2021)) has just been discovered recently. In this work, we conduct a systematic study of the algorithmic techniques in finding near-stationary points of convex finite-sums. Our main contributions are several algorithmic discoveries: (1) we discover a memory-saving variant of OGM-G based on the performance estimation problem approach (Drori and Teboulle, 2014); (2) we design a new accelerated SVRG variant that can simultaneously achieve fast rates for both minimizing gradient norm and function value; (3) we propose an adaptively regularized accelerated SVRG variant, which does not require the knowledge of some unknown initial constants and achieves near-optimal complexities. We put an emphasis on the simplicity and practicality of the new schemes, which could facilitate future developments.

التحسين والتحكم التعلم الآلي التعلم الالي

Complexity of Finding Stationary Points of Nonsmooth Nonconvex Functions

80 - Jingzhao Zhang , Hongzhou Lin , Stefanie Jegelka 2020

We provide the first non-asymptotic analysis for finding stationary points of nonsmooth, nonconvex functions. In particular, we study the class of Hadamard semi-differentiable functions, perhaps the largest class of nonsmooth functions for which the chain rule of calculus holds. This class contains examples such as ReLU neural networks and others with non-differentiable activation functions. We first show that finding an $epsilon$-stationary point with first-order methods is impossible in finite time. We then introduce the notion of $(delta, epsilon)$-stationarity, which allows for an $epsilon$-approximate gradient to be the convex combination of generalized gradients evaluated at points within distance $delta$ to the solution. We propose a series of randomized first-order methods and analyze their complexity of finding a $(delta, epsilon)$-stationary point. Furthermore, we provide a lower bound and show that our stochastic algorithm has min-max optimal dependence on $delta$. Empirically, our methods perform well for training ReLU neural networks.

التحسين والتحكم التعلم الآلي

Interior-point methods for second-order stationary points of nonlinear semidefinite optimization problems using negative curvature

64 - Shun Arahata , Takayuki Okuno , Akiko Takeda 2021

We propose a primal-dual interior-point method (IPM) with convergence to second-order stationary points (SOSPs) of nonlinear semidefinite optimization problems, abbreviated as NSDPs. As far as we know, the current algorithms for NSDPs only ensure con vergence to first-order stationary points such as Karush-Kuhn-Tucker points. The proposed method generates a sequence approximating SOSPs while minimizing a primal-dual merit function for NSDPs by using scaled gradient directions and directions of negative curvature. Under some assumptions, the generated sequence accumulates at an SOSP with a worst-case iteration complexity. This result is also obtained for a primal IPM with slight modification. Finally, our numerical experiments show the benefits of using directions of negative curvature in the proposed method.

التحسين والتحكم

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة الحواش الخاصة

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Lower Bounds for Finding Stationary Points II: First-Order Methods

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً