ترغب بنشر مسار تعليمي؟ اضغط هنا

On The Convergence of First Order Methods for Quasar-Convex Optimization

101   0   0.0 ( 0 )
 نشر من قبل Jikai Jin
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English
 تأليف Jikai Jin




اسأل ChatGPT حول البحث

In recent years, the success of deep learning has inspired many researchers to study the optimization of general smooth non-convex functions. However, recent works have established pessimistic worst-case complexities for this class functions, which is in stark contrast with their superior performance in real-world applications (e.g. training deep neural networks). On the other hand, it is found that many popular non-convex optimization problems enjoy certain structured properties which bear some similarities to convexity. In this paper, we study the class of textit{quasar-convex functions} to close the gap between theory and practice. We study the convergence of first order methods in a variety of different settings and under different optimality criterions. We prove complexity upper bounds that are similar to standard results established for convex functions and much better that state-of-the-art convergence rates of non-convex functions. Overall, this paper suggests that textit{quasar-convexity} allows efficient optimization procedures, and we are looking forward to seeing more problems that demonstrate similar properties in practice.



قيم البحث

اقرأ أيضاً

313 - Long Chen , Hao Luo 2021
We present a unified convergence analysis for first order convex optimization methods using the concept of strong Lyapunov conditions. Combining this with suitable time scaling factors, we are able to handle both convex and strong convex cases, and e stablish contraction properties of Lyapunov functions for many existing ordinary differential equation models. Then we derive prevailing first order optimization algorithms, such as proximal gradient methods, heavy ball methods (also known as momentum methods), Nesterov accelerated gradient methods, and accelerated proximal gradient methods from numerical discretizations of corresponding dynamical systems. We also apply strong Lyapunov conditions to the discrete level and provide a more systematical analysis framework. Another contribution is a novel second order dynamical system called Hessian-driven Nesterov accelerated gradient flow which can be used to design and analyze accelerated first order methods for smooth and non-smooth convex optimizations.
We provide improved convergence rates for constrained convex-concave min-max problems and monotone variational inequalities with higher-order smoothness. In min-max settings where the $p^{th}$-order derivatives are Lipschitz continuous, we give an al gorithm HigherOrderMirrorProx that achieves an iteration complexity of $O(1/T^{frac{p+1}{2}})$ when given access to an oracle for finding a fixed point of a $p^{th}$-order equation. We give analogous rates for the weak monotone variational inequality problem. For $p>2$, our results improve upon the iteration complexity of the first-order Mirror Prox method of Nemirovski [2004] and the second-order method of Monteiro and Svaiter [2012]. We further instantiate our entire algorithm in the unconstrained $p=2$ case.
The usual approach to developing and analyzing first-order methods for smooth convex optimization assumes that the gradient of the objective function is uniformly smooth with some Lipschitz constant $L$. However, in many settings the differentiable c onvex function $f(cdot)$ is not uniformly smooth -- for example in $D$-optimal design where $f(x):=-ln det(HXH^T)$, or even the univariate setting with $f(x) := -ln(x) + x^2$. Herein we develop a notion of relative smoothness and relative strong convexity that is determined relative to a user-specified reference function $h(cdot)$ (that should be computationally tractable for algorithms), and we show that many differentiable convex functions are relatively smooth with respect to a correspondingly fairly-simple reference function $h(cdot)$. We extend two standard algorithms -- the primal gradient scheme and the dual averaging scheme -- to our new setting, with associated computational guarantees. We apply our new approach to develop a new first-order method for the $D$-optimal design problem, with associated computational complexity analysis. Some of our results have a certain overlap with the recent work cite{bbt}.
We provide improved convergence rates for various emph{non-smooth} optimization problems via higher-order accelerated methods. In the case of $ell_infty$ regression, we achieves an $O(epsilon^{-4/5})$ iteration complexity, breaking the $O(epsilon^{-1 })$ barrier so far present for previous methods. We arrive at a similar rate for the problem of $ell_1$-SVM, going beyond what is attainable by first-order methods with prox-oracle access for non-smooth non-strongly convex problems. We further show how to achieve even faster rates by introducing higher-order regularization. Our results rely on recent advances in near-optimal accelerated methods for higher-order smooth convex optimization. In particular, we extend Nesterovs smoothing technique to show that the standard softmax approximation is not only smooth in the usual sense, but also emph{higher-order} smooth. With this observation in hand, we provide the first example of higher-order acceleration techniques yielding faster rates for emph{non-smooth} optimization, to the best of our knowledge.
We propose first-order methods based on a level-set technique for convex constrained optimization that satisfies an error bound condition with unknown growth parameters. The proposed approach solves the original problem by solving a sequence of uncon strained subproblems defined with different level parameters. Different from the existing level-set methods where the subproblems are solved sequentially, our method applies a first-order method to solve each subproblem independently and simultaneously, which can be implemented with either a single or multiple processors. Once the objective value of one subproblem is reduced by a constant factor, a sequential restart is performed to update the level parameters and restart the first-order methods. When the problem is non-smooth, our method finds an $epsilon$-optimal and $epsilon$-feasible solution by computing at most $O(frac{G^{2/d}}{epsilon^{2-2/d}}ln^3(frac{1}{epsilon}))$ subgradients where $G>0$ and $dgeq 1$ are the growth rate and the exponent, respectively, in the error bound condition. When the problem is smooth, the complexity is improved to $O(frac{G^{1/d}}{epsilon^{1-1/d}}ln^3(frac{1}{epsilon}))$. Our methods do not require knowing $G$, $d$ and any problem dependent parameters.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا