ترغب بنشر مسار تعليمي؟ اضغط هنا

No quantum speedup over gradient descent for non-smooth convex optimization

191   0   0.0 ( 0 )
 نشر من قبل Robin Kothari
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

We study the first-order convex optimization problem, where we have black-box access to a (not necessarily smooth) function $f:mathbb{R}^n to mathbb{R}$ and its (sub)gradient. Our goal is to find an $epsilon$-approximate minimum of $f$ starting from a point that is distance at most $R$ from the true minimum. If $f$ is $G$-Lipschitz, then the classic gradient descent algorithm solves this problem with $O((GR/epsilon)^{2})$ queries. Importantly, the number of queries is independent of the dimension $n$ and gradient descent is optimal in this regard: No deterministic or randomized algorithm can achieve better complexity that is still independent of the dimension $n$. In this paper we reprove the randomized lower bound of $Omega((GR/epsilon)^{2})$ using a simpler argument than previous lower bounds. We then show that although the function family used in the lower bound is hard for randomized algorithms, it can be solved using $O(GR/epsilon)$ quantum queries. We then show an improved lower bound against quantum algorithms using a different set of instances and establish our main result that in general even quantum algorithms need $Omega((GR/epsilon)^2)$ queries to solve the problem. Hence there is no quantum speedup over gradient descent for black-box first-order convex optimization without further assumptions on the function family.



قيم البحث

اقرأ أيضاً

Optimization algorithms for solving nonconvex inverse problem have attracted significant interests recently. However, existing methods require the nonconvex regularization to be smooth or simple to ensure convergence. In this paper, we propose a nove l gradient descent type algorithm, by leveraging the idea of residual learning and Nesterovs smoothing technique, to solve inverse problems consisting of general nonconvex and nonsmooth regularization with provable convergence. Moreover, we develop a neural network architecture intimating this algorithm to learn the nonlinear sparsity transformation adaptively from training data, which also inherits the convergence to accommodate the general nonconvex structure of this learned transformation. Numerical results demonstrate that the proposed network outperforms the state-of-the-art methods on a variety of different image reconstruction problems in terms of efficiency and accuracy.
In this paper we propose several adaptive gradient methods for stochastic optimization. Unlike AdaGrad-type of methods, our algorithms are based on Armijo-type line search and they simultaneously adapt to the unknown Lipschitz constant of the gradien t and variance of the stochastic approximation for the gradient. We consider an accelerated and non-accelerated gradient descent for convex problems and gradient descent for non-convex problems. In the experiments we demonstrate superiority of our methods to existing adaptive methods, e.g. AdaGrad and Adam.
We consider the problem of finding local minimizers in non-convex and non-smooth optimization. Under the assumption of strict saddle points, positive results have been derived for first-order methods. We present the first known results for the non-sm ooth case, which requires different analysis and a different algorithm.
In this work, we analyze the global convergence property of coordinate gradient descent with random choice of coordinates and stepsizes for non-convex optimization problems. Under generic assumptions, we prove that the algorithm iterate will almost s urely escape strict saddle points of the objective function. As a result, the algorithm is guaranteed to converge to local minima if all saddle points are strict. Our proof is based on viewing coordinate descent algorithm as a nonlinear random dynamical system and a quantitative finite block analysis of its linearization around saddle points.
We study adversary-resilient stochastic distributed optimization, in which $m$ machines can independently compute stochastic gradients, and cooperate to jointly optimize over their local objective functions. However, an $alpha$-fraction of the machin es are $textit{Byzantine}$, in that they may behave in arbitrary, adversarial ways. We consider a variant of this procedure in the challenging $textit{non-convex}$ case. Our main result is a new algorithm SafeguardSGD which can provably escape saddle points and find approximate local minima of the non-convex objective. The algorithm is based on a new concentration filtering technique, and its sample and time complexity bounds match the best known theoretical bounds in the stochastic, distributed setting when no Byzantine machines are present. Our algorithm is very practical: it improves upon the performance of all prior methods when training deep neural networks, it is relatively lightweight, and it is the first method to withstand two recently-proposed Byzantine attacks.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا