Lower Bounds for Non-Convex Stochastic Optimization

390 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Yair Carmon

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Yossi Arjevani - Yair Carmon - John C. Duchi

التحسين والتحكم نظرية المعلومات التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We lower bound the complexity of finding $epsilon$-stationary points (with gradient norm at most $epsilon$) using stochastic first-order methods. In a well-studied model where algorithms access smooth, potentially non-convex functions through queries to an unbiased stochastic gradient oracle with bounded variance, we prove that (in the worst case) any algorithm requires at least $epsilon^{-4}$ queries to find an $epsilon$ stationary point. The lower bound is tight, and establishes that stochastic gradient descent is minimax optimal in this model. In a more restrictive model where the noisy gradient estimates satisfy a mean-squared smoothness property, we prove a lower bound of $epsilon^{-3}$ queries, establishing the optimality of recently proposed variance reduction techniques.

قيم البحث

88 - Ran Xin , Usman A. Khan , Soummya Kar 2021

This paper considers decentralized stochastic optimization over a network of $n$ nodes, where each node possesses a smooth non-convex local cost function and the goal of the networked nodes is to find an $epsilon$-accurate first-order stationary poin t of the sum of the local costs. We focus on an online setting, where each node accesses its local cost only by means of a stochastic first-order oracle that returns a noisy version of the exact gradient. In this context, we propose a novel single-loop decentralized hybrid variance-reduced stochastic gradient method, called GT-HSGD, that outperforms the existing approaches in terms of both the oracle complexity and practical implementation. The GT-HSGD algorithm implements specialized local hybrid stochastic gradient estimators that are fused over the network to track the global gradient. Remarkably, GT-HSGD achieves a network topology-independent oracle complexity of $O(n^{-1}epsilon^{-3})$ when the required error tolerance $epsilon$ is small enough, leading to a linear speedup with respect to the centralized optimal online variance-reduced approaches that operate on a single node. Numerical experiments are provided to illustrate our main technical results.

التحسين والتحكم النظم الموزعة والتوازية والحوسبة العنقودية التعلم الآلي

Adaptive Gradient Descent for Convex and Non-Convex Stochastic Optimization

132 - Darina Dvinskikh , Aleksandr Ogaltsov , Alexander Gasnikov 2019

In this paper we propose several adaptive gradient methods for stochastic optimization. Unlike AdaGrad-type of methods, our algorithms are based on Armijo-type line search and they simultaneously adapt to the unknown Lipschitz constant of the gradien t and variance of the stochastic approximation for the gradient. We consider an accelerated and non-accelerated gradient descent for convex problems and gradient descent for non-convex problems. In the experiments we demonstrate superiority of our methods to existing adaptive methods, e.g. AdaGrad and Adam.

التحسين والتحكم

Graph Oracle Models, Lower Bounds, and Gaps for Parallel Stochastic Optimization

182 - Blake Woodworth , Jialei Wang , Adam Smith 2018

We suggest a general oracle-based framework that captures different parallel stochastic optimization settings described by a dependency graph, and derive generic lower bounds in terms of this graph. We then use the framework and derive lower bounds f or several specific parallel optimization settings, including delayed updates and parallel processing with intermittent communication. We highlight gaps between lower and upper bounds on the oracle complexity, and cases where the natural algorithms are not known to be optimal.

التحسين والتحكم التعلم الآلي التعلم الالي

On Stochastic Moving-Average Estimators for Non-Convex Optimization

300 - Zhishuai Guo , Yi Xu , Wotao Yin 2021

In this paper, we demonstrate the power of a widely used stochastic estimator based on moving average (SEMA) on a range of stochastic non-convex optimization problems, which only requires {bf a general unbiased stochastic oracle}. We analyze various stochastic methods (existing or newly proposed) based on the {bf variance recursion property} of SEMA for three families of non-convex optimization, namely standard stochastic non-convex minimization, stochastic non-convex strongly-concave min-max optimization, and stochastic bilevel optimization. Our contributions include: (i) for standard stochastic non-convex minimization, we present a simple and intuitive proof of convergence for a family Adam-style methods (including Adam) with an increasing or large momentum parameter for the first-order moment, which gives an alternative yet more natural way to guarantee Adam converge; (ii) for stochastic non-convex strongly-concave min-max optimization, we present a single-loop stochastic gradient descent ascent method based on the moving average estimators and establish its oracle complexity of $O(1/epsilon^4)$ without using a large mini-batch size, addressing a gap in the literature; (iii) for stochastic bilevel optimization, we present a single-loop stochastic method based on the moving average estimators and establish its oracle complexity of $widetilde O(1/epsilon^4)$ without computing the inverse or SVD of the Hessian matrix, improving state-of-the-art results. For all these problems, we also establish a variance diminishing result for the used stochastic gradient estimators.

التحسين والتحكم التعلم الآلي

Quantum algorithms and lower bounds for convex optimization

115 - Shouvanik Chakrabarti , Andrew M. Childs , Tongyang Li 2018

While recent work suggests that quantum computers can speed up the solution of semidefinite programs, little is known about the quantum complexity of more general convex optimization. We present a quantum algorithm that can optimize a convex function over an $n$-dimensional convex body using $tilde{O}(n)$ queries to oracles that evaluate the objective function and determine membership in the convex body. This represents a quadratic improvement over the best-known classical algorithm. We also study limitations on the power of quantum computers for general convex optimization, showing that it requires $tilde{Omega}(sqrt n)$ evaluation queries and $Omega(sqrt{n})$ membership queries.

فيزياء الكم بنى وهياكل البيانات والخوارزميات التحسين والتحكم