Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Graph Oracle Models, Lower Bounds, and Gaps for Parallel Stochastic Optimization

183 0 0.0 ( 0 )

Download Cite

Added by Blake Woodworth

Publication date 2018

fields Informatics Engineering

and research's language is English

Authors Blake Woodworth - Jialei Wang - Adam Smith

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We suggest a general oracle-based framework that captures different parallel stochastic optimization settings described by a dependency graph, and derive generic lower bounds in terms of this graph. We then use the framework and derive lower bounds for several specific parallel optimization settings, including delayed updates and parallel processing with intermittent communication. We highlight gaps between lower and upper bounds on the oracle complexity, and cases where the natural algorithms are not known to be optimal.

rate research

Lower Bounds for Non-Convex Stochastic Optimization

389 - Yossi Arjevani , Yair Carmon , John C. Duchi 2019

We lower bound the complexity of finding $epsilon$-stationary points (with gradient norm at most $epsilon$) using stochastic first-order methods. In a well-studied model where algorithms access smooth, potentially non-convex functions through queries to an unbiased stochastic gradient oracle with bounded variance, we prove that (in the worst case) any algorithm requires at least $epsilon^{-4}$ queries to find an $epsilon$ stationary point. The lower bound is tight, and establishes that stochastic gradient descent is minimax optimal in this model. In a more restrictive model where the noisy gradient estimates satisfy a mean-squared smoothness property, we prove a lower bound of $epsilon^{-3}$ queries, establishing the optimality of recently proposed variance reduction techniques.

Optimization and Control Information Theory Machine Learning

Complexity Lower Bounds for Nonconvex-Strongly-Concave Min-Max Optimization

74 - Haochuan Li , Yi Tian , Jingzhao Zhang 2021

We provide a first-order oracle complexity lower bound for finding stationary points of min-max optimization problems where the objective function is smooth, nonconvex in the minimization variable, and strongly concave in the maximization variable. We establish a lower bound of $Omegaleft(sqrt{kappa}epsilon^{-2}right)$ for deterministic oracles, where $epsilon$ defines the level of approximate stationarity and $kappa$ is the condition number. Our analysis shows that the upper bound achieved in (Lin et al., 2020b) is optimal in the $epsilon$ and $kappa$ dependence up to logarithmic factors. For stochastic oracles, we provide a lower bound of $Omegaleft(sqrt{kappa}epsilon^{-2} + kappa^{1/3}epsilon^{-4}right)$. It suggests that there is a significant gap between the upper bound $mathcal{O}(kappa^3 epsilon^{-4})$ in (Lin et al., 2020a) and our lower bound in the condition number dependence.

Optimization and Control Machine Learning Machine Learning

Accelerated, Optimal, and Parallel: Some Results on Model-Based Stochastic Optimization

135 - Karan Chadha , Gary Cheng , John C. Duchi 2021

We extend the Approximate-Proximal Point (aProx) family of model-based methods for solving stochastic convex optimization problems, including stochastic subgradient, proximal point, and bundle methods, to the minibatch and accelerated setting. To do so, we propose specific model-based algorithms and an acceleration scheme for which we provide non-asymptotic convergence guarantees, which are order-optimal in all problem-dependent constants and provide linear speedup in minibatch size, while maintaining the desirable robustness traits (e.g. to stepsize) of the aProx family. Additionally, we show improved convergence rates and matching lower bounds identifying new fundamental constants for interpolation problems, whose importance in statistical machine learning is growing; this, for example, gives a parallelization strategy for alternating projections. We corroborate our theoretical results with empirical testing to demonstrate the gains accurate modeling, acceleration, and minibatching provide.

Optimization and Control Machine Learning Machine Learning

Improved Oracle Complexity of Variance Reduced Methods for Nonsmooth Convex Stochastic Composition Optimization

132 - Tianyi Lin , Chenyou Fan , Mengdi Wang 2018

We consider the nonsmooth convex composition optimization problem where the objective is a composition of two finite-sum functions and analyze stochastic compositional variance reduced gradient (SCVRG) methods for them. SCVRG and its variants have recently drawn much attention given their edge over stochastic compositional gradient descent (SCGD); but the theoretical analysis exclusively assumes strong convexity of the objective, which excludes several important examples such as Lasso, logistic regression, principle component analysis and deep neural nets. In contrast, we prove non-asymptotic incremental first-order oracle (IFO) complexity of SCVRG or its novel variants for nonsmooth convex composition optimization and show that they are provably faster than SCGD and gradient descent. More specifically, our method achieves the total IFO complexity of $Oleft((m+n)logleft(1/epsilonright)+1/epsilon^3right)$ which improves that of $Oleft(1/epsilon^{3.5}right)$ and $Oleft((m+n)/sqrt{epsilon}right)$ obtained by SCGD and accelerated gradient descent (AGD) respectively. Experimental results confirm that our methods outperform several existing methods, e.g., SCGD and AGD, on sparse mean-variance optimization problem.

Optimization and Control Machine Learning

Biased Stochastic Gradient Descent for Conditional Stochastic Optimization

148 - Yifan Hu , Siqi Zhang , Xin Chen 2020

Conditional Stochastic Optimization (CSO) covers a variety of applications ranging from meta-learning and causal inference to invariant learning. However, constructing unbiased gradient estimates in CSO is challenging due to the composition structure. As an alternative, we propose a biased stochastic gradient descent (BSGD) algorithm and study the bias-variance tradeoff under different structural assumptions. We establish the sample complexities of BSGD for strongly convex, convex, and weakly convex objectives, under smooth and non-smooth conditions. We also provide matching lower bounds of BSGD for convex CSO objectives. Extensive numerical experiments are conducted to illustrate the performance of BSGD on robust logistic regression, model-agnostic meta-learning (MAML), and instrumental variable regression (IV).

Optimization and Control Machine Learning Machine Learning

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Graph Oracle Models, Lower Bounds, and Gaps for Parallel Stochastic Optimization

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions