Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Variance Reduction for Matrix Games

88 0 0.0 ( 0 )

Download Cite

Added by Yair Carmon

Publication date 2019

fields Informatics Engineering

and research's language is English

Authors Yair Carmon - Yujia Jin - Aaron Sidford

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We present a randomized primal-dual algorithm that solves the problem $min_{x} max_{y} y^top A x$ to additive error $epsilon$ in time $mathrm{nnz}(A) + sqrt{mathrm{nnz}(A)n}/epsilon$, for matrix $A$ with larger dimension $n$ and $mathrm{nnz}(A)$ nonzero entries. This improves the best known exact gradient methods by a factor of $sqrt{mathrm{nnz}(A)/n}$ and is faster than fully stochastic gradient methods in the accurate and/or sparse regime $epsilon le sqrt{n/mathrm{nnz}(A)}$. Our results hold for $x,y$ in the simplex (matrix games, linear programming) and for $x$ in an $ell_2$ ball and $y$ in the simplex (perceptron / SVM, minimum enclosing ball). Our algorithm combines Nemirovskis conceptual prox-method and a novel reduced-variance gradient estimator based on sampling from the difference between the current iterate and a reference point.

rate research

Sublinear classical and quantum algorithms for general matrix games

359 - Tongyang Li , Chunhao Wang , Shouvanik Chakrabarti 2020

We investigate sublinear classical and quantum algorithms for matrix games, a fundamental problem in optimization and machine learning, with provable guarantees. Given a matrix $Ainmathbb{R}^{ntimes d}$, sublinear algorithms for the matrix game $min_{xinmathcal{X}}max_{yinmathcal{Y}} y^{top} Ax$ were previously known only for two special cases: (1) $mathcal{Y}$ being the $ell_{1}$-norm unit ball, and (2) $mathcal{X}$ being either the $ell_{1}$- or the $ell_{2}$-norm unit ball. We give a sublinear classical algorithm that can interpolate smoothly between these two cases: for any fixed $qin (1,2]$, we solve the matrix game where $mathcal{X}$ is a $ell_{q}$-norm unit ball within additive error $epsilon$ in time $tilde{O}((n+d)/{epsilon^{2}})$. We also provide a corresponding sublinear quantum algorithm that solves the same task in time $tilde{O}((sqrt{n}+sqrt{d})textrm{poly}(1/epsilon))$ with a quadratic improvement in both $n$ and $d$. Both our classical and quantum algorithms are optimal in the dimension parameters $n$ and $d$ up to poly-logarithmic factors. Finally, we propose sublinear classical and quantum algorithms for the approximate Caratheodory problem and the $ell_{q}$-margin support vector machines as applications.

Quantum Physics Data Structures and Algorithms Machine Learning

Stochastic Variance Reduction for Variational Inequality Methods

392 - Ahmet Alacaoglu , Yura Malitsky 2021

We propose stochastic variance reduced algorithms for solving convex-concave saddle point problems, monotone variational inequalities, and monotone inclusions. Our framework applies to extragradient, forward-backward-forward, and forward-reflected-backward methods both in Euclidean and Bregman setups. All proposed methods converge in exactly the same setting as their deterministic counterparts and they either match or improve the best-known complexities for solving structured min-max problems. Our results reinforce the correspondence between variance reduction in variational inequalities and minimization. We also illustrate the improvements of our approach with numerical evaluations on matrix games.

Optimization and Control Machine Learning Machine Learning

Limitations on Variance-Reduction and Acceleration Schemes for Finite Sum Optimization

112 - Yossi Arjevani 2017

We study the conditions under which one is able to efficiently apply variance-reduction and acceleration schemes on finite sum optimization problems. First, we show that, perhaps surprisingly, the finite sum structure by itself, is not sufficient for obtaining a complexity bound of $tilde{cO}((n+L/mu)ln(1/epsilon))$ for $L$-smooth and $mu$-strongly convex individual functions - one must also know which individual function is being referred to by the oracle at each iteration. Next, we show that for a broad class of first-order and coordinate-descent finite sum algorithms (including, e.g., SDCA, SVRG, SAG), it is not possible to get an `accelerated complexity bound of $tilde{cO}((n+sqrt{n L/mu})ln(1/epsilon))$, unless the strong convexity parameter is given explicitly. Lastly, we show that when this class of algorithms is used for minimizing $L$-smooth and convex finite sums, the optimal complexity bound is $tilde{cO}(n+L/epsilon)$, assuming that (on average) the same update rule is used in every iteration, and $tilde{cO}(n+sqrt{nL/epsilon})$, otherwise.

Optimization and Control Machine Learning Machine Learning

Distributed Stochastic Non-Convex Optimization: Momentum-Based Variance Reduction

169 - Prashant Khanduri , Pranay Sharma , Swatantra Kafle 2020

In this work, we propose a distributed algorithm for stochastic non-convex optimization. We consider a worker-server architecture where a set of $K$ worker nodes (WNs) in collaboration with a server node (SN) jointly aim to minimize a global, potentially non-convex objective function. The objective function is assumed to be the sum of local objective functions available at each WN, with each node having access to only the stochastic samples of its local objective function. In contrast to the existing approaches, we employ a momentum based single loop distributed algorithm which eliminates the need of computing large batch size gradients to achieve variance reduction. We propose two algorithms one with adaptive and the other with non-adaptive learning rates. We show that the proposed algorithms achieve the optimal computational complexity while attaining linear speedup with the number of WNs. Specifically, the algorithms reach an $epsilon$-stationary point $x_a$ with $mathbb{E}| abla f(x_a) | leq tilde{O}(K^{-1/3}T^{-1/2} + K^{-1/3}T^{-1/3})$ in $T$ iterations, thereby requiring $tilde{O}(K^{-1} epsilon^{-3})$ gradient computations at each WN. Moreover, our approach does not assume identical data distributions across WNs making the approach general enough for federated learning applications.

Optimization and Control Distributed Parallel and Cluster Computing

Dimensionality reduction of SDPs through sketching

209 - Andreas Bluhm , Daniel Stilck Franca 2017

We show how to sketch semidefinite programs (SDPs) using positive maps in order to reduce their dimension. More precisely, we use Johnsonhyp{}Lindenstrauss transforms to produce a smaller SDP whose solution preserves feasibility or approximates the value of the original problem with high probability. These techniques allow to improve both complexity and storage space requirements. They apply to problems in which the Schatten 1-norm of the matrices specifying the SDP and also of a solution to the problem is constant in the problem size. Furthermore, we provide some results which clarify the limitations of positive, linear sketches in this setting.

Optimization and Control Data Structures and Algorithms

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Variance Reduction for Matrix Games

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions