Variance Reduction for Matrix Games

88 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Yair Carmon

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Yair Carmon - Yujia Jin - Aaron Sidford

التحسين والتحكم بنى وهياكل البيانات والخوارزميات التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We present a randomized primal-dual algorithm that solves the problem $min_{x} max_{y} y^top A x$ to additive error $epsilon$ in time $mathrm{nnz}(A) + sqrt{mathrm{nnz}(A)n}/epsilon$, for matrix $A$ with larger dimension $n$ and $mathrm{nnz}(A)$ nonzero entries. This improves the best known exact gradient methods by a factor of $sqrt{mathrm{nnz}(A)/n}$ and is faster than fully stochastic gradient methods in the accurate and/or sparse regime $epsilon le sqrt{n/mathrm{nnz}(A)}$. Our results hold for $x,y$ in the simplex (matrix games, linear programming) and for $x$ in an $ell_2$ ball and $y$ in the simplex (perceptron / SVM, minimum enclosing ball). Our algorithm combines Nemirovskis conceptual prox-method and a novel reduced-variance gradient estimator based on sampling from the difference between the current iterate and a reference point.

قيم البحث

359 - Tongyang Li , Chunhao Wang , Shouvanik Chakrabarti 2020

We investigate sublinear classical and quantum algorithms for matrix games, a fundamental problem in optimization and machine learning, with provable guarantees. Given a matrix $Ainmathbb{R}^{ntimes d}$, sublinear algorithms for the matrix game $min_ {xinmathcal{X}}max_{yinmathcal{Y}} y^{top} Ax$ were previously known only for two special cases: (1) $mathcal{Y}$ being the $ell_{1}$-norm unit ball, and (2) $mathcal{X}$ being either the $ell_{1}$- or the $ell_{2}$-norm unit ball. We give a sublinear classical algorithm that can interpolate smoothly between these two cases: for any fixed $qin (1,2]$, we solve the matrix game where $mathcal{X}$ is a $ell_{q}$-norm unit ball within additive error $epsilon$ in time $tilde{O}((n+d)/{epsilon^{2}})$. We also provide a corresponding sublinear quantum algorithm that solves the same task in time $tilde{O}((sqrt{n}+sqrt{d})textrm{poly}(1/epsilon))$ with a quadratic improvement in both $n$ and $d$. Both our classical and quantum algorithms are optimal in the dimension parameters $n$ and $d$ up to poly-logarithmic factors. Finally, we propose sublinear classical and quantum algorithms for the approximate Caratheodory problem and the $ell_{q}$-margin support vector machines as applications.

فيزياء الكم بنى وهياكل البيانات والخوارزميات التعلم الآلي

Stochastic Variance Reduction for Variational Inequality Methods

392 - Ahmet Alacaoglu , Yura Malitsky 2021

We propose stochastic variance reduced algorithms for solving convex-concave saddle point problems, monotone variational inequalities, and monotone inclusions. Our framework applies to extragradient, forward-backward-forward, and forward-reflected-ba ckward methods both in Euclidean and Bregman setups. All proposed methods converge in exactly the same setting as their deterministic counterparts and they either match or improve the best-known complexities for solving structured min-max problems. Our results reinforce the correspondence between variance reduction in variational inequalities and minimization. We also illustrate the improvements of our approach with numerical evaluations on matrix games.

التحسين والتحكم التعلم الآلي التعلم الالي

Limitations on Variance-Reduction and Acceleration Schemes for Finite Sum Optimization

112 - Yossi Arjevani 2017

We study the conditions under which one is able to efficiently apply variance-reduction and acceleration schemes on finite sum optimization problems. First, we show that, perhaps surprisingly, the finite sum structure by itself, is not sufficient for obtaining a complexity bound of $tilde{cO}((n+L/mu)ln(1/epsilon))$ for $L$-smooth and $mu$-strongly convex individual functions - one must also know which individual function is being referred to by the oracle at each iteration. Next, we show that for a broad class of first-order and coordinate-descent finite sum algorithms (including, e.g., SDCA, SVRG, SAG), it is not possible to get an `accelerated complexity bound of $tilde{cO}((n+sqrt{n L/mu})ln(1/epsilon))$, unless the strong convexity parameter is given explicitly. Lastly, we show that when this class of algorithms is used for minimizing $L$-smooth and convex finite sums, the optimal complexity bound is $tilde{cO}(n+L/epsilon)$, assuming that (on average) the same update rule is used in every iteration, and $tilde{cO}(n+sqrt{nL/epsilon})$, otherwise.

التحسين والتحكم التعلم الآلي التعلم الالي

Distributed Stochastic Non-Convex Optimization: Momentum-Based Variance Reduction

169 - Prashant Khanduri , Pranay Sharma , Swatantra Kafle 2020

In this work, we propose a distributed algorithm for stochastic non-convex optimization. We consider a worker-server architecture where a set of $K$ worker nodes (WNs) in collaboration with a server node (SN) jointly aim to minimize a global, potenti ally non-convex objective function. The objective function is assumed to be the sum of local objective functions available at each WN, with each node having access to only the stochastic samples of its local objective function. In contrast to the existing approaches, we employ a momentum based single loop distributed algorithm which eliminates the need of computing large batch size gradients to achieve variance reduction. We propose two algorithms one with adaptive and the other with non-adaptive learning rates. We show that the proposed algorithms achieve the optimal computational complexity while attaining linear speedup with the number of WNs. Specifically, the algorithms reach an $epsilon$-stationary point $x_a$ with $mathbb{E}| abla f(x_a) | leq tilde{O}(K^{-1/3}T^{-1/2} + K^{-1/3}T^{-1/3})$ in $T$ iterations, thereby requiring $tilde{O}(K^{-1} epsilon^{-3})$ gradient computations at each WN. Moreover, our approach does not assume identical data distributions across WNs making the approach general enough for federated learning applications.

التحسين والتحكم النظم الموزعة والتوازية والحوسبة العنقودية

Dimensionality reduction of SDPs through sketching

209 - Andreas Bluhm , Daniel Stilck Franca 2017

We show how to sketch semidefinite programs (SDPs) using positive maps in order to reduce their dimension. More precisely, we use Johnsonhyp{}Lindenstrauss transforms to produce a smaller SDP whose solution preserves feasibility or approximates the v alue of the original problem with high probability. These techniques allow to improve both complexity and storage space requirements. They apply to problems in which the Schatten 1-norm of the matrices specifying the SDP and also of a solution to the problem is constant in the problem size. Furthermore, we provide some results which clarify the limitations of positive, linear sketches in this setting.

التحسين والتحكم بنى وهياكل البيانات والخوارزميات