No Arabic abstract
We present a randomized primal-dual algorithm that solves the problem $min_{x} max_{y} y^top A x$ to additive error $epsilon$ in time $mathrm{nnz}(A) + sqrt{mathrm{nnz}(A)n}/epsilon$, for matrix $A$ with larger dimension $n$ and $mathrm{nnz}(A)$ nonzero entries. This improves the best known exact gradient methods by a factor of $sqrt{mathrm{nnz}(A)/n}$ and is faster than fully stochastic gradient methods in the accurate and/or sparse regime $epsilon le sqrt{n/mathrm{nnz}(A)}$. Our results hold for $x,y$ in the simplex (matrix games, linear programming) and for $x$ in an $ell_2$ ball and $y$ in the simplex (perceptron / SVM, minimum enclosing ball). Our algorithm combines Nemirovskis conceptual prox-method and a novel reduced-variance gradient estimator based on sampling from the difference between the current iterate and a reference point.
We investigate sublinear classical and quantum algorithms for matrix games, a fundamental problem in optimization and machine learning, with provable guarantees. Given a matrix $Ainmathbb{R}^{ntimes d}$, sublinear algorithms for the matrix game $min_{xinmathcal{X}}max_{yinmathcal{Y}} y^{top} Ax$ were previously known only for two special cases: (1) $mathcal{Y}$ being the $ell_{1}$-norm unit ball, and (2) $mathcal{X}$ being either the $ell_{1}$- or the $ell_{2}$-norm unit ball. We give a sublinear classical algorithm that can interpolate smoothly between these two cases: for any fixed $qin (1,2]$, we solve the matrix game where $mathcal{X}$ is a $ell_{q}$-norm unit ball within additive error $epsilon$ in time $tilde{O}((n+d)/{epsilon^{2}})$. We also provide a corresponding sublinear quantum algorithm that solves the same task in time $tilde{O}((sqrt{n}+sqrt{d})textrm{poly}(1/epsilon))$ with a quadratic improvement in both $n$ and $d$. Both our classical and quantum algorithms are optimal in the dimension parameters $n$ and $d$ up to poly-logarithmic factors. Finally, we propose sublinear classical and quantum algorithms for the approximate Caratheodory problem and the $ell_{q}$-margin support vector machines as applications.
We propose stochastic variance reduced algorithms for solving convex-concave saddle point problems, monotone variational inequalities, and monotone inclusions. Our framework applies to extragradient, forward-backward-forward, and forward-reflected-backward methods both in Euclidean and Bregman setups. All proposed methods converge in exactly the same setting as their deterministic counterparts and they either match or improve the best-known complexities for solving structured min-max problems. Our results reinforce the correspondence between variance reduction in variational inequalities and minimization. We also illustrate the improvements of our approach with numerical evaluations on matrix games.
We study the conditions under which one is able to efficiently apply variance-reduction and acceleration schemes on finite sum optimization problems. First, we show that, perhaps surprisingly, the finite sum structure by itself, is not sufficient for obtaining a complexity bound of $tilde{cO}((n+L/mu)ln(1/epsilon))$ for $L$-smooth and $mu$-strongly convex individual functions - one must also know which individual function is being referred to by the oracle at each iteration. Next, we show that for a broad class of first-order and coordinate-descent finite sum algorithms (including, e.g., SDCA, SVRG, SAG), it is not possible to get an `accelerated complexity bound of $tilde{cO}((n+sqrt{n L/mu})ln(1/epsilon))$, unless the strong convexity parameter is given explicitly. Lastly, we show that when this class of algorithms is used for minimizing $L$-smooth and convex finite sums, the optimal complexity bound is $tilde{cO}(n+L/epsilon)$, assuming that (on average) the same update rule is used in every iteration, and $tilde{cO}(n+sqrt{nL/epsilon})$, otherwise.
In this work, we propose a distributed algorithm for stochastic non-convex optimization. We consider a worker-server architecture where a set of $K$ worker nodes (WNs) in collaboration with a server node (SN) jointly aim to minimize a global, potentially non-convex objective function. The objective function is assumed to be the sum of local objective functions available at each WN, with each node having access to only the stochastic samples of its local objective function. In contrast to the existing approaches, we employ a momentum based single loop distributed algorithm which eliminates the need of computing large batch size gradients to achieve variance reduction. We propose two algorithms one with adaptive and the other with non-adaptive learning rates. We show that the proposed algorithms achieve the optimal computational complexity while attaining linear speedup with the number of WNs. Specifically, the algorithms reach an $epsilon$-stationary point $x_a$ with $mathbb{E}| abla f(x_a) | leq tilde{O}(K^{-1/3}T^{-1/2} + K^{-1/3}T^{-1/3})$ in $T$ iterations, thereby requiring $tilde{O}(K^{-1} epsilon^{-3})$ gradient computations at each WN. Moreover, our approach does not assume identical data distributions across WNs making the approach general enough for federated learning applications.
We show how to sketch semidefinite programs (SDPs) using positive maps in order to reduce their dimension. More precisely, we use Johnsonhyp{}Lindenstrauss transforms to produce a smaller SDP whose solution preserves feasibility or approximates the value of the original problem with high probability. These techniques allow to improve both complexity and storage space requirements. They apply to problems in which the Schatten 1-norm of the matrices specifying the SDP and also of a solution to the problem is constant in the problem size. Furthermore, we provide some results which clarify the limitations of positive, linear sketches in this setting.