Lower complexity bounds of first-order methods for convex-concave bilinear saddle-point problems

76 0 0.0 ( 0 )

Download Cite

Added by Yangyang Xu

Publication date 2018

fields

and research's language is English

Authors Yuyuan Ouyang - Yangyang Xu

Optimization and Control

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

On solving a convex-concave bilinear saddle-point problem (SPP), there have been many works studying the complexity results of first-order methods. These results are all about upper complexity bounds, which can determine at most how many efforts would guarantee a solution of desired accuracy. In this paper, we pursue the opposite direction by deriving lower complexity bounds of first-order methods on large-scale SPPs. Our results apply to the methods whose iterates are in the linear span of past first-order information, as well as more general methods that produce their iterates in an arbitrary manner based on first-order information. We first work on the affinely constrained smooth convex optimization that is a special case of SPP. Different from gradient method on unconstrained problems, we show that first-order methods on affinely constrained problems generally cannot be accelerated from the known convergence rate $O(1/t)$ to $O(1/t^2)$, and in addition, $O(1/t)$ is optimal for convex problems. Moreover, we prove that for strongly convex problems, $O(1/t^2)$ is the best possible convergence rate, while it is known that gradient methods can have linear convergence on unconstrained problems. Then we extend these results to general SPPs. It turns out that our lower complexity bounds match with several established upper complexity bounds in the literature, and thus they are tight and indicate the optimality of several existing first-order methods.

rate research

Generalization Bounds for Stochastic Saddle Point Problems

67 - Junyu Zhang , Mingyi Hong , Mengdi Wang 2020

This paper studies the generalization bounds for the empirical saddle point (ESP) solution to stochastic saddle point (SSP) problems. For SSP with Lipschitz continuous and strongly convex-strongly concave objective functions, we establish an $mathcal{O}(1/n)$ generalization bound by using a uniform stability argument. We also provide generalization bounds under a variety of assumptions, including the cases without strong convexity and without bounded domains. We illustrate our results in two examples: batch policy learning in Markov decision process, and mixed strategy Nash equilibrium estimation for stochastic games. In each of these examples, we show that a regularized ESP solution enjoys a near-optimal sample complexity. To the best of our knowledge, this is the first set of results on the generalization theory of ESP.

Optimization and Control

Lower Bounds for Finding Stationary Points II: First-Order Methods

76 - Yair Carmon , John C. Duchi , Oliver Hinder 2017

We establish lower bounds on the complexity of finding $epsilon$-stationary points of smooth, non-convex high-dimensional functions using first-order methods. We prove that deterministic first-order methods, even applied to arbitrarily smooth functions, cannot achieve convergence rates in $epsilon$ better than $epsilon^{-8/5}$, which is within $epsilon^{-1/15}logfrac{1}{epsilon}$ of the best known rate for such methods. Moreover, for functions with Lipschitz first and second derivatives, we prove no deterministic first-order method can achieve convergence rates better than $epsilon^{-12/7}$, while $epsilon^{-2}$ is a lower bound for functions with only Lipschitz gradient. For convex functions with Lipschitz gradient, accelerated gradient descent achieves the rate $epsilon^{-1}logfrac{1}{epsilon}$, showing that finding stationary points is easier given convexity.

Optimization and Control

Complexity Lower Bounds for Nonconvex-Strongly-Concave Min-Max Optimization

74 - Haochuan Li , Yi Tian , Jingzhao Zhang 2021

We provide a first-order oracle complexity lower bound for finding stationary points of min-max optimization problems where the objective function is smooth, nonconvex in the minimization variable, and strongly concave in the maximization variable. We establish a lower bound of $Omegaleft(sqrt{kappa}epsilon^{-2}right)$ for deterministic oracles, where $epsilon$ defines the level of approximate stationarity and $kappa$ is the condition number. Our analysis shows that the upper bound achieved in (Lin et al., 2020b) is optimal in the $epsilon$ and $kappa$ dependence up to logarithmic factors. For stochastic oracles, we provide a lower bound of $Omegaleft(sqrt{kappa}epsilon^{-2} + kappa^{1/3}epsilon^{-4}right)$. It suggests that there is a significant gap between the upper bound $mathcal{O}(kappa^3 epsilon^{-4})$ in (Lin et al., 2020a) and our lower bound in the condition number dependence.

Optimization and Control Machine Learning Machine Learning

Iteration-complexity of first-order augmented Lagrangian methods for convex conic programming

84 - Zhaosong Lu , Zirui Zhou 2018

In this paper we consider a class of convex conic programming. In particular, we propose an inexact augmented Lagrangian (I-AL) method for solving this problem, in which the augmented Lagrangian subproblems are solved approximately by a variant of Nesterovs optimal first-order method. We show that the total number of first-order iterations of the proposed I-AL method for computing an $epsilon$-KKT solution is at most $mathcal{O}(epsilon^{-7/4})$. We also propose a modified I-AL method and show that it has an improved iteration-complexity $mathcal{O}(epsilon^{-1}logepsilon^{-1})$, which is so far the lowest complexity bound among all first-order I-AL type of methods for computing an $epsilon$-KKT solution. Our complexity analysis of the I-AL methods is mainly based on an analysis on inexact proximal point algorithm (PPA) and the link between the I-AL methods and inexact PPA. It is substantially different from the existing complexity analyses of the first-order I-AL methods in the literature, which typically regard the I-AL methods as an inexact dual gradient method. Compared to the mostly related I-AL methods cite{Lan16}, our modified I-AL method is more practically efficient and also applicable to a broader class of problems.

Optimization and Control Computational Complexity Numerical Analysis

Oracle Complexity of Second-Order Methods for Finite-Sum Problems

111 - Yossi Arjevani , Ohad Shamir 2016

Finite-sum optimization problems are ubiquitous in machine learning, and are commonly solved using first-order methods which rely on gradient computations. Recently, there has been growing interest in emph{second-order} methods, which rely on both gradients and Hessians. In principle, second-order methods can require much fewer iterations than first-order methods, and hold the promise for more efficient algorithms. Although computing and manipulating Hessians is prohibitive for high-dimensional problems in general, the Hessians of individual functions in finite-sum problems can often be efficiently computed, e.g. because they possess a low-rank structure. Can second-order information indeed be used to solve such problems more efficiently? In this paper, we provide evidence that the answer -- perhaps surprisingly -- is negative, at least in terms of worst-case guarantees. However, we also discuss what additional assumptions and algorithmic approaches might potentially circumvent this negative result.

Optimization and Control Machine Learning Machine Learning