Do you want to publish a course? Click here

Lower complexity bounds of first-order methods for convex-concave bilinear saddle-point problems

76   0   0.0 ( 0 )
 Added by Yangyang Xu
 Publication date 2018
  fields
and research's language is English




Ask ChatGPT about the research

On solving a convex-concave bilinear saddle-point problem (SPP), there have been many works studying the complexity results of first-order methods. These results are all about upper complexity bounds, which can determine at most how many efforts would guarantee a solution of desired accuracy. In this paper, we pursue the opposite direction by deriving lower complexity bounds of first-order methods on large-scale SPPs. Our results apply to the methods whose iterates are in the linear span of past first-order information, as well as more general methods that produce their iterates in an arbitrary manner based on first-order information. We first work on the affinely constrained smooth convex optimization that is a special case of SPP. Different from gradient method on unconstrained problems, we show that first-order methods on affinely constrained problems generally cannot be accelerated from the known convergence rate $O(1/t)$ to $O(1/t^2)$, and in addition, $O(1/t)$ is optimal for convex problems. Moreover, we prove that for strongly convex problems, $O(1/t^2)$ is the best possible convergence rate, while it is known that gradient methods can have linear convergence on unconstrained problems. Then we extend these results to general SPPs. It turns out that our lower complexity bounds match with several established upper complexity bounds in the literature, and thus they are tight and indicate the optimality of several existing first-order methods.



rate research

Read More

This paper studies the generalization bounds for the empirical saddle point (ESP) solution to stochastic saddle point (SSP) problems. For SSP with Lipschitz continuous and strongly convex-strongly concave objective functions, we establish an $mathcal{O}(1/n)$ generalization bound by using a uniform stability argument. We also provide generalization bounds under a variety of assumptions, including the cases without strong convexity and without bounded domains. We illustrate our results in two examples: batch policy learning in Markov decision process, and mixed strategy Nash equilibrium estimation for stochastic games. In each of these examples, we show that a regularized ESP solution enjoys a near-optimal sample complexity. To the best of our knowledge, this is the first set of results on the generalization theory of ESP.
We establish lower bounds on the complexity of finding $epsilon$-stationary points of smooth, non-convex high-dimensional functions using first-order methods. We prove that deterministic first-order methods, even applied to arbitrarily smooth functions, cannot achieve convergence rates in $epsilon$ better than $epsilon^{-8/5}$, which is within $epsilon^{-1/15}logfrac{1}{epsilon}$ of the best known rate for such methods. Moreover, for functions with Lipschitz first and second derivatives, we prove no deterministic first-order method can achieve convergence rates better than $epsilon^{-12/7}$, while $epsilon^{-2}$ is a lower bound for functions with only Lipschitz gradient. For convex functions with Lipschitz gradient, accelerated gradient descent achieves the rate $epsilon^{-1}logfrac{1}{epsilon}$, showing that finding stationary points is easier given convexity.
We provide a first-order oracle complexity lower bound for finding stationary points of min-max optimization problems where the objective function is smooth, nonconvex in the minimization variable, and strongly concave in the maximization variable. We establish a lower bound of $Omegaleft(sqrt{kappa}epsilon^{-2}right)$ for deterministic oracles, where $epsilon$ defines the level of approximate stationarity and $kappa$ is the condition number. Our analysis shows that the upper bound achieved in (Lin et al., 2020b) is optimal in the $epsilon$ and $kappa$ dependence up to logarithmic factors. For stochastic oracles, we provide a lower bound of $Omegaleft(sqrt{kappa}epsilon^{-2} + kappa^{1/3}epsilon^{-4}right)$. It suggests that there is a significant gap between the upper bound $mathcal{O}(kappa^3 epsilon^{-4})$ in (Lin et al., 2020a) and our lower bound in the condition number dependence.
84 - Zhaosong Lu , Zirui Zhou 2018
In this paper we consider a class of convex conic programming. In particular, we propose an inexact augmented Lagrangian (I-AL) method for solving this problem, in which the augmented Lagrangian subproblems are solved approximately by a variant of Nesterovs optimal first-order method. We show that the total number of first-order iterations of the proposed I-AL method for computing an $epsilon$-KKT solution is at most $mathcal{O}(epsilon^{-7/4})$. We also propose a modified I-AL method and show that it has an improved iteration-complexity $mathcal{O}(epsilon^{-1}logepsilon^{-1})$, which is so far the lowest complexity bound among all first-order I-AL type of methods for computing an $epsilon$-KKT solution. Our complexity analysis of the I-AL methods is mainly based on an analysis on inexact proximal point algorithm (PPA) and the link between the I-AL methods and inexact PPA. It is substantially different from the existing complexity analyses of the first-order I-AL methods in the literature, which typically regard the I-AL methods as an inexact dual gradient method. Compared to the mostly related I-AL methods cite{Lan16}, our modified I-AL method is more practically efficient and also applicable to a broader class of problems.
Finite-sum optimization problems are ubiquitous in machine learning, and are commonly solved using first-order methods which rely on gradient computations. Recently, there has been growing interest in emph{second-order} methods, which rely on both gradients and Hessians. In principle, second-order methods can require much fewer iterations than first-order methods, and hold the promise for more efficient algorithms. Although computing and manipulating Hessians is prohibitive for high-dimensional problems in general, the Hessians of individual functions in finite-sum problems can often be efficiently computed, e.g. because they possess a low-rank structure. Can second-order information indeed be used to solve such problems more efficiently? In this paper, we provide evidence that the answer -- perhaps surprisingly -- is negative, at least in terms of worst-case guarantees. However, we also discuss what additional assumptions and algorithmic approaches might potentially circumvent this negative result.
comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا