On Solving Minimax Optimization Locally: A Follow-the-Ridge Approach

94 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Guodong Zhang

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Yuanhao Wang - Guodong Zhang - Jimmy Ba

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Many tasks in modern machine learning can be formulated as finding equilibria in emph{sequential} games. In particular, two-player zero-sum sequential games, also known as minimax optimization, have received growing interest. It is tempting to apply gradient descent to solve minimax optimization given its popularity and success in supervised learning. However, it has been noted that naive application of gradient descent fails to find some local minimax and can converge to non-local-minimax points. In this paper, we propose emph{Follow-the-Ridge} (FR), a novel algorithm that provably converges to and only converges to local minimax. We show theoretically that the algorithm addresses the notorious rotational behaviour of gradient dynamics, and is compatible with preconditioning and emph{positive} momentum. Empirically, FR solves toy minimax problems and improves the convergence of GAN training compared to the recent minimax optimization algorithms.

قيم البحث

68 - Tor Lattimore 2021

We analyse adversarial bandit convex optimisation with an adversary that is restricted to playing functions of the form $f_t(x) = g_t(langle x, thetarangle)$ for convex $g_t : mathbb R to mathbb R$ and unknown $theta in mathbb R^d$ that is homogeneou s over time. We provide a short information-theoretic proof that the minimax regret is at most $O(d sqrt{n} log(n operatorname{diam}(mathcal K)))$ where $n$ is the number of interactions, $d$ the dimension and $operatorname{diam}(mathcal K)$ is the diameter of the constraint set.

التعلم الآلي التحسين والتحكم

Near-optimal Local Convergence of Alternating Gradient Descent-Ascent for Minimax Optimization

148 - Guodong Zhang , Yuanhao Wang , Laurent Lessard 2021

Smooth minimax games often proceed by simultaneous or alternating gradient updates. Although algorithms with alternating updates are commonly used in practice for many applications (e.g., GAN training), the majority of existing theoretical analyses f ocus on simultaneous algorithms for convenience of analysis. In this paper, we study alternating gradient descent-ascent (Alt-GDA) in minimax games and show that Alt-GDA is superior to its simultaneous counterpart (Sim-GDA) in many settings. In particular, we prove that Alt-GDA achieves a near-optimal local convergence rate for strongly convex-strongly concave (SCSC) problems while Sim-GDA converges at a much slower rate. To our knowledge, this is the emph{first} result of any setting showing that Alt-GDA converges faster than Sim-GDA by more than a constant. We further prove that the acceleration effect of alternating updates remains when the minimax problem has only strong concavity in the dual variables. Lastly, we adapt the theory of integral quadratic constraints and show that Alt-GDA attains the same rate emph{globally} for a class of SCSC minimax problems. Numerical experiments on quadratic minimax games validate our claims. Empirically, we demonstrate that alternating updates speed up GAN training significantly and the use of optimism only helps for simultaneous algorithms.

التعلم الآلي التحسين والتحكم التعلم الالي

Efficient Online-Bandit Strategies for Minimax Learning Problems

89 - Christophe Roux , Elias Wirth , Sebastian Pokutta 2021

Several learning problems involve solving min-max problems, e.g., empirical distributional robust learning or learning with non-standard aggregated losses. More specifically, these problems are convex-linear problems where the minimization is carried out over the model parameters $winmathcal{W}$ and the maximization over the empirical distribution $pinmathcal{K}$ of the training set indexes, where $mathcal{K}$ is the simplex or a subset of it. To design efficient methods, we let an online learning algorithm play against a (combinatorial) bandit algorithm. We argue that the efficiency of such approaches critically depends on the structure of $mathcal{K}$ and propose two properties of $mathcal{K}$ that facilitate designing efficient algorithms. We focus on a specific family of sets $mathcal{S}_{n,k}$ encompassing various learning applications and provide high-probability convergence guarantees to the minimax values.

التعلم الآلي التحسين والتحكم التعلم الالي

Nearly Minimax Optimal Reinforcement Learning for Discounted MDPs

142 - Jiafan He , Dongruo Zhou , Quanquan Gu 2020

We study the reinforcement learning problem for discounted Markov Decision Processes (MDPs) under the tabular setting. We propose a model-based algorithm named UCBVI-$gamma$, which is based on the emph{optimism in the face of uncertainty principle} a nd the Bernstein-type bonus. We show that UCBVI-$gamma$ achieves an $tilde{O}big({sqrt{SAT}}/{(1-gamma)^{1.5}}big)$ regret, where $S$ is the number of states, $A$ is the number of actions, $gamma$ is the discount factor and $T$ is the number of steps. In addition, we construct a class of hard MDPs and show that for any algorithm, the expected regret is at least $tilde{Omega}big({sqrt{SAT}}/{(1-gamma)^{1.5}}big)$. Our upper bound matches the minimax lower bound up to logarithmic factors, which suggests that UCBVI-$gamma$ is nearly minimax optimal for discounted MDPs.

التعلم الآلي التحسين والتحكم التعلم الالي

Train simultaneously, generalize better: Stability of gradient-based minimax learners

130 - Farzan Farnia , Asuman Ozdaglar 2020

The success of minimax learning problems of generative adversarial networks (GANs) has been observed to depend on the minimax optimization algorithm used for their training. This dependence is commonly attributed to the convergence speed and robustne ss properties of the underlying optimization algorithm. In this paper, we show that the optimization algorithm also plays a key role in the generalization performance of the trained minimax model. To this end, we analyze the generalization properties of standard gradient descent ascent (GDA) and proximal point method (PPM) algorithms through the lens of algorithmic stability under both convex concave and non-convex non-concave minimax settings. While the GDA algorithm is not guaranteed to have a vanishing excess risk in convex concave problems, we show the PPM algorithm enjoys a bounded excess risk in the same setup. For non-convex non-concave problems, we compare the generalization performance of stochastic GDA and GDmax algorithms where the latter fully solves the maximization subproblem at every iteration. Our generalization analysis suggests the superiority of GDA provided that the minimization and maximization subproblems are solved simultaneously with similar learning rates. We discuss several numerical results indicating the role of optimization algorithms in the generalization of the learned minimax models.

التعلم الآلي التحسين والتحكم التعلم الالي