Limiting Behaviors of Nonconvex-Nonconcave Minimax Optimization via Continuous-Time Systems

107 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Benjamin Grimmer

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Benjamin Grimmer - Haihao Lu - Pratik Worah

التحسين والتحكم التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Unlike nonconvex optimization, where gradient descent is guaranteed to converge to a local optimizer, algorithms for nonconvex-nonconcave minimax optimization can have topologically different solution paths: sometimes converging to a solution, sometimes never converging and instead following a limit cycle, and sometimes diverging. In this paper, we study the limiting behaviors of three classic minimax algorithms: gradient descent ascent (GDA), alternating gradient descent ascent (AGDA), and the extragradient method (EGM). Numerically, we observe that all of these limiting behaviors can arise in Generative Adversarial Networks (GAN) training and are easily demonstrated for a range of GAN problems. To explain these different behaviors, we study the high-order resolution continuous-time dynamics that correspond to each algorithm, which results in the sufficient (and almost necessary) conditions for the local convergence by each method. Moreover, this ODE perspective allows us to characterize the phase transition between these different limiting behaviors caused by introducing regularization as Hopf Bifurcations.

قيم البحث

98 - Benjamin Grimmer , Haihao Lu , Pratik Worah 2020

Minimax optimization has become a central tool in machine learning with applications in robust optimization, reinforcement learning, GANs, etc. These applications are often nonconvex-nonconcave, but the existing theory is unable to identify and deal with the fundamental difficulties this poses. In this paper, we study the classic proximal point method (PPM) applied to nonconvex-nonconcave minimax problems. We find that a classic generalization of the Moreau envelope by Attouch and Wets provides key insights. Critically, we show this envelope not only smooths the objective but can convexify and concavify it based on the level of interaction present between the minimizing and maximizing variables. From this, we identify three distinct regions of nonconvex-nonconcave problems. When interaction is sufficiently strong, we derive global linear convergence guarantees. Conversely when the interaction is fairly weak, we derive local linear convergence guarantees with a proper initialization. Between these two settings, we show that PPM may diverge or converge to a limit cycle.

التحسين والتحكم التعلم الآلي التعلم الالي

Global Convergence and Variance-Reduced Optimization for a Class of Nonconvex-Nonconcave Minimax Problems

110 - Junchi Yang , Negar Kiyavash , Niao He 2020

Nonconvex minimax problems appear frequently in emerging machine learning applications, such as generative adversarial networks and adversarial learning. Simple algorithms such as the gradient descent ascent (GDA) are the common practice for solving these nonconvex games and receive lots of empirical success. Yet, it is known that these vanilla GDA algorithms with constant step size can potentially diverge even in the convex setting. In this work, we show that for a subclass of nonconvex-nonconcave objectives satisfying a so-called two-sided Polyak-{L}ojasiewicz inequality, the alternating gradient descent ascent (AGDA) algorithm converges globally at a linear rate and the stochastic AGDA achieves a sublinear rate. We further develop a variance reduced algorithm that attains a provably faster rate than AGDA when the problem has the finite-sum structure.

التحسين والتحكم التعلم الآلي التعلم الالي

The Complexity of Nonconvex-Strongly-Concave Minimax Optimization

129 - Siqi Zhang , Junchi Yang , Cristobal Guzman 2021

This paper studies the complexity for finding approximate stationary points of nonconvex-strongly-concave (NC-SC) smooth minimax problems, in both general and averaged smooth finite-sum settings. We establish nontrivial lower complexity bounds of $Om ega(sqrt{kappa}Delta Lepsilon^{-2})$ and $Omega(n+sqrt{nkappa}Delta Lepsilon^{-2})$ for the two settings, respectively, where $kappa$ is the condition number, $L$ is the smoothness constant, and $Delta$ is the initial gap. Our result reveals substantial gaps between these limits and best-known upper bounds in the literature. To close these gaps, we introduce a generic acceleration scheme that deploys existing gradient-based methods to solve a sequence of crafted strongly-convex-strongly-concave subproblems. In the general setting, the complexity of our proposed algorithm nearly matches the lower bound; in particular, it removes an additional poly-logarithmic dependence on accuracy present in previous works. In the averaged smooth finite-sum setting, our proposed algorithm improves over previous algorithms by providing a nearly-tight dependence on the condition number.

التحسين والتحكم التعلم الآلي التعلم الالي

The Minimax Complexity of Distributed Optimization

211 - Blake Woodworth 2021

In this thesis, I study the minimax oracle complexity of distributed stochastic optimization. First, I present the graph oracle model, an extension of the classic oracle complexity framework that can be applied to study distributed optimization algor ithms. Next, I describe a general approach to proving optimization lower bounds for arbitrary randomized algorithms (as opposed to more restricted classes of algorithms, e.g., deterministic or zero-respecting algorithms), which is used extensively throughout the thesis. For the remainder of the thesis, I focus on the specific case of the intermittent communication setting, where multiple computing devices work in parallel with limited communication amongst themselves. In this setting, I analyze the theoretical properties of the popular Local Stochastic Gradient Descent (SGD) algorithm in convex setting, both for homogeneous and heterogeneous objectives. I provide the first guarantees for Local SGD that improve over simple baseline methods, but show that Local SGD is not optimal in general. In pursuit of optimal methods in the intermittent communication setting, I then show matching upper and lower bounds for the intermittent communication setting with homogeneous convex, heterogeneous convex, and homogeneous non-convex objectives. These upper bounds are attained by simple variants of SGD which are therefore optimal. Finally, I discuss several additional assumptions about the objective or more powerful oracles that might be exploitable in order to develop better intermittent communication algorithms with better guarantees than our lower bounds allow.

التحسين والتحكم التعلم الآلي

Dependable Distributed Nonconvex Optimization via Polynomial Approximation

84 - Zhiyu He , Jianping He , Cailian Chen 2021

There has been work on exploiting polynomial approximation to solve distributed nonconvex optimization problems involving univariate objectives. This idea facilitates arbitrarily precise global optimization without requiring local evaluations of grad ients at every iteration. Nonetheless, there remains a gap between existing theoretical guarantees and diverse practical requirements for dependability, notably privacy preservation and robustness to network imperfections (e.g., time-varying directed communication and asynchrony). To fill this gap and keep the above strengths, we propose a Dependable Chebyshev-Proxy-based distributed Optimization Algorithm (D-CPOA). Specifically, to ensure both accuracy of solutions and privacy of local objectives, a new privacy-preserving mechanism is designed. This mechanism leverages the randomness in blockwise insertions of perturbed vector states and hence provides an improved privacy guarantee compared to the literature in terms of ($alpha,beta$)-data-privacy. Furthermore, to gain robustness to various network imperfections, we use the push-sum consensus protocol as a backbone, discuss its specific enhancements, and evaluate the performance of the proposed algorithm accordingly. Thanks to the linear consensus-based structure of iterations, we avoid the privacy-accuracy trade-off and the bother of selecting appropriate step-sizes in different settings. We provide rigorous analysis of the accuracy, dependability and complexity. It is shown that the advantages brought by the idea of polynomial approximation are maintained when all the above requirements exist. Simulations demonstrate the effectiveness of the developed algorithm.

التحسين والتحكم