ترغب بنشر مسار تعليمي؟ اضغط هنا

The Complexity of Nonconvex-Strongly-Concave Minimax Optimization

130   0   0.0 ( 0 )
 نشر من قبل Siqi Zhang
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

This paper studies the complexity for finding approximate stationary points of nonconvex-strongly-concave (NC-SC) smooth minimax problems, in both general and averaged smooth finite-sum settings. We establish nontrivial lower complexity bounds of $Omega(sqrt{kappa}Delta Lepsilon^{-2})$ and $Omega(n+sqrt{nkappa}Delta Lepsilon^{-2})$ for the two settings, respectively, where $kappa$ is the condition number, $L$ is the smoothness constant, and $Delta$ is the initial gap. Our result reveals substantial gaps between these limits and best-known upper bounds in the literature. To close these gaps, we introduce a generic acceleration scheme that deploys existing gradient-based methods to solve a sequence of crafted strongly-convex-strongly-concave subproblems. In the general setting, the complexity of our proposed algorithm nearly matches the lower bound; in particular, it removes an additional poly-logarithmic dependence on accuracy present in previous works. In the averaged smooth finite-sum setting, our proposed algorithm improves over previous algorithms by providing a nearly-tight dependence on the condition number.



قيم البحث

اقرأ أيضاً

We provide a first-order oracle complexity lower bound for finding stationary points of min-max optimization problems where the objective function is smooth, nonconvex in the minimization variable, and strongly concave in the maximization variable. W e establish a lower bound of $Omegaleft(sqrt{kappa}epsilon^{-2}right)$ for deterministic oracles, where $epsilon$ defines the level of approximate stationarity and $kappa$ is the condition number. Our analysis shows that the upper bound achieved in (Lin et al., 2020b) is optimal in the $epsilon$ and $kappa$ dependence up to logarithmic factors. For stochastic oracles, we provide a lower bound of $Omegaleft(sqrt{kappa}epsilon^{-2} + kappa^{1/3}epsilon^{-4}right)$. It suggests that there is a significant gap between the upper bound $mathcal{O}(kappa^3 epsilon^{-4})$ in (Lin et al., 2020a) and our lower bound in the condition number dependence.
Minimax optimization has become a central tool in machine learning with applications in robust optimization, reinforcement learning, GANs, etc. These applications are often nonconvex-nonconcave, but the existing theory is unable to identify and deal with the fundamental difficulties this poses. In this paper, we study the classic proximal point method (PPM) applied to nonconvex-nonconcave minimax problems. We find that a classic generalization of the Moreau envelope by Attouch and Wets provides key insights. Critically, we show this envelope not only smooths the objective but can convexify and concavify it based on the level of interaction present between the minimizing and maximizing variables. From this, we identify three distinct regions of nonconvex-nonconcave problems. When interaction is sufficiently strong, we derive global linear convergence guarantees. Conversely when the interaction is fairly weak, we derive local linear convergence guarantees with a proper initialization. Between these two settings, we show that PPM may diverge or converge to a limit cycle.
211 - Blake Woodworth 2021
In this thesis, I study the minimax oracle complexity of distributed stochastic optimization. First, I present the graph oracle model, an extension of the classic oracle complexity framework that can be applied to study distributed optimization algor ithms. Next, I describe a general approach to proving optimization lower bounds for arbitrary randomized algorithms (as opposed to more restricted classes of algorithms, e.g., deterministic or zero-respecting algorithms), which is used extensively throughout the thesis. For the remainder of the thesis, I focus on the specific case of the intermittent communication setting, where multiple computing devices work in parallel with limited communication amongst themselves. In this setting, I analyze the theoretical properties of the popular Local Stochastic Gradient Descent (SGD) algorithm in convex setting, both for homogeneous and heterogeneous objectives. I provide the first guarantees for Local SGD that improve over simple baseline methods, but show that Local SGD is not optimal in general. In pursuit of optimal methods in the intermittent communication setting, I then show matching upper and lower bounds for the intermittent communication setting with homogeneous convex, heterogeneous convex, and homogeneous non-convex objectives. These upper bounds are attained by simple variants of SGD which are therefore optimal. Finally, I discuss several additional assumptions about the objective or more powerful oracles that might be exploitable in order to develop better intermittent communication algorithms with better guarantees than our lower bounds allow.
Nonconvex minimax problems appear frequently in emerging machine learning applications, such as generative adversarial networks and adversarial learning. Simple algorithms such as the gradient descent ascent (GDA) are the common practice for solving these nonconvex games and receive lots of empirical success. Yet, it is known that these vanilla GDA algorithms with constant step size can potentially diverge even in the convex setting. In this work, we show that for a subclass of nonconvex-nonconcave objectives satisfying a so-called two-sided Polyak-{L}ojasiewicz inequality, the alternating gradient descent ascent (AGDA) algorithm converges globally at a linear rate and the stochastic AGDA achieves a sublinear rate. We further develop a variance reduced algorithm that attains a provably faster rate than AGDA when the problem has the finite-sum structure.
Unlike nonconvex optimization, where gradient descent is guaranteed to converge to a local optimizer, algorithms for nonconvex-nonconcave minimax optimization can have topologically different solution paths: sometimes converging to a solution, someti mes never converging and instead following a limit cycle, and sometimes diverging. In this paper, we study the limiting behaviors of three classic minimax algorithms: gradient descent ascent (GDA), alternating gradient descent ascent (AGDA), and the extragradient method (EGM). Numerically, we observe that all of these limiting behaviors can arise in Generative Adversarial Networks (GAN) training and are easily demonstrated for a range of GAN problems. To explain these different behaviors, we study the high-order resolution continuous-time dynamics that correspond to each algorithm, which results in the sufficient (and almost necessary) conditions for the local convergence by each method. Moreover, this ODE perspective allows us to characterize the phase transition between these different limiting behaviors caused by introducing regularization as Hopf Bifurcations.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا