ترغب بنشر مسار تعليمي؟ اضغط هنا

Fast Objective & Duality Gap Convergence for Nonconvex-Strongly-Concave Min-Max Problems

149   0   0.0 ( 0 )
 نشر من قبل Zhishuai Guo
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

This paper focuses on stochastic methods for solving smooth non-convex strongly-concave min-max problems, which have received increasing attention due to their potential applications in deep learning (e.g., deep AUC maximization). However, most of the existing algorithms are slow in practice, and their analysis revolves around the convergence to a nearly stationary point. We consider leveraging the Polyak-L ojasiewicz (PL) condition to design faster stochastic algorithms with stronger convergence guarantee. Although PL condition has been utilized for designing many stochastic minimization algorithms, their applications for non-convex min-max optimization remains rare. In this paper, we propose and analyze proximal epoch-based methods, and establish fast convergence in terms of both {bf the primal objective gap and the duality gap}. Our analysis is interesting in threefold: (i) it is based on a novel Lyapunov function that consists of the primal objective gap and the duality gap of a regularized function; (ii) it only requires a weaker PL condition for establishing the primal objective convergence than that required for the duality gap convergence; (iii) it yields the optimal dependence on the accuracy level $epsilon$, i.e., $O(1/epsilon)$. We also make explicit the dependence on the problem parameters and explore regions of weak convexity parameter that lead to improved dependence on condition numbers. Experiments on deep AUC maximization demonstrate the effectiveness of our methods. Our method (MaxAUC) achieved an AUC of 0.922 on private testing set on {bf CheXpert competition}.



قيم البحث

اقرأ أيضاً

We provide a first-order oracle complexity lower bound for finding stationary points of min-max optimization problems where the objective function is smooth, nonconvex in the minimization variable, and strongly concave in the maximization variable. W e establish a lower bound of $Omegaleft(sqrt{kappa}epsilon^{-2}right)$ for deterministic oracles, where $epsilon$ defines the level of approximate stationarity and $kappa$ is the condition number. Our analysis shows that the upper bound achieved in (Lin et al., 2020b) is optimal in the $epsilon$ and $kappa$ dependence up to logarithmic factors. For stochastic oracles, we provide a lower bound of $Omegaleft(sqrt{kappa}epsilon^{-2} + kappa^{1/3}epsilon^{-4}right)$. It suggests that there is a significant gap between the upper bound $mathcal{O}(kappa^3 epsilon^{-4})$ in (Lin et al., 2020a) and our lower bound in the condition number dependence.
Distributionally robust supervised learning (DRSL) is emerging as a key paradigm for building reliable machine learning systems for real-world applications -- reflecting the need for classifiers and predictive models that are robust to the distributi on shifts that arise from phenomena such as selection bias or nonstationarity. Existing algorithms for solving Wasserstein DRSL -- one of the most popular DRSL frameworks based around robustness to perturbations in the Wasserstein distance -- involve solving complex subproblems or fail to make use of stochastic gradients, limiting their use in large-scale machine learning problems. We revisit Wasserstein DRSL through the lens of min-max optimization and derive scalable and efficiently implementable stochastic extra-gradient algorithms which provably achieve faster convergence rates than existing approaches. We demonstrate their effectiveness on synthetic and real data when compared to existing DRSL approaches. Key to our results is the use of variance reduction and random reshuffling to accelerate stochastic min-max optimization, the analysis of which may be of independent interest.
We propose an efficient algorithm for finding first-order Nash equilibria in min-max problems of the form $min_{x in X}max_{yin Y} F(x,y)$, where the objective function is smooth in both variables and concave with respect to $y$; the sets $X$ and $Y$ are convex and projection-friendly, and $Y$ is compact. Our goal is to find an $(varepsilon_x,varepsilon_y)$-first-order Nash equilibrium with respect to a stationarity criterion that is stronger than the commonly used proximal gradient norm. The proposed approach is fairly simple: we perform approximate proximal-point iterations on the primal function, with inexact oracle provided by Nesterovs algorithm run on the regularized function $F(x_t,cdot)$, $x_t$ being the current primal iterate. The resulting iteration complexity is $O(varepsilon_x{}^{-2} varepsilon_y{}^{-1/2})$ up to a logarithmic factor. As a byproduct, the choice $varepsilon_y = O(varepsilon_x{}^2)$ allows for the $O(varepsilon_x{}^{-3})$ complexity of finding an $varepsilon_x$-stationary point for the standard Moreau envelope of the primal function. Moreover, when the objective is strongly concave with respect to $y$, the complexity estimate for our algorithm improves to $O(varepsilon_x{}^{-2}{kappa_y}^{1/2})$ up to a logarithmic factor, where $kappa_y$ is the condition number appropriately adjusted for coupling. In both scenarios, the complexity estimates are the best known so far, and are only known for the (weaker) proximal gradient norm criterion. Meanwhile, our approach is user-friendly: (i) the algorithm is built upon running a variant of Nesterovs accelerated algorithm as subroutine and avoids extragradient steps; (ii) the convergence analysis recycles the well-known results on accelerated methods with inexact oracle. Finally, we extend the approach to non-Euclidean proximal geometries.
Matrix completion has attracted much interest in the past decade in machine learning and computer vision. For low-rank promotion in matrix completion, the nuclear norm penalty is convenient due to its convexity but has a bias problem. Recently, vario us algorithms using nonconvex penalties have been proposed, among which the proximal gradient descent (PGD) algorithm is one of the most efficient and effective. For the nonconvex PGD algorithm, whether it converges to a local minimizer and its convergence rate are still unclear. This work provides a nontrivial analysis on the PGD algorithm in the nonconvex case. Besides the convergence to a stationary point for a generalized nonconvex penalty, we provide more deep analysis on a popular and important class of nonconvex penalties which have discontinuous thresholding functions. For such penalties, we establish the finite rank convergence, convergence to restricted strictly local minimizer and eventually linear convergence rate of the PGD algorithm. Meanwhile, convergence to a local minimizer has been proved for the hard-thresholding penalty. Our result is the first shows that, nonconvex regularized matrix completion only has restricted strictly local minimizers, and the PGD algorithm can converge to such minimizers with eventually linear rate under certain conditions. Illustration of the PGD algorithm via experiments has also been provided. Code is available at https://github.com/FWen/nmc.
This paper studies the complexity for finding approximate stationary points of nonconvex-strongly-concave (NC-SC) smooth minimax problems, in both general and averaged smooth finite-sum settings. We establish nontrivial lower complexity bounds of $Om ega(sqrt{kappa}Delta Lepsilon^{-2})$ and $Omega(n+sqrt{nkappa}Delta Lepsilon^{-2})$ for the two settings, respectively, where $kappa$ is the condition number, $L$ is the smoothness constant, and $Delta$ is the initial gap. Our result reveals substantial gaps between these limits and best-known upper bounds in the literature. To close these gaps, we introduce a generic acceleration scheme that deploys existing gradient-based methods to solve a sequence of crafted strongly-convex-strongly-concave subproblems. In the general setting, the complexity of our proposed algorithm nearly matches the lower bound; in particular, it removes an additional poly-logarithmic dependence on accuracy present in previous works. In the averaged smooth finite-sum setting, our proposed algorithm improves over previous algorithms by providing a nearly-tight dependence on the condition number.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا