Cubic Regularized Newton Method for Saddle Point Models: a Global and Local Convergence Analysis

65 0 0.0 ( 0 )

Download Cite

Added by Kevin Huang

Publication date 2020

fields

and research's language is English

Authors Kevin Huang - Junyu Zhang - Shuzhong Zhang

Optimization and Control

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

In this paper, we propose a cubic regularized Newton (CRN) method for solving convex-concave saddle point problems (SPP). At each iteration, a cubic regularized saddle point subproblem is constructed and solved, which provides a search direction for the iterate. With properly chosen stepsizes, the method is shown to converge to the saddle point with global linear and local superlinear convergence rates, if the saddle point function is gradient Lipschitz and strongly-convex-strongly-concave. In the case that the function is merely convex-concave, we propose a homotopy continuation (or path-following) method. Under a Lipschitz-type error bound condition, we present an iteration complexity bound of $mathcal{O}left(ln left(1/epsilonright)right)$ to reach an $epsilon$-solution through a homotopy continuation approach, and the iteration complexity bound becomes $mathcal{O}left(left(1/epsilonright)^{frac{1-theta}{theta^2}}right)$ under a H{o}lderian-type error bound condition involving a parameter $theta$ ($0<theta<1$).

rate research

A Decentralized Proximal Point-type Method for Saddle Point Problems

104 - Weijie Liu , Aryan Mokhtari , Asuman Ozdaglar 2019

In this paper, we focus on solving a class of constrained non-convex non-concave saddle point problems in a decentralized manner by a group of nodes in a network. Specifically, we assume that each node has access to a summand of a global objective function and nodes are allowed to exchange information only with their neighboring nodes. We propose a decentralized variant of the proximal point method for solving this problem. We show that when the objective function is $rho$-weakly convex-weakly concave the iterates converge to approximate stationarity with a rate of $mathcal{O}(1/sqrt{T})$ where the approximation error depends linearly on $sqrt{rho}$. We further show that when the objective function satisfies the Minty VI condition (which generalizes the convex-concave case) we obtain convergence to stationarity with a rate of $mathcal{O}(1/sqrt{T})$. To the best of our knowledge, our proposed method is the first decentralized algorithm with theoretical guarantees for solving a non-convex non-concave decentralized saddle point problem. Our numerical results for training a general adversarial network (GAN) in a decentralized manner match our theoretical guarantees.

Optimization and Control Machine Learning Machine Learning

Subgradient methods near active manifolds: saddle point avoidance, local convergence, and asymptotic normality

127 - Damek Davis , Dmitriy Drusvyatskiy , Liwei Jiang 2021

Nonsmooth optimization problems arising in practice tend to exhibit beneficial smooth substructure: their domains stratify into active manifolds of smooth variation, which common proximal algorithms identify in finite time. Identification then entails a transition to smooth dynamics, and accommodates second-order acceleration techniques. While identification is clearly useful algorithmically, empirical evidence suggests that even those algorithms that do not identify the active manifold in finite time -- notably the subgradient method -- are nonetheless affected by it. This work seeks to explain this phenomenon, asking: how do active manifolds impact the subgradient method in nonsmooth optimization? In this work, we answer this question by introducing two algorithmically useful properties -- aiming and subgradient approximation -- that fully expose the smooth substructure of the problem. We show that these properties imply that the shadow of the (stochastic) subgradient method along the active manifold is precisely an inexact Riemannian gradient method with an implicit retraction. We prove that these properties hold for a wide class of problems, including cone reducible/decomposable functions and generic semialgebraic problems. Moreover, we develop a thorough calculus, proving such properties are preserved under smooth deformations and spectral lifts. This viewpoint then leads to several algorithmic consequences that parallel results in smooth optimization, despite the nonsmoothness of the problem: local rates of convergence, asymptotic normality, and saddle point avoidance. The asymptotic normality results appear to be new even in the most classical setting of stochastic nonlinear programming. The results culminate in the following observation: the perturbed subgradient method on generic, Clarke regular semialgebraic problems, converges only to local minimizers.

Optimization and Control Machine Learning

Convergence Analysis of the Approximate Newton Method for Markov Decision Processes

352 - Thomas Furmston , Guy Lever 2013

Recently two approximate Newton methods were proposed for the optimisation of Markov Decision Processes. While these methods were shown to have desirable properties, such as a guarantee that the preconditioner is negative-semidefinite when the policy is $log$-concave with respect to the policy parameters, and were demonstrated to have strong empirical performance in challenging domains, such as the game of Tetris, no convergence analysis was provided. The purpose of this paper is to provide such an analysis. We start by providing a detailed analysis of the Hessian of a Markov Decision Process, which is formed of a negative-semidefinite component, a positive-semidefinite component and a remainder term. The first part of our analysis details how the negative-semidefinite and positive-semidefinite components relate to each other, and how these two terms contribute to the Hessian. The next part of our analysis shows that under certain conditions, relating to the richness of the policy class, the remainder term in the Hessian vanishes in the vicinity of a local optimum. Finally, we bound the behaviour of this remainder term in terms of the mixing time of the Markov chain induced by the policy parameters, where this part of the analysis is applicable over the entire parameter space. Given this analysis of the Hessian we then provide our local convergence analysis of the approximate Newton framework.

Optimization and Control

A Regularized Semi-Smooth Newton Method With Projection Steps for Composite Convex Programs

157 - Xiantao Xiao , Yongfeng Li , Zaiwen Wen 2016

The goal of this paper is to study approaches to bridge the gap between first-order and second-order type methods for composite convex programs. Our key observations are: i) Many well-known operator splitting methods, such as forward-backward splitting (FBS) and Douglas-Rachford splitting (DRS), actually define a fixed-point mapping; ii) The optimal solutions of the composite convex program and the solutions of a system of nonlinear equations derived from the fixed-point mapping are equivalent. Solving this kind of system of nonlinear equations enables us to develop second-order type methods. Although these nonlinear equations may be non-differentiable, they are often semi-smooth and their generalized Jacobian matrix is positive semidefinite due to monotonicity. By combining with a regularization approach and a known hyperplane projection technique, we propose an adaptive semi-smooth Newton method and establish its convergence to global optimality. Preliminary numerical results on $ell_1$-minimization problems demonstrate that our second-order type algorithms are able to achieve superlinear or quadratic convergence.

Optimization and Control

Generalization Bounds for Stochastic Saddle Point Problems

67 - Junyu Zhang , Mingyi Hong , Mengdi Wang 2020

This paper studies the generalization bounds for the empirical saddle point (ESP) solution to stochastic saddle point (SSP) problems. For SSP with Lipschitz continuous and strongly convex-strongly concave objective functions, we establish an $mathcal{O}(1/n)$ generalization bound by using a uniform stability argument. We also provide generalization bounds under a variety of assumptions, including the cases without strong convexity and without bounded domains. We illustrate our results in two examples: batch policy learning in Markov decision process, and mixed strategy Nash equilibrium estimation for stochastic games. In each of these examples, we show that a regularized ESP solution enjoys a near-optimal sample complexity. To the best of our knowledge, this is the first set of results on the generalization theory of ESP.

Optimization and Control

comments

Fetching comments

Al-Etihad University

Additional details More universities

Cubic Regularized Newton Method for Saddle Point Models: a Global and Local Convergence Analysis

Ask ChatGPT about the research

No Arabic abstract

Read More