Quadratic Convergence of Smoothing Newtons Method for 0/1 Loss Optimization

112 0 0.0 ( 0 )

Download Cite

Added by Shenglong Zhou

Publication date 2021

fields

and research's language is English

Authors Shenglong Zhou - Lili Pan - Naihua Xiu

Optimization and Control

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

It has been widely recognized that the 0/1 loss function is one of the most natural choices for modelling classification errors, and it has a wide range of applications including support vector machines and 1-bit compressed sensing. Due to the combinatorial nature of the 0/1 loss function, methods based on convex relaxations or smoothing approximations have dominated the existing research and are often able to provide approximate solutions of good quality. However, those methods are not optimizing the 0/1 loss function directly and hence no optimality has been established for the original problem. This paper aims to study the optimality conditions of the 0/1 function minimization, and for the first time to develop Newtons method that directly optimizes the 0/1 function with a local quadratic convergence under reasonable conditions. Extensive numerical experiments demonstrate its superior performance as one would expect from Newton-type methods.ions. Extensive numerical experiments demonstrate its superior performance as one would expect from Newton-type methods.

rate research

New Q-Newtons method meets Backtracking line search: good convergence guarantee, saddle points avoidance, quadratic rate of convergence, and easy implementation

112 - Tuyen Trung Truong 2021

In a recent joint work, the author has developed a modification of Newtons method, named New Q-Newtons method, which can avoid saddle points and has quadratic rate of convergence. While good theoretical convergence guarantee has not been established for this method, experiments on small scale problems show that the method works very competitively against other well known modifications of Newtons method such as Adaptive Cubic Regularization and BFGS, as well as first order methods such as Unbounded Two-way Backtracking Gradient Descent. In this paper, we resolve the convergence guarantee issue by proposing a modification of New Q-Newtons method, named New Q-Newtons method Backtracking, which incorporates a more sophisticated use of hyperparameters and a Backtracking line search. This new method has very good theoretical guarantees, which for a {bf Morse function} yields the following (which is unknown for New Q-Newtons method): {bf Theorem.} Let $f:mathbb{R}^mrightarrow mathbb{R}$ be a Morse function, that is all its critical points have invertible Hessian. Then for a sequence ${x_n}$ constructed by New Q-Newtons method Backtracking from a random initial point $x_0$, we have the following two alternatives: i) $lim_{nrightarrowinfty}||x_n||=infty$, or ii) ${x_n}$ converges to a point $x_{infty}$ which is a {bf local minimum} of $f$, and the rate of convergence is {bf quadratic}. Moreover, if $f$ has compact sublevels, then only case ii) happens. As far as we know, for Morse functions, this is the best theoretical guarantee for iterative optimization algorithms so far in the literature. We have tested in experiments on small scale, with some further simplifie

Optimization and Control Machine Learning Numerical Analysis

Newtons Method for M-Tensor Equations

300 - Dong-Hui Li Jie-Feng Xu , Hong-Bo Guan 2021

We are concerned with the tensor equations whose coefficient tensor is an M-tensor. We first propose a Newton method for solving the equation with a positive constant term and establish its global and quadratic convergence. Then we extend the method to solve the equation with a nonnegative constant term and establish its convergence. At last, we do numerical experiments to test the proposed methods. The results show that the proposed method is quite efficient.

Optimization and Control

Mini-batch stochastic Nesterovs smoothing method for constrained convex stochastic composite optimization

246 - Ruyu Wang , Chao Zhang , Lichun Wang 2021

This paper considers a class of constrained convex stochastic composite optimization problems whose objective function is given by the summation of a differentiable convex component, together with a nonsmooth but convex component. The nonsmooth component has an explicit max structure that may not easy to compute its proximal mapping. In order to solve these problems, we propose a mini-batch stochastic Nesterovs smoothing (MSNS) method. Convergence and the optimal iteration complexity of the method are established. Numerical results are provided to illustrate the efficiency of the proposed MSNS method for a support vector machine (SVM) model.

Optimization and Control

Convergence Thresholds of Newtons Method for Monotone Polynomial Equations

465 - Javier Esparza , Stefan Kiefer , Michael Luttenberger 2008

Monotone systems of polynomial equations (MSPEs) are systems of fixed-point equations $X_1 = f_1(X_1, ..., X_n),$ $..., X_n = f_n(X_1, ..., X_n)$ where each $f_i$ is a polynomial with positive real coefficients. The question of computing the least non-negative solution of a given MSPE $vec X = vec f(vec X)$ arises naturally in the analysis of stochastic models such as stochastic context-free grammars, probabilistic pushdown automata, and back-button processes. Etessami and Yannakakis have recently adapted Newtons iterative method to MSPEs. In a previous paper we have proved the existence of a threshold $k_{vec f}$ for strongly connected MSPEs, such that after $k_{vec f}$ iterations of Newtons method each new iteration computes at least 1 new bit of the solution. However, the proof was purely existential. In this paper we give an upper bound for $k_{vec f}$ as a function of the minimal component of the least fixed-point $muvec f$ of $vec f(vec X)$. Using this result we show that $k_{vec f}$ is at most single exponential resp. linear for strongly connected MSPEs derived from probabilistic pushdown automata resp. from back-button processes. Further, we prove the existence of a threshold for arbitrary MSPEs after which each new iteration computes at least $1/w2^h$ new bits of the solution, where $w$ and $h$ are the width and height of the DAG of strongly connected components.

Data Structures and Algorithms Information Theory Numerical Analysis

Adaptive Newton Sketch: Linear-time Optimization with Quadratic Convergence and Effective Hessian Dimensionality

53 - Jonathan Lacotte , Yifei Wang , Mert Pilanci 2021

We propose a randomized algorithm with quadratic convergence rate for convex optimization problems with a self-concordant, composite, strongly convex objective function. Our method is based on performing an approximate Newton step using a random projection of the Hessian. Our first contribution is to show that, at each iteration, the embedding dimension (or sketch size) can be as small as the effective dimension of the Hessian matrix. Leveraging this novel fundamental result, we design an algorithm with a sketch size proportional to the effective dimension and which exhibits a quadratic rate of convergence. This result dramatically improves on the classical linear-quadratic convergence rates of state-of-the-art sub-sampled Newton methods. However, in most practical cases, the effective dimension is not known beforehand, and this raises the question of how to pick a sketch size as small as the effective dimension while preserving a quadratic convergence rate. Our second and main contribution is thus to propose an adaptive sketch size algorithm with quadratic convergence rate and which does not require prior knowledge or estimation of the effective dimension: at each iteration, it starts with a small sketch size, and increases it until quadratic progress is achieved. Importantly, we show that the embedding dimension remains proportional to the effective dimension throughout the entire path and that our method achieves state-of-the-art computational complexity for solving convex optimization programs with a strongly convex component.

Optimization and Control Machine Learning