A proximal MM method for the zero-norm regularized PLQ composite optimization problem

92 0 0.0 ( 0 )

Download Cite

Added by Dongdong Zhang

Publication date 2020

fields

and research's language is English

Authors Dongdong Zhang - Shaohua Pan - Shujun Bi

Optimization and Control

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

This paper is concerned with a class of zero-norm regularized piecewise linear-quadratic (PLQ) composite minimization problems, which covers the zero-norm regularized $ell_1$-loss minimization problem as a special case. For this class of nonconvex nonsmooth problems, we show that its equivalent MPEC reformulation is partially calm on the set of global optima and make use of this property to derive a family of equivalent DC surrogates. Then, we propose a proximal majorization-minimization (MM) method, a convex relaxation approach not in the DC algorithm framework, for solving one of the DC surrogates which is a semiconvex PLQ minimization problem involving three nonsmooth terms. For this method, we establish its global convergence and linear rate of convergence, and under suitable conditions show that the limit of the generated sequence is not only a local optimum but also a good critical point in a statistical sense. Numerical experiments are conducted with synthetic and real data for the proximal MM method with the subproblems solved by a dual semismooth Newton method to confirm our theoretical findings, and numerical comparisons with a convergent indefinite-proximal ADMM for the partially smoothed DC surrogate verify its superiority in the quality of solutions and computing time.

rate research

Orthant Based Proximal Stochastic Gradient Method for $ell_1$-Regularized Optimization

240 - Tianyi Chen , Tianyu Ding , Bo Ji 2020

Sparsity-inducing regularization problems are ubiquitous in machine learning applications, ranging from feature selection to model compression. In this paper, we present a novel stochastic method -- Orthant Based Proximal Stochastic Gradient Method (OBProx-SG) -- to solve perhaps the most popular instance, i.e., the l1-regularized problem. The OBProx-SG method contains two steps: (i) a proximal stochastic gradient step to predict a support cover of the solution; and (ii) an orthant step to aggressively enhance the sparsity level via orthant face projection. Compared to the state-of-the-art methods, e.g., Prox-SG, RDA and Prox-SVRG, the OBProx-SG not only converges to the global optimal solutions (in convex scenario) or the stationary points (in non-convex scenario), but also promotes the sparsity of the solutions substantially. Particularly, on a large number of convex problems, OBProx-SG outperforms the existing methods comprehensively in the aspect of sparsity exploration and objective values. Moreover, the experiments on non-convex deep neural networks, e.g., MobileNetV1 and ResNet18, further demonstrate its superiority by achieving the solutions of much higher sparsity without sacrificing generalization accuracy.

Optimization and Control Machine Learning Machine Learning

A proximal method for composite minimization

419 - A.S. Lewis , S.J. Wright 2015

We consider minimization of functions that are compositions of convex or prox-regular functions (possibly extended-valued) with smooth vector functions. A wide variety of important optimization problems fall into this framework. We describe an algorithmic framework based on a subproblem constructed from a linearized approximation to the objective and a regularization term. Properties of local solutions of this subproblem underlie both a global convergence result and an identification property of the active manifold containing the solution of the original problem. Preliminary computational results on both convex and nonconvex examples are promising.

Optimization and Control Numerical Analysis

A proximal dual semismooth Newton method for computing zero-norm penalized QR estimator

70 - Dongdong Zhang , Shaohua Pan , Shujun Bi 2019

This paper is concerned with the computation of the high-dimensional zero-norm penalized quantile regression estimator, defined as a global minimizer of the zero-norm penalized check loss function. To seek a desirable approximation to the estimator, we reformulate this NP-hard problem as an equivalent augmented Lipschitz optimization problem, and exploit its coupled structure to propose a multi-stage convex relaxation approach (MSCRA_PPA), each step of which solves inexactly a weighted $ell_1$-regularized check loss minimization problem with a proximal dual semismooth Newton method. Under a restricted strong convexity condition, we provide the theoretical guarantee for the MSCRA_PPA by establishing the error bound of each iterate to the true estimator and the rate of linear convergence in a statistical sense. Numerical comparisons on some synthetic and real data show that MSCRA_PPA not only has comparable even better estimation performance, but also requires much less CPU time.

Optimization and Control

Half-Space Proximal Stochastic Gradient Method for Group-Sparsity Regularized Problem

242 - Tianyi Chen , Guanyi Wang , Tianyu Ding 2020

Optimizing with group sparsity is significant in enhancing model interpretability in machining learning applications, e.g., feature selection, compressed sensing and model compression. However, for large-scale stochastic training problems, effective group sparsity exploration are typically hard to achieve. Particularly, the state-of-the-art stochastic optimization algorithms usually generate merely dense solutions. To overcome this shortage, we propose a stochastic method -- Half-space Stochastic Projected Gradient (HSPG) method to search solutions of high group sparsity while maintain the convergence. Initialized by a simple Prox-SG Step, the HSPG method relies on a novel Half-Space Step to substantially boost the sparsity level. Numerically, HSPG demonstrates its superiority in deep neural networks, e.g., VGG16, ResNet18 and MobileNetV1, by computing solutions of higher group sparsity, competitive objective values and generalization accuracy.

Optimization and Control

A Homotopy Coordinate Descent Optimization Method for $l_0$-Norm Regularized Least Square Problem

106 - Zhenzhen Sun , Yuanlong Yu 2020

This paper proposes a homotopy coordinate descent (HCD) method to solve the $l_0$-norm regularized least square ($l_0$-LS) problem for compressed sensing, which combine the homotopy technique with a variant of coordinate descent method. Differs from the classical coordinate descent algorithms, HCD provides three strategies to speed up the convergence: warm start initialization, active set updating, and strong rule for active set initialization. The active set is pre-selected using a strong rule, then the coordinates of the active set are updated while those of inactive set are unchanged. The homotopy strategy provides a set of warm start initial solutions for a sequence of decreasing values of the regularization factor, which ensures all iterations along the homotopy solution path are sparse. Computational experiments on simulate signals and natural signals demonstrate effectiveness of the proposed algorithm, in accurately and efficiently reconstructing sparse solutions of the $l_0$-LS problem, whether the observation is noisy or not.

Machine Learning