A Homotopy Coordinate Descent Optimization Method for $l_0$-Norm Regularized Least Square Problem

107 0 0.0 ( 0 )

Download Cite

Added by Zhenzhen Sun

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors Zhenzhen Sun - Yuanlong Yu

Machine Learning

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

This paper proposes a homotopy coordinate descent (HCD) method to solve the $l_0$-norm regularized least square ($l_0$-LS) problem for compressed sensing, which combine the homotopy technique with a variant of coordinate descent method. Differs from the classical coordinate descent algorithms, HCD provides three strategies to speed up the convergence: warm start initialization, active set updating, and strong rule for active set initialization. The active set is pre-selected using a strong rule, then the coordinates of the active set are updated while those of inactive set are unchanged. The homotopy strategy provides a set of warm start initial solutions for a sequence of decreasing values of the regularization factor, which ensures all iterations along the homotopy solution path are sparse. Computational experiments on simulate signals and natural signals demonstrate effectiveness of the proposed algorithm, in accurately and efficiently reconstructing sparse solutions of the $l_0$-LS problem, whether the observation is noisy or not.

rate research

Parallel Coordinate Descent for L1-Regularized Loss Minimization

441 - Joseph K. Bradley , Aapo Kyrola , Danny Bickson 2011

We propose Shotgun, a parallel coordinate descent algorithm for minimizing L1-regularized losses. Though coordinate descent seems inherently sequential, we prove convergence bounds for Shotgun which predict linear speedups, up to a problem-dependent limit. We present a comprehensive empirical study of Shotgun for Lasso and sparse logistic regression. Our theoretical predictions on the potential for parallelism closely match behavior on real data. Shotgun outperforms other published solvers on a range of large problems, proving to be one of the most scalable algorithms for L1.

Machine Learning Information Theory Information Theory

A proximal MM method for the zero-norm regularized PLQ composite optimization problem

91 - Dongdong Zhang , Shaohua Pan , Shujun Bi 2020

This paper is concerned with a class of zero-norm regularized piecewise linear-quadratic (PLQ) composite minimization problems, which covers the zero-norm regularized $ell_1$-loss minimization problem as a special case. For this class of nonconvex nonsmooth problems, we show that its equivalent MPEC reformulation is partially calm on the set of global optima and make use of this property to derive a family of equivalent DC surrogates. Then, we propose a proximal majorization-minimization (MM) method, a convex relaxation approach not in the DC algorithm framework, for solving one of the DC surrogates which is a semiconvex PLQ minimization problem involving three nonsmooth terms. For this method, we establish its global convergence and linear rate of convergence, and under suitable conditions show that the limit of the generated sequence is not only a local optimum but also a good critical point in a statistical sense. Numerical experiments are conducted with synthetic and real data for the proximal MM method with the subproblems solved by a dual semismooth Newton method to confirm our theoretical findings, and numerical comparisons with a convergent indefinite-proximal ADMM for the partially smoothed DC surrogate verify its superiority in the quality of solutions and computing time.

Optimization and Control

A Flexible Coordinate Descent Method

509 - Kimon Fountoulakis , Rachael Tappenden 2015

We present a novel randomized block coordinate descent method for the minimization of a convex composite objective function. The method uses (approximate) partial second-order (curvature) information, so that the algorithm performance is more robust when applied to highly nonseparable or ill conditioned problems. We call the method Flexible Coordinate Descent (FCD). At each iteration of FCD, a block of coordinates is sampled randomly, a quadratic model is formed about that block and the model is minimized emph{approximately/inexactly} to determine the search direction. An inexpensive line search is then employed to ensure a monotonic decrease in the objective function and acceptance of large step sizes. We present several high probability iteration complexity results to show that convergence of FCD is guaranteed theoretically. Finally, we present numerical results on large-scale problems to demonstrate the practical performance of the method.

Optimization and Control

A regularized weighted least gradient problem for conductivity imaging

351 - Alexandru Tamasan , Alexander Timonov 2018

We propose and study a regularization method for recovering an approximate electrical conductivity solely from the magnitude of one interior current density field. Without some minimal knowledge of the boundary voltage potential, the problem has been recently shown to have nonunique solutions, thus recovering the exact conductivity is impossible. The method is based on solving a weighted least gradient problem in the subspace of functions of bounded variations with square integrable traces. The computational effectiveness of this method is demonstrated in numerical experiments.

Analysis of PDEs

Beyond Gradient Descent for Regularized Segmentation Losses

133 - Dmitrii Marin , Meng Tang , Ismail Ben Ayed 2018

The simplicity of gradient descent (GD) made it the default method for training ever-deeper and complex neural networks. Both loss functions and architectures are often explicitly tuned to be amenable to this basic local optimization. In the context of weakly-supervised CNN segmentation, we demonstrate a well-motivated loss function where an alternative optimizer (ADM) achieves the state-of-the-art while GD performs poorly. Interestingly, GD obtains its best result for a smoother tuning of the loss function. The results are consistent across different network architectures. Our loss is motivated by well-understood MRF/CRF regularization models in shallow segmentation and their known global solvers. Our work suggests that network design/training should pay more attention to optimization methods.

Machine Learning Machine Learning

A Homotopy Coordinate Descent Optimization Method for $l_0$-Norm Regularized Least Square Problem

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions