Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

The Trimmed Lasso: Sparse Recovery Guarantees and Practical Optimization by the Generalized Soft-Min Penalty

82 0 0.0 ( 0 )

Download Cite

Added by Tal Amir

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors Tal Amir - Ronen Basri - Boaz Nadler

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We present a new approach to solve the sparse approximation or best subset selection problem, namely find a $k$-sparse vector ${bf x}inmathbb{R}^d$ that minimizes the $ell_2$ residual $lVert A{bf x}-{bf y} rVert_2$. We consider a regularized approach, whereby this residual is penalized by the non-convex $textit{trimmed lasso}$, defined as the $ell_1$-norm of ${bf x}$ excluding its $k$ largest-magnitude entries. We prove that the trimmed lasso has several appealing theoretical properties, and in particular derive sparse recovery guarantees assuming successful optimization of the penalized objective. Next, we show empirically that directly optimizing this objective can be quite challenging. Instead, we propose a surrogate for the trimmed lasso, called the $textit{generalized soft-min}$. This penalty smoothly interpolates between the classical lasso and the trimmed lasso, while taking into account all possible $k$-sparse patterns. The generalized soft-min penalty involves summation over $binom{d}{k}$ terms, yet we derive a polynomial-time algorithm to compute it. This, in turn, yields a practical method for the original sparse approximation problem. Via simulations, we demonstrate its competitive performance compared to current state of the art.

rate research

M-estimation with the Trimmed l1 Penalty

321 - Jihun Yun , Peng Zheng , Eunho Yang 2018

We study high-dimensional estimators with the trimmed $ell_1$ penalty, which leaves the $h$ largest parameter entries penalty-free. While optimization techniques for this nonconvex penalty have been studied, the statistical properties have not yet been analyzed. We present the first statistical analyses for $M$-estimation and characterize support recovery, $ell_infty$ and $ell_2$ error of the trimmed $ell_1$ estimates as a function of the trimming parameter $h$. Our results show different regimes based on how $h$ compares to the true support size. Our second contribution is a new algorithm for the trimmed regularization problem, which has the same theoretical convergence rate as the difference of convex (DC) algorithms, but in practice is faster and finds lower objective values. Empirical evaluation of $ell_1$ trimming for sparse linear regression and graphical model estimation indicate that trimmed $ell_1$ can outperform vanilla $ell_1$ and non-convex alternatives. Our last contribution is to show that the trimmed penalty is beneficial beyond $M$-estimation, and yields promising results for two deep learning tasks: input structures recovery and network sparsification.

Statistics Theory Optimization and Control Statistics Theory

Penalty Method for Inversion-Free Deep Bilevel Optimization

245 - Akshay Mehra , Jihun Hamm 2019

Bilevel optimization problems are at the center of several important machine learning problems such as hyperparameter tuning, data denoising, meta- and few-shot learning, and training-data poisoning. Different from simultaneous or multi-objective optimization, the steepest descent direction for minimizing the upper-level cost requires the inverse of the Hessian of the lower-level cost. In this paper, we propose a new method for solving bilevel optimization problems using the classical penalty function approach which avoids computing the inverse and can also handle additional constraints easily. We prove the convergence of the method under mild conditions and show that the exact hypergradient is obtained asymptotically. Our methods simplicity and small space and time complexities enable us to effectively solve large-scale bilevel problems involving deep neural networks. We present results on data denoising, few-shot learning, and training-data poisoning problems in a large scale setting and show that our method outperforms or is comparable to previously proposed methods based on automatic differentiation and approximate inversion in terms of accuracy, run-time and convergence speed.

Machine Learning Optimization and Control Machine Learning

Fast Distributionally Robust Learning with Variance Reduced Min-Max Optimization

124 - Yaodong Yu , Tianyi Lin , Eric Mazumdar 2021

Distributionally robust supervised learning (DRSL) is emerging as a key paradigm for building reliable machine learning systems for real-world applications -- reflecting the need for classifiers and predictive models that are robust to the distribution shifts that arise from phenomena such as selection bias or nonstationarity. Existing algorithms for solving Wasserstein DRSL -- one of the most popular DRSL frameworks based around robustness to perturbations in the Wasserstein distance -- involve solving complex subproblems or fail to make use of stochastic gradients, limiting their use in large-scale machine learning problems. We revisit Wasserstein DRSL through the lens of min-max optimization and derive scalable and efficiently implementable stochastic extra-gradient algorithms which provably achieve faster convergence rates than existing approaches. We demonstrate their effectiveness on synthetic and real data when compared to existing DRSL approaches. Key to our results is the use of variance reduction and random reshuffling to accelerate stochastic min-max optimization, the analysis of which may be of independent interest.

Machine Learning Optimization and Control Machine Learning

Neural Network Compression Via Sparse Optimization

106 - Tianyi Chen , Bo Ji , Yixin Shi 2020

The compression of deep neural networks (DNNs) to reduce inference cost becomes increasingly important to meet realistic deployment requirements of various applications. There have been a significant amount of work regarding network compression, while most of them are heuristic rule-based or typically not friendly to be incorporated into varying scenarios. On the other hand, sparse optimization yielding sparse solutions naturally fits the compression requirement, but due to the limited study of sparse optimization in stochastic learning, its extension and application onto model compression is rarely well explored. In this work, we propose a model compression framework based on the recent progress on sparse stochastic optimization. Compared to existing model compression techniques, our method is effective and requires fewer extra engineering efforts to incorporate with varying applications, and has been numerically demonstrated on benchmark compression tasks. Particularly, we achieve up to 7.2 and 2.9 times FLOPs reduction with the same level of evaluation accuracy on VGG16 for CIFAR10 and ResNet50 for ImageNet compared to the baseline heavy models, respectively.

Machine Learning Optimization and Control Machine Learning

On the Duality between Network Flows and Network Lasso

54 - Alexander Jung 2019

Many applications generate data with an intrinsic network structure such as time series data, image data or social network data. The network Lasso (nLasso) has been proposed recently as a method for joint clustering and optimization of machine learning models for networked data. The nLasso extends the Lasso from sparse linear models to clustered graph signals. This paper explores the duality of nLasso and network flow optimization. We show that, in a very precise sense, nLasso is equivalent to a minimum-cost flow problem on the data network structure. Our main technical result is a concise characterization of nLasso solutions via existence of certain network flows. The main conceptual result is a useful link between nLasso methods and basic graph algorithms such as clustering or maximum flow.

Machine Learning Optimization and Control Machine Learning

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

The Trimmed Lasso: Sparse Recovery Guarantees and Practical Optimization by the Generalized Soft-Min Penalty

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions