Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

M-estimation with the Trimmed l1 Penalty

322 0 0.0 ( 0 )

Download Cite

Added by Peng Zheng

Publication date 2018

fields

and research's language is English

Authors Jihun Yun - Peng Zheng - Eunho Yang

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We study high-dimensional estimators with the trimmed $ell_1$ penalty, which leaves the $h$ largest parameter entries penalty-free. While optimization techniques for this nonconvex penalty have been studied, the statistical properties have not yet been analyzed. We present the first statistical analyses for $M$-estimation and characterize support recovery, $ell_infty$ and $ell_2$ error of the trimmed $ell_1$ estimates as a function of the trimming parameter $h$. Our results show different regimes based on how $h$ compares to the true support size. Our second contribution is a new algorithm for the trimmed regularization problem, which has the same theoretical convergence rate as the difference of convex (DC) algorithms, but in practice is faster and finds lower objective values. Empirical evaluation of $ell_1$ trimming for sparse linear regression and graphical model estimation indicate that trimmed $ell_1$ can outperform vanilla $ell_1$ and non-convex alternatives. Our last contribution is to show that the trimmed penalty is beneficial beyond $M$-estimation, and yields promising results for two deep learning tasks: input structures recovery and network sparsification.

rate research

The Trimmed Lasso: Sparse Recovery Guarantees and Practical Optimization by the Generalized Soft-Min Penalty

81 - Tal Amir , Ronen Basri , Boaz Nadler 2020

We present a new approach to solve the sparse approximation or best subset selection problem, namely find a $k$-sparse vector ${bf x}inmathbb{R}^d$ that minimizes the $ell_2$ residual $lVert A{bf x}-{bf y} rVert_2$. We consider a regularized approach, whereby this residual is penalized by the non-convex $textit{trimmed lasso}$, defined as the $ell_1$-norm of ${bf x}$ excluding its $k$ largest-magnitude entries. We prove that the trimmed lasso has several appealing theoretical properties, and in particular derive sparse recovery guarantees assuming successful optimization of the penalized objective. Next, we show empirically that directly optimizing this objective can be quite challenging. Instead, we propose a surrogate for the trimmed lasso, called the $textit{generalized soft-min}$. This penalty smoothly interpolates between the classical lasso and the trimmed lasso, while taking into account all possible $k$-sparse patterns. The generalized soft-min penalty involves summation over $binom{d}{k}$ terms, yet we derive a polynomial-time algorithm to compute it. This, in turn, yields a practical method for the original sparse approximation problem. Via simulations, we demonstrate its competitive performance compared to current state of the art.

Machine Learning Optimization and Control Machine Learning

Sparse Density Estimation with Measurement Errors

265 - Xiaowei Yang , Huiming Zhang , Haoyu Wei 2019

This paper aims to build an estimate of an unknown density of the data with measurement error as a linear combination of functions from a dictionary. Inspired by the penalization approach, we propose the weighted Elastic-net penalized minimal $ell_2$-distance method for sparse coefficients estimation, where the adaptive weights come from sharp concentration inequalities. The optimal weighted tuning parameters are obtained by the first-order conditions holding with a high probability. Under local coherence or minimal eigenvalue assumptions, non-asymptotical oracle inequalities are derived. These theoretical results are transposed to obtain the support recovery with a high probability. Then, some numerical experiments for discrete and continuous distributions confirm the significant improvement obtained by our procedure when compared with other conventional approaches. Finally, the application is performed in a meteorology data set. It shows that our method has potency and superiority of detecting the shape of multi-mode density compared with other conventional approaches.

Statistics Theory Machine Learning Statistics Theory

A New Non-Linear Conjugate Gradient Algorithm for Destructive Cure Rate Model and a Simulation Study: Illustration with Negative Binomial Competing Risks

129 - Suvra Pal , Souvik Roy 2019

In this paper, we propose a new estimation methodology based on a projected non-linear conjugate gradient (PNCG) algorithm with an efficient line search technique. We develop a general PNCG algorithm for a survival model incorporating a proportion cure under a competing risks setup, where the initial number of competing risks are exposed to elimination after an initial treatment (known as destruction). In the literature, expectation maximization (EM) algorithm has been widely used for such a model to estimate the model parameters. Through an extensive Monte Carlo simulation study, we compare the performance of our proposed PNCG with that of the EM algorithm and show the advantages of our proposed method. Through simulation, we also show the advantages of our proposed methodology over other optimization algorithms (including other conjugate gradient type methods) readily available as R software packages. To show these we assume the initial number of competing risks to follow a negative binomial distribution although our general algorithm allows one to work with any competing risks distribution. Finally, we apply our proposed algorithm to analyze a well-known melanoma data.

Statistics Theory Optimization and Control Statistics Theory

The semi-algebraic geometry of saturated optimal designs for the Bradley-Terry model

82 - Thomas Kahle , Frank Rottger , Rainer Schwabe 2019

Optimal design theory for nonlinear regression studies local optimality on a given design space. We identify designs for the Bradley--Terry paired comparison model with small undirected graphs and prove that every saturated D-optimal design is represented by a path. We discuss the case of four alternatives in detail and derive explicit polynomial inequality descriptions for optimality regions in parameter space. Using these regions, for each point in parameter space we can prescribe a D-optimal design.

Statistics Theory Optimization and Control Statistics Theory

Rates of convergence for density estimation with GANs

86 - Denis Belomestny , Eric Moulines , Alexey Naumov 2021

We undertake a precise study of the non-asymptotic properties of vanilla generative adversarial networks (GANs) and derive theoretical guarantees in the problem of estimating an unknown $d$-dimensional density $p^*$ under a proper choice of the class of generators and discriminators. We prove that the resulting density estimate converges to $p^*$ in terms of Jensen-Shannon (JS) divergence at the rate $(log n/n)^{2beta/(2beta+d)}$ where $n$ is the sample size and $beta$ determines the smoothness of $p^*.$ This is the first result in the literature on density estimation using vanilla GANs with JS rates faster than $n^{-1/2}$ in the regime $beta>d/2.$

Statistics Theory Machine Learning Statistics Theory

comments

Fetching comments

Hama University

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

M-estimation with the Trimmed l1 Penalty

Ask ChatGPT about the research

No Arabic abstract

Read More