Do you want to publish a course? Click here

M-estimation with the Trimmed l1 Penalty

322   0   0.0 ( 0 )
 Added by Peng Zheng
 Publication date 2018
  fields
and research's language is English




Ask ChatGPT about the research

We study high-dimensional estimators with the trimmed $ell_1$ penalty, which leaves the $h$ largest parameter entries penalty-free. While optimization techniques for this nonconvex penalty have been studied, the statistical properties have not yet been analyzed. We present the first statistical analyses for $M$-estimation and characterize support recovery, $ell_infty$ and $ell_2$ error of the trimmed $ell_1$ estimates as a function of the trimming parameter $h$. Our results show different regimes based on how $h$ compares to the true support size. Our second contribution is a new algorithm for the trimmed regularization problem, which has the same theoretical convergence rate as the difference of convex (DC) algorithms, but in practice is faster and finds lower objective values. Empirical evaluation of $ell_1$ trimming for sparse linear regression and graphical model estimation indicate that trimmed $ell_1$ can outperform vanilla $ell_1$ and non-convex alternatives. Our last contribution is to show that the trimmed penalty is beneficial beyond $M$-estimation, and yields promising results for two deep learning tasks: input structures recovery and network sparsification.

rate research

Read More

We present a new approach to solve the sparse approximation or best subset selection problem, namely find a $k$-sparse vector ${bf x}inmathbb{R}^d$ that minimizes the $ell_2$ residual $lVert A{bf x}-{bf y} rVert_2$. We consider a regularized approach, whereby this residual is penalized by the non-convex $textit{trimmed lasso}$, defined as the $ell_1$-norm of ${bf x}$ excluding its $k$ largest-magnitude entries. We prove that the trimmed lasso has several appealing theoretical properties, and in particular derive sparse recovery guarantees assuming successful optimization of the penalized objective. Next, we show empirically that directly optimizing this objective can be quite challenging. Instead, we propose a surrogate for the trimmed lasso, called the $textit{generalized soft-min}$. This penalty smoothly interpolates between the classical lasso and the trimmed lasso, while taking into account all possible $k$-sparse patterns. The generalized soft-min penalty involves summation over $binom{d}{k}$ terms, yet we derive a polynomial-time algorithm to compute it. This, in turn, yields a practical method for the original sparse approximation problem. Via simulations, we demonstrate its competitive performance compared to current state of the art.
This paper aims to build an estimate of an unknown density of the data with measurement error as a linear combination of functions from a dictionary. Inspired by the penalization approach, we propose the weighted Elastic-net penalized minimal $ell_2$-distance method for sparse coefficients estimation, where the adaptive weights come from sharp concentration inequalities. The optimal weighted tuning parameters are obtained by the first-order conditions holding with a high probability. Under local coherence or minimal eigenvalue assumptions, non-asymptotical oracle inequalities are derived. These theoretical results are transposed to obtain the support recovery with a high probability. Then, some numerical experiments for discrete and continuous distributions confirm the significant improvement obtained by our procedure when compared with other conventional approaches. Finally, the application is performed in a meteorology data set. It shows that our method has potency and superiority of detecting the shape of multi-mode density compared with other conventional approaches.
129 - Suvra Pal , Souvik Roy 2019
In this paper, we propose a new estimation methodology based on a projected non-linear conjugate gradient (PNCG) algorithm with an efficient line search technique. We develop a general PNCG algorithm for a survival model incorporating a proportion cure under a competing risks setup, where the initial number of competing risks are exposed to elimination after an initial treatment (known as destruction). In the literature, expectation maximization (EM) algorithm has been widely used for such a model to estimate the model parameters. Through an extensive Monte Carlo simulation study, we compare the performance of our proposed PNCG with that of the EM algorithm and show the advantages of our proposed method. Through simulation, we also show the advantages of our proposed methodology over other optimization algorithms (including other conjugate gradient type methods) readily available as R software packages. To show these we assume the initial number of competing risks to follow a negative binomial distribution although our general algorithm allows one to work with any competing risks distribution. Finally, we apply our proposed algorithm to analyze a well-known melanoma data.
Optimal design theory for nonlinear regression studies local optimality on a given design space. We identify designs for the Bradley--Terry paired comparison model with small undirected graphs and prove that every saturated D-optimal design is represented by a path. We discuss the case of four alternatives in detail and derive explicit polynomial inequality descriptions for optimality regions in parameter space. Using these regions, for each point in parameter space we can prescribe a D-optimal design.
We undertake a precise study of the non-asymptotic properties of vanilla generative adversarial networks (GANs) and derive theoretical guarantees in the problem of estimating an unknown $d$-dimensional density $p^*$ under a proper choice of the class of generators and discriminators. We prove that the resulting density estimate converges to $p^*$ in terms of Jensen-Shannon (JS) divergence at the rate $(log n/n)^{2beta/(2beta+d)}$ where $n$ is the sample size and $beta$ determines the smoothness of $p^*.$ This is the first result in the literature on density estimation using vanilla GANs with JS rates faster than $n^{-1/2}$ in the regime $beta>d/2.$
comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا