Do you want to publish a course? Click here

Sparse Group Lasso: Optimal Sample Complexity, Convergence Rate, and Statistical Inference

78   0   0.0 ( 0 )
 Added by Anru Zhang
 Publication date 2019
and research's language is English




Ask ChatGPT about the research

In this paper, we study sparse group Lasso for high-dimensional double sparse linear regression, where the parameter of interest is simultaneously element-wise and group-wise sparse. This problem is an important instance of the simultaneously structured model -- an actively studied topic in statistics and machine learning. In the noiseless case, we provide matching upper and lower bounds on sample complexity for the exact recovery of sparse vectors and for stable estimation of approximately sparse vectors, respectively. In the noisy case, we develop upper and matching minimax lower bounds for estimation error. We also consider the debiased sparse group Lasso and investigate its asymptotic property for the purpose of statistical inference. Finally, numerical studies are provided to support the theoretical results.



rate research

Read More

217 - Kan Chen , Zhiqi Bu , Shiyun Xu 2021
Sparse Group LASSO (SGL) is a regularized model for high-dimensional linear regression problems with grouped covariates. SGL applies $l_1$ and $l_2$ penalties on the individual predictors and group predictors, respectively, to guarantee sparse effects both on the inter-group and within-group levels. In this paper, we apply the approximate message passing (AMP) algorithm to efficiently solve the SGL problem under Gaussian random designs. We further use the recently developed state evolution analysis of AMP to derive an asymptotically exact characterization of SGL solution. This allows us to conduct multiple fine-grained statistical analyses of SGL, through which we investigate the effects of the group information and $gamma$ (proportion of $ell_1$ penalty). With the lens of various performance measures, we show that SGL with small $gamma$ benefits significantly from the group information and can outperform other SGL (including LASSO) or regularized models which does not exploit the group information, in terms of the recovery rate of signal, false discovery rate and mean squared error.
When data is collected in an adaptive manner, even simple methods like ordinary least squares can exhibit non-normal asymptotic behavior. As an undesirable consequence, hypothesis tests and confidence intervals based on asymptotic normality can lead to erroneous results. We propose an online debiasing estimator to correct these distributional anomalies in least squares estimation. Our proposed method takes advantage of the covariance structure present in the dataset and provides sharper estimates in directions for which more information has accrued. We establish an asymptotic normality property for our proposed online debiasing estimator under mild conditions on the data collection process, and provide asymptotically exact confidence intervals. We additionally prove a minimax lower bound for the adaptive linear regression problem, thereby providing a baseline by which to compare estimators. There are various conditions under which our proposed estimator achieves the minimax lower bound up to logarithmic factors. We demonstrate the usefulness of our theory via applications to multi-armed bandit, autoregressive time series estimation, and active learning with exploration.
In this paper we discuss the estimation of a nonparametric component $f_1$ of a nonparametric additive model $Y=f_1(X_1) + ...+ f_q(X_q) + epsilon$. We allow the number $q$ of additive components to grow to infinity and we make sparsity assumptions about the number of nonzero additive components. We compare this estimation problem with that of estimating $f_1$ in the oracle model $Z= f_1(X_1) + epsilon$, for which the additive components $f_2,dots,f_q$ are known. We construct a two-step presmoothing-and-resmoothing estimator of $f_1$ and state finite-sample bounds for the difference between our estimator and some smoothing estimators $hat f_1^{text{(oracle)}}$ in the oracle model. In an asymptotic setting these bounds can be used to show asymptotic equivalence of our estimator and the oracle estimators; the paper thus shows that, asymptotically, under strong enough sparsity conditions, knowledge of $f_2,dots,f_q$ has no effect on estimation accuracy. Our first step is to estimate $f_1$ with an undersmoothed estimator based on near-orthogonal projections with a group Lasso bias correction. We then construct pseudo responses $hat Y$ by evaluating a debiased modification of our undersmoothed estimator of $f_1$ at the design points. In the second step the smoothing method of the oracle estimator $hat f_1^{text{(oracle)}}$ is applied to a nonparametric regression problem with responses $hat Y$ and covariates $X_1$. Our mathematical exposition centers primarily on establishing properties of the presmoothing estimator. We present simulation results demonstrating close-to-oracle performance of our estimator in practical applications.
This paper gives a review of concentration inequalities which are widely employed in non-asymptotical analyses of mathematical statistics in a wide range of settings, from distribution-free to distribution-dependent, from sub-Gaussian to sub-exponential, sub-Gamma, and sub-Weibull random variables, and from the mean to the maximum concentration. This review provides results in these settings with some fresh new results. Given the increasing popularity of high-dimensional data and inference, results in the context of high-dimensional linear and Poisson regressions are also provided. We aim to illustrate the concentration inequalities with known constants and to improve existing bounds with sharper constants.
Gaussian processes are distributions over functions that are versatile and mathematically convenient priors in Bayesian modelling. However, their use is often impeded for data with large numbers of observations, $N$, due to the cubic (in $N$) cost of matrix operations used in exact inference. Many solutions have been proposed that rely on $M ll N$ inducing variables to form an approximation at a cost of $mathcal{O}(NM^2)$. While the computational cost appears linear in $N$, the true complexity depends on how $M$ must scale with $N$ to ensure a certain quality of the approximation. In this work, we investigate upper and lower bounds on how $M$ needs to grow with $N$ to ensure high quality approximations. We show that we can make the KL-divergence between the approximate model and the exact posterior arbitrarily small for a Gaussian-noise regression model with $Mll N$. Specifically, for the popular squared exponential kernel and $D$-dimensional Gaussian distributed covariates, $M=mathcal{O}((log N)^D)$ suffice and a method with an overall computational cost of $mathcal{O}(N(log N)^{2D}(loglog N)^2)$ can be used to perform inference.

suggested questions

comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا