Do you want to publish a course? Click here

The Lasso for High-Dimensional Regression with a Possible Change-Point

135   0   0.0 ( 0 )
 Publication date 2012
and research's language is English




Ask ChatGPT about the research

We consider a high-dimensional regression model with a possible change-point due to a covariate threshold and develop the Lasso estimator of regression coefficients as well as the threshold parameter. Our Lasso estimator not only selects covariates but also selects a model between linear and threshold regression models. Under a sparsity assumption, we derive non-asymptotic oracle inequalities for both the prediction risk and the $ell_1$ estimation loss for regression coefficients. Since the Lasso estimator selects variables simultaneously, we show that oracle inequalities can be established without pretesting the existence of the threshold effect. Furthermore, we establish conditions under which the estimation error of the unknown threshold parameter can be bounded by a nearly $n^{-1}$ factor even when the number of regressors can be much larger than the sample size ($n$). We illustrate the usefulness of our proposed estimation method via Monte Carlo simulations and an application to real data.



rate research

Read More

130 - Emmanuel Pilliat 2020
This manuscript makes two contributions to the field of change-point detection. In a general change-point setting, we provide a generic algorithm for aggregating local homogeneity tests into an estimator of change-points in a time series. Interestingly, we establish that the error rates of the collection of test directly translate into detection properties of the change-point estimator. This generic scheme is then applied to the problem of possibly sparse multivariate mean change-point detection setting. When the noise is Gaussian, we derive minimax optimal rates that are adaptive to the unknown sparsity and to the distance between change-points. For sub-Gaussian noise, we introduce a variant that is optimal in almost all sparsity regimes.
202 - Cun-Hui Zhang , Jian Huang 2008
Meinshausen and Buhlmann [Ann. Statist. 34 (2006) 1436--1462] showed that, for neighborhood selection in Gaussian graphical models, under a neighborhood stability condition, the LASSO is consistent, even when the number of variables is of greater order than the sample size. Zhao and Yu [(2006) J. Machine Learning Research 7 2541--2567] formalized the neighborhood stability condition in the context of linear regression as a strong irrepresentable condition. That paper showed that under this condition, the LASSO selects exactly the set of nonzero regression coefficients, provided that these coefficients are bounded away from zero at a certain rate. In this paper, the regression coefficients outside an ideal model are assumed to be small, but not necessarily zero. Under a sparse Riesz condition on the correlation of design variables, we prove that the LASSO selects a model of the correct order of dimensionality, controls the bias of the selected model at a level determined by the contributions of small regression coefficients and threshold bias, and selects all coefficients of greater order than the bias of the selected model. Moreover, as a consequence of this rate consistency of the LASSO in model selection, it is proved that the sum of error squares for the mean response and the $ell_{alpha}$-loss for the regression coefficients converge at the best possible rates under the given conditions. An interesting aspect of our results is that the logarithm of the number of variables can be of the same order as the sample size for certain random dependent designs.
While there is considerable work on change point analysis in univariate time series, more and more data being collected comes from high dimensional multivariate settings. This paper introduces the asymptotic concept of high dimensional efficiency which quantifies the detection power of different statistics in such situations. While being related to classic asymptotic relative efficiency, it is different in that it provides the rate at which the change can get smaller with dimension while still being detectable. This also allows for comparisons of different methods with different null asymptotics as is for example the case in high-dimensional change point settings. Based on this new concept we investigate change point detection procedures using projections and develop asymptotic theory for how full panel (multivariate) tests compare with both oracle and random projections. Furthermore, for each given projection we can quantify a cone such that the corresponding projection statistic yields better power behavior if the true change direction is within this cone. The effect of misspecification of the covariance on the power of the tests is investigated, because in many high dimensional situations estimation of the full dependency (covariance) between the multivariate observations in the panel is often either computationally or even theoretically infeasible. It turns out that the projection statistic is much more robust in this respect in terms of size and somewhat more robust in terms of power. The theoretic quantification by the theory is accompanied by simulation results which confirm the theoretic (asymptotic) findings for surprisingly small samples. This shows in particular that the concept of high dimensional efficiency is indeed suitable to describe small sample power, and this is demonstrated in a multivariate example of market index data.
In high-dimensional regression, we attempt to estimate a parameter vector ${boldsymbol beta}_0in{mathbb R}^p$ from $nlesssim p$ observations ${(y_i,{boldsymbol x}_i)}_{ile n}$ where ${boldsymbol x}_iin{mathbb R}^p$ is a vector of predictors and $y_i$ is a response variable. A well-estabilished approach uses convex regularizers to promote specific structures (e.g. sparsity) of the estimate $widehat{boldsymbol beta}$, while allowing for practical algorithms. Theoretical analysis implies that convex penalization schemes have nearly optimal estimation properties in certain settings. However, in general the gaps between statistically optimal estimation (with unbounded computational resources) and convex methods are poorly understood. We show that, in general, a large gap exists between the best performance achieved by emph{any convex regularizer} and the optimal statistical error. Remarkably, we demonstrate that this gap is generic as soon as we try to incorporate very simple structural information about the empirical distribution of the entries of ${boldsymbol beta}_0$. Our results follow from a detailed study of standard Gaussian designs, a setting that is normally considered particularly friendly to convex regularization schemes such as the Lasso. We prove a lower bound on the estimation error achieved by any convex regularizer which is invariant under permutations of the coordinates of its argument. This bound is expected to be generally tight, and indeed we prove tightness under certain conditions. Further, it implies a gap with respect to Bayes-optimal estimation that can be precisely quantified and persists if the prior distribution of the signal ${boldsymbol beta}_0$ is known to the statistician. Our results provide rigorous evidence towards a broad conjecture regarding computational-statistical gaps in high-dimensional estimation.
113 - Pierre Alquier 2011
We focus on the high dimensional linear regression $Ysimmathcal{N}(Xbeta^{*},sigma^{2}I_{n})$, where $beta^{*}inmathds{R}^{p}$ is the parameter of interest. In this setting, several estimators such as the LASSO and the Dantzig Selector are known to satisfy interesting properties whenever the vector $beta^{*}$ is sparse. Interestingly both of the LASSO and the Dantzig Selector can be seen as orthogonal projections of 0 into $mathcal{DC}(s)={betainmathds{R}^{p},|X(Y-Xbeta)|_{infty}leq s}$ - using an $ell_{1}$ distance for the Dantzig Selector and $ell_{2}$ for the LASSO. For a well chosen $s>0$, this set is actually a confidence region for $beta^{*}$. In this paper, we investigate the properties of estimators defined as projections on $mathcal{DC}(s)$ using general distances. We prove that the obtained estimators satisfy oracle properties close to the one of the LASSO and Dantzig Selector. On top of that, it turns out that these estimators can be tuned to exploit a different sparsity or/and slightly different estimation objectives.
comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا