No Arabic abstract
We focus on the high dimensional linear regression $Ysimmathcal{N}(Xbeta^{*},sigma^{2}I_{n})$, where $beta^{*}inmathds{R}^{p}$ is the parameter of interest. In this setting, several estimators such as the LASSO and the Dantzig Selector are known to satisfy interesting properties whenever the vector $beta^{*}$ is sparse. Interestingly both of the LASSO and the Dantzig Selector can be seen as orthogonal projections of 0 into $mathcal{DC}(s)={betainmathds{R}^{p},|X(Y-Xbeta)|_{infty}leq s}$ - using an $ell_{1}$ distance for the Dantzig Selector and $ell_{2}$ for the LASSO. For a well chosen $s>0$, this set is actually a confidence region for $beta^{*}$. In this paper, we investigate the properties of estimators defined as projections on $mathcal{DC}(s)$ using general distances. We prove that the obtained estimators satisfy oracle properties close to the one of the LASSO and Dantzig Selector. On top of that, it turns out that these estimators can be tuned to exploit a different sparsity or/and slightly different estimation objectives.
We consider a high-dimensional regression model with a possible change-point due to a covariate threshold and develop the Lasso estimator of regression coefficients as well as the threshold parameter. Our Lasso estimator not only selects covariates but also selects a model between linear and threshold regression models. Under a sparsity assumption, we derive non-asymptotic oracle inequalities for both the prediction risk and the $ell_1$ estimation loss for regression coefficients. Since the Lasso estimator selects variables simultaneously, we show that oracle inequalities can be established without pretesting the existence of the threshold effect. Furthermore, we establish conditions under which the estimation error of the unknown threshold parameter can be bounded by a nearly $n^{-1}$ factor even when the number of regressors can be much larger than the sample size ($n$). We illustrate the usefulness of our proposed estimation method via Monte Carlo simulations and an application to real data.
We study the problem of high-dimensional variable selection via some two-step procedures. First we show that given some good initial estimator which is $ell_{infty}$-consistent but not necessarily variable selection consistent, we can apply the nonnegative Garrote, adaptive Lasso or hard-thresholding procedure to obtain a final estimator that is both estimation and variable selection consistent. Unlike the Lasso, our results do not require the irrepresentable condition which could fail easily even for moderate $p_n$ (Zhao and Yu, 2007) and it also allows $p_n$ to grow almost as fast as $exp(n)$ (for hard-thresholding there is no restriction on $p_n$). We also study the conditions under which the Ridge regression can be used as an initial estimator. We show that under a relaxed identifiable condition, the Ridge estimator is $ell_{infty}$-consistent. Such a condition is usually satisfied when $p_nle n$ and does not require the partial orthogonality between relevant and irrelevant covariates which is needed for the univariate regression in (Huang et al., 2008). Our numerical studies show that when using the Lasso or Ridge as initial estimator, the two-step procedures have a higher sparsity recovery rate than the Lasso or adaptive Lasso with univariate regression used in (Huang et al., 2008).
In high-dimensional regression, we attempt to estimate a parameter vector ${boldsymbol beta}_0in{mathbb R}^p$ from $nlesssim p$ observations ${(y_i,{boldsymbol x}_i)}_{ile n}$ where ${boldsymbol x}_iin{mathbb R}^p$ is a vector of predictors and $y_i$ is a response variable. A well-estabilished approach uses convex regularizers to promote specific structures (e.g. sparsity) of the estimate $widehat{boldsymbol beta}$, while allowing for practical algorithms. Theoretical analysis implies that convex penalization schemes have nearly optimal estimation properties in certain settings. However, in general the gaps between statistically optimal estimation (with unbounded computational resources) and convex methods are poorly understood. We show that, in general, a large gap exists between the best performance achieved by emph{any convex regularizer} and the optimal statistical error. Remarkably, we demonstrate that this gap is generic as soon as we try to incorporate very simple structural information about the empirical distribution of the entries of ${boldsymbol beta}_0$. Our results follow from a detailed study of standard Gaussian designs, a setting that is normally considered particularly friendly to convex regularization schemes such as the Lasso. We prove a lower bound on the estimation error achieved by any convex regularizer which is invariant under permutations of the coordinates of its argument. This bound is expected to be generally tight, and indeed we prove tightness under certain conditions. Further, it implies a gap with respect to Bayes-optimal estimation that can be precisely quantified and persists if the prior distribution of the signal ${boldsymbol beta}_0$ is known to the statistician. Our results provide rigorous evidence towards a broad conjecture regarding computational-statistical gaps in high-dimensional estimation.
We study the asymptotic properties of bridge estimators in sparse, high-dimensional, linear regression models when the number of covariates may increase to infinity with the sample size. We are particularly interested in the use of bridge estimators to distinguish between covariates whose coefficients are zero and covariates whose coefficients are nonzero. We show that under appropriate conditions, bridge estimators correctly select covariates with nonzero coefficients with probability converging to one and that the estimators of nonzero coefficients have the same asymptotic distribution that they would have if the zero coefficients were known in advance. Thus, bridge estimators have an oracle property in the sense of Fan and Li [J. Amer. Statist. Assoc. 96 (2001) 1348--1360] and Fan and Peng [Ann. Statist. 32 (2004) 928--961]. In general, the oracle property holds only if the number of covariates is smaller than the sample size. However, under a partial orthogonality condition in which the covariates of the zero coefficients are uncorrelated or weakly correlated with the covariates of nonzero coefficients, we show that marginal bridge estimators can correctly distinguish between covariates with nonzero and zero coefficients with probability converging to one even when the number of covariates is greater than the sample size.
This was a revision of arXiv:1105.2454v1 from 2012. It considers a variation on the STIV estimator where, instead of one conic constraint, there are as many conic constraints as moments (instruments) allowing to use more directly moderate deviations for self-normalized sums. The idea first appeared in formula (6.5) in arXiv:1105.2454v1 when some instruments can be endogenous. For reference and to avoid confusion with the STIV estimator, this estimator should be called C-STIV.