No Arabic abstract
We consider a $l_1$-penalization procedure in the non-parametric Gaussian regression model. In many concrete examples, the dimension $d$ of the input variable $X$ is very large (sometimes depending on the number of observations). Estimation of a $beta$-regular regression function $f$ cannot be faster than the slow rate $n^{-2beta/(2beta+d)}$. Hopefully, in some situations, $f$ depends only on a few numbers of the coordinates of $X$. In this paper, we construct two procedures. The first one selects, with high probability, these coordinates. Then, using this subset selection method, we run a local polynomial estimator (on the set of interesting coordinates) to estimate the regression function at the rate $n^{-2beta/(2beta+d^*)}$, where $d^*$, the real dimension of the problem (exact number of variables whom $f$ depends on), has replaced the dimension $d$ of the design. To achieve this result, we used a $l_1$ penalization method in this non-parametric setup.
This was a revision of arXiv:1105.2454v1 from 2012. It considers a variation on the STIV estimator where, instead of one conic constraint, there are as many conic constraints as moments (instruments) allowing to use more directly moderate deviations for self-normalized sums. The idea first appeared in formula (6.5) in arXiv:1105.2454v1 when some instruments can be endogenous. For reference and to avoid confusion with the STIV estimator, this estimator should be called C-STIV.
In this paper, we consider regression models with a Hilbert-space-valued predictor and a scalar response, where the response depends on the predictor only through a finite number of projections. The linear subspace spanned by these projections is called the effective dimension reduction (EDR) space. To determine the dimensionality of the EDR space, we focus on the leading principal component scores of the predictor, and propose two sequential $chi^2$ testing procedures under the assumption that the predictor has an elliptically contoured distribution. We further extend these procedures and introduce a test that simultaneously takes into account a large number of principal component scores. The proposed procedures are supported by theory, validated by simulation studies, and illustrated by a real-data example. Our methods and theory are applicable to functional data and high-dimensional multivariate data.
Meinshausen and Buhlmann [Ann. Statist. 34 (2006) 1436--1462] showed that, for neighborhood selection in Gaussian graphical models, under a neighborhood stability condition, the LASSO is consistent, even when the number of variables is of greater order than the sample size. Zhao and Yu [(2006) J. Machine Learning Research 7 2541--2567] formalized the neighborhood stability condition in the context of linear regression as a strong irrepresentable condition. That paper showed that under this condition, the LASSO selects exactly the set of nonzero regression coefficients, provided that these coefficients are bounded away from zero at a certain rate. In this paper, the regression coefficients outside an ideal model are assumed to be small, but not necessarily zero. Under a sparse Riesz condition on the correlation of design variables, we prove that the LASSO selects a model of the correct order of dimensionality, controls the bias of the selected model at a level determined by the contributions of small regression coefficients and threshold bias, and selects all coefficients of greater order than the bias of the selected model. Moreover, as a consequence of this rate consistency of the LASSO in model selection, it is proved that the sum of error squares for the mean response and the $ell_{alpha}$-loss for the regression coefficients converge at the best possible rates under the given conditions. An interesting aspect of our results is that the logarithm of the number of variables can be of the same order as the sample size for certain random dependent designs.
Suppose that $Y$ is a scalar and $X$ is a second-order stochastic process, where $Y$ and $X$ are conditionally independent given the random variables $xi_1,...,xi_p$ which belong to the closed span $L_X^2$ of $X$. This paper investigates a unified framework for the inverse regression dimension-reduction problem. It is found that the identification of $L_X^2$ with the reproducing kernel Hilbert space of $X$ provides a platform for a seamless extension from the finite- to infinite-dimensional settings. It also facilitates convenient computational algorithms that can be applied to a variety of models.
We study the problem of high-dimensional variable selection via some two-step procedures. First we show that given some good initial estimator which is $ell_{infty}$-consistent but not necessarily variable selection consistent, we can apply the nonnegative Garrote, adaptive Lasso or hard-thresholding procedure to obtain a final estimator that is both estimation and variable selection consistent. Unlike the Lasso, our results do not require the irrepresentable condition which could fail easily even for moderate $p_n$ (Zhao and Yu, 2007) and it also allows $p_n$ to grow almost as fast as $exp(n)$ (for hard-thresholding there is no restriction on $p_n$). We also study the conditions under which the Ridge regression can be used as an initial estimator. We show that under a relaxed identifiable condition, the Ridge estimator is $ell_{infty}$-consistent. Such a condition is usually satisfied when $p_nle n$ and does not require the partial orthogonality between relevant and irrelevant covariates which is needed for the univariate regression in (Huang et al., 2008). Our numerical studies show that when using the Lasso or Ridge as initial estimator, the two-step procedures have a higher sparsity recovery rate than the Lasso or adaptive Lasso with univariate regression used in (Huang et al., 2008).