Selection of variables and dimension reduction in high-dimensional non-parametric regression


Abstract in English

We consider a $l_1$-penalization procedure in the non-parametric Gaussian regression model. In many concrete examples, the dimension $d$ of the input variable $X$ is very large (sometimes depending on the number of observations). Estimation of a $beta$-regular regression function $f$ cannot be faster than the slow rate $n^{-2beta/(2beta+d)}$. Hopefully, in some situations, $f$ depends only on a few numbers of the coordinates of $X$. In this paper, we construct two procedures. The first one selects, with high probability, these coordinates. Then, using this subset selection method, we run a local polynomial estimator (on the set of interesting coordinates) to estimate the regression function at the rate $n^{-2beta/(2beta+d^*)}$, where $d^*$, the real dimension of the problem (exact number of variables whom $f$ depends on), has replaced the dimension $d$ of the design. To achieve this result, we used a $l_1$ penalization method in this non-parametric setup.

Download