No Arabic abstract
We study uniqueness in the generalized lasso problem, where the penalty is the $ell_1$ norm of a matrix $D$ times the coefficient vector. We derive a broad result on uniqueness that places weak assumptions on the predictor matrix $X$ and penalty matrix $D$; the implication is that, if $D$ is fixed and its null space is not too large (the dimension of its null space is at most the number of samples), and $X$ and response vector $y$ jointly follow an absolutely continuous distribution, then the generalized lasso problem has a unique solution almost surely, regardless of the number of predictors relative to the number of samples. This effectively generalizes previous uniqueness results for the lasso problem (which corresponds to the special case $D=I$). Further, we extend our study to the case in which the loss is given by the negative log-likelihood from a generalized linear model. In addition to uniqueness results, we derive results on the local stability of generalized lasso solutions that might be of interest in their own right.
We study the absolute penalized maximum partial likelihood estimator in sparse, high-dimensional Cox proportional hazards regression models where the number of time-dependent covariates can be larger than the sample size. We establish oracle inequalities based on natural extensions of the compatibility and cone invertibility factors of the Hessian matrix at the true regression coefficients. Similar results based on an extension of the restricted eigenvalue can be also proved by our method. However, the presented oracle inequalities are sharper since the compatibility and cone invertibility factors are always greater than the corresponding restricted eigenvalue. In the Cox regression model, the Hessian matrix is based on time-dependent covariates in censored risk sets, so that the compatibility and cone invertibility factors, and the restricted eigenvalue as well, are random variables even when they are evaluated for the Hessian at the true regression coefficients. Under mild conditions, we prove that these quantities are bounded from below by positive constants for time-dependent covariates, including cases where the number of covariates is of greater order than the sample size. Consequently, the compatibility and cone invertibility factors can be treated as positive constants in our oracle inequalities.
Meinshausen and Buhlmann [Ann. Statist. 34 (2006) 1436--1462] showed that, for neighborhood selection in Gaussian graphical models, under a neighborhood stability condition, the LASSO is consistent, even when the number of variables is of greater order than the sample size. Zhao and Yu [(2006) J. Machine Learning Research 7 2541--2567] formalized the neighborhood stability condition in the context of linear regression as a strong irrepresentable condition. That paper showed that under this condition, the LASSO selects exactly the set of nonzero regression coefficients, provided that these coefficients are bounded away from zero at a certain rate. In this paper, the regression coefficients outside an ideal model are assumed to be small, but not necessarily zero. Under a sparse Riesz condition on the correlation of design variables, we prove that the LASSO selects a model of the correct order of dimensionality, controls the bias of the selected model at a level determined by the contributions of small regression coefficients and threshold bias, and selects all coefficients of greater order than the bias of the selected model. Moreover, as a consequence of this rate consistency of the LASSO in model selection, it is proved that the sum of error squares for the mean response and the $ell_{alpha}$-loss for the regression coefficients converge at the best possible rates under the given conditions. An interesting aspect of our results is that the logarithm of the number of variables can be of the same order as the sample size for certain random dependent designs.
We consider a high-dimensional regression model with a possible change-point due to a covariate threshold and develop the Lasso estimator of regression coefficients as well as the threshold parameter. Our Lasso estimator not only selects covariates but also selects a model between linear and threshold regression models. Under a sparsity assumption, we derive non-asymptotic oracle inequalities for both the prediction risk and the $ell_1$ estimation loss for regression coefficients. Since the Lasso estimator selects variables simultaneously, we show that oracle inequalities can be established without pretesting the existence of the threshold effect. Furthermore, we establish conditions under which the estimation error of the unknown threshold parameter can be bounded by a nearly $n^{-1}$ factor even when the number of regressors can be much larger than the sample size ($n$). We illustrate the usefulness of our proposed estimation method via Monte Carlo simulations and an application to real data.
Consider the case that we observe $n$ independent and identically distributed copies of a random variable with a probability distribution known to be an element of a specified statistical model. We are interested in estimating an infinite dimensional target parameter that minimizes the expectation of a specified loss function. In cite{generally_efficient_TMLE} we defined an estimator that minimizes the empirical risk over all multivariate real valued cadlag functions with variation norm bounded by some constant $M$ in the parameter space, and selects $M$ with cross-validation. We referred to this estimator as the Highly-Adaptive-Lasso estimator due to the fact that the constrained can be formulated as a bound $M$ on the sum of the coefficients a linear combination of a very large number of basis functions. Specifically, in the case that the target parameter is a conditional mean, then it can be implemented with the standard LASSO regression estimator. In cite{generally_efficient_TMLE} we proved that the HAL-estimator is consistent w.r.t. the (quadratic) loss-based dissimilarity at a rate faster than $n^{-1/2}$ (i.e., faster than $n^{-1/4}$ w.r.t. a norm), even when the parameter space is completely nonparametric. The only assumption required for this rate is that the true parameter function has a finite variation norm. The loss-based dissimilarity is often equivalent with the square of an $L^2(P_0)$-type norm. In this article, we establish that under some weak continuity condition, the HAL-estimator is also uniformly consistent.
The lasso procedure is ubiquitous in the statistical and signal processing literature, and as such, is the target of substantial theoretical and applied research. While much of this research focuses on the desirable properties that lasso possesses---predictive risk consistency, sign consistency, correct model selection---all of it has assumes that the tuning parameter is chosen in an oracle fashion. Yet, this is impossible in practice. Instead, data analysts must use the data twice, once to choose the tuning parameter and again to estimate the model. But only heuristics have ever justified such a procedure. To this end, we give the first definitive answer about the risk consistency of lasso when the smoothing parameter is chosen via cross-validation. We show that under some restrictions on the design matrix, the lasso estimator is still risk consistent with an empirically chosen tuning parameter.