No Arabic abstract
We investigate two important properties of M-estimator, namely, robustness and tractability, in linear regression setting, when the observations are contaminated by some arbitrary outliers. Specifically, robustness means the statistical property that the estimator should always be close to the underlying true parameters {em regardless of the distribution of the outliers}, and tractability indicates the computational property that the estimator can be computed efficiently, even if the objective function of the M-estimator is {em non-convex}. In this article, by learning the landscape of the empirical risk, we show that under mild conditions, many M-estimators enjoy nice robustness and tractability properties simultaneously, when the percentage of outliers is small. We further extend our analysis to the high-dimensional setting, where the number of parameters is greater than the number of samples, $p gg n$, and prove that when the proportion of outliers is small, the penalized M-estimators with {em $L_1$} penalty will enjoy robustness and tractability simultaneously. Our research provides an analytic approach to see the effects of outliers and tuning parameters on the robustness and tractability for some families of M-estimators. Simulation and case study are presented to illustrate the usefulness of our theoretical results for M-estimators under Welschs exponential squared loss.
The divide and conquer method is a common strategy for handling massive data. In this article, we study the divide and conquer method for cubic-rate estimators under the massive data framework. We develop a general theory for establishing the asymptotic distribution of the aggregated M-estimators using a simple average. Under certain condition on the growing rate of the number of subgroups, the resulting aggregated estimators are shown to have faster convergence rate and asymptotic normal distribution, which are more tractable in both computation and inference than the original M-estimators based on pooled data. Our theory applies to a wide class of M-estimators with cube root convergence rate, including the location estimator, maximum score estimator and value search estimator. Empirical performance via simulations also validate our theoretical findings.
Classical least squares estimators are well-known to be robust with respect to moment assumptions concerning the error distribution in a wide variety of finite-dimensional statistical problems; generally only a second moment assumption is required for least squares estimators to maintain the same rate of convergence that they would satisfy if the errors were assumed to be Gaussian. In this paper, we give a geometric characterization of the robustness of shape-restricted least squares estimators (LSEs) to error distributions with an $L_{2,1}$ moment, in terms of the `localized envelopes of the model. This envelope perspective gives a systematic approach to proving oracle inequalities for the LSEs in shape-restricted regression problems in the random design setting, under a minimal $L_{2,1}$ moment assumption on the errors. The canonical isotonic and convex regression models, and a more challenging additive regression model with shape constraints are studied in detail. Strikingly enough, in the additive model both the adaptation and robustness properties of the LSE can be preserved, up to error distributions with an $L_{2,1}$ moment, for estimating the shape-constrained proxy of the marginal $L_2$ projection of the true regression function. This holds essentially regardless of whether or not the additive model structure is correctly specified. The new envelope perspective goes beyond shape constrained models. Indeed, at a general level, the localized envelopes give a sharp characterization of the convergence rate of the $L_2$ loss of the LSE between the worst-case rate as suggested by the recent work of the authors [25], and the best possible parametric rate.
Robust estimators of large covariance matrices are considered, comprising regularized (linear shrinkage) modifications of Maronnas classical M-estimators. These estimators provide robustness to outliers, while simultaneously being well-defined when the number of samples does not exceed the number of variables. By applying tools from random matrix theory, we characterize the asymptotic performance of such estimators when the numbers of samples and variables grow large together. In particular, our results show that, when outliers are absent, many estimators of the regularized-Maronna type share the same asymptotic performance, and for these estimators we present a data-driven method for choosing the asymptotically optimal regularization parameter with respect to a quadratic loss. Robustness in the presence of outliers is then studied: in the non-regularized case, a large-dimensional robustness metric is proposed, and explicitly computed for two particular types of estimators, exhibiting interesting differences depending on the underlying contamination model. The impact of outliers in regularized estimators is then studied, with interesting differences with respect to the non-regularized case, leading to new practical insights on the choice of particular estimators.
In this article, we develop a modern perspective on Akaikes Information Criterion and Mallows Cp for model selection. Despite the diff erences in their respective motivation, they are equivalent in the special case of Gaussian linear regression. In this case they are also equivalent to a third criterion, an unbiased estimator of the quadratic prediction loss, derived from loss estimation theory. Our first contribution is to provide an explicit link between loss estimation and model selection through a new oracle inequality. We then show that the form of the unbiased estimator of the quadratic prediction loss under a Gaussian assumption still holds under a more general distributional assumption, the family of spherically symmetric distributions. One of the features of our results is that our criterion does not rely on the speci ficity of the distribution, but only on its spherical symmetry. Also this family of laws o ffers some dependence property between the observations, a case not often studied.
In the Gaussian white noise model, we study the estimation of an unknown multidimensional function $f$ in the uniform norm by using kernel methods. The performances of procedures are measured by using the maxiset point of view: we determine the set of functions which are well estimated (at a prescribed rate) by each procedure. So, in this paper, we determine the maxisets associated to kernel estimators and to the Lepski procedure for the rate of convergence of the form $(log n/n)^{be/(2be+d)}$. We characterize the maxisets in terms of Besov and Holder spaces of regularity $beta$.