Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

A Massive Data Framework for M-Estimators with Cubic-Rate

85 0 0.0 ( 0 )

Download Cite

Added by Wenbin Lu

Publication date 2016

fields Mathematical Statistics

and research's language is English

Authors Chengchun Shi - Wenbin Lu - Rui Song

Statistics Theory Statistics Theory

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

The divide and conquer method is a common strategy for handling massive data. In this article, we study the divide and conquer method for cubic-rate estimators under the massive data framework. We develop a general theory for establishing the asymptotic distribution of the aggregated M-estimators using a simple average. Under certain condition on the growing rate of the number of subgroups, the resulting aggregated estimators are shown to have faster convergence rate and asymptotic normal distribution, which are more tractable in both computation and inference than the original M-estimators based on pooled data. Our theory applies to a wide class of M-estimators with cube root convergence rate, including the location estimator, maximum score estimator and value search estimator. Empirical performance via simulations also validate our theoretical findings.

rate research

Robustness and Tractability for Non-convex M-estimators

77 - Ruizhi Zhang , Yajun Mei , Jianjun Shi 2019

We investigate two important properties of M-estimator, namely, robustness and tractability, in linear regression setting, when the observations are contaminated by some arbitrary outliers. Specifically, robustness means the statistical property that the estimator should always be close to the underlying true parameters {em regardless of the distribution of the outliers}, and tractability indicates the computational property that the estimator can be computed efficiently, even if the objective function of the M-estimator is {em non-convex}. In this article, by learning the landscape of the empirical risk, we show that under mild conditions, many M-estimators enjoy nice robustness and tractability properties simultaneously, when the percentage of outliers is small. We further extend our analysis to the high-dimensional setting, where the number of parameters is greater than the number of samples, $p gg n$, and prove that when the proportion of outliers is small, the penalized M-estimators with {em $L_1$} penalty will enjoy robustness and tractability simultaneously. Our research provides an analytic approach to see the effects of outliers and tuning parameters on the robustness and tractability for some families of M-estimators. Simulation and case study are presented to illustrate the usefulness of our theoretical results for M-estimators under Welschs exponential squared loss.

Statistics Theory Statistics Theory

Large-dimensional behavior of regularized Maronnas M-estimators of covariance matrices

105 - Nicolas Auguin , David Morales-Jimenez , Matthew R. McKay 2018

Robust estimators of large covariance matrices are considered, comprising regularized (linear shrinkage) modifications of Maronnas classical M-estimators. These estimators provide robustness to outliers, while simultaneously being well-defined when the number of samples does not exceed the number of variables. By applying tools from random matrix theory, we characterize the asymptotic performance of such estimators when the numbers of samples and variables grow large together. In particular, our results show that, when outliers are absent, many estimators of the regularized-Maronna type share the same asymptotic performance, and for these estimators we present a data-driven method for choosing the asymptotically optimal regularization parameter with respect to a quadratic loss. Robustness in the presence of outliers is then studied: in the non-regularized case, a large-dimensional robustness metric is proposed, and explicitly computed for two particular types of estimators, exhibiting interesting differences depending on the underlying contamination model. The impact of outliers in regularized estimators is then studied, with interesting differences with respect to the non-regularized case, leading to new practical insights on the choice of particular estimators.

Statistics Theory Statistics Theory

Distributed Statistical Inference for Massive Data

229 - Song Xi Chen , Liuhua Peng 2018

This paper considers distributed statistical inference for general symmetric statistics %that encompasses the U-statistics and the M-estimators in the context of massive data where the data can be stored at multiple platforms in different locations. In order to facilitate effective computation and to avoid expensive communication among different platforms, we formulate distributed statistics which can be conducted over smaller data blocks. The statistical properties of the distributed statistics are investigated in terms of the mean square error of estimation and asymptotic distributions with respect to the number of data blocks. In addition, we propose two distributed bootstrap algorithms which are computationally effective and are able to capture the underlying distribution of the distributed statistics. Numerical simulation and real data applications of the proposed approaches are provided to demonstrate the empirical performance.

Statistics Theory Statistics Theory

Maxiset in sup-norm for kernel estimators

104 - Karine Bertin , Vincent Rivoirard 2007

In the Gaussian white noise model, we study the estimation of an unknown multidimensional function $f$ in the uniform norm by using kernel methods. The performances of procedures are measured by using the maxiset point of view: we determine the set of functions which are well estimated (at a prescribed rate) by each procedure. So, in this paper, we determine the maxisets associated to kernel estimators and to the Lepski procedure for the rate of convergence of the form $(log n/n)^{be/(2be+d)}$. We characterize the maxisets in terms of Besov and Holder spaces of regularity $beta$.

Statistics Theory Statistics Theory

Sub-Gaussian mean estimators

70 - Luc Devroye , Matthieu Lerasle , Gabor Lugosi 2015

We discuss the possibilities and limitations of estimating the mean of a real-valued random variable from independent and identically distributed observations from a non-asymptotic point of view. In particular, we define estimators with a sub-Gaussian behavior even for certain heavy-tailed distributions. We also prove various impossibility results for mean estimators.

Statistics Theory Statistics Theory

comments

Fetching comments

Oran 1 University

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

A Massive Data Framework for M-Estimators with Cubic-Rate

Ask ChatGPT about the research

No Arabic abstract

Read More