بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

A Massive Data Framework for M-Estimators with Cubic-Rate

85 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Wenbin Lu

تاريخ النشر 2016

مجال البحث الاحصاء الرياضي

والبحث باللغة English

تأليف Chengchun Shi - Wenbin Lu - Rui Song

نظرية الإحصاء نظرية الإحصاء

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

The divide and conquer method is a common strategy for handling massive data. In this article, we study the divide and conquer method for cubic-rate estimators under the massive data framework. We develop a general theory for establishing the asymptotic distribution of the aggregated M-estimators using a simple average. Under certain condition on the growing rate of the number of subgroups, the resulting aggregated estimators are shown to have faster convergence rate and asymptotic normal distribution, which are more tractable in both computation and inference than the original M-estimators based on pooled data. Our theory applies to a wide class of M-estimators with cube root convergence rate, including the location estimator, maximum score estimator and value search estimator. Empirical performance via simulations also validate our theoretical findings.

قيم البحث

77 - Ruizhi Zhang , Yajun Mei , Jianjun Shi 2019

We investigate two important properties of M-estimator, namely, robustness and tractability, in linear regression setting, when the observations are contaminated by some arbitrary outliers. Specifically, robustness means the statistical property that the estimator should always be close to the underlying true parameters {em regardless of the distribution of the outliers}, and tractability indicates the computational property that the estimator can be computed efficiently, even if the objective function of the M-estimator is {em non-convex}. In this article, by learning the landscape of the empirical risk, we show that under mild conditions, many M-estimators enjoy nice robustness and tractability properties simultaneously, when the percentage of outliers is small. We further extend our analysis to the high-dimensional setting, where the number of parameters is greater than the number of samples, $p gg n$, and prove that when the proportion of outliers is small, the penalized M-estimators with {em $L_1$} penalty will enjoy robustness and tractability simultaneously. Our research provides an analytic approach to see the effects of outliers and tuning parameters on the robustness and tractability for some families of M-estimators. Simulation and case study are presented to illustrate the usefulness of our theoretical results for M-estimators under Welschs exponential squared loss.

نظرية الإحصاء نظرية الإحصاء

Large-dimensional behavior of regularized Maronnas M-estimators of covariance matrices

105 - Nicolas Auguin , David Morales-Jimenez , Matthew R. McKay 2018

Robust estimators of large covariance matrices are considered, comprising regularized (linear shrinkage) modifications of Maronnas classical M-estimators. These estimators provide robustness to outliers, while simultaneously being well-defined when t he number of samples does not exceed the number of variables. By applying tools from random matrix theory, we characterize the asymptotic performance of such estimators when the numbers of samples and variables grow large together. In particular, our results show that, when outliers are absent, many estimators of the regularized-Maronna type share the same asymptotic performance, and for these estimators we present a data-driven method for choosing the asymptotically optimal regularization parameter with respect to a quadratic loss. Robustness in the presence of outliers is then studied: in the non-regularized case, a large-dimensional robustness metric is proposed, and explicitly computed for two particular types of estimators, exhibiting interesting differences depending on the underlying contamination model. The impact of outliers in regularized estimators is then studied, with interesting differences with respect to the non-regularized case, leading to new practical insights on the choice of particular estimators.

نظرية الإحصاء نظرية الإحصاء

Distributed Statistical Inference for Massive Data

229 - Song Xi Chen , Liuhua Peng 2018

This paper considers distributed statistical inference for general symmetric statistics %that encompasses the U-statistics and the M-estimators in the context of massive data where the data can be stored at multiple platforms in different locations. In order to facilitate effective computation and to avoid expensive communication among different platforms, we formulate distributed statistics which can be conducted over smaller data blocks. The statistical properties of the distributed statistics are investigated in terms of the mean square error of estimation and asymptotic distributions with respect to the number of data blocks. In addition, we propose two distributed bootstrap algorithms which are computationally effective and are able to capture the underlying distribution of the distributed statistics. Numerical simulation and real data applications of the proposed approaches are provided to demonstrate the empirical performance.

نظرية الإحصاء نظرية الإحصاء

Maxiset in sup-norm for kernel estimators

104 - Karine Bertin , Vincent Rivoirard 2007

In the Gaussian white noise model, we study the estimation of an unknown multidimensional function $f$ in the uniform norm by using kernel methods. The performances of procedures are measured by using the maxiset point of view: we determine the set o f functions which are well estimated (at a prescribed rate) by each procedure. So, in this paper, we determine the maxisets associated to kernel estimators and to the Lepski procedure for the rate of convergence of the form $(log n/n)^{be/(2be+d)}$. We characterize the maxisets in terms of Besov and Holder spaces of regularity $beta$.

نظرية الإحصاء نظرية الإحصاء

Sub-Gaussian mean estimators

70 - Luc Devroye , Matthieu Lerasle , Gabor Lugosi 2015

We discuss the possibilities and limitations of estimating the mean of a real-valued random variable from independent and identically distributed observations from a non-asymptotic point of view. In particular, we define estimators with a sub-Gaussia n behavior even for certain heavy-tailed distributions. We also prove various impossibility results for mean estimators.

نظرية الإحصاء نظرية الإحصاء

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة سوهاج

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

A Massive Data Framework for M-Estimators with Cubic-Rate

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً