Outlier Robust Mean Estimation with Subgaussian Rates via Stability

80 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Ankit Pensia

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Ilias Diakonikolas - Daniel M. Kane - Ankit Pensia

نظرية الإحصاء بنى وهياكل البيانات والخوارزميات التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We study the problem of outlier robust high-dimensional mean estimation under a finite covariance assumption, and more broadly under finite low-degree moment assumptions. We consider a standard stability condition from the recent robust statistics literature and prove that, except with exponentially small failure probability, there exists a large fraction of the inliers satisfying this condition. As a corollary, it follows that a number of recently developed algorithms for robust mean estimation, including iterative filtering and non-convex gradient descent, give optimal error estimators with (near-)subgaussian rates. Previous analyses of these algorithms gave significantly suboptimal rates. As a corollary of our approach, we obtain the first computationally efficient algorithm with subgaussian rate for outlier-robust mean estimation in the strong contamination model under a finite covariance assumption.

قيم البحث

اقرأ أيضاً

Robust Estimation of Mean Values

129 - Xinjia Chen 2008

In this paper, we develop a computational approach for estimating the mean value of a quantity in the presence of uncertainty. We demonstrate that, under some mild assumptions, the upper and lower bounds of the mean value are efficiently computable v ia a sample reuse technique, of which the computational complexity is shown to posses a Poisson distribution.

نظرية الإحصاء أنظمة وتحكم الاحتمالات

Outlier-robust moment-estimation via sum-of-squares

77 - Pravesh K. Kothari , David Steurer 2017

We develop efficient algorithms for estimating low-degree moments of unknown distributions in the presence of adversarial outliers. The guarantees of our algorithms improve in many cases significantly over the best previous ones, obtained in recent w orks of Diakonikolas et al, Lai et al, and Charikar et al. We also show that the guarantees of our algorithms match information-theoretic lower-bounds for the class of distributions we consider. These improved guarantees allow us to give improved algorithms for independent component analysis and learning mixtures of Gaussians in the presence of outliers. Our algorithms are based on a standard sum-of-squares relaxation of the following conceptually-simple optimization problem: Among all distributions whose moments are bounded in the same way as for the unknown distribution, find the one that is closest in statistical distance to the empirical distribution of the adversarially-corrupted sample.

بنى وهياكل البيانات والخوارزميات التعلم الآلي التعلم الالي

Optimal Sub-Gaussian Mean Estimation in $mathbb{R}$

392 - Jasper C.H. Lee , Paul Valiant 2020

We revisit the problem of estimating the mean of a real-valued distribution, presenting a novel estimator with sub-Gaussian convergence: intuitively, our estimator, on any distribution, is as accurate as the sample mean is for the Gaussian distributi on of matching variance. Crucially, in contrast to prior works, our estimator does not require prior knowledge of the variance, and works across the entire gamut of distributions with bounded variance, including those without any higher moments. Parameterized by the sample size $n$, the failure probability $delta$, and the variance $sigma^2$, our estimator is accurate to within $sigmacdot(1+o(1))sqrt{frac{2logfrac{1}{delta}}{n}}$, tight up to the $1+o(1)$ factor. Our estimator construction and analysis gives a framework generalizable to other problems, tightly analyzing a sum of dependent random variables by viewing the sum implicitly as a 2-parameter $psi$-estimator, and constructing bounds using mathematical programming and duality techniques.

نظرية الإحصاء بنى وهياكل البيانات والخوارزميات نظرية المعلومات

List-Decodable Mean Estimation via Iterative Multi-Filtering

63 - Ilias Diakonikolas , Daniel M. Kane , Daniel Kongsgaard 2020

We study the problem of {em list-decodable mean estimation} for bounded covariance distributions. Specifically, we are given a set $T$ of points in $mathbb{R}^d$ with the promise that an unknown $alpha$-fraction of points in $T$, where $0< alpha < 1/ 2$, are drawn from an unknown mean and bounded covariance distribution $D$, and no assumptions are made on the remaining points. The goal is to output a small list of hypothesis vectors such that at least one of them is close to the mean of $D$. We give the first practically viable estimator for this problem. In more detail, our algorithm is sample and computationally efficient, and achieves information-theoretically near-optimal error. While the only prior algorithm for this setting inherently relied on the ellipsoid method, our algorithm is iterative and only uses spectral techniques. Our main technical innovation is the design of a soft outlier removal procedure for high-dimensional heavy-tailed datasets with a majority of outliers.

التعلم الآلي بنى وهياكل البيانات والخوارزميات نظرية الإحصاء

Rates of convergence for density estimation with GANs

86 - Denis Belomestny , Eric Moulines , Alexey Naumov 2021

We undertake a precise study of the non-asymptotic properties of vanilla generative adversarial networks (GANs) and derive theoretical guarantees in the problem of estimating an unknown $d$-dimensional density $p^*$ under a proper choice of the class of generators and discriminators. We prove that the resulting density estimate converges to $p^*$ in terms of Jensen-Shannon (JS) divergence at the rate $(log n/n)^{2beta/(2beta+d)}$ where $n$ is the sample size and $beta$ determines the smoothness of $p^*.$ This is the first result in the literature on density estimation using vanilla GANs with JS rates faster than $n^{-1/2}$ in the regime $beta>d/2.$

نظرية الإحصاء التعلم الالي نظرية الإحصاء