Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Outlier Robust Mean Estimation with Subgaussian Rates via Stability

80 0 0.0 ( 0 )

Download Cite

Added by Ankit Pensia

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors Ilias Diakonikolas - Daniel M. Kane - Ankit Pensia

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We study the problem of outlier robust high-dimensional mean estimation under a finite covariance assumption, and more broadly under finite low-degree moment assumptions. We consider a standard stability condition from the recent robust statistics literature and prove that, except with exponentially small failure probability, there exists a large fraction of the inliers satisfying this condition. As a corollary, it follows that a number of recently developed algorithms for robust mean estimation, including iterative filtering and non-convex gradient descent, give optimal error estimators with (near-)subgaussian rates. Previous analyses of these algorithms gave significantly suboptimal rates. As a corollary of our approach, we obtain the first computationally efficient algorithm with subgaussian rate for outlier-robust mean estimation in the strong contamination model under a finite covariance assumption.

rate research

Robust Estimation of Mean Values

329 - Xinjia Chen 2008

In this paper, we develop a computational approach for estimating the mean value of a quantity in the presence of uncertainty. We demonstrate that, under some mild assumptions, the upper and lower bounds of the mean value are efficiently computable via a sample reuse technique, of which the computational complexity is shown to posses a Poisson distribution.

Statistics Theory Systems and Control Probability

Outlier-robust moment-estimation via sum-of-squares

77 - Pravesh K. Kothari , David Steurer 2017

We develop efficient algorithms for estimating low-degree moments of unknown distributions in the presence of adversarial outliers. The guarantees of our algorithms improve in many cases significantly over the best previous ones, obtained in recent works of Diakonikolas et al, Lai et al, and Charikar et al. We also show that the guarantees of our algorithms match information-theoretic lower-bounds for the class of distributions we consider. These improved guarantees allow us to give improved algorithms for independent component analysis and learning mixtures of Gaussians in the presence of outliers. Our algorithms are based on a standard sum-of-squares relaxation of the following conceptually-simple optimization problem: Among all distributions whose moments are bounded in the same way as for the unknown distribution, find the one that is closest in statistical distance to the empirical distribution of the adversarially-corrupted sample.

Data Structures and Algorithms Machine Learning Machine Learning

Optimal Sub-Gaussian Mean Estimation in $mathbb{R}$

392 - Jasper C.H. Lee , Paul Valiant 2020

We revisit the problem of estimating the mean of a real-valued distribution, presenting a novel estimator with sub-Gaussian convergence: intuitively, our estimator, on any distribution, is as accurate as the sample mean is for the Gaussian distribution of matching variance. Crucially, in contrast to prior works, our estimator does not require prior knowledge of the variance, and works across the entire gamut of distributions with bounded variance, including those without any higher moments. Parameterized by the sample size $n$, the failure probability $delta$, and the variance $sigma^2$, our estimator is accurate to within $sigmacdot(1+o(1))sqrt{frac{2logfrac{1}{delta}}{n}}$, tight up to the $1+o(1)$ factor. Our estimator construction and analysis gives a framework generalizable to other problems, tightly analyzing a sum of dependent random variables by viewing the sum implicitly as a 2-parameter $psi$-estimator, and constructing bounds using mathematical programming and duality techniques.

Statistics Theory Data Structures and Algorithms Information Theory

List-Decodable Mean Estimation via Iterative Multi-Filtering

63 - Ilias Diakonikolas , Daniel M. Kane , Daniel Kongsgaard 2020

We study the problem of {em list-decodable mean estimation} for bounded covariance distributions. Specifically, we are given a set $T$ of points in $mathbb{R}^d$ with the promise that an unknown $alpha$-fraction of points in $T$, where $0< alpha < 1/2$, are drawn from an unknown mean and bounded covariance distribution $D$, and no assumptions are made on the remaining points. The goal is to output a small list of hypothesis vectors such that at least one of them is close to the mean of $D$. We give the first practically viable estimator for this problem. In more detail, our algorithm is sample and computationally efficient, and achieves information-theoretically near-optimal error. While the only prior algorithm for this setting inherently relied on the ellipsoid method, our algorithm is iterative and only uses spectral techniques. Our main technical innovation is the design of a soft outlier removal procedure for high-dimensional heavy-tailed datasets with a majority of outliers.

Machine Learning Data Structures and Algorithms Statistics Theory

Rates of convergence for density estimation with GANs

86 - Denis Belomestny , Eric Moulines , Alexey Naumov 2021

We undertake a precise study of the non-asymptotic properties of vanilla generative adversarial networks (GANs) and derive theoretical guarantees in the problem of estimating an unknown $d$-dimensional density $p^*$ under a proper choice of the class of generators and discriminators. We prove that the resulting density estimate converges to $p^*$ in terms of Jensen-Shannon (JS) divergence at the rate $(log n/n)^{2beta/(2beta+d)}$ where $n$ is the sample size and $beta$ determines the smoothness of $p^*.$ This is the first result in the literature on density estimation using vanilla GANs with JS rates faster than $n^{-1/2}$ in the regime $beta>d/2.$

Statistics Theory Machine Learning Statistics Theory

Outlier Robust Mean Estimation with Subgaussian Rates via Stability

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions