ترغب بنشر مسار تعليمي؟ اضغط هنا

Outlier Robust Mean Estimation with Subgaussian Rates via Stability

80   0   0.0 ( 0 )
 نشر من قبل Ankit Pensia
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

We study the problem of outlier robust high-dimensional mean estimation under a finite covariance assumption, and more broadly under finite low-degree moment assumptions. We consider a standard stability condition from the recent robust statistics literature and prove that, except with exponentially small failure probability, there exists a large fraction of the inliers satisfying this condition. As a corollary, it follows that a number of recently developed algorithms for robust mean estimation, including iterative filtering and non-convex gradient descent, give optimal error estimators with (near-)subgaussian rates. Previous analyses of these algorithms gave significantly suboptimal rates. As a corollary of our approach, we obtain the first computationally efficient algorithm with subgaussian rate for outlier-robust mean estimation in the strong contamination model under a finite covariance assumption.



قيم البحث

اقرأ أيضاً

120 - Xinjia Chen 2008
In this paper, we develop a computational approach for estimating the mean value of a quantity in the presence of uncertainty. We demonstrate that, under some mild assumptions, the upper and lower bounds of the mean value are efficiently computable v ia a sample reuse technique, of which the computational complexity is shown to posses a Poisson distribution.
We develop efficient algorithms for estimating low-degree moments of unknown distributions in the presence of adversarial outliers. The guarantees of our algorithms improve in many cases significantly over the best previous ones, obtained in recent w orks of Diakonikolas et al, Lai et al, and Charikar et al. We also show that the guarantees of our algorithms match information-theoretic lower-bounds for the class of distributions we consider. These improved guarantees allow us to give improved algorithms for independent component analysis and learning mixtures of Gaussians in the presence of outliers. Our algorithms are based on a standard sum-of-squares relaxation of the following conceptually-simple optimization problem: Among all distributions whose moments are bounded in the same way as for the unknown distribution, find the one that is closest in statistical distance to the empirical distribution of the adversarially-corrupted sample.
We revisit the problem of estimating the mean of a real-valued distribution, presenting a novel estimator with sub-Gaussian convergence: intuitively, our estimator, on any distribution, is as accurate as the sample mean is for the Gaussian distributi on of matching variance. Crucially, in contrast to prior works, our estimator does not require prior knowledge of the variance, and works across the entire gamut of distributions with bounded variance, including those without any higher moments. Parameterized by the sample size $n$, the failure probability $delta$, and the variance $sigma^2$, our estimator is accurate to within $sigmacdot(1+o(1))sqrt{frac{2logfrac{1}{delta}}{n}}$, tight up to the $1+o(1)$ factor. Our estimator construction and analysis gives a framework generalizable to other problems, tightly analyzing a sum of dependent random variables by viewing the sum implicitly as a 2-parameter $psi$-estimator, and constructing bounds using mathematical programming and duality techniques.
We study the problem of {em list-decodable mean estimation} for bounded covariance distributions. Specifically, we are given a set $T$ of points in $mathbb{R}^d$ with the promise that an unknown $alpha$-fraction of points in $T$, where $0< alpha < 1/ 2$, are drawn from an unknown mean and bounded covariance distribution $D$, and no assumptions are made on the remaining points. The goal is to output a small list of hypothesis vectors such that at least one of them is close to the mean of $D$. We give the first practically viable estimator for this problem. In more detail, our algorithm is sample and computationally efficient, and achieves information-theoretically near-optimal error. While the only prior algorithm for this setting inherently relied on the ellipsoid method, our algorithm is iterative and only uses spectral techniques. Our main technical innovation is the design of a soft outlier removal procedure for high-dimensional heavy-tailed datasets with a majority of outliers.
We undertake a precise study of the non-asymptotic properties of vanilla generative adversarial networks (GANs) and derive theoretical guarantees in the problem of estimating an unknown $d$-dimensional density $p^*$ under a proper choice of the class of generators and discriminators. We prove that the resulting density estimate converges to $p^*$ in terms of Jensen-Shannon (JS) divergence at the rate $(log n/n)^{2beta/(2beta+d)}$ where $n$ is the sample size and $beta$ determines the smoothness of $p^*.$ This is the first result in the literature on density estimation using vanilla GANs with JS rates faster than $n^{-1/2}$ in the regime $beta>d/2.$

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا