List-Decodable Mean Estimation via Iterative Multi-Filtering

64 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Ilias Diakonikolas

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Ilias Diakonikolas - Daniel M. Kane - Daniel Kongsgaard

التعلم الآلي بنى وهياكل البيانات والخوارزميات نظرية الإحصاء

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We study the problem of {em list-decodable mean estimation} for bounded covariance distributions. Specifically, we are given a set $T$ of points in $mathbb{R}^d$ with the promise that an unknown $alpha$-fraction of points in $T$, where $0< alpha < 1/2$, are drawn from an unknown mean and bounded covariance distribution $D$, and no assumptions are made on the remaining points. The goal is to output a small list of hypothesis vectors such that at least one of them is close to the mean of $D$. We give the first practically viable estimator for this problem. In more detail, our algorithm is sample and computationally efficient, and achieves information-theoretically near-optimal error. While the only prior algorithm for this setting inherently relied on the ellipsoid method, our algorithm is iterative and only uses spectral techniques. Our main technical innovation is the design of a soft outlier removal procedure for high-dimensional heavy-tailed datasets with a majority of outliers.

قيم البحث

اقرأ أيضاً

List-Decodable Linear Regression

77 - Sushrut Karmalkar , Adam R. Klivans , Pravesh K. Kothari 2019

We give the first polynomial-time algorithm for robust regression in the list-decodable setting where an adversary can corrupt a greater than $1/2$ fraction of examples. For any $alpha < 1$, our algorithm takes as input a sample ${(x_i,y_i)}_{i leq n}$ of $n$ linear equations where $alpha n$ of the equations satisfy $y_i = langle x_i,ell^*rangle +zeta$ for some small noise $zeta$ and $(1-alpha)n$ of the equations are {em arbitrarily} chosen. It outputs a list $L$ of size $O(1/alpha)$ - a fixed constant - that contains an $ell$ that is close to $ell^*$. Our algorithm succeeds whenever the inliers are chosen from a emph{certifiably} anti-concentrated distribution $D$. In particular, this gives a $(d/alpha)^{O(1/alpha^8)}$ time algorithm to find a $O(1/alpha)$ size list when the inlier distribution is standard Gaussian. For discrete product distributions that are anti-concentrated only in emph{regular} directions, we give an algorithm that achieves similar guarantee under the promise that $ell^*$ has all coordinates of the same magnitude. To complement our result, we prove that the anti-concentration assumption on the inliers is information-theoretically necessary. Our algorithm is based on a new framework for list-decodable learning that strengthens the `identifiability to algorithms paradigm based on the sum-of-squares method. In an independent and concurrent work, Raghavendra and Yau also used the Sum-of-Squares method to give a similar result for list-decodable regression.

بنى وهياكل البيانات والخوارزميات التعلم الآلي التعلم الالي

Learning-based Support Estimation in Sublinear Time

70 - Talya Eden , Piotr Indyk , Shyam Narayanan 2021

We consider the problem of estimating the number of distinct elements in a large data set (or, equivalently, the support size of the distribution induced by the data set) from a random sample of its elements. The problem occurs in many applications, including biology, genomics, computer systems and linguistics. A line of research spanning the last decade resulted in algorithms that estimate the support up to $ pm varepsilon n$ from a sample of size $O(log^2(1/varepsilon) cdot n/log n)$, where $n$ is the data set size. Unfortunately, this bound is known to be tight, limiting further improvements to the complexity of this problem. In this paper we consider estimation algorithms augmented with a machine-learning-based predictor that, given any element, returns an estimation of its frequency. We show that if the predictor is correct up to a constant approximation factor, then the sample complexity can be reduced significantly, to [ log (1/varepsilon) cdot n^{1-Theta(1/log(1/varepsilon))}. ] We evaluate the proposed algorithms on a collection of data sets, using the neural-network based estimators from {Hsu et al, ICLR19} as predictors. Our experiments demonstrate substantial (up to 3x) improvements in the estimation accuracy compared to the state of the art algorithm.

التعلم الآلي بنى وهياكل البيانات والخوارزميات نظرية الإحصاء

List-decodable Codes and Covering Codes

275 - Hao Chen 2021

The list-decodable code has been an active topic in theoretical computer science since the seminal papers of M. Sudan and V. Guruswami in 1997-1998. There are general result about the Johnson radius and the list-decoding capacity theorem for random c odes. However few results about general constraints on rates, list-decodable radius and list sizes for list-decodable codes have been obtained. In this paper we show that rates, list-decodable radius and list sizes are closely related to the classical topic of covering codes. We prove new simple but strong upper bounds for list-decodable codes based on various covering codes. Then any good upper bound on the covering radius imply a good upper bound on the size of list-decodable codes. Hence the list-decodablity of codes is a strong constraint from the view of covering codes. Our covering code upper bounds for $(d,1)$ list decodable codes give highly non-trivial upper bounds on the sizes of codes with the given minimum Hamming distances. Our results give exponential improvements on the recent generalized Singleton upper bound of Shangguan and Tamo in STOC 2020, when the code lengths are very large. The asymptotic forms of covering code bounds can partially recover the list-decoding capacity theorem, the Blinovsky bound and the combinatorial bound of Guruswami-H{aa}stad-Sudan-Zuckerman. We also suggest to study the combinatorial covering list-decodable codes as a natural generalization of combinatorial list-decodable codes.

نظرية المعلومات نظرية المعلومات

Outlier Robust Mean Estimation with Subgaussian Rates via Stability

79 - Ilias Diakonikolas , Daniel M. Kane , Ankit Pensia 2020

We study the problem of outlier robust high-dimensional mean estimation under a finite covariance assumption, and more broadly under finite low-degree moment assumptions. We consider a standard stability condition from the recent robust statistics li terature and prove that, except with exponentially small failure probability, there exists a large fraction of the inliers satisfying this condition. As a corollary, it follows that a number of recently developed algorithms for robust mean estimation, including iterative filtering and non-convex gradient descent, give optimal error estimators with (near-)subgaussian rates. Previous analyses of these algorithms gave significantly suboptimal rates. As a corollary of our approach, we obtain the first computationally efficient algorithm with subgaussian rate for outlier-robust mean estimation in the strong contamination model under a finite covariance assumption.

نظرية الإحصاء بنى وهياكل البيانات والخوارزميات التعلم الآلي

List Learning with Attribute Noise

117 - Mahdi Cheraghchi , Elena Grigorescu , Brendan Juba 2020

We introduce and study the model of list learning with attribute noise. Learning with attribute noise was introduced by Shackelford and Volper (COLT 1988) as a variant of PAC learning, in which the algorithm has access to noisy examples and uncorrupt ed labels, and the goal is to recover an accurate hypothesis. Sloan (COLT 1988) and Goldman and Sloan (Algorithmica 1995) discovered information-theoretic limits to learning in this model, which have impeded further progress. In this article we extend the model to that of list learning, drawing inspiration from the list-decoding model in coding theory, and its recent variant studied in the context of learning. On the positive side, we show that sparse conjunctions can be efficiently list learned under some assumptions on the underlying ground-truth distribution. On the negative side, our results show that even in the list-learning model, efficient learning of parities and majorities is not possible regardless of the representation used.

التعلم الآلي بنى وهياكل البيانات والخوارزميات التعلم الالي