No Arabic abstract
We give the first polynomial-time algorithm for robust regression in the list-decodable setting where an adversary can corrupt a greater than $1/2$ fraction of examples. For any $alpha < 1$, our algorithm takes as input a sample ${(x_i,y_i)}_{i leq n}$ of $n$ linear equations where $alpha n$ of the equations satisfy $y_i = langle x_i,ell^*rangle +zeta$ for some small noise $zeta$ and $(1-alpha)n$ of the equations are {em arbitrarily} chosen. It outputs a list $L$ of size $O(1/alpha)$ - a fixed constant - that contains an $ell$ that is close to $ell^*$. Our algorithm succeeds whenever the inliers are chosen from a emph{certifiably} anti-concentrated distribution $D$. In particular, this gives a $(d/alpha)^{O(1/alpha^8)}$ time algorithm to find a $O(1/alpha)$ size list when the inlier distribution is standard Gaussian. For discrete product distributions that are anti-concentrated only in emph{regular} directions, we give an algorithm that achieves similar guarantee under the promise that $ell^*$ has all coordinates of the same magnitude. To complement our result, we prove that the anti-concentration assumption on the inliers is information-theoretically necessary. Our algorithm is based on a new framework for list-decodable learning that strengthens the `identifiability to algorithms paradigm based on the sum-of-squares method. In an independent and concurrent work, Raghavendra and Yau also used the Sum-of-Squares method to give a similar result for list-decodable regression.
In list-decodable subspace recovery, the input is a collection of $n$ points $alpha n$ (for some $alpha ll 1/2$) of which are drawn i.i.d. from a distribution $mathcal{D}$ with a isotropic rank $r$ covariance $Pi_*$ (the emph{inliers}) and the rest are arbitrary, potential adversarial outliers. The goal is to recover a $O(1/alpha)$ size list of candidate covariances that contains a $hat{Pi}$ close to $Pi_*$. Two recent independent works (Raghavendra-Yau, Bakshi-Kothari 2020) gave the first efficient algorithm for this problem. These results, however, obtain an error that grows with the dimension (linearly in [RY] and logarithmically in BK) at the cost of quasi-polynomial running time) and rely on emph{certifiable anti-concentration} - a relatively strict condition satisfied essentially only by the Gaussian distribution. In this work, we improve on these results on all three fronts: emph{dimension-independent} error via a faster fixed-polynomial running time under less restrictive distributional assumptions. Specifically, we give a $poly(1/alpha) d^{O(1)}$ time algorithm that outputs a list containing a $hat{Pi}$ satisfying $|hat{Pi} -Pi_*|_F leq O(1/alpha)$. Our result only needs $mathcal{D}$ to have emph{certifiably hypercontractive} degree 2 polynomials. As a result, in addition to Gaussians, our algorithm applies to the uniform distribution on the hypercube and $q$-ary cubes and arbitrary product distributions with subgaussian marginals. Prior work (Raghavendra and Yau, 2020) had identified such distributions as potential hard examples as such distributions do not exhibit strong enough anti-concentration. When $mathcal{D}$ satisfies certifiable anti-concentration, we obtain a stronger error guarantee of $|hat{Pi}-Pi_*|_F leq eta$ for any arbitrary $eta > 0$ in $d^{O(poly(1/alpha) + log (1/eta))}$ time.
The list-decodable code has been an active topic in theoretical computer science since the seminal papers of M. Sudan and V. Guruswami in 1997-1998. There are general result about the Johnson radius and the list-decoding capacity theorem for random codes. However few results about general constraints on rates, list-decodable radius and list sizes for list-decodable codes have been obtained. In this paper we show that rates, list-decodable radius and list sizes are closely related to the classical topic of covering codes. We prove new simple but strong upper bounds for list-decodable codes based on various covering codes. Then any good upper bound on the covering radius imply a good upper bound on the size of list-decodable codes. Hence the list-decodablity of codes is a strong constraint from the view of covering codes. Our covering code upper bounds for $(d,1)$ list decodable codes give highly non-trivial upper bounds on the sizes of codes with the given minimum Hamming distances. Our results give exponential improvements on the recent generalized Singleton upper bound of Shangguan and Tamo in STOC 2020, when the code lengths are very large. The asymptotic forms of covering code bounds can partially recover the list-decoding capacity theorem, the Blinovsky bound and the combinatorial bound of Guruswami-H{aa}stad-Sudan-Zuckerman. We also suggest to study the combinatorial covering list-decodable codes as a natural generalization of combinatorial list-decodable codes.
Linear regression in $ell_p$-norm is a canonical optimization problem that arises in several applications, including sparse recovery, semi-supervised learning, and signal processing. Generic convex optimization algorithms for solving $ell_p$-regression are slow in practice. Iteratively Reweighted Least Squares (IRLS) is an easy to implement family of algorithms for solving these problems that has been studied for over 50 years. However, these algorithms often diverge for p > 3, and since the work of Osborne (1985), it has been an open problem whether there is an IRLS algorithm that is guaranteed to converge rapidly for p > 3. We propose p-IRLS, the first IRLS algorithm that provably converges geometrically for any $p in [2,infty).$ Our algorithm is simple to implement and is guaranteed to find a $(1+varepsilon)$-approximate solution in $O(p^{3.5} m^{frac{p-2}{2(p-1)}} log frac{m}{varepsilon}) le O_p(sqrt{m} log frac{m}{varepsilon} )$ iterations. Our experiments demonstrate that it performs even better than our theoretical bounds, beats the standard Matlab/CVX implementation for solving these problems by 10--50x, and is the fastest among available implementations in the high-accuracy regime.
Coresets are one of the central methods to facilitate the analysis of large data sets. We continue a recent line of research applying the theory of coresets to logistic regression. First, we show a negative result, namely, that no strongly sublinear sized coresets exist for logistic regression. To deal with intractable worst-case instances we introduce a complexity measure $mu(X)$, which quantifies the hardness of compressing a data set for logistic regression. $mu(X)$ has an intuitive statistical interpretation that may be of independent interest. For data sets with bounded $mu(X)$-complexity, we show that a novel sensitivity sampling scheme produces the first provably sublinear $(1pmvarepsilon)$-coreset. We illustrate the performance of our method by comparing to uniform sampling as well as to state of the art methods in the area. The experiments are conducted on real world benchmark data for logistic regression.
We study the problem of {em list-decodable mean estimation} for bounded covariance distributions. Specifically, we are given a set $T$ of points in $mathbb{R}^d$ with the promise that an unknown $alpha$-fraction of points in $T$, where $0< alpha < 1/2$, are drawn from an unknown mean and bounded covariance distribution $D$, and no assumptions are made on the remaining points. The goal is to output a small list of hypothesis vectors such that at least one of them is close to the mean of $D$. We give the first practically viable estimator for this problem. In more detail, our algorithm is sample and computationally efficient, and achieves information-theoretically near-optimal error. While the only prior algorithm for this setting inherently relied on the ellipsoid method, our algorithm is iterative and only uses spectral techniques. Our main technical innovation is the design of a soft outlier removal procedure for high-dimensional heavy-tailed datasets with a majority of outliers.