Memory-Sample Tradeoffs for Linear Regression with Small Error

نشر في Vatsal Sharan بتاريخ 2019 في مجال الهندسة المعلوماتية والبحث باللغة English تحميل البحث

الملخص بالإنكليزية

We consider the problem of performing linear regression over a stream of $d$-dimensional examples, and show that any algorithm that uses a subquadratic amount of memory exhibits a slower rate of convergence than can be achieved without memory constraints. Specifically, consider a sequence of labeled examples $(a_1,b_1), (a_2,b_2)ldots,$ with $a_i$ drawn independently from a $d$-dimensional isotropic Gaussian, and where $b_i = langle a_i, xrangle + eta_i,$ for a fixed $x in mathbb{R}^d$ with $|x|_2 = 1$ and with independent noise $eta_i$ drawn uniformly from the interval $[-2^{-d/5},2^{-d/5}].$ We show that any algorithm with at most $d^2/4$ bits of memory requires at least $Omega(d log log frac{1}{epsilon})$ samples to approximate $x$ to $ell_2$ error $epsilon$ with probability of success at least $2/3$, for $epsilon$ sufficiently small as a function of $d$. In contrast, for such $epsilon$, $x$ can be recovered to error $epsilon$ with probability $1-o(1)$ with memory $Oleft(d^2 log(1/epsilon)right)$ using $d$ examples. This represents the first nontrivial lower bounds for regression with super-linear memory, and may open the door for strong memory/sample tradeoffs for continuous optimization.

تحميل البحث