ترغب بنشر مسار تعليمي؟ اضغط هنا

Memory-Sample Lower Bounds for Learning Parity with Noise

266   0   0.0 ( 0 )
 نشر من قبل Sumegha Garg
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

In this work, we show, for the well-studied problem of learning parity under noise, where a learner tries to learn $x=(x_1,ldots,x_n) in {0,1}^n$ from a stream of random linear equations over $mathrm{F}_2$ that are correct with probability $frac{1}{2}+varepsilon$ and flipped with probability $frac{1}{2}-varepsilon$, that any learning algorithm requires either a memory of size $Omega(n^2/varepsilon)$ or an exponential number of samples. In fact, we study memory-sample lower bounds for a large class of learning problems, as characterized by [GRT18], when the samples are noisy. A matrix $M: A times X rightarrow {-1,1}$ corresponds to the following learning problem with error parameter $varepsilon$: an unknown element $x in X$ is chosen uniformly at random. A learner tries to learn $x$ from a stream of samples, $(a_1, b_1), (a_2, b_2) ldots$, where for every $i$, $a_i in A$ is chosen uniformly at random and $b_i = M(a_i,x)$ with probability $1/2+varepsilon$ and $b_i = -M(a_i,x)$ with probability $1/2-varepsilon$ ($0<varepsilon< frac{1}{2}$). Assume that $k,ell, r$ are such that any submatrix of $M$ of at least $2^{-k} cdot |A|$ rows and at least $2^{-ell} cdot |X|$ columns, has a bias of at most $2^{-r}$. We show that any learning algorithm for the learning problem corresponding to $M$, with error, requires either a memory of size at least $Omegaleft(frac{k cdot ell}{varepsilon} right)$, or at least $2^{Omega(r)}$ samples. In particular, this shows that for a large class of learning problems, same as those in [GRT18], any learning algorithm requires either a memory of size at least $Omegaleft(frac{(log |X|) cdot (log |A|)}{varepsilon}right)$ or an exponential number of noisy samples. Our proof is based on adapting the arguments in [Raz17,GRT18] to the noisy case.



قيم البحث

اقرأ أيضاً

In this work, we initiate a formal study of probably approximately correct (PAC) learning under evasion attacks, where the adversarys goal is to emph{misclassify} the adversarially perturbed sample point $widetilde{x}$, i.e., $h(widetilde{x}) eq c(wi detilde{x})$, where $c$ is the ground truth concept and $h$ is the learned hypothesis. Previous work on PAC learning of adversarial examples have all modeled adversarial examples as corrupted inputs in which the goal of the adversary is to achieve $h(widetilde{x}) eq c(x)$, where $x$ is the original untampered instance. These two definitions of adversarial risk coincide for many natural distributions, such as images, but are incomparable in general. We first prove that for many theoretically natural input spaces of high dimension $n$ (e.g., isotropic Gaussian in dimension $n$ under $ell_2$ perturbations), if the adversary is allowed to apply up to a sublinear $o(||x||)$ amount of perturbations on the test instances, PAC learning requires sample complexity that is exponential in $n$. This is in contrast with results proved using the corrupted-input framework, in which the sample complexity of robust learning is only polynomially more. We then formalize hybrid attacks in which the evasion attack is preceded by a poisoning attack. This is perhaps reminiscent of trapdoor attacks in which a poisoning phase is involved as well, but the evasion phase here uses the error-region definition of risk that aims at misclassifying the perturbed instances. In this case, we show PAC learning is sometimes impossible all together, even when it is possible without the attack (e.g., due to the bounded VC dimension).
We study the problem of high-dimensional linear regression in a robust model where an $epsilon$-fraction of the samples can be adversarially corrupted. We focus on the fundamental setting where the covariates of the uncorrupted samples are drawn from a Gaussian distribution $mathcal{N}(0, Sigma)$ on $mathbb{R}^d$. We give nearly tight upper bounds and computational lower bounds for this problem. Specifically, our main contributions are as follows: For the case that the covariance matrix is known to be the identity, we give a sample near-optimal and computationally efficient algorithm that outputs a candidate hypothesis vector $widehat{beta}$ which approximates the unknown regression vector $beta$ within $ell_2$-norm $O(epsilon log(1/epsilon) sigma)$, where $sigma$ is the standard deviation of the random observation noise. An error of $Omega (epsilon sigma)$ is information-theoretically necessary, even with infinite sample size. Prior work gave an algorithm for this problem with sample complexity $tilde{Omega}(d^2/epsilon^2)$ whose error guarantee scales with the $ell_2$-norm of $beta$. For the case of unknown covariance, we show that we can efficiently achieve the same error guarantee as in the known covariance case using an additional $tilde{O}(d^2/epsilon^2)$ unlabeled examples. On the other hand, an error of $O(epsilon sigma)$ can be information-theoretically attained with $O(d/epsilon^2)$ samples. We prove a Statistical Query (SQ) lower bound providing evidence that this quadratic tradeoff in the sample size is inherent. More specifically, we show that any polynomial time SQ learning algorithm for robust linear regression (in Hubers contamination model) with estimation complexity $O(d^{2-c})$, where $c>0$ is an arbitrarily small constant, must incur an error of $Omega(sqrt{epsilon} sigma)$.
Function inversion is the problem that given a random function $f: [M] to [N]$, we want to find pre-image of any image $f^{-1}(y)$ in time $T$. In this work, we revisit this problem under the preprocessing model where we can compute some auxiliary in formation or advice of size $S$ that only depends on $f$ but not on $y$. It is a well-studied problem in the classical settings, however, it is not clear how quantum algorithms can solve this task any better besides invoking Grovers algorithm, which does not leverage the power of preprocessing. Nayebi et al. proved a lower bound $ST^2 ge tildeOmega(N)$ for quantum algorithms inverting permutations, however, they only consider algorithms with classical advice. Hhan et al. subsequently extended this lower bound to fully quantum algorithms for inverting permutations. In this work, we give the same asymptotic lower bound to fully quantum algorithms for inverting functions for fully quantum algorithms under the regime where $M = O(N)$. In order to prove these bounds, we generalize the notion of quantum random access code, originally introduced by Ambainis et al., to the setting where we are given a list of (not necessarily independent) random variables, and we wish to compress them into a variable-length encoding such that we can retrieve a random element just using the encoding with high probability. As our main technical contribution, we give a nearly tight lower bound (for a wide parameter range) for this generalized notion of quantum random access codes, which may be of independent interest.
123 - Harry Buhrman 1998
We prove lower bounds on the error probability of a quantum algorithm for searching through an unordered list of N items, as a function of the number T of queries it makes. In particular, if T=O(sqrt{N}) then the error is lower bounded by a constant. If we want error <1/2^N then we need T=Omega(N) queries. We apply this to show that a quantum computer cannot do much better than a classical computer when amplifying the success probability of an RP-machine. A classical computer can achieve error <=1/2^k using k applications of the RP-machine, a quantum computer still needs at least ck applications for this (when treating the machine as a black-box), where c>0 is a constant independent of k. Furthermore, we prove a lower bound of Omega(sqrt{log N}/loglog N) queries for quantum bounded-error search of an ordered list of N items.
Differentially private (DP) machine learning allows us to train models on private data while limiting data leakage. DP formalizes this data leakage through a cryptographic game, where an adversary must predict if a model was trained on a dataset D, o r a dataset D that differs in just one example.If observing the training algorithm does not meaningfully increase the adversarys odds of successfully guessing which dataset the model was trained on, then the algorithm is said to be differentially private. Hence, the purpose of privacy analysis is to upper bound the probability that any adversary could successfully guess which dataset the model was trained on.In our paper, we instantiate this hypothetical adversary in order to establish lower bounds on the probability that this distinguishing game can be won. We use this adversary to evaluate the importance of the adversary capabilities allowed in the privacy analysis of DP training algorithms.For DP-SGD, the most common method for training neural networks with differential privacy, our lower bounds are tight and match the theoretical upper bound. This implies that in order to prove better upper bounds, it will be necessary to make use of additional assumptions. Fortunately, we find that our attacks are significantly weaker when additional (realistic)restrictions are put in place on the adversarys capabilities.Thus, in the practical setting common to many real-world deployments, there is a gap between our lower bounds and the upper bounds provided by the analysis: differential privacy is conservative and adversaries may not be able to leak as much information as suggested by the theoretical bound.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا