On the Power of Localized Perceptron for Label-Optimal Learning of Halfspaces with Adversarial Noise

136 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Jie Shen

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Jie Shen

التعلم الآلي بنى وهياكل البيانات والخوارزميات التعلم الالي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We study {em online} active learning of homogeneous halfspaces in $mathbb{R}^d$ with adversarial noise where the overall probability of a noisy label is constrained to be at most $ u$. Our main contribution is a Perceptron-like online active learning algorithm that runs in polynomial time, and under the conditions that the marginal distribution is isotropic log-concave and $ u = Omega(epsilon)$, where $epsilon in (0, 1)$ is the target error rate, our algorithm PAC learns the underlying halfspace with near-optimal label complexity of $tilde{O}big(d cdot polylog(frac{1}{epsilon})big)$ and sample complexity of $tilde{O}big(frac{d}{epsilon} big)$. Prior to this work, existing online algorithms designed for tolerating the adversarial noise are subject to either label complexity polynomial in $frac{1}{epsilon}$, or suboptimal noise tolerance, or restrictive marginal distributions. With the additional prior knowledge that the underlying halfspace is $s$-sparse, we obtain attribute-efficient label complexity of $tilde{O}big( s cdot polylog(d, frac{1}{epsilon}) big)$ and sample complexity of $tilde{O}big(frac{s}{epsilon} cdot polylog(d) big)$. As an immediate corollary, we show that under the agnostic model where no assumption is made on the noise rate $ u$, our active learner achieves an error rate of $O(OPT) + epsilon$ with the same running time and label and sample complexity, where $OPT$ is the best possible error rate achievable by any homogeneous halfspace.

قيم البحث

83 - Jie Shen 2021

We study efficient PAC learning of homogeneous halfspaces in $mathbb{R}^d$ in the presence of malicious noise of Valiant~(1985). This is a challenging noise model and only until recently has near-optimal noise tolerance bound been established under t he mild condition that the unlabeled data distribution is isotropic log-concave. However, it remains unsettled how to obtain the optimal sample complexity simultaneously. In this work, we present a new analysis for the algorithm of Awasthi~et~al.~(2017) and show that it essentially achieves the near-optimal sample complexity bound of $tilde{O}(d)$, improving the best known result of $tilde{O}(d^2)$. Our main ingredient is a novel incorporation of a matrix Chernoff-type inequality to bound the spectrum of an empirical covariance matrix for well-behaved distributions, in conjunction with a careful exploration of the localization schemes of Awasthi~et~al.~(2017). We further extend the algorithm and analysis to the more general and stronger nasty noise model of Bshouty~et~al.~(2002), showing that it is still possible to achieve near-optimal noise tolerance and sample complexity in polynomial time.

التعلم الآلي بنى وهياكل البيانات والخوارزميات التعلم الالي

Attribute-Efficient Learning of Halfspaces with Malicious Noise: Near-Optimal Label Complexity and Noise Tolerance

69 - Jie Shen , Chicheng Zhang 2020

This paper is concerned with computationally efficient learning of homogeneous sparse halfspaces in $mathbb{R}^d$ under noise. Though recent works have established attribute-efficient learning algorithms under various types of label noise (e.g. bound ed noise), it remains an open question when and how $s$-sparse halfspaces can be efficiently learned under the challenging malicious noise model, where an adversary may corrupt both the unlabeled examples and the labels. We answer this question in the affirmative by designing a computationally efficient active learning algorithm with near-optimal label complexity of $tilde{O}big({s log^4 frac d epsilon} big)$ and noise tolerance $eta = Omega(epsilon)$, where $epsilon in (0, 1)$ is the target error rate, under the assumption that the distribution over (uncorrupted) unlabeled examples is isotropic log-concave. Our algorithm can be straightforwardly tailored to the passive learning setting, and we show that the sample complexity is $tilde{O}big({frac 1 epsilon s^2 log^5 d} big)$ which also enjoys the attribute efficiency. Our main techniques include attribute-efficient paradigms for instance reweighting and for empirical risk minimization, and a new analysis of uniform concentration for unbounded data -- all of them crucially take the structure of the underlying halfspace into account.

التعلم الالي بنى وهياكل البيانات والخوارزميات التعلم الآلي

Provable Robustness of Adversarial Training for Learning Halfspaces with Noise

129 - Difan Zou , Spencer Frei , Quanquan Gu 2021

We analyze the properties of adversarial training for learning adversarially robust halfspaces in the presence of agnostic label noise. Denoting $mathsf{OPT}_{p,r}$ as the best robust classification error achieved by a halfspace that is robust to per turbations of $ell_{p}$ balls of radius $r$, we show that adversarial training on the standard binary cross-entropy loss yields adversarially robust halfspaces up to (robust) classification error $tilde O(sqrt{mathsf{OPT}_{2,r}})$ for $p=2$, and $tilde O(d^{1/4} sqrt{mathsf{OPT}_{infty, r}} + d^{1/2} mathsf{OPT}_{infty,r})$ when $p=infty$. Our results hold for distributions satisfying anti-concentration properties enjoyed by log-concave isotropic distributions among others. We additionally show that if one instead uses a nonconvex sigmoidal loss, adversarial training yields halfspaces with an improved robust classification error of $O(mathsf{OPT}_{2,r})$ for $p=2$, and $O(d^{1/4}mathsf{OPT}_{infty, r})$ when $p=infty$. To the best of our knowledge, this is the first work to show that adversarial training provably yields robust classifiers in the presence of noise.

التعلم الآلي التشفير والأمن التحسين والتحكم

Nearly Tight Bounds for Robust Proper Learning of Halfspaces with a Margin

83 - Ilias Diakonikolas , Daniel M. Kane , Pasin Manurangsi 2019

We study the problem of {em properly} learning large margin halfspaces in the agnostic PAC model. In more detail, we study the complexity of properly learning $d$-dimensional halfspaces on the unit ball within misclassification error $alpha cdot math rm{OPT}_{gamma} + epsilon$, where $mathrm{OPT}_{gamma}$ is the optimal $gamma$-margin error rate and $alpha geq 1$ is the approximation ratio. We give learning algorithms and computational hardness results for this problem, for all values of the approximation ratio $alpha geq 1$, that are nearly-matching for a range of parameters. Specifically, for the natural setting that $alpha$ is any constant bigger than one, we provide an essentially tight complexity characterization. On the positive side, we give an $alpha = 1.01$-approximate proper learner that uses $O(1/(epsilon^2gamma^2))$ samples (which is optimal) and runs in time $mathrm{poly}(d/epsilon) cdot 2^{tilde{O}(1/gamma^2)}$. On the negative side, we show that {em any} constant factor approximate proper learner has runtime $mathrm{poly}(d/epsilon) cdot 2^{(1/gamma)^{2-o(1)}}$, assuming the Exponential Time Hypothesis.

التعلم الآلي بنى وهياكل البيانات والخوارزميات التعلم الالي

Near-Optimal SQ Lower Bounds for Agnostically Learning Halfspaces and ReLUs under Gaussian Marginals

76 - Ilias Diakonikolas , Daniel M. Kane , Nikos Zarifis 2020

We study the fundamental problems of agnostically learning halfspaces and ReLUs under Gaussian marginals. In the former problem, given labeled examples $(mathbf{x}, y)$ from an unknown distribution on $mathbb{R}^d times { pm 1}$, whose marginal distr ibution on $mathbf{x}$ is the standard Gaussian and the labels $y$ can be arbitrary, the goal is to output a hypothesis with 0-1 loss $mathrm{OPT}+epsilon$, where $mathrm{OPT}$ is the 0-1 loss of the best-fitting halfspace. In the latter problem, given labeled examples $(mathbf{x}, y)$ from an unknown distribution on $mathbb{R}^d times mathbb{R}$, whose marginal distribution on $mathbf{x}$ is the standard Gaussian and the labels $y$ can be arbitrary, the goal is to output a hypothesis with square loss $mathrm{OPT}+epsilon$, where $mathrm{OPT}$ is the square loss of the best-fitting ReLU. We prove Statistical Query (SQ) lower bounds of $d^{mathrm{poly}(1/epsilon)}$ for both of these problems. Our SQ lower bounds provide strong evidence that current upper bounds for these tasks are essentially best possible.

التعلم الآلي بنى وهياكل البيانات والخوارزميات نظرية الإحصاء