ترغب بنشر مسار تعليمي؟ اضغط هنا

Near-Optimal SQ Lower Bounds for Agnostically Learning Halfspaces and ReLUs under Gaussian Marginals

77   0   0.0 ( 0 )
 نشر من قبل Nikos Zarifis
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

We study the fundamental problems of agnostically learning halfspaces and ReLUs under Gaussian marginals. In the former problem, given labeled examples $(mathbf{x}, y)$ from an unknown distribution on $mathbb{R}^d times { pm 1}$, whose marginal distribution on $mathbf{x}$ is the standard Gaussian and the labels $y$ can be arbitrary, the goal is to output a hypothesis with 0-1 loss $mathrm{OPT}+epsilon$, where $mathrm{OPT}$ is the 0-1 loss of the best-fitting halfspace. In the latter problem, given labeled examples $(mathbf{x}, y)$ from an unknown distribution on $mathbb{R}^d times mathbb{R}$, whose marginal distribution on $mathbf{x}$ is the standard Gaussian and the labels $y$ can be arbitrary, the goal is to output a hypothesis with square loss $mathrm{OPT}+epsilon$, where $mathrm{OPT}$ is the square loss of the best-fitting ReLU. We prove Statistical Query (SQ) lower bounds of $d^{mathrm{poly}(1/epsilon)}$ for both of these problems. Our SQ lower bounds provide strong evidence that current upper bounds for these tasks are essentially best possible.



قيم البحث

اقرأ أيضاً

We study the problem of PAC learning one-hidden-layer ReLU networks with $k$ hidden units on $mathbb{R}^d$ under Gaussian marginals in the presence of additive label noise. For the case of positive coefficients, we give the first polynomial-time algo rithm for this learning problem for $k$ up to $tilde{O}(sqrt{log d})$. Previously, no polynomial time algorithm was known, even for $k=3$. This answers an open question posed by~cite{Kliv17}. Importantly, our algorithm does not require any assumptions about the rank of the weight matrix and its complexity is independent of its condition number. On the negative side, for the more general task of PAC learning one-hidden-layer ReLU networks with arbitrary real coefficients, we prove a Statistical Query lower bound of $d^{Omega(k)}$. Thus, we provide a separation between the two classes in terms of efficient learnability. Our upper and lower bounds are general, extending to broader families of activation functions.
Gaussian Graphical Models (GGMs) have wide-ranging applications in machine learning and the natural and social sciences. In most of the settings in which they are applied, the number of observed samples is much smaller than the dimension and they are assumed to be sparse. While there are a variety of algorithms (e.g. Graphical Lasso, CLIME) that provably recover the graph structure with a logarithmic number of samples, they assume various conditions that require the precision matrix to be in some sense well-conditioned. Here we give the first polynomial-time algorithms for learning attractive GGMs and walk-summable GGMs with a logarithmic number of samples without any such assumptions. In particular, our algorithms can tolerate strong dependencies among the variables. Our result for structure recovery in walk-summable GGMs is derived from a more general result for efficient sparse linear regression in walk-summable models without any norm dependencies. We complement our results with experiments showing that many existing algorithms fail even in some simple settings where there are long dependency chains, whereas ours do not.
We study the problem of {em properly} learning large margin halfspaces in the agnostic PAC model. In more detail, we study the complexity of properly learning $d$-dimensional halfspaces on the unit ball within misclassification error $alpha cdot math rm{OPT}_{gamma} + epsilon$, where $mathrm{OPT}_{gamma}$ is the optimal $gamma$-margin error rate and $alpha geq 1$ is the approximation ratio. We give learning algorithms and computational hardness results for this problem, for all values of the approximation ratio $alpha geq 1$, that are nearly-matching for a range of parameters. Specifically, for the natural setting that $alpha$ is any constant bigger than one, we provide an essentially tight complexity characterization. On the positive side, we give an $alpha = 1.01$-approximate proper learner that uses $O(1/(epsilon^2gamma^2))$ samples (which is optimal) and runs in time $mathrm{poly}(d/epsilon) cdot 2^{tilde{O}(1/gamma^2)}$. On the negative side, we show that {em any} constant factor approximate proper learner has runtime $mathrm{poly}(d/epsilon) cdot 2^{(1/gamma)^{2-o(1)}}$, assuming the Exponential Time Hypothesis.
83 - Jie Shen 2021
We study efficient PAC learning of homogeneous halfspaces in $mathbb{R}^d$ in the presence of malicious noise of Valiant~(1985). This is a challenging noise model and only until recently has near-optimal noise tolerance bound been established under t he mild condition that the unlabeled data distribution is isotropic log-concave. However, it remains unsettled how to obtain the optimal sample complexity simultaneously. In this work, we present a new analysis for the algorithm of Awasthi~et~al.~(2017) and show that it essentially achieves the near-optimal sample complexity bound of $tilde{O}(d)$, improving the best known result of $tilde{O}(d^2)$. Our main ingredient is a novel incorporation of a matrix Chernoff-type inequality to bound the spectrum of an empirical covariance matrix for well-behaved distributions, in conjunction with a careful exploration of the localization schemes of Awasthi~et~al.~(2017). We further extend the algorithm and analysis to the more general and stronger nasty noise model of Bshouty~et~al.~(2002), showing that it is still possible to achieve near-optimal noise tolerance and sample complexity in polynomial time.
135 - Jie Shen 2020
We study {em online} active learning of homogeneous halfspaces in $mathbb{R}^d$ with adversarial noise where the overall probability of a noisy label is constrained to be at most $ u$. Our main contribution is a Perceptron-like online active learning algorithm that runs in polynomial time, and under the conditions that the marginal distribution is isotropic log-concave and $ u = Omega(epsilon)$, where $epsilon in (0, 1)$ is the target error rate, our algorithm PAC learns the underlying halfspace with near-optimal label complexity of $tilde{O}big(d cdot polylog(frac{1}{epsilon})big)$ and sample complexity of $tilde{O}big(frac{d}{epsilon} big)$. Prior to this work, existing online algorithms designed for tolerating the adversarial noise are subject to either label complexity polynomial in $frac{1}{epsilon}$, or suboptimal noise tolerance, or restrictive marginal distributions. With the additional prior knowledge that the underlying halfspace is $s$-sparse, we obtain attribute-efficient label complexity of $tilde{O}big( s cdot polylog(d, frac{1}{epsilon}) big)$ and sample complexity of $tilde{O}big(frac{s}{epsilon} cdot polylog(d) big)$. As an immediate corollary, we show that under the agnostic model where no assumption is made on the noise rate $ u$, our active learner achieves an error rate of $O(OPT) + epsilon$ with the same running time and label and sample complexity, where $OPT$ is the best possible error rate achievable by any homogeneous halfspace.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا