Adversarial VC-dimension and Sample Complexity of Neural Networks

97 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Zetong Qi

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Zetong Qi - T.J. Wilder

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Adversarial attacks during the testing phase of neural networks pose a challenge for the deployment of neural networks in security critical settings. These attacks can be performed by adding noise that is imperceptible to humans on top of the original data. By doing so, an attacker can create an adversarial sample, which will cause neural networks to misclassify. In this paper, we seek to understand the theoretical limits of what can be learned by neural networks in the presence of an adversary. We first defined the hypothesis space of a neural network, and showed the relationship between the growth number of the entire neural network and the growth number of each neuron. Combine that with the adversarial Vapnik-Chervonenkis(VC)-dimension of halfspace classifiers, we concluded the adversarial VC-dimension of the neural networks with sign activation functions.

قيم البحث

اقرأ أيضاً

On sample complexity of neural networks

73 - Alexander Usvyatsov 2019

We consider functions defined by deep neural networks as definable objects in an o-miminal expansion of the real field, and derive an almost linear (in the number of weights) bound on sample complexity of such networks.

التعلم الآلي التعلم الالي

Convergence and Sample Complexity of SGD in GANs

295 - Vasilis Kontonis , Sihan Liu , Christos Tzamos 2020

We provide theoretical convergence guarantees on training Generative Adversarial Networks (GANs) via SGD. We consider learning a target distribution modeled by a 1-layer Generator network with a non-linear activation function $phi(cdot)$ parametrized by a $d times d$ weight matrix $mathbf W_*$, i.e., $f_*(mathbf x) = phi(mathbf W_* mathbf x)$. Our main result is that by training the Generator together with a Discriminator according to the Stochastic Gradient Descent-Ascent iteration proposed by Goodfellow et al. yields a Generator distribution that approaches the target distribution of $f_*$. Specifically, we can learn the target distribution within total-variation distance $epsilon$ using $tilde O(d^2/epsilon^2)$ samples which is (near-)information theoretically optimal. Our results apply to a broad class of non-linear activation functions $phi$, including ReLUs and is enabled by a connection with truncated statistics and an appropriate design of the Discriminator network. Our approach relies on a bilevel optimization framework to show that vanilla SGDA works.

التعلم الآلي نظرية الإحصاء نظرية الإحصاء

An error analysis of generative adversarial networks for learning distributions

74 - Jian Huang , Yuling Jiao , Zhen Li 2021

This paper studies how well generative adversarial networks (GANs) learn probability distributions from finite samples. Our main results establish the convergence rates of GANs under a collection of integral probability metrics defined through Holder classes, including the Wasserstein distance as a special case. We also show that GANs are able to adaptively learn data distributions with low-dimensional structures or have Holder densities, when the network architectures are chosen properly. In particular, for distributions concentrated around a low-dimensional set, we show that the learning rates of GANs do not depend on the high ambient dimension, but on the lower intrinsic dimension. Our analysis is based on a new oracle inequality decomposing the estimation error into the generator and discriminator approximation error and the statistical error, which may be of independent interest.

التعلم الآلي نظرية الإحصاء التعلم الالي

Quadratic Upper Bound for Recursive Teaching Dimension of Finite VC Classes

61 - Lunjia Hu , Ruihan Wu , Tianhong Li 2017

In this work we study the quantitative relation between the recursive teaching dimension (RTD) and the VC dimension (VCD) of concept classes of finite sizes. The RTD of a concept class $mathcal C subseteq {0, 1}^n$, introduced by Zilles et al. (2011) , is a combinatorial complexity measure characterized by the worst-case number of examples necessary to identify a concept in $mathcal C$ according to the recursive teaching model. For any finite concept class $mathcal C subseteq {0,1}^n$ with $mathrm{VCD}(mathcal C)=d$, Simon & Zilles (2015) posed an open problem $mathrm{RTD}(mathcal C) = O(d)$, i.e., is RTD linearly upper bounded by VCD? Previously, the best known result is an exponential upper bound $mathrm{RTD}(mathcal C) = O(d cdot 2^d)$, due to Chen et al. (2016). In this paper, we show a quadratic upper bound: $mathrm{RTD}(mathcal C) = O(d^2)$, much closer to an answer to the open problem. We also discuss the challenges in fully solving the problem.

التعلم الآلي الذكاء الاصطناعي التعلم الالي

Characteristics of Monte Carlo Dropout in Wide Neural Networks

101 - Joachim Sicking , Maram Akila , Tim Wirtz 2020

Monte Carlo (MC) dropout is one of the state-of-the-art approaches for uncertainty estimation in neural networks (NNs). It has been interpreted as approximately performing Bayesian inference. Based on previous work on the approximation of Gaussian pr ocesses by wide and deep neural networks with random weights, we study the limiting distribution of wide untrained NNs under dropout more rigorously and prove that they as well converge to Gaussian processes for fixed sets of weights and biases. We sketch an argument that this property might also hold for infinitely wide feed-forward networks that are trained with (full-batch) gradient descent. The theory is contrasted by an empirical analysis in which we find correlations and non-Gaussian behaviour for the pre-activations of finite width NNs. We therefore investigate how (strongly) correlated pre-activations can induce non-Gaussian behavior in NNs with strongly correlated weights.

التعلم الآلي نظرية الإحصاء التعلم الالي