ﻻ يوجد ملخص باللغة العربية
We consider shallow (single hidden layer) neural networks and characterize their performance when trained with stochastic gradient descent as the number of hidden units $N$ and gradient descent steps grow to infinity. In particular, we investigate the effect of different scaling schemes, which lead to different normalizations of the neural network, on the networks statistical output, closing the gap between the $1/sqrt{N}$ and the mean-field $1/N$ normalization. We develop an asymptotic expansion for the neural networks statistical output pointwise with respect to the scaling parameter as the number of hidden units grows to infinity. Based on this expansion, we demonstrate mathematically that to leading order in $N$, there is no bias-variance trade off, in that both bias and variance (both explicitly characterized) decrease as the number of hidden units increases and time grows. In addition, we show that to leading order in $N$, the variance of the neural networks statistical output decays as the implied normalization by the scaling parameter approaches the mean field normalization. Numerical studies on the MNIST and CIFAR10 datasets show that test and train accuracy monotonically improve as the neural networks normalization gets closer to the mean field normalization.
We consider the approximation rates of shallow neural networks with respect to the variation norm. Upper bounds on these rates have been established for sigmoidal and ReLU activation functions, but it has remained an important open problem whether th
Dropout is a regularisation technique in neural network training where unit activations are randomly set to zero with a given probability emph{independently}. In this work, we propose a generalisation of dropout and other multiplicative noise injecti
We consider the teacher-student setting of learning shallow neural networks with quadratic activations and planted weight matrix $W^*inmathbb{R}^{mtimes d}$, where $m$ is the width of the hidden layer and $dle m$ is the data dimension. We study the o
We consider the variation space corresponding to a dictionary of functions in $L^2(Omega)$ and present the basic theory of approximation in these spaces. Specifically, we compare the definition based on integral representations with the definition in
Generative adversarial networks (GANs) are highly effective unsupervised learning frameworks that can generate very sharp data, even for data such as images with complex, highly multimodal distributions. However GANs are known to be very hard to trai