ﻻ يوجد ملخص باللغة العربية
We analyze the joint probability distribution on the lengths of the vectors of hidden variables in different layers of a fully connected deep network, when the weights and biases are chosen randomly according to Gaussian distributions, and the input is in ${ -1, 1}^N$. We show that, if the activation function $phi$ satisfies a minimal set of assumptions, satisfied by all activation functions that we know that are used in practice, then, as the width of the network gets large, the `length process converges in probability to a length map that is determined as a simple function of the variances of the random weights and biases, and the activation function $phi$. We also show that this convergence may fail for $phi$ that violate our assumptions.
Recent work on the representation of functions on sets has considered the use of summation in a latent space to enforce permutation invariance. In particular, it has been conjectured that the dimension of this latent space may remain fixed as the car
It is well-known that overparametrized neural networks trained using gradient-based methods quickly achieve small training error with appropriate hyperparameter settings. Recent papers have proved this statement theoretically for highly overparametri
The fine-grained relationship between form and function with respect to deep neural network architecture design and hardware-specific acceleration is one area that is not well studied in the research literature, with form often dictated by accuracy a
Dropout is a simple but effective technique for learning in neural networks and other settings. A sound theoretical understanding of dropout is needed to determine when dropout should be applied and how to use it most effectively. In this paper we co
Much research effort has been devoted to developing methods for reconstructing the links of a network from dynamics of its nodes. Many current methods require the measurements of the dynamics of all the nodes be known. In real-world problems, it is c