Doubly infinite residual networks: a diffusion process approach


الملخص بالإنكليزية

When neural networks parameters are initialized as i.i.d., neural networks exhibit undesirable forward and backward properties as the number of layers increases, e.g., vanishing dependency on the input, and perfectly correlated outputs for any two inputs. To overcome these drawbacks Peluchetti and Favaro (2020) considered fully connected residual networks (ResNets) with parameters distributions that shrink as the number of layers increases. In particular, they established an interplay between infinitely deep ResNets and solutions to stochastic differential equations, i.e. diffusion processes, showing that infinitely deep ResNets does not suffer from undesirable forward properties. In this paper, we review the forward-propagation results of Peluchetti and Favaro (2020), extending them to the setting of convolutional ResNets. Then, we study analogous backward-propagation results, which directly relate to the problem of training deep ResNets. Finally, we extend our study to the doubly infinite regime where both networks width and depth grow unboundedly. Within this novel regime the dynamics of quantities of interest converge, at initialization, to deterministic limits. This allow us to provide analytical expressions for inference, both in the case of weakly trained and fully trained networks. These results point to a limited expressive power of doubly infinite ResNets when the unscaled parameters are i.i.d, and residual blocks are shallow.

تحميل البحث