ﻻ يوجد ملخص باللغة العربية
We present a novel methodology based on a Taylor expansion of the network output for obtaining analytical expressions for the expected value of the network weights and output under stochastic training. Using these analytical expressions the effects of the hyperparameters and the noise variance of the optimization algorithm on the performance of the deep neural network are studied. In the early phases of training with a small noise coefficient, the output is equivalent to a linear model. In this case the network can generalize better due to the noise preventing the output from fully converging on the train data, however the noise does not result in any explicit regularization. In the later training stages, when higher order approximations are required, the impact of the noise becomes more significant, i.e. in a model which is non-linear in the weights noise can regularize the output function resulting in better generalization as witnessed by its influence on the weight Hessian, a commonly used metric for generalization capabilities.
We propose a new point of view for regularizing deep neural networks by using the norm of a reproducing kernel Hilbert space (RKHS). Even though this norm cannot be computed, it admits upper and lower approximations leading to various practical strat
Large-scale numerical simulations are used across many scientific disciplines to facilitate experimental development and provide insights into underlying physical processes, but they come with a significant computational cost. Deep neural networks (D
It has been empirically observed that the flatness of minima obtained from training deep networks seems to correlate with better generalization. However, for deep networks with positively homogeneous activations, most measures of sharpness/flatness a
Forecasting high-dimensional time series plays a crucial role in many applications such as demand forecasting and financial predictions. Modern datasets can have millions of correlated time-series that evolve together, i.e they are extremely high dim