ﻻ يوجد ملخص باللغة العربية
In this paper, we present a spectral-based approach to study the linear approximation of two-layer neural networks. We first consider the case of single neuron and show that the linear approximability, quantified by the Kolmogorov width, is controlled by the eigenvalue decay of an associate kernel. Then, we show that similar results also hold for two-layer neural networks. This spectral-based approach allows us to obtain upper bounds, lower bounds, and explicit hard examples in a united manner. In particular, these bounds imply that for networks activated by smooth functions, restricting the norms of inner-layer weights may significantly impair the expressiveness. By contrast, for non-smooth activation functions, such as ReLU, the network expressiveness is independent of the inner-layer weight norms. In addition, we prove that for a family of non-smooth activation functions, including ReLU, approximating any single neuron with random features suffers from the emph{curse of dimensionality}. This provides an explicit separation of expressiveness between neural networks and random feature models.
Recently, several studies have proven the global convergence and generalization abilities of the gradient descent method for two-layer ReLU networks. Most studies especially focused on the regression problems with the squared loss function, except fo
There has been a recent surge of interest in understanding the convergence of gradient descent (GD) and stochastic gradient descent (SGD) in overparameterized neural networks. Most previous works assume that the training data is provided a priori in
Compression techniques for deep neural network models are becoming very important for the efficient execution of high-performance deep learning systems on edge-computing devices. The concept of model compression is also important for analyzing the ge
Discrete Fourier transforms provide a significant speedup in the computation of convolutions in deep learning. In this work, we demonstrate that, beyond its advantages for efficient computation, the spectral domain also provides a powerful representa