ﻻ يوجد ملخص باللغة العربية
Gradient descent finds a global minimum in training deep neural networks despite the objective function being non-convex. The current paper proves gradient descent achieves zero training loss in polynomial time for a deep over-parameterized neural network with residual connections (ResNet). Our analysis relies on the particular structure of the Gram matrix induced by the neural network architecture. This structure allows us to show the Gram matrix is stable throughout the training process and this stability implies the global optimality of the gradient descent algorithm. We further extend our analysis to deep residual convolutional neural networks and obtain a similar convergence result.
In this paper, we theoretically prove that gradient descent can find a global minimum of non-convex optimization of all layers for nonlinear deep neural networks of sizes commonly encountered in practice. The theory developed in this paper only requi
Existing global convergence guarantees of (stochastic) gradient descent do not apply to practical deep networks in the practical regime of deep learning beyond the neural tangent kernel (NTK) regime. This paper proposes an algorithm, which is ensured
Representations are fundamental to artificial intelligence. The performance of a learning system depends on the type of representation used for representing the data. Typically, these representations are hand-engineered using domain knowledge. More r
Spiking Neural Networks (SNNs) have been attached great importance due to their biological plausibility and high energy-efficiency on neuromorphic chips. As these chips are usually resource-constrained, the compression of SNNs is thus crucial along t
The properties of flat minima in the empirical risk landscape of neural networks have been debated for some time. Increasing evidence suggests they possess better generalization capabilities with respect to sharp ones. First, we discuss Gaussian mixt