ﻻ يوجد ملخص باللغة العربية
We analyze the convergence of the averaged stochastic gradient descent for overparameterized two-layer neural networks for regression problems. It was recently found that a neural tangent kernel (NTK) plays an important role in showing the global convergence of gradient-based methods under the NTK regime, where the learning dynamics for overparameterized neural networks can be almost characterized by that for the associated reproducing kernel Hilbert space (RKHS). However, there is still room for a convergence rate analysis in the NTK regime. In this study, we show that the averaged stochastic gradient descent can achieve the minimax optimal convergence rate, with the global convergence guarantee, by exploiting the complexities of the target function and the RKHS associated with the NTK. Moreover, we show that the target function specified by the NTK of a ReLU network can be learned at the optimal convergence rate through a smooth approximation of a ReLU network under certain conditions.
We consider stochastic gradient descent and its averaging variant for binary classification problems in a reproducing kernel Hilbert space. In the traditional analysis using a consistency property of loss functions, it is known that the expected clas
We present a novel neural network Maximum Mean Discrepancy (MMD) statistic by identifying a connection between neural tangent kernel (NTK) and MMD statistic. This connection enables us to develop a computationally efficient and memory-efficient appro
The superior performance of ensemble methods with infinite models are well known. Most of these methods are based on optimization problems in infinite-dimensional spaces with some regularization, for instance, boosting methods and convex neural netwo
Modern deep learning models employ considerably more parameters than required to fit the training data. Whereas conventional statistical wisdom suggests such models should drastically overfit, in practice these models generalize remarkably well. An e
There has been a recent surge of interest in understanding the convergence of gradient descent (GD) and stochastic gradient descent (SGD) in overparameterized neural networks. Most previous works assume that the training data is provided a priori in