No Arabic abstract
Estimates of the generalization error are proved for a residual neural network with $L$ random Fourier features layers $bar z_{ell+1}=bar z_ell + mathrm{Re}sum_{k=1}^Kbar b_{ell k}e^{mathrm{i}omega_{ell k}bar z_ell}+ mathrm{Re}sum_{k=1}^Kbar c_{ell k}e^{mathrm{i}omega_{ell k}cdot x}$. An optimal distribution for the frequencies $(omega_{ell k},omega_{ell k})$ of the random Fourier features $e^{mathrm{i}omega_{ell k}bar z_ell}$ and $e^{mathrm{i}omega_{ell k}cdot x}$ is derived. This derivation is based on the corresponding generalization error for the approximation of the function values $f(x)$. The generalization error turns out to be smaller than the estimate ${|hat f|^2_{L^1(mathbb{R}^d)}}/{(KL)}$ of the generalization error for random Fourier features with one hidden layer and the same total number of nodes $KL$, in the case the $L^infty$-norm of $f$ is much less than the $L^1$-norm of its Fourier transform $hat f$. This understanding of an optimal distribution for random features is used to construct a new training method for a deep residual network. Promising performance of the proposed new algorithm is demonstrated in computational experiments.
Artificial neural networks (ANNs) have become a very powerful tool in the approximation of high-dimensional functions. Especially, deep ANNs, consisting of a large number of hidden layers, have been very successfully used in a series of practical relevant computational problems involving high-dimensional input data ranging from classification tasks in supervised learning to optimal decision problems in reinforcement learning. There are also a number of mathematical results in the scientific literature which study the approximation capacities of ANNs in the context of high-dimensional target functions. In particular, there are a series of mathematical results in the scientific literature which show that sufficiently deep ANNs have the capacity to overcome the curse of dimensionality in the approximation of certain target function classes in the sense that the number of parameters of the approximating ANNs grows at most polynomially in the dimension $d in mathbb{N}$ of the target functions under considerations. In the proofs of several of such high-dimensional approximation results it is crucial that the involved ANNs are sufficiently deep and consist a sufficiently large number of hidden layers which grows in the dimension of the considered target functions. It is the topic of this work to look a bit more detailed to the deepness of the involved ANNs in the approximation of high-dimensional target functions. In particular, the main result of this work proves that there exists a concretely specified sequence of functions which can be approximated without the curse of dimensionality by sufficiently deep ANNs but which cannot be approximated without the curse of dimensionality if the involved ANNs are shallow or not deep enough.
This paper analyzes the generalization error of two-layer neural networks for computing the ground state of the Schrodinger operator on a $d$-dimensional hypercube. We prove that the convergence rate of the generalization error is independent of the dimension $d$, under the a priori assumption that the ground state lies in a spectral Barron space. We verify such assumption by proving a new regularity estimate for the ground state in the spectral Barron space. The later is achieved by a fixed point argument based on the Krein-Rutman theorem.
This article investigates residual a posteriori error estimates and adaptive mesh refinements for time-dependent boundary element methods for the wave equation. We obtain reliable estimates for Dirichlet and acoustic boundary conditions which hold for a large class of discretizations. Efficiency of the error estimate is shown for a natural discretization of low order. Numerical examples confirm the theoretical results. The resulting adaptive mesh refinement procedures in 3d recover the adaptive convergence rates known for elliptic problems.
We study gradient-based regularization methods for neural networks. We mainly focus on two regularization methods: the total variation and the Tikhonov regularization. Applying these methods is equivalent to using neural networks to solve some partial differential equations, mostly in high dimensions in practical applications. In this work, we introduce a general framework to analyze the generalization error of regularized networks. The error estimate relies on two assumptions on the approximation error and the quadrature error. Moreover, we conduct some experiments on the image classification tasks to show that gradient-based methods can significantly improve the generalization ability and adversarial robustness of neural networks. A graphical extension of the gradient-based methods are also considered in the experiments.
We consider the optimization problem associated with fitting two-layer ReLU networks with $k$ hidden neurons, where labels are assumed to be generated by a (teacher) neural network. We leverage the rich symmetry exhibited by such models to identify various families of critical points and express them as power series in $k^{-frac{1}{2}}$. These expressions are then used to derive estimates for several related quantities which imply that not all spurious minima are alike. In particular, we show that while the loss function at certain types of spurious minima decays to zero like $k^{-1}$, in other cases the loss converges to a strictly positive constant. The methods used depend on symmetry, the geometry of group actions, bifurcation, and Artins implicit function theorem.