Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Smaller generalization error derived for a deep residual neural network compared to shallow networks

195 0 0.0 ( 0 )

Download Cite

Added by Mattias Sandberg

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors Aku Kammonen - Jonas Kiessling - Petr Plechav{c}

Numerical Analysis Numerical Analysis Optimization and Control

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Estimates of the generalization error are proved for a residual neural network with $L$ random Fourier features layers $bar z_{ell+1}=bar z_ell + mathrm{Re}sum_{k=1}^Kbar b_{ell k}e^{mathrm{i}omega_{ell k}bar z_ell}+ mathrm{Re}sum_{k=1}^Kbar c_{ell k}e^{mathrm{i}omega_{ell k}cdot x}$. An optimal distribution for the frequencies $(omega_{ell k},omega_{ell k})$ of the random Fourier features $e^{mathrm{i}omega_{ell k}bar z_ell}$ and $e^{mathrm{i}omega_{ell k}cdot x}$ is derived. This derivation is based on the corresponding generalization error for the approximation of the function values $f(x)$. The generalization error turns out to be smaller than the estimate ${|hat f|^2_{L^1(mathbb{R}^d)}}/{(KL)}$ of the generalization error for random Fourier features with one hidden layer and the same total number of nodes $KL$, in the case the $L^infty$-norm of $f$ is much less than the $L^1$-norm of its Fourier transform $hat f$. This understanding of an optimal distribution for random features is used to construct a new training method for a deep residual network. Promising performance of the proposed new algorithm is demonstrated in computational experiments.

rate research

Lower bounds for artificial neural network approximations: A proof that shallow neural networks fail to overcome the curse of dimensionality

212 - Philipp Grohs , Shokhrukh Ibragimov , Arnulf Jentzen 2021

Artificial neural networks (ANNs) have become a very powerful tool in the approximation of high-dimensional functions. Especially, deep ANNs, consisting of a large number of hidden layers, have been very successfully used in a series of practical relevant computational problems involving high-dimensional input data ranging from classification tasks in supervised learning to optimal decision problems in reinforcement learning. There are also a number of mathematical results in the scientific literature which study the approximation capacities of ANNs in the context of high-dimensional target functions. In particular, there are a series of mathematical results in the scientific literature which show that sufficiently deep ANNs have the capacity to overcome the curse of dimensionality in the approximation of certain target function classes in the sense that the number of parameters of the approximating ANNs grows at most polynomially in the dimension $d in mathbb{N}$ of the target functions under considerations. In the proofs of several of such high-dimensional approximation results it is crucial that the involved ANNs are sufficiently deep and consist a sufficiently large number of hidden layers which grows in the dimension of the considered target functions. It is the topic of this work to look a bit more detailed to the deepness of the involved ANNs in the approximation of high-dimensional target functions. In particular, the main result of this work proves that there exists a concretely specified sequence of functions which can be approximated without the curse of dimensionality by sufficiently deep ANNs but which cannot be approximated without the curse of dimensionality if the involved ANNs are shallow or not deep enough.

Numerical Analysis Numerical Analysis

A Priori Generalization Error Analysis of Two-Layer Neural Networks for Solving High Dimensional Schrodinger Eigenvalue Problems

118 - Jianfeng Lu , Yulong Lu 2021

This paper analyzes the generalization error of two-layer neural networks for computing the ground state of the Schrodinger operator on a $d$-dimensional hypercube. We prove that the convergence rate of the generalization error is independent of the dimension $d$, under the a priori assumption that the ground state lies in a spectral Barron space. We verify such assumption by proving a new regularity estimate for the ground state in the spectral Barron space. The later is achieved by a fixed point argument based on the Krein-Rutman theorem.

Numerical Analysis Numerical Analysis Mathematical Physics

A residual a posteriori error estimate for the time-domain boundary element method

120 - Heiko Gimperlein , Ceyhun Oezdemir , David Stark 2020

This article investigates residual a posteriori error estimates and adaptive mesh refinements for time-dependent boundary element methods for the wave equation. We obtain reliable estimates for Dirichlet and acoustic boundary conditions which hold for a large class of discretizations. Efficiency of the error estimate is shown for a natural discretization of low order. Numerical examples confirm the theoretical results. The resulting adaptive mesh refinement procedures in 3d recover the adaptive convergence rates known for elliptic problems.

Numerical Analysis Numerical Analysis Analysis of PDEs

Generalization Error Analysis of Neural networks with Gradient Based Regularization

167 - Lingfeng Li , Xue-Cheng Tai , Jiang Yang 2021

We study gradient-based regularization methods for neural networks. We mainly focus on two regularization methods: the total variation and the Tikhonov regularization. Applying these methods is equivalent to using neural networks to solve some partial differential equations, mostly in high dimensions in practical applications. In this work, we introduce a general framework to analyze the generalization error of regularized networks. The error estimate relies on two assumptions on the approximation error and the quadrature error. Moreover, we conduct some experiments on the image classification tasks to show that gradient-based methods can significantly improve the generalization ability and adversarial robustness of neural networks. A graphical extension of the gradient-based methods are also considered in the experiments.

Machine Learning Numerical Analysis Numerical Analysis

Symmetry & critical points for a model shallow neural network

146 - Yossi Arjevani , Michael Field 2020

We consider the optimization problem associated with fitting two-layer ReLU networks with $k$ hidden neurons, where labels are assumed to be generated by a (teacher) neural network. We leverage the rich symmetry exhibited by such models to identify various families of critical points and express them as power series in $k^{-frac{1}{2}}$. These expressions are then used to derive estimates for several related quantities which imply that not all spurious minima are alike. In particular, we show that while the loss function at certain types of spurious minima decays to zero like $k^{-1}$, in other cases the loss converges to a strictly positive constant. The methods used depend on symmetry, the geometry of group actions, bifurcation, and Artins implicit function theorem.

Machine Learning Dynamical Systems Optimization and Control

comments

Fetching comments

Hama University

Additional details More universities

Smaller generalization error derived for a deep residual neural network compared to shallow networks

Ask ChatGPT about the research

No Arabic abstract

Read More