مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

Smaller generalization error derived for a deep residual neural network compared to shallow networks

195 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Mattias Sandberg

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Aku Kammonen - Jonas Kiessling - Petr Plechav{c}

التحليل العددي التحليل العددي التحسين والتحكم

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Estimates of the generalization error are proved for a residual neural network with $L$ random Fourier features layers $bar z_{ell+1}=bar z_ell + mathrm{Re}sum_{k=1}^Kbar b_{ell k}e^{mathrm{i}omega_{ell k}bar z_ell}+ mathrm{Re}sum_{k=1}^Kbar c_{ell k}e^{mathrm{i}omega_{ell k}cdot x}$. An optimal distribution for the frequencies $(omega_{ell k},omega_{ell k})$ of the random Fourier features $e^{mathrm{i}omega_{ell k}bar z_ell}$ and $e^{mathrm{i}omega_{ell k}cdot x}$ is derived. This derivation is based on the corresponding generalization error for the approximation of the function values $f(x)$. The generalization error turns out to be smaller than the estimate ${|hat f|^2_{L^1(mathbb{R}^d)}}/{(KL)}$ of the generalization error for random Fourier features with one hidden layer and the same total number of nodes $KL$, in the case the $L^infty$-norm of $f$ is much less than the $L^1$-norm of its Fourier transform $hat f$. This understanding of an optimal distribution for random features is used to construct a new training method for a deep residual network. Promising performance of the proposed new algorithm is demonstrated in computational experiments.

قيم البحث

212 - Philipp Grohs , Shokhrukh Ibragimov , Arnulf Jentzen 2021

Artificial neural networks (ANNs) have become a very powerful tool in the approximation of high-dimensional functions. Especially, deep ANNs, consisting of a large number of hidden layers, have been very successfully used in a series of practical rel evant computational problems involving high-dimensional input data ranging from classification tasks in supervised learning to optimal decision problems in reinforcement learning. There are also a number of mathematical results in the scientific literature which study the approximation capacities of ANNs in the context of high-dimensional target functions. In particular, there are a series of mathematical results in the scientific literature which show that sufficiently deep ANNs have the capacity to overcome the curse of dimensionality in the approximation of certain target function classes in the sense that the number of parameters of the approximating ANNs grows at most polynomially in the dimension $d in mathbb{N}$ of the target functions under considerations. In the proofs of several of such high-dimensional approximation results it is crucial that the involved ANNs are sufficiently deep and consist a sufficiently large number of hidden layers which grows in the dimension of the considered target functions. It is the topic of this work to look a bit more detailed to the deepness of the involved ANNs in the approximation of high-dimensional target functions. In particular, the main result of this work proves that there exists a concretely specified sequence of functions which can be approximated without the curse of dimensionality by sufficiently deep ANNs but which cannot be approximated without the curse of dimensionality if the involved ANNs are shallow or not deep enough.

التحليل العددي التحليل العددي

A Priori Generalization Error Analysis of Two-Layer Neural Networks for Solving High Dimensional Schrodinger Eigenvalue Problems

118 - Jianfeng Lu , Yulong Lu 2021

This paper analyzes the generalization error of two-layer neural networks for computing the ground state of the Schrodinger operator on a $d$-dimensional hypercube. We prove that the convergence rate of the generalization error is independent of the dimension $d$, under the a priori assumption that the ground state lies in a spectral Barron space. We verify such assumption by proving a new regularity estimate for the ground state in the spectral Barron space. The later is achieved by a fixed point argument based on the Krein-Rutman theorem.

التحليل العددي التحليل العددي الفيزياء الرياضية

A residual a posteriori error estimate for the time-domain boundary element method

120 - Heiko Gimperlein , Ceyhun Oezdemir , David Stark 2020

This article investigates residual a posteriori error estimates and adaptive mesh refinements for time-dependent boundary element methods for the wave equation. We obtain reliable estimates for Dirichlet and acoustic boundary conditions which hold fo r a large class of discretizations. Efficiency of the error estimate is shown for a natural discretization of low order. Numerical examples confirm the theoretical results. The resulting adaptive mesh refinement procedures in 3d recover the adaptive convergence rates known for elliptic problems.

التحليل العددي التحليل العددي تحليل PDES

Generalization Error Analysis of Neural networks with Gradient Based Regularization

167 - Lingfeng Li , Xue-Cheng Tai , Jiang Yang 2021

We study gradient-based regularization methods for neural networks. We mainly focus on two regularization methods: the total variation and the Tikhonov regularization. Applying these methods is equivalent to using neural networks to solve some partia l differential equations, mostly in high dimensions in practical applications. In this work, we introduce a general framework to analyze the generalization error of regularized networks. The error estimate relies on two assumptions on the approximation error and the quadrature error. Moreover, we conduct some experiments on the image classification tasks to show that gradient-based methods can significantly improve the generalization ability and adversarial robustness of neural networks. A graphical extension of the gradient-based methods are also considered in the experiments.

التعلم الآلي التحليل العددي التحليل العددي

Symmetry & critical points for a model shallow neural network

146 - Yossi Arjevani , Michael Field 2020

We consider the optimization problem associated with fitting two-layer ReLU networks with $k$ hidden neurons, where labels are assumed to be generated by a (teacher) neural network. We leverage the rich symmetry exhibited by such models to identify v arious families of critical points and express them as power series in $k^{-frac{1}{2}}$. These expressions are then used to derive estimates for several related quantities which imply that not all spurious minima are alike. In particular, we show that while the loss function at certain types of spurious minima decays to zero like $k^{-1}$, in other cases the loss converges to a strictly positive constant. The methods used depend on symmetry, the geometry of group actions, bifurcation, and Artins implicit function theorem.

التعلم الآلي النظم الديناميكية التحسين والتحكم

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

معهد تكنولوجيا المعلومات ITI

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Smaller generalization error derived for a deep residual neural network compared to shallow networks

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً