Generalization Error Analysis of Neural networks with Gradient Based Regularization

168 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Lingfeng Li

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Lingfeng Li - Xue-Cheng Tai - Jiang Yang

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We study gradient-based regularization methods for neural networks. We mainly focus on two regularization methods: the total variation and the Tikhonov regularization. Applying these methods is equivalent to using neural networks to solve some partial differential equations, mostly in high dimensions in practical applications. In this work, we introduce a general framework to analyze the generalization error of regularized networks. The error estimate relies on two assumptions on the approximation error and the quadrature error. Moreover, we conduct some experiments on the image classification tasks to show that gradient-based methods can significantly improve the generalization ability and adversarial robustness of neural networks. A graphical extension of the gradient-based methods are also considered in the experiments.

قيم البحث

101 - Jakob Heiss , Josef Teichmann , Hanna Wutte 2019

Today, various forms of neural networks are trained to perform approximation tasks in many fields. However, the estimates obtained are not fully understood on function space. Empirical results suggest that typical training algorithms favor regularize d solutions. These observations motivate us to analyze properties of the neural networks found by gradient descent initialized close to zero, that is frequently employed to perform the training task. As a starting point, we consider one dimensional (shallow) ReLU neural networks in which weights are chosen randomly and only the terminal layer is trained. First, we rigorously show that for such networks ridge regularized regression corresponds in function space to regularizing the estimates second derivative for fairly general loss functionals. For least squares regression, we show that the trained network converges to the smooth spline interpolation of the training data as the number of hidden nodes tends to infinity. Moreover, we derive a correspondence between the early stopped gradient descent and the smoothing spline regression. Our analysis might give valuable insight on the properties of the solutions obtained using gradient descent methods in general settings.

التعلم الآلي التحليل العددي التحليل العددي

Understanding the Generalization of Adam in Learning Neural Networks with Proper Regularization

254 - Difan Zou , Yuan Cao , Yuanzhi Li 2021

Adaptive gradient methods such as Adam have gained increasing popularity in deep learning optimization. However, it has been observed that compared with (stochastic) gradient descent, Adam can converge to a different solution with a significantly wor se test error in many deep learning applications such as image classification, even with a fine-tuned regularization. In this paper, we provide a theoretical explanation for this phenomenon: we show that in the nonconvex setting of learning over-parameterized two-layer convolutional neural networks starting from the same random initialization, for a class of data distributions (inspired from image data), Adam and gradient descent (GD) can converge to different global solutions of the training objective with provably different generalization errors, even with weight decay regularization. In contrast, we show that if the training objective is convex, and the weight decay regularization is employed, any optimization algorithms including Adam and GD will converge to the same solution if the training is successful. This suggests that the inferior generalization performance of Adam is fundamentally tied to the nonconvex landscape of deep learning optimization.

التعلم الآلي التحسين والتحكم التعلم الالي

A Weight Initialization Based on the Linear Product Structure for Neural Networks

122 - Qipin Chen , Wenrui Hao , Juncai He 2021

Weight initialization plays an important role in training neural networks and also affects tremendous deep learning applications. Various weight initialization strategies have already been developed for different activation functions with different n eural networks. These initialization algorithms are based on minimizing the variance of the parameters between layers and might still fail when neural networks are deep, e.g., dying ReLU. To address this challenge, we study neural networks from a nonlinear computation point of view and propose a novel weight initialization strategy that is based on the linear product structure (LPS) of neural networks. The proposed strategy is derived from the polynomial approximation of activation functions by using theories of numerical algebraic geometry to guarantee to find all the local minima. We also provide a theoretical analysis that the LPS initialization has a lower probability of dying ReLU comparing to other existing initialization strategies. Finally, we test the LPS initialization algorithm on both fully connected neural networks and convolutional neural networks to show its feasibility, efficiency, and robustness on public datasets.

التعلم الآلي التحليل العددي التحليل العددي

A Priori Generalization Error Analysis of Two-Layer Neural Networks for Solving High Dimensional Schrodinger Eigenvalue Problems

118 - Jianfeng Lu , Yulong Lu 2021

This paper analyzes the generalization error of two-layer neural networks for computing the ground state of the Schrodinger operator on a $d$-dimensional hypercube. We prove that the convergence rate of the generalization error is independent of the dimension $d$, under the a priori assumption that the ground state lies in a spectral Barron space. We verify such assumption by proving a new regularity estimate for the ground state in the spectral Barron space. The later is achieved by a fixed point argument based on the Krein-Rutman theorem.

التحليل العددي التحليل العددي الفيزياء الرياضية

Approximation in shift-invariant spaces with deep ReLU neural networks

91 - Yunfei Yang , Zhen Li , Yang Wang 2020

We study the expressive power of deep ReLU neural networks for approximating functions in dilated shift-invariant spaces, which are widely used in signal processing, image processing, communications and so on. Approximation error bounds are estimated with respect to the width and depth of neural networks. The network construction is based on the bit extraction and data-fitting capacity of deep neural networks. As applications of our main results, the approximation rates of classical function spaces such as Sobolev spaces and Besov spaces are obtained. We also give lower bounds of the $L^p (1le p le infty)$ approximation error for Sobolev spaces, which show that our construction of neural network is asymptotically optimal up to a logarithmic factor.

التعلم الآلي التحليل العددي التحليل العددي