New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Overparameterization of deep ResNet: zero loss and mean-field analysis

258 0 0.0 ( 0 )

Download Cite

Added by Zhiyan Ding

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Zhiyan Ding - Shi Chen - Qin Li

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Finding parameters in a deep neural network (NN) that fit training data is a nonconvex optimization problem, but a basic first-order optimization method (gradient descent) finds a global solution with perfect fit in many practical situations. We examine this phenomenon for the case of Residual Neural Networks (ResNet) with smooth activation functions in a limiting regime in which both the number of layers (depth) and the number of neurons in each layer (width) go to infinity. First, we use a mean-field-limit argument to prove that the gradient descent for parameter training becomes a partial differential equation (PDE) that characterizes gradient flow for a probability distribution in the large-NN limit. Next, we show that the solution to the PDE converges in the training time to a zero-loss solution. Together, these results imply that training of the ResNet also gives a near-zero loss if the Resnet is large enough. We give estimates of the depth and width needed to reduce the loss below a given threshold, with high probability.

rate research

Physics-Enforced Modeling for Insertion Loss of Transmission Lines by Deep Neural Networks

76 - Liang Chen , Lesley Tan 2021

In this paper, we investigate data-driven parameterized modeling of insertion loss for transmission lines with respect to design parameters. We first show that direct application of neural networks can lead to non-physics models with negative insertion loss. To mitigate this problem, we propose two deep learning solutions. One solution is to add a regulation term, which represents the passive condition, to the final loss function to enforce the negative quantity of insertion loss. In the second method, a third-order polynomial expression is defined first, which ensures positiveness, to approximate the insertion loss, then DeepONet neural network structure, which was proposed recently for function and system modeling, was employed to model the coefficients of polynomials. The resulting neural network is applied to predict the coefficients of the polynomial expression. The experimental results on an open-sourced SI/PI database of a PCB design show that both methods can ensure the positiveness for the insertion loss. Furthermore, both methods can achieve similar prediction results, while the polynomial-based DeepONet method is faster than DeepONet based method in training time.

Machine Learning Numerical Analysis Numerical Analysis

Mean-Square Analysis with An Application to Optimal Dimension Dependence of Langevin Monte Carlo

78 - Ruilin Li , Hongyuan Zha , Molei Tao 2021

Sampling algorithms based on discretizations of Stochastic Differential Equations (SDEs) compose a rich and popular subset of MCMC methods. This work provides a general framework for the non-asymptotic analysis of sampling error in 2-Wasserstein distance, which also leads to a bound of mixing time. The method applies to any consistent discretization of contractive SDEs. When applied to Langevin Monte Carlo algorithm, it establishes $tilde{mathcal{O}}left( frac{sqrt{d}}{epsilon} right)$ mixing time, without warm start, under the common log-smooth and log-strongly-convex conditions, plus a growth condition on the 3rd-order derivative of the potential of target measures at infinity. This bound improves the best previously known $tilde{mathcal{O}}left( frac{d}{epsilon} right)$ result and is optimal (in terms of order) in both dimension $d$ and accuracy tolerance $epsilon$ for target measures satisfying the aforementioned assumptions. Our theoretical analysis is further validated by numerical experiments.

Machine Learning Numerical Analysis Numerical Analysis

Deep Autoencoder based Energy Method for the Bending, Vibration, and Buckling Analysis of Kirchhoff Plates

238 - Xiaoying Zhuang , Hongwei Guo , Naif Alajlan 2020

In this paper, we present a deep autoencoder based energy method (DAEM) for the bending, vibration and buckling analysis of Kirchhoff plates. The DAEM exploits the higher order continuity of the DAEM and integrates a deep autoencoder and the minimum total potential principle in one framework yielding an unsupervised feature learning method. The DAEM is a specific type of feedforward deep neural network (DNN) and can also serve as function approximator. With robust feature extraction capacity, the DAEM can more efficiently identify patterns behind the whole energy system, such as the field variables, natural frequency and critical buckling load factor studied in this paper. The objective function is to minimize the total potential energy. The DAEM performs unsupervised learning based on random generated points inside the physical domain so that the total potential energy is minimized at all points. For vibration and buckling analysis, the loss function is constructed based on Rayleighs principle and the fundamental frequency and the critical buckling load is extracted. A scaled hyperbolic tangent activation function for the underlying mechanical model is presented which meets the continuity requirement and alleviates the gradient vanishing/explosive problems under bending analysis. The DAEM can be easily implemented and we employed the Pytorch library and the LBFGS optimizer. A comprehensive study of the DAEM configuration is performed for several numerical examples with various geometries, load conditions, and boundary conditions.

Machine Learning Numerical Analysis Numerical Analysis

Mean-field Analysis of Batch Normalization

57 - Mingwei Wei , James Stokes , David J Schwab 2019

Batch Normalization (BatchNorm) is an extremely useful component of modern neural network architectures, enabling optimization using higher learning rates and achieving faster convergence. In this paper, we use mean-field theory to analytically quantify the impact of BatchNorm on the geometry of the loss landscape for multi-layer networks consisting of fully-connected and convolutional layers. We show that it has a flattening effect on the loss landscape, as quantified by the maximum eigenvalue of the Fisher Information Matrix. These findings are then used to justify the use of larger learning rates for networks that use BatchNorm, and we provide quantitative characterization of the maximal allowable learning rate to ensure convergence. Experiments support our theoretically predicted maximum learning rate, and furthermore suggest that networks with smaller values of the BatchNorm parameter achieve lower loss after the same number of epochs of training.

Machine Learning Disordered Systems and Neural Networks Machine Learning

Ensemble Kalman Sampler: mean-field limit and convergence analysis

116 - Zhiyan Ding , Qin Li 2019

Ensemble Kalman Sampler (EKS) is a method to find approximately $i.i.d.$ samples from a target distribution. As of today, why the algorithm works and how it converges is mostly unknown. The continuous version of the algorithm is a set of coupled stochastic differential equations (SDEs). In this paper, we prove the wellposedness of the SDE system, justify its mean-field limit is a Fokker-Planck equation, whose long time equilibrium is the target distribution. We further demonstrate that the convergence rate is near-optimal ($J^{-1/2}$, with $J$ being the number of particles). These results, combined with the in-time convergence of the Fokker-Planck equation to its equilibrium, justify the validity of EKS, and provide the convergence rate as a sampling method.

Numerical Analysis Numerical Analysis

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Overparameterization of deep ResNet: zero loss and mean-field analysis

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions