No Arabic abstract
In this paper, we develop a new neural network family based on power series expansion, which is proved to achieve a better approximation accuracy in comparison with existing neural networks. This new set of neural networks embeds the power series expansion (PSE) into the neural network structure. Then it can improve the representation ability while preserving comparable computational cost by increasing the degree of PSE instead of increasing the depth or width. Both theoretical approximation and numerical results show the advantages of this new neural network.
Neural Networks (NNs) are the method of choice for building learning algorithms. Their popularity stems from their empirical success on several challenging learning problems. However, most scholars agree that a convincing theoretical explanation for this success is still lacking. This article surveys the known approximation properties of the outputs of NNs with the aim of uncovering the properties that are not present in the more traditional methods of approximation used in numerical analysis. Comparisons are made with traditional approximation methods from the viewpoint of rate distortion. Another major component in the analysis of numerical approximation is the computational time needed to construct the approximation and this in turn is intimately connected with the stability of the approximation algorithm. So the stability of numerical approximation using NNs is a large part of the analysis put forward. The survey, for the most part, is concerned with NNs using the popular ReLU activation function. In this case, the outputs of the NNs are piecewise linear functions on rather complicated partitions of the domain of $f$ into cells that are convex polytopes. When the architecture of the NN is fixed and the parameters are allowed to vary, the set of output functions of the NN is a parameterized nonlinear manifold. It is shown that this manifold has certain space filling properties leading to an increased ability to approximate (better rate distortion) but at the expense of numerical stability. The space filling creates a challenge to the numerical method in finding best or good parameter choices when trying to approximate.
Deep learning is a powerful tool for solving nonlinear differential equations, but usually, only the solution corresponding to the flattest local minimizer can be found due to the implicit regularization of stochastic gradient descent. This paper proposes a network-based structure probing deflation method to make deep learning capable of identifying multiple solutions that are ubiquitous and important in nonlinear physical models. First, we introduce deflation operators built with known solutions to make known solutions no longer local minimizers of the optimization energy landscape. Second, to facilitate the convergence to the desired local minimizer, a structure probing technique is proposed to obtain an initial guess close to the desired local minimizer. Together with neural network structures carefully designed in this paper, the new regularized optimization can converge to new solutions efficiently. Due to the mesh-free nature of deep learning, the proposed method is capable of solving high-dimensional problems on complicated domains with multiple solutions, while existing methods focus on merely one or two-dimensional regular domains and are more expensive in operation counts. Numerical experiments also demonstrate that the proposed method could find more solutions than exiting methods.
We establish in this work approximation results of deep neural networks for smooth functions measured in Sobolev norms, motivated by recent development of numerical solvers for partial differential equations using deep neural networks. The error bounds are explicitly characterized in terms of both the width and depth of the networks simultaneously. Namely, for $fin C^s([0,1]^d)$, we show that deep ReLU networks of width $mathcal{O}(Nlog{N})$ and of depth $mathcal{O}(Llog{L})$ can achieve a non-asymptotic approximation rate of $mathcal{O}(N^{-2(s-1)/d}L^{-2(s-1)/d})$ with respect to the $mathcal{W}^{1,p}([0,1]^d)$ norm for $pin[1,infty)$. If either the ReLU function or its square is applied as activation functions to construct deep neural networks of width $mathcal{O}(Nlog{N})$ and of depth $mathcal{O}(Llog{L})$ to approximate $fin C^s([0,1]^d)$, the non-asymptotic approximation rate is $mathcal{O}(N^{-2(s-n)/d}L^{-2(s-n)/d})$ with respect to the $mathcal{W}^{n,p}([0,1]^d)$ norm for $pin[1,infty)$.
Physics-informed neural network (PINN) is a data-driven approach to solve equations. It is successful in many applications; however, the accuracy of the PINN is not satisfactory when it is used to solve multiscale equations. Homogenization is a way of approximating a multiscale equation by a homogenized equation without multiscale property; it includes solving cell problems and the homogenized equation. The cell problems are periodic; and we propose an oversampling strategy which greatly improves the PINN accuracy on periodic problems. The homogenized equation has constant or slow dependency coefficient and can also be solved by PINN accurately. We hence proposed a 3-step method to improve the PINN accuracy for solving multiscale problems with the help of the homogenization. We apply our method to solve three equations which represent three different homogenization. The results show that the proposed method greatly improves the PINN accuracy. Besides, we also find that the PINN aided homogenization may achieve better accuracy than the numerical methods driven homogenization; PINN hence is a potential alternative to implementing the homogenization.
This paper proposes a plane wave activation based neural network (PWNN) for solving Helmholtz equation, the basic partial differential equation to represent wave propagation, e.g. acoustic wave, electromagnetic wave, and seismic wave. Unlike using traditional activation based neural network (TANN) or $sin$ activation based neural network (SIREN) for solving general partial differential equations, we instead introduce a complex activation function $e^{mathbf{i}{x}}$, the plane wave which is the basic component of the solution of Helmholtz equation. By a simple derivation, we further find that PWNN is actually a generalization of the plane wave partition of unity method (PWPUM) by additionally imposing a learned basis with both amplitude and direction to better characterize the potential solution. We firstly investigate our performance on a problem with the solution is an integral of the plane waves with all known directions. The experiments demonstrate that: PWNN works much better than TANN and SIREN on varying architectures or the number of training samples, that means the plane wave activation indeed helps to enhance the representation ability of neural network toward the solution of Helmholtz equation; PWNN has competitive performance than PWPUM, e.g. the same convergence order but less relative error. Furthermore, we focus a more practical problem, the solution of which only integrate the plane waves with some unknown directions. We find that PWNN works much better than PWPUM at this case. Unlike using the plane wave basis with fixed directions in PWPUM, PWNN can learn a group of optimized plane wave basis which can better predict the unknown directions of the solution. The proposed approach may provide some new insights in the aspect of applying deep learning in Helmholtz equation.