ترغب بنشر مسار تعليمي؟ اضغط هنا

In recent years, physical informed neural networks (PINNs) have been shown to be a powerful tool for solving PDEs empirically. However, numerical analysis of PINNs is still missing. In this paper, we prove the convergence rate to PINNs for the second order elliptic equations with Dirichlet boundary condition, by establishing the upper bounds on the number of training samples, depth and width of the deep neural networks to achieve desired accuracy. The error of PINNs is decomposed into approximation error and statistical error, where the approximation error is given in $C^2$ norm with $mathrm{ReLU}^{3}$ networks, the statistical error is estimated by Rademacher complexity. We derive the bound on the Rademacher complexity of the non-Lipschitz composition of gradient norm with $mathrm{ReLU}^{3}$ network, which is of immense independent interest.
Using deep neural networks to solve PDEs has attracted a lot of attentions recently. However, why the deep learning method works is falling far behind its empirical success. In this paper, we provide a rigorous numerical analysis on deep Ritz method (DRM) cite{Weinan2017The} for second order elliptic equations with Drichilet, Neumann and Robin boundary condition, respectively. We establish the first nonasymptotic convergence rate in $H^1$ norm for DRM using deep networks with smooth activation functions including logistic and hyperbolic tangent functions. Our results show how to set the hyper-parameter of depth and width to achieve the desired convergence rate in terms of number of training samples.
In this paper, we study the properties of robust nonparametric estimation using deep neural networks for regression models with heavy tailed error distributions. We establish the non-asymptotic error bounds for a class of robust nonparametric regress ion estimators using deep neural networks with ReLU activation under suitable smoothness conditions on the regression function and mild conditions on the error term. In particular, we only assume that the error distribution has a finite p-th moment with p greater than one. We also show that the deep robust regression estimators are able to circumvent the curse of dimensionality when the distribution of the predictor is supported on an approximate lower-dimensional set. An important feature of our error bound is that, for ReLU neural networks with network width and network size (number of parameters) no more than the order of the square of the dimensionality d of the predictor, our excess risk bounds depend sub-linearly on d. Our assumption relaxes the exact manifold support assumption, which could be restrictive and unrealistic in practice. We also relax several crucial assumptions on the data distribution, the target regression function and the neural networks required in the recent literature. Our simulation studies demonstrate that the robust methods can significantly outperform the least squares method when the errors have heavy-tailed distributions and illustrate that the choice of loss function is important in the context of deep nonparametric regression.
This paper considers the problem of nonparametric quantile regression under the assumption that the target conditional quantile function is a composition of a sequence of low-dimensional functions. We study the nonparametric quantile regression estim ator using deep neural networks to approximate the target conditional quantile function. For convenience, we shall refer to such an estimator as a deep quantile regression (DQR) estimator. We show that the DQR estimator achieves the nonparametric optimal convergence rate up to a logarithmic factor determined by the intrinsic dimension of the underlying compositional structure of the conditional quantile function, not the ambient dimension of the predictor. Therefore, DQR is able to mitigate the curse of dimensionality under the assumption that the conditional quantile function has a compositional structure. To establish these results, we analyze the approximation error of a composite function by neural networks and show that the error rate only depends on the dimensions of the component functions. We apply our general results to several important statistical models often used in mitigating the curse of dimensionality, including the single index, the additive, the projection pursuit, the univariate composite, and the generalized hierarchical interaction models. We explicitly describe the prefactors in the error bounds in terms of the dimensionality of the data and show that the prefactors depends on the dimensionality linearly or quadratically in these models. We also conduct extensive numerical experiments to evaluate the effectiveness of DQR and demonstrate that it outperforms a kernel-based method for nonparametric quantile regression.
Schr{o}dinger-F{o}llmer sampler (SFS) is a novel and efficient approach for sampling from possibly unnormalized distributions without ergodicity. SFS is based on the Euler-Maruyama discretization of Schr{o}dinger-F{o}llmer diffusion process $$mathrm{ d} X_{t}=- abla Uleft(X_t, tright) mathrm{d} t+mathrm{d} B_{t}, quad t in[0,1],quad X_0=0$$ on the unit interval, which transports the degenerate distribution at time zero to the target distribution at time one. In cite{sfs21}, the consistency of SFS is established under a restricted assumption that %the drift term $b(x,t)$ the potential $U(x,t)$ is uniformly (on $t$) strongly %concave convex (on $x$). In this paper we provide a nonasymptotic error bound of SFS in Wasserstein distance under some smooth and bounded conditions on the density ratio of the target distribution over the standard normal distribution, but without requiring the strongly convexity of the potential.
Sampling from probability distributions is an important problem in statistics and machine learning, specially in Bayesian inference when integration with respect to posterior distribution is intractable and sampling from the posterior is the only via ble option for inference. In this paper, we propose Schr{o}dinger-F{o}llmer sampler (SFS), a novel approach for sampling from possibly unnormalized distributions. The proposed SFS is based on the Schr{o}dinger-F{o}llmer diffusion process on the unit interval with a time dependent drift term, which transports the degenerate distribution at time zero to the target distribution at time one. Comparing with the existing Markov chain Monte Carlo samplers that require ergodicity, no such requirement is needed for SFS. Computationally, SFS can be easily implemented using the Euler-Maruyama discretization. In theoretical analysis, we establish non-asymptotic error bounds for the sampling distribution of SFS in the Wasserstein distance under suitable conditions. We conduct numerical experiments to evaluate the performance of SFS and demonstrate that it is able to generate samples with better quality than several existing methods.
216 - Gefei Wang , Yuling Jiao , Qian Xu 2021
We propose to learn a generative model via entropy interpolation with a Schr{o}dinger Bridge. The generative learning task can be formulated as interpolating between a reference distribution and a target distribution based on the Kullback-Leibler div ergence. At the population level, this entropy interpolation is characterized via an SDE on $[0,1]$ with a time-varying drift term. At the sample level, we derive our Schr{o}dinger Bridge algorithm by plugging the drift term estimated by a deep score estimator and a deep density ratio estimator into the Euler-Maruyama method. Under some mild smoothness assumptions of the target distribution, we prove the consistency of both the score estimator and the density ratio estimator, and then establish the consistency of the proposed Schr{o}dinger Bridge approach. Our theoretical results guarantee that the distribution learned by our approach converges to the target distribution. Experimental results on multimodal synthetic data and benchmark data support our theoretical findings and indicate that the generative model via Schr{o}dinger Bridge is comparable with state-of-the-art GANs, suggesting a new formulation of generative learning. We demonstrate its usefulness in image interpolation and image inpainting.
This paper studies how well generative adversarial networks (GANs) learn probability distributions from finite samples. Our main results establish the convergence rates of GANs under a collection of integral probability metrics defined through Holder classes, including the Wasserstein distance as a special case. We also show that GANs are able to adaptively learn data distributions with low-dimensional structures or have Holder densities, when the network architectures are chosen properly. In particular, for distributions concentrated around a low-dimensional set, we show that the learning rates of GANs do not depend on the high ambient dimension, but on the lower intrinsic dimension. Our analysis is based on a new oracle inequality decomposing the estimation error into the generator and discriminator approximation error and the statistical error, which may be of independent interest.
Using deep neural networks to solve PDEs has attracted a lot of attentions recently. However, why the deep learning method works is falling far behind its empirical success. In this paper, we provide a rigorous numerical analysis on deep Ritz method (DRM) cite{wan11} for second order elliptic equations with Neumann boundary conditions. We establish the first nonasymptotic convergence rate in $H^1$ norm for DRM using deep networks with $mathrm{ReLU}^2$ activation functions. In addition to providing a theoretical justification of DRM, our study also shed light on how to set the hyper-parameter of depth and width to achieve the desired convergence rate in terms of number of training samples. Technically, we derive bounds on the approximation error of deep $mathrm{ReLU}^2$ network in $H^1$ norm and on the Rademacher complexity of the non-Lipschitz composition of gradient norm and $mathrm{ReLU}^2$ network, both of which are of independent interest.
In this paper, we construct neural networks with ReLU, sine and $2^x$ as activation functions. For general continuous $f$ defined on $[0,1]^d$ with continuity modulus $omega_f(cdot)$, we construct ReLU-sine-$2^x$ networks that enjoy an approximation rate $mathcal{O}(omega_f(sqrt{d})cdot2^{-M}+omega_{f}left(frac{sqrt{d}}{N}right))$, where $M,Nin mathbb{N}^{+}$ denote the hyperparameters related to widths of the networks. As a consequence, we can construct ReLU-sine-$2^x$ network with the depth $5$ and width $maxleft{leftlceil2d^{3/2}left(frac{3mu}{epsilon}right)^{1/{alpha}}rightrceil,2leftlceillog_2frac{3mu d^{alpha/2}}{2epsilon}rightrceil+2right}$ that approximates $fin mathcal{H}_{mu}^{alpha}([0,1]^d)$ within a given tolerance $epsilon >0$ measured in $L^p$ norm $pin[1,infty)$, where $mathcal{H}_{mu}^{alpha}([0,1]^d)$ denotes the Holder continuous function class defined on $[0,1]^d$ with order $alpha in (0,1]$ and constant $mu > 0$. Therefore, the ReLU-sine-$2^x$ networks overcome the curse of dimensionality on $mathcal{H}_{mu}^{alpha}([0,1]^d)$. In addition to its supper expressive power, functions implemented by ReLU-sine-$2^x$ networks are (generalized) differentiable, enabling us to apply SGD to train.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا