ترغب بنشر مسار تعليمي؟ اضغط هنا

Mean-Square Analysis with An Application to Optimal Dimension Dependence of Langevin Monte Carlo

79   0   0.0 ( 0 )
 نشر من قبل Molei Tao
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Sampling algorithms based on discretizations of Stochastic Differential Equations (SDEs) compose a rich and popular subset of MCMC methods. This work provides a general framework for the non-asymptotic analysis of sampling error in 2-Wasserstein distance, which also leads to a bound of mixing time. The method applies to any consistent discretization of contractive SDEs. When applied to Langevin Monte Carlo algorithm, it establishes $tilde{mathcal{O}}left( frac{sqrt{d}}{epsilon} right)$ mixing time, without warm start, under the common log-smooth and log-strongly-convex conditions, plus a growth condition on the 3rd-order derivative of the potential of target measures at infinity. This bound improves the best previously known $tilde{mathcal{O}}left( frac{d}{epsilon} right)$ result and is optimal (in terms of order) in both dimension $d$ and accuracy tolerance $epsilon$ for target measures satisfying the aforementioned assumptions. Our theoretical analysis is further validated by numerical experiments.



قيم البحث

اقرأ أيضاً

Generative adversarial networks (GANs) have shown promising results when applied on partial differential equations and financial time series generation. We investigate if GANs can also be used to approximate one-dimensional Ito stochastic differentia l equations (SDEs). We propose a scheme that approximates the path-wise conditional distribution of SDEs for large time steps. Standard GANs are only able to approximate processes in distribution, yielding a weak approximation to the SDE. A conditional GAN architecture is proposed that enables strong approximation. We inform the discriminator of this GAN with the map between the prior input to the generator and the corresponding output samples, i.e. we introduce a `supervised GAN. We compare the input-output map obtained with the standard GAN and supervised GAN and show experimentally that the standard GAN may fail to provide a path-wise approximation. The GAN is trained on a dataset obtained with exact simulation. The architecture was tested on geometric Brownian motion (GBM) and the Cox-Ingersoll-Ross (CIR) process. The supervised GAN outperformed the Euler and Milstein schemes in strong error on a discretisation with large time steps. It also outperformed the standard conditional GAN when approximating the conditional distribution. We also demonstrate how standard GANs may give rise to non-parsimonious input-output maps that are sensitive to perturbations, which motivates the need for constraints and regularisation on GAN generators.
Practitioners wishing to experience the efficiency gains from using low discrepancy sequences need correct, well-written software. This article, based on our MCQMC 2020 tutorial, describes some of the better quasi-Monte Carlo (QMC) software available . We highlight the key software components required to approximate multivariate integrals or expectations of functions of vector random variables by QMC. We have combined these components in QMCPy, a Python open source library, which we hope will draw the support of the QMC community. Here we introduce QMCPy.
We introduce an ensemble Markov chain Monte Carlo approach to sampling from a probability density with known likelihood. This method upgrades an underlying Markov chain by allowing an ensemble of such chains to interact via a process in which one cha ins state is cloned as anothers is deleted. This effective teleportation of states can overcome issues of metastability in the underlying chain, as the scheme enjoys rapid mixing once the modes of the target density have been populated. We derive a mean-field limit for the evolution of the ensemble. We analyze the global and local convergence of this mean-field limit, showing asymptotic convergence independent of the spectral gap of the underlying Markov chain, and moreover we interpret the limiting evolution as a gradient flow. We explain how interaction can be applied selectively to a subset of state variables in order to maintain advantage on very high-dimensional problems. Finally we present the application of our methodology to Bayesian hyperparameter estimation for Gaussian process regression.
257 - Zhiyan Ding , Shi Chen , Qin Li 2021
Finding parameters in a deep neural network (NN) that fit training data is a nonconvex optimization problem, but a basic first-order optimization method (gradient descent) finds a global solution with perfect fit in many practical situations. We exam ine this phenomenon for the case of Residual Neural Networks (ResNet) with smooth activation functions in a limiting regime in which both the number of layers (depth) and the number of neurons in each layer (width) go to infinity. First, we use a mean-field-limit argument to prove that the gradient descent for parameter training becomes a partial differential equation (PDE) that characterizes gradient flow for a probability distribution in the large-NN limit. Next, we show that the solution to the PDE converges in the training time to a zero-loss solution. Together, these results imply that training of the ResNet also gives a near-zero loss if the Resnet is large enough. We give estimates of the depth and width needed to reduce the loss below a given threshold, with high probability.
70 - Kinjal Basu , Art B. Owen 2016
Quasi-Monte Carlo methods are designed for integrands of bounded variation, and this excludes singular integrands. Several methods are known for integrands that become singular on the boundary of the unit cube $[0,1]^d$ or at isolated possibly unknow n points within $[0,1]^d$. Here we consider functions on the square $[0,1]^2$ that may become singular as the point approaches the diagonal line $x_1=x_2$, and we study three quadrature methods. The first method splits the square into two triangles separated by a region around the line of singularity, and applies recently developed triangle QMC rules to the two triangular parts. For functions with a singularity `no worse than $|x_1-x_2|^{-A}$ for $0<A<1$ that method yields an error of $O( (log(n)/n)^{(1-A)/2})$. We also consider methods extending the integrand into a region containing the singularity and show that method will not improve up on using two triangles. Finally, we consider transforming the integrand to have a more QMC-friendly singularity along the boundary of the square. This then leads to error rates of $O(n^{-1+epsilon+A})$ when combined with some corner-avoiding Halton points or with randomized QMC, but it requires some stronger assumptions on the original singular integrand.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا