High-Order Langevin Diffusion Yields an Accelerated MCMC Algorithm

179 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Wenlong Mou

تاريخ النشر 2019

مجال البحث الاحصاء الرياضي الهندسة المعلوماتية

والبحث باللغة English

تأليف Wenlong Mou - Yi-An Ma - Martin J. Wainwright

التعلم الالي بنى وهياكل البيانات والخوارزميات التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We propose a Markov chain Monte Carlo (MCMC) algorithm based on third-order Langevin dynamics for sampling from distributions with log-concave and smooth densities. The higher-order dynamics allow for more flexible discretization schemes, and we develop a specific method that combines splitting with more accurate integration. For a broad class of $d$-dimensional distributions arising from generalized linear models, we prove that the resulting third-order algorithm produces samples from a distribution that is at most $varepsilon > 0$ in Wasserstein distance from the target distribution in $Oleft(frac{d^{1/4}}{ varepsilon^{1/2}} right)$ steps. This result requires only Lipschitz conditions on the gradient. For general strongly convex potentials with $alpha$-th order smoothness, we prove that the mixing time scales as $O left(frac{d^{1/4}}{varepsilon^{1/2}} + frac{d^{1/2}}{varepsilon^{1/(alpha - 1)}} right)$.

قيم البحث

88 - Wenlong Mou , Nicolas Flammarion , Martin J. Wainwright 2019

We consider the problem of sampling from a density of the form $p(x) propto exp(-f(x)- g(x))$, where $f: mathbb{R}^d rightarrow mathbb{R}$ is a smooth and strongly convex function and $g: mathbb{R}^d rightarrow mathbb{R}$ is a convex and Lipschitz fu nction. We propose a new algorithm based on the Metropolis-Hastings framework, and prove that it mixes to within TV distance $varepsilon$ of the target density in at most $O(d log (d/varepsilon))$ iterations. This guarantee extends previous results on sampling from distributions with smooth log densities ($g = 0$) to the more general composite non-smooth case, with the same mixing time up to a multiple of the condition number. Our method is based on a novel proximal-based proposal distribution that can be efficiently computed for a large class of non-smooth functions $g$.

التعلم الالي بنى وهياكل البيانات والخوارزميات التعلم الآلي

Sampling for Bayesian Mixture Models: MCMC with Polynomial-Time Mixing

360 - Wenlong Mou , Nhat Ho , Martin J. Wainwright 2019

We study the problem of sampling from the power posterior distribution in Bayesian Gaussian mixture models, a robust version of the classical posterior. This power posterior is known to be non-log-concave and multi-modal, which leads to exponential m ixing times for some standard MCMC algorithms. We introduce and study the Reflected Metropolis-Hastings Random Walk (RMRW) algorithm for sampling. For symmetric two-component Gaussian mixtures, we prove that its mixing time is bounded as $d^{1.5}(d + Vert theta_{0} Vert^2)^{4.5}$ as long as the sample size $n$ is of the order $d (d + Vert theta_{0} Vert^2)$. Notably, this result requires no conditions on the separation of the two means. En route to proving this bound, we establish some new results of possible independent interest that allow for combining Poincar{e} inequalities for conditional and marginal densities.

التعلم الالي بنى وهياكل البيانات والخوارزميات التعلم الآلي

Differential Privacy Dynamics of Langevin Diffusion and Noisy Gradient Descent

215 - Rishav Chourasia , Jiayuan Ye , Reza Shokri 2021

What is the information leakage of an iterative learning algorithm about its training data, when the internal state of the algorithm is emph{not} observable? How much is the contribution of each specific training epoch to the final leakage? We study this problem for noisy gradient descent algorithms, and model the emph{dynamics} of Renyi differential privacy loss throughout the training process. Our analysis traces a provably tight bound on the Renyi divergence between the pair of probability distributions over parameters of models with neighboring datasets. We prove that the privacy loss converges exponentially fast, for smooth and strongly convex loss functions, which is a significant improvement over composition theorems. For Lipschitz, smooth, and strongly convex loss functions, we prove optimal utility for differential privacy algorithms with a small gradient complexity.

التعلم الالي التشفير والأمن التعلم الآلي

Higher Order Langevin Monte Carlo Algorithm

56 - Sotirios Sabanis , Ying Zhang 2018

A new (unadjusted) Langevin Monte Carlo (LMC) algorithm with improved rates in total variation and in Wasserstein distance is presented. All these are obtained in the context of sampling from a target distribution $pi$ that has a density $hat{pi}$ on $mathbb{R}^d$ known up to a normalizing constant. Moreover, $-log hat{pi}$ is assumed to have a locally Lipschitz gradient and its third derivative is locally H{o}lder continuous with exponent $beta in (0,1]$. Non-asymptotic bounds are obtained for the convergence to stationarity of the new sampling method with convergence rate $1+ beta/2$ in Wasserstein distance, while it is shown that the rate is 1 in total variation even in the absence of convexity. Finally, in the case where $-log hat{pi}$ is strongly convex and its gradient is Lipschitz continuous, explicit constants are provided.

نظرية الإحصاء نظرية الإحصاء

An explicit vector algorithm for high-girth MaxCut

70 - Jessica K. Thompson , Ojas Parekh , Kunal Marwaha 2021

We give an approximation algorithm for MaxCut and provide guarantees on the average fraction of edges cut on $d$-regular graphs of girth $geq 2k$. For every $d geq 3$ and $k geq 4$, our approximation guarantees are better than those of all other clas sical and quantum algorithms known to the authors. Our algorithm constructs an explicit vector solution to the standard semidefinite relaxation of MaxCut and applies hyperplane rounding. It may be viewed as a simplification of the previously best known technique, which approximates Gaussian wave processes on the infinite $d$-regular tree.

فيزياء الكم بنى وهياكل البيانات والخوارزميات