ترغب بنشر مسار تعليمي؟ اضغط هنا

High-Order Langevin Diffusion Yields an Accelerated MCMC Algorithm

179   0   0.0 ( 0 )
 نشر من قبل Wenlong Mou
 تاريخ النشر 2019
والبحث باللغة English




اسأل ChatGPT حول البحث

We propose a Markov chain Monte Carlo (MCMC) algorithm based on third-order Langevin dynamics for sampling from distributions with log-concave and smooth densities. The higher-order dynamics allow for more flexible discretization schemes, and we develop a specific method that combines splitting with more accurate integration. For a broad class of $d$-dimensional distributions arising from generalized linear models, we prove that the resulting third-order algorithm produces samples from a distribution that is at most $varepsilon > 0$ in Wasserstein distance from the target distribution in $Oleft(frac{d^{1/4}}{ varepsilon^{1/2}} right)$ steps. This result requires only Lipschitz conditions on the gradient. For general strongly convex potentials with $alpha$-th order smoothness, we prove that the mixing time scales as $O left(frac{d^{1/4}}{varepsilon^{1/2}} + frac{d^{1/2}}{varepsilon^{1/(alpha - 1)}} right)$.

قيم البحث

اقرأ أيضاً

We consider the problem of sampling from a density of the form $p(x) propto exp(-f(x)- g(x))$, where $f: mathbb{R}^d rightarrow mathbb{R}$ is a smooth and strongly convex function and $g: mathbb{R}^d rightarrow mathbb{R}$ is a convex and Lipschitz fu nction. We propose a new algorithm based on the Metropolis-Hastings framework, and prove that it mixes to within TV distance $varepsilon$ of the target density in at most $O(d log (d/varepsilon))$ iterations. This guarantee extends previous results on sampling from distributions with smooth log densities ($g = 0$) to the more general composite non-smooth case, with the same mixing time up to a multiple of the condition number. Our method is based on a novel proximal-based proposal distribution that can be efficiently computed for a large class of non-smooth functions $g$.
We study the problem of sampling from the power posterior distribution in Bayesian Gaussian mixture models, a robust version of the classical posterior. This power posterior is known to be non-log-concave and multi-modal, which leads to exponential m ixing times for some standard MCMC algorithms. We introduce and study the Reflected Metropolis-Hastings Random Walk (RMRW) algorithm for sampling. For symmetric two-component Gaussian mixtures, we prove that its mixing time is bounded as $d^{1.5}(d + Vert theta_{0} Vert^2)^{4.5}$ as long as the sample size $n$ is of the order $d (d + Vert theta_{0} Vert^2)$. Notably, this result requires no conditions on the separation of the two means. En route to proving this bound, we establish some new results of possible independent interest that allow for combining Poincar{e} inequalities for conditional and marginal densities.
What is the information leakage of an iterative learning algorithm about its training data, when the internal state of the algorithm is emph{not} observable? How much is the contribution of each specific training epoch to the final leakage? We study this problem for noisy gradient descent algorithms, and model the emph{dynamics} of Renyi differential privacy loss throughout the training process. Our analysis traces a provably tight bound on the Renyi divergence between the pair of probability distributions over parameters of models with neighboring datasets. We prove that the privacy loss converges exponentially fast, for smooth and strongly convex loss functions, which is a significant improvement over composition theorems. For Lipschitz, smooth, and strongly convex loss functions, we prove optimal utility for differential privacy algorithms with a small gradient complexity.
A new (unadjusted) Langevin Monte Carlo (LMC) algorithm with improved rates in total variation and in Wasserstein distance is presented. All these are obtained in the context of sampling from a target distribution $pi$ that has a density $hat{pi}$ on $mathbb{R}^d$ known up to a normalizing constant. Moreover, $-log hat{pi}$ is assumed to have a locally Lipschitz gradient and its third derivative is locally H{o}lder continuous with exponent $beta in (0,1]$. Non-asymptotic bounds are obtained for the convergence to stationarity of the new sampling method with convergence rate $1+ beta/2$ in Wasserstein distance, while it is shown that the rate is 1 in total variation even in the absence of convexity. Finally, in the case where $-log hat{pi}$ is strongly convex and its gradient is Lipschitz continuous, explicit constants are provided.
We give an approximation algorithm for MaxCut and provide guarantees on the average fraction of edges cut on $d$-regular graphs of girth $geq 2k$. For every $d geq 3$ and $k geq 4$, our approximation guarantees are better than those of all other clas sical and quantum algorithms known to the authors. Our algorithm constructs an explicit vector solution to the standard semidefinite relaxation of MaxCut and applies hyperplane rounding. It may be viewed as a simplification of the previously best known technique, which approximates Gaussian wave processes on the infinite $d$-regular tree.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا