On the convergence of the Metropolis algorithm with fixed-order updates for multivariate binary probability distributions

93 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Christian Igel

تاريخ النشر 2020

مجال البحث الاحصاء الرياضي الهندسة المعلوماتية

والبحث باللغة English

تأليف Kai Brugge - Asja Fischer - Christian Igel

التعلم الالي التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

The Metropolis algorithm is arguably the most fundamental Markov chain Monte Carlo (MCMC) method. But the algorithm is not guaranteed to converge to the desired distribution in the case of multivariate binary distributions (e.g., Ising models or stochastic neural networks such as Boltzmann machines) if the variables (sites or neurons) are updated in a fixed order, a setting commonly used in practice. The reason is that the corresponding Markov chain may not be irreducible. We propose a modified Metropolis transition operator that behaves almost always identically to the standard Metropolis operator and prove that it ensures irreducibility and convergence to the limiting distribution in the multivariate binary case with fixed-order updates. The result provides an explanation for the behaviour of Metropolis MCMC in that setting and closes a long-standing theoretical gap. We experimentally studied the standard and modified Metropolis operator for models were they actually behave differently. If the standard algorithm also converges, the modified operator exhibits similar (if not better) performance in terms of convergence speed.

قيم البحث

200 - Soheil Kolouri , Kimia Nadjahi , Umut Simsekli 2020

Probability metrics have become an indispensable part of modern statistics and machine learning, and they play a quintessential role in various applications, including statistical hypothesis testing and generative modeling. However, in a practical se tting, the convergence behavior of the algorithms built upon these distances have not been well established, except for a few specific cases. In this paper, we introduce a broad family of probability metrics, coined as Generalized Sliced Probability Metrics (GSPMs), that are deeply rooted in the generalized Radon transform. We first verify that GSPMs are metrics. Then, we identify a subset of GSPMs that are equivalent to maximum mean discrepancy (MMD) with novel positive definite kernels, which come with a unique geometric interpretation. Finally, by exploiting this connection, we consider GSPM-based gradient flows for generative modeling applications and show that under mild assumptions, the gradient flow converges to the global optimum. We illustrate the utility of our approach on both real and synthetic problems.

التعلم الالي التعلم الآلي

Task Agnostic Continual Learning Using Online Variational Bayes with Fixed-Point Updates

89 - Chen Zeno , Itay Golan , Elad Hoffer 2020

التعلم الالي التعلم الآلي

On the Convergence of Prior-Guided Zeroth-Order Optimization Algorithms

161 - Shuyu Cheng , Guoqiang Wu , Jun Zhu 2021

Zeroth-order (ZO) optimization is widely used to handle challenging tasks, such as query-based black-box adversarial attacks and reinforcement learning. Various attempts have been made to integrate prior information into the gradient estimation proce dure based on finite differences, with promising empirical results. However, their convergence properties are not well understood. This paper makes an attempt to fill this gap by analyzing the convergence of prior-guided ZO algorithms under a greedy descent framework with various gradient estimators. We provide a convergence guarantee for the prior-guided random gradient-free (PRGF) algorithms. Moreover, to further accelerate over greedy descent methods, we present a new accelerated random search (ARS) algorithm that incorporates prior information, together with a convergence analysis. Finally, our theoretical results are confirmed by experiments on several numerical benchmarks as well as adversarial attacks.

التعلم الالي التعلم الآلي التحسين والتحكم

Scalable Metropolis-Hastings for Exact Bayesian Inference with Large Datasets

95 - Robert Cornish , Paul Vanetti , Alexandre Bouchard-C^ote 2019

Bayesian inference via standard Markov Chain Monte Carlo (MCMC) methods is too computationally intensive to handle large datasets, since the cost per step usually scales like $Theta(n)$ in the number of data points $n$. We propose the Scalable Metrop olis-Hastings (SMH) kernel that exploits Gaussian concentration of the posterior to require processing on average only $O(1)$ or even $O(1/sqrt{n})$ data points per step. This scheme is based on a combination of factorized acceptance probabilities, procedures for fast simulation of Bernoulli processes, and control variate ideas. Contrary to many MCMC subsampling schemes such as fixed step-size Stochastic Gradient Langevin Dynamics, our approach is exact insofar as the invariant distribution is the true posterior and not an approximation to it. We characterise the performance of our algorithm theoretically, and give realistic and verifiable conditions under which it is geometrically ergodic. This theory is borne out by empirical results that demonstrate overall performance benefits over standard Metropolis-Hastings and various subsampling algorithms.

التعلم الالي التعلم الآلي

Super-Linear Convergence of Dual Augmented-Lagrangian Algorithm for Sparsity Regularized Estimation

781 - Ryota Tomioka , Taiji Suzuki , Masashi Sugiyama 2009

We analyze the convergence behaviour of a recently proposed algorithm for regularized estimation called Dual Augmented Lagrangian (DAL). Our analysis is based on a new interpretation of DAL as a proximal minimization algorithm. We theoretically show under some conditions that DAL converges super-linearly in a non-asymptotic and global sense. Due to a special modelling of sparse estimation problems in the context of machine learning, the assumptions we make are milder and more natural than those made in conventional analysis of augmented Lagrangian algorithms. In addition, the new interpretation enables us to generalize DAL to wide varieties of sparse estimation problems. We experimentally confirm our analysis in a large scale $ell_1$-regularized logistic regression problem and extensively compare the efficiency of DAL algorithm to previously proposed algorithms on both synthetic and benchmark datasets.

التعلم الالي التعلم الآلي المنهجية