No Arabic abstract
Computational couplings of Markov chains provide a practical route to unbiased Monte Carlo estimation that can utilize parallel computation. However, these approaches depend crucially on chains meeting after a small number of transitions. For models that assign data into groups, e.g. mixture models, the obvious approaches to couple Gibbs samplers fail to meet quickly. This failure owes to the so-called label-switching problem; semantically equivalent relabelings of the groups contribute well-separated posterior modes that impede fast mixing and cause large meeting times. We here demonstrate how to avoid label switching by considering chains as exploring the space of partitions rather than labelings. Using a metric on this space, we employ an optimal transport coupling of the Gibbs conditionals. This coupling outperforms alternative couplings that rely on labelings and, on a real dataset, provides estimates more precise than usual ergodic averages in the limited time regime. Code is available at github.com/tinnguyen96/coupling-Gibbs-partition.
Comment on ``On Random Scan Gibbs Samplers [arXiv:0808.3852]
Logistic linear mixed model (LLMM) is one of the most widely used statistical models. Generally, Markov chain Monte Carlo algorithms are used to explore the posterior densities associated with Bayesian LLMMs. Polson, Scott and Windles (2013) Polya-Gamma data augmentation (DA) technique can be used to construct full Gibbs (FG) samplers for LLMMs. Here, we develop efficient block Gibbs (BG) samplers for Bayesian LLMMs using the Polya-Gamma DA method. We compare the FG and BG samplers in the context of simulated and real data examples as the correlation between the fixed and random effects changes as well as when the dimensions of the design matrices vary. These numerical examples demonstrate superior performance of the BG samplers over the FG samplers. We also derive conditions guaranteeing geometric ergodicity of the BG Markov chain when the popular improper uniform prior is assigned on the regression coefficients and proper or improper priors are placed on the variance parameters of the random effects. This theoretical result has important practical implications as it justifies the use of asymptotically valid Monte Carlo standard errors for Markov chain based estimates of posterior quantities.
We consider the problem of estimating a parameter associated to a Bayesian inverse problem. Treating the unknown initial condition as a nuisance parameter, typically one must resort to a numerical approximation of gradient of the log-likelihood and also adopt a discretization of the problem in space and/or time. We develop a new methodology to unbiasedly estimate the gradient of the log-likelihood with respect to the unknown parameter, i.e. the expectation of the estimate has no discretization bias. Such a property is not only useful for estimation in terms of the original stochastic model of interest, but can be used in stochastic gradient algorithms which benefit from unbiased estimates. Under appropriate assumptions, we prove that our estimator is not only unbiased but of finite variance. In addition, when implemented on a single processor, we show that the cost to achieve a given level of error is comparable to multilevel Monte Carlo methods, both practically and theoretically. However, the new algorithm provides the possibility for parallel computation on arbitrarily many processors without any loss of efficiency, asymptotically. In practice, this means any precision can be achieved in a fixed, finite constant time, provided that enough processors are available.
The emergence of big data has led to a growing interest in so-called convergence complexity analysis, which is the study of how the convergence rate of a Monte Carlo Markov chain (for an intractable Bayesian posterior distribution) scales as the underlying data set grows in size. Convergence complexity analysis of practical Monte Carlo Markov chains on continuous state spaces is quite challenging, and there have been very few successful analyses of such chains. One fruitful analysis was recently presented by Qin and Hobert (2021b), who studied a Gibbs sampler for a simple Bayesian random effects model. These authors showed that, under regularity conditions, the geometric convergence rate of this Gibbs sampler converges to zero as the data set grows in size. It is shown herein that similar behavior is exhibited by Gibbs samplers for more general Bayesian models that possess both random effects and traditional continuous covariates, the so-called mixed models. The analysis employs the Wasserstein-based techniques introduced by Qin and Hobert (2021b).
Hamiltonian Monte Carlo (HMC) is a popular sampling method in Bayesian inference. Recently, Heng & Jacob (2019) studied Metropolis HMC with couplings for unbiased Monte Carlo estimation, establishing a generic parallelizable scheme for HMC. However, in practice a different HMC method, multinomial HMC, is considered as the go-to method, e.g. as part of the no-U-turn sampler. In multinomial HMC, proposed states are not limited to end-points as in Metropolis HMC; instead points along the entire trajectory can be proposed. In this paper, we establish couplings for multinomial HMC, based on optimal transport for multinomial sampling in its transition. We prove an upper bound for the meeting time - the time it takes for the coupled chains to meet - based on the notion of local contractivity. We evaluate our methods using three targets: 1,000 dimensional Gaussians, logistic regression and log-Gaussian Cox point processes. Compared to Heng & Jacob (2019), coupled multinomial HMC generally attains a smaller meeting time, and is more robust to choices of step sizes and trajectory lengths, which allows re-use of existing adaptation methods for HMC. These improvements together paves the way for a wider and more practical use of coupled HMC methods.