No Arabic abstract
Logistic linear mixed model (LLMM) is one of the most widely used statistical models. Generally, Markov chain Monte Carlo algorithms are used to explore the posterior densities associated with Bayesian LLMMs. Polson, Scott and Windles (2013) Polya-Gamma data augmentation (DA) technique can be used to construct full Gibbs (FG) samplers for LLMMs. Here, we develop efficient block Gibbs (BG) samplers for Bayesian LLMMs using the Polya-Gamma DA method. We compare the FG and BG samplers in the context of simulated and real data examples as the correlation between the fixed and random effects changes as well as when the dimensions of the design matrices vary. These numerical examples demonstrate superior performance of the BG samplers over the FG samplers. We also derive conditions guaranteeing geometric ergodicity of the BG Markov chain when the popular improper uniform prior is assigned on the regression coefficients and proper or improper priors are placed on the variance parameters of the random effects. This theoretical result has important practical implications as it justifies the use of asymptotically valid Monte Carlo standard errors for Markov chain based estimates of posterior quantities.
The emergence of big data has led to a growing interest in so-called convergence complexity analysis, which is the study of how the convergence rate of a Monte Carlo Markov chain (for an intractable Bayesian posterior distribution) scales as the underlying data set grows in size. Convergence complexity analysis of practical Monte Carlo Markov chains on continuous state spaces is quite challenging, and there have been very few successful analyses of such chains. One fruitful analysis was recently presented by Qin and Hobert (2021b), who studied a Gibbs sampler for a simple Bayesian random effects model. These authors showed that, under regularity conditions, the geometric convergence rate of this Gibbs sampler converges to zero as the data set grows in size. It is shown herein that similar behavior is exhibited by Gibbs samplers for more general Bayesian models that possess both random effects and traditional continuous covariates, the so-called mixed models. The analysis employs the Wasserstein-based techniques introduced by Qin and Hobert (2021b).
Comment on ``On Random Scan Gibbs Samplers [arXiv:0808.3852]
The Bayesian probit regression model (Albert and Chib (1993)) is popular and widely used for binary regression. While the improper flat prior for the regression coefficients is an appropriate choice in the absence of any prior information, a proper normal prior is desirable when prior information is available or in modern high dimensional settings where the number of coefficients ($p$) is greater than the sample size ($n$). For both choices of priors, the resulting posterior density is intractable and a Data Dugmentation (DA) Markov chain is used to generate approximate samples from the posterior distribution. Establishing geometric ergodicity for this DA Markov chain is important as it provides theoretical guarantees for constructing standard errors for Markov chain based estimates of posterior quantities. In this paper, we first show that in case of proper normal priors, the DA Markov chain is geometrically ergodic *for all* choices of the design matrix $X$, $n$ and $p$ (unlike the improper prior case, where $n geq p$ and another condition on $X$ are required for posterior propriety itself). We also derive sufficient conditions under which the DA Markov chain is trace-class, i.e., the eigenvalues of the corresponding operator are summable. In particular, this allows us to conclude that the Haar PX-DA sandwich algorithm (obtained by inserting an inexpensive extra step in between the two steps of the DA algorithm) is strictly better than the DA algorithm in an appropriate sense.
Computational couplings of Markov chains provide a practical route to unbiased Monte Carlo estimation that can utilize parallel computation. However, these approaches depend crucially on chains meeting after a small number of transitions. For models that assign data into groups, e.g. mixture models, the obvious approaches to couple Gibbs samplers fail to meet quickly. This failure owes to the so-called label-switching problem; semantically equivalent relabelings of the groups contribute well-separated posterior modes that impede fast mixing and cause large meeting times. We here demonstrate how to avoid label switching by considering chains as exploring the space of partitions rather than labelings. Using a metric on this space, we employ an optimal transport coupling of the Gibbs conditionals. This coupling outperforms alternative couplings that rely on labelings and, on a real dataset, provides estimates more precise than usual ergodic averages in the limited time regime. Code is available at github.com/tinnguyen96/coupling-Gibbs-partition.
Drawing independent samples from high-dimensional probability distributions represents the major computational bottleneck for modern algorithms, including powerful machine learning frameworks such as deep learning. The quest for discovering larger families of distributions for which sampling can be efficiently realized has inspired an exploration beyond established computing methods and turning to novel physical devices that leverage the principles of quantum computation. Quantum annealing embodies a promising computational paradigm that is intimately related to the complexity of energy landscapes in Gibbs distributions, which relate the probabilities of system states to the energies of these states. Here, we study the sampling properties of physical realizations of quantum annealers which are implemented through programmable lattices of superconducting flux qubits. Comprehensive statistical analysis of the data produced by these quantum machines shows that quantum annealers behave as samplers that generate independent configurations from low-temperature noisy Gibbs distributions. We show that the structure of the output distribution probes the intrinsic physical properties of the quantum device such as effective temperature of individual qubits and magnitude of local qubit noise, which result in a non-linear response function and spurious interactions that are absent in the hardware implementation. We anticipate that our methodology will find widespread use in characterization of future generations of quantum annealers and other emerging analog computing devices.