Parallel tempering (PT) methods are a popular class of Markov chain Monte Carlo schemes used to sample complex high-dimensional probability distributions. They rely on a collection of $N$ interacting auxiliary chains targeting temper
Parallel tempering (PT) is a class of Markov chain Monte Carlo algorithms that constructs a path of distributions annealing between a tractable reference and an intractable target, and then interchanges states along the path to improve mixing in the target. The performance of PT depends on how quickly a sample from the reference distribution makes its way to the target, which in turn depends on the particular path of annealing distributions. However, past work on PT has used only simple paths constructed from convex combinations of the reference and target log-densities. This paper begins by demonstrating that this path performs poorly in the setting where the reference and target are nearly mutually singular. To address this issue, we expand the framework of PT to general families of paths, formulate the choice of path as an optimization problem that admits tractable gradient estimates, and propose a flexible new family of spline interpolation paths for use in practice. Theoretical and empirical results both demonstrate that our proposed methodology breaks previously-established upper performance limits for traditional paths.
Monte Carlo (MC) methods are widely used for Bayesian inference and optimization in statistics, signal processing and machine learning. A well-known class of MC methods are Markov Chain Monte Carlo (MCMC) algorithms. In order to foster better exploration of the state space, specially in high-dimensional applications, several schemes employing multiple parallel MCMC chains have been recently introduced. In this work, we describe a novel parallel interacting MCMC scheme, called {it orthogonal MCMC} (O-MCMC), where a set of vertical parallel MCMC chains share information using some horizontal MCMC techniques working on the entire population of current states. More specifically, the vertical chains are led by random-walk proposals, whereas the horizontal MCMC techniques employ independent proposals, thus allowing an efficient combination of global exploration and local approximation. The interaction is contained in these horizontal iterations. Within the analysis of different implementations of O-MCMC, novel schemes in order to reduce the overall computational cost of parallel multiple try Metropolis (MTM) chains are also presented. Furthermore, a modified version of O-MCMC for optimization is provided by considering parallel simulated annealing (SA) algorithms. Numerical results show the advantages of the proposed sampling scheme in terms of efficiency in the estimation, as well as robustness in terms of independence with respect to initial values and the choice of the parameters.
In this article, we derive a novel non-reversible, continuous-time Markov chain Monte Carlo (MCMC) sampler, called Coordinate Sampler, based on a piecewise deterministic Markov process (PDMP), which can be seen as a variant of the Zigzag sampler. In addition to proving a theoretical validation for this new sampling algorithm, we show that the Markov chain it induces exhibits geometrical ergodicity convergence, for distributions whose tails decay at least as fast as an exponential distribution and at most as fast as a Gaussian distribution. Several numerical examples highlight that our coordinate sampler is more efficient than the Zigzag sampler, in terms of effective sample size.
We review several parallel tempering schemes and examine their main ingredients for accuracy and efficiency. The present study covers two selection methods of temperatures and several choices for the exchange of replicas, including a recent novel all-pair exchange method. We compare the resulting schemes and measure specific heat errors and efficiency using the two-dimensional (2D) Ising model. Our tests suggest that, an earlier proposal for using numbers of local moves related to the canonical correlation times is one of the key ingredients for increasing efficiency, and protocols using cluster algorithms are found to be very effective. Some of the protocols are also tested for efficiency and ground state production in 3D spin glass models where we find that, a simple nearest-neighbor approach using a local n-fold way algorithm is the most effective. Finally, we present evidence that the asymptotic limits of the ground state energy for the isotropic case and that of an anisotropic case of the 3D spin-glass model are very close and may even coincide.