No Arabic abstract
This paper discusses the challenges presented by tall data problems associated with Bayesian classification (specifically binary classification) and the existing methods to handle them. Current methods include parallelizing the likelihood, subsampling, and consensus Monte Carlo. A new method based on the two-stage Metropolis-Hastings algorithm is also proposed. The purpose of this algorithm is to reduce the exact likelihood computational cost in the tall data situation. In the first stage, a new proposal is tested by the approximate likelihood based model. The full likelihood based posterior computation will be conducted only if the proposal passes the first stage screening. Furthermore, this method can be adopted into the consensus Monte Carlo framework. The two-stage method is applied to logistic regression, hierarchical logistic regression, and Bayesian multivariate adaptive regression splines.
We propose a new kernel for Metropolis Hastings called Directional Metropolis Hastings (DMH) with multivariate update where the proposal kernel has state dependent covariance matrix. We use the derivative of the target distribution at the current state to change the orientation of the proposal distribution, therefore producing a more plausible proposal. We study the conditions for geometric ergodicity of our algorithm and provide necessary and sufficient conditions for convergence. We also suggest a scheme for adaptively update the variance parameter and study the conditions of ergodicity of the adaptive algorithm. We demonstrate the performance of our algorithm in a Bayesian generalized linear model problem.
We consider a pseudo-marginal Metropolis--Hastings kernel $P_m$ that is constructed using an average of $m$ exchangeable random variables, as well as an analogous kernel $P_s$ that averages $s<m$ of these same random variables. Using an embedding technique to facilitate comparisons, we show that the asymptotic variances of ergodic averages associated with $P_m$ are lower bounded in terms of those associated with $P_s$. We show that the bound provided is tight and disprove a conjecture that when the random variables to be averaged are independent, the asymptotic variance under $P_m$ is never less than $s/m$ times the variance under $P_s$. The conjecture does, however, hold when considering continuous-time Markov chains. These results imply that if the computational cost of the algorithm is proportional to $m$, it is often better to set $m=1$. We provide intuition as to why these findings differ so markedly from recent results for pseudo-marginal kernels employing particle filter approximations. Our results are exemplified through two simulation studies; in the first the computational cost is effectively proportional to $m$ and in the second there is a considerable start-up cost at each iteration.
This paper develops a Bayesian computational platform at the interface between posterior sampling and optimization in models whose marginal likelihoods are difficult to evaluate. Inspired by adversarial optimization, namely Generative Adversarial Networks (GAN), we reframe the likelihood function estimation problem as a classification problem. Pitting a Generator, who simulates fake data, against a Classifier, who tries to distinguish them from the real data, one obtains likelihood (ratio) estimators which can be plugged into the Metropolis-Hastings algorithm. The resulting Markov chains generate, at a steady state, samples from an approximate posterior whose asymptotic properties we characterize. Drawing upon connections with empirical Bayes and Bayesian mis-specification, we quantify the convergence rate in terms of the contraction speed of the actual posterior and the convergence rate of the Classifier. Asymptotic normality results are also provided which justify inferential potential of our approach. We illustrate the usefulness of our approach on examples which have posed a challenge for existing Bayesian likelihood-free approaches.
Statistical Data Assimilation (SDA) is the transfer of information from field or laboratory observations to a user selected model of the dynamical system producing those observations. The data is noisy and the model has errors; the information transfer addresses properties of the conditional probability distribution of the states of the model conditioned on the observations. The quantities of interest in SDA are the conditional expected values of functions of the model state, and these require the approximate evaluation of high dimensional integrals. We introduce a conditional probability distribution and use the Laplace method with annealing to identify the maxima of the conditional probability distribution. The annealing method slowly increases the precision term of the model as it enters the Laplace method. In this paper, we extend the idea of precision annealing (PA) to Monte Carlo calculations of conditional expected values using Metropolis-Hastings methods.
We present a detailed circuit implementation of Szegedys quantization of the Metropolis-Hastings walk. This quantum walk is usually defined with respect to an oracle. We find that a direct implementation of this oracle requires costly arithmetic operations and thus reformulate the quantum walk in a way that circumvents the implementation of that specific oracle and which closely follows the classical Metropolis-Hastings walk. We also present heuristic quantum algorithms that use the quantum walk in the context of discrete optimization problems and numerically study their performances. Our numerical results indicate polynomial quantum speedups in heuristic settings.