For large scale on-line inference problems the update strategy is critical for performance. We derive an adaptive scan Gibbs sampler that optimizes the update frequency by selecting an optimum mini-batch size. We demonstrate performance of our adaptive batch-size Gibbs sampler by comparing it against the collapsed Gibbs sampler for Bayesian Lasso, Dirichlet Process Mixture Models (DPMM) and Latent Dirichlet Allocation (LDA) graphical models.
Estimators computed from adaptively collected data do not behave like their non-adaptive brethren. Rather, the sequential dependence of the collection policy can lead to severe distributional biases that persist even in the infinite data limit. We develop a general method -- $mathbf{W}$-decorrelation -- for transforming the bias of adaptive linear regression estimators into variance. The method uses only coarse-grained information about the data collection policy and does not need access to propensity scores or exact knowledge of the policy. We bound the finite-sample bias and variance of the $mathbf{W}$-estimator and develop asymptotically correct confidence intervals based on a novel martingale central limit theorem. We then demonstrate the empirical benefits of the generic $mathbf{W}$-decorrelation procedure in two different adaptive data settings: the multi-armed bandit and the autoregressive time series.
In this article, we derive a novel non-reversible, continuous-time Markov chain Monte Carlo (MCMC) sampler, called Coordinate Sampler, based on a piecewise deterministic Markov process (PDMP), which can be seen as a variant of the Zigzag sampler. In addition to proving a theoretical validation for this new sampling algorithm, we show that the Markov chain it induces exhibits geometrical ergodicity convergence, for distributions whose tails decay at least as fast as an exponential distribution and at most as fast as a Gaussian distribution. Several numerical examples highlight that our coordinate sampler is more efficient than the Zigzag sampler, in terms of effective sample size.
As well as primary fluctuations, CMB temperature maps contain a wealth of additional information in the form of secondary anisotropies. Secondary effects that can be identified with individual objects, such as the thermal and kinetic Sunyaev-Zeldovich (SZ) effects due to galaxy clusters, are difficult to unambiguously disentangle from foreground contamination and the primary CMB however. We develop a Bayesian formalism for rigorously characterising anisotropies that are localised on the sky, taking the TSZ and KSZ effects as an example. Using a Gibbs sampling scheme, we are able to efficiently sample from the joint posterior distribution for a multi-component model of the sky with many thousands of correlated physical parameters. The posterior can then be exactly marginalised to estimate properties of the secondary anisotropies, fully taking into account degeneracies with the other signals in the CMB map. We show that this method is computationally tractable using a simple implementation based on the existing Commander component separation code, and also discuss how other types of secondary anisotropy can be accommodated within our framework.
In many fields, researchers are interested in discovering features with substantial effect on the response from a large number of features and controlling the proportion of false discoveries. By incorporating the knockoff procedure in the Bayesian framework, we develop the Bayesian knockoff filter (BKF) for selecting features that have important effect on the response. In contrast to the fixed knockoff variables in the frequentist procedures, we allow the knockoff variables to be continuously updated in the Markov chain Monte Carlo. Based on the posterior samples and elaborated greedy selection procedures, our method can distinguish the truly important features as well as controlling the Bayesian false discovery rate at a desirable level. Numerical experiments on both synthetic and real data demonstrate the advantages of our method over existing knockoff methods and Bayesian variable selection approaches, i.e., the BKF possesses higher power and yields a lower false discovery rate.
Discrete choice models describe the choices made by decision makers among alternatives and play an important role in transportation planning, marketing research and other applications. The mixed multinomial logit (MMNL) model is a popular discrete choice model that captures heterogeneity in the preferences of decision makers through random coefficients. While Markov chain Monte Carlo methods provide the Bayesian analogue to classical procedures for estimating MMNL models, computations can be prohibitively expensive for large datasets. Approximate inference can be obtained using variational methods at a lower computational cost with competitive accuracy. In this paper, we develop variational methods for estimating MMNL models that allow random coefficients to be correlated in the posterior and can be extended easily to large-scale datasets. We explore three alternatives: (1) Laplace variational inference, (2) nonconjugate variational message passing and (3) stochastic linear regression. Their performances are compared using real and simulated data. To accelerate convergence for large datasets, we develop stochastic variational inference for MMNL models using each of the above alternatives. Stochastic variational inference allows data to be processed in minibatches by optimizing global variational parameters using stochastic gradient approximation. A novel strategy for increasing minibatch sizes adaptively within stochastic variational inference is proposed.