No Arabic abstract
In many fields, researchers are interested in discovering features with substantial effect on the response from a large number of features and controlling the proportion of false discoveries. By incorporating the knockoff procedure in the Bayesian framework, we develop the Bayesian knockoff filter (BKF) for selecting features that have important effect on the response. In contrast to the fixed knockoff variables in the frequentist procedures, we allow the knockoff variables to be continuously updated in the Markov chain Monte Carlo. Based on the posterior samples and elaborated greedy selection procedures, our method can distinguish the truly important features as well as controlling the Bayesian false discovery rate at a desirable level. Numerical experiments on both synthetic and real data demonstrate the advantages of our method over existing knockoff methods and Bayesian variable selection approaches, i.e., the BKF possesses higher power and yields a lower false discovery rate.
We study the convergence properties of a collapsed Gibbs sampler for Bayesian vector autoregressions with predictors, or exogenous variables. The Markov chain generated by our algorithm is shown to be geometrically ergodic regardless of whether the number of observations in the underlying vector autoregression is small or large in comparison to the order and dimension of it. In a convergence complexity analysis, we also give conditions for when the geometric ergodicity is asymptotically stable as the number of observations tends to infinity. Specifically, the geometric convergence rate is shown to be bounded away from unity asymptotically, either almost surely or with probability tending to one, depending on what is assumed about the data generating process. This result is one of the first of its kind for practically relevant Markov chain Monte Carlo algorithms. Our convergence results hold under close to arbitrary model misspecification.
In this article, we derive a novel non-reversible, continuous-time Markov chain Monte Carlo (MCMC) sampler, called Coordinate Sampler, based on a piecewise deterministic Markov process (PDMP), which can be seen as a variant of the Zigzag sampler. In addition to proving a theoretical validation for this new sampling algorithm, we show that the Markov chain it induces exhibits geometrical ergodicity convergence, for distributions whose tails decay at least as fast as an exponential distribution and at most as fast as a Gaussian distribution. Several numerical examples highlight that our coordinate sampler is more efficient than the Zigzag sampler, in terms of effective sample size.
The logistic regression model is the most popular model for analyzing binary data. In the absence of any prior information, an improper flat prior is often used for the regression coefficients in Bayesian logistic regression models. The resulting intractable posterior density can be explored by running Polson et al.s (2013) data augmentation (DA) algorithm. In this paper, we establish that the Markov chain underlying Polson et al.s (2013) DA algorithm is geometrically ergodic. Proving this theoretical result is practically important as it ensures the existence of central limit theorems (CLTs) for sample averages under a finite second moment condition. The CLT in turn allows users of the DA algorithm to calculate standard errors for posterior estimates.
As well as primary fluctuations, CMB temperature maps contain a wealth of additional information in the form of secondary anisotropies. Secondary effects that can be identified with individual objects, such as the thermal and kinetic Sunyaev-Zeldovich (SZ) effects due to galaxy clusters, are difficult to unambiguously disentangle from foreground contamination and the primary CMB however. We develop a Bayesian formalism for rigorously characterising anisotropies that are localised on the sky, taking the TSZ and KSZ effects as an example. Using a Gibbs sampling scheme, we are able to efficiently sample from the joint posterior distribution for a multi-component model of the sky with many thousands of correlated physical parameters. The posterior can then be exactly marginalised to estimate properties of the secondary anisotropies, fully taking into account degeneracies with the other signals in the CMB map. We show that this method is computationally tractable using a simple implementation based on the existing Commander component separation code, and also discuss how other types of secondary anisotropy can be accommodated within our framework.
For large scale on-line inference problems the update strategy is critical for performance. We derive an adaptive scan Gibbs sampler that optimizes the update frequency by selecting an optimum mini-batch size. We demonstrate performance of our adaptive batch-size Gibbs sampler by comparing it against the collapsed Gibbs sampler for Bayesian Lasso, Dirichlet Process Mixture Models (DPMM) and Latent Dirichlet Allocation (LDA) graphical models.