No Arabic abstract
We address the problem of providing inference from a Bayesian perspective for parameters selected after viewing the data. We present a Bayesian framework for providing inference for selected parameters, based on the observation that providing Bayesian inference for selected parameters is a truncated data problem. We show that if the prior for the parameter is non-informative, or if the parameter is a fixed unknown constant, then it is necessary to adjust the Bayesian inference for selection. Our second contribution is the introduction of Bayesian False Discovery Rate controlling methodology,which generalizes existing Bayesian FDR methods that are only defined in the two-group mixture model.We illustrate our results by applying them to simulated data and data froma microarray experiment.
This survey covers state-of-the-art Bayesian techniques for the estimation of mixtures. It complements the earlier Marin, Mengersen and Robert (2005) by studying new types of distributions, the multinomial, latent class and t distributions. It also exhibits closed form solutions for Bayesian inference in some discrete setups. Lastly, it sheds a new light on the computation of Bayes factors via the approximation of Chib (1995).
We study the class of state-space models and perform maximum likelihood estimation for the model parameters. We consider a stochastic approximation expectation-maximization (SAEM) algorithm to maximize the likelihood function with the novelty of using approximate Bayesian computation (ABC) within SAEM. The task is to provide each iteration of SAEM with a filtered state of the system, and this is achieved using an ABC sampler for the hidden state, based on sequential Monte Carlo (SMC) methodology. It is shown that the resulting SAEM-ABC algorithm can be calibrated to return accurate inference, and in some situations it can outperform a version of SAEM incorporating the bootstrap filter. Two simulation studies are presented, first a nonlinear Gaussian state-space model then a state-space model having dynamics expressed by a stochastic differential equation. Comparisons with iterated filtering for maximum likelihood inference, and Gibbs sampling and particle marginal methods for Bayesian inference are presented.
Bayesian inference for nonlinear diffusions, observed at discrete times, is a challenging task that has prompted the development of a number of algorithms, mainly within the computational statistics community. We propose a new direction, and accompanying methodology, borrowing ideas from statistical physics and computational chemistry, for inferring the posterior distribution of latent diffusion paths and model parameters, given observations of the process. Joint configurations of the underlying process noise and of parameters, mapping onto diffusion paths consistent with observations, form an implicitly defined manifold. Then, by making use of a constrained Hamiltonian Monte Carlo algorithm on the embedded manifold, we are able to perform computationally efficient inference for an extensive class of discretely observed diffusion models. Critically, in contrast with other approaches proposed in the literature, our methodology is highly automated, requiring minimal user intervention and applying alike in a range of settings, including: elliptic or hypo-elliptic systems; observations with or without noise; linear or non-linear observation operators. Exploiting Markovianity, we propose a variant of the method with complexity that scales linearly in the resolution of path discretisation and the number of observation times.
Approximate Bayesian computation (ABC) is computationally intensive for complex model simulators. To exploit expensive simulations, data-resampling via bootstrapping can be employed to obtain many artificial datasets at little cost. However, when using this approach within ABC, the posterior variance is inflated, thus resulting in biased posterior inference. Here we use stratified Monte Carlo to considerably reduce the bias induced by data resampling. We also show empirically that it is possible to obtain reliable inference using a larger than usual ABC threshold. Finally, we show that with stratified Monte Carlo we obtain a less variable ABC likelihood. Ultimately we show how our approach improves the computational efficiency of the ABC samplers. We construct several ABC samplers employing our methodology, such as rejection and importance ABC samplers, and ABC-MCMC samplers. We consider simulation studies for static (Gaussian, g-and-k distribution, Ising model, astronomical model) and dynamic models (Lotka-Volterra). We compare against state-of-art sequential Monte Carlo ABC samplers, synthetic likelihoods, and likelihood-free Bayesian optimization. For a computationally expensive Lotka-Volterra case study, we found that our strategy leads to a more than 10-fold computational saving, compared to a sampler that does not use our novel approach.
Dynamically rescaled Hamiltonian Monte Carlo (DRHMC) is introduced as a computationally fast and easily implemented method for performing full Bayesian analysis in hierarchical statistical models. The method relies on introducing a modified parameterisation so that the re-parameterised target distribution has close to constant scaling properties, and thus is easily sampled using standard (Euclidian metric) Hamiltonian Monte Carlo. Provided that the parameterisations of the conditional distributions specifying the hierarchical model are constant information parameterisations (CIP), the relation between the modified- and original parameterisation is bijective, explicitly computed and admit exploitation of sparsity in the numerical linear algebra involved. CIPs for a large catalogue of statistical models are presented, and from the catalogue, it is clear that many CIPs are currently routinely used in statistical computing. A relation between the proposed methodology and a class of explicitly integrated Riemann manifold Hamiltonian Monte Carlo methods is discussed. The methodology is illustrated on several example models, including a model for inflation rates with multiple levels of non-linearly dependent latent variables.