ترغب بنشر مسار تعليمي؟ اضغط هنا

Sequentially guided MCMC proposals for synthetic likelihoods and correlated synthetic likelihoods

157   0   0.0 ( 0 )
 نشر من قبل Umberto Picchini
 تاريخ النشر 2020
  مجال البحث الاحصاء الرياضي
والبحث باللغة English




اسأل ChatGPT حول البحث

Synthetic likelihood (SL) is a strategy for parameter inference when the likelihood function is analytically or computationally intractable. In SL, the likelihood function of the data is replaced by a multivariate Gaussian density over summary statistics of the data. SL requires simulation of many replicate datasets at every parameter value considered by a sampling algorithm, such as MCMC, making the method computationally-intensive. We propose two strategies to alleviate the computational burden imposed by SL algorithms. We first introduce a novel MCMC algorithm for SL where the proposal distribution is sequentially tuned and is also made conditional to data, thus it rapidly guides the proposed parameters towards high posterior probability regions. Second, we exploit strategies borrowed from the correlated pseudo-marginal MCMC literature, to improve the chains mixing in a SL framework. Our methods enable inference for challenging case studies when the chain is initialised in low posterior probability regions of the parameter space, where standard samplers failed. Our guided sampler can also be potentially used with MCMC samplers for approximate Bayesian computation (ABC). Our goal is to provide ways to make the best out of each expensive MCMC iteration, which will broaden the scope of likelihood-free inference for models with costly simulators. To illustrate the advantages stemming from our framework we consider four benchmark examples, including estimation of parameters for a cosmological model and a stochastic model with highly non-Gaussian summary statistics.

قيم البحث

اقرأ أيضاً

The challenges posed by complex stochastic models used in computational ecology, biology and genetics have stimulated the development of approximate approaches to statistical inference. Here we focus on Synthetic Likelihood (SL), a procedure that red uces the observed and simulated data to a set of summary statistics, and quantifies the discrepancy between them through a synthetic likelihood function. SL requires little tuning, but it relies on the approximate normality of the summary statistics. We relax this assumption by proposing a novel, more flexible, density estimator: the Extended Empirical Saddlepoint approximation. In addition to proving the consistency of SL, under either the new or the Gaussian density estimator, we illustrate the method using two examples. One of these is a complex individual-based forest model for which SL offers one of the few practical possibilities for statistical inference. The examples show that the new density estimator is able to capture large departures from normality, while being scalable to high dimensions, and this in turn leads to more accurate parameter estimates, relative to the Gaussian alternative. The new density estimator is implemented by the esaddle R package, which can be found on the Comprehensive R Archive Network (CRAN).
A large number of statistical models are doubly-intractable: the likelihood normalising term, which is a function of the model parameters, is intractable, as well as the marginal likelihood (model evidence). This means that standard inference techniq ues to sample from the posterior, such as Markov chain Monte Carlo (MCMC), cannot be used. Examples include, but are not confined to, massive Gaussian Markov random fields, autologistic models and Exponential random graph models. A number of approximate schemes based on MCMC techniques, Approximate Bayesian computation (ABC) or analytic approximations to the posterior have been suggested, and these are reviewed here. Exact MCMC schemes, which can be applied to a subset of doubly-intractable distributions, have also been developed and are described in this paper. As yet, no general method exists which can be applied to all classes of models with doubly-intractable posteriors. In addition, taking inspiration from the Physics literature, we study an alternative method based on representing the intractable likelihood as an infinite series. Unbiased estimates of the likelihood can then be obtained by finite time stochastic truncation of the series via Russian Roulette sampling, although the estimates are not necessarily positive. Results from the Quantum Chromodynamics literature are exploited to allow the use of possibly negative estimates in a pseudo-marginal MCMC scheme such that expectations with respect to the posterior distribution are preserved. The methodology is reviewed on well-known examples such as the parameters in Ising models, the posterior for Fisher-Bingham distributions on the $d$-Sphere and a large-scale Gaussian Markov Random Field model describing the Ozone Column data. This leads to a critical assessment of the strengths and weaknesses of the methodology with pointers to ongoing research.
This article surveys computational methods for posterior inference with intractable likelihoods, that is where the likelihood function is unavailable in closed form, or where evaluation of the likelihood is infeasible. We review recent developments i n pseudo-marginal methods, approximate Bayesian computation (ABC), the exchange algorithm, thermodynamic integration, and composite likelihood, paying particular attention to advancements in scalability for large datasets. We also mention R and MATLAB source code for implementations of these algorithms, where they are available.
We investigate the use of data-driven likelihoods to bypass a key assumption made in many scientific analyses, which is that the true likelihood of the data is Gaussian. In particular, we suggest using the optimization targets of flow-based generativ e models, a class of models that can capture complex distributions by transforming a simple base distribution through layers of nonlinearities. We call these flow-based likelihoods (FBL). We analyze the accuracy and precision of the reconstructed likelihoods on mock Gaussian data, and show that simply gauging the quality of samples drawn from the trained model is not a sufficient indicator that the true likelihood has been learned. We nevertheless demonstrate that the likelihood can be reconstructed to a precision equal to that of sampling error due to a finite sample size. We then apply FBLs to mock weak lensing convergence power spectra, a cosmological observable that is significantly non-Gaussian (NG). We find that the FBL captures the NG signatures in the data extremely well, while other commonly used data-driven likelihoods, such as Gaussian mixture models and independent component analysis, fail to do so. This suggests that works that have found small posterior shifts in NG data with data-driven likelihoods such as these could be underestimating the impact of non-Gaussianity in parameter constraints. By introducing a suite of tests that can capture different levels of NG in the data, we show that the success or failure of traditional data-driven likelihoods can be tied back to the structure of the NG in the data. Unlike other methods, the flexibility of the FBL makes it successful at tackling different types of NG simultaneously. Because of this, and consequently their likely applicability across datasets and domains, we encourage their use for inference when sufficient mock data are available for training.
We propose a novel Markov chain Monte-Carlo (MCMC) method for reverse engineering the topological structure of stochastic reaction networks, a notoriously challenging problem that is relevant in many modern areas of research, like discovering gene re gulatory networks or analyzing epidemic spread. The method relies on projecting the original time series trajectories onto information rich summary statistics and constructing the appropriate synthetic likelihood function to estimate reaction rates. The resulting estimates are consistent in the large volume limit and are obtained without employing complicated tuning strategies and expensive resampling as typically used by likelihood-free MCMC and approximate Bayesian methods. To illustrate run time improvements that can be achieved with our approach, we present a simulation study on inferring rates in a stochastic dynamical system arising from a density dependent Markov jump process. We then apply the method to two real data examples: the RNA-seq data from zebrafish experiment and the incidence data from 1665 plague outbreak at Eyam, England.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا