ترغب بنشر مسار تعليمي؟ اضغط هنا

Model selection and parameter inference in phylogenetics using Nested Sampling

116   0   0.0 ( 0 )
 نشر من قبل Patricio Maturana
 تاريخ النشر 2017
والبحث باللغة English




اسأل ChatGPT حول البحث

Bayesian inference methods rely on numerical algorithms for both model selection and parameter inference. In general, these algorithms require a high computational effort to yield reliable estimates. One of the major challenges in phylogenetics is the estimation of the marginal likelihood. This quantity is commonly used for comparing different evolutionary models, but its calculation, even for simple models, incurs high computational cost. Another interesting challenge relates to the estimation of the posterior distribution. Often, long Markov chains are required to get sufficient samples to carry out parameter inference, especially for tree distributions. In general, these problems are addressed separately by using different procedures. Nested sampling (NS) is a Bayesian computation algorithm which provides the means to estimate marginal likelihoods together with their uncertainties, and to sample from the posterior distribution at no extra cost. The methods currently used in phylogenetics for marginal likelihood estimation lack in practicality due to their dependence on many tuning parameters and the inability of most implementations to provide a direct way to calculate the uncertainties associated with the estimates. To address these issues, we introduce NS to phylogenetics. Its performance is assessed under different scenarios and compared to established methods. We conclude that NS is a competitive and attractive algorithm for phylogenetic inference. An implementation is available as a package for BEAST 2 under the LGPL licence, accessible at https://github.com/BEAST2-Dev/nested-sampling.



قيم البحث

اقرأ أيضاً

To support and guide an extensive experimental research into systems biology of signaling pathways, increasingly more mechanistic models are being developed with hopes of gaining further insight into biological processes. In order to analyse these mo dels, computational and statistical techniques are needed to estimate the unknown kinetic parameters. This chapter reviews methods from frequentist and Bayesian statistics for estimation of parameters and for choosing which model is best for modeling the underlying system. Approximate Bayesian Computation (ABC) techniques are introduced and employed to explore different hypothesis about the JAK-STAT signaling pathway.
Computer simulations have become an important tool across the biomedical sciences and beyond. For many important problems several different models or hypotheses exist and choosing which one best describes reality or observed data is not straightforwa rd. We therefore require suitable statistical tools that allow us to choose rationally between different mechanistic models of e.g. signal transduction or gene regulation networks. This is particularly challenging in systems biology where only a small number of molecular species can be assayed at any given time and all measurements are subject to measurement uncertainty. Here we develop such a model selection framework based on approximate Bayesian computation and employing sequential Monte Carlo sampling. We show that our approach can be applied across a wide range of biological scenarios, and we illustrate its use on real data describing influenza dynamics and the JAK-STAT signalling pathway. Bayesian model selection strikes a balance between the complexity of the simulation models and their ability to describe observed data. The present approach enables us to employ the whole formal apparatus to any system that can be (efficiently) simulated, even when exact likelihoods are computationally intractable.
We consider the problem of selecting deterministic or stochastic models for a biological, ecological, or environmental dynamical process. In most cases, one prefers either deterministic or stochastic models as candidate models based on experience or subjective judgment. Due to the complex or intractable likelihood in most dynamical models, likelihood-based approaches for model selection are not suitable. We use approximate Bayesian computation for parameter estimation and model selection to gain further understanding of the dynamics of two epidemics of chronic wasting disease in mule deer. The main novel contribution of this work is that under a hierarchical model framework we compare three types of dynamical models: ordinary differential equation, continuous time Markov chain, and stochastic differential equation models. To our knowledge model selection between these types of models has not appeared previously. Since the practice of incorporating dynamical models into data models is becoming more common, the proposed approach may be very useful in a variety of applications.
Sampling errors in nested sampling parameter estimation differ from those in Bayesian evidence calculation, but have been little studied in the literature. This paper provides the first explanation of the two main sources of sampling errors in nested sampling parameter estimation, and presents a new diagrammatic representation for the process. We find no current method can accurately measure the parameter estimation errors of a single nested sampling run, and propose a method for doing so using a new algorithm for dividing nested sampling runs. We empirically verify our conclusions and the accuracy of our new method.
Assessing the quality of parameter estimates for models describing the motion of single molecules in cellular environments is an important problem in fluorescence microscopy. We consider the fundamental data model, where molecules emit photons at ran dom times and the photons arrive at random locations on the detector according to complex point spread functions (PSFs). The random, non-Gaussian PSF of the detection process and random trajectory of the molecule make inference challenging. Moreover, the presence of other nearby molecules causes further uncertainty in the origin of the measurements, which impacts the statistical precision of estimates. We quantify the limits of accuracy of model parameter estimates and separation distance between closely spaced molecules (known as the resolution problem) by computing the Cramer-Rao lower bound (CRLB), or equivalently the inverse of the Fisher information matrix (FIM), for the variance of estimates. This fundamental CRLB is crucial, as it provides a lower bound for more practical scenarios. While analytic expressions for the FIM can be derived for static molecules, the analytical tools to evaluate it for molecules whose trajectories follow SDEs are still mostly missing. We address this by presenting a general SMC based methodology for both parameter inference and computing the desired accuracy limits for non-static molecules and a non-Gaussian fundamental detection model. For the first time, we are able to estimate the FIM for stochastically moving molecules observed through the Airy and Born & Wolf PSF. This is achieved by estimating the score and observed information matrix via SMC. We sum up the outcome of our numerical work by summarising the qualitative behaviours for the accuracy limits as functions of e.g. collected photon count, molecule diffusion, etc. We also verify that we can recover known results from the static molecule case.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا