ترغب بنشر مسار تعليمي؟ اضغط هنا

Quantifying and mitigating the effect of preferential sampling on phylodynamic inference

131   0   0.0 ( 0 )
 نشر من قبل Vladimir Minin
 تاريخ النشر 2015
والبحث باللغة English




اسأل ChatGPT حول البحث

Phylodynamics seeks to estimate effective population size fluctuations from molecular sequences of individuals sampled from a population of interest. One way to accomplish this task formulates an observed sequence data likelihood exploiting a coalescent model for the sampled individuals genealogy and then integrating over all possible genealogies via Monte Carlo or, less efficiently, by conditioning on one genealogy estimated from the sequence data. However, when analyzing sequences sampled serially through time, current methods implicitly assume either that sampling times are fixed deterministically by the data collection protocol or that their distribution does not depend on the size of the population. Through simulation, we first show that, when sampling times do probabilistically depend on effective population size, estimation methods may be systematically biased. To correct for this deficiency, we propose a new model that explicitly accounts for preferential sampling by modeling the sampling times as an inhomogeneous Poisson process dependent on effective population size. We demonstrate that in the presence of preferential sampling our new model not only reduces bias, but also improves estimation precision. Finally, we compare the performance of the currently used phylodynamic methods with our proposed model through clinically-relevant, seasonal human influenza examples.



قيم البحث

اقرأ أيضاً

Epidemiological forecasts are beset by uncertainties about the underlying epidemiological processes, and the surveillance process through which data are acquired. We present a Bayesian inference methodology that quantifies these uncertainties, for ep idemics that are modelled by (possibly) non-stationary, continuous-time, Markov population processes. The efficiency of the method derives from a functional central limit theorem approximation of the likelihood, valid for large populations. We demonstrate the methodology by analysing the early stages of the COVID-19 pandemic in the UK, based on age-structured data for the number of deaths. This includes maximum a posteriori estimates, MCMC sampling of the posterior, computation of the model evidence, and the determination of parameter sensitivities via the Fisher information matrix. Our methodology is implemented in PyRoss, an open-source platform for analysis of epidemiological compartment models.
The problem of preferential sampling in geostatistics arises when the choise of location to be sampled is made with information about the phenomena in the study. The geostatistical model under preferential sampling deals with this problem, but parame ter estimation is challenging because the likelihood function has no closed form. We developed an MCEM and an SAEM algorithm for finding the maximum likelihood estimators of parameters of the model and compared our methodology with the existing ones: Monte Carlo likelihood approximation and Laplace approximation. Simulated studies were realized to assess the quality of the proposed methods and showed good parameter estimation and prediction in preferential sampling. Finally, we illustrate our findings on the well known moss data from Galicia.
This paper presents a general model framework for detecting the preferential sampling of environmental monitors recording an environmental process across space and/or time. This is achieved by considering the joint distribution of an environmental pr ocess with a site--selection process that considers where and when sites are placed to measure the process. The environmental process may be spatial, temporal or spatio--temporal in nature. By sharing random effects between the two processes, the joint model is able to establish whether site placement was stochastically dependent of the environmental process under study. The embedding into a spatio--temporal framework also allows for the modelling of the dynamic site---selection process itself. Real--world factors affecting both the size and location of the network can be easily modelled and quantified. Depending upon the choice of population of locations to consider for selection across space and time under the site--selection process, different insights about the precise nature of preferential sampling can be obtained. The general framework developed in the paper is designed to be easily and quickly fit using the R-INLA package. We apply this framework to a case study involving particulate air pollution over the UK where a major reduction in the size of a monitoring network through time occurred. It is demonstrated that a significant response--biased reduction in the air quality monitoring network occurred. We also show that the network was consistently unrepresentative of the levels of particulate matter seen across much of GB throughout the operating life of the network. Finally we show that this may have led to a severe over-reporting of the population--average exposure levels experienced across GB. This could have great impacts on estimates of the health effects of black smoke levels.
We provide an overview of the methods that can be used for prediction under uncertainty and data fitting of dynamical systems, and of the fundamental challenges that arise in this context. The focus is on SIR-like models, that are being commonly used when attempting to predict the trend of the COVID-19 pandemic. In particular, we raise a warning flag about identifiability of the parameters of SIR-like models; often, it might be hard to infer the correct values of the parameters from data, even for very simple models, making it non-trivial to use these models for meaningful predictions. Most of the points that we touch upon are actually generally valid for inverse problems in more general setups.
We introduce phylodyn, an R package for phylodynamic analysis based on gene genealogies. The package main functionality is Bayesian nonparametric estimation of effective population size fluctuations over time. Our implementation includes several Mark ov chain Monte Carlo-based methods and an integrated nested Laplace approximation-based approach for phylodynamic inference that have been developed in recent years. Genealogical data describe the timed ancestral relationships of individuals sampled from a population of interest. Here, individuals are assumed to be sampled at the same point in time (isochronous sampling) or at different points in time (heterochronous sampling); in addition, sampling events can be modeled with preferential sampling, which means that the intensity of sampling events is allowed to depend on the effective population size trajectory. We assume the coalescent and the sequentially Markov coalescent processes as generative models of genealogies. We include several coalescent simulation functions that are useful for testing our phylodynamics methods via simulation studies. We compare the performance and outputs of various methods implemented in phylodyn and outline their strengths and weaknesses. R package phylodyn is available at https://github.com/mdkarcher/phylodyn.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا