ترغب بنشر مسار تعليمي؟ اضغط هنا

Reconstructing transmission trees for communicable diseases using densely sampled genetic data

107   0   0.0 ( 0 )
 نشر من قبل Colin Worby
 تاريخ النشر 2014
والبحث باللغة English




اسأل ChatGPT حول البحث

Whole genome sequencing of pathogens from multiple hosts in an epidemic offers the potential to investigate who infected whom with unparalleled resolution, potentially yielding important insights into disease dynamics and the impact of control measures. We considered disease outbreaks in a setting with dense genomic sampling, and formulated stochastic epidemic models to investigate person-to-person transmission, based on observed genomic and epidemiological data. We constructed models in which the genetic distance between sampled genotypes depends on the epidemiological relationship between the hosts. A data augmented Markov chain Monte Carlo algorithm was used to sample over the transmission trees, providing a posterior probability for any given transmission route. We investigated the predictive performance of our methodology using simulated data, demonstrating high sensitivity and specificity, particularly for rapidly mutating pathogens with low transmissibility. We then analyzed data collected during an outbreak of methicillin-resistant Staphylococcus aureus in a hospital, identifying probable transmission routes and estimating epidemiological parameters. Our approach overcomes limitations of previous methods, providing a framework with the flexibility to allow for unobserved infection times, multiple independent introductions of the pathogen, and within-host genetic diversity, as well as allowing forward simulation.



قيم البحث

اقرأ أيضاً

Near real-time monitoring of outbreak transmission dynamics and evaluation of public health interventions are critical for interrupting the spread of the novel coronavirus (SARS-CoV-2) and mitigating morbidity and mortality caused by coronavirus dise ase (COVID-19). Formulating a regional mechanistic model of SARS-CoV-2 transmission dynamics and frequently estimating parameters of this model using streaming surveillance data offers one way to accomplish data-driven decision making. For example, to detect an increase in new SARS-CoV-2 infections due to relaxation of previously implemented mitigation measures one can monitor estimates of the basic and effective reproductive numbers. However, parameter estimation can be imprecise, and sometimes even impossible, because surveillance data are noisy and not informative about all aspects of the mechanistic model, even for reasonably parsimonious epidemic models. To overcome this obstacle, at least partially, we propose a Bayesian modeling framework that integrates multiple surveillance data streams. Our model uses both COVID-19 incidence and mortality time series to estimate our model parameters. Importantly, our data generating model for incidence data takes into account changes in the total number of tests performed. We apply our Bayesian data integration method to COVID-19 surveillance data collected in Orange County, California. Our results suggest that California Department of Public Health stay-at-home order, issued on March 19, 2020, lowered the SARS-CoV-2 effective reproductive number $R_{e}$ in Orange County below 1.0, which means that the order was successful in suppressing SARS-CoV-2 infections. However, subsequent re-opening steps took place when thousands of infectious individuals remained in Orange County, so $R_{e}$ increased to approximately 1.0 by mid-June and above 1.0 by mid-July.
RNA-Seq technology allows for studying the transcriptional state of the cell at an unprecedented level of detail. Beyond quantification of whole-gene expression, it is now possible to disentangle the abundance of individual alternatively spliced tran script isoforms of a gene. A central question is to understand the regulatory processes that lead to differences in relative abundance variation due to external and genetic factors. Here, we present a mixed model approach that allows for (i) joint analysis and genetic mapping of multiple transcript isoforms and (ii) mapping of isoform-specific effects. Central to our approach is to comprehensively model the causes of variation and correlation between transcript isoforms, including the genomic background and technical quantification uncertainty. As a result, our method allows to accurately test for shared as well as transcript-specific genetic regulation of transcript isoforms and achieves substantially improved calibration of these statistical tests. Experiments on genotype and RNA-Seq data from 126 human HapMap individuals demonstrate that our model can help to obtain a more fine-grained picture of the genetic basis of gene expression variation.
Influenza and respiratory syncytial virus (RSV) are the leading etiological agents of seasonal acute respiratory infections (ARI) around the world. Medical doctors typically base the diagnosis of ARI on patients symptoms alone and do not always condu ct virological tests necessary to identify individual viruses, which limits the ability to study the interaction between multiple pathogens and make public health recommendations. We consider a stochastic kinetic model (SKM) for two interacting ARI pathogens circulating in a large population and an empirically motivated background process for infections with other pathogens causing similar symptoms. An extended marginal sampling approach based on the Linear Noise Approximation to the SKM integrates multiple data sources and additional model components. We infer the parameters defining the pathogens dynamics and interaction within a Bayesian hierarchical model and explore the posterior trajectories of infections for each illness based on aggregate infection reports from six epidemic seasons collected by the state health department, and a subset of virological tests from a sentinel program at a general hospital in San Luis Potosi, Mexico. We interpret the results based on real and simulated data and make recommendations for future data collection strategies. Supplementary materials and software are provided online.
In the case of SARS-CoV-2 pandemic management, wastewater-based epidemiology aims to derive information on the infection dynamics by monitoring virus concentrations in the wastewater. However, due to the intrinsic random fluctuations of the viral sig nal in the wastewater (due to e.g., dilution; transport and fate processes in sewer system; variation in the number of persons discharging; variations in virus excretion and water consumption per day) the subsequent prevalence analysis may result in misleading conclusions. It is thus helpful to apply data filtering techniques to reduce the noise in the signal. In this paper we investigate 13 smoothing algorithms applied to the virus signals monitored in four wastewater treatment plants in Austria. The parameters of the algorithms have been defined by an optimization procedure aiming for performance metrics. The results are further investigated by means of a cluster analysis. While all algorithms are in principle applicable, SPLINE, Generalized Additive Model and Friedman Super Smoother are recognized as superior methods in this context (with the latter two having a tendency to over-smoothing). A first analysis of the resulting datasets indicates the influence of catchment size for wastewater-based epidemiology as smaller communities both reveal a signal threshold before any relation with infection dynamics is visible and also a higher sensitivity towards infection clusters.
Recent technological advances in Next Generation Sequencing tools have led to increasing speeds of DNA sample collection, preparation, and sequencing. One instrument can produce over 600 Gb of genetic sequence data in a single run. This creates new o pportunities to efficiently handle the increasing workload. We propose a new method of fast genetic sequence analysis using the Dynamic Distributed Dimensional Data Model (D4M) - an associative array environment for MATLAB developed at MIT Lincoln Laboratory. Based on mathematical and statistical properties, the method leverages big data techniques and the implementation of an Apache Acculumo database to accelerate computations one-hundred fold over other methods. Comparisons of the D4M method with the current gold-standard for sequence analysis, BLAST, show the two are comparable in the alignments they find. This paper will present an overview of the D4M genetic sequence algorithm and statistical comparisons with BLAST.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا