ترغب بنشر مسار تعليمي؟ اضغط هنا

Modeling population structure under hierarchical Dirichlet processes

337   0   0.0 ( 0 )
 نشر من قبل Stefano Favaro
 تاريخ النشر 2015
  مجال البحث الاحصاء الرياضي
والبحث باللغة English




اسأل ChatGPT حول البحث

We propose a Bayesian nonparametric model to infer population admixture, extending the Hierarchical Dirichlet Process to allow for correlation between loci due to Linkage Disequilibrium. Given multilocus genotype data from a sample of individuals, the model allows inferring classifying individuals as unadmixed or admixed, inferring the number of subpopulations ancestral to an admixed population and the population of origin of chromosomal regions. Our model does not assume any specific mutation process and can be applied to most of the commonly used genetic markers. We present a MCMC algorithm to perform posterior inference from the model and discuss methods to summarise the MCMC output for the analysis of population admixture. We demonstrate the performance of the proposed model in simulations and in a real application, using genetic data from the EDAR gene, which is considered to be ancestry-informative due to well-known variations in allele frequency as well as phenotypic effects across ancestry. The structure analysis of this dataset leads to the identification of a rare haplotype in Europeans.



قيم البحث

اقرأ أيضاً

We propose a hierarchical Bayesian model to estimate the proportional contribution of source populations to a newly founded colony. Samples are derived from the first generation offspring in the colony, but mating may occur preferentially among migra nts from the same source population. Genotypes of the newly founded colony and source populations are used to estimate the mixture proportions, and the mixture proportions are related to environmental and demographic factors that might affect the colonizing process. We estimate an assortative mating coefficient, mixture proportions, and regression relationships between environmental factors and the mixture proportions in a single hierarchical model. The first-stage likelihood for genotypes in the newly founded colony is a mixture multinomial distribution reflecting the colonizing process. The environmental and demographic data are incorporated into the model through a hierarchical prior structure. A simulation study is conducted to investigate the performance of the model by using different levels of population divergence and number of genetic markers included in the analysis. We use Markov chain Monte Carlo (MCMC) simulation to conduct inference for the posterior distributions of model parameters. We apply the model to a data set derived from grey seals in the Orkney Islands, Scotland. We compare our model with a similar model previously used to analyze these data. The results from both the simulation and application to real data indicate that our model provides better estimates for the covariate effects.
We propose a general Bayesian approach to modeling epidemics such as COVID-19. The approach grew out of specific analyses conducted during the pandemic, in particular an analysis concerning the effects of non-pharmaceutical interventions (NPIs) in re ducing COVID-19 transmission in 11 European countries. The model parameterizes the time varying reproduction number $R_t$ through a regression framework in which covariates can e.g be governmental interventions or changes in mobility patterns. This allows a joint fit across regions and partial pooling to share strength. This innovation was critical to our timely estimates of the impact of lockdown and other NPIs in the European epidemics, whose validity was borne out by the subsequent course of the epidemic. Our framework provides a fully generative model for latent infections and observations deriving from them, including deaths, cases, hospitalizations, ICU admissions and seroprevalence surveys. One issue surrounding our models use during the COVID-19 pandemic is the confounded nature of NPIs and mobility. We use our framework to explore this issue. We have open sourced an R package epidemia implementing our approach in Stan. Versions of the model are used by New York State, Tennessee and Scotland to estimate the current situation and make policy decisions.
Studying the determinants of adverse pregnancy outcomes like stillbirth and preterm birth is of considerable interest in epidemiology. Understanding the role of both individual and community risk factors for these outcomes is crucial for planning app ropriate clinical and public health interventions. With this goal, we develop geospatial mixed effects logistic regression models for adverse pregnancy outcomes. Our models account for both spatial autocorrelation and heterogeneity between neighborhoods. To mitigate the low incidence of stillbirth and preterm births in our data, we explore using class rebalancing techniques to improve predictive power. To assess the informative value of the covariates in our models, we use posterior distributions of their coefficients to gauge how well they can be distinguished from zero. As a case study, we model stillbirth and preterm birth in the city of Philadelphia, incorporating both patient-level data from electronic health records (EHR) data and publicly available neighborhood data at the census tract level. We find that patient-level features like self-identified race and ethnicity were highly informative for both outcomes. Neighborhood-level factors were also informative, with poverty important for stillbirth and crime important for preterm birth. Finally, we identify the neighborhoods in Philadelphia at highest risk of stillbirth and preterm birth.
175 - Akisato Suzuki 2020
How should social scientists understand and communicate the uncertainty of statistically estimated causal effects? It is well-known that the conventional significance-vs.-insignificance approach is associated with misunderstandings and misuses. Behav ioral research suggests people understand uncertainty more appropriately in a numerical, continuous scale than in a verbal, discrete scale. Motivated by these backgrounds, I propose presenting the probabilities of different effect sizes. Probability is an intuitive continuous measure of uncertainty. It allows researchers to better understand and communicate the uncertainty of statistically estimated effects. In addition, my approach needs no decision threshold for an uncertainty measure or an effect size, unlike the conventional approaches, allowing researchers to be agnostic about a decision threshold such as p<5% and a justification for that. I apply my approach to a previous social scientific study, showing it enables richer inference than the significance-vs.-insignificance approach taken by the original study. The accompanying R package makes my approach easy to implement.
When a latent shoeprint is discovered at a crime scene, forensic analysts inspect it for distinctive patterns of wear such as scratches and holes (known as accidentals) on the source shoes sole. If its accidentals correspond to those of a suspects sh oe, the print can be used as forensic evidence to place the suspect at the crime scene. The strength of this evidence depends on the random match probability---the chance that a shoe chosen at random would match the crime scene prints accidentals. Evaluating random match probabilities requires an accurate model for the spatial distribution of accidentals on shoe soles. A recent report by the Presidents Council of Advisors in Science and Technology criticized existing models in the literature, calling for new empirically validated techniques. We respond to this request with a new spatial point process model for accidental locations, developed within a hierarchical Bayesian framework. We treat the tread pattern of each shoe as a covariate, allowing us to pool information across large heterogeneous databases of shoes. Existing models ignore this information; our results show that including it leads to significantly better model fit. We demonstrate this by fitting our model to one such database.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا