No Arabic abstract
We propose a hierarchical Bayesian model to estimate the proportional contribution of source populations to a newly founded colony. Samples are derived from the first generation offspring in the colony, but mating may occur preferentially among migrants from the same source population. Genotypes of the newly founded colony and source populations are used to estimate the mixture proportions, and the mixture proportions are related to environmental and demographic factors that might affect the colonizing process. We estimate an assortative mating coefficient, mixture proportions, and regression relationships between environmental factors and the mixture proportions in a single hierarchical model. The first-stage likelihood for genotypes in the newly founded colony is a mixture multinomial distribution reflecting the colonizing process. The environmental and demographic data are incorporated into the model through a hierarchical prior structure. A simulation study is conducted to investigate the performance of the model by using different levels of population divergence and number of genetic markers included in the analysis. We use Markov chain Monte Carlo (MCMC) simulation to conduct inference for the posterior distributions of model parameters. We apply the model to a data set derived from grey seals in the Orkney Islands, Scotland. We compare our model with a similar model previously used to analyze these data. The results from both the simulation and application to real data indicate that our model provides better estimates for the covariate effects.
When a latent shoeprint is discovered at a crime scene, forensic analysts inspect it for distinctive patterns of wear such as scratches and holes (known as accidentals) on the source shoes sole. If its accidentals correspond to those of a suspects shoe, the print can be used as forensic evidence to place the suspect at the crime scene. The strength of this evidence depends on the random match probability---the chance that a shoe chosen at random would match the crime scene prints accidentals. Evaluating random match probabilities requires an accurate model for the spatial distribution of accidentals on shoe soles. A recent report by the Presidents Council of Advisors in Science and Technology criticized existing models in the literature, calling for new empirically validated techniques. We respond to this request with a new spatial point process model for accidental locations, developed within a hierarchical Bayesian framework. We treat the tread pattern of each shoe as a covariate, allowing us to pool information across large heterogeneous databases of shoes. Existing models ignore this information; our results show that including it leads to significantly better model fit. We demonstrate this by fitting our model to one such database.
Studying the determinants of adverse pregnancy outcomes like stillbirth and preterm birth is of considerable interest in epidemiology. Understanding the role of both individual and community risk factors for these outcomes is crucial for planning appropriate clinical and public health interventions. With this goal, we develop geospatial mixed effects logistic regression models for adverse pregnancy outcomes. Our models account for both spatial autocorrelation and heterogeneity between neighborhoods. To mitigate the low incidence of stillbirth and preterm births in our data, we explore using class rebalancing techniques to improve predictive power. To assess the informative value of the covariates in our models, we use posterior distributions of their coefficients to gauge how well they can be distinguished from zero. As a case study, we model stillbirth and preterm birth in the city of Philadelphia, incorporating both patient-level data from electronic health records (EHR) data and publicly available neighborhood data at the census tract level. We find that patient-level features like self-identified race and ethnicity were highly informative for both outcomes. Neighborhood-level factors were also informative, with poverty important for stillbirth and crime important for preterm birth. Finally, we identify the neighborhoods in Philadelphia at highest risk of stillbirth and preterm birth.
Studying the neurological, genetic and evolutionary basis of human vocal communication mechanisms is an important field of neuroscience. In the absence of high quality data on humans, mouse vocalization experiments in laboratory settings have been proven to be useful in providing valuable insights into mammalian vocal development and evolution, including especially the impact of certain genetic mutations. Data sets from mouse vocalization experiments usually consist of categorical syllable sequences along with continuous inter-syllable interval times for mice of different genotypes vocalizing under various contexts. Few statistical models have considered the inference for both transition probabilities and inter-state intervals. The latter is of particular importance as increased inter-state intervals can be an indication of possible vocal impairment. In this paper, we propose a class of novel Markov renewal mixed models that capture the stochastic dynamics of both state transitions and inter-state interval times. Specifically, we model the transition dynamics and the inter-state intervals using Dirichlet and gamma mixtures, respectively, allowing the mixture probabilities in both cases to vary flexibly with fixed covariate effects as well as random individual-specific effects. We apply our model to analyze the impact of a mutation in the Foxp2 gene on mouse vocal behavior. We find that genotypes and social contexts significantly affect the inter-state interval times but, compared to previous analyses, the influences of genotype and social context on the syllable transition dynamics are weaker.
We propose a Bayesian nonparametric model to infer population admixture, extending the Hierarchical Dirichlet Process to allow for correlation between loci due to Linkage Disequilibrium. Given multilocus genotype data from a sample of individuals, the model allows inferring classifying individuals as unadmixed or admixed, inferring the number of subpopulations ancestral to an admixed population and the population of origin of chromosomal regions. Our model does not assume any specific mutation process and can be applied to most of the commonly used genetic markers. We present a MCMC algorithm to perform posterior inference from the model and discuss methods to summarise the MCMC output for the analysis of population admixture. We demonstrate the performance of the proposed model in simulations and in a real application, using genetic data from the EDAR gene, which is considered to be ancestry-informative due to well-known variations in allele frequency as well as phenotypic effects across ancestry. The structure analysis of this dataset leads to the identification of a rare haplotype in Europeans.
Reliable mortality estimates at the subnational level are essential in the study of health inequalities within a country. One of the difficulties in producing such estimates is the presence of small populations, where the stochastic variation in death counts is relatively high, and so the underlying mortality levels are unclear. We present a Bayesian hierarchical model to estimate mortality at the subnational level. The model builds on characteristic age patterns in mortality curves, which are constructed using principal components from a set of reference mortality curves. Information on mortality rates are pooled across geographic space and smoothed over time. Testing of the model shows reasonable estimates and uncertainty levels when the model is applied to both simulated data which mimic US counties, and real data for French departments. The estimates produced by the model have direct applications to the study of subregional health patterns and disparities.