No Arabic abstract
While much effort has focused on detecting positive and negative directional selection in the human genome, relatively little work has been devoted to balancing selection. This lack of attention is likely due to the paucity of sophisticated methods for identifying sites under balancing selection. Here we develop two composite likelihood ratio tests for detecting balancing selection. Using simulations, we show that these methods outperform competing methods under a variety of assumptions and demographic models. We apply the new methods to whole-genome human data, and find a number of previously-identified loci with strong evidence of balancing selection, including several HLA genes. Additionally, we find evidence for many novel candidates, the strongest of which is FANK1, an imprinted gene that suppresses apoptosis, is expressed during meiosis in males, and displays marginal signs of segregation distortion. We hypothesize that balancing selection acts on this locus to stabilize the segregation distortion and negative fitness effects of the distorter allele. Thus, our methods are able to reproduce many previously-hypothesized signals of balancing selection, as well as discover novel interesting candidates.
We consider a population constituted by two types of individuals; each of them can produce offspring in two different islands (as a particular case the islands can be interpreted as active or dormant individuals). We model the evolution of the popula
We propose a method that uses genetic data to test for the occurrence of a recent range expansion and to infer the location of the origin of the expansion. We introduce a statistic for pairs of populations $psi$ (the directionality index) that detects asymmetries in the two-dimensional allele frequency spectrum caused by the series of founder events that happen during an expansion. Such asymmetry arises because low frequency alleles tend to be lost during founder events, thus creating clines in the frequencies of surviving low-frequency alleles. Using simulations, we further show that $psi$ is more powerful for detecting range expansions than both $F_{ST}$ and clines in heterozygosity. We illustrate the utility of $psi$ by applying it to a data set from modern humans and show how we can include more complicated scenarios such as multiple expansion origins or barriers to migration in the model.
We investigate a continuous time, probability measure-valued dynamical system that describes the process of mutation-selection balance in a context where the population is infinite, there may be infinitely many loci, and there are weak assumptions on selective costs. Our model arises when we incorporate very general recombination mechanisms into a previous model of mutation and selection from Steinsaltz, Evans and Wachter (2005) and take the relative strength of mutation and selection to be sufficiently small. The resulting dynamical system is a flow of measures on the space of loci. Each such measure is the intensity measure of a Poisson random measure on the space of loci: the points of a realization of the random measure record the set of loci at which the genotype of a uniformly chosen individual differs from a reference wild type due to an accumulation of ancestral mutations. Our motivation for working in such a general setting is to provide a basis for understanding mutation-driven changes in age-specific demographic schedules that arise from the complex interaction of many genes, and hence to develop a framework for understanding the evolution of aging. We establish the existence and uniqueness of the dynamical system, provide conditions for the existence and stability of equilibrium states, and prove that our continuous-time dynamical system is the limit of a sequence of discrete-time infinite population mutation-selection-recombination models in the standard asymptotic regime where selection and mutation are weak relative to recombination and both scale at the same infinitesimal rate in the limit.
RNA-Seq technology allows for studying the transcriptional state of the cell at an unprecedented level of detail. Beyond quantification of whole-gene expression, it is now possible to disentangle the abundance of individual alternatively spliced transcript isoforms of a gene. A central question is to understand the regulatory processes that lead to differences in relative abundance variation due to external and genetic factors. Here, we present a mixed model approach that allows for (i) joint analysis and genetic mapping of multiple transcript isoforms and (ii) mapping of isoform-specific effects. Central to our approach is to comprehensively model the causes of variation and correlation between transcript isoforms, including the genomic background and technical quantification uncertainty. As a result, our method allows to accurately test for shared as well as transcript-specific genetic regulation of transcript isoforms and achieves substantially improved calibration of these statistical tests. Experiments on genotype and RNA-Seq data from 126 human HapMap individuals demonstrate that our model can help to obtain a more fine-grained picture of the genetic basis of gene expression variation.
Understanding dynamics of an outbreak like that of COVID-19 is important in designing effective control measures. This study aims to develop an agent based model that compares changes in infection progression by manipulating different parameters in a synthetic population. Model input includes population characteristics like age, sex, working status etc. of each individual and other factors influencing disease dynamics. Depending on number of epicentres of infection, location of primary cases, sensitivity, proportion of asymptomatic and frequency or duration of lockdown, our simulator tracks every individual and hence infection progression through community over time. In a closed community of 10000 people, it is seen that without any lockdown, number of cases peak around 6th week and wanes off around 15th week. If primary case is located inside dense population cluster like slums, cases peak early and wane off slowly. With introduction of lockdown, cases peak at slower rate. If sensitivity of identifying infection decreases, cases and deaths increase. Number of cases declines with increase in proportion of asymptomatic cases. The model is robust and provides reproducible estimates with realistic parameter values. It also guides in identifying measures to control outbreak in a community. It is flexible in accommodating different parameters like infectivity period, yield of testing, socio-economic strata, daily travel, awareness level, population density, social distancing, lockdown etc. and can be tailored to study other infections with similar transmission pattern.