No Arabic abstract
We consider a model of a population of fixed size N in which each individual gets replaced at rate one and each individual experiences a mutation at rate mu. We calculate the asymptotic distribution of the time that it takes before there is an individual in the population with m mutations. Several different behaviors are possible, depending on how mu changes with N. These results have applications to the problem of determining the waiting time for regulatory sequences to appear and to models of cancer development.
We consider a (sub) critical Galton-Watson process with neutral mutations (infinite alleles model), and decompose the entire population into clusters of individuals carrying the same allele. We specify the law of this allelic partition in terms of the distribution of the number of clone-children and the number of mutant-children of a typical individual. The approach combines an extension of Harris representation of Galton-Watson processes and a version of the ballot theorem. Some limit theorems related to the distribution of the allelic partition are also given.
We consider the population genetics problem: how long does it take before some member of the population has $m$ specified mutations? The case $m=2$ is relevant to onset of cancer due to the inactivation of both copies of a tumor suppressor gene. Models for larger $m$ are needed for colon cancer and other diseases where a sequence of mutations leads to cells with uncontrolled growth.
It is known (see e.g. Weibull (1995)) that ESS is not robust against multiple mutations. In this article, we introduce robustness against multiple mutations and study some equivalent formulations and consequences.
We consider inference about the history of a sample of DNA sequences, conditional upon the haplotype counts and the number of segregating sites observed at the present time. After deriving some theoretical results in the coalescent setting, we implement rejection sampling and importance sampling schemes to perform the inference. The importance sampling scheme addresses an extension of the Ewens Sampling Formula for a configuration of haplotypes and the number of segregating sites in the sample. The implementations include both constant and variable population size models. The methods are illustrated by two human Y chromosome data sets.
The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has been mutating since it was first sequenced in early January 2020. The genetic variants have developed into a few distinct clusters with different properties. Since the United States (US) has the highest number of viral infected patients globally, it is essential to understand the US SARS-CoV-2. Using genotyping, sequence-alignment, time-evolution, $k$-means clustering, protein-folding stability, algebraic topology, and network theory, we reveal that the US SARS-CoV-2 has four substrains and five top US SARS-CoV-2 mutations were first detected in China (2 cases), Singapore (2 cases), and the United Kingdom (1 case). The next three top US SARS-CoV-2 mutations were first detected in the US. These eight top mutations belong to two disconnected groups. The first group consisting of 5 concurrent mutations is prevailing, while the other group with three concurrent mutations gradually fades out. Our analysis suggests that female immune systems are more active than those of males in responding to SARS-CoV-2 infections. We identify that one of the top mutations, 27964C$>$T-(S24L) on ORF8, has an unusually strong gender dependence. Based on the analysis of all mutations on the spike protein, we further uncover that three of four US SASR-CoV-2 substrains become more infectious. Our study calls for effective viral control and containing strategies in the US.