It is known (see e.g. Weibull (1995)) that ESS is not robust against multiple mutations. In this article, we introduce robustness against multiple mutations and study some equivalent formulations and consequences.
We consider a model of a population of fixed size N in which each individual gets replaced at rate one and each individual experiences a mutation at rate mu. We calculate the asymptotic distribution of the time that it takes before there is an individual in the population with m mutations. Several different behaviors are possible, depending on how mu changes with N. These results have applications to the problem of determining the waiting time for regulatory sequences to appear and to models of cancer development.
We consider inference about the history of a sample of DNA sequences, conditional upon the haplotype counts and the number of segregating sites observed at the present time. After deriving some theoretical results in the coalescent setting, we implement rejection sampling and importance sampling schemes to perform the inference. The importance sampling scheme addresses an extension of the Ewens Sampling Formula for a configuration of haplotypes and the number of segregating sites in the sample. The implementations include both constant and variable population size models. The methods are illustrated by two human Y chromosome data sets.
The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has been mutating since it was first sequenced in early January 2020. The genetic variants have developed into a few distinct clusters with different properties. Since the United States (US) has the highest number of viral infected patients globally, it is essential to understand the US SARS-CoV-2. Using genotyping, sequence-alignment, time-evolution, $k$-means clustering, protein-folding stability, algebraic topology, and network theory, we reveal that the US SARS-CoV-2 has four substrains and five top US SARS-CoV-2 mutations were first detected in China (2 cases), Singapore (2 cases), and the United Kingdom (1 case). The next three top US SARS-CoV-2 mutations were first detected in the US. These eight top mutations belong to two disconnected groups. The first group consisting of 5 concurrent mutations is prevailing, while the other group with three concurrent mutations gradually fades out. Our analysis suggests that female immune systems are more active than those of males in responding to SARS-CoV-2 infections. We identify that one of the top mutations, 27964C$>$T-(S24L) on ORF8, has an unusually strong gender dependence. Based on the analysis of all mutations on the spike protein, we further uncover that three of four US SASR-CoV-2 substrains become more infectious. Our study calls for effective viral control and containing strategies in the US.
High-throughput sequencing allows the detection and quantification of frequencies of somatic single nucleotide variants (SNV) in heterogeneous tumor cell populations. In some cases, the evolutionary history and population frequency of the subclonal lineages of tumor cells present in the sample can be reconstructed from these SNV frequency measurements. However, automated methods to do this reconstruction are not available and the conditions under which reconstruction is possible have not been described. We describe the conditions under which the evolutionary history can be uniquely reconstructed from SNV frequencies from single or multiple samples from the tumor population and we introduce a new statistical model, PhyloSub, that infers the phylogeny and genotype of the major subclonal lineages represented in the population of cancer cells. It uses a Bayesian nonparametric prior over trees that groups SNVs into major subclonal lineages and automatically estimates the number of lineages and their ancestry. We sample from the joint posterior distribution over trees to identify evolutionary histories and cell population frequencies that have the highest probability of generating the observed SNV frequency data. When multiple phylogenies are consistent with a given set of SNV frequencies, PhyloSub represents the uncertainty in the tumor phylogeny using a partial order plot. Experiments on a simulated dataset and two real datasets comprising tumor samples from acute myeloid leukemia and chronic lymphocytic leukemia patients demonstrate that PhyloSub can infer both linear (or chain) and branching lineages and its inferences are in good agreement with ground truth, where it is available.
The recurrent infectious diseases and their increasing impact on the society has promoted the study of strategies to slow down the epidemic spreading. In this review we outline the applications of percolation theory to describe strategies against epidemic spreading on complex networks. We give a general outlook of the relation between link percolation and the susceptible-infected-recovered model, and introduce the node void percolation process to describe the dilution of the network composed by healthy individual, $i.e$, the network that sustain the functionality of a society. Then, we survey two strategies: the quenched disorder strategy where an heterogeneous distribution of contact intensities is induced in society, and the intermittent social distancing strategy where health individuals are persuaded to avoid contact with their neighbors for intermittent periods of time. Using percolation tools, we show that both strategies may halt the epidemic spreading. Finally, we discuss the role of the transmissibility, $i.e$, the effective probability to transmit a disease, on the performance of the strategies to slow down the epidemic spreading.