No Arabic abstract
How natural selection acts to limit the proliferation of transposable elements (TEs) in genomes has been of interest to evolutionary biologists for many years. To describe TE dynamics in populations, many previous studies have used models of transposition-selection equilibrium that rely on the assumption of a constant rate of transposition. However, since TE invasions are known to happen in bursts through time, this assumption may not be reasonable in natural populations. Here we propose a test of neutrality for TE insertions that does not rely on the assumption of a constant transposition rate. We consider the case of TE insertions that have been ascertained from a single haploid reference genome sequence and have subsequently had their allele frequency estimated in a population sample. By conditioning on the age of an individual TE insertion (using information contained in the number of substitutions that have occurred within the TE sequence since insertion), we determine the probability distribution for the insertion allele frequency in a population sample under neutrality. Taking models of varying population size into account, we then evaluate predictions of our model against allele frequency data from 190 retrotransposon insertions sampled from North American and African populations of Drosophila melanogaster. Using this non-equilibrium model, we are able to explain about 80% of the variance in TE insertion allele frequencies based on age alone. Controlling both for nonequilibrium dynamics of transposition and host demography, we provide evidence for negative selection acting against most TEs as well as for positive selection acting on a small subset of TEs. Our work establishes a new framework for the analysis of the evolutionary forces governing large insertion mutations like TEs, gene duplications or other copy number variants.
Our models for detecting the effect of adaptation on population genomic diversity are often predicated on a single newly arisen mutation sweeping rapidly to fixation. However, a population can also adapt to a new situation by multiple mutations of similar phenotypic effect that arise in parallel. These mutations can each quickly reach intermediate frequency, preventing any single one from rapidly sweeping to fixation globally (a soft sweep). Here we study models of parallel mutation in a geographically spread population adapting to a global selection pressure. The slow geographic spread of a selected allele can allow other selected alleles to arise and spread elsewhere in the species range. When these different selected alleles meet, their spread can slow dramatically, and so form a geographic patchwork which could be mistaken for a signal of local adaptation. This random spatial tessellation will dissipate over time due to mixing by migration, leaving a set of partial sweeps within the global population. We show that the spatial tessellation initially formed by mutational types is closely connected to Poisson process models of crystallization, which we extend. We find that the probability of parallel mutation and the spatial scale on which parallel mutation occurs is captured by a single characteristic length that reflects the expected distance a spreading allele travels before it encounters a different spreading allele. This characteristic length depends on the mutation rate, the dispersal parameter, the effective local density of individuals, and to a much lesser extent the strength of selection. We argue that even in widely dispersing species, such parallel geographic sweeps may be surprisingly common. Thus, we predict, as more data becomes available, many more examples of intra-species parallel adaptation will be uncovered.
Both external environmental selection and internal lower-level evolution are essential for an integral picture of evolution. This paper proposes that the division of internal evolution into DNA/RNA pattern formation (genotype) and protein functional action (phenotype) resolves a universal conflict between fitness and evolvability. Specifically, this paper explains how this universal conflict drove the emergence of genotype-phenotype division, why this labor division is responsible for the extraordinary complexity of life, and how the specific ways of genotype-phenotype mapping in the labor division determine the paths and forms of evolution and development.
There is a near consensus view that SARS-CoV-2 has a natural zoonotic origin; however, several characteristics of SARS-CoV-2 taken together are not easily explained by a natural zoonotic origin hypothesis. These include: a low rate of evolution in the early phase of transmission; the lack of evidence of recombination events; a high pre-existing binding to human ACE2; a novel furin cleavage site insert; a flat glycan binding domain of the spike protein which conflicts with host evasion survival patterns exhibited by other coronaviruses, and high human and mouse peptide mimicry. Initial assumptions against a laboratory origin, by contrast, have remained unsubstantiated. Furthermore, over a year after the initial outbreak in Wuhan, there is still no clear evidence of zoonotic transfer from a bat or intermediate species. Given the immense social and economic impact of this pandemic, identifying the true origin of SARS-CoV-2 is fundamental to preventing future outbreaks. The search for SARS-CoV-2s origin should include an open and unbiased inquiry into a possible laboratory origin.
A forward diffusion equation describing the evolution of the allele frequency spectrum is presented. The influx of mutations is accounted for by imposing a suitable boundary condition. For a Wright-Fisher diffusion with or without selection and varying population size, the boundary condition is $lim_{x downarrow 0} x f(x,t)=theta rho(t)$, where $f(cdot,t)$ is the frequency spectrum of derived alleles at independent loci at time $t$ and $rho(t)$ is the relative population size at time $t$. When population size and selection intensity are independent of time, the forward equation is equivalent to the backwards diffusion usually used to derive the frequency spectrum, but the forward equation allows computation of the time dependence of the spectrum both before an equilibrium is attained and when population size and selection intensity vary with time. From the diffusion equation, we derive a set of ordinary differential equations for the moments of $f(cdot,t)$ and express the expected spectrum of a finite sample in terms of those moments. We illustrate the use of the forward equation by considering neutral and selected alleles in a highly simplified model of human history. For example, we show that approximately 30% of the expected heterozygosity of neutral loci is attributable to mutations that arose since the onset of population growth in roughly the last $150,000$ years.
Demographic change of human populations is one of the central questions for delving into the past of human beings. To identify major population expansions related to male lineages, we sequenced 78 East Asian Y chromosomes at 3.9 Mbp of the non-recombining region (NRY), discovered >4,000 new SNPs, and identified many new clades. The relative divergence dates can be estimated much more precisely using molecular clock. We found that all the Paleolithic divergences were binary; however, three strong star-like Neolithic expansions at ~6 kya (thousand years ago) (assuming a constant substitution rate of 1e-9/bp/year) indicates that ~40% of modern Chinese are patrilineal descendants of only three super-grandfathers at that time. This observation suggests that the main patrilineal expansion in China occurred in the Neolithic Era and might be related to the development of agriculture.