No Arabic abstract
Our models for detecting the effect of adaptation on population genomic diversity are often predicated on a single newly arisen mutation sweeping rapidly to fixation. However, a population can also adapt to a new situation by multiple mutations of similar phenotypic effect that arise in parallel. These mutations can each quickly reach intermediate frequency, preventing any single one from rapidly sweeping to fixation globally (a soft sweep). Here we study models of parallel mutation in a geographically spread population adapting to a global selection pressure. The slow geographic spread of a selected allele can allow other selected alleles to arise and spread elsewhere in the species range. When these different selected alleles meet, their spread can slow dramatically, and so form a geographic patchwork which could be mistaken for a signal of local adaptation. This random spatial tessellation will dissipate over time due to mixing by migration, leaving a set of partial sweeps within the global population. We show that the spatial tessellation initially formed by mutational types is closely connected to Poisson process models of crystallization, which we extend. We find that the probability of parallel mutation and the spatial scale on which parallel mutation occurs is captured by a single characteristic length that reflects the expected distance a spreading allele travels before it encounters a different spreading allele. This characteristic length depends on the mutation rate, the dispersal parameter, the effective local density of individuals, and to a much lesser extent the strength of selection. We argue that even in widely dispersing species, such parallel geographic sweeps may be surprisingly common. Thus, we predict, as more data becomes available, many more examples of intra-species parallel adaptation will be uncovered.
A forward diffusion equation describing the evolution of the allele frequency spectrum is presented. The influx of mutations is accounted for by imposing a suitable boundary condition. For a Wright-Fisher diffusion with or without selection and varying population size, the boundary condition is $lim_{x downarrow 0} x f(x,t)=theta rho(t)$, where $f(cdot,t)$ is the frequency spectrum of derived alleles at independent loci at time $t$ and $rho(t)$ is the relative population size at time $t$. When population size and selection intensity are independent of time, the forward equation is equivalent to the backwards diffusion usually used to derive the frequency spectrum, but the forward equation allows computation of the time dependence of the spectrum both before an equilibrium is attained and when population size and selection intensity vary with time. From the diffusion equation, we derive a set of ordinary differential equations for the moments of $f(cdot,t)$ and express the expected spectrum of a finite sample in terms of those moments. We illustrate the use of the forward equation by considering neutral and selected alleles in a highly simplified model of human history. For example, we show that approximately 30% of the expected heterozygosity of neutral loci is attributable to mutations that arose since the onset of population growth in roughly the last $150,000$ years.
How natural selection acts to limit the proliferation of transposable elements (TEs) in genomes has been of interest to evolutionary biologists for many years. To describe TE dynamics in populations, many previous studies have used models of transposition-selection equilibrium that rely on the assumption of a constant rate of transposition. However, since TE invasions are known to happen in bursts through time, this assumption may not be reasonable in natural populations. Here we propose a test of neutrality for TE insertions that does not rely on the assumption of a constant transposition rate. We consider the case of TE insertions that have been ascertained from a single haploid reference genome sequence and have subsequently had their allele frequency estimated in a population sample. By conditioning on the age of an individual TE insertion (using information contained in the number of substitutions that have occurred within the TE sequence since insertion), we determine the probability distribution for the insertion allele frequency in a population sample under neutrality. Taking models of varying population size into account, we then evaluate predictions of our model against allele frequency data from 190 retrotransposon insertions sampled from North American and African populations of Drosophila melanogaster. Using this non-equilibrium model, we are able to explain about 80% of the variance in TE insertion allele frequencies based on age alone. Controlling both for nonequilibrium dynamics of transposition and host demography, we provide evidence for negative selection acting against most TEs as well as for positive selection acting on a small subset of TEs. Our work establishes a new framework for the analysis of the evolutionary forces governing large insertion mutations like TEs, gene duplications or other copy number variants.
In this paper we investigate the spread of advantageous genes in two variants of the F-KPP model with dormancy. The first variant, in which dormant individuals do not move in space and instead form localized seed banks, has recently been introduced in Blath, Hammer and Nie (2020). However, there, only a relatively crude upper bound for the critical speed of potential travelling wave solutions has been provided. The second model variant is new and describes a situation in which the dormant forms of individuals are subject to motion, while the active individuals remain spatially static instead. This can be motivated e.g. by spore dispersal of fungi, where the dormant spores are distributed by wind, water or insects, while the active fungi are locally fixed. For both models, we establish the existence of monotone travelling wave solutions, determine the corresponding critical wave-speed in terms of the model parameters, and characterize aspects of the asymptotic shape of the waves depending on the decay properties of the initial condition. Interestingly, the slow-down effect of dormancy on the speed of propagation of beneficial alleles is often more serious in model variant II (the spore model) than in variant I (the seed bank model), and this can be understood mathematically via probabilistic representations of solutions in terms of (two variants of) on/off branching Brownian motions. Our proofs make rather heavy use of probabilistic tools in the tradition of McKean (1975), Bramson (1978), Neveu (1987), Lalley and Sellke (1987), Champneys et al (1995) and others. However, the two-compartment nature of the model and the special forms of dormancy also pose obstacles to the classical formalism, giving rise to a variety of open research questions that we briefly discuss at the end of the paper.
The advent of accessible ancient DNA technology now allows the direct ascertainment of allele frequencies in ancestral populations, thereby enabling the use of allele frequency time series to detect and estimate natural selection. Such direct observations of allele frequency dynamics are expected to be more powerful than inferences made using patterns of linked neutral variation obtained from modern individuals. We develop a Bayesian method to make use of allele frequency time series data and infer the parameters of general diploid selection, along with allele age, in non-equilibrium populations. We introduce a novel path augmentation approach, in which we use Markov chain Monte Carlo to integrate over the space of allele frequency trajectories consistent with the observed data. Using simulations, we show that this approach has good power to estimate selection coefficients and allele age. Moreover, when applying our approach to data on horse coat color, we find that ignoring a relevant demographic history can significantly bias the results of inference. Our approach is made available in a C++ software package.
The growth of a population divided among spatial sites, with migration between the sites, is sometimes modelled by a product of random matrices, with each diagonal elements representing the growth rate in a given time period, and off-diagonal elements the migration rate. If the sites are reinterpreted as age classes, the same model may apply to a single population with age-dependent mortality and reproduction. We consider the case where the off-diagonal elements are small, representing a situation where there is little migration or, alternatively, where a deterministic life-history has been slightly disrupted, for example by introducing a rare delay in development. We examine the asymptotic behaviour of the long-term growth rate. We show that when the highest growth rate is attained at two different sites in the absence of migration (which is always the case when modelling a single age-structured population) the increase in stochastic growth rate due to a migration rate $epsilon$ is like $(log epsilon^{-1})^{-1}$ as $epsilondownarrow 0$, under fairly generic conditions. When there is a single site with the highest growth rate the behavior is more delicate, depending on the tails of the growth rates. For the case when the log growth rates have Gaussian-like tails we show that the behavior near zero is like a power of $epsilon$, and derive upper and lower bounds for the power in terms of the difference in the growth rates and the distance between the sites.