No Arabic abstract
This paper develops a formulation of the quasispecies equations appropriate for polysomic, semiconservatively replicating genomes. This paper is an extension of previous work on the subject, which considered the case of haploid genomes. Here, we develop a more general formulation of the quasispecies equations that is applicable to diploid and even polyploid genomes. Interestingly, with an appropriate classification of population fractions, we obtain a system of equations that is formally identical to the haploid case. As with the work for haploid genomes, we consider both random and immortal DNA strand chromosome segregation mechanisms. However, in contrast to the haploid case, we have found that an analytical solution for the mean fitness is considerably more difficult to obtain for the polyploid case. Accordingly, whereas for the haploid case we obtained expressions for the mean fitness for the case of an analogue of the single-fitness-peak landscape for arbitrary lesion repair probabilities (thereby allowing for non-complementary genomes), here we solve for the mean fitness for the restricted case of perfect lesion repair.
This paper develops a quasispecies model that incorporates the SOS response. We consider a unicellular, asexually replicating population of organisms, whose genomes consist of a single, double-stranded DNA molecule, i.e. one chromosome. We assume that repair of post-replication mismatched base-pairs occurs with probability $ lambda $, and that the SOS response is triggered when the total number of mismatched base-pairs exceeds $ l_S $. We further assume that the per-mismatch SOS elimination rate is characterized by a first-order rate constant $ kappa_{SOS} $. For a single fitness peak landscape where the master genome can sustain up to $ l $ mismatches and remain viable, this model is analytically solvable in the limit of infinite sequence length. The results, which are confirmed by stochastic simulations, indicate that the SOS response does indeed confer a fitness advantage to a population, provided that it is only activated when DNA damage is so extensive that a cell will die if it does not attempt to repair its DNA.
We investigate the error threshold for the emergence of quasispecies in the Eigen model. By mapping to to an effective Hamiltonian ruled by the imaginary-time Schrodinger equation, a variational ansatz is proposed and applied to calculate various quantities associated with the quasispecies. The variational ansatz gives correct predictions for the survival population of the wild-type sequence and also reveals an unexpected universal scaling behaviors near the error threshold. We check the validity of the variational ansatz by numerical methods and find excellent agreement. Though the emergence of the scaling behaviors is not yet fully understood, it is remarkable that the universal scaling function reigns even for relatively short genome length such as L=16. Further investigations may reveal the mechanism of the universal scaling and extract the essential ingredients for the emergence of the quasispecies in molecular evolution.
By performing a comprehensive study on 1832 segments of 1212 complete genomes of viruses, we show that in viral genomes the hairpin structures of thermodynamically predicted RNA secondary structures are more abundant than expected under a simple random null hypothesis. The detected hairpin structures of RNA secondary structures are present both in coding and in noncoding regions for the four groups of viruses categorized as dsDNA, dsRNA, ssDNA and ssRNA. For all groups hairpin structures of RNA secondary structures are detected more frequently than expected for a random null hypothesis in noncoding rather than in coding regions. However, potential RNA secondary structures are also present in coding regions of dsDNA group. In fact we detect evolutionary conserved RNA secondary structures in conserved coding and noncoding regions of a large set of complete genomes of dsDNA herpesviruses.
Least squares trees, multi-dimensional scaling and Neighbor Nets are all different and popular ways of visualizing multi-dimensional data. The method of flexi-Weighted Least Squares (fWLS) is a powerful method of fitting phylogenetic trees, when the exact form of errors is unknown. Here, both polynomial and exponential weights are used to model errors. The exact same models are implemented for multi-dimensional scaling to yield flexi-Weighted MDS, including as special cases methods such as the Sammon Stress function. Here we apply all these methods to population genetic data looking at the relationships of Abrahams Children encompassing Arabs and now widely dispersed populations of Jews, in relation to an African outgroup and a variety of European populations. Trees, MDS and Neighbor Nets of this data are compared within a common likelihood framework and the strengths and weaknesses of each method are explored. Because the errors in this type of data can be complex, for example, due to unexpected genetic transfer, we use a residual resampling method to assess the robustness of trees and the Neighbor Net. Despite the Neighbor Net fitting best by all criteria except BIC, its structure is ill defined following residual resampling. In contrast, fWLS trees are favored by BIC and retain considerable strong internal structure following residual resampling. This structure clearly separates various European and Middle Eastern populations, yet it is clear all of the models have errors much larger than expected by sampling variance alone.
How natural selection acts to limit the proliferation of transposable elements (TEs) in genomes has been of interest to evolutionary biologists for many years. To describe TE dynamics in populations, many previous studies have used models of transposition-selection equilibrium that rely on the assumption of a constant rate of transposition. However, since TE invasions are known to happen in bursts through time, this assumption may not be reasonable in natural populations. Here we propose a test of neutrality for TE insertions that does not rely on the assumption of a constant transposition rate. We consider the case of TE insertions that have been ascertained from a single haploid reference genome sequence and have subsequently had their allele frequency estimated in a population sample. By conditioning on the age of an individual TE insertion (using information contained in the number of substitutions that have occurred within the TE sequence since insertion), we determine the probability distribution for the insertion allele frequency in a population sample under neutrality. Taking models of varying population size into account, we then evaluate predictions of our model against allele frequency data from 190 retrotransposon insertions sampled from North American and African populations of Drosophila melanogaster. Using this non-equilibrium model, we are able to explain about 80% of the variance in TE insertion allele frequencies based on age alone. Controlling both for nonequilibrium dynamics of transposition and host demography, we provide evidence for negative selection acting against most TEs as well as for positive selection acting on a small subset of TEs. Our work establishes a new framework for the analysis of the evolutionary forces governing large insertion mutations like TEs, gene duplications or other copy number variants.