No Arabic abstract
Genetic studies of human traits have revolutionized our understanding of the variation between individuals, and opened the door for numerous breakthroughs in biology, medicine and other scientific fields. And yet, the ultimate promise of this area of research is still not fully realized. In this review, we highlight the major open problems that need to be solved to improve our understanding of the genetic variation underlying human traits, and by discussing these challenges provide a primer to the field. Our focus is on concrete analytical problems, both conceptual and technical in nature. We cover general issues in genetic studies such as population structure, epistasis and gene-environment interactions, data-related issues such as ethnic diversity and rare genetic variants, and specific challenges related to heritability estimates, genetic association studies and polygenic risk scores. We emphasize the interconnectedness of these open problems and suggest promising avenues to address them.
The human microbiome is the ensemble of genes in the microbes that live inside and on the surface of humans. Because microbial sequencing information is now much easier to come by than phenotypic information, there has been an explosion of sequencing and genetic analysis of microbiome samples. Much of the analytical work for these sequences involves phylogenetics, at least indirectly, but methodology has developed in a somewhat different direction than for other applications of phylogenetics. In this paper I review the field and its methods from the perspective of a phylogeneticist, as well as describing current challenges for phylogenetics coming from this type of work.
The role of positive selection in human evolution remains controversial. On the one hand, scans for positive selection have identified hundreds of candidate loci and the genome-wide patterns of polymorphism show signatures consistent with frequent positive selection. On the other hand, recent studies have argued that many of the candidate loci are false positives and that most apparent genome-wide signatures of adaptation are in fact due to reduction of neutral diversity by linked recurrent deleterious mutations, known as background selection. Here we analyze human polymorphism data from the 1,000 Genomes project (Abecasis et al. 2012) and detect signatures of pervasive positive selection once we correct for the effects of background selection. We show that levels of neutral polymorphism are lower near amino acid substitutions, with the strongest reduction observed specifically near functionally consequential amino acid substitutions. Furthermore, amino acid substitutions are associated with signatures of recent adaptation that should not be generated by background selection, such as the presence of unusually long and frequent haplotypes and specific distortions in the site frequency spectrum. We use forward simulations to show that the observed signatures require a high rate of strongly adaptive substitutions in the vicinity of the amino acid changes. We further demonstrate that the observed signatures of positive selection correlate more strongly with the presence of regulatory sequences, as predicted by ENCODE (Gerstein et al. 2012), than the positions of amino acid substitutions. Our results establish that adaptation was frequent in human evolution and provide support for the hypothesis of King and Wilson (King and Wilson 1975) that adaptive divergence is primarily driven by regulatory changes.
Probability modelling for DNA sequence evolution is well established and provides a rich framework for understanding genetic variation between samples of individuals from one or more populations. We show that both classical and more recent models for coalescence (with or without recombination) can be described in terms of the so-called phase-type theory, where complicated and tedious calculations are circumvented by the use of matrices. The application of phase-type theory consists of describing the stochastic model as a Markov model by appropriately setting up a state space and calculating the corresponding intensity and reward matrices. Formulae of interest are then expressed in terms of these aforementioned matrices. We illustrate this by a few examples calculating the mean, variance and even higher order moments of the site frequency spectrum in the multiple merger coalescent models, and by analysing the mean and variance for the number of segregating sites for multiple samples in the two-locus ancestral recombination graph. We believe that phase-type theory has great potential as a tool for analysing probability models in population genetics. The compact matrix notation is useful for clarification of current models, in particular their formal manipulation (calculation), but also for further development or extensions.
There is a near consensus view that SARS-CoV-2 has a natural zoonotic origin; however, several characteristics of SARS-CoV-2 taken together are not easily explained by a natural zoonotic origin hypothesis. These include: a low rate of evolution in the early phase of transmission; the lack of evidence of recombination events; a high pre-existing binding to human ACE2; a novel furin cleavage site insert; a flat glycan binding domain of the spike protein which conflicts with host evasion survival patterns exhibited by other coronaviruses, and high human and mouse peptide mimicry. Initial assumptions against a laboratory origin, by contrast, have remained unsubstantiated. Furthermore, over a year after the initial outbreak in Wuhan, there is still no clear evidence of zoonotic transfer from a bat or intermediate species. Given the immense social and economic impact of this pandemic, identifying the true origin of SARS-CoV-2 is fundamental to preventing future outbreaks. The search for SARS-CoV-2s origin should include an open and unbiased inquiry into a possible laboratory origin.
Spatial constraints such as rigid barriers affect the dynamics of cell populations, potentially altering the course of natural evolution. In this paper, we study the population genetics of Escherichia coli proliferating in microchannels with open ends. Our experiments reveal that competition among two fluorescently labeled E. coli strains growing in a microchannel generates a stripe pattern aligned with the axial direction of the channel. To account for this observation, we study a lattice population model in which reproducing cells push entire lanes of cells towards the open ends of the channel. By combining mathematical theory, numerical simulations, and experiments, we find that the fixation dynamics is extremely fast along the axial direction, with a logarithmic dependence on the number of cells per lane. In contrast, competition among lanes is a much slower process. We also demonstrate that random mutations that appear in the middle and at the boundaries of the channel are highly likely to reach fixation. By theoretically studying competition between strains of different fitness, we find that the population structure in such a spatially confined system strongly suppresses selection.