No Arabic abstract
The human microbiome is the ensemble of genes in the microbes that live inside and on the surface of humans. Because microbial sequencing information is now much easier to come by than phenotypic information, there has been an explosion of sequencing and genetic analysis of microbiome samples. Much of the analytical work for these sequences involves phylogenetics, at least indirectly, but methodology has developed in a somewhat different direction than for other applications of phylogenetics. In this paper I review the field and its methods from the perspective of a phylogeneticist, as well as describing current challenges for phylogenetics coming from this type of work.
Genetic studies of human traits have revolutionized our understanding of the variation between individuals, and opened the door for numerous breakthroughs in biology, medicine and other scientific fields. And yet, the ultimate promise of this area of research is still not fully realized. In this review, we highlight the major open problems that need to be solved to improve our understanding of the genetic variation underlying human traits, and by discussing these challenges provide a primer to the field. Our focus is on concrete analytical problems, both conceptual and technical in nature. We cover general issues in genetic studies such as population structure, epistasis and gene-environment interactions, data-related issues such as ethnic diversity and rare genetic variants, and specific challenges related to heritability estimates, genetic association studies and polygenic risk scores. We emphasize the interconnectedness of these open problems and suggest promising avenues to address them.
The role of positive selection in human evolution remains controversial. On the one hand, scans for positive selection have identified hundreds of candidate loci and the genome-wide patterns of polymorphism show signatures consistent with frequent positive selection. On the other hand, recent studies have argued that many of the candidate loci are false positives and that most apparent genome-wide signatures of adaptation are in fact due to reduction of neutral diversity by linked recurrent deleterious mutations, known as background selection. Here we analyze human polymorphism data from the 1,000 Genomes project (Abecasis et al. 2012) and detect signatures of pervasive positive selection once we correct for the effects of background selection. We show that levels of neutral polymorphism are lower near amino acid substitutions, with the strongest reduction observed specifically near functionally consequential amino acid substitutions. Furthermore, amino acid substitutions are associated with signatures of recent adaptation that should not be generated by background selection, such as the presence of unusually long and frequent haplotypes and specific distortions in the site frequency spectrum. We use forward simulations to show that the observed signatures require a high rate of strongly adaptive substitutions in the vicinity of the amino acid changes. We further demonstrate that the observed signatures of positive selection correlate more strongly with the presence of regulatory sequences, as predicted by ENCODE (Gerstein et al. 2012), than the positions of amino acid substitutions. Our results establish that adaptation was frequent in human evolution and provide support for the hypothesis of King and Wilson (King and Wilson 1975) that adaptive divergence is primarily driven by regulatory changes.
Despite numerous mass extinctions in the Phanerozoic eon, the overall trend in biodiversity evolution was not blocked and the life has never been wiped out. Almost all possible catastrophic events (large igneous province, asteroid impact, climate change, regression and transgression, anoxia, acidification, sudden release of methane clathrate, multi-cause etc.) have been proposed to explain the mass extinctions. However, we should, above all, clarify at what timescale and at what possible levels should we explain the mass extinction? Even though the mass extinctions occurred at short-timescale and at the species level, we reveal that their cause should be explained in a broader context at tectonic timescale and at both the molecular level and the species level. The main result in this paper is that the Phanerozoic biodiversity evolution has been explained by reconstructing the Sepkoski curve based on climatic, eustatic and genomic data. Consequently, we point out that the P-Tr extinction was caused by the tectonically originated climate instability. We also clarify that the overall trend of biodiversification originated from the underlying genome size evolution, and that the fluctuation of biodiversity originated from the interactions among the earths spheres. The evolution at molecular level had played a significant role for the survival of life from environmental disasters.
Covarion models of character evolution describe inhomogeneities in substitution processes through time. In phylogenetics, such models are used to describe changing functional constraints or selection regimes during the evolution of biological sequences. In this work the identifiability of such models for generic parameters on a known phylogenetic tree is established, provided the number of covarion classes does not exceed the size of the observable state space. `Generic parameters as used here means all parameters except possibly those in a set of measure zero within the parameter space. Combined with earlier results, this implies both the tree and generic numerical parameters are identifiable if the number of classes is strictly smaller than the number of observable states.
Human associated microbial communities exert tremendous influence over human health and disease. With modern metagenomic sequencing methods it is possible to follow the relative abundance of microbes in a community over time. These microbial communities exhibit rich ecological dynamics and an important goal of microbial ecology is to infer the interactions between species from sequence data. Any algorithm for inferring species interactions must overcome three obstacles: 1) a correlation between the abundances of two species does not imply that those species are interacting, 2) the sum constraint on the relative abundances obtained from metagenomic studies makes it difficult to infer the parameters in timeseries models, and 3) errors due to experimental uncertainty, or mis-assignment of sequencing reads into operational taxonomic units, bias inferences of species interactions. Here we introduce an approach, Learning Interactions from MIcrobial Time Series (LIMITS), that overcomes these obstacles. LIMITS uses sparse linear regression with boostrap aggregation to infer a discrete-time Lotka-Volterra model for microbial dynamics. We tested LIMITS on synthetic data and showed that it could reliably infer the topology of the inter-species ecological interactions. We then used LIMITS to characterize the species interactions in the gut microbiomes of two individuals and found that the interaction networks varied significantly between individuals. Furthermore, we found that the interaction networks of the two individuals are dominated by distinct keystone species, Bacteroides fragilis and Bacteroided stercosis, that have a disproportionate influence on the structure of the gut microbiome even though they are only found in moderate abundance. Based on our results, we hypothesize that the abundances of certain keystone species may be responsible for individuality in the human gut microbiome.