No Arabic abstract
In order to analyze data from cancer genome sequencing projects, we need to be able to distinguish causative, or driver, mutations from passenger mutations that have no selective effect. Toward this end, we prove results concerning the frequency of neutural mutations in exponentially growing multitype branching processes that have been widely used in cancer modeling. Our results yield a simple new population genetics result for the site frequency spectrum of a sample from an exponentially growing population.
Spatial constraints such as rigid barriers affect the dynamics of cell populations, potentially altering the course of natural evolution. In this paper, we study the population genetics of Escherichia coli proliferating in microchannels with open ends. Our experiments reveal that competition among two fluorescently labeled E. coli strains growing in a microchannel generates a stripe pattern aligned with the axial direction of the channel. To account for this observation, we study a lattice population model in which reproducing cells push entire lanes of cells towards the open ends of the channel. By combining mathematical theory, numerical simulations, and experiments, we find that the fixation dynamics is extremely fast along the axial direction, with a logarithmic dependence on the number of cells per lane. In contrast, competition among lanes is a much slower process. We also demonstrate that random mutations that appear in the middle and at the boundaries of the channel are highly likely to reach fixation. By theoretically studying competition between strains of different fitness, we find that the population structure in such a spatially confined system strongly suppresses selection.
Probability modelling for DNA sequence evolution is well established and provides a rich framework for understanding genetic variation between samples of individuals from one or more populations. We show that both classical and more recent models for coalescence (with or without recombination) can be described in terms of the so-called phase-type theory, where complicated and tedious calculations are circumvented by the use of matrices. The application of phase-type theory consists of describing the stochastic model as a Markov model by appropriately setting up a state space and calculating the corresponding intensity and reward matrices. Formulae of interest are then expressed in terms of these aforementioned matrices. We illustrate this by a few examples calculating the mean, variance and even higher order moments of the site frequency spectrum in the multiple merger coalescent models, and by analysing the mean and variance for the number of segregating sites for multiple samples in the two-locus ancestral recombination graph. We believe that phase-type theory has great potential as a tool for analysing probability models in population genetics. The compact matrix notation is useful for clarification of current models, in particular their formal manipulation (calculation), but also for further development or extensions.
The key findings of classical population genetics are derived using a framework based on information theory using the entropies of the allele frequency distribution as a basis. The common results for drift, mutation, selection, and gene flow will be rewritten both in terms of information theoretic measurements and used to draw the classic conclusions for balance conditions and common features of one locus dynamics. Linkage disequilibrium will also be discussed including the relationship between mutual information and r^2 and a simple model of hitchhiking.
Microbial metabolic networks perform the basic function of harvesting energy from nutrients to generate the work and free energy required for survival, growth and replication. The robust physiological outcomes they generate across vastly different organisms in spite of major environmental and genetic differences represent an especially remarkable trait. Most notably, it suggests that metabolic activity in bacteria may follow universal principles, the search for which is a long-standing issue. Most theoretical approaches to modeling metabolism assume that cells optimize specific evolutionarily-motivated objective functions (like their growth rate) under general physico-chemical or regulatory constraints. While conceptually and practically useful in many situations, the idea that certain objectives are optimized is hard to validate in data. Moreover, it is not clear how optimality can be reconciled with the degree of single-cell variability observed within microbial populations. To shed light on these issues, we propose here an inverse modeling framework that connects fitness to variability through the Maximum-Entropy guided inference of metabolic flux distributions from data. While no clear optimization emerges, we find that, as the medium gets richer, Escherichia coli populations slowly approach the theoretically optimal performance defined by minimal reduction of phenotypic variability at given mean growth rate. Inferred flux distributions provide multiple biological insights, including on metabolic reactions that are experimentally inaccessible. These results suggest that bacterial metabolism is crucially shaped by a population-level trade-off between fitness and cell-to-cell heterogeneity.
Many questions that we have about the history and dynamics of organisms have a geographical component: How many are there, and where do they live? How do they move and interbreed across the landscape? How were they moving a thousand years ago, and where were the ancestors of a particular individual alive today? Answers to these questions can have profound consequences for our understanding of history, ecology, and the evolutionary process. In this review, we discuss how geographic aspects of the distribution, movement, and reproduction of organisms are reflected in their pedigree across space and time. Because the structure of the pedigree is what determines patterns of relatedness in modern genetic variation, our aim is to thus provide intuition for how these processes leave an imprint in genetic data. We also highlight some current methods and gaps in the statistical toolbox of spatial population genetics.