No Arabic abstract
Possibilities for using geometry and topology to analyze statistical problems in biology raise a host of novel questions in geometry, probability, algebra, and combinatorics that demonstrate the power of biology to influence the future of pure mathematics. This expository article is a tour through some biological explorations and their mathematical ramifications. The article starts with evolution of novel topological features in wing veins of fruit flies, which are quantified using the algebraic structure of multiparameter persistent homology. The statistical issues involved highlight mathematical implications of sampling from moduli spaces. These lead to geometric probability on stratified spaces, including the sticky phenomenon for Frechet means and the origin of this mathematical area in the reconstruction of phylogenetic trees.
Antimicrobial resistance is an emerging global health crisis that is undermining advances in modern medicine and, if unmitigated, threatens to kill 10 million people per year worldwide by 2050. Research over the last decade has demonstrated that the differences between genetically identical cells in the same environment can lead to drug resistance. Fluctuations in gene expression, modulated by gene regulatory networks, can lead to non-genetic heterogeneity that results in the fractional killing of microbial populations causing drug therapies to fail; this non-genetic drug resistance can enhance the probability of acquiring genetic drug resistance mutations. Mathematical models of gene networks can elucidate general principles underlying drug resistance, predict the evolution of resistance, and guide drug resistance experiments in the laboratory. Cells genetically engineered to carry synthetic gene networks regulating drug resistance genes allow for controlled, quantitative experiments on the role of non-genetic heterogeneity in the development of drug resistance. In this perspective article, we emphasize the contributions that mathematical, computational, and synthetic gene network models play in advancing our understanding of antimicrobial resistance to discover effective therapies against drug-resistant infections.
Computer simulations have become an important tool across the biomedical sciences and beyond. For many important problems several different models or hypotheses exist and choosing which one best describes reality or observed data is not straightforward. We therefore require suitable statistical tools that allow us to choose rationally between different mechanistic models of e.g. signal transduction or gene regulation networks. This is particularly challenging in systems biology where only a small number of molecular species can be assayed at any given time and all measurements are subject to measurement uncertainty. Here we develop such a model selection framework based on approximate Bayesian computation and employing sequential Monte Carlo sampling. We show that our approach can be applied across a wide range of biological scenarios, and we illustrate its use on real data describing influenza dynamics and the JAK-STAT signalling pathway. Bayesian model selection strikes a balance between the complexity of the simulation models and their ability to describe observed data. The present approach enables us to employ the whole formal apparatus to any system that can be (efficiently) simulated, even when exact likelihoods are computationally intractable.
Synthetic biology is the engineering of cellular networks. It combines principles of engineering and the knowledge of biological networks to program the behavior of cells. Computational modeling techniques in conjunction with molecular biology techniques have been successful in constructing biological devices such as switches, oscillators, and gates. The ambition of synthetic biology is to construct complex systems from such fundamental devices, much in the same way electronic circuits are built from basic parts. As this ambition becomes a reality, engineering concepts such as interchangeable parts and encapsulation will find their way into biology. We realize that there is a need for computational tools that would support such engineering concepts in biology. As a solution, we have developed the software Athena that allows biological models to be constructed as modules. Modules can be connected to one another without altering the modules themselves. In addition, Athena houses various tools useful for designing synthetic networks including tools to perform simulations, automatically derive transcription rate expressions, and view and edit synthetic DNA sequences. New tools can be incorporated into Athena without modifying existing program via a plugin interface, IronPython scripts, Systems Biology Workbench interfacing and the R statistical language. The program is currently for Windows operating systems, and the source code for Athena is made freely available through CodePlex, www.codeplex.com/athena.
Data science has emerged from the proliferation of digital data, coupled with advances in algorithms, software and hardware (e.g., GPU computing). Innovations in structural biology have been driven by similar factors, spurring us to ask: can these two fields impact one another in deep and hitherto unforeseen ways? We posit that the answer is yes. New biological knowledge lies in the relationships between sequence, structure, function and disease, all of which play out on the stage of evolution, and data science enables us to elucidate these relationships at scale. Here, we consider the above question from the five key pillars of data science: acquisition, engineering, analytics, visualization and policy, with an emphasis on machine learning as the premier analytics approach.
Cellular heterogeneity is an immanent property of biological systems that covers very different aspects of life ranging from genetic diversity to cell-to-cell variability driven by stochastic molecular interactions, and noise induced cell differentiation. Here, we review recent developments in characterizing cellular heterogeneity by distributions and argue that understanding multicellular life requires the analysis of heterogeneity dynamics at single cell resolution by integrative approaches that combine methods from non-equilibrium statistical physics, information theory and omics biology.