No Arabic abstract
Motivated by applications in systems biology, we seek a probabilistic framework based on Markov processes to represent intracellular processes. We review the formal relationships between different stochastic models referred to in the systems biology literature. As part of this review, we present a novel derivation of the differential Chapman-Kolmogorov equation for a general multidimensional Markov process made up of both continuous and jump processes. We start with the definition of a time-derivative for a probability density but place no restrictions on the probability distribution, in particular, we do not assume it to be confined to a region that has a surface (on which the probability is zero). In our derivation, the master equation gives the jump part of the Markov process while the Fokker-Planck equation gives the continuous part. We thereby sketch a {}``family tree for stochastic models in systems biology, providing explicit derivations of their formal relationship and clarifying assumptions involved.
Reproducibility and reusability of the results of data-based modeling studies are essential. Yet, there has been -- so far -- no broadly supported format for the specification of parameter estimation problems in systems biology. Here, we introduce PEtab, a format which facilitates the specification of parameter estimation problems using Systems Biology Markup Language (SBML) models and a set of tab-separated value files describing the observation model and experimental data as well as parameters to be estimated. We already implemented PEtab support into eight well-established model simulation and parameter estimation toolboxes with hundreds of users in total. We provide a Python library for validation and modification of a PEtab problem and currently 20 example parameter estimation problems based on recent studies. Specifications of PEtab, the PEtab Python library, as well as links to examples, and all supporting software tools are available at https://github.com/PEtab-dev/PEtab, a snapshot is available at https://doi.org/10.5281/zenodo.3732958. All original content is available under permissive licenses.
Computer simulations have become an important tool across the biomedical sciences and beyond. For many important problems several different models or hypotheses exist and choosing which one best describes reality or observed data is not straightforward. We therefore require suitable statistical tools that allow us to choose rationally between different mechanistic models of e.g. signal transduction or gene regulation networks. This is particularly challenging in systems biology where only a small number of molecular species can be assayed at any given time and all measurements are subject to measurement uncertainty. Here we develop such a model selection framework based on approximate Bayesian computation and employing sequential Monte Carlo sampling. We show that our approach can be applied across a wide range of biological scenarios, and we illustrate its use on real data describing influenza dynamics and the JAK-STAT signalling pathway. Bayesian model selection strikes a balance between the complexity of the simulation models and their ability to describe observed data. The present approach enables us to employ the whole formal apparatus to any system that can be (efficiently) simulated, even when exact likelihoods are computationally intractable.
Models of biological systems often have many unknown parameters that must be determined in order for model behavior to match experimental observations. Commonly-used methods for parameter estimation that return point estimates of the best-fit parameters are insufficient when models are high dimensional and under-constrained. As a result, Bayesian methods, which treat model parameters as random variables and attempt to estimate their probability distributions given data, have become popular in systems biology. Bayesian parameter estimation often relies on Markov Chain Monte Carlo (MCMC) methods to sample model parameter distributions, but the slow convergence of MCMC sampling can be a major bottleneck. One approach to improving performance is parallel tempering (PT), a physics-based method that uses swapping between multiple Markov chains run in parallel at different temperatures to accelerate sampling. The temperature of a Markov chain determines the probability of accepting an unfavorable move, so swapping with higher temperatures chains enables the sampling chain to escape from local minima. In this work we compared the MCMC performance of PT and the commonly-used Metropolis-Hastings (MH) algorithm on six biological models of varying complexity. We found that for simpler models PT accelerated convergence and sampling, and that for more complex models, PT often converged in cases MH became trapped in non-optimal local minima. We also developed a freely-available MATLAB package for Bayesian parameter estimation called PTempEst (http://github.com/RuleWorld/ptempest), which is closely integrated with the popular BioNetGen software for rule-based modeling of biological systems.
Although reproducibility is a core tenet of the scientific method, it remains challenging to reproduce many results. Surprisingly, this also holds true for computational results in domains such as systems biology where there have been extensive standardization efforts. For example, Tiwari et al. recently found that they could only repeat 50% of published simulation results in systems biology. Toward improving the reproducibility of computational systems research, we identified several resources that investigators can leverage to make their research more accessible, executable, and comprehensible by others. In particular, we identified several domain standards and curation services, as well as powerful approaches pioneered by the software engineering industry that we believe many investigators could adopt. Together, we believe these approaches could substantially enhance the reproducibility of systems biology research. In turn, we believe enhanced reproducibility would accelerate the development of more sophisticated models that could inform precision medicine and synthetic biology.
1. Movement is the primary means by which animals obtain resources and avoid hazards. Most movement exhibits directional bias that is related to environmental features (taxis), such as the location of food patches, predators, ocean currents, or wind. Numerous behaviours with directional bias can be characterized by maintaining orientation at an angle relative to the environmental stimuli (menotaxis), including navigation relative to sunlight or magnetic fields and energy-conserving flight across wind. However, no statistical methods exist to flexibly classify and characterise such directional bias. 2. We propose a biased correlated random walk model that can identify menotactic behaviours by predicting turning angle as a trade-off between directional persistence and directional bias relative to environmental stimuli without making a priori assumptions about the angle of bias. We apply the model within the framework of a multi-state hidden Markov model (HMM) and describe methods to remedy information loss associated with coarse environmental data to improve the classification and parameterization of directional bias. 3. Using simulation studies, we illustrate how our method more accurately classifies behavioural states compared to conventional correlated random walk HMMs that do not incorporate directional bias. We illustrate the application of these methods by identifying cross wind olfactory foraging and drifting behaviour mediated by wind-driven sea ice drift in polar bears (Ursus maritimus) from movement data collected by satellite telemetry. 4. The extensions we propose can be readily applied to movement data to identify and characterize behaviours with directional bias toward any angle, and open up new avenues to investigate more mechanistic relationships between animal movement and the environment.