No Arabic abstract
For many stochastic models of interest in systems biology, such as those describing biochemical reaction networks, exact quantification of parameter uncertainty through statistical inference is intractable. Likelihood-free computational inference techniques enable parameter inference when the likelihood function for the model is intractable but the generation of many sample paths is feasible through stochastic simulation of the forward problem. The most common likelihood-free method in systems biology is approximate Bayesian computation that accepts parameters that result in low discrepancy between stochastic simulations and measured data. However, it can be difficult to assess how the accuracy of the resulting inferences are affected by the choice of acceptance threshold and discrepancy function. The pseudo-marginal approach is an alternative likelihood-free inference method that utilises a Monte Carlo estimate of the likelihood function. This approach has several advantages, particularly in the context of noisy, partially observed, time-course data typical in biochemical reaction network studies. Specifically, the pseudo-marginal approach facilitates exact inference and uncertainty quantification, and may be efficiently combined with particle filters for low variance, high-accuracy likelihood estimation. In this review, we provide a practical introduction to the pseudo-marginal approach using inference for biochemical reaction networks as a series of case studies. Implementations of key algorithms and examples are provided using the Julia programming language; a high performance, open source programming language for scientific computing.
Computer simulations have become an important tool across the biomedical sciences and beyond. For many important problems several different models or hypotheses exist and choosing which one best describes reality or observed data is not straightforward. We therefore require suitable statistical tools that allow us to choose rationally between different mechanistic models of e.g. signal transduction or gene regulation networks. This is particularly challenging in systems biology where only a small number of molecular species can be assayed at any given time and all measurements are subject to measurement uncertainty. Here we develop such a model selection framework based on approximate Bayesian computation and employing sequential Monte Carlo sampling. We show that our approach can be applied across a wide range of biological scenarios, and we illustrate its use on real data describing influenza dynamics and the JAK-STAT signalling pathway. Bayesian model selection strikes a balance between the complexity of the simulation models and their ability to describe observed data. The present approach enables us to employ the whole formal apparatus to any system that can be (efficiently) simulated, even when exact likelihoods are computationally intractable.
In this paper we suggest that, under suitable conditions, supervised learning can provide the basis to formulate at the microscopic level quantitative questions on the phenotype structure of multicellular organisms. The problem of explaining the robustness of the phenotype structure is rephrased as a real geometrical problem on a fixed domain. We further suggest a generalization of path integrals that reduces the problem of deciding whether a given molecular network can generate specific phenotypes to a numerical property of a robustness function with complex output, for which we give heuristic justification. Finally, we use our formalism to interpret a pointedly quantitative developmental biology problem on the allowed number of pairs of legs in centipedes.
We present a new experimental-computational technology of inferring network models that predict the response of cells to perturbations and that may be useful in the design of combinatorial therapy against cancer. The experiments are systematic series of perturbations of cancer cell lines by targeted drugs, singly or in combination. The response to perturbation is measured in terms of levels of proteins and phospho-proteins and of cellular phenotype such as viability. Computational network models are derived de novo, i.e., without prior knowledge of signaling pathways, and are based on simple non-linear differential equations. The prohibitively large solution space of all possible network models is explored efficiently using a probabilistic algorithm, belief propagation, which is three orders of magnitude more efficient than Monte Carlo methods. Explicit executable models are derived for a set of perturbation experiments in Skmel-133 melanoma cell lines, which are resistant to the therapeutically important inhibition of Raf kinase. The resulting network models reproduce and extend known pathway biology. They can be applied to discover new molecular interactions and to predict the effect of novel drug perturbations, one of which is verified experimentally. The technology is suitable for application to larger systems in diverse areas of molecular biology.
The stochastic simulation of large-scale biochemical reaction networks is of great importance for systems biology since it enables the study of inherently stochastic biological mechanisms at the whole cell scale. Stochastic Simulation Algorithms (SSA) allow us to simulate the dynamic behavior of complex kinetic models, but their high computational cost makes them very slow for many realistic size problems. We present a pilot service, named WebStoch, developed in the context of our StochSoCs research project, allowing life scientists with no high-performance computing expertise to perform over the internet stochastic simulations of large-scale biological network models described in the SBML standard format. Biomodels submitted to the service are parsed automatically and then placed for parallel execution on distributed worker nodes. The workers are implemented using multi-core and many-core processors, or FPGA accelerators that can handle the simulation of thousands of stochastic repetitions of complex biomodels, with possibly thousands of reactions and interacting species. Using benchmark LCSE biomodels, whose workload can be scaled on demand, we demonstrate linear speedup and more than two orders of magnitude higher throughput than existing serial simulators.
Although reproducibility is a core tenet of the scientific method, it remains challenging to reproduce many results. Surprisingly, this also holds true for computational results in domains such as systems biology where there have been extensive standardization efforts. For example, Tiwari et al. recently found that they could only repeat 50% of published simulation results in systems biology. Toward improving the reproducibility of computational systems research, we identified several resources that investigators can leverage to make their research more accessible, executable, and comprehensible by others. In particular, we identified several domain standards and curation services, as well as powerful approaches pioneered by the software engineering industry that we believe many investigators could adopt. Together, we believe these approaches could substantially enhance the reproducibility of systems biology research. In turn, we believe enhanced reproducibility would accelerate the development of more sophisticated models that could inform precision medicine and synthetic biology.