No Arabic abstract
Ordinary differential equations (ODE) are widely used for modeling in Systems Biology. As most commonly only some of the kinetic parameters are measurable or precisely known, parameter estimation techniques are applied to parametrize the model to experimental data. A main challenge for the parameter estimation is the complexity of the parameter space, especially its high dimensionality and local minima. Parameter estimation techniques consist of an objective function, measuring how well a certain parameter set describes the experimental data, and an optimization algorithm that optimizes this objective function. A lot of effort has been spent on developing highly sophisticated optimization algorithms to cope with the complexity in the parameter space, but surprisingly few articles address the influence of the objective function on the computational complexity in finding global optima. We extend a recently developed multiple shooting for stochastic systems (MSS) objective function for parameter estimation of stochastic models and apply it to parameter estimation of ODE models. This MSS objective function treats the intervals between measurement points separately. This separate treatment allows the ODE trajectory to stay closer to the data and we show that it reduces the complexity of the parameter space. We use examples from Systems Biology, namely a Lotka-Volterra model, a FitzHugh-Nagumo oscillator and a Calcium oscillation model, to demonstrate the power of the MSS approach for reducing the complexity and the number of local minima in the parameter space. The approach is fully implemented in the COPASI software package and, therefore, easily accessible for a wide community of researchers.
Models of biological systems often have many unknown parameters that must be determined in order for model behavior to match experimental observations. Commonly-used methods for parameter estimation that return point estimates of the best-fit parameters are insufficient when models are high dimensional and under-constrained. As a result, Bayesian methods, which treat model parameters as random variables and attempt to estimate their probability distributions given data, have become popular in systems biology. Bayesian parameter estimation often relies on Markov Chain Monte Carlo (MCMC) methods to sample model parameter distributions, but the slow convergence of MCMC sampling can be a major bottleneck. One approach to improving performance is parallel tempering (PT), a physics-based method that uses swapping between multiple Markov chains run in parallel at different temperatures to accelerate sampling. The temperature of a Markov chain determines the probability of accepting an unfavorable move, so swapping with higher temperatures chains enables the sampling chain to escape from local minima. In this work we compared the MCMC performance of PT and the commonly-used Metropolis-Hastings (MH) algorithm on six biological models of varying complexity. We found that for simpler models PT accelerated convergence and sampling, and that for more complex models, PT often converged in cases MH became trapped in non-optimal local minima. We also developed a freely-available MATLAB package for Bayesian parameter estimation called PTempEst (http://github.com/RuleWorld/ptempest), which is closely integrated with the popular BioNetGen software for rule-based modeling of biological systems.
Biological fitness arises from interactions between molecules, genes, and organisms. To discover the causative mechanisms of this complexity, we must differentiate the significant interactions from a large number of possibilities. Epistasis is the standard way to identify interactions in fitness landscapes. However, this intuitive approach breaks down in higher dimensions for example because the sign of epistasis takes on an arbitrary meaning, and the false discovery rate becomes high. These limitations make it difficult to evaluate the role of epistasis in higher dimensions. Here we develop epistatic filtrations, a dimensionally-normalized approach to define fitness landscape topography for higher dimensional spaces. We apply the method to higher-dimensional datasets from genetics and the gut microbiome. This reveals a sparse higher-order structure that often arises from lower-order. Despite sparsity, these higher-order effects carry significant effects on biological fitness and are consequential for ecology and evolution.
Reproducibility and reusability of the results of data-based modeling studies are essential. Yet, there has been -- so far -- no broadly supported format for the specification of parameter estimation problems in systems biology. Here, we introduce PEtab, a format which facilitates the specification of parameter estimation problems using Systems Biology Markup Language (SBML) models and a set of tab-separated value files describing the observation model and experimental data as well as parameters to be estimated. We already implemented PEtab support into eight well-established model simulation and parameter estimation toolboxes with hundreds of users in total. We provide a Python library for validation and modification of a PEtab problem and currently 20 example parameter estimation problems based on recent studies. Specifications of PEtab, the PEtab Python library, as well as links to examples, and all supporting software tools are available at https://github.com/PEtab-dev/PEtab, a snapshot is available at https://doi.org/10.5281/zenodo.3732958. All original content is available under permissive licenses.
In this paper, a new stochastic framework for parameter estimation and uncertainty quantification in colon cancer-induced angiogenesis, using patient data, is presented. The dynamics of colon cancer is given by a stochastic process that captures the inherent randomness in the system. The stochastic framework is based on the Fokker-Planck equation that represents the evolution of the probability density function corresponding to the stochastic process. An optimization problem is formulated that takes input individual patient data with randomness present, and is solved to obtain the unknown parameters corresponding to the individual tumor characteristics. Furthermore, sensitivity analysis of the optimal parameter set is performed to determine the parameters that need to be controlled, thus, providing information of the type of drugs that can be used for treatment.
We consider the problem of estimating local sensor parameters, where the local parameters and sensor observations are related through linear stochastic models. Sensors exchange messages and cooperate with each other to estimate their own local parameters iteratively. We study the Gaussian Sum-Product Algorithm over a Wireless Network (gSPAWN) procedure, which is based on belief propagation, but uses fixed size broadcast messages at each sensor instead. Compared with the popular diffusion strategies for performing network parameter estimation, whose communication cost at each sensor increases with increasing network density, the gSPAWN algorithm allows sensors to broadcast a message whose size does not depend on the network size or density, making it more suitable for applications in wireless sensor networks. We show that the gSPAWN algorithm converges in mean and has mean-square stability under some technical sufficient conditions, and we describe an application of the gSPAWN algorithm to a network localization problem in non-line-of-sight environments. Numerical results suggest that gSPAWN converges much faster in general than the diffusion method, and has lower communication costs, with comparable root mean square errors.