A wide range of applications and research has been done with genome-scale metabolic models. In this work we describe a methodology for comparing metabolic networks constructed from genome-scale metabolic models and how to apply this comparison in order to infer evolutionary distances between different organisms. Our methodology allows a quantification of the metabolic differences between different species from a broad range of families and even kingdoms. This quantification is then applied in order to reconstruct phylogenetic trees for sets of various organisms.
Background: Nowadays, the reconstruction of genome scale metabolic models is a non-automatized and interactive process based on decision taking. This lengthy process usually requires a full year of one persons work in order to satisfactory collect, a
nalyze and validate the list of all metabolic reactions present in a specific organism. In order to write this list, one manually has to go through a huge amount of genomic, metabolomic and physiological information. Currently, there is no optimal algorithm that allows one to automatically go through all this information and generate the models taking into account probabilistic criteria of unicity and completeness that a biologist would consider. Results: This work presents the automation of a methodology for the reconstruction of genome scale metabolic models for any organism. The methodology that follows is the automatized version of the steps implemented manually for the reconstruction of the genome scale metabolic model of a photosynthetic organism, {it Synechocystis sp. PCC6803}. The steps for the reconstruction are implemented in a computational platform (COPABI) that generates the models from the probabilistic algorithms that have been developed. Conclusions: For validation of the developed algorithm robustness, the metabolic models of several organisms generated by the platform have been studied together with published models that have been manually curated. Network properties of the models like connectivity and average shortest mean path of the different models have been compared and analyzed.
We present a model for continuous cell culture coupling intra-cellular metabolism to extracellular variables describing the state of the bioreactor, taking into account the growth capacity of the cell and the impact of toxic byproduct accumulation. W
e provide a method to determine the steady states of this system that is tractable for metabolic networks of arbitrary complexity. We demonstrate our approach in a toy model first, and then in a genome-scale metabolic network of the Chinese hamster ovary cell line, obtaining results that are in qualitative agreement with experimental observations. More importantly, we derive a number of consequences from the model that are independent of parameter values. First, that the ratio between cell density and dilution rate is an ideal control parameter to fix a steady state with desired metabolic properties invariant across perfusion systems. This conclusion is robust even in the presence of multi-stability, which is explained in our model by the negative feedback loop on cell growth due to toxic byproduct accumulation. Moreover, a complex landscape of steady states in continuous cell culture emerges from our simulations, including multiple metabolic switches, which also explain why cell-line and media benchmarks carried out in batch culture cannot be extrapolated to perfusion. On the other hand, we predict invariance laws between continuous cell cultures with different parameters. A practical consequence is that the chemostat is an ideal experimental model for large-scale high-density perfusion cultures, where the complex landscape of metabolic transitions is faithfully reproduced. Thus, in order to actually reflect the expected behavior in perfusion, performance benchmarks of cell-lines and culture media should be carried out in a chemostat.
Because biological processes can make different loci have different evolutionary histories, species tree estimation requires multiple loci from across the genome. While many processes can result in discord between gene trees and species trees, incomp
lete lineage sorting (ILS), modeled by the multi-species coalescent, is considered to be a dominant cause for gene tree heterogeneity. Coalescent-based methods have been developed to estimate species trees, many of which operate by combining estimated gene trees, and so are called summary methods. Because summary methods are generally fast, they have become very popular techniques for estimating species trees from multiple loci. However, recent studies have established that summary methods can have reduced accuracy in the presence of gene tree estimation error, and also that many biological datasets have substantial gene tree estimation error, so that summary methods may not be highly accurate on biologically realistic conditions. Mirarab et al. (Science 2014) presented the statistical binning technique to improve gene tree estimation in multi-locus analyses, and showed that it improved the accuracy of MP-EST, one of the most popular coalescent-based summary methods. Statistical binning, which uses a simple statistical test for combinability and then uses the larger sets of genes to re-calculate gene trees, has good empirical performance, but using statistical binning within a phylogenomics pipeline does not have the desirable property of being statistically consistent. We show that weighting the recalculated gene trees by the bin sizes makes statistical binning statistically consistent under the multispecies coalescent, and maintains the good empirical performance. Thus, weighted statistical binning enables highly accurate genome-scale species tree estimation, and is also statistical consistent under the multi-species coalescent model.
A key step in the origin of life is the emergence of a primitive metabolism. This requires the formation of a subset of chemical reactions that is both self-sustaining and collectively autocatalytic. A generic theory to study such processes (called R
AF theory) has provided a precise and computationally effective way to address these questions, both on simulated data and in laboratory studies. One of the classic applications of this theory (arising from Stuart Kauffmans pioneering work in the 1980s) involves networks of polymers under cleavage and ligation reactions; in the first part of this paper, we provide the first exact description of the number of such reactions under various model assumptions. Conclusions from earlier studies relied on either approximations or asymptotic counting, and we show that the exact counts lead to similar (though not always identical) asymptotic results. In the second part of the paper, we solve some questions posed in more recent papers concerning the computational complexity of some key questions in RAF theory. In particular, although there is a fast algorithm to determine whether or not a catalytic reaction network contains a subset that is both self-sustaining and autocatalytic (and, if so, find one), determining whether or not sets exist that satisfy certain additional constraints exist turns out to be NP-complete.
Background: The study of genome-scale metabolic models and their underlying networks is one of the most important fields in systems biology. The complexity of these models and their description makes the use of computational tools an essential elemen
t in their research. Therefore there is a strong need of efficient and versatile computational tools for the research in this area. Results: In this manuscript we present PyNetMet, a Python library of tools to work with networks and metabolic models. These are open-source free tools for use in a Python platform, which adds considerably versatility to them when compared with their desktop software similars. On the other hand these tools allow one to work with different standards of metabolic models (OptGene and SBML) and the fact that they are programmed in Python opens the possibility of efficient integration with any other already existing Python tool. Conclusions: PyNetMet is, therefore, a collection of computational tools that will facilitate the research work with metabolic models and networks.