No Arabic abstract
The drive for reproducibility in the computational sciences has provoked discussion and effort across a broad range of perspectives: technological, legislative/policy, education, and publishing. Discussion on these topics is not new, but the need to adopt standards for reproducibility of claims made based on computational results is now clear to researchers, publishers and policymakers alike. Many technologies exist to support and promote reproduction of computational results: containerisation tools like Docker, literate programming approaches such as Sweave, knitr, iPython or cloud environments like Amazon Web Services. But these technologies are tied to specific programming languages (e.g. Sweave/knitr to R; iPython to Python) or to platforms (e.g. Docker for 64-bit Linux environments only). To date, no single approach is able to span the broad range of technologies and platforms represented in computational biology and biotechnology. To enable reproducibility across computational biology, we demonstrate an approach and provide a set of tools that is suitable for all computational work and is not tied to a particular programming language or platform. We present published examples from a series of papers in different areas of computational biology, spanning the major languages and technologies in the field (Python/R/MATLAB/Fortran/C/Java). Our approach produces a transparent and flexible process for replication and recomputation of results. Ultimately, its most valuable aspect is the decoupling of methods in computational biology from their implementation. Separating the how (method) of a publication from the where (implementation) promotes genuinely open science and benefits the scientific community as a whole.
Although reproducibility is a core tenet of the scientific method, it remains challenging to reproduce many results. Surprisingly, this also holds true for computational results in domains such as systems biology where there have been extensive standardization efforts. For example, Tiwari et al. recently found that they could only repeat 50% of published simulation results in systems biology. Toward improving the reproducibility of computational systems research, we identified several resources that investigators can leverage to make their research more accessible, executable, and comprehensible by others. In particular, we identified several domain standards and curation services, as well as powerful approaches pioneered by the software engineering industry that we believe many investigators could adopt. Together, we believe these approaches could substantially enhance the reproducibility of systems biology research. In turn, we believe enhanced reproducibility would accelerate the development of more sophisticated models that could inform precision medicine and synthetic biology.
We present the spectrum of the (normalized) graph Laplacian as a systematic tool for the investigation of networks, and we describe basic properties of eigenvalues and eigenfunctions. Processes of graph formation like motif joining or duplication leave characteristic traces in the spectrum. This can suggest hypotheses about the evolution of a graph representing biological data. To this data, we analyze several biological networks in terms of rough qualitative data of their spectra.
Untargeted metabolomic studies are revealing large numbers of naturally occurring metabolites that cannot be characterized because their chemical structures and MS/MS spectra are not available in databases. Here we present iMet, a computational tool based on experimental tandem mass spectrometry that could potentially allow the annotation of metabolites not discovered previously. iMet uses MS/MS spectra to identify metabolites structurally similar to an unknown metabolite, and gives a net atomic addition or removal that converts the known metabolite into the unknown one. We validate the algorithm with 148 metabolites, and show that for 89% of them at least one of the top four matches identified by iMet enables the proper annotation of the unknown metabolite. iMet is freely available at http://imet.seeslab.net.
Computational intelligence is broadly defined as biologically-inspired computing. Usually, inspiration is drawn from neural systems. This article shows how to analyze neural systems using information theory to obtain constraints that help identify the algorithms run by such systems and the information they represent. Algorithms and representations identified information-theoretically may then guide the design of biologically inspired computing systems (BICS). The material covered includes the necessary introduction to information theory and the estimation of information theoretic quantities from neural data. We then show how to analyze the information encoded in a system about its environment, and also discuss recent methodological developments on the question of how much information each agent carries about the environment either uniquely, or redundantly or synergistically together with others. Last, we introduce the framework of local information dynamics, where information processing is decomposed into component processes of information storage, transfer, and modification -- locally in space and time. We close by discussing example applications of these measures to neural data and other complex systems.
Synthetic biology brings together concepts and techniques from engineering and biology. In this field, computer-aided design (CAD) is necessary in order to bridge the gap between computational modeling and biological data. An application named TinkerCell has been created in order to serve as a CAD tool for synthetic biology. TinkerCell is a visual modeling tool that supports a hierarchy of biological parts. Each part in this hierarchy consists of a set of attributes that define the part, such as sequence or rate constants. Models that are constructed using these parts can be analyzed using various C and Python programs that are hosted by TinkerCell via an extensive C and Python API. TinkerCell supports the notion of a module, which are networks with interfaces. Such modules can be connected to each other, forming larger modular networks. Because TinkerCell associates parameters and equations in a model with their respective part, parts can be loaded from databases along with their parameters and rate equations. The modular network design can be used to exchange modules as well as test the concept of modularity in biological systems. The flexible modeling framework along with the C and Python API allows TinkerCell to serve as a host to numerous third-party algorithms. TinkerCell is a free and open-source project under the Berkeley Software Distribution license. Downloads, documentation, and tutorials are available at www.tinkercell.com.