Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Algebraic Model Selection and Experimental Design in Biological Data Science

52 0 0.0 ( 0 )

Download Cite

Added by Brandilyn Stigler

Publication date 2021

fields Biology

and research's language is English

Authors Anyu Zhang - Jingzhen Hu - Qingzhong Liang

Algebraic Geometry Quantitative Methods

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Design of experiments and model selection, though essential steps in data science, are usually viewed as unrelated processes in the study and analysis of biological networks. Not accounting for their inter-relatedness has the potential to introduce bias and increase the risk of missing salient features in the modeling process. We propose a data-driven computational framework to unify experimental design and model selection for discrete data sets and minimal polynomial models. We use a special affine transformation, called a linear shift, to provide both the data sets and the polynomial terms that form a basis for a model. This framework enables us to address two important questions that arise in biological data science research: finding the data which identify a set of known interactions and finding identifiable interactions given a set of data. We present the theoretical foundation for a web-accessible database. As an example, we apply this methodology to a previously constructed pharmacodynamic model of epidermal derived growth factor receptor (EGFR) signaling.

rate research

Data-driven modelling of biological multi-scale processes

449 - Jan Hasenauer , Nick Jagiella , Sabrina Hross 2015

Biological processes involve a variety of spatial and temporal scales. A holistic understanding of many biological processes therefore requires multi-scale models which capture the relevant properties on all these scales. In this manuscript we review mathematical modelling approaches used to describe the individual spatial scales and how they are integrated into holistic models. We discuss the relation between spatial and temporal scales and the implication of that on multi-scale modelling. Based upon this overview over state-of-the-art modelling approaches, we formulate key challenges in mathematical and computational modelling of biological multi-scale and multi-physics processes. In particular, we considered the availability of analysis tools for multi-scale models and model-based multi-scale data integration. We provide a compact review of methods for model-based data integration and model-based hypothesis testing. Furthermore, novel approaches and recent trends are discussed, including computation time reduction using reduced order and surrogate models, which contribute to the solution of inference problems. We conclude the manuscript by providing a few ideas for the development of tailored multi-scale inference methods.

Molecular Networks Quantitative Methods

Statistical model selection methods applied to biological networks

103 - M.P.H. Stumpf , P.J. Ingram , I. Nouvel 2005

Many biological networks have been labelled scale-free as their degree distribution can be approximately described by a powerlaw distribution. While the degree distribution does not summarize all aspects of a network it has often been suggested that its functional form contains important clues as to underlying evolutionary processes that have shaped the network. Generally determining the appropriate functional form for the degree distribution has been fitted in an ad-hoc fashion. Here we apply formal statistical model selection methods to determine which functional form best describes degree distributions of protein interaction and metabolic networks. We interpret the degree distribution as belonging to a class of probability models and determine which of these models provides the best description for the empirical data using maximum likelihood inference, composite likelihood methods, the Akaike information criterion and goodness-of-fit tests. The whole data is used in order to determine the parameter that best explains the data under a given model (e.g. scale-free or random graph). As we will show, present protein interaction and metabolic network data from different organisms suggests that simple scale-free models do not provide an adequate description of real network data.

Molecular Networks Other Quantitative Biology

Optimal Experimental Design for Mathematical Models of Hematopoiesis

68 - Luis Martinez Lomeli , Abdon Iniguez , Babak Shahbaba 2020

The hematopoietic system has a highly regulated and complex structure in which cells are organized to successfully create and maintain new blood cells. Feedback regulation is crucial to tightly control this system, but the specific mechanisms by which control is exerted are not completely understood. In this work, we aim to uncover the underlying mechanisms in hematopoiesis by conducting perturbation experiments, where animal subjects are exposed to an external agent in order to observe the system response and evolution. Developing a proper experimental design for these studies is an extremely challenging task. To address this issue, we have developed a novel Bayesian framework for optimal design of perturbation experiments. We model the numbers of hematopoietic stem and progenitor cells in mice that are exposed to a low dose of radiation. We use a differential equations model that accounts for feedback and feedforward regulation. A significant obstacle is that the experimental data are not longitudinal, rather each data point corresponds to a different animal. This model is embedded in a hierarchical framework with latent variables that capture unobserved cellular population levels. We select the optimum design based on the amount of information gain, measured by the Kullback-Leibler divergence between the probability distributions before and after observing the data. We evaluate our approach using synthetic and experimental data. We show that a proper design can lead to better estimates of model parameters even with relatively few subjects. Additionally, we demonstrate that the model parameters show a wide range of sensitivities to design options. Our method should allow scientists to find the optimal design by focusing on their specific parameters of interest and provide insight to hematopoiesis. Our approach can be extended to more complex models where latent components are used.

Methodology Quantitative Methods Applications

Autofocused oracles for model-based design

64 - Clara Fannjiang , Jennifer Listgarten 2020

Data-driven design is making headway into a number of application areas, including protein, small-molecule, and materials engineering. The design goal is to construct an object with desired properties, such as a protein that binds to a therapeutic target, or a superconducting material with a higher critical temperature than previously observed. To that end, costly experimental measurements are being replaced with calls to high-capacity regression models trained on labeled data, which can be leveraged in an in silico search for design candidates. However, the design goal necessitates moving into regions of the design space beyond where such models were trained. Therefore, one can ask: should the regression model be altered as the design algorithm explores the design space, in the absence of new data? Herein, we answer this question in the affirmative. In particular, we (i) formalize the data-driven design problem as a non-zero-sum game, (ii) develop a principled strategy for retraining the regression model as the design algorithm proceeds---what we refer to as autofocusing, and (iii) demonstrate the promise of autofocusing empirically.

Machine Learning Quantitative Methods Machine Learning

Modeling biological systems with delays in Bio-PEPA

363 - Giulio Caravagna 2010

Delays in biological systems may be used to model events for which the underlying dynamics cannot be precisely observed, or to provide abstraction of some behavior of the system resulting more compact models. In this paper we enrich the stochastic process algebra Bio-PEPA, with the possibility of assigning delays to actions, yielding a new non-Markovian process algebra: Bio-PEPAd. This is a conservative extension meaning that the original syntax of Bio-PEPA is retained and the delay specification which can now be associated with actions may be added to existing Bio-PEPA models. The semantics of the firing of the actions with delays is the delay-as-duration approach, earlier presented in papers on the stochastic simulation of biological systems with delays. These semantics of the algebra are given in the Starting-Terminating style, meaning that the state and the completion of an action are observed as two separate events, as required by delays. Furthermore we outline how to perform stochastic simulation of Bio-PEPAd systems and how to automatically translate a Bio-PEPAd system into a set of Delay Differential Equations, the deterministic framework for modeling of biological systems with delays. We end the paper with two example models of biological systems with delays to illustrate the approach.

Computational Engineering Quantitative Methods

comments

Fetching comments

Aِl-Baath University

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Algebraic Model Selection and Experimental Design in Biological Data Science

Ask ChatGPT about the research

No Arabic abstract

Read More