Algebraic Model Selection and Experimental Design in Biological Data Science


Abstract in English

Design of experiments and model selection, though essential steps in data science, are usually viewed as unrelated processes in the study and analysis of biological networks. Not accounting for their inter-relatedness has the potential to introduce bias and increase the risk of missing salient features in the modeling process. We propose a data-driven computational framework to unify experimental design and model selection for discrete data sets and minimal polynomial models. We use a special affine transformation, called a linear shift, to provide both the data sets and the polynomial terms that form a basis for a model. This framework enables us to address two important questions that arise in biological data science research: finding the data which identify a set of known interactions and finding identifiable interactions given a set of data. We present the theoretical foundation for a web-accessible database. As an example, we apply this methodology to a previously constructed pharmacodynamic model of epidermal derived growth factor receptor (EGFR) signaling.

Download