pyBioSig: optimizing group discrimination using genetic algorithms for biosignature discovery


Abstract in English

In medical sciences, a biomarker is a characteristic that is objectively measured and evaluated as an indicator of normal biological processes, pathogenic processes, or pharmacologic responses to a therapeutic intervention. Molecular experiments are providing rapid and systematic approaches to search for biomarkers, but because single-molecule biomarkers have shown a disappointing lack of robustness for clinical diagnosis, researchers have begun searching for distinctive sets of molecules, called biosignatures. However, the most popular statistics are not appropriate for their identification, and the number of possible biosignatures to be tested is frequently intractable. In the present work, we developed a multivariate filter using genetic algorithms (GA) as a feature (gene) selector to optimize a measure of intra-group cohesion and inter-group dispersion. This method was implemented using Python and R (pyBioSig, available at https://github.com/fredgca/pybiosig under LGPL) and can be manipulated via graphical interface or Python scripts. Using it, we were able to identify putative biosignatures composed by just a few genes and capable of recovering multiple groups simultaneously in a hierarchical clustering, even ones that were not recovered using the whole transcriptome, within a feasible length of time using a personal computer. Our results allowed us to conclude that using GA to optimize our new intra-group cohesion and inter-group dispersion measure is a clear, effective, and computationally feasible strategy for the identification of putative omical biosignatures that could support discrimination among multiple groups simultaneously.

Download