Learning interpretable models of phenotypes from whole genome sequences with the Set Covering Machine

نشر في Alexandre Drouin بتاريخ 2014 في مجال علم الأحياء والبحث باللغة English تحميل البحث

الملخص بالإنكليزية

The increased affordability of whole genome sequencing has motivated its use for phenotypic studies. We address the problem of learning interpretable models for discrete phenotypes from whole genomes. We propose a general approach that relies on the Set Covering Machine and a k-mer representation of the genomes. We show results for the problem of predicting the resistance of Pseudomonas Aeruginosa, an important human pathogen, against 4 antibiotics. Our results demonstrate that extremely sparse models which are biologically relevant can be learnt using this approach.

تحميل البحث