ﻻ يوجد ملخص باللغة العربية
The Set Covering Machine (SCM) is a greedy learning algorithm that produces sparse classifiers. We extend the SCM for datasets that contain a huge number of features. The whole genetic material of living organisms is an example of such a case, where the number of feature exceeds 10^7. Three human pathogens were used to evaluate the performance of the SCM at predicting antimicrobial resistance. Our results show that the SCM compares favorably in terms of sparsity and accuracy against L1 and L2 regularized Support Vector Machines and CART decision trees. Moreover, the SCM was the only algorithm that could consider the full feature space. For all other algorithms, the latter had to be filtered as a preprocessing step.
Antimicrobial resistance is an important public health concern that has implications in the practice of medicine worldwide. Accurately predicting resistance phenotypes from genome sequences shows great promise in promoting better use of antimicrobial
We calculate the mutual information function for each of the 24 chromosomes in the human genome. The same correlation pattern is observed regardless the individual functional features of each chromosome. Moreover, correlations of different scale leng
Data on the number of Open Reading Frames (ORFs) coded by genomes from the 3 domains of Life show some notable general features including essential differences between the Prokaryotes and Eukaryotes, with the number of ORFs growing linearly with tota
Engineering the entire genome of an organism enables large-scale changes in organization, function, and external interactions, with significant implications for industry, medicine, and the environment. Improvements to DNA synthesis and organism engin
Being able to store and transmit human genome sequences is an important part in genomic research and industrial applications. The complete human genome has 3.1 billion base pairs (haploid), and storing the entire genome naively takes about 3 GB, whic