Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Learning interpretable models of phenotypes from whole genome sequences with the Set Covering Machine

549 0 0.0 ( 0 )

Download Cite

Added by Alexandre Drouin

Publication date 2014

fields Biology Informatics Engineering

and research's language is English

Authors Alexandre Drouin - Sebastien Gigu`ere - Vladana Sagatovich

Genomics Computational Engineering Machine Learning

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

The increased affordability of whole genome sequencing has motivated its use for phenotypic studies. We address the problem of learning interpretable models for discrete phenotypes from whole genomes. We propose a general approach that relies on the Set Covering Machine and a k-mer representation of the genomes. We show results for the problem of predicting the resistance of Pseudomonas Aeruginosa, an important human pathogen, against 4 antibiotics. Our results demonstrate that extremely sparse models which are biologically relevant can be learnt using this approach.

rate research

Predicting Molecular Phenotypes with Single Cell RNA Sequencing Data: an Assessment of Unsupervised Machine Learning Models

434 - Anastasia Dunca , Frederick R. Adler 2021

According to the National Cancer Institute, there were 9.5 million cancer-related deaths in 2018. A challenge in improving treatment is resistance in genetically unstable cells. The purpose of this study is to evaluate unsupervised machine learning on classifying treatment-resistant phenotypes in heterogeneous tumors through analysis of single cell RNA sequencing(scRNAseq) data with a pipeline and evaluation metrics. scRNAseq quantifies mRNA in cells and characterizes cell phenotypes. One scRNAseq dataset was analyzed (tumor/non-tumor cells of different molecular subtypes and patient identifications). The pipeline consisted of data filtering, dimensionality reduction with Principal Component Analysis, projection with Uniform Manifold Approximation and Projection, clustering with nine approaches (Ward, BIRCH, Gaussian Mixture Model, DBSCAN, Spectral, Affinity Propagation, Agglomerative Clustering, Mean Shift, and K-Means), and evaluation. Seven models divided tumor versus non-tumor cells and molecular subtype while six models classified different patient identification (13 of which were presented in the dataset); K-Means, Ward, and BIRCH often ranked highest with ~80% accuracy on the tumor versus non-tumor task and ~60% for molecular subtype and patient ID. An optimized classification pipeline using K-Means, Ward, and BIRCH models was evaluated to be most effective for further analysis. In clinical research where there is currently no standard protocol for scRNAseq analysis, clusters generated from this pipeline can be used to understand cancer cell behavior and malignant growth, directly affecting the success of treatment.

Genomics Machine Learning Cell Behavior

Nonlinear excitations in DNA: Aperiodic models vs actual genome sequences

176 - Sara Cuenda , Angel Sanchez 2004

We study the effects of the sequence on the propagation of nonlinear excitations in simple models of DNA in which we incorporate actual DNA sequences obtained from human genome data. We show that kink propagation requires forces over a certain threshold, a phenomenon already found for aperiodic sequences [F. Domi nguez-Adame {em et al.}, Phys. Rev. E {bf 52}, 2183 (1995)]. For forces below threshold, the final stop positions are highly dependent on the specific sequence. The results of our model are consistent with the stick-slip dynamics of the unzipping process observed in experiments. We also show that the effective potential, a collective coordinate formalism introduced by Salerno and Kivshar [Phys. Lett. A {bf 193}, 263 (1994)] is a useful tool to identify key regions in DNA that control the dynamical behavior of large segments. Additionally, our results lead to further insights in the phenomenology observed in aperiodic systems.

Genomics Soft Condensed Matter Mathematical Physics

Long-range correlation in the whole human genome

110 - R. Mansilla , N. Del Castillo , T. Govezensky 2004

We calculate the mutual information function for each of the 24 chromosomes in the human genome. The same correlation pattern is observed regardless the individual functional features of each chromosome. Moreover, correlations of different scale length are detected depicting a multifractal scenario. This fact suggest a unique mechanism of structural evolution. We propose that such a mechanism could be an expansion-modification dynamical system.

Genomics Adaptation and Self-Organizing Systems

Whole genome single nucleotide polymorphism genotyping of Staphylococcus aureus

87 - Changchuan Yin , Stephen S.-T. Yau 2018

Next-generation sequencing technology enables routine detection of bacterial pathogens for clinical diagnostics and genetic research. Whole genome sequencing has been of importance in the epidemiologic analysis of bacterial pathogens. However, few whole genome sequencing-based genotyping pipelines are available for practical applications. Here, we present the whole genome sequencing-based single nucleotide polymorphism (SNP) genotyping method and apply to the evolutionary analysis of methicillin-resistant Staphylococcus aureus. The SNP genotyping method calls genome variants using next-generation sequencing reads of whole genomes and calculates the pair-wise Jaccard distances of the genome variants. The method may reveal the high-resolution whole genome SNP profiles and the structural variants of different isolates of methicillin-resistant S. aureus (MRSA) and methicillin-susceptible S. aureus (MSSA) strains. The phylogenetic analysis of whole genomes and particular regions may monitor and track the evolution and the transmission dynamic of bacterial pathogens. The computer programs of the whole genome sequencing-based SNP genotyping method are available to the public at https://github.com/cyinbox/NGS.

Genomics

easyGWAS: An integrated interspecies platform for performing genome-wide association studies

451 - Dominik Grimm , Bastian Greshake , Stefan Kleeberger 2012

Motivation: The rapid growth in genome-wide association studies (GWAS) in plants and animals has brought about the need for a central resource that facilitates i) performing GWAS, ii) accessing data and results of other GWAS, and iii) enabling all users regardless of their background to exploit the latest statistical techniques without having to manage complex software and computing resources. Results: We present easyGWAS, a web platform that provides methods, tools and dynamic visualizations to perform and analyze GWAS. In addition, easyGWAS makes it simple to reproduce results of others, validate findings, and access larger sample sizes through merging of public datasets. Availability: Detailed method and data descriptions as well as tutorials are available in the supplementary materials. easyGWAS is available at http://easygwas.tuebingen.mpg.de/. Contact: [email protected]

Genomics Computational Engineering Digital Libraries

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Learning interpretable models of phenotypes from whole genome sequences with the Set Covering Machine

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions