Optimal transport-based machine learning to match specific expression patterns in omics data

184 0 0.0 ( 0 )

Download Cite

Added by Thi Thanh Yen Nguyen

Publication date 2021

fields Biology

and research's language is English

Authors Thi Thanh Yen Nguyen

Genomics

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We present two algorithms designed to learn a pattern of correspondence between two data sets in situations where it is desirable to match elements that exhibit an affine relationship. In the motivating case study, the challenge is to better understand micro-RNA (miRNA) regulation in the striatum of Huntingtons disease (HD) model mice. The two data sets contain miRNA and messenger-RNA (mRNA) data, respectively, each data point consisting in a multi-dimensional profile. The biological hypothesis is that if a miRNA induces the degradation of a target mRNA or blocks its translation into proteins, or both, then the profile of the former should be similar to minus the profile of the latter (a particular form of affine relationship). The algorithms unfold in two stages. During the first stage, an optimal transport plan P and an optimal affine transformation are learned, using the Sinkhorn-Knopp algorithm and a mini-batch gradient descent. During the second stage, P is exploited to derive either several co-clusters or several sets of matched elements. A simulation study illustrates how the algorithms work and perform. A brief summary of the real data application in the motivating case-study further illustrates the applicability and interest of the algorithms.

rate research

Graph Representation Learning on Tissue-Specific Multi-Omics

343 - Amine Amor 2021

Combining different modalities of data from human tissues has been critical in advancing biomedical research and personalised medical care. In this study, we leverage a graph embedding model (i.e VGAE) to perform link prediction on tissue-specific Gene-Gene Interaction (GGI) networks. Through ablation experiments, we prove that the combination of multiple biological modalities (i.e multi-omics) leads to powerful embeddings and better link prediction performances. Our evaluation shows that the integration of gene methylation profiles and RNA-sequencing data significantly improves the link prediction performance. Overall, the combination of RNA-sequencing and gene methylation data leads to a link prediction accuracy of 71% on GGI networks. By harnessing graph representation learning on multi-omics data, our work brings novel insights to the current literature on multi-omics integration in bioinformatics.

Genomics Machine Learning Applications

FUNKI: Interactive functional footprint-based analysis of omics data

186 - Rosa Hernansaiz-Ballesteros , Christian H. Holland , Aurelien Dugourd 2021

Motivation: Omics data, such as transcriptomics or phosphoproteomics, are broadly used to get a snap-shot of the molecular status of cells. In particular, changes in omics can be used to estimate the activity of pathways, transcription factors and kinases based on known regulated targets, that we call footprints. Then the molecular paths driving these activities can be estimated using causal reasoning on large signaling networks. Results: We have developed FUNKI, a FUNctional toolKIt for footprint analysis. It provides a user-friendly interface for an easy and fast analysis of several omics data, either from bulk or single-cell experiments. FUNKI also features different options to visualise the results and run post-analyses, and is mirrored as a scripted version in R. Availability: FUNKI is a free and open-source application built on R and Shiny, available in GitHub at https://github.com/saezlab/ShinyFUNKI under GNU v3.0 license and accessible also in https://saezlab.shinyapps.io/funki/ Contact: [email protected] Supplementary information: We provide data examples within the app, as well as extensive information about the different variables to select, the results, and the different plots in the help page.

Genomics Computational Engineering

Strategies to integrate multi-omics data for patient survival prediction

116 - Lana X Garmire 2020

Genomics, especially multi-omics, has made precision medicine feasible. The completion and publicly accessible multi-omics resource with clinical outcome, such as The Cancer Genome Atlas (TCGA) is a great test bed for developing computational methods that integrate multi-omics data to predict patient cancer phenotypes. We have been utilizing TCGA multi-omics data to predict cancer patient survival, using a variety of approaches, including prior-biological knowledge (such as pathways), and more recently, deep-learning methods. Over time, we have developed methods such as Cox-nnet, DeepProg, and two-stage Cox-nnet, to address the challenges due to multi-omics and multi-modality. Despite the limited sample size (hundreds to thousands) in the training datasets as well as the heterogeneity nature of human populations, these methods have shown significance and robustness at predicting patient survival in independent population cohorts. In the following, we would describe in detail these methodologies, the modeling results, and important biological insights revealed by these methods.

Genomics

Regularization Strategies for Hyperplane Classifiers: Application to Cancer Classification with Gene Expression Data

89 - Erik Andries 2006

Linear discrimination, from the point of view of numerical linear algebra, can be treated as solving an ill-posed system of linear equations. In order to generate a solution that is robust in the presence of noise, these problems require regularization. Here, we examine the ill-posedness involved in the linear discrimination of cancer gene expression data with respect to outcome and tumor subclasses. We show that a filter factor representation, based upon Singular Value Decomposition, yields insight into the numerical ill-posedness of the hyperplane-based separation when applied to gene expression data. We also show that this representation yields useful diagnostic tools for guiding the selection of classifier parameters, thus leading to improved performance.

Genomics

Using ontology embeddings for structural inductive bias in gene expression data analysis

111 - Maja Trk{e}bacz , Zohreh Shams , Mateja Jamnik 2020

Stratifying cancer patients based on their gene expression levels allows improving diagnosis, survival analysis and treatment planning. However, such data is extremely highly dimensional as it contains expression values for over 20000 genes per patient, and the number of samples in the datasets is low. To deal with such settings, we propose to incorporate prior biological knowledge about genes from ontologies into the machine learning system for the task of patient classification given their gene expression data. We use ontology embeddings that capture the semantic similarities between the genes to direct a Graph Convolutional Network, and therefore sparsify the network connections. We show this approach provides an advantage for predicting clinical targets from high-dimensional low-sample data.

Genomics Machine Learning