Using ontology embeddings for structural inductive bias in gene expression data analysis

112 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Maja Tr\\k{e}bacz

تاريخ النشر 2020

مجال البحث علم الأحياء الهندسة المعلوماتية

والبحث باللغة English

تأليف Maja Trk{e}bacz - Zohreh Shams - Mateja Jamnik

الجينوم التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Stratifying cancer patients based on their gene expression levels allows improving diagnosis, survival analysis and treatment planning. However, such data is extremely highly dimensional as it contains expression values for over 20000 genes per patient, and the number of samples in the datasets is low. To deal with such settings, we propose to incorporate prior biological knowledge about genes from ontologies into the machine learning system for the task of patient classification given their gene expression data. We use ontology embeddings that capture the semantic similarities between the genes to direct a Graph Convolutional Network, and therefore sparsify the network connections. We show this approach provides an advantage for predicting clinical targets from high-dimensional low-sample data.

قيم البحث

89 - Erik Andries 2006

Linear discrimination, from the point of view of numerical linear algebra, can be treated as solving an ill-posed system of linear equations. In order to generate a solution that is robust in the presence of noise, these problems require regularizati on. Here, we examine the ill-posedness involved in the linear discrimination of cancer gene expression data with respect to outcome and tumor subclasses. We show that a filter factor representation, based upon Singular Value Decomposition, yields insight into the numerical ill-posedness of the hyperplane-based separation when applied to gene expression data. We also show that this representation yields useful diagnostic tools for guiding the selection of classifier parameters, thus leading to improved performance.

الجينوم

Prediction of gene expression time series and structural analysis of gene regulatory networks using recurrent neural networks

131 - Michele Monti , Jonathan Fiorentino , Edoardo Milanetti 2021

Methods for time series prediction and classification of gene regulatory networks (GRNs) from gene expression data have been treated separately so far. The recent emergence of attention-based recurrent neural networks (RNN) models boosted the interpr etability of RNN parameters, making them appealing for the understanding of gene interactions. In this work, we generated synthetic time series gene expression data from a range of archetypal GRNs and we relied on a dual attention RNN to predict the gene temporal dynamics. We show that the prediction is extremely accurate for GRNs with different architectures. Next, we focused on the attention mechanism of the RNN and, using tools from graph theory, we found that its graph properties allow to hierarchically distinguish different architectures of the GRN. We show that the GRNs respond differently to the addition of noise in the prediction by the RNN and we relate the noise response to the analysis of the attention mechanism. In conclusion, this work provides a a way to understand and exploit the attention mechanism of RNN and it paves the way to RNN-based methods for time series prediction and inference of GRNs from gene expression data.

الفيزياء البيولوجية التعلم الآلي تحليل البيانات والإحصاءات والاحتمال

Predicting Toxicity from Gene Expression with Neural Networks

156 - Peter Eastman , Vijay S. Pande 2019

We train a neural network to predict chemical toxicity based on gene expression data. The input to the network is a full expression profile collected either in vitro from cultured cells or in vivo from live animals. The output is a set of fine graine d predictions for the presence of a variety of pathological effects in treated animals. When trained on the Open TG-GATEs database it produces good results, outperforming classical models trained on the same data. This is a promising approach for efficiently screening chemicals for toxic effects, and for more accurately evaluating drug candidates based on preclinical data.

الجينوم

Predicting Gene Expression Between Species with Neural Networks

121 - Peter Eastman , Vijay S. Pande 2019

We train a neural network to predict human gene expression levels based on experimental data for rat cells. The network is trained with paired human/rat samples from the Open TG-GATES database, where paired samples were treated with the same compound at the same dose. When evaluated on a test set of held out compounds, the network successfully predicts human expression levels. On the majority of the test compounds, the list of differentially expressed genes determined from predicted expression levels agrees well with the list of differentially expressed genes determined from actual human experimental data.

الجينوم

Two distinct logical types of network control in gene expression profiles

700 - Carsten Marr , Marcel Geertz , Marc-Thorsten Huett 2007

In unicellular organisms such as bacteria the same acquired mutations beneficial in one environment can be restrictive in another. However, evolving Escherichia coli populations demonstrate remarkable flexibility in adaptation. The mechanisms sustain ing genetic flexibility remain unclear. In E. coli the transcriptional regulation of gene expression involves both dedicated regulators binding specific DNA sites with high affinity and also global regulators - abundant DNA architectural proteins of the bacterial chromoid binding multiple low affinity sites and thus modulating the superhelical density of DNA. The first form of transcriptional regulation is dominantly pairwise and specific, representing digitial control, while the second form is (in strength and distribution) continuous, representing analog control. Here we look at the properties of effective networks derived from significant gene expression changes under variation of the two forms of control and find that upon limitations of one type of control (caused e.g. by mutation of a global DNA architectural factor) the other type can compensate for compromised regulation. Mutations of global regulators significantly enhance the digital control; in the presence of global DNA architectural proteins regulation is mostly of the analog type, coupling spatially neighboring genomic loci; together our data suggest that two logically distinct types of control are balancing each other. By revealing two distinct logical types of control, our approach provides basic insights into both the organizational principles of transcriptional regulation and the mechanisms buffering genetic flexibility. We anticipate that the general concept of distinguishing logical types of control will apply to many complex biological networks.

الجينوم الشبكات الجزيئية