Incorporating network based protein complex discovery into automated model construction

118 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Paul Scherer

تاريخ النشر 2020

مجال البحث علم الأحياء الهندسة المعلوماتية

والبحث باللغة English

تأليف Paul Scherer - Maja Trc{e}bacz - Nikola Simidjievski

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We propose a method for gene expression based analysis of cancer phenotypes incorporating network biology knowledge through unsupervised construction of computational graphs. The structural construction of the computational graphs is driven by the use of topological clustering algorithms on protein-protein networks which incorporate inductive biases stemming from network biology research in protein complex discovery. This structurally constrains the hypothesis space over the possible computational graph factorisation whose parameters can then be learned through supervised or unsupervised task settings. The sparse construction of the computational graph enables the differential protein complex activity analysis whilst also interpreting the individual contributions of genes/proteins involved in each individual protein complex. In our experiments analysing a variety of cancer phenotypes, we show that the proposed methods outperform SVM, Fully-Connected MLP, and Randomly-Connected MLPs in all tasks. Our work introduces a scalable method for incorporating large interaction networks as prior knowledge to drive the construction of powerful computational models amenable to introspective study.

قيم البحث

183 - Jie Fang , Jianwu Lin , Shutao Xia 2020

Instead of conducting manual factor construction based on traditional and behavioural finance analysis, academic researchers and quantitative investment managers have leveraged Genetic Programming (GP) as an automatic feature construction tool in rec ent years, which builds reverse polish mathematical expressions from trading data into new factors. However, with the development of deep learning, more powerful feature extraction tools are available. This paper proposes Neural Network-based Automatic Factor Construction (NNAFC), a tailored neural network framework that can automatically construct diversified financial factors based on financial domain knowledge and a variety of neural network structures. The experiment results show that NNAFC can construct more informative and diversified factors than GP, to effectively enrich the current factor pool. For the current market, both fully connected and recurrent neural network structures are better at extracting information from financial time series than convolution neural network structures. Moreover, new factors constructed by NNAFC can always improve the return, Sharpe ratio, and the max draw-down of a multi-factor quantitative investment strategy due to their introducing more information and diversification to the existing factor pool.

التمويل الإحصائي التعلم الآلي

Network Enhancement: a general method to denoise weighted biological networks

145 - Bo Wang , Armin Pourshafeie , Marinka Zitnik 2018

Networks are ubiquitous in biology where they encode connectivity patterns at all scales of organization, from molecular to the biome. However, biological networks are noisy due to the limitations of measurement technology and inherent natural variat ion, which can hamper discovery of network patterns and dynamics. We propose Network Enhancement (NE), a method for improving the signal-to-noise ratio of undirected, weighted networks. NE uses a doubly stochastic matrix operator that induces sparsity and provides a closed-form solution that increases spectral eigengap of the input network. As a result, NE removes weak edges, enhances real connections, and leads to better downstream performance. Experiments show that NE improves gene function prediction by denoising tissue-specific interaction networks, alleviates interpretation of noisy Hi-C contact maps from the human genome, and boosts fine-grained identification accuracy of species. Our results indicate that NE is widely applicable for denoising biological networks.

الشبكات الجزيئية التعلم الآلي الشبكات الاجتماعية والمعلومات

ComPPI, a cellular compartment-specific database for protein-protein interaction network analysis

408 - Daniel V. Veres , David M. Gyurko , Benedek Thaler 2014

Here we present ComPPI, a cellular compartment specific database of proteins and their interactions enabling an extensive, compartmentalized protein-protein interaction network analysis (http://ComPPI.LinkGroup.hu). ComPPI enables the user to filter biologically unlikely interactions, where the two interacting proteins have no common subcellular localizations and to predict novel properties, such as compartment-specific biological functions. ComPPI is an integrated database covering four species (S. cerevisiae, C. elegans, D. melanogaster and H. sapiens). The compilation of nine protein-protein interaction and eight subcellular localization data sets had four curation steps including a manually built, comprehensive hierarchical structure of more than 1600 subcellular localizations. ComPPI provides confidence scores for protein subcellular localizations and protein-protein interactions. ComPPI has user-friendly search options for individual proteins giving their subcellular localization, their interactions and the likelihood of their interactions considering the subcellular localization of their interacting partners. Download options of search results, whole proteomes, organelle-specific interactomes and subcellular localization data are available on its website. Due to its novel features, ComPPI is useful for the analysis of experimental results in biochemistry and molecular biology, as well as for proteome-wide studies in bioinformatics and network science helping cellular biology, medicine and drug design.

الشبكات الجزيئية الفيزياء البيولوجية

A comprehensive statistical study of metabolic and protein-protein interaction network properties

126 - D. Gamermann , J. Triana , R. Jaime 2017

Understanding the mathematical properties of graphs underling biological systems could give hints on the evolutionary mechanisms behind these structures. In this article we perform a complete statistical analysis over thousands of graphs representing metabolic and protein-protein interaction (PPI) networks. First, we investigate the quality of fits obtained for the nodes degree distributions to power-law functions. This analysis suggests that a power-law distribution poorly describes the data except for the far right tail in the case of PPI networks. Next we obtain descriptive statistics for the main graph parameters and try to identify the properties that deviate from the expected values had the networks been built by randomly linking nodes with the same degree distribution. This survey identifies the properties of biological networks which are not solely the result of their degree distribution, but emerge from yet unidentified mechanisms other than those that drive these distributions. The findings suggest that, while PPI networks have properties that differ from their expected values in their randomiz

الشبكات الجزيئية تطبيقات الإحصاء

Autonomous Discovery of Unknown Reaction Pathways from Data by Chemical Reaction Neural Network

132 - Weiqi Ji , Sili Deng 2020

Chemical reactions occur in energy, environmental, biological, and many other natural systems, and the inference of the reaction networks is essential to understand and design the chemical processes in engineering and life sciences. Yet, revealing th e reaction pathways for complex systems and processes is still challenging due to the lack of knowledge of the involved species and reactions. Here, we present a neural network approach that autonomously discovers reaction pathways from the time-resolved species concentration data. The proposed Chemical Reaction Neural Network (CRNN), by design, satisfies the fundamental physics laws, including the Law of Mass Action and the Arrhenius Law. Consequently, the CRNN is physically interpretable such that the reaction pathways can be interpreted, and the kinetic parameters can be quantified simultaneously from the weights of the neural network. The inference of the chemical pathways is accomplished by training the CRNN with species concentration data via stochastic gradient descent. We demonstrate the successful implementations and the robustness of the approach in elucidating the chemical reaction pathways of several chemical engineering and biochemical systems. The autonomous inference by the CRNN approach precludes the need for expert knowledge in proposing candidate networks and addresses the curse of dimensionality in complex systems. The physical interpretability also makes the CRNN capable of not only fitting the data for a given system but also developing knowledge of unknown pathways that could be generalized to similar chemical systems.

الشبكات الجزيئية التعلم الآلي الفيزياء الكيميائية