Functional Protein Structure Annotation Using a Deep Convolutional Generative Adversarial Network

296 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Ethan Moyer

تاريخ النشر 2021

مجال البحث علم الأحياء الهندسة المعلوماتية

والبحث باللغة English

تأليف Ethan Moyer - Jeff Winchell - Isamu Isozaki

الجزيئات الحيوية التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Identifying novel functional protein structures is at the heart of molecular engineering and molecular biology, requiring an often computationally exhaustive search. We introduce the use of a Deep Convolutional Generative Adversarial Network (DCGAN) to classify protein structures based on their functionality by encoding each sample in a grid object structure using three features in each object: the generic atom type, the position atom type, and its occupancy relative to a given atom. We train DCGAN on 3-dimensional (3D) decoy and native protein structures in order to generate and discriminate 3D protein structures. At the end of our training, loss converges to a local minimum and our DCGAN can annotate functional proteins robustly against adversarial protein samples. In the future we hope to extend the novel structures we found from the generator in our DCGAN with more samples to explore more granular functionality with varying functions. We hope that our effort will advance the field of protein structure prediction.

قيم البحث

139 - Sheng Wang , Jian Peng , Jianzhu Ma 2015

Protein secondary structure (SS) prediction is important for studying protein structure and function. When only the sequence (profile) information is used as input feature, currently the best predictors can obtain ~80% Q3 accuracy, which has not been improved in the past decade. Here we present DeepCNF (Deep Convolutional Neural Fields) for protein SS prediction. DeepCNF is a Deep Learning extension of Conditional Neural Fields (CNF), which is an integration of Conditional Random Fields (CRF) and shallow neural networks. DeepCNF can model not only complex sequence-structure relationship by a deep hierarchical architecture, but also interdependency between adjacent SS labels, so it is much more powerful than CNF. Experimental results show that DeepCNF can obtain ~84% Q3 accuracy, ~85% SOV score, and ~72% Q8 accuracy, respectively, on the CASP and CAMEO test proteins, greatly outperforming currently popular predictors. As a general framework, DeepCNF can be used to predict other protein structure properties such as contact number, disorder regions, and solvent accessibility.

الجزيئات الحيوية التعلم الآلي الأساليب الكمية

Functional annotation of creeping bentgrass protein sequences based on convolutional neural network

78 - Han-Yu Jiang , Jun He 2021

Background: Creeping bentgrass (Agrostis soionifera) is a perennial grass of Gramineae, belonging to cold season turfgrass, but has shallow adventitious roots, poor disease-resistance. Little is known about the ISR mechanism of turfgrass and the sign al transduction involved in disease-resistance induction, especially the function of a large number of disease-resistance related proteins are urgent to be explored. Results: In this work, the protein sequences of creeping bentgrass were measured and annotated by a functional prediction model based on convolutional neural network. Creeping bentgrass seedlings were grown with BDO treatment, and the ISR response was induced by infecting Rhizoctonia solani. We preformed the transcriptome analysis by Illumina Sequencing and high-quality unigenes were obtained. A minority of assembled unigenes were functionally annotated according to the database alignment while a large part of the obtained amino acid sequences was left non-annotated. To treat the non-annotated sequences, a prediction model was established by training the data set from GO families in three domains to acquire good performance, especially the higher false positive control rate. With such model, we analyzed the non-annotated protein sequences of creeping bentgrass transcriptome, and annotated the disease-resistance response and signal transduction related proteins. Conclusions: The results provide good candidates of the proteins with certain functions. With the results in this work, the waste of transcriptome sequencing data of creeping bentgrass can be avoided, and research time and labor for the analysis of ISR characteristics of creeping bentgrass will be saved in further research. It also provides reference for the sequence analysis of turfgrass disease-resistance research.

الجينوم

Deep Generative Model Driven Protein Folding Simulation

101 - Heng Ma , Debsindhu Bhowmik , Hyungro Lee 2019

Significant progress in computer hardware and software have enabled molecular dynamics (MD) simulations to model complex biological phenomena such as protein folding. However, enabling MD simulations to access biologically relevant timescales (e.g., beyond milliseconds) still remains challenging. These limitations include (1) quantifying which set of states have already been (sufficiently) sampled in an ensemble of MD runs, and (2) identifying novel states from which simulations can be initiated to sample rare events (e.g., sampling folding events). With the recent success of deep learning and artificial intelligence techniques in analyzing large datasets, we posit that these techniques can also be used to adaptively guide MD simulations to model such complex biological phenomena. Leveraging our recently developed unsupervised deep learning technique to cluster protein folding trajectories into partially folded intermediates, we build an iterative workflow that enables our generative model to be coupled with all-atom MD simulations to fold small protein systems on emerging high performance computing platforms. We demonstrate our approach in folding Fs-peptide and the $betabetaalpha$ (BBA) fold, FSD-EY. Our adaptive workflow enables us to achieve an overall root-mean squared deviation (RMSD) to the native state of 1.6$~AA$ and 4.4~$AA$ respectively for Fs-peptide and FSD-EY. We also highlight some emerging challenges in the context of designing scalable workflows when data intensive deep learning techniques are coupled to compute intensive MD simulations.

الجزيئات الحيوية

Protein-RNA interaction prediction with deep learning: Structure matters

156 - Junkang Wei , Siyuan Chen , Licheng Zong 2021

Protein-RNA interactions are of vital importance to a variety of cellular activities. Both experimental and computational techniques have been developed to study the interactions. Due to the limitation of the previous database, especially the lack of protein structure data, most of the existing computational methods rely heavily on the sequence data, with only a small portion of the methods utilizing the structural information. Recently, AlphaFold has revolutionized the entire protein and biology field. Foreseeably, the protein-RNA interaction prediction will also be promoted significantly in the upcoming years. In this work, we give a thorough review of this field, surveying both the binding site and binding preference prediction problems and covering the commonly used datasets, features, and models. We also point out the potential challenges and opportunities in this field. This survey summarizes the development of the RBP-RNA interaction field in the past and foresees its future development in the post-AlphaFold era.

الجزيئات الحيوية التعلم الآلي الحوسبة العصبية والتطورية

TALI: Protein Structure Alignment Using Backbone Torsion Angles

125 - Xijiang Miao , Michael G. Bryson , Homayoun Valafar 2020

This article introduces a novel protein structure alignment method (named TALI) based on the protein backbone torsion angle instead of the more traditional distance matrix. Because the structural alignment of the two proteins is based on the comparis on of two sequences of numbers (backbone torsion angles), we can take advantage of a large number of well-developed methods such as Smith-Waterman or Needleman-Wunsch. Here we report the result of TALI in comparison to other structure alignment methods such as DALI, CE, and SSM ass well as sequence alignment based on PSI-BLAST. TALI demonstrated great success over all other methods in application to challenging proteins. TALI was more successful in recognizing remote structural homology. TALI also demonstrated an ability to identify structural homology between two proteins where the structural difference was due to a rotation of internal domains by nearly 180$^circ$.

الجزيئات الحيوية التعلم الآلي