Transfer Learning for Protein Structure Classification at Low Resolution

117 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Alexander Hudson

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Alexander Hudson - Shaogang Gong

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Structure determination is key to understanding protein function at a molecular level. Whilst significant advances have been made in predicting structure and function from amino acid sequence, researchers must still rely on expensive, time-consuming analytical methods to visualise detailed protein conformation. In this study, we demonstrate that it is possible to make accurate ($geq$80%) predictions of protein class and architecture from structures determined at low ($>$3A) resolution, using a deep convolutional neural network trained on high-resolution ($leq$3A) structures represented as 2D matrices. Thus, we provide proof of concept for high-speed, low-cost protein structure classification at low resolution, and a basis for extension to prediction of function. We investigate the impact of the input representation on classification performance, showing that side-chain information may not be necessary for fine-grained structure predictions. Finally, we confirm that high-resolution, low-resolution and NMR-determined structures inhabit a common feature space, and thus provide a theoretical foundation for boosting with single-image super-resolution.

قيم البحث

358 - Hongyu Guo , Khalique Newaz , Scott Emrich 2019

As proteins with similar structures often have similar functions, analysis of protein structures can help predict protein functions and is thus important. We consider the problem of protein structure classification, which computationally classifies t he structures of proteins into pre-defined groups. We develop a weighted network that depicts the protein structures, and more importantly, we propose the first graphlet-based measure that applies to weighted networks. Further, we develop a deep neural network (DNN) composed of both convolutional and recurrent layers to use this measure for classification. Put together, our approach shows dramatic improvements in performance over existing graphlet-based approaches on 36 real datasets. Even comparing with the state-of-the-art approach, it almost halves the classification error. In addition to protein structure networks, our weighted-graphlet measure and DNN classifier can potentially be applied to classification of other weighted networks in computational biology as well as in other domains.

التعلم الالي التعلم الآلي الجزيئات الحيوية

Adaptive machine learning for protein engineering

113 - Brian L. Hie , Kevin K. Yang 2021

Machine-learning models that learn from data to predict how protein sequence encodes function are emerging as a useful protein engineering tool. However, when using these models to suggest new protein designs, one must deal with the vast combinatoria l complexity of protein sequences. Here, we review how to use a sequence-to-function machine-learning surrogate model to select sequences for experimental measurement. First, we discuss how to select sequences through a single round of machine-learning optimization. Then, we discuss sequential optimization, where the goal is to discover optimized sequences and improve the model across multiple rounds of training, optimization, and experimental measurement.

الأساليب الكمية التعلم الآلي الجزيئات الحيوية

Machine Learning for Classification of Protein Helix Capping Motifs

271 - Sean Mullane , Ruoyan Chen , Sri Vaishnavi Vemulapalli 2019

The biological function of a protein stems from its 3-dimensional structure, which is thermodynamically determined by the energetics of interatomic forces between its amino acid building blocks (the order of amino acids, known as the sequence, define s a protein). Given the costs (time, money, human resources) of determining protein structures via experimental means such as X-ray crystallography, can we better describe and compare protein 3D structures in a robust and efficient manner, so as to gain meaningful biological insights? We begin by considering a relatively simple problem, limiting ourselves to just protein secondary structural elements. Historically, many computational methods have been devised to classify amino acid residues in a protein chain into one of several discrete secondary structures, of which the most well-characterized are the geometrically regular $alpha$-helix and $beta$-sheet; irregular structural patterns, such as turns and loops, are less understood. Here, we present a study of Deep Learning techniques to classify the loop-like end cap structures which delimit $alpha$-helices. Previous work used highly empirical and heuristic methods to manually classify helix capping motifs. Instead, we use structural data directly--including (i) backbone torsion angles computed from 3D structures, (ii) macromolecular feature sets (e.g., physicochemical properties), and (iii) helix cap classification data (from CAPS-DB)--as the ground truth to train a bidirectional long short-term memory (BiLSTM) model to classify helix cap residues. We tried different network architectures and scanned hyperparameters in order to train and assess several models; we also trained a Support Vector Classifier (SVC) to use as a baseline. Ultimately, we achieved 85% class-balanced accuracy with a deep BiLSTM model.

الجزيئات الحيوية التعلم الآلي الأساليب الكمية

Multi-resolution Outlier Pooling for Sorghum Classification

286 - Chao Ren , Justin Dulay , Gregory Rolwes 2021

Automated high throughput plant phenotyping involves leveraging sensors, such as RGB, thermal and hyperspectral cameras (among others), to make large scale and rapid measurements of the physical properties of plants for the purpose of better understa nding the difference between crops and facilitating rapid plant breeding programs. One of the most basic phenotyping tasks is to determine the cultivar, or species, in a particular sensor product. This simple phenotype can be used to detect errors in planting and to learn the most differentiating features between cultivars. It is also a challenging visual recognition task, as a large number of highly related crops are grown simultaneously, leading to a classification problem with low inter-class variance. In this paper, we introduce the Sorghum-100 dataset, a large dataset of RGB imagery of sorghum captured by a state-of-the-art gantry system, a multi-resolution network architecture that learns both global and fine-grained features on the crops, and a new global pooling strategy called Dynamic Outlier Pooling which outperforms standard global pooling strategies on this task.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

Deep Learning Classification of Lake Zooplankton

70 - S. P. Kyathanahally , T. Hardeman , E. Merz 2021

Plankton are effective indicators of environmental change and ecosystem health in freshwater habitats, but collection of plankton data using manual microscopic methods is extremely labor-intensive and expensive. Automated plankton imaging offers a pr omising way forward to monitor plankton communities with high frequency and accuracy in real-time. Yet, manual annotation of millions of images proposes a serious challenge to taxonomists. Deep learning classifiers have been successfully applied in various fields and provided encouraging results when used to categorize marine plankton images. Here, we present a set of deep learning models developed for the identification of lake plankton, and study several strategies to obtain optimal performances,which lead to operational prescriptions for users. To this aim, we annotated into 35 classes over 17900 images of zooplankton and large phytoplankton colonies, detected in Lake Greifensee (Switzerland) with the Dual Scripps Plankton Camera. Our best models were based on transfer learning and ensembling, which classified plankton images with 98% accuracy and 93% F1 score. When tested on freely available plankton datasets produced by other automated imaging tools (ZooScan, FlowCytobot and ISIIS), our models performed better than previously used models. Our annotated data, code and classification models are freely available online.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي السكان والتطور