Adaptive machine learning for protein engineering

114 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Kevin Yang

تاريخ النشر 2021

مجال البحث علم الأحياء الهندسة المعلوماتية

والبحث باللغة English

تأليف Brian L. Hie - Kevin K. Yang

الأساليب الكمية التعلم الآلي الجزيئات الحيوية

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Machine-learning models that learn from data to predict how protein sequence encodes function are emerging as a useful protein engineering tool. However, when using these models to suggest new protein designs, one must deal with the vast combinatorial complexity of protein sequences. Here, we review how to use a sequence-to-function machine-learning surrogate model to select sequences for experimental measurement. First, we discuss how to select sequences through a single round of machine-learning optimization. Then, we discuss sequential optimization, where the goal is to discover optimized sequences and improve the model across multiple rounds of training, optimization, and experimental measurement.

قيم البحث

115 - Anna Paola Muntoni , Andrea Pagnani , Martin Weigt 2021

Boltzmann machines are energy-based models that have been shown to provide an accurate statistical description of domains of evolutionary-related protein and RNA families. They are parametrized in terms of local biases accounting for residue conserva tion, and pairwise terms to model epistatic coevolution between residues. From the model parameters, it is possible to extract an accurate prediction of the three-dimensional contact map of the target domain. More recently, the accuracy of these models has been also assessed in terms of their ability in predicting mutational effects and generating in silico functional sequences. Our adaptive implementation of Boltzmann machine learning, adabmDCA, can be generally applied to both protein and RNA families and accomplishes several learning set-ups, depending on the complexity of the input data and on the user requirements. The code is fully available at https://github.com/anna-pa-m/adabmDCA. As an example, we have performed the learning of three Boltzmann machines modeling the Kunitz and Beta-lactamase2 protein domains and TPP-riboswitch RNA domain. The models learned by adabmDCA are comparable to those obtained by state-of-the-art techniques for this task, in terms of the quality of the inferred contact map as well as of the synthetically generated sequences. In addition, the code implements both equilibrium and out-of-equilibrium learning, which allows for an accurate and lossless training when the equilibrium one is prohibitive in terms of computational time, and allows for pruning irrelevant parameters using an information-based criterion.

الأساليب الكمية الأنظمة المضطربة والشبكات العصبية الجزيئات الحيوية

Prediction and optimization of NaV1.7 inhibitors based on machine learning methods

119 - Weikaixin Kong , Xinyu Tu , Zhengwei Xie 2019

We used machine learning methods to predict NaV1.7 inhibitors and found the model RF-CDK that performed best on the imbalanced dataset. Using the RF-CDK model for screening drugs, we got effective compounds K1. We use the cell patch clamp method to v erify K1. However, because the model evaluation method in this article is not comprehensive enough, there is still a lot of research work to be performed, such as comparison with other existing methods. The target protein has multiple active sites and requires our further research. We need more detailed models to consider this biological process and compare it with the current results, which is an error in this article. So we want to withdraw this article.

الأساليب الكمية التعلم الآلي الجزيئات الحيوية

Machine Learning for Classification of Protein Helix Capping Motifs

271 - Sean Mullane , Ruoyan Chen , Sri Vaishnavi Vemulapalli 2019

The biological function of a protein stems from its 3-dimensional structure, which is thermodynamically determined by the energetics of interatomic forces between its amino acid building blocks (the order of amino acids, known as the sequence, define s a protein). Given the costs (time, money, human resources) of determining protein structures via experimental means such as X-ray crystallography, can we better describe and compare protein 3D structures in a robust and efficient manner, so as to gain meaningful biological insights? We begin by considering a relatively simple problem, limiting ourselves to just protein secondary structural elements. Historically, many computational methods have been devised to classify amino acid residues in a protein chain into one of several discrete secondary structures, of which the most well-characterized are the geometrically regular $alpha$-helix and $beta$-sheet; irregular structural patterns, such as turns and loops, are less understood. Here, we present a study of Deep Learning techniques to classify the loop-like end cap structures which delimit $alpha$-helices. Previous work used highly empirical and heuristic methods to manually classify helix capping motifs. Instead, we use structural data directly--including (i) backbone torsion angles computed from 3D structures, (ii) macromolecular feature sets (e.g., physicochemical properties), and (iii) helix cap classification data (from CAPS-DB)--as the ground truth to train a bidirectional long short-term memory (BiLSTM) model to classify helix cap residues. We tried different network architectures and scanned hyperparameters in order to train and assess several models; we also trained a Support Vector Classifier (SVC) to use as a baseline. Ultimately, we achieved 85% class-balanced accuracy with a deep BiLSTM model.

الجزيئات الحيوية التعلم الآلي الأساليب الكمية

Quantitative Protein Dynamics from Dominant Folding Pathways

98 - M. Sega , P. Faccioli , F. Pederiva 2007

We develop a theoretical approach to the protein folding problem based on out-of-equilibrium stochastic dynamics. Within this framework, the computational difficulties related to the existence of large time scale gaps in the protein folding problem a re removed and simulating the entire reaction in atomistic details using existing computers becomes feasible. In addition, this formalism provides a natural framework to investigate the relationships between thermodynamical and kinetic aspects of the folding. For example, it is possible to show that, in order to have a large probability to remain unchanged under Langevin diffusion, the native state has to be characterized by a small conformational entropy. We discuss how to determine the most probable folding pathway, to identify configurations representative of the transition state and to compute the most probable transition time. We perform an illustrative application of these ideas, studying the conformational evolution of alanine di-peptide, within an all-atom model based on the empiric GROMOS96 force field.

الأساليب الكمية مادة مكثفة ناعمة الجزيئات الحيوية

Transfer Learning for Protein Structure Classification at Low Resolution

116 - Alexander Hudson , Shaogang Gong 2020

Structure determination is key to understanding protein function at a molecular level. Whilst significant advances have been made in predicting structure and function from amino acid sequence, researchers must still rely on expensive, time-consuming analytical methods to visualise detailed protein conformation. In this study, we demonstrate that it is possible to make accurate ($geq$80%) predictions of protein class and architecture from structures determined at low ($>$3A) resolution, using a deep convolutional neural network trained on high-resolution ($leq$3A) structures represented as 2D matrices. Thus, we provide proof of concept for high-speed, low-cost protein structure classification at low resolution, and a basis for extension to prediction of function. We investigate the impact of the input representation on classification performance, showing that side-chain information may not be necessary for fine-grained structure predictions. Finally, we confirm that high-resolution, low-resolution and NMR-determined structures inhabit a common feature space, and thus provide a theoretical foundation for boosting with single-image super-resolution.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي الجزيئات الحيوية