Combating small molecule aggregation with machine learning

73 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Tiago Rodrigues

تاريخ النشر 2021

مجال البحث علم الأحياء الهندسة المعلوماتية

والبحث باللغة English

تأليف Kuan Lee - Ann Yang - Yen-Chu Lin

الأساليب الكمية التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Biological screens are plagued by false positive hits resulting from aggregation. Thus, methods to triage small colloidally aggregating molecules (SCAMs) are in high demand. Herein, we disclose a bespoke machine-learning tool to confidently and intelligibly flag such entities. Our data demonstrate an unprecedented utility of machine learning for predicting SCAMs, achieving 80% of correct predictions in a challenging out-of-sample validation. The tool outperformed a panel of expert chemists, who correctly predicted 61 +/- 7% of the same test molecules in a Turing-like test. Further, the computational routine provided insight into molecular features governing aggregation that had remained hidden to expert intuition. Leveraging our tool, we quantify that up to 15-20% of ligands in publicly available chemogenomic databases have the high potential to aggregate at typical screening concentrations, imposing caution in systems biology and drug design programs. Our approach provides a means to augment human intuition, mitigate attrition and a pathway to accelerate future molecular medicine.

قيم البحث

113 - Brian L. Hie , Kevin K. Yang 2021

Machine-learning models that learn from data to predict how protein sequence encodes function are emerging as a useful protein engineering tool. However, when using these models to suggest new protein designs, one must deal with the vast combinatoria l complexity of protein sequences. Here, we review how to use a sequence-to-function machine-learning surrogate model to select sequences for experimental measurement. First, we discuss how to select sequences through a single round of machine-learning optimization. Then, we discuss sequential optimization, where the goal is to discover optimized sequences and improve the model across multiple rounds of training, optimization, and experimental measurement.

الأساليب الكمية التعلم الآلي الجزيئات الحيوية

Machine Learning and Glioblastoma: Treatment Response Monitoring Biomarkers in 2021

126 - Thomas Booth , Bernice Akpinar , Andrei Roman 2021

The aim of the systematic review was to assess recently published studies on diagnostic test accuracy of glioblastoma treatment response monitoring biomarkers in adults, developed through machine learning (ML). Articles were searched for using MEDLIN E, EMBASE, and the Cochrane Register. Included study participants were adult patients with high grade glioma who had undergone standard treatment (maximal resection, radiotherapy with concomitant and adjuvant temozolomide) and subsequently underwent follow-up imaging to determine treatment response status. Risk of bias and applicability was assessed with QUADAS 2 methodology. Contingency tables were created for hold-out test sets and recall, specificity, precision, F1-score, balanced accuracy calculated. Fifteen studies were included with 1038 patients in training sets and 233 in test sets. To determine whether there was progression or a mimic, the reference standard combination of follow-up imaging and histopathology at re-operation was applied in 67% of studies. The small numbers of patient included in studies, the high risk of bias and concerns of applicability in the study designs (particularly in relation to the reference standard and patient selection due to confounding), and the low level of evidence, suggest that limited conclusions can be drawn from the data. There is likely good diagnostic performance of machine learning models that use MRI features to distinguish between progression and mimics. The diagnostic performance of ML using implicit features did not appear to be superior to ML using explicit features. There are a range of ML-based solutions poised to become treatment response monitoring biomarkers for glioblastoma. To achieve this, the development and validation of ML models require large, well-annotated datasets where the potential for confounding in the study design has been carefully considered.

الأساليب الكمية التعلم الآلي

Dual-view Molecule Pre-training

155 - Jinhua Zhu , Yingce Xia , Tao Qin 2021

Inspired by its success in natural language processing and computer vision, pre-training has attracted substantial attention in cheminformatics and bioinformatics, especially for molecule based tasks. A molecule can be represented by either a graph ( where atoms are connected by bonds) or a SMILES sequence (where depth-first-search is applied to the molecular graph with specific rules). Existing works on molecule pre-training use either graph representations only or SMILES representations only. In this work, we propose to leverage both the representations and design a new pre-training algorithm, dual-view molecule pre-training (briefly, DMP), that can effectively combine the strengths of both types of molecule representations. The model of DMP consists of two branches: a Transformer branch that takes the SMILES sequence of a molecule as input, and a GNN branch that takes a molecular graph as input. The training of DMP contains three tasks: (1) predicting masked tokens in a SMILES sequence by the Transformer branch, (2) predicting masked atoms in a molecular graph by the GNN branch, and (3) maximizing the consistency between the two high-level representations output by the Transformer and GNN branches separately. After pre-training, we can use either the Transformer branch (this one is recommended according to empirical results), the GNN branch, or both for downstream tasks. DMP is tested on nine molecular property prediction tasks and achieves state-of-the-art performances on seven of them. Furthermore, we test DMP on three retrosynthesis tasks and achieve state-of-the-result on the USPTO-full dataset. Our code will be released soon.

الأساليب الكمية التعلم الآلي

Prediction and optimization of NaV1.7 inhibitors based on machine learning methods

119 - Weikaixin Kong , Xinyu Tu , Zhengwei Xie 2019

We used machine learning methods to predict NaV1.7 inhibitors and found the model RF-CDK that performed best on the imbalanced dataset. Using the RF-CDK model for screening drugs, we got effective compounds K1. We use the cell patch clamp method to v erify K1. However, because the model evaluation method in this article is not comprehensive enough, there is still a lot of research work to be performed, such as comparison with other existing methods. The target protein has multiple active sites and requires our further research. We need more detailed models to consider this biological process and compare it with the current results, which is an error in this article. So we want to withdraw this article.

الأساليب الكمية التعلم الآلي الجزيئات الحيوية

Molecular modeling with machine-learned universal potential functions

126 - Ke Liu , Zekun Ni , Zhenyu Zhou 2021

Molecular modeling is an important topic in drug discovery. Decades of research have led to the development of high quality scalable molecular force fields. In this paper, we show that neural networks can be used to train a universal approximator for energy potential functions. By incorporating a fully automated training process we have been able to train smooth, differentiable, and predictive potential functions on large-scale crystal structures. A variety of tests have also been performed to show the superiority and versatility of the machine-learned model.

الأساليب الكمية التعلم الآلي