ترغب بنشر مسار تعليمي؟ اضغط هنا

Predicting inhibitors for SARS-CoV-2 RNA-dependent RNA polymerase using machine learning and virtual screening

104   0   0.0 ( 0 )
 نشر من قبل Nazim Medzhidov
 تاريخ النشر 2020
  مجال البحث علم الأحياء
والبحث باللغة English
 تأليف Romeo Cozac Elix




اسأل ChatGPT حول البحث

Global coronavirus disease pandemic (COVID-19) caused by newly identified SARS- CoV-2 coronavirus continues to claim the lives of thousands of people worldwide. The unavailability of specific medications to treat COVID-19 has led to drug repositioning efforts using various approaches, including computational analyses. Such analyses mostly rely on molecular docking and require the 3D structure of the target protein to be available. In this study, we utilized a set of machine learning algorithms and trained them on a dataset of RNA-dependent RNA polymerase (RdRp) inhibitors to run inference analyses on antiviral and anti-inflammatory drugs solely based on the ligand information. We also performed virtual screening analysis of the drug candidates predicted by machine learning models and docked them against the active site of SARS- CoV-2 RdRp, a key component of the virus replication machinery. Based on the ligand information of RdRp inhibitors, the machine learning models were able to identify candidates such as remdesivir and baloxavir marboxil, molecules with documented activity against RdRp of the novel coronavirus. Among the other identified drug candidates were beclabuvir, a non-nucleoside inhibitor of the hepatitis C virus (HCV) RdRp enzyme, and HCV protease inhibitors paritaprevir and faldaprevir. Further analysis of these candidates using molecular docking against the SARS-CoV-2 RdRp revealed low binding energies against the enzyme active site. Our approach also identified anti-inflammatory drugs lupeol, lifitegrast, antrafenine, betulinic acid, and ursolic acid to have potential activity against SARS-CoV-2 RdRp. We propose that the results of this study are considered for further validation as potential therapeutic options against COVID-19.



قيم البحث

اقرأ أيضاً

96 - Jingwei Liu 2021
CovID-19 genetics analysis is critical to determine virus type,virus variant and evaluate vaccines. In this paper, SARS-Cov-2 RNA sequence analysis relative to region or territory is investigated. A uniform framework of sequence SVM model with variou s genetics length from short to long and mixed-bases is developed by projecting SARS-Cov-2 RNA sequence to different dimensional space, then scoring it according to the output probability of pre-trained SVM models to explore the territory or origin information of SARS-Cov-2. Different sample size ratio of training set and test set is also discussed in the data analysis. Two SARS-Cov-2 RNA classification tasks are constructed based on GISAID database, one is for mainland, Hongkong and Taiwan of China, and the other is a 6-class classification task (Africa, Asia, Europe, North American, South American& Central American, Ocean) of 7 continents. For 3-class classification of China, the Top-1 accuracy rate can reach 82.45% (train 60%, test=40%); For 2-class classification of China, the Top-1 accuracy rate can reach 97.35% (train 80%, test 20%); For 6-class classification task of world, when the ratio of training set and test set is 20% : 80% , the Top-1 accuracy rate can achieve 30.30%. And, some Top-N results are also given.
Structure-based Deep Fusion models were recently shown to outperform several physics- and machine learning-based protein-ligand binding affinity prediction methods. As part of a multi-institutional COVID-19 pandemic response, over 500 million small m olecules were computationally screened against four protein structures from the novel coronavirus (SARS-CoV-2), which causes COVID-19. Three enhancements to Deep Fusion were made in order to evaluate more than 5 billion docked poses on SARS-CoV-2 protein targets. First, the Deep Fusion concept was refined by formulating the architecture as one, coherently backpropagated model (Coherent Fusion) to improve binding-affinity prediction accuracy. Secondly, the model was trained using a distributed, genetic hyper-parameter optimization. Finally, a scalable, high-throughput screening capability was developed to maximize the number of ligands evaluated and expedite the path to experimental evaluation. In this work, we present both the methods developed for machine learning-based high-throughput screening and results from using our computational pipeline to find SARS-CoV-2 inhibitors.
182 - Zixuan Cang , Lin Mu , Guowei Wei 2017
This work introduces a number of algebraic topology approaches, such as multicomponent persistent homology, multi-level persistent homology and electrostatic persistence for the representation, characterization, and description of small molecules and biomolecular complexes. Multicomponent persistent homology retains critical chemical and biological information during the topological simplification of biomolecular geometric complexity. Multi-level persistent homology enables a tailored topological description of inter- and/or intra-molecular interactions of interest. Electrostatic persistence incorporates partial charge information into topological invariants. These topological methods are paired with Wasserstein distance to characterize similarities between molecules and are further integrated with a variety of machine learning algorithms, including k-nearest neighbors, ensemble of trees, and deep convolutional neural networks, to manifest their descriptive and predictive powers for chemical and biological problems. Extensive numerical experiments involving more than 4,000 protein-ligand complexes from the PDBBind database and near 100,000 ligands and decoys in the DUD database are performed to test respectively the scoring power and the virtual screening power of the proposed topological approaches. It is demonstrated that the present approaches outperform the modern machine learning based methods in protein-ligand binding affinity predictions and ligand-decoy discrimination.
Single-nucleotide-resolution chemical mapping for structured RNA is being rapidly advanced by new chemistries, faster readouts, and coupling to computational algorithms. Recent tests have shown that selective 2-hydroxyl acylation by primer extension (SHAPE) can give near-zero error rates (0-2%) in modeling the helices of RNA secondary structure. Here, we benchmark the method using six molecules for which crystallographic data are available: tRNA(phe) and 5S rRNA from Escherichia coli, the P4-P6 domain of the Tetrahymena group I ribozyme, and ligand-bound domains from riboswitches for adenine, cyclic di-GMP, and glycine. SHAPE-directed modeling of these highly structured RNAs gave an overall false negative rate (FNR) of 17% and a false discovery rate (FDR) of 21%, with at least one helix prediction error in five of the six cases. Extensive variations of data processing, normalization, and modeling parameters did not significantly mitigate modeling errors. Only one varation, filtering out data collected with deoxyinosine triphosphate during primer extension, gave a modest improvement (FNR = 12%, and FDR = 14%). The residual structure modeling errors are explained by the insufficient information content of these RNAs SHAPE data, as evaluated by a nonparametric bootstrapping analysis. Beyond these benchmark cases, bootstrapping suggests a low level of confidence (<50%) in the majority of helices in a previously proposed SHAPE-directed model for the HIV-1 RNA genome. Thus, SHAPE-directed RNA modeling is not always unambiguous, and helix-by-helix confidence estimates, as described herein, may be critical for interpreting results from this powerful methodology.
We propose a benchmark to study surrogate model accuracy for protein-ligand docking. We share a dataset consisting of 200 million 3D complex structures and 2D structure scores across a consistent set of 13 million in-stock molecules over 15 receptors , or binding sites, across the SARS-CoV-2 proteome. Our work shows surrogate docking models have six orders of magnitude more throughput than standard docking protocols on the same supercomputer node types. We demonstrate the power of high-speed surrogate models by running each target against 1 billion molecules in under a day (50k predictions per GPU seconds). We showcase a workflow for docking utilizing surrogate ML models as a pre-filter. Our workflow is ten times faster at screening a library of compounds than the standard technique, with an error rate less than 0.01% of detecting the underlying best scoring 0.1% of compounds. Our analysis of the speedup explains that to screen more molecules under a docking paradigm, another order of magnitude speedup must come from model accuracy rather than computing speed (which, if increased, will not anymore alter our throughput to screen molecules). We believe this is strong evidence for the community to begin focusing on improving the accuracy of surrogate models to improve the ability to screen massive compound libraries 100x or even 1000x faster than current techniques.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا