ترغب بنشر مسار تعليمي؟ اضغط هنا

Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening

183   0   0.0 ( 0 )
 نشر من قبل Zixuan Cang
 تاريخ النشر 2017
  مجال البحث علم الأحياء
والبحث باللغة English




اسأل ChatGPT حول البحث

This work introduces a number of algebraic topology approaches, such as multicomponent persistent homology, multi-level persistent homology and electrostatic persistence for the representation, characterization, and description of small molecules and biomolecular complexes. Multicomponent persistent homology retains critical chemical and biological information during the topological simplification of biomolecular geometric complexity. Multi-level persistent homology enables a tailored topological description of inter- and/or intra-molecular interactions of interest. Electrostatic persistence incorporates partial charge information into topological invariants. These topological methods are paired with Wasserstein distance to characterize similarities between molecules and are further integrated with a variety of machine learning algorithms, including k-nearest neighbors, ensemble of trees, and deep convolutional neural networks, to manifest their descriptive and predictive powers for chemical and biological problems. Extensive numerical experiments involving more than 4,000 protein-ligand complexes from the PDBBind database and near 100,000 ligands and decoys in the DUD database are performed to test respectively the scoring power and the virtual screening power of the proposed topological approaches. It is demonstrated that the present approaches outperform the modern machine learning based methods in protein-ligand binding affinity predictions and ligand-decoy discrimination.

قيم البحث

اقرأ أيضاً

103 - Romeo Cozac Elix 2020
Global coronavirus disease pandemic (COVID-19) caused by newly identified SARS- CoV-2 coronavirus continues to claim the lives of thousands of people worldwide. The unavailability of specific medications to treat COVID-19 has led to drug repositionin g efforts using various approaches, including computational analyses. Such analyses mostly rely on molecular docking and require the 3D structure of the target protein to be available. In this study, we utilized a set of machine learning algorithms and trained them on a dataset of RNA-dependent RNA polymerase (RdRp) inhibitors to run inference analyses on antiviral and anti-inflammatory drugs solely based on the ligand information. We also performed virtual screening analysis of the drug candidates predicted by machine learning models and docked them against the active site of SARS- CoV-2 RdRp, a key component of the virus replication machinery. Based on the ligand information of RdRp inhibitors, the machine learning models were able to identify candidates such as remdesivir and baloxavir marboxil, molecules with documented activity against RdRp of the novel coronavirus. Among the other identified drug candidates were beclabuvir, a non-nucleoside inhibitor of the hepatitis C virus (HCV) RdRp enzyme, and HCV protease inhibitors paritaprevir and faldaprevir. Further analysis of these candidates using molecular docking against the SARS-CoV-2 RdRp revealed low binding energies against the enzyme active site. Our approach also identified anti-inflammatory drugs lupeol, lifitegrast, antrafenine, betulinic acid, and ursolic acid to have potential activity against SARS-CoV-2 RdRp. We propose that the results of this study are considered for further validation as potential therapeutic options against COVID-19.
In this work we build a stack of machine learning models aimed at composing a state-of-the-art credit rating and default prediction system, obtaining excellent out-of-sample performances. Our approach is an excursion through the most recent ML / AI c oncepts, starting from natural language processes (NLP) applied to economic sectors (textual) descriptions using embedding and autoencoders (AE), going through the classification of defaultable firms on the base of a wide range of economic features using gradient boosting machines (GBM) and calibrating their probabilities paying due attention to the treatment of unbalanced samples. Finally we assign credit ratings through genetic algorithms (differential evolution, DE). Model interpretability is achieved by implementing recent techniques such as SHAP and LIME, which explain predictions locally in features space.
Measuring similarity between molecules is an important part of virtual screening (VS) experiments deployed during the early stages of drug discovery. Most widely used methods for evaluating the similarity of molecules use molecular fingerprints to en code structural information. While similarity methods using fingerprint encodings are efficient, they do not consider all the relevant aspects of molecular structure. In this paper, we describe a quantum-inspired graph-based molecular similarity (GMS) method for ligand-based VS. The GMS method is formulated as a quadratic unconstrained binary optimization problem that can be solved using a quantum annealer, providing the opportunity to take advantage of this nascent and potentially groundbreaking technology. In this study, we consider various features relevant to ligand-based VS, such as pharmacophore features and three-dimensional atomic coordinates, and include them in the GMS method. We evaluate this approach on various datasets from the DUD_LIB_VS_1.0 library. Our results show that using three-dimensional atomic coordinates as features for comparison yields higher early enrichment values. In addition, we evaluate the performance of the GMS method against conventional fingerprint approaches. The results demonstrate that the GMS method outperforms fingerprint methods for most of the datasets, presenting a new alternative in ligand-based VS with the potential for future enhancement.
For several decades optical tweezers have proven to be an invaluable tool in the study and analysis of a myriad biological responses and applications. However, as every tool, it can have undesirable or damaging effects upon the very sample it is help ing to study. In this review the main negative effects of optical tweezers upon biostructures and living systems will be presented. Three are the main areas on which the review will focus: linear optical excitation within the tweezers, non-linear photonic effects, and thermal load upon the sampled volume. Additional information is provided on negative mechanical effects of optical traps on biological structures. Strategies to avoid or, in the least, minimize these negative effects will be introduced. Finally, all these effects, undesirable for the most, can have positive applications under the right conditions. Some hints in this direction will also be discussed.
Fingerprint-based models for protein-ligand binding have demonstrated outstanding success on benchmark datasets; however, these models may not learn the correct binding rules. To assess this concern, we use in silico datasets with known binding rules to develop a general framework for evaluating model attribution. This framework identifies fragments that a model considers necessary to achieve a particular score, sidestepping the need for a model to be differentiable. Our results confirm that high-performing models may not learn the correct binding rule, and suggest concrete steps that can remedy this situation. We show that adding fragment-matched inactive molecules (decoys) to the data reduces attribution false negatives, while attribution false positives largely arise from the background correlation structure of molecular data. Normalizing for these background correlations helps to reveal the true binding logic. Our work highlights the danger of trusting attributions from high-performing models and suggests that a closer examination of fingerprint correlation structure and better decoy selection may help reduce misattributions.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا