ترغب بنشر مسار تعليمي؟ اضغط هنا

Attribution Methods Reveal Flaws in Fingerprint-Based Virtual Screening

215   0   0.0 ( 0 )
 نشر من قبل Vikram Sundar
 تاريخ النشر 2020
  مجال البحث علم الأحياء
والبحث باللغة English




اسأل ChatGPT حول البحث

Fingerprint-based models for protein-ligand binding have demonstrated outstanding success on benchmark datasets; however, these models may not learn the correct binding rules. To assess this concern, we use in silico datasets with known binding rules to develop a general framework for evaluating model attribution. This framework identifies fragments that a model considers necessary to achieve a particular score, sidestepping the need for a model to be differentiable. Our results confirm that high-performing models may not learn the correct binding rule, and suggest concrete steps that can remedy this situation. We show that adding fragment-matched inactive molecules (decoys) to the data reduces attribution false negatives, while attribution false positives largely arise from the background correlation structure of molecular data. Normalizing for these background correlations helps to reveal the true binding logic. Our work highlights the danger of trusting attributions from high-performing models and suggests that a closer examination of fingerprint correlation structure and better decoy selection may help reduce misattributions.



قيم البحث

اقرأ أيضاً

96 - Hao Tian , Peng Tao 2020
The novel severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a major worldwide public health emergency that has infected over $1.5$ million people. The partially open state of S1 subunit in spike glycoprotein is considered vital for its infection with host cell and is represented as a key target for neutralizing antibodies. However, the mechanism elucidating the transition from the closed state to the partially open state still remains unclear. Here, we applied a combination of Markov state model, transition path theory and random forest to analyze the S1 motion. Our results explored a promising complete conformational movement of receptor-binding domain, from buried, partially open, to detached states. We also numerically confirmed the transition probability between those states. Based on the asymmetry in both the dynamics behavior and backbone C$alpha$ importance, we further suggested a relation between chains in the trimer spike protein, which may help in the vaccine design and antibody neutralization.
182 - Zixuan Cang , Lin Mu , Guowei Wei 2017
This work introduces a number of algebraic topology approaches, such as multicomponent persistent homology, multi-level persistent homology and electrostatic persistence for the representation, characterization, and description of small molecules and biomolecular complexes. Multicomponent persistent homology retains critical chemical and biological information during the topological simplification of biomolecular geometric complexity. Multi-level persistent homology enables a tailored topological description of inter- and/or intra-molecular interactions of interest. Electrostatic persistence incorporates partial charge information into topological invariants. These topological methods are paired with Wasserstein distance to characterize similarities between molecules and are further integrated with a variety of machine learning algorithms, including k-nearest neighbors, ensemble of trees, and deep convolutional neural networks, to manifest their descriptive and predictive powers for chemical and biological problems. Extensive numerical experiments involving more than 4,000 protein-ligand complexes from the PDBBind database and near 100,000 ligands and decoys in the DUD database are performed to test respectively the scoring power and the virtual screening power of the proposed topological approaches. It is demonstrated that the present approaches outperform the modern machine learning based methods in protein-ligand binding affinity predictions and ligand-decoy discrimination.
Normal mode analysis offers an efficient way of modeling the conformational flexibility of protein structures. Simple models defined by contact topology, known as elastic network models, have been used to model a variety of systems, but the validatio n is typically limited to individual modes for a single protein. We use anisotropic displacement parameters from crystallography to test the quality of prediction of both the magnitude and directionality of conformational variance. Normal modes from four simple elastic network model potentials and from the CHARMM forcefield are calculated for a data set of 83 diverse, ultrahigh resolution crystal structures. While all five potentials provide good predictions of the magnitude of flexibility, the methods that consider all atoms have a clear edge at prediction of directionality, and the CHARMM potential produces the best agreement. The low-frequency modes from different potentials are similar, but those computed from the CHARMM potential show the greatest difference from the elastic network models. This was illustrated by computing the dynamic correlation matrices from different potentials for a PDZ domain structure. Comparison of normal mode results with anisotropic temperature factors opens the possibility of using ultrahigh resolution crystallographic data as a quantitative measure of molecular flexibility. The comprehensive evaluation demonstrates the costs and benefits of using normal mode potentials of varying complexity. Comparison of the dynamic correlation matrices suggests that a combination of topological and chemical potentials may help identify residues in which chemical forces make large contributions to intramolecular coupling.
The tertiary structures of functional RNA molecules remain difficult to decipher. A new generation of automated RNA structure prediction methods may help address these challenges but have not yet been experimentally validated. Here we apply four pred iction tools to a remarkable class of double glycine riboswitches that exhibit ligand-binding cooperativity. A novel method (BPPalign), RMdetect, JAR3D, and Rosetta 3D modeling give consistent predictions for a new stem P0 and kink-turn motif. These elements structure the linker between the RNAs double aptamers. Chemical mapping on the F. nucleatum riboswitch with SHAPE, DMS, and CMCT probing, mutate-and-map studies, and mutation/rescue experiments all provide strong evidence for the structured linker. Under solution conditions that separate two glycine binding transitions, disrupting this helix-junction-helix structure gives 120-fold and 6- to 30-fold poorer association constants for the two transitions, corresponding to an overall energetic impact of 4.3 pm 0.5 kcal/mol. Prior biochemical and crystallography studies from several labs did not include this critical element due to over-truncation of the RNA. We argue that several further undiscovered elements are likely to exist in the flanking regions of this and other RNA switches, and automated prediction tools can now play a powerful role in their detection and dissection.
171 - Wipapat Kladwang , Justine Hum , 2012
Chemical purity of RNA samples is critical for high-precision studies of RNA folding and catalytic behavior, but such purity may be compromised by photodamage accrued during ultraviolet (UV) visualization of gel-purified samples. Here, we quantitativ ely assess the breadth and extent of such damage by using reverse transcription followed by single-nucleotide-resolution capillary electrophoresis. We detected UV-induced lesions across a dozen natural and artificial RNAs including riboswitch domains, other non-coding RNAs, and artificial sequences; across multiple sequence contexts, dominantly at but not limited to pyrimidine doublets; and from multiple lamps that are recommended for UV shadowing in the literature. Most strikingly, irradiation time-courses reveal detectable damage within a few seconds of exposure, and these data can be quantitatively fit to a skin effect model that accounts for the increased exposure of molecules near the top of irradiated gel slices. The results indicate that 200-nucleotide RNAs subjected to 20 seconds or less of UV shadowing can incur damage to 20% of molecules, and the molecule-by-molecule distribution of these lesions is more heterogeneous than a Poisson distribution. Photodamage from UV shadowing is thus likely a widespread but unappreciated cause of artifactual heterogeneity in quantitative and single-molecule-resolution RNA biophysical measurements.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا