No Arabic abstract
The knowledge of potentially druggable binding sites on proteins is an important preliminary step towards the discovery of novel drugs. The computational prediction of such areas can be boosted by following the recent major advances in the deep learning field and by exploiting the increasing availability of proper data. In this paper, a novel computational method for the prediction of potential binding sites is proposed, called DeepSurf. DeepSurf combines a surface-based representation, where a number of 3D voxelized grids are placed on the proteins surface, with state-of-the-art deep learning architectures. After being trained on the large database of scPDB, DeepSurf demonstrates superior results on three diverse testing datasets, by surpassing all its main deep learning-based competitors, while attaining competitive performance to a set of traditional non-data-driven approaches.
There is great interest to develop artificial intelligence-based protein-ligand affinity models due to their immense applications in drug discovery. In this paper, PointNet and PointTransformer, two pointwise multi-layer perceptrons have been applied for protein-ligand affinity prediction for the first time. Three-dimensional point clouds could be rapidly generated from the data sets in PDBbind-2016, which contain 3 772 and 11 327 individual point clouds derived from the refined or/and general sets, respectively. These point clouds were used to train PointNet or PointTransformer, resulting in protein-ligand affinity prediction models with Pearson correlation coefficients R = 0.831 or 0.859 from the larger point clouds respectively, based on the CASF-2016 benchmark test. The analysis of the parameters suggests that the two deep learning models were capable to learn many interactions between proteins and their ligands, and these key atoms for the interaction could be visualized in point clouds. The protein-ligand interaction features learned by PointTransformer could be further adapted for the XGBoost-based machine learning algorithm, resulting in prediction models with an average Rp of 0.831, which is on par with the state-of-the-art machine learning models based on PDBbind database. These results suggest that point clouds derived from the PDBbind datasets are useful to evaluate the performance of 3D point clouds-centered deep learning algorithms, which could learn critical protein-ligand interactions from natural evolution or medicinal chemistry and have wide applications in studying protein-ligand interactions.
Motivation: Protein-ligand affinity prediction is an important part of structure-based drug design. It includes molecular docking and affinity prediction. Although molecular dynamics can predict affinity with high accuracy at present, it is not suitable for large-scale virtual screening. The existing affinity prediction and evaluation functions based on deep learning mostly rely on experimentally-determined conformations. Results: We build a predictive model of protein-ligand affinity through the ResNet neural network with added attention mechanism. The resulting ResAtom-Score model achieves Pearsons correlation coefficient R = 0.833 on the CASF-2016 benchmark test set. At the same time, we evaluated the performance of a variety of existing scoring functions in combination with ResAtom-Score in the absence of experimentally-determined conformations. The results show that the use of {Delta}VinaRF20 in combination with ResAtom-Score can achieve affinity prediction close to scoring functions in the presence of experimentally-determined conformations. These results suggest that ResAtom system may be used for in silico screening of small molecule ligands with target proteins in the future. Availability: https://github.com/wyji001/ResAtom
Comprehensive knowledge of protein-ligand interactions should provide a useful basis for annotating protein functions, studying protein evolution, engineering enzymatic activity, and designing drugs. To investigate the diversity and universality of ligand binding sites in protein structures, we conducted the all-against-all atomic-level structural comparison of over 180,000 ligand binding sites found in all the known structures in the Protein Data Bank by using a recently developed database search and alignment algorithm. By applying a hybrid top-down-bottom-up clustering analysis to the comparison results, we determined approximately 3000 well-defined structural motifs of ligand binding sites. Apart from a handful of exceptions, most structural motifs were found to be confined within single families or superfamilies, and to be associated with particular ligands. Furthermore, we analyzed the components of the similarity network and enumerated more than 4000 pairs of ligand binding sites that were shared across different protein folds.
The cornerstone of computational drug design is the calculation of binding affinity between two biological counterparts, especially a chemical compound, i.e., a ligand, and a protein. Predicting the strength of protein-ligand binding with reasonable accuracy is critical for drug discovery. In this paper, we propose a data-driven framework named DeepAtom to accurately predict the protein-ligand binding affinity. With 3D Convolutional Neural Network (3D-CNN) architecture, DeepAtom could automatically extract binding related atomic interaction patterns from the voxelized complex structure. Compared with the other CNN based approaches, our light-weight model design effectively improves the model representational capacity, even with the limited available training data. With validation experiments on the PDBbind v.2016 benchmark and the independent Astex Diverse Set, we demonstrate that the less feature engineering dependent DeepAtom approach consistently outperforms the other state-of-the-art scoring methods. We also compile and propose a new benchmark dataset to further improve the model performances. With the new dataset as training input, DeepAtom achieves Pearsons R=0.83 and RMSE=1.23 pK units on the PDBbind v.2016 core set. The promising results demonstrate that DeepAtom models can be potentially adopted in computational drug development protocols such as molecular docking and virtual screening.
Most biological processes are described as a series of interactions between proteins and other molecules, and interactions are in turn described in terms of atomic structures. To annotate protein functions as sets of interaction states at atomic resolution, and thereby to better understand the relation between protein interactions and biological functions, we conducted exhaustive all-against-all atomic structure comparisons of all known binding sites for ligands including small molecules, proteins and nucleic acids, and identified recurring elementary motifs. By integrating the elementary motifs associated with each subunit, we defined composite motifs which represent context-dependent combinations of elementary motifs. It is demonstrated that function similarity can be better inferred from composite motif similarity compared to the similarity of protein sequences or of individual binding sites. By integrating the composite motifs associated with each protein function, we define meta-composite motifs each of which is regarded as a time-independent diagrammatic representation of a biological process. It is shown that meta-composite motifs provide richer annotations of biological processes than sequence clusters. The present results serve as a basis for bridging atomic structures to higher-order biological phenomena by classification and integration of binding site structures.