No Arabic abstract
Computational drug discovery provides an efficient tool helping large scale lead molecules screening. One of the major tasks of lead discovery is identifying molecules with promising binding affinities towards a target, a protein in general. The accuracies of current scoring functions which are used to predict the binding affinity are not satisfactory enough. Thus, machine learning (ML) or deep learning (DL) based methods have been developed recently to improve the scoring functions. In this study, a deep convolutional neural network (CNN) model (called OnionNet) is introduced and the features are based on rotation-free element-pair specific contacts between ligands and protein atoms, and the contacts were further grouped in different distance ranges to cover both the local and non-local interaction information between the ligand and the protein. The prediction power of the model is evaluated and compared with other scoring functions using the comparative assessment of scoring functions (CASF-2013) benchmark and the v2016 core set of PDBbind database. When compared to a previous CNN-based scoring function, our model shows improvements of 0.08 and 0.16 in the correlations (R) and standard deviations (SD) of regression, respectively, between the predicted binding affinities and the experimental measured binding affinities. The robustness of the model is further explored by predicting the binding affinities of the complexes generated from docking simulations instead of experimentally determined PDB structures.
One key task in virtual screening is to accurately predict the binding affinity ($triangle$$G$) of protein-ligand complexes. Recently, deep learning (DL) has significantly increased the predicting accuracy of scoring functions due to the extraordinary ability of DL to extract useful features from raw data. Nevertheless, more efforts still need to be paid in many aspects, for the aim of increasing prediction accuracy and decreasing computational cost. In this study, we proposed a simple scoring function (called OnionNet-2) based on convolutional neural network to predict $triangle$$G$. The protein-ligand interactions are characterized by the number of contacts between protein residues and ligand atoms in multiple distance shells. Compared to published models, the efficacy of OnionNet-2 is demonstrated to be the best for two widely used datasets CASF-2016 and CASF-2013 benchmarks. The OnionNet-2 model was further verified by non-experimental decoy structures from docking program and the CSAR NRC-HiQ data set (a high-quality data set provided by CSAR), which showed great success. Thus, our study provides a simple but efficient scoring function for predicting protein-ligand binding free energy.
Empirical scoring functions based on either molecular force fields or cheminformatics descriptors are widely used, in conjunction with molecular docking, during the early stages of drug discovery to predict potency and binding affinity of a drug-like molecule to a given target. These models require expert-level knowledge of physical chemistry and biology to be encoded as hand-tuned parameters or features rather than allowing the underlying model to select features in a data-driven procedure. Here, we develop a general 3-dimensional spatial convolution operation for learning atomic-level chemical interactions directly from atomic coordinates and demonstrate its application to structure-based bioactivity prediction. The atomic convolutional neural network is trained to predict the experimentally determined binding affinity of a protein-ligand complex by direct calculation of the energy associated with the complex, protein, and ligand given the crystal structure of the binding pose. Non-covalent interactions present in the complex that are absent in the protein-ligand sub-structures are identified and the model learns the interaction strength associated with these features. We test our model by predicting the binding free energy of a subset of protein-ligand complexes found in the PDBBind dataset and compare with state-of-the-art cheminformatics and machine learning-based approaches. We find that all methods achieve experimental accuracy and that atomic convolutional networks either outperform or perform competitively with the cheminformatics based methods. Unlike all previous protein-ligand prediction systems, atomic convolutional networks are end-to-end and fully-differentiable. They represent a new data-driven, physics-based deep learning model paradigm that offers a strong foundation for future improvements in structure-based bioactivity prediction.
The cornerstone of computational drug design is the calculation of binding affinity between two biological counterparts, especially a chemical compound, i.e., a ligand, and a protein. Predicting the strength of protein-ligand binding with reasonable accuracy is critical for drug discovery. In this paper, we propose a data-driven framework named DeepAtom to accurately predict the protein-ligand binding affinity. With 3D Convolutional Neural Network (3D-CNN) architecture, DeepAtom could automatically extract binding related atomic interaction patterns from the voxelized complex structure. Compared with the other CNN based approaches, our light-weight model design effectively improves the model representational capacity, even with the limited available training data. With validation experiments on the PDBbind v.2016 benchmark and the independent Astex Diverse Set, we demonstrate that the less feature engineering dependent DeepAtom approach consistently outperforms the other state-of-the-art scoring methods. We also compile and propose a new benchmark dataset to further improve the model performances. With the new dataset as training input, DeepAtom achieves Pearsons R=0.83 and RMSE=1.23 pK units on the PDBbind v.2016 core set. The promising results demonstrate that DeepAtom models can be potentially adopted in computational drug development protocols such as molecular docking and virtual screening.
There is great interest to develop artificial intelligence-based protein-ligand affinity models due to their immense applications in drug discovery. In this paper, PointNet and PointTransformer, two pointwise multi-layer perceptrons have been applied for protein-ligand affinity prediction for the first time. Three-dimensional point clouds could be rapidly generated from the data sets in PDBbind-2016, which contain 3 772 and 11 327 individual point clouds derived from the refined or/and general sets, respectively. These point clouds were used to train PointNet or PointTransformer, resulting in protein-ligand affinity prediction models with Pearson correlation coefficients R = 0.831 or 0.859 from the larger point clouds respectively, based on the CASF-2016 benchmark test. The analysis of the parameters suggests that the two deep learning models were capable to learn many interactions between proteins and their ligands, and these key atoms for the interaction could be visualized in point clouds. The protein-ligand interaction features learned by PointTransformer could be further adapted for the XGBoost-based machine learning algorithm, resulting in prediction models with an average Rp of 0.831, which is on par with the state-of-the-art machine learning models based on PDBbind database. These results suggest that point clouds derived from the PDBbind datasets are useful to evaluate the performance of 3D point clouds-centered deep learning algorithms, which could learn critical protein-ligand interactions from natural evolution or medicinal chemistry and have wide applications in studying protein-ligand interactions.
Drug discovery often relies on the successful prediction of protein-ligand binding affinity. Recent advances have shown great promise in applying graph neural networks (GNNs) for better affinity prediction by learning the representations of protein-ligand complexes. However, existing solutions usually treat protein-ligand complexes as topological graph data, thus the biomolecular structural information is not fully utilized. The essential long-range interactions among atoms are also neglected in GNN models. To this end, we propose a structure-aware interactive graph neural network (SIGN) which consists of two components: polar-inspired graph attention layers (PGAL) and pairwise interactive pooling (PiPool). Specifically, PGAL iteratively performs the node-edge aggregation process to update embeddings of nodes and edges while preserving the distance and angle information among atoms. Then, PiPool is adopted to gather interactive edges with a subsequent reconstruction loss to reflect the global interactions. Exhaustive experimental study on two benchmarks verifies the superiority of SIGN.