No Arabic abstract
Most biological processes are described as a series of interactions between proteins and other molecules, and interactions are in turn described in terms of atomic structures. To annotate protein functions as sets of interaction states at atomic resolution, and thereby to better understand the relation between protein interactions and biological functions, we conducted exhaustive all-against-all atomic structure comparisons of all known binding sites for ligands including small molecules, proteins and nucleic acids, and identified recurring elementary motifs. By integrating the elementary motifs associated with each subunit, we defined composite motifs which represent context-dependent combinations of elementary motifs. It is demonstrated that function similarity can be better inferred from composite motif similarity compared to the similarity of protein sequences or of individual binding sites. By integrating the composite motifs associated with each protein function, we define meta-composite motifs each of which is regarded as a time-independent diagrammatic representation of a biological process. It is shown that meta-composite motifs provide richer annotations of biological processes than sequence clusters. The present results serve as a basis for bridging atomic structures to higher-order biological phenomena by classification and integration of binding site structures.
Comprehensive knowledge of protein-ligand interactions should provide a useful basis for annotating protein functions, studying protein evolution, engineering enzymatic activity, and designing drugs. To investigate the diversity and universality of ligand binding sites in protein structures, we conducted the all-against-all atomic-level structural comparison of over 180,000 ligand binding sites found in all the known structures in the Protein Data Bank by using a recently developed database search and alignment algorithm. By applying a hybrid top-down-bottom-up clustering analysis to the comparison results, we determined approximately 3000 well-defined structural motifs of ligand binding sites. Apart from a handful of exceptions, most structural motifs were found to be confined within single families or superfamilies, and to be associated with particular ligands. Furthermore, we analyzed the components of the similarity network and enumerated more than 4000 pairs of ligand binding sites that were shared across different protein folds.
The knowledge of potentially druggable binding sites on proteins is an important preliminary step towards the discovery of novel drugs. The computational prediction of such areas can be boosted by following the recent major advances in the deep learning field and by exploiting the increasing availability of proper data. In this paper, a novel computational method for the prediction of potential binding sites is proposed, called DeepSurf. DeepSurf combines a surface-based representation, where a number of 3D voxelized grids are placed on the proteins surface, with state-of-the-art deep learning architectures. After being trained on the large database of scPDB, DeepSurf demonstrates superior results on three diverse testing datasets, by surpassing all its main deep learning-based competitors, while attaining competitive performance to a set of traditional non-data-driven approaches.
We outline recent developments in artificial intelligence (AI) and machine learning (ML) techniques for integrative structural biology of intrinsically disordered proteins (IDP) ensembles. IDPs challenge the traditional protein structure-function paradigm by adapting their conformations in response to specific binding partners leading them to mediate diverse, and often complex cellular functions such as biological signaling, self organization and compartmentalization. Obtaining mechanistic insights into their function can therefore be challenging for traditional structural determination techniques. Often, scientists have to rely on piecemeal evidence drawn from diverse experimental techniques to characterize their functional mechanisms. Multiscale simulations can help bridge critical knowledge gaps about IDP structure function relationships - however, these techniques also face challenges in resolving emergent phenomena within IDP conformational ensembles. We posit that scalable statistical inference techniques can effectively integrate information gleaned from multiple experimental techniques as well as from simulations, thus providing access to atomistic details of these emergent phenomena.
The prion protein (PrP) binds Cu2+ ions in the octarepeat domain of the N-terminal tail up to full occupancy at pH=7.4. Recent experiments show that the HGGG octarepeat subdomain is responsible for holding the metal bound in a square planar coordination. By using first principle ab initio molecular dynamics simulations of the Car-Parrinello type, the Cu coordination mode to the binding sites of the PrP octarepeat region is investigated. Simulations are carried out for a number of structured binding sites. Results for the complexes Cu(HGGGW)+(wat), Cu(HGGG) and the 2[Cu(HGGG)] dimer are presented. While the presence of a Trp residue and a H2O molecule does not seem to affect the nature of the Cu coordination, high stability of the bond between Cu and the amide Nitrogens of deprotonated Glys is confirmed in the case of the Cu(HGGG) system. For the more interesting 2[Cu(HGGG)] dimer a dynamically entangled arrangement of the two monomers, with intertwined N-Cu bonds, emerges. This observation is consistent with the highly packed structure seen in experiments at full Cu occupancy.
Sm proteins were discovered nearly 20 years ago as a group of small antigenic proteins ($approx$ 90-120 residues). Since then, an extensive amount of biochemical and genetic data have illuminated the crucial roles of these proteins in forming ribonucleoprotein (RNP) complexes that are used in RNA processing, e.g., spliceosomal removal of introns from pre-mRNAs. Spliceosomes are large macromolecular machines that are comparable to ribosomes in size and complexity, and are composed of uridine-rich small nuclear RNPs (U snRNPs). Various sets of seven different Sm proteins form the cores of most snRNPs. Despite their importance, very little is known about the atomic-resolution structure of snRNPs or their Sm cores. As a first step towards a high-resolution image of snRNPs and their hierarchic assembly, we have determined the crystal structures of archaeal homologs of Sm proteins, which we term Sm-like archaeal proteins (SmAPs).