No Arabic abstract
The computational study of conformational transitions in nucleic acids still faces many challenges. For example, in the case of single stranded RNA tetranucleotides, agreement between simulations and experiments is not satisfactory due to inaccuracies in the force fields commonly used in molecular dynamics simulations. We here use experimental data collected from high-resolution X-ray structures to attempt an improvement of the latest version of the AMBER force field. A modified metadynamics algorithm is used to calculate correcting potentials designed to enforce experimental distributions of backbone torsion angles. Replica-exchange simulations of tetranucleotides including these correcting potentials show significantly better agreement with independent solution experiments for the oligonucleotides containing pyrimidine bases. Although the proposed corrections do not seem to be portable to generic RNA systems, the simulations revealed the importance of the alpha and beta backbone angles on the modulation of the RNA conformational ensemble. The correction protocol presented here suggests a systematic procedure for force-field refinement.
Elastic network models (ENMs) are valuable and efficient tools for characterizing the collective internal dynamics of proteins based on the knowledge of their native structures. The increasing evidence that the biological functionality of RNAs is often linked to their innate internal motions, poses the question of whether ENM approaches can be successfully extended to this class of biomolecules. This issue is tackled here by considering various families of elastic networks of increasing complexity applied to a representative set of RNAs. The fluctuations predicted by the alternative ENMs are stringently validated by comparison against extensive molecular dynamics simulations and SHAPE experiments. We find that simulations and experimental data are systematically best reproduced by either an all-atom or a three-beads-per-nucleotide representation (sugar-base-phosphate), with the latter arguably providing the best balance of accuracy and computational complexity.
Recent computational efforts have shown that the current potential energy models used in molecular dynamics are not accurate enough to describe the conformational ensemble of RNA oligomers and suggest that molecular dynamics should be complemented with experimental data. We here propose a scheme based on the maximum entropy principle to combine simulations with bulk experiments. In the proposed scheme the noise arising from both the measurements and the forward models used to back calculate the experimental observables is explicitly taken into account. The method is tested on RNA nucleosides and is then used to construct chemically consistent corrections to the Amber RNA force field that allow a large set of experimental data on nucleosides and dinucleosides to be correctly reproduced. The transferability of these corrections is assessed against independent data on tetranucleotides and displays a previously unreported agreement with experiments. This procedure can be applied to enforce multiple experimental data on multiple systems in a self-consistent framework thus suggesting a new paradigm for force field refinement.
The ongoing effort to detect and characterize physical entanglement in biopolymers has so far established that knots are present in many globular proteins and also abound in viral DNA packaged inside bacteriophages. RNA molecules, on the other hand, have not yet been systematically screened for the occurrence of physical knots. We have accordingly undertaken the systematic profiling of the ~6,000 RNA structures present in the protein data bank. The search identified no more than three deeply-knotted RNA molecules. These are ribosomal RNAs solved by cryo-em and consist of about 3,000 nucleotides. Compared to the case of proteins and viral DNA, the observed incidence of RNA knots is therefore practically negligible. This suggests that either evolutionary selection, or thermodynamic and kinetic folding mechanisms act towards minimizing the entanglement of RNA to an extent that is unparalleled by other types of biomolecules. The properties of the three observed RNA knotting patterns provide valuable clues for designing RNA sequences capable of self-tying in a twist-knot fold.
Determining which proteins interact together is crucial to a systems-level understanding of the cell. Recently, algorithms based on Direct Coupling Analysis (DCA) pairwise maximum-entropy models have allowed to identify interaction partners among paralogous proteins from sequence data. This success of DCA at predicting protein-protein interactions could be mainly based on its known ability to identify pairs of residues that are in contact in the three-dimensional structure of protein complexes and that coevolve to remain physicochemically complementary. However, interacting proteins possess similar evolutionary histories. What is the role of purely phylogenetic correlations in the performance of DCA-based methods to infer interaction partners? To address this question, we employ controlled synthetic data that only involve phylogeny and no interactions or contacts. We find that DCA accurately identifies the pairs of synthetic sequences that share evolutionary history. While phylogenetic correlations confound the identification of contacting residues by DCA, they are thus useful to predict interacting partners among paralogs. We find that DCA performs as well as phylogenetic methods to this end, and slightly better than them with large and accurate training sets. Employing DCA or phylogenetic methods within an Iterative Pairing Algorithm (IPA) allows to predict pairs of evolutionary partners without a training set. We demonstrate the ability of these various methods to correctly predict pairings among real paralogous proteins with genome proximity but no known physical interaction, illustrating the importance of phylogenetic correlations in natural data. However, for physically interacting and strongly coevolving proteins, DCA and mutual information outperform phylogenetic methods. We discuss how to distinguish physically interacting proteins from those only sharing evolutionary history.
Many non-coding RNAs are known to play a role in the cell directly linked to their structure. Structure prediction based on the sole sequence is however a challenging task. On the other hand, thanks to the low cost of sequencing technologies, a very large number of homologous sequences are becoming available for many RNA families. In the protein community, it has emerged in the last decade the idea of exploiting the covariance of mutations within a family to predict the protein structure using the direct-coupling-analysis (DCA) method. The application of DCA to RNA systems has been limited so far. We here perform an assessment of the DCA method on 17 riboswitch families, comparing it with the commonly used mutual information analysis and with state-of-the-art R-scape covariance method. We also compare different flavors of DCA, including mean-field, pseudo-likelihood, and a proposed stochastic procedure (Boltzmann learning) for solving exactly the DCA inverse problem. Boltzmann learning outperforms the other methods in predicting contacts observed in high resolution crystal structures.