No Arabic abstract
Many non-coding RNAs are known to play a role in the cell directly linked to their structure. Structure prediction based on the sole sequence is however a challenging task. On the other hand, thanks to the low cost of sequencing technologies, a very large number of homologous sequences are becoming available for many RNA families. In the protein community, it has emerged in the last decade the idea of exploiting the covariance of mutations within a family to predict the protein structure using the direct-coupling-analysis (DCA) method. The application of DCA to RNA systems has been limited so far. We here perform an assessment of the DCA method on 17 riboswitch families, comparing it with the commonly used mutual information analysis and with state-of-the-art R-scape covariance method. We also compare different flavors of DCA, including mean-field, pseudo-likelihood, and a proposed stochastic procedure (Boltzmann learning) for solving exactly the DCA inverse problem. Boltzmann learning outperforms the other methods in predicting contacts observed in high resolution crystal structures.
Various approaches have explored the covariation of residues in multiple-sequence alignments of homologous proteins to extract functional and structural information. Among those are principal component analysis (PCA), which identifies the most correlated groups of residues, and direct coupling analysis (DCA), a global inference method based on the maximum entropy principle, which aims at predicting residue-residue contacts. In this paper, inspired by the statistical physics of disordered systems, we introduce the Hopfield-Potts model to naturally interpolate between these two approaches. The Hopfield-Potts model allows us to identify relevant patterns of residues from the knowledge of the eigenmodes and eigenvalues of the residue-residue correlation matrix. We show how the computation of such statistical patterns makes it possible to accurately predict residue-residue contacts with a much smaller number of parameters than DCA. This dimensional reduction allows us to avoid overfitting and to extract contact information from multiple-sequence alignments of reduced size. In addition, we show that low-eigenvalue correlation modes, discarded by PCA, are important to recover structural information: the corresponding patterns are highly localized, that is, they are concentrated in few sites, which we find to be in close contact in the three-dimensional protein fold.
The computational study of conformational transitions in nucleic acids still faces many challenges. For example, in the case of single stranded RNA tetranucleotides, agreement between simulations and experiments is not satisfactory due to inaccuracies in the force fields commonly used in molecular dynamics simulations. We here use experimental data collected from high-resolution X-ray structures to attempt an improvement of the latest version of the AMBER force field. A modified metadynamics algorithm is used to calculate correcting potentials designed to enforce experimental distributions of backbone torsion angles. Replica-exchange simulations of tetranucleotides including these correcting potentials show significantly better agreement with independent solution experiments for the oligonucleotides containing pyrimidine bases. Although the proposed corrections do not seem to be portable to generic RNA systems, the simulations revealed the importance of the alpha and beta backbone angles on the modulation of the RNA conformational ensemble. The correction protocol presented here suggests a systematic procedure for force-field refinement.
DNA replication is an essential process in biology and its timing must be robust so that cells can divide properly. Random fluctuations in the formation of replication starting points, called origins, and the subsequent activation of proteins lead to variations in the replication time. We analyse these stochastic properties of DNA and derive the positions of origins corresponding to the minimum replication time. We show that under some conditions the minimization of replication time leads to the grouping of origins, and relate this to experimental data in a number of species showing origin grouping.
This work is about statistical genetics, an interdisciplinary topic between Statistical Physics and Population Biology. Our focus is on the phase of Quasi-Linkage Equilibrium (QLE) which has many similarities to equilibrium statistical mechanics, and how the stability of that phase is lost. The QLE phenomenon was discovered by Motoo Kimura and was extended and generalized to the global genome scale by Neher & Shraiman (2011). What we will refer to as the Kimura-Neher-Shraiman (KNS) theory describes a population evolving due to the mutations, recombination, genetic drift, natural selection (pairwise epistatic fitness). The main conclusion of KNS is that QLE phase exists at sufficiently high recombination rate ($r$) with respect to the variability in selection strength (fitness). Combining these results with the techniques of the Direct Coupling Analysis (DCA) we show that in QLE epistatic fitness can be inferred from the knowledge of the (dynamical) distribution of genotypes in a population. Extending upon our earlier work Zeng & Aurell (2020) here we present an extension to high mutation and recombination rate. We further consider evolution of a population at higher selection strength with respect to recombination and mutation parameters ($r$ and $mu$). We identify a new bi-stable phase which we call the Non-Random Coexistence (NRC) phase where genomic mutations persist in the population without either fixating or disappearing. We also identify an intermediate region in the parameter space where a finite population jumps stochastically between QLE-like state and NRC-like behaviour. The existence of NRC-phase demonstrates that even if statistical genetics at high recombination closely mirrors equilibrium statistical physics, a more apt analogy is non-equilibrium statistical physics with broken detailed balance, where self-sustained dynamical phenomena are ubiquitous.
Elastic network models (ENMs) are valuable and efficient tools for characterizing the collective internal dynamics of proteins based on the knowledge of their native structures. The increasing evidence that the biological functionality of RNAs is often linked to their innate internal motions, poses the question of whether ENM approaches can be successfully extended to this class of biomolecules. This issue is tackled here by considering various families of elastic networks of increasing complexity applied to a representative set of RNAs. The fluctuations predicted by the alternative ENMs are stringently validated by comparison against extensive molecular dynamics simulations and SHAPE experiments. We find that simulations and experimental data are systematically best reproduced by either an all-atom or a three-beads-per-nucleotide representation (sugar-base-phosphate), with the latter arguably providing the best balance of accuracy and computational complexity.