Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Inferring DNA sequences from mechanical unzipping data: the large-bandwidth case

472 0 0.0 ( 0 )

Download Cite

Added by Remi Monasson

Publication date 2007

fields Biology Physics

and research's language is English

Authors Valentina Baldazzi

Biomolecules Statistical Mechanics

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

The complementary strands of DNA molecules can be separated when stretched apart by a force; the unzipping signal is correlated to the base content of the sequence but is affected by thermal and instrumental noise. We consider here the ideal case where opening events are known to a very good time resolution (very large bandwidth), and study how the sequence can be reconstructed from the unzipping data. Our approach relies on the use of statistical Bayesian inference and of Viterbi decoding algorithm. Performances are studied numerically on Monte Carlo generated data, and analytically. We show how multiple unzippings of the same molecule may be exploited to improve the quality of the prediction, and calculate analytically the number of required unzippings as a function of the bandwidth, the sequence content, the elasticity parameters of the unzipped strands.

rate research

Inferring interaction partners from protein sequences using mutual information

84 - Anne-Florence Bitbol 2018

Functional protein-protein interactions are crucial in most cellular processes. They enable multi-protein complexes to assemble and to remain stable, and they allow signal transduction in various pathways. Functional interactions between proteins result in coevolution between the interacting partners, and thus in correlations between their sequences. Pairwise maximum-entropy based models have enabled successful inference of pairs of amino-acid residues that are in contact in the three-dimensional structure of multi-protein complexes, starting from the correlations in the sequence data of known interaction partners. Recently, algorithms inspired by these methods have been developed to identify which proteins are functional interaction partners among the paralogous proteins of two families, starting from sequence data alone. Here, we demonstrate that a slightly higher performance for partner identification can be reached by an approximate maximization of the mutual information between the sequence alignments of the two protein families. Our mutual information-based method also provides signatures of the existence of interactions between protein families. These results stand in contrast with structure prediction of proteins and of multi-protein complexes from sequence data, where pairwise maximum-entropy based global statistical models substantially improve performance compared to mutual information. Our findings entail that the statistical dependences allowing interaction partner prediction from sequence data are not restricted to the residue pairs that are in direct contact at the interface between the partner proteins.

Biomolecules Biological Physics

Statistical properties of metastable intermediates in DNA unzipping

362 - J. M. Huguet , N. Forns , F. Ritort 2010

We unzip DNA molecules using optical tweezers and determine the sizes of the cooperatively unzipping and zipping regions separating consecutive metastable intermediates along the unzipping pathway. Sizes are found to be distributed following a power law, ranging from one base pair up to more than a hundred base pairs. We find that a large fraction of unzipping regions smaller than 10 bp are seldom detected because of the high compliance of the released single stranded DNA. We show how the compliance of a single nucleotide sets a limit value around 0.1 N/m for the stiffness of any local force probe aiming to discriminate one base pair at a time in DNA unzipping experiments.

Biological Physics Statistical Mechanics

Phylogenetic correlations can suffice to infer protein partners from sequences

92 - Guillaume Marmier , Martin Weigt , Anne-Florence Bitbol 2019

Determining which proteins interact together is crucial to a systems-level understanding of the cell. Recently, algorithms based on Direct Coupling Analysis (DCA) pairwise maximum-entropy models have allowed to identify interaction partners among paralogous proteins from sequence data. This success of DCA at predicting protein-protein interactions could be mainly based on its known ability to identify pairs of residues that are in contact in the three-dimensional structure of protein complexes and that coevolve to remain physicochemically complementary. However, interacting proteins possess similar evolutionary histories. What is the role of purely phylogenetic correlations in the performance of DCA-based methods to infer interaction partners? To address this question, we employ controlled synthetic data that only involve phylogeny and no interactions or contacts. We find that DCA accurately identifies the pairs of synthetic sequences that share evolutionary history. While phylogenetic correlations confound the identification of contacting residues by DCA, they are thus useful to predict interacting partners among paralogs. We find that DCA performs as well as phylogenetic methods to this end, and slightly better than them with large and accurate training sets. Employing DCA or phylogenetic methods within an Iterative Pairing Algorithm (IPA) allows to predict pairs of evolutionary partners without a training set. We demonstrate the ability of these various methods to correctly predict pairings among real paralogous proteins with genome proximity but no known physical interaction, illustrating the importance of phylogenetic correlations in natural data. However, for physically interacting and strongly coevolving proteins, DCA and mutual information outperform phylogenetic methods. We discuss how to distinguish physically interacting proteins from those only sharing evolutionary history.

Biomolecules Statistical Mechanics Biological Physics

Algebraic statistics of Poincare recurrences in DNA molecule

80 - Alexey K. Mazur , D. L. Shepelyansky 2015

Statistics of Poincare recurrences is studied for the base-pair breathing dynamics of an all-atom DNA molecule in realistic aqueous environment with thousands of degrees of freedom. It is found that at least over five decades in time the decay of recurrences is described by an algebraic law with the Poincare exponent close to $beta=1.2$. This value is directly related to the correlation decay exponent $ u = beta -1$, which is close to $ uapprox 0.15$ observed in the time resolved Stokes shift experiments. By applying the virial theorem we analyse the chaotic dynamics in polynomial potentials and demonstrate analytically that exponent $beta=1.2$ is obtained assuming the dominance of dipole-dipole interactions in the relevant DNA dynamics. Molecular dynamics simulations also reveal the presence of strong low frequency noise with the exponent $eta=1.6$. We trace parallels with the chaotic dynamics of symplectic maps with a few degrees of freedom characterized by the Poincare exponent $beta sim 1.5$.

Biomolecules Statistical Mechanics

Inferring epistasis from genomic data with comparable mutation and outcrossing rate

372 - Hong-Li Zeng , Eugenio Mauri , Vito Dichio 2020

We consider a population evolving due to mutation, selection and recombination, where selection includes single-locus terms (additive fitness) and two-loci terms (pairwise epistatic fitness). We further consider the problem of inferring fitness in the evolutionary dynamics from one or several snap-shots of the distribution of genotypes in the population. In the recent literature this has been done by applying the Quasi-Linkage Equilibrium (QLE) regime first obtained by Kimura in the limit of high recombination. Here we show that the approach also works in the interesting regime where the effects of mutations are comparable to or larger than recombination. This leads to a modified main epistatic fitness inference formula where the rates of mutation and recombination occur together. We also derive this formula using by a previously developed Gaussian closure that formally remains valid when recombination is absent. The findings are validated through numerical simulations.

Populations and Evolution Statistical Mechanics

comments

Fetching comments

Alshahba Private University

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Inferring DNA sequences from mechanical unzipping data: the large-bandwidth case

Ask ChatGPT about the research

No Arabic abstract

Read More