ترغب بنشر مسار تعليمي؟ اضغط هنا

Inferring DNA sequences from mechanical unzipping data: the large-bandwidth case

137   0   0.0 ( 0 )
 نشر من قبل Remi Monasson
 تاريخ النشر 2007
  مجال البحث علم الأحياء فيزياء
والبحث باللغة English




اسأل ChatGPT حول البحث

The complementary strands of DNA molecules can be separated when stretched apart by a force; the unzipping signal is correlated to the base content of the sequence but is affected by thermal and instrumental noise. We consider here the ideal case where opening events are known to a very good time resolution (very large bandwidth), and study how the sequence can be reconstructed from the unzipping data. Our approach relies on the use of statistical Bayesian inference and of Viterbi decoding algorithm. Performances are studied numerically on Monte Carlo generated data, and analytically. We show how multiple unzippings of the same molecule may be exploited to improve the quality of the prediction, and calculate analytically the number of required unzippings as a function of the bandwidth, the sequence content, the elasticity parameters of the unzipped strands.

قيم البحث

اقرأ أيضاً

Functional protein-protein interactions are crucial in most cellular processes. They enable multi-protein complexes to assemble and to remain stable, and they allow signal transduction in various pathways. Functional interactions between proteins res ult in coevolution between the interacting partners, and thus in correlations between their sequences. Pairwise maximum-entropy based models have enabled successful inference of pairs of amino-acid residues that are in contact in the three-dimensional structure of multi-protein complexes, starting from the correlations in the sequence data of known interaction partners. Recently, algorithms inspired by these methods have been developed to identify which proteins are functional interaction partners among the paralogous proteins of two families, starting from sequence data alone. Here, we demonstrate that a slightly higher performance for partner identification can be reached by an approximate maximization of the mutual information between the sequence alignments of the two protein families. Our mutual information-based method also provides signatures of the existence of interactions between protein families. These results stand in contrast with structure prediction of proteins and of multi-protein complexes from sequence data, where pairwise maximum-entropy based global statistical models substantially improve performance compared to mutual information. Our findings entail that the statistical dependences allowing interaction partner prediction from sequence data are not restricted to the residue pairs that are in direct contact at the interface between the partner proteins.
We unzip DNA molecules using optical tweezers and determine the sizes of the cooperatively unzipping and zipping regions separating consecutive metastable intermediates along the unzipping pathway. Sizes are found to be distributed following a power law, ranging from one base pair up to more than a hundred base pairs. We find that a large fraction of unzipping regions smaller than 10 bp are seldom detected because of the high compliance of the released single stranded DNA. We show how the compliance of a single nucleotide sets a limit value around 0.1 N/m for the stiffness of any local force probe aiming to discriminate one base pair at a time in DNA unzipping experiments.
Determining which proteins interact together is crucial to a systems-level understanding of the cell. Recently, algorithms based on Direct Coupling Analysis (DCA) pairwise maximum-entropy models have allowed to identify interaction partners among par alogous proteins from sequence data. This success of DCA at predicting protein-protein interactions could be mainly based on its known ability to identify pairs of residues that are in contact in the three-dimensional structure of protein complexes and that coevolve to remain physicochemically complementary. However, interacting proteins possess similar evolutionary histories. What is the role of purely phylogenetic correlations in the performance of DCA-based methods to infer interaction partners? To address this question, we employ controlled synthetic data that only involve phylogeny and no interactions or contacts. We find that DCA accurately identifies the pairs of synthetic sequences that share evolutionary history. While phylogenetic correlations confound the identification of contacting residues by DCA, they are thus useful to predict interacting partners among paralogs. We find that DCA performs as well as phylogenetic methods to this end, and slightly better than them with large and accurate training sets. Employing DCA or phylogenetic methods within an Iterative Pairing Algorithm (IPA) allows to predict pairs of evolutionary partners without a training set. We demonstrate the ability of these various methods to correctly predict pairings among real paralogous proteins with genome proximity but no known physical interaction, illustrating the importance of phylogenetic correlations in natural data. However, for physically interacting and strongly coevolving proteins, DCA and mutual information outperform phylogenetic methods. We discuss how to distinguish physically interacting proteins from those only sharing evolutionary history.
Statistics of Poincare recurrences is studied for the base-pair breathing dynamics of an all-atom DNA molecule in realistic aqueous environment with thousands of degrees of freedom. It is found that at least over five decades in time the decay of rec urrences is described by an algebraic law with the Poincare exponent close to $beta=1.2$. This value is directly related to the correlation decay exponent $ u = beta -1$, which is close to $ uapprox 0.15$ observed in the time resolved Stokes shift experiments. By applying the virial theorem we analyse the chaotic dynamics in polynomial potentials and demonstrate analytically that exponent $beta=1.2$ is obtained assuming the dominance of dipole-dipole interactions in the relevant DNA dynamics. Molecular dynamics simulations also reveal the presence of strong low frequency noise with the exponent $eta=1.6$. We trace parallels with the chaotic dynamics of symplectic maps with a few degrees of freedom characterized by the Poincare exponent $beta sim 1.5$.
We consider a population evolving due to mutation, selection and recombination, where selection includes single-locus terms (additive fitness) and two-loci terms (pairwise epistatic fitness). We further consider the problem of inferring fitness in th e evolutionary dynamics from one or several snap-shots of the distribution of genotypes in the population. In the recent literature this has been done by applying the Quasi-Linkage Equilibrium (QLE) regime first obtained by Kimura in the limit of high recombination. Here we show that the approach also works in the interesting regime where the effects of mutations are comparable to or larger than recombination. This leads to a modified main epistatic fitness inference formula where the rates of mutation and recombination occur together. We also derive this formula using by a previously developed Gaussian closure that formally remains valid when recombination is absent. The findings are validated through numerical simulations.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا