Do you want to publish a course? Click here

From principal component to direct coupling analysis of coevolution in proteins: Low-eigenvalue modes are needed for structure prediction

256   0   0.0 ( 0 )
 Added by Remi Monasson
 Publication date 2012
  fields Biology Physics
and research's language is English




Ask ChatGPT about the research

Various approaches have explored the covariation of residues in multiple-sequence alignments of homologous proteins to extract functional and structural information. Among those are principal component analysis (PCA), which identifies the most correlated groups of residues, and direct coupling analysis (DCA), a global inference method based on the maximum entropy principle, which aims at predicting residue-residue contacts. In this paper, inspired by the statistical physics of disordered systems, we introduce the Hopfield-Potts model to naturally interpolate between these two approaches. The Hopfield-Potts model allows us to identify relevant patterns of residues from the knowledge of the eigenmodes and eigenvalues of the residue-residue correlation matrix. We show how the computation of such statistical patterns makes it possible to accurately predict residue-residue contacts with a much smaller number of parameters than DCA. This dimensional reduction allows us to avoid overfitting and to extract contact information from multiple-sequence alignments of reduced size. In addition, we show that low-eigenvalue correlation modes, discarded by PCA, are important to recover structural information: the corresponding patterns are highly localized, that is, they are concentrated in few sites, which we find to be in close contact in the three-dimensional protein fold.



rate research

Read More

Many non-coding RNAs are known to play a role in the cell directly linked to their structure. Structure prediction based on the sole sequence is however a challenging task. On the other hand, thanks to the low cost of sequencing technologies, a very large number of homologous sequences are becoming available for many RNA families. In the protein community, it has emerged in the last decade the idea of exploiting the covariance of mutations within a family to predict the protein structure using the direct-coupling-analysis (DCA) method. The application of DCA to RNA systems has been limited so far. We here perform an assessment of the DCA method on 17 riboswitch families, comparing it with the commonly used mutual information analysis and with state-of-the-art R-scape covariance method. We also compare different flavors of DCA, including mean-field, pseudo-likelihood, and a proposed stochastic procedure (Boltzmann learning) for solving exactly the DCA inverse problem. Boltzmann learning outperforms the other methods in predicting contacts observed in high resolution crystal structures.
The Protein Data Bank (PDB) contains the atomic structures of over 105 biomolecules with better than 2.8A resolution. The listing of the identities and coordinates of the atoms comprising each macromolecule permits an analysis of the slow-time vibrational response of these large systems to minor perturbations. 3D video animations of individual modes of oscillation demonstrate how regions interdigitate to create cohesive collective motions, providing a comprehensive framework for and familiarity with the overall 3D architecture. Furthermore, the isolation and representation of the softest, slowest deformation coordinates provide opportunities for the development of me- chanical models of enzyme function. The eigenvector decomposition, therefore, must be accurate, reliable as well as rapid to be generally reported upon. We obtain the eigenmodes of a 1.2A 34kDa PDB entry using either exclusively heavy atoms or partly or fully reduced atomic sets; Cartesian or internal coordinates; interatomic force fields derived either from a full Cartesian potential, a reduced atomic potential or a Gaussian distance-dependent potential; and independently devel- oped software. These varied technologies are similar in that each maintains proper stereochemistry either by use of dihedral degrees of freedom which freezes bond lengths and bond angles, or by use of a full atomic potential that includes realistic bond length and angle restraints. We find that the shapes of the slowest eigenvectors are nearly identical, not merely similar.
We study the space of all compact structures on a two-dimensional square lattice of size $N=6times6$. Each structure is mapped onto a vector in $N$-dimensions according to a hydrophobic model. Previous work has shown that the designabilities of structures are closely related to the distribution of the structure vectors in the $N$-dimensional space, with highly designable structures predominantly found in low density regions. We use principal component analysis to probe and characterize the distribution of structure vectors, and find a non-uniform density with a single peak. Interestingly, the principal axes of this peak are almost aligned with Fourier eigenvectors, and the corresponding Fourier eigenvalues go to zero continuously at the wave-number for alternating patterns ($q=pi$). These observations provide a stepping stone for an analytic description of the distribution of structural points, and open the possibility of estimating designabilities of realistic structures by simply Fourier transforming the hydrophobicities of the corresponding sequences.
Protein secondary structure (SS) prediction is important for studying protein structure and function. When only the sequence (profile) information is used as input feature, currently the best predictors can obtain ~80% Q3 accuracy, which has not been improved in the past decade. Here we present DeepCNF (Deep Convolutional Neural Fields) for protein SS prediction. DeepCNF is a Deep Learning extension of Conditional Neural Fields (CNF), which is an integration of Conditional Random Fields (CRF) and shallow neural networks. DeepCNF can model not only complex sequence-structure relationship by a deep hierarchical architecture, but also interdependency between adjacent SS labels, so it is much more powerful than CNF. Experimental results show that DeepCNF can obtain ~84% Q3 accuracy, ~85% SOV score, and ~72% Q8 accuracy, respectively, on the CASP and CAMEO test proteins, greatly outperforming currently popular predictors. As a general framework, DeepCNF can be used to predict other protein structure properties such as contact number, disorder regions, and solvent accessibility.
381 - Michael E. Wall 2002
This chapter describes gene expression analysis by Singular Value Decomposition (SVD), emphasizing initial characterization of the data. We describe SVD methods for visualization of gene expression data, representation of the data using a smaller number of variables, and detection of patterns in noisy gene expression data. In addition, we describe the precise relation between SVD analysis and Principal Component Analysis (PCA) when PCA is calculated using the covariance matrix, enabling our descriptions to apply equally well to either method. Our aim is to provide definitions, interpretations, examples, and references that will serve as resources for understanding and extending the application of SVD and PCA to gene expression analysis.
comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا