No Arabic abstract
In structure-based models of proteins, one often assumes that folding is accomplished when all contacts are established. This assumption may frequently lead to a conceptual problem that folding takes place in a temperature region of very low thermodynamic stability, especially when the contact map used is too sparse. We consider six different structure-based models and show that allowing for a small, but model-dependent, percentage of the native contacts not being established boosts the folding temperature substantially while affecting the time scales of folding only in a minor way. We also compare other properties of the six models. We show that the choice of the description of the backbone stiffness has a substantial effect on the values of characteristic temperatures that relate both to equilibrium and kinetic properties. Models without any backbone stiffness (like the self-organized polymer) are found to perform similar to those with the stiffness, including in the studies of stretching.
Exploring and understanding the protein-folding problem has been a long-standing challenge in molecular biology. Here, using molecular dynamics simulation, we reveal how parallel distributed adjacent planar peptide groups of unfolded proteins fold reproducibly following explicit physical folding codes in aqueous environments due to electrostatic attractions. Superfast folding of protein is found to be powered by the contribution of the formation of hydrogen bonds. Temperature-induced torsional waves propagating along unfolded proteins break the parallel distributed state of specific amino acids, inferred as the beginning of folding. Electric charge and rotational resistance differences among neighboring side-chains are used to decipher the physical folding codes by means of which precise secondary structures develop. We present a powerful method of decoding amino acid sequences to predict native structures of proteins. The method is verified by comparing the results available from experiments in the literature.
Computational elucidation of membrane protein (MP) structures is challenging partially due to lack of sufficient solved structures for homology modeling. Here we describe a high-throughput deep transfer learning method that first predicts MP contacts by learning from non-membrane proteins (non-MPs) and then predicting three-dimensional structure models using the predicted contacts as distance restraints. Tested on 510 non-redundant MPs, our method has contact prediction accuracy at least 0.18 better than existing methods, predicts correct folds for 218 MPs (TMscore at least 0.6), and generates three-dimensional models with RMSD less than 4 Angstrom and 5 Angstrom for 57 and 108 MPs, respectively. A rigorous blind test in the continuous automated model evaluation (CAMEO) project shows that our method predicted high-resolution three-dimensional models for two recent test MPs of 210 residues with RMSD close to 2 Angstrom. We estimated that our method could predict correct folds for between 1,345 and 1,871 reviewed human multi-pass MPs including a few hundred new folds, which shall facilitate the discovery of drugs targeting at membrane proteins.
Stochastic simulations of coarse-grained protein models are used to investigate the propensity to form knots in early stages of protein folding. The study is carried out comparatively for two homologous carbamoyltransferases, a natively-knotted N-acetylornithine carbamoyltransferase (AOTCase) and an unknotted ornithine carbamoyltransferase (OTCase). In addition, two different sets of pairwise amino acid interactions are considered: one promoting exclusively native interactions, and the other additionally including non-native quasi-chemical and electrostatic interactions. With the former model neither protein show a propensity to form knots. With the additional non-native interactions, knotting propensity remains negligible for the natively-unknotted OTCase while for AOTCase it is much enhanced. Analysis of the trajectories suggests that the different entanglement of the two transcarbamylases follows from the tendency of the C-terminal to point away from (for OTCase) or approach and eventually thread (for AOTCase) other regions of partly-folded protein. The analysis of the OTCase/AOTCase pair clarifies that natively-knotted proteins can spontaneously knot during early folding stages and that non-native sequence-dependent interactions are important for promoting and disfavoring early knotting events.
Contact-assisted protein folding has made very good progress, but two challenges remain. One is accurate contact prediction for proteins lack of many sequence homologs and the other is that time-consuming folding simulation is often needed to predict good 3D models from predicted contacts. We show that protein distance matrix can be predicted well by deep learning and then directly used to construct 3D models without folding simulation at all. Using distance geometry to construct 3D models from our predicted distance matrices, we successfully folded 21 of the 37 CASP12 hard targets with a median family size of 58 effective sequence homologs within 4 hours on a Linux computer of 20 CPUs. In contrast, contacts predicted by direct coupling analysis (DCA) cannot fold any of them in the absence of folding simulation and the best CASP12 group folded 11 of them by integrating predicted contacts into complex, fragment-based folding simulation. The rigorous experimental validation on 15 CASP13 targets show that among the 3 hardest targets of new fold our distance-based folding servers successfully folded 2 large ones with <150 sequence homologs while the other servers failed on all three, and that our ab initio folding server also predicted the best, high-quality 3D model for a large homology modeling target. Further experimental validation in CAMEO shows that our ab initio folding server predicted correct fold for a membrane protein of new fold with 200 residues and 229 sequence homologs while all the other servers failed. These results imply that deep learning offers an efficient and accurate solution for ab initio folding on a personal computer.
We perform theoretical studies of stretching of 20 proteins with knots within a coarse grained model. The knots ends are found to jump to well defined sequential locations that are associated with sharp turns whereas in homopolymers they diffuse around and eventually slide off. The waiting times of the jumps are increasingly stochastic as the temperature is raised. Larger knots do not return to their native locations when a protein is released after stretching.