No Arabic abstract
We develop a theoretical approach to the protein folding problem based on out-of-equilibrium stochastic dynamics. Within this framework, the computational difficulties related to the existence of large time scale gaps in the protein folding problem are removed and simulating the entire reaction in atomistic details using existing computers becomes feasible. In addition, this formalism provides a natural framework to investigate the relationships between thermodynamical and kinetic aspects of the folding. For example, it is possible to show that, in order to have a large probability to remain unchanged under Langevin diffusion, the native state has to be characterized by a small conformational entropy. We discuss how to determine the most probable folding pathway, to identify configurations representative of the transition state and to compute the most probable transition time. We perform an illustrative application of these ideas, studying the conformational evolution of alanine di-peptide, within an all-atom model based on the empiric GROMOS96 force field.
Drug resistance to HIV-1 Protease involves accumulation of multiple mutations in the protein. Here we investigate the role of these mutations by using molecular dynamics simulations which exploit the influence of the native-state topology in the folding process. Our calculations show that sites contributing to phenotypic resistance of FDA-approved drugs are among the most sensitive positions for the stability of partially folded states and should play a relevant role in the folding process. Furthermore, associations between amino acid sites mutating under drug treatment are shown to be statistically correlated. The striking correlation between clinical data and our calculations suggest a novel approach to the design of drugs tailored to bind regions crucial not only for protein function but also for folding.
The folding pathway and rate coefficients of the folding of a knotted protein are calculated for a potential energy function with minimal energetic frustration. A kinetic transition network is constructed using the discrete path sampling approach, and the resulting potential energy surface is visualized by constructing disconnectivity graphs. Owing to topological constraints, the low-lying portion of the landscape consists of three distinct regions, corresponding to the native knotted state and to configurations where either the N- or C-terminus is not yet folded into the knot. The fastest folding pathways from denatured states exhibit early formation of the N-terminus portion of the knot and a rate-determining step where the C-terminus is incorporated. The low-lying minima with the N-terminus knotted and the C-terminus free therefore constitute an off-pathway intermediate for this model. The insertion of both the N- and C-termini into the knot occur late in the folding process, creating large energy barriers that are the rate limiting steps in the folding process. When compared to other protein folding proteins of a similar length, this system folds over six orders of magnitude more slowly.
Machine-learning models that learn from data to predict how protein sequence encodes function are emerging as a useful protein engineering tool. However, when using these models to suggest new protein designs, one must deal with the vast combinatorial complexity of protein sequences. Here, we review how to use a sequence-to-function machine-learning surrogate model to select sequences for experimental measurement. First, we discuss how to select sequences through a single round of machine-learning optimization. Then, we discuss sequential optimization, where the goal is to discover optimized sequences and improve the model across multiple rounds of training, optimization, and experimental measurement.
We introduce a powerful iterative algorithm to compute protein folding pathways, with realistic all-atom force fields. Using the path integral formalism, we explicitly derive a modified Langevin equation which samples directly the ensemble of reactive pathways, exponentially reducing the cost of simulating thermally activated transitions. The algorithm also yields a rigorous stochastic estimate of the reaction coordinate. After illustrating this approach on a simple toy model, we successfully validate it against the results of ultra-long plain MD protein folding simulations for a fast folding protein (Fip35), which were performed on the Anton supercomputer. Using our algorithm, computing a folding trajectory for this protein requires only 1000 core hours, a computational load which could be even carried out on a desktop workstation.
An exactly solvable model based on the topology of a protein native state is applied to identify bottlenecks and key-sites for the folding of HIV-1 Protease. The predicted sites are found to correlate well with clinical data on resistance to FDA-approved drugs. It has been observed that the effects of drug therapy are to induce multiple mutations on the protease. The sites where such mutations occur correlate well with those involved in folding bottlenecks identified through the deterministic procedure proposed in this study. The high statistical significance of the observed correlations suggests that the approach may be promisingly used in conjunction with traditional techniques to identify candidate locations for drug attacks.