No Arabic abstract
Computing accurate reaction rates is a central challenge in computational chemistry and biology because of the high cost of free energy estimation with unbiased molecular dynamics. In this work, a data-driven machine learning algorithm is devised to learn collective variables with a multitask neural network, where a common upstream part reduces the high dimensionality of atomic configurations to a low dimensional latent space, and separate downstream parts map the latent space to predictions of basin class labels and potential energies. The resulting latent space is shown to be an effective low-dimensional representation, capturing the reaction progress and guiding effective umbrella sampling to obtain accurate free energy landscapes. This approach is successfully applied to model systems including a 5D Muller Brown model, a 5D three-well model, and alanine dipeptide in vacuum. This approach enables automated dimensionality reduction for energy controlled reactions in complex systems, offers a unified framework that can be trained with limited data, and outperforms single-task learning approaches, including autoencoders.
The development of enhanced sampling methods has greatly extended the scope of atomistic simulations, allowing long-time phenomena to be studied with accessible computational resources. Many such methods rely on the identification of an appropriate set of collective variables. These are meant to describe the systems modes that most slowly approach equilibrium. Once identified, the equilibration of these modes is accelerated by the enhanced sampling method of choice. An attractive way of determining the collective variables is to relate them to the eigenfunctions and eigenvalues of the transfer operator. Unfortunately, this requires knowing the long-term dynamics of the system beforehand, which is generally not available. However, we have recently shown that it is indeed possible to determine efficient collective variables starting from biased simulations. In this paper, we bring the power of machine learning and the efficiency of the recently developed on-the-fly probability enhanced sampling method to bear on this approach. The result is a powerful and robust algorithm that, given an initial enhanced sampling simulation performed with trial collective variables or generalized ensembles, extracts transfer operator eigenfunctions using a neural network ansatz and then accelerates them to promote sampling of rare events. To illustrate the generality of this approach we apply it to several systems, ranging from the conformational transition of a small molecule to the folding of a mini-protein and the study of materials crystallization.
A popular way to accelerate the sampling of rare events in molecular dynamics simulations is to introduce a potential that increases the fluctuations of selected collective variables. For this strategy to be successful, it is critical to choose appropriate variables. Here we review some recent developments in the data-driven design of collective variables, with a focus on the combination of Fishers discriminant analysis and neural networks. This approach allows to compress the fluctuations of metastable states into a low-dimensional representation. We illustrate through several examples the effectiveness of this method in accelerating the sampling, while also identifying the physical descriptors that undergo the most significant changes in the process.
Designing an appropriate set of collective variables is crucial to the success of several enhanced sampling methods. Here we focus on how to obtain such variables from information limited to the metastable states. We characterize these states by a large set of descriptors and employ neural networks to compress this information in a lower-dimensional space, using Fishers linear discriminant as an objective function to maximize the discriminative power of the network. We test this method on alanine dipeptide, using the non-linearly separable dataset composed by atomic distances. We then study an intermolecular aldol reaction characterized by a concerted mechanism. The resulting variables are able to promote sampling by drawing non-linear paths in the physical space connecting the fluctuations between metastable basins. Lastly, we interpret the behavior of the neural network by studying its relation to the physical variables. Through the identification of its most relevant features, we are able to gain chemical insight into the process.
We introduce a method to obtain one-dimensional collective variables for studying rarely occurring transitions between two metastable states separated by a high free energy barrier. No previous information, not even approximated, on the path followed during the transition is needed. The only requirement is to know the fluctuations of the system while in the two metastable states. With this information in hand we build the collective variable using a modified version of Fishers linear discriminant analysis. The usefulness of this approach is tested on the metadynamics simulation of two representative systems. The first is the freezing of silver iodide into the superionic $alpha$-phase, the second is the study of a classical Diels Alder reaction. The collective variable works very well in these two diverse cases.
We present a new method for sampling rare and large fluctuations in a non-equilibrium system governed by a stochastic partial differential equation (SPDE) with additive forcing. To this end, we deploy the so-called instanton formalism that corresponds to a saddle-point approximation of the action in the path integral formulation of the underlying SPDE. The crucial step in our approach is the formulation of an alternative SPDE that incorporates knowledge of the instanton solution such that we are able to constrain the dynamical evolutions around extreme flow configurations only. Finally, a reweighting procedure based on the Girsanov theorem is applied to recover the full distribution function of the original system. The entire procedure is demonstrated on the example of the one-dimensional Burgers equation. Furthermore, we compare our method to conventional direct numerical simulations as well as to Hybrid Monte Carlo methods. It will be shown that the instanton-based sampling method outperforms both approaches and allows for an accurate quantification of the whole probability density function of velocity gradients from the core to the very far tails.