No Arabic abstract
We present a method to probe rare molecular dynamics trajectories directly using reinforcement learning. We consider trajectories that are conditioned to transition between regions of configuration space in finite time, like those relevant in the study of reactive events, as well as trajectories exhibiting rare fluctuations of time-integrated quantities in the long time limit, like those relevant in the calculation of large deviation functions. In both cases, reinforcement learning techniques are used to optimize an added force that minimizes the Kullback-Leibler divergence between the conditioned trajectory ensemble and a driven one. Under the optimized added force, the system evolves the rare fluctuation as a typical one, affording a variational estimate of its likelihood in the original trajectory ensemble. Low variance gradients employing value functions are proposed to increase the convergence of the optimal force. The method we develop employing these gradients leads to efficient and accurate estimates of both the optimal force and the likelihood of the rare event for a variety of model systems.
We study the time until first occurrence, the first-passage time, of rare density fluctuations in diffusive systems. We approach the problem using a model consisting of many independent random walkers on a lattice. The existence of spatial correlations makes this problem analytically intractable. However, for a mean-field approximation in which the walkers can jump anywhere in the system, we obtain a simple asymptotic form for the mean first-passage time to have a given number k of particles at a distinguished site. We show numerically, and argue heuristically, that for large enough k, the mean-field results give a good approximation for first-passage times for systems with nearest-neighbour dynamics, especially for two and higher spatial dimensions. Finally, we show how the results change when density fluctuations anywhere in the system, rather than at a specific distinguished site, are considered.
Very often when studying non-equilibrium systems one is interested in analysing dynamical behaviour that occurs with very low probability, so called rare events. In practice, since rare events are by definition atypical, they are often difficult to access in a statistically significant way. What are required are strategies to make rare events typical so that they can be generated on demand. Here we present such a general approach to adaptively construct a dynamics that efficiently samples atypical events. We do so by exploiting the methods of reinforcement learning (RL), which refers to the set of machine learning techniques aimed at finding the optimal behaviour to maximise a reward associated with the dynamics. We consider the general perspective of dynamical trajectory ensembles, whereby rare events are described in terms of ensemble reweighting. By minimising the distance between a reweighted ensemble and that of a suitably parametrised controlled dynamics we arrive at a set of methods similar to those of RL to numerically approximate the optimal dynamics that realises the rare behaviour of interest. As simple illustrations we consider in detail the problem of excursions of a random walker, for the case of rare events with a finite time horizon; and the problem of a studying current statistics of a particle hopping in a ring geometry, for the case of an infinite time horizon. We discuss natural extensions of the ideas presented here, including to continuous-time Markov systems, first passage time problems and non-Markovian dynamics.
We show how to calculate the likelihood of dynamical large deviations using evolutionary reinforcement learning. An agent, a stochastic model, propagates a continuous-time Monte Carlo trajectory and receives a reward conditioned upon the values of certain path-extensive quantities. Evolution produces progressively fitter agents, eventually allowing the calculation of a piece of a large-deviation rate function for a particular model and path-extensive quantity. For models with small state spaces the evolutionary process acts directly on rates, and for models with large state spaces the process acts on the weights of a neural network that parameterizes the models rates. This approach shows how path-extensive physics problems can be considered within a framework widely used in machine learning.
The probability of trajectories of weakly diffusive processes to remain in the tubular neighbourhood of a smooth path is given by the Freidlin-Wentzell-Graham theory of large deviations. The most probable path between two states (the instanton) and the leading term in the logarithm of the process transition density (the quasipotential) are obtained from the minimum of the Freidlin-Wentzell action functional. Here we present a Ritz method that searches for the minimum in a space of paths constructed from a global basis of Chebyshev polynomials. The action is reduced, thereby, to a multivariate function of the basis coefficients, whose minimum can be found by nonlinear optimization. For minimisation regardless of path duration, this procedure is most effective when applied to a reparametrisation-invariant on-shell action, which is obtained by exploiting a Noether symmetry and is a generalisation of the scalar work [Olender and Elber, 1997] for gradient dynamics and the geometric action [Heyman and Vanden-Eijnden, 2008] for non-gradient dynamics. Our approach provides an alternative to chain-of-states methods for minimum energy paths and saddlepoints of complex energy landscapes and to Hamilton-Jacobi methods for the stationary quasipotential of circulatory fields. We demonstrate spectral convergence for three benchmark problems involving the Muller-Brown potential, the Maier-Stein force field and the Egger weather model.
We show that neural networks trained by evolutionary reinforcement learning can enact efficient molecular self-assembly protocols. Presented with molecular simulation trajectories, networks learn to change temperature and chemical potential in order to promote the assembly of desired structures or choose between competing polymorphs. In the first case, networks reproduce in a qualitative sense the results of previously-known protocols, but faster and with higher fidelity; in the second case they identify strategies previously unknown, from which we can extract physical insight. Networks that take as input the elapsed time of the simulation or microscopic information from the system are both effective, the latter more so. The evolutionary scheme we have used is simple to implement and can be applied to a broad range of examples of experimental self-assembly, whether or not one can monitor the experiment as it proceeds. Our results have been achieved with no human input beyond the specification of which order parameter to promote, pointing the way to the design of synthesis protocols by artificial intelligence.