No Arabic abstract
Combining reinforcement learning (RL) and molecular dynamics (MD) simulations, we propose a machine-learning approach (RL$^ddag$) to automatically unravel chemical reaction mechanisms. In RL$^ddag$, locating the transition state of a chemical reaction is formulated as a game, where a virtual player is trained to shoot simulation trajectories connecting the reactant and product. The player utilizes two functions, one for value estimation and the other for policy making, to iteratively improve the chance of winning this game. We can directly interpret the reaction mechanism according to the value function. Meanwhile, the policy function enables efficient sampling of the transition paths, which can be further used to analyze the reaction dynamics and kinetics. Through multiple experiments, we show that RL{ddag} can be trained tabula rasa hence allows us to reveal chemical reaction mechanisms with minimal subjective biases.
Some problems in physics can be handled only after a suitable textit{ansatz }solution has been guessed. Such method is therefore resilient to generalization, resulting of limited scope. The coherent transport by adiabatic passage of a quantum state through an array of semiconductor quantum dots provides a par excellence example of such approach, where it is necessary to introduce its so called counter-intuitive control gate ansatz pulse sequence. Instead, deep reinforcement learning technique has proven to be able to solve very complex sequential decision-making problems involving competition between short-term and long-term rewards, despite a lack of prior knowledge. We show that in the above problem deep reinforcement learning discovers control sequences outperforming the textit{ansatz} counter-intuitive sequence. Even more interesting, it discovers novel strategies when realistic disturbances affect the ideal system, with better speed and fidelity when energy detuning between the ground states of quantum dots or dephasing are added to the master equation, also mitigating the effects of losses. This method enables online update of realistic systems as the policy convergence is boosted by exploiting the prior knowledge when available. Deep reinforcement learning proves effective to control dynamics of quantum states, and more generally it applies whenever an ansatz solution is unknown or insufficient to effectively treat the problem.
We present a deep reinforcement learning framework where a machine agent is trained to search for a policy to generate a ground state for the square ice model by exploring the physical environment. After training, the agent is capable of proposing a sequence of local moves to achieve the goal. Analysis of the trained policy and the state value function indicates that the ice rule and loop-closing condition are learned without prior knowledge. We test the trained policy as a sampler in the Markov chain Monte Carlo and benchmark against the baseline loop algorithm. This framework can be generalized to other models with topological constraints where generation of constraint-preserving states is difficult.
Deep reinforcement learning has been recognized as an efficient technique to design optimal strategies for different complex systems without prior knowledge of the control landscape. To achieve a fast and precise control for quantum systems, we propose a novel deep reinforcement learning approach by constructing a curriculum consisting of a set of intermediate tasks defined by a fidelity threshold. Tasks among a curriculum can be statically determined using empirical knowledge or adaptively generated with the learning process. By transferring knowledge between two successive tasks and sequencing tasks according to their difficulties, the proposed curriculum-based deep reinforcement learning (CDRL) method enables the agent to focus on easy tasks in the early stage, then move onto difficult tasks, and eventually approaches the final task. Numerical simulations on closed quantum systems and open quantum systems demonstrate that the proposed method exhibits improved control performance for quantum systems and also provides an efficient way to identify optimal strategies with fewer control pulses.
Sepsis is a leading cause of mortality in intensive care units and costs hospitals billions annually. Treating a septic patient is highly challenging, because individual patients respond very differently to medical interventions and there is no universally agreed-upon treatment for sepsis. In this work, we propose an approach to deduce treatment policies for septic patients by using continuous state-space models and deep reinforcement learning. Our model learns clinically interpretable treatment policies, similar in important aspects to the treatment policies of physicians. The learned policies could be used to aid intensive care clinicians in medical decision making and improve the likelihood of patient survival.
Deep reinforcement learning (DRL) methods such as the Deep Q-Network (DQN) have achieved state-of-the-art results in a variety of challenging, high-dimensional domains. This success is mainly attributed to the power of deep neural networks to learn rich domain representations for approximating the value function or policy. Batch reinforcement learning methods with linear representations, on the other hand, are more stable and require less hyper parameter tuning. Yet, substantial feature engineering is necessary to achieve good results. In this work we propose a hybrid approach -- the Least Squares Deep Q-Network (LS-DQN), which combines rich feature representations learned by a DRL algorithm with the stability of a linear least squares method. We do this by periodically re-training the last hidden layer of a DRL network with a batch least squares update. Key to our approach is a Bayesian regularization term for the least squares update, which prevents over-fitting to the more recent data. We tested LS-DQN on five Atari games and demonstrate significant improvement over vanilla DQN and Double-DQN. We also investigated the reasons for the superior performance of our method. Interestingly, we found that the performance improvement can be attributed to the large batch size used by the LS method when optimizing the last layer.