Non-Markovian Reinforcement Learning using Fractional Dynamics

62 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Gaurav Gupta

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Gaurav Gupta - Chenzhong Yin - Jyotirmoy V. Deshmukh

التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Reinforcement learning (RL) is a technique to learn the control policy for an agent that interacts with a stochastic environment. In any given state, the agent takes some action, and the environment determines the probability distribution over the next state as well as gives the agent some reward. Most RL algorithms typically assume that the environment satisfies Markov assumptions (i.e. the probability distribution over the next state depends only on the current state). In this paper, we propose a model-based RL technique for a system that has non-Markovian dynamics. Such environments are common in many real-world applications such as in human physiology, biological systems, material science, and population dynamics. Model-based RL (MBRL) techniques typically try to simultaneously learn a model of the environment from the data, as well as try to identify an optimal policy for the learned model. We propose a technique where the non-Markovianity of the system is modeled through a fractional dynamical system. We show that we can quantify the difference in the performance of an MBRL algorithm that uses bounded horizon model predictive control from the optimal policy. Finally, we demonstrate our proposed framework on a pharmacokinetic model of human blood glucose dynamics and show that our fractional models can capture distant correlations on real-world datasets.

قيم البحث

360 - I. A. Luchnikov , S. V. Vintskevich , D. A. Grigoriev 2019

Machine learning methods have proved to be useful for the recognition of patterns in statistical data. The measurement outcomes are intrinsically random in quantum physics, however, they do have a pattern when the measurements are performed successiv ely on an open quantum system. This pattern is due to the system-environment interaction and contains information about the relaxation rates as well as non-Markovian memory effects. Here we develop a method to extract the information about the unknown environment from a series of projective single-shot measurements on the system (without resorting to the process tomography). The method is based on embedding the non-Markovian system dynamics into a Markovian dynamics of the system and the effective reservoir of finite dimension. The generator of Markovian embedding is learned by the maximum likelihood estimation. We verify the method by comparing its prediction with an exactly solvable non-Markovian dynamics. The developed algorithm to learn unknown quantum environments enables one to efficiently control and manipulate quantum systems.

فيزياء الكم

A Multiagent Reinforcement Learning Algorithm with Non-linear Dynamics

93 - Sherief Abdallah , Victor Lesser 2014

Several multiagent reinforcement learning (MARL) algorithms have been proposed to optimize agents decisions. Due to the complexity of the problem, the majority of the previously developed MARL algorithms assumed agents either had some knowledge of th e underlying game (such as Nash equilibria) and/or observed other agents actions and the rewards they received. We introduce a new MARL algorithm called the Weighted Policy Learner (WPL), which allows agents to reach a Nash Equilibrium (NE) in benchmark 2-player-2-action games with minimum knowledge. Using WPL, the only feedback an agent needs is its own local reward (the agent does not observe other agents actions or rewards). Furthermore, WPL does not assume that agents know the underlying game or the corresponding Nash Equilibrium a priori. We experimentally show that our algorithm converges in benchmark two-player-two-action games. We also show that our algorithm converges in the challenging Shapleys game where previous MARL algorithms failed to converge without knowing the underlying game or the NE. Furthermore, we show that WPL outperforms the state-of-the-art algorithms in a more realistic setting of 100 agents interacting and learning concurrently. An important aspect of understanding the behavior of a MARL algorithm is analyzing the dynamics of the algorithm: how the policies of multiple learning agents evolve over time as agents interact with one another. Such an analysis not only verifies whether agents using a given MARL algorithm will eventually converge, but also reveals the behavior of the MARL algorithm prior to convergence. We analyze our algorithm in two-player-two-action games and show that symbolically proving WPLs convergence is difficult, because of the non-linear nature of WPLs dynamics, unlike previous MARL algorithms that had either linear or piece-wise-linear dynamics. Instead, we numerically solve WPLs dynamics differential equations and compare the solution to the dynamics of previous MARL algorithms.

التعلم الآلي أنظمة متعددة العملاء

Fractional Transfer Learning for Deep Model-Based Reinforcement Learning

159 - Remo Sasso , Matthia Sabatelli , Marco A. Wiering 2021

Reinforcement learning (RL) is well known for requiring large amounts of data in order for RL agents to learn to perform complex tasks. Recent progress in model-based RL allows agents to be much more data-efficient, as it enables them to learn behavi ors of visual environments in imagination by leveraging an internal World Model of the environment. Improved sample efficiency can also be achieved by reusing knowledge from previously learned tasks, but transfer learning is still a challenging topic in RL. Parameter-based transfer learning is generally done using an all-or-nothing approach, where the networks parameters are either fully transferred or randomly initialized. In this work we present a simple alternative approach: fractional transfer learning. The idea is to transfer fractions of knowledge, opposed to discarding potentially useful knowledge as is commonly done with random initialization. Using the World Model-based Dreamer algorithm, we identify which type of components this approach is applicable to, and perform experiments in a new multi-source transfer learning setting. The results show that fractional transfer learning often leads to substantially improved performance and faster learning compared to learning from scratch and random initialization.

التعلم الآلي الذكاء الاصطناعي

Assessing non-Markovian dynamics

245 - M.M. Wolf , J. Eisert , T.S. Cubitt 2008

We investigate what a snapshot of a quantum evolution - a quantum channel reflecting open system dynamics - reveals about the underlying continuous time evolution. Remarkably, from such a snapshot, and without imposing additional assumptions, it can be decided whether or not a channel is consistent with a time (in)dependent Markovian evolution, for which we provide computable necessary and sufficient criteria. Based on these, a computable measure of `Markovianity is introduced. We discuss how the consistency with Markovian dynamics can be checked in quantum process tomography. The results also clarify the geometry of the set of quantum channels with respect to being solutions of time (in)dependent master equations.

فيزياء الكم الفيزياء الرياضية الفيزياء الرياضية

Reinforcement Learning using Guided Observability

98 - Stephan Weigand , Pascal Klink , Jan Peters 2021

Due to recent breakthroughs, reinforcement learning (RL) has demonstrated impressive performance in challenging sequential decision-making problems. However, an open question is how to make RL cope with partial observability which is prevalent in man y real-world problems. Contrary to contemporary RL approaches, which focus mostly on improved memory representations or strong assumptions about the type of partial observability, we propose a simple but efficient approach that can be applied together with a wide variety of RL methods. Our main insight is that smoothly transitioning from full observability to partial observability during the training process yields a high performance policy. The approach, called partially observable guided reinforcement learning (PO-GRL), allows to utilize full state information during policy optimization without compromising the optimality of the final policy. A comprehensive evaluation in discrete partially observableMarkov decision process (POMDP) benchmark problems and continuous partially observable MuJoCo and OpenAI gym tasks shows that PO-GRL improves performance. Finally, we demonstrate PO-GRL in the ball-in-the-cup task on a real Barrett WAM robot under partial observability.

التعلم الآلي