New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Mungojerrie: Reinforcement Learning of Linear-Time Objectives

192 0 0.0 ( 0 )

Download Cite

Added by Mateo Perez

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Ernst Moritz Hahn - Mateo Perez - Sven Schewe

Machine Learning Logic in Computer Science Systems and Control

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Reinforcement learning synthesizes controllers without prior knowledge of the system. At each timestep, a reward is given. The controllers optimize the discounted sum of these rewards. Applying this class of algorithms requires designing a reward scheme, which is typically done manually. The designer must ensure that their intent is accurately captured. This may not be trivial, and is prone to error. An alternative to this manual programming, akin to programming directly in assembly, is to specify the objective in a formal language and have it compiled to a reward scheme. Mungojerrie (https://plv.colorado.edu/mungojerrie/) is a tool for testing reward schemes for $omega$-regular objectives on finite models. The tool contains reinforcement learning algorithms and a probabilistic model checker. Mungojerrie supports models specified in PRISM and $omega$-automata specified in HOA.

rate research

Model-free Reinforcement Learning for Branching Markov Decision Processes

127 - Ernst Moritz Hahn , Mateo Perez , Sven Schewe 2021

We study reinforcement learning for the optimal control of Branching Markov Decision Processes (BMDPs), a natural extension of (multitype) Branching Markov Chains (BMCs). The state of a (discrete-time) BMCs is a collection of entities of various types that, while spawning other entities, generate a payoff. In comparison with BMCs, where the evolution of a each entity of the same type follows the same probabilistic pattern, BMDPs allow an external controller to pick from a range of options. This permits us to study the best/worst behaviour of the system. We generalise model-free reinforcement learning techniques to compute an optimal control strategy of an unknown BMDP in the limit. We present results of an implementation that demonstrate the practicality of the approach.

Machine Learning Logic in Computer Science Systems and Control

Improving Generalization in Meta Reinforcement Learning using Learned Objectives

147 - Louis Kirsch , Sjoerd van Steenkiste , Jurgen Schmidhuber 2019

Biological evolution has distilled the experiences of many learners into the general learning algorithms of humans. Our novel meta reinforcement learning algorithm MetaGenRL is inspired by this process. MetaGenRL distills the experiences of many complex agents to meta-learn a low-complexity neural objective function that decides how future individuals will learn. Unlike recent meta-RL algorithms, MetaGenRL can generalize to new environments that are entirely different from those used for meta-training. In some cases, it even outperforms human-engineered RL algorithms. MetaGenRL uses off-policy second-order gradients during meta-training that greatly increase its sample efficiency.

Machine Learning Artificial Intelligence Neural and Evolutionary Computing

Polynomial Time Reinforcement Learning in Correlated FMDPs with Linear Value Functions

125 - Siddartha Devic , Zihao Deng , Brendan Juba 2021

Many reinforcement learning (RL) environments in practice feature enormous state spaces that may be described compactly by a factored structure, that may be modeled by Factored Markov Decision Processes (FMDPs). We present the first polynomial-time algorithm for RL with FMDPs that does not rely on an oracle planner, and instead of requiring a linear transition model, only requires a linear value function with a suitable local basis with respect to the factorization. With this assumption, we can solve FMDPs in polynomial time by constructing an efficient separation oracle for convex optimization. Importantly, and in contrast to prior work, we do not assume that the transitions on various factors are independent.

Machine Learning

Inverse Reinforcement Learning of Autonomous Behaviors Encoded as Weighted Finite Automata

126 - Tianyu Wang , Nikolay Atanasov 2021

This paper presents a method for learning logical task specifications and cost functions from demonstrations. Linear temporal logic (LTL) formulas are widely used to express complex objectives and constraints for autonomous systems. Yet, such specifications may be challenging to construct by hand. Instead, we consider demonstrated task executions, whose temporal logic structure and transition costs need to be inferred by an autonomous agent. We employ a spectral learning approach to extract a weighted finite automaton (WFA), approximating the unknown logic structure of the task. Thereafter, we define a product between the WFA for high-level task guidance and a Labeled Markov decision process (L-MDP) for low-level control and optimize a cost function that matches the demonstrators behavior. We demonstrate that our method is capable of generalizing the execution of the inferred task specification to new environment configurations.

Machine Learning Logic in Computer Science

Distributional reinforcement learning with linear function approximation

277 - Marc G. Bellemare , Nicolas Le Roux , Pablo Samuel Castro 2019

Despite many algorithmic advances, our theoretical understanding of practical distributional reinforcement learning methods remains limited. One exception is Rowland et al. (2018)s analysis of the C51 algorithm in terms of the Cramer distance, but their results only apply to the tabular setting and ignore C51s use of a softmax to produce normalized distributions. In this paper we adapt the Cramer distance to deal with arbitrary vectors. From it we derive a new distributional algorithm which is fully Cramer-based and can be combined to linear function approximation, with formal guarantees in the context of policy evaluation. In allowing the models prediction to be any real vector, we lose the probabilistic interpretation behind the method, but otherwise maintain the appealing properties of distributional approaches. To the best of our knowledge, ours is the first proof of convergence of a distributional algorithm combined with function approximation. Perhaps surprisingly, our results provide evidence that Cramer-based distributional methods may perform worse than directly approximating the value function.

Machine Learning Machine Learning

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Mungojerrie: Reinforcement Learning of Linear-Time Objectives

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions