New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Omega-Regular Objectives in Model-Free Reinforcement Learning

352 0 0.0 ( 0 )

Download Cite

Added by Ashutosh Trivedi

Publication date 2018

fields Informatics Engineering

and research's language is English

Authors Ernst Moritz Hahn - Mateo Perez - Sven Schewe

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We provide the first solution for model-free reinforcement learning of {omega}-regular objectives for Markov decision processes (MDPs). We present a constructive reduction from the almost-sure satisfaction of {omega}-regular objectives to an almost- sure reachability problem and extend this technique to learning how to control an unknown model so that the chance of satisfying the objective is maximized. A key feature of our technique is the compilation of {omega}-regular properties into limit- deterministic Buechi automata instead of the traditional Rabin automata; this choice sidesteps difficulties that have marred previous proposals. Our approach allows us to apply model-free, off-the-shelf reinforcement learning algorithms to compute optimal strategies from the observations of the MDP. We present an experimental evaluation of our technique on benchmark learning problems.

rate research

Reward Shaping for Reinforcement Learning with Omega-Regular Objectives

83 - E. M. Hahn , M. Perez , S. Schewe 2020

Recently, successful approaches have been made to exploit good-for-MDPs automata (Buchi automata with a restricted form of nondeterminism) for model free reinforcement learning, a class of automata that subsumes good for games automata and the most widespread class of limit deterministic automata. The foundation of using these Buchi automata is that the Buchi condition can, for good-for-MDP automata, be translated to reachability. The drawback of this translation is that the rewards are, on average, reaped very late, which requires long episodes during the learning process. We devise a new reward shaping approach that overcomes this issue. We show that the resulting model is equivalent to a discounted payoff objective with a biased discount that simplifies and improves on prior work in this direction.

Logic in Computer Science Machine Learning

Tree games with regular objectives

170 - Marcin Przyby{l}ko 2014

We study tree games developed recently by Matteo Mio as a game interpretation of the probabilistic $mu$-calculus. With expressive power comes complexity. Mio showed that tree games are able to encode Blackwell games and, consequently, are not determined under deterministic strategies. We show that non-stochastic tree games with objectives recognisable by so-called game automata are determined under deterministic, finite memory strategies. Moreover, we give an elementary algorithmic procedure which, for an arbitrary regular language L and a finite non-stochastic tree game with a winning objective L decides if the game is determined under deterministic strategies.

Logic in Computer Science Formal Languages and Automata Theory Computer Science and Game Theory

Robotic Table Tennis with Model-Free Reinforcement Learning

110 - Wenbo Gao , Laura Graesser , Krzysztof Choromanski 2020

We propose a model-free algorithm for learning efficient policies capable of returning table tennis balls by controlling robot joints at a rate of 100Hz. We demonstrate that evolutionary search (ES) methods acting on CNN-based policy architectures for non-visual inputs and convolving across time learn compact controllers leading to smooth motions. Furthermore, we show that with appropriately tuned curriculum learning on the task and rewards, policies are capable of developing multi-modal styles, specifically forehand and backhand stroke, whilst achieving 80% return rate on a wide range of ball throws. We observe that multi-modality does not require any architectural priors, such as multi-head architectures or hierarchical policies.

Machine Learning Robotics Machine Learning

Topology and Ambiguity in Omega Context Free Languages

246 - Olivier Finkel 2008

We study the links between the topological complexity of an omega context free language and its degree of ambiguity. In particular, using known facts from classical descriptive set theory, we prove that non Borel omega context free languages which are recognized by Buchi pushdown automata have a maximum degree of ambiguity. This result implies that degrees of ambiguity are really not preserved by the operation of taking the omega power of a finitary context free language. We prove also that taking the adherence or the delta-limit of a finitary language preserves neither unambiguity nor inherent ambiguity. On the other side we show that methods used in the study of omega context free languages can also be applied to study the notion of ambiguity in infinitary rational relations accepted by Buchi 2-tape automata and we get first results in that direction.

Logic in Computer Science Logic

Learning Human Objectives by Evaluating Hypothetical Behavior

113 - Siddharth Reddy , Anca D. Dragan , Sergey Levine 2019

We seek to align agent behavior with a users objectives in a reinforcement learning setting with unknown dynamics, an unknown reward function, and unknown unsafe states. The user knows the rewards and unsafe states, but querying the user is expensive. To address this challenge, we propose an algorithm that safely and interactively learns a model of the users reward function. We start with a generative model of initial states and a forward dynamics model trained on off-policy data. Our method uses these models to synthesize hypothetical behaviors, asks the user to label the behaviors with rewards, and trains a neural network to predict the rewards. The key idea is to actively synthesize the hypothetical behaviors from scratch by maximizing tractable proxies for the value of information, without interacting with the environment. We call this method reward query synthesis via trajectory optimization (ReQueST). We evaluate ReQueST with simulated users on a state-based 2D navigation task and the image-based Car Racing video game. The results show that ReQueST significantly outperforms prior methods in learning reward models that transfer to new environments with different initial state distributions. Moreover, ReQueST safely trains the reward model to detect unsafe states, and corrects reward hacking before deploying the agent.

Computers and Society Machine Learning Machine Learning

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Omega-Regular Objectives in Model-Free Reinforcement Learning

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions