ﻻ يوجد ملخص باللغة العربية
This paper presents a method for learning logical task specifications and cost functions from demonstrations. Linear temporal logic (LTL) formulas are widely used to express complex objectives and constraints for autonomous systems. Yet, such specifications may be challenging to construct by hand. Instead, we consider demonstrated task executions, whose temporal logic structure and transition costs need to be inferred by an autonomous agent. We employ a spectral learning approach to extract a weighted finite automaton (WFA), approximating the unknown logic structure of the task. Thereafter, we define a product between the WFA for high-level task guidance and a Labeled Markov decision process (L-MDP) for low-level control and optimize a cost function that matches the demonstrators behavior. We demonstrate that our method is capable of generalizing the execution of the inferred task specification to new environment configurations.
The goal of the inverse reinforcement learning (IRL) problem is to recover the reward functions from expert demonstrations. However, the IRL problem like any ill-posed inverse problem suffers the congenital defect that the policy may be optimal for m
Bayesian inference over the reward presents an ideal solution to the ill-posed nature of the inverse reinforcement learning problem. Unfortunately current methods generally do not scale well beyond the small tabular setting due to the need for an inn
Inverse reinforcement learning attempts to reconstruct the reward function in a Markov decision problem, using observations of agent actions. As already observed by Russell the problem is ill-posed, and the reward function is not identifiable, even u
Reinforcement learning synthesizes controllers without prior knowledge of the system. At each timestep, a reward is given. The controllers optimize the discounted sum of these rewards. Applying this class of algorithms requires designing a reward sch
We consider the transfer of experience samples (i.e., tuples < s, a, s, r >) in reinforcement learning (RL), collected from a set of source tasks to improve the learning process in a given target task. Most of the related approaches focus on selectin