ﻻ يوجد ملخص باللغة العربية
This paper is dedicated to designing provably efficient adversarial imitation learning (AIL) algorithms that directly optimize policies from expert demonstrations. Firstly, we develop a transition-aware AIL algorithm named TAIL with an expert sample complexity of $tilde{O}(H^{3/2} |S|/varepsilon)$ under the known transition setting, where $H$ is the planning horizon, $|S|$ is the state space size and $varepsilon$ is desired policy value gap. This improves upon the previous best bound of $tilde{O}(H^2 |S| / varepsilon^2)$ for AIL methods and matches the lower bound of $tilde{Omega} (H^{3/2} |S|/varepsilon)$ in [Rajaraman et al., 2021] up to a logarithmic factor. The key ingredient of TAIL is a fine-grained estimator for expert state-action distribution, which explicitly utilizes the transition function information. Secondly, considering practical settings where the transition functions are usually unknown but environment interaction is allowed, we accordingly develop a model-based transition-aware AIL algorithm named MB-TAIL. In particular, MB-TAIL builds an empirical transition model by interacting with the environment and performs imitation under the recovered empirical model. The interaction complexity of MB-TAIL is $tilde{O} (H^3 |S|^2 |A| / varepsilon^2)$, which improves the best known result of $tilde{O} (H^4 |S|^2 |A| / varepsilon^2)$ in [Shani et al., 2021]. Finally, our theoretical results are supported by numerical evaluation and detailed analysis on two challenging MDPs.
We study the reinforcement learning problem for discounted Markov Decision Processes (MDPs) under the tabular setting. We propose a model-based algorithm named UCBVI-$gamma$, which is based on the emph{optimism in the face of uncertainty principle} a
This paper explores a simple regularizer for reinforcement learning by proposing Generative Adversarial Self-Imitation Learning (GASIL), which encourages the agent to imitate past good trajectories via generative adversarial imitation learning framew
We show that a critical vulnerability in adversarial imitation is the tendency of discriminator networks to learn spurious associations between visual features and expert labels. When the discriminator focuses on task-irrelevant features, it does not
We present the ADaptive Adversarial Imitation Learning (ADAIL) algorithm for learning adaptive policies that can be transferred between environments of varying dynamics, by imitating a small number of demonstrations collected from a single source dom
We study risk-sensitive imitation learning where the agents goal is to perform at least as well as the expert in terms of a risk profile. We first formulate our risk-sensitive imitation learning setting. We consider the generative adversarial approac