New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Nearly Minimax Optimal Adversarial Imitation Learning with Known and Unknown Transitions

140 0 0.0 ( 0 )

Download Cite

Added by Yang Yu

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Tian Xu - Ziniu Li - Yang Yu

Machine Learning Artificial Intelligence

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

This paper is dedicated to designing provably efficient adversarial imitation learning (AIL) algorithms that directly optimize policies from expert demonstrations. Firstly, we develop a transition-aware AIL algorithm named TAIL with an expert sample complexity of $tilde{O}(H^{3/2} |S|/varepsilon)$ under the known transition setting, where $H$ is the planning horizon, $|S|$ is the state space size and $varepsilon$ is desired policy value gap. This improves upon the previous best bound of $tilde{O}(H^2 |S| / varepsilon^2)$ for AIL methods and matches the lower bound of $tilde{Omega} (H^{3/2} |S|/varepsilon)$ in [Rajaraman et al., 2021] up to a logarithmic factor. The key ingredient of TAIL is a fine-grained estimator for expert state-action distribution, which explicitly utilizes the transition function information. Secondly, considering practical settings where the transition functions are usually unknown but environment interaction is allowed, we accordingly develop a model-based transition-aware AIL algorithm named MB-TAIL. In particular, MB-TAIL builds an empirical transition model by interacting with the environment and performs imitation under the recovered empirical model. The interaction complexity of MB-TAIL is $tilde{O} (H^3 |S|^2 |A| / varepsilon^2)$, which improves the best known result of $tilde{O} (H^4 |S|^2 |A| / varepsilon^2)$ in [Shani et al., 2021]. Finally, our theoretical results are supported by numerical evaluation and detailed analysis on two challenging MDPs.

rate research

Nearly Minimax Optimal Reinforcement Learning for Discounted MDPs

142 - Jiafan He , Dongruo Zhou , Quanquan Gu 2020

We study the reinforcement learning problem for discounted Markov Decision Processes (MDPs) under the tabular setting. We propose a model-based algorithm named UCBVI-$gamma$, which is based on the emph{optimism in the face of uncertainty principle} and the Bernstein-type bonus. We show that UCBVI-$gamma$ achieves an $tilde{O}big({sqrt{SAT}}/{(1-gamma)^{1.5}}big)$ regret, where $S$ is the number of states, $A$ is the number of actions, $gamma$ is the discount factor and $T$ is the number of steps. In addition, we construct a class of hard MDPs and show that for any algorithm, the expected regret is at least $tilde{Omega}big({sqrt{SAT}}/{(1-gamma)^{1.5}}big)$. Our upper bound matches the minimax lower bound up to logarithmic factors, which suggests that UCBVI-$gamma$ is nearly minimax optimal for discounted MDPs.

Machine Learning Optimization and Control Machine Learning

Generative Adversarial Self-Imitation Learning

166 - Yijie Guo , Junhyuk Oh , Satinder Singh 2018

This paper explores a simple regularizer for reinforcement learning by proposing Generative Adversarial Self-Imitation Learning (GASIL), which encourages the agent to imitate past good trajectories via generative adversarial imitation learning framework. Instead of directly maximizing rewards, GASIL focuses on reproducing past good trajectories, which can potentially make long-term credit assignment easier when rewards are sparse and delayed. GASIL can be easily combined with any policy gradient objective by using GASIL as a learned shaped reward function. Our experimental results show that GASIL improves the performance of proximal policy optimization on 2D Point Mass and MuJoCo environments with delayed reward and stochastic dynamics.

Machine Learning Artificial Intelligence Machine Learning

Task-Relevant Adversarial Imitation Learning

84 - Konrad Zolna , Scott Reed , Alexander Novikov 2019

We show that a critical vulnerability in adversarial imitation is the tendency of discriminator networks to learn spurious associations between visual features and expert labels. When the discriminator focuses on task-irrelevant features, it does not provide an informative reward signal, leading to poor task performance. We analyze this problem in detail and propose a solution that outperforms standard Generative Adversarial Imitation Learning (GAIL). Our proposed method, Task-Relevant Adversarial Imitation Learning (TRAIL), uses constrained discriminator optimization to learn informative rewards. In comprehensive experiments, we show that TRAIL can solve challenging robotic manipulation tasks from pixels by imitating human operators without access to any task rewards, and clearly outperforms comparable baseline imitation agents, including those trained via behaviour cloning and conventional GAIL.

Machine Learning Artificial Intelligence Robotics

ADAIL: Adaptive Adversarial Imitation Learning

111 - Yiren Lu , Jonathan Tompson 2020

We present the ADaptive Adversarial Imitation Learning (ADAIL) algorithm for learning adaptive policies that can be transferred between environments of varying dynamics, by imitating a small number of demonstrations collected from a single source domain. This is an important problem in robotic learning because in real world scenarios 1) reward functions are hard to obtain, 2) learned policies from one domain are difficult to deploy in another due to varying source to target domain statistics, 3) collecting expert demonstrations in multiple environments where the dynamics are known and controlled is often infeasible. We address these constraints by building upon recent advances in adversarial imitation learning; we condition our policy on a learned dynamics embedding and we employ a domain-adversarial loss to learn a dynamics-invariant discriminator. The effectiveness of our method is demonstrated on simulated control tasks with varying environment dynamics and the learned adaptive agent outperforms several recent baselines.

Machine Learning Artificial Intelligence

Risk-Sensitive Generative Adversarial Imitation Learning

86 - Jonathan Lacotte , Mohammad Ghavamzadeh , Yinlam Chow 2018

We study risk-sensitive imitation learning where the agents goal is to perform at least as well as the expert in terms of a risk profile. We first formulate our risk-sensitive imitation learning setting. We consider the generative adversarial approach to imitation learning (GAIL) and derive an optimization problem for our formulation, which we call it risk-sensitive GAIL (RS-GAIL). We then derive two differe

Machine Learning Artificial Intelligence Machine Learning

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Nearly Minimax Optimal Adversarial Imitation Learning with Known and Unknown Transitions

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions