Generative Adversarial Self-Imitation Learning

167 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Yijie Guo

تاريخ النشر 2018

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Yijie Guo - Junhyuk Oh - Satinder Singh

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

This paper explores a simple regularizer for reinforcement learning by proposing Generative Adversarial Self-Imitation Learning (GASIL), which encourages the agent to imitate past good trajectories via generative adversarial imitation learning framework. Instead of directly maximizing rewards, GASIL focuses on reproducing past good trajectories, which can potentially make long-term credit assignment easier when rewards are sparse and delayed. GASIL can be easily combined with any policy gradient objective by using GASIL as a learned shaped reward function. Our experimental results show that GASIL improves the performance of proximal policy optimization on 2D Point Mass and MuJoCo environments with delayed reward and stochastic dynamics.

قيم البحث

86 - Jonathan Lacotte , Mohammad Ghavamzadeh , Yinlam Chow 2018

We study risk-sensitive imitation learning where the agents goal is to perform at least as well as the expert in terms of a risk profile. We first formulate our risk-sensitive imitation learning setting. We consider the generative adversarial approac h to imitation learning (GAIL) and derive an optimization problem for our formulation, which we call it risk-sensitive GAIL (RS-GAIL). We then derive two differe

التعلم الآلي الذكاء الاصطناعي التعلم الالي

$f$-GAIL: Learning $f$-Divergence for Generative Adversarial Imitation Learning

204 - Xin Zhang , Yanhua Li , Ziming Zhang 2020

Imitation learning (IL) aims to learn a policy from expert demonstrations that minimizes the discrepancy between the learner and expert behaviors. Various imitation learning algorithms have been proposed with different pre-determined divergences to q uantify the discrepancy. This naturally gives rise to the following question: Given a set of expert demonstrations, which divergence can recover the expert policy more accurately with higher data efficiency? In this work, we propose $f$-GAIL, a new generative adversarial imitation learning (GAIL) model, that automatically learns a discrepancy measure from the $f$-divergence family as well as a policy capable of producing expert-like behaviors. Compared with IL baselines with various predefined divergence measures, $f$-GAIL learns better policies with higher data efficiency in six physics-based control tasks.

التعلم الآلي الذكاء الاصطناعي التعلم الالي

Self-Imitation Learning

150 - Junhyuk Oh , Yijie Guo , Satinder Singh 2018

This paper proposes Self-Imitation Learning (SIL), a simple off-policy actor-critic algorithm that learns to reproduce the agents past good decisions. This algorithm is designed to verify our hypothesis that exploiting past good experiences can indir ectly drive deep exploration. Our empirical results show that SIL significantly improves advantage actor-critic (A2C) on several hard exploration Atari games and is competitive to the state-of-the-art count-based exploration methods. We also show that SIL improves proximal policy optimization (PPO) on MuJoCo tasks.

التعلم الآلي الذكاء الاصطناعي التعلم الالي

Dyna-AIL : Adversarial Imitation Learning by Planning

78 - Vaibhav Saxena , Srinivasan Sivanandan , Pulkit Mathur 2019

Adversarial methods for imitation learning have been shown to perform well on various control tasks. However, they require a large number of environment interactions for convergence. In this paper, we propose an end-to-end differentiable adversarial imitation learning algorithm in a Dyna-like framework for switching between model-based planning and model-free learning from expert data. Our results on both discrete and continuous environments show that our approach of using model-based planning along with model-free learning converges to an optimal policy with fewer number of environment interactions in comparison to the state-of-the-art learning methods.

التعلم الآلي الذكاء الاصطناعي التعلم الالي

Combating False Negatives in Adversarial Imitation Learning

152 - Konrad Zolna , Chitwan Saharia , Leonard Boussioux 2020

In adversarial imitation learning, a discriminator is trained to differentiate agent episodes from expert demonstrations representing the desired behavior. However, as the trained policy learns to be more successful, the negative examples (the ones p roduced by the agent) become increasingly similar to expert ones. Despite the fact that the task is successfully accomplished in some of the agents trajectories, the discriminator is trained to output low values for them. We hypothesize that this inconsistent training signal for the discriminator can impede its learning, and consequently leads to worse overall performance of the agent. We show experimental evidence for this hypothesis and that the False Negatives (i.e. successful agent episodes) significantly hinder adversarial imitation learning, which is the first contribution of this paper. Then, we propose a method to alleviate the impact of false negatives and test it on the BabyAI environment. This method consistently improves sample efficiency over the baselines by at least an order of magnitude.

التعلم الآلي الذكاء الاصطناعي التعلم الالي