Self-Imitation Learning

151 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Junhyuk Oh

تاريخ النشر 2018

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Junhyuk Oh - Yijie Guo - Satinder Singh

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

This paper proposes Self-Imitation Learning (SIL), a simple off-policy actor-critic algorithm that learns to reproduce the agents past good decisions. This algorithm is designed to verify our hypothesis that exploiting past good experiences can indirectly drive deep exploration. Our empirical results show that SIL significantly improves advantage actor-critic (A2C) on several hard exploration Atari games and is competitive to the state-of-the-art count-based exploration methods. We also show that SIL improves proximal policy optimization (PPO) on MuJoCo tasks.

قيم البحث

166 - Yijie Guo , Junhyuk Oh , Satinder Singh 2018

This paper explores a simple regularizer for reinforcement learning by proposing Generative Adversarial Self-Imitation Learning (GASIL), which encourages the agent to imitate past good trajectories via generative adversarial imitation learning framew ork. Instead of directly maximizing rewards, GASIL focuses on reproducing past good trajectories, which can potentially make long-term credit assignment easier when rewards are sparse and delayed. GASIL can be easily combined with any policy gradient objective by using GASIL as a learned shaped reward function. Our experimental results show that GASIL improves the performance of proximal policy optimization on 2D Point Mass and MuJoCo environments with delayed reward and stochastic dynamics.

التعلم الآلي الذكاء الاصطناعي التعلم الالي

Risk-Sensitive Generative Adversarial Imitation Learning

86 - Jonathan Lacotte , Mohammad Ghavamzadeh , Yinlam Chow 2018

We study risk-sensitive imitation learning where the agents goal is to perform at least as well as the expert in terms of a risk profile. We first formulate our risk-sensitive imitation learning setting. We consider the generative adversarial approac h to imitation learning (GAIL) and derive an optimization problem for our formulation, which we call it risk-sensitive GAIL (RS-GAIL). We then derive two differe

التعلم الآلي الذكاء الاصطناعي التعلم الالي

Dyna-AIL : Adversarial Imitation Learning by Planning

78 - Vaibhav Saxena , Srinivasan Sivanandan , Pulkit Mathur 2019

Adversarial methods for imitation learning have been shown to perform well on various control tasks. However, they require a large number of environment interactions for convergence. In this paper, we propose an end-to-end differentiable adversarial imitation learning algorithm in a Dyna-like framework for switching between model-based planning and model-free learning from expert data. Our results on both discrete and continuous environments show that our approach of using model-based planning along with model-free learning converges to an optimal policy with fewer number of environment interactions in comparison to the state-of-the-art learning methods.

التعلم الآلي الذكاء الاصطناعي التعلم الالي

Combating False Negatives in Adversarial Imitation Learning

152 - Konrad Zolna , Chitwan Saharia , Leonard Boussioux 2020

In adversarial imitation learning, a discriminator is trained to differentiate agent episodes from expert demonstrations representing the desired behavior. However, as the trained policy learns to be more successful, the negative examples (the ones p roduced by the agent) become increasingly similar to expert ones. Despite the fact that the task is successfully accomplished in some of the agents trajectories, the discriminator is trained to output low values for them. We hypothesize that this inconsistent training signal for the discriminator can impede its learning, and consequently leads to worse overall performance of the agent. We show experimental evidence for this hypothesis and that the False Negatives (i.e. successful agent episodes) significantly hinder adversarial imitation learning, which is the first contribution of this paper. Then, we propose a method to alleviate the impact of false negatives and test it on the BabyAI environment. This method consistently improves sample efficiency over the baselines by at least an order of magnitude.

التعلم الآلي الذكاء الاصطناعي التعلم الالي

Provable Representation Learning for Imitation Learning via Bi-level Optimization

140 - Sanjeev Arora , Simon S. Du , Sham Kakade 2020

A common strategy in modern learning systems is to learn a representation that is useful for many tasks, a.k.a. representation learning. We study this strategy in the imitation learning setting for Markov decision processes (MDPs) where multiple expe rts trajectories are available. We formulate representation learning as a bi-level optimization problem where the outer optimization tries to learn the joint representation and the inner optimization encodes the imitation learning setup and tries to learn task-specific parameters. We instantiate this framework for the imitation learning settings of behavior cloning and observation-alone. Theoretically, we show using our framework that representation learning can provide sample complexity benefits for imitation learning in both settings. We also provide proof-of-concept experiments to verify our theory.

التعلم الآلي الذكاء الاصطناعي التعلم الالي