ترغب بنشر مسار تعليمي؟ اضغط هنا

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

Generative Adversarial Reward Learning for Generalized Behavior Tendency Inference

167 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Xiaocong Chen

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Xiaocong Chen - Lina Yao - Xianzhi Wang

التعلم الآلي الذكاء الاصطناعي استرجاع المعلومات

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Recent advances in reinforcement learning have inspired increasing interest in learning user modeling adaptively through dynamic interactions, e.g., in reinforcement learning based recommender systems. Reward function is crucial for most of reinforcement learning applications as it can provide the guideline about the optimization. However, current reinforcement-learning-based methods rely on manually-defined reward functions, which cannot adapt to dynamic and noisy environments. Besides, they generally use task-specific reward functions that sacrifice generalization ability. We propose a generative inverse reinforcement learning for user behavioral preference modelling, to address the above issues. Instead of using predefined reward functions, our model can automatically learn the rewards from users actions based on discriminative actor-critic network and Wasserstein GAN. Our model provides a general way of characterizing and explaining underlying behavioral tendencies, and our experiments show our method outperforms state-of-the-art methods in a variety of scenarios, namely traffic signal control, online recommender systems, and scanpath prediction.

قيم البحث

اقرأ أيضاً

Intrinsic Reward Driven Imitation Learning via Generative Model

127 - Xingrui Yu , Yueming Lyu , Ivor W. Tsang 2020

Imitation learning in a high-dimensional environment is challenging. Most inverse reinforcement learning (IRL) methods fail to outperform the demonstrator in such a high-dimensional environment, e.g., Atari domain. To address this challenge, we propo se a novel reward learning module to generate intrinsic reward signals via a generative model. Our generative method can perform better forward state transition and backward action encoding, which improves the modules dynamics modeling ability in the environment. Thus, our module provides the imitation agent both the intrinsic intention of the demonstrator and a better exploration ability, which is critical for the agent to outperform the demonstrator. Empirical results show that our method outperforms state-of-the-art IRL methods on multiple Atari games, even with one-life demonstration. Remarkably, our method achieves performance that is up to 5 times the performance of the demonstration.

التعلم الآلي الذكاء الاصطناعي التعلم الالي

Generative Adversarial Self-Imitation Learning

166 - Yijie Guo , Junhyuk Oh , Satinder Singh 2018

This paper explores a simple regularizer for reinforcement learning by proposing Generative Adversarial Self-Imitation Learning (GASIL), which encourages the agent to imitate past good trajectories via generative adversarial imitation learning framew ork. Instead of directly maximizing rewards, GASIL focuses on reproducing past good trajectories, which can potentially make long-term credit assignment easier when rewards are sparse and delayed. GASIL can be easily combined with any policy gradient objective by using GASIL as a learned shaped reward function. Our experimental results show that GASIL improves the performance of proximal policy optimization on 2D Point Mass and MuJoCo environments with delayed reward and stochastic dynamics.

التعلم الآلي الذكاء الاصطناعي التعلم الالي

$f$-GAIL: Learning $f$-Divergence for Generative Adversarial Imitation Learning

204 - Xin Zhang , Yanhua Li , Ziming Zhang 2020

Imitation learning (IL) aims to learn a policy from expert demonstrations that minimizes the discrepancy between the learner and expert behaviors. Various imitation learning algorithms have been proposed with different pre-determined divergences to q uantify the discrepancy. This naturally gives rise to the following question: Given a set of expert demonstrations, which divergence can recover the expert policy more accurately with higher data efficiency? In this work, we propose $f$-GAIL, a new generative adversarial imitation learning (GAIL) model, that automatically learns a discrepancy measure from the $f$-divergence family as well as a policy capable of producing expert-like behaviors. Compared with IL baselines with various predefined divergence measures, $f$-GAIL learns better policies with higher data efficiency in six physics-based control tasks.

التعلم الآلي الذكاء الاصطناعي التعلم الالي

Risk-Sensitive Generative Adversarial Imitation Learning

86 - Jonathan Lacotte , Mohammad Ghavamzadeh , Yinlam Chow 2018

We study risk-sensitive imitation learning where the agents goal is to perform at least as well as the expert in terms of a risk profile. We first formulate our risk-sensitive imitation learning setting. We consider the generative adversarial approac h to imitation learning (GAIL) and derive an optimization problem for our formulation, which we call it risk-sensitive GAIL (RS-GAIL). We then derive two differe

التعلم الآلي الذكاء الاصطناعي التعلم الالي

Continual Learning in Generative Adversarial Nets

173 - Ari Seff , Alex Beatson , Daniel Suo 2017

Developments in deep generative models have allowed for tractable learning of high-dimensional data distributions. While the employed learning procedures typically assume that training data is drawn i.i.d. from the distribution of interest, it may be desirable to model distinct distributions which are observed sequentially, such as when different classes are encountered over time. Although conditional variations of deep generative models permit multiple distributions to be modeled by a single network in a disentangled fashion, they are susceptible to catastrophic forgetting when the distributions are encountered sequentially. In this paper, we adapt recent work in reducing catastrophic forgetting to the task of training generative adversarial networks on a sequence of distinct distributions, enabling continual generative modeling.

التعلم الآلي الذكاء الاصطناعي التعلم الالي

الأسئلة المقترحة

ما العلاقة بين الذكاء الاصطناعي وتعلم الآلة؟

1992 - 0 - - Shamra Editor تم طرحه بمساحة ( الهندسة المعلوماتية)

التعلم الآلي

ماذا يعني التنقيب عن البيانات؟

2362 - 0 - - Ahmad Ali تم طرحه بمساحة ( الهندسة المعلوماتية)

التعلم الآلي

ماهي وسائل التنقيب في البيانات؟

2104 - 0 - - Ahmad Ali تم طرحه بمساحة ( الهندسة المعلوماتية)

التعلم الآلي

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة البعث

تفاصيل إضافية المزيد من الجامعات

mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا

نعم | كلا