Discovering Diverse Solutions in Deep Reinforcement Learning

78 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Takayuki Osa

تاريخ النشر 2021

مجال البحث الاحصاء الرياضي الهندسة المعلوماتية

والبحث باللغة English

تأليف Takayuki Osa - Voot Tangkaratt - Masashi Sugiyama

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Reinforcement learning (RL) algorithms are typically limited to learning a single solution of a specified task, even though there often exists diverse solutions to a given task. Compared with learning a single solution, learning a set of diverse solutions is beneficial because diverse solutions enable robust few-shot adaptation and allow the user to select a preferred solution. Although previous studies have showed that diverse behaviors can be modeled with a policy conditioned on latent variables, an approach for modeling an infinite set of diverse solutions with continuous latent variables has not been investigated. In this study, we propose an RL method that can learn infinitely many solutions by training a policy conditioned on a continuous or discrete low-dimensional latent variable. Through continuous control tasks, we demonstrate that our method can learn diverse solutions in a data-efficient manner and that the solutions can be used for few-shot adaptation to solve unseen tasks.

قيم البحث

اقرأ أيضاً

Deep Successor Reinforcement Learning

163 - Tejas D. Kulkarni , Ardavan Saeedi , Simanta Gautam 2016

Learning robust value functions given raw observations and rewards is now possible with model-free and model-based deep reinforcement learning algorithms. There is a third alternative, called Successor Representations (SR), which decomposes the value function into two components -- a reward predictor and a successor map. The successor map represents the expected future state occupancy from any given state and the reward predictor maps states to scalar rewards. The value function of a state can be computed as the inner product between the successor map and the reward weights. In this paper, we present DSR, which generalizes SR within an end-to-end deep reinforcement learning framework. DSR has several appealing properties including: increased sensitivity to distal reward changes due to factorization of reward and world dynamics, and the ability to extract bottleneck states (subgoals) given successor maps trained under a random policy. We show the efficacy of our approach on two diverse environments given raw pixel observations -- simple grid-world domains (MazeBase) and the Doom game engine.

التعلم الالي الذكاء الاصطناعي التعلم الآلي

Dex: Incremental Learning for Complex Environments in Deep Reinforcement Learning

95 - Nick Erickson , Qi Zhao 2017

This paper introduces Dex, a reinforcement learning environment toolkit specialized for training and evaluation of continual learning methods as well as general reinforcement learning problems. We also present the novel continual learning method of i ncremental learning, where a challenging environment is solved using optimal weight initialization learned from first solving a similar easier environment. We show that incremental learning can produce vastly superior results than standard methods by providing a strong baseline method across ten Dex environments. We finally develop a saliency method for qualitative analysis of reinforcement learning, which shows the impact incremental learning has on network attention.

التعلم الالي الذكاء الاصطناعي التعلم الآلي

Randomized Prior Functions for Deep Reinforcement Learning

326 - Ian Osband , John Aslanides , Albin Cassirer 2018

Dealing with uncertainty is essential for efficient reinforcement learning. There is a growing literature on uncertainty estimation for deep learning from fixed datasets, but many of the most popular approaches are poorly-suited to sequential decisio n problems. Other methods, such as bootstrap sampling, have no mechanism for uncertainty that does not come from the observed data. We highlight why this can be a crucial shortcoming and propose a simple remedy through addition of a randomized untrainable `prior network to each ensemble member. We prove that this approach is efficient with linear representations, provide simple illustrations of its efficacy with nonlinear representations and show that this approach scales to large-scale problems far better than previous attempts.

التعلم الالي الذكاء الاصطناعي التعلم الآلي

Discovering Reinforcement Learning Algorithms

126 - Junhyuk Oh , Matteo Hessel , Wojciech M. Czarnecki 2020

Reinforcement learning (RL) algorithms update an agents parameters according to one of several possible rules, discovered manually through years of research. Automating the discovery of update rules from data could lead to more efficient algorithms, or algorithms that are better adapted to specific environments. Although there have been prior attempts at addressing this significant scientific challenge, it remains an open question whether it is feasible to discover alternatives to fundamental concepts of RL such as value functions and temporal-difference learning. This paper introduces a new meta-learning approach that discovers an entire update rule which includes both what to predict (e.g. value functions) and how to learn from it (e.g. bootstrapping) by interacting with a set of environments. The output of this method is an RL algorithm that we call Learned Policy Gradient (LPG). Empirical results show that our method discovers its own alternative to the concept of value functions. Furthermore it discovers a bootstrapping mechanism to maintain and use its predictions. Surprisingly, when trained solely on toy environments, LPG generalises effectively to complex Atari games and achieves non-trivial performance. This shows the potential to discover general RL algorithms from data.

التعلم الآلي الذكاء الاصطناعي

Symmetry Learning for Function Approximation in Reinforcement Learning

332 - Anuj Mahajan , Theja Tulabandhula 2017

In this paper we explore methods to exploit symmetries for ensuring sample efficiency in reinforcement learning (RL), this problem deserves ever increasing attention with the recent advances in the use of deep networks for complex RL tasks which requ ire large amount of training data. We introduce a novel method to detect symmetries using reward trails observed during episodic experience and prove its completeness. We also provide a framework to incorporate the discovered symmetries for functional approximation. Finally we show that the use of potential based reward shaping is especially effective for our symmetry exploitation mechanism. Experiments on various classical problems show that our method improves the learning performance significantly by utilizing symmetry information.

التعلم الالي الذكاء الاصطناعي التعلم الآلي