ترغب بنشر مسار تعليمي؟ اضغط هنا

Dex: Incremental Learning for Complex Environments in Deep Reinforcement Learning

96   0   0.0 ( 0 )
 نشر من قبل Nick Erickson
 تاريخ النشر 2017
والبحث باللغة English




اسأل ChatGPT حول البحث

This paper introduces Dex, a reinforcement learning environment toolkit specialized for training and evaluation of continual learning methods as well as general reinforcement learning problems. We also present the novel continual learning method of incremental learning, where a challenging environment is solved using optimal weight initialization learned from first solving a similar easier environment. We show that incremental learning can produce vastly superior results than standard methods by providing a strong baseline method across ten Dex environments. We finally develop a saliency method for qualitative analysis of reinforcement learning, which shows the impact incremental learning has on network attention.

قيم البحث

اقرأ أيضاً

Learning robust value functions given raw observations and rewards is now possible with model-free and model-based deep reinforcement learning algorithms. There is a third alternative, called Successor Representations (SR), which decomposes the value function into two components -- a reward predictor and a successor map. The successor map represents the expected future state occupancy from any given state and the reward predictor maps states to scalar rewards. The value function of a state can be computed as the inner product between the successor map and the reward weights. In this paper, we present DSR, which generalizes SR within an end-to-end deep reinforcement learning framework. DSR has several appealing properties including: increased sensitivity to distal reward changes due to factorization of reward and world dynamics, and the ability to extract bottleneck states (subgoals) given successor maps trained under a random policy. We show the efficacy of our approach on two diverse environments given raw pixel observations -- simple grid-world domains (MazeBase) and the Doom game engine.
Dealing with uncertainty is essential for efficient reinforcement learning. There is a growing literature on uncertainty estimation for deep learning from fixed datasets, but many of the most popular approaches are poorly-suited to sequential decisio n problems. Other methods, such as bootstrap sampling, have no mechanism for uncertainty that does not come from the observed data. We highlight why this can be a crucial shortcoming and propose a simple remedy through addition of a randomized untrainable `prior network to each ensemble member. We prove that this approach is efficient with linear representations, provide simple illustrations of its efficacy with nonlinear representations and show that this approach scales to large-scale problems far better than previous attempts.
Reinforcement learning (RL) algorithms are typically limited to learning a single solution of a specified task, even though there often exists diverse solutions to a given task. Compared with learning a single solution, learning a set of diverse solu tions is beneficial because diverse solutions enable robust few-shot adaptation and allow the user to select a preferred solution. Although previous studies have showed that diverse behaviors can be modeled with a policy conditioned on latent variables, an approach for modeling an infinite set of diverse solutions with continuous latent variables has not been investigated. In this study, we propose an RL method that can learn infinitely many solutions by training a policy conditioned on a continuous or discrete low-dimensional latent variable. Through continuous control tasks, we demonstrate that our method can learn diverse solutions in a data-efficient manner and that the solutions can be used for few-shot adaptation to solve unseen tasks.
In this paper we explore methods to exploit symmetries for ensuring sample efficiency in reinforcement learning (RL), this problem deserves ever increasing attention with the recent advances in the use of deep networks for complex RL tasks which requ ire large amount of training data. We introduce a novel method to detect symmetries using reward trails observed during episodic experience and prove its completeness. We also provide a framework to incorporate the discovered symmetries for functional approximation. Finally we show that the use of potential based reward shaping is especially effective for our symmetry exploitation mechanism. Experiments on various classical problems show that our method improves the learning performance significantly by utilizing symmetry information.
Evolution strategies (ES), as a family of black-box optimization algorithms, recently emerge as a scalable alternative to reinforcement learning (RL) approaches such as Q-learning or policy gradient, and are much faster when many central processing u nits (CPUs) are available due to better parallelization. In this paper, we propose a systematic incremental learning method for ES in dynamic environments. The goal is to adjust previously learned policy to a new one incrementally whenever the environment changes. We incorporate an instance weighting mechanism with ES to facilitate its learning adaptation, while retaining scalability of ES. During parameter updating, higher weights are assigned to instances that contain more new knowledge, thus encouraging the search distribution to move towards new promising areas of parameter space. We propose two easy-to-implement metrics to calculate the weights: instance novelty and instance quality. Instance novelty measures an instances difference from the previous optimum in the original environment, while instance quality corresponds to how well an instance performs in the new environment. The resulting algorithm, Instance Weighted Incremental Evolution Strategies (IW-IES), is verified to achieve significantly improved performance on a suite of robot navigation tasks. This paper thus introduces a family of scalable ES algorithms for RL domains that enables rapid learning adaptation to dynamic environments.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا