ﻻ يوجد ملخص باللغة العربية
Despite recent success of deep network-based Reinforcement Learning (RL), it remains elusive to achieve human-level efficiency in learning novel tasks. While previous efforts attempt to address this challenge using meta-learning strategies, they typically suffer from sampling inefficiency with on-policy RL algorithms or meta-overfitting with off-policy learning. In this work, we propose a novel meta-RL strategy to address those limitations. In particular, we decompose the meta-RL problem into three sub-tasks, task-exploration, task-inference and task-fulfillment, instantiated with two deep network agents and a task encoder. During meta-training, our method learns a task-conditioned actor network for task-fulfillment, an explorer network with a self-supervised reward shaping that encourages task-informative experiences in task-exploration, and a context-aware graph-based task encoder for task inference. We validate our approach with extensive experiments on several public benchmarks and the results show that our algorithm effectively performs exploration for task inference, improves sample efficiency during both training and testing, and mitigates the meta-overfitting problem.
Deep reinforcement learning algorithms require large amounts of experience to learn an individual task. While in principle meta-reinforcement learning (meta-RL) algorithms enable agents to learn new skills from small amounts of experience, several ma
Context, the embedding of previous collected trajectories, is a powerful construct for Meta-Reinforcement Learning (Meta-RL) algorithms. By conditioning on an effective context, Meta-RL policies can easily generalize to new tasks within a few adaptat
Model-agnostic meta-learners aim to acquire meta-learned parameters from similar tasks to adapt to novel tasks from the same distribution with few gradient updates. With the flexibility in the choice of models, those frameworks demonstrate appealing
The goal of reinforcement learning algorithms is to estimate and/or optimise the value function. However, unlike supervised learning, no teacher or oracle is available to provide the true value function. Instead, the majority of reinforcement learnin
There has been rapidly growing interest in meta-learning as a method for increasing the flexibility and sample efficiency of reinforcement learning. One problem in this area of research, however, has been a scarcity of adequate benchmark tasks. In ge