ﻻ يوجد ملخص باللغة العربية
Policies for complex visual tasks have been successfully learned with deep reinforcement learning, using an approach called deep Q-networks (DQN), but relatively large (task-specific) networks and extensive training are needed to achieve good performance. In this work, we present a novel method called policy distillation that can be used to extract the policy of a reinforcement learning agent and train a new network that performs at the expert level while being dramatically smaller and more efficient. Furthermore, the same method can be used to consolidate multiple task-specific policies into a single policy. We demonstrate these claims using the Atari domain and show that the multi-task distilled agent outperforms the single-task teachers as well as a jointly-trained DQN agent.
The transfer of knowledge from one policy to another is an important tool in Deep Reinforcement Learning. This process, referred to as distillation, has been used to great success, for example, by enhancing the optimisation of agents, leading to stro
Policy distillation, which transfers a teacher policy to a student policy has achieved great success in challenging tasks of deep reinforcement learning. This teacher-student framework requires a well-trained teacher model which is computationally ex
While deep reinforcement learning has achieved promising results in challenging decision-making tasks, the main bones of its success --- deep neural networks are mostly black-boxes. A feasible way to gain insight into a black-box model is to distill
Multiagent reinforcement learning algorithms (MARL) have been demonstrated on complex tasks that require the coordination of a team of multiple agents to complete. Existing works have focused on sharing information between agents via centralized crit
The control of nonlinear dynamical systems remains a major challenge for autonomous agents. Current trends in reinforcement learning (RL) focus on complex representations of dynamics and policies, which have yielded impressive results in solving a va