ﻻ يوجد ملخص باللغة العربية
Despite the close connection between exploration and sample efficiency, most state of the art reinforcement learning algorithms include no considerations for exploration beyond maximizing the entropy of the policy. In this work we address this seeming missed opportunity. We observe that the most common formulation of directed exploration in deep RL, known as bonus-based exploration (BBE), suffers from bias and slow coverage in the few-sample regime. This causes BBE to be actively detrimental to policy learning in many control tasks. We show that by decoupling the task policy from the exploration policy, directed exploration can be highly effective for sample-efficient continuous control. Our method, Decoupled Exploration and Exploitation Policies (DEEP), can be combined with any off-policy RL algorithm without modification. When used in conjunction with soft actor-critic, DEEP incurs no performance penalty in densely-rewarding environments. On sparse environments, DEEP gives a several-fold improvement in data efficiency due to better exploration.
Model-based deep reinforcement learning has achieved success in various domains that require high sample efficiencies, such as Go and robotics. However, there are some remaining issues, such as planning efficient explorations to learn more accurate d
The goal of meta-reinforcement learning (meta-RL) is to build agents that can quickly learn new tasks by leveraging prior experience on related tasks. Learning a new task often requires both exploring to gather task-relevant information and exploitin
Sparse-reward domains are challenging for reinforcement learning algorithms since significant exploration is needed before encountering reward for the first time. Hierarchical reinforcement learning can facilitate exploration by reducing the number o
A fundamental issue in reinforcement learning algorithms is the balance between exploration of the environment and exploitation of information already obtained by the agent. Especially, exploration has played a critical role for both efficiency and e
Many cooperative multiagent reinforcement learning environments provide agents with a sparse team-based reward, as well as a dense agent-specific reward that incentivizes learning basic skills. Training policies solely on the team-based reward is oft