ﻻ يوجد ملخص باللغة العربية
Goal-conditioned hierarchical reinforcement learning (HRL) serves as a successful approach to solving complex and temporally extended tasks. Recently, its success has been extended to more general settings by concurrently learning hierarchical policies and subgoal representations. However, online subgoal representation learning exacerbates the non-stationary issue of HRL and introduces challenges for exploration in high-level policy learning. In this paper, we propose a state-specific regularization that stabilizes subgoal embeddings in well-explored areas while allowing representation updates in less explored state regions. Benefiting from this stable representation, we design measures of novelty and potential for subgoals, and develop an efficient hierarchical exploration strategy that actively seeks out new promising subgoals and states. Experimental results show that our method significantly outperforms state-of-the-art baselines in continuous control tasks with sparse rewards and further demonstrate the stability and efficiency of the subgoal representation learning of this work, which promotes superior policy learning.
Sparse-reward domains are challenging for reinforcement learning algorithms since significant exploration is needed before encountering reward for the first time. Hierarchical reinforcement learning can facilitate exploration by reducing the number o
Reinforcement learning (RL) promises to enable autonomous acquisition of complex behaviors for diverse agents. However, the success of current reinforcement learning algorithms is predicated on an often under-emphasised requirement -- each trial need
In this paper we present a novel method for learning hierarchical representations of Markov decision processes. Our method works by partitioning the state space into subsets, and defines subtasks for performing transitions between the partitions. We
In this paper we introduce a simple approach for exploration in reinforcement learning (RL) that allows us to develop theoretically justified algorithms in the tabular case but that is also extendable to settings where function approximation is requi
Although distributional reinforcement learning (DRL) has been widely examined in the past few years, there are two open questions people are still trying to address. One is how to ensure the validity of the learned quantile function, the other is how