ﻻ يوجد ملخص باللغة العربية
Transfer learning (TL) is a promising way to improve the sample efficiency of reinforcement learning. However, how to efficiently transfer knowledge across tasks with different state-action spaces is investigated at an early stage. Most previous studies only addressed the inconsistency across different state spaces by learning a common feature space, without considering that similar actions in different action spaces of related tasks share similar semantics. In this paper, we propose a method to learning action embeddings by leveraging this idea, and a framework that learns both state embeddings and action embeddings to transfer policy across tasks with different state and action spaces. Our experimental results on various tasks show that the proposed method can not only learn informative action embeddings but accelerate policy learning.
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension, making it challenging to apply existing on-policy gradient based deep RL algorithms
Deep reinforcement learning (DRL) algorithms have been demonstrated to be effective in a wide range of challenging decision making and control tasks. However, these methods typically suffer from severe action oscillations in particular in discrete ac
In multi-agent reinforcement learning, the inherent non-stationarity of the environment caused by other agents actions posed significant difficulties for an agent to learn a good policy independently. One way to deal with non-stationarity is agent mo
It has been arduous to assess the progress of a policy learning algorithm in the domain of hierarchical task with high dimensional action space due to the lack of a commonly accepted benchmark. In this work, we propose a new light-weight benchmark ta
Discrete-continuous hybrid action space is a natural setting in many practical problems, such as robot control and game AI. However, most previous Reinforcement Learning (RL) works only demonstrate the success in controlling with either discrete or c