ﻻ يوجد ملخص باللغة العربية
A critical and challenging problem in reinforcement learning is how to learn the state-action value function from the experience replay buffer and simultaneously keep sample efficiency and faster convergence to a high quality solution. In prior works, transitions are uniformly sampled at random from the replay buffer or sampled based on their priority measured by temporal-difference (TD) error. However, these approaches do not fully take into consideration the intrinsic characteristics of transition distribution in the state space and could result in redundant and unnecessary TD updates, slowing down the convergence of the learning procedure. To overcome this problem, we propose a novel state distribution-aware sampling method to balance the replay times for transitions with skew distribution, which takes into account both the occurrence frequencies of transitions and the uncertainty of state-action values. Consequently, our approach could reduce the unnecessary TD updates and increase the TD updates for state-action value with more uncertainty, making the experience replay more effective and efficient. Extensive experiments are conducted on both classic control tasks and Atari 2600 games based on OpenAI gym platform and the experimental results demonstrate the effectiveness of our approach in comparison with the standard DQN approach.
Thompson sampling is a well-known approach for balancing exploration and exploitation in reinforcement learning. It requires the posterior distribution of value-action functions to be maintained; this is generally intractable for tasks that have a hi
The fundamental assumption of reinforcement learning in Markov decision processes (MDPs) is that the relevant decision process is, in fact, Markov. However, when MDPs have rich observations, agents typically learn by way of an abstract state represen
Delusional bias is a fundamental source of error in approximate Q-learning. To date, the only techniques that explicitly address delusion require comprehensive search using tabular value estimates. In this paper, we develop efficient methods to mitig
Deep Q-Learning is an important reinforcement learning algorithm, which involves training a deep neural network, called Deep Q-Network (DQN), to approximate the well-known Q-function. Although wildly successful under laboratory conditions, serious ga
Reinforcement learning (RL) algorithms have made huge progress in recent years by leveraging the power of deep neural networks (DNN). Despite the success, deep RL algorithms are known to be sample inefficient, often requiring many rounds of interacti