ﻻ يوجد ملخص باللغة العربية
This paper makes one step forward towards characterizing a new family of textit{model-free} Deep Reinforcement Learning (DRL) algorithms. The aim of these algorithms is to jointly learn an approximation of the state-value function ($V$), alongside an approximation of the state-action value function ($Q$). Our analysis starts with a thorough study of the Deep Quality-Value Learning (DQV) algorithm, a DRL algorithm which has been shown to outperform popular techniques such as Deep-Q-Learning (DQN) and Double-Deep-Q-Learning (DDQN) cite{sabatelli2018deep}. Intending to investigate why DQVs learning dynamics allow this algorithm to perform so well, we formulate a set of research questions which help us characterize a new family of DRL algorithms. Among our results, we present some specific cases in which DQVs performance can get harmed and introduce a novel textit{off-policy} DRL algorithm, called DQV-Max, which can outperform DQV. We then study the behavior of the $V$ and $Q$ functions that are learned by DQV and DQV-Max and show that both algorithms might perform so well on several DRL test-beds because they are less prone to suffer from the overestimation bias of the $Q$ function.
This paper introduces four new algorithms that can be used for tackling multi-agent reinforcement learning (MARL) problems occurring in cooperative settings. All algorithms are based on the Deep Quality-Value (DQV) family of algorithms, a set of tech
We present a distributional approach to theoretical analyses of reinforcement learning algorithms for constant step-sizes. We demonstrate its effectiveness by presenting simple and unified proofs of convergence for a variety of commonly-used methods.
Catastrophic interference is common in many network-based learning systems, and many proposals exist for mitigating it. But, before we overcome interference we must understand it better. In this work, we provide a definition of interference for contr
Deep Reinforcement Learning (DRL) methods have performed well in an increasing numbering of high-dimensional visual decision making domains. Among all such visual decision making problems, those with discrete action spaces often tend to have underlyi
Reinforcement Learning (RL) is a key technique to address sequential decision-making problems and is crucial to realize advanced artificial intelligence. Recent years have witnessed remarkable progress in RL by virtue of the fast development of deep