ﻻ يوجد ملخص باللغة العربية
The temporal-difference methods TD($lambda$) and Sarsa($lambda$) form a core part of modern reinforcement learning. Their appeal comes from their good performance, low computational cost, and their simple interpretation, given by their forward view. Recently, n
Emphatic Temporal Difference (ETD) learning has recently been proposed as a convergent off-policy learning method. ETD was proposed mainly to address convergence issues of conventional Temporal Difference (TD) learning under off-policy training but i
The true online TD({lambda}) algorithm has recently been proposed (van Seijen and Sutton, 2014) as a universal replacement for the popular TD({lambda}) algorithm, in temporal-difference learning and reinforcement learning. True online TD({lambda}) ha
An effective approach to exploration in reinforcement learning is to rely on an agents uncertainty over the optimal policy, which can yield near-optimal exploration strategies in tabular settings. However, in non-tabular settings that involve functio
Multi-step temporal-difference (TD) learning, where the update targets contain information from multiple time steps ahead, is one of the most popular forms of TD learning for linear function approximation. The reason is that multi-step methods often
This paper has been withdrawn by the author. This draft is withdrawn for its poor quality in english, unfortunately produced by the author when he was just starting his science route. Look at the ICML version instead: http://icml2008.cs.helsinki.fi/papers/111.pdf