An Empirical Comparison of Off-policy Prediction Learning Algorithms on the Collision Task


الملخص بالإنكليزية

Off-policy prediction -- learning the value function for one policy from data generated while following another policy -- is one of the most challenging subproblems in reinforcement learning. This paper presents empirical results with eleven prominent off-policy learning algorithms that use linear function approximation: five Gradient-TD methods, two Emphatic-TD methods, Off-policy TD($lambda$), Vtrace, a

تحميل البحث