ﻻ يوجد ملخص باللغة العربية
Reproducibility in reinforcement learning is challenging: uncontrolled stochasticity from many sources, such as the learning algorithm, the learned policy, and the environment itself have led researchers to report the performance of learned agents using aggregate metrics of performance over multiple random seeds for a single environment. Unfortunately, there are still pernicious sources of variability in reinforcement learning agents that make reporting common summary statistics an unsound metric for performance. Our experiments demonstrate the variability of common agents used in the popular OpenAI Baselines repository. We make the case for reporting post-training agent performance as a distribution, rather than a point estimate.
It is a widely accepted principle that software without tests has bugs. Testing reinforcement learning agents is especially difficult because of the stochastic nature of both agents and environments, the complexity of state-of-the-art models, and the
Every living organism struggles against disruptive environmental forces to carve out and maintain an orderly niche. We propose that such a struggle to achieve and preserve order might offer a principle for the emergence of useful behaviors in artific
We propose a new approach to visualize saliency maps for deep neural network models and apply it to deep reinforcement learning agents trained on Atari environments. Our method adds an attention module that we call FLS (Free Lunch Saliency) to the fe
We revisit residual algorithms in both model-free and model-based reinforcement learning settings. We propose the bidirectional target network technique to stabilize residual algorithms, yielding a residual version of DDPG that significantly outperfo
Reinforcement Learning (RL) is a key technique to address sequential decision-making problems and is crucial to realize advanced artificial intelligence. Recent years have witnessed remarkable progress in RL by virtue of the fast development of deep