ﻻ يوجد ملخص باللغة العربية
We introduce a new virtual environment for simulating a card game known as Big 2. This is a four-player game of imperfect information with a relatively complicated action space (being allowed to play 1,2,3,4 or 5 card combinations from an initial starting hand of 13 cards). As such it poses a challenge for many current reinforcement learning methods. We then use the recently proposed Proximal Policy Optimization algorithm to train a deep neural network to play the game, purely learning via self-play, and find that it is able to reach a level which outperforms amateur human players after only a relatively short amount of training time and without needing to search a tree of future game states.
Competitive Self-Play (CSP) based Multi-Agent Reinforcement Learning (MARL) has shown phenomenal breakthroughs recently. Strong AIs are achieved for several benchmarks, including Dota 2, Glory of Kings, Quake III, StarCraft II, to name a few. Despite
Reproducibility in reinforcement learning is challenging: uncontrolled stochasticity from many sources, such as the learning algorithm, the learned policy, and the environment itself have led researchers to report the performance of learned agents us
Curriculum reinforcement learning (CRL) improves the learning speed and stability of an agent by exposing it to a tailored series of tasks throughout learning. Despite empirical successes, an open question in CRL is how to automatically generate a cu
Using privileged information during training can improve the sample efficiency and performance of machine learning systems. This paradigm has been applied to reinforcement learning (RL), primarily in the form of distillation or auxiliary tasks, and l
Reinforcement learning agents that operate in diverse and complex environments can benefit from the structured decomposition of their behavior. Often, this is addressed in the context of hierarchical reinforcement learning, where the aim is to decomp