An Overview of Multi-Agent Reinforcement Learning from Game Theoretical Perspective

104 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Yaodong Yang Mr.

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Yaodong Yang - Jun Wang

أنظمة متعددة العملاء الذكاء الاصطناعي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Following the remarkable success of the AlphaGO series, 2019 was a booming year that witnessed significant advances in multi-agent reinforcement learning (MARL) techniques. MARL corresponds to the learning problem in a multi-agent system in which multiple agents learn simultaneously. It is an interdisciplinary domain with a long history that includes game theory, machine learning, stochastic control, psychology, and optimisation. Although MARL has achieved considerable empirical success in solving real-world games, there is a lack of a self-contained overview in the literature that elaborates the game theoretical foundations of modern MARL methods and summarises the recent advances. In fact, the majority of existing surveys are outdated and do not fully cover the recent developments since 2010. In this work, we provide a monograph on MARL that covers both the fundamentals and the latest developments in the research frontier. The goal of our monograph is to provide a self-contained assessment of the current state-of-the-art MARL techniques from a game theoretical perspective. We expect this work to serve as a stepping stone for both new researchers who are about to enter this fast-growing domain and existing domain experts who want to obtain a panoramic view and identify new directions based on recent advances.

قيم البحث

193 - Christian A. Schroeder de Witt , Jakob N. Foerster , Gregory Farquhar 2018

Cooperative multi-agent reinforcement learning often requires decentralised policies, which severely limit the agents ability to coordinate their behaviour. In this paper, we show that common knowledge between agents allows for complex decentralised coordination. Common knowledge arises naturally in a large number of decentralised cooperative multi-agent tasks, for example, when agents can reconstruct parts of each others observations. Since agents an independently agree on their common knowledge, they can execute complex coordinated policies that condition on this knowledge in a fully decentralised fashion. We propose multi-agent common knowledge reinforcement learning (MACKRL), a novel stochastic actor-critic algorithm that learns a hierarchical policy tree. Higher levels in the hierarchy coordinate groups of agents by conditioning on their common knowledge, or delegate to lower levels with smaller subgroups but potentially richer common knowledge. The entire policy tree can be executed in a fully decentralised fashion. As the lowest policy tree level consists of independent policies for each agent, MACKRL reduces to independently learnt decentralised policies as a special case. We demonstrate that our method can exploit common knowledge for superior performance on complex decentralised coordination tasks, including a stochastic matrix game and challenging problems in StarCraft II unit micromanagement.

أنظمة متعددة العملاء الذكاء الاصطناعي علوم الكمبيوتر ونظرية الألعاب

Multi-agent Reinforcement Learning in Sequential Social Dilemmas

90 - Joel Z. Leibo , Vinicius Zambaldi , Marc Lanctot 2017

Matrix games like Prisoners Dilemma have guided research on social dilemmas for decades. However, they necessarily treat the choice to cooperate or defect as an atomic action. In real-world social dilemmas these choices are temporally extended. Coope rativeness is a property that applies to policies, not elementary actions. We introduce sequential social dilemmas that share the mixed incentive structure of matrix game social dilemmas but also require agents to learn policies that implement their strategic intentions. We analyze the dynamics of policies learned by multiple self-interested independent learning agents, each using its own deep Q-network, on two Markov games we introduce here: 1. a fruit Gathering game and 2. a Wolfpack hunting game. We characterize how learned behavior in each domain changes as a function of environmental factors including resource abundance. Our experiments show how conflict can emerge from competition over shared resources and shed light on how the sequential nature of real world social dilemmas affects cooperation.

أنظمة متعددة العملاء الذكاء الاصطناعي علوم الكمبيوتر ونظرية الألعاب

Scalable Evaluation of Multi-Agent Reinforcement Learning with Melting Pot

122 - Joel Z. Leibo , Edgar Due~nez-Guzman , Alexander Sasha Vezhnevets 2021

Existing evaluation suites for multi-agent reinforcement learning (MARL) do not assess generalization to novel situations as their primary objective (unlike supervised-learning benchmarks). Our contribution, Melting Pot, is a MARL evaluation suite th at fills this gap, and uses reinforcement learning to reduce the human labor required to create novel test scenarios. This works because one agents behavior constitutes (part of) another agents environment. To demonstrate scalability, we have created over 80 unique test scenarios covering a broad range of research topics such as social dilemmas, reciprocity, resource sharing, and task partitioning. We apply these test scenarios to standard MARL training algorithms, and demonstrate how Melting Pot reveals weaknesses not apparent from training performance alone.

أنظمة متعددة العملاء الذكاء الاصطناعي

Signal Instructed Coordination in Cooperative Multi-agent Reinforcement Learning

517 - Liheng Chen , Hongyi Guo , Yali Du 2019

In many real-world problems, a team of agents need to collaborate to maximize the common reward. Although existing works formulate this problem into a centralized learning with decentralized execution framework, which avoids the non-stationary proble m in training, their decentralized execution paradigm limits the agents capability to coordinate. Inspired by the concept of correlated equilibrium, we propose to introduce a coordination signal to address this limitation, and theoretically show that following mild conditions, decentralized agents with the coordination signal can coordinate their individual policies as manipulated by a centralized controller. The idea of introducing coordination signal is to encapsulate coordinated strategies into the signals, and use the signals to instruct the collaboration in decentralized execution. To encourage agents to learn to exploit the coordination signal, we propose Signal Instructed Coordination (SIC), a novel coordination module that can be integrated with most existing MARL frameworks. SIC casts a common signal sampled from a pre-defined distribution to all agents, and introduces an information-theoretic regularization to facilitate the consistency between the observed signal and agents policies. Our experiments show that SIC consistently improves performance over well-recognized MARL models in both matrix games and a predator-prey game with high-dimensional strategy space.

أنظمة متعددة العملاء الذكاء الاصطناعي

Hierarchical Reinforcement Learning for Multi-agent MOBA Game

67 - Zhijian Zhang , Haozheng Li , Luo Zhang 2019

Real Time Strategy (RTS) games require macro strategies as well as micro strategies to obtain satisfactory performance since it has large state space, action space, and hidden information. This paper presents a novel hierarchical reinforcement learni ng model for mastering Multiplayer Online Battle Arena (MOBA) games, a sub-genre of RTS games. The novelty of this work are: (1) proposing a hierarchical framework, where agents execute macro strategies by imitation learning and carry out micromanipulations through reinforcement learning, (2) developing a simple self-learning method to get better sample efficiency for training, and (3) designing a dense reward function for multi-agent cooperation in the absence of game engine or Application Programming Interface (API). Finally, various experiments have been performed to validate the superior performance of the proposed method over other state-of-the-art reinforcement learning algorithms. Agent successfully learns to combat and defeat bronze-level built-in AI with 100% win rate, and experiments show that our method can create a competitive multi-agent for a kind of mobile MOBA game {it King of Glory} in 5v5 mode.

التعلم الآلي الذكاء الاصطناعي