Navigating the Landscape of Multiplayer Games

89 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Shayegan Omidshafiei

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Shayegan Omidshafiei - Karl Tuyls - Wojciech M. Czarnecki

الذكاء الاصطناعي أنظمة متعددة العملاء

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Multiplayer games have long been used as testbeds in artificial intelligence research, aptly referred to as the Drosophila of artificial intelligence. Traditionally, researchers have focused on using well-known games to build strong agents. This progress, however, can be better informed by characterizing games and their topological landscape. Tackling this latter question can facilitate understanding of agents and help determine what game an agent should target next as part of its training. Here, we show how network measures applied to response graphs of large-scale games enable the creation of a landscape of games, quantifying relationships between games of varying sizes and characteristics. We illustrate our findings in domains ranging from canonical games to complex empirical games capturing the performance of trained agents pitted against one another. Our results culminate in a demonstration leveraging this information to generate new and interesting games, including mixtures of empirical games synthesized from real world games.

قيم البحث

118 - Zhengxing Chen , Truong-Huy D Nguyen , Yuyu Xu 2018

Multiplayer Online Battle Arena (MOBA) games have received increasing popularity recently. In a match of such games, players compete in two teams of five, each controlling an in-game avatars, known as heroes, selected from a roster of more than 100. The selection of heroes, also known as pick or draft, takes place before the match starts and alternates between the two teams until each player has selected one hero. Heroes are designed with different strengths and weaknesses to promote team cooperation in a game. Intuitively, heroes in a strong team should complement each others strengths and suppressing those of opponents. Hero drafting is therefore a challenging problem due to the complex hero-to-hero relationships to consider. In this paper, we propose a novel hero recommendation system that suggests heroes to add to an existing team while maximizing the teams prospect for victory. To that end, we model the drafting between two teams as a combinatorial game and use Monte Carlo Tree Search (MCTS) for estimating the values of hero combinations. Our empirical evaluation shows that hero teams drafted by our recommendation algorithm have significantly higher win rate against teams constructed by other baseline and state-of-the-art strategies.

الذكاء الاصطناعي تفاعل الإنسان والحاسوب الشبكات الاجتماعية والمعلومات

Multiplayer AlphaZero

81 - Nick Petosa , Tucker Balch 2019

The AlphaZero algorithm has achieved superhuman performance in two-player, deterministic, zero-sum games where perfect information of the game state is available. This success has been demonstrated in Chess, Shogi, and Go where learning occurs solely through self-play. Many real-world applications (e.g., equity trading) require the consideration of a multiplayer environment. In this work, we suggest novel modifications of the AlphaZero algorithm to support multiplayer environments, and evaluate the approach in two simple 3-player games. Our experiments show that multiplayer AlphaZero learns successfully and consistently outperforms a competing approach: Monte Carlo tree search. These results suggest that our modified AlphaZero can learn effective strategies in multiplayer game scenarios. Our work supports the use of AlphaZero in multiplayer games and suggests future research for more complex environments.

الذكاء الاصطناعي

Improving Policies via Search in Cooperative Partially Observable Games

83 - Adam Lerer , Hengyuan Hu , Jakob Foerster 2019

Recent superhuman results in games have largely been achieved in a variety of zero-sum settings, such as Go and Poker, in which agents need to compete against others. However, just like humans, real-world AI systems have to coordinate and communicate with other agents in cooperative partially observable environments as well. These settings commonly require participants to both interpret the actions of others and to act in a way that is informative when being interpreted. Those abilities are typically summarized as theory f mind and are seen as crucial for social interactions. In this paper we propose two different search techniques that can be applied to improve an arbitrary agreed-upon policy in a cooperative partially observable game. The first one, single-agent search, effectively converts the problem into a single agent setting by making all but one of the agents play according to the agreed-upon policy. In contrast, in multi-agent search all agents carry out the same common-knowledge search procedure whenever doing so is computationally feasible, and fall back to playing according to the agreed-upon policy otherwise. We prove that these search procedures are theoretically guaranteed to at least maintain the original performance of the agreed-upon policy (up to a bounded approximation error). In the benchmark challenge problem of Hanabi, our search technique greatly improves the performance of every agent we tested and when applied to a policy trained using RL achieves a new state-of-the-art score of 24.61 / 25 in the game, compared to a previous-best of 24.08 / 25.

الذكاء الاصطناعي أنظمة متعددة العملاء

Discovering Multi-Agent Auto-Curricula in Two-Player Zero-Sum Games

156 - Xidong Feng , Oliver Slumbers , Yaodong Yang 2021

When solving two-player zero-sum games, multi-agent reinforcement learning (MARL) algorithms often create populations of agents where, at each iteration, a new agent is discovered as the best response to a mixture over the opponent population. Within such a process, the update rules of who to compete with (i.e., the opponent mixture) and how to beat them (i.e., finding best responses) are underpinned by manually developed game theoretical principles such as fictitious play and Double Oracle. In this paper we introduce a framework, LMAC, based on meta-gradient descent that automates the discovery of the update rule without explicit human design. Specifically, we parameterise the opponent selection module by neural networks and the best-response module by optimisation subroutines, and update their parameters solely via interaction with the game engine, where both players aim to minimise their exploitability. Surprisingly, even without human design, the discovered MARL algorithms achieve competitive or even better performance with the state-of-the-art population-based game solvers (e.g., PSRO) on Games of Skill, differentiable Lotto, non-transitive Mixture Games, Iterated Matching Pennies, and Kuhn Poker. Additionally, we show that LMAC is able to generalise from small games to large games, for example training on Kuhn Poker and outperforming PSRO on Leduc Poker. Our work inspires a promising future direction to discover general MARL algorithms solely from data.

الذكاء الاصطناعي أنظمة متعددة العملاء

Multiplayer parallel repetition for expander games

79 - Irit Dinur , Prahladh Harsha , Rakesh Venkat 2016

We investigate the value of parallel repetition of one-round games with any number of players $kge 2$. It has been an open question whether an analogue of Razs Parallel Repetition Theorem holds for games with more than two players, i.e., whether the value of the repeated game decays exponentially with the number of repetitions. Verbitsky has shown, via a reduction to the density Hales-Jewett theorem, that the value of the repeated game must approach zero, as the number of repetitions increases. However, the rate of decay obtained in this way is extremely slow, and it is an open question whether the true rate is exponential as is the case for all two-player games. Exponential decay bounds are known for several special cases of multi-player games, e.g., free games and anchored games. In this work, we identify a certain expansion property of the base game and show all games with this property satisfy an exponential decay parallel repetition bound. Free games and anchored games satisfy this expansion property, and thus our parallel repetition theorem reproduces all earlier exponential-decay bounds for multiplayer games. More generally, our parallel repetition bound applies to all multiplayer games that are connected in a certain sense. We also describe a very simple game, called the GHZ game, that does not satisfy this connectivity property, and for which we do not know an exponential decay bound. We suspect that progress on bounding the value of this the parallel repetition of the GHZ game will lead to further progress on the general question.

التعقيد الحسابي