Sample-based Approximation of Nash in Large Many-Player Games via Gradient Descent

68 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Ian Gemp

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Ian Gemp - Rahul Savani - Marc Lanctot

علوم الكمبيوتر ونظرية الألعاب

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Nash equilibrium is a central concept in game theory. Several Nash solvers exist, yet none scale to normal-form games with many actions and many players, especially those with payoff tensors too big to be stored in memory. In this work, we propose an approach that iteratively improves an approximation to a Nash equilibrium through joint play. It accomplishes this by tracing a previously established homotopy which connects instances of the game defined with decaying levels of entropy regularization. To encourage iterates to remain near this path, we efficiently minimize emph{average deviation incentive} via stochastic gradient descent, intelligently sampling entries in the payoff tensor as needed. This process can also be viewed as constructing and reacting to a polymatrix approximation to the game. In these ways, our proposed approach, emph{average deviation incentive descent with adaptive sampling} (ADIDAS), is most similar to three classical approaches, namely homotopy-type, Lyapunov, and iterative polymatrix solvers. We demonstrate through experiments the ability of this approach to approximate a Nash equilibrium in normal-form games with as many as seven players and twenty one actions (over one trillion outcomes) that are orders of magnitude larger than those possible with prior algorithms.

قيم البحث

109 - Edward Hughes , Thomas W. Anthony , Tom Eccles 2020

Zero-sum games have long guided artificial intelligence research, since they possess both a rich strategy space of best-responses and a clear evaluation metric. Whats more, competition is a vital mechanism in many real-world multi-agent systems capab le of generating intelligent innovations: Darwinian evolution, the market economy and the AlphaZero algorithm, to name a few. In two-player zero-sum games, the challenge is usually viewed as finding Nash equilibrium strategies, safeguarding against exploitation regardless of the opponent. While this captures the intricacies of chess or Go, it avoids the notion of cooperation with co-players, a hallmark of the major transitions leading from unicellular organisms to human civilization. Beyond two players, alliance formation often confers an advantage; however this requires trust, namely the promise of mutual cooperation in the face of incentives to defect. Successful play therefore requires adaptation to co-players rather than the pursuit of non-exploitability. Here we argue that a systematic study of many-player zero-sum games is a crucial element of artificial intelligence research. Using symmetric zero-sum matrix games, we demonstrate formally that alliance formation may be seen as a social dilemma, and empirically that naive multi-agent reinforcement learning therefore fails to form alliances. We introduce a toy model of economic competition, and show how reinforcement learning may be augmented with a peer-to-peer contract mechanism to discover and enforce alliances. Finally, we generalize our agent model to incorporate temporally-extended contracts, presenting opportunities for further work.

علوم الكمبيوتر ونظرية الألعاب التعلم الآلي أنظمة متعددة العملاء

Smoothed Complexity of 2-player Nash Equilibria

96 - Shant Boodaghians , Joshua Brakensiek , Samuel B. Hopkins andn Aviad Rubinstein 2020

We prove that computing a Nash equilibrium of a two-player ($n times n$) game with payoffs in $[-1,1]$ is PPAD-hard (under randomized reductions) even in the smoothed analysis setting, smoothing with noise of constant magnitude. This gives a strong n egative answer to conjectures of Spielman and Teng [ST06] and Cheng, Deng, and Teng [CDT09]. In contrast to prior work proving PPAD-hardness after smoothing by noise of magnitude $1/operatorname{poly}(n)$ [CDT09], our smoothed complexity result is not proved via hardness of approximation for Nash equilibria. This is by necessity, since Nash equilibria can be approximated to constant error in quasi-polynomial time [LMM03]. Our results therefore separate smoothed complexity and hardness of approximation for Nash equilibria in two-player games. The key ingredient in our reduction is the use of a random zero-sum game as a gadget to produce two-player games which remain hard even after smoothing. Our analysis crucially shows that all Nash equilibria of random zero-sum games are far from pure (with high probability), and that this remains true even after smoothing.

علوم الكمبيوتر ونظرية الألعاب التعقيد الحسابي

Logarithmic Query Complexity for Approximate Nash Computation in Large Games

393 - Paul W. Goldberg , Francisco J. Marmolejo-Cossio , Zhiwei Steven Wu 2016

We investigate the problem of equilibrium computation for large $n$-player games. Large games have a Lipschitz-type property that no single players utility is greatly affected by any other individual players actions. In this paper, we mostly focus on the case where any change of strategy by a player causes other players payoffs to change by at most $frac{1}{n}$. We study algorithms having query access to the games payoff function, aiming to find $epsilon$-Nash equilibria. We seek algorithms that obtain $epsilon$ as small as possible, in time polynomial in $n$. Our main result is a randomised algorithm that achieves $epsilon$ approaching $frac{1}{8}$ for 2-strategy games in a {em completely uncoupled} setting, where each player observes her own payoff to a query, and adjusts her behaviour independently of other players payoffs/actions. $O(log n)$ rounds/queries are required. We also show how to obtain a slight improvement over $frac{1}{8}$, by introducing a small amount of communication between the players. Finally, we give extension of our results to large games with more than two strategies per player, and alternative largeness parameters.

علوم الكمبيوتر ونظرية الألعاب بنى وهياكل البيانات والخوارزميات

A Direct Reduction from k-Player to 2-Player Approximate Nash Equilibrium

149 - Uriel Feige , Inbal Talgam-Cohen 2010

We present a direct reduction from k-player games to 2-player games that preserves approximate Nash equilibrium. Previously, the computational equivalence of computing approximate Nash equilibrium in k-player and 2-player games was established via an indirect reduction. This included a sequence of works defining the complexity class PPAD, identifying complete problems for this class, showing that computing approximate Nash equilibrium for k-player games is in PPAD, and reducing a PPAD-complete problem to computing approximate Nash equilibrium for 2-player games. Our direct reduction makes no use of the concept of PPAD, thus eliminating some of the difficulties involved in following the known indirect reduction.

علوم الكمبيوتر ونظرية الألعاب

Policy Gradient Methods Find the Nash Equilibrium in N-player General-sum Linear-quadratic Games

169 - Ben Hambly , Renyuan Xu , Huining Yang 2021

We consider a general-sum N-player linear-quadratic game with stochastic dynamics over a finite horizon and prove the global convergence of the natural policy gradient method to the Nash equilibrium. In order to prove the convergence of the method, w e require a certain amount of noise in the system. We give a condition, essentially a lower bound on the covariance of the noise in terms of the model parameters, in order to guarantee convergence. We illustrate our results with numerical experiments to show that even in situations where the policy gradient method may not converge in the deterministic setting, the addition of noise leads to convergence.

التحسين والتحكم علوم الكمبيوتر ونظرية الألعاب التعلم الآلي

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة الجزيرة الخاصة

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Sample-based Approximation of Nash in Large Many-Player Games via Gradient Descent

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً