Efficient Deviation Types and Learning for Hindsight Rationality in Extensive-Form Games

242 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Dustin Morrill

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Dustin Morrill - Ryan DOrazio - Marc Lanctot

علوم الكمبيوتر ونظرية الألعاب الذكاء الاصطناعي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Hindsight rationality is an approach to playing general-sum games that prescribes no-regret learning dynamics for individual agents with respect to a set of deviations, and further describes jointly rational behavior among multiple agents with mediated equilibria. To develop hindsight rational learning in sequential decision-making settings, we formalize behavioral deviations as a general class of deviations that respect the structure of extensive-form games. Integrating the idea of time selection into counterfactual regret minimization (CFR), we introduce the extensive-form regret minimization (EFR) algorithm that achieves hindsight rationality for any given set of behavioral deviations with computation that scales closely with the complexity of the set. We identify behavioral deviation subsets, the partial sequence deviation types, that subsume previously studied types and lead to efficient EFR instances in games with moderate lengths. In addition, we present a thorough empirical analysis of EFR instantiated with different deviation types in benchmark games, where we find that stronger types typically induce better performance.

قيم البحث

830 - Marc Lanctot , Richard Gibson , Neil Burch 2012

Counterfactual Regret Minimization (CFR) is an efficient no-regret learning algorithm for decision problems modeled as extensive games. CFRs regret bounds depend on the requirement of perfect recall: players always remember information that was revea led to them and the order in which it was revealed. In games without perfect recall, however, CFRs guarantees do not apply. In this paper, we present the first regret bound for CFR when applied to a general class of games with imperfect recall. In addition, we show that CFR applied to any abstraction belonging to our general class results in a regret bound not just for the abstract game, but for the full game as well. We verify our theory and show how imperfect recall can be used to trade a small increase in regret for a significant reduction in memory in three domains: die-roll poker, phantom tic-tac-toe, and Bluff.

علوم الكمبيوتر ونظرية الألعاب الذكاء الاصطناعي

Dependent Types for Extensive Games

151 - Pierre Lescanne 2016

Extensive games are tools largely used in economics to describe decision processes ofa community of agents. In this paper we propose a formal presentation based on theproof assistant COQ which focuses mostly on infinite extensive games and theirchara cteristics. COQ proposes a feature called dependent types, which meansthat the type of an object may depend on the type of its components. For instance,the set of choices or the set of utilities of an agent may depend on the agentherself. Using dependent types, we describe formally a very general class of gamesand strategy profiles, which corresponds somewhat to what game theorists are used to.We also discuss the notions of infiniteness in game theory and how this can beprecisely described.

علوم الكمبيوتر ونظرية الألعاب منطق

Hindsight and Sequential Rationality of Correlated Play

102 - Dustin Morrill , Ryan DOrazio , Reca Sarfati 2020

Driven by recent successes in two-player, zero-sum game solving and playing, artificial intelligence work on games has increasingly focused on algorithms that produce equilibrium-based strategies. However, this approach has been less effective at pro ducing competent players in general-sum games or those with more than two players than in two-player, zero-sum games. An appealing alternative is to consider adaptive algorithms that ensure strong performance in hindsight relative to what could have been achieved with modified behavior. This approach also leads to a game-theoretic analysis, but in the correlated play that arises from joint learning dynamics rather than factored agent behavior at equilibrium. We develop and advocate for this hindsight rationality framing of learning in general sequential decision-making settings. To this end, we re-examine mediated equilibrium and deviation types in extensive-form games, thereby gaining a more complete understanding and resolving past misconceptions. We present a set of examples illustrating the distinct strengths and weaknesses of each type of equilibrium in the literature, and prove that no tractable concept subsumes all others. This line of inquiry culminates in the definition of the deviation and equilibrium classes that correspond to algorithms in the counterfactual regret minimization (CFR) family, relating them to all others in the literature. Examining CFR in greater detail further leads to a new recursive definition of rationality in correlated play that extends sequential rationality in a way that naturally applies to hindsight evaluation.

علوم الكمبيوتر ونظرية الألعاب الذكاء الاصطناعي

Timeability of Extensive-Form Games

496 - Sune K. Jakobsen , Troels B. S{o}rensen , Vincent Conitzer 2015

Extensive-form games constitute the standard representation scheme for games with a temporal component. But do all extensive-form games correspond to protocols that we can implement in the real world? We often rule out games with imperfect recall, wh ich prescribe that an agent forget something that she knew before. In this paper, we show that even some games with perfect recall can be problematic to implement. Specifically, we show that if the agents have a sense of time passing (say, access to a clock), then some extensive-form games can no longer be implemented; no matter how we attempt to time the game, some information will leak to the agents that they are not supposed to have. We say such a game is not exactly timeable. We provide easy-to-check necessary and sufficient conditions for a game to be exactly timeable. Most of the technical depth of the paper concerns how to approximately time games, which we show can always be done, though it may require large amounts of time. Specifically, we show that for some games the time required to approximately implement the game grows as a power tower of height proportional to the number of players and with a parameter that measures the precision of the approximation at the top of the power tower. In practice, that makes the games untimeable. Besides the conceptual contribution to game theory, we believe our methodology can have applications to preventing information leakage in security protocols.

علوم الكمبيوتر ونظرية الألعاب

No-Regret Learning Dynamics for Extensive-Form Correlated Equilibrium

198 - Andrea Celli , Alberto Marchesi , Gabriele Farina 2020

The existence of simple, uncoupled no-regret dynamics that converge to correlated equilibria in normal-form games is a celebrated result in the theory of multi-agent systems. Specifically, it has been known for more than 20 years that when all player s seek to minimize their internal regret in a repeated normal-form game, the empirical frequency of play converges to a normal-form correlated equilibrium. Extensive-form (that is, tree-form) games generalize normal-form games by modeling both sequential and simultaneous moves, as well as private information. Because of the sequential nature and presence of partial information in the game, extensive-form correlation has significantly different properties than the normal-form counterpart, many of which are still open research directions. Extensive-form correlated equilibrium (EFCE) has been proposed as the natural extensive-form counterpart to normal-form correlated equilibrium. However, it was currently unknown whether EFCE emerges as the result of uncoupled agent dynamics. In this paper, we give the first uncoupled no-regret dynamics that converge to the set of EFCEs in $n$-player general-sum extensive-form games with perfect recall. First, we introduce a notion of trigger regret in extensive-form games, which extends that of internal regret in normal-form games. When each player has low trigger regret, the empirical frequency of play is close to an EFCE. Then, we give an efficient no-trigger-regret algorithm. Our algorithm decomposes trigger regret into local subproblems at each decision point for the player, and constructs a global strategy of the player from the local solutions at each decision point.

علوم الكمبيوتر ونظرية الألعاب الذكاء الاصطناعي التعلم الآلي