ترغب بنشر مسار تعليمي؟ اضغط هنا

Faster Algorithms for Optimal Ex-Ante Coordinated Collusive Strategies in Extensive-Form Zero-Sum Games

124   0   0.0 ( 0 )
 نشر من قبل Gabriele Farina
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

We focus on the problem of finding an optimal strategy for a team of two players that faces an opponent in an imperfect-information zero-sum extensive-form game. Team members are not allowed to communicate during play but can coordinate before the game. In that setting, it is known that the best the team can do is sample a profile of potentially randomized strategies (one per player) from a joint (a.k.a. correlated) probability distribution at the beginning of the game. In this paper, we first provide new modeling results about computing such an optimal distribution by drawing a connection to a different literature on extensive-form correlation. Second, we provide an algorithm that computes such an optimal distribution by only using profiles where only one of the team members gets to randomize in each profile. We can also cap the number of such profiles we allow in the solution. This begets an anytime algorithm by increasing the cap. We find that often a handful of well-chosen such profiles suffices to reach optimal utility for the team. This enables team members to reach coordination through a relatively simple and understandable plan. Finally, inspired by this observation and leveraging theoretical concepts that we introduce, we develop an efficient column-generation algorithm for finding an optimal distribution for the team. We evaluate it on a suite of common benchmark games. It is three orders of magnitude faster than the prior state of the art on games that the latter can solve and it can also solve several games that were previously unsolvable.

قيم البحث

اقرأ أيضاً

Despite the many recent practical and theoretical breakthroughs in computational game theory, equilibrium finding in extensive-form team games remains a significant challenge. While NP-hard in the worst case, there are provably efficient algorithms f or certain families of team game. In particular, if the game has common external information, also known as A-loss recall -- informally, actions played by non-team members (i.e., the opposing team or nature) are either unknown to the entire team, or common knowledge within the team -- then polynomial-time algorithms exist (Kaneko and Kline, 1995). In this paper, we devise a completely new algorithm for solving team games. It uses a tree decomposition of the constraint system representing each teams strategy to reduce the number and degree of constraints required for correctness (tightness of the mathematical program). Our algorithm reduces the problem of solving team games to a linear program with at most $NW^{w+O(1)}$ nonzero entries in the constraint matrix, where $N$ is the size of the game tree, $w$ is a parameter that depends on the amount of uncommon external information, and $W$ is the treewidth of the tree decomposition. In public-action games, our program size is bounded by the tighter $tilde O(3^t 2^{t(n-1)}NW)$ for teams of $n$ players with $t$ types each. Since our algorithm describes the polytope of correlated strategies directly, we get equilibrium finding in correlated strategies for free -- instead of, say, having to run a double oracle algorithm. We show via experiments on a standard suite of games that our algorithm achieves state-of-the-art performance on all benchmark game classes except one. We also present, to our knowledge, the first experiments for this setting where more than one team has more than one member.
Hindsight rationality is an approach to playing general-sum games that prescribes no-regret learning dynamics for individual agents with respect to a set of deviations, and further describes jointly rational behavior among multiple agents with mediat ed equilibria. To develop hindsight rational learning in sequential decision-making settings, we formalize behavioral deviations as a general class of deviations that respect the structure of extensive-form games. Integrating the idea of time selection into counterfactual regret minimization (CFR), we introduce the extensive-form regret minimization (EFR) algorithm that achieves hindsight rationality for any given set of behavioral deviations with computation that scales closely with the complexity of the set. We identify behavioral deviation subsets, the partial sequence deviation types, that subsume previously studied types and lead to efficient EFR instances in games with moderate lengths. In addition, we present a thorough empirical analysis of EFR instantiated with different deviation types in benchmark games, where we find that stronger types typically induce better performance.
Counterfactual Regret Minimization (CFR) is an efficient no-regret learning algorithm for decision problems modeled as extensive games. CFRs regret bounds depend on the requirement of perfect recall: players always remember information that was revea led to them and the order in which it was revealed. In games without perfect recall, however, CFRs guarantees do not apply. In this paper, we present the first regret bound for CFR when applied to a general class of games with imperfect recall. In addition, we show that CFR applied to any abstraction belonging to our general class results in a regret bound not just for the abstract game, but for the full game as well. We verify our theory and show how imperfect recall can be used to trade a small increase in regret for a significant reduction in memory in three domains: die-roll poker, phantom tic-tac-toe, and Bluff.
Extensive-form games constitute the standard representation scheme for games with a temporal component. But do all extensive-form games correspond to protocols that we can implement in the real world? We often rule out games with imperfect recall, wh ich prescribe that an agent forget something that she knew before. In this paper, we show that even some games with perfect recall can be problematic to implement. Specifically, we show that if the agents have a sense of time passing (say, access to a clock), then some extensive-form games can no longer be implemented; no matter how we attempt to time the game, some information will leak to the agents that they are not supposed to have. We say such a game is not exactly timeable. We provide easy-to-check necessary and sufficient conditions for a game to be exactly timeable. Most of the technical depth of the paper concerns how to approximately time games, which we show can always be done, though it may require large amounts of time. Specifically, we show that for some games the time required to approximately implement the game grows as a power tower of height proportional to the number of players and with a parameter that measures the precision of the approximation at the top of the power tower. In practice, that makes the games untimeable. Besides the conceptual contribution to game theory, we believe our methodology can have applications to preventing information leakage in security protocols.
Bayesian persuasion is the study of information sharing policies among strategic agents. A prime example is signaling in online ad auctions: what information should a platform signal to an advertiser regarding a user when selling the opportunity to a dvertise to her? Practical considerations such as preventing discrimination, protecting privacy or acknowledging limited attention of the information receiver impose constraints on information sharing. In this work, we propose and analyze a simple way to mathematically model such constraints as restrictions on Receivers admissible posterior beliefs. We consider two families of constraints - ex ante and ex post, where the latter limits each instance of Sender-Receiver communication, while the former more general family can also pose restrictions in expectation. For the ex ante family, Doval and Skreta establish the existence of an optimal signaling scheme with a small number of signals - at most the number of constraints plus the number of states of nature; we show this result is tight and provide an alternative proof for it. For the ex post family, we tighten a bound of V{o}lund, showing that the required number of signals is at most the number of states of nature, as in the original Kamenica-Gentzkow setting. As our main algorithmic result, we provide an additive bi-criteria FPTAS for an optimal constrained signaling scheme assuming a constant number of states; we improve the approximation to single-criteria under a Slater-like regularity condition. The FPTAS holds under standard assumptions; relaxed assumptions yield a PTAS. Finally, we bound the ratio between Senders optimal utility under convex ex ante constraints and the corresponding ex post constraints. This bound applies to finding an approximately welfare-maximizing constrained signaling scheme in ad auctions.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا