ﻻ يوجد ملخص باللغة العربية
In imperfect-information games, subgame solving is significantly more challenging than in perfect-information games, but in the last few years, such techniques have been developed. They were the key ingredient to the milestone of superhuman play in no-limit Texas holdem poker. Current subgame-solving techniques analyze the entire common-knowledge closure of the players current information set, that is, the smallest set of nodes within which it is common knowledge that the current node lies. However, this set is too large to handle in many games. We introduce an approach that overcomes this obstacle, by instead working with only low-order knowledge. Our approach allows an agent, upon arriving at an infoset, to basically prune any node that is no longer reachable, thereby massively reducing the game tree size relative to the common-knowledge subgame. We prove that, as is, our approach can increase exploitability compared to the blueprint strategy. However, we develop three avenues by which safety can be guaranteed. First, safety is guaranteed if the results of subgame solves are incorporated back into the blueprint. Second, we provide a method where safety is achieved by limiting the infosets at which subgame solving is performed. Third, we prove that our approach, when applied at every infoset reached during play, achieves a weaker notion of equilibrium, which we coin affine equilibrium, and which may be of independent interest. We show that affine equilibria cannot be exploited by any Nash strategy of the opponent, so an opponent who wishes to exploit must open herself to counter-exploitation. Even without the safety-guaranteeing additions, experiments on medium-sized games show that our approach always reduced exploitability even when applied at every infoset, and a depth-limited version of it led to--to our knowledge--the first strong AI for the massive challenge problem dark chess.
A generally intelligent agent must be able to teach itself how to solve problems in complex domains with minimal human supervision. Recently, deep reinforcement learning algorithms combined with self-play have achieved superhuman proficiency in Go, C
We discuss some old common knowledge puzzles and introduce a lot of new common knowledge puzzles.
We propose a relaxation of common belief called factional belief that is suitable for the analysis of strategic coordination on social networks. We show how this definition can be used to analyze revolt games on general graphs, including by giving an
Cooperative multi-agent reinforcement learning often requires decentralised policies, which severely limit the agents ability to coordinate their behaviour. In this paper, we show that common knowledge between agents allows for complex decentralised
For artificially intelligent learning systems to have widespread applicability in real-world settings, it is important that they be able to operate decentrally. Unfortunately, decentralized control is difficult -- computing even an epsilon-optimal jo