No Arabic abstract
It is known that there are uncoupled learning heuristics leading to Nash equilibrium in all finite games. Why should players use such learning heuristics and where could they come from? We show that there is no uncoupled learning heuristic leading to Nash equilibrium in all finite games that a player has an incentive to adopt, that would be evolutionary stable or that could learn itself. Rather, a player has an incentive to strategically teach such a learning opponent in order secure at least the Stackelberg leader payoff. The impossibility result remains intact when restricted to the classes of generic games, two-player games, potential games, games with strategic complements or 2x2 games, in which learning is known to be nice. More generally, it also applies to uncoupled learning heuristics leading to correlated equilibria, rationalizable outcomes, iterated admissible outcomes, or minimal curb sets. A possibility result restricted to strategically trivial games fails if some generic games outside this class are considered as well.
In this paper we describe an approach to resolve strategic games in which players can assume different types along the game. Our goal is to infer which type the opponent is adopting at each moment so that we can increase the players odds. To achieve that we use Markov games combined with hidden Markov model. We discuss a hypothetical example of a tennis game whose solution can be applied to any game with similar characteristics.
The combination of deep reinforcement learning and search at both training and test time is a powerful paradigm that has led to a number of successes in single-agent settings and perfect-information games, best exemplified by AlphaZero. However, prior algorithms of this form cannot cope with imperfect-information games. This paper presents ReBeL, a general framework for self-play reinforcement learning and search that provably converges to a Nash equilibrium in any two-player zero-sum game. In the simpler setting of perfect-information games, ReBeL reduces to an algorithm similar to AlphaZero. Results in two different imperfect-information games show ReBeL converges to an approximate Nash equilibrium. We also show ReBeL achieves superhuman performance in heads-up no-limit Texas holdem poker, while using far less domain knowledge than any prior poker AI.
Candogan et al. (2011) provide an orthogonal direct-sum decomposition of finite games into potential, harmonic and nonstrategic components. In this paper we study the issue of decomposing games that are strategically equivalent from a game-theoretical point of view, for instance games obtained via transformations such as duplications of strategies or positive affine mappings of of payoffs. We show the need to define classes of decompositions to achieve commutativity of game transformations and decompositions.
Large-scale screening for potential threats with limited resources and capacity for screening is a problem of interest at airports, seaports, and other ports of entry. Adversaries can observe screening procedures and arrive at a time when there will be gaps in screening due to limited resource capacities. To capture this game between ports and adversaries, this problem has been previously represented as a Stackelberg game, referred to as a Threat Screening Game (TSG). Given the significant complexity associated with solving TSGs and uncertainty in arrivals of customers, existing work has assumed that screenees arrive and are allocated security resources at the beginning of the time window. In practice, screenees such as airport passengers arrive in bursts correlated with flight time and are not bound by fixed time windows. To address this, we propose an online threat screening model in which screening strategy is determined adaptively as a passenger arrives while satisfying a hard bound on acceptable risk of not screening a threat. To solve the online problem with a hard bound on risk, we formulate it as a Reinforcement Learning (RL) problem with constraints on the action space (hard bound on risk). We provide a novel way to efficiently enforce linear inequality constraints on the action output in Deep Reinforcement Learning. We show that our solution allows us to significantly reduce screenee wait time while guaranteeing a bound on risk.
We provide an epistemic analysis of arbitrary strategic games based on possibility correspondences. We first establish a generic result that links true common beliefs (and, respectively, common knowledge) of players rationality defined by means of `monotonic properties, with the iterated elimination of strategies that do not satisfy these properties. It allows us to deduce the customary results concerned with true common beliefs of rationality and iterated elimination of strictly dominated strategies as simple corollaries. This approach relies on Tarskis Fixpoint Theorem. We also provide an axiomatic presentation of this generic result. This allows us to clarify the proof-theoretic principles assumed in players reasoning. Finally, we provide an alternative characterization of the iterated elimination of strategies based on the concept of a public announcement. It applies to `global properties. Both classes of properties include the notions of rationalizability and the iterated elimination of strictly dominated strategies.