Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Fictitious play in zero-sum stochastic games

355 0 0.0 ( 0 )

Download Cite

Added by Muhammed Omer Sayin

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors Muhammed O. Sayin - Francesca Parise - Asuman Ozdaglar

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We present fictitious play dynamics for stochastic games and analyze its convergence properties in zero-sum stochastic games. Our dynamics involves players forming beliefs on opponent strategy and their own continuation payoff (Q-function), and playing a greedy best response using estimated continuation payoffs. Players update their beliefs from observations of opponent actions. A key property of the learning dynamics is that update of the beliefs on Q-functions occurs at a slower timescale than update of the beliefs on strategies. We show both in the model-based and model-free cases (without knowledge of player payoff functions and state transition probabilities), the beliefs on strategies converge to a stationary mixed Nash equilibrium of the zero-sum stochastic game.

rate research

Decentralized Q-Learning in Zero-sum Markov Games

392 - Muhammed O. Sayin , Kaiqing Zhang , David S. Leslie 2021

We study multi-agent reinforcement learning (MARL) in infinite-horizon discounted zero-sum Markov games. We focus on the practical but challenging setting of decentralized MARL, where agents make decisions without coordination by a centralized controller, but only based on their own payoffs and local actions executed. The agents need not observe the opponents actions or payoffs, possibly being even oblivious to the presence of the opponent, nor be aware of the zero-sum structure of the underlying game, a setting also referred to as radically uncoupled in the literature of learning in games. In this paper, we develop for the first time a radically uncoupled Q-learning dynamics that is both rational and convergent: the learning dynamics converges to the best response to the opponents strategy when the opponent follows an asymptotically stationary strategy; the value function estimates converge to the payoffs at a Nash equilibrium when both agents adopt the dynamics. The key challenge in this decentralized setting is the non-stationarity of the learning environment from an agents perspective, since both her own payoffs and the system evolution depend on the actions of other agents, and each agent adapts their policies simultaneously and independently. To address this issue, we develop a two-timescale learning dynamics where each agent updates her local Q-function and value function estimates concurrently, with the latter happening at a slower timescale.

Computer Science and Game Theory Machine Learning Multiagent Systems

Towards General Function Approximation in Zero-Sum Markov Games

100 - Baihe Huang , Jason D. Lee , Zhaoran Wang 2021

This paper considers two-player zero-sum finite-horizon Markov games with simultaneous moves. The study focuses on the challenging settings where the value function or the model is parameterized by general function classes. Provably efficient algorithms for both decoupled and {coordinated} settings are developed. In the {decoupled} setting where the agent controls a single player and plays against an arbitrary opponent, we propose a new model-free algorithm. The sample complexity is governed by the Minimax Eluder dimension -- a new dimension of the function class in Markov games. As a special case, this method improves the state-of-the-art algorithm by a $sqrt{d}$ factor in the regret when the reward function and transition kernel are parameterized with $d$-dimensional linear features. In the {coordinated} setting where both players are controlled by the agent, we propose a model-based algorithm and a model-free algorithm. In the model-based algorithm, we prove that sample complexity can be bounded by a generalization of Witness rank to Markov games. The model-free algorithm enjoys a $sqrt{K}$-regret upper bound where $K$ is the number of episodes. Our algorithms are based on new techniques of alternate optimism.

Computer Science and Game Theory Machine Learning Machine Learning

Convergence of Deep Fictitious Play for Stochastic Differential Games

135 - Jiequn Han , Ruimeng Hu , Jihao Long 2020

Stochastic differential games have been used extensively to model agents competitions in Finance, for instance, in P2P lending platforms from the Fintech industry, the banking system for systemic risk, and insurance markets. The recently proposed machine learning algorithm, deep fictitious play, provides a novel efficient tool for finding Markovian Nash equilibrium of large $N$-player asymmetric stochastic differential games [J. Han and R. Hu, Mathematical and Scientific Machine Learning Conference, pages 221-245, PMLR, 2020]. By incorporating the idea of fictitious play, the algorithm decouples the game into $N$ sub-optimization problems, and identifies each players optimal strategy with the deep backward stochastic differential equation (BSDE) method parallelly and repeatedly. In this paper, we prove the convergence of deep fictitious play (DFP) to the true Nash equilibrium. We can also show that the strategy based on DFP forms an $eps$-Nash equilibrium. We generalize the algorithm by proposing a new approach to decouple the games, and present numerical results of large population games showing the empirical convergence of the algorithm beyond the technical assumptions in the theorems.

Optimization and Control Computer Science and Game Theory Machine Learning

Learning to Resolve Alliance Dilemmas in Many-Player Zero-Sum Games

109 - Edward Hughes , Thomas W. Anthony , Tom Eccles 2020

Zero-sum games have long guided artificial intelligence research, since they possess both a rich strategy space of best-responses and a clear evaluation metric. Whats more, competition is a vital mechanism in many real-world multi-agent systems capable of generating intelligent innovations: Darwinian evolution, the market economy and the AlphaZero algorithm, to name a few. In two-player zero-sum games, the challenge is usually viewed as finding Nash equilibrium strategies, safeguarding against exploitation regardless of the opponent. While this captures the intricacies of chess or Go, it avoids the notion of cooperation with co-players, a hallmark of the major transitions leading from unicellular organisms to human civilization. Beyond two players, alliance formation often confers an advantage; however this requires trust, namely the promise of mutual cooperation in the face of incentives to defect. Successful play therefore requires adaptation to co-players rather than the pursuit of non-exploitability. Here we argue that a systematic study of many-player zero-sum games is a crucial element of artificial intelligence research. Using symmetric zero-sum matrix games, we demonstrate formally that alliance formation may be seen as a social dilemma, and empirically that naive multi-agent reinforcement learning therefore fails to form alliances. We introduce a toy model of economic competition, and show how reinforcement learning may be augmented with a peer-to-peer contract mechanism to discover and enforce alliances. Finally, we generalize our agent model to incorporate temporally-extended contracts, presenting opportunities for further work.

Computer Science and Game Theory Machine Learning Multiagent Systems

(Local) Non-Asymptotic Analysis of Logistic Fictitious Play for Two-Player Zero-Sum Games and Its Deterministic Variant

97 - Renbo Zhao , Qiuyun Zhu 2021

We conduct a local non-asymptotic analysis of the logistic fictitious play (LFP) algorithm, and show that with high probability, this algorithm converges locally at rate $O(1/t)$. To achieve this, we first develop a global non-asymptotic analysis of the deterministic variant of LFP, which we call DLFP, and derive a class of convergence rates based on different step-sizes. We then incorporate a particular form of stochastic noise to the analysis of DLFP, and obtain the local convergence rate of LFP. As a result of independent interest, we extend DLFP to solve a class of strongly convex composite optimization problems. We show that although the resulting algorithm is a simple variant of the generalized Frank-Wolfe method in Nesterov [1,Section 5], somewhat surprisingly, it enjoys significantly improved convergence rate.

Optimization and Control Probability

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Fictitious play in zero-sum stochastic games

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions