ﻻ يوجد ملخص باللغة العربية
We present fictitious play dynamics for stochastic games and analyze its convergence properties in zero-sum stochastic games. Our dynamics involves players forming beliefs on opponent strategy and their own continuation payoff (Q-function), and playing a greedy best response using estimated continuation payoffs. Players update their beliefs from observations of opponent actions. A key property of the learning dynamics is that update of the beliefs on Q-functions occurs at a slower timescale than update of the beliefs on strategies. We show both in the model-based and model-free cases (without knowledge of player payoff functions and state transition probabilities), the beliefs on strategies converge to a stationary mixed Nash equilibrium of the zero-sum stochastic game.
We study multi-agent reinforcement learning (MARL) in infinite-horizon discounted zero-sum Markov games. We focus on the practical but challenging setting of decentralized MARL, where agents make decisions without coordination by a centralized contro
This paper considers two-player zero-sum finite-horizon Markov games with simultaneous moves. The study focuses on the challenging settings where the value function or the model is parameterized by general function classes. Provably efficient algorit
Stochastic differential games have been used extensively to model agents competitions in Finance, for instance, in P2P lending platforms from the Fintech industry, the banking system for systemic risk, and insurance markets. The recently proposed mac
Zero-sum games have long guided artificial intelligence research, since they possess both a rich strategy space of best-responses and a clear evaluation metric. Whats more, competition is a vital mechanism in many real-world multi-agent systems capab
We conduct a local non-asymptotic analysis of the logistic fictitious play (LFP) algorithm, and show that with high probability, this algorithm converges locally at rate $O(1/t)$. To achieve this, we first develop a global non-asymptotic analysis of