Evolutionary Dynamics and $Phi$-Regret Minimization in Games

152 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Shayegan Omidshafiei

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Georgios Piliouras - Mark Rowland - Shayegan Omidshafiei

علوم الكمبيوتر ونظرية الألعاب التعلم الآلي أنظمة متعددة العملاء

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Regret has been established as a foundational concept in online learning, and likewise has important applications in the analysis of learning dynamics in games. Regret quantifies the difference between a learners performance against a baseline in hindsight. It is well-known that regret-minimizing algorithms converge to certain classes of equilibria in games; however, traditional forms of regret used in game theory predominantly consider baselines that permit deviations to deterministic actions or strategies. In this paper, we revisit our understanding of regret from the perspective of deviations over partitions of the full emph{mixed} strategy space (i.e., probability distributions over pure strategies), under the lens of the previously-established $Phi$-regret framework, which provides a continuum of stronger regret measures. Importantly, $Phi$-regret enables learning agents to consider deviations from and to mixed strategies, generalizing several existing notions of regret such as external, internal, and swap regret, and thus broadening the insights gained from regret-based analysis of learning algorithms. We prove here that the well-studied evolutionary learning algorithm of replicator dynamics (RD) seamlessly minimizes the strongest possible form of $Phi$-regret in generic $2 times 2$ games, without any modification of the underlying algorithm itself. We subsequently conduct experiments validating our theoretical results in a suite of 144 $2 times 2$ games wherein RD exhibits a diverse set of behaviors. We conclude by providing empirical evidence of $Phi$-regret minimization by RD in some larger games, hinting at further opportunity for $Phi$-regret based study of such algorithms from both a theoretical and empirical perspective.

قيم البحث

115 - Gabriele Farina , Andrea Celli , Alberto Marchesi 2021

The existence of simple uncoupled no-regret learning dynamics that converge to correlated equilibria in normal-form games is a celebrated result in the theory of multi-agent systems. Specifically, it has been known for more than 20 years that when al l players seek to minimize their internal regret in a repeated normal-form game, the empirical frequency of play converges to a normal-form correlated equilibrium. Extensive-form games generalize normal-form games by modeling both sequential and simultaneous moves, as well as imperfect information. Because of the sequential nature and presence of private information in the game, correlation in extensive-form games possesses significantly different properties than its counterpart in normal-form games, many of which are still open research directions. Extensive-form correlated equilibrium (EFCE) has been proposed as the natural extensive-form counterpart to the classical notion of correlated equilibrium in normal-form games. Compared to the latter, the constraints that define the set of EFCEs are significantly more complex, as the correlation device must keep into account the evolution of beliefs of each player as they make observations throughout the game. Due to that significant added complexity, the existence of uncoupled learning dynamics leading to an EFCE has remained a challenging open research question for a long time. In this article, we settle that question by giving the first uncoupled no-regret dynamics that converge to the set of EFCEs in n-player general-sum extensive-form games with perfect recall. We show that each iterate can be computed in time polynomial in the size of the game tree, and that, when all players play repeatedly according to our learning dynamics, the empirical frequency of play is proven to be a O(T^-0.5)-approximate EFCE with high probability after T game repetitions, and an EFCE almost surely in the limit.

علوم الكمبيوتر ونظرية الألعاب التعلم الآلي أنظمة متعددة العملاء

No-regret Learning in Cournot Games

208 - Yuanyuan Shi , Baosen Zhang 2019

This paper examines the convergence of no-regret learning in Cournot games with continuous actions. Cournot games are the essential model for many socio-economic systems, where players compete by strategically setting their output quantity. We assume that players do not have full information of the game and thus cannot pre-compute a Nash equilibrium. Two types of feedback are considered: one is bandit feedback and the other is gradient feedback. To study the convergence of the induced sequence of play, we introduce the notion of convergence in measure, and show that the players actual sequence of action converges to the unique Nash equilibrium. In addition, our results naturally extend the no-regret learning algorithms time-average regret bounds to obtain the final-iteration convergence rates. Together, our work presents significantly sharper convergence results for learning in games without strong assumptions on game property (e.g., monotonicity) and shows how exploiting the game information feedback can influence the convergence rates.

علوم الكمبيوتر ونظرية الألعاب أنظمة متعددة العملاء أنظمة وتحكم

Herd Behaviors in Epidemics: A Dynamics-Coupled Evolutionary Games Approach

75 - Shutian Liu , Yuhan Zhao , Quanyan Zhu 2021

The recent COVID-19 pandemic has led to an increasing interest in the modeling and analysis of infectious diseases. The pandemic has made a significant impact on the way we behave and interact in our daily life. The past year has witnessed a strong i nterplay between human behaviors and epidemic spreading. In this paper, we propose an evolutionary game-theoretic framework to study the coupled evolutions of herd behaviors and epidemics. Our framework extends the classical degree-based mean-field epidemic model over complex networks by coupling it with the evolutionary game dynamics. The statistically equivalent individuals in a population choose their social activity intensities based on the fitness or the payoffs that depend on the state of the epidemics. Meanwhile, the spreading of the infectious disease over the complex network is reciprocally influenced by the players social activities. We analyze the coupled dynamics by studying the stationary properties of the epidemic for a given herd behavior and the structural properties of the game for a given epidemic process. The decisions of the herd turn out to be strategic substitutes. We formulate an equivalent finite-player game and an equivalent network to represent the interactions among the finite populations. We develop structure-preserving approximation techniques to study time-dependent properties of the joint evolution of the behavioral and epidemic dynamics. The resemblance between the simulated coupled dynamics and the real COVID-19 statistics in the numerical experiments indicates the predictive power of our framework.

علوم الكمبيوتر ونظرية الألعاب الفيزياء والمجتمع

Decentralized Q-Learning in Zero-sum Markov Games

392 - Muhammed O. Sayin , Kaiqing Zhang , David S. Leslie 2021

We study multi-agent reinforcement learning (MARL) in infinite-horizon discounted zero-sum Markov games. We focus on the practical but challenging setting of decentralized MARL, where agents make decisions without coordination by a centralized contro ller, but only based on their own payoffs and local actions executed. The agents need not observe the opponents actions or payoffs, possibly being even oblivious to the presence of the opponent, nor be aware of the zero-sum structure of the underlying game, a setting also referred to as radically uncoupled in the literature of learning in games. In this paper, we develop for the first time a radically uncoupled Q-learning dynamics that is both rational and convergent: the learning dynamics converges to the best response to the opponents strategy when the opponent follows an asymptotically stationary strategy; the value function estimates converge to the payoffs at a Nash equilibrium when both agents adopt the dynamics. The key challenge in this decentralized setting is the non-stationarity of the learning environment from an agents perspective, since both her own payoffs and the system evolution depend on the actions of other agents, and each agent adapts their policies simultaneously and independently. To address this issue, we develop a two-timescale learning dynamics where each agent updates her local Q-function and value function estimates concurrently, with the latter happening at a slower timescale.

علوم الكمبيوتر ونظرية الألعاب التعلم الآلي أنظمة متعددة العملاء

PAC Learnability of Approximate Nash Equilibrium in Bimatrix Games

160 - Zhijian Duan , Dinghuai Zhang , Wenhan Huang 2021

Computing Nash equilibrium in bimatrix games is PPAD-hard, and many works have focused on the approximate solutions. When games are generated from a fixed unknown distribution, learning a Nash predictor via data-driven approaches can be preferable. I n this paper, we study the learnability of approximate Nash equilibrium in bimatrix games. We prove that Lipschitz function class is agnostic Probably Approximately Correct (PAC) learnable with respect to Nash approximation loss. Additionally, to demonstrate the advantages of learning a Nash predictor, we develop a model that can efficiently approximate solutions for games under the same distribution. We show by experiments that the solutions from our Nash predictor can serve as effective initializing points for other Nash solvers.

علوم الكمبيوتر ونظرية الألعاب التعلم الآلي أنظمة متعددة العملاء