ﻻ يوجد ملخص باللغة العربية
This paper examines the convergence of no-regret learning in Cournot games with continuous actions. Cournot games are the essential model for many socio-economic systems, where players compete by strategically setting their output quantity. We assume that players do not have full information of the game and thus cannot pre-compute a Nash equilibrium. Two types of feedback are considered: one is bandit feedback and the other is gradient feedback. To study the convergence of the induced sequence of play, we introduce the notion of convergence in measure, and show that the players actual sequence of action converges to the unique Nash equilibrium. In addition, our results naturally extend the no-regret learning algorithms time-average regret bounds to obtain the final-iteration convergence rates. Together, our work presents significantly sharper convergence results for learning in games without strong assumptions on game property (e.g., monotonicity) and shows how exploiting the game information feedback can influence the convergence rates.
Counterfactual Regret Minimization (CFR) is an efficient no-regret learning algorithm for decision problems modeled as extensive games. CFRs regret bounds depend on the requirement of perfect recall: players always remember information that was revea
We study an optimal targeting problem for super-modular games with binary actions and finitely many players. The considered problem consists in the selection of a subset of players of minimum size such that, when the actions of these players are forc
This paper considers repeated games in which one player has more information about the game than the other players. In particular, we investigate repeated two-player zero-sum games where only the column player knows the payoff matrix A of the game. S
We revisit the problem of solving two-player zero-sum games in the decentralized setting. We propose a simple algorithmic framework that simultaneously achieves the best rates for honest regret as well as adversarial regret, and in addition resolves
In this paper, we consider a distributed learning problem in a subnetwork zero-sum game, where agents are competing in different subnetworks. These agents are connected through time-varying graphs where each agent has its own cost function and can re