ﻻ يوجد ملخص باللغة العربية
In this paper, we consider a distributed learning problem in a subnetwork zero-sum game, where agents are competing in different subnetworks. These agents are connected through time-varying graphs where each agent has its own cost function and can receive information from its neighbors. We propose a distributed mirror descent algorithm for computing a Nash equilibrium and establish a sublinear regret bound on the sequence of iterates when the graphs are uniformly strongly connected and the cost functions are convex-concave. Moreover, we prove its convergence with suitably selected diminishing stepsizes for a strictly convex-concave cost function. We also consider a constant step-size variant of the algorithm and establish an asymptotic error bound between the cost function values of running average actions and a Nash equilibrium. In addition, we apply the algorithm to compute a mixed-strategy Nash equilibrium in subnetwork zero-sum finite-strategy games, which have merely convex-concave (to be specific, multilinear) cost functions, and obtain a final-iteration convergence result and an ergodic convergence result, respectively, under different assumptions.
We revisit the problem of solving two-player zero-sum games in the decentralized setting. We propose a simple algorithmic framework that simultaneously achieves the best rates for honest regret as well as adversarial regret, and in addition resolves
This paper examines the convergence of no-regret learning in Cournot games with continuous actions. Cournot games are the essential model for many socio-economic systems, where players compete by strategically setting their output quantity. We assume
We consider two-player zero-sum differential games (ZSDGs), where the state process (dynamical system) depends on the random initial condition and the state processs distribution, and the objective functional includes the state processs distribution
Counterfactual Regret Minimization (CFR) is an efficient no-regret learning algorithm for decision problems modeled as extensive games. CFRs regret bounds depend on the requirement of perfect recall: players always remember information that was revea
The paper studies the open-loop saddle point and the open-loop lower and upper values, as well as their relationship for two-person zero-sum stochastic linear-quadratic (LQ, for short) differential games with deterministic coefficients. It derives a