In this paper we deal with the problem of existence of a smooth solution of the Hamilton-Jacobi-Bellman-Isaacs (HJBI for short) system of equations associated with nonzero-sum stochastic differential games. We consider the problem in unbounded domains either in the case of continuous generators or for discontinuous ones. In each case we show the existence of a smooth solution of the system. As a consequence, we show that the game has smooth Nash payoffs which are given by means of the solution of the HJBI system and the stochastic process which governs the dynamic of the controlled system.
In this paper we consider non zero-sum games where multiple players control the drift of a process, and their payoffs depend on its ergodic behaviour. We establish their connection with systems of Ergodic BSDEs, and prove the existence of a Nash equilibrium under the generalised Isaacs conditions. We also study the case of interacting players of different type.
We study a two-player nonzero-sum stochastic differential game where one player controls the state variable via additive impulses while the other player can stop the game at any time. The main goal of this work is characterize Nash equilibria through a verification theorem, which identifies a new system of quasi-variational inequalities whose solution gives equilibrium payoffs with the correspondent strategies. Moreover, we apply the verification theorem to a game with a one-dimensional state variable, evolving as a scaled Brownian motion, and with linear payoff and costs for both players. Two types of Nash equilibrium are fully characterized, i.e. semi-explicit expressions for the equilibrium strategies and associated payoffs are provided. Both equilibria are of threshold type: in one equilibrium players intervention are not simultaneous, while in the other one the first player induces her competitor to stop the game. Finally, we provide some numerical results describing the qualitative properties of both types of equilibrium.
Multi-agent reinforcement learning (MARL) has become effective in tackling discrete cooperative game scenarios. However, MARL has yet to penetrate settings beyond those modelled by team and zero-sum games, confining it to a small subset of multi-agent systems. In this paper, we introduce a new generation of MARL learners that can handle nonzero-sum payoff structures and continuous settings. In particular, we study the MARL problem in a class of games known as stochastic potential games (SPGs) with continuous state-action spaces. Unlike cooperative games, in which all agents share a common reward, SPGs are capable of modelling real-world scenarios where agents seek to fulfil their individual goals. We prove theoretically our learning method, SPot-AC, enables independent agents to learn Nash equilibrium strategies in polynomial time. We demonstrate our framework tackles previously unsolvable tasks such as Coordination Navigation and large selfish routing games and that it outperforms the state of the art MARL baselines such as MADDPG and COMIX in such scenarios.
We study zero-sum stochastic differential games where the state dynamics of the two players is governed by a generalized McKean-Vlasov (or mean-field) stochastic differential equation in which the distribution of both state and controls of each player appears in the drift and diffusion coefficients, as well as in the running and terminal payoff functions. We prove the dynamic programming principle (DPP) in this general setting, which also includes the control case with only one player, where it is the first time that DPP is proved for open-loop controls. We also show that the upper and lower value functions are viscosity solutions to a corresponding upper and lower Master Bellman-Isaacs equation. Our results extend the seminal work of Fleming and Souganidis [15] to the McKean-Vlasov setting.
The paper studies the open-loop saddle point and the open-loop lower and upper values, as well as their relationship for two-person zero-sum stochastic linear-quadratic (LQ, for short) differential games with deterministic coefficients. It derives a necessary condition for the finiteness of the open-loop lower and upper values and a sufficient condition for the existence of an open-loop saddle point. It turns out that under the sufficient condition, a strongly regular solution to the associated Riccati equation uniquely exists, in terms of which a closed-loop representation is further established for the open-loop saddle point. Examples are presented to show that the finiteness of the open-loop lower and upper values does not ensure the existence of an open-loop saddle point in general. But for the classical deterministic LQ game, these two issues are equivalent and both imply the solvability of the Riccati equation, for which an explicit representation of the solution is obtained.