No Arabic abstract
We study a general class of entropy-regularized multi-variate LQG mean field games (MFGs) in continuous time with $K$ distinct sub-population of agents. We extend the notion of actions to action distributions (exploratory actions), and explicitly derive the optimal action distributions for individual agents in the limiting MFG. We demonstrate that the optimal set of action distributions yields an $epsilon$-Nash equilibrium for the finite-population entropy-regularized MFG. Furthermore, we compare the resulting solutions with those of classical LQG MFGs and establish the equivalence of their existence.
Entropy regularization has been extensively adopted to improve the efficiency, the stability, and the convergence of algorithms in reinforcement learning. This paper analyzes both quantitatively and qualitatively the impact of entropy regularization for Mean Field Game (MFG) with learning in a finite time horizon. Our study provides a theoretical justification that entropy regularization yields time-dependent policies and, furthermore, helps stabilizing and accelerating convergence to the game equilibrium. In addition, this study leads to a policy-gradient algorithm for exploration in MFG. Under this algorithm, agents are able to learn the optimal exploration scheduling, with stable and fast convergence to the game equilibrium.
This paper aims to answer the research question as to optimal design of decision-making processes for autonomous vehicles (AVs), including dynamical selection of driving velocity and route choices on a transportation network. Dynamic traffic assignment (DTA) has been widely used to model travelerss route choice or/and departure-time choice and predict dynamic traffic flow evolution in the short term. However, the existing DTA models do not explicitly describe ones selection of driving velocity on a road link. Driving velocity choice may not be crucial for modeling the movement of human drivers but it is a must-have control to maneuver AVs. In this paper, we aim to develop a game-theoretic model to solve for AVss optimal driving strategies of velocity control in the interior of a road link and route choice at a junction node. To this end, we will first reinterpret the DTA problem as an N-car differential game and show that this game can be tackled with a general mean field game-theoretic framework. The developed mean field game is challenging to solve because of the forward and backward structure for velocity control and the complementarity conditions for route choice. An efficient algorithm is developed to address these challenges. The model and the algorithm are illustrated on the Braess network and the OW network with a single destination. On the Braess network, we first compare the LWR based DTA model with the proposed game and find that the driving and routing control navigates AVs with overall lower costs. We then compare the total travel cost without and with the middle link and find that the Braess paradox may still arise under certain conditions. We also test our proposed model and solution algorithm on the OW network.
We consider a non-zero-sum linear quadratic Gaussian (LQG) dynamic game with asymmetric information. Each player observes privately a noisy version of a (hidden) state of the world $V$, resulting in dependent private observations. We study perfect Bayesian equilibria (PBE) for this game with equilibrium strategies that are linear in players private estimates of $V$. The main difficulty arises from the fact that players need to construct estimates on other players estimate on $V$, which in turn would imply that an infinite hierarchy of estimates on estimates needs to be constructed, rendering the problem unsolvable. We show that this is not the case: each players estimate on other players estimates on $V$ can be summarized into her own estimate on $V$ and some appropriately defined public information. Based on this finding we characterize the PBE through a backward/forward algorithm akin to dynamic programming for the standard LQG control problem. Unlike the standard LQG problem, however, Kalman filter covariance matrices, as well as some other required quantities, are observation-dependent and thus cannot be evaluated off-line through a forward recursion.
Mean field games are concerned with the limit of large-population stochastic differential games where the agents interact through their empirical distribution. In the classical setting, the number of players is large but fixed throughout the game. However, in various applications, such as population dynamics or economic growth, the number of players can vary across time which may lead to different Nash equilibria. For this reason, we introduce a branching mechanism in the population of agents and obtain a variation on the mean field game problem. As a first step, we study a simple model using a PDE approach to illustrate the main differences with the classical setting. We prove existence of a solution and show that it provides an approximate Nash-equilibrium for large population games. We also present a numerical example for a linear--quadratic model. Then we study the problem in a general setting by a probabilistic approach. It is based upon the relaxed formulation of stochastic control problems which allows us to obtain a general existence result.
This paper investigates the problem of computing the equilibrium of competitive games, which is often modeled as a constrained saddle-point optimization problem with probability simplex constraints. Despite recent efforts in understanding the last-iterate convergence of extragradient methods in the unconstrained setting, the theoretical underpinnings of these methods in the constrained settings, especially those using multiplicative updates, remain highly inadequate, even when the objective function is bilinear. Motivated by the algorithmic role of entropy regularization in single-agent reinforcement learning and game theory, we develop provably efficient extragradient methods to find the quantal response equilibrium (QRE) -- which are solutions to zero-sum two-player matrix games with entropy regularization -- at a linear rate. The proposed algorithms can be implemented in a decentralized manner, where each player executes symmetric and multiplicative updates iteratively using its own payoff without observing the opponents actions directly. In addition, by controlling the knob of entropy regularization, the proposed algorithms can locate an approximate Nash equilibrium of the unregularized matrix game at a sublinear rate without assuming the Nash equilibrium to be unique. Our methods also lead to efficient policy extragradient algorithms for solving entropy-regularized zero-sum Markov games at a linear rate. All of our convergence rates are nearly dimension-free, which are independent of the size of the state and action spaces up to logarithm factors, highlighting the positive role of entropy regularization for accelerating convergence.