No Arabic abstract
Interactions among individuals in natural populations often occur in a dynamically changing environment. Understanding the role of environmental variation in population dynamics has long been a central topic in theoretical ecology and population biology. However, the key question of how individuals, in the middle of challenging social dilemmas (e.g., the tragedy of the commons), modulate their behaviors to adapt to the fluctuation of the environment has not yet been addressed satisfactorily. Utilizing evolutionary game theory and stochastic games, we develop a game-theoretical framework that incorporates the adaptive mechanism of reinforcement learning to investigate whether cooperative behaviors can evolve in the ever-changing group interaction environment. When the action choices of players are just slightly influenced by past reinforcements, we construct an analytical condition to determine whether cooperation can be favored over defection. Intuitively, this condition reveals why and how the environment can mediate cooperative dilemmas. Under our model architecture, we also compare this learning mechanism with two non-learning decision rules, and we find that learning significantly improves the propensity for cooperation in weak social dilemmas, and, in sharp contrast, hinders cooperation in strong social dilemmas. Our results suggest that in complex social-ecological dilemmas, learning enables the adaptation of individuals to varying environments.
Animal behavior and evolution can often be described by game-theoretic models. Although in many situations, the number of players is very large, their strategic interactions are usually decomposed into a sum of two-player games. Only recently evolutionarily stable strategies were defined for multi-player games and their properties analyzed (Broom et al., 1997). Here we study the long-run behavior of stochastic dynamics of populations of randomly matched individuals playing symmetric three-player games. We analyze stochastic stability of equilibria in games with multiple evolutionarily stable strategies. We also show that in some games, a population may not evolve in the long run to an evolutionarily stable equilibrium.
We provide a classification of symmetric three-player games with two strategies and investigate evolutionary and asymptotic stability (in the replicator dynamics) of their Nash equilibria. We discuss similarities and differences between two-player and multi-player games. In particular, we construct examples which exhibit a novel behavior not found in two-player games.
We study the performance of the gradient play algorithm for multi-agent tabular Markov decision processes (MDPs), which are also known as stochastic games (SGs), where each agent tries to maximize its own total discounted reward by making decisions independently based on current state information which is shared between agents. Policies are directly parameterized by the probability of choosing a certain action at a given state. We show that Nash equilibria (NEs) and first order stationary policies are equivalent in this setting, and give a non-asymptotic global convergence rate analysis to an $epsilon$-NE for a subclass of multi-agent MDPs called Markov potential games, which includes the cooperative setting with identical rewards among agents as an important special case. Our result shows that the number of iterations to reach an $epsilon$-NE scales linearly, instead of exponentially, with the number of agents. Local geometry and local stability are also considered. For Markov potential games, we prove that strict NEs are local maxima of the total potential function and fully-mixed NEs are saddle points. We also give a local convergence rate around strict NEs for more general settings.
We consider a scenario in which two reinforcement learning agents repeatedly play a matrix game against each other and update their parameters after each round. The agents decision-making is transparent to each other, which allows each agent to predict how their opponent will play against them. To prevent an infinite regress of both agents recursively predicting each other indefinitely, each agent is required to give an opponent-independent response with some probability at least epsilon. Transparency also allows each agent to anticipate and shape the other agents gradient step, i.e. to move to regions of parameter space in which the opponents gradient points in a direction favourable to them. We study the resulting dynamics experimentally, using two algorithms from previous literature (LOLA and SOS) for opponent-aware learning. We find that the combination of mutually transparent decision-making and opponent-aware learning robustly leads to mutual cooperation in a single-shot prisoners dilemma. In a game of chicken, in which both agents try to manoeuvre their opponent towards their preferred equilibrium, converging to a mutually beneficial outcome turns out to be much harder, and opponent-aware learning can even lead to worst-case outcomes for both agents. This highlights the need to develop opponent-aware learning algorithms that achieve acceptable outcomes in social dilemmas involving an equilibrium selection problem.
The promotion of cooperation on spatial lattices is an important issue in evolutionary game theory. This effect clearly depends on the update rule: it diminishes with stochastic imitative rules whereas it increases with unconditional imitation. To study the transition between both regimes, we propose a new evolutionary rule, which stochastically combines unconditional imitation with another imitative rule. We find that, surprinsingly, in many social dilemmas this rule yields higher cooperative levels than any of the two original ones. This nontrivial effect occurs because the basic rules induce a separation of timescales in the microscopic processes at cluster interfaces. The result is robust in the space of 2x2 symmetric games, on regular lattices and on scale-free networks.