No Arabic abstract
In the last few decades, numerous experiments have shown that humans do not always behave so as to maximize their material payoff. Cooperative behavior when non-cooperation is a dominant strategy (with respect to the material payoffs) is particularly puzzling. Here we propose a novel approach to explain cooperation, assuming what Halpern and Pass call translucent players. Typically, players are assumed to be opaque, in the sense that a deviation by one player in a normal-form game does not affect the strategies used by other players. But a player may believe that if he switches from one strategy to another, the fact that he chooses to switch may be visible to the other players. For example, if he chooses to defect in Prisoners Dilemma, the other player may sense his guilt. We show that by assuming translucent players, we can recover many of the regularities observed in human behavior in well-studied games such as Prisoners Dilemma, Travelers Dilemma, Bertrand Competition, and the Public Goods game.
A traditional assumption in game theory is that players are opaque to one another -- if a player changes strategies, then this change in strategies does not affect the choice of other players strategies. In many situations this is an unrealistic assumption. We develop a framework for reasoning about games where the players may be translucent to one another; in particular, a player may believe that if she were to change strategies, then the other player would also change strategies. Translucent players may achieve significantly more efficient outcomes than opaque ones. Our main result is a characterization of strategies consistent with appropriate analogues of common belief of rationality. Common Counterfactual Belief of Rationality (CCBR) holds if (1) everyone is rational, (2) everyone counterfactually believes that everyone else is rational (i.e., all players i believe that everyone else would still be rational even if i were to switch strategies), (3) everyone counterfactually believes that everyone else is rational, and counterfactually believes that everyone else is rational, and so on. CCBR characterizes the set of strategies surviving iterated removal of minimax dominated strategies: a strategy $sigma_i$ is minimax dominated for i if there exists a strategy $sigma_i$ for i such that $min_{mu_{-i}} u_i(sigma_i, mu_{-i}) > max_{mu_{-i}} u_i(sigma_i, mu_{-i})$.
Matrix games like Prisoners Dilemma have guided research on social dilemmas for decades. However, they necessarily treat the choice to cooperate or defect as an atomic action. In real-world social dilemmas these choices are temporally extended. Cooperativeness is a property that applies to policies, not elementary actions. We introduce sequential social dilemmas that share the mixed incentive structure of matrix game social dilemmas but also require agents to learn policies that implement their strategic intentions. We analyze the dynamics of policies learned by multiple self-interested independent learning agents, each using its own deep Q-network, on two Markov games we introduce here: 1. a fruit Gathering game and 2. a Wolfpack hunting game. We characterize how learned behavior in each domain changes as a function of environmental factors including resource abundance. Our experiments show how conflict can emerge from competition over shared resources and shed light on how the sequential nature of real world social dilemmas affects cooperation.
Groups of humans are often able to find ways to cooperate with one another in complex, temporally extended social dilemmas. Models based on behavioral economics are only able to explain this phenomenon for unrealistic stateless matrix games. Recently, multi-agent reinforcement learning has been applied to generalize social dilemma problems to temporally and spatially extended Markov games. However, this has not yet generated an agent that learns to cooperate in social dilemmas as humans do. A key insight is that many, but not all, human individuals have inequity averse social preferences. This promotes a particular resolution of the matrix game social dilemma wherein inequity-averse individuals are personally pro-social and punish defectors. Here we extend this idea to Markov games and show that it promotes cooperation in several types of sequential social dilemma, via a profitable interaction with policy learnability. In particular, we find that inequity aversion improves temporal credit assignment for the important class of intertemporal social dilemmas. These results help explain how large-scale cooperation may emerge and persist.
Theoretical models suggest that social networks influence the evolution of cooperation, but to date there have been few experimental studies. Observational data suggest that a wide variety of behaviors may spread in human social networks, but subjects in such studies can choose to befriend people with similar behaviors, posing difficulty for causal inference. Here, we exploit a seminal set of laboratory experiments that originally showed that voluntary costly punishment can help sustain cooperation. In these experiments, subjects were randomly assigned to a sequence of different groups in order to play a series of single-shot public goods games with strangers; this feature allowed us to draw networks of interactions to explore how cooperative and uncooperative behavior spreads from person to person to person. We show that, in both an ordinary public goods game and in a public goods game with punishment, focal individuals are influenced by fellow group members contribution behavior in future interactions with other individuals who were not a party to the initial interaction. Furthermore, this influence persists for multiple periods and spreads up to three degrees of separation (from person to person to person to person). The results suggest that each additional contribution a subject makes to the public good in the first period is tripled over the course of the experiment by other subjects who are directly or indirectly influenced to contribute more as a consequence. These are the first results to show experimentally that cooperative behavior cascades in human social networks.
Exploring the possible consequences of spatial reciprocity on the evolution of cooperation is an intensively studied research avenue. Related works assumed a certain interaction graph of competing players and studied how particular topologies may influence the dynamical behavior. In this paper we apply a numerically more demanding off-lattice population approach which could be potentially relevant especially in microbiological environments. As expected, results are conceptually similar to those which were obtained for lattice-type interaction graphs, but some spectacular differences can also be revealed. On one hand, in off-lattice populations spatial reciprocity may work more efficiently than for a lattice-based system. On the other hand, competing strategies may separate from each other in the continuous space concept, which gives a chance for cooperators to survive even at relatively high temptation values. Furthermore, the lack of strict neighborhood results in soft borders between competing patches which jeopardizes the long term stability of homogeneous domains. We survey the major social dilemma games based on pair interactions of players and reveal all analogies and differences compared to on-lattice simulations.