No Arabic abstract
We present insights and empirical results from an extensive numerical study of the evolutionary dynamics of the iterated prisoners dilemma. Fixation probabilities for Moran processes are obtained for all pairs of 164 different strategies including classics such as TitForTat, zero determinant strategies, and many more sophisticated strategies. Players with long memories and sophisticated behaviours outperform many strategies that perform well in a two player setting. Moreover we introduce several strategies trained with evolutionary algorithms to excel at the Moran process. These strategies are excellent invaders and resistors of invasion and in some cases naturally evolve handshaking mechanisms to resist invasion. The best invaders were those trained to maximize total payoff while the best resistors invoke handshake mechanisms. This suggests that while maximizing individual payoff can lead to the evolution of cooperation through invasion, the relatively weak invasion resistance of payoff maximizing strategies are not as evolutionarily stable as strategies employing handshake mechanisms.
The Axelrod library is an open source Python package that allows for reproducible game theoretic research into the Iterated Prisoners Dilemma. This area of research began in the 1980s but suffers from a lack of documentation and test code. The goal of the library is to provide such a resource, with facilities for the design of new strategies and interactions between them, as well as conducting tournaments and ecological simulations for populations of strategies. With a growing collection of 139 strategies, the library is a also a platform for an original tournament that, in itself, is of interest to the game theoretic community. This paper describes the Iterated Prisoners Dilemma, the Axelrod library and its development, and insights gained from some novel research.
In the evolutionary Prisoners Dilemma (PD) game, agents play with each other and update their strategies in every generation according to some microscopic dynamical rule. In its spatial version, agents do not play with every other but, instead, interact only with their neighbors, thus mimicking the existing of a social or contact network that defines who interacts with whom. In this work, we explore evolutionary, spatial PD systems consisting of two types of agents, each with a certain update (reproduction, learning) rule. We investigate two different scenarios: in the first case, update rules remain fixed for the entire evolution of the system; in the second case, agents update both strategy and update rule in every generation. We show that in a well-mixed population the evolutionary outcome is always full defection. We subsequently focus on two-strategy competition with nearest-neighbor interactions on the contact network and synchronized update of strategies. Our results show that, for an important range of the parameters of the game, the final state of the system is largely different from that arising from the usual setup of a single, fixed dynamical rule. Furthermore, the results are also very different if update rules are fixed or evolve with the strategies. In these respect, we have studied representative update rules, finding that some of them may become extinct while others prevail. We describe the new and rich variety of final outcomes that arise from this co-evolutionary dynamics. We include examples of other neighborhoods and asynchronous updating that confirm the robustness of our conclusions. Our results pave the way to an evolutionary rationale for modelling social interactions through game theory with a preferred set of update rules.
We present tournament results and several powerful strategies for the Iterated Prisoners Dilemma created using reinforcement learning techniques (evolutionary and particle swarm algorithms). These strategies are trained to perform well against a corpus of over 170 distinct opponents, including many well-known and classic strategies. All the trained strategies win standard tournaments against the total collection of other opponents. The trained strategies and one particular human made designed strategy are the top performers in noisy tournaments also.
Since the introduction of zero-determinant strategies, extortionate strategies have received considerable interest. While an interesting class of strategies, the definitions of extortionate strategies are algebraically rigid, apply only to memory-one strategies, and require complete knowledge of a strategy (memory-one cooperation probabilities). We describe a method to detect extortionate behaviour from the history of play of a strategy. When applied to a corpus of 204 strategies this method detects extortionate behaviour in well-known extortionate strategies as well others that do not fit the algebraic definition. The highest performing strategies in this corpus are able to exhibit selectively extortionate behavior, cooperating with strong strategies while exploiting weaker strategies, which no memory-one strategy can do. These strategies emerged from an evolutionary selection process and their existence contradicts widely-repeated folklore in the evolutionary game theory literature: complex strategies can be extraordinarily effective, zero-determinant strategies can be outperformed by non-zero determinant strategies, and longer memory strategies are able to outperform short memory strategies. Moreover, while resistance to extortion is critical for the evolution of cooperation, the extortion of weak opponents need not prevent cooperation between stronger opponents, and this adaptability may be crucial to maintaining cooperation in the long run.
We study the problem of the emergence of cooperation in the spatial Prisoners Dilemma. The pioneering work by Nowak and May showed that large initial populations of cooperators can survive and sustain cooperation in a square lattice with imitate-the-best evolutionary dynamics. We revisit this problem in a cost-benefit formulation suitable for a number of biological applications. We show that if a fixed-amount reward is established for cooperators to share, a single cooperator can invade a population of defectors and form structures that are resilient to re-invasion even if the reward mechanism is turned off. We discuss analytically the case of the invasion by a single cooperator and present agent-based simulations for small initial fractions of cooperators. Large cooperation levels, in the sustainability range, are found. In the conclusions we discuss possible applications of this model as well as its connections with other mechanisms proposed to promote the emergence of cooperation.