# From extortion to generosity, evolution in the Iterated Prisoner’s Dilemma

See allHide authors and affiliations

Edited* by William H. Press, University of Texas at Austin, Austin, TX, and approved July 25, 2013 (received for review April 3, 2013)

## Significance

Cooperative behavior seems at odds with the Darwinian principle of survival of the fittest, yet cooperation is abundant in nature. Scientists have used the Prisoner Dilemma game, in which players must choose to cooperate or defect, to study the emergence and stability of cooperation. Recent work has uncovered a remarkable class of extortion strategies that provide one player a disproportionate payoff when facing an unwitting opponent. Extortion strategies perform very well in head-to-head competitions, but they fare poorly in large, evolving populations. Rather we identify a closely related set of generous strategies, which cooperate with others and forgive defection, that replace extortionists and dominate in large populations. Our results help to explain the evolution of cooperation.

## Abstract

Recent work has revealed a new class of “zero-determinant” (*ZD*) strategies for iterated, two-player games. *ZD* strategies allow a player to unilaterally enforce a linear relationship between her score and her opponent’s score, and thus to achieve an unusual degree of control over both players’ long-term payoffs. Although originally conceived in the context of classical two-player game theory, *ZD* strategies also have consequences in evolving populations of players. Here, we explore the evolutionary prospects for *ZD* strategies in the Iterated Prisoner’s Dilemma (IPD). Several recent studies have focused on the evolution of “extortion strategies,” a subset of *ZD* strategies, and have found them to be unsuccessful in populations. Nevertheless, we identify a different subset of *ZD* strategies, called “generous *ZD* strategies,” that forgive defecting opponents but nonetheless dominate in evolving populations. For all but the smallest population sizes, generous *ZD* strategies are not only robust to being replaced by other strategies but can selectively replace any noncooperative *ZD* strategy. Generous strategies can be generalized beyond the space of *ZD* strategies, and they remain robust to invasion. When evolution occurs on the full set of all IPD strategies, selection disproportionately favors these generous strategies. In some regimes, generous strategies outperform even the most successful of the well-known IPD strategies, including win-stay-lose-shift.

Press and Dyson (1) recently revealed a remarkable class of strategies, called “zero-determinant” (*ZD*) strategies, for iterated two-player games. *ZD* strategies are of particular interest in the Iterated Prisoner’s Dilemma (IPD), the canonical game used to study the emergence of cooperation among rational individuals (2⇓⇓⇓⇓⇓⇓–9). By allowing a player to unilaterally enforce a linear relationship between her payoff and her opponent’s payoff, Press and Dyson (1) argue, *ZD* strategies provide a sentient player unprecedented control over the long-term outcome of IPD games. In particular, Press and Dyson (1) highlighted a subset of *ZD* strategies, called “extortion strategies,” that grants the extorting player a disproportionately high payoff when employed against a naive opponent who blindly adjusts his strategy to maximize his own payoff.

A natural response to Press and Dyson (1) is to ask: What are the implications of *ZD* strategies for an evolving population of players (10)? Although several recent studies have begun to explore this question (11, 12), they have focused almost exclusively on extortion strategies. Extortion strategies are not successful in evolving populations unless the population size is very small. Like all strategies that prefer to defect rather than to cooperate, extortion strategies are vulnerable to strategies that reward cooperation but punish defection. However, there is more to *ZD* strategies than just extortion, and recent work has uncovered some *ZD* strategies that promote cooperation in two-player games (10, 13). Here, we consider the full range of *ZD* strategies in a population setting and show that when it comes to evolutionary success, it is generosity, not extortion, that rules.

We begin our analysis by considering populations restricted to the space of *ZD* strategies. We show that evolution within *ZD* always leads to a special subset of strategies, which we call “generous” *ZD*. Generous *ZD* strategies reward cooperation but punish defection only mildly, and they tend to score lower payoffs than those of defecting opponents. Next, we build on recent work by Akin (13), who identified generous strategies beyond those contained within *ZD*. We demonstrate that a large proportion of these generous strategies are robust to replacement in an evolving population. At worst, the robust generous strategies can be replaced neutrally. Conversely, we demonstrate that most generous strategies can readily replace resident nongenerous strategies in a population. As a result, generous strategies are just as, or sometimes even more, successful than the most successful of well-known IPD strategies in evolving populations. Finally, we show that populations evolving on the full set of IPD strategies spend a disproportionate amount of time near generous strategies, indicating that they are favored by evolution.

## Methods and Results

In the Prisoner’s Dilemma, two players, *X* and *Y*, must simultaneously choose whether to cooperate (*c*) or defect (*d*). If both players cooperate (*cc*), they each receive payoff *R*. If *X* cooperates and *Y* defects (*cd*), *X* loses out and receives the smallest possible payoff, *S*, whereas *Y* receives the largest possible payoff, *T*. If both players defect (*dd*), both players receive payoff *P*. Payoffs are specified so that the reward for mutual defection is less than the reward for mutual cooperation (i.e., ). It is typically assumed that , so that it is not possible for total payoff received by both players to exceed 2*R*. In what follows, we will consider the payoffs , , , and , which comprise the so-called “donation game” (12).

The IPD consists of infinitely many successive rounds of the Prisoner’s Dilemma. Press and Dyson (1) showed that it is sufficient to consider only the space of memory-1 strategies (i.e., strategies that specify the probability of a player cooperating in each round in terms of the payoff she received in the previous round. Memory-1 strategies consist of four probabilities, . In particular, Press and Dyson (1) showed that the long-term payoff to a memory-1 player pitted against an arbitrary opponent is the same as her payoff would be against some other memory-1 opponent. Thus, we limit our analysis to memory-1 players without loss of generality (*Materials and Methods*).

### Evolutionary Game Theory.

In the context of evolutionary game theory, we consider a population of *N* individuals who are each characterized by a strategy **p**. We say strategy **p** receives long-term payoff against an opponent with strategy **q**. The success of a strategy depends on its payoff when pitted against all individuals in the population (14⇓⇓–17). Traditionally, the evolutionary outcome in such a population has been understood in terms of evolutionary stable strategies (ESSs). A strategy **p** is an ESS if its long-term payoffs satisfy , or and , for all opponents .

The ESS condition provides a useful notion of stability in the context of an infinite population. However, in a finite population, the concept must be generalized to consider whether selection favors both invasion and replacement of a resident strategy by a mutant strategy (18, 19). In a finite, homogeneous population of size *N*, a newly introduced neutral mutation (i.e., a mutation that does not change the payoff to either player) will eventually replace the entire population with probability . A deleterious mutation, which is opposed by selection, will fix with probability , whereas an advantageous mutation, which is favored by selection, will fix with probability . We say that a resident strategy **p** in a finite population of size *N* is “evolutionary robust” against a mutant strategy **q** if the probability of replacement satisfies ; in other words, the robust strategy cannot be selectively replaced by the mutant strategy. In the limit of infinite population size, , the condition reduces to the ESS condition.

When selection is weak (; *Materials and Methods*), we can write down an explicit criterion for robustness: A resident *Y* is evolutionary robust against a mutant *X* if and only ifwhere we denote the long-term payoff of player *X* against player *Y* by . We restrict our analysis to memory-1 players. In the two-player setting, this restriction does not sacrifice generality because, as per Press and Dyson (1), the payoff received by a memory-1 strategy *Y* can be determined independent of an opponent’s memory. However, in an evolutionary setting, *Y*’s success depends also on the payoff her opponent receives against himself. Nonetheless, we will show that our results for generous strategies hold against all opponents, no matter how long their memories, provided the standard IPD assumption holds.

### Zero-Determinant Strategies, Extortion, and Generosity.

Among the space of all memory-1 IPD strategies, Press and Dyson (1) identified a subspace of *ZD* strategies that ensure a fixed, linear relationship between two players’ long-term payoffs. If player *Y* facing player *X* employs a *ZD* strategy of the formtheir payoffs will satisfy the linear relationshipThe parameters *χ* and *κ* must lie in the range and to produce a feasible strategy. Eq. **2** defines the full space of *ZD* strategies introduced by Press and Dyson (1). Within this space, two particular subsets are of special interest: the extortion strategies, described by Press and Dyson (1), for which and , and the generous strategies, described in our commentary (10), for which and .

Extortion strategies ensure that either the extortioner *Y* receives a higher payoff than her opponent *X*, , or that both players otherwise receive the payoff for mutual defection, . In contrast, generous strategies ensure that both players receive the payoff for mutual cooperation, , or that the generous player *Y* otherwise receives a lower payoff than her opponent, .

Recent work has focused on the evolutionary prospects of extortioners (11, 12) and has found that such strategies are unsuccessful, except in very small populations. In fact, as we will show below, selection favors replacement of extortioners by generous strategies, and generous strategies are robust to replacement by extortioners. Moreover, the success of generous strategies persists when evolution proceeds within the full space of IPD strategies.

### Evolution of Generosity Within *ZD* Strategies.

We start by identifying the subset of *ZD* strategies that is evolutionary robust against all IPD strategies in a population of size *N*. Substituting Eq. **2** into Eq. **1** shows that a resident *ZD* strategy *Y* with is robust against any mutant IPD strategy *X* if and only if (*Materials and Methods*). Conversely, provided that , any resident *ZD* strategy *Y* with can be selectively replaced by another strategy, namely, by a *ZD* strategy with and (*Materials and Methods*). Hence, those *ZD* strategies with and are precisely the *ZD* strategies that are evolutionary robust against all IPD strategies. We denote this set of robust *ZD* strategies as *ZD*_{R}:Here, *ϕ* is left unconstrained, but it must lie in the range required to produce a feasible strategy, .

The robust *ZD* strategies are what we call “cooperative,” meaning they satisfy . Any cooperative player will agree to mutual cooperation when facing another cooperative player, and so they each receive payoff . If a cooperative strategy further satisfies the condition , we say that the strategy is generous, meaning that any deviation from mutual cooperation causes the generous player’s payoff to decline more than that of her opponent. The robust *ZD* strategies are all generous.

We now consider evolution in a population of players restricted to the space of *ZD* strategies. Because selection favors replacement of any noncooperative *ZD* strategy by some member of *ZD*_{R}, we expect evolution within the space of *ZD* strategies to tend towards generous strategies, and thereafter to remain at generous strategies, because *ZD*_{R} is robust. This expectation is confirmed by Monte Carlo simulations of well-mixed populations of IPD players (Fig. 1). Following Hilbe et al. (12) and Traulsen et al. (20), we modeled evolution as a process in which individuals copy successful strategies with a probability that depends on their relative payoffs (*Materials and Methods*). As Fig. 1 shows, evolution within the set of *ZD* strategies proceeds from extortion ( and ) to generosity ( and ). In fact, even populations initiated with evolve to generosity (Fig. S1).

### Good Strategies.

The generous *ZD* strategies identified above are best understood by comparison with the space of “good” strategies recently introduced by Akin (13). A good strategy stabilizes cooperative behavior in the two-player IPD: By definition, if both players adopt good strategies, each receives payoff and neither player can gain by unilaterally changing strategy. All good strategies are cooperative (i.e., they have ). Moreover, the generous *ZD* strategies described above are precisely the intersection of good strategies (13) and *ZD* strategies (1) (Fig. 2).

We can identify the space of memory-1 good strategies as those of the formwhere and are required to produce a feasible strategy. Sufficient conditions for set *G* of good strategies are (*SI Text*):where the parameter *ϕ* is left unconstrained except that it must produce a feasible strategy. Numerics indicate these sufficient conditions are also necessary (*SI Text*). Note that the good strategies with correspond precisely to the generous *ZD* strategies.

It is interesting to note that in addition to tit-for-tat and generous tit-for-tat, which are *ZD*, the set of good strategies contains win-lose-stay-shift, which is widely known as one of the most evolutionary successful IPD strategies (7). Nonetheless, even though win-lose-stay-shift is good, it is not generous (it has ; Fig. 3). Because it lacks generosity, win-lose-stay-shift can, in fact, be outcompeted in evolving populations, as we shall see below.

### Evolutionary of Generosity Within Good Strategies.

In this section, we ask which good strategies are evolutionary robust, and we find that the robust good strategies are always generous (i.e., have , regardless of *λ*). In the case of *ZD*, the conditions for evolutionary robustness do not depend on the parameter *ϕ*. Similarly, we will derive conditions for the robustness of good strategies that hold regardless of *ϕ*.

Application of Eq. **1** allows us to derive the conditions for a good strategy to be evolutionary robust against all IPD strategies in a population of size *N* (*SI Text*). The resulting set, , of evolutionary robust good strategies satisfiesHere, *ϕ* is left unconstrained, except that it must produce a feasible strategy. These analytical conditions for robustness are confirmed by Monte Carlo simulations (Fig. S2). Setting in the equation above recovers the conditions we previously derived for the robustness of *ZD* strategies. As in the case of *ZD*, the robust good strategies are exclusively limited to generous strategies (i.e., strategies with and ; Fig. 3).

Interestingly, the strategy win-stay-lose-shift does not lie within the region of robust good strategies (Fig. 3). As a concrete demonstration of this result, we have identified a specific strategy that selectively replaces win-lose-stay-shift in a finite population (*SI Text* and Fig. S3). Furthermore, even under strong selection, and under increased mutation rates, win-stay-lose-shift can be dominated by some strategies (Fig. S3).

### Evolutionary Success of Generosity.

We have shown that generous strategies are evolutionarily robust, and eventually dominate in a population, when players are confined to the space of *ZD* strategies. We have also shown that among the good strategies, which stabilize cooperative behavior, the evolutionary robust strategies, , are also generous. To complement these results, we now systematically query the evolutionary success of generous strategies in general by allowing a population to explore the full set of memory-1 strategies and quantifying how much time the population spends near generosity.

Following Hilbe et al. (12) and Imhof and Nowak (21), we performed simulations in the regime of weak mutation, so that the population is monomorphic for a single strategy at all times. Mutant strategies, drawn uniformly from the space , are proposed at rate *μ*. A proposed mutant either immediately fixes or is immediately lost from the population, according to its fixation probability calculated relative to the current strategy in the population (12, 20). Over the course of this simulation, we quantified how much time the population spends in a *δ*-neighborhood of *ZD*, *ZD*_{R}, *G*, and *G*_{R} strategies, as well as extortion strategies (Fig. 4). The *δ*-neighborhood of a strategy set is defined as those strategies within Euclidean distance *δ* of it, among the space of all memory-1 strategies. If the proportion of time spent in the *δ*-neighborhood is greater than would be expected by random chance (which is proportional to the volume of the *δ*-neighborhood), evolution is said to favor that set of strategies.

It is already known that except for very small populations, a population spends far less time near extortion strategies than expected by random chance and that the same is true for the set of all *ZD* strategies (11, 12). Thus, in general, extortion and *ZD* strategies are disfavored by evolution in populations. This has led to the view that *ZD* strategies are of importance only in the setting of classical two-player game theory, and not in evolving populations (11, 22). In Fig. 4, we repeat this analysis but additionally report the *δ*-neighborhoods of *ZD*_{R}, *G*, and *G*_{R} strategies. We find that, except in very small populations, selection strongly favors *G*, *G*_{R}, and especially *ZD*_{R} strategies. In particular, the population spends more than 100-fold longer in the neighborhood of *ZD*_{R} strategies than expected by random chance. Thus, *ZD* contains a subset of strategies that is remarkably successful in evolving populations, in contrast to the claims of Adami and Hintze (11).

We also analyzed the time spent near each individual good strategy, under both weak and strong selection. We found that the strategies most strongly favored by selection are virtually all generous (Fig. S4). The remaining good strategies are typically moderately favored by selection, with the exception of those near win-stay-lose-shift, which are also strongly favored.

### Success of Generous Strategies Against Classic IPD Strategies.

To complement the weak-mutation studies described above, we also compared the performance of generous *ZD* strategies against several classic IPD strategies, in a finite population of players (12, 18⇓–20), assuming either strong or weak mutation (i.e., high or low mutation rates). We performed Monte Carlo simulations of populations constrained to different subsets of strategies, similar to those of Hilbe et al. (12). In these simulations, a pair of individuals is chosen from the population at each time step, and the first individual copies the strategy of the second with a probability that depends on their respective payoffs (Table S1), as above. Mutations also occur, with probability *μ*, so that the mutated individual randomly adopts another strategy from the set of strategies being considered. We ran simulations at a variety of populations sizes, ranging from from to .

At very small population sizes, defector strategies tend to dominate (Fig. S5), reflecting the fact that extortion pays in the classic two-player setting (1). However, as the population size increases, good strategies, such as win-stay-lose-shift, and generous *ZD* quickly begin to dominate (Fig. S5). Which strategy does best depends on the population size, the mutation rate, and the set of available strategies (Figs. S5 and S6). In some regimes, generous *ZD* strategies even outperform win-stay-lose-shift (Fig. S5).

## Discussion

We have shown that generous strategies tend to dominate in evolving populations of IPD players. This is a surprising result because, when faced with a defector strategy, generous strategies must, by definition, suffer a greater reduction in payoff than their opponent suffers. One might expect such strategies to be vulnerable to replacement by defector strategies, whereas, in fact, we have shown that the reverse is true. Likewise, one might expect generous strategies to be unsuccessful at displacing resident strategies in a population. However, simulations reveal (Figs. S7 and S8) that most generous strategies can selectively replace almost all other IPD strategies.

How can we account for the remarkable evolutionary success of generosity? First, it is important to note that the most successful generous strategies are not too generous. For example, in a large population, evolutionary robust *ZD* strategies must have ; that is, they must reduce their payoff when faced with a defector opponent but not by too much. Second, although generous strategies score less than defector strategies in head-to-head matches, they are able to limit the difference between their own payoff and their opponent’s payoff (*Materials and Methods*). As a result, they tend to have a consistent probability of replacing a diversity of resident IPD strategies (Fig. S8), allowing them to succeed in an evolutionary setting.

We found that generous *ZD* strategies are particularly successful when mutations arise at an appreciable rate. Under such circumstances, *ZD*_{R} strategies can dominate even win-stay-lose-shift, a perennial favorite in evolving populations (7, 11, 23, 24). Overall, selection strongly favors generous *ZD* strategies when evolution proceeds in the full space of memory-1 strategies. These results strongly contravene the view that *ZD* strategies are of little evolutionary importance (11, 22). In fact, we have shown that a subset of *ZD* strategies, the generous ones, is strongly favored in the evolutionary setting.

The discovery and elegant definition of *ZD* strategies remains a remarkable achievement, especially in light of decades worth of prior research on the Prisoner’s Dilemma in both the two-player and evolutionary settings. *ZD* strategies comprise a variety of new ways to play the IPD, and Akin’s generalization of cooperative *ZD* to good strategies (13) provides novel insight into how cooperation between two rational players can be stabilized. However, in an evolutionary setting, among both *ZD* and good strategies, it is the generous ones that are most successful.

## Materials and Methods

### Notation.

For ease of analysis, the parameter *χ* we use throughout is the inverse of that used by Press and Dyson (1). In addition, to avoid confusion with *δ*-neighborhoods, we use *λ* in place of Akin’s *δ* (13).

### Evolutionary Simulations.

We simulated a well-mixed population in which selection follows an “imitation” process (12, 20). At each discrete time step, a pair of individuals is chosen at random. *X* switches its strategy to imitate *Y* with probability :

where and denote the average IPD payoffs of players *X* and *Y* against the entire population and *σ* denotes the strength of selection. When a mutant strategy *X* is introduced to a population otherwise consisting of a resident strategy *Y*, its probability of fixation, *ρ*, is given by

Taylor expansion to first order about gives Eq. **1**, the condition for selective replacement of *Y* by *X* under weak selection.

### Evolutionary Robustness of Cooperative *ZD* Strategies.

Suppose that a resident strategy *Y* is cooperative and *ZD*. We will show that *Y* is evolutionary robust if and only if . From Eq. **1**, we deduce that *Y* is robust against any mutant IPD strategy *X* if their payoffs satisfy

Using Eq. **2** to substitute for yields the equivalent condition

Furthermore, we know that for any mutant *X* (because, otherwise, Eq. **2** would imply that both and exceed , which contradicts the assumption ). Therefore, the cooperative *ZD* strategy *Y* is robust if and only if .

### Noncooperative *ZD* Strategies Can Be Selectively Replaced.

Here, we show that a resident *ZD* strategy *Y* with is selectively replaced by a *ZD* strategy *X* with and . Because both players are *ZD*, their payoffs satisfy the equations

which result in the payoff matrix

Substituting these payoffs into Eq. **1** shows that *X* can selectively replace *Y* if

By our assumptions on *X* and *Y*, and . If , inequality **5** is satisfied, and so *X* is selected to replace *Y*. If , to determine the conditions for which *X* is selected to replace *Y*, we make the coordinate transformations and , so that and . The inequality **5** is then satisfied provided

Rearranging this gives

which is hardest to satisfy when is at its minimum (i.e., ). This results in the inequality

as a sufficient condition for *X* to replace *Y* selectively. This sufficient condition is met by our assumption on *X*. Thus, noncooperative *ZD* strategies can always be selectively replaced, provided .

### Generous Strategies Limit the Difference Between Their Payoff and Their Opponent’s Payoff.

Consider Eq. **2** for a generous *ZD* strategy *Y* facing an arbitrary opponent *X*:

Rearranging this expression gives the difference in the players’ payoffs:

Increasing *χ* reduces the difference between two players’ payoffs, regardless of the opponent’s strategy. This is also true for generous good strategies, which satisfy

where denotes the equilibrium rate of the play and the equilibrium rate of the play (13). The ability of generous strategies to limit the difference in payoffs with arbitrary opponents accounts for their remarkable consistency as invaders, as exemplified in Fig. S8. On the other hand, a nongenerous strategy, such as win-stay-lose-shift, is subject to larger differences between one player’s payoff and her opponent’s payoff, leading to less consistent success as an invader (Fig. S8).

### Long-Memory Strategies.

Our results for the evolutionary success of generous strategies in a finite population also hold against longer memory opponents. As per Press and Dyson (1), from the perspective of a memory-1 player, a long-memory opponent is equivalent to a memory-1 opponent. Thus, the payoff can be determined by considering only the set of memory-1 strategies. However, the payoff a long-memory opponent receives against itself, , may depend on its memory capacity. Nonetheless, under the standard IPD assumption , the highest total payoff for any pair of players in the IPD is ; thus, . This condition on is the only condition required to derive our results on the robustness of *ZD* and good strategies (*SI Text*), and so our results continue to hold even against long-memory invaders.

## Acknowledgments

We thank William Press, Freeman Dyson, Karl Sigmund, and Christian Hilbe, and two anonymous referees for constructive feedback. We gratefully acknowledge support from the Burroughs Wellcome Fund, the David and Lucile Packard Foundation, the James S. McDonnell Foundation, the Alfred P. Sloan Foundation, the Foundational Questions in Evolutionary Biology Fund (Grant RFP-12-16), the US Army Research Office (Grant W911NF-12-1-0552), and Grant D12AP00025 from the US Department of the Interior.

## Footnotes

- ↵
^{1}To whom correspondence should be addressed. E-mail: jplotkin{at}sas.upenn.edu.

Author contributions: A.J.S. and J.B.P. designed research, performed research, analyzed data, and wrote the paper.

The authors declare no conflict of interest.

↵*This Direct Submission article had a prearranged editor.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1306246110/-/DCSupplemental.

Freely available online through the PNAS open access option.

## References

- ↵
- Press WH,
- Dyson FJ

- ↵
- Rapoport A,
- Chammah AM

- ↵
- Axelrod R,
- Hamilton WD

- ↵
- Axelrod R

- ↵
- ↵
- Fudenberg D,
- Maskin E

- ↵
- ↵
- ↵
- ↵
- Stewart AJ,
- Plotkin JB

- ↵Adami C, Hintze A (2012) Winning isn’t everything: Evolutionary stability of zero determinant strategies.
*Nature Communications 4*, 10.1038/ncomms3193. - ↵
- Hilbe C,
- Nowak MA,
- Sigmund K

- ↵Akin E (2012) Stable cooperative solutions for the iterated prisoner’s dilemma.
*arXiv*:1211.0969. - ↵
- ↵
- Maynard Smith J

- ↵
- Hofbauer J,
- Sigmund K

- ↵
- Boyd R,
- Gintis H,
- Bowles S

- ↵
- ↵
- Nowak MA

- ↵
- ↵
- Imhof LA,
- Nowak MA

- ↵
- Ball P

- ↵
- ↵

## Citation Manager Formats

## Article Classifications

- Biological Sciences
- Evolution

- Social Sciences
- Psychological and Cognitive Sciences