A synergy of costly punishment and commitment in cooperation dilemmas

Abstract

To ensure cooperation in the Prisoner’s Dilemma, individuals may require prior commitments from others, subject to compensations when agreements to cooperate are violated. Alternatively, individuals may prefer to behave reactively, without arranging prior commitments, by simply punishing those who misbehave. These two mechanisms have been shown to promote the emergence of cooperation, yet are complementary in the way they aim to promote cooperation. Although both mechanisms have their specific limitations, either one of them can overcome the problems of the other. On one hand, costly punishment requires an excessive effect-to-cost ratio to be successful, and this ratio can be significantly reduced by arranging a prior commitment with a more limited compensation. On the other hand, commitment-proposing strategies can be suppressed by free-riding strategies that commit only when someone else is paying the cost to arrange the deal, whom in turn can be dealt with more effectively by reactive punishers. Using methods from Evolutionary Game Theory, we present here an analytical model showing that there is a wide range of settings for which the combined strategy outperforms either strategy by itself, leading to significantly higher levels of cooperation. Interestingly, the improvement is most significant when the cost of arranging commitments is sufficiently high and the penalty reaches a certain threshold, thereby overcoming the weaknesses of both mechanisms.

Keywords

Commitment costly punishment cooperation prisoner’s dilemma evolutionary game theory

1 Introduction

The problem of explaining the evolution of cooperative behaviour has been actively investigated in many fields, from Evolutionary Biology, Ecology and Computer Science to Economics and Social Science. Several mechanisms responsible for the evolution of cooperation have been proposed, from kin and group selection to direct and indirect reciprocity, to structured populations, and to punishment (Nowak, 2006a, 2006b; Perc, Gómez- Gardeñes, Szolnoki, Floría, & Moreno, 2013; Sigmund, 2010; West, Griffin, & Gardner, 2007). An extensive body of theoretical and experimental evidence has shown that arranging prior commitments with posterior compensations promotes the evolution of cooperation (Cherry & McEvoy, 2013; Gerber & Wichardt, 2009; Han, Moniz Pereira, & Lenaerts, 2015; Han, Pereira, Santos, & Lenaerts, 2013a; Han, Santos, Lenaerts, & Pereira, 2015; Miettinen, 2013; Nesse, 2001; Ostrom, 1990), even when the interaction is one-shot, i.e. not repeated. Arranging prior commitments, such as through enforceable contracts or pledges (X.-P. Chen & Komorita, 1994), deposit-refund schemes (Cherry & McEvoy, 2013; Gerber & Wichardt, 2009; Sasaki, Okada, Uchida, & Chen, 2015) or even emotional or reputation-based commitment devices (Frank, 1988; Nesse, 2001), enforces others to cooperate, as it requires them to reveal their preferences or intentions (X.- P. Chen & Komorita, 1994; Han, 2013; Han, Pereira, & Santos, 2011, 2012a; Sterelny, 2012). This behaviour is ubiquitous in various human activities, ranging from personal to group and international relationships (X.- P. Chen & Komorita, 1994; Cherry & McEvoy, 2013; Frank, 1988; Nesse, 2001). For instance, contracts are a popular kind of commitment, playing a key role in enforcing cooperation in modern societies (Nesse, 2001; Skyrms, 1996). Commitments are also widely studied and utilised in multi-agent and autonomous agent systems, in order to ensure high levels of cooperation among agents (Castelfranchi & Falcone, 2010; Schelling, 1990; Winikoff, 2007; Wooldridge & Jennings, 1999).

Another important mechanism that promotes cooperation in one-shot interactions is costly punishment, where a punisher pays a cost to punish another player who misbehaves (Boyd, Gintis, Bowles, & Richerson, 2003; Egas & Riedl, 2008; Fehr & Gächter, 2000; Fehr & Gachter, 2002; Guala, 2012; Hauert, Traulsen, Brandt, Nowak, & Sigmund, 2007; Henrich et al., 2006; Traulsen, Röhl, & Milinski, 2012; Wu et al., 2009). Unlike commitment, this strategy does not request a prior agreement from the co-player before the interaction. Instead, players will reactively punish those that misbehave (if he or she can be identified) once the interaction has taken place. Several theoretical and experimental studies have shown that this kind of costly punishment can evolve in well-mixed populations only if it is cost-effective, i.e. when the punished agent suffers a sufficiently higher cost than the punisher (Anderson & Putterman, 2006; Boyd et al., 2003; Carpenter, 2007; Egas & Riedl, 2008; Nikiforakis & Normann, 2008; Sigmund, Hauert, & Nowak, 2001; Wu et al., 2009).¹ This issue has also been shown to be facilitated in structured populations as a result of strategies segregation (Brandt, Hauert, & Sigmund, 2003; Nakamaru & Iwasa, 2005; Perc & Szolnoki, 2012; Szolnoki & Perc, 2013); spatial interactions may make it possible for costly punishers to avoid being exploited by second-order free-riders and fight against defectors more effectively (Helbing, Szolnoki, Perc, & Szabó, 2010a, 2010b).

These two strategies, i.e. commitments and costly punishment, compel others to cooperate in a complementary manner. Commitment proposers force participants in a game to reveal their intentions or preferences (X.-P. Chen & Komorita, 1994; Han, Lenaerts, Santos, & Pereira, 2015; Han, Santos, et al., 2015). Yet, even when co-players accept the commitment and behave appropriately, they can still decide not to initiate such agreements themselves as this is costly, and defect when no agreement is established. Especially when the commitment cost is high, these kind of free-riders, which benefit directly from the efforts of commitment proposing strategies, can dominate (Cherry & McEvoy, 2013; Han et al., 2013a), leading to destruction of cooperation and social welfare. Punishing strategies do not experience this problem. They can effectively deal with different types of players free-riding on the investments of commitment proposers, especially since they defect when no agreement was established. Yet, within the context of well-mixed populations as was mentioned earlier, costly punishment only thrives when there is an excessive effect-to-cost ratio (Anderson & Putterman, 2006; Carpenter, 2007; Egas & Riedl, 2008; Han et al., 2013a; Nikiforakis & Normann, 2008; Wu et al., 2009), which is not the case for commitment proposing strategies. Arranging a prior agreement regarding the posterior compensation does not require that compensation to scale with the cost of setting up the agreement, as was shown in Han, Moniz Pereira, and Lenaerts (2015); Han et al. (2013a), reducing the punishment fine significantly was necessary to induce an effect on the level of cooperation. Given this complementarity, we hypothesise that a weighted combination of these two mutually complementary strategies should lead to a better solution in coping with free-riding behaviours and as a consequence with the evolution of cooperation.

Resorting to Evolutionary Game Theory (EGT) methods (Hofbauer & Sigmund, 1998; Maynard-Smith, 1982; Sigmund, 2010), we study the conditions for when a weighted combination of these two strategies may lead to a more favourable outcome for cooperative behaviour in the one-shot Prisoner’s Dilemma (PD) (Sigmund, 2010; Trivers, 1971). In this game, rational choice determines that it is better for each player not to cooperate, even though both would be better off cooperating (see detailed description in the next section). Consequently, evolutionary game dynamics predicts that under those conditions cooperation disappears (Hardin, 1968; Nowak, 2006a, 2006b; Sigmund, 2010). Here, the synergy of the two strategies, arranging prior commitments and costly punishment, will be characterised by a single parameter, which describes the probability that either strategy is used in the PD. We study both analytically and through numerical simulations, the range of this value that will allow the combined strategy to overcome the weaknesses of both strategies, thereby leading to a higher level of cooperation than the one achieved by either strategy by itself. Our results show that there is always a wide range of values for this parameter where the synergistic strategy performs better than both strategies independently.

The remainder of this paper is structured as follows. The next section describes the model combining commitment and punishment strategies in the one-shot Prisoner’s Dilemma, and the EGT methods used to analyse the model. Then, in the results section, analytical and numerical results are presented. The paper ends with a discussion and some directions for future work. We also provide a separate Supporting Information (SI) text that contains additional results and supporting analysis, which is referred to in the paper where relevant.

2 Models and methods

We first recall the definition of the Prisoner’s Dilemma game and its extension with strategies that allow for the creation of commitment strategies (Han, Pereira, & Santos, 2012b; Han et al., 2013a) or strategies using posterior costly punishment (Boyd et al., 2003; Sigmund et al., 2001). We then describe our model for the synergy of prior commitments and posterior punishment.

2.1 One-shot Prisoner’s Dilemma (PD)

As usual, for the one-shot Prisoner’s Dilemma (PD) game the four possible outcomes resulting from the action choices of the two players can be written down as a symmetric payoff matrix (Sigmund, 2010; Trivers, 1971)

\begin{matrix} (\begin{matrix} C & D \\ C R, R & S, T \\ D T, S & P, P \end{matrix}) \end{matrix}

with the entries of this matrix satisfying the ordering $T > R > P > S$ (Coombs, 1973). The interpretation of these entries is explained as follows: Once the interaction is established and both players have decided to play C (D), both players receive the same reward R (penalty P) for mutual cooperation (mutual defection). Unilateral cooperation provides the sucker’s payoff S for the cooperative player and the temptation to defect T for the defective one. Changing the ordering of the matrix entries will result in different kinds of social dilemmas with their specific Nash Equilibria. For the sake of mathematical simplicity, the Donor game (Sigmund, 2010), a special case of the PD, is sometimes used: $T = b, R = b - c, P = 0, S = - c$ , where b and c correspond to the benefit and cost of cooperation, respectively.

2.2 Commitment and costly punishment in the Prisoner’s Dilemma

Before playing the PD, a commitment strategy (denoted by COMP), proposes to her co-player to commit to the game and cooperate. As arranging agreements or exposing the intention of others may be costly, the proposer has to pay an arrangement cost $ϵ_{1}$ . If the co-player agrees with the deal, then COMP assumes that the opponent will cooperate, yet there is no guarantee that this will actually be the case. When the opponent accepted the commitment and later does not cooperate, she has to compensate the non-defaulting player at a personal cost $δ_{1}$ .

Next to the traditional unconditional cooperators (C, who always commit when a commitment deal is proposed, cooperate whenever the PD is played, but do not propose commitment themselves) and unconditional cooperators (D, who do not accept commitment, defect when the PD takes place, and do not propose commitment), we consider two commitment free-riding strategies, which have been shown to become dominant under certain conditions in the pair-wise PD situation (Han et al., 2013a).

Fake committers (FAKE), who accept a commitment proposal yet do not cooperate whenever the PD takes place. These players assume that they can exploit the commitment proposing players without suffering the consequences.

Commitment free-riders (FREE), who defect unless being proposed a commitment, which they then accept and cooperate subsequently in the PD. In other words, these players are willing to cooperate when a commitment is proposed but are not prepared to pay the cost of setting it up.

In the following sections, we consider well-mixed, finite populations of a constant size N, composed of those five strategies, i.e. COMP, C, FREE, D, and FAKE. We have shown, both analytically and numerically, that COMP dominates when the cost of arranging commitment $ϵ_{1}$ is justified with respect to the cost of cooperation c and the compensation $δ_{1}$ is sufficiently high, leading to a substantial level of cooperation (Han et al., 2013a) (see also SI, Figure S1). If these conditions are not satisfied, then either the FREE or FAKE players will dominate the population making it also possible for D players to dominate commitment proposers at higher initiation costs. It is Important to understand that the cost of setting up the agreement ( $ϵ_{1}$ ) is the essential factor, since when the compensation ( $δ_{1}$ ) reaches a certain threshold, increasing it does not lead to any noticeable improvement for the level of cooperation (Han et al., 2013a).

Adding all five strategies, the following payoff matrix for the pairwise PD is obtained (Han et al., 2013a)

M_{1} = (\begin{matrix} COMP & C & D & FAKE & FREE \\ COMP & R - \frac{ϵ_{1}}{2} & R - ϵ_{1} & 0 & S + δ_{1} - ϵ_{1} & R - ϵ_{1} \\ C & R & R & S & S & S \\ D & 0 & T & P & P & P \\ FAKE & T - δ_{1} & T & P & P & P \\ FREE & R & T & P & P & P \end{matrix})

(1)

Let us in turn determine the dominance conditions for the costly punishment strategy (CP) (Boyd et al., 2003; Hauert et al., 2007; Sigmund et al., 2001). It behaves as a standard C player in the PD, but unlike a C player, it punishes its co-player if she defects in the game. That punishment consists of paying a personal cost $ϵ_{2}$ to reduce the defector’s payoff by $δ_{2}$ . Replacing COMP with CP in the previous payoff matrix, provides the following new payoff matrix

M_{2} = (\begin{matrix} CP & C & D & FAKE & FREE \\ CP & R & R & S - ϵ_{2} & S - ϵ_{2} & S - ϵ_{2} \\ C & R & R & S & S & S \\ D & T - δ_{2} & T & P & P & P \\ FAKE & T - δ_{2} & T & P & P & P \\ FREE & T - δ_{2} & T & P & P & P \end{matrix})

(2)

Similar to COMP, the lower the cost $ϵ_{2}$ , the higher the frequency of CP. But in contrast to COMP, the frequency of CP increases with $δ_{2}$ when assuming a specific value of $ϵ_{2}$ (see SI, Figure S1). However, when the cost of commitment is sufficiently small, to reach the same level of cooperation as in the commitment model, a much more severe punishment is required for an equivalent cost, especially when the PD is less harsh. As such, commitment effectively reduces the cost-to-impact ratio in this case, an inefficient situation typically observed for CP in well-mixed populations (Anderson & Putterman, 2006; Egas & Riedl, 2008; Han et al., 2013a).

2.3 The synergy of commitment and punishment strategies

We now introduce a new strategy, denoted by CPP, that combines COMP and CP in the following manner. With probability q, CPP uses strategy COMP, and CP otherwise (i.e. with probability $1 - q$ ). With the exception of the payoff when CPP player encounters another CPP player, the payoff matrix in the case of CPP reads

M_{CPP} = q \times M_{1} + (1 - q) \times M_{2}

(3)

The average payoff of a CPP player when playing with another CPP player is

\begin{matrix} {(1 - q)}^{2} R + (1 - q) qR + q (1 - q) (R - ϵ_{1}) + \\ q^{2} (R - ϵ_{1} / 2) = R - ϵ_{1} q + q^{2} ϵ_{1} / 2 \end{matrix}

In this expression, the four terms correspond to the payoff they get for playing both C, taking into account the probabilities that: (i) both players do not propose the commitment; (ii) the focal player proposes and the other does not; (iii) the focal player does not propose and the other does; and (iv) both players propose.

2.4 Evolutionary dynamics in finite populations

Both the analytical and numerical results obtained here use Evolutionary Game Theory (EGT) methods for finite populations (Imhof, Fudenberg, & Nowak, 2005; Nowak, 2006a; Nowak, Sasaki, Taylor, & Fudenberg, 2004; Sigmund, 2010). In such a setting, agents’ payoff represents their fitness or social success, and evolutionary dynamics is shaped by social learning (Hofbauer & Sigmund, 1998; Sigmund, 2010), whereby the most successful agents will tend to be imitated more often by the other agents. In the current work, social learning is modelled using the so-called pairwise comparison rule (Traulsen, Nowak, & Pacheco, 2006), assuming that an agent A with fitness $f_{A}$ adopts the strategy of another agent B with fitness $f_{B}$ with probability p given by the Fermi function (Perc & Szolnoki, 2010; Sigmund, 2010; Szabó & Tőke, 1998; Traulsen et al., 2006)

p_{A, B} = {(1 + e^{- β (f_{B} - f_{A})})}^{- 1}

The parameter $β$ represents the ‘imitation strength’ or ‘intensity of selection’, i.e. how strongly the agents base their decision to imitate on fitness difference between themselves and the opponents. For $β = 0$ , we obtain the limit of neutral drift – the imitation decision is random. For large $β$ , imitation becomes increasingly deterministic.

In the absence of mutations or exploration, the end states of evolution are inevitably monomorphic: once such a state is reached, it cannot be escaped through imitation. We thus further assume that, with a certain mutation probability, an agent switches randomly to a different strategy without imitating another agent. In the limit of small mutation rates, the dynamics will proceed with, at most, two strategies in the population, such that the behavioural dynamics can be conveniently described by a Markov Chain, where each state represents a monomorphic population, whereas the transition probabilities are given by the fixation probability of a single mutant (Fudenberg & Imhof, 2005; Imhof et al., 2005; Nowak et al., 2004). The resulting Markov Chain has a stationary distribution, which characterises the average time the population spends in each of these monomorphic end states.

Let N be the size of the population. Suppose there are at most two strategies in the population, say, k agents using strategy A ( $0 \leq k \leq N$ ) and $(N - k)$ agents using strategy B. Thus, the (average) payoff of the agent that uses A and B can be written as follows, respectively

\begin{matrix} Π_{A} (k) = \frac{(k - 1) π_{A, A} + (N - k) π_{A, B}}{N - 1} \\ Π_{B} (k) = \frac{k π_{B, A} + (N - k - 1) π_{B, B}}{N - 1} \end{matrix}

(4)

Now, the probability of changing the number k of agents using strategy A by ± one in each time step can be written as (Traulsen et al., 2006)

T^{\pm} (k) = \frac{N - k}{N} \frac{k}{N} {[1 + e^{\mp β [Π_{A} (k) - Π_{B} (k)]}]}^{- 1}

(5)

The fixation probability of a single mutant with a strategy A in a population of $(N - 1)$ agents using B is given by (Karlin & Taylor, 1975; Nowak et al., 2004; Traulsen et al., 2006)

ρ_{B, A} = {(1 + \sum_{i = 1}^{N - 1} Π_{j = 1}^{i} \frac{T^{-} (j)}{T^{+} (j)})}^{- 1}

(6)

In the limit of neutral selection (i.e. $β = 0$ ), $ρ_{B, A}$ equals the inverse of population size, $1 / N$ . Considering a set ${1, \dots, s}$ of different strategies, these fixation probabilities determine a transition matrix $M = {T_{ij}}_{i, j = 1}^{s}$ , with $T_{ij, j \neq i} = ρ_{ji} / (s - 1)$ and $T_{ii} = 1 - \sum_{j = 1, j \neq i}^{s} T_{ij}$ , of a Markov Chain. The normalised eigenvector associated with the eigenvalue 1 of the transpose of M provides the stationary distribution above (Fudenberg & Imhof, 2005; Imhof et al., 2005; Karlin & Taylor, 1975), describing the relative time the population spends adopting each of the strategies.

Risk-dominance. An important measure to compare the two strategies A and B is which direction the transition is stronger or more probable, an A mutant fixating in a population of agents using B, $ρ_{B, A}$ , or a B mutant fixating in the population of agents using A, $ρ_{A, B}$ . It can be shown that the former is stronger, in the limit of large N, if (Kandori, Mailath, & Rob, 1993; Sigmund, 2010)

π_{A, A} + π_{A, B} - π_{B, A} - π_{B, B} > 0

(7)

3 Results

3.1 Conditions for the viability of CPP

We derive analytical conditions for which CPP can be a viable strategy for the emergence of cooperation in the Donation game (see Methods). In other words, we wish to determine when CPP is successful against the defective and free-riding strategies, and this is relative to q. Using the inequality in equation (7), we obtain the conditions under which CPP is risk-dominant against the three strategies D, FAKE and FREE, respectively

δ_{2} (1 - q) + q (b + c - ϵ_{1} + ϵ_{2}) + \frac{q^{2} ϵ_{1}}{2} - 2 c - ϵ_{2} > 0

(8)

δ_{2} (1 - q) + 2 q δ_{1} + q (ϵ_{2} - 2 ϵ_{1}) + \frac{q^{2} ϵ_{1}}{2} - 2 c - ϵ_{2} > 0

(9)

δ_{2} (1 - q) + q (b + c) + q (ϵ_{2} - 2 ϵ_{1}) + \frac{q^{2} ϵ_{1}}{2} - 2 c - ϵ_{2} > 0

(10)

We observe that the left hand side of equation (10) is smaller than or equal to the left hand side of equation (8). Thus, satisfying the inequality in equation (10) implies satisfying the inequality in equation (8). Hence, in order for CPP to be risk-dominant against the three defective strategies, we only need to guarantee that the two inequalities in equations (9) and (10) hold. By solving the system of these two inequalities, we can analytically derive the range of q for which CPP is risk-dominant against all free-riders (see SI for details), which are corroborated by numerical simulation results in the next section.

We now derive some properties of these inequalities for varying q, and for some special cases. First, we observe that whenever punishment is carried out, at least to some degree (i.e. $q < 1$ ), the two conditions are satisfied whenever $δ_{2}$ is sufficiently large, regardless of values of the other costs (see a proof in SI). This is in contrast to COMP where if the cost of arranging commitment, $ϵ_{1}$ , exceeds a certain limit, COMP is not risk-dominant against the commitment free-riders FREE, however large $δ_{1}$ is. This can be seen explicitly by simplifying equation (10), substituting for $q = 1$ , which is equivalent to (Han et al., 2013a)

b - c - \frac{3 ϵ_{1}}{2} > 0

(11)

In order to obtain further meaningful comparison of CPP to COMP and CP, we consider that prior commitment and costly punishment have the same cost of setting up as well as the same effect of compensation/punishment when the co-player misbehaves (i.e. defects in the PD after having agreed to cooperate in the former case, and defects in the PD in the latter). That is, we assume $ϵ_{1} = ϵ_{2} = ϵ$ and $δ_{1} = δ_{2} = δ$ . Then, equation (9) can be simplified to

δ > ϵ + \frac{2 c}{1 + q} - \frac{q^{2} ϵ}{2 (1 + q)}

(12)

The right hand side of equation (12) is a decreasing function of q (see SI), implying that the larger q is, the easier it is to satisfy this condition. In other words, the larger q is, the better FAKE players can be restrained by CPP players (as can already be inferred from the numerical results shown in Figure 1).

Figure 1.

Frequency of each strategy as a function of q, in a population of five strategies CPP, C, D, FAKE and FREE. For a large range of q, CPP has a higher frequency than both the commitment proposing strategy COMP (i.e. CPP with $q = 1$ ) and the costly punishment strategy CP (i.e. CPP with $q = 0$ ). The payoffs being used are, $T = 2, R = 1, P = 0, S = - 1$ ; imitation strength, $β = 0.1;$ population size, $N = 100$ ; $ϵ_{1} = ϵ_{2} = ϵ = 0.75$ ; $δ_{1} = δ_{2} = δ$ where $δ = 5$ in panel (a), $δ = 10$ in panel (b) and $δ = 15$ in panel (c).

When $q = 0$ , CPP is reduced to CP. Hence, CP is risk-dominant against the three defective strategies when

δ > ϵ + 2 c

(13)

Similarly, when $q = 1$ , CPP is reduced to COMP, which is risk-dominant against all the defective strategies when

\begin{matrix} ϵ < \frac{2 (b - c)}{3} \\ δ > \frac{3 ϵ}{4} + c \end{matrix}

(14)

Comparing conditions specified in equations (13) and (14), it follows that when $ϵ$ is sufficiently small (namely, bounded by $2 (b - c) / 3$ ), then a smaller $δ$ is required for the risk-dominance of COMP against all defective strategies than is required for CP (see also SI, Figure S1).

Now, equation (10) can be simplified to

δ - 2 c - ϵ + q (b + c - δ - ϵ (1 - \frac{q}{2})) > 0

(15)

When $δ$ is sufficiently large compared to the benefit and cost of cooperation, the left hand side is a decreasing function of q (see SI). Hence, it is more difficult for CPP to be risk-dominant against FREE players as q increases (see also Figure 1).

3.2 Varying usage of prior commitments and punishment can cope better with various types of free-riders

The analytical observations above are clarified by looking at Figure 1, where we plot the frequencies of the five strategies CPP, C, D, FAKE and FREE (see Methods) as a function of q, for different values of $δ$ and identical $ϵ$ . For a large enough $δ$ that sufficiently restrains FAKE players, increasing it further does not enhance the frequency or performance of COMP (i.e. CPP with $q = 1$ ). But that is not the case for CP, since the frequency of CP (i.e. CPP with $q = 0$ ) against D, FAKE and FREE always increases with $δ$ (see also Figure S1 in SI). In general, we observe that FAKE players can be restrained better as q increases, i.e. when commitment is used more often. On the other hand, the FREE players are better coped with for smaller q, i.e. when punishment is used more often, especially for larger $δ$ (comparing the three panels). This means that a balance between arranging prior commitments and using reactive costly punishment may provide a strategy that performs better than either strategy by itself.

Indeed, we observe that there is a wide range of q where CPP is better than both COMP and CP, in all the panels of Figure 1. The ranges are in close accordance with the analytical results (see SI, Section 3). Furthermore, in Figure 2 we plot the range of q in which CPP is better than both COMP and CP, for varying $δ$ as well as for different values of $ϵ$ . In general, CPP is more frequent for a wide range of q, and this range is larger for smaller $ϵ$ . The range is largest when $δ$ reaches a certain threshold, as thereafter commitment does not provide a further advantage for the combined strategy. However, when $δ$ is small, COMP is crucially important to reduce the impact-to-cost ratio, especially for a smaller $ϵ$ , as can be seen from the fact that the upper bound of the range is close to 1 (i.e. 100% use of commitment). Furthermore, we can observe that both the lower and upper bounds of this range decrease when $δ$ increases, because punishment is more beneficial then.

Figure 2.

The range of q in which CPP is more frequent than both COMP and CP (the light grey area), as a function of $δ$ , for (a) $ϵ = 0.1$ , (b) $ϵ = 0.5$ and (c) $ϵ = 1$ . In general, CPP is more frequent for a wide range of q, which is larger for smaller $ϵ$ . The range is largest when $δ$ is sufficiently high (but not too high). Both the lower and upper bounds of the range decrease (not strictly) with $δ$ . The payoffs being used are, $T = 2, R = 1, P = 0, S = - 1$ ; imitation strength, $β = 0.1;$ population size, $N = 100$ .

As we have seen, there is always a wide range of parameter values in which CPP is better than both COMP and CP individually. But what is the actual improvement, e.g. in terms of the improved level of cooperation, that one may obtain with the combined strategy? We search for the optimal value of q, at which CPP has the highest frequency (Figures 3a and 3c), for varying both $ϵ$ and $δ$ . We see that CPP retains the important property of CP, i.e. its frequency increases with $δ$ (Figure 3a). Furthermore, we observe a significant improvement compared to the highest frequency of COMP and CP (Figure 3b). As expected, the improvement is most significant (even more than 100%) when $δ$ reaches a certain threshold. But interestingly, it occurs when the cost $ϵ$ is sufficiently large, because in that case, the performance of COMP is severely demolished. These interesting observations are robust for varying the benefit-to-cost ratio (see SI, Figure S2).

Figure 3.

(a) Frequency of CPP at the optimal value of q; (b) the improvement in percentage of CPP in comparison to maximum of CPP and CP; and (c) optimal value of $q \in [0, 1]$ where CPP has the highest frequency, as a function of $ϵ$ and $δ$ (where $ϵ_{1} = ϵ_{2} = ϵ$ and $δ_{1} = δ_{2} = δ$ ). The frequency of CPP increases with $δ$ , and exhibits a significant improvement in comparison to the average of CP and COMP. Furthermore, the larger $δ$ , the lower the optimal value of q. The payoffs being used are, $T = 2, R = 1, P = 0, S = - 1$ ; imitation strength, $β = 0.1;$ population size, $N = 100$ .

Moreover, to gain further understanding about when CPP performs best, in Figure 3c we plot, as a function of $ϵ$ and $δ$ , the optimal value of q where CPP reaches its highest frequency. We observe that the optimal value of q is a decreasing function of $δ$ , which reaches its minimum for intermediate values of $ϵ$ . That is, it is more advantageous to use more punishment when the effect-to-cost is sufficiently high. This observation is robust for various configurations of the PD (see SI, Figure S3). We also observe that when the PD becomes less harsh, using more commitments is more beneficial to reach high levels of cooperation.

4 Discussion

Both costly punishment and arranging prior commitment have been shown to provide important pathways for the evolution of cooperation (Boyd et al., 2003; X.-P. Chen & Komorita, 1994; Han, 2016; Han, Moniz Pereira, & Lenaerts, 2015; Han et al., 2013a; Han, Santos, et al., 2015; Hauert et al., 2007; Hilbe & Traulsen, 2012), and both are widely used in diverse human activities to enforce cooperation and regulate collective behaviour (Fehr & Gächter, 2000; Fehr & Gachter, 2002; Henrich et al., 2006; Nesse, 2001; Sterelny, 2012). Interestingly, we have shown in this paper that a simple synergy of the two mechanisms can lead to an even better one that promotes a higher level of cooperation than either strategy by itself, for a wide range of parameter values. Each strategy has its own weakness, which the combined strategy can overcome. On one hand, arranging prior commitment reduces the effect-to-cost ratio required by costly punishment to perform efficiently, particularly when the cost of arrangement is sufficiently low. On the other hand, costly punishment can enable one to deal with commitment free-riders, who can escape sanctioning when interacting with the commitment strategy. In addition, one important feature of costly punishment is that its efficiency always increases with the effect-to-cost ratio of punishment, which is not possessed by the commitment strategy, even when the cost of arrangement is low. Our results show that the combined strategy retains this important property of costly punishment. Furthermore, the improvement that can be achieved through the combined strategy, in terms of frequency (compared to the best of the two strategies separately), is most significant when the cost is sufficiently large and the impact of punishment reaches a threshold. This is a notable observation since the performance of the commitment strategy is demolished in the former case and the performance of costly punishment is reduced in the latter one. As such, our results have shown that the combined strategy can overcome the weaknesses of both strategies. Hence, as an implication, our results provide novel insights for the design of self-organised or distributed multi-agent and autonomous agent systems (Bonabeau, Dorigo, & Theraulaz, 1999) that require cooperation among agents in a competitive environment.

In addition, our results suggest that free-riders of various types might be better dealt with by simply varying the use of two complementary mechanisms that can efficiently deal with the same kind of strategic situations (i.e. the one-shot PD herein). It would be optimal to be able to identify what kind of free-riders one is dealing with and use the most appropriate mechanism, but it might be difficult to do so when the information available about the co-players is insufficient for decision making, which is particularly the case in the non-repeated interactions. By varying the use of the available mechanisms (in our case, costly punishment and commitment), we can suppress more types of free-riders. Related to this, we envisage that our combined strategy can be improved through using additional cognitive skills, e.g. intention recognition (Han et al., 2011; Han, Santos, et al., 2015), to recognise the type of free-riders one is dealing with and then use the best mechanism to deal with that type.

Our model is closely related to the models in Sigmund et al. (2010); Szolnoki et al. (2011) where peer and pool punishments are studied when present as two separate strategies at the same time in either a well-mixed or structured (network) population. These models were designed to investigate which of the two sanctioning mechanisms is preferred when both of them are available to a society, a question which has also been studied experimentally in Putterman, Tyran, and Kamei (2011); Traulsen et al. (2012). Although pool punishment and prior commitment are similar in the sense that both require prior arrangements for the sanctioning of defectors at a later stage, our work differs in that commitment and peer punishment are combined into one single strategy, aiming to show that these two mechanisms can support each other in dealing with various types of free-riders. Nonetheless, as shown in SI (Section 6, Figure S4) where we analyse various types of CPP strategies in co-presence in the population (including CP and COMP as the extreme types of CPP, with $q = 0$ and 1, respectively), a similar observation regarding the emergence of a weighted combination of the two mechanisms is obtained. Namely, there is a wide range of parameters where CPP with an intermediate value of q is the most abundant in the population: neither CP nor COMP can be dominant when used separately. That said, it would be interesting to see in the context of the models in Sigmund et al. (2010); Szolnoki, Szabó, and Czakó (2011) whether a weighted combination of pool and peer punishment can outperform either of them when using separately.

Similarities can also be found with the model that analyses the probabilistic sharing of punishment duty in the context of the Public Goods Game (PGG) (X. Chen, Szolnoki, & Perc, 2014). In a PGG group interaction, punishers share the duty of punishing defectors. That is, when facing defectors, a punisher only punishes with a certain probability, and cooperates otherwise. A similar approach is conditional punishment, wherein punishment duty is proportional to the number of punishers in the group (Szolnoki & Perc, 2013). Both approaches have been shown to be more efficient for promoting cooperation in structured populations than mere punishment as they enable costly punishers to defer defectors more efficiently, especially when punishment is expensive (i.e. having a low effect-to-cost ratio). We go beyond those results by showing that the synergy between costly punishment and commitment provides a more efficient solution than mere punishment even in well-mixed populations, wherein cooperation is harder to emerge (Santos, Pacheco, & Lenaerts, 2006). Moreover, the efficiency of the model in X. Chen et al. (2014) is based on the fact that punishers can share punishment duty, which would clearly become less efficient, if not inapplicable, for a two-player game as in our setting. To the contrary, our model can be readily extended to group interactions typically found in the PGG (Han, Moniz Pereira, & Lenaerts, 2015).

Various extensions to the current model can be described. First, in this work we have not taken into account the fact that punishment might be antisocial, in which defectors can also punish cooperators. Antisocial punishment is widespread in nature (Herrmann, Thöni, & Gächter, 2008), which has been shown to be detrimental for the emergence of cooperation (Han, 2016; Hilbe & Traulsen, 2012; Rand & Nowak, 2011). To the contrary, arranging a prior commitment ensures that this kind of antisocial behaviour does not occur, because only those who agreed to commit can be punished for misbehaving (Han et al., 2013a). Hence, it would be interesting to see whether our combined strategy can overcome this weakness of costly punishment when antisocial punishment is possible. Another setting where the combination of commitment and costly punishment might be of interest is that of the repeated interaction setting, where both commitment and costly punishment have been used in support of apology (Han, Pereira, Santos, & Lenaerts, 2013b; Martinez-Vaquero, Han, Pereira, & Lenaerts, 2015; Okamoto & Matsumura, 2000). Hence, it would be interesting to see whether and how the combined strategy can support apology more efficiently, leading to a better outcome for cooperation. Last but not least, as discussed above, the cost efficiency issue of costly punishment can be efficiently dealt with in structured populations (Brandt et al., 2003; Helbing et al., 2010a, 2010b; Nakamaru & Iwasa, 2005). Thus, it would be interesting to study how the synergistic effect of commitment and costly punishment would change in such a setting.

In short, our results have shown that, although both commitment and costly punishment might promote the evolution of cooperation in the one-shot interaction setting, they can actually complement each other to assemble a better combined solution that ensures a more favourable outcome for cooperative behaviour. By varying the use of the available mechanisms one can suppress more various types of free-riders, even without looking at which mechanism is best for the situation at hand.

Footnotes

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The Anh Han was supported by Teesside URF funding (11200174). Tom Lenaerts was supported by FRS - FNRS Belgium (grant number 2.4614.12) and FWO Belgium (grant number G.0391.13N).

Notes

About the authors

The Anh Han (BacGiang, Vietnam, 1983) obtained his Bachelor degree in computer science in 2007 at the St. Petersburg State University (Russia), his master double degree in computer science in 2009 at the Technical University of Dresden (Germany) and the New University of Lisbon (Portugal). He then earned his Ph.D. in computer science at the New University of Lisbon in 2012. After 2 years as postdoctoral research fellow at Vrije Universiteit Brussel (supported by FWO Belgium), he is currently a Lecturer (Assistant Professor) at the school of computing and the Future Digital Institute of the Teesside University (UK). His current research interests span a wide range of topics within Artificial Intelligence and Multidisciplinary research, including dynamics of human cooperation, AI cognitive modelling, evolution of cognition, evolutionary game theory, agent-based modelling, behavioural economics, intention recognition, and knowledge representation and reasoning. School of Computing, Teesside University, Borough Road, Middlesbrough, TS1 3BA, UK email: T.Han@tees.ac.uk

Tom Lenaerts (Brasschaat, Belgium, 1972) studied Computer Science at the Vrije Universiteit Brussel (Belgium), where he also obtained his Ph.D. in 2003. Afterwards he held different postdoctoral positions, performing research on different topics, ranging from evolutionary game theory over language evolution to computational biology research. Since 2008, he is associate professor (tenured in 2011) at the Université Libre de Bruxelles (Belgium) where is co-heading the Machine Learning group in the Department of Computer Science (Faculty of Sciences). In 2007 he became also officially affiliated with the Artificial Intelligence lab at the Vrije Universiteit Brussel. He is founding member of the recent Interuniversity Institute for Bioinformatics in Brussels (IB2) as well as the new Brussels Experimental Economics Laboratory (BEEL). His research interests have always been interdisciplinary, covering currently the following topics: Evolution of cooperation, public goods, group formation, cognitive aspects in evolution, trust, behavioral experiments, cancer dynamics, protein communication, signal transduction and the analysis of oligogenic diseases. Machine Learning Group, Université Libre de Bruxelles, Boulevard du Triomphe CP 212, 1050 Brussels, Belgium and Artificial Intelligence Lab, Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels, Belgium. Email Tom.Lenaerts@ulb.ac.be

References

Anderson

C. M.

Putterman

(2006). Do non-strategic sanctions obey the law of demand? the demand for punishment in the voluntary contribution mechanism. Games and Economic Behavior, 54(1), 1–24.

Bonabeau

Dorigo

Theraulaz

(1999). Swarm intelligence: From natural to artificial systems. New York, NY, USA: Oxford University Press.

Boyd

Gintis

Bowles

(2010). Coordinated punishment of defectors sustains cooperation and can proliferate when rare. Science, 328(5978), 617–620.

Boyd

Gintis

Bowles

Richerson

P. J.

(2003, March). The evolution of altruistic punishment. Proceedings of the National Academy of Sciences of the United States of America, 100(6), 3531–3535.

Brandt

Hauert

Sigmund

(2003). Punishment and reputation in spatial public goods games. Proceedings of the Royal Society of London B: Biological Sciences, 270(1519), 1099–1104.

Carpenter

J. P.

(2007). The demand for punishment. Journal of Economic Behavior & Organization, 62(4), 522–542.

Castelfranchi

Falcone

(2010). Trust Theory: A Socio-Cognitive and Computational Model (Wiley Series in Agent Technology). West Sussex, UK: Wiley.

Chen

Szolnoki

Perc

(2014). Probabilistic sharing solves the problem of costly punishment. New Journal of Physics, 16(8), 083016.

Chen

X.-P.

Komorita

S. S.

(1994). The effects of communication and commitment in a public goods social dilemma. Organizational Behavior and Human Decision Processes, 60(3), 367–386.

10.

Cherry

T. L.

McEvoy

D. M.

(2013). Enforcing compliance with environmental agreements in the absence of strong institutions: An experimental analysis. Environmental and Resource Economics, 54(1), 63–77.

11.

Coombs

C. H.

(1973). A reparameterization of the prisoner’s dilemma game. Behavioral Science, 18(6), 424–428.

12.

Egas

Riedl

(2008). The economics of altruistic punishment and the maintenance of cooperation. Proceedings of the Royal Society B: Biological Sciences, 275(1637), 871–878.

13.

Fehr

Gächter

(2000). Cooperation and punishment in public goods experiments. American Economic Review, 90(4), 980–994.

14.

Fehr

Gachter

(2002). Altruistic punishment in humans. Nature, 415, 137–140.

15.

Frank

R. H.

(1988). Passions Within Reason: The Strategic Role of the Emotions. New York, NY, USA: W. W. Norton and Company.

16.

Fudenberg

Imhof

L. A.

(2005). Imitation processes with small mutations. Journal of Economic Theory, 131, 251-262.

17.

Gerber

Wichardt

P. C.

(2009). Providing public goods in the absence of strong institutions. Journal of Public Economics, 93(3), 429–439.

18.

Guala

(2012). Reciprocity: Weak or strong? What punishment experiments do (and do not) demonstrate. Behavioral and Brain Sciences, 35(1), 1.

19.

Han

T. A.

(2013). Intention recognition, commitments and their roles in the evolution of cooperation: From artificial intelligence techniques to evolutionary game theory models (Vol. 9). Berlin, Heidelberg: Springer SAPERE series.

20.

Han

T. A.

(2016). Emergence of social punishment and cooperation through prior commitments. In Proceedings of the conference of the American association of artificial intelligence (AAAI’2016) (pp. 2494–2500). Phoenix, Arizona, USA: AAAI Press.

21.

Han

T. A.

Lenaerts

Santos

F. C.

Pereira

L. M.

(2015). Emergence of cooperation via intention recognition, commitment and apology–a research summary. AI Communications, 28(4), 709–715.

22.

Han

T. A.

Moniz Pereira

Lenaerts

(2015). Avoiding or Restricting Defectors in Public Goods Games?Journal of the Royal Society Interface, 12(103), 20141203.

23.

Han

T. A.

Pereira

L. M.

Santos

F. C.

(2011). Intention recognition promotes the emergence of cooperation. Adaptive Behavior, 19(3), 264–279.

24.

Han

T. A.

Pereira

L. M.

Santos

F. C.

(2012a). Corpusbased intention recognition in cooperation dilemmas. Artificial Life journal, 18(4), 365-383.

25.

Han

T. A.

Pereira

L. M.

Santos

F. C.

(2012b). The emergence of commitments and cooperation. In Proceedings of the 11th international conference on autonomous agents and multiagent systems (AAMAS’2012) (pp. 559–566). Richland, SC: ACM.

26.

Han

T. A.

Pereira

L. M.

Santos

F. C.

Lenaerts

(2013a). Good agreements make good friends. Scientific Reports, 3(2695).

27.

Han

T. A.

Pereira

L. M.

Santos

F. C.

Lenaerts

(2013b). Why Is It So Hard to Say Sorry: The Evolution of Apology with Commitments in the Iterated Prisoner’s Dilemma. In Proceedings of the 23nd international joint conference on artificial intelligence (IJCAI’2013) (pp. 177–183). California, USA: AAAI Press.

28.

Han

T. A.

Santos

F. C.

Lenaerts

Pereira

L. M.

(2015). Synergy between intention recognition and commitments in cooperation dilemmas. Scientific Reports, 5(9312).

29.

Hardin

(1968). The tragedy of the commons. Science, 162, 1243–1248.

30.

Hauert

Traulsen

Brandt

Nowak

M. A.

Sigmund

(2007). Via freedom to coercion: The emergence of costly punishment. Science, 316, 1905–1907.

31.

Helbing

Szolnoki

Perc

Szabó

(2010a). Evolutionary establishment of moral and double moral standards through spatial interactions. PLoS Comput Biol, 6(4), e1000758.

32.

Helbing

Szolnoki

Perc

Szabó

(2010b). Punish, but not too hard: how costly punishment spreads in the spatial public goods game. New Journal of Physics, 12(8), 083005.

33.

Henrich

McElreath

Barr

Ensminger

Barrett

Bolyanatz

. . . Ziker

(2006). Costly punishment across human societies. Science, 312(5781), 1767–1770.

34.

Herrmann

Thöni

Gächter

(2008, March). Antisocial Punishment Across Societies. Science, 319(5868), 1362–1367.

35.

Hilbe

Traulsen

(2012). Emergence of responsible sanctions without second order free riders, antisocial punishment or spite. Scientific Reports, 2.

36.

Hilbe

Traulsen

Röhl

Milinski

(2014). Democratic decisions establish stable authorities that overcome the paradox of second-order punishment. Proceedings of the National Academy of Sciences of the United States of America, 111(2), 752–756.

37.

Hofbauer

Sigmund

(1998). Evolutionary games and population dynamics. Cambridge, UK: Cambridge University Press.

38.

Imhof

L. A.

Fudenberg

Nowak

M. A.

(2005). Evolutionary cycles of cooperation and defection. Proceedings of the National Academy of Sciences of the United States of America, 102, 10797–10800.

39.

Kandori

Mailath

G. J.

Rob

(1993). Learning, mutation, and long run equilibria in games. Econometrica, 61, 29–56.

40.

Karlin

Taylor

H. E.

(1975). A first course in stochastic processes. New York: Academic Press.

41.

Martinez-Vaquero

L. A.

Han

T. A.

Pereira

L. M.

Lenaerts

(2015). Apology and forgiveness evolve to resolve failures in cooperative agreements. Scientific Reports, 5(10639).

42.

Maynard-Smith

(1982). Evolution and the theory of games. Cambridge: Cambridge University Press.

43.

Miettinen

(2013). Promises and conventions–an approach to pre-play agreements. Games and Economic Behavior, 80, 68–84.

44.

Nakamaru

Iwasa

(2005). The evolution of altruism by costly punishment in lattice-structured populations: score-dependent viability versus scoredependent fertility. Evolutionary Ecology Research, 7(6), 853–870.

45.

Nesse

R. M.

(2001). Evolution and the capacity for commitment. New York, USA: Russell Sage.

46.

Nikiforakis

Normann

H.-T.

(2008). A comparative statics analysis of punishment in publicgood experiments. Experimental Economics, 11(4), 358–369.

47.

Nowak

M. A.

(2006a). Evolutionary dynamics: Exploring the equations of life. Cambridge, MA: Harvard University Press.

48.

Nowak

M. A.

(2006b). Five rules for the evolution of cooperation. Science, 314(5805), 1560.

49.

Nowak

M. A.

Sasaki

Taylor

Fudenberg

(2004). Emergence of cooperation and evolutionary stability in finite populations. Nature, 428, 646–650.

50.

Okamoto

Matsumura

(2000). The evolution of punishment and apology: an iterated prisoner’s dilemma model. Evolutionary Ecology, 14(8), 703–720.

51.

Ostrom

(1990). Governing the commons: The evolution of institutions for collective action. Cambridge, UK: Cambridge University Press.

52.

Perc

Gómez-Gardeñes

Szolnoki

Floría

L. M.

Moreno

(2013). Evolutionary dynamics of group interactions on structured populations: a review. Journal of The Royal Society Interface, 10(80), 20120997.

53.

Perc

Szolnoki

(2010). Coevolutionary games?a mini review. BioSystems, 99(2), 109–125.

54.

Perc

Szolnoki

(2012). Self-organization of punishment in structured populations. New Journal of Physics, 14(4), 043013.

55.

Putterman

Tyran

J.-R.

Kamei

(2011). Public goods and voting on formal sanction schemes. Journal of Public Economics, 95(9), 1213–1222.

56.

Rand

D. G.

Armao

J. J.

IV Nakamaru

Ohtsuki

(2010). Anti-social punishment can prevent the coevolution of punishment and cooperation. Journal of Theoretical Biology, 265(4), 624–632.

57.

Rand

D. G.

Nowak

M. A.

(2011). The evolution of antisocial punishment in optional public goods games. Nature Communications, 2, 434.

58.

Santos

F. C.

Pacheco

J. M.

Lenaerts

(2006). Evolutionary dynamics of social dilemmas in structured heterogeneous populations. Proceedings of the National Academy of Sciences of the United States of America, 103, 3490–3494.

59.

Sasaki

Brännström

Å.

Dieckmann

Sigmund

(2012). The take-it-or-leave-it option allows small penalties to overcome social dilemmas. Proceedings of the National Academy of Sciences of the United States of America, 109(4), 1165–1169.

60.

Sasaki

Okada

Uchida

Chen

(2015). Commitment to cooperation and peer punishment: Its evolution. Games, 6(4), 574–587.

61.

Schelling

T. C.

(1990). The strategy of conflict. London: Oxford University Press.

62.

Schoenmakers

Hilbe

Blasius

Traulsen

(2014). Sanctions as honest signals the evolution of pool punishment by public sanctioning institutions. Journal of Theoretical Biology, 356, 36–46.

63.

Sigmund

(2010). The calculus of selfishness. Princeton, US: Princeton University Press.

64.

Sigmund

Hauert

Nowak

(2001). Reward and punishment. Proceedings of the National Academy of Sciences of the United States of America, 98(19), 10757–10762.

65.

Sigmund

Silva

H. D.

Traulsen

Hauert

(2010). Social learning promotes institutions for governing the commons. Nature, 466, 7308.

66.

Skyrms

(1996). Evolution of the social contract. Cambridge, UK: Cambridge University Press.

67.

Sterelny

(2012). The evolved apprentice. MIT Press.

68.

Szabó

Tőke

(1998). Evolutionary prisoner?s dilemma game on a square lattice. Physical Review E, 58(1), 69.

69.

Szolnoki

Perc

(2013). Effectiveness of conditional punishment for the evolution of public cooperation. Journal of Theoretical Biology, 325, 34–41.

70.

Szolnoki

Szabó

Czakó

(2011). Competition of individual and institutional punishments in spatial public goods games. Physical Review E, 84(4), 046106.

71.

Szolnoki

Szabó

Perc

(2011). Phase diagrams for the spatial public goods game with pool punishment. Physical Review E, 83(3), 036101.

72.

Traulsen

Nowak

M. A.

Pacheco

J. M.

(2006). Stochastic dynamics of invasion and fixation. Physical Review E, 74, 11909.

73.

Traulsen

Röhl

Milinski

(2012, September22). An economic experiment reveals that humans prefer pool punishment to maintain the commons. Proceedings of the Royal Society B: Biological Sciences, 279(1743), 3716–3721.

74.

Trivers

R. L.

(1971). The evolution of reciprocal altruism. Quaterly Review of Biology, 46, 35–57.

75.

West

Griffin

Gardner

(2007). Evolutionary explanations for cooperation. Current Biology, 17, R661–R672.

76.

Winikoff

(2007). Implementing commitmentbased interactions. In Proceedings of the 6th international joint conference on autonomous agents and multiagent systems (pp. 868–875). New York, NY: ACM.

77.

Wooldridge

Jennings

N. R.

(1999). The cooperative problem-solving process. Journal of Logic and Computation, 9(4), 403–417.

78.

J. J.

Zhang

B. Y.

Zhou

Z. X.

Q. Q.

Zheng

X. D.

Cressman

Tao

(2009). Costly punishment does not always increase cooperation. Proceedings of the National Academy of Science of the United States of America, 106(41), 17448-51.