Abstract
To ensure cooperation in the Prisoner’s Dilemma, individuals may require prior commitments from others, subject to compensations when agreements to cooperate are violated. Alternatively, individuals may prefer to behave reactively, without arranging prior commitments, by simply punishing those who misbehave. These two mechanisms have been shown to promote the emergence of cooperation, yet are complementary in the way they aim to promote cooperation. Although both mechanisms have their specific limitations, either one of them can overcome the problems of the other. On one hand, costly punishment requires an excessive effect-to-cost ratio to be successful, and this ratio can be significantly reduced by arranging a prior commitment with a more limited compensation. On the other hand, commitment-proposing strategies can be suppressed by free-riding strategies that commit only when someone else is paying the cost to arrange the deal, whom in turn can be dealt with more effectively by reactive punishers. Using methods from Evolutionary Game Theory, we present here an analytical model showing that there is a wide range of settings for which the combined strategy outperforms either strategy by itself, leading to significantly higher levels of cooperation. Interestingly, the improvement is most significant when the cost of arranging commitments is sufficiently high and the penalty reaches a certain threshold, thereby overcoming the weaknesses of both mechanisms.
1 Introduction
The problem of explaining the evolution of cooperative behaviour has been actively investigated in many fields, from Evolutionary Biology, Ecology and Computer Science to Economics and Social Science. Several mechanisms responsible for the evolution of cooperation have been proposed, from kin and group selection to direct and indirect reciprocity, to structured populations, and to punishment (Nowak, 2006a, 2006b; Perc, Gómez- Gardeñes, Szolnoki, Floría, & Moreno, 2013; Sigmund, 2010; West, Griffin, & Gardner, 2007). An extensive body of theoretical and experimental evidence has shown that arranging prior commitments with posterior compensations promotes the evolution of cooperation (Cherry & McEvoy, 2013; Gerber & Wichardt, 2009; Han, Moniz Pereira, & Lenaerts, 2015; Han, Pereira, Santos, & Lenaerts, 2013a; Han, Santos, Lenaerts, & Pereira, 2015; Miettinen, 2013; Nesse, 2001; Ostrom, 1990), even when the interaction is one-shot, i.e. not repeated. Arranging prior commitments, such as through enforceable contracts or pledges (X.-P. Chen & Komorita, 1994), deposit-refund schemes (Cherry & McEvoy, 2013; Gerber & Wichardt, 2009; Sasaki, Okada, Uchida, & Chen, 2015) or even emotional or reputation-based commitment devices (Frank, 1988; Nesse, 2001), enforces others to cooperate, as it requires them to reveal their preferences or intentions (X.- P. Chen & Komorita, 1994; Han, 2013; Han, Pereira, & Santos, 2011, 2012a; Sterelny, 2012). This behaviour is ubiquitous in various human activities, ranging from personal to group and international relationships (X.- P. Chen & Komorita, 1994; Cherry & McEvoy, 2013; Frank, 1988; Nesse, 2001). For instance, contracts are a popular kind of commitment, playing a key role in enforcing cooperation in modern societies (Nesse, 2001; Skyrms, 1996). Commitments are also widely studied and utilised in multi-agent and autonomous agent systems, in order to ensure high levels of cooperation among agents (Castelfranchi & Falcone, 2010; Schelling, 1990; Winikoff, 2007; Wooldridge & Jennings, 1999).
Another important mechanism that promotes cooperation in one-shot interactions is costly punishment, where a punisher pays a cost to punish another player who misbehaves (Boyd, Gintis, Bowles, & Richerson, 2003; Egas & Riedl, 2008; Fehr & Gächter, 2000; Fehr & Gachter, 2002; Guala, 2012; Hauert, Traulsen, Brandt, Nowak, & Sigmund, 2007; Henrich et al., 2006; Traulsen, Röhl, & Milinski, 2012; Wu et al., 2009). Unlike commitment, this strategy does not request a prior agreement from the co-player before the interaction. Instead, players will reactively punish those that misbehave (if he or she can be identified) once the interaction has taken place. Several theoretical and experimental studies have shown that this kind of costly punishment can evolve in well-mixed populations only if it is cost-effective, i.e. when the punished agent suffers a sufficiently higher cost than the punisher (Anderson & Putterman, 2006; Boyd et al., 2003; Carpenter, 2007; Egas & Riedl, 2008; Nikiforakis & Normann, 2008; Sigmund, Hauert, & Nowak, 2001; Wu et al., 2009). 1 This issue has also been shown to be facilitated in structured populations as a result of strategies segregation (Brandt, Hauert, & Sigmund, 2003; Nakamaru & Iwasa, 2005; Perc & Szolnoki, 2012; Szolnoki & Perc, 2013); spatial interactions may make it possible for costly punishers to avoid being exploited by second-order free-riders and fight against defectors more effectively (Helbing, Szolnoki, Perc, & Szabó, 2010a, 2010b).
These two strategies, i.e. commitments and costly punishment, compel others to cooperate in a complementary manner. Commitment proposers force participants in a game to reveal their intentions or preferences (X.-P. Chen & Komorita, 1994; Han, Lenaerts, Santos, & Pereira, 2015; Han, Santos, et al., 2015). Yet, even when co-players accept the commitment and behave appropriately, they can still decide not to initiate such agreements themselves as this is costly, and defect when no agreement is established. Especially when the commitment cost is high, these kind of free-riders, which benefit directly from the efforts of commitment proposing strategies, can dominate (Cherry & McEvoy, 2013; Han et al., 2013a), leading to destruction of cooperation and social welfare. Punishing strategies do not experience this problem. They can effectively deal with different types of players free-riding on the investments of commitment proposers, especially since they defect when no agreement was established. Yet, within the context of well-mixed populations as was mentioned earlier, costly punishment only thrives when there is an excessive effect-to-cost ratio (Anderson & Putterman, 2006; Carpenter, 2007; Egas & Riedl, 2008; Han et al., 2013a; Nikiforakis & Normann, 2008; Wu et al., 2009), which is not the case for commitment proposing strategies. Arranging a prior agreement regarding the posterior compensation does not require that compensation to scale with the cost of setting up the agreement, as was shown in Han, Moniz Pereira, and Lenaerts (2015); Han et al. (2013a), reducing the punishment fine significantly was necessary to induce an effect on the level of cooperation. Given this complementarity, we hypothesise that a weighted combination of these two mutually complementary strategies should lead to a better solution in coping with free-riding behaviours and as a consequence with the evolution of cooperation.
Resorting to Evolutionary Game Theory (EGT) methods (Hofbauer & Sigmund, 1998; Maynard-Smith, 1982; Sigmund, 2010), we study the conditions for when a weighted combination of these two strategies may lead to a more favourable outcome for cooperative behaviour in the one-shot Prisoner’s Dilemma (PD) (Sigmund, 2010; Trivers, 1971). In this game, rational choice determines that it is better for each player not to cooperate, even though both would be better off cooperating (see detailed description in the next section). Consequently, evolutionary game dynamics predicts that under those conditions cooperation disappears (Hardin, 1968; Nowak, 2006a, 2006b; Sigmund, 2010). Here, the synergy of the two strategies, arranging prior commitments and costly punishment, will be characterised by a single parameter, which describes the probability that either strategy is used in the PD. We study both analytically and through numerical simulations, the range of this value that will allow the combined strategy to overcome the weaknesses of both strategies, thereby leading to a higher level of cooperation than the one achieved by either strategy by itself. Our results show that there is always a wide range of values for this parameter where the synergistic strategy performs better than both strategies independently.
The remainder of this paper is structured as follows. The next section describes the model combining commitment and punishment strategies in the one-shot Prisoner’s Dilemma, and the EGT methods used to analyse the model. Then, in the results section, analytical and numerical results are presented. The paper ends with a discussion and some directions for future work. We also provide a separate Supporting Information (SI) text that contains additional results and supporting analysis, which is referred to in the paper where relevant.
2 Models and methods
We first recall the definition of the Prisoner’s Dilemma game and its extension with strategies that allow for the creation of commitment strategies (Han, Pereira, & Santos, 2012b; Han et al., 2013a) or strategies using posterior costly punishment (Boyd et al., 2003; Sigmund et al., 2001). We then describe our model for the synergy of prior commitments and posterior punishment.
2.1 One-shot Prisoner’s Dilemma (PD)
As usual, for the one-shot Prisoner’s Dilemma (PD) game the four possible outcomes resulting from the action choices of the two players can be written down as a symmetric payoff matrix (Sigmund, 2010; Trivers, 1971)
with the entries of this matrix satisfying the ordering
2.2 Commitment and costly punishment in the Prisoner’s Dilemma
Before playing the PD, a commitment strategy (denoted by COMP), proposes to her co-player to commit to the game and cooperate. As arranging agreements or exposing the intention of others may be costly, the proposer has to pay an arrangement cost
Next to the traditional unconditional cooperators (C, who always commit when a commitment deal is proposed, cooperate whenever the PD is played, but do not propose commitment themselves) and unconditional cooperators (D, who do not accept commitment, defect when the PD takes place, and do not propose commitment), we consider two commitment free-riding strategies, which have been shown to become dominant under certain conditions in the pair-wise PD situation (Han et al., 2013a).
Fake committers (FAKE), who accept a commitment proposal yet do not cooperate whenever the PD takes place. These players assume that they can exploit the commitment proposing players without suffering the consequences.
Commitment free-riders (FREE), who defect unless being proposed a commitment, which they then accept and cooperate subsequently in the PD. In other words, these players are willing to cooperate when a commitment is proposed but are not prepared to pay the cost of setting it up.
In the following sections, we consider well-mixed, finite populations of a constant size N, composed of those five strategies, i.e. COMP, C, FREE, D, and FAKE. We have shown, both analytically and numerically, that COMP dominates when the cost of arranging commitment
Adding all five strategies, the following payoff matrix for the pairwise PD is obtained (Han et al., 2013a)
Let us in turn determine the dominance conditions for the costly punishment strategy (CP) (Boyd et al., 2003; Hauert et al., 2007; Sigmund et al., 2001). It behaves as a standard C player in the PD, but unlike a C player, it punishes its co-player if she defects in the game. That punishment consists of paying a personal cost
Similar to COMP, the lower the cost
2.3 The synergy of commitment and punishment strategies
We now introduce a new strategy, denoted by CPP, that combines COMP and CP in the following manner. With probability q, CPP uses strategy COMP, and CP otherwise (i.e. with probability
The average payoff of a CPP player when playing with another CPP player is
In this expression, the four terms correspond to the payoff they get for playing both C, taking into account the probabilities that: (i) both players do not propose the commitment; (ii) the focal player proposes and the other does not; (iii) the focal player does not propose and the other does; and (iv) both players propose.
2.4 Evolutionary dynamics in finite populations
Both the analytical and numerical results obtained here use Evolutionary Game Theory (EGT) methods for finite populations (Imhof, Fudenberg, & Nowak, 2005; Nowak, 2006a; Nowak, Sasaki, Taylor, & Fudenberg, 2004; Sigmund, 2010). In such a setting, agents’ payoff represents their fitness or social success, and evolutionary dynamics is shaped by social learning (Hofbauer & Sigmund, 1998; Sigmund, 2010), whereby the most successful agents will tend to be imitated more often by the other agents. In the current work, social learning is modelled using the so-called pairwise comparison rule (Traulsen, Nowak, & Pacheco, 2006), assuming that an agent A with fitness
The parameter
In the absence of mutations or exploration, the end states of evolution are inevitably monomorphic: once such a state is reached, it cannot be escaped through imitation. We thus further assume that, with a certain mutation probability, an agent switches randomly to a different strategy without imitating another agent. In the limit of small mutation rates, the dynamics will proceed with, at most, two strategies in the population, such that the behavioural dynamics can be conveniently described by a Markov Chain, where each state represents a monomorphic population, whereas the transition probabilities are given by the fixation probability of a single mutant (Fudenberg & Imhof, 2005; Imhof et al., 2005; Nowak et al., 2004). The resulting Markov Chain has a stationary distribution, which characterises the average time the population spends in each of these monomorphic end states.
Let N be the size of the population. Suppose there are at most two strategies in the population, say, k agents using strategy A (
Now, the probability of changing the number k of agents using strategy A by ± one in each time step can be written as (Traulsen et al., 2006)
The fixation probability of a single mutant with a strategy A in a population of
In the limit of neutral selection (i.e.
Risk-dominance. An important measure to compare the two strategies A and B is which direction the transition is stronger or more probable, an A mutant fixating in a population of agents using B,
3 Results
3.1 Conditions for the viability of CPP
We derive analytical conditions for which CPP can be a viable strategy for the emergence of cooperation in the Donation game (see Methods). In other words, we wish to determine when CPP is successful against the defective and free-riding strategies, and this is relative to q. Using the inequality in equation (7), we obtain the conditions under which CPP is risk-dominant against the three strategies D, FAKE and FREE, respectively
We observe that the left hand side of equation (10) is smaller than or equal to the left hand side of equation (8). Thus, satisfying the inequality in equation (10) implies satisfying the inequality in equation (8). Hence, in order for CPP to be risk-dominant against the three defective strategies, we only need to guarantee that the two inequalities in equations (9) and (10) hold. By solving the system of these two inequalities, we can analytically derive the range of q for which CPP is risk-dominant against all free-riders (see SI for details), which are corroborated by numerical simulation results in the next section.
We now derive some properties of these inequalities for varying q, and for some special cases. First, we observe that whenever punishment is carried out, at least to some degree (i.e.
In order to obtain further meaningful comparison of CPP to COMP and CP, we consider that prior commitment and costly punishment have the same cost of setting up as well as the same effect of compensation/punishment when the co-player misbehaves (i.e. defects in the PD after having agreed to cooperate in the former case, and defects in the PD in the latter). That is, we assume
The right hand side of equation (12) is a decreasing function of q (see SI), implying that the larger q is, the easier it is to satisfy this condition. In other words, the larger q is, the better FAKE players can be restrained by CPP players (as can already be inferred from the numerical results shown in Figure 1).

Frequency of each strategy as a function of q, in a population of five strategies CPP, C, D, FAKE and FREE. For a large range of q, CPP has a higher frequency than both the commitment proposing strategy COMP (i.e. CPP with
When
Similarly, when
Comparing conditions specified in equations (13) and (14), it follows that when
Now, equation (10) can be simplified to
When
3.2 Varying usage of prior commitments and punishment can cope better with various types of free-riders
The analytical observations above are clarified by looking at Figure 1, where we plot the frequencies of the five strategies CPP, C, D, FAKE and FREE (see Methods) as a function of q, for different values of
Indeed, we observe that there is a wide range of q where CPP is better than both COMP and CP, in all the panels of Figure 1. The ranges are in close accordance with the analytical results (see SI, Section 3). Furthermore, in Figure 2 we plot the range of q in which CPP is better than both COMP and CP, for varying

The range of q in which CPP is more frequent than both COMP and CP (the light grey area), as a function of
As we have seen, there is always a wide range of parameter values in which CPP is better than both COMP and CP individually. But what is the actual improvement, e.g. in terms of the improved level of cooperation, that one may obtain with the combined strategy? We search for the optimal value of q, at which CPP has the highest frequency (Figures 3a and 3c), for varying both

(a) Frequency of CPP at the optimal value of q; (b) the improvement in percentage of CPP in comparison to maximum of CPP and CP; and (c) optimal value of
Moreover, to gain further understanding about when CPP performs best, in Figure 3c we plot, as a function of
4 Discussion
Both costly punishment and arranging prior commitment have been shown to provide important pathways for the evolution of cooperation (Boyd et al., 2003; X.-P. Chen & Komorita, 1994; Han, 2016; Han, Moniz Pereira, & Lenaerts, 2015; Han et al., 2013a; Han, Santos, et al., 2015; Hauert et al., 2007; Hilbe & Traulsen, 2012), and both are widely used in diverse human activities to enforce cooperation and regulate collective behaviour (Fehr & Gächter, 2000; Fehr & Gachter, 2002; Henrich et al., 2006; Nesse, 2001; Sterelny, 2012). Interestingly, we have shown in this paper that a simple synergy of the two mechanisms can lead to an even better one that promotes a higher level of cooperation than either strategy by itself, for a wide range of parameter values. Each strategy has its own weakness, which the combined strategy can overcome. On one hand, arranging prior commitment reduces the effect-to-cost ratio required by costly punishment to perform efficiently, particularly when the cost of arrangement is sufficiently low. On the other hand, costly punishment can enable one to deal with commitment free-riders, who can escape sanctioning when interacting with the commitment strategy. In addition, one important feature of costly punishment is that its efficiency always increases with the effect-to-cost ratio of punishment, which is not possessed by the commitment strategy, even when the cost of arrangement is low. Our results show that the combined strategy retains this important property of costly punishment. Furthermore, the improvement that can be achieved through the combined strategy, in terms of frequency (compared to the best of the two strategies separately), is most significant when the cost is sufficiently large and the impact of punishment reaches a threshold. This is a notable observation since the performance of the commitment strategy is demolished in the former case and the performance of costly punishment is reduced in the latter one. As such, our results have shown that the combined strategy can overcome the weaknesses of both strategies. Hence, as an implication, our results provide novel insights for the design of self-organised or distributed multi-agent and autonomous agent systems (Bonabeau, Dorigo, & Theraulaz, 1999) that require cooperation among agents in a competitive environment.
In addition, our results suggest that free-riders of various types might be better dealt with by simply varying the use of two complementary mechanisms that can efficiently deal with the same kind of strategic situations (i.e. the one-shot PD herein). It would be optimal to be able to identify what kind of free-riders one is dealing with and use the most appropriate mechanism, but it might be difficult to do so when the information available about the co-players is insufficient for decision making, which is particularly the case in the non-repeated interactions. By varying the use of the available mechanisms (in our case, costly punishment and commitment), we can suppress more types of free-riders. Related to this, we envisage that our combined strategy can be improved through using additional cognitive skills, e.g. intention recognition (Han et al., 2011; Han, Santos, et al., 2015), to recognise the type of free-riders one is dealing with and then use the best mechanism to deal with that type.
Our model is closely related to the models in Sigmund et al. (2010); Szolnoki et al. (2011) where peer and pool punishments are studied when present as two separate strategies at the same time in either a well-mixed or structured (network) population. These models were designed to investigate which of the two sanctioning mechanisms is preferred when both of them are available to a society, a question which has also been studied experimentally in Putterman, Tyran, and Kamei (2011); Traulsen et al. (2012). Although pool punishment and prior commitment are similar in the sense that both require prior arrangements for the sanctioning of defectors at a later stage, our work differs in that commitment and peer punishment are combined into one single strategy, aiming to show that these two mechanisms can support each other in dealing with various types of free-riders. Nonetheless, as shown in SI (Section 6, Figure S4) where we analyse various types of CPP strategies in co-presence in the population (including CP and COMP as the extreme types of CPP, with
Similarities can also be found with the model that analyses the probabilistic sharing of punishment duty in the context of the Public Goods Game (PGG) (X. Chen, Szolnoki, & Perc, 2014). In a PGG group interaction, punishers share the duty of punishing defectors. That is, when facing defectors, a punisher only punishes with a certain probability, and cooperates otherwise. A similar approach is conditional punishment, wherein punishment duty is proportional to the number of punishers in the group (Szolnoki & Perc, 2013). Both approaches have been shown to be more efficient for promoting cooperation in structured populations than mere punishment as they enable costly punishers to defer defectors more efficiently, especially when punishment is expensive (i.e. having a low effect-to-cost ratio). We go beyond those results by showing that the synergy between costly punishment and commitment provides a more efficient solution than mere punishment even in well-mixed populations, wherein cooperation is harder to emerge (Santos, Pacheco, & Lenaerts, 2006). Moreover, the efficiency of the model in X. Chen et al. (2014) is based on the fact that punishers can share punishment duty, which would clearly become less efficient, if not inapplicable, for a two-player game as in our setting. To the contrary, our model can be readily extended to group interactions typically found in the PGG (Han, Moniz Pereira, & Lenaerts, 2015).
Various extensions to the current model can be described. First, in this work we have not taken into account the fact that punishment might be antisocial, in which defectors can also punish cooperators. Antisocial punishment is widespread in nature (Herrmann, Thöni, & Gächter, 2008), which has been shown to be detrimental for the emergence of cooperation (Han, 2016; Hilbe & Traulsen, 2012; Rand & Nowak, 2011). To the contrary, arranging a prior commitment ensures that this kind of antisocial behaviour does not occur, because only those who agreed to commit can be punished for misbehaving (Han et al., 2013a). Hence, it would be interesting to see whether our combined strategy can overcome this weakness of costly punishment when antisocial punishment is possible. Another setting where the combination of commitment and costly punishment might be of interest is that of the repeated interaction setting, where both commitment and costly punishment have been used in support of apology (Han, Pereira, Santos, & Lenaerts, 2013b; Martinez-Vaquero, Han, Pereira, & Lenaerts, 2015; Okamoto & Matsumura, 2000). Hence, it would be interesting to see whether and how the combined strategy can support apology more efficiently, leading to a better outcome for cooperation. Last but not least, as discussed above, the cost efficiency issue of costly punishment can be efficiently dealt with in structured populations (Brandt et al., 2003; Helbing et al., 2010a, 2010b; Nakamaru & Iwasa, 2005). Thus, it would be interesting to study how the synergistic effect of commitment and costly punishment would change in such a setting.
In short, our results have shown that, although both commitment and costly punishment might promote the evolution of cooperation in the one-shot interaction setting, they can actually complement each other to assemble a better combined solution that ensures a more favourable outcome for cooperative behaviour. By varying the use of the available mechanisms one can suppress more various types of free-riders, even without looking at which mechanism is best for the situation at hand.
Footnotes
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The Anh Han was supported by Teesside URF funding (11200174). Tom Lenaerts was supported by FRS - FNRS Belgium (grant number 2.4614.12) and FWO Belgium (grant number G.0391.13N).
