Abstract
Using data from the 2011-2012 season of the Premier League, we study empirically and theoretically the impact of soccer suspension rules on the behavior of players and referees. For players facing a potential one-game suspension, being one versus two yellow cards away from the suspension limit results in an approximate 12% reduction in fouling, while for those facing a potential two-game suspension, the reduction is approximately 23%. The probability such players receive a yellow card is also reduced. In addition, we find some evidence of slight referee bias for the home team in the dispensing of penalty cards, but not in the calling of fouls. Finally, we develop a theoretical framework for investigating the effects of suspension rules on the number of fouls committed. Within this framework, we investigate how policy instruments such as referees’ propensity to give out yellow cards or their consistency in giving them out affect the impact of suspension rules.
Introduction
In this article, we quantitatively evaluate the effect of suspension rules on players’ behavior in soccer. Soccer, like many other forms of football, features situations when two players collide in an attempt to control a ball. Such collisions are potentially dangerous to players. In extreme cases, they can result in injuries so serious that they threaten the players’ future careers. In order to mitigate such outcomes and maintain the integrity of the game, various tournament and league organizers introduce punishment schemes to deter players from dangerous play. In soccer, a dangerous play is typically punished by showing the player a yellow card, which goes on the player’s record (he is “booked”). Accumulation of a certain number of yellow cards in a season results in the player being suspended for one or more games. Our goal in this article is to evaluate the impact of the threat of such suspensions on players’ behavior.
Specifically, we analyze a unique, player-game-level data set from the English Premier League to address the following three questions: (1) Under the current suspension rules, what affects the number of fouls committed? (2) Under the current suspension rules, what affects the probability of getting a penalty card? and (3) How would the number of fouls committed differ if the suspension rules changed?
First, we estimate the effect that the threat of missing a game has on a player’s propensity to commit a foul. In order to do so, we use a Poisson regression where we estimate the expected number of fouls committed in a game, as a function of the number of yellow cards the player can still receive before being suspended, as well as the player’s unobservable characteristics (fixed effects), and a large number of control variables. Our results suggest that if a player is only one yellow card away from suspension instead of two, the expected number of fouls he would commit in the game drops by 12% if he faces a one-game suspension and 23% if he faces a two-game suspension.
Second, we use a logistic regression to estimate the probability p of a player receiving a penalty card. Our results show that if a player is only one yellow card away from suspension instead of two, his odds (p divided by 1 − p) of getting a penalty card are reduced by 50% if he faces a one-game suspension and by 70% if he faces a two-game suspension.
Finally, we develop a dynamic model of a player’s optimal behavior and show that our empirical results are in line with the predictions of the economic theory. We calibrate model parameters (as much as possible) and use the model to analyze the effect of suspension rules on player behavior—not only in the games just before possible suspension, but even early in the season. This framework allows us to investigate the impact of changing the suspension rules, something impossible to do using the data alone (Lucas, 1976).
We want to point out that it is not obvious a priori what the effects of the suspension rules on player behavior might be. The direct effect is that the threat of suspension makes the player more cautious. There can be, however, another strategic effect. In 2010, two Real Madrid players strategically received yellow cards for delaying a game, only to miss the upcoming, last game of a group stage, and have their records clean for the more important knockout stage that their team had secured earlier. Another aspect to consider is that whether to book a player is ultimately the referee’s decision on the field. It is quite plausible that a referee, knowing a player is facing a possible one-game suspension, may alter his decision about booking in a nonobvious situation (when a foul is borderline dangerous).
The outline of the article is as follows: The second section discusses the relevant literature, while the third section contains relevant background information about the game of soccer and the English Premier League. The fourth section describes the data set utilized in this study. The Poisson and logistic regression models are explained and discussed in the fifth section, while the dynamic modeling framework is set forth in the sixth section. The last section gives some concluding remarks and avenues for future research.
Relation to the Literature
Several authors have studied the impact of a red card on outcome of the match (e.g., Bar-Eli, Tenenbaum, & Geister, 2006; Ridder, Cramer, & Hopstaken, 1994; Vecer, Kopriva, & Ichiba, 2009). Dobson, Goddard, and Staehler (2014) used a Tullock contest model to predict levels of effort in the English Premier League, with data on fouls and yellow and red cards used to reflect the effort of teams. Team-level data were used for matches played between the 2001-2002 and 2006-2007 seasons. Snyder (2013) uses individual-level Premier League data from the 2011-2012 season to predict game outcomes in the context of sports betting. We use this same data to answer a different question: How much, if at all, does the Premier League’s suspension rule curb fouling? A similar question was asked by del Corral, Prieto-Rodriguez, and Simmons (2010), who studied whether Spain’s 1995 rule change awarding 3 points for a win instead of 2 affected the team-wide probability of a red card. Garicano and Palacios-Huerta (2005) studied the same question as del Corral et al. (2010) but used data from a different year. Stride, Patterson, and Thomas (2011) gathered data on fouls in the 2010 World Cup tournament, both those caught by referees and those that should have been. They modeled foul counts with the Poisson distribution and utilized a dispersion parameter to account for non-Poisson variance. They found that position, international experience, and the stage of tournament were important predictors of fouls. The World Cup, however, has different suspension rules than the Premier League, and it does not seem that the effect of these rules on fouls was considered (Stride et al., 2011, is unpublished at the time of this writing).
Our article is closely related to the economic literature on crime 1 and deterrence. McCormick and Tollison (1984) evaluate the effect of adding the third referee in basketball on the number of fouls in the National Basketball Association. They find the additional referee matters: the number of fouls by players dropped. A similar point is made by Tella and Schargrodsky (2004) who show that an increase in the number of police reduces the number of nonviolent crimes committed. Both studies suggest that unwanted behavior declines if the probability of being punished increases. Our result confirms that observation: The more yellow cards the player has accumulated (i.e., the more likely the player is to be suspended), the fewer fouls he commits. Finally, Kessler and Levitt (1999) address whether jail sentences reduce crime through deterrence, that is, by affecting potential criminals’ behavior. Their findings suggest that the threat of punishment mitigates unwanted behavior. In our article, we confirm that effect and quantify it in the context of punishment cards and suspension rules in soccer.
A special subset of the crime literature concerns the effects of “three-strikes” legislation enacted in California and elsewhere. Such laws impose harsh punishments on frequent offenders. Early analyses of the legislation’s effects (Greenwood et al., 1994; Zimring, Kamin, & Hawkins, 1999) considered only partial deterrence, the deterrence of offenders who had already received two strikes. A more sophisticated analysis by Shepherd (2002) shows that this underestimates the true effect of the laws, since three-strikes laws also deter individuals contemplating their first offense. We likewise expect that the effects of suspension rules in the Premier League will extend beyond players who are just one card away from an accumulation limit. Helland and Tabarrok (2007), taking advantage of the randomization of trial outcomes, found that California’s three-strikes laws reduce felony arrest rates of criminals with two strikes by approximately 17–20%. Similarly, as part of our analysis, we estimate the effect, given the suspension rules, of being one versus two yellow cards away from the accumulation limit. In his comprehensive review of the evidence regarding deterrence, Nagin (2013) suggests that certainty of punishment may have a greater impact on behavior than severity of punishment. In the structural model of the “Theoretical Model” section, in addition to incorporating the severity of punishment, we include a parameter that captures the certainty of punishment.
Of course, whether a player gets called for a foul or receives a penalty card depends not only on his behavior but also on the referee’s. Morgulev, Azar, Lidor, Sabag, and Bar-Eli (2014) show that basketball players fall intentionally in order to deceive referees that they are being fouled. Such deception in soccer weakens the link between a player’s actual aggressiveness and the probability he would receive a yellow card. A play that is not dangerous may be called as one, and a truly aggressive play may be misinterpreted as “diving”—an intentional fall by the other player. We take these problems into consideration in our structural model of optimal yellow card accumulation.
Another important factor in referees’ behavior is the possibility of favoring a home team. Indeed, Dawson, Dobson, Goddard, and Wilson (2007) found that the tendency for home teams to incur fewer penalty cards than away teams was best explained by referee bias. Home-field bias was also found by Sutter and Kocher (2004) in a study of the German Bundesliga. In addition, Sutter and Kocher (2004) confirmed the finding by Garicano, Palcios-Huerta, and Predergast (2005) that extra time was awarded in a way that favored the home team. The evidence for referee bias is not limited to what has been discovered by statistical analysis of match data, however compelling this evidence is; it has also been confirmed by experimentation. Using video clips from English Premier League games, Nevill, Balmer, and Williams (2002) conducted an experiment in which 40 referees were asked to classify tackles as regular or irregular. Half of the referees were shown the clips with the audience muted while the other half were able to hear the audience. Referees who could hear the audience were more reluctant to classify home team tackles as irregular and their decisions correlated better with the “game-time” decisions than did those by referees who watched the muted clips. Taken together, these studies constitute rather strong evidence that referees tend to favor the home team when dispensing penalty cards. We therefore take venue into account in all empirical models presented herein.
Background Information
Since we study the impact of certain types of punishment rules on player behavior, we will now briefly outline the different types of offenses and individual punishments in soccer. We will also describe in detail the Premier League suspension rules.
Yellow Cards Rules and Offenses Punishable
The individual punishment we focus on is a yellow card. A yellow card is used by a referee to officially caution a player who commits a certain offense. It goes on the player’s official record (the player is booked). Two yellow cards in a game turn into a red card, which expels the player for the rest of the game, and his team plays with one man down. The player must also sit out the next game. International Federation of Association Football (FIFA) specifies the offenses that are punishable with a yellow card. These are (1) unsporting behavior; (2) dissent by word or action; (3) persistent infringement of the Laws of the Game; (4) delaying the restart of play; (5) failure to respect the required distance when play is restarted with a corner kick, throw-in, or free kick; (6) entering or reentering the field of play without the referee’s permission; and (7) deliberately leaving the field of play without the referee’s permission. The unsporting behavior includes dangerous fouls, removing a jersey after a goal celebration, or simulating actions in order to deceive a referee (e.g., diving in the penalty box in order to be given a penalty kick).
By far, the most common offense punishable by a yellow card is dangerous play. The use of yellow cards to punish dangerous play has evolved over the last 30 years to make the game safer for players involved. It includes codification of fouls that automatically result in a yellow card (e.g., an incidental tackle from behind). In addition to that, different organizations introduce additional game suspensions for players who have a record of dangerous play in multiple games. For instance, in the FIFA World Cup and in the Union of European Football Associations Champions League, two yellow cards received in two different games of the same stage result in a suspension of a player for one match.
Suspension Rules in the Premier League for 2011-2012 Season
Like any other organization, the Premier League has its own suspension rules. The season starts in August and lasts till May, with each of 20 teams playing every other team twice (once at home, once away) for a total of 38 games. If a player accumulates five yellow cards before December 31, he is suspended for one game. If he accumulates 10 yellow cards before the second Sunday in April, he is suspended for two games. Finally, if he accumulates 15 yellow cards before the end of the season, he is suspended for three games (or the rest of the season if there are fewer than three games remaining). All suspensions take place immediately following the game.
We had to pay special attention to red cards. In our data (see “Data” section), if a player receives two yellow cards in a game, it is coded as one red card and zero yellow cards. If a player receives a straight red card, it is coded as one red card as well and the player may or may not have a yellow card as well. In the Premier League, if a red card is a result of two yellows, then the player leaves the field, misses the very next game (because of the red card), but he is still booked for two yellow cards which go on his record toward the possible future suspension.
Incentives Caused by Suspension Rules
These suspension rules create a complex set of incentives for a player. First of all, as the three-strikes literature suggests, the very possibility of future suspension should curb card-worthy behavior for all players, regardless of how close they are to the cutoff (Shepherd, 2002). We expect, however, that as a player nears the accumulation limit, he will be increasingly likely not to foul. Being one or two yellows away from the limit may constitute special situations since the player is now “one strike” away in the sense that one bad act could bring him to the limit.
All else being equal, we expect the incentives against fouling to be stronger when facing a longer suspension. In practice, however, all else isn’t equal: When accumulating yellows toward a shorter suspension, players are also accumulating yellows toward the longer one as well. Thus, the measured effect of facing a one-game suspension includes the effect of a looming possible two-game suspension. In addition, there should be some temporal effect, since a player with three yellows to give and one game before the deadline will tend to be more aggressive than he would be if he had eight games to go. In the “Regression Models” section, we introduce regression models which we feel do an adequate job of teasing out the effects of these incentives, while the structural theoretical model presented in the “Theoretical Model” section, by explicitly incorporating the dynamic aspect of the incentives, allows one to explore what would happen if the suspension rules were changed.
Data
To estimate the effect of the Premier League’s cumulative penalty card suspension rules, we utilize a very rich data set containing individual player data from each game of the 2011-2012 season. The data set was released in the summer of 2012 by Manchester City Football Club in collaboration with the data gathering company Opta in order to encourage soccer analytics. For each player in each game, frequency counts for some 200 different events, ranging from successful passes to duels won to goals scored, are recorded. (A more detailed description of the individual variables we included in our analysis is available in Appendix. For a detailed description of the variables not included, see Opta, 2012.) These data are analyzed on various Internet blogs (e.g., Bime, 2012; Brown, 2012; Ramineni, 2012), but no analysis has focused on estimating the effect of the suspension rules on player or referee behavior. Snyder (2013) used the data to predict game outcomes (home loss, tie, or home win) in the context of sports betting. Table 1 gives summary statistics by position for the most important variables. Since each row of the data set corresponds to a player–game combination, these can be thought of as the league-wide per game averages. The data set included one row for every unique player–game combination for a total of 10,369 rows. There were 539 unique players from 20 teams, each player playing anywhere from 1 to 38 games. Every team played every other team twice, once at home and once away.
Summary Statistics by Position for the Most Important Variables.
Regression Models
We model the effect of the suspension policy on two outcomes of interest: the number of fouls committed and the probability of receiving a penalty card (yellow or red). We suspect that players close to the card limit will behave less aggressively and therefore commit fewer fouls and incur fewer penalties. Because number of fouls committed takes on integer values, we use Poisson regression instead of the more common linear regression. (Technically, we use quasi-Poisson regression, which allows for more flexibility in the variation of observations about their mean than does regular Poisson regression.) For modeling the probability of receiving a penalty card, we use logistic regression.
As a result of their position and/or style of play, some players are inherently more aggressive than others. We account for this by including a different intercept for each player in both models. We assume that this latent level of aggressiveness remains constant across games, but believe that there may also be game-specific increases or decreases in aggression for various reasons. For example, a player who typically is not aggressive may be agitated by someone else committing fouls against him, leading to more fouls committed than normal. Or perhaps a player gets the ball stolen, which could increase the probability that he will foul in retaliation. We account for these circumstantial, game-specific changes in aggression by including appropriate covariates in both models.
Model for Number of Fouls (Quasi-Poisson Regression)
In Poisson regression, the expected number of fouls committed by player i in game j is
where α i is the fixed effect (inherent level of aggression) for player i, xij are the covariates for player i in game j, and β is a vector of regression coefficients common to all players. The number of fouls Yij committed by player i in game j is assumed to follow a Pois(μ ij ) distribution. This type of model is sometimes called a Poisson log-linear model because log(μ ij ) is linear in the model parameters β (Agresti, 2002, pp. 125–126). Poisson regression is more appropriate than, say, ordinary least squares for modeling foul counts because the Poisson distribution has the nonnegative integers as its support, whereas the normal distribution covers the whole real line.
A special property of the Poisson distribution is that its mean and variance are equal. In practice, however, count data do not always exhibit this property. One solution that retains many of the advantages of Poisson regression is to introduce a dispersion parameter φ and require E(Yij) = μ
ij
and
We now turn our attention to the selection of covariates to be included in our model. We created three new variables and combined these in various ways in order to tease out effects of the Premier League’s cumulative yellow suspension rules. The new variables are Yellows.To.Give, Suspension.Duration, and Weeks.Until.Cutoff.
Yellows.To.Give refers to the number of yellow cards a player may incur (at the start of the game in question) before a suspension is warranted. Thus, if the date is on or before December 31, 2011, and a player has received four yellow cards, then Yellows.To.Give = 1, because once he receives one more yellow card, he has earned a suspension. If the date is on or before April 8, 2012, and the player has received seven yellow cards, then Yellows.To.Give = 3. When Yellows.To.Give is low, we believe a player will be more careful not to foul. If, in the course of a match, a player receives a number of yellow cards exceeding Yellows.To.Give, then he will be suspended for a number of games equal to Suspension.Duration, and Yellows.To.Give will be reset to the appropriate quantity at the start of the next game in which he participates. We expect that as the severity of the potential suspension increases, the propensity to foul should decrease.
We believe that Yellows.To.Give equaling 1 or 2 could constitute special situations (since one bad play results in suspension), the effects of which we would like to estimate separately for each suspension severity level. We constructed four additional variables to capture the effects of these special situations: 2 Y1S1, Y1S2, Y2S1, and Y2S2, where YiSj takes the value 1 if Yellows.To.Give equals i and Suspension.Duration equals j and the value 0 otherwise. When Yellows.To.Give exceeds 2, we require Yellows.To.Give and Suspension.Duration to affect the (log of) expected number of fouls linearly and independently by including Yellows.To.Give.Linear = Yellows.To.Give × I(Yellows.To.Give > 2) and Suspension.Duration.Linear = Suspension.Duration × I(Yellows.To.Give > 2) in the model.
Weeks.Until.Cutoff measures the number of weeks until the cutoff date for the next potential suspension. For example, suppose a player incurs his fifth cumulative yellow card on November 13, 2011, and is subsequently suspended for one game on November 20. For this player, Weeks.Until.Cutoff on November 13 is the number of weeks until December 31, but Weeks.Until.Cutoff on November 27 is the number of weeks until April 8. We expect players with more games to play before the suspension cutoff will tend to foul less, resulting in a negative coefficient.
There are, of course, other ways to incorporate these constructed variables into a model. For instance, one could assume that the (log of) expected number of fouls grows linearly in Yellows.To.Give, giving no special treatment to cases where a player is one card away from suspension. Or, instead of including a linear effect for Weeks.Until.Cutoff as we have done, one could use Yellows.To.Give/Weeks.Until.Cutoff to try to account for the fact that being three yellows away from a suspension means something different to someone one game away from the cutoff than to someone, say, eight games away. Having fit a number of similar models to this data set, we can say that the main conclusions we draw are robust to these choices. In short, we recognize that all models are approximations of reality and contend that ours is no less reasonable an approximation than any other.
Decisions to include specific control covariates were based on amateur knowledge of the game of soccer and the practical and statistical significance of the corresponding coefficients. We note that the estimates for the coefficients of interest were robust with regard to choices at the margins, due, we suspect, to the lack of multicollinearity among the covariates. Although previous literature suggested inclusion of player position, this was deemed unnecessary in light of the fact that an intercept was included for each player. The same holds true for amount of international experience as measured by number of “caps,” that is, appearances in an international game, which was found by Stride et al. (2011) to be a good predictor of number of fouls committed in the 2010 World Cup. Stride et al. suggest that fouls increase as stage increases, which they took as a proxy for the importance of the match increasing, but the Premier League season is a double round–robin tournament, not an elimination tournament, so it is not clear what the analogous variable to include would be. We tried including days since start of season with the thought that later games might be more important, but this turned out to be neither practically nor statistically significant, so it was excluded. We also tried but excluded the difference in the two teams’ end-of-season rankings for the same reason. (Weekly updated mid-season rankings would arguably have been more appropriate, but these were unavailable and would correlate quite strongly with end-of-season rankings anyhow.) Venue (Home/Away) was retained in the model despite its negligible effect because it was of secondary interest to determine whether this impacted the number of fouls called. Match.Differential, which is the score of the player’s team minus the score of the opposing team, was likewise retained in the model despite its negligible effect.
Model for Probability of Penalty Card (Logistic Regression)
We used the same set of covariates to model the probability of receiving a penalty card (yellow or red), except that the number of total fouls committed was also included as a predictor. We would like to discern whether referees treat players differently who are close to being suspended, but from these data alone we regrettably cannot disentangle changes in a player’s propensity to commit card-worthy offenses from changes in the referee’s propensity to book the player. However, under the assumption that players are no less aggressive at home than away (or that any such differences are captured by the control variables), a significant negative coefficient for Venue would confirm the home-field bias in refereeing that has been noted in the literature (see “Relation to the Literature” section herein).
Results
The results for Model 1 (fouls committed) can be seen in Table 2. Standard errors were computed using a dispersion parameter equal to .724 (see below for some discussion of this quantity). The most important finding is that when Yellows.To.Give decreases, propensity to foul decreases. For instance, when a player is one yellow card away from a one-game suspension, his expected number of fouls is
Results for the Quasi-Poisson Regression Model (Dependent Variable: Number of Fouls Committed).
Note. Approximate p values for testing whether the true coefficient is equal to 0 are in the rightmost column and are rounded to the nearest ten thousandth.
Expected Number of Fouls Divided by Expectation When Yellows.To.Give = 3 and Suspension.Duration = 1.
Note. The expected number of fouls increases in Yellows.To.Give and decreases in Suspension.Duration. No data were recorded for Yellows.To.Give ≤ 2 and Suspension.Duration = 3.
Several of the control variables’ effects are large enough to warrant mention. Losing an additional duel increases the number of expected fouls by 27% (for reference, strikers have the highest mean of 4.7 duels lost per game). Losing an additional challenge decreases expected fouls by 16% (midfielders average .65). Having an additional unsuccessful dribble decreases expected fouls by 20% (strikers average .85). The signs on the coefficients for Challenge.Lost and Unsuccessful. Dribbles are the opposite of what we expected. Our only explanation is that thinking of conditional instead of marginal effects can be quite difficult. (It turns out that the marginal effects of unsuccessful dribbles and challenges lost on total fouls committed are both positive.) Home-field advantage, which was of secondary importance to our analysis, has very little effect on the number of fouls called.
That the data are well modeled by a Poisson distribution can be established by plotting the estimated means by the empirical variation about the means (Ver Hoef & Boveng, 2007). Because each μ
ij
has only one observation associated with it, we follow the recommendation of Ver Hoef and Boveng (2007) to bin the
The results can be seen in Figure 1, wherein the linear trend (excepting the rightmost point, which constituted less than 2% of all total

Plot of the estimated means by the empirical variability about the means.
The results for the logistic model can be found in Table 4. Some important things to note about the logistic regression are that the probability of a penalty card decreases as Yellows.To.Give decreases or Suspension.Duration increases, as expected. For example, suppose a player who is two yellow cards away from a two-game suspension has a 10% probability (and hence 1/9 odds) of getting a penalty card in a given game. Were this player to have instead just one yellow card to give, this would multiply his odds of receiving card by
Results for the Logistic Regression Model (Dependent Variable: Indicator of Whether Any Penalty Card Was Received).
Note. Approximate p values for testing whether the true coefficient is equal to 0 are in the rightmost column and are rounded to the nearest ten thousandth.
Theoretical Model
The rule that suspends a player who has accumulated a certain number of yellow cards over the course of more than one game creates a dynamic problem for a player since a yellow card received early in the season affects the probability of being suspended later on. We analyze this dynamic problem using a quantitative theoretical model. We discuss what economic theory predicts about the evolution of the optimal level of aggressiveness over time and how it depends on some factors the player has control over and some he does not. We use standard dynamic programming techniques.
The state variable for the player is the number of yellow cards away from suspension. We denote it with y, and it can take values of
The effort has two effects. First, it affects the probability the player’s team wins, potentially increases the player’s market value, his happiness from playing well, and so on. All these factors are captured by a strictly increasing, strictly concave, current period payoff function w. Second, the more effort the player puts in, the higher his perceived aggressiveness tends to be. The perceived aggressiveness of the player is denoted with z, and we assume that z is the sum of the player’s actual effort x, and a random variable ε:
A player will get a yellow card if
Suppose a player starts a game at y yellow cards away from suspension. He chooses effort level x which yields current payoff w(x). If the realization of ε is such that
The dynamic programming problem of the player can be written as follows:
where β is the discount factor,
In order to simplify the analysis, we abstracted from the fact that the season ends and considered an infinite horizon problem. We also simplified the situation by assuming there is no three-game suspension, which dampens the estimated effect of suspension rules. Finally, we assume all players who have not received a one-game suspension remain eligible for it throughout the season; that is, we ignore the fact that for such players, on January 1, the suspension duration switches to 2 and there is an automatic increase in y. This simplification inflates the estimated effect of suspension rules. Relaxing these assumptions is an avenue for future research.
Characterization
At the beginning of the game, before realization of ε, the player chooses his effort level x. Let
and
where φ denotes the probability distribution function of a standard normal distribution. Since both V1 and V2 are endogenous objects, we do not provide analytical results. Instead, we characterize the model numerically. We solve the model by iterating on the value function.
Numerical Analysis
In this section, we analyze how the expected number of fouls committed depends on various model parameters. We also analyze how it evolves as the state variable y changes (which we analyze in the data).
Functional forms, parameter values, and calibration
Since our model serves illustrative purposes only, we provide an ad hoc parameterization with parameters that we are able (with a few exceptions) to calibrate to match Premier League statistics. The period payoff function w(x) is assumed to be quadratic:
The utility cost κ and the discount factor β are jointly calibrated to match the effort profile of a player before a one-game suspension. Specifically, we target two numbers from Table 3—expected numbers of fouls for players at one and at five yellow cards away from a one-game suspension, relative to the player at three cards away from a one-game suspension: 0.76 and 1.14, respectively. Our calibrated values are β = .86 and κ = 13.79. The model fit is presented in Table 5. Overall, our very stylized model matches the empirical estimates remarkably well, with exception of the cases of four and five yellow cards away from a two-game suspension, where we overshoot the expected number of fouls.
Model Fit.
aValues targeted in calibration.
Fouling as a function of number of yellow cards
We first show that our model’s predictions are consistent with the empirical estimates presented in the “Regression Models” section. Figure 2 shows how expected number of fouls evolves as the player accumulates more yellow cards. In that figure, we normalized the expected number of fouls in the very first game of the season to be one. The blue line shows how this number evolves as the player gets closer to a one-game suspension. The dashed red line shows the expected number of fouls for a player facing a possible two-game suspension. While the model was parameterized to match the end point and the midpoint of the blue line, the red dashed line is an endogenous outcome of the model. If the player is only one card away from a two-game suspension, he fouls less by 53% relative to his very first game of the season

Fouls drop as we get closer to suspension.
Fouling as a function of parameters of interest
Some important model parameters were chosen arbitrarily, because we did not have data to properly calibrate them. We now analyze how the model’s predictions depend on some parameters of interest, namely,
We start with

Effect of
Figure 4 shows the effect of σ2—the variance of the random element of “booking,” ε. As the variance increases, player aggressiveness eventually increases. In the limit, as the variance tends to infinity, the probability of getting booked depends less and less on the player’s behavior. Accordingly, he will foul as much as w permits. The effect of a decrease in σ2 is more complex. As σ2 declines, the number of fouls first drops, and then starts to increase. The initial drop is intuitive: Smaller variance of the random component means that more aggressive play is more likely to be detected, while play that is not aggressive is less likely to be mistakenly punished with a yellow card.

Effect of σ2 on fouls committed.
The subsequent rise is more subtle. For a given level of σ, the player faces a certain probability of not getting a yellow card. As σ declines, that probability gets smaller, inducing him to play less aggressively (i.e., lowering x closer to
Figure 5 shows the impact of the discount factor β on fouls committed. Of course, when the discount factor β = 0, players do not care about future games, and the effort is chosen so as to maximize w. Next, let’s consider the effect of an increase in β. For a player in the first game of the season, quite understandably, he tends to foul less as he values his future more (dot-dashed red curve). What is much more interesting is how increasing β impacts a player who is one card away from suspension (dashed blue curve). At first, the slope is negative, for the same reason that the dot-dashed red curve has negative slope. But then, around β = .6, the player starts to foul more. We believe this is because the player values his future enough that the benefit of getting y reset to

Effect of β on fouls committed.
The careful reader will notice that the dashed blue curve goes down again once β > .9. We attribute this to the fact that for players who are particularly forward thinking, not only does the benefit of being able to play more aggressively in future games matter, but the cost of sitting out future games matters as well. This type of behavior was suggested at in the discussion of incentives faced by players in this environment (see “Incentives Caused by Suspension Rules” subsection). It is exciting to see it borne out in the dynamic model.
Figure 6 shows the effect of

Effect of
Impact of Suspension Rules on Fouling
Finally, we can use the model to evaluate the impact of suspension rules on fouling. The advantage of using a structural model is that we can run a counterfactual experiment by removing the suspension rule and simulating the average number of fouls players commit during the whole season. To do so, we run the following simulation. We draw 11 × 20 × 38 values of ε (11 players that start a game in 20 different teams, multiplied by 38 games played by each team). Then, for each player we can track the evolution of his yellow cards, and compute his effort using the solution to the dynamic programming problem. When the player sits out, we assume his replacement only cares about the current game, so x* = −b/2a. The rationale is that the replacement player plays so infrequently, he will never accumulate five yellow cards. We run the simulation 100 times and calculate the average number of fouls per player per game, and compare it to x* = −b/2a, which would be the number of fouls per game had the suspension rule not been in place.
Notice, that by using the structural model (rather than econometric estimates) in our counterfactual analysis, we take into account one very important factor: Players foul less even in the very early stages, because they fear getting closer to the situation when y = 1. This effect is a known feature of the three-strikes laws intended to curb criminal behavior (Shepherd, 2002) and soccer players should also fear getting even the first strike. In other words, our econometric estimates were obtained in the environment with suspension rules in place: We cannot use our data to estimate how aggressively players would play in the beginning of the season, had there not been the suspension rule. In macroeconomic literature, this is often referred to as the Lucas’ (1976) critique: In general, the econometric estimates of relationships between different variables change when a policy changes.
Since our parameterization in this article was somewhat arbitrary, we perform this counterfactual experiment for a variety of different parameter values. That way we evaluate what factors (in theory) make the suspension rules more effective in reducing player aggressiveness. The results are presented in Table 6.
Quantitative Impact of Suspension Rules on Average Number of Fouls—Model Simulations.
Note. Top row: Expected number of fouls relative to environment without suspension rules. Bottom row: Standard error based on 100 replications of the model. Benchmark values: β = .86;
Our results suggest the existence of suspension rules can have a large impact on the level of aggressiveness. The average number of fouls, relative to an environment where no suspensions ever take place, is lower by 34%. That drop would have been 46%, should suspension take place after three yellow cards were accumulated.
It is also interesting to see how the random component affects the effectiveness of suspension rules. First, consider the effect of
Finally, we can also see that a small amount of randomness in booking is necessary for the suspension rules to impact player behavior. Increasing σ2 from 1 to 2 reduces the effectiveness of suspension rules. Similarly, reducing it from 1 to 0.25 makes the suspension rules more effective. However, when σ2 becomes very small, the impact of suspension rules on number of fouls becomes smaller and disappears in the limit as σ2 → 0. We find this to be potentially a very interesting result, posing questions about the effectiveness of the certainty of punishment (Nagin, 2013).
The estimates in Table 6 represent a first pass at evaluating the impact of suspension rules on aggressive behavior. Many features of the real world are not accounted for in our simple model. First, we are not able (with our data set) to pin down the value of
Conclusion
The main goals of punishment rules in soccer are to protect the health of players and maintain the integrity of the game. Yellow cards were introduced in the 1960s and tested during the 1970 World Cup. In late 1980s and early 1990s, different leagues and tournaments introduced additional punishment: suspension for one or more games of players who had accumulated too many yellow cards. Using Premier League data from the 2011-2012 season, we found that additional punishment matters: Players foul less.
In the theoretical section, we provided some guidance for showing what factors may have the biggest impact on player aggressiveness. We think this is a good first step toward the design of optimal punishment rules. An optimal punishment rule is different from the rule that completely removes potentially dangerous play, however. 5 Things such as slide tackles or accidental collisions between players are intrinsic parts of soccer: They provide enjoyment for fans that turns into income for players and sponsors. When thinking about optimal punishment rules, one must weigh the benefits of such enjoyment, and the costs resulting from possible injuries. This requires a very careful and thorough analysis that we think is a fascinating avenue for further research.
Our analysis can be fine-tuned by using a more refined data set which would allow for the estimation of the threshold level of aggressiveness that results in a yellow card. Such a data set would have to include measures of the actual level of aggressiveness (possibly based on postgame evaluation by experts) combined with data on which behaviors resulted in a penalty card. It would also need to distinguish between penalty cards awarded for aggressive play and those awarded for other types of unwanted behavior.
Footnotes
Appendix
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
