Fear of the Draw,Consumption and Mistaken Heuristics: Profit Opportunities in the Football Betting Market

Abstract

This study examines inefficiencies in football betting markets by analyzing the gap between bookmaker odds and empirically estimated probabilities. Using English Premier League data and a combination of Markov processes and ordered logit model, the paper constructs a betting strategy that does not rely on superior predictive power but exploits market mispricing. The strategy consistently yields abnormal returns, particularly for draws and away teams, challenging the Efficient Market Hypothesis (EMH). The analysis reveals potential behavioral biases, like emotions, and reliance on heuristics, which distort market prices. Robustness checks with data from other major European leagues support these findings. The results have broader implications for financial markets, as they highlight how behavioral patterns can create exploitable inefficiencies. We conclude that sports betting thus serves as a simplified, data-rich setting to test theories of market behavior, offering insights into how information, psychology, and pricing may interact in both betting and financial domains.

Keywords

football betting market market inefficiency behavioral bias betting strategy Markov process

Introduction

For most people, sporting events are seen primarily as an amusement and are an important part of the culture of the countries. In the case of South America and Europe, football is the sport that takes the lead in terms of popularity. However, the importance of football is not limited to entertainment as such, since this sport is the source of a large betting market, in which individuals try to test both their skill and their luck in order to obtain profits. In this scheme, where objectivity should prevail, there may be both informational and behavioral elements that affect preferences and distort the profits that could be obtained in betting. This research aims to show that it is possible to beat sport betting markets by only following a betting strategy based on rational principles, without the necessity to make better predictions than the market.

Within these markets there are a large number of assets (bets) whose values depend on different scenarios of nature and reflect relevant information that is publicly available. This means that, in essence, the football betting market, like any other sport betting market, might be considered a financial market. However, the sport betting markets, and specifically football markets, have the property that they receive information more frequently about the performance of the teams, making these markets, in theory, more efficient, eliminating any option of generating abnormal returns, thus fulfilling the Efficient Market Hypothesis (EMH) proposed by Fama (1970), at least in its semi-strong form.

In traditional financial markets, assets are valued on the basis of future expectations, which materialize in prices whose values are the result of the interaction between agents, although there is no total certainty as to whether these values are a reliable reflection of the intrinsic value of the assets. In contrast, sports betting markets have the additional property that each bet (asset) has a defined moment in time in which its intrinsic value becomes known by the market, that is, once the match has finished, bettors know with total certainty what the true value of their bet is based on the result. This would make these markets attractive to study, because the values of the real results can be compared with the expectations of the bettors (Ottaviani & Sørenson, 2009; Thaler & Ziemba, 1988; Williams, 1999).

Given the above, punters are faced with betting (investment) options, whose decisions, under the assumption individuals are rational, theoretically should be based on the information they receive periodically about the team's performance. If this were the case, this would mean that, as in the case of financial markets, it would not be possible to obtain abnormal returns consistently over time (Jensen, 1978). Despite the above, it must be clear that in sports betting markets not only do agents participate whose main purpose is to obtain profits from their investments, but there is also an important “consumption” component (Allais, 1953; Conlisk, 1993; Humphreys et al., 2013), which is why people are not always indifferent between betting on one team or another, even under similar return expectations. Thus, sport betting markets also have a component that is inherent to punters, which is related either to their personal preferences, beliefs or judgment (Tversky & Kahneman, 1973, 1974). Therefore, individuals do not necessarily incorporate information adequately, even if it is available, implying that punters may create suboptimal strategies when they try to beat the market. This would lead individuals to underestimate/overestimate certain scenarios for specific games across the season, opening opportunities for getting significant returns.

Most of the studies related to football (soccer) betting have emphasized the predictive character of the models, in terms of directly predicting the final outcome of the matches, in order to try to obtain positive returns in the sports betting market (Boshnakov et al., 2017; Cain et al., 2000; Constantinou et al., 2017; Deschamps & Gergaud, 2007; Dixon & Coles, 1997; Dixon & Pope, 2004; Goddard & Asimakopoulos, 2004; Hausch et al., 1981; Kuypers, 2000; Maher, 1982; Pope & Peel, 1989). Meanwhile, other studies have paid attention to empirically testing the existence of the so-called Longshot Bias (or Improbable Favorability Bias), which corresponds to the tendency of individuals to prefer more improbable outcomes, which can alter the expected return of bets (Cain et al., 2000; Henery, 1985; Williams & Paton, 1997; Woodland & Woodland, 1994).

Considering that returns from betting are linked to the probability that one of the possible states occurs in a match, home wins (Home), home loses (Away) or draw (from now onwards, HAD), we used public information for all market agents (punters), to initially establish a Markov model for a discrete stochastic process and, subsequently, an ordered multiple choice model, from which the probabilities of occurrence of each match were estimated. With this, a betting strategy was created that seeks to simulate the behavior of a rational bettor, using as input the estimated probabilities and comparing them with those of the market, thus generating a set of bets that allows discriminating between bets with a high or low expected return, giving room for abnormal profits. The results show that indeed the proposed betting strategy allows us to obtain positive abnormal returns. This suggests that, on the one hand, the EMH would not be fulfilled in the case of sports betting markets, and, on the other hand, there are biases, which would mean that market agents are not cognitively incorporating the available information correctly in their betting strategies.

In summary, this study finds that market odds consistently deviate from estimated probabilities derived from available information, particularly in the case of draws and away teams. The proposed strategy, which does not require superior prediction, yields statistically significant abnormal returns, challenging the assumption of efficiency in betting markets. These findings align with behavioral research insights and suggest that financial markets, like football betting markets may be influenced by cognitive biases and information-processing limitations. This analogy reinforces the idea that understanding investor behavior and mispricing mechanisms in financial contexts can benefit from empirical evidence observed in controlled, data-rich settings like sports betting.

Literature Review

Financial Perspective on Sport Betting Markets

Market efficiency has been a widely discussed topic in financial literature over the years. One of the first related works, Fama (1970) attempts to explain the concept through models such as the “Fair Game”, “Submartingale” and “Random Walk”. In this research, Fama coined the concept of EMH, according to which markets become efficient to the extent that prices incorporate available information. Thus, it is stated that no market agent could obtain abnormal returns consistently in the long term, given the condition of randomness of returns, which is subject to the incorporation of information that comes in an unpredictable way for those who participate in the market.

While most of the subsequent empirical evidence showed that the EMH is indeed fulfilled (Jensen, 1978), Ball (1978) and Watts (1978) provide evidence that is inconsistent with this theory, observing the existence of abnormal returns. However, according to Ball (1978) those abnormal returns are not evidence of market inefficiencies but instead are due to deficiencies in the capital asset-pricing model. Furthermore, another line of research argues that the abnormalities might be explained because of judgment biases of investors, which makes them either overreact or underreact under certain circumstances (Barberis et al., 1998). This idea lies on the evidence stemming from cognitive psychology showed by Tversky and Kahneman (1973, 1974), suggesting that individuals give excessive importance to recent patterns in the data and not enough to the properties of the population that generates the data, and on the idea of conservatism posited by Edwards (1968), that the incorporation of information in cognitive models is slow when there is new evidence. Nonetheless, Fama (1998) argues that although there is evidence showing significant abnormal returns that might apparently be explained by behavioral patterns (over/underreaction), these are rather fragile and tend to disappear with reasonable changes in the way returns are measured, suggesting that EMH would indeed be fulfilled in financial markets.

In this respect, in traditional financial markets, agents try to use available information to value assets. Therefore, to the extent that agents incorporate information, which is assumed to be predominantly rational, it should be reflected in the value of those assets. Thus, sports betting markets might be considered a financial market, which has a large number of investors (punters), great access to relevant information, assets (bets) whose values depend on the performance of the companies (teams), but also has the additional property that each asset (bet) has a defined point in time in which its fundamental value is realized and known by the entire market once the game is over (Ottaviani & Sørenson, 2009; Thaler & Ziemba, 1988; Williams, 1999). This represents an advantage over the study of traditional financial markets, where the fundamental or intrinsic value of assets is never known with certainty, so that any analysis is based only on estimates and expectations of such a value (Thaler & Ziemba, 1988).

Behavioral Nature of Betting Markets

Evidently, just like in traditional financial markets, sport betting markets may be influenced by behavioral patterns. In fact, there are reasons to think that those behavioral patterns should be stronger in sport betting markets, even to the point their effect can outperform the effect of a greater availability of information in these markets in certain situations. Tversky and Kahneman (1974) explain that the way in which individuals determine probabilities in the face of uncertain events underlies a limited number of heuristic principles. This subjective probability assessment is based on cues or indicators that can often lead to biased results.

In this respect, a well-documented behavioral phenomenon in sport betting markets is the so-called Longshot Bias, or Favorite-Longshot Bias (Cain et al., 2000; Henery, 1985; Williams & Paton, 1997; Woodland & Woodland, 1994). This bias occurs because punters prefer to risk betting options that, although they have a low probability of occurrence, give a higher return to the punter in case of choosing the correct outcome. Thus, the bookmaker assigns a higher proportion of the margin¹ to the less probable scenarios, therefore, the less probable scenarios give a lower payout in terms of their probability of occurrence, so that their expected return will be lower than that of the more probable scenarios.

In the particular case of the soccer betting market, other biases in the assignment of probabilities can potentially be given by the history of the teams or the players that the team possesses, which can be heuristic and misleading the judgement of actual team performance, being in accordance also with the idea of conservatism (Edwards, 1968) and the Halo Effect proposed by Thorndike (1920), who found empirical evidence demonstrating that the initial the impression (positive or negative) about the characteristic of a person, situation or entity may influence the a posteriori assessment of the real characteristics or even other characteristics not related at all to the person/situation/entity. For instance, this bias has been found to be significant in the case of financial markets when investors are faced with similar investment but from firms with higher or lower reputation (Jang et al., 2016). Therefore, in line with these biases, it might be the case that punters are prone to overweight the likelihood for teams that have a big history or more renowned players, regardless of their real performance.

Another aspect to consider is the so-called Home Advantage, according to which teams playing at home would have a higher probability of winning, which would be given by factors such as field conditions, fatigue due to travel (from the away team point of view), and fan support for the home team, among other factors. Although the home advantage concept emerged as an empirical truth, it has subsequently been tested in literature (Carron et al., 2005; Clarke & Norman, 1995; Fischer & Haucap, 2021; Pollard, 1986; Waters & Lovell, 2002; Winkelmann et al., 2021). This conception of home teams being favored by external conditions can also work as heuristic for punters at the time they create their expectations about the odds. For instance, Winkelmann et al. (2021) found that the prohibition of public gatherings in stadiums during the COVID-19 pandemic eliminated the home advantage in the Germain Bundesliga. The authors found that bookmakers had problems adjusting the betting odds in accordance with the disappeared home advantage. These results are consistent with the ones obtained by Fischer and Haucap (2021), who analyzed the same tournament during the COVID-19 pandemic.

Moreover, a punter's evaluation about teams’ performance may be based on information that is more accessible in their memory rather than based on all available information. For instance, Tversky and Kahneman (1973) posit that people make some decisions based on assessments that stem from more recent available information, they called this bias as the availability heuristic. In the case of football teams, it is unlikely that punters are able to identify all the information for all teams when they set their bets, instead, many times, punters could rely on mental shortcuts to estimate the likelihood of events, which in turn will be based on how easily they can recall the frequency of those events. Thus, more infrequent events, like draws or away team victory in football (provided the home advantage), may be systematically underestimated by punters and bookmakers, which is consistent with the betting strategy proposed by Archontakis and Osborne (2007), where punters can obtain better returns by betting to draws in the World Cup, and also the results from Snowberg and Wolfers (2010), who show lower losses for backing draws than other outcomes in English Football. These mistaken heuristics can influence the assessment of punters about the probabilities.

For his part, Allais (1953), in his famous critique of the axioms of the Expected Utility Theory, argues that among some aspects to take into account when it is about betting is the pleasure (utility) that gambling itself implies for the gambler. In fact, more recent studies about sport betting markets have confirmed the importance of the consumption perspective on the decisions of individuals (Humphreys et al., 2013; Mao et al., 2015; Stetzka & Winter, 2023). Stetzka and Winter (2023) argue that even though monetary consequences are not irrelevant for gamblers as they take into consideration monetary outcomes, consumers of gambling services also see it as a commodity that offers entertainment and excitement, implying that football betting markets are not purely rational nor fully irrational.

Indeed, Humphreys et al. (2013) found empirical evidence in the context of NCAA basketball games, whereby the authors identify that the behavior of bettors resembles more to that of sport fans rather than wealth-maximizing investors, meaning that overall punters do prefer one team in the detriment of draws. It is in accordance with the results from Franck et al. (2011), who argue that sentiments have a strong influence in these markets, finding evidence confirming that more favorable odds are extended to bets on more popular clubs, since bookmakers offer lower prices for bets with comparatively stronger demand. In fact, plenty of research has found that people do prefer sports with higher scoring rates (Jane, 2014; Paul & Weinbach, 2007; Salaga & Tainsky, 2015). In this respect, Vlastakis et al. (2009) found that on average draws presented the lowest total scoring rate across European football competitions, meaning that eventually draws can be seen as less appealing for punters from the consumer point of view.

In the line of sentiments and emotions, some research suggests that feelings like anxiety and fear play an important role in the way individuals assess risk and uncertainty. For instance, Hsee et al. (2001) argue that the assessment of the risk associated with outcomes is not purely a cognitive evaluation but rather is heavily influenced by feelings that may even stem from that same evaluation. The authors propose that the evaluation of subjective probabilities is a conjoint process that involves cognition of the likelihood and emotions about the outcome itself. Thus, emotions like fear or anxiety generated by the potential outcomes would make people more risk averse, beyond what the purely cognitive evaluation would suggest. In the case of the football betting market, the fact that individuals are able to assess the probability of the outcomes in real time while the matches are happening adds another dimension to the model proposed by Hsee et al. (2001), because in this case not only the outcome can generate fear or anxiety, but the match itself. Let's think in a punter whose money is on team A, there are very many cases in which the punter will be sure with high certainty that his bet will be realized, for example, if team A is winning by many goals (difference of 2, 3, 4… etc.), even if team A makes a big mistake. However, in the case of a draw, it is not the same, because draws are subject to change under any little mistake, so, punter that bet on draws are very unlikely to be free from anxiety or fear about their bets even to the very last minute of the match. Therefore, when punters bet on draws, they have the cognitive evaluation that there is no possible scenario in which they can watch the game with confidence that their bet will become true, which makes them more reluctant to draws than they cognitively should.

So, there are at least four effects other than longshot bias that can distort the real odds that bookmakers should offer, the Halo Effect, the Availability Heuristic, the Consumption Effect and the emotions/affect effect, whose expected influence on the odds expectation is summarized in Table 1. It is noteworthy that these four effects are not necessarily exclusive, meaning that they can be present all at the same time. All these patterns can give rise to inefficiencies that eventually can be exploited by more rational decision makers.

Table 1.

Possible Behavioral Biases in Sport Betting Markets.

Bias Driver	Description	Expected Effect
Halo Effect	Punters should be prone to prefer teams with better reputation	The likelihood offered by bookmakers for bigger and historically famous teams should be higher than the intrinsic likelihood
Mistaken Heuristics Effect	Apparently “odd” events should be underestimated by punters, while apparently “common” events overestimated	The likelihood offered by bookmakers for draws and away should be smaller, home teams should be bigger, depending on the realized frequencies.
Consumption Effect	Punters should be influenced by their feelings and personal preferences	The likelihood offered by bookmakers for draws should be smaller than the intrinsic likelihood
Emotions and Affect	Punters assessment of risk is influenced by emotions and affect such as fear and anxiety, making them to be more reluctant to draws.	The likelihood offered by bookmakers for draws should be smaller than the intrinsic likelihood

Source: Own elaboration.

Prior Testing of Sport Betting Markets

Since the seminal works of Hausch et al. (1981) and Maher (1982), several authors have in fact attempted to show that there are inefficiencies in the sports betting market by using stochastic and probabilistic models. Dixon and Coles (1997) made an estimate based on a discrete choice model with a Poisson distribution, finding inefficiencies that can be exploited in the betting market, taking as a sample data from the First Division of the English Football League in the 1995–1996 season. Later, Dixon and Pope (2004) carried out the same methodology, but incorporating information available before each match, obtaining similar results. Likewise, through an Ordered Logit estimation method, Hvattum (2013) attempts to prove the informational efficiency of football betting markets, where, using data from the five most important leagues in Europe, he concludes that there would indeed be informational inefficiencies that allow obtaining better returns than by making a blind bet. For their part, Boshnakov et al. (2017), propose an alternative approach based on a Weilbull counting model to determine the goals scored in a match, obtaining a better fit and confirming the existence of inefficiencies. On the other hand, Constantinou et al. (2017) use a Bayesian network model, based on three main factors, team strength, form and fatigue with motivation. With this the authors incorporate both information available in the market and personal judgment to estimate the probability of HAD (Home, Away, Draw), finding results that confirm the possibility of obtaining abnormal returns. Other works that also suggest that there may be inefficiencies in sports betting markets are Pope and Peel (1989), Forrest and Simmons (2000a, 2000b), Kuypers (2000), Cain et al. (2000), Goddard and Asimakopoulos (2004), and Deschamps and Gergaud (2007).

Within the framework of this quest to generate models that can explain or predict the results of matches, several variables have been considered. Naturally, the outcome of a match is expected to depend, to a large extent, on the performance or strength of the teams facing each other. Maher (1982), Dixon and Coles (1997), and Rue and Salvesen (2000) have translated this into two aspects, called Attacking Ability and Defensive Ability. In this sense, a team with better performance (measured in terms of offense and defense) should have a better chance of winning.

In line with the above, Dixon and Coles (1997) considered that a good way to represent attacking ability is by the goals scored by the team, while a good way to represent defending ability is by the goals conceded. Likewise, Dixon and Coles (1997) also posit that the most recent results should be a better measure of its future performance. Meanwhile, Chumacero (2009) considered that previously accumulated points would be a possible predictor of the outcome of a match. Along with the same idea, Lasek et al. (2013) and Kausel et al. (2019) consider that the difference in the leaderboard between teams would be a measure of skill, which is based on the previous performance of the teams.

All these studies have focused mostly on getting abnormal returns by predicting the outcomes of each game better than the market on average. Nevertheless, this prior research has ignored that, under the influence of behavioral patterns in the market (if there are), those biases should affect primarily the probabilities that market offers (which are represented by the odds), and not necessarily the expected outcome (HAD). The above means that, by trying to get abnormal returns predicting the outcomes more accurately than the market, researchers are not fully capturing the whole size of the bias, implying that there are options to obtain even higher expected abnormal returns, even without the need to be more accurate than the market on average in predicting HAD.

The Football Betting Market

The Sportradar AG provides services to bookmakers and monitors potential fraud, reporting that the sports betting market moves around 1.5 billion euros a year worldwide. According to Sportradar AG, within Europe, the United Kingdom is the country with the largest legal sports betting market. The football betting market, in its most traditional format, consists of betting a certain amount of money on one of three possible states of nature; the home team wins, the home team loses or the match ends in a draw (HAD). The market assigns each state of nature a number that represents how much the payout (odd) is granted in case of winning. Naturally, if the bet is incorrect, the money wagered remains with the bookmaker. These assigned odds are inversely proportional to the probability of occurrence of each state, so a state with a low probability of occurrence will have to grant a greater reward, given the greater risk it implies for punters.

There are several formats to represent the odds in betting markets, two of the most popular being the Fractional odds (American) and Decimal odds (European). Under the decimal odds format, the probability of each state of nature HAD is the result of dividing 1 by the odds offered by the bookmaker (equation (1)). For example, if the odds offered by the bookmaker for the victory of a particular team is 1.1, then the probability that the market associates with that state is 1 over 1.1, which is equal to 90.9%. Bookmakers want to ensure a return for themselves, which corresponds precisely to the additional premium that the odds have, known as the margin (Margin = Prob. H + Prob. A + Prob. D − 1), whereby the sum of the odds gives a value greater than 1, making it necessary to adjust these odds to prevent the calculations based on this information from generating bias.

The existence of the margin represents an issue, particularly because the way in which bookmakers distribute the margin in the three states of nature HAD is private information and, therefore, is not directly identifiable. However, Sauer (2005) proposes a method to deal with this problem, arguing that a solution is to distribute the margin proportionally to the probability of each state. Therefore, the probability estimated for each state of nature corresponds to the one described in equation (2).

C_{i} = \frac{1}{P_{i}}

(1)

{\hat{P}}_{H A D} * = \frac{P_{H A D}}{1 + ϕ}

(2)

Where $C_{i}$ corresponds to the odds the bookmaker offers for the occurrence of the event i and $ϕ$ corresponds to the size of the margin.

Data

In order to evaluate the efficiency of the football betting market, I will work with historical data of the English Premier League (EPL) for the all seven seasons between 2017 and 2024. The choice of the English Premier League is due to the fact that it is the most popular football league in the world when measured in terms of total revenue generated. The EPL generated an aggregate revenue of €5,297 million during the 2016–2017 season, the highest revenue followed by the Spanish league (LaLiga). Given its high level of popularity, the market is expected to possess a higher level of information and knowledge regarding the teams. On the other hand, it was decided to work with each season independently, since teams usually sign players and coaches at the beginning of the season in the summer market, making it difficult to compare the performance of a team from one season to another, since it is very likely that the players are not the same, or even if they are, the performance may change significantly from one year to another.

The historical data of the odds associated with each of the 380 matches for each season between 2017 and 2024 were obtained from the website of the betting odds comparison service company, OddsPortal². Meanwhile, data were also extracted from the official website of the EPL³ related to goals scored, points, among others. Although initially we had observations from 380 matches (full season), after the third matchweek all teams had already played at least once both as home and away, so we extracted the first two dates (20 matches). Additionally, it is assumed that in the first matches the teams test players and strategies, so they are not necessarily going to be a very accurate representation of their long-term play. In the second stage, the sample was separated into two sets. One of them corresponds to the training data, with 250 observations, corresponding to the interval from matchweek 3 to 27, which will be used to train the model and obtain its parameters. On the other hand, the remaining sample, corresponding to the last 11 matchweeks of the tournament, will be used to test the model and evaluate its predictive performance, but mainly its performance in terms of the returns obtained from the “rational” betting strategy.

With respect to the distribution of the states of nature (HAD), Tables 2 and 3 offer an illustrative description of the odds for season 2017–2018. It can be observed that the average odds given for the case of the away team win is higher than the case of the home team win and draw, which would be explained by the home advantage. This shows us that there are substantial differences between the odds awarded for each of the three possible states. If we analyze the volatility of the odds paid in each state of nature, we can see that the standard deviation for the case of draws is considerably lower compared to the other two cases. On the other hand, if we look at the risk-return relationship, we see that the Sharpe Ratio (Sharpe, 1966) of the odds is higher in the case of the draws.

Table 2.

Descriptive Statistics of Betting Market Odds.

	Home	Away	Draw
Relative Freq.	0.46	0.28	0.27
Absolute Freq.	164	100	96
Average	3.10	5.46	4.41
Std. Standard	2.44	5.20	1.70
Sharpe Ratio	1.27	1.05	2.59

Source: Own elaboration.

Table 3.

Descriptive Statistics Conditional Coefficients.

	Home	Away	Draw
Relative Freq.	0.46	0.28	0.27
Absolute Freq.	164	100	96
Average	2.23	2.82	3.96
Std. Standard	1.50	2.05	1.43
Sharpe Ratio	1.48	1.37	2.77

Source: Own elaboration based on Premier League.

Moreover, when taking the odds that were actually paid given the realized occurrence of the HAD event, which we will define as conditional odds, we can observe that the distribution of these odds (see Table 3) is like that presented in the non-conditional ones. However, if we analyze the variation of the average odds between Tables 2 and 3, we can see that, in the case of draws, the odds paid decrease proportionally less than in the case of home and away wins. This suggests that for the case of draws, there is a smaller number of high odds that fail to be paid, which means that there is a high number of high odds events that actually occurred. In other words, it means that there are a high number of events with a low probability of occurrence (according to the market) that finally do occur.

Regarding the 250 training matches that go from the third to the twenty-seventh matchweek, we describe a summary of the variables used in the model for each season in Table 4. On the other hand, for those considered as test matchweeks, referring to the last 11 rounds of the tournament (from the 28th to the 38th matchweek), their summary is described in Table 5.

Table 4.

Descriptive Statistics for the Training Sample Across Seasons.

Season			Home			Away			Draw
Season		Diff. Points	Diff. Goals	Winning streak	Diff. Points	Diff. Goals	Winning streak	Diff. Points	Diff. Goals	Winning streak
	Mean	4,632	6,965	1,202	0.015	−1,200	−0.354	2,732	3,507	0.085
	SD	7,704	12,604	2,472	6,051	8,222	1,807	5,794	8,584	1,834
	Median	4	4	0	0	−1	0	3	4	0
2017–2018	Max	27	42	12	16	23	5	19	27	7
	Min	−13	−24	−3	−17	−22	−8	−18	−24	−9
	Range	40	66	15	33	45	13	37	51	16
	Obs	114	114	114	65	65	65	71	71	71
	Mean	4.518	7.035	0.719	−1.482	−2.059	−0.047	2.314	2.980	0.549
	SD	6.370	9.578	1.660	5.325	7.395	1.360	4.904	6.868	1.352
	Median	4	5	0	−2	−3	0	2	2	0
2018–2019	Max	24	40	8	20	33	9	15	26	7
	Min	−19	−24	−5	−18	−23	−4	−12	−17	−3
	Range	43	64	13	38	56	13	27	43	10
	Obs	114	114	114	85	85	85	51	51	51
	Mean	4.225	5.919	0.847	0.392	−0.684	−0.253	2.600	3.267	0.483
	SD	5.341	8.068	2.243	5.087	7.467	1.441	4.482	6.788	1.160
	Median	3	5	0	1	0	0	2	2	0
2019–2020	Max	27	36	13	13	24	3	18	24	7
	Min	−7	−20	−4	−22	−26	−6	−7	−15	−4
	Range	34	56	17	35	50	9	25	39	11
	Obs	111	111	111	79	79	79	60	60	60
	Mean	1.054	1.848	0.359	−2.280	−3.032	−0.355	−1.923	−2.631	−0.123
	SD	4.476	6.664	1.380	5.296	7.028	1.154	5.456	8.011	1.303
	Median	1	1	0	−2	−2	0	−1	−1	0
2020–2021	Max	19	26	6	17	23	2	19	22	7
	Min	−16	−23	−3	−21	−30	−5	−21	−36	−6
	Range	35	49	9	38	53	7	40	58	13
	Obs	92	92	92	93	93	93	65	65	65
	Mean	3.816	7.316	0.592	−1.136	−1.034	−0.295	1.328	3.141	0.047
	SD	5.247	9.297	1.574	5.698	8.816	1.054	5.157	7.458	1.040
	Median	3	5	0	−1	0	0	1	3	0
2021–2022	Max	20	40	9	14	30	2	18	23	3
	Min	−8	−16	−3	−20	−33	−4	−13	−16	−6
	Range	28	56	12	34	63	6	31	39	9
	Obs	98	98	98	88	88	88	64	64	64
	Mean	5.627	9.814	0.712	1.608	3.392	0.189	4.034	6.466	0.397
	SD	5.750	9.349	1.364	5.127	7.348	1.323	5.654	8.688	1.224
	Median	5	7	0	1	3	0	3	5	0
2022–2023	Max	22	39	7	17	31	7	17	29	5
	Min	−17	−19	−4	−16	−18	−3	−9	−14	−3
	Range	39	58	11	33	49	10	26	43	8
	Obs	118	118	118	74	74	74	58	58	58
	Mean	4.536	7.196	0.768	−0.793	−0.529	−0.149	2.137	5.196	0.490
	SD	5.721	8.575	1.553	6.159	8.859	1.461	5.115	8.185	1.455
	Median	4	5	0	0	0	0	1	3	0
2023–2024	Max	23	33	6	23	30	7	23	38	8
	Min	−9	−13	−3	−18	−41	−4	−9	−10	−3
	Range	32	46	9	41	71	11	32	48	11
	Obs	112	112	112	87	87	87	51	51	51

Source: Own elaboration based on Premier League.

Table 5.

Descriptive Statistics for the Test Sample Across Seasons.

Season			Home			Away			Draw
Season		Diff. Points	Diff. Goals	Winning streak	Diff. Points	Diff. Goals	Winning streak	Diff. Points	Diff. Goals	Winning streak
	Mean	13,360	15,021	1,420	1,000	−0.171	0.343	9,160	13,200	0.480
	SD	11,111	17,909	2,990	12,182	17,889	2,722	11,863	20,384	1,873
	Median	12	15	1	1	1	0	9	13	0
2017–2018	Max	38	62	15	34	45	13	36	68	4
	Min	−19	−24	−3	−28	−36	−4	−17	−21	−4
	Range	57	86	18	62	81	17	53	89	8
	Obs	50	50	50	35	35	35	25	25	25
	Mean	11.263	17.877	1.596	0.083	−1.472	0.000	7.941	12.059	0.235
	SD	8.612	13.677	1.577	6.983	10.418	1.402	6.991	9.828	0.419
	Median	9	12	0	6	7	1	7	8	0
2018–2019	Max	41	66	9	25	38	9	32	42	3
	Min	−15	−16	−2	−18	−33	−6	−21	−24	−1
	Range	56	82	11	43	71	15	53	66	4
	Obs	57	57	57	36	36	36	17	17	17
	Mean	9.189	14.019	1.679	0.469	0.375	0.063	7.120	8.200	1.040
	SD	8.575	13.245	2.244	4.542	6.010	0.618	5.983	8.691	1.786
	Median	4	7	0	9	16	1	3	5	0
2019–2020	Max	39	65	16	21	26	4	29	44	15
	Min	−23	−32	−8	−17	−20	−2	−12	−16	−2
	Range	62	97	24	38	46	6	41	60	17
	Obs	53	53	53	32	32	32	25	25	25
	Mean	3.378	8.333	−0.111	−3.277	−3.979	−0.660	−3.111	−1.667	−0.167
	SD	5.875	8.899	1.380	5.845	8.900	1.919	4.586	5.274	0.551
	Median	−2	−2	0	0	3	0	−5	−5	0
2020–2021	Max	23	34	7	19	27	8	13	18	3
	Min	−27	−27	−12	−23	−33	−11	−16	−14	−2
	Range	50	61	19	42	60	19	29	32	5
	Obs	45	45	45	47	47	47	18	18	18
	Mean	8.925	15.302	0.792	−2.778	−4.528	0.000	3.714	6.429	0.381
	SD	7.417	12.271	1.613	6.112	10.310	0.902	5.310	8.384	1.548
	Median	6	11	0	4	10	0	3	6	0
2021–2022	Max	36	56	11	22	29	6	21	33	12
	Min	−13	−25	−5	−24	−41	−3	−17	−23	−3
	Range	49	81	16	46	70	9	38	56	15
	Obs	53	53	53	36	36	36	21	21	21
	Mean	12.649	20.070	1.649	2.548	4.419	−0.258	6.682	10.682	0.318
	SD	8.323	13.297	1.519	5.273	8.234	0.840	6.704	10.493	1.070
	Median	8	14	0	10	11	1	6	9	0
2022–2023	Max	34	58	8	19	37	4	25	38	4
	Min	−12	−16	−2	−20	−28	−4	−14	−21	−5
	Range	46	74	10	39	65	8	39	59	9
	Obs	57	57	57	31	31	31	22	22	22
	Mean	11.712	18.788	0.923	1.400	−1.133	−0.033	4.286	5.714	0.071
	SD	8.094	12.120	0.900	4.977	8.696	0.804	5.773	9.580	0.716
	Median	7	11	0	6	7	0	6	12	0
2023–2024	Max	33	48	4	28	44	5	24	37	3
	Min	−13	−15	−1	−15	−34	−3	−9	−29	−3
	Range	46	63	5	43	78	8	33	66	6
	Obs	52	52	52	30	30	30	28	28	28

Source: Own elaboration based on Premier League.

It can be seen that there is a large difference in the mean of the variables if we consider the time window used in Table 4 and the one used in Table 5. This difference lies in the fact that, as the variables are measured with the absolute differences between teams, as more rounds go by, the best teams win and accumulate more points and the worst teams remain at the bottom of the results, making the differences become increasingly larger as the championship progresses. On the other hand, the differences could also be explained by the fact that teams can change their performance throughout the championship.

If we consider that in the middle of the season there is a recess period where teams stop playing, it is natural that teams modify or improve their game strategies, which may be different between teams, causing differences in the composition of the data, to the point that the estimates will not be able to adequately incorporate these effects. For example, in the case of a team's winning streak, which is a variable that by its nature should not suffer from the accumulation problem that the Points difference variable has (column Diff. Points), however, there is evidence that the behavior of the teams effectively changes as the end of the season approaches.

Preliminary Analysis: Markov Process

The football betting market can evolve as much as the strategies of the teams evolve as the tournament progresses, which is not observable with descriptive data. Under these conditions, it is reasonable to think that the result of a given match is not explained as much by past results, but rather it will be conditioned by the current state of the teams. This situation can be modeled by as a Markov process on the probabilities, therefore, from a transition matrix, the steady state probability associated with each scenario was estimated, which, in this case, would be independent of the initial state of any particular match and consequently represents a long-term probability (Datta & McCormick, 1993; Kulperger & Prakasa Rao, 1989). Thus, as the season progresses, team performance should obey a long-term behavior that is independent of any initial HAD state that the team has had.

In order to determine whether there is a difference between the steady-state probabilities and the probabilities assigned by the market, the statistical resampling method known as Bootstrap (Efron, 1979) was used. The first step in making the estimate of the Markov process is to generate a transition matrix, which is obtained by calculating the probability of each possible scenario conditioned on the outcome of the immediately preceding match. To estimate the steady-state probabilities using Bootstrap, a total of 10 thousand estimates were considered, using randomly selected subsamples with a size of 70% of the total sample, which allowed creating a distribution for the probabilities of each scenario.

The mean of the observed probabilities, which basically represent the actual results that occurred for each state, is obtained by dividing the absolute frequency of each scenario by the total number of matches, as shown in the first column of Table 5. Likewise, the mean of the market probability corresponds to the simple average of the probability that the market (bookmaker) assigns to each scenario (see details in Annex II). The mean of the steady-state probabilities corresponds to the average obtained from the distributions generated using the Bootstrap method, as can be seen in the last column of Table 6 for which the confidence intervals were determined by the 5 and 99 percentiles (90% CI), 2.5 and 97.5 percentiles (95% CI) and 0.5 and 99.5 (99% CI).

Table 6.

Probability Comparison for the English Premier League by Season.

Season		Observed Probability	Market Probability	Steady-State Probability
Season		Mean	Mean	Mean	90% CI	95% CI	99% CI
2017–2018	Home	0.456	0.445	0.459	0.425–0.490	0.409–0.510	0.394–0.527
	Away	0.278	0.311	0.256	0.233–0.280	0.217–0.293	0.205–0.303
	Draw	0.267	0.241	0.285	0.257–0.312	0.242–0.327	0.229–0.340
2018–2019	Home	0.456	0.446	0.441	0.413–0.469	0.404–0.477	0.389–0.492
	Away	0.204	0.238	0.242	0.219–0.264	0.211–0.271	0.199–0.286
	Draw	0.340	0.316	0.317	0.291–0.344	0.282–0.351	0.269–0.364
2019–2020	Home	0.444	0.452	0.370	0.344–0.394	0.338–0.402	0.324–0.415
	Away	0.240	0.235	0.257	0.233–0.281	0.226–0.288	0.213–0.304
	Draw	0.316	0.313	0.373	0.347–0.398	0.339–0.406	0.328–0.419
2020–2021	Home	0.368	0.417	0.389	0.364–0.414	0.357–0.423	0.345–0.436
	Away	0.260	0.244	0.257	0.232–0.280	0.225–0.287	0.213–0.300
	Draw	0.372	0.339	0.354	0.324–0.382	0.316–0.390	0.302–0.405
	Home	0.392	0.470	0.389	0.364–0.414	0.357–0.423	0.345–0.436
2021–2022	Away	0.256	0.234	0.257	0.232–0.280	0.225–0.287	0.213–0.300
	Draw	0.352	0.297	0.354	0.324–0.382	0.316–0.390	0.302–0.405
	Home	0.472	0.433	0.449	0.423–0.475	0.415–0.482	0.401–0.497
2022–2023	Away	0.232	0.243	0.205	0.181–0.230	0.174–0.237	0.162–0.250
	Draw	0.296	0.324	0.345	0.320–0.371	0.312–0.377	0.298–0.390
	Home	0.448	0.443	0.449	0.422–0.476	0.414–0.484	0.401–0.497
2023–2024	Away	0.204	0.230	0.206	0.180–0.231	0.174–0.239	0.161–0.251
	Draw	0.348	0.328	0.344	0.319–0.370	0.311–0.378	0.296–0.390

Source: Own elaboration.

In the case of the observed probabilities, most of the probabilities associated with any state HAD fall within the 90% confidence intervals to the very least (see Table 6), so it can be said that there is no significant difference between the observed probabilities and the steady-state probabilities. This implies that the steady state probabilities would be a good estimate of the observed probabilities. However, in the case of the market probabilities, it can be observed that the mean of the “Away” scenario is higher than the upper bound of the 90% confidence interval in some seasons (like 2017–2019, 2022–2023, 2023–2024) or very close to the upper/lower bound at least (like in 2021–2022, 2019–2020). Moreover, in the “Draw” scenario the probability is lower than the 90% confidence interval to the very least in several cases (like in 2017–2018, 2019–2020, 2021–2022) or very close to the upper/lower bound (like in 2022–2023, 2023–2024), so it is possible to infer that there is indeed a significant difference in the probabilities assigned by the market and the stationary probabilities, where the market effectively misvalues the probability of a draw and the probability of the Away team. Interestingly, there are three consecutive seasons from 2019–2020 to 2021–2022 where the market probability for the Home scenario is higher than the upper bound of the 90% confidence interval at least, which arguably can be related to the distortion that home advantage suffered because of the COVID-19 situation and the prohibition of public gatherings in stadiums that eliminated part of the advantage that home teams hold generally (Fischer & Haucap, 2021; Winkelmann et al., 2021).

Empirical Strategy

The probabilities of HAD are linked to hierarchical and exclusive events relative to the performance of both teams in a given match. This implies that the outcome of a match is, on average, determined by the performance differential between the two contenders. The greater the performance gap between the home and away teams, the higher the likelihood that the home team will win. Conversely, as this performance gap narrows and eventually disappears, the most probable outcome becomes a draw. If the gap reverses in favor of the away team, the most likely result shifts toward an away-team victory.

The above description represents the rationale of the proposed model, and why it is possible to use an ordered multiple choice model. Similar models have been used in the study of sports betting markets (Forrest & Simmons, 2000a, 2000b; Goddard & Asimakopoulos, 2004; Hvattum, 2013; Kuypers, 2000) to assess inefficiencies in information and returns. Unlike other previously conducted work, we focus on a betting format in which one wants to predict one of the three possible outcomes (HAD) and not the goals scored by each team (exact outcome). Given the above, to estimate the probabilities of each matchup, an Ordered Logit estimation method was used.

If the market turns out to be efficient, the odds of HAD are well determined for bookmakers. Consequently, there should be no possibility of obtaining a positive and statistically significant return consistently over time. This is because the odds associated with each match are precisely pre-established by bookmakers, so that the expected return for a bet on a specific event HAD for a given match in a given matchweek is of the form:

E (R_{i}) = (P_{i} * \cdot C_{i}) - 1

(3)

E (R_{i}) = (P_{i} * \cdot \frac{1}{P_{i}}) - 1

(4)

Where $E (R_{i})$ corresponds to the expected return of each bet $i \in {Home = 1,$ $Away = 2, Draw = 3}$ , $P_{i}^{*}$ is the real unbiased probability of occurrence of the event i, which is not directly observable by the market. $C_{i}$ corresponds to the odds that the market pays associated with the occurrence of the event i. $P_{i}$ is the probability associated to a given event i assigned by the bookmaker, if $P_{i} = P_{i}^{*}$ , then the expected return on a bet will be $E (R_{i}) = 0$ , but in the case that $P_{i} < P_{i}^{*}$ , it must be fulfilled that $P_{i} + ε_{i} = P_{i}^{*}$ with $ε_{i} > 0$ , therefore, the expected return should comply with $E (R_{i}) = ε_{i} \cdot C_{i} > 0$ . Thus, in the case that the market misestimates the probabilities of any of the possible i events, the expected return for that event will be different from zero. In particular, if the bookmaker underestimates the probability of an event, then the associated expected return of betting on that event will be greater than zero.

Therefore, it should be possible to obtain positive and significant returns if a betting strategy can be created to identify undervalued states for certain matches. The above is true assuming that the market, and therefore bookmakers underestimate certain events in specific matches. Based on this logic, we will establish a betting strategy based on estimated probabilities, determined on the basis of information available to all punters on the date each match is played. With this strategy, we expect to obtain significant (positive) abnormal returns, such that:

R_{i j h a} = C_{i j h a} - 1

(5)

Where $R_{i j h a}$ corresponds to the return of the event i on matchweek j in the confrontation of home team h with away team a. $C_{i j h a} | k$ corresponds to the odds paid by event i on matchweek j in the confrontation of home team h with away team a, conditional on the occurrence of the random event k ∈ {1, 2, 3}. Then, it must be true that $C_{i j h a} > 1$ if $i = k$ and $C_{i j h a} = 0$ otherwise.

It is necessary to estimate the probabilities of occurrence of events, for which the following hierarchical multiple-choice model is given:

Pr (Y_{i j h a} = k | X) = Φ (β_{0} + β_{1} Δ S_{i j h a} + β_{2} Δ G_{i j h a} + β_{2} Δ G_{i j h a} W) \forall k \in {1, 2, 3}

(6)

Where Y represents the occurrence of the event i (outcome of the match), so that it will take value 1 if the home team is defeated, value 2 if there is a draw and value 3 when the home team is victorious. Since in each match one must necessarily be the home team and the other the away team, we will take the differences of the explanatory variables on the performance of the teams (Chumacero, 2009; Konning, 2000). Thus $Δ S_{i j h a}$ is the difference in cumulative points between the home team playing in home condition, and away team playing in away condition before the matchweek j, $Δ G_{i j h a}$ represents the difference in cumulative goals between the home team playing in home condition and away team in away condition, before the matchweek j, and finally $Δ W_{i j h a}$ is the difference in cumulative winning streaks between the home team playing in home condition and away team playing in away condition, before the matchweek j. Another possible approach could be to use the difference in the standings as a variable measuring the past performance of the teams, as mentioned in Lasek et al. (2013) and Kausel et al. (2019). However, the use of the previously mentioned variables is considered sufficient to capture the effect of performance and obtain well-determined probabilities.

Once the probabilities $P^{*}$ have been estimated by the ordered Logit estimation method, which we will call ${\hat{P}}^{*}$ , we define the parameters $0 < α < 1$ and $0 < γ_{i} < 1$ , so that we have the standard deviation of the estimated HAD probabilities for a matchweek j between the home team h and the away team a, $σ_{j h a}$ . The betting strategy is conditional on the bet being placed, so that the first condition for betting is defined as follows:

Condition 1.

If $σ_{j h a} > α$ , then the matchweek j between the home team h and the away team a is a match with a clear preference for one team, so it can be bet on, otherwise, no bet is placed on the match.

If the above condition is met and the bet is placed, then the strategy is governed by the following conditions.

Condition 2.

If ${\hat{P}}_{Y = 1}^{*} > γ_{1}$ an amount $ω$ is bet on event 1 (home team wins) in matchweek j between the home team h and the visiting team a.

Condition 3.

If ${\hat{P}}_{Y = 2}^{*} > γ_{2}$ an amount $ω$ is bet on event 2 (home team loses) in matchweek j between the home team h and the visiting team a.

Condition 4.

If ${\hat{P}}_{Y = 3}^{*} > γ_{3}$ an amount $ω$ is bet on event 3 (draw) in matchweek j between the home team h and the visiting team a.

Conditions from 2 to 4 are placed to ensure that extremely risky bets are not made. Finally, condition 5 is to decide which event should be bet on. It is based on the conceptualization made in equation (4).

Condition 5.

The strategy of the $i$ -bet will be subject to:

{\hat{P}}_{Y} * - P_{i} = m a x {{\hat{P}}_{Y = 1} * - P_{1}, {\hat{P}}_{Y = 2} * - P_{2}, {\hat{P}}_{Y = 3} * - P_{3}}

(7)

These conditions ensure to bet on the events whose estimated probabilities $P^{*}$ are as far (positively) as possible from the bookmaker probabilities P. On the other hand, to generate a betting strategy simulating a rational punter, it is necessary to initially establish the average return of the betting set in order to create an objective function, which is described in equation (8).

E (R) = θ_{R} = \frac{1}{n} \sum_{i}^{I} \sum_{j}^{J} \sum_{h}^{H} \sum_{a}^{A} R_{i j h a} \cdot ϕ_{i j h a}

(8)

Equation (8) depicts the expected return of the rational betting strategy, whereby $ϕ_{i j h a}$ is a parameter that takes the value 1 if a bet is placed on the event i, in matchweek j, for the home team h and the away team a, and it takes the value 0 if no bet is placed, while n is the total number of bets.

Having defined the parameters ( $α$ and $γ_{i}$ ) and the returns function of a betting set, it is necessary to find the values for those parameters. Under the financial assumption that a rational punter not only wants to obtain highly positive expected returns, but actually want to obtain positive and statistically significant expected returns in order to consistently beat the betting market, I have defined the t -statistic of the returns as the objective function that the rational punter attempts to maximize (equation (9)). In addition, the parameters $α$ and $γ_{i}$ are obtained from an optimization process, where the aim is to maximize the t statistic for the average return of the betting strategy, which is described as:

t = \frac{θ_{R}}{\frac{σ}{\sqrt{n}}}

(9)

Where $σ$ is the estimated standard deviation of the returns obtained with the betting strategy up to matchweek 27. Then, if $t > τ$ , where $τ$ corresponds to the comparison statistic obtained directly from the t-Student distribution, the returns obtained using the simulated strategy of the bettors are indeed positive and significant. It is necessary to keep in mind that, since we are trying to generate a betting strategy by simulating a bettor, the average return and the associated statistic to perform the maximization will be calculated considering the data up to matchweek 27, since that would be the information available for an individual who wishes to bet from round 28 onwards.

Finally, the values for parameters $α$ and $γ_{i}$ were obtained by maximizing t (equation (9)). Given the nonlinear nature of the problem, I used the Newton-Raphson method to get a solution, which is not necessarily a global solution, however, as long as it provides significant returns, it should be sufficient condition. Values of the parameters by season can be seen in Table 7.

Table 7.

Estimated Parameters by Season.

Parameter	2017–2018	2018–2019	2019–2020	2020–2021	2021–2022	2022–2023	2023–2024
$α$	0.0505	0.0274	0.0403	0.0020	0.0093	0.0026	0.0386
$γ_{1}$	0.5268	0.0337	0.1402	0.3477	0.3164	0.3841	0.4335
$γ_{2}$	0.4106	0.3455	0.3111	0.3701	0.3639	0.4763	0.4796
$γ_{3}$	0.6358	0.3421	0.4155	0.7267	0.6401	0.7823	0.1750

Source: Own elaboration.

Results

From the estimates of the Ordered Logit method on the model described in equation (6), I established the HAD probabilities that the player should select in accordance with the objective function, which we can contrast with the probabilities (bets) occurring in the market (the odds of the bookmaker). An illustration of these probabilities using season 2017–2018 as example can be seen in Figure 1, which shows that in the case of the probabilities associated with the victory of the home team, the market tends to assign more extreme probabilities (close to 10 and 90%) if we compare it with the estimates obtained by the model, where such probabilities tend to accumulate in an average close to 35%. On the side of home losses (or away wins), we see that the frequency of the probabilities assigned by the market presents a more extreme value if we compare it with the probabilities of the model. It can be observed that the probabilities of the market distribution tend to be more on the right side with respect to the model, i.e., the market assigns higher probabilities to home defeats than the model estimates. As for draws, the market assigns lower probabilities (lower than 30%) with a higher frequency if we compare it with the probability determined by the model, where the probabilities are more concentrated in the 30% to 40% range, with some even higher than 45%. This is in accordance with what was observed in the Markov process described initially.

Figure 1.

Estimated prob vs market prob. Source: Own Elaboration.

Regarding the betting strategy, I assigned the home win, away or draw condition according to the five conditions defined above, whose performance can be seen in Table 8. In the case of the model's predictions for season 2017–2018, there are 17 blank bets out of a total of 110 possible ones (last 11 matchweeks), which means that the model did not bet on those matches given the high risk they represented according to the five conditions. As for the performance itself, it is observed that the model has a capacity to correctly predict the home team's victories with a precision that ranges from 63.6% (season 2017–2018) to 94.4% (season 2018–2019). Likewise, for the losses of the home team (away), the model has a precision ranging from 0% to 55.6% (season 2017–2018). On the other hand, the lowest precision is present in the case of draws, where the precision ranges from 0% to 50%, equivalent to tossing a coin in a 50/50 game. The percentage correctly classified is far from impressive, which ranges from 50.9 to 58.1%.

Table 8.

Performance Predictive Model by Season.

Season	Class	Specificity	Precision	Sensitivity	F1-Score
2017–2018	Home	61.9%	63.6%	54.9%	58.9%
	Away	76.50%	55.60%	35.70%	43.50%
	Draw	79.6%	50.0%	28.2%	36.1%
	Correctly Classified		58.1%
2018–2019	Home	81.3%	94.4%	57.3%	71.3%
	Away	79.00%	0.00%	0.00%	-
	Draw	70.8%	38.2%	39.4%	38.8%
	Correctly Classified		51.2%
2019–2020	Home	72.7%	94.1%	54.5%	69.1%
	Away	71.80%	0.00%	0.00%	-
	Draw	72.7%	30.8%	24.2%	27.1%
	Correctly Classified		56.6%
2020–2021	Home	64.4%	64.4%	45.3%	53.2%
	Away	83.30%	44.40%	16.30%	23.90%
	Draw	59.7%	45.7%	44.7%	45.2%
	Correctly Classified		53.2%
2021–2022	Home	66.7%	86.8%	53.5%	66.2%
	Away	76.70%	19.00%	11.80%	14.50%
	Draw	68.5%	30.3%	29.4%	29.9%
	Correctly Classified		56.1%
2022–2023	Home	60.0%	93.0%	56.4%	70.2%
	Away	77.10%	23.80%	14.70%	18.20%
	Draw	69.9%	3.8%	4.8%	4.3%
	Correctly Classified		56.7%
2024–2024	Home	59.1%	82.7%	48.9%	61.4%
	Away	74.10%	46.40%	25.00%	32.50%
	Draw	65.1%	0.0%	0.0%	-
	Correctly Classified		50.9%

Source: Own elaboration.

Based on these results we determine the returns obtained following the previously defined betting strategy. The results obtained are described in Table 9, which shows that the cumulative average returns are positive for all seasons and all time frames considered with 90% of confidence, except for the time frame between matchweek 28 and matchweek 38 during season 2023–2024. However, the significance tends to decrease as we approach the end of the season. These results demonstrate that, regardless of the relatively poor predictive power of the betting strategy, the major capacity to generate significant returns lies in the capacity to identify matches with odds that are poorly determined by the market, in accordance with the idea proposed in equation (4). In short, the significant returns do not stem from predicting correctly more matches, but instead from predicting smarter than the market.

Table 9.

Cumulative Expected Return for Time Frames and Seasons.

Season	Matchweek	N°Bets	SD	Return	SE	t-Stat	P-Value
2017–2018	28–32	41	1.343	64.15%	0.210	3.059	0.002
	28–33	50	1.454	62.78%	0.206	3.052	0.002
	28–34	58	1.446	57.69%	0.190	3.038	0.002
	28–35	67	1.433	45.09%	0.175	2.575	0.006
	28–36	76	1.393	34.92%	0.160	2.186	0.016
	28–37	84	1.383	31.17%	0.151	2.065	0.021
	28–38	93	1.348	24.56%	0.140	1.758	0.041
2018–2019	28–32	47	1.053	38.7%	0.154	2.522	0.008
	28–33	57	1.155	39.6%	0.153	2.586	0.006
	28–34	67	1.118	40.7%	0.137	2.982	0.002
	28–35	76	1.131	31.4%	0.130	2.424	0.009
	28–36	86	1.141	23.4%	0.123	1.901	0.030
	28–37	95	1.150	24.3%	0.118	2.063	0.021
	28–38	105	1.152	18.3%	0.112	1.629	0.053
2019–2020	28–32	44	1.715	36.5%	0.259	1.413	0.082
	28–33	54	1.871	41.2%	0.255	1.617	0.056
	28–34	63	1.792	36.2%	0.226	1.601	0.057
	28–35	72	1.852	46.9%	0.218	2.147	0.018
	28–36	81	1.792	45.2%	0.199	2.271	0.013
	28–37	91	1.728	42.6%	0.181	2.351	0.010
	28–38	99	1.687	35.9%	0.170	2.120	0.018
2020–2021	28–32	49	1.239	16.9%	0.177	0.954	0.173
	28–33	59	1.336	22.3%	0.174	1.283	0.102
	28–34	69	1.342	25.4%	0.162	1.571	0.060
	28–35	79	1.395	33.6%	0.157	2.142	0.018
	28–36	89	1.376	31.2%	0.146	2.137	0.018
	28–37	99	1.355	21.5%	0.136	1.581	0.059
	28–38	109	1.338	23.0%	0.128	1.798	0.038
2021–2022	28–32	49	1.333	45.5%	0.190	2.391	0.010
	28–33	59	1.297	35.4%	0.169	2.097	0.020
	28–34	69	1.287	30.8%	0.155	1.986	0.026
	28–35	79	1.258	23.1%	0.142	1.628	0.054
	28–36	88	1.229	17.3%	0.131	1.317	0.096
	28–37	97	1.322	23.6%	0.134	1.757	0.041
	28–38	107	1.293	21.1%	0.125	1.686	0.047
2022–2023	28–32	47	1.446	11.9%	0.211	0.566	0.287
	28–33	56	1.370	5.4%	0.183	0.297	0.384
	28–34	66	1.285	7.3%	0.158	0.464	0.322
	28–35	76	1.334	18.1%	0.153	1.182	0.121
	28–36	86	1.299	12.7%	0.140	0.908	0.183
	28–37	96	1.379	21.2%	0.141	1.508	0.067
	28–38	104	1.376	26.9%	0.135	1.991	0.025
2023–2024	28–32	50	1.549	55.2%	0.219	2.520	0.008
	28–33	60	1.502	38.7%	0.194	1.994	0.025
	28–34	70	1.512	35.6%	0.181	1.970	0.026
	28–35	80	1.479	35.5%	0.165	2.146	0.017
	28–36	90	1.435	26.7%	0.151	1.764	0.041
	28–37	100	1.409	19.9%	0.141	1.414	0.080
	28–38	110	1.371	13.5%	0.131	1.030	0.153

Source: Own elaboration.

The consistent decrease in the significance of the expected return may be because the model becomes less accurate as we approach the last days of the season. This can possibly be explained by variables or factors that are beyond the scope of the model and that manifest themselves in the last few rounds. For example, a mid-table team may be fighting to qualify for European competition or to avoid relegation when there are still several rounds left to play. However, as the end of the season approaches, there are fewer points in contention, so that some teams already have enough points to avoid relegation, but not enough points to qualify for European competition. Thus, these teams will not be motivated to play in the last few rounds. In contrast, the teams that are more on the extremes have greater incentives to play, since those at the bottom of the standings would seek not to be relegated to the second division, while those at the top would seek to access European competition positions (Champions League or UEFA Europa League) or even fight to be champions. We must also consider that the bettors (investors) have formed emotions (faith) for the teams, which would generate hunches that could go against the strategies of the teams, causing discrepancies with the facts. This may cause the model to misdetermine the probabilities of success, undervaluing the teams at the extremes of the standings and overvaluing those in the middle that do not really have strong incentives to play.

Our findings shed light that punters are not estimating match probabilities adequately. The abnormal returns obtained show that it is possible to beat the market using only public information (historical data), so we would be in the presence of a market in which the EMH is not fulfilled, even in its semi-strong form. This is even more surprising if we consider that the estimates and results presented correspond only to the English league, which is by far the deepest betting market (the most popular), so that its prices should theoretically be better allocated than in other betting markets.

Another interesting finding that can be deduced from our evidence is that, although the model has a lower accuracy in predicting draws (see Table 10), nevertheless, within the set of bets provided by the strategy it is the one that delivers the highest expected returns in most cases and, in particular if we consider the total bets (matchweeks 28–38). This apparent draw bias is consistent with the results obtained by Pope and Peel (1989) and more recently by Deschamps and Gergaud (2007), who suggest that the market would indeed undervalue the probability of a draw. This is evidence that individuals might be engaging in irrational (hunch) biases, where draws would be considered “boring” and in general punters would prefer to bet on the victory of one of the two teams, even if this leads to suboptimal bets, fear of the draw, or that punters use mistaken heuristics to assess the odds. The same applies to the away condition, which also delivers positive and significant returns in several seasons. Not very surprisingly, there is only one season where betting to the home teams delivers the highest return, which is the season 2019–2020. This result is in line with previous findings that have found that the market has problem adjusting home team odds because of the distortion of the home advantage during the COVID-19 outbreak (Fischer & Haucap, 2021; Winkelmann et al., 2021).

Table 10.

Cumulative Expected Return and Standard Deviation by Event and Season.

		Home		Away		Draw
Season	Matchweek	Return	SD	Return	SD	Return	SD
2017–2018	28–32	66.88%	1.079	79.89%	0.846	52.56%	1.798
	28–33	48.50%	1.090	80.00%	0.798	68.45%	1.981
	28–34	41.65%	1.144	55.07%	0.934	77.00%	1.971
	28–35	25.79%	1.126	44.73%	0.985	67.83%	1.928
	28–36	21.87%	1.092	28.84%	0.981	54.92%	1.905
	28–37	19.53%	1.127	22.40%	0.997	54.92%	1.905
	28–38	20.85%	1.086	18.23%	0.987	34.27%	1.848
2018–2019	28–32	42.79%	1.061	21.56%	1.059	-	-
	28–33	41.98%	1.175	30.50%	1.120	-	-
	28–34	40.81%	1.151	40.40%	1.030	-	-
	28–35	29.00%	1.171	40.63%	0.995	-	-
	28–36	21.19%	1.175	32.35%	1.022	-	-
	28–37	24.19%	1.181	25.00%	1.039	-	-
	28–38	14.92%	1.171	33.68%	1.078	-	-
2019–2020	28–32	46.1%	1.778	−24.17%	1.180	-	-
	28–33	49.3%	1.933	−24.17%	1.180	-	-
	28–34	47.7%	1.889	−25.20%	1.007	-	-
	28–35	58.5%	1.935	−25.20%	1.007	-	-
	28–36	57.7%	1.877	−26.33%	0.956	-	-
	28–37	52.7%	1.829	−4.88%	1.057	-	-
	28–38	43.8%	1.777	−4.88%	1.057	-	-
2020–2021	28–32	−3.13%	0.863	24.50%	1.320	56.67%	1.869
	28–33	−11.96%	0.859	41.22%	1.376	70.00%	1.927
	28–34	−11.81%	0.839	47.26%	1.406	68.67%	1.884
	28–35	−8.34%	0.835	72.04%	1.545	58.13%	1.868
	28–36	−6.37%	0.839	69.26%	1.503	43.50%	1.817
	28–37	−8.95%	0.846	63.97%	1.508	19.58%	1.739
	28–38	−3.44%	0.818	73.74%	1.515	6.30%	1.680
2021–2022	28–32	35.72%	1.233	74.50%	1.253	66.67%	2.887
	28–33	30.55%	1.192	45.42%	1.321	66.67%	2.887
	28–34	23.02%	1.172	54.00%	1.265	40.00%	2.227
	28–35	15.95%	1.144	43.73%	1.282	40.00%	2.227
	28–36	16.24%	1.113	34.75%	1.289	−6.67%	1.895
	28–37	15.16%	1.168	38.29%	1.257	53.64%	2.187
	28–38	15.78%	1.144	30.61%	1.262	40.83%	2.132
2022–2023	28–32	5.60%	1.390	85.00%	2.616	75.00%	2.475
	28–33	3.68%	1.323	85.00%	2.616	−12.50%	1.750
	28–34	6.07%	1.235	85.00%	2.616	−12.50%	1.750
	28–35	13.54%	1.254	85.00%	2.616	54.00%	2.123
	28–36	11.08%	1.225	85.00%	2.616	10.00%	1.889
	28–37	14.41%	1.276	85.00%	2.616	71.44%	2.054
	28–38	18.83%	1.269	85.00%	2.616	89.30%	2.017
2023–2024	28–32	5.61%	0.999	-	-	151.47%	1.964
	28–33	1.13%	0.992	-	-	113.75%	2.024
	28–34	0.34%	0.987	-	-	107.61%	2.079
	28–35	3.69%	0.993	-	-	105.40%	2.059
	28–36	2.72%	0.967	-	-	77.07%	2.038
	28–37	−0.62%	0.977	-	-	65.65%	2.018
	28–38	−2.07%	0.953	-	-	46.71%	1.969

Source: Own elaboration.

Robustness Check

In order to guarantee that the results obtained above are not just a one-tournament thing, a robustness check is necessary. To do so, data from the season 2024–2025 (last available season) was collected, not only for the EPL, but for the four major football leagues in the world (Hvattum, 2013), EPL, LaLiga (Spain), Serie A (Italy) and Bundesliga (Germany). The underlaying idea is to replicate the results obtained from using the EPL (2017–2018), whereby if findings are consistent through time and across leagues, then we can have more solid evidence of the existence of biases in football betting markets. Thus, the exact same procedure described above was conducted for these four mentioned leagues (season 2024–2025). First, the Bayesian analysis was re-run to identify deviations by comparing the odds from the markets (bookmakers) with the Steady-state Probability provided by the Markov process analysis.

The results in Table 11 show again that, in the case of the observed probabilities (the realized odds), none of the probabilities associated with any state HAD were outside of the confidence intervals, so it can be said that there is no significant difference between the observed probabilities and the steady-state probabilities. This implies that the steady state probabilities would be a good estimate of the observed probabilities. However, in the case of the market probabilities (bookmakers), it can be observed that the mean for the “Home” scenario is consistently higher for 3 out of 4 competitions compared to the steady-state probabilities since the mean is higher than the upper bound of the confidence interval at least with a confidence of 10%. The only competition that does not comply with this is LaLiga. The above suggests that the market consistently overrates the likelihood of winning for home teams, and, since the probabilities are interdependent, it also implies that the market is either undervaluing the likelihood of draws, visitor teams (away) or both.

Table 11.

Probability Comparison for the Four Major League Competitions.

Competition		Observed Probability	Market Probability	Steady-State Probability
Competition		Mean	Mean	Mean	90% CI	95% CI	99% CI
EPL	Home	0.396	0.444	0.393	0.367–0.419	0.360–0.427	0.349–0.440
	Away	0.352	0.324	0.353	0.328–0.380	0.320–0.387	0.307–0.400
	Draw	0.252	0.233	0.254	0.230–0.277	0.224–0.284	0.211–0.296
LaLiga	Home	0.452	0.440	0.454	0.427–0.480	0.418–0.488	0.404–0.503
	Away	0.294	0.294	0.281	0.257–0.306	0.249–0.314	0.236–0.326
	Draw	0.266	0.266	0.265	0.240–0.290	0.232–0.297	0.219–0.309
Serie A	Home	0.388	0.422	0.389	0.362–0.417	0.355–0.425	0.339–0.441
	Away	0.316	0.313	0.318	0.290–0.347	0.282–0.354	0.268–0.371
	Draw	0.296	0.266	0.292	0.266–0.318	0.260–0.325	0.246–0.341
Bundesliga	Home	0.413	0.455	0.416	0.388–0.444	0.380–0.451	0.364–0.465
	Away	0.347	0.311	0.344	0.315–0.373	0.306–0.382	0.290–0.398
	Draw	0.240	0.234	0.240	0.214–0.266	0.207–0.272	0.194–0.286

Source: Own elaboration.

In this respect, from Table 11, it can be observed that in the EPL, the mean of the market for the Away condition is significantly smaller compared to that for the steady-state probability since the mean is lower than the lower bound of the confidence interval at 90%. In the case of the Draw condition, it is similar, even though the mean of the market is slightly higher than the lower bound of the confidence interval at 90%. For the Serie A, the mean of the market for the Draw condition is significantly smaller compared to that for the steady-state probability since the mean is lower than the lower bound of the confidence interval at 90%, whereas for the Bundesliga, the situation is the same but for the Away condition. Regardless of the confidence interval, we also can confirm that the market underrates the likelihood of away teams and draws by noticing that the means of the market are consistently smaller for the three mentioned competitions for these conditions (except for LaLiga).

Afterwards, the estimation of the parameters for the betting model was conducted by repeating the optimization process described in Section 6. The parameters for each tournament (season 2024–2025) can be found in Table 12. Using these parameters, the prediction for the last matchweeks of each league was conducted, starting from matchweek 8, in the case of ELP, LaLiga and Serie A, it corresponds to ten weeks (from matchweek 28 to 38), whereas for Bundestliga, it corresponds only to 6 matchweeks because this tournament holds only 18 teams instead of 20. The predictive performance for each league can be seen in Table 13.

Table 12.

Estimated Parameters for Each Competition.

Parameter	EPL	LaLiga	Serie A	Bundesliga
$α$	0.105	0.001	0.000	0.071
$γ_{1}$	0.499	0.654	0.567	0.742
$γ_{2}$	0.842	0.232	0.332	0.390
$γ_{3}$	0.190	0.011	0.026	0.026

Source: Own elaboration.

Table 13.

Performance Predictive Model.

Competition	Class	Specificity	Precision	Sensitivity	F1-Score
EPL	Home	61.1%	61.1%	44.0%	51.2%
	Away	64.5%	15.4%	16.7%	16.0%
	Draw	81.3%	75.0%	33.3%	46.2%
	Correctly Classified		34.9%
LaLiga	Home	55.3%	22.7%	29.4%	25.6%
	Away	76.5%	90.7%	41.9%	57.4%
	Draw	71.0%	13.0%	7.3%	9.4%
	Correctly Classified		40.0%
Serie A	Home	68.6%	61.9%	44.1%	51.5%
	Away	76.1%	70.3%	40.6%	51.5%
	Draw	70.3%	29.0%	25.0%	26.9%
	Correctly Classified		46.9%
Bundesliga	Home	65.0%	-	-	-
	Away	58.3%	54.5%	40.0%	46.2%
	Draw	75.0%	77.8%	36.8%	50.0%
	Correctly Classified		21.5%

Source: Own elaboration.

Overall, results in Table 13 show that the percentage of correctly classified matches is far from impressive compared to the likelihood of tossing a coin, which in this case is 33. $\bar{3}$ % (because there are 3 scenarios), where only have a percentage of correctly classified higher than 40% and with the Bundesliga being even lower than 33. $\bar{3}$ %, with 21.5%. Also, it can be noticed that the Bundesliga does not have any bet on home teams in the time frame analyzed, explaining that there is no value for most of the performance indicators for this category.

The results in Table 14 show that average cumulative returns from the betting strategy are positive for all time frames and competitions, the Bundesliga being the most profitable, or in other words, the most inefficient in financial terms with an accumulative return of 76.9% by the end of the season. Also, in all cases, the cumulative returns are statistically significant at 10% to the very least and even at 5% in most cases. These findings confirm the results found initially only in the EPL for season 2017–2018. Furthermore, results in Table 15 confirmed the existence of biases in the probability assigned to draws and away teams, where the average cumulative return for the away scenario is bigger in the EPL and slightly bigger in the Bundesliga, whereas the average cumulative return is bigger for draws in the case of Laliga and the Serie A.

Table 14.

(a) Cumulative Expected Return Strategy for Different Time Frames (EPL).

Matchweek	N°Bets	SD	Return	SE	t-Stat	P-Value
28–32	20	2.055	72.4%	0.460	1.574	0.066
28–33	23	2.053	68.5%	0.428	1.600	0.062
28–34	30	1.976	53.7%	0.361	1.490	0.074
28–35	33	1.960	56.7%	0.341	1.661	0.053
28–36	36	2.152	64.3%	0.359	1.792	0.041
28–37	39	2.098	55.3%	0.336	1.647	0.054
28–38	43	2.440	68.2%	0.372	1.833	0.037

Source: Own elaboration.

Table 14.

(b) Cumulative Expected Return Strategy for Different Time Frames (LaLiga).

Matchweek	N°Bets	SD	Return	SE	t-Stat	P-Value
28–32	50	2.081	82.6%	0.294	2.806	0.004
28–33	60	1.987	59.0%	0.257	2.299	0.013
28–34	70	1.910	45.8%	0.228	2.007	0.024
28–35	80	1.842	35.2%	0.206	1.710	0.046
28–36	90	1.789	30.4%	0.189	1.612	0.055
28–37	100	1.768	34.2%	0.177	1.934	0.028
28–38	110	1.717	28.7%	0.164	1.752	0.041

Source: Own elaboration.

Table 14.

Matchweek	N°Bets	SD	Return	SE	t-Stat	P-Value
28–32	50	1.396	39.0%	0.197	1.977	0.027
28–33	60	1.359	34.2%	0.175	1.949	0.028
28–34	70	1.332	31.8%	0.159	1.998	0.025
28–35	80	1.310	28.3%	0.146	1.929	0.029
28–36	90	1.290	17.9%	0.136	1.314	0.096
28–37	100	1.340	23.4%	0.134	1.745	0.042
28–38	110	1.347	25.4%	0.128	1.975	0.025

Source: Own elaboration.

Table 14.

(d) Cumulative Expected Return Strategy for Different Time Frames (Bundesliga).

Matchweek	N°Bets	SD	Return	SE	t-Stat	P-Value
28–32	39	2.082	99.3%	0.333	2.978	0.003
28–33	46	2.047	83.0%	0.302	2.749	0.004
28–34	54	1.990	76.9%	0.271	2.838	0.003

Source: Own elaboration.

Table 15.

(a) Cumulative Expected Return and Standard Deviation by Event (EPL).

Matchweek	Home		Away		Draw
Matchweek	Return	S.D.	Return	SD	Return	SD
28–32	9.0%	0.496	318.0%	-	88.8%	2.516
28–33	9.0%	0.496	318.0%	-	79.6%	2.425
28–34	13.8%	0.445	109.0%	2.956	66.8%	2.363
28–35	15.7%	0.424	109.0%	2.956	71.2%	2.345
28–36	5.2%	0.532	109.0%	2.956	88.7%	2.566
28–37	8.3%	0.519	109.0%	2.956	73.6%	2.512
28–38	2.4%	0.565	386.7%	5.244	66.9%	2.485

Source: Own elaboration.

Table 15.

(b) Cumulative Expected Return and Standard Deviation by Event (LaLiga).

Matchweek	Home		Away		Draw
Matchweek	Return	SD	Return	SD	Return	SD
28–32	0.0%	0.568	82.3%	2.188	188.5%	1.964
28–33	6.7%	0.479	59.1%	2.108	130.8%	2.135
28–34	11.3%	0.462	45.8%	2.024	92.3%	2.129
28–35	21.2%	0.476	34.1%	1.964	64.9%	2.075
28–36	21.2%	0.476	28.4%	1.885	64.9%	2.075
28–37	10.2%	0.581	34.8%	1.856	64.9%	2.075
28–38	12.2%	0.558	28.1%	1.798	64.9%	2.075

Source: Own elaboration.

Table 15.

Matchweek	Home		Away		Draw
Matchweek	Return	S.D.	Return	SD	Return	SD
28–32	10.0%	0.809	12.2%	1.406	150.5%	1.783
28–33	12.4%	0.794	17.8%	1.354	108.8%	1.884
28–34	7.1%	0.812	23.6%	1.382	108.8%	1.884
28–35	3.6%	0.816	33.7%	1.369	67.0%	1.881
28–36	−0.4%	0.828	17.9%	1.339	56.6%	1.864
28–37	11.1%	0.908	13.7%	1.318	66.9%	1.925
28–38	3.6%	0.899	27.1%	1.374	66.9%	1.925

Source: Own elaboration.

Table 15.

(d) Cumulative Expected Return and Standard Deviation by Event (Bundesliga).

Matchweek	Home		Away		Draw
Matchweek	Return	SD	Return	SD	Return	SD
28–32	-	-	75.2%	2.025	112.7%	2.142
28–33	-	-	57.1%	1.921	98.1%	2.136
28–34	-	-	81.2%	1.849	74.1%	2.103

Source: Own elaboration.

Discussion

Overall, the results from the analysis of the EPL (from 2017–2018 to 2023–2024) and the replication for the four major European competitions in the season 2024–2025, confirm the existence of biases that lead to profit opportunities in these markets. In essence, betting markets should comply with many properties of financial markets, as they are based on information on the performance of teams (Ottaviani & Sørenson, 2009; Thaler & Ziemba, 1988; Williams, 1999) that is publicly available and there is a huge number of transactions that, in theory, should prevent individuals from any long-term profit opportunity.

The result of draws and away conditions being, on average more profitable for a punter is an indicator of the possible presence of the biases mentioned above. Specifically, the profitability of the away condition for the four leagues suggests that punters use the home condition as a mistaken heuristic (Tversky & Kahneman, 1973) when they set the probabilities, however, we can notice that in the case of the season 2017–2018 the away condition was the least profitable, meaning that in that case punters used the away condition as the mistaken heuristic. The reason behind this change of behavior can be explained by a change in the behavior of the competition itself. For instance, for the season 2017–2018 there were 100 away wins out of 360 games, whereas in the season 2024–2025 there were 124 away wins out of 360 games. The above suggests that in the first case, the away condition was overestimated by the market, this being the mistaken heuristic, while in the most recent season, given the change in the behavior of the tournament, the home condition worked as the mistaken heuristic (164 in 2017–2018 vs 146 in 2024–2025).

With respect to the draw condition, it has been proved to be consistently more profitable than the home condition and more profitable than the away condition in half of the cases. The above suggests that punters are reluctant to bet on draws because they consider it not appealing from a consumption point of view, being consistent with the idea of betting on markets as consumers who look for entertainment to certain extent (Humphreys et al., 2013; Mao et al., 2015; Stetzka & Winter, 2023). The above should be particularly true for individuals that are more casual bettors. On the other hand, punters that are interested in generating profit can be more reluctant to bet on draws because it generates more negative emotions (Hsee et al., 2001) on average through the game, because a draw is a result that can change at any instant during the development of the game. Also, the mistaken heuristics may make punters underestimate the probability of draws because it is the least common result across all competitions.

The results of the model indicate that it is feasible to systematically achieve positive returns within the football betting market. Nevertheless, beyond the behavioral biases that shape punters’ decisions, even those who attempt to wager according to ostensibly “rational” principles face inherent cognitive constraints that limit their ability to process all relevant information (bounded rationality) (Simon, 1955). Although certain “expert” punters may possess extensive knowledge regarding the performance of numerous teams, it remains improbable that they can sustain a sufficiently comprehensive understanding of enough teams each week to effectively diversify their bets and mitigate the stochastic nature of football outcomes. However, because individuals can account for a variety of unobservable factors, such as playing style, psychological dynamics, and team weaknesses, they may be better positioned to detect isolated mispriced matches with greater accuracy than algorithmic models. Accordingly, the optimal betting framework would integrate both computer-based predictions, which provide statistical regularity and diversification through the law of large numbers, and human expert judgment, which introduces contextual insight and nuanced interpretation of variables that are difficult to operationalize within computational models.

For financial markets, it is quite unlikely that biases such as the consumption effect are present, considering that financial markets do not offer entertainment and are purely thought of as investment. Nonetheless, it might be the case that some biases arise because of heuristics or the emotional influence. For instance, Chu et al. (2014) studied the influence of loss affect on investors and how the capacity of expert investors to cope with the negative emotions of losses allows them to achieve a better financial performance.

In this respect, these biases may have a role in the short run in financial markets and can explain, at least in part, some unexplained phenomena like over reactions of the market or the overpricing that has been widely documented to happen during IPOs (Allen & Faulhaber, 1989; Ibbotson, 1975). For instance, Lezana et al. (2024) found that unicorns tend to have higher underpricing compared to non-unicorn firms when they open to the stock market. In this scenario, the unicorn label may act in the same way as the fame and reputation of some teams in betting markets. On the other hand, unicorn firms, whose values rely heavily on intangible assets by their nature, are also riskier for investors, which can lead to emotions like fear. This behavioral explanation is consistent with the idea posed by recent research, arguing that some financial phenomena hold a psychological nature (Adams et al., 2008; Chen, 2021).

Conclusions

Although there is a large literature that seeks to study inefficiencies in sports markets, which in essence should comply with the same principles as financial markets, few studies have focused their attention on the possible biases that exist on the part of agents within the financial market when using the available information to make decisions with uncertainty. Such biases would be associated with hunches (favoritism), heuristics, or perceptions of the utility generated by the bet itself. Thus, it is considered that sports betting markets, unlike financial markets, could be significantly influenced by agents’ beliefs and emotions.

After analyzing the available information, we found evidence of the existence of bias on the part of punters (investors) in the football (soccer) betting market, which seems to be associated with their heuristic evaluations, and influenced by emotional grounds (not very rational) rather than on a logical analysis based on the available information. When we consider betting as a Markov process, conditional on the results obtained in the previous matchweeks, we obtain that punters should have better returns than those observed in practice, which seem to concentrate on the away or draw, causing inefficiency in this market. This may be also related to what Allais (1953) proposed, where betting on a draw would be a more “boring” outcome than betting on a loss, causing the aggregate market to undervalue the probability of a draw and overvalue the probability of the home team.

This research also shows that by using available information with some procedures such as gambling strategies and known estimation models, it is not only possible to achieve market efficiency, but also to obtain time-consistent positive abnormal returns (beating the bookmaker). These findings also suggest that in the aggregate, draws and away bets potentially yield the highest returns, but also exhibit the most volatility. This leads us to believe that our evidence may have important implications for the study of financial markets and behavioral finance, where, although it has been shown that EMH tends to be met in general, it is still not clear how, under certain abnormal conditions, investors would be able to meet EMH, investors could be putting into practice their hunches (emotions and irrationality), since in the financial market there is not an end of season, but a continuous process, which could attenuate these biases making them less visible in the estimates.

Beyond the realm of sports betting, the implications of this research extend to financial markets, where pricing anomalies and persistent inefficiencies continue to puzzle scholars and practitioners alike. The observed behavioral distortions, such as the longshot bias, emotional aversion, and the impact of salient outcomes, mirror known deviations from rationality in investor behavior. By showing that profitable strategies can be built without informational advantage, this study contributes to the broader discussion on how psychological and structural factors shape market outcomes. These parallels underscore the value of using betting markets as simplified laboratories to test theories of market behavior and to uncover universal principles that govern price formation and decision-making under uncertainty.

Finally, it is necessary to emphasize the limitations of the present work and the pre-established model. On the one hand, we present a static model in which the parameters remain constant over time. Also, these parameters are common for all the teams in the league. It should be noted that the model assumes the independence of the performance of the teams playing at home with respect to their performance playing away, which is an assumption used in works such as Maher (1982) and Dixon and Coles (1997). For its part, the methodology used is limited to controlling for the possible effect of information asymmetries by choosing a league that is as robust as possible in terms of the number of punters and depth, as well as by using only information available to all individuals before each matchweek. Nevertheless, it is expected that the results of this work show sufficiently accurate approximation to reality to reflect the biases that would exist in the market. Thus, it is left for future work to try to find more dynamic models that better adjust to the probabilities that the market should assign given the available information and incorporate additional variables that control for other possible biases.

Footnotes

ORCID iD

Bruce Lezana

Funding

The author received no financial support for the research, authorship, and/or publication of this article.

Declaration of Conflicting Interests

The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Notes

Author Biography

Bruce Lezana is a PhD student in Entrepreneurship at the University of Massachusetts Lowell. His research focuses on risk aversion, uncertainty, and economic inequality, particularly how these factors intersect with entrepreneurship.

Appendix

References

Adams

Thornton

Hall

(2008). IPO pricing phenomena: Empirical evidence of behavioral biases. Journal of Business & Economics Research (JBER), 6(4), 67–74. https://doi.org/10.19030/jber.v6i4.2412

Allais

(1953). Le Comportement de l'Homme Rationnel devant le Risque: Critique des Postulats et Axiomes de l'Ecole Americaine. Econometrica, 21, 503–546. https://doi.org/10.2307/1907921

Allen

Faulhaber

G. R.

(1989). Signalling by underpricing in the IPO market. Journal of Financial Economics, 23(2), 303–323. https://doi.org/10.1016/0304-405X(89)90060-3

Archontakis

Osborne

(2007). Playing it safe? A Fibonacci strategy for soccer betting. Journal of Sports Economics, 8(3), 295–308. https://doi.org/10.1177/1527002506286775

Ball

(1978). Anomalies in relationships between securities’ yields and yield-surrogates. Journal of Financial Economics, 6, 103–126. https://doi.org/10.1016/0304-405X(78)90026-0

Barberis

Shleifer

Vishny

(1998). A model of investor sentiment. Journal of Financial Economics, 49(3), 307–343. https://doi.org/10.1016/S0304-405X(98)00027-0

Boshnakov

Kharrat

McHale

I. G.

(2017). A bivariate Weibull count model for forecasting association football scores. International Journal of Forecasting, 33(2), 458–466. https://doi.org/10.1016/j.ijforecast.2016.11.006

Cain

Law

Peel

(2000). The favorite-longshot bias and market efficiency in UK football betting. Scottish Journal of Political Economy, 47(1), 25–36. https://doi.org/10.1111/1467-9485.00151

Carron

Loughhead

Bray

(2005). The home advantage in sport competitions: Courneya and Carron's (1992) conceptual framework a decade later. Journal of Sports Sciences, 23(4), 395–407. https://doi.org/10.1080/02640410400021542

10.

Chen

(2021). Dynamic survival bias in optimal stopping problems. Journal of Economic Theory, 196, 105286. https://doi.org/10.1016/j.jet.2021.105286

11.

Chu

Lee

E. J.

(2014). Investor expertise as mastery over mind: Regulating loss affect for superior investment performance. Psychology & Marketing, 31(5), 321–334. https://doi.org/10.1002/mar.20697

12.

Chumacero

R. A.

(2009). Altitude or hot air? Journal of Sports Economics, 10(6), 619–638. https://doi.org/10.1177/1527002509336217

13.

Clarke

Norman

(1995). Home ground advantage of individual clubs in English soccer. Journal of the Royal Statistical Society, 44(4), 509–521. https://doi.org/10.2307/2348899

14.

Conlisk

(1993). The utility of gambling. Journal of Risk and Uncertainty, 6, 255–275. https://doi.org/10.1007/BF01072614

15.

Constantinou

Fenton

N. E.

Neil

(2017). Profiting from an inefficient association football gambling market: Prediction, risk and uncertainty using Bayesian networks. Knowledge-Based Systems, 50, 60–86. https://doi.org/10.1016/j.knosys.2013.05.008

16.

Datta

McCormick

W. P.

(1993). Regeneration-based bootstrap for Markov chains. Canadian Journal of Statistics, 21, 181–193. https://doi.org/10.2307/3315810

17.

Deschamps

Gergaud

(2007). Efficiency in betting markets: Evidence from English football. The Journal of Prediction Markets, 1, 61–73. https://doi.org/10.5750/jpm.v1i1.420

18.

Dixon

Coles

(1997). Modeling association football scores and inefficiencies in the football betting market. Statistica Neerlandica, 46(2), 265–280. https://doi.org/10.1111/1467-9876.00065

19.

Dixon

Pope

(2004). The value of statistical forecasts in the UK association football betting market. International Journal of Forecasting, 20, 697–711. https://doi.org/10.1016/j.ijforecast.2003.12.007

20.

Edwards

(1968). Conservatism in human information processing. In Kleinmuntz

(Ed.), Formal representation of human judgment (pp. 17–52). Wiley.

21.

Efron

(1979). Bootstrap methods: Another look at the jackknife. The Annals of Statistics, 7(1), 1–26. https://doi.org/10.1214/aos/1176344552

22.

Fama

(1970). Efficient capital markets: A review of theory and empirical work. The Journal of Finance, 25, 383–417. https://doi.org/10.2307/2325486

23.

Fama

(1998). Market efficiency, long-term returns, and behavioral finance. Journal of Financial Economics, 49, 283–306. https://doi.org/10.1016/S0304-405X(98)00026-9

24.

Fischer

Haucap

(2021). Does crowd support drive the home advantage in professional football? Evidence from German ghost games during the COVID-19 pandemic. Journal of Sports Economics, 22(8), 982–1008. https://doi.org/10.1177/15270025211026552

25.

Forrest

Simmons

(2000a). Forecasting sport: The behavior and performance of football tipsters. International Journal of Forecasting, 16, 317–331. https://doi.org/10.1016/S0169-2070(00)00050-9

26.

Forrest

Simmons

(2000b). Making up the results: The work of the football pools panel, 1963–1997. Journal of the Royal Statistical Society, 49(2), 253–260. https://doi.org/10.1111/1467-9884.00235

27.

Franck

Verbeek

Nüesch

(2011). Sentimental preferences and the organizational regime of betting markets. Southern Economic Journal, 78(2), 502–518. https://doi.org/10.4284/0038-4038-78.2.502

28.

Goddard

Asimakopoulos

(2004). Forecasting football match results and the efficiency of fixed-odds betting. Journal of Forecasting, 23, 51–66. https://doi.org/10.1002/for.877

29.

Hausch

Ziemba

W. T.

Rubinstein

(1981). Efficiency of the market for racetrack betting. Management Science, 27, 1435–1452. https://doi.org/10.1287/mnsc.27.12.1435

30.

Henery

(1985). On the average probability of losing bets on horses with given starting price odds. Journal of the Royal Statistical Society, 148, 342–349. https://doi.org/10.2307/2981894

31.

Hsee

C. K.

Loewenstein

G. F.

Weber

E. U.

Welch

(2001). Risk as feelings. Psychological Bulletin, 127(2), 267–286. https://doi.org/10.1037/0033-2909.127.2.267

32.

Humphreys

B. R.

Paul

R. J.

Weinbach

A. P.

(2013). Consumption benefits and gambling: Evidence from the NCAA basketball betting market. Journal of Economic Psychology, 39, 376–386. https://doi.org/10.1016/j.joep.2013.05.010

33.

Hvattum

L. M.

(2013). Analyzing information efficiency in the betting market for association football league winners. The Journal of Prediction Markets, 7, 55–70. https://doi.org/10.5750/jpm.v7i2.614

34.

Ibbotson

R. G.

(1975). Price performance of common stock new issues. Journal of Financial Economics, 2(3), 235–272. https://doi.org/10.1016/0304-405X(75)90015-X

35.

Jane

W. J.

(2014). The relationship between outcome uncertainties and match attendance: New evidence in the National Basketball Association. Review of Industrial Organization, 45, 177–200. https://doi.org/10.1007/s11151-014-9436-x

36.

Jang

W. Y.

Lee

J. H.

H. C.

(2016). Halo, horn, or dark horse biases: Corporate reputation and the earnings announcement puzzle. Journal of Empirical Finance, 38, 272–289. https://doi.org/10.1016/j.jempfin.2016.07.005

37.

Jensen

(1978). Some anomalous evidence regarding market efficiency. Journal of Financial Economics, 6, 95–101. https://doi.org/10.1016/0304-405X(78)90025-9

38.

Kausel

E. E.

Venture

Rodríguez

(2019). Outcome bias in subjective ratings of performance: Evidence from the (football) field. Journal of Economic Psychology, 75, 102132. https://doi.org/10.1016/j.joep.2018.12.006

39.

Koning

(2000). Balance in competition in Dutch soccer. Journal of the Royal Statistical Society, 49(3), 419–431. https://doi.org/10.1111/1467-9884.00244

40.

Kulperger

R. J.

Prakasa Rao

B. L. S.

(1989). Bootstrapping a finite state Markov chain. Sankhya. Series A. (2008), 51(2), 178–191. https://www.jstor.org/stable/25050735

41.

Kuypers

(2000). Information and efficiency: An empirical study of a fixed odds betting market. Applied Economics, 32, 1353–1363. https://doi.org/10.1080/00036840050151449

42.

Lasek

Szlávik

Bhulai

(2013). The predictive power of ranking systems in association football. International Journal of Applied Pattern Recognition, 1(1), 27–46. https://doi.org/10.1504/IJAPR.2013.052339

43.

Lezana

Guede

Cancino

C. A.

(2024). The real return of unicorns: What do we know?. Technology Analysis & Strategic Management, 37(11), 1–14. https://doi.org/10.1080/09537325.2024.2328150

44.

Maher

(1982). Modelling association football scores. Statistica Neerlandica, 36, 109–118. https://doi.org/10.1111/j.1467-9574.1982.tb00782.x

45.

Mao

L. L.

Zhang

J. J.

Connaughton

D. P.

(2015). Sports gambling as consumption: Evidence from demand for sports lottery. Sport Management Review, 18(3), 436–447. https://doi.org/10.1016/j.smr.2014.11.006

46.

Ottaviani

Sørenson

P. N.

(2009). Noise, information, and the favorite-longshot bias in parimutuel predictions. American Economic Journal: Microeconomics, 2(1), 58–85. https://doi.org/10.1257/mic.2.1.58

47.

Paul

R. J.

Weinbach

A. P.

(2007). The uncertainty of outcome and scoring effects on Nielsen ratings for Monday Night Football. Journal of Economics and Business, 59(3), 199–211. https://doi.org/10.1016/j.jeconbus.2006.05.001

48.

Pollard

(1986). Home advantage in soccer: A retrospective analysis. Journal of Sports Sciences, 4, 237–248. https://doi.org/10.1080/02640418608732122

49.

Pope

Peel

(1989). Information, prices and efficiency in a fixed-odds betting market. Economica, 56, 323–341. https://doi.org/10.2307/2554281

50.

Rue

Salvesen

(2000). Prediction and retrospective analysis of soccer matches in a league. Journal of the Royal Statistical Society, 49(3), 399–418. https://doi.org/10.1111/1467-9884.00243

51.

Salaga

Tainsky

(2015). Betting lines and college football television ratings. Economics Letters, 132, 112–116. https://doi.org/10.1016/j.econlet.2015.04.032

52.

Sauer

R. D.

(2005). The state of research on markets for sports betting and suggested future directions. Journal of Economics and Finance, 29(3), 416–426. https://doi.org/10.1007/BF02761586

53.

Sharpe

(1966). Mutual fund performance. The Journal of Business, 39, 119–138. https://doi.org/10.1086/294846

54.

Simon

H. A.

(1955). A behavioral model of rational choice. The Quarterly Journal of Economics, 69(1), 99–118. https://doi.org/10.2307/1884852

55.

Snowberg

Wolfers

(2010). Explaining the favorite–long shot bias: Is it risk-love or misperceptions? Journal of Political Economy, 118(4), 723–746. https://doi.org/10.1086/655844

56.

Stetzka

R. M.

Winter

(2023). How rational is gambling? Journal of Economic Surveys, 37(4), 1432–1488. https://doi.org/10.1111/joes.12473

57.

Thaler

R. H.

Ziemba

W. T.

(1988). Anomalies: Parimutuel betting markets, racetracks and lotteries. Journal of Economic Perspectives, 2, 74–161. https://doi.org/10.1257/jep.2.2.161

58.

Thorndike

(1920). A constant error in psychological ratings. Journal of Applied Psychology, 4(1), 25–29. https://doi.org/10.1037/h0071663

59.

Tversky

Kahneman

(1973). Availability: A heuristic for judging frequency and probability. Cognitive Psychology, 5(2), 207–232. https://doi.org/10.1016/0010-0285(73)90033-9

60.

Tversky

Kahneman

(1974). Judgment under uncertainty: Heuristics and biases. Science, 185, 1124–1131. https://doi.org/10.1126/science.185.4157.1124

61.

Vlastakis

Dotsis

Markellos

R. N.

(2009). How efficient is the European football betting market? Evidence from arbitrage and trading strategies. Journal of Forecasting, 28(5), 426–444. https://doi.org/10.1002/for.1085

62.

Waters

Lovell

(2002). An examination of the homefield advantage in a professional English soccer team from a psychological standpoint. Football Studies, 5(1), 46–59.

63.

Watts

(1978). Systematic 'abnormal' returns after quarterly earnings announcements. Journal of Financial Economics, 6, 127–150. https://doi.org/10.1016/0304-405X(78)90027-2

64.

Williams

(1999). Information efficiency in betting markets: A survey. Bulletin of Economic Research, 51, 1–39. https://doi.org/10.1111/1467-8586.00069

65.

Williams

Paton

(1997). Why is there a favorite-longshot bias in British racetrack betting markets. The Economic Journal, 107, 150–158. https://doi.org/10.1111/1468-0297.00147

66.

Winkelmann

Deutscher

Ötting

(2021). Bookmakers’ mispricing of the disappeared home advantage in the German Bundesliga after the COVID-19 break. Applied Economics, 53(26), 3054–3064. https://doi.org/10.1080/00036846.2021.1873234

67.

Woodland

(1994). Market efficiency and the favorite-longshot bias: The baseball betting market. The Journal of Finance, 49(1), 269–279. https://doi.org/10.1111/j.1540-6261.1994.tb04429.x