Path-dependency and coordination in multi-candidate elections with behavioral voters

Abstract

We consider a behavioral model of voting in multi-candidate elections under plurality rule. In the case of a positive impression of the campaign leader, voters increase their propensity to vote for that candidate, while in the case of a negative impression voters decrease their propensity. The formation of positive or negative impressions depends on an endogenous aspiration level. We show that in multi-candidate elections, in any stationary distribution, the winner receives a share of 50% of votes. Our results suggest that achieving coordination is ‘path-dependent’: whether voters manage to coordinate on the majority-preferred candidate critically depends on the initial state. We then identify conditions that make the election of the majority-preferred candidate more likely. However, even if the majority candidate is elected for sure, voting behavior is only partially coordinated.

Keywords

Behavioral model multi-candidate elections voter coordination

1. Introduction

Game-theoretic models of voting have demonstrated the strategic complexity of multi-candidate elections (e.g., Cox, 1994, 1997; Fey, 1997; Myatt, 2007; Myatt and Fisher, 2002a,b; Myerson and Weber, 1993; Palfrey, 1989). In the canonical model, candidates have fixed policy positions, while voters are assumed to maximize their expected policy payoff based on the election outcome. Despite its apparent simplicity, the model is difficult to analyze. This is because voters who prefer a badly trailing candidate have an incentive to abandon their first choice who is likely to lose, and switch their vote to a more competitive candidate. This ‘no-wasted-votes’ argument requires that voters form expectations about the likelihood that a candidate will win and then vote for the candidate who maximizes their expected utility. However, these expectations themselves depend on the expected vote choices of the entire electorate, which depend on the electorate’s expectations about who is leading, and so forth. In equilibrium, voters are assumed to calculate the probability of being pivotal among low-probability events and to base their decisions on relative pivot probability ratios.

A long tradition in political science has questioned the empirical validity of such models of strategic voting.¹ According to this literature, not only do voters lack basic information or coherent policy positions, their reasoning processes are heavily biased and bear no resemblance to the rational cognitive processes presumed by game-theoretic models (e.g., Achen and Bartels, 2002, 2016; Berelson et al., 1954; Campbell et al., 1960; Cole et al., 2012; Healy and Lenz, 2014; Healy et al., 2010; Stimson, 2004; Wolfers, 2002). Following these findings, some scholars have suggested that voters essentially vote how they ‘feel’ and that this lack of sophistication can have dire consequences for democratic governance, as elections amount to little else than random selection devices driven by biases and politically irrelevant factors (e.g., Achen and Bartels, 2004, 2016). Such sweeping normative conclusions, in turn, have led to rebuttals from rational choice theorists. Counter-arguments have questioned the empirical validity of behavioral criticisms of rational choice models or defended the explanatory power of existing rational choice models. (e.g., Ashworth, 2012; Ashworth and de Mesquita, 2014).

In this paper, we take a different approach. We set aside the empirical debate on the extent of voter rationality, information, or competence. Instead, we will formally explore the performance of electoral institutions if voters indeed act as described by, e.g., Achen and Bartels (2016) and vote ‘as they feel.’ Specifically, we assume that voting behavior is consistent with one of the fundamental principles of learning theory, the law of effect (Hilgard and Bower, 1966; Thorndike, 1898): actions that are associated with favorable impressions are more likely to occur; those that are associated with negative impressions are less likely to occur. According to this view, rather than using best-response strategies based on rationally formed beliefs, voters adapt their voting propensities in response to positive and negative impressions, which are compared with an aspiration level based on past impressions (Bendor et al., 2003, 2011; Karandikar et al., 1998). In the case of a positive impression of a candidate, i.e., an impression above their aspiration level, voters increase their propensity to vote for the candidate; while in case of negative impression of a candidate, i.e., an impression below their aspiration, voters decrease their propensity for the candidate and shift their voting propensity to the other candidates. Aspiration levels are not fixed but incorporate past impressions. This behavioral rule requires far less cognitive effort, attention, and ability than the classical game-theoretic approach, in which voters make decisions based on complex pivot events that occur with very small probabilities. Moreover, voters are not assumed to know the details of their environment, such as the action sets of other voters, the details of opinion polls, or even the fact that voters adjust their propensities based on impressions, and so forth.

We then apply this model to the problem of candidate selection in multi-candidate elections. That is, voters elect a public representative from a finite list of candidates with fixed policy positions under plurality rule. In contrast to the negative assessment of Achen and Bartels (2004), we find that, even if voters vote ‘as they feel,’ elections perform quite well. In a two-candidate case, the majority-preferred candidate always wins. We then consider the case of multiple candidates. We first show that the voting process is, in general, path-dependent: whether voters are able to coordinate on a majority-preferred candidate (and on which of the candidates) depends on the initial state.² We then show that a majority-preferred candidate wins under a broad set of circumstances even if voters simply rely on fleeting impressions of candidates.

While the main focus of the paper is on analyzing the normative properties of elections with behavioral voters, our paper also makes a methodological contribution. Aspiration-based behavioral models are often difficult to solve analytically. In their analysis of the prisoners’ dilemma, Karandikar et al. (1998), for example, are only able to analyze $2 \times 2$ games. In their analysis of elections, Bendor et al. (2003, 2011) frequently have to rely on computational approaches. These models have assumed a finite number of agents. By instead modeling the electorate as a continuum of voters, we can analytically characterize the stationary distributions of the model’s dynamic process. As we explain later (in endnote 8), where we compare the predictions of our continuous population model and the computational results of the finite population counterpart (Bendor et al., 2011), the qualitative properties of the two models are the same. This suggests that our modeling approach could be used in other applications where the existing finite population setup has encountered difficulties.

2. The model

There are n candidates with fixed policy positions. Elections are conducted by plurality rule. The set of all candidates will be denoted by $I = {1, 2, \dots, n}$ with i denoting a generic candidate. We will refer to voters as ‘she’ and candidates as ‘he.’ There is a continuum of voters of measure 1. Voting is based on a voter’s impressions of a candidate. Impressions are the sum of two factors. The first is a fixed, systematic, factor. It captures the typical response of a voter toward a candidate and may be based on the voter’s ideological stance, party identification, or prior experience with the candidate. Everything else being equal, a conservative voter, for example, will be more likely to feel positively toward a republican candidate than toward a democrat (Kelley, 1983; Zaller, 1992). Voters are grouped into types based on their systematic factors $v_{θ}$ . Types are indexed by the Greek letter $θ$ . Intuitively, a type corresponds to a ranking of the n candidates in the absence of other factors. The set of all types will be denoted $Θ = {θ = (i_{1} i_{2} \dots i_{n}) | (i_{1} i_{2} \dots i_{n}) is a permutation of (12 \dots n)}$ . Type $θ = (i_{1} i_{2} \dots i_{n})$ is characterized by the n -dimensional vector $v_{θ} = (v_{θ}^{(i_{1})}, v_{θ}^{(i_{2})}, \dots, v_{θ}^{(i_{n})})$ , where $v_{θ}^{(i_{1})}$ is type $θ$ ’s systematic factor associated with candidate $i_{1}$ , $v_{θ}^{(i_{2})}$ is type $θ$ ’s systematic factor associated with candidate $i_{2}$ , etc.³ The share of the type- $θ$ voters in the total population is denoted $s_{θ}$ , with $0 \leq s_{θ} \leq 1$ and $\sum_{θ \in Θ} s_{θ} = 1$ .

The second factor, $ϵ$ , represents the fleeting, idiosyncratic factors that influence voters’ impression for a given candidate, reflecting events like gaffes, campaign commercials, and media reports. It may also capture idiosyncratic and unrelated events, such as the weather or the fate of local sports teams, that affect the mood of a voter, as suggested by, e.g., Achen and Bartels (2002), Cole et al. (2012), and Healy et al. (2010). We will assume that the factor $ϵ$ is modeled as a random variable with a normal distribution $N (0, σ^{2})$ , where $Φ ()$ is its distribution function. Then $π = v_{θ}^{i} + ϵ$ will reflect a voter’s current impression of a candidate during a campaign.⁴

The intended domain of our model is an electoral campaign. In each period t ( $t = 0, 1, 2, \dots$ ), voters express voting propensities for the n candidates. That is, each voter, at each time t, is characterized by a probability distribution ${p^{t} (i)}_{i \in I}$ , with the interpretation that $p^{t} (i)$ represents the probability or inclination of the respective voter to vote for candidate i, at time t. We will assume that $p^{t} (i) \geq 0$ for each $i \in I$ and $\sum_{i \in I} p^{t} (i) = 1$ , i.e., there are no abstentions. The distribution of vote shares will be denoted by ${(S_{i})}_{i \in I}$ , where $S_{i}$ denotes the vote share that goes to candidate i. The candidate with the most votes is elected. Ties are decided by a fair coin toss.

In each period t, the voting propensities of the electorate determine who is the leading candidate, the ‘campaign leader,’ and who are the ‘campaign laggards.’ This can be done through a poll or media reports. Voters base their propensities solely on who is the campaign leader. They do not pay any attention to other features of the race, such as expected vote shares in an election poll and do not use Bayes’ rule to form beliefs about the likely outcome of the race, in contrast with the approaches proposed in Diermeier and Van Mieghem (2008) and Fey (1997). This formally captures the notion of inattentive, uninformed, and uninterested voters, a feature often pointed out by the behaviorally oriented political science literature on electoral campaigns (e.g., Stimson, 2004). The details of the adjustment process are presented in the next subsection.

2.1. The behavioral adjustment process

The adjustment process is a generalization of the model proposed in Bendor et al. (2003) and Karandikar et al. (1998). As a convention, we will use superscripts to denote time and subscripts to denote voters and voter types. In addition, we use brackets to refer to candidates. As an example, $p_{θ}^{t} (i)$ refers to the probability that a type- $θ$ voter will vote for candidate i were the election to be held at time t. Whenever the superscript is omitted, the notation refers to a stationary state.

In each period t, each voter forms an impression about the campaign leader given by $π^{t} = v_{θ}^{i} + ϵ^{t}$ , where index i corresponds to the campaign leader, and $θ$ corresponds to the type of the voter. A given impression $π^{t}$ is classified as a positive or negative impression, based on aspiration level $a^{t}$ (Bendor et al., 2003). If $π^{t} \geq a^{t}$ , the voter has a positive impression and will increase her propensity to vote for that candidate. If $π^{t} < a^{t}$ , then the corresponding impression is negative, and the voter will decrease her propensity to vote for that candidate.

As in Bendor et al. (2003) and Karandikar et al. (1998), the aspiration level $a^{t}$ endogenously adjusts to past impressions. A voter who repeatedly has negative impressions of a candidate will lower her threshold for having positive impressions in the future, while a voter who receives a string of positive impressions of a candidate will increase her threshold for future positive impressions.

The specific rules used to adjust voting probabilities and aspirations are as follows.

2.1.1. Voting probabilities adjustment

Voting probabilities evolve according to the Bush–Mosteller rule (Bendor et al., 2003, 2011; Bush and Mosteller, 1955; Karandikar et al., 1998). Specifically:

Suppose candidate i, with $i \in I$ , is the campaign leader in period t and $π^{t} \geq a^{t}$ , i.e., the voter’s impression of the candidate is positive, then

p^{t + 1} (i) = (1 - λ_{p}) p^{t} (i) + λ_{p}

Suppose candidate i, with $i \in I$ , is the campaign leader in period t and $π^{t} < a^{t}$ , i.e., the voter’s impression of the candidate is negative, then

p^{t + 1} (i) = (1 - λ_{p}) p^{t} (i)

$λ_{p} \in (0, 1)$ is constant across time and voters and represents the rate of voting probability adjustment.⁵

Under the Bush–Mosteller rule, the new propensity $p^{t + 1} (i)$ is a convex combination of the current propensity $p^{t} (i)$ and an indicator function, taking the value 1 when impressions are positive and 0 when negative. It is straightforward to see that the law of effect is satisfied: the propensity to vote for the campaign leader increases if the voter’s impression is positive and decreases otherwise.

2.1.2. Aspiration adjustment

Aspirations evolve according to the Cyert–March rule (Bendor et al., 2003, 2011; Cyert and March, 1963; Karandikar et al., 1998), defined as

a^{t + 1} = (1 - λ_{a}) a^{t} + λ_{a} π^{t}

where $λ_{a} \in (0, 1)$ is a constant and common parameter that controls the level of adjustment.

Under the Cyert–March rule, the aspiration, $a^{t + 1}$ , is a convex combination of present aspiration, $a^{t}$ , and the impression, $π^{t}$ . Note that if the impression $π^{t}$ is positive, the aspiration level increases, while negative impressions cause the aspiration level to drop.

3. Stationary distributions and normative implications

The basic model and the behavioral process described in Section 2.1 jointly define a dynamic process. Our goal is to characterize the stationary distributions of that adjustment process and its properties. Specifically, we focus on a two-candidate election, and a well-studied three-candidate election, called ‘beat the incumbent’ (e.g., Fey (1997); Myatt (2007); Myatt and Fisher (2002b); Myerson and Weber (1993).

3.1. Two-candidate elections

We first consider the case of n=2 candidates. In rational choice models of voting, this is a trivial case as voters simply vote for their preferred candidate. In a behavioral model where voters are inattentive and vote based on their impressions, this may not be the case. Specifically, we wish to explore the claim of Achen and Bartels (2004) that if voters vote ‘how they feel,’ elections amount to little less than a ‘random oligarchy.’ The case of n=2 will also illustrate the logic of impression-based voting in general. Later, we will extend this analysis to multi-candidate elections.

Suppose there are two candidates, candidate A and candidate B, and two types of voter: $Θ : = {AB, BA}$ . AB voters rank A higher than B, and BA voters rank B higher than A. Specifically, we assume for AB voters: $v_{AB} = (v_{AB}^{(A)}, v_{AB}^{(B)}) = (1, 0)$ and BA voters: $v_{BA} = (v_{BA}^{(A)}, v_{BA}^{(B)}) = (0, 1)$ . Let $s_{AB}$ be the fraction of AB voters, and $s_{BA}$ be the fraction of AB voters, and suppose that $s_{AB} > s_{BA}$ .

The analysis proceeds as follows. Consider a stationary distribution in which, say, candidate A is the campaign leader. In this case, the voter’s impression $π$ of candidate A is distributed as $π_{AB} = 1 + ϵ$ for AB voters and $π_{BA} = 0 + ϵ$ for BA voters, where $ϵ ~ N (0, σ^{2})$ . Because aspirations adjust according to the Cyert–March rule and the distribution is stationary, this implies that the aspirations of the two types of voter will be distributed normally, centered around 1 for AB voters, and around 0 for BA voters. But this implies that 50% of the population of each type will have a positive impression of candidate A, while the remaining 50% will have a negative impression of candidate A. Given this 50%–50% split of voters, and since the Bush–Mosteller rule is symmetric around 0.5, we must have that the stationary shares of the two candidates will be 50% each. That is, there is a perfect tie between the two candidates. Depending on how the tie is broken, if candidate A is the expected winner, then the next-period share of votes will follow the same distribution of 50%–50%. If B wins, the average AB voters’ impression $π$ of the campaign leader (now candidate B) will jump from 1 to 0 and a large fraction of them will have a negative impression of candidate B and thus become more likely to support candidate A. At the same time, the average BA voters’ impression $π_{BA}$ of the expected winner, (i.e., candidate B) will go up from 0 to 1, and a large fraction of them will have a positive impression of candidate B and thus become more likely to support candidate B. But since AB voters are the larger group, this implies that the aggregate probability to vote A will increase. Hence, the share candidate A receives shoots up, with the increase being dependent on the share of AB voters, i.e., the larger $s_{AB}$ , the larger the increase in the share of votes A receives.

A similar argument applies at the other stationary distribution, in which candidate B is the expected winner. The predicted shares of the two candidates will be 50% each. But, whenever A is expected to win, given the larger share of AB voters, the share of votes that A receives will shoot above 50%, again with the increase being dependent on the share of AB voters. It therefore follows that since AB voters are more numerous than BA voters, the share of votes that candidate A is expected to receive can never go below 50%, and, on average, will exceed 50%, with its level being dependent on the size of its supporters in the electorate.

We summarize this result in the following proposition. The proof is straightforward and thus omitted.

Proposition 1. For n=2 candidates, the candidate with the most supporters wins, with an average share of votes that is proportional to the size of his supporters.

Note that the vote share received by A is proportional, but in general not equal, to the size of the AB faction. More specifically, the voting behavior is such that a majority of AB voters will vote for A, and similarly, a majority of BA voters will vote for B; while, on aggregate, A will be the expected winner of the election as desired by a majority. Thus, voters typically do not ‘vote sincerely’ or ‘expressively’ (Schuessler, 2000), but the dynamics of the adjustment process ensure that the majority-preferred candidate wins.

3.2. The ‘beat the incumbent’ election

The ‘beat the incumbent’ model (e.g., Fey, 1997; Myatt, 2007; Myatt and Fisher, 2002b; Myerson and Weber, 1993) consists of three candidates A, B, and C, and three types of voter: AB, BA, and C. The labels indicate the top preferences of each type, for example an AB type prefers A over B, and B over C. AB and BA voters both rank candidate C third and thus have a common interest in making sure that C does not get elected, though their preferences diverge on who they rank first. The fixed factors v of each type are described in Table 1. v is a parameter with $v \in [0, 1]$ and can be interpreted as the benefit of coordinating among AB and BA voters. The share of each type is denoted by $s_{AB}$ , $s_{BA}$ , and $s_{C}$ , respectively. In the model proposed by Fey (1997) or Myerson and Weber (1993), $s_{AB} = s_{BA} = 30 %$ and $s_{C} = 40 %$ , so that AB and BA voters constitute a majority, while C voters are in the minority. This means that C is a ‘Condorcet loser’ candidate, i.e., he would lose against either A or B in pair-wise competition. This gives rise to the normative question of which conditions would prevent the election of a Condorcet loser. Note that the majority group of voters (i.e., AB and BA voters) can be sure of a majority candidate win only if they sufficiently coordinate their support on either A or B. Otherwise, if they fail to coordinate their votes, candidate C wins.

Table 1.

Voters’ fixed factors ( $v_{θ}$ ).

	Candidate A	Candidate B	Candidate C	Proportion
AB type	1	v	0	$s_{AB}$
BA type	v	1	0	$s_{BA}$
C type	0	0	1	$s_{C}$

In the ‘beat the incumbent’ election, with three candidates, the process allows for multiple stationary distributions. Yet, all attainable stationary distributions can be characterized in the following proposition.⁶

Proposition 2. In the ‘beat the incumbent’ election, the adjustment process has three stationary distributions, one distribution for each candidate, where that candidate receives a share of 50% of the votes.

Proof. In the appendix. □

The intuition for the result is as follows. Consider a stationary distribution and suppose that candidate A is the sole winner. In this case, the distribution of impressions is $π_{AB}^{A} = 1 + ϵ$ , $π_{BA}^{A} = v + ϵ$ , and $π_{C}^{A} = 0 + ϵ$ , where $ϵ ~ N (0, σ^{2})$ . Because aspirations adjust by the Cyert–March rule and the distribution is stationary, they must also be normally distributed and centered around 1 for AB type voters, around v for BA type voters; and around 0 for C type voters. But this then implies that 50% of the population will have a positive impression of candidate A, while the remaining 50% will have a negative impression.

To see why the winner must receive a share of 50%, suppose, by way of contradiction, that his share is below 50%. First we note that, since the population is continuous, the winner’s share (in this case candidate A) is exactly the aggregate probability of voting for candidate A across the entire electorate. Thus, a share of less than 50% for candidate A is equivalent to an aggregate chance of voting for candidate A below 0.5. Second, since the (aggregate) likelihood of voting for candidate A is less than 0.5, the Bush–Mosteller rule implies that, on average, the probability increase of voters with a positive impression of candidate A will be larger than the probability decrease of voters with a negative impression. As voters with positive and negative impressions are split 50%–50%, the probability of voting for candidate A will, on average, increase. Thus, the share of candidate A will increase, which contradicts the stationarity assumption. A similar reasoning shows that if candidate A’s share is above 50% then, on average, the probability of voting for candidate A decreases. Alternatively, the share of candidate A decreases, which, again, is contrary to the stationarity assumption. Therefore, the only candidate for a stationary distribution is a state in which candidate A receives 50% votes. Indeed, at an average voting inclination of 0.5 (which is equivalent to a share of 50% votes for candidate A) the adjustments of voters with positive and negative impressions of candidate A will compensate each other and the distribution will be stationary.

Note that Proposition 2 holds at the aggregate level, i.e., at the level of types, not at the individual voter level. Specifically, voters attach voting probability 0.5 to voting for the winning candidate only on average. In contrast, individual voting propensities will vary across voters, i.e., some voters will be more likely to vote for the winning candidate, others less likely.

Note also that the prediction of a 50% vote share should not be interpreted too literally. It is a consequence of our mathematical assumption of a continuum of voters. In any finite electorate, we would have a distribution over vote shares centered around 50%. Proposition 2 is thus consistent with the electoral data from the 1987–1997 parliamentary elections in England analyzed by Myatt and Fisher (2002a,b): the distribution of vote shares of the winning candidates is single-peaked and centered around 50%.⁷

The model allows for multiple stationary distributions, one for each candidate. All stationary distributions allocate a vote share of 50% to the winning candidate and the adjustment process uniquely selects one of the stationary distributions, but which one depends on the process’ starting state. That is, the result is properly viewed as a form of ‘path-dependence.’⁸ This naturally raises the question under what circumstances the plurality-preferred candidate will be elected. We will discuss this question in Section 3.2.1.

3.2.1. Path-dependency and coordination success

Proposition 2 implies that the adaptive process is path-dependent. That is, it is possible for each candidate to be the election winner, including the Condorcet loser. Proposition 2, however, does not tell us how likely these outcomes are. Note also that Proposition 2 does not impose any restrictions on how the voting weights are adjusted for the campaign laggards. To study the path-dependence of the process, we need to be more specific and make assumptions on the way voting weights adjust for all candidates and not only for the campaign leader. We will focus on a class of adjustment rules that satisfy a ‘no-leap-frog’ condition.⁹

Assumption 1. Suppose that candidate i ( $i \in {A, B, C}$ ) leads in period t, i.e., $S_{i}^{t} > S_{i^{'}}^{t}$ for each $i^{'} \neq i$ $, i^{'} \in {A, B, C}$ . If at $t + 1 S_{i}^{t + 1} > S_{i}^{t}$ , i.e., candidate i ’s share increases, then i must again lead in period $t + 1$ .

Intuitively, Assumption 1 requires that any redistribution of vote shares must be balanced, i.e., no campaign laggard is arbitrarily favored. To understand the impact of the condition, consider the following example. Suppose that in the current period, A is the campaign leader, i.e., candidate A has the largest share among the three candidates. Further, suppose that after voters adjust their voting weights, A’s share increases in the next period. Then, we require that A continue to lead.

We now investigate the question under which conditions a majority is successful in coordinating, i.e., elect one of the majority-preferred candidates. Here, we focus on the case where $v \leq a^{0}$ or $v \geq 2 a^{0}$ , i.e., the benefit from coordination for the majority factions is either ‘small’ or ‘large.’ Suppose that voters initially vote for their highest-ranked candidate with probability one, i.e., AB types vote for A and BA types vote for B, while all C types vote for C.¹⁰ In this case, the candidate who has the largest support among voters wins the election (with a vote share of 50%).

Proposition 3. Suppose that, initially, propensities are distributed as $p_{AB}^{0} (A) = 1$ , $p_{BA}^{0} (B) = 1$ , and $p_{C}^{0} (C) = 1$ , while aspirations are distributed as $a^{0} = \frac{1}{3}$ for all voters. If $s_{AB} > max {s_{BA}, s_{C}}$ , the process converges to the stationary distribution in which A wins with a share of 50%. Similar results hold for the cases of $s_{BA} > max {s_{AB}, s_{C}}$ and $s_{C} > max {s_{AB}, s_{BA}}$ , i.e., if $s_{BA} > max {s_{AB}, s_{C}}$ , the process converges to the stationary distribution in which B wins with a share of 50%, while, if $s_{C} > max {s_{AB}, s_{BA}}$ , the process converges to the stationary distribution in which C wins with 50%.

Proof. In the appendix. □

In other words, if voters initially vote for their preferred choice and aspirations are in a medium range around $1 ∕ 3$ , the plurality-preferred candidate will win. Note, however, that in the case where the candidate is the first choice of a plurality, but not a majority, the vote share of the winning candidate will be larger than the share of the population that rank that candidate first. Proposition 3 thus points out that the adjustment mechanism can amplify a plurality into a majority. This also means that a Condorcet loser can be elected, provided he is the plurality winner, as in the case studied by Fey (1997) and Myerson and Weber (1993), i.e., $s_{AB} = s_{BA} = 30 %$ and $s_{C} = 40 %$ . Conversely, if $s_{AB} = 34 %, s_{BA} = 33 %$ , and $s_{C} = 33 %$ , voters will coordinate on A as the winner, with a vote share of 50%, which selects a majority-preferred candidate, but also overstates the candidate’s true support in the electorate.

It is important to understand two factors that matter in Proposition 3. First, a candidate must be the initial campaign leader. In Proposition 3, the plurality-preferred candidate will be the leader because, by assumption, all voters initially choose their most preferred candidate. Second, that initial success must be amplified. Various factors play a role here. One of them is the fact that initial aspirations are sufficiently low; another that the systematic support in the voting population must be sufficiently high.

We can further investigate these factors by considering alternative initial conditions. For example, consider the case where there is no initial propensity for the preferred candidate. That is, suppose initial voting probabilities are all equal, i.e., $p^{0} (A) = p^{0} (B) = p^{0} (C) = 1 ∕ 3$ for all voters. Again, assume that initial aspirations are $1 ∕ 3$ for all voters. We then have the following proposition.

Proposition 4. Suppose that, initially, propensities are distributed as $p^{0} (A) = p^{0} (B) = p^{0} (C) = 1 ∕ 3$ , while aspirations are distributed as $a^{0} = 1 ∕ 3$ for all voters. If candidate A leads at t=0 and $s_{AB}$ is sufficiently large, the process converges to the stationary distribution in which A wins a share of 50%. A similar statement holds for the cases of B or C being the initial leaders, provided their support is sufficiently large.

Proof. In the appendix. □

Under these assumptions, a candidate with minority support can win the election, provided he is the initial campaign leader and his support is not too small. Intuitively, the condition ‘ $s_{AB}$ sufficiently large’ ensures that the initial advantage ‘sticks.’ That is, it ensures that the leading position of a given candidate, say A, leads to a positive impression of the candidate, which generates an increase in propensity. This ensures that, in each cycle, the share of A either increases or is at least 50%. Therefore, A will be the leading candidate in each cycle, which implies that A will win in the limit, with a share of 50% votes.

We can make this threshold value precise. Specifically, for candidate A’s support to be sufficiently large, $s_{AB}$ needs to be larger than a critical value $s^{*}$ ; we must have

(*) s_{AB} > s^{*} : = \frac{Φ (\frac{a^{0}}{σ}) - \frac{2}{3}}{Φ (\frac{1 - a^{0}}{σ}) + Φ (\frac{a^{0}}{σ}) - 1}

Observe that if $σ$ is large, $s^{*} < 0$ . In this case, whoever is the leader in the first round will be assured of winning the election, independent of his support among voters. Intuitively, this will be the case if impressions are characterized by substantial noise. Observe, also, that, for the given starting state, $s^{*} < 1 ∕ 3$ for any $σ > 0$ . So, it is always possible for a Condorcet loser to win the election if initial propensities and aspirations are equal among candidates. Similarly, note that a low $a^{0}$ favors the initial campaign leader, as voters are more likely to have a positive impression of that candidate in the initial periods and are thus more likely to support the initial leader. So low expectations may make it easier for an initial advantage to carry the day.

In the next proposition, we state conditions that ensure the success of a majority-preferred candidate, i.e., A or B.¹¹ This will depend on the benefit of coordinating v. That is, $s_{AB} > s^{*}$ or $s_{BA} > s^{*}$ can be relaxed if v is positive.

Proposition 5. Suppose that, initially, propensities are distributed as $p^{0} (A) = p^{0} (B) = p^{0} (C) = 1 ∕ 3$ while aspirations are distributed as $a^{0} = 1 ∕ 3$ for all voters, and suppose that A leads at t=0. If (i) $v > 0$ and (ii) $s_{AB} [Φ (\frac{1 - a^{0}}{σ}) + Φ (\frac{a^{0}}{σ}) - 1] + s_{BA} [Φ (\frac{v - a^{0}}{σ}) + Φ (\frac{a^{0}}{σ}) - 1] > Φ (\frac{a^{0}}{σ}) - 2 ∕ 3$ , the process converges to a stationary distribution in which candidate A wins with a share of 50% of votes. A similar statement holds for the case of B being the leader at t=0.

This result indicates that coordination among AB and BA types is more likely if the benefit of coordinating, i.e., parameter v, is large. The intuitive reason is that if v is large, AB voters (or BA) will have a positive impression of B (or A) with a higher probability, because the impression $π = v + ϵ$ is increasing in v. Therefore, if either A or B are the campaign leaders, AB and BA types will have a higher propensity to vote for them, and so coordination among AB and BA will be more likely.¹²

3.2.2. Duvergerian equilibria and Duverger’s law

We will now compare our analysis of the ‘beat the incumbent’ game with its rational voter analog. Various papers (see, e.g., Fey, 1997; Myerson and Weber, 1993; Palfrey, 1989) show that with rational voters there are three equilibria that fit into two categories. The first category, referred to as ‘Duvergerian,’ in reference to Duverger’s law (Duverger, 1954), includes two equilibria in which the groups of AB and BA voters fully coordinate their votes on either A or B, while the C voters cast their support for C. In this case, only two candidates get votes and A or B wins the election. The second category, referred to as ‘non-Duvergerian,’ includes an equilibrium, in which each group of voters cast their support for their top choice, i.e., AB voters vote for A, BA voters vote for B, and C voters vote for C. In this case, the groups of AB and BA do not coordinate and all three candidates get votes.

From a normative point of view, Duvergerian equilibria are desirable, since the majority faction successfully coordinates on A or B. However, a unique selection of Duvergerian equilibria (and hence a strict derivation of Duverger’s law from a game-theoretic model) has proven to be difficult, since both type of equilibria (Duvergerian or non-Duvergerian) are consistent with the game’s incentives, without any prediction about their relative frequency. Myerson and Weber (1993, p.106) state this as: ‘Duverger’s law cannot be derived exclusively from analysis of voting equilibria. [ $\dots$ ] Any derivation of Duverger’s law would seem to require some additional assumption of dynamic stability or persistence.’

To resolve these difficulties, Fey (1997) proposed a model of myopic best-responses dynamic where voters use Bayes’ rule to infer each candidate’s chances of winning based on current opinion polls. Fey’s adjustment model generically converges on a Duvergerian equilibrium for any initial state of the process. Fey thus concludes that the non-Duvergerian equilibrium is ‘unstable.’

In contrast, our model is based on very different behavioral assumptions. In our model, voters do not use best-response strategies; rather they use an adjustment process based on whether their impression of a candidate was positive given their aspiration level. Voters also do not need to know the details of polls or be able to calculate a candidate’s chances of winning using Bayes’ rule. All they need to know is who the leading candidate is. They also do not need to know the details of their environment, such as the presence or action sets of other voters, or the process of impression formation and voting.

Proposition 2 implies that the behavioral adjustment process has three stationary distributions: in two of them, one of the majority candidates (A or B) wins with a share of 50%, while in the third stationary distribution the minority candidate (candidate C) wins, again, with a share of 50%. These predictions may seem similar to those of the game-theoretic model previously discussed; however, there are major differences between the two models, in regard to both the dynamics of the adjustment process and the voting behavior associated with these outcomes. Importantly, there are no ‘unstable’ states in our approach. In particular, as we have seen, the distribution in which the minority candidate C wins can be reached from a large range of starting states. Thus, in our model, the case in which C wins does not constitute an ‘exceptional’ case (as suggested by Palfrey (1989)).

More generally, our model is incompatible with equilibrium behavior in a game-theoretic model.

Remark 1. The Duvergerian equilibrium in which all AB and BA types vote for A (or B) and C types vote for C is not a stationary distribution of the behavioral adjustment process.

Conversely, the non-Duvergerian type equilibrium, in which all three candidates receive positive shares of votes, is not stationary either.

Remark 2. The non-Duvergerian equilibrium in which AB types vote for A, BA types vote for B, and C types vote for C is not a stationary distribution of the behavioral adjustment process.

These results follow directly from Proposition 2. Proposition 2 states that in any stationary distribution the winner receives a share of 50% of total votes, and, moreover, as shown in the proof, each group of voters will vote for the winning candidate with probability 0.5. But this immediately rules out the two types of equilibria as potential candidates for a stationary distribution. For example, in the Duvergerian equilibrium in which either A or B wins, all C types attach weight zero to voting for A or B, contradicting the voting behavior implied by Proposition 2. Similarly, in the non-Duvergerian equilibrium in which C wins, all AB types put zero mass probability on voting for C. This is inconsistent with Proposition 2.

More generally, no outcome in which one of the candidates receives (exactly) $0 %$ (as in the Duvergerian equilibrium) of the total votes is stationary in our model. That is, in our model, coordination is always partial. Consider, for example, the parameter configuration as proposed by Fey (1997) and Myerson and Weber (1993): $s_{AB} = 30 %$ , $s_{BA} = 30 %$ , and $s_{C} = 40 %$ . With an adjustment process that favors the candidate among the campaign laggards that is better liked by a majority, the behavioral model predicts the reallocation of votes toward the winning candidate. That is, there are two types of distribution: one in which either A or B wins as $(S_{A} = 50 %, S_{B} = 30 %, S_{C} = 20 %)$ and a distribution where candidate C wins, with a distribution of votes $(S_{A} = 25 %, S_{B} = 25 %, S_{C} = 50 %)$ .¹³ Thus, even though the electoral support for the three candidates is balanced i.e., $30 %$ , $30 %$ , $40 %$ , the voting behavior favors one of the majority candidates over the other. In this sense, our model is consistent with Duverger’s law, understood as a ‘tendency’ toward two-candidate competition (Duverger, 1954). Note that the mechanism for this tendency is different from that proposed in the game-theoretic models. The choice by voters to abandon their preferred choice if that candidate is trailing badly plays no role in our model. Rather, what matters is the amplification of support for the leading candidate, and type-driven propensities for campaign laggards in the case of negative impressions of the campaign leader.

Whether these differences from the game-theoretic models constitute a vice or a virtue depends on the empirical adequacy of the model. Most theoretical research has identified Duverger’s law with the selection of Duvergerian equilibria. The focus on selecting Duvergerian equilibria, however, has been questioned in recent empirical and theoretical analyses. Cox (1997) and especially Myatt and Fisher (2002a,b) have argued that third-party candidates receive consistently more votes than predicted by Duvergerian equilibria. According to these findings, coordination under plurality rule is partial, as in our model, but in contrast with the properties of Duvergerian equilibria.¹⁴ More empirical research is needed to investigate these issues in detail.

4. Conclusions

A long tradition of voting behavior in political science has suggested that voters are heavily influenced by how they currently ‘feel’ about a candidate and do not engage in rational belief formation or decision-making. We formally capture this view with a model of voting where actions are based on the comparison between voters’ current impression of a candidate and an aspiration level leading to reinforcement-based behavior that has been widely supported by psychological research.

The model defines a dynamic process, and we solve for the stationary distributions of the process. All stationary distributions are of the same form, with one of the candidates winning a majority of votes. In a two-candidate competition, the majority-preferred candidate always wins. In the case of a multiple candidate, the process exhibits path-dependence as, under certain conditions, the initial campaign leader may become the eventual winner.

The dynamics of our model are fundamentally different from the equilibria identified by game-theoretic models or best-response-based adjustment models, as in Fey (1997). Winning an election depends on initial success plus amplification of support, which depends on the attitudes of the electorate. It is possible for a minority-preferred candidate to be elected, provided he leads in the first round and has sufficient support in the electorate. But the likelihood that a majority candidate will be elected will increase if the candidate commands a larger support in the electorate and the benefits of coordination among the majority factions are large. Moreover, low initial aspirations make it easier for a majority candidate to win, provided he has significant initial support.

In our model, a substantial segment of the voting population do not vote according to their preferences. That is, they do not vote ‘sincerely.’ This concentration of support, however, is not due to ‘strategic voting,’ if by that term we mean a conscious calculation based on relative chances of winning. Rather, voters adjust their action propensities based on impression. This adaptive process amplifies initial success.

From a normative point of view, the result points out that even an electorate that largely votes ‘as it feels’ will often succeed in electing a candidate that is preferred by a majority. Candidate selection is not random, as argued in Achen and Bartels (2004). Rather, behavioral adjustment processes lead to an outcome that, in many cases, avoids electing a Condorcet loser. In other words, compared with purely random selection, impression-based voting favors the direction of the candidate who is majority preferred. That said, the dynamic process of coordination on majority-preferred candidates is complex. A detailed analysis requires a more fully specified adjustment process. The critical issue is how voters reallocate voting propensities among the campaign laggards. If they reallocate their propensity according to their preferences over candidates, as indicated by some experimental evidence (Forsythe et al., 1993), our model is consistent with Duverger’s law. But empirical analyses of the adjustment processes used by voters during a campaign are lacking. Thus, one implication of our model is that we need to learn more about the empirical regularities of impression-based voting to analyze the normative properties of mass elections.

Footnotes

Appendix A: Proofs

The following generic observation is used throughout the appendix.

Observation 1. Suppose that the distribution of voting weights across the entire electorate is ${(p (i))}_{i \in I}$ , while the corresponding type-level distribution is ${(p_{θ} (i))}_{i \in I}$ for each $θ \in Θ$ . Then an application of the law of large numbers implies that the share of votes that a candidate $i \in I$ obtains is the average propensity to vote candidate i across the entire electorate: $S_{i} = E [p (i)] = \sum_{θ \in Θ} s_{θ} E [p_{θ} (i)]$ .

Acknowledgements

Earlier versions of this paper were circulated under the titles ‘A behavioral model of multi-candidate elections’ and ‘Path-dependency and coordination in multi-candidate elections.’ We would like to thank the participants at the MPSA meetings 2010 and APSA meetings 2011 for their useful comments. All remaining errors are our own.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

Notes

References

Achen

Bartels

(2002) Blind retrospection: Electoral responses to droughts, flu, and shark attacks. In: Annual Meeting of the APSA, Boston, August 26-September 1, 2002.

Achen

Bartels

(2004) Musical chairs: Pocketbook voting and the limits of democratic accountability. In: Annual Meeting of the APSA, Chicago, September 2-September 5, 2004.

Achen

Bartels

(2016) Democracy for Realists: Why Elections Do Not Produce Responsive Government. Princeton, NJ: Princeton University Press.

Ashworth

(2012) Electoral accountability: Recent theoretical and empirical work. Annual Review of Political Science 15: 183–201.

Ashworth

de Mesquita

(2014) Is voter competence good for voters? Information, rationality, and democratic performance. American Political Science Review 108(3): 565–587.

Bendor

(2010) Bounded Rationality and Politics. Berkeley: University of California Press.

Bendor

Diermeier

Siegel

et al . (2011) A Behavioral Theory of Elections. Princeton: Princeton University Press.

Bendor

Diermeier

Ting

(2003) A behavioral model of turnout. American Political Science Review 97: 261–280.

Berelson

Lazarsfeld

McPhee

(1954) Voting: A Study of Opinion Formation in a Presidential Campaign. Chicago: University of Chicago Press.

10.

Bush

Mosteller

(1955) Stochastic Models of Learning. New York: Wiley.

11.

Campbell

Converse

Miller

et al . (1960) The American Voter. Wiley: New York.

12.

Cole

Healy

Werker

(2012) Do voters demand responsive governments? Evidence from Indian disaster relief. Journal of Development Economics 97: 167–181.

13.

Cox

(1994) Strategic voting equilibria under the single non-transferable vote. American Political Science Review 88: 608–621.

14.

Cox

(1997) Making Votes Count. Cambridge: Cambridge University Press.

15.

Cyert

March

(1963) A Behavioral Theory of the Firm. Englewood Cliffs, NJ: Prentice Hall.

16.

Diermeier

Van Mieghem

(2008) Coordination and turnout in large elections. Mathematical and Computer Modeling 48: 1478–1496.

17.

Duverger

(1954) Political Parties: Their Organization and Activity in the Modern State. New York: Wiley.

18.

Fey

(1997) Stability and coordination in Duverger’s law: A formal model of pre-election polls and strategic voting. American Political Science Review 91: 135–147.

19.

Forsythe

Myerson

Rietz

et al . (1993) An experiment on coordination in multi-candidate elections: The importance of polls and election histories. Social Choice and Welfare 10: 223–247.

20.

Healy

Lenz

(2014) Substituting the end for the whole: Why voters respond primarily to the election-year economy. American Journal of Political Science 58(1): 31–47.

21.

Healy

Malhotra

(2010) Irrelevant events affect voters’ evaluations of government performance. Proceedings of the National Academy of Sciences 107(28): 12506–12511.

22.

Hilgard

Bower

(1966) Theories of Learning. New York: Appleton-Century-Crofts.

23.

Karandikar

Mookherjee

Ray

et al . (1998) Evolving aspirations and cooperation. Journal of Economic Theory 80: 292–331.

24.

Kelley

(1983) Interpreting Elections. Princeton: Princeton University Press.

25.

Myatt

(2007) On the theory of strategic voting. Review of Economic Studies 74: 255–281.

26.

Myatt

Fisher

(2002a) Everything is uncertain and uncertainty is everything: Strategic voting in simple plurality elections. University of Oxford, UK, Discussion Paper Series, Number 115.

27.

Myatt

Fisher

(2002b) Tactical coordination in plurality electoral systems. Oxford Review of Economic Policy 18(4): 504–522.

28.

Myerson

Weber

(1993) A theory of voting equilibria. American Political Science Review 87: 102–114.

29.

Palfrey

(1989) A mathematical proof of Duverger’s law. In: Ordeshook

(ed.) Models Of Strategic Choice in Politics. Ann Arbor, MI: University of Michigan Press, pp. 69–92.

30.

Palomino

Vega-Redondo

(1999) Convergence of aspirations and (partial) cooperation in the prisoner’s dilemma. International Journal of Game Theory 28: 465–488.

31.

Schuessler

(2000) Expressive voting. Rationality and Society 12(1): 87–119.

32.

Stimson

(2004) Tides of Consent: How Public Opinion Shapes American Politics. Cambridge: Cambridge University Press.

33.

Thorndike

(1898) Animal intelligence: An experimental study of the associative processes in animals. Psychological Review (Monograph Supplement) 2(4): i–109.

34.

Wolfers

(2002) Are voters rational? Evidence from gubernatorial elections. Stanford Graduate School of Business Working Paper no. 1730, Stanford, CA.

35.

Zaller

(1992) The Nature and Origins of Mass Opinion. Cambridge: Cambridge University Press.