Abstract
Long-standing debate over the Politicized Departure Hypothesis (PDH) asserts that federal judges tend to arrange to retire under presidents of the same political party as the president who first appointed them, thereby giving that party the right to nominate their successor. PDH is important for asserting political party agency by judges, who receive no consequent personal benefit, and for explaining the long-term political party orientation of courts. PDH studies inevitably suffer from absence of data on known and unknown determinants of retirement timing. To avoid these and other problems, we apply 11 sharp regression discontinuity (SRD) analyses to voluntary judicial departures before and after five elections that replace Republican presidents with Democrats, and six that replace Democrats with Republicans, 1920 to 2018. Results of difference tests, difference-in-differences tests, and others are as predicted by PDH, for 10 of 11 analyses, for pre-election and post-inauguration observation periods of 270 days. Although unexpected, we find stronger PDH effects for Republican appointees than for Democratic appointees. We offer a novel explanation of PDH based on normative reciprocity rather than ideology.
Much-debated hypotheses claim that, as judges appointed under Article III of the U.S. Constitution end their courtroom careers, they seek replacement by others who share their political party orientation as Republicans or Democrats (Spriggs and Wahlbeck 1995; Stolzenberg and Lindgren 2010; Yoon 2006, 2017; Zorn and Van Winkle 2000). 1 Delivery of appointment rights is straightforward, but sometimes difficult to execute: if judges retire, resign, or accede to senior status early in the four-year term of an incumbent president, or whenever the sitting president’s party controls the Senate, that president has the right to nominate a successor judge, and there is time for confirmation by a cooperative Senate. By adjusting their dates of departure, judges can control assignment of the right to appoint their replacements, barring their sudden death, poor presidential election outcome prediction, or unforeseen personal exigencies (Campbell 2008; Chabot 2019; Erikson and Wlezien 2008).
In one version of this politicized departure hypothesis (hereafter, PDH), presidents seek to mold legal decisions by nominating judges who share their political ideology, values, attitudes, and opinions. Conversely, when judges leave full-time court service, they seek to have their replacements named by presidents who share their judicial ideology, values, attitudes, or at least opinions. Presidents choose among many candidate judges, but judges choose only between departure under a Democratic or Republican president, and they can only trust that presidents’ ideologies, attitudes, values, and opinions correlate with their political party affiliation (Chabot 2019; Stolzenberg and Lindgren 2010). We call this hypothesis the “enduring ideology” version of PDH, because it relies on judges’ maintenance of the political and legal attitudes and values that led to their first federal bench appointment. 2
We propose a second version of PDH in which the norm of reciprocity, not enduring ideology, motivates judges to return rights of appointment to presidents of the same party as the presidents who first appointed them. Reciprocity norms are mainstays of culture-focused social science (in sociology, see Gouldner 1960; Molm, Collett, and Schaefer 2007; in political science, Lubell and Scholz 2001; in social psychology, Whatley et al. 1999; in history, Kloppenberg 2016; in anthropology, Graeber 2001; and in economics, Fehr and Schmidt 2006; Malmendier, te Velde, and Weber 2014). We call this variant of PDH the “reciprocity norm” version.
Both versions of PDH assert that when age, health, family matters, occupational fatigue, or anything else induces judges to end full-time judicial service, they tend to delay or accelerate their retirement, thereby delivering the right to nominate their successor to presidents of the same party as the president who first appointed them. Neither version of PDH excludes the other.
If true, PDH is important for its far-reaching implications. Although party politics may be unavoidable for judicial aspirants, PDH suggests judges themselves act politically, without financial or career advancement incentives, as they end their courtroom careers. If objective data indicate that judges tend to act politically at career end, then it provides evidence that party politics influences their behavior throughout their judicial careers, when evidence of influence is less available.
More abstractly, PDH is important because it describes a self-replicating system shaped by societal norms; supported by judges’ values, attitudes, and behavior; facilitated by judicial, presidential, and senatorial organizational structures, practices, and procedures; replete with the influence of previous judges on the nominations of current jurists; and, barring unforeseen changes in judicial selection, full of promise that current judges will have opportunities to choose the party whose president will nominate their future replacements. If it is indeed institutionalized, as just described, then politicized departure is likely to be durable, with diffuse effects extending beyond the careers and decisions of individual judges, and past the tenures of individual presidents.
Finally, PDH is important because it implies that party politics influence U.S. social stratification through the courts as well as through elected officials of the legislative and executive branches of government. Competition for market advantage, indicia of social status, and political power (i.e., the entire Weberian stratification paradigm; see Collins 1986) in the United States is governed by federal laws and refereed by federal judges. When disputes over resources, privileges, and competition reach those courts, “Article III judges” interpret laws and admissibility of facts, instruct juries, decide sentences, make monetary awards, sometimes reach verdicts themselves, and issue injunctions to halt prohibited behaviors. Thus, judges are arbiters of competition and disputes involving labor and product markets, public accommodations, schools, housing, voting rights, civil rights, and intragovernmental conflict. If correct, PDH provides a concise explanation of previous findings of correlation between the political parties of presidents and the decisions of judges they appoint (see, e.g., Kang and Shepherd 2011; Kastellec 2011; Shepherd 2009; Spitzer and Talley 2013; Sunstein et al. 2006). A proper empirical test of PDH is thus broadly important for theoretical and policy-related understanding of significant social, political, economic, and legal issues.
Previous studies have used judicial career data to consider PDH (for a review through 2010, see Stolzenberg and Lindgren 2010; see also Bailey and Yoon 2011; Barrow and Zuk 1990; Choi, Gulati, and Posner 2013; Hansford, Savchak, and Songer 2010; Nixon and Haskin 2000; Spriggs and Wahlbeck 1995; Van Tassel 1993; Yoon 2006, 2017; Zorn and Van Winkle 2000). Those studies focus on subsets that constitute a minority of Article III judges (usually Supreme Court justices; occasionally Circuit Courts of Appeals judges), and therefore heighten the current value of testing PDH in the entire Article III judiciary.
New PDH tests are also motivated by labor force studies that suggest the need to control for health, family circumstances, work attitudes, long-term career plans, and other career characteristics that are difficult to ascertain for living judges and simply unavailable for most or all of the dead (on the importance of these measures in the general population, see Munnell, Sanzbacher, and Rutledge 2018 and Stolzenberg 1988; regarding difficulties in obtaining such information from judges, see Greenhouse 1984; on the relationship between retirement and mortality risk of judges, see Stolzenberg 2011). These data and analysis difficulties are analogous to problems found in studies of class size effects on school learning and minimum wage legislation effects on demand for labor (Angrist and Lavy 1999; Card and Krueger 1994). We propose that the same methods used to address those problems in school and employment data, such as regression discontinuity and sharp regression discontinuity (SRD), can be applied to tests of PDH. Application of SRD to PDH is novel, but SRD is both old and now widely applied to social science data (see, e.g., Cunningham 2021; Holland 1986; Morgan and Winship 2014; Thistlethwaite and Campbell 1960; Wasserman 2003:251).
In short, this article describes SRD tests of PDH for judges who were appointed under Article III of the U.S. Constitution, and who terminated full-time judicial service from 1919, when employment terms of these judges first approximated their current form, to 2018, when we began this research.
Previous Research
Stolzenberg and Lindgren (2010: Table 1) list and briefly describe some 20 previous analyses of departures from the Supreme Court of the United States (hereafter, SCOTUS). Some of these studies examine only the statistical distribution of SCOTUS vacancies (Callen and Leidecker 1971; Ulmer 1982; Wallis 1936). Other judicial career research is not probative of PDH. For example, King (1987) and Hagle (1993) combine death-in-office with retirement, resignation, and partial retirement (“senior status”), although death-in-office is an involuntary biological consequence of failure to leave the bench before death, whereas senior status accession, resignation, and retirement are voluntary actions reserved for the living.
Symbols and Definitions
Labor force–wide studies find that the probability of voluntary employment termination varies inversely with workers’ health or “vitality” (Bound 1991; Dwyer and Mitchell 1999; French 2005; Parsons 1982). Virtually all previous historical narrative studies of SCOTUS voluntary terminations consider the retirement effects of declining vitality (Goff 1960; Schmidhauser 1962), or, in Garrow’s (2000) sensational wording, “decrepitude.” In statistical analyses, Squire (1988) includes a measure of poor health, which is criticized by Hagle (1993:35) and Zorn and van Winkle (2000:162). For dead justices, Stolzenberg and Lindgren (2010) use years-left-to-live at a time before death to indicate health at that time. However, remaining lifetime is more reliable for measuring population average health than individual health. Zorn and van Winkle (2000) use justices’ written opinion production to measure physical health, but the many determinants of productivity raise questions about the validity of this measure (see Green and Baker 1991). Finally, we suggest judges’ career and employment decisions may be less affected by actual health and future longevity than by their unobservable perceptions of those things. Sick judges may refuse retirement if they think themselves healthy; healthy people may be more likely to retire if they think themselves ill. Moreover, Hagle (1993:46) asserts that SCOTUS justices are flagrantly dishonest and willfully misleading about their health. Thus, controlling for health in judicial career studies requires methods that do not rely on direct health measurement or candid self-reporting by judges.
Conceptually, differences between ideology and party are stark, because parties are organizations of people, and ideologies are complexes of values, attitudes, ideas, and perceptions (see Martin 2015). Conceptual differences notwithstanding, empirical observations of ideologies and party affiliations of individuals can be correlated empirically, even to the point that effects of one are difficult or impossible to distinguish from effects of the other. In the general population, party identification and ideology of individuals are regularly measured by survey questions. For judges, party identification is conveniently defined and observed as the party of the president who first appointed them to the Article III bench. But judicial custom and ethics make measurement of ideology more involved. Pinello (1999:219) reviews and exhaustively meta-analyzes 84 prior studies, then concludes, “party is a dependable yardstick for ideology.” Thus, Pinello implies that ideology and reciprocity versions of PDH are empirically indistinguishable, even if conceptually dissimilar.
Judicial ideology measurement has grown considerably since Pinello (1999). Martin and Quinn (2002) (hereafter MQ) show that ideology can be measured without reference to party, by Item Response Theory (IRT) scaling of SCOTUS justices’ votes in court decisions. In a computational tour de force, MQ calculate annual ideal point IRT ideology scores for SCOTUS justices, starting in 1937, 3 based on voting in case decisions. Whatever their advantages, MQ methods cannot be applied to district court judges who do not vote on panels, as do SCOTUS and appellate court judges. Judicial Common Space (JCS) scales combine MQ scores with other data for SCOTUS justices. For Circuit Courts of Appeals judges, JCS confounds party and ideology, which JCS infers from the political parties of the appointing president and senators from a judge’s home state. In a novel, indirect measurement strategy applied to judges of all Article III courts, Bonica and colleagues (2019) use political donations of money by law clerks of Article III judges to indicate political ideologies of the judges for whom they work. 4
In short, techniques for measuring judicial ideology have developed considerably since Pinello’s analysis, reducing confidence in the current validity of his 1999 claim that empirical measures of party identity and judicial ideology are generally indistinguishable. To update Pinello’s analyses, we examine the empirical congruence of party and ideology measures by principal components factor analysis of data on the 31 SCOTUS justices who served at any time from 1960 to 2018. We focus on SCOTUS justices because they are the only judges for whom there exist ideology measures that are not at least partially based on party identity (i.e., MQ scores). We focus on 1960 to 2018 to include other ideology scores that are available only after 1960. We end observations in 2018, because that is the year we began the research reported here. Factor analyzed variables include lifetime averages of Bailey, MQ, and JCS scores, plus Rep (= 1 for justices first appointed to the Article III judiciary by a Republican president; = 0 else). Data and analysis details are given in the Appendix.
Using data just described, principal components factor analysis finds only one factor with an eigenvalue greater than 1, and it explains 99.37 percent of the variance among Rep, Bailey, MQ, and JCS scores. Factor loadings all exceed .65 and average .90. Although the small N for the analysis, and its restriction to SCOTUS justices from 1960 to 1988, suggest restraint, findings are bolstered by consistency with Pinello’s summary of previous studies. Thus, despite new methods and resurgent interest in distinguishing ideology effects on PDH from party identity effects, the factor analysis suggests that for SCOTUS justices, party identity and the ideology scales analyzed here are all indicators of the same underlying factor.5,6
For the present purpose of testing PDH in the entire Article III judiciary, the implications of past research and the factor analysis results just presented can be summarized briefly in three points. First, testing PDH in the entire Article III judiciary remains important; it is the focus of the present analyses because a disproportionate share of prior PDH research focuses on SCOTUS justices, who are a small segment of the Article III judiciary. Second, as a practical matter, PDH effects of ideology are not distinguishable from PDH effects of party identity, except, possibly, for some judges of some courts. Nonetheless, even without an empirical distinction between ideology and party identity effects, politicized departure remains an important hypothesis of long-standing importance. Third, general labor force retirement studies suggest judges’ unobservable personal characteristics and circumstances affect their ability to adjust the timing of their retirements and resignations from full-time judicial service.
Analytic Strategy
We restate the PDH as follows: When judges are ready to end their full-time federal judicial service, those who were first appointed by a Republican president are more likely to end full-time service when the incumbent president is a Republican than when the president is a Democrat, all else equal. Similarly, when Democratic appointees decide to end their full-time judicial service, they are more likely to do so when the incumbent president is a Democrat, all else equal.
We test these hypotheses by selecting pairs of time periods in which all determinants of termination probability, except the political party of the incumbent president, may be regarded as identical, or nearly so, for every judge who terminates full-time service in either period. If pre-election and post-inauguration periods are adjacent and sufficiently short, judges’ attitudes, values, health, family characteristics, finances, and other retirement-related characteristics can be considered the same in both periods, leaving the political party of the sitting president as the only retirement-related characteristic that changes with the inauguration of a new president. Consequently, any difference between the termination probability after inauguration and the probability before the election is attributed to the change in presidential party.
Circumstances just described occur naturally but irregularly, shortly before “regime-changing” elections (here defined as elections and inaugurations that replace Democratic presidents with Republicans, or vice versa) and the inaugurations that follow them. For example, consider the 270 days (about nine months) before the presidential election of 2008 and the equal period after the inauguration in 2009. The 2008 pre-election president was Republican; the 2009 post-inauguration president was Democratic. We assume retirement-related characteristics of judges do not differ meaningfully between adjacent pre-election and post-inauguration periods. If this assumption is tenable, then the average treatment effect of a Democratic president on departures from full-time judicial service of Democratically-appointed judges is the difference between the proportion of Democratically-appointed judges who retire in the 2009 post-inauguration period and the proportion of Democratically-appointed judges who retire in the 2008 pre-election period. The PDH hypothesis can be expressed as a positive after–before difference in the number of terminations, a positive after–before difference in the rate of terminations, an after–before ratio greater than one, or an after–before odds-ratio greater than one, depending on statistical preferences. 7
Regime-changing elections and inaugurations occur 11 times from 1920 to 2017 (i.e., elections of 1920, 1932, 1952, 1960, 1968, 1976, 1980, 1992, 2000, 2008, and 2016). By starting these analyses in 1920, we evade statistical consequences of a small judiciary in earlier years (the entire Article III judiciary does not exceed 200 active-duty judges consistently until 1919), and we avoid problems of comparing terminations of full-time judicial service before and after the 1919 modifications of Article III judicial employment regulations, which created the option of senior service for long-serving, sub-SCOTUS judges. Senior service accession facilitates terminations from full-time service by permitting judges a reduced caseload, or no cases at all, without loss of honorific status, income, or other perquisites.
As an additional control for confounding and spuriousness due to unobserved variables, we also calculate the same after–before voluntary termination probability difference for judges first appointed by a president of the same party as the recent presidential election loser, and subtract it from the difference obtained from judges appointed by presidents of the same party as the election winner. This is the difference-in-differences (hereafter, DiD) statistic. Again, depending on statistical preferences, DiD can be expressed as a difference between rates, a ratio, or an odds ratio. PDH predicts a positive value for DiD based on differences between rates, or ratios greater than unity, if DiD is based on ratios and odds ratios.
To observe and control effects of historical peculiarities, such as time elapsed between regime-changing elections or the political balance of the Senate, we replicate analyses at each of the 11 regime-changing presidential elections from 1920 to 2016. For example, Eisenhower’s 1952 election was the first regime-changing election after 1932. Perhaps World War II, the Great Depression, or the unusually long, 20-year interval between these regime changes altered career dynamics for politically-influenced federal judges during F. D. Roosevelt’s presidential tenure. Similarly, to distinguish political party effects from PDH effects, we stratify analyses by the party of the winner of the regime-changing election—six Republican and five Democratic regime-changing victories from 1920 to 2016.
We perform all analyses separately for pre-election and post-enumeration periods of 180, 270, 365, 547, and 730 days, or approximately 6, 9, 12, 18, and 24 months before the regime-changing election, and after the subsequent inauguration. 8 Thus, we stratify analyses by length of the pre-election and post-inauguration enumeration periods, to determine if the treatment effect is strongest at the beginnings of presidential terms in office, when incumbent presidents tend to be most popular, have their greatest Senate support, and have the maximum time available to negotiate Senate confirmation of nominees.
Finally, we emphasize that the hypothesized PDH effect on judicial full-time service departures is probabilistic and incomplete (thus neither necessary nor sufficient). For example, judges’ voluntary terminations from full-time judicial employment may coincide randomly with White House occupancy by presidents of the same party as the presidents who first appointed them to the federal bench, or fail to coincide despite efforts by judges to arrange the contrary. Also, judges’ desires to comply with norms of reciprocity and enduring ideology may be overwhelmed by their inaccurate predictions of future presidential election outcomes, or by unexpected personal exigencies. As Justice Ginsburg illustrates, inaccurate election predictions and personal exigencies can defeat intentions for politicized departure, thus reducing the number of politicized departures, biasing Diff and other measures downward, and thereby making tests of PDH more stringent than their significance levels imply. But good luck and accurate predictions neither compel nor motivate politicized departure, and so do not undermine PDH tests used here.
Research Design and Data
The process just described appears to be a previously unnoticed, naturally occurring example of the sharp regression discontinuity (SRD) research design, with 11 replications (Cattaneo and Vazquez-Bare 2016; Imbens and Lemieux 2008; Lee and Lemieux 2010; Thistlethwaite and Campbell 1960). The hallmark of SRD is abrupt, exogenous change in the state or value of a treatment. 9 We now describe the design of this research in the language of experimentation, focusing on subjects, outcomes, and treatments.
Subjects
The units of analysis—the subjects—in analyses presented here are persons who were employed full-time as Article III federal judges for at least 730 days (about two years) prior to a regime-changing presidential election between 1920 and 2016. 10 For brevity, we call retirements, resignations, and accessions to senior status “trigger actions,” because they trigger new presidential nominations to the bench. Prior service of at least 730 days excludes judges who lack a minimal claim to a federal judicial career, rather than a recent posting to a new job. Requiring a year of post-inaugural life avoids the need to distinguish judges who take a trigger action in that period from those who might have done so, had they endured. Judges are excluded from analysis if they leave office involuntarily due to death, abolition of their appointed court, or Congressional impeachment and conviction.
Treatment
Treatment occurs during enumeration periods shortly before regime-changing elections, and shortly after inaugurations that follow them. For each judge, treatment consists of changing the party of the incumbent president from “different from” to “the same as” the political party of the president who first appointed them to the federal judiciary. Characteristics of judges are assumed to not change meaningfully from the start of the pre-election enumeration period to the end of the next post-inauguration period. These characteristics include judges’ perceptions of their own health, personal finances, job satisfaction, and desire to retire.
Outcomes
For any of the 11 regime-changing elections considered here, three outcomes are possible: judges can take no trigger action; they can take a trigger action in the pre-election period; or they can take a trigger action in the post-inauguration period.
Effect Measures
PDH predicts that, if treated judges terminate full-time service about the time of a regime-changing election, they are more likely to do so post-inauguration than pre-election. Thus, for any particular regime-changing election, the treatment effect is the difference between the number of treated judges who terminate full-time service in the post-inauguration period and the number of treated judges who terminate full-time service in the pre-election period. Growth of the federal judiciary from 1920 to 2018 would affect these numbers, so results are expressed as proportions, odds, and odds ratios, per common statistical practice (Agresti 1990). Counts and proportions can be recovered from n’s, odds, and odds ratios.
An Example
Figure 1 provides a schematic diagram of the analysis design, the hypotheses it tests, and treatment effect measures for a single election–inauguration (2008 to 2009; won by the Democratic candidate) and enumeration periods of 270 days before the election and after inauguration. Symbols and terms are defined in Table 1.

Simplified Nonparametric Regression Discontinuity Design for Analyses of Judicial Trigger Actions, When Republican President Is Incumbent before Election and Democrat Is Inaugurated after Election
Row labels on the left side of Figure 1 distinguish untreated (Republican) appointees in the bottom row from treated (Democratic) appointees above them. Across the top, column labels distinguish pre-election periods on the left from post-inauguration periods to the right. Judges in Group A were first appointed by Democratic presidents. After the election, those Democratic-appointee judges appear in “Group B.” Hypothesis 1 asserts that the number of trigger actions by judges in Group B after the election (∑dYaRD) exceeds the number of trigger actions by those same judges before the election (∑dYbRD) when they constitute Group A. Without loss of information, the numbers of triggers in Group A and Group B can be divided by the number of Democratic-appointee judges Nd to obtain proportions, and the hypothesis becomes HA: (∑dYaRD/Nd) – (∑dYbRD)/Nd) > 0. Re-scaling proportions to odds and comparing them by division instead of subtraction yields the odds ratio, Diff = dOaRD
We also compute difference-in-differences (DiD), which is the ratio of Diff for judges appointed by presidents of the same party as the winner of the most recent presidential election to the same ratio for judges appointed by presidents of the same party as the loser of the most recent presidential election. DiD controls for the possibility that some unrecognized agent appeared in the form of a secular trend or a random shock to increase trigger actions after inauguration by all judges, regardless of the party of the president who first appointed them to the federal bench.
We also consider a measure we call Directional Diff in Diff (hereafter, DDD), which compares Diff to the end-of-term odds-ratio measure of the effect of the pre-election president’s political party on terminations by judges first appointed by presidents of that party. DDD is useful in addressing the secondary hypothesis that political influence on trigger-action timing declines as the presidential term of office approaches expiration.
For the 2008 election and 2009 inauguration shown in Figure 1, there are 755 judges appointed by Republican presidents and 499 appointed by Democrats. In the 270 days preceding the 2008 presidential election, 13 Republican appointees and 12 Democratic appointees took trigger actions. In the 270 days following the 2009 inauguration, 15 Republican appointees and 26 Democratic appointees took trigger actions. Odds and odds ratios are computed with the usual continuity correction of .5 (Agresti 1990:68), yielding the following results:
The odds ratio, Diff, equals 2.18, indicating that, as the political influence hypothesis predicts, the odds that Democratic appointees take a trigger action in the post-inauguration period are more than twice the odds they do so in the pre-election period.
The value of DID, the ratio of Diff for Democratic appointees to the same odds ratio for Republican-appointed judges in the same period, is 1.90, indicating that, even if a secular trend or aberrant influence increased post-inauguration departures from full-time judging, the increased odds ratio for Democratic appointees predicted by the political influence hypothesis remains 1.90 times the size of the odds ratio for Republican appointees.
The value of 2.51 for DDD indicates that the boost in odds of trigger actions by Democratic appointees during the first 270 days of this regime-changing Democratic presidency is about two and one half times as large as the disparity between Republican appointee odds of trigger action during the last 270 days before the election, when the president was Republican. This result for DDD is consistent with the hypothesis that political influence effects decline as the end of the presidential term approaches.
Identification of effect measures in these analyses is explicated formally by Hahn, Todd, and Van der Klaauw (2001) (see also Cattaneo and Escanciano 2017; Cattaneo, Titiunik, and Vazquez-Bare 2017; Imbens and Lemieux 2008; Lee and Lemieux 2010). Informally, identification is apparent from several design features of this research. First, there is no self-selection for treatment: assignment to control and treatment groups is determined by the outcome of a presidential election, and therefore beyond control by any individual judge. 11 Second, temporal ordering and close conjunction of treatment and outcome are ensured by strictly-defined periods in which the outcome is measured and the treatment is either entirely present or completely absent. Even if subjects’ unspecified characteristics affect outcomes, their effects are cancelled by division in calculation of Diff. And, third, effects are measured by comparisons of treated individuals to themselves when not treated, thereby permitting the assumption that unobserved characteristics of treated and untreated subjects do not differ. Formally, this last comparison is stratification on retirement/resignation/accession to senior status (retirement): everyone in the analysis is leaving full-time judging during an interval that straddles an election and inauguration. The estimand of interest compares the ratio of the probability of actual departure during the term of the outgoing president. As described by Frangakis and Rubin (2002), this stratification on retirement renders retirement invariant in the analyses and therefore without effect on the estimand (Diff), obviating any need to specify an instrument for retirement. For a comparison to instrumental variables estimation, see Angrist, Imbens, and Rubin (1996).
Replication and Stratification
We apply the regression discontinuity design method just explicated to federal judicial trigger actions immediately before and after each of the 11 regime-changing presidential elections from 1920 to 2016, using data from 1919 through 2018. Because six of these regime-changing elections were won by Republicans, and five were won by Democrats, the replication also stratifies the analysis by the party of the presidential election winner.
Significance Tests
We perform separate, disjoint tests of PDH, one for each regime-changing presidential election from 1920 to 2018. Absent any PDH effect, and other things equal, probabilities of retirement before and after the election would be equal, so that dYaRD = dYbRD. Following Agresti (1990:352), the null hypothesis of no presidential party effect on voluntary terminations is
and dYaRD – dYbRD > 0 is distributed as Bernoulli (binomial) trials with p = .5 and n = 11. The probability of 8 or more successes is .113, which is the test significance level. For 9, 10, or 11 successes, significance levels are .033, .006, and .0005, respectively. In six analyses of Republican appointees, probabilities of five or more, or four or more, successes are .109 and .344, respectively. For n = 5 analyses of Democratic appointees, the probability of four or more successes is .188, and the probability of three or more is .500. These tests do not address compound null hypotheses.
Data
Primary data examined here were produced by extensive checks, corrections, and re-codes of data downloaded from the Federal Judicial Center (n.d.a.) on April 28, 2018. Most corrections are based on consistency checking and comparison with records and online biographies from the Federal Judicial Center (n.d.b.) and Abraham (1999), resulting in a file of 86,316 judge-year records for all 3,516 individuals who were nominated by presidents to Article III judicial positions, confirmed by the Senate, and commissioned in office, from 1789 to April 2018.
Results
11 Results for Diff in Nine-Month Observation Periods
Table 2 reports values of Diff in column (3) for analyses in which pre-election and post-inauguration periods are both 270 days, for all regime-changing elections from 1920 through 2016. Per column (3), Diff exceeds one in 10 of 11 analyses, and it is consistent with the first hypothesis at a significance level of .006. Consistent with PDH, the mean of Diff is 3.23: on average, the odds of a trigger action in the post-inaugural period is 3.23 times the odds of a trigger action in the immediately preceding pre-election period.
Analyses of Trigger Actions 270 Days before 11 Regime-Changing Presidential Elections and 270 Days after Subsequent Inaugurations
Source: Computed by authors from data downloaded from the Federal Judicial Center (n.d.a.) on April 28, 2018, and subsequently corrected by authors.
Note: In columns (4), (6), and (8), “+” indicates the odds ratio is greater than 1; a blank indicates the relevant odds ratio does not exceed 1. In Columns (3), (5), and (7), winner’s odds are the odds of a trigger action by judges first appointed by a president of the same party as the presidential election winner.
Figure 2 plots Diff for 270-day enumeration periods, from 1919 to 2018, with a line fitted by Cleveland’s (1979) “robust locally weighted regression” method. The main finding, seen in the solid line in Figure 2, as from column (3) of Table 2, is that temporal variation in Diff reflects atypically large values at the elections of 1952 and 1960, but is consistent with PDH.

Diff, DiD, and DDD, by Election Year, for 270-Day Enumeration Periods, with Values Smoothed by Cleveland’s Robust Locally Weighted Regression (Bandwidth = .9)
DiD Results for Nine-Month Observation Periods
Consistent with PDH, column (5) of Table 2 shows the mean of DiD as 4.19. On average, Diff is 4.19 times as large for judges first appointed by presidents of the same party as the newly-inaugurated president (concordant party judges) as for those first appointed by presidents of the other party (discordant party judges). Consistent with PDH, DiD exceeds one in 10 of 11 analyses, for a binomial test significance level of .00059.
DDD Results for Nine-Month Observation Periods
Directional Diff in Diff (DDD) compares beginning-of-presidential term PDH effects to end-of-term PDH effects. The mean of DDD in column (7) of Table 2 is 3.30, indicating that the effect of party concordance is more than three times as large at the start of a president’s term as at the end. DDD exceeds unity in 8 of 11 election–inauguration sequences, with a significance level of .113.
Party Differences
Rows 2 and 3 of Table 3 compare values of Diff, DiD, and DDD for all 11 regime-changing presidential election–inauguration sequences from 1920 to 2018, separately for the six elections won by Republicans and the five elections won by Democrats. At every observation period length, Diff is larger, on average, when Republicans win than when Democrats win. Indeed, for 14 of these 15 comparisons of row (2) to row (3) of Table 3, the average values of Diff, DiD, and DDD obtained under Republican presidents exceeds the average value obtained under Democrats. These results are consistent with the claim that exit timing of Republican appointees is more influenced by the political party of the newly-elected president than exit timing of Democratic appointees. We did not hypothesize party differences, and we know of no previously-published hypotheses of party differences in PDH effects, so we only note them, and wait for future research to properly test for and explain their existence.
Results of 55 Election- and Inauguration-Specific Analyses of Trigger Actions, by Parties of Appointing President and Election Winner
Note: For 2016 election, 547- and 730-day post-inauguration enumeration periods are truncated to 536 days.
Enumeration Period Length Effects
Results presented so far pertain to 270-day observation periods (about nine months) before regime-changing elections and after regime-changing inaugurations. Row 1 of Table 3 summarizes results for periods of 180, 270, 365, 547, and 730 days—about 6, 9, 12, 18, and 24 months—for Diff, DiD, and DDD. 12 As observation periods lengthen, Table 3 shows that average values of Diff and DDD decline strictly monotonically. DiD declines similarly, although its value in one-year observation periods is larger than for the nine-month periods. These patterns are consistent with the assertion that judges who wish to leave full-time service honor principles of enduring ideology, party reciprocity, or both, but only up to a point. For some, that point seems to be based on the time they must linger in full-time jobs they wish to leave, although others cling to their posts as long as they live.
Amalgamated Results
Table 4 retabulates voluntary terminations in 270-day enumeration periods, by concordance of the party of the presidential election winner with the party of the appointing president, for all 11 regime-changing election–inauguration periods from 1920 to 2017.
Consolidated Counts of Trigger Actions, 1920 to 2018, by Concordance of Party of Appointing President and Party of Election Winner, 270 Days before and after 11 Regime-Changing Elections
Note: The number of judges varies from years 1920 to 2017. See text for explanation of use of counts in this table rather than odds, probabilities, and proportions.
Although not a proper test of PDH, Table 4 is consistent with it: 225 judges appointed by presidents of the same party as the recently elected president resigned or took senior status in these enumeration periods, triggering new presidential appointments. Of these, 36.0 percent did so in the pre-inauguration period, and, consistent with PDH, 1.8 times as many (64.0 percent) did so in the post-election interval—a difference of 28.0 percent. For judges appointed by presidents of the election-losing party, the corresponding difference is −7.6 percent, and the difference between these differences is 35.6 percent, which is consistent with PDH.
Party Control of the Senate
Are results affected by variation in party control of the Senate? In short, we find no co-variation between party control of the Senate and consistency of results with PDH. So results provide no evidence that party Senate control explains support for PDH. In particular, the party of the winning presidential candidate also held post-inaugural control of the Senate in all regime-changing elections from 1920 to 2016, except after the 1968 election (U.S. Senate n.d.). Looking in Table 2, for 1968, notice that values for Diff, DiD, and DDD all exceed one and are consistent with the PDH. In 2000, Republicans took the presidency, but on June 6, 2001, they lost control of the Senate (U.S. Senate n.d.). Nonetheless, Table 2 shows values of Diff, DiD, and DDD for that election–inauguration cycle are as predicted by PDH.
Discussion and Conclusions
This article considers the politicized departure hypothesis, a venerable but still controversial assertion that as Article III judges approach the ends of their careers, they tend to adjust the timing of their departures so that the rights to nominate their replacements are given to presidents of the same political party as the president who first appointed them to the federal bench. Whether or not judging is biased is a question of enduring interest (Harris and Sen 2019) and previous research on politicized departure is abundant, but questions remain. To wit, prior research gives little attention to judges of Article III courts below the Circuit Courts of Appeal; and judicial ethics and custom discourage judges from providing information about their health, family circumstances, job attitudes, work satisfaction, and similar things that have been shown to affect voluntary job termination and retirement in the general population.
To escape the problems of unmeasured and unknown omitted variables, and to expand coverage to all Article III judges, we apply sharp regression discontinuity methods, with and without the difference-in-differences estimator, to the entire Article III judiciary. To apply SRD, we examine situations in which the political party of the sitting U.S. president changes abruptly over a span of time that is too short for retirement-related characteristics of judges to change much, if at all. We observe that such situations occur repeatedly, shortly before regime-changing presidential elections and shortly after the presidential inaugurations that follow. Our application of regression discontinuity methods to the politicized departure hypothesis appears to be novel, even if neither regression discontinuity nor potential outcomes methods are new (see Haavelmo 1943, 1944; Holland 1986; Holland and Rubin 1988; Sobel 2000; Thistlethwaite and Campbell 1960). As we compare periods just before regime-changing elections to periods of equal length immediately after those elections, we find, consistent with the politicized departure hypothesis, that Article III judges are more likely to retire when their party’s candidate wins the election and sits in the White House, than immediately earlier, in the pre-election period, when the president is of the other party.
SRD, like other potential outcomes research designs, gains much of its power by a strategy that is characteristic of scientific experiments and anomalous in survey research: it focuses on times and conditions in which treatment effects are apparent—even if those circumstances are atypical—and ignores other circumstances altogether. When regime-changing presidential elections occur, the politicized departure hypothesis predicts more retirements in the post-inauguration period than in the pre-election period before it, for judges who were first appointed by presidents of the same party as the recently elected president. We report that difference as Diff, as well as a difference-in-differences (DiD) estimator, and related quantities. This SRD pseudo experiment is replicated 11 times between 1920 and 2018. For pre-election and post-inauguration observation periods of 270 days, we find values of Diff and DiD that are consistent with PDH in 10 of these 11 replications. Treating these 11 analyses as binomial trials leads to rejection of the null hypothesis of no PDH effects. Less formally, results lend credence to the PDH.
The clarity of SRD is valuable but not costless. In particular, in the 98 years from 1920 to 2018, there have been 25 elections, of which 11 were regime-changing and suitable for the regression discontinuity method we apply. Similarly, the data and method used here do not allow much partitioning of judges into subsets based on organizational, demographic, or political characteristics, so little can be said about, for example, differences or nondifferences between SCOTUS justices, judges of the Circuit Courts of Appeals, and district courts. Potential outcomes analyses specific to the SCOTUS and the Circuit Courts of Appeals would require methods more suited to small n’s than those we apply here. It appears that some invention would be needed to create those methods.
Although we did not hypothesize party differences before undertaking this research, we observe stronger average gross PDH effects for Republican appointees than for Democratic appointees. These effects and differences are gross, rather than adjusted, insofar as results for Republican and Democratic appointees are measured at different times, and therefore, perhaps under different conditions. Like any results not hypothesized in advance of their detection, these differences are harder to distinguish from statistical noise than if they were predicted a priori. Indeed, one could as easily conjure a post hoc expectation of this finding as its opposite, or a finding of no difference at all. For that reason, examination of party differences might require a different method or different data than we use here. For example, future research might consider the hypothesis that Republican presidents are more likely than Democrats to appoint party stalwarts, such as individuals who have run for public office as party candidates. Or one might hypothesize that party differences in this judicial behavior are the result of party differences in grooming and systematic persuasion after judges take office. To wit, Teles (2008) offers a model of judicial influence in which presidential nomination is a mere first step in a diffuse, ongoing, career-long, and fully institutionalized pattern of effort by ideologues and commercial interests to influence the perceptions and decisions of federal judges and the law professors who teach them, long before they reach the bench.
Finally, it seems important to revisit the fundamental question that motivates tests of PDH: Are Article III judges influenced by politics while in office? The politicized beginnings of Article III judicial careers are apparent from the nominations of these judges by the politicians who serve as elected presidents, and their confirmations by politicians who serve as elected Senators. But PDH suggests judges themselves tend to behave politically, even at the final moment of their full-time courtroom careers, without discernible incentives, financial or otherwise, long after their confirmation hearings. If it is apparent that judicial careers are politically vetted at their start, and if sharp regression discontinuity analysis of objective data indicates that judges tend to act politically at career end, then we think there is reason to believe that politics has been an active influence on many of these judges in the interim.
Footnotes
Appendix: Factor Analysis of Ideology and Political Measures for 31 Supreme Court Justices,1960 To 2018
Lifetime Average Ideology Scores and Party of First Appointing President of Supreme Court Justices, 1960 to 2018
| Last Name | Bailey Score | JCS Score | MQ Score | Republican Appointee |
|---|---|---|---|---|
| Alito | 1.13 | .58 | 1.82 | 1 |
| Black | –1.61 | –.42 | –1.76 | 0 |
| Blackmun | –.07 | .07 | –.03 | 1 |
| Brennan | –1.13 | –.44 | –1.78 | 1 |
| Breyer | –.65 | –.33 | –1.23 | 0 |
| Burger | .94 | .59 | 1.89 | 1 |
| Clark | .23 | .26 | .46 | 0 |
| Douglas | –1.90 | –.71 | –4.72 | 0 |
| Fortas | –1.09 | –.37 | –1.33 | 0 |
| Frankfurter | .34 | .26 | .52 | 0 |
| Ginsburg | –.93 | –.43 | –1.73 | 0 |
| Goldberg | –.97 | –.29 | –1.08 | 0 |
| Gorsuch | 1.02 | .42 | .98 | 1 |
| Harlan | .77 | .53 | 1.62 | 1 |
| Kagan | –.72 | –.43 | –1.58 | 0 |
| Kavanaugh | .77 | .30 | .54 | 1 |
| Kennedy | .42 | .33 | .68 | 1 |
| Marshall | –1.47 | –.58 | –2.83 | 0 |
| O’Connor | .56 | .41 | 1.01 | 1 |
| Powell | .49 | .42 | .97 | 1 |
| Rehnquist | 1.37 | .68 | 2.97 | 1 |
| Roberts | .65 | .39 | .93 | 1 |
| Scalia | 1.18 | .66 | 2.51 | 1 |
| Sotomayor | –1.17 | –.61 | –2.68 | 1 |
| Souter | –.52 | –.17 | –.77 | 1 |
| Stevens | –1.02 | –.39 | –1.81 | 1 |
| Stewart | .15 | .25 | .40 | 1 |
| Thomas | 1.45 | .75 | 3.60 | 1 |
| Warren | –.85 | –.34 | –1.26 | 1 |
| White | .16 | .25 | .44 | 0 |
| Whittaker | .59 | .46 | 1.17 | 1 |
Data sources: JCS Scores: Epstein 2021. Bailey Scores: Bailey 2021. MQ Scores: University of Michigan 2021. Biographical information, dates of service, and party of first appointing president come from https://www.supremecourt.gov/about/members_text.aspx.
Acknowledgements
For useful advice and comments on earlier versions, we thank Ronald Burt, Lis Clemens, Alberto Palloni, Michael Sobel, and, especially, Donald Treiman, as well as the editors and anonymous reviewers. Responsibility for errors is the authors’ alone. Stolzenberg thanks New York University for his support as a visiting scholar in 2018–19. Statistical analyses in this article were conceived, executed, and described herein by the corresponding author.
