The Impact of United States v. Booker and Gall/Kimbrough v. United States on Sentence Severity

Abstract

In the wake of United States v. Booker and Gall/Kimbrough v. United States, sentencing researchers and legal scholars conducted research designed to identify their impact on the federal sentencing process, with a focus on determining whether the decisions increased unwarranted disparity. In this article, we extend this body of research. Using 10 years of data from the U.S. Sentencing Commission and data from other sources, we assess whether and how these decisions influence sentence severity. Results indicate that sentence severity declined following Booker and, especially, Gall/Kimbrough, but that the decisions’ effects on sentence severity varied significantly across U.S. District Courts. Most importantly, the impact of Gall/Kimbrough sentence severity was conditioned by districts’ percent Black population, level of socioeconomic disadvantage, and degree of political conservatism; each of these factors moderated the decisions’ effects on the harshness of the sentences imposed by the districts’ judges.

Keywords

federal sentencing sentence severity social context

Introduction

Recently, the federal sentencing process was reshaped by a series of rulings from the U.S. Supreme Court—most salient, United States v. Booker (543 U.S. 220, 2005, hereinafter Booker) and Gall/Kimbrough v. United States (552 U.S. 38, 2007, 552 U.S. 85, 2007, respectively, hereinafter Gall)—which rendered and subsequently reaffirmed the U.S. Sentencing Guidelines as “effectively advisory,” thus reviving judicial discretion. In the wake of these decisions, sentencing scholars (Engen, 2009; Hofer, 2007; Spohn, 2011) called on researchers to “identify and quantify the effects of this change and to learn whatever lessons this natural experiment might tell us about the federal sentencing system” (Hofer, 2007, p. 437). Indeed, crime and justice scholars have amassed a fairly large body of research evaluating post-Booker and post-Gall sentence outcomes. And with few exceptions, these studies reveal that these legal contingencies in U.S. District Courts have not substantially altered judicial decision making and that unwarranted disparity has not increased (Fischman & Schanzenbach, 2012; Scott, 2010; Starr & Rehavi, 2012; Tiede, 2009; Ulmer & Light, 2010, 2011; Ulmer, Light, & Kramer, 2011a, 2011b; but see U.S. Sentencing Commission [USSC], 2010, 2012).

Despite the theoretical and empirical scholarship on sentencing that has emerged in the post-Booker/Gall era, two critical questions remain unanswered. First, the existing research has been centered on whether unwarranted disparity increased or decreased when the guidelines were rendered advisory but has not systematically addressed whether the absolute level of sentence severity has changed over time, particularly sentencing discretion in the post-Gall period (Bowman, 2012; Starr & Rehavi, 2012). This concern was voiced by Engen (2009), who argued that, “We must expand research on prosecutorial and judicial discretion beyond the question of unwarranted disparity that has dominated the research agenda for the last 25-plus years,” and who called for research that includes “an explicit focus on . . . the severity of punishment related to the structure and content of sentencing laws” (pp. 332-333).

A second limitation of the extant research on the impact of Booker and Gall is that a limited number of studies have addressed whether the effects of the Court’s rulings vary depending upon the social contexts in which individual judges process cases (but see Farrell & Ward, 2011; Ulmer et al., 2011a). It is well established that courts operate within a localized legal culture in which legal actors share values and norms (Eisenstein, Flemming, & Nardulli, 1988; Eisenstein & Jacob, 1977; Myers & Talarico, 1987; Ulmer, 1997). Two recent studies found that the association between judicial characteristics—namely, gender, race, and political orientation—and sentence severity is more pronounced under the less-rigid, advisory sentencing guidelines (Fischman & Schanzenbach, 2011; Scott, 2010). Largely unconsidered by the literature, however, is whether judicial decision making under the different legal structures following Booker and Gall is shaped by court jurisdictions’ social context, for instance, federal districts’ case composition and racial and ethnic minority population.

Our goal in this article is to build on and extend research on federal sentencing in the post-Booker/Gall era, with a focus on the overall effects of the decisions on sentence severity. Of particular relevance is whether the effects of Booker and Gall on sentence severity are conditioned by district-level characteristics. To this end, drawing from 10 years of data from the USSC and data from a number of other sources and using a multilevel modeling strategy, we assess whether there is variability in sentence outcomes across sentencing reforms and between U.S. District Courts. Specifically, this study assesses (a) whether the Booker and Gall rulings have produced changes in sentence severity, (b) whether these changes vary across district courts, and, if so, (c) whether districts’ social and political characteristics account for these variations. The broader purpose of this study is to investigate how district court judges have responded to Supreme Court decisions that enhanced their discretion and that therefore have the potential to change the punishment to which defendants are subjected.

Theoretical Perspectives and Hypotheses

Judicial Discretion and Sentence Severity in Federal Sentencing Context

Our first research question asks whether there have been changes in the absolute level of sentence severity in the wake of the Booker/Gall decisions. Theoretical propositions on judicial decision making have focused primarily on the relationship between discretion and sentencing disparity, especially unwarranted sentencing disparity (Albonetti, 1991; Steffensmeier, Kramer, & Ulmer, 1998), rather than addressing the mechanisms by which judicial discretion affects sentence severity. Given the lack of theoretical attention to this issue, we draw from prior research on judicial decision making to construct two competing explanations regarding potential changes in sentence severity following the Booker and Gall decisions.

The first proposition, the “stability hypothesis,” posits that sentence severity did not change in response to Booker and Gall. This hypothesis is premised upon an assumption that the level of judicial discretion following Booker/Gall is not fundamentally different from the level of discretion available to judges before the Booker decision. At least three explanations for why this may be the case exist. First, although the U.S. Sentencing Guidelines are now advisory rather than mandatory, judges are nonetheless required to refer to the guidelines in calculating the recommended sentence and must provide a “statement of reasons” for sentencing outside the guideline range. These mechanisms, which are designed to heighten transparency and accountability, may reduce the likelihood of significant changes in judicial decision making. Second, sentencing practices across U.S. District Courts are based on a uniform and consistent criminal code, and this uniformity is reinforced by national-level training and discourse built into individual judges’ sentencing norms. Third, approximately two thirds of federal judges were appointed subsequent to the implementation of the federal guidelines. Thus, judges’ sentencing decisions may reflect a legal culture of conformity to the guidelines, the only sentencing structure most have ever known. These considerations, then, suggest that sentence severity will not change significantly in response to changes in sentencing policy (Berman, 2005; Schanzenbach & Tiller, 2007; Ulmer et al., 2011a, 2011b).

A second proposition is that sentences became more or less severe as judges exercised their discretion to sentence outside the guideline range. We contend that there is little, if any, theoretical rationale for arguing that the expansion of judicial discretion would lead judges to impose harsher sentences.¹ Judicial criticism of the guidelines has focused on the fact that they are overly harsh and rigid, and not on the fact that they result in sentences that are too lenient (Stith & Cabranes, 1998; USSC, 2004). There are, on the other hand, reasons for suggesting that sentences became less severe in the post-Booker/Gall era. This hypothesis—the “decreased punitiveness hypothesis”—is premised on the notion that discretion is a tool that can be used for “better-reasoned” decision making (Gottfredson & Gottfredson, 1987; Hofer, 2006; Tonry, 1995). The widespread criticism of the U.S. Sentencing Guidelines as inappropriately punitive (Nagel & Schulhofer, 1992; Tonry, 1996), coupled with the fact that judges are not mandated to follow the guidelines, may lead judges to depart from the guidelines to ameliorate sentences that they consider to be unduly severe (Starr & Rehavi, 2012). Partial empirical support for this proposition is indirectly offered by the considerable increase in downward departures (USSC, 2010; Ulmer & Light, 2011) observed following the Booker and Gall decisions. This perspective therefore implies that judges, whose discretion has been enhanced and whose sentences are not likely to be subjected to appellate review, will hand down more lenient sentences. It is our contention that the more likely outcome of Booker and Gall is a reduction in the severity of sentences imposed by federal judges. We believe that—the requirement that judges consider the guideline range in determining the appropriate sentence notwithstanding—there has been an expansion of judicial discretion. We also believe that judges, the majority of whom believe that sentences imposed on some types of offenders are overly harsh (USSC, 2004), will use their discretion to reduce the punishment to which offenders are exposed. Therefore, our first hypothesis is as follows:

Hypothesis 1: Sentences imposed by judges in U.S. District Courts are less severe following the Booker and the Gall decisions.

Social Context of Judicial Decision Making

Our second and third research questions focus on interdistrict variations in responses to Booker and Gall. We ask whether district courts responded in unique ways to these decisions and, if so, whether these differences can be attributed to differences in the social and political contexts in which the courts operate. There is a rich body of theoretical and empirical work in sentencing that reveals that judicial decision making is shaped by a variety of environmental factors above and beyond individual judges’ sentencing philosophies (Savelsberg, 1992; Ulmer & Kramer, 1998). To illustrate, according to the courts as communities perspective (Eisenstein et al., 1988), local legal culture shapes the way courts operate and the outcomes they produce. Similarly, the rationalized justice (Heydebrand & Seron, 1990) and social worlds (Ulmer, 1997) perspectives posit that environmental, structural, workgroup, and individual factors all influence court outcomes. In a similar vein, there is also a body of studies that highlight the important effects of broader social contexts on judicial decision making. For example, the minority threat perspective (Blalock, 1967) posits that the level of the African American population in a court’s jurisdiction is associated with harsher sentences (e.g., Britt, 2000; Johnson, Ulmer, & Kramer, 2008). Similarly, there is evidence that a jurisdiction’s economic ecology is an important determinant of sentence outcomes (Chambliss & Seidman, 1971), as elites or majority groups use it to maintain power over the “surplus labor force” (see Fearn, 2005; Wang & Mears, 2010). Furthermore, scholars have also stressed that the punishment process is fundamentally political (Garland, 1990), and there is a body of research showing that the political contexts in which courts are embedded influence sentencing outcomes (e.g., Helms & Jacobs, 2002). Overall, the theoretical perspectives suggest that the external court environment affects judicial decision making; more specifically, they suggest that sentences will be harsher in jurisdictions with larger Black populations, with more socioeconomic disadvantage, and in more politically conservative jurisdictions.

Yet, what is glaringly missing from this body of research in relation to our second and third research questions is why and how this sentencing pattern would be modified following Booker and Gall. Our second research question addresses the most fundamental aspect of the impact of Booker and Gall in this regard. That is, has the sentencing pattern across different federal jurisdictions diverged post Booker and Gall? Given the argument that more discretion would lead to more disparity (Frankel, 1972), we expect that a higher level of judicial discretion resulting from United States v. Booker and Gall v. United States would lead to more variance, as there is more room for external social pressures to intrude on judges’ sentencing decisions. Thus, our second hypothesis is as follows:

Hypothesis 2: The reductions in sentence severity following the Booker and Gall decisions will vary significantly across federal district courts.

Our third research question concerns how the sentencing pattern would change in the wake of Booker and Gall. As noted, there is little theoretical guidance on this issue. For that reason, we attempt to derive our framework from the uncertainty avoidance perspective (Albonetti, 1991; see also Cano & Spohn, 2012). Because the sudden increase in judicial discretion following Booker and Gall could be characterized by heightened uncertainty in judicial decision making (Fischman & Schanzenbach, 2012), judges are likely to rely on environmental cues as a mechanism for minimizing the level of uncertainty. To illustrate, following Booker and Gall, a judge in a court with a get-tough orientation on crime control would be likely to mete out more severe punishments when faced with uncertainty. In contrast, a judge in a district characterized by a higher level of political liberalism may be likely to hand down shorter sentences. It could be further inferred that the degree to which the impact of the legal changes would be moderated by social context would be greater in the post-Gall period, relative to the post-Booker period, as there is unarguably more uncertainty involving the binding power of federal guidelines following Gall. Accordingly, we test the following three hypotheses:

Hypothesis 3a: The reductions in sentence severity following the Booker and Gall decisions will vary by the percentage of the population in the judicial district that is African American; the reductions will be smaller in jurisdictions with larger populations of African Americans.

Hypothesis 3b: The reductions in sentence severity following the Booker and Gall decisions will vary by the percentage of the population in the judicial district that is economically disadvantaged; the reductions will be smaller in jurisdictions with a higher level of disadvantage.

Hypothesis 3c: The reductions in sentence severity following the Booker and Gall decisions will vary by the level of political conservatism in the judicial district; the reductions will be smaller in jurisdictions with a higher level of political conservatism.

Data and Method

Data

This study employs the USSC standard research data set on offenders convicted in U.S. District Courts from FY2001 to FY2010, in combination with data derived from the 2000 U.S. Census, Uniform Crime Reports (UCR) data on crime rates from 2001 to 2010, biographical data on federal judges from the Federal Judicial Center, and the County Characteristics data (ICPSR 20660). The original USSC data file included 724,297 offenders in 94 district courts, but some cases are excluded. First, all immigration cases were removed from the analysis due to the complexities inherent in these offenses (n = 181,735). Second, cases adjudicated in Puerto Rico, Guam, the Virginia Islands, the District of Columbia, and Northern-Mariana Island were also removed (n = 11,779). Finally, we deleted cases sentenced in the first quarter of FY2001 (n = 11,768) to take into account a temporal order issue with the crime rate variable. With these cases being excluded, the final data file includes 519,015 offenders who were sentenced in 89 district courts.

Measures

The dependent variable is the length of the prison sentence captured in months. Following prior studies which included probation cases (with a 0-month sentence) in the analyses (Albonetti, 1997; Bushway & Piehl, 2001; Fischman & Schanzenbach, 2011; Starr & Rehavi, 2012; USSC, 2010), we keep the probation cases in the sample.² We made two adjustments to this variable: Sentence length was capped at 470 months; it then was log transformed because of its skewed distribution, with a constant of 1 added. The adjusted sentence length variable has a mean of 3.09 and a standard deviation of 1.72.

The key independent variables comprised a series of binary indicators representing the Booker and Gall Court decisions. The Booker variable measures whether the case was sentenced after the Booker decision, coded as 1 if the offender was sentenced from the time the Booker decision was announced to the time the Gall decision was handed down and as 0 otherwise. Likewise, the Gall variable is coded as 1 if the offender was sentenced after the Gall ruling and as 0 otherwise. With regard to the reference category, we follow the approach of Ulmer and colleagues (2011a, 2011b), demarcating the effect of the Protect Act from the pre-Protect era. Thus, there are four binary time indicators: pre-Protect (reference category), post-Protect, Booker, and Gall. We also use three contextual-level independent variables to model the between-district variations in the effects of Booker and Gall. First, data from the 2000 decennial census are used to compile a variable for racial composition, which measures a district’s percent Black in the population. Second, also using data from the 2000 census, a measure for disadvantage is computed using a standardized factor score deriving from the following four items: percentage female-headed families with children, male unemployment rate, poverty rate, and the percentage of people without a high school diploma or equivalent (Eigenvalue = 2.867, Factor loadings: minimum = .722 and maximum = .929, α = .85). Finally, a measure for political conservatism, generated from county-level data (ICPSR 20660), reflects the percentage of votes for George W. Bush in the 2004 presidential election. All of the district-level variables were standardized with a mean of 0 and a standard deviation of 1.

Control variables at the individual level include the offender’s race/ethnicity (Black, Hispanic, or Other, with White offenders serving as the reference category), gender (female = 1; male = 0), education (some college above = 1; no college = 0), and citizenship (noncitizen = 1; citizen = 0). Age is controlled as a continuous variable in addition to an age squared term. We control for the offense type, which is measured using five dummy variables (drug, fraud, firearm, and others, with violent offenses serving as the reference category). We also control for the presumptive sentence, which is adjusted in two ways. Consistent with the treatment of the dependent variable, it is capped at 470 months and is log transformed with a constant of 1 added. We control for earlier case processing outcomes, including the offender’s pretrial status (in custody = 1; released = 0) and the type of disposition in the case (guilty plea = 1; trial = 0). Finally, we control for whether the offender was facing a mandatory minimum penalty; a case is coded 1 if an offender is facing a mandatory minimum and fails to receive either a substantial assistance departure or a safety valve departure. For ease of interpretation of the intercept, all variables are grand mean centered.³

At the district level, we use four time-varying covariates for each district to rule out the possibility that any concurrent factors, other than Booker and Gall, would influence the changes in sentence severity over time. First, we control for judicial characteristics across U.S. District Courts by employing annual measures for percent Republican presidential appointees, percent male judges, and percent White judges as proxies for judicial ideology in punishment (Fischman & Schanzenbach, 2011). Second, premised on an organizational efficiency argument (Dixon, 1995), districts’ caseload is used to control for the possibility that any changes in caseload over time may exhibit an effect on sentence severity. Caseload is measured as the average number of cases processed in a district court for a given quarter, divided by the number of active judges, subsequently divided by 10 for ease of interpretation (Johnson, 2005). Third, a measure of the district’s crime rate is used to control for the possibility that changes in crime rates may influence judicial decision making (Britt, 2000). Using UCR data, we measure each district’s crime rate as the total number of index crimes per 1,000 of the population.⁴ Fourth, we use a time variable—measured as the month of sentencing, including a time squared term—to account for any unmeasured and gradual time trends in sentencing outcomes (Starr & Rehavi, 2012). Finally, we control for two additional time-invariant, court-level variables. Court size is measured by the number of authorized judgeships in each federal district. Because of caseload pressure from immigration cases, we also control for whether a district is situated on the U.S./Mexico border (Districts of Southern California, Arizona, New Mexico, Southern and Western Texas; 1 = yes, 0 = no).⁵ All variables were grand mean centered. Table 1 provides the means and standard deviations for all the study variables.

Table 1.

Descriptive Statistics.

	M	SD	Minimum	Maximum	% Missing
Dependent variable
Sentence length (logged)	3.09	1.72	0	6.15	1.54
Independent and control variables
Level 1 variables
White (reference group)	0.37		0	1	5.16
African American	0.30		0	1	5.16
Hispanic	0.28		0	1	5.16
Others	0.05		0	1	5.16
Female	0.15		0	1	1.87
Age	35.12	10.79	16	102	1.68
Some college	0.26		0	1	7.21
Noncitizen	0.21		0	1	4.94
Plea	0.94		0	1	0.29
Detained	0.65		0	1	6.21
Violent (reference group)	0.05		0	1	0.29
Drug	0.47		0	1	0.29
Fraud	0.20		0	1	0.29
Firearms	0.14		0	1	0.29
Others	0.14		0	1	0.29
Final criminal history	2.33	1.74	1	6	3.74
Mandatory minimum	0.32		0	1	3.96
Presumptive sentence (logged, GLMIN)	3.48	1.49	0	6.15	4.12
Presumptive sentence (logged, XMINSOR)	3.44	1.46	0	6.15	4.12
Level 2 variables (time-variant)
Pre-protect (reference group)	0.23		0	1	0.00
Post-protect	0.17		0	1	0.00
Booker	0.31		0	1	0.00
Gall	0.29		0	1	0.00
% Republican appointees	60.83	16.90	0	100	0.00
% Male judges	79.11	12.10	50	100	0.00
% White judges	80.96	14.56	47.05	100	0.00
Caseload	3.93	3.26	.27	17.7	0.00
Crime rates	25.29	13.74	.76	83.18	0.00
Time	58.72	33.55	0	116	0.00
Level 3 variables (time-invariant)
Court size	10.54	6.62	1	28	0.00
Border district	0.19		0	1	0.00
Racial context (z)	0	1	−0.89	3.49	0.00
Conservatism (z)	0	1	−2.57	2.33	0.00
Disadvantage (z)	0	1	−1.97	2.33	0.00

Note. All the variables are presented as their original forms, unless otherwise specified.

Analytic Strategy

Following the lead of DiPrete and Grusky (1990) and Xie, Lauristen, and Heimer (2012), we use a three-level, hierarchal linear modeling strategy, in which individual cases are nested within time and districts. The three-level model takes into account the multiple-nesting structure of the data to produce conservative estimates of the effects of our independent variables on our outcome measure. Because we use repeated cross-sectional data that extend across a long period of time and across different locations, the possibility exists that a defendant sentenced in a similar time point shares similar sentencing patterns, and a defendant sentenced in the same district is treated in a similar manner, relative to another defendant sentenced in a different district. To address our research questions, we estimate the following model as our baseline model:

Y_{i t j} = β_{0 t j} + β_{1} X_{i t j} + e_{i t j} .

(1) Level 1

β_{0 t j} = π_{0 j} + π_{1 j} (B o o k e r / G a l l_{t j}) + π_{2 j} (Z_{t j}) + π_{3 j} (Time / {Time}_{t j}^{2}) + r_{t j} .

(2) Level 2

π_{0 j} = γ_{00} + γ_{01 j} (W_{j}) + u_{0 j} .

(3) Level 3

At Level 1, Y_itj is the observed logged sentence length for case i, at time t, in district j. X_itj denotes a vector of the characteristics of an individual defendant and case processing variables measured at time t in district j. At Level 2, the Booker and Gall dummy variables are our main predictors of interest. Z_tj represents a vector of time-varying covariates at the district level, such as crime rate, caseload, and aggregated judicial characteristics measured at time t in district j. Time_tj is measured as the month when cases were sentenced. At level 3, W_j denotes a vector of time-invariant covariates in district j, such as our district-level independent variables, court size, and border district. Finally, r_tj and u_0j are random effects for the time and the district, which are assumed to be normally distributed with means of zero and variances of σ²µ and σ²_v. To answer our first research question, our main interest lies in estimating the parameter π_1j. Our extended model, which tests our second and third hypotheses, is specified in the following forms. To test our second hypothesis, we allow the impact of Booker and Gall to randomly vary across districts, in which the variation would be captured by u_1j in Equation 4. Once we establish that there is a statistically significant random effect to be explained, we test our third hypotheses by adding into the equation a group of W_j, measures that capture percent Black, disadvantage, and conservatism into Equation 5.

π_{1 j} = γ_{00} + u_{1 j} .

(4) Level 3

π_{1 j} = γ_{10} + γ_{11} (W_{j}) + u_{1 j} .

(5) Level 3

Recent discourse on sentencing research has been centered upon the appropriate methodological strategies that should be assumed to capture whether Booker and Gall affected sentencing discretion (for competing analytic strategies, see Starr & Rehavi, 2012; Ulmer et al., 2011a, 2011b). Critics charge that studies should not control for sentencing departures and mandatory minimum status, as these variables are endogenous to the Booker and Gall decisions (Fischman & Schanzenbach, 2012; Engen, 2011). Reflecting this debate, our analyses proceed in the following manner. For each hypothesis, the first set of models captures the application of a mandatory minimum and the presumptive sentence variable (“GLMIN”), which encompasses mandatory minimum status. Conversely, the second set of models does not capture the application of a mandatory minimum and thus employs a presumptive sentence variable (“XMINSOR”) that does not reflect defendants’ mandatory minimum status. What is markedly different between the series of models is the control measure for mandatory minimum status, as we do not control for the sentencing departure decision (see Note 3). Estimating the series of competing analytic strategies can help better approximate potential anomalies introduced by changes in prosecutors’ charging behavior in relation to the application of mandatory minimums following the Booker and Gall decisions.

Results

The first step in the analyses was to fit an unconditional model designed to determine whether there are significant variance components at Levels 2 and 3; if so, there would be justification for estimating a multilevel model. The results (not presented here) indicate that the grand mean for logged sentence length is 3.138. We find statistically significant random variations; the intraclass correlation coefficients (ICCs) indicate that about 4.91% of the total variance in sentence length is explained at the district level (Level 3), whereas approximately 2% of the variation is explained across time (Level 2). Based on prior research, the variation accounted for at Level 2 is relatively small and thus may suggest that the clustering of data at Level 2 is unnecessary. We performed a (likelihood ratio) test to determine whether a two-level model or the current three-level model fit the data better. The result suggests that a three-level model is a better fit (LR chi-square = 3,247.86; p = .000). Therefore, we use a three-level model as our primary data analysis technique.

To test the first hypothesis, we estimated a random coefficient model, in which we allowed the impact of Protect, Booker and Gall to vary across district courts. Our overall results in Table 2 suggest that most of the variation in sentence length is explained by legally relevant predictors. With respect to extralegal predictors, only the offender’s sex exerts substantial statistical significance, while all other predictors show modest effects. With the exception of the district’s size and percent Black population, the effects of the district-level covariates are largely in line with our expectations. The indicator for time shows a negative, downward trend, which is curve-linear. With regard to our first hypothesis, we predicted that sentence severity would decline in the wake of Booker and Gall. The results in Model 1 show that both Booker and Gall exhibited statistically significant, negative effects on sentence severity. Specifically, offenders adjudicated after Booker received sentences that averaged 7.1% shorter relative to offenders adjudicated in the pre-Protect period. The effect of Gall on sentencing severity is greater than that of Booker, as evidenced by the Wald test (χ² = 6.40, p = .011), in that sentences following Gall were 9.4% shorter than sentences imposed in the pre-Protect period. The results presented in Model 2, which does not control for the application of a mandatory minimum, uncover a similar pattern. That is, the results remain relatively similar, but the magnitudes of the effects of both Booker and Gall decrease modestly in Models 1 and 2, respectively (Booker: −0.071→−0.063, Gall: −0.094→−0.076). Considered together, both series of Models—Model 1 and Model 2—offer support for our predictions, as we found moderate, negative effects for Booker and Gall on sentence severity.⁶

Table 2.

Full Models.

Fixed effects	Model 1 (with minimum)			Model 2 (without minimum)
Fixed effects	Coefficient	RSE	p	Coefficient	RSE	p
Level 1
Intercept	3.184***	.022	.000	3.170***	.022	.000
Black	.046***	.006	.000	.082***	.006	.000
Hispanic	.053***	.008	.000	.063***	.008	.000
Others	.046*	.023	.049	.031^†	.023	.089
Female	−.233***	.014	.000	−.252***	.014	.000
Age	.015***	.001	.000	.014***	.001	.000
Age²	−.000***	.000	.000	−.000***	.000	.000
Education	−.044***	.005	.000	−.053***	.005	.000
Noncitizen	.094***	.012	.000	.076***	.012	.000
Drug	−.285***	.019	.000	−.143***	.019	.000
Fraud	−.299***	.023	.000	−.278***	.023	.000
Firearms	−.197***	.017	.000	−.089***	.017	.000
Others	−.248***	.018	.000	−.232***	.018	.000
Detained	.538***	.027	.000	.594***	.027	.000
Plea	−.342***	.020	.000	−.393***	.020	.000
Presumptive sentence	.802***	.013	.000	.833***	.013	.000
Criminal history score	.057***	.003	.000	.043***	.003	.000
Mandatory minimum	.309***	.014	.000	—	—	—
Level 2
Protect	−.010	.011	.366	−.011	.011	.366
Booker	−.071 ***	.017	.000	−.063 ***	.017	.000
Gall	−.094 ***	.022	.000	−.076 ***	.022	.000
% Republican judges	−.000	.000	.779	−.001	.000	.779
% Male judges	−.000	.001	.857	−.000	.001	.857
% White judges	−.000	.001	.909	.000	.001	.909
Caseload	.006	.006	.326	.003	.006	.326
Crime rates	.000	.000	.407	.000	.000	.407
Time	.003***	.000	.000	.003***	.000	.000
Time²	−.000***	.000	.000	−.000***	.000	.000
Level 3
Size	−.005^†	.003	.099	−.004^†	.003	.099
Border	−.160	.106	.137	−.178	.106	.112
% Black residents	.008	.017	.630	.011	.017	.630
Disadvantage	.039^†	.021	.062	.038^†	.021	.062
Conservatism	.062***	.015	.000	.061***	.015	.000
Random effects	SD	SE	p	SD	SE	p
Level 3
Intercept Protect Booker Gall	.138***	.006	.000	.130***	.010	.000
	.065 ***	.004	.000	.054 ***	.004	.000
	.082 ***	.005	.000	.068 ***	.005	.000
	.116 ***	.008	.000	.106 ***	.008	.000
Level 2	.066***	.000	.000	.070***	.001	.000
Level 1	.780***	.000	.000	.816***	.000	.000

†

p < .10. *p < .05. **p < .01. ***p < .001.

Turning to our second hypothesis, results from the random effects models are presented in the bottom half of Table 2. We observed a pattern whereby the magnitude of random variations of each event appears to change across the sentencing policy changes. Results in Model 1 show that the impact of the Protect Act was rather homogeneous across districts (SD = 0.065), compared with the effect of Booker (SD = 0.082), and the variation of Booker across districts appeared relatively smaller than the variation of Gall (SD = 0.116). To demonstrate the variability of the impact of Gall, we calculated the 95% plausible range of the Gall effect. The results show that the effect lies between −0.321 and 0.133 in logged sentence length across federal districts (−0.094 ± 1.96 × 0.116 = −0.321/0.133). Finally, support is provided for our second hypothesis, as the random variance is statistically significant. Although the population-mean slopes are not substantial, the standard deviations of the variance component for Booker (0.082, p = 0.000) and Gall (0.116, p = .000) exhibit a considerable amount of variation. In particular, the random variation of Gall is larger than the variation of Booker, as shown by the Wald test (χ² = 19.29, p = .000). In line with our findings on the first hypothesis, we observed a similar pattern of variance for the effects of Booker/Gall, which are presented at the bottom half of Table 2 (Model 2). The variability of the effects of Booker and Gall appears to be smaller in the second model, as we do not control for the application of a mandatory minimum (Booker: 0.082→0.068, Gall: 0.116→0.106).

The results discussed thus far demonstrate that the Supreme Court’s rulings in Booker and Gall had a significant, negative effect on sentence severity, and that the effects of these decisions varied considerably across U.S. District Courts. We now turn to our final research questions—that is, whether and to what extent do court contextual-level predictors explain the random variation that was uncovered? Results for our final set of hypotheses are displayed in Table 3. Hypothesis 3a predicts that the reductions in sentence severity following Booker and Gall would be smaller in districts with a higher percentage of Black residents. Results from Model 3 offer partial support. Although the effect of Booker was not conditioned by percent Black in the population, the effect of Gall was moderated by racial composition, in that a one standard deviation increase in districts’ percent Black population reduced the negative impact of Gall on sentence severity by 2.6%. On the contrary, offenders adjudicated in districts in which percent Black population was one standard deviation below the mean received a higher sentence reduction in the wake of Gall, which amounted to about a 12% reduction in logged sentence length. In addition, we observe a similar pattern from our alternative model that does not control for mandatory minimum status (Model 6). Consistent with our previous findings, there was a slight reduction in the interaction effect between Gall and percent Black population (Gall × % Black: 0.026→0.022).

Table 3.

Cross-Level Interactions (Abridged).

Fixed effects	With minimum			Without minimum
Fixed effects	Model 3	Model 4	Model 5	Model 6	Model 7	Model 8
Intercept	3.185***	3.184***	3.184***	3.173***	3.172***	3.172***
Level 2
Booker	−.071***	−.071***	−.071***	−.061***	−.061***	−.062***
Gall	−.094***	−.092***	−.094***	−.074***	−.072***	−.075***
Booker × % Black	.003	—	—	−.004	—	—
Gall × % Black	.026 **	—	—	.022 *	—	—
Booker × Disadvantage	—	.000	—	—	.000	—
Gall × Disadvantage	—	.021 **	—	—	.019 ^†	—
Booker × Conservatism	—	—	.011	—	—	.013^†
Gall × Conservatism	—	—	.013 *	—	—	.012
Level 3
% Black	.001	.008	.008	.008	.008	.008
Disadvantage	.039^†	.034	.039^†	.039^†	.038^†	.039^†
Conservatism	.062***	.062***	.056**	.062***	.062***	.060**
Random effects	SD	SD	SD	SD	SD	SD
Level 3
Intercept	.138***	.138***	.138***	.128***	.128***	.128***
Booker	.082***	.082***	.082***	.061***	.061***	.060***
Gall	.101***	.101***	.101***	.099***	.100***	.102***

Note. All the Level 1, Level 2, and Level 3 variables are controlled, but not shown.

†

p < .10. *p < .05. **p < .01. ***p < .001.

With regard to Hypothesis 3b, it was anticipated that the effects of Booker and Gall would vary by the district’s level of socioeconomic disadvantage, and we predicted that sentence reductions would be smaller in districts with a higher level of socioeconomic disadvantage. The results presented in Model 4 provide partial support for this hypothesis. Consistent with the findings for Hypothesis 3a, we find that the effect of Gall, but not the effect of Booker, was conditioned by the district’s level of socioeconomic disadvantage. A one standard deviation increase in the level of disadvantage reduces the main effect of Gall by 2.1%, thus reducing the overall Gall effect to a 7.1% reduction in logged sentence length. The results from our alternative model—that is, no control measure for mandatory minimum status—also exhibited a similar pattern (Model 7), as we uncovered a slight reduction in the interaction effect between Gall and the disadvantage predictor (Gall × Disadvantage: 0.021→0.019).

Our final research proposition, Hypothesis 3c, suggested that a district’s level of political conservatism would moderate the effects of Booker and Gall on sentence length. Consistent with the results for the previous two hypotheses, we found that, whereas the effect of Booker is not significantly moderated by political conservatism, the impact of Gall is conditioned by political conservatism, although the magnitude of the cross-level interaction is minimal (Model 5). In our alternative model (Model 8), however, we found that the modest interaction effect between Gall and political conservatism was reduced to a statistical nonsignificant level, as the effect of Booker became statistically significant at the p value < .10 level.

Discussion and Conclusion

Social scientists and legal scholars argue that the decisions handed down by the U.S. Supreme Court in Booker and Gall reshaped the landscape of federal sentencing. However, unanswered questions remain regarding the degree to which these decisions changed the sentences judges impose and, if so, whether the changes were uniform across offenders and across courts. We address these questions in this study. We hypothesized that the Booker and Gall decisions would reduce sentence severity. This hypothesis was confirmed, as we found that offenders sentenced after both Booker and Gall received sentences that were significantly shorter than the sentences imposed on offenders sentenced in the pre-Protect era. We hypothesized that the impact of Booker and Gall would vary across the U.S. district courts and that this variation would be explained in part by the social, economic, and political contexts in which the court was situated. We found support for both hypotheses. There was significant interdistrict variation in the effects of both decisions on sentence severity. More importantly, we found that the effects of Gall, but not the effects of Booker, were conditioned by the contextual factors.

Several of the findings merit comment. The fact that the Booker and Gall decisions led to significant reductions in sentence severity suggests that the Court’s rulings that the guidelines were advisory rather than mandatory were not simply symbolic pronouncements about the nature of the federal sentencing process. By allowing judges to impose sentences outside the guideline range (Booker) and by stating that a judge’s determination of a reasonable sentence was to be framed by “an individualized assessment based on the facts presented” (Gall), the Court freed judges to consider, not only the severity of the offense and the seriousness of the offender’s criminal history, but also the offender’s past and current circumstances, motivation for the crime, degree of remorse, and so on. Although we can only speculate, it thus appears that the decisions gave district court judges discretion to consider what Tonry (1996) has referred to as the “commonsense bases for distinguishing among offenders” (p. 77).

We also found that the presumptive sentence was the strongest predictor of sentence severity during the 10-year period we examined, which included 5 years of sentences imposed in the post-Booker era. This suggests that the Supreme Court decisions, which rendered the federal sentencing guidelines advisory, did not make them irrelevant. Judges, most of whom have known no other sentencing regime than the guidelines, continue to impose sentences that are based to a considerable degree on the presumptive sentence. Also of interest is the fact that the negative effects of both Booker and Gall varied substantially across courts and that these variations in reactions to the Gall decision were explained in part by the district’s social, economic, and, to a lesser degree, political characteristics. One of the most noteworthy findings of the study is the weak and null interaction effects in relation to the political conservatism variable. In contrast to racial composition and socioeconomic disadvantage, we did not observe the robust interaction effects of political conservatism on sentence severity. Because of the salient role political context plays in sentencing decisions (Helms & Jacobs, 2002), it is interesting that relatively little change was observed in districts where the level of political conservatism is low. One could speculate that liberal judges were already thwarting the guidelines to some degree, prior to Booker, as reflected in the strong main effect of the political conservatism predictor; thus, any changes following Booker or Gall may have been minimal.⁷ Considered together, however, these results add to a growing body of research (Kautt, 2002; Spohn, 2005; Ulmer & Johnson, 2004) demonstrating that sentencing policies are interpreted and implemented in different ways in different court systems and highlight the importance of court contextual factors.

An important caveat in this study resonates from the fact that official USSC data files do not provide information on prosecutorial charging decisions. For example, in cases processed subsequent to Booker and Gall, prosecutors may charge offenders with more severe crimes and more frequently invoke mandatory minimums to counteract the rise in judicial discretion. In the present study, we partially addressed the issue by not controlling for defendants’ mandatory minimum status. Accordingly, any changes stemming from prosecutors’ decisions that are linked to a mandatory minimum are reflected in the results. Although two recent studies suggest that Booker and Gall affected prosecutors’ behavior related to the application of a mandatory minimum, not necessarily charging reductions (Fischman & Schanzenbach, 2012; Starr & Rehavi, 2012), failure to control for any changes in charge reductions is clearly a limitation.

Our research is certainly not the last word on the impacts of Booker and Gall. We included only 5 years of data post-Booker and 3 years of data post-Gall, finding significant reductions in sentence severity over this time period. Future research should continue to monitor the decisions’ effects on the sentences judges impose, focusing, as we did, on whether the effects are uniform across districts and, as others have done (Ulmer et al., 2011a; USSC, 2010) on the decisions’ effects on unwarranted disparity. Future research also should attempt to tease out whether the effects of the decisions vary across crime types and whether the factors that affect sentence outcomes have changed over time (Wooldredge, 2009).

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

Notes

Author Biographies

Byungbae Kim is a doctoral student in the School of Criminology and Criminal Justice at Arizona State University. His main research interests are sentencing, corrections, and quantitative data analysis.

Mario V. Cano is an assistant professor in the Department of Sociology, Anthropology, and Social Work at Kansas State University. His research interests include criminological theory and courts and sentencing.

KiDeuk Kim is a senior research associate at the Urban Institute and a visiting fellow at the Bureau of Justice Statistics. His current research interests include sentencing, evidence-based decision making, and criminal justice policy evaluation.

Cassia Spohn is foundation professor of criminology and criminal justice at Arizona State University, where she also directs the doctoral program. Her research focuses on prosecutorial and judicial decision making, case attrition in sexual assault cases, and the effects of race, ethnicity, and gender on criminal court outcomes.

References

Albonetti

C. A.

(1991). An integration of theories to explain judicial discretion. Social Problems, 38, 247-266.

Albonetti

C. A.

(1997). Sentencing under the federal sentencing guidelines: Effects of defendant characteristics, guilty pleas, and departures on sentence outcomes for drug offenses, 1991–1992. Law & Society Review, 31, 789-822.

Berman

D. A.

(2005). Assessing federal sentencing after Booker. Federal Sentencing Reporter, 17, 291-294.

Blalock

H. M.

(1967). Toward a theory of minority-group relations. New York, NY: Wiley.

Bowman

F. O.

III . (2012). Nothing is not enough: Fix the absurd post-booker federal sentencing system. Federal Sentencing Reporter, 24, 356-368.

Britt

C. L.

(2000). Social context and racial disparities in punishment decisions. Justice Quarterly, 17, 707-732.

Bushway

Piehl

(2001). Judging judicial discretion: Legal factors and racial discrimination in sentencing. Law & Society Review, 35,733-35,764.

Cano

M. V.

Spohn

(2012). Circumventing the penalty for offenders facing mandatory minimums: Revisiting the dynamics of “sympathetic” and “salvageable” offenders. Criminal Justice and Behavior, 39, 308-332.

Chambliss

W. J.

Seidman

R. B.

(1971). Law, order, and power. Reading, MA: Addison-Wesley.

10.

DiPrete

T. A.

Grusky

D. B.

(1990). The multilevel analysis of trends with repeated cross-sectional data. Sociological Methods, 20, 337-368.

11.

Dixon

(1995). The organizational context of criminal sentencing. American Journal of Sociology, 100, 1157-1198.

12.

Eisenstein

Flemming

R. B.

Nardulli

(1988). The contours of justice: Communities and their courts. Boston, MA: Little, Brown.

13.

Eisenstein

Jacob

(1977). Felony justice: An organizational analysis of criminal courts. Boston, MA: Little, Brown.

14.

Engen

R. L.

(2009). Assessing determinate and presumptive sentencing: Making research relevant. Criminology & Public Policy, 8, 323-335.

15.

Engen

R. L.

(2011). Racial disparity in the wake of Booker/Fanfan making sense of “messy” results and other challenges for sentencing research. Criminology & Public Policy, 10, 1139-1149.

16.

Farrell

Ward

(2011). Examining district variation in sentencing in the post-Booker period. Federal Sentencing Reporter, 23, 318-325.

17.

Fearn

N. E.

(2005). A multilevel analysis of community effects on criminal sentencing. Justice Quarterly, 22, 452-22, 487.

18.

Fischman

J. B.

Schanzenbach

M. M.

(2011). Do standards of review matter? The case of federal criminal sentencing. The Journal of Legal Studies, 40, 405-437.

19.

Fischman

J. B.

Schanzenbach

M. M.

(2012). Racial disparities under the federal sentencing guidelines: The role of judicial discretion and mandatory minimums. Journal of Empirical Legal Studies, 9, 729-764.

20.

Frankel

(1972). Lawlessness in sentencing. University of Cincinnati Law Review, 41, 1-54.

21.

Freeborn

B. A.

Hartmann

M. E.

(2010). Judicial discretion and sentencing behavior: Did the Feeney amendment rein in district judges? Journal of Empirical Legal Studies, 7, 355-378.

22.

Garland

(1990). Punishment and modern society. Oxford, UK: Clarendon Press.

23.

Gottfredson

M. R.

Gottfredson

D. M.

(1987). Decision making in criminal justice (Vol. 3). New York, NY: Plenum Press.

24.

Helms

Jacobs

(2002). The political context of sentencing: An analysis of community and individual determinants. Social Forces, 81, 577-604.

25.

Heydebrand

W. V.

Seron

(1990). Rationalizing justice: The political economy of federal district courts. New York: State University of New York Press.

26.

Hofer

P. J.

(2006). Immediate and long-term effects of United States v. Booker: More discretion, more disparity, or better reasoned sentences? Arizona State Law Journal, 38, 425-468.

27.

Hofer

P. J.

(2007). United States v. Booker as a natural experiment: Using empirical research to inform the federal sentencing policy debate. Criminology & Public Policy, 6, 433-460.

28.

Johnson

B. D.

(2005). Contextual disparities in guidelines departures: courtroom social contexts, guidelines compliance, and extralegal disparities in criminal sentencing. Criminology, 43, 761-796.

29.

Johnson

B. D.

Ulmer

J. T.

Kramer

J. H.

(2008).The social context of guideline circumvention: The case of federal district courts. Criminology, 46, 711-783.

30.

Kautt

P. M.

(2002). Location, location, location: Interdistrict and intercircuit variation in sentencing outcomes for federal drug-trafficking offenses. Justice Quarterly, 19, 633-671.

31.

Kimbrough v. United States. 2007. 552 U.S. __.

32.

Myers

M. A.

Talarico

S. M.

(1987). The social contexts of criminal sentencing. New York, NY: Springer-Verlag.

33.

Nagel

I. H.

Schulhofer

S. J.

(1992).A tale of three cities: An empirical study of charging and bargaining practices under the federal sentencing guidelines. Southern California Law Review, 66, 501-566.

34.

Paternoster

(2011). Racial disparity under the federal sentencing guidelines pre- and post-Booker. Criminology & Public Policy, 10, 1063-1072.

35.

Savelsberg

J. J.

(1992). Law that does not fit society: Sentencing guidelines as a neoclassical reaction to the dilemmas of substantivized Law. American Journal of Sociology, 97, 1346-1381.

36.

Schanzenbach

M. M.

Tiller

E. H.

(2007). Strategic judging under the US sentencing guidelines: Positive political theory and evidence. Journal of Law, Economics, & Organization, 23, 24-56.

37.

Scott

(2010). Inter-judge sentencing disparity after Booker: A first look. Stanford Law Review, 63, 1-66.

38.

Spohn

(2005). Sentencing decisions in three US district courts: Testing the assumption of uniformity in the federal sentencing process. Justice Research and Policy, 7, 1-28.

39.

Spohn

(2011). Unwarranted disparity in the wake of the Booker/Fanfan decision. Criminology & Public Policy, 10, 1119-1127.

40.

Starr

S. B.

Rehavi

M. M.

(2012, November 1). Racial disparity in the criminal justice process: Prosecutors, judges, and the effects of United States v. Booker (U of Michigan Law & Econ Research Paper No. 12-021).

41.

Steffensmeier

Ulmer

J. T.

Kramer

J. H.

(1998). The interaction of race, gender, and age in criminal sentencing: The punishment cost of being young, black, and male. Criminology, 36, 763-798.

42.

Stith

Cabranes

J. A.

(1998). Fear of judging: Sentencing guidelines in the federal courts. Chicago, IL: University of Chicago Press.

43.

Tiede

L. B.

(2009). The impact of the federal sentencing guidelines and reform: A comparative analysis. Justice System Journal, 30, 34-49.

44.

Tonry

M. H.

(1995). Malign neglect. New York, NY: Oxford University Press.

45.

Tonry

M. H.

(1996). Sentencing matters. New York, NY: Oxford University Press.

46.

Ulmer

J. T.

(1997). Social worlds of sentencing: Court communities under sentencing guidelines. Albany: State University of New York Press.

47.

Ulmer

J. T.

Johnson

(2004). Sentencing in context: A multilevel analysis. Criminology, 42, 137-177.

48.

Ulmer

J. T.

Kramer

J. H.

(1998). The use and transformation of formal decision making criteria: Sentencing guidelines, organizational context, and case processing strategies. Social Problems, 45, 248-267.

49.

Ulmer

J. T.

Light

M. T.

(2010). The stability of case processing and sentencing post-Booker. Gender, Race, and Ethnicity, 143, 143-177.

50.

Ulmer

J. T.

Light

M. T.

(2011). Disparity: Changes in federal sentencing after Booker and Gall? Federal Sentencing Reporter, 23, 333-341.

51.

Ulmer

J. T.

Light

M. T.

Kramer

J. H.

(2011a). The “liberation” of federal judges’ discretion in the wake of the Booker/Fanfan decision: Is there increased disparity and divergence between courts? Justice Quarterly, 28, 799-837.

52.

Ulmer

J. T.

Light

M. T.

Kramer

J. H.

(2011b). Racial disparity in the wake of the Booker/Fanfan decision: An alternative analysis to the USSC’s 2010 report. Criminology & Public Policy, 10, 1077-1118.

53.

United States v. Booker and United States v. Fanfan. 2005. 125 S. Ct. 738.

54.

U.S. Sentencing Commission. (2004). Fifteen years of guideline sentencing: An assessment of how well the federal criminal justice system is achieving the goals of sentencing reform. Washington, DC: Author.

55.

U.S. Sentencing Commission. (2006). Report on the impact of United States v. Booker on federal sentencing. Washington, DC: Author.

56.

U.S. Sentencing Commission. (2010). Demographic differences in federal sentencing practices: An update of the Booker report’s multivariate regression analysis. Washington, DC: Author.

57.

U.S. Sentencing Commission. (2012). Report on the continuing impact of United States v. Booker on federal sentencing. Washington, DC: Author.

58.

Wang

Mears

D. P.

(2010). A multilevel test for minority threat effects on sentencing. Journal of Quantitative Criminology, 26, 191-215.

59.

Wooldredge

(2009). Short- versus long-term effects of Ohio’s switch to more structured sentencing on extralegal disparities in prison sentences in an urban court. Criminology & Public Policy, 8, 285-312.

60.

Xie

Lauristen

J. L.

Heimer

(2012). Intimate partner violence in U.S. Metropolitan areas: The contextual influences of police and social services. Criminology, 50, 961-992.