The Gap-Closing Estimand: A Causal Approach to Study Interventions That Close Disparities Across Social Categories

Abstract

Disparities across race, gender, and class are important targets of descriptive research. But rather than only describe disparities, research would ideally inform interventions to close those gaps. The gap-closing estimand quantifies how much a gap (e.g., incomes by race) would close if we intervened to equalize a treatment (e.g., access to college). Drawing on causal decomposition analyses, this type of research question yields several benefits. First, gap-closing estimands place categories like race in a causal framework without making them play the role of the treatment (which is philosophically fraught for non-manipulable variables). Second, gap-closing estimands empower researchers to study disparities using new statistical and machine learning estimators designed for causal effects. Third, gap-closing estimands can directly inform policy: if we sampled from the population and actually changed treatment assignments, how much could we close gaps in outcomes? I provide open-source software (the R package gapclosing) to support these methods.

Keywords

causal inference disparities stratification race class gender

Introduction

Gaps in socioeconomic outcomes are among the strongest indictments of inequality. For example, the net worth of the median black household in America in 2016 was 10 cents for every dollar held by the median white household (McIntosh et al., 2020). Among those employed full-time in 2014, the median woman earned 79 cents for every dollar earned by the median man (Blau and Kahn, 2017). People raised by high-income parents have higher incomes themselves as adults (Chetty et al., 2014). These disparities are compelling in part because they are simple: the difference in a summary statistic across populations. Yet having done the important work of describing a gap, one might turn to a further object of research: how we could intervene to close that gap.

Learning how to close a gap requires one to think about causal counterfactuals. What racial wealth gap would remain if access to homeownership were equalized? What sex gap in pay would remain if men and women were assigned equitably to occupations? What income mobility would be observed if access to college were equalized? Each of these questions involves a disparity across categories (race, sex, family origin) that would persist under counterfactual assignment of some treatment (homeownership, occupation, college). For some treatments (e.g., occupation), gap-closing estimands speak to scientific understanding. For instance, we can assess the degree to which the sex gap in pay is a causal consequence of occupational segregation. Other treatments (e.g., access to college) may be directly amenable to policy manipulation. Evidence about these treatments would directly inform interventions to promote equality.

A gap-closing estimand places categories of race, class, and gender within a causal framework. It begins with a set of units labeled by categories (e.g., race). Then it posits that we could take a sample of those units from the population and conduct a counterfactual intervention to a treatment variable (e.g., attaining a college degree). The gap-closing estimand is the expected disparity in an outcome (e.g., income) across categories of units in that sample who receive that counterfactual intervention to the treatment (Figure 1). This way of thinking about the problem is useful for three reasons. First, we never have to posit a counterfactual manipulation of the category itself (e.g., race), thereby avoiding the pitfalls of estimating the causal effect of seemingly immutable characteristics (Holland, 1986; Kohler-Hausmann, 2018). Second, by focusing on what would happen if the treatment were applied to a sample we avoid the interference problems that would arise from giving treatment to the full population. To know what would happen in a sample, we do not need formal models of interference and can directly apply standard statistical and machine learning procedures. Third, this framing is useful for policy: we cannot change people’s race, but we can send them to college.

Figure 1.

The gap-closing estimand uses observational data to emulate a hypothetical experiment. Social categories (e.g., class origin) are denoted by $X$ and are conceptualized as collections of units: sets of observations over which we seek to estimate a disparity. We never consider the outcome that a unit of one category (e.g., a circle) would realize if they were counterfactually of another category (e.g., a diamond). Instead, we consider sampling from the population and intervening to set a manipulable treatment variable $T$ to some value $t$ . We would then observe the potential outcomes $y_{i} (t)$ and $y_{j} (t)$ for individuals in each category and the disparity across categories. The gap-closing estimand is the expected disparity over hypothetical samples $S$ . The target trial clarifies the intervention at the core of the claim and the scope of the intervention—applied to a sample $S$ rather than the full population $P$ . The notion of formalizing causal claims with respect to a target trial comes from Hernán and Robins (2016).

The gap-closing estimand complements other research approaches. For example, assessments of discrimination necessarily involve the causal effect of the category: when someone perceived as black is not hired for a job, would they have been hired if the prospective employer had perceived them as white? Audit studies assess discrimination by experimentally manipulating signals of a category like race while holding all else constant (Pager, 2003; Bertrand and Mullainathan, 2004). Doing so identifies the causal effect of one signal of that category, such as the name at the top of one’s resume (Kaufman, 2008; Greiner and Rubin, 2011; Sen and Wasow, 2016). But audit studies do not capture other aspects of race, such as how racism produces barriers to educational opportunity that create inequality in the content of resumes across racial categories. A similar argument would hold when the category is gender. In order to make a social category manipulable, audit studies reduce the operationalization to one small slice of the broader social and historical construct (Omi and Winant, 1994; Sen and Wasow, 2016; Kohler-Hausmann, 2018). Doing so is worthwhile: causal evidence of discrimination is extremely important. Likewise, descriptive research about disparities is extremely important. Gap-closing estimands complement each of those existing approaches: what disparity would persist under an intervention to a treatment variable?

Gap-closing estimands are rooted in causal decomposition analyses in epidemiology (Vander Weele and Robinson, 2014; Jackson and VanderWeele, 2018) and are related to causal perspectives on fairness (Zhang and Bareinboim, 2018). But they apply in a much wider range of settings. The paper proceeds in several sections. First, I argue that gap-closing estimands formalize a widespread problem. Second, I define gap-closing estimands in the potential outcomes framework and discuss two important considerations: the scope of the claim and the credibility of the intervention. Third, I present causal assumptions. Fourth, I present estimation strategies. Fifth, I illustrate the method by extending previous work that described gaps in pay by class origins (Laurison and Friedman, 2016). After detailing the method and how it can be applied, I present more detailed connections to previous research. I then conclude with implications for social science practice: gap-closing estimands would free researchers to make explicitly causal claims about interventions that could close gaps across social categories of race, gender, and class, thereby promoting transparency about research goals and assumptions as well as improving the relevance of social science to policy.

Gap-Closing Estimands Address Questions of Widespread Interest

Researchers frequently consider disparities across social categories. They also routinely consider how those disparities would change in some counterfactual scenario. Gap-closing estimands are therefore already implicit in the goals of existing research, especially in the study of race, class, and gender.

Those who study race frequently explore the degree to which racial inequality would be different under alternative institutional arrangements. Western (2006) studies the role of incarceration in the racial earnings gap among men. Although incarceration substantially harms earnings, it is rare enough that “the difference in earnings between blacks and whites would be reduced only by about 3 percent if the incarceration rate were zero,” (Western 2006:127). This is a gap-closing estimand because it involves descriptive components (the initial racial gap and the proportion incarcerated) but turns crucially on a causal effect (the effect of incarceration on the earnings of black and white men). Ciocca Eller and DiPrete (2018) examine the black-white gap in college degree completion. They conclude “if black students matched to colleges in the same way as white students with similar backgrounds, their dropout rate would decrease from 50.4 to 47.5 percent, and the BA attainment rate would increase from about 19 to 20 percent,” (Ciocca Eller and DiPrete 2018:1194). Some studies make explicit the fact that the estimand of interest involves the causal effect of some treatment. Killewald and Bryan (2016:123) identify the effect of home ownership on wealth and then conduct a simulation to show that “altering their [blacks’] homeownership experiences to be comparable to those of whites would substantially narrow race gaps in midlife wealth.” Each interpretation above appeals to the notion that gaps would change if an intervention occurred to some policy-amenable variable.

Studies of social class likewise routinely involve questions about the gap that would persist across class origins (some operationalization of one’s family background) if we intervened to help people attain some treatment that could potentially break them free from the constraints of birth. In the example to which this paper returns repeatedly, Laurison and Friedman (2016) study the earnings of British workers in higher managerial and professional occupations. Within this category, the authors examine the class gap in pay between the intergenerationally stable (those whose parents were in the professional class) and the upwardly mobile (those whose parents were in the working class). The authors control for a series of variables that might be consequences of class origins (parents’ occupational class) but causes of class destinations (own occupational class). One could interpret that research goal causally as a gap-closing estimand: the difference in pay between those of professional- and working-class backgrounds, if we intervened to lift them personally to the professional class. Similarly, studies about the role of education in social mobility often involve implicit claims about the gap-closing estimand between those of different family origins if we intervened to send them to college (Hout, 1988; Torche, 2011; Zhou, 2019).

Gender inequality likewise begets claims about how gender gaps would be different under alternative conditions. For instance, the gender wage gap within jobs is very small. Petersen and Morgan (1995:338) motivate this estimand in language that closely resembles a gap-closing estimand: “Suppose sex segregation—by occupation, establishment, or occupation-establishment—were abolished; what then would the remaining gender relative wages be?” The gender wage gap is purely a descriptive quantity, but the hypothetical intervention of equalizing the distribution across occupations and establishments invokes a causal effect of occupations and establishments on wages. Given persistent gender segregation across occupations with unequal pay (Blau and Kahn, 2017), this gap-closing estimand remains central to the study of gender wage inequality today.

Many of the examples above explicitly state that the research goal is descriptive rather than causal. On one hand, that caveat is correct: absent causal assumptions, observational evidence is necessarily descriptive. Yet one reason these claims are compelling is because the descriptive evidence points toward a possible causal claim: if we intervened on the treatment, the gap might close. For researchers who want to make that causal claim, the gap-closing estimand provides a method to do so.

Define the Goal: Gap-Closing Estimands Provide a Causal Framework for Social Categories

A gap-closing estimand explicitly appeals to a counterfactual world in which a treatment variable was reassigned. Because it is a world that does not exist, great care is needed to define that world. This section begins by defining gap-closing estimands with fixed treatment assignments (assign treatment value $t$ ) before turning to stochastic treatment assignments (assign a treatment value $t$ selected at random from a set $T$ ). Then, I discuss two important considerations when defining the goal: choice of a credible scope and choice of a realistic intervention. In the framework of Lundberg et al. (2021), this section defines the theoretical estimand and presents some considerations for choosing a theoretical estimand which is empirically tractable.

To make the gap-closing estimand precise, it is helpful to follow the advice of Hernán and Robins (2016) and specify a target trial: the hypothetical experiment which we hope to approximate by analyzing observational data. The motivation for a target trial comes from experimental settings, where the protocol for assigning the treatment makes the research goal unambiguous. Randomization is not possible in observational settings, yet we can still gain clarity about the research goal by specifying the procedure we would like to apply if it were possible. Figure 1 presents a target trial for the gap-closing estimand. Suppose you draw a sample $S$ from a population $P$ . Then, you intervene to assign each sampled unit $i$ to treatment value $T_{i} = t$ and observe the outcome $y_{i} (t)$ under that treatment. You then calculate a disparity ${\bar{y}}_{S, x^{'}} (t) - {\bar{y}}_{S, x} (t)$ in the mean outcomes of units in sample $S$ from the categories $X = x^{'}$ and $X = x$ , under exposure to treatment value $t$ . That disparity is a random variable because it averages over a random sample $S$ from the population $P$ . The gap-closing estimand is the expected disparity over hypothetical samples $S$ . (1)

The right column of Figure 1 makes this concrete. Suppose we begin with those raised in professional and working-class families, defined by the occupation of one’s father figure ( $X = x^{'}$ and $X = x$ ). Then we take a sample from the population and assign people in that sample to personally hold a professional occupation as an adult ( $T = t$ ), thus cutting the intergenerational transmission of class attainment. The gap-closing estimand is the expected pay disparity by class origin in the sample that would result from this hypothetical procedure.

Sometimes our theoretical question might not involve assigning everyone to a single treatment condition. Instead, we might want to know what would happen if we shuffled the treatment assignments (possibly as a function of covariates) while keeping their relative prevalence fixed. The gap-closing estimand extends to this type of stochastic treatment assignment rule. In the target trial, a unit $i$ would be selected as part of the sample, the researcher would observe their covariate values ${\vec{L}}_{i} = \vec{ℓ}$ (e.g., education is a college degree), and then the researcher would assign them to a treatment value $t$ (e.g., an occupational class) probabilistically from a set of many possible treatment values $T$ (e.g., professional and working-class) according to a set of probabilities ${π (t, \vec{ℓ})}_{t \in T}$ . In this notation, there is one treatment probability $π (t, \vec{ℓ})$ for each treatment value $t$ among all candidate treatment values $T$ , and those probabilities could be functions of the observed covariate values $\vec{ℓ}$ . For instance, perhaps we would counterfactually assign professional occupations with higher probabilities to those with college degrees. Each unit $i$ has an expected outcome of this experiment: the average of the set potential outcomes ${y_{i} (t)}_{t \in T}$ where each $y_{i} (t)$ is the outcome unit $i$ unit would realize under each treatment value $t$ , with each outcome weighted by the counterfactual probability $π (t, {\vec{L}}_{i})$ of assigning that treatment value. I use ${\bar{y}}_{i} (π)$ to denote this average for unit $i$ under treatment assignment rule $π$ .

(2)

Given the unit-specific expected outcome ${\bar{y}}_{i} (π)$ , let ${\bar{y}}_{S, x} (π)$ denote the mean of that unit-specific expected outcome among units $i$ in the sample $S$ for whom $X_{i} = x$ . The gap-closing estimand under a stochastic assignment rule $τ_{x^{'}, x} (π)$ is then analogous to equation (1) but with the assignment rule $π$ replacing the fixed treatment value $t$ everywhere it appears. Instead of equalizing at a single treatment value, we are equalizing the distribution from which a treatment is drawn. This idea appears in Vander Weele and Robinson (2014) and is discussed in greater depth by Jackson (2018).

Regardless of whether treatments are hypothetically assigned by a fixed rule (assign one treatment value $t$ ) or a stochastic rule (assign a treatment value $t$ probabilistically from the set $T$ ), the target trial clarifies two important considerations. The first is the scope of the intervention: we are not giving treatment to the entire population $P$ , but only to a sample $S$ . The scope is most credible if $S$ is a very small fraction of $P$ , because then the treatment assigned to one unit in $S$ is unlikely to affect the outcome of another unit in $S$ since they are dispersed over a much larger population $P$ and thus less likely to interact. The second is the degree to which the intervention is realistic: we have to think carefully about whether each unit can realistically be assigned to each treatment $T = t$ .

Consideration 1: Make a Claim With Credible Scope

It can be tempting to interpret a gap-closing estimand in terms of a global claim: what would happen if every unit in the entire population received treatment value $t$ (Figure 2). The examples highlighted previously dealt with global interventions to set the incarceration rate to zero (Western 2006: 127), to abolish occupational sex segregation (Petersen and Morgan 1995: 338), or to create “a utopian world of ‘college for all’,” (Zhou 2019: 466). All of these claims involve interventions that simultaneously alter the treatment assignments of vast swaths of the population. It is difficult to provide empirical support for these global claims because the required assumptions are particularly stringent.

Figure 2.

Clarify the scope of the intervention. A gap-closing estimand invokes a counterfactual world where treatment is reassigned for a sample $S$ from a population $P$ (Figure 1). Inference about a world where treatment is reassigned for the entire population ( $S = P$ ) is often only speculative. That global extreme is not empirically tractable unless one has strong theory about interference among units that can be encoded in a formal model. In practice, we often lack this theory. Thankfully, the policy-relevant claim is much more local: it is often infeasible for a policymaker to change treatment for the entire population at once. A claim relevant to policy is thus one about what would happen if treatment is reassigned for a very small sample drawn from the population. In the local extreme, that sample might be only one unit from each category. Empirical evidence speaks to the local extreme, and interpretations become increasingly speculative as one moves toward the global extreme.

To take one example, only 35 percent of the U.S. population ages 25 and older in 2018 held a bachelor’s degree or higher (author’s tabulation from Table 3 in U.S. Census Bureau, 2019). A world in which everyone attended college would be radically different from the world in which we live. A college degree would no longer have the same meaning. Past studies of education expansion have shown that elites find ways to maintain their advantages (Raftery and Hout, 1993; Lucas, 2001) and that the benefits of being in the pool of college graduates may decline as the size of that pool grows (Horowitz, 2018). Thus, the outcome realized under a college degree depends on how many other people are also assigned to a college degree. There is a serious problem of interference: intervening to change the treatment value of unit $i$ might change the effectiveness of the treatment for another unit $j$ . We might credibly be able to infer what would happen if treatment were expanded to a few units sampled from a much large population, so that they would not interfere with each other. But to predict what would happen if everyone received the treatment would require formal theory about interference, which might border on speculation. Often, it would be better to reduce the scope of the claim.

The interference concerns that arise with global claims involve violations of an assumption which is often overlooked: the consistency assumption (Hernán and Robins, 2020). The consistency assumption defines the potential outcomes $y_{i} (t)$ . One setting which violates consistency is interference, where the potential outcome $y_{i} (t)$ is defined as a function of the treatment value $t$ assigned to unit $i$ but in fact unit $i$ responds to the treatments assigned to other units. Sometimes this is termed a violation of the Stable Unit Treatment Value Assumption (Imbens and Rubin, 2015), and it is often emphasized in other substantive contexts, as when an experiment is carried out on units embedded in social networks (e.g., Aronow et al., 2017). Yet interference is also in play when social interactions are less explicit. This is especially true in sociological settings where giving everyone the treatment would change the effectiveness of the treatment. Population-level interpretations of gap-closing estimands lean heavily on an individualistic definition of potential outcomes in a setting where that definition is particularly imperfect.

Researchers concerned with global claims have at least three options. One option is to theorize how the system would change under the intervention and encode that theory into a formal model like an agent-based simulation (Jackson and Arah, 2020). Yet, doing so requires strong social theory about how units interfere with each other, which may not be available in some substantive settings. A second option is to select an estimand for which the interactions among units may be less severe. For example, one of the most severe threats to validity arises when the effectiveness of the treatment is a function of the proportion of the population to receive the treatment. This is the case, for example, when college helps you secure access to an occupation for which the number of available positions is limited. This particularly severe threat can be averted by shifting to a stochastic estimand that keeps the marginal distribution of the treatment at its observed distribution. We might be more willing to speculate about a world where college degrees were randomly shuffled among the population (assignment rule $π (t, ℓ) = P (T = t)$ ), thus changing who holds the degree but not changing the prevalence of the degree. There is still a leap to global inference—the data only speak to the local claim. But with stochastic interventions a researcher may be more willing to make that leap because the global counterfactual is more like the observed world, at least in the marginal distribution. Global claims are always speculative, but stochastic interventions may help to improve credibility.

In many settings, a third option is most promising: make a local (rather than global) version of the claim. Following this path, a gap-closing estimand is the expected result of a hypothetical experiment: if we sample a small fraction of the population and assign them to the treatment, what disparity would we expect to observe in that sample? The local extreme of Figure 2 is a conceptually helpful edge case: suppose we sample one unit from each category and carry out the intervention on only those two units. In that limiting case, interference problems are unlikely—if the treatment is a college degree, assigning two people to receive college degrees would not change the meaning of the treatment. Moving back from the limiting case, we may often be able to credibly infer what would happen if treatment were provided to a small sample from the population: if that small sample is randomly spread throughout the much larger population, the risk of interference is reduced. That empirically tractable target is also relevant to certain types of policies. Policymakers generally cannot intervene on the entire population at once, so what would happen if treatment is provided to a sample may be what the policymaker wants to know. At least in principle, the researcher could update the evidence with new observational analyses conducted in real time as the policy expands to ever-greater shares of the population.

Consideration 2: Define a Realistic Intervention

One can speculate about the disparity under any treatment assignment $T = t$ . But that speculation is only useful in the real world if the treatment assignment $T = t$ is plausible for each unit in question. The pre-treatment characteristics of an individual can make certain treatments implausible. For example, someone without a high school degree is factually unlikely to hold a professional occupation (e.g., as a doctor). It would be better to rule out this type of implausible intervention when defining a gap-closing estimand. Stochastic interventions $π (t, \vec{ℓ})$ are extremely helpful in this regard: the researcher can define the assignment rule such that the probability of treatment $T = t$ is zero in subpopulations $\vec{L} = \vec{ℓ}$ for whom treatment value $t$ is implausible (see Nguyen et al., 2020 footnote 22 and the discussion of positivity in the identification section of Jackson, 2021).

When defining a realistic intervention, researchers may face a tradeoff between an equitable intervention and a credible intervention. It might be most equitable to assign treatment irrespective of the covariates $\vec{L}$ , thus giving everyone in the population equal access to each treatment. But doing so risks the implausible counterfactuals discussed above. The researcher could allow $π (t, \vec{ℓ})$ to be a function of some of the covariates in $\vec{ℓ}$ , such as education. Jackson (2021) outlines how researchers could distinguish between one set of covariates that we would allow to determine treatment assignment (e.g., education) and non-allowable covariates over which unequal allocation of treatments might be considered especially discriminatory (e.g., race, sex). In some settings, one may be able to avoid implausible treatment assignments by making the counterfactual assignments $π (t, \vec{ℓ})$ a function only of allowable covariates. Third, the researcher could accept all covariates as allowable and assign treatment as a function of all covariates. For example, one could imagine shuffling treatments across the category $X$ but maintaining the distribution of treatments within each covariate subgroup $\vec{L} = \vec{ℓ}$ . Doing so would maximize credibility (covariate subgroups are assigned the treatments as observed in that subgroup, thus ruling out implausible assignments) but may come at the cost of equity (it may be unjust to assign treatment as a direct function of some covariates). The possible tradeoff between equity and credibility is the kind of substantive debate one can only have after entering a causal framework with stochastic interventions.

Identify the Estimand: Causal Assumptions are Agnostic About the Effect of Social Categories

Like causal effects, gap-closing estimands involve potential outcomes that are not observed for some units (Holland, 1986). This section focuses on the link between the theoretical estimand and an empirical estimand defined in terms of observable data (Lundberg et al., 2021). The identification assumptions for gap-closing estimands are the same as those for estimating causal effects and are untestable—they must be defended on conceptual grounds. These assumptions include consistency, conditional mean independence, and positivity (following the terminology of Hernán and Robins, 2020). Consistency defines the potential outcomes and equates the observed outcome with the potential outcome under the observed treatment condition ( $Y_{i} = y_{i} (T_{i}$ )). Conditional mean independence requires that the treatment $T$ is unconfounded given a set of observed covariates (see Figure 3). Positivity requires that treatment value $t$ occurs with non-zero probability in each stratum of those observed variables; without positivity, we might not observe any individuals from whom to learn the potential outcomes. The Appendix provides more details about these assumptions and proves that they are sufficient for identification.

Figure 3.

Identification of gap-closing estimands. Observed variables include the social category $X$ (e.g., race, class, gender), the manipulable treatment variable $T$ (e.g., college completion, occupational attainment), and other pre-treatment covariates $\vec{L}$ . Nodes $U$ and $V$ are unobserved. The blue $T \to Y$ edge represents the causal effect that must be identified. The dashed red edges in (D and E) represent threats to identification. DAGs present assumptions that are stronger than needed; identification is possible under the slightly weaker assumption of conditional mean independence for the potential outcome $Y (t)$ under the treatment value $t$ of interest (see Appendix). A gap-closing estimand is identified under a wide range of assumptions about the social category $X$ (e.g., race, class, gender). Above, there is no causal effect of $X$ (B), or it is not identified (A, C) due to the backdoor path $X \leftarrow U \to Y$ through unobserved $U$ . The gap-closing estimand is nonetheless identified. A gap-closing estimand is not identified when $T \to Y$ is not identified.

Figure 3 illustrates key issues using Directed Acyclic Graphs (DAGs, Pearl, 2000).¹ The central benefit of gap-closing estimands is that counterfactuals are defined over hypothetical interventions to the treatment $T$ and not over the gap-closing category $X$ , thereby allowing a range of assumptions about that category. For example, suppose the category $X$ is race. Race may be understood as assigned prior to all other variables, so that covariates $\vec{L}$ are consequences of race (Panel A, see Sen and Wasow, 2016). Race may be understood to have no causal effects (Panel B, see Rubin, 1986; Holland, 2008). The covariates may affect how one racially identifies (Panel C, see Saperstein and Penner, 2012). In all of these cases, the gap-closing estimand is still identified because it involves the causal effect of the treatment, not the causal effect of race. The social category is understood agnostically as a marker of two collections of units across which we seek to summarize a gap.

The settings that threaten causal identification for gap-closing estimands are the same settings that threaten identification for the causal effect of the treatment $T$ . If a backdoor path cannot be blocked due to an unobserved variable $V$ that affects both the treatment and the outcome, then the gap-closing estimand is not identified (Panel D). Confoundedness can also arise through $M$ -bias (Panel E, Greenland et al., 1999, Greenland, 2003). If the pre-treatment covariates $\vec{L}$ are causally downstream from both an unobserved variable $U$ that affects the outcome directly and an unobserved variable $V$ that affects the treatment directly, then $\vec{L}$ is a collider variable and conditioning on it can open a backdoor path $T \leftarrow V \to \vec{L} \leftarrow U \to Y$ (Elwert and Winship, 2014). This possibility illustrates the need for a clear translation between our beliefs about the causal structure of the world and the mathematical assumptions required for identification. Making the research goal explicitly causal creates new opportunities for clear reasoning about these assumptions.

Estimation: Learn From Data

The central hurdle for estimation is that some potential outcomes $y_{i} (t)$ are unobserved. If the identification assumptions hold, we can learn the average of those potential outcomes for all units by examining the units we actually observe with treatment value $T_{i} = t$ , within each subpopulation with a particular category value $x$ and covariate set $\vec{ℓ}$ . Then, we can aggregate those subpopulation estimates across the population distribution of ${X, \vec{L}}$ . In a limited sample size, however, each of those subpopulations may have very few observations. Statistical models and machine learning tools improve efficiency by sharing information across subpopulations. Choices about how to share information correspond to the estimation step of Lundberg et al. (2021).

To simplify the discussion of estimation, it is helpful to break the gap-closing estimand into two components, which I call post-intervention means.

(3)

Once you have an estimator for each post-intervention mean $θ$ , you have by extension an estimator for the gap-closing estimand $τ$ . For simplicity, this section therefore focuses on estimators for post-intervention means: the expected outcome in a category $X_{i} = x$ under assignment to treatment by a rule $π$ . I first discuss estimation by predicted outcomes (e.g., regression modeling) and by predicted treatment probabilities (e.g., propensity score weighting). Then, I discuss how these two approaches can be brought together in a doubly robust estimator that is consistent if either estimator is consistent. The doubly robust estimator can be interpreted as either estimator paired with a bias correction. Double robustness becomes especially helpful when the prediction functions for treatment probabilities and outcomes are estimated by machine learning, which is biased by regularization. In that setting, it is also useful to carry out the bias correction of the doubly robust estimator with a sampling splitting step. The end of the section briefly discusses standard error estimation by resampling strategies.

Estimation by Predicted Outcomes

One estimation approach relies on a function $g ()$ for predicting the outcome variable.

Outcome prediction function : g (t, x, \vec{ℓ}) = E (Y ∣ T = t, X = x, \vec{L} = \vec{ℓ})

(4)

The true conditional mean function g() could be approximated by any statistical or machine learning algorithm (e.g., OLS) to predict an outcome

Y

as a function of

T

X

, and

\vec{L}

. If estimation is conducted on a complex survey sample, one should use survey weights in estimation of

g ()

A function to predict outcomes can be converted to an estimate of the gap-closing estimand by an approach known as the parametric $g$ -formula (Robins, 1986, Hernán and Robins 2020:166). The researcher uses the estimated function to predict unobserved potential outcomes with estimates $\hat{g} ()$ , thereby imputing the potential outcome under treatment value $t$ for every unit regardless of whether that particular unit is observed in this treatment condition. It then becomes possible to aggregate over all observations by a weighted mean, (5) where $η_{i}$ are inverse probability of sampling weights and $η_{x}$ is the sum of these weights among sampled units in category $X = x$ . After estimating the post-intervention means ${{\hat{θ}}_{x^{'}}^{Outcome}, {\hat{θ}}_{x}^{Outcome}}$ in each category ${x^{'}, x}$ , the difference is an estimate of the gap-closing estimand.

Estimation by Predicted Treatment Probabilities

Instead of predicting the unobserved outcomes, one can also reweight the observed outcomes to draw inference about the average outcome under treatment in the population. This approach begins with a function $m ()$ to predict the probability of treatment, which could be estimated by any statistical or machine learning algorithm (e.g., logistic regression).

Treatment prediction function : m (t, x, \vec{ℓ}) = P (T = t ∣ X = x, \vec{L} = \vec{ℓ})

(6)

The task then becomes analogous to sampling from a population. We only observe the outcome

y_{i} (t)

for units who factually have

T_{i} = t

(analogous to only observing outcomes for sampled units). We estimate the probability that

T_{i} = t

from the prediction function above (analogous to the known probability of sample inclusion in a survey). Generalized propensity score methods (Imbens, 2000) draw on this analogy to estimate the population-average outcome under treatment

T = t

by the sample mean with inverse probability of treatment weights, analogous to similar estimators for population means weighted by inverse sampling probabilities (Horvitz and Thompson, 1952). For the more general setting of a stochastic assignment rule to set

t

to a value selected probabilistically from the set

T

, the inverse probability weights involve the ratio of the assignment probability for the factual treatment

T_{i}

under the counterfactual rule

π (T_{i}, {\vec{L}}_{i})

and the factual rule

m (T_{i}, X_{i}, {\vec{L}}_{i})

. The estimate is a weighted average of the observed outcomes,

(7) where

η_{i}

accounts for the unequal sampling probabilities for units from the population. An estimate

\hat{m} ()

of the generalized propensity score thus translates to estimates

{{\hat{θ}}_{x^{'}}^{Treatment}, {\hat{θ}}_{x}^{Treatment}}

of the post-intervention means, which can be differenced to estimate the gap-closing estimand.

Doubly Robust Estimation

Doubly robust estimation combines predicted outcomes and treatment probabilities to produce an estimator that is consistent if either the estimator for the outcome prediction function is consistent for the true conditional mean function or the estimator for the treatment prediction function is consistent for the true conditional probability of treatment (Figure 4, see Robins et al., 1994; Bang and Robins, 2005; Glynn and Quinn, 2010).² Begin with an estimate ${\hat{θ}}_{x}^{Outcome} (π)$ based on predicted outcomes from $\hat{g} ()$ . Suppose the outcome function $\hat{g} ()$ has a functional form that is misspecified (e.g., missing an interaction term, so that $\hat{g} () \to \tilde{g} () \neq g ()$ ). For each unit, we are able to observe the error $\hat{g} (T_{i}, X_{i}, {\vec{L}}_{i}) - Y_{i}$ for the potential outcome in the observed treatment condition $T_{i}$ . By reweighting the errors by the inverse probability of treatment, we can estimate the average error in the target population (all units, regardless of their factual treatment). A doubly robust augmented inverse probability weighting estimator subtracts the estimated bias off from the outcome estimator, (8) where ${\hat{w}}_{i} = \frac{η_{i} \hat{π} (T_{i}, {\vec{L}}_{i})}{\hat{m} (T_{i}, X_{i}, {\vec{L}}_{i})}$ as in the treatment prediction estimator.³

Figure 4.

Doubly robust estimation helps when one of two prediction functions is correct. The data in this example are simulated, so the truth is known. The vertical line indicates the true gap-closing estimand $τ_{1, 0} (1)$ across binary categories $X = 1$ and $X = 0$ under an intervention set treatment to $T = 1$ . Densities depict the empirical distribution of estimates ${\hat{τ}}_{1, 0} (1)$ by the three estimators on the vertical axis, over 1,000 simulations. Axis labels are omitted because the scale of the simulation is arbitrary; all three panels have the same scale. Estimation by predicted outcomes can be biased if the outcome prediction function is misspecified (in this case, missing an interaction term). Estimation by predicted treatment probabilities can be biased if the function to predict treatment probabilities is misspecified (in this case, missing a quadratic term). Doubly robust estimates are consistent and approximately unbiased as long as one of the two is correctly specified. In this simulation, there is a binary category $X$ , a binary treatment $T$ , and one continuous confounder $L$ . The confounder $L$ is a stochastic function of the category $X$ , the treatment $T$ is a stochastic function of the confounder, and the outcome $Y$ is a stochastic function of the confounder and the treatment. Simulation details are provided in the Appendix, simulation 1.

Doubly robust estimates are consistent if either $\hat{g} ()$ or $\hat{m} ()$ is consistent. There are two important caveats. First, if both prediction functions are inconsistent, there is no guarantee that a doubly-robust estimator will outperform an estimator based solely on outcome predictions (see the exchange between Kang and Schafer, 2007 and Robins et al., 2007). The next subsection addresses the setting where both the treatment and the outcome functions have unknown functional forms. Second, double robustness is about estimation of the factual outcome and treatment functions $g ()$ and $m ()$ ; there is no double robustness to misspecification of a counterfactual treatment assignment rule $\hat{π} ()$ that is learned from the data. For example, suppose you study the outcome when treatment is assigned proportional to its factual prevalence in each covariate subgroup, marginalized only over the category $X$ . You might estimate those prevalences from the data with a logistic regression model. But if that logistic regression model is misspecified, then your estimated counterfactual assignment rule $\hat{π} ()$ might be inconsistent for the true rule $π ()$ about which you sought to learn. Thus, while double robustness protects against misspecification of the factual functions $m ()$ and $g ()$ , it provides no protection against a researcher’s misspecification of the counterfactual rule $π ()$ . There are two common settings in which this is not a problem. First, this is not a problem when the counterfactual rule $π ()$ is known a priori, as when a researcher is interested in giving the treatment to all units. Second, it is not a problem for relatively simple counterfactual rules $π ()$ which can be learned nonparametrically (e.g., by means in subpopulations). While the world determines the complexity of the factual assignment rule $m ()$ , the complexity of the counterfactual rule $π ()$ is entirely under the researcher’s control: you can choose the complexity of the $π ()$ that you define. Thus, you can define it to involve a small set of categorical predictors linked to treatment by a function that is easy to learn. In contrast, the factual treatment function $m ()$ must include all confounders with whatever functional form factually exists for them, and thus robustness to misspecification of $m ()$ can be very useful.

Estimation With Cross Fitting

At the core of doubly robust estimation is an estimated bias that involves residuals $\hat{g} (T_{i}, X_{i}, {\vec{L}}_{i}) - Y_{i}$ . But if $\hat{g} ()$ is learned on a sample that includes case $i$ , the residual could be misleading due to overfitting—the algorithm explicitly learned $\hat{g} ()$ to minimize a loss function involving this residual. Sample splitting can produce a better bias correction: learn $\hat{g} ()$ and $\hat{m} ()$ in one sample and use another sample to convert them into an estimate of the causal estimand (including the bias correction). Cross fitting carries out this sample splitting procedure repeatedly: use subsample A for learning and B for estimation, then B for learning and A for estimation, and average the results (Chernozhukov et al., 2018; Bickel, 1982).

When the treatment probabilities and expected outcomes follow unknown functional forms, cross fitting becomes especially important. In these settings, the researcher can learn the treatment and outcome functions with flexible machine learning estimators (as in McCaffrey et al., 2004; Lee et al., 2010; Hill, 2011; van der Laan and Rose, 2011). For example, a random forest would automatically learn interactions and nonlinearities (Breiman, 2001). Machine learning estimators balance bias against variance to produce a result optimized for individual-level prediction. But the task when estimating a gap-closing estimand is not individual level prediction; a small but desirable bias for individual-level prediction could correspond to a large and undesirable bias once aggregated across individuals. For this reason, the bias correction of doubly robust estimation is important when using machine learning estinators, and is especially helpful when carried out with cross-fitting (Chernozhukov et al., 2018).

The most pronounced benefit of cross fitting with machine learning estimators is improved convergence toward the truth as the sample size grows. Figure 5 illustrates in a simulated setting. The true gap-closing estimand is known and we can observe the root mean squared error (RMSE) of the gap-closing estimator over many simulations at various sample sizes. As the sample size grows, the RMSE approaches zero more quickly when cross fitting is used in the estimation procedure (details in Appendix simulation 3).

Figure 5.

Cross fitting can improve convergence rates. The data in this example are simulated, so the truth is known. Potential outcomes are a linear function of a binary treatment and 10 continuous confounders in this simulation. In the true data generating process, there is no disparity across categories $X = 1$ and $X = 0$ in the absence of treatment. The treatment effect is $+ 1$ if $X = 1$ and $- 1$ if $X = 0$ , so the gap-closing estimand under assignment to treatment is $τ_{1, 0} (1) = 2$ . The simulation estimates ${\hat{τ}}_{1, 0} (1)$ on samples from this data generating process at each sample size ( $x$ -axis). Because a researcher would not know the true functional form, I learn the functional form from the data with random forests using the defaults of the ranger package in R (Wright and Ziegler, 2017). I learn (1) a forest for treatment probabilities and (2) a forest for the potential outcome under treatment, estimated on the subsample that was factually treated. Then I aggregate to a gap-closing estimate ${\hat{τ}}_{1, 0} (1)$ according to the three methods represented by curves. Performance is measured by root mean squared error (RMSE) across the 1,000 simulations, $\sqrt{\frac{1}{1000} \sum_{r = 1}^{1000} ({\hat{τ}}_{1, 0}^{r} (1) - τ_{1, 0} (1))^{2}}$ . At small sample sizes, doubly robust estimation with cross fitting is a suboptimal estimator because of its high variance. As the sample size grows, the doubly robust cross fitting estimator RMSE converges to zero most quickly. For simulation details, see the Appendix, simulation 3.

Standard Error Estimation for Gap-Closing Estimands

The estimators discussed above all involve multiple steps: fit a predictive algorithm for the treatment and/or outcome, make predictions, and report some function of those predictions. Analytical standard errors for this procedure are not straightforward. Instead, researchers should produce standard errors computationally by resampling-based methods that mimic the process by which the sample was drawn. If the sample is a simple random sample, then the variance can be estimated by the nonparametric bootstrap: sample ${d a t a}^{*}$ from $d a t a$ with replacement, calculate an estimate $τ^{*}$ on the resampled data, and use the empirical variance over resampled estimates $\hat{V} ({\hat{τ}}^{*})$ as an estimator for the variance of the point estimate $V (\hat{τ})$ (Efron and Tibshirani, 1994). In complex survey samples, the resampling procedure should mimic the procedure by which the original sample was drawn (Krewski and Rao, 1981; Rao and Wu, 1988; Rust and Rao, 1996; Lumley, 2011). The Appendix presents an example. Variance estimates can be converted into confidence intervals by a normal approximation to the sampling distribution.

Empirical Example: Class Ceiling in Pay

Laurison and Friedman (2016) describe the log incomes of British workers who attain higher managerial and professional occupations (hereafter “professional class”). Among this high-attainment category, mean log income is still lower for those whose father held a working-class occupation. The authors coin the term “class ceiling” for this intriguing descriptive result. This section extends the idea to a related causal estimand: what pay gap by class origin would persist if we intervened to assign some individuals to professional class destinations? This gap-closing estimand makes no restrictions on the meaning or causal importance of class origin $X$ (the social category), which simply denotes the collections of individuals with professional vs. working-class parents. The estimand directs attention to the manipulable treatment $T$ : one’s own class destination (a professional vs. a working-class occupation). It becomes important to adjust for observed confounders $\vec{L}$ of the association between class destination and pay (e.g., one’s own education), but not of confounders between class origin and pay (e.g., one’s parents’ education). The end product is a gap-closing estimand that answers a straightforward question: whether an intervention to one’s own class attainment $T$ can break one free from the constraints of class origin $X$ .

I assess this new question in the U.S. context, analyzing data from the 1975–2018 General Social Survey (GSS, Smith et al., 2018), which is conducted each year on a national probability sample ( $N =$ 12,328). I define professional class for the respondent and for the father as any occupation in Class I of the Erikson-Goldthorpe-Portocarero (EGP) schema (Erikson et al., 1979), which Laurison and Friedman (2016) highlight as the U.S. parallel to the U.K. National Statistics Socioeconomic Classification used in their study. Examples of professional occupations include manager, engineer, scientist, and lawyer. In the weighted sample, 10.2% of respondents report that their father held a professional occupation (the gap-defining category) and 9.6% report that they personally hold a professional occupation (the gap-closing treatment). The outcome variable is log annual income in 1986 dollars. Because the goal of this example is to illustrate the method rather than to provide the definitive answer to this substantive question, I keep the adjustment set as simple as possible and only include four covariates: race (white, black, other), sex, age, and highest degree (less than high school, high school, junior college, bachelor’s, or a graduate degree). Additional details about the data and sample restrictions are provided in the Appendix. In this example, the unit-specific quantity at the core of the claim is potential log income in a professional occupation. The target population is U.S. adults ages 30–45 years in 1975–2018, with each year equally weighted. The theoretical estimand is the unit-specific quantity averaged over the target population.

Identification

Suppose we took a sample from the population and reassigned the class destinations of that sample. To what degree would the pay gap by class origin close for that sample? Answering this question requires us to identify the causal effect of one’s own occupational class on pay. As in section “Identify the Estimand: Causal Assumptions are Agnostic About the Effect of Social Categories,” this requires the assumption that the population average potential log income that would be realized in a professional occupation is equal to the observable outcome among those who factually hold a professional occupation, within subgroups defined by race, sex, age, education, and father’s occupational class. Because education is such a strong determinant of both occupational attainment and pay, its inclusion in this adjustment set is essential. Nonetheless, this conditioning set is likely to yield only imperfect identification. Because the aim of this analysis is only to illustrate the method, I proceed with this simple example and leave it to future work to conduct similar analyses in settings where the identification assumptions are more plausible but which would be more complex for illustrating the method.

Estimation

I estimate the outcome function $g$ and the treatment propensity function $m$ by ordinary least squares and logistic regression, respectively (Appendix Table 2). Using standard regression estimators is useful because it clarifies how the gap closing estimand is distinct from a coefficient, even when the conditional mean functions are estimated with coefficient-based regression models. I aggregate these functions to an estimate by the doubly robust estimator (equation (8)). I estimate the variance by balanced repeated replication (BRR), a computational strategy for variance estimation in complex survey samples (Krewski and Rao 1981:1013). The Appendix presents the variance estimation procedure. Results with alternative estimators including are reported in Figure 11, including machine learning estimators that are more data-adaptive.

Results

Figure 6 presents results. Descriptively, log incomes are 0.32 points higher for those from professional class origins compared with working-class origins. One might argue that this pay gap is caused by the unequal rates at which people from these categories attain professional class destinations themselves: 24% among those from professional origins attain professional destinations compared with only 8% among those from working-class origins (Appendix Table 1). The gap-closing estimand allows us to estimate the gap that would persist if we took a sample and assigned them to professional class destinations, thus cutting off this potential explanation for the disparity in that sample. Under that intervention, the disparity would remain at 0.27 (84% of its original size, Panel A). Assigning a professional destination would increase pay in both groups, but the causal effects are about the same size so that the gap is almost unchanged. Thus, the disparity in class destinations does not explain the pay disparity by class origin—the pay disparity would be almost the same size even if we equalized class destinations. This reinforces the general conclusion of Laurison and Friedman (2016): attaining a professional destination is insufficient to erase the disparity by class origin. It reinforces it to a much larger degree: intervening to assign a professional class destination would reduce the pay gap by class origin by a tiny amount. Likewise, a substantial disparity by class origin would persist if we intervened to assign a working-class occupation (Panel B), if we stochastically intervened to assign a random class proportional to their population prevalence (Panel C), or if we stochastically intervened to assign a random class within subgroups of covariates (Panel D). In all cases, the gap-closing estimand is nearly as large as the descriptive disparity.

Figure 6.

An intervention to change one’s own social class would do little to close the pay gap across categories of one’s father’s social class. Among those whose father held a professional occupation, mean log income is 0.32 points higher than among those whose father held a working-class occupation. But what if we intervened on a sample to send people personally to professional occupations? Would the gap close? That intervention (Panel A) would causally increase pay in both categories but would leave the gap across categories almost unchanged. Similarly, the gap would be almost unchanged if we counterfactually assigned people to a working-class occupation (Panel B), to an occupational class selected randomly proportional to its prevalence in the population (Panel C), or to an occupational class selected randomly proportional to its prevalence among those who match the covariates of the person in question (Panel D). In no case does an intervention to class destination substantially close the pay gap by class origin. The Appendix presents details for this illustration, including sample selection, definitions of the interventions, and the regression specifications. Conceptually, this figure builds on Laurison and Friedman (2016). Data are pooled from the 1975–2018 General Social Survey ( $N = 12, 328$ ). The term “professional class” as used here refers to holding any occupation within Class 1 of the Erikson et al. (1979) class schema, with all other occupations termed “working class.” Results use doubly robust estimators (equation (8)) with $g$ and $m$ estimated by weighted linear and logistic regression, respectively (Table 2). Confidence intervals are calculated by balanced repeated replication (see Appendix). Figure 11 presents estimates under alternative estimation approaches.

These results speak to theories of social mobility. Theories often posit a status attainment process that begins with one’s family background as a constraint on life chances. Over the life course, attaining a high level of education or a professional occupation might gradually free one from those constraints, as one’s own status overpowers disparities determined by one’s family of origin (e.g., the discussion of college in Hout, 1988). But the evidence here casts doubt on that set of theories. Even if we intervened to assign people to professional occupations, the pay disparity by class origin would almost entirely remain. Occupational attainment does not have the power to liberate individuals from the shadow of their family background.

The test of those theories in this empirical example is limited; all of these claims rely on identification assumptions involving no unobserved confounding, and these assumptions are unlikely to hold in the current example because of the very limited adjustment set. Yet unobserved selection into treatment may actually bias the estimate toward an overstatement of the degree to which an intervention to class destinations would close the pay gap by class origins. Those from working-class origins face greater barriers to occupational attainment, so those who attain professional destinations may be more positively selected among those from working-class origins than professional origins. This would upwardly bias the estimated post-intervention means, but moreso among those of working-class origins, which would downwardly bias the gap-closing estimand (discussed in greater depth in the Appendix). Due to this bias, the disparity by class origin if we assigned people to professional class destinations might be even bigger than the estimate reported in this paper. Overall, there is good reason to believe that personally attaining a professional class occupation does very little to attenuate the pay disparity by class origin.

Contribution and Related Work

This paper introduces gap-closing estimands for social scientists, drawing on a growing literature on causal decomposition analysis in epidemiology and biostatistics (Vander Weele and Robinson, 2014; Jackson and VanderWeele, 2018; Jackson, 2018; Jackson and Arah, 2020; Jackson, 2021). Studies of fairness in machine learning are beginning to consider the causal process that produces an observed disparity (Zhang and Bareinboim, 2018). The present paper connects those research goals to a broader class of social science settings. In the service of that primary goal, there are three specific contributions: a conceptual contribution delimiting the scope of the intervention, a technical contribution deriving a doubly robust estimator for this setting, and a contribution to the accessibility of these methods by introducing them with examples intended to reach a broad social science audience.

The conceptual contribution delimits the scope of the intervention: the gap-closing estimand is the expected disparity in a sample $S$ from a population $P$ if treatment is reassigned for that sample (Figures 1 and 2). In experimental settings, this distinction is obvious: the protocol of any randomized controlled trial makes it clear that treatment is not randomized for the entire population but only for a sample. In observational settings, past work has already emphasized how appeal to a hypothetical experiment clarifies issues like how the treatment is defined, the assignment process, and who would be eligible (Hernán and Robins, 2016; Hernán, 2016; Hernán et al., 2016). The target trial is important not only for those reasons but also because it allows one to distinguish an intervention applied to a sample $S$ from one applied to a population $P$ . Even though the intervention sample $S$ is purely hypothetical in observational settings, the distinction clarifies the degree to which interference (a violation of consistency) threatens the validity of interpretations. When we conceptualize $S$ as a small fraction of $P$ , interference problems are greatly reduced because most of the people with whom any unit $i$ interacts would not experience a change in treatment assignment under the intervention. When we conceptualize a treatment applied to the full population $S = P$ , there is an enormous risk of interference that would need to be modeled formally (Jackson and Arah, 2020). A target trial that explicitly appeals to a sample $S$ is not a methodological fix for this problem; it is a conceptual framework for explicitly navigating the tension between the credibility of the claim (most credible if the size of $S$ is much less than $P$ ) and the scope of the claim (the size of $S$ ).

The technical contribution is a doubly robust estimator for this particular setting: stochastic treatment assignments in a complex survey sample. Doubly robust estimators have a long history in causal inference (Robins et al., 1994; Bang and Robins, 2005; Kang and Schafer, 2007), including for the setting with stochastic treatment assignments (see Dudík et al., 2014 Sec. 3.3 and Murphy et al., 2001 Sec. 5.2).⁴ There is also growing interest in applying doubly robust estimators in complex survey samples (Rudolph et al., 2014), albeit with a focus on non-stochastic treatment assignment rules. The present paper proves double robustness for stochastic treatment assignment rules when the sample contains units selected from the population with unequal probabilities (proofs in the Appendix). Given a doubly robust estimator, the extension to machine learning estimation of nuisance functions with unknown functional forms follows directly from the literature for causal effects (van der Laan and Rose, 2011; Chernozhukov et al., 2018). This paper also argues that double robustness for the treatment and outcome prediction functions $m ()$ and $g ()$ is important because these can be complex, whereas the researcher can control the complexity of the counterfactual rule $π ()$ for which there is no robustness to misspecification.

Each contribution above serves the third and main contribution: bringing a range of existing ideas together in an accessible framework designed to support social science inquiry to define interventions, estimate the resulting disparities, and inform policy to close gaps. By illustrating the relevance of gap-closing estimands to a range of substantive questions about inequality, this paper builds a bridge between the questions sociologists are already asking implicitly and a set of methods that can answer those questions more explicitly. In the service of this third contribution, the gapclosing R package provides publicly available and open source software to support the methods described in this paper.

Comparing Coefficients Across Regressions Does Not Estimate a Gap-Closing Estimand

There is a common research practice that does not estimate a gap-closing estimand. Suppose a researcher estimates two regression models, one with and one without the treatment variable $T$ .

E (Y ∣ X, \vec{L}) = α_{Model 1} + β_{Model 1} X + {\vec{η}}_{Model 1}^{'} \vec{L}

(9)

E (Y ∣ X, \vec{L}, T) = α_{Model 2} + β_{Model 2} X + {\vec{η}}_{Model 2}^{'} \vec{L} + γ_{Model 2} T

(10)

Suppose both functional forms are correct (matching the true conditional mean function). Suppose also that the adjustment set

{X, \vec{L}}

identifies the causal effect of

T

(as in Figure 3). The researcher might interpret the difference

β_{Model 1} - β_{Model 2}

as the amount of the disparity “attributable” to the treatment. In the class gap in pay example, adding class destination (

T

) to an OLS model of log income (

Y

) would reduce the coefficient on class origin (

X

) from

{\hat{β}}_{Model 1} = 0.05

{\hat{β}}_{Model 2} = 0.02

(Appendix Table 3). One might infer (incorrectly) that over half of the pay gap by class origin is “attributable” to class destination. But that comparison speaks to a complicated quantity.

\begin{aligned} β_{Model 1} - β_{Model 2} = & \overset{Disparity\,\,within\,\,subgroup \, \vec{L} = \vec{ℓ}}{\overset{⏞}{[E (Y ∣ X = 1, \vec{L} = \vec{ℓ}) - E (Y ∣ X = 0, \vec{L} = \vec{ℓ})]}} \\ - \underset{Disparity\;within\;subgroup\; \vec{L} = \vec{ℓ} under\;intervention\;to\;set\; T = t}{\underset{⏟}{[E (Y (t) ∣ X = 1, \vec{L} = \vec{ℓ}) - E (Y (t) ∣ X = 0, \vec{L} = \vec{ℓ})]}} \forall t, \vec{ℓ} \end{aligned}

(11)

This quantity focuses on a subpopulation with a particular covariate set

\vec{L} = \vec{ℓ}

. Then, it involves the disparity that would persist in that subpopulation if the treatment were set to some value

T = t

. Because there are no interactions in equation (9)–(10), the comparison between the two is (by assumption) the same at every value of

\vec{ℓ}

and

t

The comparison in equation (11) is difficult to interpret for two reasons. First, neither $β_{Model 1}$ nor $β_{Model 2}$ is the marginal descriptive disparity. Both adjust for covariates $\vec{L}$ that may lie along the causal path $X \to \vec{L} \to Y$ from the category to the outcome. Holding them constant blocks an important path. For the same reason, neither coefficient is a gap-closing estimand—an intervention to equalize the treatment would not equalize the covariates $\vec{L}$ . You might still interpret equation (11) as the amount that the intervention would close the gap net of the covariates $\vec{L}$ . But then there is a second complexity: endogenous selection bias opens a new path $X \to \vec{L} \leftarrow U \to Y$ which makes the disparity hard to interpret. For example, consider the pay disparity by class origin $X$ among those who complete a college degree $\vec{L} = \vec{ℓ}$ . Because of barriers to college access, people from working-class origins $X = 0$ who nonetheless overcome the odds to complete college $\vec{L} = \vec{ℓ}$ may have more work ethic (an unobserved $U$ ) than their counterparts who did not complete college. Adjusting for education $\vec{L}$ creates that association even if work ethic $U$ is marginally unrelated to class origin $X$ . The reason interpretations are difficult is because the covariates $\vec{L}$ are possibly consequences of the gap-defining category $X$ . The gap-closing estimand avoids both of the issues by not holding $\vec{L}$ constant. The covariates $\vec{L}$ are used to learn the algorithm to predict potential outcomes. But then those predictions are made for each unit $i$ with the covariates at their observed values ${\vec{L}}_{i}$ . Thus, the gap-closing estimand addresses an intervention to the treatment $T$ while allowing the covariates $\vec{L}$ to remain as they factually exist.

Past Work on Neighboring Descriptive and Causal Topics Does Not Speak to Gap-Closing Estimands

Descriptive work may seem related but is distinct from gap-closing estimands because it does not invoke a causal claim. Substantial scholarship within sociology has summarized disparities by Kitagawa-Blinder-Oaxaca decompositions (Kitagawa, 1955; Blinder, 1973; Oaxaca, 1973). This technique (or the extension to generalized linear models by Fairlie, 2005) appears in examinations of inequality over numerous social categories in sociology: class origins (Laurison and Friedman, 2016), race (Ciocca Eller and DiPrete, 2018), disability (Shandra, 2018), sexual orientation (Mize, 2016), and gender (Weisshaar, 2017), to name a few. Absent causal assumptions, these decompositions are purely descriptive: they provide evidence about the disparity across categories among units who are identical along all covariates. Gap-closing estimands are different because they speak to whether an intervention would causally close a disparity. To use Kitagawa-Blinder-Oaxaca decompositions to estimate gap-closing estimands requires careful interpretation under a specific set of causal assumptions and a specific assumed functional form (Jackson and VanderWeele, 2018).⁵

Causal work on controlled direct effects may seem related but is distinct because it posits a different kind of intervention. A controlled direct effect (Pearl, 2001; Robins, 2003; Acharya et al., 2016; Zhou, 2019) is best understood in the context of a hypothetical experiment: the expected difference in an outcome $Y$ if we intervene to assign someone to treatment $X = x^{'}$ or control $X = x$ while intervening to hold a post-treatment intervention at some fixed value $T = t$ . In potential outcomes notation, the CDE involves potential outcomes $y_{i} (x, t)$ defined over both the category and the treatment. In contrast, the gap-closing estimand involves potential outcomes $y_{i} (t)$ defined over the treatment only, which are then aggregated over collections of units $i$ defined by $X$ (which is never subject to a hypothetical intervention). This essential distinction makes the gap-closing estimand applicable in settings where mediation estimands are not: when the category $X$ is a social category for which the causal effect is philosophically fraught. Nonetheless, methods developed to estimate controlled direct effects have analogs for gap-closing estimands. Structural nested mean models for controlled direct effects (Vansteelandt, 2009) are analogous to estimation by predicted outcomes; both depend on models of the conditional mean of the outcome. Inverse probability weighting (VanderWeele, 2009) is directly analogous to estimation by predicted treatments. Other methods developed for controlled direct effects, such as regression with residuals (Zhou, 2019), could point toward analogous procedures for gap-closing estimands.

Discussion

Gap-closing estimands provide opportunities to not only study disparities, but to learn about interventions to close them. By making an explicitly causal claim, researchers who estimate gap-closing estimands gain transparency about required causal assumptions needed for identification. They also gain opportunites to estimate by flexible predictive algorithms, which are implemented in open-source software (the gapclosing package). Gap-closing estimands have the potential to promote deeper understanding of gaps across social categories and how to close them.

The knowledge we can gain from gap-closing estimands complements existing bodies of research that focus on descriptive disparities and causal assessments of discrimination. Descriptive research could proceed in tandem with gap-closing research; a descriptive study documenting the presence of a large disparity would set the stage for subsequent studies to explore interventions to close that disparity. Causal assessments of discrimination in audit studies provide one type of understanding about why a gap exists: a gap may exist because decision makers react differently when they perceive a person to be of one category versus another. Gap-closing estimands provide a complementary type of understanding about disparities: how would a gap change if some other treatment variable took a different value? To fully understand disparities, we need both types of understanding.

An embrace of gap-closing estimands would both sharpen theory and change the language with which social scientists discuss race, class, and gender in observational studies. Too often, theory about disparities involves vague claims about the role that some treatment $T$ plays in generating the gap. Gap-closing estimands lend new theoretical precision to those claims: the treatment $T$ plays a role in the sense that an intervention to reassign that treatment by a new rule in a sample would change the gap in a particular way. Thus, the treatment is a cause of the gap. Language for discussing the categories would also change. We would no longer speak of the “effect” or “influence” (in scare quotes) of categories like race, class, and gender in observational studies. After all, a hypothetical intervention to change someone’s race, class, or gender (not just a signal, but the entire construct) is neither straightforward to define nor especially relevant for policy. Instead, researchers would be clear that the only effect identified is the effect of a manipulable treatment. Doing so would create space for substantive arguments about causal assumptions and technical advances in estimation, both of which would be grounded in a research goal defined outside of the statistical algorithm (Lundberg et al., 2021).

Substantively, a shift away from conditional comparisons (e.g., coefficients that statistically adjust for many covariates) and toward gap-closing estimands (e.g., the outcome of an intervention on one variable) might reveal that post-intervention disparities are actually larger than researchers might have otherwise thought. This shift would improve rhetorical clarity and also contribute to the policy-relevance of research: policymakers can understand that the research implies that an intervention to the variable studied might plausibly close a gap. Finally, formalizing the hypothetical experiment provides an opportunity to clarify the scope of the intervention (to a sample vs. to the population). Gap-closing estimands provide a framework for clarity about the target of statistical inference, thereby promoting transparent research and new estimators like the one developed in this paper to help us build evidence on interventions to close gaps.

Footnotes

Appendix

Acknowledgments

The methods presented in this paper are implemented in the R package gapclosing, available from CRAN. Replication code is available on Dataverse: doi.org/10.7910/DVN/UWYAJD. A preprint is available on SocArxiv: doi.org/10.31235/osf.io/gx4y3. For feedback relevant to this project, I thank Brandon Stewart, Matthew Salganik, Belén Unzueta, Christopher Felton, Daniela Urbina, Felix Elwert, Gillian Slee, Hannah Waight, Janet Xu, Rebecca Johnson, Simone Zhang, Xiang Zhou, members of the Stewart Lab, and reviewers.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.

Funding

Research reported in this publication was supported by the National Science Foundation under Award Number 2104607 and by The Eunice Kennedy Shriver National Institute of Child Health & Human Development of the National Institutes of Health under Award Numbers P2CHD047879 and P2CHD041022.

ORCID iD

Ian Lundberg

Supplemental material

Supplemental material for this article is available online.

Notes

Author Biography

Ian Lundberg is a Postdoctoral Scholar in the Department of Sociology and California Center for Population Research at UCLA. He completed his PhD at Princeton University in May 2021. His research develops statistical and machine learning methods to answer new questions about inequality in America. You can read more at ianlundberg.org.

References

Acharya

Blackwell

Sen

2016. “Explaining Causal Findings Without Bias: Detecting and Assessing Direct Effects.” American Political Science Review 110 (3): 512-29.

Glynn

A. N.

2019. “Treatment Effect Deviation As An Alternative to Blinder-Oaxaca Decomposition for Studying Social Inequality.” Sociological Methods & Research 50 (3): 1006-33.

Aronow

P. M.

Samii

, et al. 2017. “Estimating Average Causal Effects Under General Interference, With Application to a Social Network Experiment.” The Annals of Applied Statistics 11 (4): 1912-47.

Bang

Robins

J. M.

2005. “Doubly Robust Estimation in Missing Data and Causal Inference Models.” Biometrics 61 (4): 962-73.

Beller

. 2009. “Bringing Intergenerational Social Mobility Research Into the Twenty-first Century: Why Mothers Matter.” American Sociological Review 74 (4): 507-28.

Bertrand

Mullainathan

2004. “Are Emily and Greg More Employable Than Lakisha and Jamal? A Field Experiment on Labor Market Discrimination.” American Economic Review 94 (4): 991-1013.

Bickel

P. J

. 1982. “On Adaptive Estimation.” The Annals of Statistics 10 (1): 647-71.

Blau

F. D.

Kahn

L. M.

2017. “The Gender Wage Gap: Extent, Trends, and Explanations.” Journal of Economic Literature 55 (3): 789-865.

Blinder

A. S

. 1973. “Wage Discrimination: Reduced Form and Structural Estimates.” Journal of Human Resources 8 (4): 436-55.

10.

Breiman

. 2001. “Random Forests.” Machine Learning 45 (1): 5-32.

11.

Chernozhukov

Chetverikov

Demirer

Duflo

Hansen

Newey

Robins

2018. “Double/Debiased Machine Learning for Treatment and Structural Parameters.” The Econometrics Journal 21 (1): C1-C68.

12.

Chetty

Hendren

Kline

Saez

Turner

2014. “Is the United States Still a Land of Opportunity? Recent Trends in Intergenerational Mobility.” American Economic Review 104 (5): 141-47.

13.

Ciocca Eller

DiPrete

T. A.

2018. “The Paradox of Persistence: Explaining the Black-White Gap in Bachelor’s Degree Completion.” American Sociological Review 83 (6): 1171-214.

14.

Díaz Muñoz

van der Laan

2012. “Population Intervention Causal Effects Based on Stochastic Interventions.” Biometrics 68 (2): 541-9.

15.

Dudík

Erhan

Langford

, et al. 2014. “Doubly Robust Policy Evaluation and Optimization.” Statistical Science 29 (4): 485-511.

16.

Efron

Tibshirani

R. J.

1994. An Introduction to the Bootstrap. Boca Raton, FL: CRC Press.

17.

Elwert

Winship

2014. “Endogenous Selection Bias: The Problem of Conditioning on a Collider Variable.” Annual Review of Sociology 40:31-53.

18.

Erikson

Goldthorpe

J. H.

Portocarero

1979. “Intergenerational Class Mobility in Three Western European Societies: England, France and Sweden.” The British Journal of Sociology 30 (4): 415-41.

19.

Fairlie

R. W

. 2005. “An Extension of the Blinder-Oaxaca Decomposition Technique to Logit and Probit Models.” Journal of Economic and Social Measurement 30 (4): 305-16.

20.

Glynn

A. N.

Quinn

K. M.

2010. “An Introduction to the Augmented Inverse Propensity Weighted Estimator.” Political Analysis 18 (1): 36-56.

21.

Greenland

. 2003. “Quantifying Biases in Causal Models: Classical Confounding vs Collider-Stratification Bias.” Epidemiology (Cambridge, Mass.) 14 (3): 300-6.

22.

Greenland

Pearl

Robins

J. M.

1999. “Causal Diagrams for Epidemiologic Research.” Epidemiology (Cambridge, Mass.) 10 (1): 37-48.

23.

Greiner

D. J.

Rubin

D. B.

2011. “Causal Effects of Perceived Immutable Characteristics.” Review of Economics and Statistics 93 (3): 775-85.

24.

Hernán

M. A

. 2016. “Does Water Kill? A Call for Less Casual Causal Inferences.” Annals of Epidemiology 26 (10): 674-80.

25.

Hernán

M. A.

Robins

J. M.

2016. “Using Big Data to Emulate a Target Trial When a Randomized Trial is Not Available.” American Journal of Epidemiology 183 (8): 758-64.

26.

Hernán

M. A.

Robins

J. M.

2020. Causal Inference: What If. Boca Raton, FL: Chapman & Hall/CRC.

27.

Hernán

M. A.

Sauer

B. C.

Hernández-Díaz

Platt

Shrier

2016. “Specifying a Target Trial Prevents Immortal Time Bias and Other Self-inflicted Injuries in Observational Analyses.” Journal of Clinical Epidemiology 79:70-5.

28.

Hill

J. L

. 2011. “Bayesian Nonparametric Modeling for Causal Inference.” Journal of Computational and Graphical Statistics 20 (1): 217-40.

29.

Holland

P. W

. 1986. “Statistics and Causal Inference.” Journal of the American Statistical Association 81 (396): 945-60.

30.

Holland

P. W

. 2008. “Causation and Race.” White Logic, White Methods: Racism and Methodology 4:93-109.

31.

Horowitz

. 2018. “Relative Education and the Advantage of a College Degree.” American Sociological Review 83 (4): 771-801.

32.

Horvitz

D. G.

Thompson

D. J.

1952. “A Generalization of Sampling Without Replacement From a Finite Universe.” Journal of the American Statistical Association 47 (260): 663-85.

33.

Hout

. 1988. “More Universalism, Less Structural Mobility: The American Occupational Structure in the 1980s.” American Journal of Sociology 93 (6): 1358-400.

34.

Imbens

G. W

. 2000. “The Role of the Propensity Score in Estimating Dose-Response Functions.” Biometrika 87 (3): 706-10.

35.

Imbens

G. W.

Rubin

D. B.

2015. Causal Inference in Statistics, Social, and Biomedical Sciences. New York: Cambridge University Press.

36.

Jackson

J. W

. 2018. “On the Interpretation of Path-specific Effects in Health Disparities Research.” Epidemiology (Cambridge, Mass.) 29 (4): 517-20.

37.

Jackson

J. W

. 2021. “Meaningful Causal Decompositions in Health Equity Research: Definition, Identification, and Estimation Through a Weighting Framework.” Epidemiology (Cambridge, Mass.) 32 (2): 282-90.

38.

Jackson

J. W.

Arah

O. A.

2020. “Invited Commentary: Making Causal Inference More Social and (social) Epidemiology More Causal.” American Journal of Epidemiology 189 (3): 179-82.

39.

Jackson

J. W.

VanderWeele

T. J.

2018. “Decomposition Analysis to Identify Intervention Targets for Reducing Disparities.” Epidemiology (Cambridge, Mass.) 29 (6): 825-35.

40.

Jonsson

J. O.

Grusky

D. B.

Di Carlo

Pollak

Brinton

M. C.

2009. “Microclass Mobility: Social Reproduction in Four Countries.” American Journal of Sociology 114 (4): 977-1036.

41.

Kang

J. D.

Schafer

J. L.

2007. “Demystifying Double Robustness: A Comparison of Alternative Strategies for Estimating a Population Mean From Incomplete Data.” Statistical Science 22 (4): 523-39.

42.

Kaufman

J. S

. 2008. “Epidemiologic Analysis of Racial/Ethnic Disparities: Some Fundamental Issues and a Cautionary Example.” Social Science & Medicine 66 (8): 1659-69.

43.

Kennedy

E. H.

McHugh

M. D.

Small

D. S.

2017. “Nonparametric Methods for Doubly Robust Estimation of Continuous Treatment Effects.” Journal of the Royal Statistical Society. Series B, Statistical Methodology 79 (4): 1229.

44.

Killewald

Bryan

2016. “Does Your Home Make You Wealthy?” RSF: The Russell Sage Foundation Journal of the Social Sciences 2 (6): 110-28.

45.

Kitagawa

E. M

. 1955. “Components of a Difference Between Two Rates.” Journal of the American Statistical Association 50 (272): 1168-94.

46.

Kohler-Hausmann

. 2018. “Eddie Murphy and the Dangers of Counterfactual Causal Thinking About Detecting Racial Discrimination.” Northwestern University Law Review 113:1163.

47.

Krewski

Rao

1981. “Inference From Stratified Samples: Properties of the Linearization, Jackknife and Balanced Repeated Replication Methods.” The Annals of Statistics 9 (5): 1010-9.

48.

van der Laan

M. J.

Rose

2011. Targeted Learning: Causal Inference for Observational and Experimental Data. New York: Springer Science & Business Media.

49.

Laurison

Friedman

2016. “The Class Pay Gap in Higher Professional and Managerial Occupations.” American Sociological Review 81 (4): 668-95.

50.

Lee

B. K.

Lessler

Stuart

E. A.

2010. “Improving Propensity Score Weighting Using Machine Learning.” Statistics in Medicine 29 (3): 337-46.

51.

Lucas

S. R

. 2001. “Effectively Maintained Inequality: Education Transitions, Track Mobility, and Social Background Effects.” American Journal of Sociology 106 (6): 1642-90.

52.

Lumley

. 2011. Complex Surveys: A Guide to Analysis Using R, Vol. 565. Hoboken, NJ: John Wiley & Sons.

53.

Lumley

. 2019. survey: Analysis of Complex Survey Samples. R package version 3.36.

54.

Lundberg

Johnson

Stewart

B. M.

2021. “What is Your Estimand? Defining the Target Quantity Connects Statistical Evidence to Theory.” American Sociological Review 86 (3): 532-65.

55.

McCaffrey

D. F.

Ridgeway

Morral

A. R.

2004. “Propensity Score Estimation with Boosted Regression for Evaluating Causal Effects in Observational Studies.” Psychological Methods 9 (4): 403-25.

56.

McIntosh

Moss

Nunn

Shambaugh

2020. Examining the Black-white Wealth Gap. Washington DC: Brooking Institutes.

57.

Mize

T. D

. 2016. “Sexual Orientation in the Labor Market.” American Sociological Review 81 (6): 1132-60.

58.

Morgan

S. L

. 2017. A coding of social class for the general social survey. Chicago, Illinois: GSS Methodological Report No. 125.

59.

Murphy

S. A.

van der Laan

M. J.

Robins

J. M.

Group

C. P. P. R.

2001. “Marginal Mean Models for Dynamic Regimes.” Journal of the American Statistical Association 96 (456): 1410-23.

60.

Naimi

A. I.

Schnitzer

M. E.

Moodie

E. E.

Bodnar

L. M.

2016. “Mediation Analysis for Health Disparities Research.” American Journal of Epidemiology 184 (4): 315-24.

61.

Nguyen

T. Q.

Schmid

Stuart

E. A.

2020. “Clarifying Causal Mediation Analysis for the Applied Researcher: Defining Effects Based on what We Want to Learn.” Psychological Methods 26 (2).

62.

Oaxaca

. 1973. “Male-Female Wage Differentials in Urban Labor Markets.” International Economic Review 14 (3): 693-709.

63.

Omi

Winant

1994. Racial Formation in the United States. New York: Routledge.

64.

Pager

. 2003. “The Mark of a Criminal Record.” American Journal of Sociology 108 (5): 937-75.

65.

Pearl

. 2000. Causality: Models, Reasoning and Inference. New York: Springer.

66.

Pearl

. 2001. Direct and indirect effects. In Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence, pp. 411–420. Morgan Kaufmann Publishers Inc.

67.

Petersen

Morgan

L. A.

1995. “Separate and Unequal: Occupation-Establishment Sex Segregation and the Gender Wage Gap.” American Journal of Sociology 101 (2): 329-65.

68.

Pfeffer

F. T.

Hertel

F. R.

2015. “How Has Educational Expansion Shaped Social Mobility Trends in the United States?.” Social Forces 94 (1): 143-80.

69.

Raftery

A. E.

Hout

1993. “Maximally Maintained Inequality: Expansion, Reform, and Opportunity in Irish Education, 1921–75.” Sociology of Education 66 (1): 41-62.

70.

Rao

J. N.

1988. “Resampling Inference with Complex Survey Data.” Journal of the American Statistical Association 83 (401): 231-41.

71.

Robins

. 1986. “A New Approach to Causal Inference in Mortality Studies with a Sustained Exposure Period–Application to Control of the Healthy Worker Survivor Effect.” Mathematical Modelling 7 (9-12): 1393-512.

72.

Robins

Sued

Lei-Gomez

Rotnitzky

2007. “Comment: Performance of Double-robust Estimators When ‘Inverse Probability’ Weights Are Highly Variable.” Statistical Science 22 (4): 544-59.

73.

Robins

J. M

. 2003. “Semantics of Causal DAG Models and the Identification of Direct and Indirect Effects.” Oxford Statistical Science Series 69 (2): 70-82.

74.

Robins

J. M.

Rotnitzky

Zhao

L. P.

1994. “Estimation of Regression Coefficients when Some Regressors are Not Always Observed.” Journal of the American Statistical Association 89 (427): 846-66.

75.

Rubin

D. B

. 1986. “Comment: Which Ifs Have Causal Answers.” Journal of the American Statistical Association 81 (396): 961-2.

76.

Rudolph

K. E.

Díaz

Rosenblum

Stuart

E. A.

2014. “Estimating Population Treatment Effects From a Survey Subsample.” American Journal of Epidemiology 180 (7): 737-48.

77.

Rust

K. F.

Rao

1996. “Variance Estimation for Complex Surveys Using Replication Techniques.” Statistical Methods in Medical Research 5 (3): 283-310.

78.

Saperstein

Penner

A. M.

2012. “Racial Fluidity and Inequality in the United States.” American Journal of Sociology 118 (3): 676-727.

79.

Scharfstein

D. O.

Rotnitzky

Robins

J. M.

1999. “Adjusting for Nonignorable Drop-out Using Semiparametric Nonresponse Models.” Journal of the American Statistical Association 94 (448): 1096-120.

80.

Sen

Wasow

2016. “Race As a Bundle of Sticks: Designs that Estimate Effects of Seemingly Immutable Characteristics.” Annual Review of Political Science 19:499-522.

81.

Shandra

C. L

. 2018. “Disability As Inequality: Social Disparities, Health Disparities, and Participation in Daily Activities.” Social Forces 97 (1): 157-92.

82.

Smith

T. W.

Davern

Freese

Morgan

2018. General Social Surveys, 1972-2018 [machine-readable data file]. Sponsored by National Science Foundation. –NORC ed.– Chicago: NORC, 2018: NORC at the University of Chicago [producer and distributor]. Data accessed from the GSS Data Explorer website at gssdataexplorer.norc.org.

83.

Torche

. 2011. “Is a College Degree Still the Great Equalizer? Intergenerational Mobility Across Levels of Schooling in the United States.” American Journal of Sociology 117 (3): 763-807.

84.

U.S. Census Bureau 2019. Educational attainment in the United States: 2018. https://www.census.gov/data/tables/2018/demo/education-attainment/cps-detailed-tables.html.

85.

VanderWeele

Robinson

2014. “On the Causal Interpretation of Race in Regressions Adjusting for Confounding and Mediating Variables.” Epidemiology (Cambridge, Mass.) 25 (4): 473-84.

86.

Vansteelandt

. 2009. “Estimating Direct Effects in Cohort and Case–control Studies.” Epidemiology (Cambridge, Mass.) 20 (6): 851-60.

87.

Weisshaar

. 2017. “Publish and Perish? An Assessment of Gender Gaps in Promotion to Tenure in Academia.” Social Forces 96 (2): 529-60.

88.

VanderWeele

T. J

. 2009. “Marginal Structural Models for the Estimation of Direct and Indirect Effects.” Epidemiology (Cambridge, Mass.) 20 (1): 18-26.

89.

Western

. 2006. Punishment and Inequality in America. New York: Russell Sage Foundation.

90.

Winship

Radbill

1994. “Sampling Weights and Regression Analysis.” Sociological Methods & Research 23 (2): 230-57.

91.

Wood

S. N

. 2017. Generalized Additive Models: An Introduction with R. Boca Raton, FL: Chapman and Hall / CRC.

92.

Wright

M. N.

Ziegler

2017. “Ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R.” Journal of Statistical Software 77 (1).

93.

Zhang

Bareinboim

2018. Fairness in Decision-making-The Causal Explanation Formula. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.

94.

Zhou

. 2019. “Equalization Or Selection? Reassessing the “Meritocratic Power” of a College Degree in Intergenerational Income Mobility.” American Sociological Review 84 (3): 459-85.

95.

Zhou

Wodtke

G. T.

2019. “A Regression-With-Residuals Method for Estimating Controlled Direct Effects.” Political Analysis 27 (3): 360-9.