Nonparametric Bounds on the Causal Effect of University Studies on Job Opportunities Using Principal Stratification

Abstract

The authors propose a methodology based on the principal strata approach to causal inference for assessing the relative effectiveness of two degree programs with respect to the employment status of their graduates. An innovative use of nonparametric bounds in the principal strata framework is shown, examining the role of some assumptions in reducing uncertainty about the causal effects and proposing a strategy to use the covariates in the construction of the bounds. In the application, the nonparametric bounds turn out to be quite informative on the average causal effect for the latent group of students who are potentially able to graduate from both degree programs. There is some evidence that the effect is positive for economics with respect to political science, at least for some values of the covariates.

Keywords

bounds causal effects effectiveness potential outcomes principal strata

Traditional analyses of the effect of degree programs on the employment status of their graduates (an example of external effectiveness) are performed on the sole basis of graduated students, neglecting the fact that the set of students who are able to graduate from a certain degree program is in principle different from the set of students who are able to graduate from another degree program. In other words, two degree programs might select different kinds of students with different attitudes and propensities to become employed. An analysis of employment status based only on graduated students mixes the “direct” effect of a degree program on employment status with the “indirect” effect through graduation status. From a policy point of view, disentangling the two effects is very important. For example, if there is a direct effect on employment, a degree program with smaller effectiveness should try to adapt its contents to match labor market requirements. If, instead, the labor market success of a degree program is due merely to different selection criteria through the university career (e.g., one program is more difficult than another and thus selects better students), the problem becomes an issue of educational policy, for example, whether it is desirable for society to graduate students with low ability or to allow the existence of degree programs with different difficulty levels.

To study the direct effect of degree programs on the employment status, avoiding the possible bias caused by different graduation processes, it is necessary to envisage a joint study of graduation and employment. In this respect, a convenient framework is that of principal stratification (Frankgakis & Rubin, 2002), an important development of the potential outcomes approach to causal inference (Rubin, 1974). The framework of principal stratification was recently used by Barnard, Frangakis, Hill, Rubin (2003) for the analysis of a complex randomized experiment in the educational context. In the present application, the treatment variable is the degree program, while the intermediate (posttreatment) variable defining the principal strata is graduation status (graduated or not). The key point is that if a student does not graduate, the outcome variable, employment status, is not defined for the purpose of assessing the effectiveness of graduation in a given degree program with respect to job opportunities. This is an example of the so-called truncation by death discussed by Zhang and Rubin (2003) in the hypothetical case of a randomized experiment concerning two high school educational programs, in which the intermediate variable is graduation and the outcome variable is the score on a final test. In our article, the approach of Zhang and Rubin is applied to a real case, which differs from their example in many respects: (a) the treatment is not randomized, (b) the two treatments are on an equal footing (i.e., there is no active treatment to be compared with a control), (c) the outcome variable is binary and subject to nonresponse, and (d) some relevant covariates are available.

The present analysis is limited to a comparison of only two degree programs. The extension to many degree programs entails some technical difficulties, but the conceptual framework would be essentially unaltered.

The two degree programs to be compared are in economics (economia e commercio) and political science (scienze politiche), which are quite similar, at least in Italy, with respect to the contents of the courses and to job opportunities. In the light of this similarity, for a given level of the observed covariates, the choice to enroll in a certain degree program is likely to be only weakly related to unobserved characteristics that potentially affect also graduation and employment status, so the key assumption of unconfoundedness discussed later seems reasonable.

The article is organized as follows. The first section describes the data, and the second and third sections outline the principal strata framework and the probabilistic structure used to model the data at hand. The fourth section is devoted to the derivation of large-sample nonparametric bounds for a homogeneous population, and the fifth section extends this technique to exploit the covariates. The final section offers some concluding remarks.

The Data

A joint analysis of the academic careers and job opportunities of university students required merging two data sources: an administrative database about a cohort of freshmen and survey data on the employment status of the graduates belonging to that cohort. In the present study, concerning the University of Florence, the two sources were

the administrative database of the 1992 cohort of freshmen enrolled in the two degree programs to be compared, economics and political science, and

three census surveys on the occupational status of the graduates in 1998, 1999, and 2000.

The data sets are merged by matriculation number. Overall, 1,941 freshmen belonged to the examined 1992 cohort: 1,068 enrolled in economics and 873 enrolled in political science.

The choice of the specific 1992 cohort was motivated by the availability of survey data for the graduates from 1998 to 2000: The 1992 cohort appeared to be the best choice, because only 21 students of that cohort graduated before 1998, and among the students who did not drop out, the majority of them graduated in the triennium 1998–2000.

The status of the students by the end of 2000 is summarized in Table 1. The students were classified as “dropped out” if they abandoned the degree program before the end of 2000 or as “still enrolled” if they were still paying the enrollment fee by the end of 2000. Because our aim was to assess the effectiveness of the degree, the students still enrolled and the students who had dropped out both belonged to the same residual category of students who had not (yet) graduated. This implies that in the present analysis, graduation is defined as graduation within 9 years of enrollment, and thus the effectiveness of the degree must be interpreted in such terms.

Employment status at the interview for the subset of graduated students is reported in Table 2. All the interviewed students responded to the question on employment status. Apart from 21 students who graduated before 1998 and so were out of the target group, almost all missing interviews were due to missing contact.

The outcome variable of the analysis was a dichotomous indicator of permanent job at the time of the interview (i.e., from 1 to 2 years after graduation). Assessments of the permanent nature of the graduates’ jobs depended on the types of contracts for employees; job permanence was self-assessed for self-employed workers. Temporary jobs were ignored because they often are low-level, easy-to-obtain jobs that should not contribute to define the outcomes of the graduates.

The administrative database includes some additional information on every student in the 1992 cohort, which was used to define five dichotomous covariates: gender, residence (Florence vs. others), high school degree (liceo, a type of secondary school, vs. others), high school grade (high grade [i.e., 50–60] vs. low grade [i.e., 36–49]), and late enrollment (i.e., the student did not enroll soon after high school). Table 3 reports the sample means of the covariates.

The covariates had different distributions in the two degree programs (treatment arms), so the assignment mechanism was likely to depend on the covariates. In particular, high school grades were higher for the economics students. The most striking difference concerned late enrollment, which was rare in economics but reached 22% in political science.

The Principal Strata Framework

Consider the 1992 cohort of freshmen enrolled in economics or political science at the University of Florence. For a generic student, the treatment variable Z is defined as Z =1 if the student enrolled in economics and Z =0 if the student enrolled in political science. Even if the enrollment decision was taken by the student, terms such as treatment and assignment are used to conform to the standard language of causal analysis.

Now let z denote the realized value of Z, and let z denote the corresponding vector for all individuals. In the potential outcomes framework, every posttreatment variable (i.e., any relevant variable that takes its value after treatment assignment) depends on the vector of treatment assignments z. However, in the present application, it is reasonable to make the following standard assumption, which rules out possible interactions between individuals:

Assumption 1 (stable unit treatment value assumption): For any individual, every posttreatment variable depends on z only through its own z

Under the stable unit treatment value assumption, every posttreatment variable has as many potential versions as the number of possible treatments (two in the present application). Moreover, assuming exchangeability, it is possible to omit the individual-specific subscripts from the random variables.

The posttreatment variables can be defined as follows. The first posttreatment variable is the intermediate variable S, with potential versions S(0) and S(1): For each value of z ∈ {0, 1}, S(z) is the indicator of the event “the student graduated within 9 years if enrolled in degree program z.” As already noted, the restriction to graduation within 9 years was imposed by the availability of survey data for the graduates up to year 2000.

Another posttreatment variable is the response indicator R, with potential versions R(0) and R(1): For each value of z ∈ {0, 1}, R(z) is the indicator of the event “the student responded to the question on the employment status if enrolled in degree program z and graduated.”

The last posttreatment variable is the outcome variable Y, with potential versions Y(0) and Y(1): For each value of z ∈ {0, 1}, Y(z) is the indicator of the event “the student had a permanent job at the time of the interview (i.e., from 1 to 2 years after graduation) if enrolled in degree program z and graduated.”

For any individual, the treatment variable assumes one and only one value, so for every posttreatment variable, only one of the two potential versions can be observed: S ^obs =S(Z), R ^obs =R(Z), and Y ^obs =Y(Z).

Because both the treatment variable and the intermediate variable are dichotomous, four principal strata can be defined in the following way through the latent variable L:

L = “GG” (graduated, graduated) if S(1) =1 and S(0) =1: a student who would be able to graduate in both degree programs;

L =“GN” (graduated, not graduated) if S(1) =1 and S(0) =0: a student who would be able to graduate in the first degree program (economics) but not able to graduate in the second degree program (political science);

L =“NG” (not graduated, graduated) if S(1) =0 and S(0) =1: a student who would not be able to graduate in the first degree program (economics) but would be able to graduate in the second degree program (political science);

L =“NN” (not graduated, not graduated) if S(1) =0 and S(0) =0: a student who would not be able to graduate in either degree program.

Each student belongs to a single stratum; however, because S(1) and S(0) cannot be jointly observed, the data cannot reveal to which stratum a student belongs. In other words, the principal strata are latent classes, and the data allow only an estimation of the probability that a given individual belongs to a certain latent class. Also note that the principal strata are defined by the couple of potential values of the intermediate variable, so they are not affected by the treatment and thus can be viewed as an unobserved pretreatment covariate.

The relationship among the observed groups, defined by Z and S ^obs, and the principal strata is described in Table 4, along with the corresponding supports of R ^obs and Y ^obs.

Note that given Z and S ^obs, an individual can belong to only two of the four strata, so some strata memberships are ruled out by the data.

For the posttreatment variables S and Y, the sample proportions in the two treatment arms are of interest:

P _S _,1 = 0.253 and P _S _,0 = 0.202 are the sample proportions of graduates (S ^obs =1)among students enrolled in economics (Z =1) and political science (Z =0), respectively, and

P _Y _,1 = 0.516 and P _Y _,1 = 0.364 are the sample proportions of individuals with permanent jobs (Y ^obs =1) among students enrolled in economics (Z =1) and political science (Z =0), respectively, who graduated (S ^obs =1) and responded to the interview (R ^obs =1).

Therefore, economics had a higher graduation rate and also a higher employment rate among its graduates. The analysis should assess if the better performance of economics can be attributed to a positive causal effect.

Because the purpose of this study was to evaluate the effectiveness of graduation from a given degree program with respect to job opportunities, the outcome variable Y is defined only for graduates. Therefore, the causal effect Y(1) − Y(0) on the employment status is properly defined only in the GG stratum (i.e., for students who would be able to graduate from both degree programs). In principle, if data on employment were available, the outcome variable Y could be defined for all enrolled students, allowing comparisons within other strata. Anyway, such comparisons would not address the issue of the relative effectiveness of graduation from different degree programs. The causal effect of main interest is thus defined within the GG stratum.

The strata other than the GG stratum could be split by defining the intermediate variable S as having three categories: “graduated within 9 years,” “still enrolled after 9 years,” and “dropped out within 9 years.” This definition of the intermediate variable would lead to 3² =9 principal strata. Even in this case, the causal effect of interest would still be sought only in the GG stratum, so a framework with 9 strata, although making the analysis considerably more complex, would not help the estimation of the effect of main interest.

The key issue when drawing causal inferences with observational studies is that the treatment is not randomized, so there may be confounders that influence both the treatment and the outcome. In such a case, a statistical association could not be interpreted as a causal effect. Indeed, in observational studies, causal effects are usually identified by assuming that the treatment is unconfounded. In the principal strata framework, the definition of unconfoundedness also includes the intermediate variable:

Assumption 2 (unconfoundedness of treatment assignment): Z ⊥ {S(0), S(1), Y(0),Y(1)}

When covariates are available, the unconfoundedness assumption is usually stated conditionally on such covariates to make it more plausible. In the following, we assume to be within a cell defined by the values of the available covariates, so all the assumptions and distributions are implicitly conditional on such covariates.

In the present application, the unconfoundedness assumption would be violated if students with the same observed covariates based their enrollment decisions on reliable predictions of graduation and employment determined by unobserved covariates. However, this behavior seems unlikely, because in Italy, the two competing degree programs have many common features, both in the subjects and in their occupational prospects (e.g., they are equivalent as for access to many positions in the civil service).

The data on the graduates’ outcomes also suffer from a problem of missing data; in fact, even if the outcome variable Y is defined for all the graduates, it is available only for the interviewed ones. In this article, it is assumed that the information about Y is missing at random:

Assumption 3 (missing at random): R(z) ⊥ Y(z)|S(z) =1 for each z ∈ {0, 1}

Under Assumption 3, the response mechanism is ignorable, so the analysis can be safely based on the available responses (conditional on observed covariates). Here the assumption of missing at random seems reasonable because almost all missing interviews are due to missing contact. Alternative assumptions on the response mechanism are discussed in Mealli, Imbens, Ferro, and Biggeri (2004).

The Probabilistic Structure

Under Assumptions 1–3, the data-generating process can be defined in terms of the following two sets of probabilities:

Probabilities of the principal strata:

$\begin{array}{l} π_{GG} = Pr (L = GG), \\ π_{GN} = Pr (L = GN), \\ π_{NG} = Pr (L = NG), \end{array}$

and

$π_{NN} = Pr (L = NN),$

where, for example, π_GN is the probability that a student belongs to the principal stratum GN (i.e., he or she would be able to graduate in economics but not in political science).

Probabilities of the outcome variable, conditional on the principal stratum:

$\begin{array}{l} γ_{1, GG} = Pr (Y^{obs} = 1 ∣ Z = 1, L = GG) = Pr [Y (1) = 1 ∣ L = GG], \\ γ_{0, GG} = Pr (Y^{obs} = 1 ∣ Z = 0, L = GG) = Pr [Y (0) = 1 ∣ L = GG], \\ γ_{1, GN} = Pr (Y^{obs} = 1 ∣ Z = 1, L = GN) = Pr [Y (1) = 1 ∣ L = GN], \end{array}$

and

$γ_{0, NG} = Pr (Y^{obs} = 1 ∣ Z = 0, L = NG) = Pr [Y (0) = 1 ∣ L = NG],$

where, for example, γ_1,GG is the probability that a student had a permanent job if he or she enrolled in economics (Z =1) and belonged to the principal stratum GG. Note that the second equalities in the probabilities of the outcome variable follow from unconfoundedness (Assumption 2).

The possible combinations of treatment Z and principal stratum L are eight, but the probabilities of the outcome are defined only for the four listed combinations; in fact, for the other four combinations, S ^obs is null, and thus Y ^obs is not defined.

The probabilistic structure is analogous to that of latent class models, except that in the present case, belonging to a certain latent class (principal stratum) determines not only the values of the probabilities of the outcome but even whether they are defined or not. Another peculiarity is that any given individual can belong only to a subset of latent classes (i.e., given the data, the probabilities of belonging to certain classes are necessarily zero).

The conditional probabilities of the outcome cannot be directly estimated, because they are defined conditionally on the principal stratum. Rather, the data allow the estimation of the following probabilities:

γ_{1} = Pr (Y^{obs} = 1 ∣ Z = 1, S^{obs} = 1) = Pr [Y (1) = 1 ∣ S (1) = 1],

and

γ_{0} = Pr (Y^{obs} = 1 ∣ Z = 0, S^{obs} = 1) = Pr [Y (0) = 1 ∣ S (0) = 1],

where the second equalities follow from unconfoundedness (Assumption 2). These probabilities are in fact mixtures of the probabilities conditional on the principal stratum:

γ_{1} = γ_{1, GG} \frac{π_{GG}}{π_{GG} + π_{GN}} + γ_{1, GN} \frac{π_{GN}}{π_{GG} + π_{GN}},

(1)

and

γ_{0} = γ_{0, GG} \frac{π_{GG}}{π_{GG} + π_{NG}} + γ_{0, NG} \frac{π_{NG}}{π_{GG} + π_{NG}},

(2)

so estimation requires some mixture deconvolution.

The estimand of main interest is the average causal effect (ACE) on employment in the GG stratum (i.e., the expected difference between Y[1] and Y[0] for individuals belonging to the GG stratum). When the outcome Y is binary, this estimand becomes the difference between the probabilities of Y under the two treatments:

Pr [Y (1) = 1 ∣ L = GG] - Pr [Y (0) = 1 ∣ L = GG] = γ_{1, GG} - γ_{0, GG} .

(3)

Also, the probabilities of the principal strata are interesting per se, because they throw light on the dynamics of the graduation process in the two degree programs. In fact, the ACE on graduation is

Pr [S (1) = 1] - Pr [S (0) = 1] = (π_{GG} + π_{GN}) - (π_{GG} + π_{NG}) = π_{GN} - π_{NG} .

(4)

Therefore, the size of the GG stratum, π_GG, is irrelevant for the ACE on graduation, but it points out different scenarios. In particular, as π_GG diminishes, the graduates of the two degree programs tend to be more heterogeneous, and therefore there are more chances to increase the graduation rates by means of a suitable guidance policy.

Large-Sample Nonparametric Bounds for a Homogeneous Population

Now let us consider how to determine the range of admissible probability values for the principal strata under the available data and the corresponding bounds on the ACE on employment in the GG stratum. In this section, we show the methodology to derive the bounds for a homogeneous population (i.e., assuming to be within a cell defined by the values of the covariates). The methodology will be illustrated using data from the whole population (i.e., assuming that the set of covariates is empty and thus there is a single cell). In the next section, we address the issues of estimating the bounds within low-frequency cells and combining the conditional bounds to obtain unconditional bounds.

In the present application, there are four principal strata, whose distribution is defined by three nonredundant probabilities. When the treatment is unconfounded, the distribution of the principal strata is the same in both treatment arms. Therefore, with the addition of one constraint, the probabilities of the principal strata can be estimated by the observed proportions of graduates in the two degree programs, P _S _,1 and P _S _,0. When the sample is sufficiently large, the sampling errors can be neglected, yielding the following equations (see Table 4):

\begin{array}{r} P_{S, 1} = π_{GG} + π_{GN}, \\ 1 - P_{S, 1} = π_{NG} + π_{NN}, \\ P_{S, 0} = π_{GG} + π_{NG}, \end{array}

and

1 - P_{S, 0} = π_{GN} + π_{NN} .

From these equations, it follows that π_GG must lie in the interval

max (P_{S, 0} + P_{S, 1} - 1, 0) \leq π_{GG} \leq min (P_{S, 0}, P_{S, 1}) .

(5)

Then, fixing π_GG to a value in its admissible range, the probabilities of the other principal strata are given by

π_{GN} = P_{S, 1} - π_{GG},

(6a)

π_{NG} = P_{S, 0} - π_{GG},

(6b)

and

π_{NN} = 1 - P_{S, 1} - P_{S, 0} + π_{GG} .

(6c)

Figure 1 shows the four probabilities of the principal strata as functions of π_GG for the whole population at hand, where π_GG lies between 0 and 0.202. Note that the difference between the two parallel descending lines, π_GN − π_NG, is the ACE on graduation defined in Equation 4 and estimated by P _S _,1 − P _S _,0. Therefore, Figure 1 represents different scenarios yielding the same estimated ACE on graduation. In particular, the maximum admissible value of π_GG corresponds to the scenario in which the GN and NG strata are at their admissible minimums (i.e., π_GN = P _S _,1 − P _S _,0 and π_NG = 0).

The bounds on the ACE on employment in the GG stratum, γ_1,GG − γ_0,GG, are calculated for any fixed value of π_GG by considering the “best” and “worst” scenarios. From Equation 1, it follows that

γ_{1, GG} = \frac{γ_{1} - γ_{1, GN} (1 - ϕ_{1, GG})}{ϕ_{1, GG}},

(7)

where ϕ_1,GG = π_GG/(π_GG + π_GN). Then γ_1,GG attains its minimum when γ_1,GN = 1 and its maximum when γ_1,GN = 0, leading to the following bounds:

max (1 - \frac{1 - γ_{1}}{ϕ_{1, GG}}, 0) \leq γ_{1, GG} \leq min (\frac{γ_{1}}{ϕ_{1, GG}}, 1) .

(8)

These bounds rely on two unknown quantities that need to be estimated: ϕ_1,GG is estimated through Equation 6 so it depends on the assumed value of π_GG and on the sample proportion of the intermediate variable for Z = 1, P _S _,1; on the other hand, γ₁ is estimated by the sample proportion of the outcome variable for Z =1, P _Y _,1.

Similarly, Equation 2 implies the following bounds for γ_0,GG:

max (1 - \frac{1 - γ_{0}}{ϕ_{0, GG}}, 0) \leq γ_{0, GG} \leq min (\frac{γ_{0}}{ϕ_{0, GG}}, 1),

(9)

where ϕ_0,GG = π_GG/(π_GG + π_NG) depends on the assumed value of π_GG and on the sample proportion of the intermediate variable for Z =0, P _S _,0, while γ₀ is estimated by the sample proportion of the outcome variable for Z =0, P _Y _,0.

Finally, the bounds on the ACE on employment in the GG stratum, γ_1,GG − γ_0,GG, are derived from the bounds in Equations 8 and 9:

\begin{array}{l} max (1 - \frac{1 - γ_{1}}{ϕ_{1, GG}}, 0) - min (\frac{γ_{0}}{ϕ_{0, GG}}, 1) \leq γ_{1, GG} - γ_{0, GG} \\ \leq min (\frac{γ_{1}}{ϕ_{1, GG}}, 1) - max (1 - \frac{1 - γ_{0}}{ϕ_{0, GG}}, 0) . \end{array}

(10)

These bounds are similar to the ones derived by Zhang and Rubin (2003); the difference is that they have a continuous Y variable and calculate the bounds through a procedure based on the ordered values of Y. However, when Y is dichotomous, as in the present case, their procedure yields the same results as ours, up to small approximations due to the discreteness of the data. Also note that Zhang and Rubin analyzed data from an experiment with an active treatment versus a control, so they studied the bounds as functions of π_NG, whereas in the present case, given that the two treatments are on an equal footing, it is more natural to study the bounds as functions of π_GG.

The bounds in Equation 10, estimated by the sample proportions, are plotted as functions of π_GG in Figure 2 under the label “general bounds.” Note that the bounds widen as π_GG becomes smaller: For high values of π_GG (between 0.196 and the maximum 0.202), the extremes are both positive, so the sign of the ACE on employment in the GG stratum is determined; then the bounds widen until they reach the interval [−1,1], thus becoming noninformative.

The bounds just calculated are “large-sample” bounds, in the sense that in large samples, they estimate the true bounds nearly without error and there is no need to consider the estimation of confidence limits. In general, both the upper and the lower bounds could be wrapped in confidence bands to take account of sampling variability. The derivation of confidence limits is not trivial, and it can be approached in various ways, as explained by Imbens and Manski (2004). Anyway, in the present application, the main use of the bounds is to explore the data at hand, showing only the uncertainty on the causal effect caused by the partial identifiability of the model. Indeed, the bounds convey the uncertainty involved in the estimation of the ACE on employment in the GG stratum irrespective of the sample size: The message is that even in a very large sample, there is an entire interval of admissible values for the estimand of interest whose width depends on the structure of the population, notably on the size of the GG stratum.

The bounds can be sharpened by making some suitable assumptions on the probabilities of the principal strata or on the probabilities of the outcome.

As for the probabilities of the principal strata, a standard assumption is that of monotonicity, saying that there is no NG group (i.e., π_NG = 0). This assumption is often made in studies comparing an active treatment with a placebo-like treatment, because with regard to the intermediate variable S, the NG group has a negative performance under the active treatment (S[1] =0) and a positive performance under the control (S[0] =1). However, in the present application, the two treatment groups are on an equal footing, so it is likely that both the NG and GN groups are present. Therefore, the monotonicity assumption is not sensible here.

A restriction on the values of the probabilities of the principal strata, which seems reasonable in the present context, is that the students who would be able to graduate in both degree programs are a majority in the group of students who would be able to graduate in at least one degree program (i.e., the group with probability π_GG + π_NG + π_GN). This leads to the following assumption:

Assumption 4 (relative majority of the GG individuals): π_GG ≥ π_NG + π_GN

Equations 6a and 6b allow rewriting Assumption 4 as 3π_GG − (P _S _,1 + P _S _,0) ≥ 0. Because the bounds widen as π_GG diminishes, the widest bounds satisfying Assumption 4 correspond to the unique value of π_GG for which the inequality becomes an equality, namely, π_GG = (P _S _,1 + P _S _,0)/3, provided such value of π_GG is admissible. This case is represented in Figure 2 by the vertical line labeled π_GG = π_NG + π_GN and passing through the value 0.152. The widest bounds satisfying Assumption 4 are thus the bounds corresponding to π_GG = 0.152, namely, [−0.290, 0.708], which are obviously more informative than the interval [−1, 1].

As for the probabilities of the outcome, it might be sensible to assume that the students who would be able to graduate in both degree programs (the GG stratum) had more chances to get permanent jobs than the students who would be able to graduate in one degree program but not in the other (the NG and GN strata). This consideration leads to the following stochastic dominance assumption for the binary outcome Y:

Assumption 5 (stochastic dominance): γ_1,GG ≥ γ_1,GN and γ_0,GG ≥ γ_0,NG

For an outcome of arbitrary type, the stochastic dominance assumption states that Pr(Y[1] ≤ t|L =GG) ≤ Pr(Y[1] ≤ t|L =GN) and Pr(Y[0] ≤ t|L =GG) ≤ Pr (Y[0] ≤ t|L =NG) for any real number t. Such an assumption was exploited by Zhang and Rubin (2003) in the case of a continuous outcome.

The plausibility of this assumption depends on the circumstances, and it is not appropriate to use it if one thinks that people who are more specific about their skills and preferences (the NG and GN strata) might have a better chance to get jobs.

Under stochastic dominance, the bounds are narrower than in the general case. In fact, from Equation 7 and the first inequality in Assumption 5, γ_1,GG attains its minimum when γ_1,GN = γ_1,GG, so from Equation 1, the lower bound of γ_1,GG in Equation 8 becomes γ₁. Similarly, the upper bound of γ_0,GG in Equation 9 becomes γ₀. Therefore, under stochastic dominance, Equation 10 becomes

γ_{1} - min (\frac{γ_{0}}{ϕ_{0, GG}}, 1) \leq γ_{1, GG} - γ_{0, GG} \leq min (\frac{γ_{1}}{ϕ_{1, GG}}, 1) - γ_{0} .

(11)

Note that when π_GG = π_NG + π_GN, the estimated bounds in Equation 11 are [0.030, 0.494] (i.e., the ACE on employment in the GG stratum is necessarily positive). This is an interesting result, because it shows that two weak assumptions, such as Assumptions 4 and 5, may be sufficient to determine the sign of the effect without the need to rely on parametric models.

Large-Sample Nonparametric Bounds Exploiting the Covariates

The bounds for the whole population computed in the previous section are valid if all Assumptions 2–5 hold unconditionally on the covariates. However, because of the observational nature of the study, the assumptions are more plausible conditionally on the covariates, so it is advisable to exploit the covariates. A possible strategy is to derive the bounds for each cell defined by the covariates and then reconstruct the unconditional bounds through an average weighted by the cell frequencies. This procedure also has the advantage of showing how the bounds depend on the covariates, for example, whether some covariates are particularly effective in sharpening the bounds or whether in some cells the bounds are sufficient to determine the sign of the causal effect.

Using the cells defined by the covariates is impractical with a large number of covariates, especially if they have many categories, and it becomes infeasible with continuous covariates. In such cases, an alternative strategy to define the cells is to exploit the propensity score (Rosenbaum & Rubin, 1983). In fact, it is well known that if unconfoundedness holds conditionally on the covariates, it holds also conditionally on the propensity score, so the alternative strategy would involve estimating the propensity score with the available covariates and defining the cells through a suitable partitioning of the range of the propensity score.

The calculation of the conditional bounds, however, may raise a new problem: Depending on the sample size and on the number and nature of the covariates, the cells may have low frequencies. Therefore, to estimate the quantities that enter the bounds for a given cell c, it is advisable to replace the cell sample proportions ( $P_{S, 0}^{c}, P_{S, 1}^{c}, P_{Y, 0}^{c}, P_{Y, 1}^{c}$ ) with the predictions obtained with a smoothing technique. When the cells are defined by the covariates, the smoothing can be obtained using logit regression. Then the cell sample proportions $P_{S, 0}^{c}$ and $P_{S, 1}^{c}$ are replaced with the probabilities predicted by the models

logit Pr (S^{obs} = 1 ∣ Z = z, x) = α_{S, z} + x β_{S, z} z \in {0, 1},

(12)

while the cell sample proportions $P_{Y, 0}^{c}$ and $P_{Y, 1}^{c}$ are replaced with the probabilities predicted by the models

logit Pr (Y^{obs} = 1 ∣ S^{obs} = 1, Z = z, x) = α_{Y, z} + x β_{Y, z} z \in {0, 1},

(13)

The logit model is used here only to smooth the data, so inferential issues are ignored. In particular, each of the four models includes all the available covariates, regardless of statistical significance.

The conditional bounds on the ACE on employment in the GG stratum for a given cell are then calculated as shown in the previous section, using Equation 10 for the general bounds and Equation 11 for the bounds under stochastic dominance.

If some covariates have missing values, some adjustments are necessary before computing the bounds. In our case, there were only 19 individuals with missing curricular information. Given their small number, those individuals were excluded from the data set, so the analysis based on the covariates was conducted on a sample of 1,922 individuals.

Alternatively, one could avoid the exclusion of individuals with missing values by adding the category “missing” to the covariates, thus increasing the number of cells. However, the added cells are likely to have low frequencies, so this variant of the procedure is worth being implemented only when there are many individuals with missing values and few covariates affected by missing data. The two mentioned strategies for dealing with missing data in the covariates (namely, deleting individuals and defining additional cells) yield valid inferences under different assumptions on the missing mechanism, as discussed by Mattei (2004).

To study the dependence of the bounds on the covariates, it is useful to restrict attention to a single scenario, for example, the one with π_GG = π_NG + π_GN for each cell, which is the worst situation (in terms of the width of the bounds) still satisfying Assumption 4 (relative majority of the GG individuals). The results under this scenario are shown in Table 5 (without stochastic dominance) and in Table 6 (under stochastic dominance within each cell). The tables show the bounds for six combinations of the covariates (i.e., 6 of the 30 nonempty cells). The first row, “baseline,” refers to an individual with all covariates equal to zero and corresponds to the cell with the highest frequency; each subsequent row refers to an individual who differs from the baseline only for one covariate. Even if in each cell the bounds are computed using the proportions predicted from the logit models in Equations 12 and 13, in both Table 5 and Table 6, we report the cell frequencies to give an idea of the information contained in each cell.

Stochastic dominance (Table 6) roughly halves the length of the intervals. The sign of the ACE on employment in the GG stratum is still uncertain for the baseline individual, but a change in the covariate late enrollment is sufficient to constrain the causal effect to be positive. Moreover, the covariates female and high grade lead to intervals that include mostly positive values. The covariates thus play a crucial role in the analysis.

Unconditional bounds are obtained through an average weighted by the cell frequencies: The unconditional lower (upper) bound is the weighted average of the conditional lower (upper) bounds. Table 7 shows three types of bounds: (A) bounds ignoring the covariates, derived in the previous section, in the scenario π_GG = π_NG + π_GN, which results in π_GG = 0.152; (B) bounds exploiting the covariates, derived from the conditional bounds computed in the scenario π_GG = π_NG + π_GN for each cell (as in Tables 5 and 6); and (C) bounds exploiting the covariates, derived from the conditional bounds computed in the scenario π_GG = 0.152 for each cell.

The bounds of Case B are slightly shorter than the simple bounds of Case A, even if the lower bound under stochastic dominance becomes barely negative. In general, Case B bounds are more reliable than Case A bounds, because the underlying assumptions are more plausible when applied conditional on the covariates.

The bounds of Case C are calculated to illustrate the danger of deriving the conditional bounds with a fixed value of π_GG for each cell, because the real meaning of the chosen value of π_GG is different from cell to cell, and it is quite possible that for some cells, the chosen value is outside the interval of admissible values in Equation 5. After correcting the nonadmissible values of π_GG, which is necessary for more than half of the cells, the bounds shown in the last row of Table 7 are obtained. In the present application, they are not so far from the previous bounds, but in general, they cannot be trusted.

The main result of the nonparametric analysis is that the bounds on the ACE on employment in the GG stratum of economics versus political science point toward an overall positive effect in favor of economics, provided that Assumptions 4 and 5 hold for any combination of the covariates, as seems reasonable in this setting. However, the sign of the effect remains quite uncertain for particular combinations of the values of the covariates.

Concluding Remarks

Two degree programs of the University of Florence have been compared to evaluate their effectiveness with respect to employment status after graduation. The principal strata approach to causal inference was used to set up a general framework of analysis, with a precise definition of the causal quantities. In this framework, nonparametric bounds for the causal effect of interest were derived: The nonparametric bounds permit one to restrict the range of possible inferences on the basis of a minimal set of assumptions, whose plausibility should be judged case by case. In observational studies, as the one considered here, it is crucial to exploit the available covariates to make the assumptions more plausible. To that end, we propose a general strategy for building bounds that use the information carried by the covariates.

In this application, the nonparametric bounds are quite informative on the ACE in the GG stratum (i.e., for the students who would be able to graduate from both degree programs). There is some evidence that the effect is positive for economics, at least for some values of the covariates. The bounds rely on the key assumption of treatment unconfoundedness, which seems plausible given the available covariates. In practice, some other assumptions are needed to sharpen the bounds enough to draw some substantive conclusions. In this respect, we propose the joint use of a restriction on the probabilities of the principal strata (relative majority of the GG individuals) and a restriction on the probabilities of the outcome (stochastic dominance). Such restrictions are effective in sharpening the bounds and have a substantive meaning that makes them plausible in the present application.

An efficient way to exploit the covariates, at the price of adding further assumptions, is to build a parametric model, which can be fitted using either likelihood or Bayesian methods. Model specification and estimation is a difficult task, because in the principal strata framework, the latent groups lead to mixtures of distributions that are difficult to disentangle. The covariates are extremely useful to identify the model: Identification can be achieved by several alternative restrictions, as illustrated by Jo (2002) in the special instance of noncompliance with two principal strata. However, the likelihood function is usually rather flat, so its maximization is not trivial. Using the theoretical framework and data outlined in this article, Grilli and Mealli (2007) performed a maximum likelihood analysis, discussing the relevant modeling and computational issues. The model-based analysis may also be developed within the Bayesian paradigm (Imbens & Rubin, 1997), which entails several difficulties (e.g., specification of the priors, computational complexity) but offers some advantages that become crucial as the complexity of the model increases, as in Barnard et al. (2003).

Regardless of the chosen paradigm of inference, the results of a model-based analysis must be interpreted with care, because they rely on several modeling assumptions. As general advice, the computation of nonparametric bounds should always be the first step of the analysis, and even the only step if the bounds are sufficiently informative to meet the main research question.

Nonparametric Bounds on the Causal Effect of University Studies on Job Opportunities Using Principal Stratification

Abstract

Keywords

The Data

The Principal Strata Framework

Assumption 1 (stable unit treatment value assumption): For any individual, every posttreatment variable depends on z only through its own z

Assumption 2 (unconfoundedness of treatment assignment): Z ⊥ {S(0), S(1), Y(0),Y(1)}

Assumption 3 (missing at random): R(z) ⊥ Y(z)|S(z) =1 for each z ∈ {0, 1}

The Probabilistic Structure

Large-Sample Nonparametric Bounds for a Homogeneous Population

Assumption 4 (relative majority of the GG individuals): π_GG ≥ π_NG + π_GN

Assumption 5 (stochastic dominance): γ_1,GG ≥ γ_1,GN and γ_0,GG ≥ γ_0,NG

Large-Sample Nonparametric Bounds Exploiting the Covariates

Concluding Remarks

Footnotes

Figures and Tables

References

Nonparametric Bounds on the Causal Effect of University Studies on Job Opportunities Using Principal Stratification

Abstract

Keywords

The Data

The Principal Strata Framework

Assumption 1 (stable unit treatment value assumption): For any individual, every posttreatment variable depends on z only through its own z

Assumption 2 (unconfoundedness of treatment assignment): Z ⊥ {S(0), S(1), Y(0),Y(1)}

Assumption 3 (missing at random): R(z) ⊥ Y(z)|S(z) =1 for each z ∈ {0, 1}

The Probabilistic Structure

Large-Sample Nonparametric Bounds for a Homogeneous Population

Assumption 4 (relative majority of the GG individuals): πGG ≥ πNG + πGN

Assumption 5 (stochastic dominance): γ1,GG ≥ γ1,GN and γ0,GG ≥ γ0,NG

Large-Sample Nonparametric Bounds Exploiting the Covariates

Concluding Remarks

Footnotes

Figures and Tables

References

Assumption 4 (relative majority of the GG individuals): π_GG ≥ π_NG + π_GN

Assumption 5 (stochastic dominance): γ_1,GG ≥ γ_1,GN and γ_0,GG ≥ γ_0,NG