A Modified General Location Model for Noncompliance With Missing Data

Abstract

Missing data, especially when coupled with noncompliance, are a challenge even in the setting of randomized experiments. Although some existing methods can address each complication, it can be difficult to handle both of them simultaneously. This is true in the example of the New York City School Choice Scholarship Program, where both the covariates and the outcomes were sometimes missing, and there was complicated noncompliance. The authors propose a modified general location model to integrate the ideas of missing data techniques and principal stratification and then analyze the same data as in Barnard, Frangakis, Hill, and Rubin (2003), where a pattern-mixture model was used. Their results are presented and compared with those in Barnard et al.

Keywords

causal inference Rubin Causal Model school voucher program

1. Introduction

The past two decades witnessed a series of debates in the United States, over school choice voucher programs. Although the Supreme Court has ruled in this case (e.g., see Krueger & Zhu, 2004) that public funds may be used through voucher programs to sponsor children from low-income families to enroll in private religious schools, arguments about the effectiveness of such programs continue. Although quite a few school choice voucher programs have been conducted across the United States, the New York City School Choice Scholarship Program is arguably the largest and best-implemented private school choice randomized experiment to date. However, even this program suffers from two common complications in social science experiments: missing data and noncompliance.

Some existing statistical methods have proved to be relatively satisfactory for handling either complication. For example, the principal stratification framework (Frangakis & Rubin, 2002), which is more general and flexible than the standard instrumental variable (IV) approach (Angrist, Imbens, & Rubin, 1996), has successfully addressed the noncompliance complication in many randomized experiments (e.g., Jin & Rubin 2008; Frangakis, Rubin, & Zhou, 2002; Hirano, Imbens, Rubin, & Zhou, 2000; Imbens & Rubin, 1997). In addition, missing data techniques based on iterative computation (Little & Rubin, 2002; Rubin, 1974, 1987, 2004), such as the expectation-maximization (EM) algorithm (Dempster, Laird, & Rubin, 1977) and data augmentation (Tanner & Wong, 1987), have also enjoyed wide application and success. Moreover, the integration of these two approaches, that is, principal stratification and missing data techniques, in randomized experiments with only outcome data missing is relatively straightforward (e.g., Jin & Rubin, 2009).

Covariate missingness, however, appears to be more challenging. For instance, Peterson, Myers, Howell, and Mayer (1999) based their analysis on the complete cases of the New York City School Choice Scholarship Program, thereby legitimately treating the pattern of covariate missingness also as a covariate, but thus ignored the possibly important information in the incomplete cases. Barnard, Frangakis, Hill, and Rubin (2003) made use of both the complete cases and the incomplete cases in the program through a pattern-mixture model, which indeed integrated principal stratification and missing data techniques, but strictly speaking they did not address the missing covariates directly. Specifically, they classified the students into different groups based on their covariate missingness patterns and used different sets of covariates including both observed covariates and covariate missingness patterns, which are always observed, for different groups. However, one possible concern with their approach is that they assumed the missing covariate values did not play a role in the children’s compliance behavior and outcomes, which might be questionable (See the formulization in Section 4.2).

To address this concern, we present a new model, which is a modification of the general location model with missing data (Schafer, 1997), for the analysis of the same data from the school choice scholarship program as in Barnard et al. (2003). We believe that the new model provides an alternative way to integrate the principal stratification concept and missing data techniques, especially about the issue of missing covariates. We first describe the scholarship program in Section 2 and explain the basic idea of principal stratification, as well as relevant assumptions in Section 3. We then contrast our assumptions of missing data with Barnard et al. (2003) in Section 4. In Section 5, we discuss a simple example of the modified general location model, as well as our actual model. Section 6 presents the results and compares them with those in Barnard et al. (2003), and Section 7 provides a concluding discussion.

2. New York City School Choice Scholarship Program

In February 1997, the School Choice Scholarship Foundation (SCSF) launched the New York City School Choice Scholarship Program and invited applications from eligible low-income families interested in scholarships toward private school expenses; these scholarships offered up to $1,400 for the academic year 1997–1998. Eligibility requirements included that the children were attending public school in Grades K through 4 in the New York City at the time of application and that their families were poor enough to qualify for free school lunch. The SCSF received applications from over 20,000 students. In a mandatory information session before the lottery to assign the scholarships, each family provided background information, and the children in Grades 1 through 4 took the Iowa Test of Basic Skills (ITBS), the pretest in reading and math. In the final lottery held in May 1997, about 1,000 students were randomly selected to the treatment group and were awarded offers of scholarships; about another 1,000 were selected to the control group without the scholarship. Both groups were followed up and strongly encouraged to take a posttest, again the ITBS, at the end of the 1997-1998 academic year.

For administrative reasons, the lottery was conducted separately for families with only one child applying (single-child family) and those with at least two children applying (multiple-child family). In addition, two different random selection schemes, a propensity matched pairs design (PMPD; Rubin, 1979; Rosenbaum & Rubin, 1983; Hill, Rubin, & Thomas, 2002) and a randomized block design, were implemented in two periods of the application, respectively. Students originally attending public schools that had average test scores below the citywide median (low average score) were given a higher probability to be offered a scholarship. Thus, the entire experiment can be considered a randomized one, where the design variables are period of application, applicant’s original public school (low average score/high average score), and family size (single/multiple). For details of the experiment, see Hill et al. (2002). For consistency with Barnard et al. (2003), we only use the data from the 1,050 students from single-child families in Grades 1 through 4 at the time of application.

Two major complications exist in our data. One complication is the two-sided noncompliance. About 20% of the students in the treatment group, instead of using the assigned scholarship and attending private school, remained in public school; however, about 10% of the students in the control group attended private school with their own funding, supposedly unavailable at the time of application. The other complication is missing data. Although every effort was made at the information gathering session to eliminate missing covariate data, some students’ families did not provide certain background information; for example, a single mother might not know the current earnings of the child’s father. In addition, about 10% of the pretest scores and 20% of the posttest scores are missing.

3. Principal Stratification

3.1. Principal Stratification Framework

The principal stratification framework, based on the potential outcome concept of the Rubin Causal Model (Holland, 1986; Imbens & Rubin, 1997; Neyman, 1923; Rubin, 1990), has allowed substantial progress in recent years to address complications in randomized experiments involving partially observed intermediate outcomes, such as noncompliance (Angrist et al., 1996), censoring by death (Zhang & Rubin, 2003), and surrogate measurements (Rubin, 2004). In econometrics, the IV approach has been a major tool for noncompliance problems. Angrist et al. (1996) show that the principal stratification framework is not only compatible with the traditional IV approach of econometrics (e.g., Haavelmo, 1943; Tinbergen, 1930) but also more general and flexible, by making weaker assumptions and being more explicit. Barnard et al. (2003) exemplified how the framework can be applied to data from this program, through a pattern-mixture model to cope with covariate data missingness. In this article, we apply the same principal stratification idea but use a modified general location model to handle both noncompliance and missing data simultaneously. In this section, we discuss the principal stratification idea and possible assumptions but ignore the issue of missing data in the data collection process, which we address in Section 4.

We adopt a notation as close to that in Barnard et al. (2003) as possible. Let Z_i be the treatment assignment of child i in the program: Z_i = 1 if the child is assigned to the treatment group with the scholarship offer, and Z_i = 0 if the child is assigned to control. Let Y_i (1) represent the bivariate potential outcome under treatment, that is, the vector having the posttest reading score and math score if the child is assigned treatment; and let Y_i (0) represent the bivariate potential outcome if assigned control. Obviously, only one of the two potential outcomes can be actually observed: Y_i (1) can be observed only when child i is assigned treatment, and Y_i (0) can be observed only when child i is assigned control. The causal effect of treatment assignment on test scores for child i is defined to be E_i = Y_i (1) − Y_i (0). In addition, let X_i be the vector of background covariates for child i.

We now consider the noncompliance issue. Let D_i (1) denote the actual treatment received by child i if assigned treatment: D_i (1) = 1 if the child actually attends private school using the scholarship offered, and D_i (1) = 0 if the child does not use the offered scholarship and remains in public school. Analogously, let D_i (0) denote the actual treatment received by child i if assigned control: D_i (0) = 1 if the child attends private school with his or her own funding, and D_i (0) = 0 if the child remains in public school when not offered the scholarship. We define the principal stratum of child i to be $S_{i} = [D_{i} (1), D_{i} (0)]$ ; analogous to $Y_{i} (1)$ and $Y_{i} (0)$ , only one of the two components in $S_{i}$ can be actually observed. Moreover, there are four possible principal strata: compliers with $S_{i} = (1,0)$ , never-takers of private schools (0, 0), always-takers of private school (1, 1), and defiers (0, 1).

Suppose all the children have the same value of covariates for the moment, the principal stratification structure of the children is illustrated in Table 1, where the “×” represents observed data, and the “?” represents unobserved, or missing, data due to treatment assignment. If every child were a complier (i.e., if all the $S_{i}$ took the value (1, 0)), we could estimate the average causal effect across all the children, $\bar{E} = AVE [Y_{i} (1) - Y_{i} (0)],$ using the intention-to-treat estimate $\hat{I T T} = {AVE}_{Z_{i} = 1} [Y_{i} (1)] - {AVE}_{Z_{i} = 0} [Y_{i} (0)]$ ; that is, the average of all the “×” in the $Y_{i} (1)$ column minus the average of all the “×” in the $Y_{i} (0)$ column. By randomization, the $\hat{I T T}$ would be an unbiased estimate of $\bar{E} .$

Table 1.
Principal Stratification Structure

$i$ $Z_{i}$ $D_{i} (1)$ $D_{i} (0)$ $Y_{i} (1)$ $Y_{i} (0)$

1 1 $\times$ ? $\times$ ?

2 1 $\times$ ? $\times$ ?

… 1 $\times$ ? $\times$ ?

$n_{T}$ 1 $\times$ ? $\times$ ?

$n_{T} + 1$ 0 ? $\times$ ? $\times$

$n_{T} + 2$ 0 ? $\times$ ? $\times$

… 0 ? $\times$ ? $\times$

$n_{T} + n_{C}$ 0 ? $\times$ ? $\times$

$i$	$Z_{i}$	$D_{i} (1)$	$D_{i} (0)$	$Y_{i} (1)$	$Y_{i} (0)$
1	1	$\times$	?	$\times$	?
2	1	$\times$	?	$\times$	?
…	1	$\times$	?	$\times$	?
$n_{T}$	1	$\times$	?	$\times$	?
$n_{T} + 1$	0	?	$\times$	?	$\times$
$n_{T} + 2$	0	?	$\times$	?	$\times$
…	0	?	$\times$	?	$\times$
$n_{T} + n_{C}$	0	?	$\times$	?	$\times$

Note: “ $\times$ ” represents observed data; “?” represents missing data due to treatment assignment.

Now consider covariates. In this section, we assume the background covariates $X_{i}$ are fully observed for each child. If everybody were a complier, and we had large enough samples, we could calculate ${\hat{I T T}}_{cell}$ for each cell defined by $X_{i}$ , where children in one cell share the same values of $X_{i}$ . If all the covariates in the treatment group and the control group are balanced due to randomization, $\hat{I T T}$ for all the children and ${\hat{I T T}}_{cell}$ for the children in each covariate cell are unbiased estimates of $\bar{E}$ , the average causal effects across all the children, and ${\bar{E}}_{cell}$ , the average causal effect within the cell, respectively. However, if some covariates are design variables that influence the probability of treatment assignment, we have to consider the “grand” cells defined by the design variables first, within which the other covariates are balanced by randomization and the above arguments can then be applied.

With noncompliance, we can classify the children into four different principal strata: compliers, always-takers, never-takers, and defiers. Although the treatment assignment determines which component of the principal stratum $S_{i}$ can be actually observed, it cannot affect the value of the bivariate $S_{i}$ . Therefore, we can treat the principal stratum $S_{i}$ as a bivariate covariate and estimate the average causal effect within each principal stratum, which are called principal causal effects and are defined similarly to the above definition of ${\bar{E}}_{cell}$ in a covariate cell, except that $S_{i}$ is only a partially observed covariate. For example, the principal causal effect for compliers, or the complier average causal effect, is $CACE = {AVE}_{S_{i} = (1,0)} [Y_{i} (1) - Y_{i} (0)]$ . Analogously, we can define the always-taker average causal effect (AACE), the never-taker average causal effect (NACE), and the defier average causal effect (DACE). The CACE is especially interesting to us, because for compliers assignment to treatment implies attending private school and assignment to control corresponds to attending public school. Therefore, the CACE not only denotes the average causal effect of assignment to treatment versus control in this subgroup of children but also represents the effect of private school versus public school. Note that in Table 1, although there are missing data for both D and Y variables due to treatment assignment, only the missing D values are new: We need to infer each child’s likely principal stratum to estimate the principal causal effects for each principal stratum of the children. As in Barnard et al. (2003), this typically is done with a Bayesian parametric model and missing data techniques, to be discussed in Section 5.

3.2. Assumptions for Principal Stratification

Here, we explain assumptions common in our approach and Barnard et al. (2003).

Assumption 1: Stable Unit Treatment Value Assumption (SUTVA; Rubin, 1980). This assumption states that the treatment assignment of one child will not affect other children’s posttest scores and that there are no different versions of public schools that a particular child can attend, and analogously, there are no different versions of private schools. We consider this assumption reasonable, because the children in the experiment probably do not know each other, and they typically attend the closest private school or the closest public school from their home. The SUTVA is widely assumed in randomized experiments, and the representation of potential outcomes in Table 1 would not be adequate without it.

Assumption 2: Ignorable Treatment Assignment (Rubin, 1978). Formally, $P [Z | Y (1), Y (0), D (1), D (0), X] = P (Z | Y_{obs}, D_{obs}, X_{obs})$ , where $Y_{obs}$ , $D_{obs}$ , and $X_{obs}$ represent the observed values of outcome, compliance, and covariates, respectively. This assumption holds when the treatment assignment mechanism is completely randomized or only depends on observed values. With this assumption, for Bayesian or likelihood inference, we do not need to model the treatment assignment mechanism explicitly to estimate causal effects. This is the case in the scholarship program, although the model for the data needs to include the design variables to obtain valid estimates.

Assumption 3: Monotonicity: $D_{i} (1) \geq D_{i} (0)$ . This assumption states that there are no defiers. Specifically, each child is more likely to attend private school if assigned the scholarship than if assigned control. This rules out the possibility of defiers in the program and is very plausible.

Assumption 4: Exclusion Restriction: If $D_{i} (1) = D_{i} (0)$ , then $Y_{i} (1) = Y_{i} (0)$ . This states that for each always-taker or never-taker, the causal effect of treatment assignment is zero. Intuitively, because these children attend the same school, whether these children are offered the scholarship, the offer will not affect the type of school they attend and their academic achievements will not be affected by the offer. This assumption might be questionable. For example, an always-taker who will attend private school regardless of the treatment assignment will have extra money if offered the scholarship. This extra money might have an impact on the child’s academic performance and thus the posttest scores. Fortunately, because our estimate of the proportion of always-takers is relatively small (about 11%), and this assumption is more plausible for never-takers, there is reason to believe that making this assumption will not have a substantial impact on the results, as similarly stated in Barnard et al. (2003).

With the above four assumptions, there only exist three principal strata of children in the program: compliers, never-takers, and always-takers; and we are focused on the CACE, because the other two principal causal effects are assumed to be zero.

4. Missing Data

4.1. Missing Covariate and Outcome Data

In the New York City School Choice Scholarship Program, some background covariates and outcomes could not be collected, which are depicted by “??” in Table 2. Note that these data should have been observed but were somehow missing in the data collection process of the experiment, thus different from the missing data due to treatment assignment depicted by “?” in the same table. We use an indicator vector $R_{x i}$ to show whether each scalar component of $X_{i}$ is observed or not, where $R_{x i}$ and $X_{i}$ have the same dimensions. In Table 2, we only show the simplest case where each of them is a scalar. We use a scalar indicator $R_{y i} (1)$ to record whether the child takes the posttest in both reading and math if assigned treatment: $R_{y i} (1) = 1$ means that the child takes the test and we can observe $Y_{i} (1)$ , including both the reading score and the math score of the post-test. Analogously, we use $R_{y i} (0)$ to indicate whether the child takes the post-test if assigned control. Both $R_{y i} (1)$ and $R_{y i} (0)$ are potential outcomes like $Y_{i} (1)$ and $Y_{i} (0)$ .

Table 2.
Principal Stratification Structure With Missing Data

$i$ $Z_{i}$ $X_{i}$ $R_{x i}$ $D_{i} (1)$ $D_{i} (0)$ $Y_{i} (1)$ $Y_{i} (0)$ $R_{y i} (1)$ $R_{y i} (0)$

1 1 $\times$ 1 $\times$ ? $\times$ ? 1 ?

2 1 ?? 0 $\times$ ? $\times$ ? 1 ?

… 1 $\times$ 1 $\times$ ? ?? ? 0 ?

$n_{T}$ 1 $\times$ 1 $\times$ ? $\times$ ? 1 ?

$n_{T} + 1$ 0 $\times$ 1 ? $\times$ ? $\times$ ? 1

$n_{T} + 2$ 0 $\times$ 1 ? $\times$ ? ?? ? 0

… 0 ?? 0 ? $\times$ ? $\times$ ? 1

$n_{T} + n_{C}$ 0 $\times$ 1 ? $\times$ ? $\times$ ? 1

$i$	$Z_{i}$	$X_{i}$	$R_{x i}$	$D_{i} (1)$	$D_{i} (0)$	$Y_{i} (1)$	$Y_{i} (0)$	$R_{y i} (1)$	$R_{y i} (0)$
1	1	$\times$	1	$\times$	?	$\times$	?	1	?
2	1	??	0	$\times$	?	$\times$	?	1	?
…	1	$\times$	1	$\times$	?	??	?	0	?
$n_{T}$	1	$\times$	1	$\times$	?	$\times$	?	1	?
$n_{T} + 1$	0	$\times$	1	?	$\times$	?	$\times$	?	1
$n_{T} + 2$	0	$\times$	1	?	$\times$	?	??	?	0
…	0	??	0	?	$\times$	?	$\times$	?	1
$n_{T} + n_{C}$	0	$\times$	1	?	$\times$	?	$\times$	?	1

Note: “ $\times$ ” represents observed data; “?” represents missing data due to treatment assignment; “??” represents missing data in the data collection process.

4.2. Assumptions for Missing Data

We now consider possible assumptions with such a missing data problem, some of which also were explicitly made by Barnard et al. (2003).

Assumption 5: Compound Exclusion Restriction. If $D_{i} (1) = D_{i} (0)$ , then $Y_{i} (1) = Y_{i} (0)$ and $R_{y i} (1) = R_{y i} (0)$ . This assumption is an extension of the original exclusion restriction assumption achieved by treating $R_{y i} (1)$ and $R_{y i} (0)$ as additional potential outcomes. It states that for always-takers and never-takers, neither their potential posttest score values nor their posttest score missingness will be affected by the scholarship offer. It was first formulated by Frangakis and Rubin (1999) and was used by Barnard et al. (2003); we consider it reasonable for the program data.

Assumption 6: Ignorable Covariate Missingness. $P (R_{x} | X_{obs}, X_{mis}) = P (R_{x} | X_{obs})$ . This is the standard ignorability assumption for missing data (Little & Rubin, 2002). It states that the missingness of covariate data, $R_{x}$ , is independent of the missing values $X_{mis}$ , given the observed values $X_{obs}$ . With this assumption, for Bayesian inference, we do not need to model the missingness mechanism explicitly, if we are interested only in the values of the covariates, whether observed or missing. A special case of the assumption is $P (R_{x} | X_{obs}, X_{mis}) = P (R_{x})$ , which means missingness is completely at random. This assumption is largely irrelevant for Barnard et al. (2003), because they combined the observed covariates $X_{obs}$ and the missingness pattern $R_{x}$ , both of which are always observed, as their new set of covariates. As a result, there were no missing covariates to consider in their approach, which is different from what we present in this article.

Assumption 7: Latently Ignorable Outcome Missingness. $P [R_{y} (1), R_{y} (0) | Y (1), Y (0), X, S] = P [R_{y} (1), R_{y} (0) | X, S]$ . This differs from the standard ignorability: $R_{y} (1)$ and $R_{y} (0)$ , the missingness of $Y (1)$ and $Y (0)$ , do not depend on the underlying outcome values, although they still depend on X and S, both of which might contain certain missing components. Thus, the missingness mechanism of the outcomes is ignorable only if we know the values of the background covariates X and the principal stratum S. As a result, we cannot really ignore the missingness mechanisms of $R_{y} (1)$ and $R_{y} (0)$ ; instead, we need to model them explicitly, because they contain information about the unobserved X and S values we are interested in. This assumption seems more suitable for the outcome missingness than the standard ignorability assumption, possibly providing us more information about the missing data. The same assumption was presented in Barnard et al. (2003) as $P [R_{y} (1), R_{y} (0) | Y (1), Y (0), X, R_{x}, S] = P [R_{y} (1), R_{y} (0) | X_{obs}, R_{x}, S]$ , again because they used both $X_{obs}$ and $R_{x}$ as their fully observed covariates.

Assumptions 1 through 7 lay the foundation for our analysis regardless of specific parametric models. In contrast, Barnard et al. (2003) assumed assumptions 1 through 5 and assumption 7 in a different form. Based on these assumptions, we can propose the modified general location model in the next section.

5. Modified General Location Model

5.1. Standard General Location Model

Table 2 can be transformed into Table 3 to illustrate the data structure we face from the computational perspective. There are three categorical variables: the binary treatment assignment $Z_{i}$ ; the binary fully observed actual attendance $D_{i}$ , which equals $D_{i} (1)$ for the treatment group children and $D_{i} (0)$ for the control group children; the principal stratum $S_{i} \in {c, n, a}$ , which is missing due to the treatment assignment and can only take the value of complier (c), never-taker (n), or always-taker (a). There are two continuous variables: the background covariate $X_{i}$ such as pretest score and the actual outcome $Y_{i}$ , which equals $Y_{i} (1)$ for the treatment group children and $Y_{i} (0)$ for the control group children. Again, we use “×,” “?,” and “??”to indicate observed data, missing data due to treatment assignment, and missing data in the data collection process, respectively. We do not include $R_{x}$ in our model because of the standard ignorability assumption, and $R_{y}$ is discussed in the actual model used in Section 5.3.

Table 3.
General Location Model

$i$ $Z_{i}$ $D_{i}$ $S_{i}$ $X_{i}$ $Y_{i}$

1 1 $\times$ ? $\times$ $\times$

2 1 $\times$ ? ?? $\times$

… 1 $\times$ ? $\times$ ??

$n_{T}$ 1 $\times$ ? $\times$ $\times$

$n_{T} + 1$ 0 $\times$ ? $\times$ $\times$

$n_{T} + 2$ 0 $\times$ ? $\times$ ??

… 0 $\times$ ? ?? $\times$

$n_{T} + n_{C}$ 0 $\times$ ? $\times$ $\times$

$i$	$Z_{i}$	$D_{i}$	$S_{i}$	$X_{i}$	$Y_{i}$
1	1	$\times$	?	$\times$	$\times$
2	1	$\times$	?	??	$\times$
…	1	$\times$	?	$\times$	??
$n_{T}$	1	$\times$	?	$\times$	$\times$
$n_{T} + 1$	0	$\times$	?	$\times$	$\times$
$n_{T} + 2$	0	$\times$	?	$\times$	??
…	0	$\times$	?	??	$\times$
$n_{T} + n_{C}$	0	$\times$	?	$\times$	$\times$

Note: “ $\times$ ” represents observed data; “?” represents missing data due to treatment assignment; “??” represents missing data in the data collection process.

If there were no restrictions imposed by our explicit assumptions discussed in Sections 3 and 4, a standard general location model (Schafer, 1997) could be used to compute the missing data in Table 3 and make inferences for the underlying parameters. This model assumes a multivariate normal distribution $N [{(μ_{x}, μ_{y})}^{T}, Σ]$ for the continuous variables in each cell defined by the categorical variables, where the mean $μ = (μ_{x}, μ_{y})^{T}$ varies across the cells but the covariance matrix Σ remains the same. A constraint matrix A can be applied to reduce the number of unconstrained parameters governing the cell means, such that the mean matrix for all the $n_{cell}$ cells is

W = (μ_{1}, μ_{2}, ..., μ_{n_{cell}})^{T} = A β,

where the components of vector β are the unconstrained parameters. The categorical data follow a log-linear model:

\log π = M λ,

where π represents the probability vector for all the $n_{cell}$ cells, M represents a design matrix specifying the association among the categorical variables, and λ is the corresponding vector of unconstrained parameters.

Computational tools to make inferences for the parameters β, Σ, and λ, as well as the missing data, are generally iterative algorithms such as the EM (Dempster et al., 1977) or Expectation-Conditional Maximization (ECM) algorithms (Meng & Rubin, 1993), as well as data augmentation based on the Gibbs sampler or the Metropolis-Hastings algorithm (Gelman, Carlin, Stern, & Rubin, 2004). The basic idea is that starting from initial values, given the parameters and the observed data, we can estimate the missing data; then given the missing data and the observed data (i.e., the complete data), we in turn can estimate the parameters; and we can iterate this way until the process converges.

Specifically, in the case of the standard general location model, we can apply ECM to obtain maximum likelihood (ML) estimates: Given the complete data, λ can be estimated using one cycle of iterative proportional fitting (IPF; Bishop, Fienberg, & Holland, 1975), and (β, Σ) can be estimated through a multivariate regression $(X, Y) = U A β + ε$ , where (X, Y) is the matrix of the continuous, U is the matrix of dummy indicators recording the cell location of each child, and each row of the random error ϵ follows the multivariate normal distribution $N (0, Σ)$ ; given these parameters, the expectation of the missing complete-data sufficient statistic is straightforward. The process will converge when the iterations are repeated long enough, thus providing us the point estimate or the posterior distribution of the parameters and the missing data. For Bayesian inference, the analogous steps take random draws from the conditional posterior distributions. For details of the model and computation, see Schafer (1997).

5.2. Modified General Location Model—A Simple Example

In Table 3, the value of $D_{i}$ is dictated by the values of $Z_{i}$ and $S_{i}$ ; therefore, the number of cells in Table 3 is only determined by all the possible combinations of $Z \in {0,1}$ and $S \in {c, n, a}$ , that is, $n_{cell} = 2 \times 3 = 6$ . Therefore, we can further transform Table 3 into Table 4, a contingency table with six cells defined by (S, Z). For example, in the (c, 1) cell, all the children are compliers who are actually assigned treatment, the scholarship, in the program. The mean for these children in the cell therefore is defined as ${(μ_{x}^{(c, 1)}, μ_{y}^{(c, 1)})}^{T}$ . Analogously, we can define such means for other cells.

Table 4.
Contingency Table of General Location Model

$Z = 1$ $Z = 0$

S = complier $N [{(μ_{x}^{(c, 1)}, μ_{y}^{(c, 1)})}^{T}, Σ]$ $N [{(μ_{x}^{(c, 0)}, μ_{y}^{(c, 0)})}^{T}, Σ]$

S = never-taker $N [{(μ_{x}^{(n, 1)}, μ_{y}^{(n, 1)})}^{T}, Σ]$ $N [{(μ_{x}^{(n, 0)}, μ_{y}^{(n, 0)})}^{T}, Σ]$

S = always-taker $N [{(μ_{x}^{(a, 1)}, μ_{y}^{(a, 1)})}^{T}, Σ]$ $N [{(μ_{x}^{(a, 0)}, μ_{y}^{(a, 0)})}^{T}, Σ]$

	$Z = 1$	$Z = 0$
S = complier	$N [{(μ_{x}^{(c, 1)}, μ_{y}^{(c, 1)})}^{T}, Σ]$	$N [{(μ_{x}^{(c, 0)}, μ_{y}^{(c, 0)})}^{T}, Σ]$
S = never-taker	$N [{(μ_{x}^{(n, 1)}, μ_{y}^{(n, 1)})}^{T}, Σ]$	$N [{(μ_{x}^{(n, 0)}, μ_{y}^{(n, 0)})}^{T}, Σ]$
S = always-taker	$N [{(μ_{x}^{(a, 1)}, μ_{y}^{(a, 1)})}^{T}, Σ]$	$N [{(μ_{x}^{(a, 0)}, μ_{y}^{(a, 0)})}^{T}, Σ]$

Now, we need to modify the standard general location model, by imposing two constraints to the cell means according to our explicit assumptions.

Ignorable treatment assignment assumption. If the experiment is completely randomized, the continuous covariate variables in a treatment cell (Z = 1) should have the same distribution as those in the corresponding control cell (Z = 0). Specifically, the normal means for the continuous covariates X in the following cells should be the same:

μ_{y}^{(c, 1)} = μ_{x}^{(c, 0)}, μ_{x}^{(n, 1)} = μ_{x}^{(n, 0)}, μ_{x}^{(a, 1)} = μ_{x}^{(a, 0)} .

In reality, there exist some design variables in the program, and so in our actual model in Section 5.3, the above relations hold within each “super” cell defined by those design variables, because all the other covariates are balanced in such cells of design variables by treatment assignment.

Exclusion restriction assumption. Always-takers and never-takers should have zero causal effects. Therefore, the outcomes in a treatment cell should have the same distribution as the outcomes in the corresponding control cell. Specifically, the normal means for the outcomes Y in the following cells should be the same:

μ_{y}^{(n, 1)} = μ_{y}^{(n, 0)}, μ_{y}^{(a, 1)} = μ_{y}^{(a, 0)} .

Note that $μ_{y}^{(c, 1)} - μ_{y}^{(c, 0)}$ is actually the CACE of primary interest. We address the implementation of the compound exclusion restriction in Section 5.3.

As a result of the above two constraints, we should modify the specification and computation of the standard general location model. First, we divide the continuous variables into two blocks, the covariate X block and the outcome Y block and impose different constraint matrices for their respective normal means:

W = (μ_{1}, μ_{2}, ..., μ_{n_{cell}})^{T} = (W_{x}, W_{y}) = (A_{1} β_{1}, A_{2} β_{2}) .

More specifically, let the unconstrained parameters be $β_{1} = (β_{1}^{0}, β_{1}^{1}, β_{1}^{2})^{T}$ and $β_{2} = (β_{2}^{0}, β_{2}^{1}, β_{2}^{2}, β_{2}^{3})^{T}$ . Table 5 provides the definition of the constraint matrices $A_{1}$ and $A_{2}$ , as well as the cell means $W_{x} = A_{1} β_{1}$ and $W_{y} = A_{2} β_{2}$ . It is easy to verify that the cell means satisfy the ignorable treatment assignment and exclusion restriction assumptions. For example, $μ_{x}^{(c, 1)} = β_{1}^{0} = μ_{x}^{(c, 0)}$ and $μ_{y}^{(n, 1)} = β_{2}^{0} + β_{2}^{2} = μ_{y}^{(n, 0)}$ . Please note that $β_{2}^{1} = μ_{y}^{(c, 1)} - μ_{y}^{(c, 0)}$ represents the CACE.

Table 5.

Constraint Matrices and Cell Means of Modified General Location Model

Cell	Z	D	S	$A_{1}$			$A_{2}$				$W_{x}$	$W_{y}$
(c, 1)	1	1	c	1	0	0	1	1	0	0	$β_{1}^{0}$	$β_{2}^{0} + β_{2}^{1}$
(c, 0)	0	0	c	1	0	0	1	0	0	0	$β_{1}^{0}$	$β_{2}^{0}$
(n, 1)	1	0	n	1	1	0	1	0	1	0	$β_{1}^{0} + β_{1}^{1}$	$β_{2}^{0} + β_{2}^{2}$
(n, 0)	0	0	n	1	1	0	1	0	1	0	$β_{1}^{0} + β_{1}^{1}$	$β_{2}^{0} + β_{2}^{2}$
(a, 1)	1	1	a	1	0	1	1	0	0	1	$β_{1}^{0} + β_{1}^{2}$	$β_{2}^{0} + β_{2}^{3}$
(a, 0)	0	1	a	1	0	1	1	0	0	1	$β_{1}^{0} + β_{1}^{2}$	$β_{2}^{0} + β_{2}^{3}$

Second, we partition the common covariance matrix Σ into four submatrices accordingly,

Σ = (\begin{matrix} Σ_{(11)} & Σ_{(12)} \\ Σ_{(21)} & Σ_{(22)} \end{matrix}),

where $Σ_{(21)} = Σ_{(12)}^{T}$ .

Third, we modify the computation for β₁, β₂, and Σ, given the complete data in the following way: We first estimate β₁ and Σ ₍₁₁₎ through the multivariate regression $X = U A_{1} β_{1} + ε_{1}$ , and β₁ and Σ ₍₂₂₎ through $Y = U A_{2} β_{2} + ε_{2}$ ; then, given those parameters, we run a multivariate regression of the second block Y on the first block X to compute Σ ₍₁₂₎:

(Y - U A_{2} β_{2}) = (X - U A_{1} β_{1}) Σ_{(11)}^{- 1} Σ_{(12)} + ε_{2 | 1},

where the row of the random error $ε_{2 | 1}$ follows a multivariate normal distribution

N [0, Σ_{(22)} - Σ_{(21)} Σ_{(11)}^{- 1} Σ_{(12)}] .

It is relatively straightforward to extend this two-block model into models with more blocks. Suppose we have L blocks of continuous variables, each of which has a specific constraint matrix $A_{l}$ , $l = 1,2, . . ., L$ , then the cell mean matrix W is broken into

W = (W_{1}, W_{2}, . . ., W_{L}) = (A_{1} β_{1}, A_{2} β_{2}, . . ., A_{L} β_{L}),

and the covariance matrix Σ now becomes

Σ = (\begin{matrix} Σ_{(11)} & ... & Σ_{(1 L)} \\ ... & ... & ... \\ Σ_{(L 1)} & ... & Σ_{(L L)} \end{matrix}) .

The extention in computation is also straightforward.

5.3. Modified General Location Model—The Actual Model

We actually use a four-block model for the single-child data in the program. The categorical variables are Z, D, S, application period (the PMPD period or not), applicant’s original school (low average score or high average score), and grades (1−4), which account for the design variables of the experiment.

We use the following four blocks of continuous variables:

Block 1 includes two fully observed categorical background covariates (whether the child has received any foreign education, whether the father’s work status is missing) and two almost fully observed categorical covariates (sex, religion) and treats them as continuous variables (X ₁);

Block 2 includes two continuous background covariates (X ₂): pretest reading score and pretest math score;

Block 3 includes two continuous outcomes (Y): posttest reading score and posttest math score;

Block 4 is the outcome missingness treated as a continuous variable (R_y ): whether the child takes the posttest.

In the above specification, we can treat the fully observed categorical data in Block 1 and Block 4 as continuous, because no missing data need to be estimated. For the two almost fully observed covariate variables in Block 1, we do not expect this approximation to significantly distort the results, because very few data (less than 5%) are missing. Of course, the more missing data in the categorical background covariates, the less appropriate it is to use this continuous approximation, and a better model to handle this issue is needed, as discussed in Section 7.

For the categorical variables in our modified general location model, we assumed a log-linear model that had two-way interactions of principal stratum S with application period, S with applicant’s original school and S with grade. We also imposed linear model constraints on the cell means that had main effects for S, application period, and grade along with all two-way interactions among these variables in addition to the constraints implied by ignorable treatment assignment and exclusion restriction.

Due to the latently ignorable outcome missingness assumption, the outcome missingness R_y in Block 4 should be independent of the outcome values Y in Block 3, given the covariates X and the principal stratum S. Within each cell defined by the categorical variables, including S, we take $Σ_{(34)} = Σ_{(43)}^{T} = 0$ , so that marginally across the pretreatment continuous variables, Y and R_y are independent. And A ₃ and A ₄ should satisfy the compound exclusion restriction, which requires $μ_{R_{y}}^{(n, 1)} = μ_{R_{y}}^{(n, 0)}$ , $μ_{R_{y}}^{(a, 1)} = μ_{R_{y}}^{(a, 0)}$ in addition to the exclusion restriction on $μ_{y}$ s as shown in Section 5.2 and Table 5.

We applied our model to the 1,050 children in Grades 1 through 4 from single-child families, with a flat prior distribution for the parameters of our four-block Bayesian model. We first ran the EM algorithm to get the ML estimate of the parameters, and then used it as an initial value to run a Markov Chain Monte Carlo (MCMC) and draw the posterior samples of the parameters and the missing data. In each MCMC iteration, we simulated the parameters and the missing principal strata, missing covariates, and missing outcomes. After convergence, we obtained the posterior distribution of all estimands of interest: parameters, CACE, proportions of the three principal strata, and so on. To ensure convergence, we ran several separate chains starting from different initial values and applied the G-R statistic to monitor convergence (Gelman & Rubin, 1992).

We developed our software using the S-PLUS code for the standard general location model provided by Schafer (1997), with corresponding modifications based on our discussion in this section. To check the validity of our software to and investigate the frequentist properties of our Bayesian model, we carried out a frequentist simulation as follows: We set the posterior median as the true underlying parameters; in each replication, we used the true parameters to generate a pseudo data set and made a certain proportion of the data missing completely at random (see Table 6 for the proportions); then we drew the posterior samples of the parameters using MCMC and examined whether each true scalar parameter was covered by its corresponding posterior 95% interval; we repeated such a replication 200 times, and we then obtained the coverage for each scalar parameter. We tried two different scenarios of the missingness: In the first scenario, we made 10% of the covariate data and 15% of the outcome data missing completely at random; in the second scenario, 25% of the covariate data and 25% of the outcome data were made missing completely at random. Table 6 displays the largest and the smallest coverage rates for those scalar parameters under the two scenarios. The simulation results show very good frequentist properties: The coverage rates are between 91.5% and 100%, which indicates that our software was written correctly. A more extensive Bayesian assessment would have used the method of Cook, Gelman, and Rubin (2006). In addition, the increase of missing data proportion does not appear to affect the coverage rates. To test the sensitivity of the results to different values of the true underlying parameters, we increased and then decreased certain individual components of the parameters by 20%, and there was little difference from the results in Table 6.

Table 6.
Simulation Results

Scenario $X$ Missing Rate $Y$ Missing Rate Min Coverage Rate Max Coverage Rate

1 10% 15% 91.5% 100%

2 25% 25% 92.0% 100%

Scenario	$X$ Missing Rate	$Y$ Missing Rate	Min Coverage Rate	Max Coverage Rate
1	10%	15%	91.5%	100%
2	25%	25%	92.0%	100%

6. Results

We summarize the posterior distribution of CACE, and principal strata proportions, within each group defined by the applicants’ grade at the time of application and whether their original public schools had an average test score higher or lower than the citywide median. Table 7 reports the posterior mean and posterior 95% interval of CACE for each group. Table 8 lists the posterior means and posterior standard deviations for the percentages of all three principal strata in each group. The corresponding results in Barnard et al. (2003) are shown in Tables 9 and 10, respectively.

Table 7.
Complier Average Causal Effect (CACE) of Our Four-Block Modified General Location Model

Grade at Application Applicant’s School: Low Average Score Applicant’s School: High Average Score

Reading Math Reading Math

1 1.8 (−3.1, 6.7) 5.8 (0.7, 10.9) 12.7 (2.2, 23.3) 9.6 (−2.3, 21.7)

2 −0.4 (−5.0, 4.2) 1.5 (−3.2, 6.4) −1.5 (−11.5, 9.3) −1.1 (−12.2, 10.4)

3 2.8 (−2.0, 7.3) 4.0 (−1.0, 9.1) −6.5 (−18.8, 5.6) −1.5 (−15.4, 11.9)

4 7.1 (1.5, 12.7) 3.7 (−2.1, 9.4) −2.2 (−15.7, 11.3) 4.1 (−10.1, 18.0)

Overall 2.5 (−0.7, 5.8) 3.7 (0.5, 7.2) 1.3 (−5.4, 8.0) 2.6 (−6.0, 10.5)

Grade at Application	Applicant’s School: Low Average Score	Applicant’s School: High Average Score
1	1.8 (−3.1, 6.7)	5.8 (0.7, 10.9)	12.7 (2.2, 23.3)	9.6 (−2.3, 21.7)
2	−0.4 (−5.0, 4.2)	1.5 (−3.2, 6.4)	−1.5 (−11.5, 9.3)	−1.1 (−12.2, 10.4)
3	2.8 (−2.0, 7.3)	4.0 (−1.0, 9.1)	−6.5 (−18.8, 5.6)	−1.5 (−15.4, 11.9)
4	7.1 (1.5, 12.7)	3.7 (−2.1, 9.4)	−2.2 (−15.7, 11.3)	4.1 (−10.1, 18.0)
Overall	2.5 (−0.7, 5.8)	3.7 (0.5, 7.2)	1.3 (−5.4, 8.0)	2.6 (−6.0, 10.5)

Table 8.

Principal Strata Proportions of Our Four-Block Model (%)

Grade at Application	Applicant’s School: Low Average Score			Applicant’s School: High Average Score
Grade at Application	Never-Taker	Complier	Always-Taker	Never-Taker	Complier	Always-Taker
1	28.8 (3.2)	62.8 (3.7)	8.5 (1.9)	36.9 (5.3)	52.4 (5.6)	10.8 (2.9)
2	23.7 (3.2)	64.8 (3.7)	11.5 (2.2)	30.6 (5.0)	54.5 (5.5)	14.9 (3.5)
3	25.3 (3.3)	63.8 (3.9)	10.9 (2.2)	32.5 (5.2)	53.5 (5.6)	14.0 (3.5)
4	24.7 (3.7)	63.0 (4.3)	12.3 (2.7)	31.6 (5.3)	52.6 (5.8)	15.7 (3.9)
Overall	25.6 (1.9)	63.7 (2.2)	10.7 (1.2)	33.0 (4.1)	53.4 (4.4)	13.6 (2.6)

Table 9.

Complier Average Causal Effect (CACE) of Barnard et al. (2003)

Grade at Application	Applicant’s School: Low Average Score		Applicant’s School: High Average Score
Grade at Application	Reading	Math	Reading	Math
1	3.4 (−2.0, 8.7)	7.7 (3.0, 12.4)	1.9 (−7.3, 10.3)	7.4 (0.2, 14.6)
2	0.7 (−3.7, 5.0)	1.9 (−2.4, 6.2)	−0.9 (−9.4, 7.3)	1.5 (−6.2, 9.3)
3	1.0 (−4.1, 6.1)	5.0(−0.8, 10.7)	−0.8 (−9.5, 7.7)	4.0 (−4.9, 12.5)
4	4.2 (−1.5, 10.1)	4.3 (−1.6, 10.1)	2.7 (−6.3, 11.3)	3.5 (−4.7, 11.9)
Overall	2.2 (−0.9, 5.3)	4.7 (1.4, 7.9)	0.6 (−7.1, 7.7)	4.2 (−2.6, 10.9)

Table 10.

Principal Strata Proportions of Barnard et al. (2003) (%)

Grade at Application	Applicant’s School: Low Average Score			Applicant’s School: High Average Score
Grade at Application	Never-Taker	Complier	Always-Taker	Never-Taker	Complier	Always-Taker
1	24.5 (2.9)	67.1 (3.8)	8.4 (2.4)	25.0 (5.0)	69.3 (6.1)	5.7 (3.3)
2	20.5 (2.7)	69.4 (3.7)	10.1 (2.5)	25.3 (5.1)	67.2 (6.2)	7.5 (3.4)
3	24.5 (3.2)	65.9 (4.0)	9.6 (2.5)	28.8 (5.6)	64.1 (6.7)	7.1 (3.5)
4	18.4 (3.3)	72.8 (4.6)	8.8 (3.0)	27.0 (5.5)	66.7 (6.7)	6.3 (3.6)

The results of CACE in Table 7 look very similar to those in Table 9: Most of the posterior means are positive but below 10, and most of the 95% intervals cover zero. However, both models find that for compliers originally from schools with low average scores, attendance in private school will unambiguously improve their overall math performance (3.7 [0.5, 7.2] in Table 7; 4.7 [1.4, 7.9] in Table 9) as compared to attendance in public school. Such an improvement is especially evident for children in Grade 1 (5.8[0.7, 10.9] in Table 7; 7.7 [3.0, 12.4] in Table 9). However, results from the two models differ in some other groups in Tables 7 and 9: Using our model, we find that reading score was likely improved for children from Grade 4 of low average schools (7.1 [1.5, 12.7]) and children from Grade 1 of high average schools (12.7 [2.2, 23.3]); the estimates of Barnard et al. (2003) of the two groups, 4.2 (−1.5, 10.1) and 1.9 (−7.3, 10.3), respectively, were much smaller. They instead found that complying children from Grade 1 of high average schools likely enjoyed an increase in math score (7.4 [0.2,14.6]) by attending private school, in contrast to 9.6 (−2.3, 21.7). Their results are consistent in contrast with our higher but more uncertain estimates.

As shown in Table 8, our model found a posterior distribution of principal strata proportions similar to that in Table 10: Compliers account for a majority of the children, and always-takers turn out to be the smallest principal stratum. This is understandable, because all the children are supposed to come from poor families and can not attend private school without the scholarship. However, our model tends to predict a somewhat smaller size of the complier stratum and slightly larger sizes of the other two strata than Barnard et al. (2003), especially for children from high average score public schools.

7. Discussion

Although our modified general location model and the model in Barnard et al. (2003) differ in some underlying assumptions, selection of covariates, and the actual parametric form, we share the same principal stratification framework. This might be one explanation for the similar result of the math improvement of complying children from low average score schools, which is different from the result in Peterson et al. (1999). Another similar result of Barnard et al. (2003) and ours is the proportional distribution of principal strata, which also provides interesting evidence for policymaking.

The relatively minor differences in the results from the two analyses, such as the reading improvements for certain groups, are possibly partly due to the missing covariate values that we make use of in our model, although more detailed examination is needed for such a conclusion. In general, our model seems to be a useful alternative to theirs, because we can use the information contained in those missing covariates otherwise ignored.

However, we have also made some simplifications for computational convenience, such as treating some categorical variables as continuous. This is generally fine for our current model, because the four categorical covariates we used in Block 1 are either fully observed or almost fully observed, and the outcome missingness R_y in Block 4 is also fully observed. However, our current model might encounter some difficulties if the outcome variable itself is categorical and partially observed. Therefore, such a specification calls for improvements in our future work, which hopefully should include all the categorical variables, whether covariate or outcome, in the categorical part of the general location model. In that case, the log-linear model for the categorical part may require further modifications.

Footnotes

Acknowledgments

This work was supported in part by NFS Grant SES-05-05887 and NIH Grant R01 DA023879-01.

References

Angrist

J. D.

Imbens

G. W.

Rubin

D. B.

(1996). Identification of causal effects using instrumental variables (with discussion). Journal of the American Statistical Association, 91, 444–472.

Barnard

Frangakis

C. E.

Hill

J. L.

Rubin

D. B.

(2003). Principal stratification approach to broken randomized experiments: A case study of School Choice vouchers in New York City. Journal of the American Statistical Association, 98, 288–311.

Bishop

Y. M. M.

Fienberg

S. E.

Holland

P. W.

(1975). Discrete multivariate analysis: Theory and practice. Cambridge, MA: MIT Press.

Cook

S. R.

Gelman

Rubin

D. B.

(2006). Validation of Software for Bayesian Models using posterior quantiles. Journal of Graphical and Computational Statistics, 15, 675–692.

Dempster

A. P.

Laird

N. N.

Rubin

D. B.

(1977). Maximum likelihood from incomplete data via EM algorithm. Journal of the Royal Statistical Society Series B-Methodological, 39, 1–38.

Frangakis

C. E.

Rubin

D. B.

(1999). Addressing complications of intention-to-treat analysis in the combined presence of all-or-none treatment-noncompliance and subsequent missing outcomes. Biometrika, 86, 365–379.

Frangakis

C. E.

Rubin

D. B.

(2002). Principal stratification in causal inference. Biometrics, 58, 20–29.

Frangakis

C. E.

Rubin

D. B.

Zhou

X. H.

(2002). Clustered encouragement designs with individual noncompliance: Bayesian inference with randomization, and application to advance directive forms. Biostatistics, 3, 147–164.

Gelman

Carlin

J. B.

Stern

H. S.

Rubin

D. B.

(2004). Bayesian data analysis. (2nd ed.). Boca Raton: Chapman & Hall/CRC.

10.

Gelman

Rubin

D. B.

(1992). Inference from iterative simulation using multiple sequences (with discussion). Statistical Science, 7, 457–511.

11.

Haavelmo

(1943). The statistical implications of a system of simultaneous equations. Econometrica, 11, 1–12.

12.

Hill

J. L.

Rubin

D. B.

Thomas

(2002). The design of the New York school choice scholarship program evaluation. In Bickman

(Ed.), Donald Campbell’s legacy (pp. 155–180). London: Sage Publications.

13.

Hirano

Imbens

G. W.

Rubin

D. B.

Zhou

X. -H.

(2000). Assessing the effect of an influenza vaccine in an encouragement design. Biostatistics, 1, 69–88.

14.

Holland

(1986). Statistics and causal inference. Journal of the American Statistical Association, 81, 945–970.

15.

Imbens

G. W.

Rubin

D. B.

(1997). Bayesian inference for causal effects in randomized experiments with noncompliance. Annals of Statistics, 25, 305–327.

16.

Jin

Rubin

D. B.

(2008). Principal stratification for causal inference with extended partial compliance. Journal of the American Statistical Association, 103, 101–111.

17.

Jin

Rubin

D. B.

(2009). Public schools versus private schools: Causal inference with partial compliance. Journal of Educational and Behavioral Statistics, 34, 24–45.

18.

Krueger

A. B.

Zhu

(2004). Another look at the New York city school voucher experiment. American Behavioral Scientist, 47, 658–698.

19.

Little

R. J. A.

Rubin

D. B.

(2002). Statistical analysis with missing data. (2nd ed.). New York: Wiley.

20.

Meng

X.-L.

Rubin

D. B.

(1993). Maximum likelihood estimation via the ECM algorithm: A general framework. Biometrika, 80, 267–278.

21.

Neyman

(1923). On the application of probability to agricultural experiments. Essay on principles. Section 9. Roczniki Nauk RolniczychTom X (Poland). Statistical Science, 5, 465–480 (Translated in 1990).

22.

Peterson

P. E.

Myers

D. E.

Howell

W. G.

Mayer

D. P.

(1999). The effects of school choice in New York City. In Mayer

S. E.

Peterson

P. E.

(Eds.), Earning and learning; how schools matter (pp. 317–337).Washington: Brookings Institution Press.

23.

Rosenbaum

P. R.

Rubin

D. B.

(1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70, 41–55.

24.

Rubin

D. B.

(1974). Characterizing the estimation of parameters in incomplete data problems. Journal of the American Statistical Association, 69, 467–474.

25.

Rubin

D. B.

(1978). Bayesian inference for causal effects: The role of randomization. The Annals of Statistics, 6, 34–58.

26.

Rubin

D. B.

(1979). Using multivariate matched sampling and regression adjustment to control bias in observational studies. Journal of the American Statistical Association, 74, 318–328.

27.

Rubin

D. B.

(1980). Randomization analysis of experimental-data—the Fisher randomization test—comment. Journal of the American Statistical Association, 75, 591–593.

28.

Rubin

D. B.

(1987). The calculation of posterior distributions by data augmentation—Comment. Journal of the American Statistical Association, 82, 543–546.

29.

Rubin

D. B.

(1990). Comment: Neyman (1923) and causal inference in experiments and observational studies. Statistical Science, 5, 472–480.

30.

Rubin

D. B.

(2004). Direct and indirect causal effects via potential outcomes. Scandinavian Journal of Statistics, 31, 161–170.

31.

Schafer

J. L.

(1997). Analysis of incomplete multivariate data. Boca Raton, FL: Chapman & Hall/CRC.

32.

Tanner

M. A.

Wong

W. H.

(1987). An application of imputation to an estimation problem in grouped lifetime analysis. Technometrics, 29, 23–32.

33.

Tinbergen

(1930). Determination and interpretation of supply curves: An example. In Hendry

Morgan

(Eds.), The foundations of econometric analysis (pp. 233–248). Cambridge University Press.(Reprinted from Zeitschrift fur Nationalokonomie).

34.

Zhang

Rubin

D. B.

(2003). Estimation of causal effects via principal stratification when some outcomes are truncated by “death.” Journal of Educational and Behavioral Statistics, 28, 353–368.

A Modified General Location Model for Noncompliance With Missing Data

Abstract

Keywords

1. Introduction

2. New York City School Choice Scholarship Program

3. Principal Stratification

3.1. Principal Stratification Framework

Table 1. Principal Stratification Structure i Z i D i ( 1 ) D i ( 0 ) Y i ( 1 ) Y i ( 0 ) 1 1 × ? × ? 2 1 × ? × ? … 1 × ? × ? n T 1 × ? × ? n T + 1 0 ? × ? × n T + 2 0 ? × ? × … 0 ? × ? × n T + n C 0 ? × ? ×

4. Missing Data

4.1. Missing Covariate and Outcome Data

5. Modified General Location Model

5.1. Standard General Location Model

Table 3. General Location Model i Z i D i S i X i Y i 1 1 × ? × × 2 1 × ? ?? × … 1 × ? × ?? n T 1 × ? × × n T + 1 0 × ? × × n T + 2 0 × ? × ?? … 0 × ? ?? × n T + n C 0 × ? × ×

Table 6. Simulation Results Scenario X Missing Rate Y Missing Rate Min Coverage Rate Max Coverage Rate 1 10% 15% 91.5% 100% 2 25% 25% 92.0% 100%

Footnotes

Acknowledgments

References

Table 6.
Simulation Results

Scenario $X$ Missing Rate $Y$ Missing Rate Min Coverage Rate Max Coverage Rate

1 10% 15% 91.5% 100%

2 25% 25% 92.0% 100%