Reference Values of Within-District Intraclass Correlations of Academic Achievement by District Characteristics

Abstract

Background:

Randomized experiments are often considered the strongest designs to study the impact of educational interventions. Perhaps the most prevalent class of designs used in large-scale education experiments is the cluster randomized design in which entire schools are assigned to treatments. In cluster randomized trials that assign schools to treatments within a set of school districts, the statistical power of the test for treatment effects depends on the within-district school-level intraclass correlation (ICC). Hedges and Hedberg (2014) recently computed within-district ICC values in 11 states using three-level models (students in schools in districts) that pooled results across all the districts within each state. Although values from these analyses are useful when working with a representative sample of districts, they may be misleading for other samples of districts because the magnitude of ICCs appears to be related to district size. To plan studies with small or nonrepresentative samples of districts, better information are needed about the relation of within-district school-level ICCs to district size.

Objective:

Our objective is to explore the relation between district size and within-district ICCs to provide reference values for math and reading achievement for Grades 3–8 by district size, poverty level, and urbanicity level. These values are not derived from pooling across all districts within a state as in previous work but are based on the direct calculation of within-district school-level ICCs for each school district.

Research Design:

We use mixed models to estimate over 7,000 district-specific ICCs for math and reading achievement in 11 states and for Grades 3–8. We then perform a random effects meta-analysis on the estimated within-district ICCs. Our analysis is performed by grade and subject for different strata designated by district size (number of schools), urbanicity, and poverty rates.

Keywords

education methodological development

Introduction

Randomized experiments are often used to evaluate the impact of educational interventions, products, or services. Over the past decade, the number of experiments in education funded by Federal sources has increased considerably (Spybrook & Raudenbush, 2009). Yet, the number of experimental studies reported in the literature has not kept pace (Spybrook, Puente, & Lininger, 2011), possibly due to null findings from studies that are underpowered. The most common experimental designs used in education are cluster randomized trials (CRTs) that assign whole schools to treatments and where the schools are nested within districts.

CRTs are commonly used to evaluate interventions in other fields such as health (Hayes, Moulton, & Press, 2009) and have been embraced by the education community. The primary reasons for their adoption is the containment of “spillover” or “contamination” effects (the mixing of treatment and control conditions in a common place) and the efficiencies in delivering place-based services (Bloom, 2005). Blocking (controlling for the effects of schools with similar characteristics) is another common practice in such experiments, with school districts serving as a naturally occurring characteristic on which to block schools. Cluster randomized designs incorporating blocks such as districts are sometimes called multisite CRTs (MSCRTs).

In multilevel designs, the precision of estimates of treatment effects, the statistical power to detect effects, and the minimum effect size that is detectable with a given level of certainty (the minimum detectable effect size) all depend (in part) on the variance decomposition between and within schools (Bloom, 2005; Bloom, Bos, & Lee, 1999; Hedges & Rhoads, 2011; Raudenbush, 1997). In two-level designs assigning schools to treatments, this variance decomposition is typically summarized by the school-level intraclass correlation (ICC) coefficient (ρ), which is the proportion of the total variance that occurs between schools. Therefore, planning the sample sizes for a CRT or an MSCRT requires knowledge of the likely value of the school-level ICC coefficient.

The purpose of this article is to explore the distribution of within-district ICCs and to provide guidance on ICC values for mathematics and reading achievement across districts of varying size, urbanicity, and levels of poverty. These values will be especially useful to evaluators employing CRTs where schools (clusters) are assigned to treatment condition but are located within a set of districts (blocked sites). The values presented in this article are unique in that they are not based on three-level (students in schools in districts) mixed models that include entire state data systems (as in Hedges & Hedberg, 2014) but are instead based on school-level ICCs estimated from individual districts. We then summarize these district specific ICCs with a random effects meta-analysis by grade and district subgroups.

CRTs

The values provided in this article have specific use for CRTs or MSCRTs where schools are the level of randomization but are blocked by district fixed effects.¹ Blocked CRT designs in education are usually three-level designs because they involve three-stage sampling where districts (sites) are selected first, then schools (which are statistical clusters), and finally individuals within schools. However, when the number of districts is small, they may be considered to have fixed effects since modeling with so few districts would not produce reliable variance components (and thus district effects may be modeled as a set of dummy variables so that the model reduces to two levels of random effects). Thus, the model for this design is a two-level model predicting the outcome for the ith student in school j in district k, such as (Spybrook & Raudenbush, 2009):

\begin{aligned} y_{i j k} & = γ_{0} + γ_{1} W_{j k} + γ_{2} D_{1} + . . . + γ_{K - 1} D_{K - 1} + r_{j k} + e_{i j k} \\ e_{i j k} ~ N (0, σ_{W}^{2}), r_{j k} ~ N (0, σ_{B}^{2}), \end{aligned}

where $γ_{0}$ is the grand mean, $γ_{1}$ is the average treatment effect, W_jk is a school-level variable coded as 0.5 for treatment and −0.5 for control, r_jk is the school-level residual with mean 0 and variance $σ_{B}^{2}$ , and e_ijk is the student-level residual with mean 0 and variance $σ_{W}^{2}$ .

Given this model, the statistical power of the test for the treatment effect depends on the sample sizes and two other parameters, that is, the within-district ICC and the effect size. The within-district school-level ICC is defined as follows:

ρ = \frac{σ_{B}^{2}}{σ_{B}^{2} + σ_{W}^{2}} .

The effect size (based on the total variation), δ, is defined as:

δ = \frac{γ_{1}}{\sqrt{σ_{B}^{2} + σ_{W}^{2}}} .

The three components to the sample size are as follows: the number of districts (sites) selected (K), the number of schools (clusters) per district (J), and the number of individuals within each school (n), which we will assume here are equal in each cluster for simplicity of exposition.

One method to produce a test statistic for testing the null hypothesis of no treatment effect employs the F sampling distribution with 1 and K(J − 2) degrees of freedom. Under the alternative hypothesis, the test statistic has the noncentral F distribution and has a noncentrality parameter as follows:

λ = \frac{K J δ^{2}}{\begin{matrix} 4 [\frac{ρ + 1 - ρ}{n}] \end{matrix}} .

The power of the design is the inverse cumulative (upper tail or survivor) noncentral F distribution employing this noncentrality parameter and degrees of freedom, that is,1 and K(J − 2). For example, the power for a design with effect size 0.2, n = 20, J = 10, K = 12, and an ICC of .17 is 0.65.²

Finally, it turns out that many different combinations of K, J, and n may give identical (or nearly identical) statistical power. The so-called optimal design or optimal allocation methods (which maximize precision or statistical power for a given cost function) are often used to assist in planning cluster randomized designs (see, e.g., Raudenbush, 1997). Optimal allocation depends on cost data and is also a function of the school-level ICC.

In summary, information about within-district school-level ICCs is crucial in planning experiments that use cluster randomized designs conducted either within a single district or using districts as blocks. ICCs are vital to both estimating the statistical power for a given design and optimally allocating resources to schools and students. This study adds to the empirical data about such values.

Previous Studies of Design Parameters

Several authors have assembled empirical evidence about ICCs to aide researchers in planning cluster randomized designs. For example, Bloom, Richburg-Hayes, and Black (2007) reported ICCs at several grade levels from five large urban school districts in the Eastern United States that had participated in evaluation studies. Bloom et al. (2008) extended the work of Bloom et al. (2007) to provide school-level parameters that extend beyond test scores and include other academic-related outcomes in the same five school districts, also providing ICCs for classrooms within schools. Brandon, Harrison, and Lawton (2013), in work that provides SAS code for estimating ICCs, also provide upper bound values for Hawaii, a state that is a single school district. Finally, Schochet (2008) provides values for ICCs based on large evaluation studies, but few of these are within-district values.

It is important that the variances of ICC estimates are inversely proportional to the number of schools used and therefore ICC estimates from individual randomized trials (even relatively large ones) are subject to rather large sampling uncertainties (large standard errors). The same thing is true of ICC estimates from all but the largest school districts. Thus, the unrepresentative nature of the samples and large sampling uncertainties of estimates given in the studies cited earlier make them suboptimal as reference values for planning CRTs.

To provide ICC estimates from larger and more representative samples, Hedges and Hedberg (2007a) used a set of surveys with large (hundreds to thousands of schools) national probability samples to estimate school-level ICC values for reading and mathematics achievement from kindergarten through Grade 12. ICCs for rural areas were published in Hedges and Hedberg (2007b). Hedges and Hedberg (2011) also provide ICC estimates by grade, region, and certain school characteristics (such as socioeconomic status, achievement level, and urbanicity) via the so-called Online Variance Almanac (https://arc.uchicago.edu/reese/variance-almanac-academic-achievement). The ICC estimates are nationally representative and have acceptably small standard errors. However, the sampling designs of the surveys used did not permit the estimation of between-school district variation. Consequently between-district variation is pooled into between-school variation in the ICC estimates that were computed, which means that the ICCs computed are overestimates of the school-level ICCs (based on three-level models) that are relevant for planning CRTs that use one or a few districts.

To obtain better estimates of within-district school-level ICCs, Hedges and Hedberg (2014) expanded their national database of ICCs by providing values for reading and mathematics achievement based on the analyses of State Longitudinal Data Systems (SLDS) in 11 states (Arkansas, Arizona, Colorado, Florida, Louisiana, Kansas, Kentucky, Massachusetts, North Carolina, West Virginia, and Wisconsin; see http://www.ipr.northwestern.edu/research-areas/designparameters/stateva.html). For evaluations across schools where the investigative team is not concerned with school district effects, they provide school-level ICCs based on two-level models that pool district-level variation into school-level variation. They also provide estimates of the ICC system from three-level models (i.e., an ICC for district-level effects and another ICC for school-level effects). Westine, Spybrook, and Taylor (2014) provide similar values based on SLDS systems for science outcomes.

Why Additional ICC Estimates Are Necessary

The school-level ICC values derived from the statewide three-level models are useful for planning designs that employ a representative sample of districts from a state. However, the research reported in this article demonstrates that within-district school-level ICCs are not constant throughout states but depend on characteristics of districts, particularly on the number of schools in the district (district size). Therefore, pooled state within-district ICCs may be an average of dissimilar values that underestimates the ICCs in large districts and overestimates the ICCs in small districts. Moreover, because the pooled state average within-district ICCs give more weight to large districts (because they contribute more information), the pooled state average ICC estimates are particularly poor estimates of the ICCs in smaller school districts. Thus, the estimates of Hedges and Hedberg (2014) based on SLDSs may not be ideal for planning a CRT using a small number of districts, particularly if the districts in the CRT sample are not representative of the state (e.g., if they are smaller districts).

A review of recent published randomized control trials (RCTs) suggests that the typical RCT uses a small number of districts, usually just one or two. All studies reviewed whether the intervention randomized at the student level used three or fewer districts and whether half of the studies that randomized at the class or school level also used three or fewer districts. Overall, 66% of all studies reviewed used three or fewer districts. This is consistent with the idea that most researchers use local education agencies near their institutions to recruit participants and have insufficient resources to manage more than a handful of districts. These results are based on a review of over 20 published articles and reports over the last 3 years, primary from the American Educational Research Journal, Educational Evaluation and Policy Analysis, Journal of Research on Educational Effectiveness, and The Journal of Experimental Education, which are primary outlets for education experiments (see Agodini, Harris, Thomas, Murphy, & Gallagher, 2010; Bottge, Grant, Stephens, & Rueda, 2010; Bradshaw, Mitchell, & Leaf, 2010; Calderón, Slavin, & Sánchez, 2011; Fantuzzo, Gadsden, & McDermott, 2011; Fulmer & Frijters, 2011; Gersten, Dimino, Jayanthi, Kim, & Santoro, 2010; Goodson et al., 2011; Hamre et al., 2012; Isenberg et al., 2009; Kim, Capotosto, Hartry, & Fitzgerald, 2011; Lane et al., 2011; Laura, McMeeking, Orsi, & Cobb, 2012; Marley, Levin, & Glenberg, 2010; Marley, Szabo, Levin, & Glenberg, 2011; McQuillin, Smith, & Strait, 2011; Olson et al., 2012; Phelan, Choi, Vendlinski, Baker, & Herman, 2011; Reis, McCoach, Little, Muller, & Kaniskan, 2011; Rose, Woolley, Orthner, Akos, & Jones-Sanpei, 2012; Sarama, Clements, Wolfe, & Spitler, 2012; Slavin, Cheung, Holmes, Madden, & Chamberlain, 2012; Springer et al., 2012; VanDerHeyden, McLaughlin, Algina, & Snyder, 2012; Vaughn, Klingner, et al., 2011; Vaughn, Wexler, et al., 2011; Wirkala & Kuhn, 2011; Wolf et al., 2010).

Analysis Plan

The purpose of our analysis is to estimate typical within-district ICCs by subject, grade, district size, urbanicity, and poverty status. Our analysis follows three steps. First, specific to subject and grade, we estimate district-specific school-level ICCs using 11 state data systems: Arkansas, Arizona, Colorado, Florida, Kansas, Kentucky, Louisiana, Massachusetts, North Carolina, West Virginia, and Wisconsin. All data were from the 2009–2010 school year, with the exception of Florida, which supplied data from the 2006–2007 school year, Louisiana, which supplied data from the 2012–2013 school year, and West Virginia, which supplied data from the 2011–2012 school year. Since all states test in Grades 3–8, we focused our analysis only on these grades.

Whether a district was included in the analysis was evaluated separately for each grade. Eligible districts were those that had test scores in at least two schools that served a particular grade (since ICCs are undefined in a district with a single school) and had a harmonic mean number of at least two student scores per school. We use the harmonic mean since it is less prone to outliers. We used a threshold of two students because the variance in the ICC, given subsequently in Equation 8, increases exponentially for harmonic means of fewer than two students (regardless of the value of the ICC).

Each state employed a different achievement test, namely the Augmented Benchmark Examination (Arkansas), Arizona’s Instrument to Measure Standards, Colorado’s Student Assessment Program the Florida Comprehensive Assessment Test, the Kansas Assessment Program, the Commonwealth Accountability Testing System (Kentucky), Louisiana’s Integrated Educational Assessment Program, Massachusetts Comprehensive Assessment System, the North Carolina End of Grade Tests, West Virginia’s WESTEST, and the Wisconsin Knowledge and Concepts Examination.

Second, we compiled our district-specific ICCs into a database and assigned subgroup identifiers. Employing the Common Core of Data (Keaton, 2012), we estimated the 10th, 25th, and 50th percentiles of district size that serve students in Grades 3, 4, 5, 6, 7, and 8. Our percentile analysis used the student count to weight the school records. Thus, we found the district size percentiles from the student point of view. The 10th percentile means that 10% of students are served by a district of a particular size. Weighting the districts by students served by grade, we found for Grades 3 and 4 that the 10th percentile of district size was 3 schools, the 25th percentile was 5 schools, and the 50th percentile was 10 schools. In Grades 5 and 6, the 10th, 25th, and 50th percentiles were 2, 5, and 11 schools, respectively. In Grade 7, the 10th, 25th, and 50th percentiles were 2, 3, and 6 schools, respectively. Finally, in Grade 8, the 10th, 25th, and 50th percentiles were 2, 3, and 7 schools, respectively. The sample of district-specific ICCs was then divided into four groups for each grade, using the 10th, 25th, and 50th percentiles of district size as cut points. These size grouping are noted as “very small,” “small,” “medium,” and “large” districts. We include these school sizes in the results tables for clarity.

Finally, we summarize the district-specific ICCs using a random effects meta-analytic approach (Borenstein, Hedges, Higgins, & Rothstein, 2011; Hedges & Vevea, 1998) as detailed subsequently. We do this by grade and subject and also for district size, poverty status, and urbanicity groups. Poverty is defined as a two-group variable indicating that the district has either (a) fewer than 50% of its students eligible for free or reduced-price lunch or (b) 50% or more of students are eligible for free or reduced-priced lunch. Urbanicity is also a two-group variable indicating that the district is either (a) primarily not in an urban area or (b) primarily in an urban area. Urban areas are defined by Common Core of Data standards. Urban areas meet the following criteria: it is a territory inside an urbanized area and inside a principal city with a population of 250,000 or more, a territory inside an urbanized area and inside a principal city with a population of less than 250,000 and greater than or equal to 100,000, or a territory inside an urbanized area and inside a principal city with a population of less than 100,000.

We provide ICCs by district size categories because there is a relationship between the log number of schools and the value of the ICC (presented in the results section). In addition to district size, many studies are focused on impoverished populations and/or urban populations, which may have different ICC values. To that end, we also provide results by district size for districts with more and fewer than 50% of students eligible for free or reduced-price lunch and for districts that are or are not located in urban areas. For example, researchers conducting evaluation studies in large urban school districts will find Tables 6 and 7 most useful.

Statistical Methodology

Estimating District-Specific ICCs

The district-specific ICCs were estimated by selecting each eligible district in each state, selecting all students within a specific grade and setting the outcome to either the reading or the math score. Once selected, we estimated an unconditional two-level mixed model using restricted maximum likelihood,

y_{i j} = μ + η_{j} + ϵ_{i j},

where y_ij is the score from the ith student from school j, μ is the average of school average, η _j is the school random effect, and $ϵ_{i j}$ is the within-school student random effect. The within-school variance component (the variance of the $ϵ_{i j}$ ’s) is $σ_{W}^{2}$ and the between-school variance component (the variane of the η _j ’s) is $σ_{B}^{2}$ . The estimated ICC is obtained using the estimated variance components as specified in Equation 2.

Random Effects Meta-Analysis of District-Specific ICCs

Our analysis produced several thousand district-specific ICCs, many of which are estimated from small districts where concerns about privacy are relevant. Moreover, there is considerable variation in estimates from similar districts, undoubtedly due to random sampling error. Therefore, instead of providing tables with several hundreds of estimates, we instead summarize our results by presenting average ICCs derived from a random effects meta-analysis (Hedges & Vevea, 1998). Subsequently we provide a brief overview of this procedure in the context of our study.

The goal of a meta-analysis is to summarize the results of a series of estimations in order to provide guidance on the expected “effect.” In our case, the effect is the ICC, and we wish to estimate the population’s typical ICC, based on a given set of k estimates, for use in planning CRTs. If we assumed that the true ICC was the same in all districts (in other words treating the districts as fixed effects), we would conceptualize any estimate, Y_i , as the sum of the true effect, θ, and the sampling error, ∊ _i ,

Y_{i} = θ + ϵ_{i} .

Of course, we don’t know the true effect, only the estimates and the sampling variation associated with them. We can achieve an estimate of the true effect by using the inverse variance of the estimate as a weighting variable. For our ICCs, the estimated variance for the ith ICC is (Fisher, 1925):

V_{i} = \frac{2 (1 - ρ_{i})^{2} (1 + (n_{i} - 1) ρ_{i})^{2}}{n_{i} (n_{i} - 1) (m_{i} - 1)},

where n_i is the harmonic mean number of students per school in the district and m_i is the number of schools in that district. The weight for each ICC is then simply,

W_{i} = V_{i}^{- 1},

and the estimate of the true ICC would be defined as,

{\hat{μ}}_{ρ} = \frac{\sum W_{i} Y_{i}}{\sum W_{i}} .

However, a weakness of this approach is that we cannot assume the same ICC for all districts, even in a subgroup, for two reasons. First, we are making a generalization beyond the observed results. This introduces a random effect beyond the sampling error that must be addressed. A second, more nuanced, set of problem with the fixed effects approach is that each state employs a different standardized test (at least in our data), each state organizes districts in a slightly different way, and the way districts organize their students is not universal. Thus, ICCs are derived from slightly different processes across our observed districts. As a result, we must employ a random effects approach to the meta-analysis.

In a random effects meta-analysis, we conceptualize the estimate, Y_i , as the sum of the average of the true effects, µ_θ, the district’s deviance from the average of the true effect, ζ _i , and the sampling error, ∊ _i ,

Y_{i} = μ_{θ} + ζ_{i} + ϵ_{i} .

We must therefore account for the variance associated with both sampling errors and the variance in the true district values around the average true effect. This is accomplished by estimating the between-district variance of the ICCs, τ². This quantity can be estimated with a method of moments estimator, given in Hedges and Vevea (1998) as:

{\hat{τ}}^{2} = \frac{Q - k - 1}{\sum W_{i} - \sum W_{i}^{2} / \sum W_{i}},

where

Q = \sum W_{i} Y_{i}^{2} - \frac{{(\sum W_{i} Y_{i})}^{2}}{\sum W_{i}} .

Also note that this estimation makes no assumptions about the underlying distribution of the effects (i.e., ICCs). Therefore, it is still an unbiased estimate of the variation in ICCs. However, we do not recommend the use of this variance component to compute a range of a plausible ICC values (e.g., ${\hat{μ}}_{ρ} \pm \hat{τ} \times z_{α / 2}$ ) because the distribution is not normal.

To test the null hypothesis that τ = 0, we use the fact that Q, given in Equation 12, has a χ² distribution with k − 1 degrees of freedom when τ = 0. With this estimate, we calculate the random effects weight for each ICC:

W_{_{i}}^{*} = (V_{i} + τ^{2})^{- 1} .

The summary reported in our results is then the weighted mean of the observed ICCs:

{\hat{μ}}_{ρ}^{*} = \frac{\sum W_{i}^{*} Y_{i}}{\sum W_{i}^{*}},

and its standard error (the square root of the inverse of the sum of the weights),

S E ({\hat{μ}}_{ρ}^{*}) = \sqrt{{(\sum W_{i}^{*})}^{- 1}} .

Note that if τ is estimated to be negative, in which case we truncate $\hat{τ}$ to 0, the weights are simply the fixed effect weights as in Equation 8 and the random effects analysis is identical to the fixed effects analysis.

Database of District-Specific Estimates

This section describes the database of district-specific estimates that we compiled. In a small number of cases, the ICC estimates were quite large and could inflate the estimate of τ². Therefore, to avoid allowing outliers to have disproportionate influence on our estimate of τ², we removed the top 1% of estimates, redacting 71 estimates from our input data greater than the 99th percentile of the estimates (.557). Although this did not have a measurable impact on the mean estimate, it substantially decreased the estimate of τ². This resulted in a set of 3,555 ICCs for mathematics achievement and 3,557 ICCs for reading achievement. Table 1 presents the number of eligible districts by state, grade, and subject. Table 1 also includes the number of students used in the estimates.

Table 1.

Number of Estimated ICCs and Sample Sizes.

	Math
	Grade 3	Grade 4	Grade 5	Grade 6	Grade 7	Grade 8	Total
Arkansas	33	33	21	16	12	11	126
	(15,668)	(15,251)	(11,307)	(8,756)	(7,904)	(7,252)	(66,138)
Arizona	73	73	71	54	44	44	359
	59,619	59,396	58,626	52,472	50,395	51,063	331,571
Colorado	49	49	47	37	32	33	247
	(47,340)	(46,570)	(45,448)	(41,666)	(40,890)	(40,402)	(262,316)
Florida	56	55	55	56	59	52	333
	(157,780)	(155,276)	(153,312)	(159,374)	(156,787)	(152,254)	(934,783)
Kansas	55	53	47	29	17	21	222
	(22,280)	(21,561)	(19,498)	(15,877)	(12,228)	(13,245)	(104,689)
Kentucky	89	89	88	45	28	27	366
	(35,960)	(36,418)	(36,163)	(25,814)	(21,558)	(21,317)	(177,230)
Louisiana	63	64	60	62	62	62	373
	(47,952)	(49,622)	(42,972)	(45,321)	(45,237)	(41,878)	(272,982)
Massachusetts	121	115	99	45	34	34	448
	(45,471)	(44,783)	(40,316)	(24,826)	(22,270)	(23,124)	(200,790)
North Carolina	96	94	92	77	71	72	502
	(96,468)	(93,134)	(91,233)	(85,836)	(83,089)	(83,455)	(533,215)
Wisconsin	102	98	83	31	20	20	354
	(34,944)	(34,460)	(30,880)	(19,134)	(16,426)	(16,670)	(152,514)
West Virginia	47	47	43	32	28	28	225
	(15,804)	(16,522)	(15,918)	(14,479)	(13,695)	(13,120)	(89,538)
Total	784	770	706	484	407	404	3,555
	(579,286)	(572,993)	(545,673)	(493,555)	(470,479)	(463,780)	(3,125,766)
	Reading
Arkansas	33	32	21	16	12	11	125
	(15,633)	(14,964)	(11,285)	(8,744)	(7,885)	(7,239)	(65,750)
Arizona	73	73	71	54	44	44	359
	59,620	59,383	58,627	52,472	50,393	51,076	331,571
Colorado	49	49	47	37	32	33	247
	(46,179)	(46,451)	(45,412)	(41,638)	(40,854)	(40,381)	(260,915)
Florida	56	56	55	56	59	53	335
	(157,839)	(155,494)	(153,292)	(157,758)	(156,323)	(157,475)	(938,181)
Kansas	55	52	47	29	18	20	221
	(22,264)	(21,333)	(19,499)	(15,886)	(12,745)	(12,925)	(104,652)
Kentucky	89	90	88	45	28	27	367
	(35,960)	(36,523)	(36,163)	(25,814)	(21,558)	(21,317)	(177,335)
Louisiana	63	65	62	62	62	61	375
	(47,944)	(50,253)	(43,111)	(45,317)	(45,242)	(42,836)	(274,703)
Massachusetts	121	115	99	45	33	34	447
	(45,032)	(44,404)	(39,975)	(24,551)	(21,815)	(22,856)	(198,633)
North Carolina	96	94	92	77	72	72	503
	(96,158)	(92,848)	(90,986)	(85,592)	(83,290)	(83,225)	(532,099)
Wisconsin	102	98	83	31	20	19	353
	(34,785)	(34,372)	(30,795)	(19,081)	(16,368)	(16,148)	(151,549)
West Virginia	47	47	43	32	28	28	225
	(15,804)	(16,522)	(15,918)	(14,479)	(13,695)	(13,120)	(89,538)
Total	784	771	708	484	408	402	3,557
	(577,218)	(572,547)	(545,063)	(491,332)	(470,168)	(468,598)	(3,124,926)

Note. ICC = intraclass correlation. Number of students are given in parentheses.

The results presented in this article are based on over 3.1 million students. Of the ICCs computed, 16% are from urban areas and about 58% are from high-poverty areas. About 57% of the nonurban areas and 62% of the urban areas are high poverty. Figure 1 presents the number of ICCs estimated by district size, urbanicity, and poverty. The modal ICC is estimated from a nonurban, high-poverty, medium-sized district, followed by a similar small district. The next most common ICC is estimated from a nonurban, low-poverty, small district. Larger districts were more prevalent in urban areas as would be expected.

Figure 1.

Number of estimates by grade-specific district size, urbanicity, and poverty.

The mean ICC estimated from all districts, grades, and subjects was .094, with a standard deviation of .092. The distribution is highly skewed, with a median of .056. The estimated ICCs for math had a mean of .104 and standard deviation of .104. The 10th, 25th, 50th, 75th, and 90th percentiles for math were .001, .027, .076, .149, and .246, respectively. The estimated ICCs for reading were generally lower, with a mean of .084 and standard deviation of .092. The 10th, 25th, 50th, 75th, and 90th percentiles for reading were <.001, .018, .056, .118, and .200, respectively.

Figure 2 presents box plots of the estimated district-specific ICCs by subject, grade, and district size. Each box plot presents a highly skewed distribution. In all districts, math ICCs tend to have a larger median than the reading ICCs, and the medians generally rise with grade level. The variance also increases with grade levels. Examining the box plots for the very small districts, those below the 10th percentile, we see the reverse pattern: ICCs and the variance decrease with grade. The small and medium school districts do not display a consistent pattern with grades, except that the eighth-grade variance is larger. Finally, large school districts are more reflective of the overall pattern.

Figure 2.

Box plot of district-specific intraclass correlations (ICCs) by grade, grade-specific district size, and subject.

Finally, as support for presenting meta-analyses by district size, we estimated unweighted correlations between the district-specific ICC by the log of district size (number of schools) for each subject and grade. The correlation coefficients ranged from .52 to .75, with a median of .70, which supports the claim that ICCs are related to district size.

Results of Meta-Analyses

Tables 2 –11 present the estimated mean ICCs and τ for math and reading achievement by grade and district size. In these tables, we also present the empirical 25th, 50th (the median), and 75th percentiles to give a sense of the observed distribution and variance. Each table is organized in a series of horizontal panels, each for a district size category, with rows for each grade. The number of districts used for the analysis is denoted as k. Tables 2 and 3 present results for all districts. These results are useful for research designs that sample districts with a variety of characteristics and are not limited to only impoverished or rural areas.

Table 2.

Results of Random Effects Meta-Analysis of Within-District ICCs for Mathematics Achievement by District Size and Grade.

Size	Schools	Grade	k	Mean ICC		τ	Empirical Percentiles
Size	Schools	Grade	k	Mean ICC		τ	25th	50th	75th
All districts		3	787	.072	(.004)	.095*	.022	.070	.125
		4	774	.063	(.003)	.048*	.023	.067	.131
		5	711	.080	(.004)	.080*	.032	.075	.140
		6	487	.084	(.005)	.065*	.028	.086	.170
		7	414	.097	(.005)	.062*	.033	.094	.190
		8	418	.169	(.015)	.292*	.032	.119	.267
Very small districts	2–3	3	165	.009	(.004)	^a	<.001	.034	.084
	2–3	4	149	.007	(.003)	^a	<.001	.022	.079
	2	5	10	.006	(.019)	^a	<.001	.025	.081
	2	6	11	.002	(.006)	^a	<.001	.014	.073
	2	7	16	.002	(.005)	^a	<.001	.016	.110
	2	8	15	.001	(.005)	^a	<.001	.018	.075
Small districts	4–5	3	214	.012	(.003)	^a	.010	.046	.096
	4–5	4	216	.030	(.004)	.030*	.014	.047	.107
	3–5	5	313	.014	(.003)	^a	.013	.049	.117
	3–5	6	251	.050	(.006)	.059*	.014	.047	.113
	3	7	91	.060	(.010)	.060*	.005	.048	.137
	3	8	92	.004	(.002)	^a	.004	.030	.169
Medium districts	6–10	3	212	.084	(.012)	.162*	.029	.075	.125
	6–10	4	214	.037	(.003)	.016	.030	.061	.123
	6–11	5	214	.040	(.004)	.018*	.035	.065	.123
	6–11	6	133	.058	(.006)	.027*	.047	.092	.161
	4–6	7	150	.027	(.004)	.014	.027	.072	.141
	4–7	8	174	.139	(.014)	.152*	.032	.099	.210
Large districts	11+	3	172	.118	(.005)	.055*	.077	.117	.171
	11+	4	175	.120	(.005)	.050*	.079	.117	.172
	12+	5	155	.141	(.011)	.122*	.083	.132	.187
	12+	6	81	.174	(.011)	.079*	.100	.175	.252
	7+	7	126	.183	(.012)	.107*	.090	.171	.279
	8+	8	111	.255	(.043)	.442*	.107	.235	.352

Note. ICC = intraclass correlation. ^a τ estimated as 0. Very small districts defined as the 10th percentile of size weighted by students served by grade. Small districts defined as the 25th percentile of size weighted by students served by grade. Medium districts defined as the 50th percentile of size weighted by students served by grade. Large districts defined as the >50th percentile of size weighted by students served by grade.

*p(τ = 0) < .05, standard errors are given in parentheses.

Table 3.

Results of Random Effects Meta-Analysis of Within-District ICCs for Reading Achievement by District Size and Grade.

Size	Schools	Grade	k	Mean ICC		τ	Empirical Percentiles
Size	Schools	Grade	k	Mean ICC		τ	25th	50th	75th
All districts		3	787	.049	(.002)	.037*	.017	.050	.107
		4	774	.057	(.003)	.055*	.017	.053	.108
		5	711	.069	(.007)	.178*	.020	.057	.104
		6	487	.052	(.003)	.038*	.018	.058	.122
		7	414	.067	(.004)	.052*	.015	.065	.147
		8	419	.145	(.013)	.252*	.023	.083	.235
Very small districts	2–3	3	165	.008	(.003)	^a	<.001	.023	.073
	2–3	4	149	.008	(.004)	^a	<.001	.021	.059
	2	5	10	.004	(.015)	^a	<.001	.035	.169
	2	6	11	.005	(.010)	^a	.010	.022	.088
	2	7	17	.003	(.006)	^a	<.001	.011	.073
	2	8	15	.001	(.004)	^a	<.001	.013	.116
Small districts	4–5	3	214	.011	(.003)	^a	.006	.031	.072
	4–5	4	216	.011	(.003)	^a	.004	.035	.076
	3–5	5	313	.009	(.002)	^a	.007	.033	.082
	3–5	6	251	.008	(.002)	^a	.007	.033	.086
	3	7	90	.058	(.011)	.073*	<.001	.028	.097
	3	8	92	.013	(.005)	.018*	.003	.033	.121
Medium districts	6–10	3	212	.032	(.003)	.017*	.024	.054	.095
	6–10	4	214	.058	(.007)	.077*	.026	.054	.094
	6–11	5	214	.019	(.002)	^a	.024	.052	.084
	6–11	6	133	.065	(.008)	.055*	.027	.059	.109
	4–6	7	149	.014	(.003)	^a	.013	.048	.107
	4–7	8	174	.080	(.008)	.069*	.020	.056	.169
Large districts	11+	3	172	.105	(.005)	.052*	.063	.107	.153
	11+	4	175	.108	(.005)	.055*	.069	.103	.156
	12+	5	155	.125	(.024)	.298*	.068	.112	.163
	12+	6	81	.135	(.009)	.057*	.090	.143	.210
	7+	7	125	.127	(.009)	.071*	.065	.122	.201
	8+	8	111	.230	(.042)	.431*	.094	.195	.337

Note. ICC = intraclass correlation. ^aτ estimated as 0. Very small districts defined as the 10th percentile of size weighted by students served by grade. Small districts defined as the 25th percentile of size weighted by students served by grade. Medium districts defined as the 50th percentile of size weighted by students served by grade. Large districts defined as the >50th percentile of size weighted by students served by grade.

*p(τ = 0) < .05, standard errors are given in parentheses.

Table 4.

Results of Random Effects Meta-Analysis of Nonurban Within-District ICCs for Mathematics Achievement by District Size and Grade.

Size	Schools	Grade	k	Mean ICC		τ	Empirical Percentiles
Size	Schools	Grade	k	Mean ICC		τ	25th	50th	75th
All districts		3	683	.064	(.005)	.096*	.019	.063	.116
		4	668	.051	(.003)	.039*	.019	.057	.116
		5	609	.071	(.004)	.079*	.027	.066	.132
		6	397	.081	(.005)	.068*	.026	.079	.165
		7	330	.101	(.006)	.073*	.027	.088	.188
		8	334	.175	(.021)	.358*	.029	.117	.277
Very small districts	2–3	3	164	.009	(.004)	^a	<.001	.035	.085
	2–3	4	147	.007	(.003)	^a	<.001	.020	.080
	2	5	9	.008	(.023)	^a	<.001	.044	.081
	2	6	10	.001	(.006)	^a	<.001	.017	.073
	2	7	14	.001	(.005)	^a	<.001	.011	.086
	2	8	13	.001	(.005)	^a	<.001	.007	.060
Small districts	4–5	3	199	.012	(.003)	^a	.010	.041	.092
	4–5	4	201	.026	(.004)	.025*	.013	.044	.101
	3–5	5	300	.014	(.003)	^a	.012	.050	.114
	3–5	6	218	.055	(.007)	.068*	.014	.047	.128
	3	7	79	.075	(.014)	.079*	.005	.045	.137
	3	8	81	.007	(.004)	^a	.002	.029	.180
Medium districts	6–10	3	192	.083	(.013)	.171*	.028	.072	.121
	6–10	4	197	.038	(.004)	.017	.030	.058	.116
	6–11	5	193	.039	(.004)	.019*	.034	.064	.123
	6–11	6	112	.058	(.007)	.030*	.041	.089	.164
	4–6	7	123	.034	(.005)	.012	.027	.076	.151
	4–7	8	142	.157	(.020)	.211*	.034	.101	.232
Large districts	11+	3	107	.106	(.007)	.052*	.065	.105	.169
	11+	4	109	.112	(.007)	.050*	.069	.107	.168
	12+	5	93	.139	(.016)	.145*	.068	.122	.187
	12+	6	50	.172	(.016)	.088*	.096	.164	.272
	7+	7	87	.188	(.015)	.114*	.090	.171	.285
	8+	8	76	.277	(.054)	.459*	.120	.263	.377

Note. ICC = intraclass correlation. ^aτ estimated as 0. Very small districts defined as the 10th percentile of size weighted by students served by grade. Small districts defined as the 25th percentile of size weighted by students served by grade. Medium districts defined as the 50th percentile of size weighted by students served by grade. Large districts defined as the >50th percentile of size weighted by students served by grade.

*p(τ = 0) < .05, standard errors are given in parentheses.

Table 5.

Results of Random Effects Meta-Analysis of Nonurban Within-District ICCs for Reading Achievement by District Size and Grade.

Size	Schools	Grade	k	Mean ICC		τ	Empirical Percentiles
Size	Schools	Grade	k	Mean ICC		τ	25th	50th	75th
All districts		3	683	.037	(.002)	.026*	.012	.042	.093
		4	668	.046	(.003)	.049*	.012	.047	.091
		5	609	.063	(.009)	.196*	.016	.052	.094
		6	397	.045	(.003)	.036*	.016	.050	.114
		7	330	.063	(.005)	.055*	.013	.060	.134
		8	335	.148	(.017)	.284*	.020	.080	.244
Very small districts	2–3	3	164	.008	(.003)	^a	<.001	.024	.074
	2–3	4	147	.007	(.004)	^a	<.001	.021	.059
	2	5	9	.003	(.015)	^a	<.001	.013	.110
	2	6	10	.005	(.010)	^a	.010	.018	.088
	2	7	15	.003	(.007)	^a	<.001	.009	.073
	2	8	13	.001	(.004)	^a	<.001	.008	.074
Small districts	4–5	3	199	.010	(.003)	^a	.004	.029	.068
	4–5	4	201	.010	(.003)	^a	.004	.028	.069
	3–5	5	300	.012	(.002)	^a	.007	.034	.081
	3–5	6	218	.009	(.002)	^a	.007	.033	.089
	3	7	78	.065	(.014)	.089*	<.001	.027	.097
	3	8	81	.017	(.006)	.023*	.002	.032	.123
Medium districts	6–10	3	192	.031	(.003)	.018*	.023	.053	.085
	6–10	4	197	.057	(.007)	.079*	.025	.054	.089
	6–11	5	193	.018	(.002)	^a	.023	.051	.082
	6–11	6	112	.065	(.009)	.058*	.025	.058	.106
	4–6	7	123	.014	(.003)	^a	.012	.046	.124
	4–7	8	142	.077	(.009)	.065*	.020	.059	.175
Large districts	11+	3	107	0.093	(.006)	.046*	.061	.096	.148
	11+	4	109	.097	(.006)	.050*	.058	.090	.154
	12+	5	93	.120	(.036)	.348*	.061	.102	.163
	12+	6	50	.122	(.011)	.051*	.081	.119	.210
	7+	7	86	.114	(.010)	.063*	.062	.112	.195
	8+	8	76	.250	(.054)	.462*	.102	.224	.361

Note. ICC = intraclass correlation. ^aτ estimated as 0. Very small districts defined as the 10th percentile of size weighted by students served by grade. Small districts defined as the 25th percentile of size weighted by students served by grade. Medium districts defined as the 50th percentile of size weighted by students served by grade. Large districts defined as the >50th percentile of size weighted by students served by grade.

*p(τ = 0) < .05, standard errors are given in parentheses.

Table 6.

Results of Random Effects Meta-Analysis of Urban Within-District ICCs for Mathematics Achievement by District Size and Grade.

Size	Schools	Grade	k	Mean ICC		τ	Empirical Percentiles
Size	Schools	Grade	k	Mean ICC		τ	25th	50th	75th
All districts		3	104	.119	(.009)	.068*	.082	.116	.164
		4	106	.120	(.008)	.058*	.076	.124	.169
		5	102	.120	(.008)	.058*	.083	.125	.174
		6	90	.099	(.009)	.056*	.048	.100	.175
		7	84	.093	(.008)	.047*	.051	.106	.227
		8	84	.120	(.010)	.055*	.040	.130	.235
Very small districts	2–3	3	—	—	—	—
	2–3	4	2	.049	(.179)	^a	.030	.040	.051
	2	5	—	—	—	—
	2	6	—	—	—	—
	2	7	2	.066	(.084)	^a	.044	.188	.332
	2	8	2	.023	(.041)	^a	.018	.134	.250
Small districts	4–5	3	15	.020	(.012)	^a	.042	.084	.118
	4–5	4	15	.134	(.036)	.103*	.049	.134	.193
	3–5	5	13	.023	(.012)	^a	.025	.036	.169
	3–5	6	33	.005	(.004)	^a	.017	.043	.087
	3	7	12	.002	(.004)	^a	.005	.052	.121
	3	8	11	.001	(.003)	^a	.020	.038	.133
Medium districts	6–10	3	20	.069	(.016)	.040*	.054	.106	.167
	6–10	4	17	.038	(.011)	^a	.026	.095	.129
	6–11	5	21	.069	(.013)	^a	.062	.086	.123
	6–11	6	21	.070	(.014)	^a	.071	.096	.148
	4–6	7	27	.004	(.003)	^a	.019	.061	.112
	4–7	8	32	.031	(.009)	.019*	.027	.088	.155
Large districts	11+	3	65	.138	(.009)	.052*	.098	.126	.172
	11+	4	66	.134	(.008)	.043*	.092	.133	.173
	12+	5	62	.143	(.009)	.050*	.094	.142	.182
	12+	6	31	.177	(.014)	.049*	.107	.192	.244
	7+	7	39	.172	(.017)	.085*	.090	.152	.264
	8+	8	35	.204	(.021)	.102*	.095	.211	.296

Note. ICC = intraclass correlation. ^aτ estimated as 0. Very small districts defined as the 10th percentile of size weighted by students served by grade. Small districts defined as the 25th percentile of size weighted by students served by grade. Medium districts defined as the 50th percentile of size weighted by students served by grade. Large districts defined as the >50th percentile of size weighted by students served by grade.

*p(τ = 0) < .05, standard errors are given in parentheses.

Table 7.

Results of Random Effects Meta-Analysis of Urban Within-District ICCs for Reading Achievement by District Size and Grade.

Size	Schools	Grade	k	Mean ICC		τ	Empirical Percentiles
Size	Schools	Grade	k	Mean ICC		τ	25th	50th	75th
All districts		3	104	.105	(.007)	.057*	.063	.112	.152
		4	106	.117	(.009)	.071*	.063	.120	.162
		5	102	.105	(.009)	.070*	.058	.100	.155
		6	90	.081	(.008)	.049*	.032	.072	.153
		7	84	.083	(.008)	.048*	.045	.088	.189
		8	84	.121	(.012)	.084*	.029	.094	.224
Very small districts	2–3	3	—	—	—	—
	2–3	4	2	.027	(.163)	^a	<.001	.015	.030
	2	5	—	—	—	—
	2	6	—	—	—	—
	2	7	2	.046	(.103)	.105	.011	.181	.351
	2	8	2	.016	(.032)	^a	.013	.157	.302
Small districts	4–5	3	15	.048	(.017)	^a	.047	.071	.174
	4–5	4	15	.132	(.037)	.116*	.022	.083	.183
	3–5	5	13	.009	(.009)	.010	.011	.031	.140
	3–5	6	33	.005	(.003)	^a	.007	.022	.070
	3	7	12	.004	(.005)	^a	.023	.058	.090
	3	8	11	.005	(.006)	^a	.004	.042	.104
Medium districts	6–10	3	20	.038	(.010)	^a	.034	.091	.133
	6–10	4	17	.059	(.016)	.037	.035	.071	.125
	6–11	5	21	.040	(.010)	^a	.034	.069	.089
	6–11	6	21	.059	(.013)	^a	.046	.061	.122
	4–6	7	26	.013	(.006)	^a	.015	.048	.096
	4–7	8	32	.090	(.020)	.087*	.022	.042	.158
Large districts	11+	3	65	.124	(.009)	.059*	.088	.122	.164
	11+	4	66	.126	(.009)	.060*	.088	.136	.170
	12+	5	62	.129	(.010)	.065*	.087	.131	.163
	12+	6	31	.153	(.016)	.062*	.105	.168	.219
	7+	7	39	.153	(.019)	.103*	.084	.132	.239
	8+	8	35	.182	(.023)	.119*	.089	.160	.259

Note. ICC = intraclass correlation. ^aτ estimated as 0. Very small districts defined as the 10th percentile of size weighted by students served by grade. Small districts defined as the 25th percentile of size weighted by students served by grade. Medium districts defined as the 50th percentile of size weighted by students served by grade. Large districts defined as the >50th percentile of size weighted by students served by grade.

*p(τ = 0) < .05, standard errors are given in parentheses.

Table 8.

Results of Random Effects Meta-Analysis of Low-Poverty Within-District ICCs for Mathematics Achievement by District Size and Grade.

Size	Schools	Grade	k	Mean ICC		τ	Empirical Percentiles
Size	Schools	Grade	k	Mean ICC		τ	25th	50th	75th
All districts		3	350	.062	(.007)	.120*	.017	.056	.102
		4	342	.034	(.003)	.022*	.014	.045	.094
		5	312	.057	(.004)	.044*	.018	.063	.122
		6	179	.074	(.007)	.068*	.018	.066	.138
		7	163	.091	(.007)	.060*	.024	.086	.184
		8	178	.153	(.015)	.173*	.029	.093	.259
Very small districts	2–3	3	88	.006	(.004)	^a	<.001	.027	.075
	2–3	4	79	.006	(.004)	^a	<.001	.011	.047
	2	5	8	.009	(.024)	^a	<.001	.044	.222
	2	6	9	.001	(.006)	^a	<.001	.014	.073
	2	7	12	.002	(.006)	^a	<.001	.012	.069
	2	8	11	.002	(.006)	^a	<.001	.007	.075
Small districts	4–5	3	111	.010	(.003)	^a	.006	.036	.079
	4–5	4	109	.012	(.003)	^a	.014	.036	.081
	3–5	5	153	.012	(.003)	^a	.009	.041	.091
	3–5	6	97	.044	(.009)	.065*	.007	.033	.097
	3	7	34	.081	(.017)	.074*	.005	.028	.137
	3	8	38	.014	(.007)	.016	.008	.034	.133
Medium districts	6–10	3	81	.078	(.027)	.233*	.025	.061	.091
	6–10	4	88	.022	(.004)	^a	.021	.045	.080
	6–11	5	86	.035	(.005)	.024*	.018	.058	.114
	6–11	6	38	.062	(.012)	.033	.031	.072	.165
	4–6	7	52	.006	(.003)	^a	.027	.056	.103
	4–7	8	71	.059	(.009)	.038*	.027	.064	.149
Large districts	11+	3	62	.104	(.008)	.044*	.076	.108	.165
	11+	4	58	.101	(.008)	.043*	.060	.109	.173
	12+	5	56	.136	(.011)	.068*	.085	.125	.187
	12+	6	28	.181	(.021)	.093*	.093	.135	.335
	7+	7	52	.205	(.020)	.119*	.097	.189	.299
	8+	8	46	.278	(.033)	.211*	.150	.277	.369

Note. ICC = intraclass correlation. ^aτ estimated as 0. Very small districts defined as the 10th percentile of size weighted by students served by grade. Small districts defined as the 25th percentile of size weighted by students served by grade. Medium districts defined as the 50th percentile of size weighted by students served by grade. Large districts defined as the >50th percentile of size weighted by students served by grade.

*p(τ = 0) < .05, standard errors in parentheses.

Table 9.

Results of Random Effects Meta-Analysis of Low-Poverty Within-District ICCs for Reading Achievement by District Size and Grade.

Size	Schools	Grade	k	Mean ICC		τ	Empirical Percentiles
Size	Schools	Grade	k	Mean ICC		τ	25th	50th	75th
All districts		3	349	.031	(.003)	.024*	.008	.039	.084
		4	342	.046	(.004)	.057*	.010	.040	.085
		5	314	.041	(.003)	.027*	.019	.052	.088
		6	179	.045	(.005)	.040*	.012	.047	.114
		7	164	.067	(.007)	.056*	.014	.059	.141
		8	180	.130	(.013)	.152*	.016	.065	.210
Very small districts	2–3	3	88	.007	(.004)	^a	<.001	.019	.064
	2–3	4	79	.006	(.004)	^a	<.001	.016	.047
	2	5	8	.003	(.015)	^a	<.001	.030	.140
	2	6	9	.006	(.011)	^a	.012	.022	.088
	2	7	12	.005	(.009)	^a	.005	.012	.055
	2	8	11	.001	(.005)	^a	<.001	.013	.188
Small districts	4–5	3	111	.008	(.003)	^a	<.001	.024	.050
	4–5	4	109	.009	(.003)	^a	.007	.032	.063
	3–5	5	154	.007	(.003)	^a	.005	.031	.067
	3–5	6	97	.008	(.003)	.007	.005	.019	.064
	3	7	34	.088	(.019)	.086*	<.001	.031	.097
	3	8	38	.023	(.009)	.031*	.004	.028	.104
Medium districts	6–10	3	80	.034	(.006)	.030*	.019	.043	.081
	6–10	4	88	.059	(.013)	.105*	.022	.046	.086
	6–11	5	87	.027	(.004)	.010	.025	.054	.080
	6–11	6	38	.098	(.019)	.096*	.019	.053	.164
	4–6	7	52	.012	(.004)	^a	.012	.040	.091
	4–7	8	72	.034	(.007)	.030*	.014	.040	.145
Large districts	11+	3	62	.086	(.007)	.038*	.057	.091	.163
	11+	4	58	.096	(.008)	.049*	.046	.087	.156
	12+	5	56	.105	(.008)	.042*	.062	.109	.168
	12+	6	28	.112	(.013)	.048*	.073	.098	.215
	7+	7	53	.120	(.013)	.066*	.065	.131	.221
	8+	8	47	.239	(.029)	.186*	.089	.212	.334

Note. ICC = intraclass correlation. ^aτ estimated as 0. Very small districts defined as the 10th percentile of size weighted by students served by grade. Small districts defined as the 25th percentile of size weighted by students served by grade. Medium districts defined as the 50th percentile of size weighted by students served by grade. Large districts defined as the >50th percentile of size weighted by students served by grade.

*p(τ = 0) < .05, standard errors are given in parentheses.

Table 10.

Results of Random Effects Meta-Analysis of High-Poverty Within-District ICCs for Mathematics Achievement by District Size and Grade.

Size	Schools	Grade	k	Mean ICC		τ	Empirical Percentiles
Size	Schools	Grade	k	Mean ICC		τ	25th	50th	75th
All districts		3	437	.076	(.004)	.054*	.030	.082	.147
		4	432	.088	(.004)	.062*	.037	.088	.154
		5	399	.094	(.006)	.102*	0.041	.086	.157
		6	308	.089	(.006)	.054*	.038	.093	.178
		7	251	.104	(.007)	.067*	.042	.098	.195
		8	240	.179	(.026)	.382*	.037	.129	.269
Very small districts	2–3	3	77	.016	(.007)	^a	.004	.041	.139
	2–3	4	70	.013	(.007)	^a	.006	.053	.106
	2	5	2	.000	(.034)	^a	<.001	.003	.005
	2	6	2	.013	(.029)	^a	.010	.032	.054
	2	7	4	.001	(.010)	^a	.007	.074	.233
	2	8	4	.001	(.008)	^a	.016	.047	.246
Small districts	4–5	3	103	.016	(.005)	^a	.018	.063	.113
	4–5	4	107	.061	(.010)	.063*	.014	0.065	.134
	3–5	5	160	.018	(.004)	^a	.023	.066	.155
	3–5	6	154	.033	(.005)	.016	.023	.066	.131
	3	7	57	.020	(.008)	.020	.008	.053	.130
	3	8	54	.002	(.003)	^a	.001	.028	.202
Medium districts	6–10	3	131	.045	(.005)	.019	.034	.084	.143
	6–10	4	126	.055	(.006)	.027*	.037	.080	.146
	6–11	5	128	.040	(.004)	^a	.042	.070	.138
	6–11	6	95	.056	(.007)	.025	.060	.096	.152
	4–6	7	98	.052	(.008)	.027	.027	.086	.175
	4–7	8	103	.172	(.028)	.255*	.039	.123	.233
Large districts	11+	3	110	.125	(.007)	.058*	.080	.127	.176
	11+	4	117	.129	(.006)	.049*	.086	.129	.170
	12+	5	99	.141	(.015)	.140*	.079	.132	.187
	12+	6	53	.171	(.013)	.068*	.109	.183	.244
	7+	7	74	.170	(.015)	.101*	.086	.157	.260
	8+	8	65	.240	(.060)	.476*	.095	.211	.325

Note. ICC = intraclass correlation. ^aτ estimated as 0. Very small districts defined as the 10th percentile of size weighted by students served by grade. Small districts defined as the 25th percentile of size weighted by students served by grade. Medium districts defined as the 50th percentile of size weighted by students served by grade. Large districts defined as the >50th percentile of size weighted by students served by grade.

*p(τ = 0) < .05, standard errors are given in parentheses.

Table 11.

Results of Random Effects Meta-Analysis of High-Poverty Within-District ICCs for Reading Achievement by District Size and Grade.

Size	Schools	Grade	k	Mean ICC		τ	Empirical Percentiles
Size	Schools	Grade	k	Mean ICC		τ	25th	50th	75th
All districts		3	438	.062	(.003)	.044*	.024	.062	.124
		4	432	.067	(.004)	.049*	.022	.064	.122
		5	397	.079	(.013)	.240*	.023	.064	.120
		6	308	.057	(.004)	.033*	.025	.061	.128
		7	250	.065	(.005)	.044*	.016	.072	.163
		8	239	.153	(.024)	.342*	.026	.102	.245
Very small districts	2–3	3	77	.009	(.006)	^a	<.001	.028	.086
	2–3	4	70	.011	(.007)	^a	<.001	.027	.081
	2	5	2	.049	(.115)	^a	.013	.137	.260
	2	6	2	.003	(.020)	^a	.001	.019	.038
	2	7	5	.001	(.010)	^a	<.001	<.001	.164
	2	8	4	.002	(.009)	^a	.004	.041	.095
Small districts	4–5	3	103	.019	(.005)	^a	.013	.038	.103
	4–5	4	107	.025	(.006)	.026*	.003	.035	.094
	3–5	5	159	.015	(.004)	^a	.009	.047	.103
	3–5	6	154	.014	(.003)	^a	.011	.040	.095
	3	7	56	.014	(.007)	^a	.002	.028	.099
	3	8	54	.009	(.006)	^a	<.001	.042	.124
Medium districts	6–10	3	132	.030	(.004)	^a	.030	.055	.104
	6–10	4	126	.035	(.004)	^a	.030	.057	.107
	6–11	5	127	.017	(.003)	^a	.019	.049	.086
	6–11	6	95	.031	(.005)	^a	.029	.060	.104
	4–6	7	97	.020	(.005)	.012	.013	.052	.108
	4–7	8	102	.113	(.014)	.098*	.024	.072	.196
Large districts	11+	3	110	.113	(.007)	.056*	.072	.120	.153
	11+	4	117	.114	(.007)	.057*	.074	.116	.156
	12+	5	99	.130	(.034)	.338*	.068	.114	.163
	12+	6	53	.149	(.012)	.061*	.095	.165	.206
	7+	7	72	.132	(.012)	.077*	.071	.118	.195
	8+	8	64	.222	(.060)	.470*	.099	.176	.338

Note. ICC = intraclass correlation. ^aτ estimated as 0. Very small districts defined as the 10th percentile of size weighted by students served by grade. Small districts defined as the 25th percentile of size weighted by students served by grade. Medium districts defined as the 50th percentile of size weighted by students served by grade. Large districts defined as the >50th percentile of size weighted by students served by grade.

*p(τ = 0) < .05, standard errors are given in parentheses.

However, other designs may be more specific. To serve those researchers, Tables 4 and 5 present results for nonurban districts. Tables 6 and 7 present results for urban districts. Tables 8 and 9 present results for low-poverty (less than 50% of students who are eligible for free or reduced price lunch) districts. Finally, Tables 10 and 11 present results for high-poverty (at least 50% of students who are eligible for free or reduced-price lunch) districts. In this section, we present some patterns (or lack of patterns) of interest. Overall, each pattern noted here will have exceptions, but the following will provide some basic insight into the distribution of ICCs.

Results for all Districts and Comparison With Statewide Estimates

As is typical in other studies, we generally see that ICCs for mathematics are higher than ICCs estimated for reading. This is a relatively stable pattern, but there are exceptions. In medium-sized districts, the reading ICCs are larger for Grades 4 and 6. They are also larger in small eighth-grade districts. For mathematics achievement, the average within-district ICC derived from individual three-level models (students nested within schools nested within districts) from each state is about .11 for Grades 3, 4, and 5. The meta-analysis results in smaller ICCs for Grades 3, 4, and 5 are .07, .06, and .08, respectively. Although our seventh-grade estimate is also smaller, .10 versus .13, our estimates for eighth grade are larger than the three-level models, .17 versus .16.

We also observe smaller estimated mean ICCs in our analysis for reading achievement in all grades, compared to the results from analyses that pool all data across districts within each state. In Grades 3 and 4, the average results from the state data that pool the information across districts are much larger than the estimated average of the district-specific estimates from the meta-analysis, .10 versus .05 and .10 versus .06, respectively. In Grade 7, the result from our meta-analysis is smaller than the average of the district-specific estimates, .07 versus .10.

To contextualize the results presented in this section to results from estimates that pool all data across districts, we provide the following guidance. The results in this study are appropriate for planning targeted samples that do not represent an entire state. Conversely, the results from data that pool information across all districts are meant to inform designs that draw a sample from all districts.

Results by District Size

In most grades, the ICCs are larger in large districts. For example, in Table 2, the Grade 3 math ICCs for very small, small, medium, and large districts are .009, .012, .084, and .118, respectively. This general pattern holds for all grades in math except Grade 7, where the small districts have a larger ICC than the medium districts. Another notable feature is that the pattern is not exceptionally linear, with the larger districts having much larger ICCs than the smaller districts.

Also of note is that the largest districts show similar ICCs for earlier grades compared with the ICCs from three-level models. For example, the mean math ICC from the meta-analysis for Grade 3 in the largest districts is .118, which is similar to the average from the three-level model analyses (.112). This supports the hypothesis that the three-level models are unduly influenced by larger districts.

Results by Grade

Previous investigations found that the ICCs generally increase with grade level (Hedges & Hedberg 2007a, 2007b, and 2014). While we again find this is true in broad strokes, closer examination reveals a more complicated picture. In all districts, we observe a pattern for math in which mean ICCs increase, with Grades 3, 4, 5, 6, 7, and 8 having mean ICCs of .072, .063, .080, .084, .097, and .169, respectively. This pattern is not evident in reading, with the ICCs appearing to “bounce around” for Grades 3–7. These patterns hold in the smaller districts as well, although there is a linear increase in the ICCs by grade in the largest districts.

Results by Urbanicity

Tables 4 and 5 present ICCs for mathematics and reading achievement for nonurban districts, while Tables 6 and 7 present math and reading ICCs for urban districts. Some combinations of district size and urbanicity were not represented in our data and thus meta-analysis was not possible. For districts of any size, we generally find ICCs in urban areas are larger for the lower grades than those in nonurban areas. In the higher grades, the nonurban areas tend to have larger ICCs, especially in eighth grade.

Results by Poverty

Tables 8 and 9 present ICCs for mathematics and reading achievement for districts with less than a 50% rate of free or reduced-price lunch eligible students, while Tables 10 and 11 present math and reading ICCs for districts with at least half of students eligible for free or reduced-price lunch. For districts of any size, we generally find that ICCs in high-poverty districts are larger than those in low-poverty districts. The exception to this pattern is seventh-grade reading, where the high-poverty ICC is slightly lower than the low-poverty ICC.

In the smaller districts, we generally see that the high-poverty ICCs are higher than the low poverty ICCs in the earlier grades (i.e., Grades 3–5). In Grades 6–8, however, it is the low-poverty ICCs that are smaller in the smaller districts. In the medium size districts, the reading ICCs are lower in most grades for low-poverty compared to high-poverty districts, whereas the math ICCs do not seem to follow a pattern. Finally, in the largest districts, the math ICCs are higher in the high-poverty districts for the lower grades and are generally higher for reading in most grades except eighth.

Variation in Estimated ICCs

In Tables 2 –11, we also report the variation in ICCs for grade and district size combinations as the standard deviation $\hat{τ}$ . We tested this estimate against the null hypothesis that τ = 0 and marked estimates with less than a 5% chance of Type I error in rejecting the null hypothesis that τ = 0; in other words, we follow standard practice and mark statistically significant variance. For results of all districts (i.e., ignoring district size), we universally found significant variation in ICCs. This held consistently for the largest districts, although we found less consistent evidence of variation in ICCs for the smaller districts. In many cases, our estimate of τ was negative and truncated at 0. In such cases, we entered the superscript letter “^a” into the table.

Discussion

In this study, we provide expected ICCs by grade and subject in a variety of contexts that will also be of interest to evaluation researchers, namely district size, urbanicity, and poverty status. Although we generally found expected patterns, the smaller districts presented a picture that was less consistent. Perhaps the sampling variability associated with smaller districts creates difficulty in uncovering patterns of results or perhaps such patterns do not exist.

To our knowledge, this is the first investigation of the distribution of within-district ICCs. One of the more important findings of this study is simply that these ICCs tend to be quite small for earlier grades in the smaller districts. This is particularly important for planning interventions in these settings because pretests on academic achievement that might be used as covariates to improve statistical power are less frequently available in administrative data on younger children. Given the small ICC estimates, the practice of spending resources of pretests may not be necessary.

Finally, it is worth noting that the more heterogeneous districts, in terms of expected ICCs, are the largest districts. This is not surprising, given the diversity found in large urban areas. It does, however, highlight the need to utilize local data when available. Although previous publications have provided such data from the handful of large urban districts that have been studied in previous evaluations, our results provide some guidance in other situations where little data are available.

Limitations

We have two limitations of this study to outline. The first limitation was the amount of data available to produce ICC estimates to analyze. To be sure, we employ a relatively large amount of data compared to most education studies, but for this article the unit of analysis is the district. With 11 states, we only have between 400 and 780 districts per grade to analyze. Thus, more detailed breakdowns by urbanicity and poverty produce too few estimates in certain cells to produce reliable means. Until more data are available, the conservative approach would be to employ the larger of available ICCs for more targeted sample (e.g., impoverished urban areas). Finally, we would also like to have a more detailed urbanicity breakdown, but again we are restricted by our sample size.

Conclusion

We have presented empirical evidence about design parameters useful in planning CRT experiments that used academic achievement as outcomes and where districts of a particular size or type are employed. Our estimates are means derived from random effects meta-analyses and are presented along with standard errors that provide some sense of the sampling error inherent in these estimates. We now turn to the question of how best to use these design parameters in planning CRTs and what some limitations might be.

The ICC values reported in this tabulation differ from the national values reported by Hedges and Hedberg (2007a) and their more recent work (2014). Although the evidence reported in this article is based on near-census data from several state longitudinal data systems, it is data from only 11 states and only Grades 3–8. Although our estimates should be more relevant for some studies than national estimates like those of Hedges and Hedberg (2007a), there are significant heterogeneities across large districts. However, for certain applications, the results in this article may prove more useful than estimates derived from models that pool results across states. This study has also revealed that the distribution of within-district school-level ICCs is highly skewed, with numerous very small ICCs and a small number of large values. However, meta-analyses reveal that small districts have ICCs of relatively uniform size, with measures of variation seldom statistically differing from zero. A final caution is that the estimates reported here are based on state assessments and thus would be less relevant to studies using achievement tests that are not aligned with instruction.

Example Power Analyses for CRTs

Putting such limitations aside, these values can be used with several pieces of software designed for multilevel power computations, including Optimal Design for Windows (Spybrook, Raudenbush, Liu, Congdon, & Martínez, 2006), RDPOWER for Stata (Hedberg, 2012), and commercial software such as CRT-Power (Borenstein, Hedges, & Rothstein, 2012). Here, we provide an example with immediate commands in Stata.

Suppose we were to perform an experiment that impacts mathematics achievement of third graders in eight large school districts, with each district having 12 schools. We plan to collect data on 30 students in each of these 96 schools. Power to detect an effect size of 0.2 is computed using the noncentrality parameter (Equation 4) by entering the following into Stata:

. scalar K = 12

. scalar J = 8

. scalar n = 30

. scalar es = .2

. scalar rho = .118

. display nFtail(1, K*(J-2),(K*J*es^2)/(4*(rho+((1-rho)/n))), invFtail(1, K*(J-2),.05)).

The result gives the statistical power of a two-tailed test at the .05 level of significance as .71.

Summary

The main finding of this study is that district size matters. In some cases, employing smaller districts (and thus fewer schools) may yield better power because the ICCs are that much smaller. While which districts participate in a study is rarely under investigator control, suppose we are designing a study for third-grade math and the choice is between using 4 medium districts with 10 schools each (ICC = .031) and 4 large districts with 20 schools each (ICC = .118). Holding other factors constant, and assuming a fixed effects design, the smaller districts yield slightly more power for each effect size. Of course, employing a single large district, even with a larger ICC, may have the practical benefit of only having to recruit a single education agency.

Footnotes

Authors’ Note

The opinions expressed are those of the authors and do not necessarily represent the views of the Institute or the U.S. Department of Education.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The research reported here was supported by the Institute of Education Sciences, U.S. Department of Education, through Grant R305D110032, NORC at The University of Chicago.

Notes

References

Agodini

Harris

Thomas

Murphy

Gallagher

(2010). Achievement effects of four early elementary school math curricula: Findings for first and second graders (NCEE 2011–4001). Washington, DC: U.S. Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance.

Bloom

H. S.

(2005). Learning more from social experiments: Evolving analytic approaches. New York, NY: Russell Sage Foundation.

Bloom

H. S.

Bos

J. M.

Lee

S.-W.

(1999). Using cluster random assignment to measure program impacts statistical implications for the evaluation of education programs. Evaluation Review, 23, 445–469.

Bloom

H. S.

Richburg-Hayes

Black

A. R.

(2007). Using covariates to improve precision for studies that randomize schools to evaluate educational interventions. Educational Evaluation and Policy Analysis, 29, 30–59.

Bloom

H. S.

Zhu

Jacob

Raudenbush

Martinez

Lin

(2008). Empirical issues in the design of group-randomized studies to measure the effects of interventions for children. New York, NY: MDRC.

Borenstein

Hedges

Rothstein

(2012). CRT-Power. Teaneck, NJ: Biostat.

Borenstein

Hedges

L. V.

Higgins

J. P.

Rothstein

H. R.

(2011). Introduction to meta-analysis. Wiley.com.

Bottge

B. A.

Grant

T. S.

Stephens

A. C.

Rueda

(2010). Advancing the math skills of middle school students in technology education classrooms. NASSP Bulletin, 94, 81–106.

Bradshaw

C. P.

Mitchell

M. M.

Leaf

P. J.

(2010). Examining the effects of schoolwide positive behavioral interventions and supports on student outcomes results from a randomized controlled effectiveness trial in elementary schools. Journal of Positive Behavior Interventions, 12, 133–148.

10.

Brandon

P. R.

Harrison

G. M.

Lawton

B. E.

(2013). SAS code for calculating intraclass correlation coefficients and effect size benchmarks for site-randomized education experiments. American Journal of Evaluation, 34, 85–90.

11.

Calderón

Slavin

Sánchez

(2011). Effective instruction for English learners. The Future of Children, 21, 103–127.

12.

Fantuzzo

J. W.

Gadsden

V. L.

McDermott

P. A.

(2011). An integrated curriculum to improve mathematics, language, and literacy for Head Start children. American Educational Research Journal, 48, 763–793.

13.

Fisher

R. A.

(1925). Statistical methods for research workers. New Delhi, India: Genesis.

14.

Fulmer

S. M.

Frijters

J. C.

(2011). Motivation during an excessively challenging reading task: The buffering role of relative topic interest. The Journal of Experimental Education, 79, 185–208.

15.

Gersten

Dimino

Jayanthi

Kim

J. S.

Santoro

L. E.

(2010). Teacher study group impact of the professional development model on reading instruction and student outcomes in first grade classrooms. American Educational Research Journal, 47, 694–739.

16.

Goodson

Wolf

Bell

Turner

Finney

P. B.

Garcia

(2011). The Effectiveness of a Program to Accelerate Vocabulary Development in Kindergarten (VOCAB). ): First grade follow-up impact report and exploratory analyses of kindergarten impacts. Final Report (NCEE 20104014). Washington, DC: U.S. Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance.

17.

Hamre

B. K.

Pianta

R. C.

Burchinal

Field

LoCasale-Crouch

Downer

J. T.

… Scott-Little

(2012). A course on effective teacher-child interactions effects on teacher beliefs, knowledge, and observed practice. American Educational Research Journal, 49, 88–123.

18.

Hayes

R. J.

Moulton

L. H.

Press

(2009). Cluster randomised trials. London, England: CRC press.

19.

Hedberg

E. C.

(2012). RDPOWER: Stata module to perform power calculations for random designs. Statistical Software Components. S457260, Boston College Department of Economics, revised 12 Feb 2012.

20.

Hedges

L. V.

Hedberg

E. C.

(2007a). Intraclass correlation values for planning group-randomized trials in education. Educational Evaluation and Policy Analysis, 29, 60–87.

21.

Hedges

L. V.

Hedberg

E. C.

(2007b). Intraclass correlations for planning group randomized experiments in rural education. Journal of Research in Rural Education, 22, 22–10.

22.

Hedges

L. V.

Hedberg

E. C.

(2011). Variance Almanac of Academic Achievement 2013. Retrieved from https://arc.uchicago.edu/reese/variance-almanac-academic-achievement

23.

Hedges

L. V.

Hedberg

E. C.

(2014). Intraclass Correlations and Covariate Outcome Correlations for Planning Two and Three-Level Cluster-Randomized Experiments in Education. Evaluation review, 37(6), 445–489.

24.

Hedges

L. V.

Rhoads

C. H.

(2011). Correcting an analysis of variance for clustering. British Journal of Mathematical and Statistical Psychology, 64, 20–37.

25.

Hedges

L. V.

Vevea

J. L.

(1998). Fixed-and random-effects models in meta-analysis. Psychological Methods, 3, 486.

26.

Isenberg

Glazerman

Bleeker

Johnson

Lugo-Gil

Grider

… Britton

(2009). Impacts of comprehensive teacher induction: Results from the Second Year of a Randomized Controlled Study. NCEE 2009-4072. Washington, DC: U.S. Department of Education, National Center for Educational Evaluation and Regional Assistance, Institute of Education Sciences, 2008.

27.

Keaton

(2012). Documentation to the NCES Common Core of Data Public Elementary/Secondary School Universe Survey: School Year 2010–11 (NCES 2012-338rev). U.S. Department of Education. Washington, DC: National center for Education Statistics.

28.

Kim

J. S.

Capotosto

Hartry

Fitzgerald

(2011). Can a mixed-method literacy intervention improve the reading achievement of low-performing elementary school students in an after-school program? Results from a randomized controlled trial of READ 180 enterprise. Educational Evaluation and Policy Analysis, 33, 183–201.

29.

Lane

K. L.

Harris

Graham

Driscoll

Sandmel

Morphy

… Schatschneider

(2011). Self-regulated strategy development at tier 2 for second-grade students with writing and behavioral difficulties: A randomized controlled trial. Journal of Research on Educational Effectiveness, 4, 322–353.

30.

Laura

McMeeking

Orsi

Cobb

R. B.

(2012). Effects of a teacher professional development program on the mathematics achievement of middle school students. Journal for Research in Mathematics Education, 43, 159–181.

31.

Marley

S. C.

Levin

J. R.

Glenberg

A. M.

(2010). What cognitive benefits does an activity-based reading strategy afford young native American readers? The Journal of Experimental Education, 78, 395–417.

32.

Marley

S. C.

Szabo

Levin

J. R.

Glenberg

A. M.

(2011). Investigation of an activity-based text-processing strategy in mixed-age child dyads. The Journal of Experimental Education, 79, 340–360.

33.

McQuillin

Smith

Strait

(2011). Randomized evaluation of a single semester transitional mentoring program for first year middle school students: A cautionary result for brief, school-based mentoring programs. Journal of Community Psychology, 39, 844–859.

34.

Olson

C. B.

Kim

J. S.

Scarcella

Kramer

Pearson

van Dyk

D. A.

… Land

R. E.

(2012). Enhancing the interpretive reading and analytical writing of mainstreamed English learners in secondary school results from a randomized field trial using a cognitive strategies approach. American Educational Research Journal, 49, 323–355.

35.

Phelan

Choi

Vendlinski

Baker

Herman

(2011). Differential improvement in student understanding of mathematical principles following formative assessment intervention. The Journal of Educational Research, 104, 330–339.

36.

Raudenbush

S. W.

(1997). Statistical analysis and optimal design for cluster randomized trials. Psychological Methods, 2, 173–185.

37.

Reis

S. M.

McCoach

D. B.

Little

C. A.

Muller

L. M.

Kaniskan

R. B.

(2011). The effects of differentiated instruction and enrichment pedagogy on reading achievement in five elementary schools. American Educational Research Journal, 48, 462–501.

38.

Rose

R. A.

Woolley

M. E.

Orthner

D. K.

Akos

P. T.

Jones-Sanpei

H. A.

(2012). Increasing teacher use of career-relevant instruction a randomized control trial of careerstart. Educational Evaluation and Policy Analysis, 34, 295–312.

39.

Sarama

Clements

D. H.

Wolfe

C. B.

Spitler

M. E.

(2012). Longitudinal evaluation of a scale-up model for teaching mathematics with trajectories and technologies. Journal of Research on Educational Effectiveness, 5, 105–135.

40.

Schochet

P. Z.

(2008). Statistical power for random assignment evaluations of education programs. Journal of Educational and Behavioral Statistics, 33, 62–87.

41.

Slavin

R. E.

Cheung

Holmes

Madden

N. A.

Chamberlain

(2012). Effects of a data-driven district reform model on state assessment outcomes. American Educational Research Journal.

42.

Springer

M. G.

Pane

J. F.

V.-N.

McCaffrey

D. F.

Burns

S. F.

Hamilton

L. S.

Stecher

(2012). Team pay for performance experimental evidence from the round rock pilot project on team incentives. Educational Evaluation and Policy Analysis, 34, 367–390.

43.

Spybrook

Puente

A. C.

Lininger

(2011). An examination of the impact of changes in federal policies on the landscape of educational research in the USA. Effective Education, 3, 83–88.

44.

Spybrook

Raudenbush

S. W.

(2009). An examination of the precision and technical accuracy of the first wave of group-randomized trials funded by the institute of education sciences. Educational Evaluation and Policy Analysis, 31, 298–318.

45.

Spybrook

Raudenbush

S. W.

Liu

X.-F.

Congdon

Martínez

(2006). Optimal design for longitudinal and multilevel research: Documentation for the “Optimal Design” software. Ann Arbor, MI: Survey Research Center of the Institute of Social Research at University of Michigan.

46.

VanDerHeyden

McLaughlin

Algina

Snyder

(2012). Randomized evaluation of a supplemental grade-wide mathematics intervention. American Educational Research Journal, 49, 1251–1284.

47.

Vaughn

Klingner

J. K.

Swanson

E. A.

Boardman

A. G.

Roberts

Mohammed

S. S.

Stillman-Spisak

S. J.

(2011). Efficacy of collaborative strategic reading with middle school students*. American Educational Research Journal, 48, 938–964.

48.

Vaughn

Wexler

Roberts

Barth

A. A.

Cirino

P. T.

Romain

M. A.

… Denton

C. A.

(2011). Effects of individualized and standardized interventions on middle school students with reading disabilities. Exceptional children, 77, 391–407.

49.

Westine

C. D.

Spybrook

Taylor

J. A.

(2014). An empirical investigation of variance design parameters for planning cluster-randomized trials of science achievement. Evaluation Review. doi:0193841X14531584

50.

Wirkala

Kuhn

(2011). Problem-based learning in K–12 education is it effective and how does it achieve its effects? American Educational Research Journal, 48, 1157–1186.

51.

Wolf

Gutmann

Puma

Kisida

Rizzo

Eissa

Carr

(2010). Evaluation of the DC opportunity scholarship program: Final report. NCEE 2010-4018. Washington, DC: National Center for Education Evaluation and Regional Assistance.