Abstract
This study investigated the relationship of teachers’ reading knowledge with students’ reading achievement using a direct teacher knowledge assessment rather than indirect proxies (e.g., certification). To address the inequitable distribution of teachers’ knowledge resulting from differences in teachers’ backgrounds and the disparities in how schools attract and cultivate knowledge, the study developed multilevel propensity score methods to identify comparable teachers on the basis of both teacher and school backgrounds. Results suggest that schools are complexly associated with differences in teachers’ knowledge and that comparisons which ignore the relevance of schools may be misleading. By comparing teachers with similar personal and school backgrounds, results show measured knowledge is significantly associated with students’ achievement in reading comprehension but not word analysis. The findings support policies which leverage school capacities to develop the specialized knowledge needed for teaching reading.
T
In the area of reading, indices of teachers’ knowledge used to estimate teacher quality have primarily relied upon indirect Darling-Hammond, 2000; measures or proxies of knowledge relied than direct assessments of knowledge about reading. For instance, research has considered qualifications such as attainment (e.g., the SAT), certification types, advanced degrees in reading-related areas, or years of teaching experience (e.g., Ballou, 1996; Darling-Hammond, 2000; Ehrenberg & Brewer, 1995; Hanushek, Kain, O’Brien, & Rivkin, 2005). Similarly, measures thought to be more proximal to such knowledge, such as the number of courses or major, have also been considered (e.g., Croninger, Rice, Rathbun & Nishio, 2003).
In general, this line of research has generated inconsistent associations between such proxies of teachers’ knowledge and students’ achievement. Although some studies have found these measures to be related to students’ achievement, the magnitude and significance of these relationships vary considerably (e.g., Croninger et al., 2003; Darling-Hammond, 2000; Goldhaber & Brewer, 2000). Collectively, such research suggests that the formation of teachers’ knowledge does not necessarily stem from a common set of experiences and that simple proxies may not adequately capture differences among teachers in their knowledge (Nye, Konstantopoulous, & Hedges, 2004).
To this end, relating students’ achievement and teachers’ knowledge using such coarse, one-size-fits-all proxies induces a significant potential for measurement error. For example, teachers’ certifications or degrees are often used as substitutes for the (teachers’) knowledge thought to be driving students’ achievement. Yet, because teachers’ knowledge is likely brought about through heterogeneous pathways (e.g., through PD in addition), such proxies may not be indicative of high knowledge for large portions of teachers. Furthermore, because such proxy qualifications tend to be measured nominally, the relationship between students’ achievement and teachers’ knowledge may also be attenuated.
Similarly, identifying comparable teachers solely on the basis of their backgrounds implicitly assumes that knowledge stems from only individual experiences. Such an assumption disregards the role of schools and the interplay between schools and teachers. It implies that despite operating in different schools (and for different lengths of time), teachers who have similar personal backgrounds (e.g., certification, experience, etc.) will necessarily remain directly comparable on their knowledge levels. Whether the school has a direct role in cultivating this knowledge or just attracting it, this absence of school relevance would also suggest that empirically, teachers’ knowledge is equally distributed among schools.
Yet research has indicated that most indices tend to show that teachers’ quality is clustered or overrepresented in certain schools (Hirsch, 2008). For example, differing amounts and foci of PD initiated by schools may contribute to systematic differences in teachers’ knowledge and its distribution across schools. Despite the potential for clustering and interplay between teachers and schools, inquiries surrounding teachers’ knowledge have generally muted the role of schools when matching comparable teachers.
Research on teachers’ knowledge about reading has rarely considered more direct measures of their subject-specific knowledge in a way that allows for knowledge acquisition to stem from different experiences and interactions. In particular, there is good reason to suspect that more direct measures of teachers’ subject-specific and pedagogical knowledge coupled with methods that attend to the complex distribution of teachers’ knowledge among schools could provide a better index of teachers’ knowledge as it relates to students’ achievement.
Using a measure constructed to assess early reading content and situated knowledge, this study examined the extent to which teachers’ knowledge about early reading contributed to their students’ achievement in reading in Reading First (RF) classrooms. Studies of components of teacher quality in this population are particularly critical, because RF schools tended to exhibit low achievement and high poverty, which are also characteristics of schools that are most likely to be staffed by the least qualified teachers (Darling-Hammond, 2004). In addition, because literature has suggested the clustering of high-quality teachers in more affluent schools, included was a second question that examined the extent to which measured knowledge is clustered in schools (Hirsch, 2008; Weiler & Mitchell, 1992).
To address these questions, I developed methods to identify and address the uneven distribution of teachers’ reading knowledge among schools. That is, to more effectively identify similar teachers, I examined how teacher characteristics as well as the schools they serve in were associated with changes in knowledge. To understand the potential for schools to attract or augment teachers’ knowledge, I studied how schools were associated with increased teachers’ knowledge and how they modified the predictive capacity of teacher-level covariates.
Although methods, such as those based on propensity scores (PSs), have been extensively developed to identify comparable individuals (e.g., teachers), little focus has been set on identifying individuals who are comparable in both their backgrounds as well as how they interact with the groups (e.g., schools) they serve in. I addressed such influence by examining three different types of interactions between schools and teachers and developed corresponding PSs. In turn, I used these methods to explore how teacher and school characteristics and choices come together to distribute and/or form teachers’ knowledge in an imbalanced way. Subsequently, I used the structure of this imbalance to identify comparable teachers on the basis of both teacher backgrounds and the backgrounds of the schools and classrooms they serve in.
Using sets of comparable teachers, I evaluated the relationship between teachers’ reading knowledge and students’ reading achievement in grade one as measured by the Iowa Test of Basic Skills (ITBS) Reading Comprehension and Word Analysis subtests. The results first indicated that teachers’ reading knowledge is clustered in and differentially associated with schools. Second, the results suggested that students with high-knowledge teachers retain a small, yet significant, bump up in reading comprehension but an insignificant increment in word analysis. Through this research, I intended to contribute to a number of ongoing policy-relevant questions, including the relevance of teachers’ knowledge on students’ achievement, the distribution of teachers’ knowledge and quality among schools, pathways by which such knowledge may develop (e.g., coursework, experience, PD, certifications) and the schools’ roles in these pathways.
Background
Teachers’ Reading Knowledge
In studying the effects of teachers’ knowledge about reading on students’ reading acquisition, an important problem is identifying the types and extent of knowledge that teachers need to hold. This problem is particularly complex for teachers of early reading, because, unlike mathematics and science, it is difficult to identify the necessary core content of instruction. Early reading instruction focuses on students’ acquisition of processes of word reading and comprehension.
Teachers’ knowledge in this area might be thought of as incorporating both an understanding of the process of reading and the methods by which children acquire skill in this process. In content domains other than reading, measures of teachers’ knowledge have focused on pedagogical content knowledge, a construct that involves the intersection of the knowledge that teachers’ need to impart and the methods used to convey this knowledge to students (Shulman, 1986). Yet many prior theoretical frameworks and previous empirical studies of reading have focused solely on the extent to which teachers hold content knowledge only.
Perhaps the most common approach to measuring both the pedagogical and content knowledge of early grades reading teachers has focused on teachers’ knowledge of the linguistic foundations of early reading (e.g., Moats, 1994, 1999). Here, measurement of teachers’ knowledge focuses primarily on relations between the spoken and written aspects of language, about the sound structure of words, and about related topics in grammar, morphology, orthography and semantic knowledge (e.g., Piasta, Connor, Fishman, & Morrison, 2009). In the area of early reading, it can be reasoned that this linguistic knowledge and its use is the specialized content knowledge that teachers of early reading must hold to be effective in teaching early reading.
Recent Studies of Teachers’ Knowledge
Studies investigating teachers’ reading knowledge in the early grades have focused on several types of interrelated research questions about the teaching of early elementary reading. Some studies have examined the extent to which teachers actually know about the linguistic foundations of early reading; others have investigated whether specific staff development programs can increase teachers’ knowledge in this domain; still others have asked whether increasing teachers’ linguistic knowledge leads to more emphasis on explicit instruction in phonemic awareness, phonics, or other code-related aspects of reading; and yet other studies have examined whether teachers with greater knowledge in this area have a more positive impact on students’ reading achievement (e.g., Bos, Mather, Narr, & Babur, 1999; Foorman & Moats, 2004; Garet et al., 2008; McCutchen, Abbott, et al., 2002; McCutchen, Harry, et al., 2002; Moats & Foorman, 2003; Spear-Swerling & Brucker, 2003, 2004).
Although the number of studies addressing teachers’ knowledge–related research questions has grown over the past decade, the findings from this body of research have not always been clear. Although there seems to be consistent evidence indicating that the average teacher of reading in the early grades lacks strong knowledge about the linguistic foundations of reading, little research has been able to link such levels of knowledge to students’ reading achievement (e.g., Bos, Mather, Dickson, Podhajski, & Chard, 2001; McCutchen, Abbott et al., 2002; Moats, 1994).
Of those studies that focused on the contribution of teachers’ reading knowledge to students’ reading achievement, most had methodological features that make it difficult to infer the unique teacher knowledge effects. Specifically, this line of study has tended to focus on proxies or composite measures of teachers’ knowledge and assessed effects without adjusting for salient differences. For instance, studies have approximated the effect of teachers’ knowledge on students’ achievement by proxy by assessing the effect of participation in PD on students’ achievement (Bos et al., 1999; McCutchen, Abbott, et al., 2002).
In addition, assessments of the contribution of teachers’ knowledge to students’ achievement have largely failed to take into account the co-occurrence of knowledge with other important teacher, class, and school characteristics (e.g., Moats & Foorman, 2003). That is, estimation of a teacher knowledge effect from observational data is complicated because students’ exposure to teachers with high knowledge may be confounded with other teacher, class, and schools characteristics. So the unique contribution of teachers’ knowledge is still intertwined with even common alternative explanations.
Inequities in Distribution
Previous studies assessing the contribution of teachers’ reading knowledge to students’ reading achievement have additionally overlooked the roles of schools and the potential clustering of high-knowledge teachers. More specifically, because literature surrounding teacher quality has suggested that the distribution of highly qualified teachers is imbalanced among schools, there is good reason to suspect that this imbalance extends to teachers’ knowledge (e.g., Darling-Hammond, 2000; Ferguson, 1997; Haycock, 1998; Hirsch, 2008). In particular, lower quality teachers (e.g., uncertified, scored poorly on college and licensure exams) tend to cluster or be overrepresented in low-income schools (e.g., Jerald, 2002; Krei, 1998). Such inequities in teacher quality persist beyond financial resources as well. For example, economically disadvantaged White children tend to retain a higher likelihood of having well-qualified teachers compared with Black children with similar economic disadvantage (e.g., Kain & Singleton, 1996; Langford, Loeb, & Wyckoff, 2002). Similar findings have appeared with teachers’ content knowledge as well. Black and Latino children are far more likely to be taught by teachers with low content knowledge (e.g., scored poorly on teacher standardized tests, not certified in subject area) than their White peers (Ingersoll, 2001; Kain & Singleton, 1996).
Such disparities among schools in the quality of their teachers are frequently influenced by school-level factors (Hirsch, 2008; Leithwood, 2006). For instance, because more resourceful schools tend to hire teachers far earlier than hard-to-staff schools, hard-to-staff schools tend to fill significantly higher percentages of their positions with (the remaining) underqualified teachers (e.g., lower grade point averages, less experienced; Levin & Quinn, 2003). Furthermore, this distributional gap between more and less affluent schools likely persists even after hiring, as the climates cultivated by schools may provide avenues for schools to modify teachers’ capacities (e.g., Leithwood, 2006; Newman & Wehlage, 1995). For instance, there is a growing body of evidence that teachers can be directly taught linguistic knowledge through PD (e.g., Foorman & Moats, 2004; Garet et al., 2008). Because schools generally choose the depth and scope of the PD for all of their respective teachers, PD provides a systematic and sustained avenue for schools to modify their teachers’ knowledge. Similarly, the collective capacity of schools to, for example, cultivate collaborative communities and channel staff efforts provides another persistent avenue for schools to influence their respective teachers (Newman & Wehlage, 1995).
Collectively, the initial and sustained differences across schools (e.g., hiring, PD, collaboration) that are shared among teachers from the same school are strong reasons to suspect both that teacher clustering exists and that there are school-based avenues to systematically support knowledge augmentation. If these differences take root, their sum may produce coordinated and sustained disadvantages for both teachers and students at underserved schools. Such disadvantages contain important policy implications concerning schools’ roles in shaping and improving teachers’ knowledge and quality. The reach of such differences should extend to how we identify comparable sets of teachers in evaluating the relationship between this knowledge and students’ achievement. That is, no longer can we assume that two teachers with similar personal backgrounds are necessarily comparable. Rather, because schools are likely to be complexly associated with changes in teachers’ knowledge, the comparability of teachers should be based on both teachers’ backgrounds and the schools they serve in.
Current Study
The current study was designed to address limitations of previous studies, the goal being to assess the effect of teachers’ reading knowledge using a direct measure of relevant knowledge and methods that attend to the complex distribution of teachers’ knowledge. To address these limitations, I first used a measure designed to draw on knowledge about reading that early elementary teachers use as they teach children to read words and comprehend texts. Questions centered on situations that teachers might encounter when teaching reading in the early elementary years (Carlisle, Johnson, Phelps, & Rowan, 2008). Second, to address complex differences among teachers, I attempted to identify and match similar sets of teachers. Contrasting teachers and classrooms that are similar in distribution with respect to observed and unobserved characteristics is necessary to eliminate confounding bias, whereas blocking on similar students who were exposed to different levels of teachers’ knowledge is generally insufficient and unhelpful (e.g., Raudenbush, 2003). 1
To identify similar teachers, I first sought to understand the extent to which teacher and school factors might be associated with higher levels of teacher knowledge. In particular, I hypothesized that schools may be associated with teachers’ knowledge in complex and varying ways such that the predictive capacities of covariates vary across schools. For instance, more resourceful schools may be associated with increased teachers’ knowledge, and furthermore, these schools might provide a highly collaborative environment that, in turn, attenuates (e.g., make less relevant) the predictive capacity of teacher-level covariates. In other words, that which is indicative of a high-knowledge teacher at one school may be very different than that of another school.
Methods
Sample
The data for this study were derived from the RF program in Michigan. This program provides funding to support improvement of reading instruction in kindergarten through Grade 3 in school districts with high levels of poverty and underachievement in reading. Of the state RF population, approximately half of Grade 1 teachers volunteered to allow researchers to investigate the effects of teachers’ knowledge on students’ reading achievement and had sufficient student data. Collectively, this sample consisted of 298 teachers who instructed 5,746 students and were nested in 139 schools. 2 Although the full population of Michigan RF teachers could not be used in the presentation of this study, the data were available for both the population and research sample. This allowed comparisons of the characteristics of the two groups to determine the extent to which the volunteer sample differed from the larger population of teachers. On nearly all measures, the two groups were minimally different, and no differences were significant. One major exception was the difference between the two groups was on the measure of teachers’ knowledge. On this measure, the volunteers scored significantly higher than the full population. As a result, the ability to generalize to RF teachers in Michigan is very limited. Moreover, because this study focused on RF teachers, who likely differ substantially from non-RF teachers, inferences are further limited and are focused only on volunteer Michigan RF teachers. Tables 1 and 2 describe and compare the characteristics of those who volunteered for research in the current study and the general state RF population.
Selected Student Characteristics of State Population and Analytic Sample
Note. DIBELS = Dynamic Indicators of Basic Early Literacy Skills; NWF = Nonsense Word Fluency; RC = Reading Comprehension; WA = Word Analysis.
Selected Teacher Characteristics of State Population and Analytic Sample
Note. RF = Reading First.
Measures
The primary source of data concerning teachers was a self-administered questionnaire focusing first on teacher experience, certification, education and the study’s measure of teachers’ knowledge, Teachers’ Knowledge of Reading and Reading Practices (TKRRP; Carlisle et al., 2008). This questionnaire also included a measure of professional training, because teachers who were in the program the previous year (84%) received extensive and variable PD as part of RF program.
TKRRP was developed as part of the evaluation of RF in Michigan and was not based on the contents of any particular program of PD. Rather, the measure centered on experts’ judgments of the knowledge that teachers of early reading needed to know to be effective. On the basis of prior research, items tapping such knowledge were mapped along two primary dimensions: knowledge relevant to the teaching of word reading and knowledge relevant to the teaching of reading comprehension (Garet et al., 2008; Phelps & Schilling, 2004). Furthermore, drawing on insights from fields beyond reading, construction focused on measuring knowledge of pedagogy and student learning such that knowledge was situated in classroom practices (Hill, Rowan, & Ball, 2005). As a result, items were designed around classroom and student scenarios that early elementary teachers encounter. Specifically, items focused on oral language, reading, and writing activities in the domains of word reading (e.g., phonemic awareness, letter sound relationships, spelling patterns) and comprehension (e.g., morphology, text analysis, fluency). The measure consisted of 22 items and was centered primarily on knowledge important to Grade 1 instruction (Carlisle et al., 2008; Phelps & Schilling, 2004).
Student-level data were derived from the state student database and focused on student characteristics and backgrounds. The outcome measures for this study were the ITBS standardized subtests concerning word comprehension and reading comprehension, published by Riverside Publishing. The Word Analysis subtest involves identifying and matching sounds and spelling elements of words. The Reading Comprehension involves selecting responses to questions that followed short passages. The measure of students’ performance was the developmental standard score (Hoover, Dunbar, Frisbee, et al., 2003). As reported by Riverside, the reliability (computed using Kuder-Richardson Formula 20) for Grade 1 is .85 for Word Analysis and .91 for Reading Comprehension.
Because students were not assessed using the ITBS measures in kindergarten, my measure of prior achievement was the Dynamic Indicators of Basic Early Literacy Skills (DIBELS) assessment at the beginning of first grade. DIBELS is a set of fluency measures of early reading skills used to assess elementary students’ progress in reading. The Grade 1 DIBELS measure, Nonsense Word Fluency (NWF), entails decoding two- or three-letter nonsense words on a printed page; credit is given for the number of letters correctly decoded in 1 minute (Assessment Committee, 2002). The median alternate-form reliability of the DIBELS NWF measure for first graders was .83, whereas the concurrent validity for those who had also taken the Woodcock-Johnson Readiness had a median of .51 (Assessment Committee, 2002). Finally, beyond aggregates, measures of school characteristics were drawn from the Michigan Department of Education website.
Analytic Approach
Applying Rubin’s causal model (e.g., Holland, 1986) to study the effect of increasing doses of knowledge, define the set of potential outcomes for each intact classroom as the classroom’s potential achievement after being exposed to each dose or level of knowledge. That is, if we allow Z to be the set of potential knowledge levels and Y to index the achievement for classroom j in response to knowledge level z, then
To identify this effect, reliance on two standard assumptions of Rubin’s causal model is needed. First, we take up the stable-unit-treatment-value assumption as applied to multilevel settings such that we assume potential responses are relatively stable given intact classrooms (Hong & Raudenbush, 2006; Rubin, 1986). Under this formulation, results generalize to a population of existing first grade classrooms similar to those that volunteered.
Second, we assume that all confounders, at all levels, have been measured such that it preserves pairwise independence of knowledge levels with each of the potential outcomes (i.e., weak unconfoundedness; e.g., Imbens, 2000). That is, identification of the average effect here only requires weak unconfoundedness rather than independence of the complete set of potential outcomes and knowledge levels (i.e., strong unconfoundedness). Formally, if we let I(z) for class j be defined as Ij(z) = 1 if Zj = z and 0 otherwise and X be the pretreatment confounders such that
then the expected value of Y( z ) adjusting for X is (e.g., Imbens, 2000):
As a result, the average outcomes are estimated by averaging these conditional means,
with the average difference between two levels 1 standard deviation apart,
representing the estimand.
One increasingly common approach to adjudicate confounding explanations in high-dimensional observational data is to construct a balancing score on the basis of the PS (Rosenbaum & Rubin, 1983). PS methods adjust for confounding by using a model of teachers’ reading knowledge to identify groups of similar teachers on the basis of their likelihoods for knowledge given observed covariates. Literature has shown that conducting analyses using PS subsets can improve estimation and increase robustness to a number of standard assumptions (e.g., Imai & Van Dyk, 2004). Literature has developed multiple methods to infer such treatment likelihoods from the observed data to identify comparable units or teachers (e.g., Lee, Lessler, & Stuart, 2009; McCaffrey, Ridgeway, & Morral, 2004). Furthermore, the PS has been extended in a number of ways to address, for example, categorical, ordinal, and continuous treatments (e.g., Imai & Van Dyk, 2004; Imbens, 2000; Joffe & Rosenbaum, 1999).
Of relevance in the current study is the generalized PS or the propensity function (PF) (Imai & Van Dyk, 2004). The PF is the conditional probability of the observed level of knowledge given the appropriate covariates. Allow the PF to be denoted as
with associated parametric model
where ψ parameterizes this distribution, and Z represents teachers’ knowledge with observed level z and covariates X (Imai & Van Dyk, 2004). Furthermore, this framework presumes a unique parameter, θ, such that the PF depends on the covariates only through some function θψ(X). More concretely, if the conditional distribution of teachers’ knowledge given the covariates is correctly described by some function, then the result is a scalar balancing score.
When the distribution of Z is continuous and approximately normal, literature has considered the Gaussian density function. We can describe parameters of the distribution with the mean and variance [ψ = (β, σ2)] and use the ordinary least squares estimator [θψ(X) = XTβ] to estimate the PF. If the conditional independence of the potential outcomes and treatment assignment given the measured confounders holds, this implies the same conditional independence given the PF (Imai & Van Dyk, 2004). As a result, matching or stratifying on this score, for example, identifies teachers with similar knowledge likelihoods and backgrounds to alleviate measured bias.
Despite such extensions, such literature has been relatively scarce concerning those structures common to educational studies. Specifically, the multilevel structure of educational data presents properties atypical to common applications of the PS. For example, in the current context, there is likely a complex interplay between teachers and schools that potentially confounds interpretations of the association between knowledge and students’ achievement. In other words, because schools may change teachers’ likelihood to have high knowledge, comparing teachers on the sole basis of their personal backgrounds using a single-level PS may not fully address alternative explanations. The importance of addressing such interplay and the extent to which processes differ across schools have been demonstrated in several articles examining, for example, the effect of dropping out of high school and the effect of kindergarten retention (e.g., Hong & Raudenbush, 2006; Hong & Yu, 2007; Rosenbaum, 1986).
Two principal PS approaches have been proposed in this setting. The first makes use of a canonical single-level PS and restricts comparisons to individuals within the same school (Rosenbaum, 1986). This approach addresses the varying influence of a school by blocking on school membership and thus on observed and unobserved school covariates. Use of single-level PSs that constrain comparisons to teachers within schools ostensibly alleviates residual bias from measured and unmeasured school-level factors. Such protection is often paired with at least two difficulties. In particular, restricting comparisons to within schools generally requires a large reservoir of comparable teachers within each school to ensure high-quality matches. This issue is especially germane in teacher quality studies, because most schools maintain a very limited number of teachers within a grade, potentially resulting in poor matching options.
The second difficulty is that the removal of school-level bias from omitted variables by matching within schools using a single-level PS is heavily dependent on how knowledge acquisition processes differ among schools. In particular, removal of school-level omitted variable bias is dependent on the extent to which the predictive capacities of teacher variables differ among schools. If only the average level of knowledge varies among schools (e.g., random intercept only) then matching within schools can be an effective strategy for eliminating bias from omitted school-level confounders. If additionally the magnitudes of teacher variable coefficients vary across schools (e.g., random intercept and slopes), then matching within schools on a single-level PS is ineffective (Kim & Seltzer, 2007). More specifically, such coefficient variation across schools indicates that the knowledge likelihood ought to be represented by a different PS equation for each school rather than a uniform PS equation for all schools. As a result, within-school matches based on a uniform equation across all schools can misidentify teachers’ likelihoods for knowledge. Theoretically, this limitation can be overcome by constructing individual PSs for each school. However, given small within-school teacher sample sizes, constructing such PSs would require that, at best, only a few covariates contribute to this likelihood.
The second approach makes use of multilevel models and potentially permits matches across schools (Hong & Raudenbush, 2006; Kim & Seltzer, 2007). This approach explicitly models how knowledge acquisition differs among schools. Because it takes into account both school membership and how the predictive capacities of covariates vary across schools, resulting PSs are more comparable across school boundaries. As a result, the reservoir of potential matches is much larger and often leads to higher quality matches.
Such advantages, though, are also paired with the assumptions of their own. First, because multilevel PSs permit matches across schools, they assume that all relevant school-level covariates have been measured and modeled. Second, they constrain estimation of effect size variability. Specifically, whereas matching within a school allows estimation of both the overall average effect of teachers’ knowledge as well as the variability of this effect across schools, without large assumptions and extrapolations matches that traverse school boundaries prevent the estimation of such across-school variability.
Estimation of Multilevel PSs
Driven by the hypothesis that schools are complexly and variably associated with teachers’ knowledge and the impracticality of matching teachers within schools, I explored if and how schools were associated with changes in teachers’ knowledge using single and multilevel models to estimate the PF. I examined three different generic paths in which schools may be associated with changes in teachers’ knowledge and discuss their implications in subsequent sections. The first, a single-level PF, is a canonical single-level model that specifies knowledge acquisition as a function of teacher and classroom (student-aggregated) covariates only:
where TK indicates the knowledge of teacher j in school k, X and β represent teacher and classroom covariates and coefficients, and u has a normal distribution with variance σ2. By ignoring schools, this path assumes that schools offer no information concerning their teachers’ levels of knowledge and that teachers’ levels of knowledge within a school are conditionally independent given only their personal and classroom characteristics.
Second, I considered a simple multilevel model with a random intercept (e.g., Hong & Raudenbush, 2006). This approach considers the influence of the teacher or classroom and the school, and in hierarchical form it is
where r0 is a single random effect for each school such that r0 ~ N(0, τ), W represent school covariates, and γ represent the corresponding school-level coefficients. Here, the prediction of teachers’ knowledge is improved upon by allowing school covariates (W) and membership (r0) to be associated with knowledge. Under this particular path, it would be assumed that schools are associated with a shift in knowledge but that the predictive nature of teacher and classroom covariates is constant across schools and each school has a constant and uniform influence on all of its teachers.
The third, a complex multilevel PS model, used random intercept and slopes. In hierarchical model form, we can specify this as
where r are now the multiple random (slope and intercept) school effects with r ~ MVN(0, τ) with the covariance, τ, matrix unstructured and empirically estimated to allow correlations across terms. In other words, this model considers teachers’ knowledge to be composed of teacher and classroom associations (X), a common school association (W, r0), a differential association on subgroups of teachers that is common across all schools (cross-level interactions), and a school-specific association on teacher subgroups (Xr). Compared with the simple multilevel PF, the prediction of teachers’ knowledge is improved upon by additionally allowing cross-level interactions and the predictive capacities of teacher and classroom covariates to vary across schools.
Whereas the single-level model assumes no variation at the school level, both multilevel mechanisms engage schools in the prediction process. Still, they do so in different and increasingly complex ways. A simple random intercept–only approach implies that only the average level of knowledge varies across schools, whereas the more complex random intercept and slopes approach implies that the average knowledge level varies and the magnitude of teacher slopes vary.
For a simple illustration, consider the following hypothetical scenario. Assume that schools choose their teachers on the basis of perceived teacher quality and that high-knowledge teachers tend to be clustered in more resourceful schools. Next, assume that teachers with master’s degrees tend to have higher knowledge. Then, assume that some schools offer PD to all of their teachers, whereas other schools offer PD only to teachers without master’s degrees (and assume that such PD increases teachers’ knowledge). In predicting how backgrounds and experiences predispose teachers to differing levels of knowledge, first, those teachers with master’s degrees will generally have higher knowledge (e.g., properties of a single-level PS). Second, regardless of personal background (e.g., regardless of master’s degree status), all those teachers who serve in schools that offer PD to all teachers will receive an additional shift in their propensities to have high knowledge (e.g., properties of a simple multilevel PS). Finally, a further shift in propensities will be realized only by those teachers without master’s degrees who serve in schools that offer PD only to non–master’s degree teachers (e.g., properties of a complex multilevel PS). In other words, the final shift affects only a specific subgroup of teachers. Ignoring such differences may misidentify teachers as similar despite remaining systematic differences (e.g., more PD opportunity) that confound interpretation.
Because the observational data concerning teachers’ knowledge are cross-sectional, they were inadequate for providing rigorous evidence toward which of these paths of knowledge acquisition are most likely. Rather, the data in conjunction with these PFs provided complex descriptions of variable relationships and were suggestive of potential pathways and policies. Furthermore, the aim of the PS analyses was to identify similar teachers on the basis of their current knowledge likelihood rather than to identify causal factors surrounding knowledge acquisition.
To this end, I assessed the importance of these complex associations and identified similar teachers by empirically assessing which path most closely aligns with the observed data. The proposed PS models represent restricted models as the simple multilevel model (equation 8) constrains random slope effects exhibited in equation 9, whereas the single-level model (equation 7) constrains all random effects. Using maximum likelihood estimators, the distribution of the difference in deviances between full models and those that restrict certain parameters should have an approximately χ2 distribution with degrees of freedom equal to the difference in the number of parameters (e.g., Pinheiro & Bates, 2000; Raudenbush & Bryk, 2002). Such comparisons help suggest which PS model best describes knowledge’s complex relationships and which PS model best approximates the conditional independence needed for inference.
Statistical Models of Reading Outcomes
To estimate the conditional association or effect of teachers’ reading knowledge and students’ reading achievement, I combined stratification on the estimated PSs with regression adjustment using linear mixed models with the ITBS Word Analysis and Reading Comprehension subtests as outcomes (e.g., Hirano & Imbens, 2002; Hong & Raudenbush, 2005). More specifically, to identify and contrast similar teachers, I constructed five strata using the quintiles of the estimated PS and used every available teacher or class and school covariate to adjust for remaining differences (e.g., Cochran, 1968). Covariance adjustment while stratifying on the PS has been shown to have a number of advantages, including further reduction in bias and variance (e.g., Imai & Van Dyk, 2004; Robins & Rotnitzky, 2001). Furthermore, to increase precision, I considered seven student covariates that are historically related to reading achievement: age, NWF prior achievement, gender, eligibility for free or reduced-price lunch, disability, limited English proficiency, and race. The model in mixed form was
where Y is the outcome of student i in classroom j for school k,is the association of knowledge and achievement, TK is the level of teacher knowledge, A are the student covariates with corresponding coefficients π, S represent the PS strata with β as coefficients, X represent the teacher-level covariates with β as coefficients, W represent the school-level covariates with γ as coefficients, and u, r, and ε are the random effects for the classroom, school, and student error.
Results
The results are presented in four sections, and interpretation and implications are limited to the discussion section. The first section summarizes the psychometric properties of the TKRRP measure and its distribution among teachers and schools. The subsequent section focuses on the estimation of teacher propensities and the covariate balance achieved using these propensities. The third section describes the results of the model-based estimates of the teachers’ knowledge association with achievement, while the fourth section explicates the sensitivity of such estimates to unmeasured variables, PF specifications, and outcome specifications.
Teachers’ Knowledge and Distribution
Psychometric analyses suggested the measure was unidimensional despite the original intent to cover two dimensions (Carlisle et al., 2008). Using a one-parameter item response theory (IRT) model, teachers’ knowledge scores were constructed, resulting in .76 IRT reliability with a test information maximum well below average (Figure 1; e.g. Mislevy & Bock, 1997). 3 Evident from the figure, the measure produced indices of knowledge that primarily differentiate among lower knowledge teachers. Such results suggest that although the measure had a fairly reliable signal for teachers between, say, −2.5 and 0.5 standard deviations, there may be substantial noise within the scores that deviate from this interval. Review of the scores indicated that approximately 60% of teachers fell within this range, with the sample mean falling at 0.27.

Teachers’ knowledge test information curve and standard error.
To further examine the distribution of teachers’ knowledge, I partitioned its variation across schools using the unconditional version of equation 8. The partitioning demonstrated significant clustering of teachers’ knowledge (Table 3). In particular, approximately 27% of the variation in teachers’ knowledge can be attributed to the school level. Figure 2 displays a graphical representation of this clustering as teachers of the same school tend to be more similar to each other than to teachers from other schools. Figure 3 displays the distribution of the teachers’ knowledge at the school level (school averages). Evident of significant clustering, teacher knowledge scores aggregated to the school level ranged from below −2 to beyond 2.
Variance Components for Teachers’ Reading Knowledge
Note. Unconditional teacher and school components refer to the unconditional random intercept–only model using equation 9, whereas the variance components for race, undergraduate degree, and reading certification add only those covariates and corresponding random effects.

Box plot of teachers’ reading knowledge by school membership for a sample of schools.

Histogram of the distribution of teachers’ knowledge at the school level (school averages).
The data offered no causal interpretation in the way in which teachers acquired knowledge and the role schools played. However, together with the multilevel PFs, it did expose relationships that are suggestive of possible pathways. Below, I briefly describe the scope and magnitude of these pathways in terms of standardized mean differences and leave discussion to later.
Incremental comparisons of lower and higher knowledge teachers demonstrated that they tended to differ on most teacher characteristics (Table 4). Teachers’ knowledge tended to lean toward several academic qualifications. For instance, knowledge was higher among teachers who had standard certification, had reading certification, had more approved professional trainings, were RF veterans, and had master’s degrees in literacy education. Of these, masters in literacy education and reading certification demonstrated the largest differences. Specifically, these qualifications were associated with increases of 0.50 and 0.80 standard deviations in teachers’ knowledge, respectively. With comparable magnitudes, the distribution of teachers’ knowledge was additionally imbalanced along several demographic markers, such as race. Each additional approved professional training (M = 3.7, SD = 1.7) was associated with a 0.16 standard deviation increase in teachers’ knowledge.
Mean Differences in Teachers’ Knowledge Across Selected Teacher, Class, and School Characteristics
Note. DIBELS = Dynamic Indicators of Basic Early Literacy Skills; NWF = Nonsense Word Fluency; RF = Reading First.
p < .10.
Levels of teachers’ knowledge were also discordant across multiple classroom characteristics (Table 4). Collective student prior achievement was significantly associated with teachers’ knowledge. For each standard deviation increase in classrooms’ prior achievement, there was an associated 0.19 increment in teachers’ knowledge. That is, classrooms with higher prior achievement had teachers with higher knowledge. Parallel disadvantages appeared for classrooms with high eligibility for free or reduced-price lunch and high percentages of African American students, the magnitudes of which were quite large (Table 4). Together, such disparities suggest that historically disadvantaged students are served by lower knowledge teachers.
School characteristics analogous to those of classrooms and teachers were similarly misaligned along teachers’ knowledge (Table 4). Empirically, several significant knowledge imbalances among schools leave room for both baseline and sustained school factors to partially explain teacher knowledge disparities among schools. Take as a possible example of baseline differences schools with higher versus lower levels of prior achievement. Schools with higher prior student achievement maintained, on average, teachers with higher levels of knowledge. Of note, the magnitude of such differences (0.16) is similar to that of classroom prior achievement (0.19). This suggests that in this sample, higher performing schools are more likely to retain high-knowledge teachers and that higher performing classrooms are likely to be staffed by higher knowledge teachers. The school-level selection persists even when controlling for classroom prior achievement. Specifically, a 1 standard deviation increase in average school prior achievement while controlling for classroom prior achievement is still significantly associated with a 0.10 increase in teachers’ knowledge.
Evidence that such disparities potentially existed beyond prior ability was also found. Assessments indicated that the school wide proportion of students eligible for free or reduced-price lunch was associated with large downward shifts (Table 4) in knowledge. Consequently, schools serving students with fewer family economic resources tend to retain teachers with lower levels of knowledge. Away from family resources, schools with high African American proportions also retained teachers with significant and large (0.62) levels of lower knowledge. This disparity endured even after controlling for prior achievement and free or reduced-price lunch eligibility (0.38, p < .05).
The potential for school-based pathways to widen this gap was also evident. In particular, the collective PD experience of teachers at a school was associated with higher levels of knowledge, the magnitude of which, interestingly, was larger than the teacher-level analogue (Table 4). Furthermore, the discrepancy remained even after controlling for teachers’ individual professional training (0.12, p < .05). Such relations suggested that schools might be associated with the augmentation of teachers’ knowledge.
PS Estimation
Under the cautious assumption that the observed variables adequately describe the likelihood of having high knowledge such that teachers with similar PSs do not systematically differ, I estimated each teacher’s propensity to have high knowledge by constructing each of the three increasingly complex PFs described above. Examination of the comparative fit of the PS models (equations 7 to 9) to the data using deviance tests indicated that the contribution of schools to the likelihood of being a higher knowledge teacher significantly varied among schools, as did the predictive capacity of several teacher variables. 4 In particular, teacher variable coefficients allowed to vary across schools included teachers’ race (African American, r1), undergraduate education (early childhood education, r2), and specialized reading certification (r3) and demonstrated significant variation across schools (Table 3). In other words, the complex multilevel PS (equation 9) most closely described the complex association of teachers’ knowledge with school and teacher factors. Such results suggest heterogeneity in what is predictive of knowledge levels and acquisition across schools.
Using equation 9, I then constructed the PF as a function of every class, teacher, and school measured variable (see the Appendix). Furthermore, because the analyses indicated that the predictive capacity of a teacher’s race, undergraduate education, and specialized reading certification varied significantly across schools, I considered cross-level interactions between each of these and every school-level covariate. To assess balance on observed covariates, I regressed each covariate and cross-level interaction on teachers’ reading knowledge with and without strata indicators (Imai & Van Dyk, 2004). In particular, if the PS strata appropriately adjust for differences in the predisposition to have high knowledge, covariates should have little residual predictive capacity. Because each covariate should be uncorrelated with knowledge given the actual PF, and to shield against model misspecification, I further assessed balance on several transformations of knowledge.
Figure 4(a) summarizes covariate balance using standardized differences/coefficients without adjusting for PS strata (1), balance with PS strata (2), and balance with PS strata on the logarithm of knowledge (3) (e.g. Imai & Van Dyk, 2004). Without PS strata, most covariates were largely imbalanced, whereas with PS strata, results indicated that no standardized coefficient had a magnitude greater than 0.1. Similarly, Figure 4(b) displays the t-statistics of the standardized coefficients indicating that, with the inclusion of PS strata, no covariate demonstrated a significant difference.

Standard normal quantile plots of (a) standardized covariate coefficients and (b) t statistics when predicting teachers’ knowledge without propensity score (PS) strata (1), with PS strata (2), and with PS strata using log(TK) (3).
Outcome Models
Using a fully unconditional model, I partitioned the variance in students’ achievement into three components representing the variance among schools, the variance among teachers within schools, and the variance among students within classrooms (Table 5). For both outcomes, the majority (over 80%) of variation in students’ reading achievement was attributed to students and measurement error. I found that the remaining variation was split between the teacher and school components, with approximately 10% at the classroom level and 7% at the school level. Using the PS strata and covariates as in equation 10, the conditional association of teachers’ reading knowledge and students’ reading comprehension was 0.07 (p < .05; Table 6). That is, holding all other factors constant, an increase of 1 standard deviation in teachers’ reading knowledge was associated with a 0.07 standard deviation increase in students’ achievement. Collectively, for reading comprehension, the measured student covariates accounted for about 29% of the student variation in achievement, the measured teacher covariates accounted for about 44% of the teacher variation in achievement, and the measured school covariates accounted for about 86% of the school variation in achievement (Table 6).
Variance Components for Achievement Models
Relationship Between Students’ Achievement and Teachers’ Knowledge
Note. NWF = Nonsense Word Fluency. Teacher and school covariates are omitted for brevity (see Appendix for a full list of coefficients).
p < .05.
Subsequent analyses concerning the Word Analysis subtest indicated that teachers’ reading knowledge was associated with an effect size of much smaller magnitude (0.01, p > .05) and was indistinguishable from zero. For the Word Analysis outcome, the measured student covariates accounted for about 30% of the student variation in achievement, teacher covariates for about 50% of the teacher variation in achievement, and school covariates for about 43% of the school variation in achievement.
Sensitivity Analyses
Because there is little empirical evidence identifying both the core of teachers’ reading knowledge and its corresponding effect on students’ reading achievement, there is a considerable possibility that I have not controlled for relevant differences among teachers or classrooms and schools. To better understand the quality of the estimated association of teachers’ reading knowledge and reading comprehension, I assessed the sensitivity of the inferences. First, I assessed the sensitivity of the inferences to the inclusion of an unmeasured variable (e.g., Hong & Raudenbush, 2006; Rosenbaum, 1995). I constructed a sensitivity index from the set of observed measures to determine if the model estimates would be significantly influenced by potential hidden biases resulting from similar but unobserved covariates.
For the significant association of teachers’ reading knowledge and students’ reading comprehension, my sensitivity analyses indicated that the estimated association was robust to the omission of a wide range of measured characteristics but sensitive to an unmeasured confounder similar in magnitude to that of a classroom average prior achievement measure. Specifically, if a confounder with relationships to the outcome and treatment similar in magnitude to that of the aggregated prior achievement was unmeasured, it is possible that inferences would have been different.
Such sensitivity is typical, as measures of prior achievement often demonstrate the strongest impact in achievement models (Bloom, 2005). There is also evidence that the inclusion of additional covariates accounts for decreasing amounts of variance once the most predictive (e.g., pretests) are considered (Bloom, 2005). Nonetheless, such sensitivity is particularly relevant in this study because prior achievement was indexed using a measure different from the outcome.
To speculate on the quality of this pretest, I explored the magnitude of the NWF correlations with ITBS subtests in Grades 2 and 3 using the Michigan state student base. I found that consecutive ITBS scores in Grades 1 through 3 were correlated at around .50. The NWF measure of used had correlations of .53 and .51 with the ITBS Reading Comprehension and Word Analysis outcomes, respectively. Such zero-order correlations do not alleviate omitted variable concerns, but they do provide some limited evidence as to the relevance of NWF as a pretest measure.
In addition, there is also a scarcity of literature on how schools are associated with teachers’ knowledge and little evidence of the effectiveness of multilevel PSs. My approach was guided by the theory that such knowledge may be clustered in schools, and my analyses indicated that there was statistical evidence toward complex clustering. To address this clustering, I included school characteristics and their cross-level interactions and allowed certain teacher coefficients to vary across schools. To assess the sensitivity of my findings to the multilevel structure of the PSs, I reestimated the association of teachers’ reading knowledge with students’ reading achievement, ignoring the predictive contribution of schools to teachers’ knowledge using the single-level (equation 7) and simple multilevel (equation 8) PSs (Table 7). The results indicated small differences that suggested similar effect sizes but potentially different inferences. Results for Reading Comprehension indicated that the association using a single-level PS was 0.05 (SE = 0.03) and using a simple multilevel PS was 0.06 (SE = 0.03), whereas results for Word Analysis indicated virtually no changes.
Summary of Teachers’ Knowledge Effects for Different Models
Note. PS = propensity score. The first seven estimates refer to changes in the PS model, whereas the last two refer to changes in the outcome model.
Finally, to address potential model misspecification, I assessed the sensitivity of the results to alternate variable specifications of the PS and outcome model. Specifically, for the PS model, I first assessed the estimates sensitivity to the exclusion of all cross-level interactions (random effects summarize schools’ contribution) and second to omissions of one random slope at a time (ri). Third, for the outcome model, I assessed how results changed when using only covariance adjustment and no PS strata and fourth when using only the PS strata.
The results of these analyses (Table 7) indicated, first, that ignoring cross-level interactions had minimal impact, suggesting that the findings were robust to the exclusion of interactions. However, the constraint of the predictive capacity of a teacher covariate previously allowed to vary across schools (i.e., random slope) changed the associated effect by as much as 25%, potentially suggesting the importance of specific random slope effects (Table 7). Finally, by not subclassifying on the PSs, covariance adjustment alone would have led to smaller effect sizes and different inferences, whereas similar results were found by excluding the covariates in the outcome model and using only the PS strata.
Discussion
In this study, I have examined the effect of teachers’ reading knowledge on students’ reading achievement in Grade 1 RF classrooms. To overcome the potential limitations of indirect measures, I have used a direct assessment of reading knowledge that targeted content seen as central to teachers’ of early reading instruction. As a first step in understanding knowledge as measured by the TKRRP, psychometric analyses were conducted, indicating that the measure could most strongly differentiate among those teachers with relatively lower knowledge. Approximately 40% of teachers fell in a range that might be considered unreliable, suggesting that the magnitude of relationships subsequently examined might be attenuated.
Next, because teachers’ knowledge acquisition avenues were thought to be complexly associated with the schools they served in, I studied the distribution of teachers’ knowledge. Such examination found significant evidence that knowledge was clustered in schools. In particular, approximately 27% of the variation in teachers’ knowledge could be uniquely attributed to school membership. When this proportion is compared with analogous estimates for students’ achievement (e.g., 10%), such imbalances display severe inequities in that teachers’ knowledge tends to be separated out more than students’ achievement. Moreover, as the sample was derived from RF volunteers, the estimate of clustering may represent a conservative estimate compared with national data.
Further investigation concerning the correlates of this clustering demonstrated multiple imbalances in teachers’ knowledge across a range of school characteristics. Such imbalances highlight the potential roles schools have in attracting, retaining, and developing quality teachers in terms of their knowledge. Specifically, the imbalance in teachers’ knowledge across school characteristics tended to be similar in magnitude to imbalances across teacher and class characteristics. Furthermore, these differences tended to persist even after controlling for comparable teacher and class characteristics.
Collectively, these disparities suggest that beyond differences in teacher qualifications and classroom composition, teachers’ knowledge is not equally distributed among schools. If the TKRRP measure of teachers’ reading knowledge can be seen as a component of or associated with teacher quality, such findings are suggestive of sustained systematic disadvantages for both teachers and students. The extent to which the school is an active agent in promoting such knowledge, just a marker of such knowledge, or a mixture of both has important policy implications.
Specifically, discerning the forces behind the skew of high-knowledge teachers to more effective and resourceful schools is central. If the imbalances are the result of sorting effects such that high-knowledge teachers choose or are recruited by academically strong schools, then hiring access to high-quality teachers becomes paramount. If, however, the imbalances stem largely from post-hire differences, training teachers through PD or teacher mentoring, for instance, becomes more central. The current analyses highlight whether questions on equitable access to high-quality teachers should concentrate focus on initial access to good teachers or developing the incumbent teachers. Unfortunately, the cross-sectional nature of the data on teachers’ knowledge can only suggest avenues for these imbalances and raise questions.
As a second step, I explored the extent to which schools adapted the predictive capacity of teacher characteristics in predicting teachers’ knowledge. In particular, because I suspected that the factors that shape teachers’ knowledge may be very different across schools, I examined the extent to which the predictive capacity of teacher covariates varied across schools. The results indicated that the association of race, undergraduate degree, and reading certification with knowledge differed across schools. Such results may suggest, for example, that schools may increase or decrease the relevance of race in obtaining reading knowledge, potentially creating less or more equitable environments.
At the same time, these results might also suggest that the effect of teachers’ personal experiences (e.g., academic or professional) on their formation of reading knowledge may vary and that, in part, this variation may be summarized by the schools they choose to serve in. If schools do modify knowledge and the relevance of teachers’ backgrounds, policies that support paths for schools to enable this type of learning might be appropriate. If, rather, the variation is due to differences in what is predictive of knowledge for disparate teacher populations, more emphasis might be placed on policies supporting preservice training and experiences. These differences underscore the complex interplay between schools and teachers but also draw attention to the indiscriminant nature of schools in predicting teachers’ knowledge in the propensity models. Again, the cross-sectional data used offer little evidence of whether schools modify teachers’ knowledge and the relevance of their backgrounds, whether these differences are an artifact of the schools teachers choose to serve in, or some combination.
One limited piece of evidence pointing to schools’ contribution is the difference in knowledge of new versus veteran RF teachers. Veteran RF teachers were exposed to substantial PD prior to the research year and scored significantly higher on the knowledge measure than did teachers who were new to RF. Similarly, in predicting knowledge, coefficients for race, undergraduate education, and reading certification were different for new and veteran RF teachers. Although conclusions are highly inferential, such differences suggested that measured differences in knowledge across schools are due at least in part to PD implemented by schools. Collectively, the schools’ potential roles in these data suggest that a salient limitation of proxy measures may be their inability to characterize the varying relationships of such proxies with knowledge.
To attend to these features of reading knowledge, I subsequently attempted to identify similar teachers on the basis of personal backgrounds, the classrooms they serve, and the schools they serve in by constructing multilevel PSs. The results of the analysis provided evidence that multilevel PSs more fully described the distribution of knowledge and aligned better with data. For instance, whereas the single-level PSs ascribed residual variability in knowledge to nonsystematic chance, the multilevel PSs attributed portions of that residual variability to school factors and school–teacher interactions. In turn, such differences suggest that approaches that ignore the predictive contribution of schools or assume that the predictive capacity of teacher characteristics is invariant across schools may mismatch teachers.
Using the subsets of similar teachers identified by the PS, I estimated the knowledge effect via the conditional association between students’ reading achievement and teachers’ reading knowledge and found mixed evidence. For the Reading Comprehension subtest, each standard deviation increase in teachers’ reading knowledge was associated with roughly 2 extra weeks of growth. 5 In contrast, no association for the Word Analysis subtest was found.
It is possible that such word analysis results may be due to the composition of the teachers’ reading knowledge scale, as the items may not be tapping the types of pedagogy that influence achievement in word analysis (Carlisle et al., 2008). Specialized knowledge needed for teaching reading remains difficult to identify and measure. For instance, items measuring knowledge of children’s texts and genres may strengthen the scale (e.g. Cunningham, Perry, Stanovich, & Stanovich, 2004). In addition, it is clear from the psychometric analyses that the measure of teachers’ reading knowledge is fairly coarse and primarily offers reliable information well below the population mean. Given these qualities and the above-average ability of the sample, it remains likely that there was substantial noise, which downwardly biased the estimates, and it is likely that more refined measures may produce stronger effects. Such findings emphasize the importance of measurement issues and highlight the difficulty in establishing and assessing the domains relevant to teaching reading in first grade.
The analyses presented have several key limitations. First, the target population was very different from mainstream education and is not representative of the distribution of students, teachers, or schools in the United States. The volunteer sample further constrains inferences, because teachers in this sample were distinctly different on levels of knowledge than the target population. Within this sample, the knowledge measure was weak for about 40% of teachers, likely attenuating both school-knowledge relationships and knowledge-achievement relationships.
There are also clear limitations to PS-based analyses. Most notable is the inability of PS methods to eliminate confounding from unmeasured variables, and my sensitivity analyses did note such sensitivity. In parallel, the data did not have several important measures, such as ITBS pretests or indices of general teacher aptitude. Furthermore, the use of multilevel PSs with across-school comparisons to address the limited within-school sample sizes and the clustered distribution of knowledge also exposed results to the assumption of no unmeasured school covariates. Although there is evidence that the findings were robust to unmeasured school covariates, further research is needed to understand the scope and conditions of this utility. Similarly, to take into account that teachers were matched across schools, inferences concerning the potential for effect variability across schools were constrained.
The contribution of this article is seen as twofold. First, this study provides limited but promising evidence that direct measures of teachers’ reading knowledge can be linked with students’ reading achievement, suggesting that knowledge does matter. The results demonstrated the importance of measuring appropriate domains of knowledge and the difficultly in designing assessments that attend to the full range of teachers (especially above average). Such results suggest that to link knowledge with achievement, we must first clarify the domains of knowledge needed to promote effective instruction. Such results further amplify the difficulty of using proxy measures to tap into knowledge or quality. Subsequent research will likely benefit from the development of items targeted toward specific domains that differentiate among teachers with above-average knowledge.
Once reliable and valid measures are constructed, the work here suggests that teachers still differ in how their experiences help them to acquire this knowledge. Schools may play a primary role in facilitating these experiences or further developing this knowledge. Such roles may signify that policies that support PD and collaborative environments can be effective in developing teachers’ capacities as it relates to students’ achievement.
Second, beyond the substance, the study developed a strategy for assessing the relationship between elements of teacher quality and students’ achievement as they exist in observational data. This strategy was guided by the theory that many teacher characteristics and quality indicators are unevenly distributed among schools and unevenly influenced by characteristics of schools. I believe this strategy may be particularly useful in future studies exploring malleable pathways for improving teacher quality as well as linking hypothesized elements of teacher quality with students’ achievement.
There were several benefits to the use of multilevel PSs in the current context beyond usual PS advantages such as robustness to nonlinearities, model misspecification, and respecification error (e.g., Dehejia & Wahba, 1999; Rubin, 2007). First, there is evidence in the literature that incorporating such subclassifications on the PS reduces bias and improves efficiency (Imai & Van Dyk, 2004). This may help explain the small but noteworthy discord between the multilevel PS and non-PS results and would suggest that the former should be preferred, because it tends to be more accurate. Next, the use of multilevel PS methods encouraged a more detailed study of teachers’ reading knowledge and how we might identify similar teachers. Although the primary purpose of multilevel PSs was to reduce confounding by identifying similar teachers, the use of multilevel PSs also gave insight into policy-relevant issues. For example, multilevel PSs helped identify correlates of knowledge and identified potential teacher and school pathways from which we might study and advance the knowledge that teachers’ hold. Finally, a salient practical feature of multilevel PSs in the context of teacher quality is the ability to match teachers across schools. Because teacher sample sizes within schools tend to be small and their comparability poor, methods that allow comparisons of teacher across schools are particularly useful.
Although the ability of multilevel PSs to identify comparable teachers across schools theoretically depends on the measurement of all relevant school covariates, its effectiveness in removing confounding may extend to unmeasured school-level covariates. The results that omitted all cross-level interactions but maintained the random effects suggest that in this study, random effects capture a substantial portion of these omitted interactions (see Table 7). However, the extent of this mechanism is unknown and likely dependent on, for example, the relevance of the interactions and the coefficient variation and reliability (e.g., Morris, 1983). Further research into the statistical properties of these approaches and under which conditions they outperform alternative methods is warranted.
Footnotes
Appendix
Coefficients of Achievement Models and List of Propensity Score Covariates
| WA | RC | |||
|---|---|---|---|---|
| Variable | Estimate | SE | Estimate | SE |
| Intercept | −3.54 | 1.87 | −1.27 | 1.59 |
| Teachers’ knowledge | 0.01 | 0.03 | 0.07 | 0.03 |
| Age in months | 0.00 | 0.00 | 0.00 | 0.00 |
| Fall NWF | 0.03 | 0.00 | 0.03 | 0.00 |
| Male student | −0.07 | 0.02 | −0.13 | 0.02 |
| Special education student | −0.08 | 0.07 | −0.08 | 0.07 |
| Student eligible for free or reduced-price lunch | −0.14 | 0.03 | −0.19 | 0.03 |
| Disabled student | −0.28 | 0.04 | −0.18 | 0.04 |
| Limited-English-proficient student | −0.09 | 0.04 | −0.12 | 0.04 |
| White student | 0.19 | 0.03 | 0.18 | 0.03 |
| Strata Indicator 1 | −0.09 | 0.11 | 0.07 | 0.10 |
| Strata Indicator 2 | −0.13 | 0.08 | −0.07 | 0.07 |
| Strata Indicator 4 | 0.01 | 0.08 | −0.01 | 0.07 |
| Strata Indicator 5 | 0.11 | 0.11 | −0.05 | 0.10 |
| Male teacher | 0.44 | 0.12 | 0.29 | 0.11 |
| White teacher | −0.09 | 0.12 | 0.01 | 0.11 |
| African American teacher | −0.01 | 0.14 | 0.13 | 0.13 |
| Hispanic teacher | 0.02 | 0.12 | 0.14 | 0.11 |
| Bachelor’s degree in elementary education | 0.19 | 0.10 | 0.10 | 0.09 |
| Bachelor’s degree in early childhood education | 0.19 | 0.13 | 0.07 | 0.11 |
| Bachelor’s degree in literacy education | 0.31 | 0.24 | 0.05 | 0.23 |
| Bachelor’s degree in special education | 0.19 | 0.12 | 0.03 | 0.11 |
| Master’s degree | 0.06 | 0.08 | 0.03 | 0.07 |
| Master’s degree in elementary education | 0.01 | 0.07 | −0.01 | 0.06 |
| Master’s degree in early childhood education | 0.00 | 0.11 | 0.04 | 0.10 |
| Master’s degree in literacy education | −0.07 | 0.10 | −0.04 | 0.09 |
| Master’s degree in special education | 0.12 | 0.08 | 0.07 | 0.07 |
| Post−master’s degree | 0.18 | 0.12 | 0.06 | 0.11 |
| Possesses a standard teaching certification | 0.03 | 0.06 | 0.07 | 0.06 |
| Possesses a reading certification | 0.25 | 0.18 | 0.21 | 0.11 |
| Possesses a special education certification | −0.01 | 0.12 | −0.10 | 0.11 |
| Number of approved reading trainings/PD seminars | −0.02 | 0.02 | 0.00 | 0.02 |
| Number of years teaching | 0.01 | 0.01 | 0.00 | 0.01 |
| More than 3 years of teaching | 0.11 | 0.06 | 0.09 | 0.05 |
| RF veteran status | 0.02 | 0.08 | −0.01 | 0.07 |
| Average age of class | 0.03 | 0.02 | 0.02 | 0.01 |
| Average of class DIBELS NWF in the fall | 0.00 | 0.00 | 0.00 | 0.00 |
| Standard deviation of class DIBELS NWF in the fall | 0.01 | 0.01 | 0.01 | 0.01 |
| Proportion in class that is male | 0.21 | 0.23 | −0.09 | 0.21 |
| Proportion of class identified as special education | 0.55 | 0.33 | 0.59 | 0.31 |
| Proportion of class eligible for free or reduced-price lunch | 0.31 | 0.23 | −0.02 | 0.21 |
| Proportion of class identified as having a disability | −0.97 | 0.32 | −0.44 | 0.30 |
| Proportion of class identified as having limited English proficiency | −0.14 | 0.30 | −0.25 | 0.27 |
| Proportion of class that is African American | 0.01 | 0.29 | −0.24 | 0.27 |
| Proportion of class that is Hispanic | 0.34 | 0.37 | 0.37 | 0.34 |
| Proportion of class that is White | 0.88 | 0.31 | 0.46 | 0.29 |
| Proportion of students in school eligible for free or reduced-price lunch | 0.10 | 0.44 | 0.29 | 0.36 |
| Proportion of students in school who are female | 0.40 | 0.72 | −0.52 | 0.67 |
| Proportion of students in school who are African American | 0.14 | 0.70 | −0.48 | 0.60 |
| Proportion of students in school who are Hispanic | −1.69 | 1.00 | −2.18 | 0.84 |
| Proportion of students in school who are White | 0.18 | 0.74 | 0.03 | 0.63 |
| Proportion of teachers in school who are male | −0.51 | 0.30 | −0.59 | 0.24 |
| Proportion of teachers in school who are White | −0.01 | 0.40 | −0.04 | 0.33 |
| Proportion of teachers in school who are African American | 0.37 | 0.41 | 0.20 | 0.34 |
| Proportion of teachers in school who are Hispanic | −0.14 | 0.44 | −0.20 | 0.38 |
| Proportion of teachers in school holding bachelor’s degrees in elementary education | 0.18 | 0.19 | 0.21 | 0.16 |
| Proportion of teachers in school holding bachelor’s degrees in early childhood education | −0.53 | 0.28 | −0.06 | 0.24 |
| Proportion of teachers in school holding bachelor’s degrees in literacy education | −1.37 | 0.93 | 0.15 | 0.78 |
| Proportion of teachers in school holding bachelor’s degrees in special education | 0.07 | 0.26 | 0.07 | 0.22 |
| Proportion of teachers in school holding master’s degrees | −0.01 | 0.20 | 0.06 | 0.17 |
| Proportion of teachers in school holding master’s degrees in elementary education | 0.24 | 0.17 | 0.27 | 0.15 |
| Proportion of teachers in school holding master’s degrees in early childhood education | 0.43 | 0.31 | 0.21 | 0.27 |
| Proportion of teachers in school holding master’s degrees in literacy education | 0.12 | 0.29 | 0.49 | 0.24 |
| Proportion of teachers in school holding master’s degrees in special education | −0.32 | 0.24 | 0.06 | 0.20 |
| Proportion of teachers retaining degrees beyond a master’s | 0.26 | 0.40 | 0.44 | 0.34 |
| Proportion of teachers in school holding standard certification | 0.02 | 0.16 | −0.14 | 0.13 |
| Proportion of teachers in school holding reading certification | −1.03 | 0.52 | −0.39 | 0.43 |
| Proportion of teachers in school holding special education certification | 0.19 | 0.31 | 0.08 | 0.27 |
| Average number of approved trainings that teachers maintain | 0.03 | 0.04 | 0.01 | 0.04 |
| Proportion of teachers with high levels of teaching experience | 0.08 | 0.15 | −0.03 | 0.13 |
| Average years of teacher experience | 0.01 | 0.01 | 0.01 | 0.01 |
| Proportion of teachers in school who are new to RF | −0.02 | 0.18 | −0.02 | 0.15 |
| Average age of first grade students in school | −0.01 | 0.01 | −0.01 | 0.01 |
| Average of school DIBELS NWF in the fall | 0.01 | 0.01 | 0.01 | 0.01 |
| Proportion of students in first grade eligible for special education | −0.68 | 0.63 | −0.51 | 0.54 |
| Proportion of students in first grade eligible for free or reduced-price lunch | −0.28 | 0.43 | −0.16 | 0.36 |
| Proportion of students in first grade with disabilities | 0.87 | 0.67 | 0.62 | 0.57 |
| Proportion of students in first grade with limited English proficiency | 0.59 | 0.38 | 0.36 | 0.33 |
| Proportion of students in first grade who are African American | −0.73 | 0.57 | −0.48 | 0.50 |
| Proportion of students in first grade who are Hispanic | 0.74 | 0.99 | 0.82 | 0.86 |
| Proportion of students in first grade who are White | −1.45 | 0.69 | −1.55 | 0.59 |
Note. DIBELS = Dynamic Indicators of Basic Early Literacy Skills; NWF = Nonsense Word Fluency; PD = professional development; RC = Reading Comprehension; RF = Reading First; WA = Word Analysis.
Acknowledgements
The author would like to thank Joanne Carlisle, Geoffrey Phelps, and Brian Rowan.
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
The author disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was made possible by a Teacher Quality grant from the Institute for Education Sciences (Award #R305M050087); however, IES is not responsible for the design and analysis of the study or interpretation or results.
