Abstract
We apply a difference-in-difference design to measure the causal effect of a teacher obtaining an endorsement in Sheltered English Immersion under Massachusetts’s Rethinking Equity in the Teaching of English Language Learners initiative on student’s learning outcomes. More than 35,000 in-service public school teachers completed the semester-long course. We find no effect on English learners’ (ELs) average test scores, but modest positive spillovers for students with disabilities and other non-EL students. Training benefited teachers recently hired by their district but had no effect on longer serving teachers.
Keywords
Most English learners (ELs) in U.S. public schools receive instruction from a core academic teacher within a general education setting (Staehr Fenner, 2013), and yet most teachers in those classrooms have received little to no training in how to instruct ELs (Cox et al., 2017; Greenberg et al., 2015). With federal mandates requiring that schools provide ELs with equal access to instruction, there is a widespread need to improve the current teaching workforce’s preparation to support these students. But given that more than half of public school teachers instruct at least one EL (Cox et al., 2017), 1 the scale of professional development (PD) required to train all relevant in-service teachers is daunting.
The Rethinking Equity in the Teaching of English Language Learners (RETELL) initiative pursued by Massachusetts was an ambitious attempt to address deficiencies in knowledge of strategies for instructing ELs among general education teachers at an unprecedented scale. RETELL required all core academic teachers instructing even one EL to obtain an endorsement in Sheltered English Immersion (SEI) to demonstrate their proficiency in instructional strategies for making academic content accessible to ELs and for scaffolding their English language development in the context of a general education classroom. For most teachers, this required completing a PD training equivalent in scope to a college-level semester-long course. Over a 5-year period, more than 35,000 in-service public school general education teachers throughout the state completed the training.
In this article, we apply a generalized difference-in-difference design to estimate the causal effect of a general education teacher obtaining an SEI endorsement under RETELL on the test scores of students they instruct. We find no significant effect from a general education teacher obtaining an SEI endorsement under RETELL on the test scores of ELs that the teacher instructs, on average. However, we find some evidence that obtaining an SEI endorsement benefited educators hired by the district within the previous 3 years—a proxy for classroom experience—but not for longer serving teachers within the district. Furthermore, we find a modest positive spillover effect for students with disabilities and the larger group of non-EL students that a teacher instructs.
The potential for required PD to benefit teacher quality at scale depends not only on the quality of the training but also fidelity of implementation and variation in pre-existing characteristics and cultural contexts within schools (Matsumura et al., 2010). Our data do not allow us to disentangle the extent to which obtaining an SEI endorsement under RETELL failed to improve teacher impacts for ELs because of limitations inherent in the training or factors associated with large-scale implementation. However, reports of substantial variation in the fidelity with which the RETELL training was provided and for teacher buy-in for the training requirement identified through interviews with participating teachers and providers by Chang-Bacon (2022) suggest that issues with implementation likely contributed to limiting the effect of the training.
From a policy perspective, our results are relevant to a growing set of expansive teacher training requirements intended to improve instruction for ELs. At least 24 states currently require or recommend EL-specific training for general education teachers (Education Commission of the States [ECS], 2020), and about 27% of teachers nationwide participated in PD targeted to instructing ELs as of 2011–2012 (Rotermund et al., 2017). Yet, we know little about the effect that such training has on teacher contributions to the achievement of ELs or other students. The few recent quantitative studies finding an association between EL-specific training for general education teachers and ELs’ achievement are correlational in nature and may not translate to the context of a statewide requirement (Betts et al., 2003; Loeb et al., 2014; Master et al., 2016). 2 To our knowledge, we provide the first causal estimate for the impact of a large-scale teacher training requirement targeted to improving instruction for ELs on the academic performance of ELs they instruct.
More generally, we contribute to a growing literature evaluating policies intended to im prove the effectiveness of in-service teachers. Research finding that uniform salary increases do not appear to increase the performance of teachers already in the classroom (de Ree et al., 2017) and that linking teacher compensation to performance in U.S. public schools has yielded null or small effects (Fryer et al., 2012; Glazerman & Seifullah, 2010; Goldhaber & Walch, 2012; Goodman & Turner, 2011; Springer et al., 2012) suggests that policies that focus on teacher compensation and motivation alone are not likely to meaningfully improve the effectiveness of in-service teachers. However, recent evidence that in-service teachers benefit from formal evaluations (Taylor & Tyler, 2012) and coaching (Kraft et al., 2018) suggest the potential to improve teacher effectiveness by supporting their practice directly.
Our findings are especially relevant to the literature evaluating the impact of training and certification on the effectiveness of the teaching workforce. Studies evaluating the efficacy of certification to ensure that students receive effective instruction have produced mixed results (Clotfelter et al., 2006, 2007, 2010; Goldhaber & Anthony, 2007; Harris & Sass, 2011; Sass, 2015). Recent meta-analyses of rigorous studies suggest that PD can have meaningful positive impacts on teacher effectiveness (Kennedy, 2016; Lynch et al., 2019). Some recent studies of expansive PD programs have found benefits within a variety of school contexts (Boulay et al., 2018), although recent evaluations of large-scale required PD within China failed to find significant benefits (Loyalka et al., 2019; Lu et al., 2019). 3
The remainder of the paper proceeds as follows. The section “Educating ELs” provides a basic description of EL education. “Setting and Policy” section provides a brief historical context for educating ELs in Massachusetts and describes the RETELL initiative. The “Data” section describes the data and “Identifying the Causal Effect of SEI Endorsement Training” section describes our empirical strategy for estimating the causal effect of completing the training. We report the results in Section “Results.” In the section “Identification Test,” we present a test of the plausibility of our identifying assumption for the analysis. Finally, the section “Discussion and Conclusion” provides a brief summary of results and concludes.
Educating ELs
ELs are among the most rapidly growing and lowest performing student subpopulations in American public schools (e.g., National Education Association, 2015; U.S. Department of Education, 2018). ELs’ relatively low academic outcomes are especially concerning because federal law requires schools to provide sufficient language support services to ensure that their access to content instruction is equivalent to that of native English-speaking students. Under Title VI of the Civil Rights Act of 1964 and the Equal Educational Opportunities Act (EEOA) of 1974, states and school districts must “take ‘affirmative steps’ to address language barriers so that EL students may participate meaningfully in schools’ educational programs.” In Lau v. Nichols (1974), the seminal court case that definitively established the legal obligations of school districts and public schools to address ELs’ language barriers, the U.S. Supreme Court declared, “There is no equality of treatment merely by providing students with the same facilities, textbooks, teachers, and curriculum; for students who do not understand English are effectively foreclosed from any meaningful education.” The subsequent Castankeda v. Pickard (1981) ruling further specified that in order for a school district to be considered fulfilling its legal obligations, its EL program must meet three conditions: (a) that it is based on a sound educational theory, (b) that it is implemented with adequate resources and personnel, and (c) that over time, it proves effective in eliminating ELs’ language barriers.
Some of the legal obligations can be fulfilled by the language support services provided by English as a second language (ESL) and bilingual education specialist teachers. ESL teachers, for instance, might pull out ELs for English language development instruction or push into their general education classrooms to support ELs in academic instruction. However, concerns over segregating ELs from their English-speaking peers for separate instruction has led many states and school districts to favor an instructional model wherein ELs are kept in their regular classrooms as much as possible, and general education teachers serve as both content and language teachers (Harklau & Yang, 2019). Most ELs spend the majority of their school time in a general education classroom with a core academic teacher (Staehr Fenner, 2013), and by some estimates, more than half of public school teachers teach at least one EL. In such a model, general education teachers must be equipped to provide linguistic scaffolding along with academic instruction if ELs are to have equal access to academic content as non-ELs.
However, there is substantial variation across states in the pre-service and PD training required for general education teachers instructing ELs (ECS, 2020). Much of the policy discussion regarding the lack of quality instruction for ELs has focused on the fact that ELs tend to be taught by inexperienced and less qualified teachers (Ballantyne et al., 2008; Cosentino de Cohen et al., 2005; Dabach, 2015). But even otherwise, fully qualified general education teachers may not have sufficient training to seamlessly incorporate the teaching of vocabulary and language functions into their instruction to make academic content accessible to ELs and to foster their language development (Penner-Williams et al., 2017). A 2014 survey found that only 24% of elementary teacher education programs provided any training in EL-specific instructional strategies (Greenberg et al., 2015). Similarly, only 29.5% of general education teachers in U.S. public schools who have at least one EL in their classroom have had the opportunity to receive PD in EL education (Cox et al., 2017). The logic underlying RETELL’s SEI training requirement and similar, although less ambitious, initiatives in other states is to improve educational outcomes for ELs by ensuring that they are instructed by teachers sufficiently trained to meet their specialized needs. However, there is currently little to no empirical support for the effectiveness of such policies or for the training they require. Some recent studies find a positive association between EL-specific training and ELs’ test scores (Betts et al., 2003; Loeb et al., 2014; Master et al., 2016). However, none of these prior studies specific to EL certification and pre-service coursework employs a research design capable of leading to a causal estimate.
Notably, it is possible that the specific deficiencies in instruction for ELs targeted by the SEI training and similar policies are not as daunting as commonly believed. First, the test score gap between ELs and non-ELs is somewhat misleading because it in part reflects the fact that students lose the EL designation once they have achieved sufficient proficiency, leading to what Saunders and Marcelletti (2013) describe as the “gap that can’t go away.” Furthermore, some recent studies have found that ELs’ lower test scores relative to non-ELs are at least in part a reflection of other achievement gaps based on race/ethnicity and household income, thus challenging the conventional understanding that the so-called EL gap is driven by insufficient language services (Callahan & Humphries, 2016; Umansky et al., 2016).
Setting and Policy
Recent Historical Context
In 2002, Massachusetts voters approved a ballot initiative requiring public school children to be taught in English language classrooms, which effectively eliminated transitional bilingual education programs in the state. Prior to this ballot initiative, 23% of ELs were enrolled in bilingual education. After the initiative became law, the majority of these students were moved into SEI programs.
In theory, under the SEI model, ELs were supposed to receive content instruction, mostly in English, from general education teachers who were trained to scaffold ELs’ academic and language learning, while receiving more explicit English language development instruction from ESL specialist teachers. In practice, many ELs were instructed by teachers who were not trained in SEI practices. Although the Massachusetts Department of Elementary and Secondary Education (MADESE) mandated specialist training for ESL teachers, it did not mandate SEI training for general education teachers who had ELs in their classrooms. Instead, the state offered four categories of SEI training and encouraged general education teachers to undergo the training. Because the category training was voluntary, only a small fraction of general education teachers took it. By 2010, 50,000 teachers, or 71% of the state’s public school teachers, lacked training to work with ELs under the state’s SEI model.
The statewide shortage of SEI-trained general education teachers meant that a large proportion of ELs were being taught by untrained teachers. For example, in 2010, only half of secondary-level ELs and a quarter of elementary-level ELs in Boston Public Schools, the largest school district in the state, received instruction from teachers who had either received category training or obtained an ESL license. Consequently, in July 2011, the U.S. Department of Justice (DOJ) sent a letter informing the state that it was found in violation of the EEOA for its failure to require adequate training for SEI teachers. Referring to the state’s decision not to mandate SEI training for general education teachers back in 2004, DOJ argued,
MADESE can no longer claim to be implementing an SEI Program model consistent with EEOA requirements if the voluntary PD program has resulted in a significant shortage of SEI teachers trained to educate ELL children in content classes seven years later.
RETELL and the SEI Endorsement Requirement
MADESE responded to DOJ’s concerns by proposing the RETELL initiative. At the heart of the initiative was a requirement that all core academic pre-service teachers and all core academic in-service teachers who were instructing ELs obtain an SEI teacher endorsement by June 30, 2016. Similarly, administrators (e.g., principals and supervisors) who evaluated SEI teachers were required to obtain the SEI administrator endorsement. SEI teachers could obtain the endorsement in one of three ways: (a) if they already possessed an ESL license, (b) passed an SEI licensure test, or (c) completed a comprehensive SEI teacher endorsement course. Those who had no previous training or only the Category 1 training were required to take a full 45-hour endorsement course. Teachers who had completed three of the previously available category training were eligible to take a 15-hour short-bridge course, and those who had completed two of the category trainings could take a 24-hour long-bridge course. The policy applied to both new and existing teachers, and thus resulted in thousands of hours of PD across the state.
In this article, we focus on measuring the effect of obtaining the SEI endorsement under RETELL on a teacher’s contribution to student test score outcomes. However, it is worth noting that RETELL contained other aspects in addition to the training requirement. First, RETELL also required training for administrators where they learned how to provide supervision and support for teachers using SEI strategies. When the initiative began in 2013, the state also changed to the WIDA (World-Class Instructional Design and Assessment) English development standards and began to use WIDA’s ACCESS test to assess ELs’ English language proficiency annually. By 2015, the RETELL initiative also included coaching and extended learning opportunities meant to build upon the SEI endorsement training. Thus, our analysis looks at only one important aspect of the full RETELL initiative.
After a small pilot meant to evaluate the content of the training (August et al., 2012), the statewide roll out of the requirement began in 2013. Due to the large number of teachers who were required to complete this training, the state assigned each district into one of three cohorts. The districts with the highest EL incidence and lowest EL academic performance were assigned Cohort 1. In other words, the districts with the highest needs for training received priority, whereas districts with fewer ELs and higher performing ELs received the training later (Chester, 2012). The state funded the required PD during the 2- to 3-year cohort period. A teacher could choose when to begin the training but was required to complete the training within 1 year of beginning the course.
Only core academic teachers who were assigned to instruct ELs during the district’s cohort years were required to obtain an SEI endorsement by the end of the cohort period and thus were eligible for the training. Unlike other PD opportunities, failure to successfully complete the training under RETELL resulted in meaningful consequences for teachers: Eligible teachers within a district who failed to acquire the endorsement by the end of the cohort period were unable to advance, renew, or extend their license, and they were required to earn the endorsement at their own expense.
SEI PD Course
The overall purpose of the SEI endorsement is to help general education teachers develop proficiency in instructional strategies for making academic content accessible to ELs as well as scaffolding their English language development in the context of general education classrooms. The 45-hour SEI teacher endorsement course consists of 10 hours of theory and policy component and 35 hours of practice component (MADESE, n.d.). The theory and policy component provides an overview of state and local educational agencies’ legal obligation to ELs, the characteristics of the EL population, second language acquisition theory, and the WIDA English language proficiency standards, which the state had adopted. The practice component encompasses vocabulary, discourse, reading, and writing instruction with a large emphasis on specific SEI instructional strategies. Figure 1 describes components of course content. As major assignments, course participants are required to implement these strategies in their classes and write six reflection reports through the duration of the course. Participants are also expected to develop a lesson plan incorporating SEI strategies, teach the lesson to their own students, reflect on the implementation, and teach a small component of the class to their course colleagues in the last two sessions of the course (MADESE, n.d.).

Summary of SEI course.
The practice component of the course is based on the Expediting Comprehension for English Language Learners (ExC-ELL) model (Calderón, 2011; Calderón et al., 2005). Funded by the Carnegie Cooperation of New York, the model was designed “to help teachers provide effective instruction for ELs and all other students in their classrooms, particularly those reading below grade level and needing extensive vocabulary development” (Calderon & Slakk, 2018). The ExC-ELL model is quite similar to another, perhaps, better known SEI model called the Sheltered Instruction Observation Protocol (SIOP) model (Echevarria et al., 2008). Both are designed to incorporate explicit language instruction into academic content instruction, so that ELs will learn both English and academic content in tandem in the context of the general education classroom. However, the ExC-ELL model has a stronger focus on literacy while the SIOP model tends to emphasize oral interaction.
We are not aware of any independent assessment of the efficacy of the ExC-ELL model. However, many aspects of the ExC-ELL model are aligned with evidence-based recommendations for EL instruction. For examples, the Institute of Education Sciences (Baker et al., 2014) makes four recommendations for EL academic instruction: (a) intensive academic vocabulary instruction, (b) integration of oral and written English language instruction into content teaching, (c) regular opportunities to practice writing, and (d) small group intervention for struggling language learners. The first three components of the recommendations occupy large portions of the SEI training. On the contrary, it is also important to remember that SEI is monolingual instruction; indeed, the SEI training was developed in the context of a state that had banned bilingual education. Therefore, the SEI course provides no training for teachers on how to utilize ELs’ first language resources. The concept of translanguaging, which emphasizes the importance of recognizing and utilizing all of multilingual learners’ linguistic resources (Otheguy et al., 2015), is certainly not reflected in the course, and neither are strategies that take advantage of the students’ oral and literacy skills in their first language to develop literacy in English and to facilitate their academic learning, even though the field has long recognized the importance of first-language resources in ELs’ literacy development in English (e.g., August & Shanahan, 2017).
Teachers were required to pass the course, but they were not required to pass a particular test. About 3.8% of teachers who enrolled in the training failed at least once.
Challenges Associated With Statewide Implementation
MADESE hired and mandated training for course instructors in an attempt to ensure the fidelity of course implementation. MADESE listed a total of 47 approved providers for the course throughout the state, including 17 school districts, 16 higher education institutions, and 14 outside providers such as educational collaboratives. 4
The instructor training strongly emphasized the need to implement the course with fidelity and that individual instructors could not modify the curriculum at will. Nonetheless, need for so many providers to train such a large number of teachers across the state naturally raises questions about variability in how the training was supplied in practice. Indeed, Chang-Bacon (2022) found considerable variation in the extent to which the 33 interviewed SEI course instructors adhered to the specified course curriculum. Some followed the rules closely, some made small changes to personalize the course, while others modified the course content significantly adding more theories and social justice perspectives they felt the course was lacking.
The scale at which the RETELL requirement was implemented also imposed challenges associated with teachers taking up the material in a way that informed their practice. For example, many providers had no prior relationship with the educators they taught, which Kennedy (2016) has identified as a potential weakness in PD designs. Furthermore, statewide variation across schools in their social resources necessary to support teachers to implement what they learned could create variability in the effectiveness of the training at scale in a way that would push the average effect of the training toward zero (Matsumura et al., 2010). Furthermore, that the PD was mandatory and initiated external to the teachers’ districts raises questions about variation in teacher motivation and ability to enact instructional strategies in ways that reflected the realities of their instructional context. For example, Chang-Bacon (2022) describes teachers who perceived the course as an unnecessary bureaucratic requirement and even nicknaming the initiative as “RE-HELL.” Consequently, at least some instructors interviewed reported not believing (or in some cases not caring) that teachers were likely to actually use the strategies covered in the course. Once teachers completed the course successfully, there were no follow-up coaching or observations of their teaching in their own classrooms. Thus, the extent to which teachers actually changed their instruction after completing the training is unknown.
Potential for Spillovers to Non-ELs
Although the primary purpose of the SEI endorsement requirement and the RETELL initiative more broadly was to improve instruction for ELs, the fact that the training was provided to teachers working in general education settings presents an opportunity for spillover effects to non-ELs that the teachers also instruct.
Of course, some techniques for instructing ELs would not likely have effects for non-ELs. For example, August et al. (2005) summarize research showing the importance of taking advantage of connections between a student’s native language and English (Dressler, 2000; Durgunoğlu et al., 1993; García, 1991; Hancin-Bhatt & Nagy, 1994; Jiménez et al., 1996) and ensuring that ELs know the meaning of basic words (Beck et al., 2013; Calderón et al., 2005).
However, there are also several instructional strategies that are likely to be beneficial for both ELs and students proficient in English. Nearly two thirds of the time allocated within the SEI endorsement training under RETELL was dedicated to strategies for instructing students in academic content such as vocabulary, reading, writing, and differentiation. Some of the strategies emphasized by the course, such as modeling and explicit instruction, have also been found to benefit struggling readers (Connor et al., 2011), students with disabilities (Jones et al., 2022), and even the larger general student population (Cohen, 2018). Exposure to such instructional strategies within this PD could also improve a teacher’s general effectiveness in a way that benefits all students they instruct regardless of their facility with English.
Notably, one could also imagine scenarios in which completing the training required to obtain an SEI endorsement could reduce a teacher’s impact on non-ELs in the classroom. For example, the training might reveal to teachers that they have not previously dedicated sufficient time to instructing ELs, which, perhaps appropriately, could divert some attention they previously directed to non-ELs.
Potential for Heterogeneous Effects by Teacher Experience
At the teacher level, we investigate heterogeneity in the effect of obtaining an SEI endorsement by years of prior classroom experience. Despite a widespread perception that PD is more or only effective for early career teachers (Hill et al., 2022), some recent studies have found that later-career teachers can benefit from PD (Papay et al., 2020; Santagata et al., 2010). Nonetheless, that teachers’ skill development is steepest during the beginning of their teaching career (Kraft et al., 2020) suggests the potential for their effectiveness to be more malleable than is the case for later-career teachers.
Furthermore, teachers at different career stages may benefit from different types of PD, which could lead to variation in the effectiveness of any specific PD by teacher experience level. For instance, Floden et al. (2020) suggest that more experienced teachers might benefit from exposure to new expectations for student learning and best practices, while newly prepared teachers might already be well-versed in such practices but would especially benefit from learning strategies for implementing such ideas and practices in the classroom. Researching in the context of a developing country, Loyalka et al. (2019) found a positive effect of PD that was restricted to only “less qualified” teachers. Master et al. (2016) finds some evidence that novice teachers in particular benefit from pre-service and in-service training and ESL certification.
In addition, variation in the perceived usefulness of the mandated PD under RETELL and willingness to alter instructional techniques to align with the training could be associated with prior classroom experience. Although Chang-Bacon (2022) did not present such an analysis, it seems plausible that the variation in teacher perceptions of the need for the state-mandated course and hostility toward the requirement could be more prevalent for later-career teachers than for early career teachers.
Data
We use longitudinal administrative data for the universe of Massachusetts public school students and their teachers for school years 2010–2011 through 2017–2018 provided by MADESE. Student-level data include demographic and classification information and math and English language arts (ELA) scores on the state’s spring standardized test—the Massachusetts Comprehensive Assessment System (MCAS)—which we standardize by subject, grade, and year to have a mean 0 and standard deviation 1. For ELs, the data also include the student’s proficiency levels on the annual English proficiency test that is used to inform the decision to reclassify a student as no longer an EL. 5 We use data on student and teacher classroom assignments to match students with their teachers. See the Supplementary Appendix in the online version of the journal for a detailed description of the data and matching process.
Our primary analyses evaluate the effect of assignment to an SEI-endorsed teacher on a student’s standardized math and ELA scores, with special consideration for the effect for ELs. A notable limitation of focusing on test scores in our context is that their lack of proficiency in English can interfere with ELs ability to demonstrate their content knowledge on standardized tests (Abedi et al., 2004; Faulkner-Bond & Sireci, 2015; Kieffer et al., 2009; Lane & Leventhal, 2015; Martiniello, 2009; Robinson, 2010). AS our analyses compare ELs with other ELs, this issue is unlikely to bias the estimates, but it could reduce the precision of estimates within models evaluating the impact of the training for ELs. As do other states, Massachusetts provides accommodations to ELs to produce scores that are comparable with students who are proficient in English. In addition to content validity, a Bias Committee annually reviews the appropriateness of test items and passages to ensure that students are not disadvantaged for reasons that are not educationally relevant. 6 Most concretely, the technical documentation for the test reports very similar reliability properties for ELs as for all students in the state (MADESE, 2013). Nonetheless, it is not possible to know the extent to which this issue decreases the precision of our estimates when measuring effects for ELs.
The estimation sample includes observations for students in Grades 4 to 8 and Grade 10 with valid information on included variables. 7 We exclude third-grade students because the analysis controls for the student’s test score in the prior year, and testing begins in the third grade. The state does not administer a ninth-grade test. For tenth-grade students, we control for the student’s test score in the eighth grade.
We include students whom we can match to only a single teacher in the respective subject. Results are similar if we include students assigned to multiple teachers in a subject and we randomly choose the teacher to account for in the regression. We include only instructors who are classified as a “teacher,” and thus we omit other classifications such as coteachers and paraprofessionals.
We combine the administrative data with records from the SEI course, also provided by MADESE. For each year from 2012–2013 (pilot year) through 2016–2017, the data identify each teacher who enrolled in the SEI course, their completion date, course type (full, long-bridge, short-bridge, administrator), and indicate whether the teacher passed or failed the course. 8 A table in the Supplementary Appendix in the online version of the journal reports the distribution of the months that teachers completed the training, weighted by the students they instructed. Most teachers completed the training either in January, June, or August. For the purposes of the analysis described below, we classify a teacher as having been trained during a particular school year if they had successfully completed the training by the September of that school year. Results in a separate table in the Supplementary Appendix in the online version of the journal show that if we instead classify a teacher as trained if they had completed the training by January of the school year—thus including teachers who were trained by midyear—the results are in a similar direction but move toward zero.
Table 1 presents descriptive statistics for relevant variables.
Descriptive Statistics: ELA and Math Baseline Samples
Note. ELA = English language arts; EL = English learner.
Identifying the Causal Effect of SEI Endorsement Training
The goal of this article is to uncover the causal effect that obtaining an SEI endorsement under the RETELL initiative had on a teacher’s contribution to student outcomes. Our ideal experiment would be to randomly assign the training to teachers after students have already been assigned to classrooms. However, as we discussed in the section “RETELL and the SEI Endorsement Requirement,” the SEI endorsement training was rolled out to schools and districts on the basis of observables. Furthermore, much of the training was conducted during school breaks and hence occurred prior to the mapping of students to classrooms.
We expect that a naive comparison of outcomes between students with trained and untrained teachers will be biased by two sources of selection. First, the roll out of the program across the state effectively prioritized teachers in urban and underperforming school districts. This source of selection would lead us to conflate student socioeconomic characteristics with the impact of completing the training. Thus, we expect that the naive contrast is biased downward. Second, we worry that after teachers have received training, administrators within schools may endogenously sort students to trained teachers on the basis of ability. If trained teachers are more likely to receive EL or other typically low-performing students, this second source of selection would also lead the naive contrast to be biased downward.
We address these sources of selection empirically by leveraging cross-teacher variation in the timing of training via a two-way fixed-effect model, also known as a generalized difference-in-difference strategy. 9 The first difference is across teachers, the second difference is over time. Intuitively, we would like to compare the classroom-level trend in test score gains among teachers who have received the training to those that have not. Because the comparison across teachers is via trends, our identification strategy addresses the first source of selection by effectively differencing out variation in the socioeconomic and demographic composition of students across schools and districts. To address the second source of selection, we will hold constant test scores from prior years and hence focus our attention on trends in test score gains. Thus we ask, “Do average test score gains increase suddenly in the classrooms of teachers who receive the training relative to the classrooms of teachers who do not?” Thus, compositional changes emerging from the endogenous sorting of students to teachers are effectively controlled for by differencing out average student ability at the classroom level.
This intuition leads us to estimate variations of the following two-way fixed-effects regression:
where yit is the test score of student i at time t; δ j is a fixed effect for teacher j, with j = j(i, t) representing a one-to-one mapping between students and the teachers to whom they have been assigned at time t; φ t is a time period fixed effect; τ jt is an indicator for whether teacher j has completed the SEI endorsement training as of September of school year t; Xit are controls for the demographic characteristics for student i at time t 10 ; and ϵ it is a projection residual which is uncorrelated with the included regressors by definition (Angrist & Pischke, 2008).
Our parameter of interest is β. Without further assumptions, β identifies a weighted average of the underlying two-by-two classroom-level difference-in-difference comparisons (Goodman-Bacon, 2018). 11 Provided the average classroom-level trend in test score gains of teachers who have not received or not-yet received the training is an accurate counterfactual for the average classroom-level trend of teachers who did receive training, β is properly interpreted as the causal effect that obtaining an SEI endorsement through the RETELL initiative has on student outcomes. This restriction on the treatment group counterfactual trend needed for causality is commonly referred to as a parallel trend assumption. The results from event-study style regressions that we report as an identification test in the “Identification Test” section suggest that the parallel trends assumption holds in our case and is consistent with providing a causal interpretation to the results.
We use the student’s prior test score to account for nonrandom sorting of students to teachers. Thus, the model directly accounts for the possibility that administrators choose to sort lower-performing students into a teacher’s classroom once they have obtained an SEI endorsement. Our results will be biased if student assignments to a teacher’s classroom are related to their obtainment of an SEI endorsement in a way that is not captured in the student’s test score during the previous year.
We acknowledge here that there is some subtlety to the interpretation of the estimand of the two-way fixed-effects model when there is variation in treatment timing. Rather than identifying the Average Treatment Effect (ATE), because teachers obtain endorsements at different time periods, Model 1 identifies a variance weighted average of any underlying heterogeneous treatment effects. Specifically, a teacher’s impact on the overall estimate is proportional to the number of periods they were observed prior and after obtaining the endorsement, with those treated in the middle of the panel contributing more weight to the estimate than those who were treated at the extremes (Goodman-Bacon, 2018). Of particular concern is the fact that the weights can be negative when the treatment effect changes over time (Goodman-Bacon, 2018; de Chaisemartin & d’Haultfoeuille, 2020). However, this is unlikely to be a problem in our application, AS we find little evidence of strong dynamic effects in our event-study type specifications. Furthermore, we note that despite being a potentially biased estimate of the ATE, the variance weighted average can also be a substantially lower variance estimator (Angrist & Pischke, 2008). Ultimately, the goal of this article is to produce an estimate of the RETELL treatment effect that is as close as possible to the actual impact of the policy, in which case trading-in some bias in exchange for a reduction in variance can be desirable (Friedman et al., 2001).
Note that our preferred specification analyzes the data at the student level rather than the (arguably more natural) specification which aggregates the data to the teacher level. This choice affords us two benefits: First, it transforms our estimate into a student-weighted average which we believe is the natural way to interpret the program impact; second, this allows us to leverage potential efficiency gains afforded by the inclusion of student-level covariates. However, as training is assigned to teachers (not students), we cluster our standard errors at the teacher level to account for within-teacher autocorrelation in training status in accordance with the design-based view of Abadie et al. (2017).
We also note here that our choice to control for lagged test scores (and hence focus on gains) does impose a nontrivial cost. While this choice allows us to address the endogenous sorting of students to teachers, it will also force us to interpret our estimate as representing a pure short-run effect. To the extent that having an SEI endorsed teacher last year affects a student’s test scores in the current year, this causal effect of the training will also be indirectly controlled for by the inclusion of the student’s test score from last year in the regression. 12 We address this concern in the Supplementary Appendix in the online version of the journal by showing that having a trained teacher in the prior year does not have an independent effect on this year’s test scores, conditional on the student’s prior-year test score. Thus, we believe the benefits of addressing the potential for endogenous sorting by focusing on test score gains outweighs the associated cost.
In addition to estimating the effect of obtaining an SEI endorsement under RETELL at all, we also report the results from models that differentiate between teachers who completed the full training and those who completed one of the shorter supplemental versions of the training. 13 We might expect the effect of obtaining the endorsement to differ by type of training both because of differences in the prior exposure to similar training that led teachers to be assigned to their particular training and also because of differences in the amount of new material that was presented to the teachers.
From a policy perspective, we are primarily interested in the effect of having a trained teacher for students who are currently classified as an EL, as this is the group of students that RETELL was specifically designed to affect. When the sample is restricted to include only current ELs, the regression includes controls for indicators of the student’s proficiency in English at the end of the previous year as measured by their performance level on the proficiency assessment that the state uses to inform reclassification decisions. 14 The estimates are nearly unchanged when this variable is excluded.
EL is a transient category in that students are no longer classified as an EL after they are deemed to have sufficient proficiency in English. It is possible, however, that students might continue to respond to the instructional strategies taught within the SEI endorsement course. Thus, we also report results for the effect of having a trained teacher on the performance of Ever-ELs, which includes both current ELs and former ELs who have been reclassified as fluent-English-proficient. 15 When the sample is restricted to include Ever-ELs, we control for the student’s prior proficiency level on the Massachusetts English Proficiency Assessment (MEPA) if it is available. These controls account for differences in the student’s ability to understand English, which is part of the data-generating process for student math and ELA test scores.
As described in the section “Potential for Spillovers to Non-ELs,” we are also interested in investigating the potential that a teacher completing the training under RETELL had spillover effects for non-ELs they instruct. In particular, we report results for estimates within samples restricted to include students with disabilities receiving special education services, and the broader group of non-ELs. Because the large majority of students in these groups were never classified as an EL and thus did not take the MEPA assessment, when estimating within these samples, the regression does not include a control for prior English proficiency level.
In addition to the main results evaluating the effect of the training on teacher impacts overall on average, we also present models looking for heterogeneity in the treatment effect by teacher experience level. Unfortunately, our data do not include specific information on the teacher’s prior years of classroom experience, and thus we use the teacher’s first hire date within the district as an imperfect but reasonable proxy. We bin teachers according to years since first hired within the district into categories that largely represent quartiles for the workforce in Massachusetts. For these models, we interact treatment with the teacher’s experience bin. We use the Bonferroni correction when inferring differences between groups to account for the assessment of multiple comparisons.
To evaluate the robustness of our results and contribute estimates of potential policy interest, we report the results from several additional models in the Supplementary Appendix in the online version of the journal. First, to address the potential that the large demographic differences across cohorts and the potential that the training could change over time, we show that there are not major differences in the estimated effect by either the teacher’s cohort under RETELL or the year in which the teacher completed the training. We also show that our results do not appear to be driven by differences in the impact across grade levels. We find some differences in the effect based on the student’s prior proficiency level on the MEPA/ACCESS assessment, although these estimates are too imprecise to interpret confidently.
Results
Average Effect of Obtaining an SEI Endorsement on Student Math and Reading Scores
Table 2 reports the results from regressions evaluating the effect of completing the SEI endorsement training on teachers’ impact on average student ELA and math scores. For each subgroup, the table reports results from models that evaluate the effect of completing the training overall as well as a model that differentiates according to whether the teacher received the full training or one of the shorter versions.
Average Effect of Training on Student Math and ELA Scores
Note. Standard errors in parentheses. ELA = English language arts; EL = English learner; MEPA = Massachusetts English Proficiency Assessment.
p < .10. **p < .05. ***p < .01.
For ELs, our subgroup of primary interest, we find no significant effect from completing the training overall in either ELA or math. The results from models that include all students who are observed as an EL at some point in the data (Ever-ELs) are quite similar to the results from models restricted to current ELs. For both subjects, the estimated effect of the training is negative, and the coefficients are estimated precisely enough to detect meaningful effects. The point estimates from models that separately estimate by the type of training suggest the potential that the full training had a positive effect for current ELs and Ever-ELs.
However, these models lack the necessary statistical precision to definitively distinguish the estimate for the full version from the estimates for the bridge versions.
We find some evidence of positive spillover effects for students with disabilities and for the larger group of Never-ELs on the ELA exam overall and on the math exam for those who completed the full training. Although statistically significant, the magnitude of these potential spillover effects on the ELA exam is modest. We again cannot in these models statistically distinguish between the effect of the full training from the effect of the bridge versions of the training.
Effect by Teacher’s Years of Employment Within District
The results illustrated in Figure 2 suggest that the training had a significant positive effect on the performance of teachers hired within the previous 3 years. For ELs, the estimate is positive but imprecisely estimated on the ELA exam. The effect for ELs is statistically significant on the math test for teachers in secondary grades. The magnitude of the effect of the training for these newly hired teachers when combined across grades (between 0.03 and 0.06 standard deviations) is quite consistent across subjects, grade level, and subgroup. In contrast, we find no evidence that the training had an impact on the performance of teachers who have been working within the district for longer than 3 years. The imprecision in the ELA analysis leads all comparisons across categories for years since hired to be statistically insignificant. However, several of the differences between newly hired teachers and teachers with more in-district experience remain statistically significant at the 10% level after applying the Bonferroni adjustment for multiple comparisons.

Effect by teacher years since hired within district. (a) ELA. (b) Mathematics.
Identification Test
The primary assumption required to interpret β as the causal effect of completing the training on the outcomes of a teacher’s students is that there are no time-variant factors that are associated with both the timing of the teacher completing the training and the outcomes of students in her class. We test the plausibility that this assumption holds by conducting an event-study analysis that measures changes in teacher effects in the years leading up to and following the training. In particular, we estimate a regression taking the form,
where k is an index for the number of years from the teacher’s training, such that k = 0 represents the year prior to completing the training. We estimate Equation 2 for each subgroup and for a variety of different categorizations.
Figure 3 illustrates results from models looking at the impact of the training on the average test scores for ELs, students with disabilities, and students never observed to be ELs for the full sample and for samples restricted by grade level. Consistent with the results in Table 2, there is no clear post-training difference in the outcomes for ELs but there appear to be positive post-treatment effects for students with disabilities and students never observed as ELs. 16

Identification test.
The patterns illustrated in the figures are generally consistent with applying a causal interpretation to the significant impacts identified in the regressions. In the cases where we previously found impacts from the training—the ELA exam for students with disabilities and Never-ELs—the pattern is for there to be no significant difference in years prior to completing the training followed by a jump in performance in the year immediately following the training.
Discussion and Conclusion
We use a generalized difference-in-difference design to estimate the causal effect of completing an intensive PD to obtain an SEI endorsement under Massachusetts’ RETELL initiative on in-service core academic public school teachers’ effectiveness. We find that a teacher completing the endorsement did not achieve the primary goal of significantly improving their impact on ELs they instruct, on average. However, we identify small spillover effects to other students in the classroom and potentially important heterogeneous impacts by teacher experience level. The existence of such spillovers suggests that the widespread training requirement did invoke a response and led to some learning improvements across the state.
Because the course emphasized several strategies that could benefit all students regardless of their language proficiency status, the existence of small spillovers for non-ELs is not itself surprising. However, it is at least somewhat puzzling for such spillovers to exist despite the training having no discernible impact for the students to which the training was targeted. One possible explanation is that receiving instruction within the SEI model itself limits ELs’ ability to respond to changes in teacher practices. The SEI instructional model essentially supports ELs’ fitting into the existing model of education that centers the experiences of White monolingual middle-class students. It is possible that ELs could be more responsive to the instructional strategies taught in the course if they were instead learning within a two-way bilingual education program. That is, improving teachers’ monolingual instruction alone may have limited impact on ELs’ academic learning. However, the existing data do not allow us to directly investigate this or other potential factors underlying this pattern of results.
Considered with the understanding that the instructional strategies taught under the required training would be expected to benefit all students, we present an example of an intensive PD distributed to teachers at an expansive scale leading to measurable teacher quality improvements. However, the positive spillover benefits that we identify for students with disabilities and the large subgroup of Never-ELs are small. It remains unclear the extent to which the potential impact of the training for both ELs and non-ELs was muted by challenges associated with implementing such a mandate at large scale. It is important for future research to distinguish the conditions under which training that is effective for small groups of teachers within the context of a supportive environment can maintain its impact within the context of a mandated training at a large scale.
Our finding that the statewide mandated training benefited teachers recently hired by the district but not longer serving teachers is both policy relevant and worthy of future examination. Our results are consistent with Loyalka et al.’s (2019) analysis of a mandated PD in China, which found a positive effect that was restricted to only “less qualified” teachers. In a practical sense, if required widespread PD only benefits recently hired teachers, then the potential for large-scale PD as a tool for making immediate and widespread improvements to the teacher quality distribution could be quite limited. However, especially if the training has similar benefits for pre-service teachers, there could ultimately be a delayed but substantial improvement in instruction within the state over time, including for ELs, as a new generation of teachers who have been exposed to the training early in their career expands through the workforce.
We encourage additional research not only documenting the impact of widespread PD by teacher experience but, if our result holds in other contexts, also establishing the factors that produce this differential response. For instance, if teacher needs from PD differ based on the teacher’s career stage, then we would expect differences in the effectiveness of a uniform required PD for more and less experienced teachers. Furthermore, mid- and late-career teachers might be less willing to change their instructional practices in response to a mandated training, in which case policymakers may need to explore PD approaches that are particularly impactful for teachers who have established teaching practices.
Finally, our results are relevant to research on effective instructional practices, although we caution that the interpretations for our findings are limited in this area. In particular, it is important to keep in mind that we measure the causal effect of obtaining an SEI endorsement under RETELL, which is likely not the same as the effect of a teacher utilizing the instructional strategies taught in the training. We do not observe the extent to which teachers changed their instructional practices. Furthermore, it is possible that the expansive nature of the training across the state affects the quality and consistency of the training. Future research using causal research designs is necessary to more fully understand how teachers’ take-up of instructional strategies learned in required training affect the academic performance of ELs and other students.
Supplemental Material
sj-pdf-1-epa-10.3102_01623737221136101 – Supplemental material for Professional Development at Scale: The Causal Effect of Obtaining an SEI Endorsement Under Massachusetts’s RETELL Initiative
Supplemental material, sj-pdf-1-epa-10.3102_01623737221136101 for Professional Development at Scale: The Causal Effect of Obtaining an SEI Endorsement Under Massachusetts’s RETELL Initiative by Jesse Bruhn, Nathan Jones, Yasuko Kanno and Marcus A. Winters in Educational Evaluation and Policy Analysis
Footnotes
Acknowledgements
The data can be obtained by filing a request directly with the Massachusetts Department of Elementary and Secondary Education. The authors are willing to assist. We would like to thank the Massachusetts Department of Elementary and Secondary Education for providing the data necessary to conduct the analysis and for valuable feedback. In particular, we are grateful for assistance from Carrie Conaway, Matthew Deninger, Paul Aguiar, Kendra Winner, Sibel Hughes, Zhaneta Liti, Aubree Webb, and Elana McDermott. We received valuable feedback from Martin West, Matthew Kraft, and participants at the annual conference for the Association for Education Finance and Policy annual conference and the Applied Microeconomics seminar series at Boston University. Vittoria Dicandia and Thomas Pearson provided excellent research assistance. The views expressed and any errors remaining are the authors’ alone. The authors have no other interests to declare.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: We are grateful to the W. T. Grant Foundation for providing financial support for this research. The Foundation had no other direct involvement in the research.
Supplemental Material
Supplemental material for this article is available online.
Notes
Authors
JESSE BRUHN, PhD, is an assistant professor of economics at Brown University, Providence, RI, USA. His research focuses on labor economics.
NATHAN JONES, PhD, is an associate professor of special education and education policy at Boston University, MA, USA. His research focuses on teacher quality and teacher development. A particular focus over the last several years has been on the quality of instruction received by students with disabilities in both general education and special education.
YASUKO KANNO, PhD, is an associate professor of language education at Boston University Wheelock College of Education & Human Development, MA, USA. Her research focuses on high school English learners’ access to postsecondary education.
MARCUS A. WINTERS, PhD, is an associate professor and chair of the Department of Educational Leadership and Policy Studies at the Boston University Wheelock College of Education & Human Development and faculty director of the Wheelock Educational Policy Center, MA, USA. He is an applied micro-economist with research interests within K–12 education policy, especially topics related to school choice and teacher labor markets.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
