Abstract
Limited scholarship has examined school districtwide turnaround reforms beyond the first few years of implementation or efforts to replicate successes in new contexts. We studied Massachusetts, home to a state takeover of the Lawrence school district that led to academic gains in early reform years and where state leaders attempted to replicate this success in three additional communities. We used statewide student-level data (2006–7 to 2018–19) and event study methods to estimate medium-term impacts on student outcomes. We found that improvements were largely sustained in Lawrence. We observed evidence of successful replication in the Springfield Empowerment Zone but not Holyoke or Southbridge. Cases with positive outcomes struck a unique balance between both state and local decision-making authority, suggesting that multilevel governance can provide one pathway for effective state-led school district improvement.
Introduction
Identifying the optimal balance of state and local authority for improving the performance of public institutions is an enduring puzzle for scholars across a wide range of disciplines. We studied this topic in the context of K–12 educational systems via a cross-case examination of four public school systems targeted by the state of Massachusetts for improvement reforms based on perceived low performance. States often and increasingly take the lead when they perceive local school districts are underachieving. Despite states’ constitutional responsibilities for public school systems, these state-led efforts often lack legitimacy with the public and have a mixed track record of success.
We sought to understand whether states can sustain gains made at the start of districtwide improvement initiatives, whether they can replicate effective systemwide reforms from one context to another, and whether successful reforms have features in common that can inform state-initiated district improvement policy more broadly. We therefore addressed three research questions:
Was Massachusetts able to sustain the early achievement gains in Lawrence—the first community in our sample targeted for state-led district improvement?
Was the state able to replicate the early Lawrence gains in three other low-performing contexts—Springfield, Holyoke, and Southbridge?
Were the impacts of these reforms consistent with the theory of action behind what we will call multilevel governance?
We found that Massachusetts was largely able to sustain the initial gains made in Lawrence up to 7 years after reforms began, although sustaining the rate of growth appeared more complicated because the rate of the initial gains leveled off over time. The state also was largely able to replicate the Lawrence successes in the Springfield context, where reforms generated large positive effects on several outcomes, at least through their fourth year (the farthest year we studied). In contrast, leaders had less success in Holyoke and Southbridge, where reforms led some outcomes to dip, at least in the 3–4 years we examined. The two more successful cases shared common features in their approach to improvement that aligned with a theory of action behind multilevel governance. In these contexts, leaders capitalized on the advantages of the governance shift away from an elected school board while actively guarding against the potential pitfalls of state takeover by incorporating local representation into district- and school-level decision making. Although we are not in a position to make strong claims about why some cases worked better than others given our study setup, our work provides suggestive evidence for one pathway through which state-level leaders pursuing district turnaround might find an effective balance between state and local authority for school system improvement. Future research should test this theory in a more direct and rigorous way.
Motivation: Why Study the Sustainability and Replicability of State-Led School District Improvement?
Educational inequality based on race, ethnicity, and socioeconomic status remains unacceptably high in the United States (Hashim et al., 2020; Hanushek et al., 2019). One way policymakers attempt to confront such inequality is through turnaround reforms, which represent efforts to rapidly improve outcomes for students in the public K–12 schools identified as among the lowest performing in a given state. These schools disproportionately serve low-income children and students of color, and therefore, improving them could put a meaningful dent in educational inequity. The federal government has devoted significant funding to these turnaround efforts, including the current Comprehensive Support and Improvement program, which provides notable leeway for local leaders to design policy responses (Meyers et al., 2022). Future policy developments are likely to push even greater decision making to the state and local levels. This policy context therefore motivates the continued need for research on best practices in school improvement policy to inform state and local decisions.
The existing turnaround literature has focused heavily on school-level improvement efforts; however, there are theoretical and empirical reasons to believe that more attention should be paid to district-level reforms. Scholars from organizational behavior, economics, and political science traditions alike have made the theoretical case that school districts play a crucial role in the education-production function (Blazar & Schueler, 2023). Scholars of education have argued that districts may have greater capacity to create the conditions and systemwide coherence needed for schools to succeed than individual schools on their own (Johnson et al., 2015; Peurach, 2011; Supovitz, 2006; Zavadsky, 2013). Quantitative evidence decomposing variations in student achievement illustrates that districts play a nontrivial role in producing these outcomes (Chingos et al., 2015). In the context of turnaround, a recent meta-analysis found suggestive evidence that districtwide improvement efforts were associated with greater gains on average than school-level turnaround reforms (Redding & Nguyen, 2020; & Nguyen, 2020; Schueler et al., 2021). More generally, a recent report from a long list of prominent scholars and leaders highlights the pressing need for new research focused on district leadership and policy (Schwartz et al., 2023).
Another limitation of the existing turnaround research is that it has tended to focus on the short-term impacts of reforms, with limited attention paid to whether initial impacts are sustained over time. A meta-analysis of evaluations of turnaround interventions implemented in the post–No Child Left Behind (NCLB) period of universal test-based accountability found that two thirds of estimates examined <3 years of reform implementation (Schueler et al., 2021). This is an unfortunate omission because the evidence prior to NCLB concluded that school improvement takes time, particularly given that these reforms are often undertaken in contexts of concentrated disadvantage and therefore require systemic solutions (Cohen et al., 2014). More specifically, scholars argue that typically at least 3 years of reform implementation are necessary for these interventions to demonstrate results, if not more (Borman et al., 2003; Desimone, 2002; Gross et al., 2009; Peurach & Neumerski, 2015). Therefore, more research is needed on the impacts of districtwide turnaround reforms beyond the first 2 years of implementation.
The few studies of post-NCLB district improvement that examine outcomes beyond the first few years yielded mixed results. Harris and Larsen's (2022) work on New Orleans, Louisiana, showed that reforms enacted in the aftermath of Hurricane Katrina generated notable improvements in student achievement that persisted up to 8 years later. Pham et al. (2020) documented more mixed results in Tennessee, with positive effects from the iZone reforms (driven by earlier cohorts) and null effects for the Achievement School District up to 6 years into the initiatives. Chin et al. (2019) studied districtwide reforms in Newark, New Jersey, and found initial declines in achievement followed by a rebound, ultimately resulting in modest improvements in reading achievement growth but no change in math after 5 years of implementation. Given the variation across this small set of case studies, further research is needed concerning what makes some district improvement efforts more successful in the long run than others.
A final limitation of the existing literature is that there has not been significant attention paid to the issue of whether districtwide reforms are replicable across contexts. The case studies have often focused on situations that were exceptional in some way and/or the first of their kind in their respective contexts. For example, the New Orleans reforms were implemented in the context of a major national disaster and resulted in what is now the only all-charter-school district in the country. Prior research in Massachusetts has focused on the Lawrence turnaround, which represented the first state takeover enacted after the passage of a new law giving the state greater authority when intervening in low-performing systems (Schueler et al., 2017). Therefore, it is important to understand whether it is possible to transport effective districtwide reforms and successfully apply them in new contexts. There is also an open question of whether states have the capacity to effectively support more than one takeover at a time as they try to replicate successes. On the one hand, it could be difficult for state agencies to support multiple districts at the same time (Torres, 2024). On the other hand, states could learn from earlier interventions and get more effective at supporting improvement over time.
The broader literature on the sustainability and scalability of district improvement problematizes the notion that the field simply needs to identify best practices for turnaround and then implement them with fidelity in new places (Elmore, 2016; Honig & Hatch, 2004). Notably, Peurach (2011) argued that reformers often act as though there is a quick fix to educational challenges that can be implemented in a top-down manner without a need to grapple with the true complexity of school improvement work. Complexity is reflected in the numerous interdependencies between districts, schools, interventions, and support organizations as well as between reforms and the contexts in which they are implemented. In other words, it's not just a matter of identifying what works but rather what combinations of approaches work for whom, when, and where (Strunk, 2023), as well as how leaders can facilitate continuous improvement. In fact, standardizing reforms may be harmful if this results in the loss of local influence. Instead, scholars argue that leaders should work collaboratively to retain practical local knowledge to enhance policy effectiveness (Peurach & Glazer, 2012; Peurach & Neumerski, 2015). A related literature focuses on teachers as implementers of school improvement policies and the need to secure their buy-in to ensure reform success (e.g., Higgins, 2022; Leithwood & Strauss, 2008; Weatherly & Lipsky, 1977). Therefore, an important empirical question is whether variation in attention to issues of complexity, local input into decision making, and implementation challenges explains differences across districts in the success of systemwide reforms.
Conceptual Framework: The Theory for Multilevel Governance
State Takeover as a District Improvement Strategy
One approach to districtwide turnaround that has become increasingly common over time is state takeover, which represents a change in educational governance that typically removes authority from a locally elected school board and places decision-making power with the state. 1 Studies from the pre-NCLB era found that states had little success in improving academic outcomes via takeover (Wong & Shen, 2003). More recent research has shown that takeovers yielded no academic benefits, on average, for targeted districts (Schueler, 2024). Although test-score effects are null on average, scholars have found significant heterogeneity of takeover effects across districts (i.e., some positive, others negative, and still others producing neutral effects) (Schueler & Bleiberg, 2022). Therefore, takeover is not a silver bullet but appears to have benefited student outcomes in some cases. Furthermore, researchers observe heterogeneity of impacts both across and within states, suggesting that state capacity does not explain all the variation in effects. In other words, it does not seem to simply be the case that some states are better at implementing takeovers than others. This motivates the need for cross-district case studies within states—such as this one—to begin to reveal what makes some takeover efforts more successful at enhancing academic achievement than others.
Multilevel Governance
One possibility is that state-initiated turnaround using multilevel governance is more likely to generate and sustain improvements to student outcomes. Multilevel governance refers to a distribution of decision-making authority across various levels of government and types of organizations (Hooghe et al., 2020). Here we are using multilevel governance to refer to an approach that balances state and local power over decision making. The four cases under study all represent state-initiated reforms, with variation in the extent to which leaders emphasized building in formal mechanisms for local authority. By multilevel governance, we do not simply mean creating opportunities for local communities to provide input or feedback to decision makers but rather that local representatives have some formal authority over decision making.
Figure 1 provides an illustration of the theory of action through which we might expect multilevel governance to moderate the effectiveness of state-level district turnaround. Multilevel governance may be beneficial for improving the effectiveness of policy design. Capitalizing on local knowledge and allowing local input into decision making could lead to the effective matching of interventions to the district contexts and to individual schools within a district, such as the design of extended learning time calendars tailored to an individual school's needs or the replacement of particular staff members not well suited to particular school environments. Multilevel governance also may improve the political reception that reforms receive by minimizing the loss of local political and economic power. Policy effectiveness and positive public reception are reinforcing, illustrated by the double-headed arrow in Figure 1, because quick wins garner public support for reforms and politically palatable policy avoids distracting leaders from important implementation work. To consider the potential merits of this multilevel governance approach, we begin by outlining the key theoretical arguments for why state takeover may or may not improve school system performance, as related to four dimensions: representation, interest groups, capacity, and efficiency. For each dimension, we explain the advantages and disadvantages of takeover and then explain how multilevel governance could capitalize on the advantages of takeover while overcoming disadvantages. We summarize this section in Figure 2, which includes the advantages (column 1) and disadvantages (column 2) of takeover on each dimension (the rows) and lists the features of multilevel governance that could resolve these tensions (column 3).

Theory of change.

Theory of how multilevel governance capitalizes on advantages of state takeover while overcoming disadvantages.
Representation
The theory of action behind state takeover as a district improvement strategy is largely rooted in critiques of the typical local educational governance arrangement in which an elected school board is in charge of major district policy decisions and overseeing the superintendent (Henig, 2013; Manna & McGuinn, 2013). Critics of school boards point to the fact that voter turnout in board elections tends to be troublingly low and that the electorate selecting board members is not always representative of the local community, with affluent, White, and older citizens often overrepresented (Gersen & Gersen, 2011; Hartney, 2021; Kirst & Wirt, 2009; Kogan et al., 2021).
In contrast, critics of takeover have a more optimistic view of school boards and their ability to generate policymaking that reflects the interests and needs of local communities (see column 2, row 1 of Figure 2). Critics worry about the loss of local political power for communities that are in the minority statewide (Morel & Nuamah, 2020; Schueler & West, 2022). They are especially concerned about removing democratically elected local representation in cases where there is a demographic or partisan mismatch between local communities and state-level leaders (e.g., a majority Black district taken over in a majority White state) and the implications this may have for equity (Morel, 2018; Oluwole & Green, 2009). Local representation could lead to policy choices that are more aligned with the interests of local community members (e.g., Bryk et al., 2023).
To capitalize on the advantages of greater state involvement while overcoming the disadvantages, multilevel governance would mean shifting from a traditional school board arrangement while building in governance mechanisms for local representation and input into the decision-making process. This could look like an appointed board with a seat or multiple seats reserved for representatives of the local community. This would not simply be an opportunity for locals to voice feedback but to have a representative at the table contributing to decisions.
Interest Groups
Low turnout in local elections increases the extent to which teachers’ unions can influence electoral outcomes (Anzia, 2014; Berry & Howell, 2005; Moe, 2005, 2011). Takeover proponents argue that union influences may make it difficult to implement reforms that benefit students because union interests are not always aligned with the interests of students. This may be exacerbated when there is minimal overlap between the families served by a district and the population of teachers employed in that district. Venue shifting to the state level through takeover may allow reformers to adopt policies that otherwise would face too much resistance from organized groups at the local level.
Critics of takeover, on the flip side, argue that union interests are often consistent with student interests, teachers have valuable local knowledge that can inform a more effective approach to reform, teacher support is critical to sustaining reforms given the significant power these organized groups wield, and union members’ cooperation is critical to implementing policy change (Weatherly & Lipsky, 1977). After reviewing a wide range of system reforms, Cohen and Mehta (2017) argued that it is hard to identify successfully institutionalized initiatives that did not “offer solutions to problems that the people who worked in or around education knew that they had and wanted to solve” (p. 646). Critics also worry about the loss of economic power and job security for central office administrators and teachers that can accompany takeover, particularly in school systems that historically have served as a major venue for economic mobility among people of color (Henig et al., 1999; Lincove et al., 2018). Staff turnover can be destabilizing, disruptive, and costly (e.g., Pham et al., 2020; Rice & Malen, 2010).
To resolve these tensions, multilevel governance would involve leaders’ attunement to public perceptions of reforms and active efforts to build durable coalitions of support for policy changes (Glazer & Egan, 2018). This could be accomplished in part by creating mechanisms for union input into decision making at the district level, greater school-level autonomy for school administrators, local teacher input into decision making at the school level, and the cultivation of support from nonunion constituencies. Leaders could minimize disruptive staffing changes such as mass teacher dismissals, especially when turnaround is undertaken in communities where local labor markets do not provide an easily accessible alternative teacher workforce that is more effective than the existing one. They also could expand outsources of support for reforms beyond teacher unions to include a broader multi–interest group coalition (Stone, 1993).
Capacity
Critics of school board governance point out that many board races are uncontested or not particularly competitive and that board members are not always well compensated. As a result, state-level education policymakers may have greater capacity and relevant expertise for school improvement than school boards. Furthermore, states have the ability to centralize and redistribute resources and capitalize on economies of scale. In contrast, state agencies may lack sufficient staffing and turnaround expertise, and centralizing decision-making authority with the state may be ineffective if state leaders are too far removed from the local contexts to diagnose challenges and tailor solutions to individual districts or schools (Greenblatt, 2018).
Multilevel governance could overcome these tensions by capitalizing on state resources while building in formal mechanisms for tailoring reforms to the district context and across individual schools based on local input (Honig & Coburn, 2008), including allowing flexibility of resource use across school sites. State leaders may have fewer political incentives than local leaders to preserve local administrative bureaucracies, such as an unnecessarily bulky district central office. However, gutting a central office may create challenges for sustaining improvements in the long run after the return to local control. Multilevel governance may involve pushing resources from the district to the school level while preserving the central office and continuing to build its capacity for school improvement. It also could (or alternatively) allow for the pooling of capacity across state and local agencies.
Efficiency
Finally, an appointed individual district leader or set of leaders who can act swiftly in the context of takeover may help make the decision-making process more efficient, whereas a group of elected school board members may require more deliberation and have a more challenging time coming to a consensus or a majority opinion. For these reasons, scholars argue that school boards can suffer from ineffectiveness, inefficiency, and dysfunction (e.g., Hess & Leal, 2005; Payne, 2008). However, by disempowering local authorities, takeover can generate significant local political opposition that can be distracting, require leaders’ time and attention, and make it difficult to implement reforms efficiently and effectively (Burns, 2003; Glazer & Egan, 2018; Jabbar, 2015; Mason & Reckow, 2017; Marsh et al., 2020; Welsh et al., 2020). Multilevel governance could take advantage of the benefits of appointed leaders while including mechanisms for local representation in the decision-making process at district and school levels and proactively building coalitions of support for reforms.
While the four cases studied here represent variation in the extent to which leaders appeared to rely on multilevel governance, we were not able to isolate the effects of the governance approach from other important factors that varied across contexts and may have influenced the success of the reform efforts (either in addition to or instead of the governance shifts). It is also possible that our pattern of results would generalize more to states with similarities rather than important differences from Massachusetts. Therefore, in the next section, we describe the statewide policy context in which the district reforms were undertaken as well as similarities and key differences across the four local communities we studied—both in terms of the preexisting characteristics of these contexts and in terms of the policy reforms undertaken in each district.
Policy Context: Massachusetts
Massachusetts is a valuable context for studying the sustainability and replicability of districtwide turnaround reforms as well as the potential of multilevel governance. The state is home to the Lawrence Public Schools, a historically low-performing district that was taken over by the state in 2012. Previous research documents that the reforms implemented by state-appointed leaders resulted in meaningful academic improvements (particularly in math) in the first 2 years of implementation without slippage on other indicators (Schueler et al., 2017). Based at least in part on this success, the state continued with district improvement efforts in Lawrence and simultaneously intervened in three other low-performing contexts—Holyoke, Southbridge, and Springfield—all of which serve large numbers of low-income children of color. More specifically, Massachusetts enacted state takeovers in Holyoke and Southbridge. Springfield avoided the threat of takeover by partnering with the state on a novel form of state-led turnaround in which an independent board made up of state appointees and local representatives oversaw a set of the district's low-performing schools in what is called the Springfield Empowerment Zone Partnership (SEZP). The nonprofit organization supporting SEZP now has led the creation of similar empowerment zones based on the SEZP model in 10 different states across the country (Empower Schools, 2023). This model has not yet been subject to rigorous independent evaluation, to our knowledge. All four cases represent shifts away from the traditional model of elected school board governance but also embody informative differences in the theories of action that appeared to drive reforms. In particular, in two of the contexts—Lawrence and Springfield—leaders adopted and emphasized what we will argue represents a multilevel governance approach to state-initiated district improvement.
State-Initiated District Turnaround in Massachusetts
Massachusetts is a relatively high-performing state when it comes to K–12 education but struggles with persistent educational inequality (Papay et al., 2020). In addition to its school-level improvement efforts, the state has been engaged in a number of initiatives targeting entire low-performing districts or large clusters of schools within districts. This has been made possible, in part, by the passage of the 2010 Achievement Gap Act, which allowed for state takeover (or receivership) of districts. Once placed in receivership, the State Commissioner of Elementary and Secondary Education appoints a “Receiver” who assumes all the decision-making power previously held by the superintendent as well as the elected school board. The Receiver then enjoys broad authority to make districtwide policy changes and even has the ability to limit, suspend, or change provisions of the existing collective-bargaining agreement and require all staff to reapply for their positions (Commonwealth of Massachusetts, 2010).
The state enacted its first takeover after passage of the Achievement Gap Act—of the Lawrence Public Schools—in 2012 under the leadership of then Commissioner Mitchell Chester. Since then, the state has maintained a leadership role in Lawrence, has undertaken takeover of two additional districts—Holyoke and Southbridge—and has embarked on a unique state-initiated governance arrangement in a zone of schools within the Springfield Public Schools called SEZP. SEZP is the only context studied here that is not a formal example of state takeover but still represents a shift in governance away from a traditional locally elected school board model. The Massachusetts Department of Elementary and Secondary Education (MA DESE) provided technical support to these districts throughout this period and established a new office in 2016 focused on supporting receivership contexts in areas ranging from operations to academic improvement by assessing local conditions, identifying receiver candidates, representing the commissioner in collective bargaining in receivership districts, helping develop improvement strategies, providing ongoing support to Receivers, and strategizing regarding transitions out of receivership. Until this point, the state had enjoyed leadership stability—Commissioner Chester was the longest serving chief state school officer in the country when he passed away unexpectedly in the summer of 2017. In April of 2018, Chester was succeeded by Jeffrey Riley, who had until then been serving as the state-appointed Receiver in Lawrence. Figure 3 provides a timeline of leadership transitions.

Summarizing and comparing interventions and results across contexts.
As we show in Table 1, all four contexts targeted for turnaround were performing on standardized tests well below not only the statewide average, by between −1.20 and −0.60 standard deviations (SDs) on math assessments—but also the average for majority low-income districts prior to the reforms. They all served majority low-income student populations with high concentrations of students of color. All four contexts had higher shares of first-year teachers than the rest of the state, leading up to reforms (see Appendix Table A1 in the online version of the journal). They also each had larger shares of Hispanic teachers, although none came close to having a teaching force that was demographically reflective of the student populations because all had majority White teaching populations (between 68% White for SEZP Cohort 2 and 93% for Southbridge).
Baseline characteristics of the student sample
Note. For each of the turnaround contexts, we averaged across all pre-turnaround years. For example, for Lawrence, we averaged across 2007–08 through 2011–12. For the comparison groups in the last two columns, we averaged across the years that represented pre-turnaround years for all districts (2007–08 through 2011–12). The number of students represents the number of students in a single pre-turnaround year.
There also were similarities in the reforms pursued in each place. All districts implemented new teacher compensation systems that included a career ladder, a pay scale based in part on performance, stipends for extended learning time and teacher leadership roles, and pay increases as a result of negotiations with the unions that resulted in newly ratified collective-bargaining agreements (although Receivers were not required by state law to do so). In all four contexts, leaders prioritized diversifying the educator workforce. However, there also were notable differences among the four contexts and the policies pursued in each place described in the next section. This paper focuses on reforms prior to the pandemic, given that COVID-19 differentially affected turnaround communities (Harbatkin et al., 2022). One implication is that we could only examine a few postreform years for some contexts (e.g., we observed 3 years post-takeover for Southbridge, although results in more recent years may have changed). We refer to academic years with the spring year (e.g., 2015–16 is “2016”).
Lawrence
Lawrence is a midsized postindustrial city located about 40 minutes north of Boston by car. The district serves roughly 13,000 students in 30 schools. Almost all students are growing up in low-income homes (92%), as we showed in Table 1. Prior to turnaround, 88% of students were Hispanic, and 82% had a first language other than English. Lawrence is home to large communities that recently arrived in Massachusetts from Puerto Rico and the Dominican Republic. Based on persistent low performance as well as leadership challenges, Massachusetts placed the district into receivership and appointed a Receiver who began implementing turnaround efforts in 2012–13. At the time, the district performed −0.28 SD below the national mean on English language arts (ELA) exams and −0.20 on math based on the Stanford Education Data Archive, which allows for achievement comparisons across states by norming state exams to the National Assessment of Educational Progress (NAEP) test.
The Lawrence reforms were characterized by efforts to cultivate coalitions of support for reform and partner with key local stakeholders, including union leaders (Schueler, 2019). Not only was the union asked for input, but it also was given authority for operating one district school. There was a marked focus on increasing school-level autonomy and principal decision-making power—at differentiated levels depending on school performance and perceived capacity—and holding schools to higher expectations. The central office budget was reduced by 25%, and funds were pushed to the school level. Per-pupil spending did not increase in Lawrence relative to the increases statewide. If anything, in more recent years of reform, it declined (see Figure A1 in the online version of the journal). Principals, alongside teacher leader teams, were given autonomy over their calendars, interim assessments, staffing, and more. Most of the schools remained under district management, but a small number were handed over to outside operators, including a charter management group, a local nonprofit, and the local teachers’ union. All schools retained neighborhood-based student assignment and a unionized teaching force. Teacher leadership teams that contributed to school-level decision making were a notable aspect of the reforms, advocated for by the local union and codified and expanded over time. In year 4 of the takeover, leaders embarked on a high school redesign process, which was again revamped in year 7.
Throughout the period of study, Lawrence leaders prioritized increased learning time—extending the school day and/or year, building out extracurricular options in collaboration with community partners, and offering tutoring for students in need of support. The district ran “vacation academy” programs for students below proficiency thresholds on standardized exams. For these programs, the district recruited teachers it considered to be effective to work with small groups of about 10 students in a single subject over week-long vacation breaks. Previous work has shown that participation in academies explained a large part of the gains in early years of reform (Schueler et al., 2017). Another focus was on improving human capital. The Receiver's team replaced half of all principals by year 2. They actively replaced a smaller share—roughly 10%—of all teachers in those early years, and leaders publicly highlighted their restraint in this area and commitment to partnership with teachers (Moore Johnson, 2017). Reforms also placed an emphasis on using data to drive instructional improvements and, in later years, on shifting all schools to vetted, standards-aligned curricula, building out early college programs, and enhancing family engagement.
Starting in the 2017–18 year, as part of efforts to begin a process of returning local control, the state appointed a board—the Lawrence Alliance for Education—which included local representation, to serve as Receiver, oversee the superintendent, and include local leaders in decision making. After that year, the original Receiver left the district to become state commissioner and the Lawrence Alliance for Education hired a new superintendent (Moore Johnson, 2021). Weeks into the new superintendent's tenure, Lawrence experienced two major gas explosions, killing one former Lawrence student, leaving many families displaced from their homes, and leading to school evacuations for suspected gas leaks in the fall of 2018. We raise this because we later explore whether declines in outcomes appear due to the effects of these tragic events.
In the year following the gas explosions, the new leaders pushed forward with reforms such as developing a more explicit performance management framework to set common expectations for schools about how to earn autonomies and how school-level funding operated, creating structures for principal collaboration, hiring a new principal who worked on increasing coherence across programs at the high school, standardizing the calendar districtwide, and attempting to build support for a restorative justice approach to discipline. These new reforms were just getting underway in the year prior to the onset of the COVID-19 pandemic. See Appendix Figure A2 in the online version of the journal for a summary of the reforms over the 7-year period we studied here.
Holyoke
Holyoke—the context for the second takeover under the Achievement Gap Act—is a small city in western Massachusetts located about 1 hour and 40 minutes from Boston by car. The district serves roughly 5,000 students in 12 schools. As reported in Table 1, most of the students were growing up in low-income homes (84%) and identified as Hispanic (77%). A large share—but smaller share than in Lawrence—had a first language other than English (59%). In March 2015, three years after the Lawrence takeover, the commissioner recommended takeover of the Holyoke Public Schools. He appointed a Receiver who began reforms in 2015–16 and was at the helm for the entire period under study here. At the time, the district was performing −0.58 SD below the national average in ELA and −0.38 in math. The Holyoke community appeared to express greater resistance to takeover in the early reform years than Lawrence (Fried, 2020).
Holyoke reforms also increased school-level autonomy, although some things remained more standardized across schools than they had in Lawrence, such as the calendar. Rather than reducing central office staff and funding, as the Lawrence team had done, the Holyoke Receiver hired a new central office cabinet and built a team to support principals. Reforms in the years under study tended to target the district's youngest and oldest children. The team expanded pre-kindergarten programs significantly. The leaders redesigned the two high schools into a single campus, invested in career and technical education programs, created a menu of pathway programs and an early college program, and increased the availability of advanced coursework for high schoolers. Later, starting in 2018–19, the district handed management of one middle school (that remained a traditional public school) to an independent charter operator.
The Holyoke reforms also included extended learning time, vacation academies, enhanced enrichment offerings, efforts to improve human capital including principal and teacher replacements—similar to Lawrence with a heavier emphasis on replacing principals rather than teachers—using data to drive instructional improvement, engaging families and teachers through advisory groups, and addressing deferred maintenance to facilities as well as basic operational systems such as a phone communication solution for contacting families. There were increased efforts to ensure that students with disabilities were being served in the least restrictive environment and to incorporate more feedback from families through a Parent Advisory Council. This occurred in the aftermath of pre-takeover allegations of physical abuse of students with disabilities in one particular intervention program. Leaders also expanded the dual-language program and began introducing new curricular materials, although not consistently districtwide until later years outside our study window. Changes are summarized in Appendix Figure A3 in the online version of the journal. Funding increases in the reform period did not outpace statewide increases. If anything, spending declined in Holyoke after the takeover somewhat relative to the state (see Appendix Figure A1 in the online version of the journal).
Springfield Empowerment Zone Partnership
Springfield is a medium-sized city—larger than Lawrence but smaller than Boston—in western Massachusetts, just a 15-minute drive south of Holyoke. Leading up to the reforms, 90% of students were growing up in low-income families and a majority identified as Hispanic (52%). Springfield served a larger share of Black students (20%) than any of the other contexts under study and a smaller share of students whose first language was not English (28%) than Lawrence and Holyoke. Leading up to the intervention, the district was performing −0.32 SD below the national average on ELA tests and −0.26 SD in math. Under the threat of receivership, the state and district agreed to a new, unique model for school improvement that allowed the district to avoid state takeover but still undertake state-initiated improvement efforts without a typical elected school board governance structure. Specifically, six middle schools (including one serving grades 6–12), serving roughly 4,000 students, categorized as underperforming in the state's accountability system, were placed in a “zone” and targeted for improvement through the SEZP (we call this SEZP Cohort 1).
MA DESE and the Springfield Public Schools signed a memorandum of understanding indicating that the SEZP would be governed by a nonprofit board of directors. This board is made up of the mayor, the superintendent of Springfield Public Schools, the vice chair of the school board, and four state commissioner appointees who are based in the region. Therefore, although the state appointed a majority of members, the board was intended to provide greater local representation and influence over decision making than what typically exists under state takeover (Jochim & Opalka, 2017). The majority of commissioner-appointed members had local Springfield ties (e.g., a minister, a family foundation officer, and a nonprofit leader). There was relative stability of leadership on the board over the period under study, including the same board chair and superintendent. SEZP was incubated by a nonprofit organization called Empower Schools, led by several people who were involved in shaping the Lawrence reforms in their early years. SEZP reforms began 2015–16, the same year as the Holyoke takeover.
The SEZP reforms extended the approach taken in the early years of the Lawrence turnaround by granting school-based autonomy in exchange for a heightened level of accountability. However, unlike in Lawrence, the same level of autonomy was granted across all SEZP schools from the start. Principals and their teacher-leader teams had the authority to make decisions related to budget, curriculum, staffing, schedule, and school culture (up to 80% of the budget), and the district provided a menu of services that schools could select (or not). SEZP followed the Lawrence approach of empowering elected teacher-leader teams to set the working conditions at each school and emphasized partnership with teachers as a central pillar of the theory of action. In the early years, three of the middle schools were reconfigured such that the zone included a total of nine distinct learning communities. One of the new schools was managed by a charter operator, but none were converted to charter status, and all remained unionized. In 2017–18, a large high school serving 1,400 students was added to the zone and reconfigured into two new learning communities over 2 years (we call this SEZP Cohort 2).
From the start of SEZP, leaders expanded learning time across all zone schools, expanded tutoring offerings, and provided vacation academies to students who were struggling to meet proficiency benchmarks. A field experimental study showed that these week-long programs improved test scores and reduced exposure to exclusionary discipline for participating students (Schueler, 2018). There was again an emphasis on replacing school leaders and, to a lesser extent, teachers. SEZP emphasized the use of data for planning, accountability, and instructional improvement. Leaders also established new dual-language and early college programs. These changes are summarized in Appendix Figure A4 in the online version of the journal. While per-pupil spending in Springfield outpaced the state before reform, the funding after the takeover declined relative to the state, although we could only examine spending for the full district, not SEZP specifically (see Appendix Figure A1 in the online version of the journal).
Southbridge
Southbridge is the most recent Massachusetts district to enter receivership. It is a small city located about a 75-minute drive southwest of Boston. The district served roughly 1,800 students in six schools. As reported in Table 1, the district had the lowest share of low-income students of the four contexts but still a large majority (73%). It had the largest share of White students (48%) but also served a sizable share of Hispanic students (47%). About one third had a first language other than English. After placing the district into receivership, the state-appointed Receiver began in 2016–17. At the time, the district was achieving −0.30 SD below the national average in ELA and −0.41 in math.
The Southbridge turnaround was marked by leadership instability. The first Receiver was placed on administrative leave after her first year and replaced for the first half of 2017–18 by a state-level leader who served as Interim Receiver until a more permanent Receiver was appointed midway through the year. This Receiver remained through the rest of the period under study (and beyond). Southbridge was no stranger to leadership churn because, prior to takeover, the district had seven superintendents and seven high school principals over the previous 6 years. The state agency was supporting the district while also supporting two other receivership districts (Lawrence and Holyoke) at the same time. Leaders indicated that their efforts were met with criticism from school board members throughout.
The first year of reform was focused on increasing alignment across schools. The state hoped to eventually increase school-level autonomy but did not see this as a possibility at the outset given perceived school capacity limitations. The Receiver established a new alternative high school program for students with behavioral issues, extended learning time for elementary students, focused on principal and teacher replacements, and negotiated a new contract (modeled on the other three contexts). The second year was focused on stabilization given the leadership transition. Schools began shifting to vetted, standards-aligned curricula across the district, added school-level family liaisons, and established new translation services for families.
In the third year of reform, the new Receiver focused on creating structures for principal collaboration and capacity building, redesigning the alternative high school into a therapeutic day school, adding time for teacher professional development and planning, using data, and implementing the Positive Behavior Intervention and Supports framework for improving student behavior. In this year, leaders also shifted from a paper-based to a digital recordkeeping system for student information management, finance, human resources, facilities, operations, budget, food service, and more. These new reforms were just getting underway in the year prior to the pandemic's onset. We summarize the policy changes in Appendix Figure A5 in the online version of the journal. Unlike the other contexts, per-pupil spending increased in Southbridge after the takeover relative to the state (see Appendix Figure A1 in the online version of the journal).
Synthesizing Policy Similarities and Contrasts Across Contexts
Figure 3 summarizes the policy reforms pursued across the contexts. While there are a number of similarities among all four improvement efforts—a governance shift, new contracts, extended time, and data use—some notable differences emerge in the theories of action that leaders seemed to embrace across contexts. In particular, the Lawrence and SEZP reforms appeared to have more in common with each other than with the reform approaches in the other two districts. In the last column of Figure 2, we describe how the approach taken in both cases appeared aligned with the theory of action behind multilevel governance as an approach to balancing state and local decision making. Specifically, both Lawrence and SEZP adopted models of appointed boards that intentionally reserved seats meant to provide local representation in a key district-level decision-making body (which did not exist in Holyoke or Southbridge). These representatives did not have absolute authority but did have a vote and therefore formal power. In both places, turnaround leaders emphasized a distinct commitment to partnership with teachers’ unions and their members, for example, through the establishment of teacher-leader teams that were not just advisory committees but were meaningfully empowered to contribute to school-level decision making. These teacher-leader teams were codified in the new collectively-bargained contracts. In the other two districts, leaders placed less of an emphasis on teacher leadership. Finally, in both places, leaders placed a heavy emphasis on increasing school-level autonomy (and the authority of school principals), paired with higher levels of accountability, and a reduced central office that allowed for greater school-level resources. In contrast, in Holyoke and Southbridge, there was less of an effort to shrink the central office and increase school-level autonomy.
Data
To assess the impact of the reforms on student outcomes, we leveraged statewide longitudinal student-level data provided by MA DESE for 2006–07 to 2018–19 (the last full pre-COVID-19 year). These data included each student's grade, school, district, demographic characteristics, standardized test scores, attendance, and discipline record by year. The data included >500,000 unique student observations per year. Our preferred analytic sample included roughly 25% of the full universe of Massachusetts students who were within the 54 districts that served a majority low-income student population in the pre-takeover period (i.e., districts where 50% or more of the students were classified as low income). This is a more relevant set of comparison districts given that all treated districts were majority low income and due to the well-established correlation between socioeconomic status and academic outcomes.
The outcomes consisted of students’ academic performance as measured by their test scores on statewide math and ELA exams, administered annually in grades 3–8 and grade 10, as well as science exams for grades 5, 8, and 9 or 10. We standardized scores within year, subject, grade, exam, and modality (computer vs paper) using the statewide sample. Standardizing within exam was necessary because there was variation over time and within years, with all students taking the Massachusetts Comprehensive Assessment System exam prior to 2015, some students taking the Partnership for Assessment of Readiness for College and Careers (PARCC) exam in 2015 and/or 2016, and then all students switching to the Massachusetts Comprehensive Assessment System 2.0 exam in 2017 and beyond. Additionally, in 2015 and 2016, about half the students who took the PARCC exam also took computer-based testing, whereas the other half took paper exams. From 2017 to 2019, an increasing share took computer-based testing (Backes & Cowan, 2019). We confirmed prior to standardizing (by examining raw scaled scores) that there was not a substantial change in the presence of floor or ceiling effects when these exam shifts occurred either in the treated or comparison districts that could have artificially resulted in perceived gains or losses for the treatment groups relative to the comparison group (see Appendix Figure A6 in the online version of the journal). We also examined nontest measures, including the number of days a student attended school, the number of out-of-school suspensions, and finally, whether a student was retained and progressed to the next grade in the next year.
Analytic Methods
Examining Impacts on Student Outcomes
To study turnaround effects, we conducted difference-in-differences analyses that compared achievement trends of students in turnaround contexts with achievement trends of students in comparison districts that did not experience state-led turnaround. Because the reforms varied fairly substantially between districts, we estimated turnaround impacts separately for each context (excluding the other ever-treated districts from the sample) rather than estimating a staggered difference-in-difference model combining all treated districts together. Our approach should minimize concerns about bias due to variation in treatment timing because we estimated effects separately by context, within which treatment timing did not vary (Baker et al., 2022). For SEZP, we examined two cohorts separately because the reforms began in 2016 (Cohort 1), but a new high school was added to the zone starting in 2018 (Cohort 2). We began by running event study models to transparently assess the parallel-trends assumption and to examine how effects may have developed over time. In all student-level models, we treated the 6 years leading up to turnaround as the pre-takeover period and omitted the last year prior to the intervention as the comparison year. Our primary specification was a school-by-grade fixed-effects model as follows (using Lawrence as an example):
where Y is an outcome, such as a standardized math test score for student i in school s and grade g in year y, and
We included school-by-grade fixed effects (
Given previous research that showed that changes to exams and modality in Massachusetts impacted student test performance, particularly among students receiving special education services and whose first language was not English (Backes & Cowan, 2019), we included a set of test-related controls when estimating impacts on test-based achievement. Specifically, we controlled for whether the student took the PARCC exam, whether a student took a computerized exam, whether the test modality was new to the student in that year, and interaction terms that allowed the impact of a computerized exam to vary for students identified as special education or having a first language other than English. We clustered standard errors at the district level.
We also examined whether reform impacts varied depending on student characteristics. There were two differences between the models for these analyses and model (1). First, instead of event study models that estimate separate coefficients for each year, we ran basic difference-in-difference models, pooling all post-treatment annual effects together. We did this to avoid an unmanageable proliferation of estimates given the many permutations of post-turnaround years and student characteristics, along with the fact that we had no theoretical expectation that heterogeneity of effects would vary over time. Second, we interacted the post-treatment indicator with the student demographic characteristic to test whether the treatment effect varied for a particular subgroup (separately for each demographic characteristic and treatment context).
Synthetic Control Methods Examining Impacts on Student Outcomes
The key underlying assumption of the difference-in-differences approach was that the comparison group and treatment group were on a similar trajectory on outcome prior to the intervention. However, for some of the treated contexts and student outcomes, we observed violations of this parallel-trends assumption using model (1). As a check on whether any findings were driven by differences between the treatment and comparison groups in pre-takeover trends, we also used the synthetic control group method to identify comparison groups that were on a similar trajectory with respect to the outcome leading up to the reform implementation (Abadie, 2021; Abadie et al., 2010). We used the method to identify the weighted combination of all other untreated majority low-income districts in Massachusetts that minimized the mean squared prediction errors of the outcome variable of the treated district in each of the pre-takeover years. We excluded districts not observed in every year to create a balanced panel for the synthetic control package. For SEZP Cohort 2, which consisted of a single high school, we used other high schools to make up our donor pool rather than districts. We then generated difference-in-differences estimates using the synthetic control as the comparison group.
It was not possible to use traditional statistical inference approaches to infer the statistical significance of results in a synthetic control group framework because doing so typically involves analyzing the data at the level of assignment to treatment (in this case, typically the district level), dramatically reducing the sample size. Instead, we followed Abadie et al. (2015), Hernández (2019), and McClelland and Gault (2017) and conducted “placebo studies” based on the idea that we would not expect to observe estimated effects similar to or greater in magnitude than those for the treatment groups in districts where the reforms did not occur. To do this, we temporarily assigned treatment status to each placebo district in the donor pool and then conducted the synthetic control group analysis, generating estimate effects. Finally, we compared the treatment effects for our treated contexts with the distribution of estimated placebo effects. Where most of the placebo effects were smaller in magnitude than the treatment effects, we had greater confidence in the estimated treatment effect (Billmeier & Nannicini, 2013; Shores et al., 2023).
Findings
Lawrence
Overall, the results of the Lawrence reforms were positive to neutral. We begin by displaying the results for test score outcomes graphically in Figure 4. This is part of a series of figures that all provide descriptive outcome trends for the treated context and the comparison districts in the left-most panel and third panel from the left, with regression-based estimates of the effects for each outcome to the right of the descriptive figures. Synthetic control-based estimates and results from the placebo tests are included in the Appendix in the online version of the journal. For Lawrence, we observed large positive impacts of reforms on math achievement that increased in magnitude for the first 3 years of the reforms, leveled off for the fourth and fifth years, and began to decrease in the last 2 years of the reforms, coinciding with the shift to an appointed board (2018) and the arrival of a new superintendent (2019). That said, the impacts remained positive even in year 7. The average effect across all seven post-takeover years was 0.21 SD, as shown in Table 2. Again, this is combining effects across all tested grades.

Lawrence outcome trends and event study estimates.
Pooled difference-in-differences estimates of reform impacts by context
*p < 0.05. **p < 0.01. ***p < 0.001.
In ELA, the positive effects in the early years of reform were more modest in magnitude and began to trend downward in the last 2 years of reform that we observed. The overall pooled effect was not statistically different from zero (see Appendix Table A3 in the online version of the journal). The declines in the last reform year we observed (2019), which coincided with the gas explosions in the Lawrence community, did not appear to be due to these events alone because the declines persisted even after we excluded the schools located in neighborhoods most directly affected by the explosions (see Appendix Figure A11 in the online version of the journal). In science, positive effects began to emerge in the second year of the reforms and increased in magnitude until they began to dip, although they remained positive, in the last year we observed. The pooled impact across all post-takeover years was 0.12 SD (see Appendix Table A3 in the online version of the journal). Based on our visual inspection and joint F tests of the pre-takeover effects, reported in Appendix Table A2 in the online version of the journal, none of the test score impacts appeared to be driven by differences in pre-takeover trends between Lawrence and other majority low-income districts. We confirm that the results are not due to increased rates of missingness on the outcome measures after takeover in Appendix Table A6 in the online version of the journal.
Turning to nontest outcomes, the Lawrence reforms appeared to increase the average number of days students attended school by 1.78 days, pooling across all post-takeover years (see Table 2). However, the F test of whether the pre-takeover coefficients were jointly statistically significant (reported in Appendix Table A2 in the online version of the journal), and a visual inspection suggested that these attendance impacts may have been driven by pre-takeover trends. The effects, however, were robust to the use of synthetic control methods, allowing us to compare Lawrence with a synthetic comparison district with a very similar attendance pre-takeover trend. In the upper right-hand corner of Appendix Figure A8 in the online version of the journal, we show that this does not appear to be due to chance because most of the placebo district effects were smaller in magnitude than the effects we observed for Lawrence.
When it came to disciplinary outcomes, the reforms appeared to decrease out-of-school suspensions by 0.03 after pooling effects across all post-takeover years (see Table 2). The decreases were larger in the early years of reform and became neutral from 2016 to 2019. Finally, Lawrence reforms also appeared to increase the rate of grade progression, but we could not rule out that this result may have been driven by pre-takeover trend differences because the results were not robust to the use of synthetic control. None of these results were sensitive to the inclusion of student fixed effects, suggesting that compositional shifts in the student population did not drive findings.
In Appendix Table A3 in the online version of the journal, we display results on whether the effects of the Lawrence reforms varied for students based on demographic characteristics. For most outcomes, we found that positive impacts were largest for students of color and low-income students, with the exception of discipline outcomes, for which the effects were somewhat smaller for these groups, although still positive. In contrast, the test score impacts were smaller for students identified as immigrants. In Appendix Table A4 in the online version of the journal, we demonstrate that the gains for most outcomes were larger for middle school and high school students than for elementary schoolers. There were two exceptions—both the attendance and grade-progression impacts were almost entirely concentrated among high school students.
Holyoke
Unfortunately, the story was not especially positive for the second takeover, which overall generated negative to neutral effects on student outcomes, at least in the first 4 years of reform (the period we examined here). We begin by displaying results for test-based outcomes in Figure 5. In math, the reforms did not appear to alter performance in the first year, but we observed negative effects in years 2 through 4, resulting in a pooled impact of −0.22 SD over all 4 years. The pooled impacts were smaller but still negative in ELA (−0.06 SD) and science (−0.04 SD). There is some suggestive evidence that these results could be due to pre-takeover trends on the test outcomes (all joint F tests were statistically significant), but the findings were generally robust to the use of synthetic control methods, as shown in Appendix Figure A12 in the online version of the journal. When it came to the nontest outcomes, we found no strong evidence of impacts—positive or negative—on attendance or grade progression. For out-of-school suspensions, Holyoke was on a very different pre-takeover trajectory than the comparison districts, with very high rates of exclusionary discipline in the pre-reform era that plummeted the year prior to the reforms, making it difficult to draw conclusions about the impact of the reforms on out-of-school suspensions based on our event study methods. The negative impacts did not appear to be due to the changing composition of the Holyoke student population. Results were robust to the inclusion of student fixed effects, as we show in Appendix Figure A10 in the online version of the journal. Also, we observed no changes in the share of students identified as Black, Hispanic, or low income as a result of the reforms, as reported in Appendix Table A5 in the online version of the journal. Therefore, it does not appear that these results were due to improvements in retention rates among disadvantaged students, for example.

Holyoke outcome trends and event study estimates.
In Appendix Table A9 in the online version of the journal, we explore whether Holyoke reform impacts varied depending on student demographic characteristics. It did not appear that vulnerable subgroups benefited more than other students. In fact, the negative effects were larger for Hispanic students, students from low-income homes, and students with a first language other than English. One bright spot is that the impacts were more neutral or even more positive for special education students, especially when it came to test score outcomes. Many of the Holyoke reforms targeted the district's youngest and oldest students rather than those in the middle grades (which included a large share of students in tested grades who contributed to our estimates of the reform impacts on test scores). In Appendix Table A4 in the online version of the journal, we explore whether impacts varied by grade level. We did not find that it was only middle school students, for example, driving negative results. In fact, for math, the negative impacts were no different for middle school and high school students than for elementary students. In ELA, effects were somewhat more negative for high schoolers than for elementary students, but no different for middle schoolers. For science, negative impacts were concentrated among elementary students. For attendance, middle schoolers saw more positive results than elementary students.
SEZP
The SEZP reforms produced generally positive to neutral effects on student academic outcomes, both for the first and second cohorts. In Figure 6, we show positive effects on all three test subjects by the second year of the reforms for Cohort 1. However, there are some signs that Cohort 1 was on a different pre-takeover trajectory than comparison districts based on the joint F tests reported in Appendix Table A2 in the online version of the journal. Synthetic control results suggested positive effects for Cohort 1 by the fourth year of the reforms, but positive effects in the earlier years were not always robust to this method. For Cohort 2, we also observed large positive impacts on all three test subjects, but again, there was evidence that this cohort was on a different trajectory than the comparison group in the pre-takeover period (Figure 7). The positive effects on math and science were robust to synthetic control methods, but the ELA effects were not (see Appendix Figure A16 in the online version of the journal). Student fixed-effects estimates suggested that the results were not driven by changes to the composition of the student population, as we show in Appendix Figure A10 in the online version of the journal.

SEZP Cohort 1 outcome trends and event study estimates.

SEZP Cohort 2 outcome trends and event study estimates.
For Cohort 1 nontest outcomes, it also was difficult to differentiate treatment impacts from preexisting differences in outcome trends. Our pooled difference-in-differences estimates in Table 2 suggest non–statistically significant small positive effects on attendance on the order of 0.58 days of school. These positive results appeared robust to the use of synthetic control methods. Similarly, we observed a 1 percentage point increase in the rate of grade progression as a result of the reforms, which was robust to synthetic control. Unfortunately, the reforms appeared to increase out-of-school suspensions by a small 0.06 suspensions for Cohort 1 (pooling across all postreform years), and this result persisted even when relying on the synthetic comparison group. Cohort 2 students appeared to experience gains on all nontest outcomes we measured after treatment, but we again were unable to determine whether these effects were due to the reforms or preexisting differences in SEZP Cohort 2 outcomes in the pre-takeover period. One exception was attendance, which the reforms increased by 8.16 days, and this result was robust to the use of synthetic control methods. While it was difficult to find patterns of variation for the SEZP results based on student demographic characteristics that were largely consistent across outcomes for Cohort 2, the results in general were more positive for Black students, whereas the impacts were not as large for Hispanic students or for low-income students.
Southbridge
Unfortunately, our results suggested that the most recent Massachusetts takeover in Southbridge generated negative to neutral impacts on student outcomes, at least through the first 3 years of reforms. On test outcomes shown in Table 2, we observed large negative effects on the order of −0.22 SD in math, −0.16 SD in ELA, and −0.29 SD in science when pooling all post-takeover years. Southbridge was on quite a different trajectory than comparison districts in the pre-takeover period on all outcomes, as we show visually in Figure 8. Joint F tests of the pre-takeover coefficients reported in Appendix Table A14 in the online version of the journal confirm that these pre-takeover differences were statistically significant. However, our findings on negative test score impacts were robust in all three subjects to the use of synthetic control methods, where we compared Southbridge with a synthetic district that was on a nearly identical trajectory before the takeover.

Southbridge outcome trends and event study estimates.
For nontest outcomes, the Southbridge takeover appeared to result in reductions in attendance based on our event study estimates, but we observed violations of the parallel-trends assumption, and these findings were not robust to the use of synthetic control. Event study estimates suggested that the takeover increased exposure to exclusionary discipline, increasing exclusionary discipline by 0.04 suspensions. Despite the presence of pre-takeover differences for Southbridge, the discipline results appeared robust to the use of synthetic control (see Appendix Figure A19 in the online version of the journal). We found no impacts, positive or negative, on grade progression. These results were not driven by changes to the composition of the student population because they were robust to the inclusion of student fixed effects (see Appendix Figure A10 in the online version of the journal).
Next, we turn to whether Southbridge impacts varied for subgroups. Test score impacts were not as negative for Black students, and there were greater post-takeover reductions in suspensions for Black students. In contrast, effects for nearly all outcomes were more negative for Hispanic students than for non-Hispanic students. Negative results also were concentrated among students for whom English was not a first language. In Appendix Table A4 in the online version of the journal, we examined whether effects varied by grade level. For math, negative effects were driven more by elementary students than by middle or high school students. Declines in ELA were driven more by elementary students and high schoolers than by middle schoolers, whereas declines in science achievement were driven by all three levels but were largest among high schoolers. Suspension increases occurred mostly at the middle school level. That said, given pre-takeover issues, it is hard to draw strong conclusions about subgroups.
Discussion
Prior work on school system improvement has focused largely on short-term impacts in initial reform contexts. Less is known about the sustainability of district turnaround effects over time and the replicability of district improvement success across environments. This cross-case study of four state-initiated district improvement efforts in Massachusetts begins to fill these holes. Medium-term results for the Lawrence takeover indicate that it was indeed possible for a system serving a high concentration of low-income students of color to generate academic gains via state takeover and districtwide turnaround and to sustain those gains over time. Lawrence leaders generated positive effects on math and science performance, reduced exposure to exclusionary discipline, and increased the grade progression rate. Our examination of behavioral outcomes is especially important given the possibility accountability policy improves high-stakes outcomes to the detriment of nontest outcomes that are not part of the accountability system.
Although we observed persistent positive impacts of these reforms, our findings suggest that sustaining gains at the same level as the initial improvements is challenging. In the case of Lawrence, this appeared especially true in the context of leadership turnover once the process of transitioning back to local control began. This finding suggests some parallels with what happened in the New Orleans school system with the return from takeover to a new version of local control where results were mostly sustained but still somewhat mixed depending on the outcomes (Carroll et al., 2023). This suggests that leaders considering or embarking on takeover should be planful about the state's exit strategy from the outset. Researchers should devote more attention to learning about the transition out of takeover to better inform policy in this area.
In terms of replicability, the results for the SEZP intervention suggest that it is indeed possible to replicate turnaround gains across contexts because we observed suggestive evidence of positive impacts on most of the test-based and nontest outcomes we examined. That said, it is somewhat challenging to fully separate out policy impacts from preexisting differences between the treated schools and the comparison group in this context. However, replication of the Lawrence results was not guaranteed and proved more challenging in the two other state takeover contexts—Holyoke and Southbridge—where the reforms appeared to negatively impact some (although not all) of the key student outcomes that we were able to study.
That said, we were only able to examine 4 years of post-takeover outcomes in Holyoke and 3 years in Southbridge. It is possible that reforms generated longer-term benefits that we were unable to observe. Leaders indicated during interviews that we conducted that many core reforms in Southbridge were just getting underway during year 3. Some of the earliest reforms involved establishing basic systems—such as shifting from paper to digitized recordkeeping—that may not have paid off in immediate student outcomes but ultimately may prove to be critical to setting a foundation for future improvement. It is possible that some of the preconditions necessary for replicating Lawrence's success may not have been present in other targeted contexts. This would be consistent with Leithwood and Strauss’ (2008, p. 10) distinction between the “crisis stabilization” versus “sustaining and improving performance” stages of turnaround. Furthermore, the measures available may not have captured important benefits of reforms. For example, Holyoke's focus on early childhood education and postsecondary/workforce preparation may not be well evaluated, for example, by test-score outcomes among tested grades of 3–8 and 10. There may have been gains on outcomes beyond the scope of this study, such as access to pre-K and/or advanced coursework.
Despite these limitations, it is worth considering whether variation in the policies pursued across the four contexts could explain differences in the impacts on student achievement. It is striking that leaders in the two districts with the more positive outcomes—Lawrence and SEZP—emphasized a more similar policy approach than the other two districts that was aligned with a multilevel governance approach to state-initiated district turnaround designed to balance state and local input into decision making. In both cases, reformers seemed to find a way to capitalize on the advantages of greater state authority while minimizing some of the greatest downsides of shifting away from elected school board governance, as described in our conceptual framework and Figure 2. In all four contexts, the elimination of elected school boards allowed state-appointed leaders to quickly enact unilateral changes that likely otherwise would have been difficult to make, aligned with the efficiency dimension of our framework. However, leaders in Lawrence and SEZP also created formal opportunities for meaningful local authority over district-level decision making by reserving seats for local representatives on newly appointed boards, addressing issues of representation and capacity. District leaders partnered with union leaders, gave principals substantial autonomy, and established teacher-leader teams with influence over school-level policies, which connects back to address tensions related to interest groups. These actions did not simply involve opportunities for local stakeholder feedback but also granted formal decision-making power. Leaders therefore seemed to address potential implementation challenges by minimizing the extent to which local actors felt (and were) disempowered and attempting to increase buy-in of the reforms among a key stakeholder group responsible for implementation by giving teachers influence over reforms and minimizing teacher replacements. In contrast, the Holyoke and Southbridge reforms appeared less oriented toward increasing school-level autonomy or providing formalized roles for local influence over decision making.
Efforts to incorporate local voice in decision making in Lawrence and SEZP may have helped minimize local opposition to reforms that otherwise could have derailed efforts in either the short or longer run. Indeed, previous research has suggested that this was the case in Lawrence (Schueler, 2019), whereas the Holyoke and Southbridge reforms seemed to be met with significant public resistance (e.g., Fried, 2020). In both Lawrence and SEZP, leaders focused on increasing school-level autonomy—shifting funds from the central office to the school level, shifting from a compliance to a support orientation at the district level, and increasing the authority of school-based leadership teams—which has appeared to be a potent combination in other contexts scholars have studied (e.g., Fuller, 2022; Honig & Rainey, 2012; Jackson, 2023). These structures were designed to harness local knowledge and teacher expertise, which may have helped leaders address the often-overlooked complexity of school turnaround work by tailoring reforms to contexts and supporting ongoing learning. This study therefore provides suggestive evidence for one pathway through which leaders might find an effective balance between state and local authority for school system improvement. This approach appeared to be a valuable recipe, especially when combined with other features known to be associated with improved academic outcomes in turnaround contexts, such as extended learning time (Schueler et al., 2021) and higher teacher pay in the context of a performance-based career ladder system (e.g., Dee & Wyckoff, 2015; Hanushek et al., 2023).
That said, we want to be clear that these patterns are suggestive rather than definitive. For example, it is unclear whether effects varied because of differences in the policies leaders pursued or whether preexisting differences across the contexts dictated differences in the policy approaches leaders could realistically pursue (e.g., the Lawrence approach may not have been possible to implement in Holyoke or Southbridge). In these contexts, reformers may have had a more limited labor pool to draw on, and indeed, our interviews with leaders revealed challenges with recruiting staff in these contexts, and the research suggests teacher labor markets tend to be quite local (Sanderson Edwards et al., 2022). Based on the available evidence, we are not able to definitively say that the poorer outcomes in Holyoke and Southbridge were due to the reduced emphasis on multilevel governance in these contexts. There may have been other causes. For example, the state leadership's capacity for supporting multiple turnaround efforts could have contributed to challenges in Holyoke and Southbridge. It is also possible that the reasons for failure are entirely different in Holyoke versus Southbridge (e.g., one could have been due to teacher staffing challenges and the other to leadership turnover). For these reasons, we see this work as contributing to theory building rather than a rigorous test of a particular theoretical framework. Future research could isolate the impacts of this approach and test the theory behind a multilevel governance approach more rigorously. It is also unclear whether a multilevel governance approach would be optimal beyond Massachusetts. Furthermore, we do not mean to suggest that multilevel governance is the only possible pathway to district improvement. Neither do we argue that effective district improvement must occur in the context of takeover—rather, we are interested in understanding variation in the effectiveness of state-initiated district turnaround.
An underappreciated limitation of the accountability policy literature is that it often focuses on the impact of the intervention itself rather than the impact of accountability pressure (designed to incentivize better performance). For example, in the state takeover space, although researchers have found that when takeover occurs, it does not increase student achievement on average (Schueler & Bleiberg, 2022), it may be the case that takeover laws improve achievement in low-performing systems via the threat of takeover in districts that never actually experience one. The SEZP experience provides suggestive evidence on this question as an example where the district, under the threat of takeover, avoided state receivership by adopting a novel form of governance with greater state involvement than a typical district but greater local involvement than a typical takeover district. This arrangement benefited student achievement and is a model worth studying in more detail because it has now spread to 10 different states, covering >29,000 students nationwide. Although more research is needed, this may be a path to district improvement that avoids some of the more contentious aspects of takeover.
One sobering point is that reforms in both Lawrence and SEZP narrowed gaps in achievement between the targeted districts and the statewide average but did not close these gaps entirely. Lawrence caught up on some outcomes with other majority low-income districts but after 7 years of reform was still performing well behind the state average. SEZP made notable gains but remained at achievement levels below other majority low-income districts as of the last pre-pandemic year. Therefore, there remains an urgent need to identify strategies capable of more fully addressing opportunity gaps and ultimately eliminating educational inequality. Despite this need, the magnitude of the positive impacts in Lawrence and SEZP was noteworthy. In math, the effects were equivalent to roughly one quarter of the overall average statewide difference in achievement between low-income and non-low-income students. These impacts were comparable in size to the impact of efforts to implement the practices of high-performing charter schools into traditional public schools in Houston (Fryer, 2014) and to grandfather traditional public school students into high-performing charter schools in New Orleans (Abdulkadiroğlu et al., 2014). Therefore, although these districts still have ample room to grow, leaders did generate rare and remarkable improvements in student academic outcomes, providing lessons for leaders seeking to do the same in their own contexts.
Supplemental Material
sj-pdf-1-aer-10.3102_00028312251360910 – Supplemental material for Can States Sustain and Replicate School District Improvement? Evidence from Massachusetts on Multilevel Governance
Supplemental material, sj-pdf-1-aer-10.3102_00028312251360910 for Can States Sustain and Replicate School District Improvement? Evidence from Massachusetts on Multilevel Governance by Beth E. Schueler, Liz Nigro and John Wang in American Educational Research Journal
Footnotes
Acknowledgements
The authors thank Ziqiao Chen for significant research assistance and data management; our colleagues at the EdPolicyWorks Center at the University of Virginia; Association for Education Finance and Policy 2023 conference participants; NAED/Spencer 2023 Annual Fall Retreat participants; Doug Harris, Susan Moore Johnson, Kris Gutiérrez, Whitney Kozakowski, Jim Soland, and Kylie Anglin for feedback; and our partners at the Massachusetts Department of Elementary and Secondary Education as well as the Lawrence Public Schools, Holyoke Public Schools, Southbridge Public Schools, and the Springfield Empowerment Zone Partnership. Finally, we thank the editors and anonymous reviewers who pushed our thinking on this work in unusually helpful ways.
Funding
This research was supported by the Spencer Foundation and WT Grant Foundation. Liz Nigro and John Wang's work on this project was supported by the Institute of Education Science, U.S. Department of Education, through Grant No. R305B200005 to the University of Virginia.
Notes
B
L
J
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
