Abstract
Over many decades, educators have developed countless interventions and theories about how to create lasting change. Implementation research is the study of these efforts with a set of basic questions: What are we doing? Is it working? For whom? Where? When? How? And, Why? In other words, implementation research is an endeavor to understand if and how educational efforts are accomplishing their goals. This chapter describes the landscape of implementation research, tracing it back to its historical roots and connecting it to other fields with the aim of identifying common threads across diverse efforts. The authors survey where the field is today and highlight different perspectives on complex questions that have long troubled researchers. They outline some of the sticky issues ahead and make a case for shared conceptual clarity and clearly communicated and understood language that will help researchers understand how various bodies of implementation research work are related. The authors conclude by describing the opportunity presented to the education research community in this moment: to capitalize on and learn from historical and contemporary work in education and other fields, and to identify connections across theories and approaches and find ways to collectively move forward toward the shared goal of making education better.
The field of education research is ultimately focused on one shared goal: making education better. Over many decades, we have developed countless interventions and theories about how to develop, enact, iterate, operationalize, institutionalize, and diffuse something that will yield the prize of successful, lasting change. Implementation research is the study of these efforts. It examines the products, strategies, processes, and theories that researchers, practitioners, policymakers, and other stakeholders create by asking a set of basic questions: What are we doing? Is it working? For whom? Where? When? How? And Why? In other words, implementation research is an endeavor to better understand whether the field of education research is accomplishing its goal.
This chapter aims to describe a landscape of implementation research so that the field can identify and connect common threads across diverse efforts. “Implementation” has been referenced in education for decades (Berman, 1981; Fullan & Pomfret, 1977; Penuel, Fishman, Cheng, & Sabelli, 2011; Scheirer, Shediac, & Cassady, 1995; Spillane, Reiser, & Reimer, 2002). However, as a named field of scholarship in education, implementation research is relatively young. There are no journals focused solely on implementation research in education, nor are there consistent methods of inquiry. Indeed, one of the premier journals for implementation research in health care, Implementation Science, has been in circulation only since 2006. Moreover, as the study of implementation in education has evolved, the emergence of vastly different philosophical, theoretical, and practical orientations has made shared learning difficult.
In this chapter we do not argue for creating one approach for carrying out implementation research. We do, however, make a case for shared conceptual clarity and common (or at least clearly communicated and understood) language so that those working under the broad umbrella of implementation research can understand one another and how their various bodies of work relate.
For this chapter, we offer a working definition of implementation research as systematic inquiry regarding innovations enacted in controlled settings or in ordinary practice, the factors that influence innovation enactment, and the relationships between innovations, influential factors, and outcomes. In turn, we define innovations as programs, interventions, technologies, processes, approaches, methods, strategies, or policies that involve a change (e.g., in behavior or practice) for the individuals (end users) enacting them. To clarify further, an innovation can range from simple (e.g., tool or procedure) to complex (e.g., system-wide professional development or cross-system collaboration), and it can reside at one or more system levels (e.g., classroom, school, district, state). To communicate across domains and disciplines, we sometimes simply describe the innovation as “the it:” the object or focus of change in implementation research.
In this chapter, we examine work in education and other fields to identify how their driving questions about large-scale social innovation, improving education, and creating and sustaining change converge in the hybrid endeavor we are calling implementation research. We survey where the field is today, as evidenced in both theoretical and empirical literature, and highlight different perspectives on issues and complex questions that have long troubled researchers. We outline some of the sticky issues ahead and finish by describing the opportunity presented to the education research community in this moment: to capitalize on and learn from the historical and contemporary work of others so that we can identify connections across theories and approaches and find ways to collectively move forward toward our shared goal of making education better.
Our Approach
Implementation research is not, in itself, a suitable topic for a traditional targeted systematic review. It is not the subject of empirical studies that can be found, evaluated, and synthesized. Nor is implementation research, itself, a specific innovation or even a genre of an innovation that can be tested as one might see in systematic reviews. It involves more than a single set of methodologies, and it includes many different theoretical approaches. The very qualities that make it a timely topic for this publication—its evolving nature, vague and overlapping definitions, diverse theoretical orientations, and widely varied applications—are the same qualities that make it a poor subject for a targeted search.
As a result, this chapter is a review in the most general sense, in that it provides an account of a particular body of literature (Sayfori, 2014). Under this general umbrella, we combine several of the focus areas and goals that characterize literature reviews as described by Cooper and Hedges (2009). More specifically, this chapter is informed by searches that focused on original and highly influential research as well as on systematic reviews, including a comprehensive review that we carried out in 2008 and a review conducted specifically for this chapter, targeting highly referenced works. This is a narrative synthesis in the sense that it seeks to tell a story—one of the growth and development of implementation research in education—yet not a synthesis in the sense of consolidating a set of findings (Popay et al., 2006).
In this narrative, we acknowledge that implementation research is carried out with different intentions, different approaches to inquiry, and different methodologies, and we aim to account for that diversity while also identifying common threads. We draw from others’ reviews and writings, as well as from our own, to describe emerging conceptual frameworks and approaches. Unlike some other syntheses, ours does not seek to arrive at any definitive conclusions about a single innovation or type of innovation by using the findings of other works. Rather, it uses a combination of best evidence synthesis, systematic review, and narrative synthesis to provide the reader with a coherent and succinct snapshot of the past, present, and future of implementation research.
Positioning Implementation Research: Looking Beyond Fidelity
For many, implementation research is assumed to focus on establishing or evaluating fidelity of implementation, that is, the extent to which an innovation is enacted according to its intended model. This assumption is sometimes coupled with a view that implementation for educational improvement should entail replication of programs that demonstrate sufficient evidence of being potentially beneficial (Domitrovich & Greenberg, 2000; Downer & Yazejian, 2013; Durlak, 2010). While implementation research does include these views, its definition is far more wide-ranging. Current understandings of implementation research are now acknowledging decades-old observations: Contexts and conditions can affect innovation enactment in legitimate ways; replication is not always possible or even desirable; and improving education requires processes for changing individuals, organizations, and systems. These ideas are not new, but they are only now finding a place in the implementation research context.
Looking back 40 years, in the 1970s the seminal RAND Change Agent study (Berman & McLaughlin, 1978) shifted education reformers’ mindsets with its conclusion that an innovation adoption decision alone was not reason enough to believe that the innovation had been or would be fully implemented (Short, 1973). Although this realization may seem obvious now (at least among implementation researchers), it was not so to many at the time. Solomon, Ferritor, Haern, and Myers (1973) made this then-new observation: “The fact that materials and strategies were prescribed does not guarantee that the teacher actually engaged children in the intended way” (p. 2). Accepting this fact called for recognition of the difference between dissemination and implementation. While “dissemination science” focuses on what creators of innovations do to reach potential adopters, implementation science focuses on what happens once the innovation is in their hands (Dearing & Kee, 2012). Simply put, implementation research (or what Dearing and Kee refer to as “implementation science”) is not about the adoption decision alone; rather, it is about investigating what happens next—what is actually enacted, how an innovation is enacted, and why the contexts, conditions, characteristics, and other influences shape innovation enactment as they do.
In the decades since the educational policy implementation studies of the 1970s (Backer, 1991; Honig, 2006), rich and broadening perspectives on implementation research have grown in the medical and health literature. Implementation research in education has broadened as well, though in relative isolation from these other fields. Some studies have maintained a focus on questions of fidelity, replication, and outcomes (e.g., Hulleman & Cordray, 2009; T. Smith, Cobb, Farran, Cordray, & Munter, 2013; Strain & Bovey, 2011), while others have turned away from fidelity, to examine innovation adaptation, the reasons for adaptation, and the relationships between adaptation and student outcomes (e.g., Barab & Luehmann, 2003; Fogleman, McNeill, & Krajcik, 2011; Forbes & Davis, 2010). Other efforts—some rooted in the design experiments of the 1990s (e.g., A. L. Brown, 1992)—pursue questions of innovation creation and iteration more deeply (Bell, 2004; Collins, Joseph, & Bielaczyc, 2004; DeBarger, Choppin, Beauvineau, & Moorthy, 2013), while yet others pursue questions that embrace the notion of iterative design with broader change concerns.
Confrey, Castro-Filho, and Wilhelm (2000), for example, wrote that implementation research “models and documents the interrelations among system components identifying the catalysts and impediments to change” (p. 182). The design-based implementation research (DBIR) approach (Penuel et al., 2011) also focuses on more complex questions related to problem identification, innovation development and iteration, as well as developing theory and supporting system capacity for sustained change. These efforts represent a growing recognition that implementation research seeks to do more than answer questions pertaining to efficacy and fidelity; it includes questions about all aspects of the dynamic, complex implementation process.
Given the breadth of studies that fall under the implementation research umbrella, it is not surprising that disparities exist in the literature with respect to how implementation is defined. For example, Durlak (2015) defines implementation as “what a program looks like ‘on the ground’ when it is being conducted, as opposed to what a program looks like in theory or on the drawing board” (p. 1124) and, in doing so, maintains an innovation-focused definition of implementation. In contrast, Fixsen, Naoom, Blasé, Friedman, and Wallace (2005) describe implementation as a “specified set of [professional development] activities designed to put into practice an [intervention] activity or program of known dimensions” (as quoted in Dunst, Trivette, & Raab, 2013, p. 88). Other researchers understand implementation research to be like translational research in medicine, which seeks to move findings from clinical studies to practice settings or communities (Rubio et al., 2010). Still others focus on the dynamic nature of the innovation and the context. Berman (1981), for example, observed that implementation is “the adaptation of an innovative idea to its institutional setting” (p. 273), and Bryk et al. (2013) offer even more complexity by describing a nested innovation—a networked improvement community that is itself enacting innovations in different settings.
Notwithstanding these diverse purposes, theoretical orientations, and methods, we are beginning to see some convergence on the idea that implementation research as an area of inquiry includes the study of the innovation implementation (on-the-ground enactment and/or participation by the end user), the conditions and contexts that affect implementation (influential factors), and the aligned outcomes at a grain size commensurate with the innovation and its components (Century & Cassata, 2014; Durlak, 2010; Downer & Yazejian, 2013). Bringing conceptual clarity to these three key elements—the innovation, the influential factors, and the aligned outcomes—is at the heart of facilitating the field’s movement forward.
Why Study Implementation?
In brief, implementation research seeks to shed light on what an innovation can and should be (innovation design, development, testing, and improvement), what happens during and after innovation enactment (whether it worked and how, why, and where), and what we can learn through these inquiries about improving education (Domitrovich & Greenberg, 2000; Fullan & Pomfret, 1977). Under this broad umbrella, researchers then have varied interests and perspectives that we have grouped into five main categories (see Table 1). These categories are neither mutually exclusive nor bound to any particular theoretical orientation.
Reasons to Study Implementation and Example Purposes
First, implementation research asks questions that inform innovation design and development. That is, it examines questions about what the innovation could and/or should be, the extent to which an innovation is feasible in particular settings, and its utility from the perspective of the end users (Berman, 1981; Lynch & O’Donnell, 2005; Penuel & Fishman, 2012). Findings from these questions can help innovation developers—whether curriculum writers, researchers, policymakers, practitioners, or a combination thereof—make informed choices about innovation components; identify the supports that may be needed prior to, during, and after innovation implementation; and identify innovation elements that are more or less challenging for end users (Penuel, Phillips, & Harris, 2014; Rohrbach, Grana, Sussman, & Valente, 2006).
Second, implementation research pursues questions about whether the innovation achieves its desired outcomes. Some studies focus on evaluating efficacy and effectiveness in particular contexts to establish evidence of “what works” through rigorous (often experimental or quasi-experimental) innovation testing. These studies seek to make causal claims about innovation impacts, which requires establishing the extent to which the innovation was actually enacted in the treatment and comparison conditions (Basch, 1984; Cohen, 1975; Dhillon, Darrow, & Meyers, 2015; Domitrovich & Greenberg, 2000; Moncher & Prinz, 1991; L. Peterson, Homer, & Wonderlich, 1982; Rohrbach et al., 2006; Solomon et al., 1973; Song & Herman, 2010; Wolery, 2011). In these cases, implementation data can confirm the integrity of experimental designs and enable statistical analyses of relationships between levels of implementation and outcomes (e.g., Darrow, 2013; Durlak, 2015; Gresham, Gansle, Noell, Cohen, & Rosenblum, 1993). L. Peterson et al. (1982) succinctly refer to this as “assessment of the independent variable” (p. 479).
Questions of “does it work” need not be limited to experimental or quasi-experimental designs, however. Some innovations, such as those that are complex, place-based, or perhaps system-embedded, are not good fits for an efficacy/effectiveness orientation that functions under more constrained conditions or seeks to make claims of generalizability. For example, one might consider DBIR, with its specified characteristics and system-level outcomes to be, in itself, an innovation. Within that innovation resides an effort to solve a “persistent problem of practice” (Penuel et al., 2011). The solution to the problem of practice may (or may not be) one or more other innovations that reside within the larger innovation of the DBIR structure. In this case, the question of effectiveness is twofold: Is there a meaningful solution to the problem of practice, and in turn, was the DBIR approach a powerful mechanism for realizing that solution? For dynamic and complex innovation approaches such as this, implementation research can examine success using qualitative, mixed-methods, and multidimensional research designs.
Third, implementation research asks questions that examined why the innovation works, for whom, where, when, and under what conditions. In other words, these studies explore relationships between influential factors (e.g., characteristics of the user, the organization, the external environment, or the innovation itself) and innovation enactment, with some also examining associated relationships to outcomes. In such studies, innovation enactment might be considered the dependent variable. Other studies with this orientation may test more complex theories, for example, by treating influential factors as moderating or mediating variables that influence innovation enactment and have indirect effects on intended outcomes.
Fourth, implementation research questions stem from the desire to improve the innovation and, in turn, the intended outcomes (Domitrovich & Greenberg, 2000). For studies on clearly bounded interventions (e.g., curriculum materials or professional development models) that have undergone efficacy testing, this type of implementation research shifts from establishing internal validity toward understanding what Wolery (2011, p. 156) called the “transportability” of the innovation and its use in the complex world of real-life settings (Berman, 1981; Domitrovich & Greenberg, 2000; Scheirer et al., 1995). Some studies targeting improvement seek to identify and establish support systems for innovation implementation, while others involving more complex or multilayered innovations give more attention to innovation design iteration. Ultimately, the goal of this kind of research is to improve the innovation design, the support strategies, and other elements of the context, to better realize the intended innovation outcomes.
Fifth, implementation research informs theory development. Implementation research informs a wide range of theoretical questions, from how to bring about deep, lasting change to how to understand the most essential elements of an innovative curriculum. Although the theoretical orientations, methodologies, and purposes of research described above vary widely, all nonetheless fall under the broad implementation research banner. Further, all share the goal of ultimately addressing the perennial and confounding fact that even when efficacy is established, implementers (end users) do not necessarily embrace innovations in any broad or sustainable way (Dearing, 2009; Dearing & Kee, 2012). This challenge stands for the simplest and the most complex innovations. In 1977, Fullan and Pomfret wrote, “By investigating implementation directly . . . we can begin to identify some of the most problematic aspects of bringing about change” (p. 337). The field has repeatedly confirmed that bringing about change is not as simple as finding what works. As Dearing (2009) succinctly states, “We assume that evidence matters in the decision making of potential adopters” (p. 509). Although evidence is likely to matter to some extent, many other things matter too. Implementation research reveals what they are.
Literatures that Inform Implementation Research
Implementation research has roots in many varied and overlapping fields, a few of which we touch on here. Each offers a different view on two inextricably tied questions: (a) How do ideas (as manifested in innovations) spread? and (b) Why do people choose (or choose not) to embrace them?
Diffusion Theory and Dissemination
Diffusion theory can be traced to the 1800s with the work of two contributing founders: the French sociologist Gabriel Tarde and the German sociologist Georg Simmel. Tarde explained diffusion as “a societal-level phenomenon of social change” (Dearing & Kee, 2012, p. 60). His views of diffusion were influenced by practical observations and his belief that societal changes resulted from individuals’ desires to imitate inspirational and original ideas (Green, Ottoson, García, & Hiatt, 2009; Kinnunen, 1996). Like Tarde, Simmel focused on individual actions but, in particular, theorized that they are affected by external conditions (Dearing, 2008). Both scholars are acknowledged as among the first to ask questions pertaining to how and why innovations spread from one individual or societal context to another (Dearing & Kee, 2012).
The 1920s and 1930s saw more anthropologists exploring diffusion (Backer, 1991; Green et al., 2009). As Dearing and Kee (2012) explained, anthropologists’ work “focused not only on spread of innovations, but also on how cultures in turn shaped those innovations by giving them new purposes and by adapting them to suit local needs—the beginnings of what we now call implementation science” (p. 61). Moving ahead to the 1940s, any account of the influence of diffusion on implementation research would be incomplete without mention of the Ryan and Gross report (1943) on the diffusion of farmers’ use of hybrid corn. Their study set the stage for many diffusion studies to follow, including those that would inform Everett Rogers’s (1962) seminal work Diffusion of Innovations. Rogers’s work brought varied perspectives together in demonstrating how “macrolevel processes of system change could be linked to microlevel behavior” (Dearing & Kee, 2012, p. 63).
Diffusion theory informs implementation research by examining individual decision making and the attributes of an innovation that affect that decision making. It has highlighted the pervasive but not always accepted view that individual behavior change is slow and sometimes “discontinuous” over time (Rohrbach et al., 2006). This phenomenon continues to stymie those who seek to design implementation research studies today.
If diffusion research is the study of the natural spreading of innovations, dissemination research then is focused on conscious, active efforts to spread new knowledge or information to potential new adopters (Green et al., 2009). Dissemination activities include clearinghouses, special publications, and other methods for getting information about an innovation to audiences. These activities are, in turn, distinct from implementation activities that happen after the information is provided (Backer, 1991). The first instances of explicit authorization in support of dissemination in education in this country came with several broad policy initiatives, including the national Defense Education Act of 1958, the Elementary and Secondary Education Act of 1965, and the creation of the National Institute of Education, which in 1972 included dissemination as one of its priorities (Love, 1985). Other dissemination models included the creation of ERIC (Education Research Information Center) and the regional laboratory system (Louis, 1992). And yet, even though dissemination activities were popularly supported in the 1960s and 1970s, there was little evidence that dissemination actually resulted in behavior change (i.e., ongoing implementation) on the part of end users (Greco & Eisenberg, 1993; Grimshaw, 1999).
Knowledge Utilization and Technology Transfer
In the 1970s, scholars recognized that the concepts of diffusion and dissemination, and the related concept of knowledge utilization, were distinct, even though individuals casually used the terms interchangeably (Green et al., 2009). Whereas studies concerned with diffusion and dissemination targeted the use of innovations, studies focused on knowledge utilization were concerned with the use of knowledge of all kinds (Green et al., 2009). The knowledge utilization lens involves asking questions about the ways that knowledge gets used and how it can better be brought into use. Lehming and Kane (1981) define “knowledge” as information from research, practice, or both and suggest that it can reside in ideas, theories, explanations, advice, or “things” (p. 11). It is these things that are of interest in implementation research; they are the innovations that implementation researchers study.
Best, Hiatt, and Norman (2008) suggest that knowledge utilization can be broken down into two steps: the “imparting of research knowledge from producers to potential users” (dissemination) and “knowledge uptake—that is, the acquisition and review of research knowledge and its utilization” (p. 321). Implementation research brings a parallel perspective. In the case of an externally created innovation (from the end user perspective), implementation represents knowledge uptake as manifested in enacting an innovation that embodies a “set” of knowledge that originated from the innovation developers. In other cases, implementation focuses on co-creation of knowledge through collaborative innovation development (DeBarger et al., 2013). Other studies examine what happens when end users not involved in the development of an innovation enact that innovation, taking into account their previous knowledge (what some might call an influential factor) as they make sense of the new knowledge operationalized in the innovation (Coburn & Talbert, 2006; Penuel et al., 2014).
To complicate the lexicon further, knowledge utilization is sometimes confused with knowledge transfer and technology transfer. Similar to the concept of dissemination, knowledge transfer is defined as “a process of transmitting or conveying information from the developer, organizer or interpreter of research to the potential users” (Love, 1985, p. 344). Technology transfer, then, is exactly what it sounds like: a process of dissemination of technology, equipment, and devices or of technical information (Backer, 1991; Love, 1985). Twenty years ago, Hutchinson and Huberman (1994) considered the technology transfer perspective to be antiquated in the context of understanding innovation implementation. They saw what Short (1973) saw more than 20 years earlier: that researchers have unrealistic expectations about the contributions that research makes to practice. Short posited that “the relationship of research to practice is not a one-to-one relationship; rather it appears to be a process involving a series of complexly interrelated steps, still only partially understood” (p. 242).
Individual, Organizational, and Educational Change
Questions pertaining to change lie squarely at the heart of implementation research: What does it take for people, organizations, and systems to change? Oancea and Pring’s (2008) observation illuminates the complexity of this question:
To ask why a person acts in the way he or she does is logically very different from asking why the lights failed or why such a person has “flu”—a different sort of explanation is required—one in terms of intentions and motives. (p. 31)
Thirty years prior, Berman and McLaughlin (1976) noted that effective implementation depends on individuals’ and organizations’ capacity for and receptivity to change and their joint process of adapting an innovation to meet local needs. Enacting new knowledge as manifested in new practices or programs in educational settings requires individual and organizational change that might be mechanically difficult and, more important, psychologically threatening (Backer, 1991).
Individual Change
Just as Tarde’s early 19th-century work on diffusion entailed looking at individual motives for embracing new ideas (Dearing & Kee, 2012), some scholars examine educational change through the lens of the individual end user. Theories of change emphasizing the key role of end users who bring their own information and expertise to a situation gained prominence in the 1970s (Hutchinson & Huberman, 1994). One of the most widely referenced models is Hall and Loucks’s (1978b) concerns-based adoption model that describes and explains the process of change experienced by teachers who implement instructional innovations (Anderson, 1997). This model is grounded in the assertion that change is a very personal experience requiring developmental growth (Hord, Rutherford, Huling-Austin, & Hall, 1987). Cohen and Ball (1990) also affirmed that teachers’ practices are shaped by their many personal experiences over time and that changing those practices is no trivial matter, asserting that “changing one’s teaching is not like changing one’s socks” (Cohen & Ball, 1990, p. 334). Fullan (2007) points to the potential discomfort associated with change that individuals may experience, noting that “even changes that do not seem to be complex to their promoter, may raise numerous doubts and uncertainties on the part of those not familiar with them” (p. 45).
As Hord et al. (1987) suggested 30 years ago, change is enacted by individuals, and individuals each bring their own experiences to the process of change. Coburn and Talbert (2006) emphasize the important role that enactors’ prior beliefs and experiences play in innovation implementation. Similarly, S. Peterson (2013) suggests that educational change is a collaborative practice of meeting people where they are. Other models of individual change have put forward the notion of sense-making as a lens through which to understand innovation implementation. Spillane et al. (2002), for example, note that the adaptations during implementation are the result of “human sense-making” (p. 419), that is, the ways that the innovation user constructs understanding of the innovation in practice informed by prior experience. Increasingly, implementation research is embracing these long-standing perspectives that recognize the important role of the individual in any change effort.
Organizational Change and Educational Change
The literature on organizational change comes from many disciplines, including health, business, and law—with many theories and research approaches seeking to find answers to similar challenges in how to bring about large-scale change behaviors (Burnes, 2005; House, 1981; Inbar, 1996; Rohrbach et al., 2006) or “transfer capabilities” (Szulanski, 1996, p. 27). Most of this literature carries a common theme: that organizational systems within which innovations exist (in the case of education, schools, and school systems) are complex and constantly changing. With this acknowledgement, the organizational change literature points toward the challenge that implementation researchers face in making sense of that complexity. As Meyer and Goes (1988) observe, “From both the theoretical and practical perspectives, our cumulative knowledge of why and how organizations adopt and implement innovations is considerably less than the sum of its parts” (p. 897).
Given that educational change theories are closely tied to organizational theories, it is not surprising that they, too, overlap and, to some extent, lack coherence. Some focus on systemic change, others on piecemeal change, and still others on phases of change (Joseph & Reigeluth, 2010). There are theories about “change agents,” “planned change,” “systemic change,” “conditions of change,” and more (Ellsworth, 2000; Fullan, as cited in Anson, 1994; Leithwood & Montgomery, 1980; Miles, 1998; Sikorski, 1976; Yin, Quick, Bateman, & Marks, 1978). Clark and Guba (1965) wrote about models and processes of change in education over 50 years ago, using many of these same words; yet the challenge of bringing consistency to the dialogue persists.
Sashkin and Egermeier’s (1993) comprehensive review of educational change models identifies three perspectives on bringing about change. The first is a rational-scientific model that focuses on dissemination of new techniques to the end user. One might place the National Diffusion Network of the 1970s and the What Works Clearinghouse of today in this group. Second, there is a political perspective (top-down) that focuses on achieving change through policies. For example, the focus on systemic change in the 1990s (e.g., M. Smith & O’Day, 1991) sought to bring coherency to elements of the system rather than emphasize any single innovation (Knapp, 1997). Third is a cultural perspective, one that seeks to make change by influencing individual values. Given that these models are not mutually exclusive and overlap, implementation research can be situated within or across any of these orientations.
Many who take part in the educational change discourse lament the stubbornness of the system in relentlessly retaining the status quo. Elmore (1996) discusses the notion that what passes for change is not really change at all; that is, changes take place at a surface level but never really penetrate to “the core of educational practice” (p. 2) in any way that will endure. Similarly, Coburn (2003) offers an alternative view of scaling change as one that eschews breadth for depth, or what she calls “deep and consequential change in classroom practice” (p. 4). Tyack and Cuban (1995), in their seminal Tinkering Toward Utopia, put it well: “To bring about improvement at the heart of education—classroom instruction . . . has proven to be the most difficult kind of reform” (p. 135). Implementation research, in its current form, is shaped by the long-standing histories of preceding fields that sought to address these concerns.
Recent History: Approaches, Frameworks, and Methods
With these literatures as a backdrop, we now provide an overview of the state of implementation research today. Our literature search involved three distinct approaches. We began by revisiting a comprehensive, systematic review that we had conducted in 2008 to identify factors that affect the implementation, spread, and sustainability of innovations (Century, Cassata, Rudnick & Freeman, 2012). For this chapter, we revisited that work (using the Web of Science database) with a 100-year historical lens by conducting backward and forward citations of the most highly referenced works to date, paying particular attention to references published in the education literature.
Next, to identify key empirical research approaches and analytic strategies, (because a comprehensive review was beyond the scope of this chapter), we identified highly cited articles that included practice recommendations and preliminary guidelines for implementation research (e.g., Dane & Schneider, 1998; Durlak & DuPre, 2008; Dusenbury, Brannigan, Falco, & Hansen, 2003; Mowbray, Holter, Teague, & Bybee, 2003; O’Donnell, 2008), and conducted a forward search in Web of Science to locate empirical education research studies that referenced one or more of these seminal works.
Finally, we reviewed the recent “gray literature” from growing professional organizations in and outside of education (including the American Educational Research Association, the Global Implementation Initiative, the National Implementation Research Network, and the Society for Implementation Research Collaboration). We located relevant conference presentations, report briefs, webinars, and other, more ephemeral sources disseminated through organizational websites and at professional meetings from 2011 to the present.
Theoretical Frameworks for Implementation Research
Implementation research, by our working definition, is the systematic inquiry of innovations enacted in controlled settings or in ordinary practice, the factors that influence innovation enactment, and relationships between innovations, influential factors, and outcomes. Thus, frameworks that inform the organization of implementation research address two main concerns—how to conceptualize and describe the innovation itself, and how to identify and organize the contexts, conditions, and characteristics that influence innovation enactment (influential factors). These two fundamental concepts—(a) characteristics of the innovation and (b) influential factors—are basic elements of varied theories of change and a key part of most recent research syntheses or metaframeworks depicting innovations in context (e.g., Domitrovich et al., 2008; Donaldson, 2001; Hulleman, Rimm-Kaufman, & Abry, 2013; Moulin, Sabater-Hernandez, Fernandez-Llimos, & Benrimaj, 2015).
Conceptual frameworks enable researchers to effectively communicate about implementation phenomena with other researchers, practitioners, developers, and policymakers. Moreover, they provide a starting point for the sometimes obscured but substantial challenge of knowing exactly where to the draw the conceptual boundary between an innovation itself and the contexts that influence its enactment. As established earlier, change is complex: Numerous multilevel, interacting, and dynamic variables work together to produce desired outcomes. Thus, the wily line that distinguishes innovation from context can be hard to pin down, but doing so, informed by a theoretical orientation, is a necessary step for researchers seeking to specify analytic models that explore the complex relationships between innovations, contexts, and outcomes.
In the sections that follow, we first discuss the challenge of, and approaches to, defining the innovation. We then outline theoretical frameworks for organizing and describing influential factors in innovation enactment. We end this section by revisiting considerations for distinguishing innovation from context.
Conceptualizing the Innovation
It is now widely accepted that educational innovations are (or at least should be) developed according to implicit or explicit theories of change and that they contain multiple elements (i.e., features, building blocks, ingredients) designed to produce desired outcomes in a given context. At the same time, researchers acknowledge that all innovation elements are not created equal: Some are core components (also known as essential components, critical components, or active ingredients) that are theorized or empirically determined to be the key contributors to outcomes of interest and primary mechanisms for change (Abry, Rimm-Kaufman, Larsen, & Brewer, 2013; Damschroder et al., 2009; Darrow, 2013; Domitrovich et al., 2008; Fixsen et al., 2005; Greenhalgh, Robert, MacFarlane, Bate, & Kyriakidou, 2004). For this reason, core components are often considered indispensable in practice (Damschroder et al., 2009; Greenhalgh et al., 2004), at least until empirical data prove otherwise. The remaining innovation components, in theory, are considered to be nonessential “related components” (Hall & Loucks, 1978a) or part of the “adaptable periphery” (e.g., Damschroder et al., 2009).
Identifying “core” components
Describing innovation components is challenging, even when working with the developers themselves (Hall & Loucks, 1978a; Leithwood & Montgomery, 1980; Meyers, Durlak, & Wandersman, 2012). This is true for relatively simple innovations as well as for complex and multifaceted innovations. It is not uncommon for developers to be unsure about which elements are indeed most critical (Remillard, 2005; Ruiz-Primo, 2005), and there is a tendency for innovation creators to identify the majority of components as “very important” (Mowbray et al., 2003) and to hold holistic views of their innovations (i.e., as “packages”), leading to component descriptions that lack specificity (Cohen, 1975; Harn, Parisi, & Stoolmiller, 2013; Leithwood & Montgomery,1980). For these reasons, researchers are encouraged to use multifaceted approaches to identifying innovation components that combine information from developers and other experts, from end users, from observations of innovations in practice, and from reviews of artifacts, such as practice guides and other program materials (e.g., Century, Rudnick, & Freeman, 2010; Leithwood & Montgomery, 1980; Mowbray et al., 2003).
Some researchers who have focused on conducting comparison studies suggest that identified core components should be further classified into two categories originally described by Waltz, Addis, Koerner, and Jacobson (1993): “unique” (i.e., innovation-specific) and “necessary but not unique.” This perspective acknowledges that many innovations and business-as-usual educational practices may have some degree of component overlap (Century et al., 2010; Michie, Fixsen, Grimshaw, & Eccles, 2009; Nelson, Cordray, Hulleman, Darrow, & Sommer, 2012), in particular, with components representing general quality of instruction or good teaching practice. Identifying and measuring the enactment of the unique core components in both the treatment and comparison groups, while also taking into account the enactment of nonunique (shared) components, are key to determining differences in innovations (Darrow, 2013; Nelson et al., 2012).
Other studies are less focused on making comparisons between different innovation enactments and are more concerned with understanding the operation and evolution of innovations in natural settings. In these studies, it is still essential to articulate innovation components in order to measure, analyze, and understand their relationships to one another and to outcomes. Moreover, specific descriptions of innovation components afford the potential of enabling synthesis of studies that examine innovations with common components, leading to a cumulative knowledge base.
The extent to which innovation components should or can be described at a meaningful level of specificity is shaped by the innovation itself and by its underlying theory of action. Policy reforms, for example, have varied levels of specificity (Desimone, 2002) and often are vaguely stated (Cohen & Ball, 1990), leading to assortments of implementation on the ground. As Berman and McLaughlin (1978) observed decades ago, policy in operation can look very different in different contexts. In brief, if the field is to learn about improving education through implementation studies, it is essential to be able to answer the question: implementation of what?
Organizing core components
As part of identifying and organizing innovation core components, researchers must decide on the appropriate level of detail with which the components should be described. For instance, in a classroom-level curricular innovation, core components may include specific teacher practices (e.g., guiding students’ learning by taking into account students’ ideas); larger, more abstract categories of practice (e.g., supporting student learning); or even more abstract constructs (e.g., instructional transactions; Ruiz-Primo, 2005). In comparison, in a school-wide innovation, core components may include decision-making activities (e.g., staff participate in decision making), leadership activities (e.g., school leaders model instructional practice), or larger constructs that subsume multiple components (e.g., staff foundations; LaForce, Noble, King, Holt, & Century, 2014). In a district-level innovation, core components may be much broader in nature, reflecting district-wide activities (e.g., professional development). Finally, there are innovations that cut across multiple levels of the education system and have process-oriented components (e.g., collaborative planning groups).
Abry, Hulleman, and Rimm-Kaufman (2015) suggest that researchers, as a rule of thumb, identify the “kernels” of an innovation—the “fundamental units that cannot be further reduced while retaining their impact” (p. 334). Others suggest that once identified, specific innovation components should be organized according to the latent constructs they represent, so that those constructs, even if operationalized differently, can be measured in both the treatment and comparison conditions (Nelson et al., 2012). Another consideration might be the level of the outcome and alignment between the innovation component and that outcome (e.g., professional development elements may align with teacher outcomes; elements of system-level processes may align with system-level outcomes).
In recent years, researchers have begun to make the conceptual distinction between core components that are structural in nature (i.e., those that provide a format and organizational structure for the innovation) and those that represent processes such as specific teacher and student interactions, participation in decision-making processes, and innovation codevelopment (e.g., S. A. Brown, Pitvorec, Ditto, & Kelso, 2009; Century & Cassata, 2014; Harn et al., 2013; McKenna, Flower, & Ciullo, 2014; Odom et al., 2010; O’Donnell, 2008; Ruiz-Primo, 2005). The structure/process approach to categorizing educational innovation components can be traced back to a seminal publication by Mowbray et al. (2003), whose review of implementation research studies challenged the field to rigorously measure processes as well as the more easily captured structural innovation elements. Their work suggested the importance of doing more than noting the presence of a structure; researchers need to understand what happens within structures to draw conclusions about innovation enactment and outcomes.
Conceptualizing Factors Influencing Innovation Implementation
Conceptualizing the innovation specifies the “what” of change; conceptualizing the range of influences on innovation enactment specifies the “why” and the “how.” For several decades, education researchers have sought to identify variables that influence how and why educational innovations are enacted in practice settings. As referenced earlier, in the 1960s and 1970s, educational innovations were seen as replicable “technologies” that would easily transfer once providers had knowledge that they “work.” Then large-scale evaluations brought to light the influence of administrative structures, material resources, and problem-solving strategies on implementation (Berman & McLaughlin, 1976; Fullan & Pomfret, 1977). Concurrently, curriculum implementation research illuminated the needs and concerns of individual practitioners as they attempted to enact changes in their practice (e.g., Ball & Cohen, 1996; Buttolph, 1992; Hall & Loucks, 1978b; Sieber, 1981). Berman (1981, p. 279) was among those documenting these phenomena and published a preliminary categorization of factors affecting the educational change process, calling on the field to develop a “taxonomy of contextual conditions” to guide the design of empirical studies examining the individual and interacting spheres of influence. In introducing his preliminary taxonomy, he spoke about the importance of categorizing different types of variables in order to clarify their status in the educational change process.
Spheres of influence
Since that time, numerous studies have generated lists of factors that may support or inhibit educational and other social innovation enactments. These factors converge across multiple disciplines, including school psychology, health, education, and prevention science. Greenhalgh et al. (2004) were among the first to organize influential factors into a comprehensive framework, creating a model that categorized them into layers or spheres of influence: the individual, the organization, the external environment, and the attributes of the innovation itself. Many similar frameworks and research reviews have since been published (e.g., Century & Cassata, 2014; Chambers, Glasgow, & Stange, 2013; Durlak & DuPre, 2008; Fixsen et al., 2005; Hall & Hord, 2015; Michie et al., 2009; Remillard, 2005; Rohrbach et al., 2006; Sanetti & Kratochwill, 2009; Weiss, Bloom, & Brock, 2013). Moulin et al. (2015) recently located, reviewed, and compared 49 such frameworks across disciplines. The Consolidated Framework for Implementation Research (CFIR) is a particularly well-developed and widely cited framework synthesizing the many influences on implementation identified through theory and empirical research in the health services sector. The CFIR includes a companion website with an online menu of constructs and measurement resources for researchers and evaluators (see http://cfirguide.org).
Characteristics of individual end users: The characteristics of individual end users cited in the literature generally fall into two categories: (a) characteristics of the individual in relation to the innovation (e.g., level of understanding, expertise, prior experience, beliefs, values, attitudes, motivation, or self-efficacy) and (b) characteristics of the individual that exist independently of the innovation (e.g., willingness to try new things, organizational skills, classroom management style, or views about teaching and learning in general). While research has emphasized the importance of individual competence and skills in enacting the innovation (e.g., Hall & Loucks, 1975; Mowbray et al., 2003), there is also a long-standing awareness that innovation use is more than a matter of skill or even self-efficacy. As early as 1950, Caswell urged researchers to consider “psychological factors in change” (p. 69), reminding them that trying out something new always involves an element of uncertainty and risk. This perspective emphasizes that individual innovation users are not passive recipients; in attempting new practices, they actively interpret and make decisions about their use by drawing on prior beliefs and experiences (Ball & Cohen, 1996; Greenhalgh et al., 2004; Penuel et al., 2014). Over several decades, research has noted the highly personal transformational process that takes place during innovation implementation, beginning with a sense of readiness, or a perception that one is willing and able to change (Buttolph, 1992; S. Peterson, 2013). While individual characteristics independent of the innovation are generally not emphasized in theory, preliminary research suggests they are potentially important influential factors (Hill, Blazar, & Lynch, 2015).
Organizational and environmental factors: The next spheres of influence are not always as clear. Depending on the innovation, the boundary of “inside the organization” and “outside the organization” may vary (Damschroder et al., 2009). However, in the context of classroom-level interventions, organizational factors often refer to school- or district-level influences (Snyder et al., 1992). Some organizational factors pertain to characteristics of the setting itself (e.g., class size, resources, physical space, scheduling, organizational structure; Hall & Hord, 2015; Macklem, 2014), while others involve the organizational administration, management, and decision-making processes that individuals in the organization (i.e., school or district) engage in related to innovation adoption and use (Fullan & Pomfret, 1977). The collective attitudes and behaviors of people within the organization (e.g., morale, vision, trust, collaboration, identity, commitment), which some refer to as aspects of organizational culture, are also considered important organizational influences (Maitlis & Sonenshein, 2010; Rohrbach et al., 2006). Environmental factors are those considered “outside the organization” (e.g., government agencies, economic conditions, shifting social priorities, or professional community networks). These elements of the broader context exert indirect influence on innovation implementation (Ball & Cohen, 1996; Fixsen et al., 2005; Fullan & Pomfret, 1977; Snyder et al., 1992).
Attributes of the innovation: The innovation attributes themselves can also influence innovation implementation. Innovations can have actual attributes (objective characteristics) and perceived attributes (subjective user perceptions about the innovation). These attributes, however, have not consistently been differentiated in the literature. Some researchers equate innovation attributes with objective characteristics such as number of components (complexity), specification, scope of effort, empirical evidence of effectiveness, design features, and cost (e.g., Berman, 1981; Century et al., 2012; Damschroder et al., 2009; Fullan & Pomfret, 1977; Snyder et al., 1992). The degree of specification—the explicitness with which an intervention (in whole or in part) is articulated for the end user—varies widely (Cohen & Ball, 1999; Desimone, 2002). While some interventions (e.g., curricula, technology, training materials) provide “blueprints” or detailed plans for end users, other types of interventions (e.g., policies, design principles, or goal statements) are much more ambiguous in comparison, leaving their operationalization more subject to the influence of the local context.
Other researchers consider innovation attributes to include not only objective features but also subjective judgments such as level of attractiveness of the materials, ease of use, familiarity, perceived relevance, and perceived advantage over current practice (e.g., Dearing, 2009; Rohrbach et al., 2006). Adaptability, or the extent to which an innovation may be flexibly enacted to fit the circumstances, may be subjective (depending on the skills, knowledge, and attitudes of the provider) or objective (in cases where flexibility is built into innovations by design). While both categories of factors are potentially important, researchers should note that the subjective factors will vary by different end user populations.
Implementation support strategies
In addition to the factors explained above, there is consensus in the field that deliberate, planned support for innovation users and their organizations is vital to change efforts (Forman et al., 2013; Hall & Hord, 2015; Peters, Adam, Alonge, Agyepong, & Tran, 2013). The variety of these supports is broad, encompassing operational planning, resource provision, professional development, mentoring, strategic planning, evaluative processes, and other strategies that support ongoing implementation and improvement. Supports may be provided by innovation developers or intermediary organizations (also called change agents, change facilitators, technical assistance providers, or purveyors) or may come from within the enacting organization. In the literature, this factor category is described in various ways, including implementation drivers (Fixsen et al., 2005), implementation-level activities (Darrow, 2013), support systems (Domitrovich et al., 2008; Meyers et al., 2012), strategies (Moulin et al., 2015), and implementation practices (Dunst et al., 2013). In this chapter, we use the term implementation support strategies to highlight the primary purpose of supporting end users as they put an innovation into practice (Fullan & Pomfret, 1977). While implementation support strategies are often considered key variables in theories of change (e.g., Dunst et al., 2013; Fixsen et al., 2005; Meyers et al., 2012; Weiss et al., 2013), they are not present in all innovations. Moreover, they do not align cleanly with particular “spheres of influence,” usually falling into the organizational or environmental groups.
Ultimately, implementation support strategies are designed according to underlying theories or best practices for facilitating individual and organizational learning and change. For this reason, researchers have begun to emphasize the importance of understanding the extent to which implementation support strategies were carried out as intended as part of interpreting observed effects (e.g., Dunst et al., 2013; Meyers et al., 2012; Nelson et al., 2012; Weiss et al., 2013).
Implementation over time
A final set of theories about factors influencing innovations focuses on implementation phases or stages. In general, these theories differ from others we have outlined because they bring a longer term view of implementation, depicting a developmental arc from the first moment of innovation adoption to a point at which the innovation has (potentially) become routine. Time-related theories are often cited in research exploring factors that influence an innovation’s propensity to be sustained over time in a particular context. Such theories include views of innovation implementation from the perspectives of both the individual end user and the other individuals in the organization, as they move through developmental stages from awareness to initial adoption, to sophisticated innovation use.
Moulin et al. (2015), for example, in their synthesis of existing frameworks, explicitly note the nonlinear and recursive nature of the implementation process and the possibility of different contextual variables that come into play at different points in time. Hall and Loucks (1975) focus on the individual’s evolution, moving from routine, mechanical use to more flexible and adaptive use with increasing skills and competence. Other frameworks theorize that whole organizations experience similar phases, progressing through awareness, then start-up activities (e.g., planning and securing resources), initial implementation, skilled implementation, and ultimately, routine practice (e.g., Berman & McLaughlin, 1976; Fixsen et al., 2005; Yin et al., 1978).
In effect, studies that examine innovation endurance are concerned with implementation over time. In such studies, the duration of the time horizon combines with the study’s theoretical orientation to inform the study focus. In the shorter term, for example, studies may focus on the extent to which innovation structures are present. As the time horizon lengthens, however, research questions shift, reflecting a deeper concern for the changes that reside at the heart of lasting educational improvement. Adelman and Taylor (2003) speak about the presence of the “valued functions” that reside within the innovation structures and giving attention to ways those functions can endure even as the innovation structures come and go. Others assert that for change to be truly permanent and meaningful, it must take deep hold at the core of practice (Coburn, 2003; Elmore, 1996). Thus, implementation research studies concerned with questions about innovation endurance focus on questions of what is lasting: whether the innovation is changing, how it is changing, and why.
Distinguishing the Innovation From Context
The diversity of theoretical orientations that researchers bring to implementation research warrants an embrace of different research models, designs, and methods. Still, those engaged in a shared research endeavor need to establish shared conceptual understanding that transcends the different orientations (Bell, 2004; Coburn, 2003). The literature reveals an emerging consensus toward this goal, at the largest grain size, in designating variables as innovation core components or as contextual factors, and in specifying the level of contextual influence (i.e., innovation, individual, organizational, environmental). Then, within broad conceptual frameworks describing the phenomena under study (i.e., the innovation, its component parts, and categories of influence), researchers can hone in on a variety of specific innovation-related and contextual variables according to their particular research questions and theoretical orientation.
Still, even with a commitment to identifying the innovation and the influential factors, the lines that distinguish them are not always clear. Depending on the theory of change, the same innovation elements may be considered part of the innovation, external to the innovation, or even outcomes of the innovation. For example, some support strategies might be considered innovations (or innovation elements) in themselves, enacted to enhance the behaviors and skills of practitioners delivering service to target recipients. More specifically, a school district may establish district coaching to support teachers’ implementation of a new instructional resource. The district (or researchers studying the phenomena, or both) may view the instructional resource as the innovation, with the coaching strategy acting as an influential factor (an implementation support strategy). Other districts or researchers, however, might view the coaching strategy and the instructional resource as key elements in a larger, systemic innovation. In the latter case, the coaching strategy is an innovation element, not an influential factor.
The challenge also applies to distinguishing between influential factors and outcomes. For example, teacher characteristics (e.g., beliefs about how students learn, values about education, attitudes about the subject matter, interpersonal skills) are often identified as key mediating or moderating variables that support or inhibit innovation enactment (e.g., Dane & Schneider, 1998; Domitrovich et al., 2008; Hulleman et al., 2013; Macklem, 2014; Ruiz-Primo, 2005). That is, they are considered influential factors. In some cases, however, individual attitudes, perceptions, and skills are explicitly targeted as desired outcomes of innovation enactment and theorized to change as a result of innovation use or participation.
The evident complexity of innovation enactment, its influences, and the multiple functions that variables play at different points in the implementation process challenge us as a field to find ways to communicate about our work in a clear and unambiguous manner. Having both clear language to discuss the innovation elements and shared conceptual understanding about the categories of influential factors will enable researchers to study implementation of their innovations of interest with the methodologies they embrace while being able to meaningfully share findings with one another.
Measurement and Analysis of Innovation Implementation
Perspectives on Measurement
For the purpose of this chapter, we use the definition of measurement provided by Zeller and Carmines (1980)—“the process of linking abstract concepts to empirical indicants” (p. 2)—which involves operationalizing abstract concepts so that they may be observed. Thus, here we discuss the ways the determination of innovation components, influential factors, and innovation enactment may be operationalized for the purpose of systematic measurement. In this process, the translation from abstract concept to operational definition does not dictate a single methodological orientation or research method. Peters et al. (2013, p. 2), in outlining a set of key principles for implementation research, emphasize that the research question, typically organized around a theory of change or specific research objective (i.e., to explore, describe, influence, explain, or predict), drives the specific methods used and assumptions taken. Research questions and associated objectives may be shaped by the five purposes for studying implementation outlined earlier (see Table 1). Peters et al. describe a wide range of qualitative and quantitative research methods, including randomized controlled trials (RCTs), participatory action research, and mixed methods, that can be used to achieve these very different purposes.
Implementation research measurement has been informed by two main perspectives. First, studies conducted with the objectives of designing and developing innovations and establishing their efficacy and effectiveness are frequently driven by an interest in evaluating fidelity of implementation of the innovation—that is, the extent to which the innovation (i.e., its core components) was enacted as intended. In comparison, a second perspective resides in studies driven by the goals of innovation improvement and exploring relationships between innovations, contextual factors, and outcomes. These studies are less concerned with evaluating fidelity and more concerned with describing implementation as conducted (i.e., what actually happened), the extent to which desired outcomes are achieved, and why. They focus on questions such as the following: Is the innovation (i.e., its core components) being used, how is it being used, and to what extent? What is being adapted or modified from the original model? Why?
Research coming from the first perspective (fidelity of implementation) asks, “Was the innovation enacted as intended (i.e., compared to an ideal standard)?” In the context of design and feasibility studies for example, understanding fidelity of implementation even at the level of an individual component can inform developers (whether external to the setting, internal to the setting or a combination) about the innovation’s potential to be enacted by the intended user. For experimental or quasi-experimental designs underlying efficacy or effectiveness studies, documenting fidelity of implementation in the treatment group is necessary to determine whether the treatment group indeed received the intended treatment (the innovation). Similarly, fidelity levels may be measured in the control or comparison condition (to the extent that common core components are identified) and then used to determine whether the two groups are sufficiently differentiated (Mowbray et al., 2003; Nelson et al., 2012; O’Donnell, 2008).
Research coming from the second perspective (describing implementation as conducted) may also examine relationships between implementation and outcomes without necessarily comparing it to a theoretical ideal. Such studies may seek to describe the extent and nature of innovation use in practice, including adaptations and omission of core components, and explore the contextual factors that support or inhibit innovation use. Rather than bringing an evaluative view to the innovation enactment, some studies that examine implementation as conducted bring a more descriptive and explanatory approach to the inquiry. Documenting implementation as conducted enables researchers to understand the ways that innovations are operationalized in practice, the influential factors that affect that practice (Hamilton & Feldman, 2014), the different patterns of practice, or “configurations” (Hall & Loucks, 1978a), and in some studies, the relationships between these patterns and outcomes. Researchers using this approach can also catalogue the nature of the adaptations that end users make to better describe the range of beneficial, acceptable (i.e., aligned with program goals and theory), or unacceptable adaptations (Durlak & DuPre, 2008; Hall & Loucks, 1978a; Moore, Bumbarger, & Cooper, 2013; O’Donnell, 2008; Penuel et al., 2014).
These two perspectives provide complementary and useful information about the status of and mechanisms driving innovation effectiveness. The first (fidelity of implementation) has been the primary focus of quantitative measurement in education to date, due in part to the recent predominance of RCT research designs. Fidelity of implementation is hypothesized to mediate the effect of random assignment to treatment or control and is important for understanding the mechanisms by which an innovation achieves its effects (Hansen, 2014). The second (implementation as conducted) involves descriptive or correlational analyses crucial for identifying critical components, specifying ranges of acceptable adaptation, and identifying key supportive and inhibiting contextual factors—data that can inform innovation improvement and endurance.
Notwithstanding the value of studying implementation as conducted and the wide range of implementation research purposes, the next section focuses on emerging understandings of fidelity and evolving strategies for fidelity measurement. The attention given to fidelity here is not intended to emphasize its importance over other perspectives but rather is a reflection of its dominance in the literature targeting implementation measurement.
Conceptualizing Fidelity of Innovation Implementation
While terminology varies across studies, fidelity criteria are generally related to which and how many core components are used, how much of the innovation is delivered and/or received, how well the innovation is delivered, and the level of participant engagement in the innovation (Sanetti & Kratochwill, 2009). Some authors suggest that in addition to these criteria, innovations come with expectations for how, when, and by whom each service should be provided, which may also be evaluated as part of fidelity measurement (e.g., Macklem, 2014; Sanetti & Kratochwill, 2009; Weiss et al., 2013).
Still, the most widely cited criteria for fidelity measurement are attributed to Dane and Schneider (1998), who conducted a literature review that identified five “aspects of program integrity,” which they labeled adherence, exposure, quality of delivery, participant responsiveness, and program differentiation. The first four criteria describe ways the innovation may be enacted or engaged in by users and/or recipients; program differentiation describes a manipulation check carried out by the researcher to ensure that subjects in each experimental condition received only planned interventions (to safeguard against diffusion of treatments). At the time, Dane and Schneider established these categories as provisional, indicating that uniform definitions were needed; yet, 15 years later, Darrow (2013) noted that while the five aspects of implementation identified by Dane and Schneider (1998) are often cited, “individual interpretations and lack of consensus around those categories exist” (p. 1140). Multiple authors have suggested their own definitions, many of which continue to vary across studies or lack the level of detail needed to guide researchers in what, exactly, should be measured.
Hansen (2014) explains that the language of fidelity assessment is neither universally applied nor universally understood: “Even the most basic of terms—fidelity, adherence, dosage, engagement, program differentiation, and adaptation—may have one meaning for an evaluation staff and a very different meaning for practitioners” (p. 336). For example, across studies in the education literature, definitions of adherence capture the general concept of “doing what was expected or as intended,” but as operationalized this may involve doing specified program activities (e.g., Benner, Nelson, Stage, & Ralston, 2011; Macklem, 2014), doing them at the recommended quantity (e.g., Abry et al., 2015; Moore et al., 2013), and doing them fully (e.g., Zucker, Solari, Landry, & Swank, 2013). Similarly, “exposure” has been operationalized in two ways: as the amount of innovation received from the perspective of the recipients (i.e., students, learners; e.g., Domitrovich et al., 2008; Dusenbury et al., 2003; Weiss et al., 2013) and as the amount of innovation delivered from the perspective of the end users (Durlak & DuPre, 2008; O’Donnell, 2008; Ruiz-Primo, 2005). In measuring quality, some researchers have measured end user characteristics (which we suggest fall into the spheres of influence), such as enthusiasm, interpersonal style, preparedness, and level of skill as indicators of quality (e.g., Domitrovich et al., 2008; Lynch & O’Donnell, 2005; Macklem, 2014; Pence, Justice, & Wiggins, 2008), while other researchers have focused on innovation use and on “how well” the innovation is delivered (e.g., Abry et al., 2015; Benner et al., 2011; Zucker et al., 2013). Participant responsiveness is sometimes defined as a measure of recipient (learner) participation and engagement (e.g., Lynch & O’Donnell, 2005) and sometimes as the interest and attention of both the end users and the recipients (e.g., Carroll et al., 2007).
Mowbray et al. (2003) and O’Donnell (2008) have encouraged researchers to consider the fidelity of implementation of process-related components (i.e., users and recipient behaviors and interactions) as well as structural components (i.e., innovation-specific materials, resources, and activities). Since that time, the field has increasingly acknowledged this view, resulting in hybrid models that organize Dane and Schneider’s (1998) fidelity criteria into a larger “structure” and “process” framework. More specifically, researchers have created measures of “fidelity to structure” that generally align with the concepts of adherence and exposure, while “fidelity to process” generally aligns with the concepts of quality and participant responsiveness (e.g., Benner et al., 2011; McKenna et al., 2014; O’Donnell, 2008). Although placing Dane and Schneider’s (1998) criteria into a structure/process framework provides a richer representation of enacted practice, given the lack of clear definitions and varied interpretations, this approach does not resolve problems of clarity and consensus.
The need for our field to establish clearly understood language regarding fidelity of implementation has been discussed by many researchers over the past decade (e.g., Century & Cassata, 2014; Greenhalgh et al., 2004; Irwin & Supplee, 2012; O’Donnell, 2008; Sanetti & Kratochwill, 2009). The need remains, and is as urgent as ever. The Department of Education’s Institute of Education Sciences (IES) 2015 funding solicitation asks researchers not only to provide evidence regarding the impact of innovations but also to describe sufficient implementation for achieving beneficial effects so that those effects can be generalized to new contexts. However, as illustrated above, determining what constitutes sufficient implementation is largely a matter of perspective. As of this writing, the field has no agreed-on, systematic way to do so.
The Process of Innovation Implementation Measurement
Several step-by-step guidelines and resources now exist to assist researchers seeking to measure implementation (e.g., Hall & Loucks, 1978a; Mowbray et al., 2003; Nelson et al., 2012; O’Donnell, 2008). As previously noted, these guidelines are discussed almost exclusively in studies focused on measuring fidelity of implementation rather than implementation as conducted, and relate primarily to quantitative data sources. They describe key decisions that researchers must make in the process of developing, administering, and using implementation measures:
Identifying and operationally defining the core components of a given innovation model
If relevant to the study, determining fidelity benchmarks, or expectations for component enactment (typically determined by innovation developers)
Developing a theoretical model linking core components (and mediating variables) to outcomes in a causal chain
Specifying the methods and data sources used to measure each core component
Selecting an appropriate time frame for data collection
Ensuring the data collected are reliable and valid
Determining how the data will be summarized and/or reduced for analysis
While an extensive review of each of these decisions is beyond the scope of this chapter, we summarize a few key areas of consensus below.
Data Sources
A range of data sources can be used for implementation measurement, including expert observations, user interviews and self-reported surveys, and collection of institutional records (Hansen, 2014; Scheirer & Rezmovic, 1983). For decades, direct observation conducted by expert raters has been considered the preferred method for assessing innovation implementation (Durlak, 2010; Fullan & Pomfret, 1977; Leithwood & Montgomery, 1980; Ruiz-Primo, 2005). While experts can potentially work with any data source (including interviews, videos, logs, and other documents), observations by expert raters have generally been regarded as the most direct measures of practice, the most rigorous, and the most objective with respect to implementation quality, compared to self-reported data. Domitrovich, Gest, Jones, Gill, and DeRousie (2010) note that unlike researcher observation, users’ reports via surveys or interviews may be inaccurate, for example, if their concerns about social desirability lead them to inflate their ratings. Furthermore, if the majority of respondents report their implementation at very high levels, variability in practice is not captured and the resulting data are not useful for outcome analysis.
However, with all of their advantages, observations are acknowledged to present clear practical challenges. They are expensive, often not feasible with large samples, and can be time-consuming (Domitrovich et al., 2010; Fullan & Pomfret, 1977). In addition, while observations can capture program elements with less potential bias than self-report measures, many innovation components are less observable and are difficult to assess (Fullan & Pomfret, 1977; Ruiz- Primo, 2005; Snyder et al., 1992) or not observable at all. Moreover, observations may not capture core components with enough precision (Leithwood & Montgomery, 1980), and the people being observed may act differently with the knowledge of being observed (McKenna et al., 2014). For these reasons, the use of a multimethod, multi-informant approach is recommended, including using observations to confirm self-reports (Domitrovich et al., 2008; McKenna et al., 2014; Mowbray et al., 2003; Nelson et al., 2012; Snyder et al., 1992). In some cases, self-report methods may be preferred, for example, in measuring provider knowledge, understanding, and other individual user characteristics (Fullan & Pomfret, 1977) or in documenting how often users completed specific activities or lessons (Durlak, 2010). Furthermore, in cases where analysis requires statistical power, self-report may be the only practical way to obtain a sufficient data set.
Instrument Validation
Recommendations in the literature encourage researchers to ensure that the measures used to assess fidelity are reliable and valid (Mowbray et al., 2003; Nelson et al., 2012; O’Donnell, 2008). However, recent reviews from the school psychology and health education fields reveal that psychometric properties of fidelity measures are rarely provided (Dusenbury et al., 2003; Schulte, Easton, & Parker, 2009). When researchers do report psychometric data, they typically report assessment of interrater or interobserver agreement, assessment of intraclass correlations among raters, examination of the internal structure of the data through measures of internal consistency or confirmatory factor analysis, and test-retest reliability (Mowbray et al., 2003; Scheirer & Rezmovic, 1983; Schulte et al., 2009). In turn, researchers most frequently report validation strategies such as establishing face validity, concurrent validity (comparing differences in fidelity scores among known groups), convergent validity (comparing data collected across multiple data sources), and predictive validity (examining relationships between fidelity of implementation and expected participant outcomes; Mowbray et al., 2003).
Timing of Data Collection
While there is limited research on how many time points are needed to capture changes in implementation, the consensus is that innovation implementation is dynamic and should be measured on multiple occasions (Durlak, 2010; Harn et al., 2013; Odom et al., 2010). Empirical data support this view; analytic strategies such as linear growth modeling and growth curve modeling have been used to illustrate changes in implementation status over time (Clements, Sarama, Wolfe, & Spitler, 2015; Domitrovich et al., 2010). In addition, decisions about data collection timing require careful consideration of the nature of the innovation itself, as researchers need to be able to capture all innovation core components, including those that happen frequently as well as those that happen only occasionally (Domitrovich et al., 2010). Finally, it is important to take into account the presence of contextual factors surrounding the innovation, including training, which may indicate whether and how much change can be expected over time (Durlak, 2010).
Data Reduction
The literature presents two primary approaches to data reduction consistent with the two main approaches to quantitative measurement of innovation implementation: (a) seeking to create fidelity indices, or variables summarizing the degree of deviation from, or convergence with, the innovation model across items (the predominant approach in the literature) and (b) seeking to create measures that represent the extent of component use without reference to fidelity criteria. While the first approach creates measures that represent distance from the ideal (i.e., the difference between what was enacted and what should have been enacted), the second approach creates measures that represent the absolute extent or degree of enactment. Research studies may use data measuring the extent or degree of enactment to discover what is ideal, for example, by examining relationships between variation in component enactment (in terms of quantity or quality) and outcomes.
Fidelity indices approach
In creating fidelity indices, researchers measure core component enactment with respect to one or more predetermined criteria such as component presence, quantity, or quality of delivery and/or receipt (e.g., Hord et al., 1987; Ruiz-Primo, 2005). Researchers create indices with varying levels of specificity. It is common for researchers to create fidelity indices representing the innovation as a whole through a “composite fidelity score” (e.g., Abry et al., 2015; Aladjem et al., 2006; Blakely et al., 1987; Pas & Bradshaw, 2012). Hulleman and Cordray (2009) demonstrated multiple ways these indices can be created, including a proportion score in which achieved fidelity is divided by maximum possible fidelity, a binary score that involves assigning a dichotomous yes or no value for fidelity, and an average score in which higher scores represent more or better fidelity. However, the more general the index, the less nuanced information is available with which to understand implementation. In recent years, more detailed approaches for creating fidelity indices have emerged, such as creating multiple fidelity indices for different core components (Abry et al., 2015; Aladjem et al., 2006) and creating separate indices to represent fidelity to “structural” and “process” aspects of the innovation (Odom et al., 2010).
Component approach
The component approach to data reduction creates indices that represent the degree of core component implementation without making the comparison between what was enacted and what was intended. This approach is useful for empirically determining which aspects of materials are linked to learning outcomes, that is, “for testing theories about what are the ‘active ingredients’” (Penuel et al., 2014, p. 773), as well as for answering the question of “how much” enactment is enough, so that we can create meaningful thresholds for the necessary presence and amounts of components (Abry et al., 2015). Some studies using the component approach provide descriptive reports of implementation (listing the components that were implemented most and least), while other studies explore relationships between component enactment and learning outcomes (e.g., Fogleman et al., 2011) or between component enactment and influential contextual factors (e.g., Stein & Kaufman, 2010). It is important to note that reporting implementation as enacted does not preclude the comparison of implementation as enacted to previously designated benchmarks or recommendations, although this type of reporting is less common (see Agodini et al., 2009, p. 41, as an example).
Analyzing the Implementation Process
In general, implementation analyses in the literature to date have focused on fidelity of implementation for three purposes: (a) discriminating between the treatment and control groups, (b) exploring relationships between variation in fidelity of implementation and learner outcomes, and (c) exploring relationships between variation in fidelity of implementation and contextual factors.
Discriminating Between Treatment and Control Group
With educational innovations, even when there is randomization, one cannot always assume there is a clear differentiation between treatment and control groups. Furthermore, the treatment group may deviate from protocol in a beneficial way (sometimes called positive infidelity; Cordray & Hulleman, 2009). For these reasons, several researchers have recommended computing the difference between fidelity in treatment and control groups (Durlak & DuPre, 2008; Mowbray et al., 2003; O’Donnell, 2008). Some researchers have conducted significance tests as a basic way to examine differences in fidelity scores between treatment and comparison conditions (O’Donnell, 2008), while others have devised more sophisticated methods to capture the differences. For example, the achieved relative strength index is a measure of intervention strength representing the standardized difference in the extent to which teachers enact core intervention components in the treatment and control conditions (Cordray, Pion, Brandt, Molefe, & Toby, 2012; Hulleman & Cordray, 2009).
Relating Fidelity of Implementation to Outcomes
Researchers have employed a range of linear modeling approaches for exploring relationships between fidelity and learner outcomes. For example, correlational analyses explore the nature of association between fidelity variables and outcomes (e.g., Odom et al., 2010), while multiple regression analyses assess the relative contribution of multiple fidelity variables to variation in outcomes (e.g., Benner et al., 2011). In contexts where study participants are nested (e.g., students within classrooms or groups), multilevel modeling approaches assess the extent to which fidelity (as one source of between-teacher variation) explains variation in student outcomes (e.g., Odom et al., 2010; Zvoch, 2012). Within the context of experimental designs, replacing the causal variable, or “intent to treat” (a dichotomous variable representing the treatment or control group as randomized), with a fidelity index score computed for each group estimates the effects of “treatment on the treated,” as recommended by Cook (2005; e.g., Davidson, Fields, & Yang, 2009; Justice, Mashburn, Pence, & Wiggins, 2008; T. Smith et al., 2013). Structural equation modeling approaches, while less common, may be used to estimate mediated or indirect effects of the innovation treatment through fidelity indices (e.g., Abry et al., 2013; Gennetian, Bos, & Morris, 2002). Although none of these analyses enable causal conclusions, they do provide important information for understanding the content within the “black box” of observed causal effects.
Mediation analyses are particularly useful for analyzing the impacts of multilevel, complex interventions where multiple components are placed together and theorized to work in a causal chain to influence outcomes. Mediating variables are those affected by the innovation and that, in turn, affect the outcome of interest (Donaldson, 2001; Raudenbush & Bloom, 2015). The relationship between mediator and outcome is indirect, in that the mediating variable is a precursor to the outcome variable of interest. In theory, hypothesized mediating relationships can be tested through pilot studies and small-scale data collection, repeating the process until the developer decides the innovation is worthy of full-scale implementation. The goal of repeated analysis is to find a parsimonious model that accounts for a large percentage of variance in the desired outcome variables. Still, researchers who are focused on such approaches face a conundrum because the limited empirical data available during innovation development do not typically provide enough statistical power to analyze mediating relationships. Donaldson (2001) suggests that even with limited resources, descriptive quantitative or qualitative data on enactment of the variables of interest can be used to gauge mechanisms by which innovations appear to work, resulting in immediate information to guide program development and hypotheses that can be tested in larger scale studies. These issues are of less concern for studies focused on complex, place-based innovations and studies not seeking to establish causal findings.
Relating Contextual Factors to Implementation
Analyses examining factors that influence fidelity of implementation reside one step to the left in the causal chain leading to outcomes; the contextual factor variables become independent variables, and the implementation variables become dependent variables. Such analyses are sometimes carried out to supplement the initial findings from RCTs by describing for whom or under what conditions under which the innovation “worked,” to uncover the best strategies for supporting optimal implementation, or to discover the influence of variation in context on implementation in a scale-up or sustainability study. These mostly exploratory analyses generally fall into two categories: those that examine short-term or immediate influences on implementation (e.g., Kurki, Boyle, & Aladjem, 2006; McCormick, Steckler, & McLeroy, 1995; Penuel, Fishman, Yamaguchi, & Gallagher, 2007) and those that examine long-term influences supporting sustained implementation over time (e.g., Clements et al., 2015; Lieber et al., 2010; McIntosh et al., 2013). Notably, in addition to quantitative studies, there are numerous examples of qualitative and mixed-methods studies that examine the influence of supportive or inhibiting factors on innovation implementation and sustainability (e.g., Billing, Sherry, & Havelock, 2005; Century & Levy, 2002; Lieber et al., 2009; Rijsdijk et al., 2014).
The influence of contextual factors may also be examined in terms of moderating variables in a theory of change. Moderating variables affect the direction or strength of relationships between predictor variables and outcomes by reducing, enhancing, or changing their influence (Fairchild & MacKinnon, 2009). Moderator effects are often referred to as interaction effects, where the effect of one variable depends on the levels of the other variables in the analysis. Moderator variables by definition must be uninfluenced by the innovation and observable prior to engaging in the innovation (Raudenbush & Bloom, 2015). Demographic characteristics of participants, such as gender, race, ethnicity, and socioeconomic status, are often included as moderating variables. Subgroup analysis can explore questions about for whom an innovation is most effective, and in what kinds of sites innovations work best. However, from a quantitative perspective, Fairchild and MacKinnon (2009) caution that, as in mediation analysis, the statistical power required to detect interaction effects often requires a sample size greater than typically available during innovation development.
Making New History: Seizing the Opportunity to Accumulate Knowledge
Looking at the current state of the literature in the context of a long and varied history, it is clear that although the field of education has made good progress, there is far to go. This section identifies decades-old needs and challenges that have yet to be resolved, new challenges ahead, and considerations for implementation research that we, as a field, must address.
Resolving the Fidelity-Adaptation Debate
Despite enduring challenges, there is a clear consensus that measuring innovation implementation provides one important avenue for understanding the education improvement process. In a review of over 200 experimental intervention studies, Sanetti, Gritter, and Dobey (2011) found that between 2008 and 1995, the proportion of studies reporting quantitative fidelity measurement increased threefold. Funding agencies increasingly require researchers to include fidelity data in innovation development and evaluation work (e.g., Stockard, 2010; U.S. Department of Education, IES, 2015; U.S. Department of Education, IES, & National Science Foundation, 2013). Furthermore, over the past 5 years, multiple professional development workshops and webinars on the topic of implementation measurement have been offered to education researchers, sponsored by organizations such as the Society of Research on Educational Effectiveness, the Global Implementation Initiative, and the National Implementation Research Network.
Profidelity Perspective
Underlying the rationale for measuring fidelity of implementation, in particular, is the view that once an innovation is found to be efficacious, future implementations should not deviate from the established “proven” or “evidence-based” model (which we will refer to as the profidelity view). This perspective recommends that users should be provided with supports that ensure that fidelity of implementation is sufficiently high. This profidelity stance has been extensively documented and referenced for decades as the dominant perspective on how end users should approach the use of novel practices and strategies that are identified as evidence-based (Blakely et al., 1987; Penuel et al., 2014; Snyder et al., 1992). In short, some of the literature embraces the assumption that more fidelity is better (Buxton et al., 2015; Cho, 1998; Macklem, 2014; Moore et al., 2013), and the default rule of thumb has been that “it is best not to tinker with the prescribed formula”—a conclusion drawn from the absence, rather than the presence, of empirical evidence about what types of adaptations are beneficial or harmful (Halle, Metz, & Martinez-Beck, 2013; Moore et al., 2013, p. 149). The profidelity position tends to view the educational improvement process as linear and rational, concerned with faithful implementation and minimizing variation and deviation from efficacious innovation models (Snyder et al., 1992).
Pro-Adaptation Perspective
Yet, for as long as the profidelity view has existed within the field, there has also existed an alternative perspective supported by assertions in the change and knowledge utilization literature. In this perspective, the innovation user’s adaptations of innovation elements (rather than strict adherence to them) is key to reproducing positive outcomes from one context to another and bringing about ongoing improvement. We call this the pro-adaptation perspective. More specifically, this perspective holds that adaptations from an original innovation model may add effective strategies, make the innovation more contextually relevant (e.g., McGrew, Bond, Dietzen, & Salyers, 1994; Sanetti & Kratochwill, 2009), or establish that components once considered core to the innovation are not, in fact, vital (Harn et al., 2013; Macklem, 2014). Over the years, researchers have stated that variation within and across sites and over time is expected (Berman, 1981), that perfect implementation is never obtained (Durlak, 2010; Moore et al., 2013), and that adaptation is the natural tendency of implementers and of the change process (Dearing, 2009; Hall & Hord, 2015).
Pro-adaptation advocates contend that a strict fidelity perspective is an “outsider’s perspective,” in which the developer is deemed the best person to dictate what implementation should look like (Buxton et al., 2015). Others identify end users as the active agents, focusing on their need to be responsive to their context and to act in keeping with the personal meaning-making that guides their decisions (Berman & McLaughlin, 1976; Buttolph, 1992; Buxton et al., 2015; Penuel et al., 2014). Furthermore, Blakely et al. (1987) reported many studies finding that local buy-in of end users was enhanced by enabling program adaptation; in turn, their buy-in helped maintain ongoing program operations. It is also well documented that developers themselves (or other intermediary organizations) may alter their innovations to ensure success in a new context if they believe the new iteration is a better fit—a process known as “mutual adaptation” (Berman & McLaughlin, 1976; Dearing, 2009; Dusenbury et al., 2003). The adaptation perspective suggests that the success and eventual sustainability of an innovation may depend on its potential for use in a variety of ways (Buxton et al., 2015; Leithwood & Montgomery, 1980).
Mitigating the Artificial Divide
Between the extreme profidelity and pro-adaptation points on the spectrum, there is, of course, a middle ground that asserts that different kinds of adaptations are acceptable depending on their extent of alignment with or deviation from program theory. This view considers adaptations acceptable to a certain point (sometimes called the “point” or “zone” of “drastic mutation”), as long as they are congruent with the goals and principles of the designers (Coburn, 2003; Hall & Loucks, 1978a; Kelly et al., 2000; Penuel & Fishman, 2012). DeBarger et al. (2013) use the term productive adaptations because such adaptations respond to the demands of the context and are consistent with the innovation’s core design principles. The authors note that productive adaptations may themselves be evidence-based, emerging from the tacit knowledge of the practitioners, which has developed through careful observations of their learners.
In reality, there is simply not enough information to determine which innovation components are truly critical to an innovation’s effects, how much of a particular component is “good enough,” or how much adaptation is acceptable. With the many interacting variables at play within educational innovations, the answer is almost certainly, “It depends.” Some researchers have suggested that certain types of innovations, such as those that are less specified or less structured in nature, may inherently demand more interpretation by the provider (and in turn, more adaptation; Cohen & Ball, 1990), while innovations that are well structured and well specified are better suited to implementing with fidelity (Berman, 1981; O’Donnell, 2008). Other researchers point to the importance of understanding why adaptations are made, where implementation challenges come from, and how the nature of an adaptation matters for outcomes, before making a priori judgments about their appropriateness (DeBarger et al., 2013; Remillard, 2005). And perhaps most important, others highlight the need to understand the relative contributions of different innovation components and how different levels of component enactment are related to outcomes (Abry et al., 2015; Damschroder et al., 2009; Durlak, 2015; Odom, 2009).
Finding Clear Language and Shared Conceptual Understanding
As education researchers, we need only look at our history to ascertain what we need to change moving forward. We have already established a lack of consensus in the ways that fidelity of implementation data are described and reported in education research literature (Dane & Schneider, 1998; Darrow, 2013; Downer & Yazejian, 2013; Missett & Foster, 2015; Mowbray et al., 2003; O’Donnell, 2008; Sanetti et al., 2011; Scheirer & Rezmovic, 1983). These reviews, spanning several domains, reveal inconsistency in terms of operationally defining fidelity of implementation, data sources used, and analytic strategies. More recent reviews also reveal a tendency for researchers to measure structural aspects of implementation (e.g., adherence or quantity) rather than quality (e.g., Downer & Yazejian, 2013; Sanetti et al., 2011) and failure to analyze fidelity data in relation to measured innovation outcomes (Downer & Yazejian, 2013; Missett & Foster, 2015).
Now, there is momentum toward convergence in three areas: (a) the importance of identifying core components, (b) categories of influential factors, and (c) the definition of fidelity of implementation (enactment of an innovation compared to a model or theoretical ideal). However, given that the conversations underlying these points of agreement have taken place over decades, we need to find ways to move more quickly, more coherently, and more collaboratively. We still have far to go with regard to describing innovations and the factors that influence them, reconciling intersections of theoretical frameworks, and moving forward with consistent or complementary terminology, measures, and analytic strategies. The work of the past 100 years, accelerated by the increased focus on implementation in the past 20 years, gives us much to build on; it is there for the taking. With greater clarity about language and shared conceptual frameworks, we will be able to compare results and accumulate knowledge about the innovations we are studying and their abilities to generate desired outcomes for learners (Century & Cassata, 2014; Darrow, Goodson, & Boulay, 2014).
Broadening Our Conception of Evidence
It may be easier to progress toward agreement on common frameworks and processes than on some of the thornier, more philosophical issues, such as the fidelity-adaptation debate and the closely related question, what is evidence? These two issues are, in fact ultimately linked. The reasoning in favor of enacting innovations with fidelity is that once there is evidence that an innovation works, we must try to replicate the effects (Chambers et al., 2013). In contrast, some argue for a broader definition of legitimate, credible evidence and assert that because schools are social systems within which knowledge is socially constructed, school improvement should not focus on strict replication (Cousins & Leithwood, 1983, as cited in Helmsley Brown, & Sharp, 2003, p. 14).
Evidence Is Relative to Context
While one may assume that innovations designated as “evidence-based” are indisputably of value, in reality, evidence is relative. There are two important caveats to the “evidence-based” designation. First, evidence established in an efficacy study is grounded on the assumption that the intervention will be implemented with the experimental (treatment) group in the ideal manner. Second, RCT evidence, by its nature, is relative to what was happening in the control or comparison group in a particular study (Gennetian et al., 2002). Given that effectiveness studies often compare treatment to business-as-usual conditions, we need to acknowledge that what is effective now may not continue to be effective in 5 years—even if enacted in the ideal over time—if the business-as-usual condition undergoes a change (Lemons, Fuchs, Gilbert, Fuchs, 2014).
Second, the widely recognized reality that actual implementation is always different from the theoretical ideal is a reminder that evidence coming out of efficacy and effectiveness studies, while informative, is not definitive. Chambers et al. (2013) question why we “reify early phase interventions tested in the most artificial settings” (p. 3), pointing out that doing so leads to an overreliance on quality assurance practices and, in turn, to missed opportunities for ongoing improvement through innovation customization and optimization. With innovation use in ordinary settings as the end goal in mind, education research will benefit from a shift in the conception of educational innovations as static “evidence-based entities” to be replicated, toward the recognition that innovation enactment is a dynamic process influenced by and adapted to the local context.
Even within the medical field, the push to replicate treatments is criticized. Naylor (1995) refers to the grey zones of practice where good clinical medicine blends “the art of uncertainty with the science of probability” (p. 841). Good clinicians use what Feinstein and Horwitz (1997) call “soft” information (e.g., severity and nature of the symptoms, associated diseases, rate of symptom growth) to make decisions; the authors assert that “a good clinician constantly uses this ‘soft’ information for diverse clinical decisions” (p. 531). Soft information is like the factors that affect implementation—the very kind of information that implementation research seeks to capture. While RCTs provide clinicians with information that is helpful for the average patient, the information will not apply to all patients (Feinstein & Horwitz, 1997). There simply is not an “average” patient. Nor, we assert, is there an average teacher, school, district, or system.
In medical clinical settings, “what works” is a matter of judgment that weighs, judges, and considers RCT findings in context (Morrison, 2001). The same is true in education, where scholars increasingly recognize that “evidence” means different things to different innovation enactors (Coburn & Talbert, 2006). Sanderson (2003) argues that “the question for teachers is not simply ‘what is effective,’ but rather, more broadly it is, ‘what is appropriate for these children in these circumstances’” (as quoted in Oancea & Pring, 2008, p. 23).
Embracing Complexity
Research in highly controlled settings is driven by the desire to reduce variability in order to draw clear conclusions about results. As acknowledged earlier, there is a place for this kind of study in the spectrum of implementation research. However, when moving from highly controlled conditions to ordinary settings, rather than try to manage the many influences that shape an end user’s implementation, the field needs to embrace that complexity. To truly understand how an innovation will be effective in a new context and/or over time, we need techniques that illuminate the ways that innovations are associated with desired outcomes by recognizing and accounting for complexity rather than reducing it (Burns & Knox, 2011; Snyder, 2013). To do otherwise denies the reality of complex social systems, where individuals interact, feedback loops exist, one action influences another, and individuals in the system are in constant development as a function of their ongoing experiences.
Designs that isolate, decontextualize, and simplify issues of complexity decrease the applicability of implementation research results. “The key problems of today are ‘wicked’” (Kessler & Glasgow, 2011, p. 638), and to solve them, we need to put all of our methodologies to work and recognize that some questions of implementation research can be fully addressed only with all of the techniques we have on hand and those not yet developed. The evidence-based movement carries the suggestion that solving problems of educational improvement is akin to a linear algorithm or technical fix. In such a paradigm, the problem space is knowable, all variables in the model that contribute to the outcome can be defined, and variables are treated largely as independent of one another (Preskill, Gopal, Mack, & Cook, 2014; Snyder, 2013). Yet educational innovations of any type are, in reality, much more complex. In systems, independent variables are not all knowable and do not behave uniformly at all times; rather, both end users and innovations are co-evolving (Nespor, 2002; Preskill et al., 2014). New innovation designs and associated analytic approaches that account for this complexity may provide much-needed insight into what it truly takes to realize lasting educational change.
Technological advances in data analysis emerging in the life sciences, economics, and systems science have something to offer education researchers, who seek to solve equally but differently complex problems (Lemke & Sabelli, 2008). For example, Burke et al. (2015) outline three nascent methodological approaches for implementation research—system dynamics modeling, agent-based modeling, and social network analysis—that may capture the complexity of the problem space of educational innovations. These methods enable modeling aspects of complex systems such as delays between cause and effect, nonlinear relationships between variables, and unanticipated outcomes—analyses that are limited by those using traditional methodological approaches. While such methods are not appropriate for the purpose of predicting what will happen, they are useful for describing and theorizing what has happened or what is happening currently, and for providing a useful paradigm for understanding the complex landscape where we work to improve education (Morrison, 2010). A commitment to looking beyond the current borders of education research will be essential for the creation of new approaches to managing and analyzing implementation data in increasingly meaningful ways.
Accumulating Knowledge
A commitment to embracing complexity, hand in hand with the field’s convergence on a component approach to describing innovations and frameworks for organizing influential factors, can enable us to accumulate knowledge and move toward our ambitious shared goal of making education better. Innovations today are commonly viewed as proprietary packages, but they share many of the same essential elements. Research that views innovations as combinations of components can contribute to learning not only about a specific innovation but also about its components. Mowbray et al. (2003) suggest an empirical approach of deconstructing innovation models and systematically testing the impact of key ingredients across sites. Similarly, Abry et al. (2015) envision a future where “evidence of active ingredients accumulates” and contributes to unified theories of change (p. 334). Similarly, the field can accumulate knowledge about the role of influential factors, ranging from characteristics of the innovation itself to organizational and environmental forces.
Looking Forward, Learning From the Past
We might do well to learn from the fields of medicine, public health, health care, psychology, prevention science, and other psychosocial disciplines, where the implementation research movement is alive and well. In the past 10 years alone, two cross-disciplinary communities of implementation researchers (the Society for Implementation Research Collaboration and the Global Implementation Initiative) were founded; an open-access peer-reviewed journal, Implementation Science, was launched; a yearly conference focusing on dissemination and implementation was instituted; and initiatives were launched to locate, catalogue, and rate measures associated with the CFIR framework (Damschroder et al., 2009). This is a pivotal time when the field can continue to diverge or decide to come together with coherency. Change is about changing mindsets (Joseph & Reigeluth, 2010). As Coburn wrote about rethinking scale in 2003, “scaling up” is more than a process of spreading an activity structure; it is also about spreading underlying beliefs and norms. In the field of education research, it is time that we put our own findings into practice to change ourselves.
