Abstract
Students with and without disabilities increasingly learn to read within multitiered systems of support that provide universal (Tier 1) instruction and targeted (Tier 2) intervention. Researchers have emphasized the importance of evaluating Tier 2 interventions in relation to the quality and coherence of Tier 1 instruction. Although large-scale studies increasingly report implementation fidelity and alignment across tiers, it is unclear to what extent single-case research design (SCRD) studies include such information. The SCRD encompasses a variety of within-subject experiments that are common in special education and have potential benefits for students with or at-risk for learning disabilities. In this systematic review, we examined how SCRD studies of Tier 2 literacy interventions (k = 9) reported instructional fidelity and alignment across tiers. While most studies documented fidelity for the implemented intervention, none described the nature or quality of Tier 1 instruction. Findings indicate that SCRD research rarely situates Tier 2 outcomes within the context of Tier 1, limiting interpretability and integration with the larger evidence base on reading interventions.
Keywords
Since their introduction, tiered instructional frameworks—referred to as Response to Intervention (RTI) and most recently as Multitiered Systems of Support (MTSS)—have become a defining feature of elementary reading instruction in the United States (Berkeley et al., 2020; See Note 1). Although manifestations of MTSS vary across states, components such as universal screening, progress monitoring, delivery of standard instruction in Tier 1, and supplemental intervention at successive tiers (e.g., Tier 2) represent commonalities across systems (Hill et al., 2012; Miciak & Fletcher, 2020). The MTSS are a departure from flawed methods of allocating support for students (D. Fuchs & Fuchs, 2006; D. Fuchs et al., 2004) that have nonetheless (a) blurred the distinction between students with reading difficulty and those with learning disability (LD) and (b) complicated how instructional effectiveness is ascertained (Berkeley et al., 2020; Denton, 2012; King, Wang, Datchuk & Rodgers, 2023; Peng et al., 2020). The ubiquity of MTSS has been accompanied by challenges in identifying who benefits from intensive instruction and guidelines for implementing instructional tiers (Hintze et al., 2018; Miciak & Fletcher, 2020).
Because MTSS integrate assessment and intervention into a single system, research on instructional effectiveness depends on understanding how elementary literacy instruction at Tier 1 shapes outcomes at Tier 2 (Hill et al., 2012). The use of MTSS frameworks carries two implications for evaluations of instructional effectiveness. First, the extent to which students require intensive remediation is best determined after examining their performance in the context of effective Tier 1 instruction—in other words, their RTI (D. Fuchs et al., 2010). Without this context, assessment scores may be equally indicative of ineffective instruction or the need for more intensive intervention. Second, the effectiveness of supplemental Tier 2 interventions is a function of their impact on students who have received generally effective core instruction without benefit. That is, interventions may only meaningfully be characterized as effective “Tier 2” when they improve outcomes for students who did not benefit from appropriate Tier 1 instruction. Consequently, researchers who seek to evaluate Tier 2 instruction are obligated to provide insight into the quality of instruction provided at Tier 1 (Hill et al., 2012). The quality of Tier 2 intervention research carries particular significance for students with or at-risk for LD, as Tier 2 represents a source of support and a gatekeeping mechanism for LD identification (Miciak & Fletcher, 2020). Inasmuch as students who do not respond adequately to Tier 2 instruction are more likely to receive LD diagnoses and qualify for more intensive services, understanding what constitutes high-quality Tier 2 instruction is therefore critical for distinguishing between disability and inadequate instruction (Denton, 2012).
Our understanding of the typical relationship between Tier 1 instruction and Tier 2 intervention at the elementary level is primarily derived from large-scale group-design research and observational studies (Al Otaiba et al., 2025; Hill et al., 2012). However, the extent to which single-case research design (SCRD)—a within-subject experimental approach used in much of special education with unrealized potential for students with or at-risk for LD (King, Wang, Nylen & Enders, 2023; Peltier et al., 2021)—addresses this issue remains unclear. In this article, we describe and define the implications of two factors related to the context in which Tier 2 instruction is implemented: implementation fidelity and alignment across tiers. We then describe SCRD and consider its potential to capture contextual information about Tier 1 and Tier 2 instructions. Finally, we systematically assess how SCRD studies of Tier 2 literacy interventions report information related to fidelity and instructional alignment across tiers.
Connecting Instruction at Tier 1 and Tier 2: Implementation Fidelity and Alignment
Hill and colleagues identified two dimensions of instruction related to the implementation context of Tier 2 elementary literacy intervention. The first, implementation fidelity, traditionally refers to both measuring and maintaining the correspondence of instruction with a pre-established plan or protocol (O’Donnell, 2008; Sanetti & Luh, 2019). Recent analyses of reading intervention studies define implementation fidelity as a multidimensional construct encompassing adherence (i.e., instruction delivered as intended), differentiation (i.e., instruction is distinct from other practices), exposure (i.e., appropriate instructional duration), quality, and responsiveness (i.e., student engagement), reflecting both structural and process features of implementation (e.g., Gresham et al., 2009; van Dijk et al., 2023).
Multiple types of implementation fidelity data are needed to determine whether insufficient student progress results from inadequate intervention intensity or poor implementation (Sanetti & Collier-Meek, 2019). However, few reading intervention studies report fidelity of the primary reading intervention (Capin et al., 2018); when reported, data are generally presented as a single quantified index (e.g., percentage of instructional steps completed; van Dijk et al., 2023). Scholars have suggested that observations at Tier 1 should incorporate standardized observation tools, such as the Instructional Content Emphasis–Revised (ICE-R; Edmonds & Briggs, 2003), to provide insight into the quality and characteristics of instruction (Al Otaiba et al., 2025). Given the relationship between fidelity and student outcomes (e.g., Benner et al., 2011; Capin et al., 2018), one of the primary purposes of measuring implementation fidelity is to inform changes to instruction (Kretlow & Bartholomew, 2010). Accordingly, authors should also document training and other supports (e.g., coaching, feedback) designed to improve and maintain fidelity within each tier.
Highlighting instructional features through the ICE-R and similar tools relates to the second dimension of instructional content described by Hill and colleagues (2012): alignment. Alignment refers to whether similar instructional procedures, outcomes, and philosophies were employed across tiers. Intentional alignment of instruction across Tiers 1 and 2 has been shown to yield greater effects on student outcomes (e.g., Stevens et al., 2020, 2024). This alignment is particularly important because Tier 2 interventions are designed to provide targeted support for students who do not make expected progress with core instruction in the elementary grades (Wanzek et al., 2016). In contrast, aligning Tier 3 interventions with core instruction may be less feasible due to the highly individualized nature of intensive literacy interventions. For example, Al Otaiba and colleagues (2025) observed far more code-focused instruction in Tier 3 than in Tier 1 classrooms serving students in Grades 1–5. Thus, improved outcomes in subsequent tiers and beyond may be partially explained by low implementation fidelity in Tier 1 or by high variation in instructional content across tiers.
Systematic reviews suggest authors of Tier 2 intervention studies rarely report Tier 1 fidelity data or provide information necessary to determine alignment across tiers. Hill and colleagues (2012) examined how frequently group-design reading intervention studies (k = 22) that evaluated the efficacy of elementary Tier 2 instruction reported fidelity and alignment of instruction provided across tiers. In 75% of eligible studies, authors reported fidelity data for Tier 2 interventions, but Tier 1 fidelity data were reported in just 36% of studies. Approximately 32% of studies included information that could be used to assess alignment of instruction across tiers, and these data were more commonly reported when researchers manipulated Tier 1 instruction (e.g., Loftus et al., 2010). Overall, results indicated a marked increase in reporting of Tier 2 fidelity data as compared to prior reviews (i.e., Gresham et al., 2000; Swanson et al., 2013) but limited consideration of Tier 1 instruction.
Hill and colleagues (2012) established the limited extent to which earlier Tier 2 research (O’Connor et al., 2005; Scanlon et al., 2008) reported information related to alignment and fidelity. Although Hill and colleagues’ review has yet to be replicated, more recent group-design studies have explicitly emphasized the importance of both fidelity and alignment of Tier 1 instruction as key components of validating supplemental intervention delivered at Tier 2 (Young et al., in press). For example, Stevens and colleagues (2020) compared the effects of aligned Tier 1 and Tier 2 instructions to nonaligned instruction across tiers as well as a business-as-usual (BAU) Tier 2 reading intervention for readers who did not pass the state reading assessment at the end of their third-grade year. In the aligned Tier 1–Tier 2 condition, social studies teachers received initial training and ongoing coaching to implement the same instructional practices used in the researcher-provided Tier 2 intervention. In addition, the research team monitored adherence to essential instructional components and quality of implementation across tiers in all conditions. Findings indicated that students in the aligned condition significantly outperformed students who received nonaligned intervention or BAU instruction on measures of reading comprehension, content knowledge, and vocabulary. Based on these results, Stevens and colleagues noted, It may be important for future research to include information about Tier 1 so that we can better understand the extent to which this may contextualize findings . . . it may be useful to . . . [view] Tiers 1 and 2 as connected and intentionally planning instructional delivery across the school day . . . rather than as separate pieces of the puzzle. (p. 446)
Emphasis on the importance of fidelity and alignment is also evident in recent studies where researchers did not manipulate Tier 1 instruction (e.g., Al Otaiba et al., 2014; Wanzek et al., 2017). In both studies, researchers implemented a Tier 2 reading intervention without modifying typical Tier 1 instruction. Yet they also observed and reported the content emphasis of Tier 1, allowing assessment of alignment. In addition, both research teams monitored implementation fidelity of the researcher-implemented Tier 2 intervention as well as school-implemented Tier 1 instruction. Together, these studies illustrate how researchers conducting group-design studies can evaluate a Tier 2 intervention as part of a full instructional package rather than as an isolated instructional component (Stevens et al., 2020). However, the contribution of SCRD, which represents most of the experiments in special education and increasingly appears in work involving students with- or at-risk for LD, has yet to be determined (Hott & Flores, 2023; King, Wang, Nylen & Enders, 2023).
Single-Case Design Research, LDs, and Contextualizing Tier 2
The SCRD refers to a family of within-subject experimental approaches that use repeated measurement; manipulations in intervention timing; and replication of effects across individuals, behaviors, or groups to demonstrate causal relations between instruction and outcomes (King, Wang, Nylen & Enders, 2023). Participants in SCRD generally serve as their own control, meaning that an intervention effect is determined by comparing a student’s performance during baseline (i.e., nontreatment) and intervention phases (Ledford & Gast, 2024). Despite representing more than half of experiments in special education, less than 15% of experiments published in journals focusing on LD are SCRD (King, Wang, Nylen & Enders, 2023). The rarity of SCRD in LD journals may stem from the historic tendency among researchers to interpret the design’s frequent use among small, heterogeneous populations as evidence that it is primarily suitable for examining intensive intervention, such as highly individualized Tier 3 interventions for students with LD and other disabilities (Hurtado-Parrado & López-López, 2015; King et al., 2024). Yet SCRD offers LD researchers opportunities to pilot new interventions, isolate components of effective instruction, attend to the needs of specific students, and conduct in-depth observations of the learning context across tiers (Hott, Flores et al., 2023; Peltier et al., 2021).
The small scale and flexibility of SCRD can feasibly accommodate detailed observations of instruction and other variables relevant to participant performance in Tier 2 intervention (Rila et al., 2025). These observations, in addition to facilitating the contextualization of Tier 2 reading interventions within MTSS frameworks, would be consistent with emerging standards of SCRD quality in LD research. Hott, Flores, et al. (2023) and Hott, Heiniger, et al. (2023) established the importance of providing replicable descriptions of conditions, as well as participant characteristics (e.g., previous instruction). Additional detail regarding typical classroom instruction for children with or at-risk for LD would therefore align with guidelines for SCRD research as well as calls for transparency regarding BAU instruction more generally (Gersten et al., 2005).
The intimacy and flexibility of SCRD uniquely position researchers to capture alignment, as—in contrast to the large samples and limited number of observations characteristic of most group designs—SCRD require repeated observations of a single student in their routine instructional environment and the intervention context (King, Wang, Nylen & Enders, 2023). Despite this capacity, SCRD has historically emphasized control and replication over contextual description, leaving unresolved questions about how to balance rigor with relevance (Ledford et al., 2023). Nonetheless, recent studies have demonstrated the potential of SCRD to capture instructional context.
King, Lemons, et al. (2022) and King, Rodgers, and Lemons (2022) studies concerning reading instruction for children with Down syndrome—although outside the LD context—illustrate how SCRD examining intensive reading interventions can provide detailed descriptions of Tier 1 instruction. Across studies, authors used teacher interviews and observations based on the ICE-R to document the amount, format, and content of students’ ongoing school-based reading instruction. They noted that participants typically received instruction derived from commercial curricula for at least 1 hr each day across special and general education settings. These studies provided further context for outcomes, in which students learned content aligned with the intervention but generally did not improve on curriculum-based measures of reading.
Detailed observations of participants’ typical context are unusual; however, as descriptions of conditions within SCRD typically pertain to sessions directly under the control of researchers (e.g., Hott, Heiniger, et al., 2023). As a result, the brief observation probes conducted during baseline and intervention conditions threaten to provide a false sense of participants’ behavior and potentially omit significant contextual information (Lambert et al., 2025). Rila and colleagues (2025) found that SCRD studies do not capitalize on the intimacy of the format, and many authors have called to broaden the forms of data collected and questions answered by SCRD (e.g., Hitchcock et al., 2010; Onghena et al., 2019). The extent to which this larger conversation regarding SCRD methods pertains to the issue of connecting Tier 1 and Tier 2 instructions remains uncertain, however, given that previous reviews of the literature have excluded SCRD (e.g., Hill et al., 2012).
Need for the Current Study
Researchers have increasingly recognized the importance of examining Tier 1 instruction—both in terms of its fidelity and alignment with subsequent intervention—when evaluating the effectiveness of Tier 2. More than a decade after Hill and colleagues (2012) articulated the need to assess the fidelity and alignment of instructional tiers in elementary Tier 2 intervention research, it remains unclear to what extent these elements are examined or supported in the broader literature, as no systematic review has revisited this question since the original search. Recent group-design research nonetheless suggests alignment across instructional tiers improves student outcomes (e.g., Coyne et al., 2022; Fien et al., 2015; Smith et al., 2016) or directly compares aligned and nonaligned interventions (e.g., Foorman et al., 2018; Stevens et al., 2020), underscoring alignment as a promising potential direction for future scholarship regarding MTSS. However, an emphasis on SCRD is warranted because of (a) their omission from previous examinations of this issue (Hill et al., 2012), (b) their prominence in special education and growing use among students with or at-risk for LD (Hott, Flores et al., 2023; King, Wang, Datchuk & Rodgers, 2023; Peltier et al., 2021), and (c) their potential to elaborate on the instructional contexts represented in intervention research (Rila et al., 2025). The purpose of this study is to extend Hill and colleagues’ original review by assessing the extent to which implementation fidelity and alignment appear in SCRD research evaluating Tier 2 literacy interventions for struggling readers in an MTSS context. Guiding questions include (a) To what extent is intervention fidelity reported across Tiers 1 and 2 and (b) How do studies establish alignment across Tiers 1 and 2?
Method
We addressed the research questions through a process consisting of multiple stages. First, we conducted electronic database and ancestral searches of studies reporting on supplemental reading instruction either included in or designed for inclusion in an MTSS model. We then adapted codes from Hill and colleagues (2012) pertaining to intervention fidelity and the instructional alignment of Tiers 1 and 2. We then applied all codes to identified studies over the course of a descriptive review.
Search Procedures
Studies were identified using a three-step process. Figure 1 provides a visual depiction of the search in accordance with Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines (Page et al., 2021). In EBSCO, we performed abstract, title, and keyword searches of PsycINFO, Education Research Information Center (ERIC), and Education Source databases. Although we retained the emphasis of Hill and colleagues’ (2012) original search, terms were expanded in recognition of changes in the conception of MTSS and the breadth of literacy domains subject to intensive intervention. Specifically, the search referred to variations of MTSS (“response to intervention” or RTI or “multi-tiered system” or MTSS or “tiered support system” or “graduated intervention” or “Tier 2” or “tier two” or “Tier 1” or “Tier one” or “tiered systems” or “progress monitor*” or supplemental) and reading (read* or literacy or phon* or “reading fluency” or comprehen* or “language arts” or decod* or vocabulary or) in the title and keywords. We also conducted an abstract search with additional terms related to instruction (instruct* or teach* or taught or interven* or train*). We performed an additional search for MTSS and literacy terms in all fields except full-text in ProQuest Dissertations & Theses Global. Together, the searches produced 7,671 results, which were manually assessed for duplicates using the Zotero reference management system. The team then evaluated the remaining 4,646 records for inclusion. We also conducted an ancestral search, for which we manually reviewed references cited by each study included in the corpus.

Visual depiction of literature search.
Inclusion Criteria
We identified records for full-text review following a review of abstracts, titles, and keywords. We established inclusion criteria to ensure alignment with the research questions and avoid undue assessment of articles that may have provided brief descriptions of intervention procedures (e.g., subsample of an original study).
Abstract Review
Records retrieved from the electronic search were reviewed by title, keyword, and abstract using six inclusion criteria. First, we included records describing peer-reviewed articles, dissertations, and research reports that were published between 2012 (i.e., the conclusion of Hill and colleagues’ original search) and October 2024 (i.e., the date of the search); all other documents were excluded. Studies published prior to Hill and colleagues’ (2012) review were excluded, as their review was the first to systematically operationalize and evaluate alignment between MTSS tiers in intervention studies. Applying alignment criterion retroactively to studies published before Hill and colleagues established the construct would yield findings of limited interpretive value. Studies also needed to include elementary-age students (determined by reported participants enrolled in Grades K–5) or, in the absence of grade information, participants ages 5 to 12 years. Studies with students within a single grade or year outside the established ranges were included, provided the sample also included otherwise eligible students. This preserved alignment with the original review; in addition, the content emphasis of typical secondary instruction, combined with the intensive supports necessary for students in secondary grades who experience consistent challenges in reading (Brozo, 2015), would reasonably reduce the salience of alignment in such studies.
Studies were also excluded if a nonexperimental design was used, including studies that were qualitative, descriptive/correlational, or within-subject group designs. This criterion was consistent with Hill and colleagues’ (2012) earlier review as well as the potentially diminished interest in reporting specific features (e.g., fidelity) in nonexperimental studies (King, Wang, Nylen & Enders, 2023). Studies were reviewed at this level for the presence of a literacy instructional intervention. Finally, studies needed to be published in English to be included for full-text review. Records for which there was any disagreement across any criterion were also included. Following the abstract review, we identified 823 studies for further evaluation.
Full-Text Review
After completing the abstract review, we then screened all remaining studies at the full-text level using four additional inclusion criteria. First, like the inclusion criteria set by Hill and colleagues (2012), participants in the study needed to receive supplemental literacy intervention pertaining to any aspect of reading or writing appropriate for inclusion in an MTSS or RTI model. This could be achieved by (a) the authors mentioning an explicit link made between the reading intervention and an MTSS framework maintained by the school or (b) the authors actively manipulating or influencing, by training or some other method, both Tier 1 and Tier 2-level instructions. Authors describing their intervention as suitable for Tier 2 without explicit reference to MTSS, RTI, or tiered instruction maintained by the school (e.g., Faggella-Luby & Wardwell, 2011; Furey et al., 2017) were insufficient for inclusion. Second, the intervention needed to be characterized as Tier 2 instruction given in a small-group or one-on-one setting. Interventions explicitly identified as Tier 3 were excluded to preserve consistency with Hill and colleagues (2012) and because the intensive nature of Tier 3 interventions might preclude considerations of alignment (Al Otaiba et al., 2016). Third, studies were reviewed to ensure that original data were used in the study. This meant that independent studies using extant data, or data previously published elsewhere, were excluded to prevent double-counting of findings and to avoid disadvantaging studies that may have reported procedural details across multiple manuscripts. For manuscripts for which both a dissertation and peer-reviewed article were available, only the latter was retained. Published articles are generally accepted as the authoritative source (e.g., National Information Standards Organization, 2008), and meaningful discrepancies between SCRD dissertations and peer-reviewed articles appear to be uncommon (Travers et al., 2026). Retaining the published article also avoided discrepant reporting across included studies, as dissertations falling outside the date range of the search would not have been available as an alternative source. Fourth, given that aspects of the MTSS may vary internationally, studies needed to take place in a U.S. school setting and primarily pertain to literacy instruction in English. Finally, only single-case designs examining the efficacy of Tier 2 instruction were retained. Group designs were excluded.
Upon completion of full-text screening of studies identified through the electronic search, nine studies were determined to be eligible for this review. Of these, two studies appeared as both a thesis or dissertation and a published article within the date range of the search (Boudreaux-Johnson, 2015; J. L. Kuhn, 2017); only the published article was retained. The ancestral search did not result in any additional articles.
Coding Procedures
We coded for indicators of alignment and fidelity across Tier 1 and Tier 2. We also coded whether Tier 2 instruction involved reading exclusively or also emphasized writing (e.g., written spelling, written summaries of passages), the involvement of researchers in the direct implementation of the Tier 2 intervention, and the situation of study procedures within a school-based MTSS or RTI framework. Operational descriptions of all codes appear in Table 1.
Coding Categories and Descriptions.
Note. ICE-R = Instructional Content Emphasis Revised. Alignment indicators across instructional tiers share coding subcategories unless otherwise indicated.
Alignment codes pertained to reports regarding instructional practices, content, duration, and arrangements that could be used to compare instruction across tiers. We recognized articles for providing this information even in instances where descriptions were not comprehensive. We applied alignment codes to Tier 1 instruction provided by schools (i.e., school-implemented Tier 1) and in studies where the research team indicated their involvement in Tier 1 (i.e., researcher-implemented Tier 1; e.g., direct implementation, professional development for instructors) separately. For the former, we further indicated whether descriptions of Tier 1 instruction were supplemented with interviews, surveys, or other observations.
Fidelity codes identified specific article features related to the documentation of the quality or consistency of instruction with an established protocol (e.g., frequency of observation) for researcher- and school-implemented variants of Tier 1 and research-implemented Tier 2 interventions. Specific codes pertained to reports regarding the precise definition of fidelity as well as training procedures, coaching, monitoring, and assessment tools used in assessment. We further indicated whether teachers received feedback, the frequency of observations, and procedures used to obtain fidelity scores. We also noted whether authors reported the use of Tier 2 interventions by participating schools.
Interrater Agreement
Interrater agreement (IRR) was collected across multiple levels of the research process, including screening of record titles, abstracts, and keywords; full-text review; and article coding. Teams led by two doctoral-level special education faculty and a postdoctoral scholar with experience in systematic reviews conducted each stage of the project. Training, led by the first three authors, generally consisted of reviewing selection criteria or codes with the research team, guided practices with articles from the larger sample, and routine evaluations of agreement—with feedback—provided over the course of the project. Agreement during training was determined using phase-specific calculations. For initial screening, coders received an initial 1-hr training delivered by the postdoctoral scholar, which included coding prescored examples. Coders were then required to maintain >90% agreement with the postdoctoral scholar on 10 records before receiving records to code independently. Independent task sets were assigned to coders in discrete sets (e.g., 20 records) to allow for routine assessment of agreement by the postdoctoral scholar and prevent observer drift. For full-text screening and coding, reviewers assessed the coding scheme and jointly reviewed articles outside the scope of the review until reaching 90% agreement on three consecutive articles.
Abstract Review
All initially retrieved records were double-screened by the postdoctoral scholar and three graduate researchers. We defined agreement as two screeners agreeing on the inclusion status of the same record and calculated an IRR coefficient by dividing the number of agreements by the total number of records screened. The average IRR was 92%. Any studies coded as a disagreement between screeners were reviewed again at the full-text level.
Full-Text Review
All full texts were double screened by two authors, and any disagreements were resolved by a third. Agreement was defined as two authors agreeing on the inclusion status of an article. The average IRR, calculated by dividing the number of agreements by the total number of studies screened, was 93%.
Coding
A team consisting of four authors double-coded 66.77% of the included studies. We resolved disagreements through consensus between the two primary coders, and consulted a third independent coder, if necessary. We calculated IRR by dividing the total number of agreements by the total number of rated items for each study. The average IRR was 96%.
Results
The review yielded nine SCRD studies (see Table 2). Of these, 33.33% (k = 3) were available only as dissertations. Eighty-nine percent of studies (k = 8) implemented interventions at Tier 2 exclusively, while Boulos (2016) reported implementing interventions at both Tiers 1 and 2. Studies included 57 total participants, with an average sample size of six participants (R = 3–11; SD = 2.8). Across studies, 78.95% (k = 45) of participants were boys. On average, students were in the second grade at the time of the study (R = 1-6; SD = 1.69). All participants were identified as being at risk for reading disabilities based on benchmark screenings, state test scores, or teacher observations; none were formally identified with LD. Additional disability classifications represented among participants included emotional and behavioral disorders (15.79%, k = 9), speech-language impairment (3.51%, k = 2), and other health impairments (i.e., attention-deficit disorder; 5.26%, k = 3).
Alignment and Fidelity of Studies.
Note. Dissertation studies listed in italics. CN = content; PC = instructional practices; AG = instructional arrangement; DR = Duration of instruction/component duration; OB = observation; DF = fidelity definition; CH = coaching; MN = monitoring; TL = origin of assessment tool reported; F = feedback; SC = score; FQ = assessment frequency; TN = training; T2 = description of school T2 project; Y = reported; N = not reported; TCD = total and component duration; TO = total duration only; ST = standardized tool; NS = nonstandard observation; SR = self report; EO = external monitoring; RC = researcher created; OU = origin unknown; EC = external score, with calculation; EW = external score, without calculation.
Description of researcher-implemented Tier 1 program appears in text.
Except for Boulos (2016), all studies were explicitly situated within an MTSS framework. Researchers were directly involved in the implementation of the intervention in 33.33% of studies (k = 3; Boulos, 2016; J. Kuhn & Albers, 2022; Spencer et al., 2024). Multiple-baseline designs (66.67%, k = 6) or nonconcurrent multiple baseline designs (11.11%, k = 1)—the most commonly used SCRD configuration in special education (King et al., 2024)—appeared in the majority of studies. Alternating treatments designs (Boudreaux-Johnson et al., 2017; 11.11%, k = 1) and the related repeated acquisition design (Spencer et al., 2024; 11.1%, k = 1) appeared less frequently.
Alignment
School-Implemented Tier 1
No studies reported alignment across any of the categories assessed for school-implemented Tier 1 instruction.
Researcher-Implemented Tier 1
Boulos (2016) described a class-wide curriculum—specifically, Reading Mastery—as part of the study. Descriptions of specific grouping arrangements or the duration of components were not provided. However, reports regarding the specific techniques (e.g., rhyming exercises), total duration, and content of instruction were described to such an extent to permit assessment of alignment between instruction across tiers.
Fidelity
School-Implemented Tier 1
As with alignment, no studies reported fidelity data for school-implemented Tier 1 instruction. Lack of fidelity data presents problems in confidence that the procedures were implemented as intended.
Researcher-Implemented Tier 1
Boulos (2016) reported multiple components related to both the maintenance and measurement of Tier 1 fidelity. The training of the teacher was described. A definition of fidelity was also provided alongside an implementation checklist. Finally, the calculation of fidelity, as well as the schedule by which the researcher conducted observations, was provided.
Researcher-Implemented Tier 2
Eight studies (88.89%) described training interventionists prior to implementation. Training varied from unspecified sessions prior to implementation (e.g., Fuoco, 2020) to several hours of training spanning multiple days (e.g., O’Keeffe et al., 2013). In terms of maintaining fidelity, authors reported using coaching (e.g., individual supports provided as needed; J. Kuhn & Albers, 2022) in 33.33% of studies (k = 3). Feedback was included in 44.44% of studies (k = 4). Boulos (2016) shared the results of fidelity observations with interventionists but provided no other form of support (i.e., coaching).
Six studies (66.67%) provided an explicit definition of fidelity or otherwise provided access to their observation tool. Authors reported using observers to externally monitor fidelity in eight studies (89%), with J. Kuhn and Albers (2022) also collecting teacher self-reports regarding the intervention condition. Fidelity observation tools ranged from researcher-created (e.g., Boudreaux-Johnson et al., 2017) to standardized tools packaged with their reading programs (e.g., Reading Mastery, Boulos, 2016). Authors reported fidelity scores in 88.89% of studies (k = 8), with external monitoring as the main method to assess fidelity. Assessment frequency was reported in most of the studies (88.89%; k = 8). Although 88.89% of studies (k = 8) reported scores, only half of these described calculation procedures. Five studies (55.56%) described conventional Tier 2 programs provided alongside the intervention.
Discussion
The present review examined how SCRD studies evaluating elementary Tier 2 literacy interventions report instructional fidelity and alignment across tiers. Across nine studies, all but one implemented researcher-provided Tier 2 interventions exclusively, with limited or no description of concurrent Tier 1 instruction. Researchers consistently reported fidelity of the intervention they administered, yet none provided information regarding how these interventions related to, or were aligned with, the instruction students typically received in Tier 1 settings. The absence of information regarding the Tier 1 context limits the interpretability of SCRD findings in relation to Tier 2 interventions and constrains the integration of this literature with research on students with or at-risk for LD more broadly. Findings provide clear guidance on how future SCRD work involving Tier 2 interventions may advance.
Although motivated by similar questions, differences in targeted studies and coding schemes complicate direct comparisons between the outcomes of the current review and those of Hill and colleagues (2012). Nonetheless, the principal result—Tier 2 intervention studies generally monitor Tier 2 intervention without providing additional insight into Tier 1 instruction—remains the same. The absence of frequent reports regarding the implementation fidelity of the primary intervention distinguishes these studies from the wider reading intervention literature that often omits such information (e.g., Capin et al., 2018). The extent to which studies in this review reported fidelity exceeds the marked increase in reporting observed in SCRD across subject areas (e.g., inappropriate behavior, mathematics; Gage et al., 2020). However, the actual data encompassed by fidelity procedures varies considerably across SCRD studies (Gage et al., 2020) and reading studies more generally (van Dijk et al., 2023), which is consistent with our findings. Whereas most studies identified by Hill and colleagues (2012) provided some form of support to interventionists, only 44% of studies in the present review reported coaching or feedback beyond initial training. The discrepancy may be due to the scale of SCRD, where primary authors generally conducted all observations and can informally exert control over the quality of intervention.
Previous reviews suggest that reports of Tier 1 fidelity in research concerning more intensive intervention are relatively uncommon (Al Otaiba et al., 2025; Hill et al., 2012). Although more common in group-design studies (e.g., Stevens et al., 2020), the scale of many Tier 2 intervention studies may explain—if not excuse—limited reports of the wider instructional context (Hill et al., 2012). Given the relatively small number of participants across studies, the limited reports of fidelity and alignment of Tier 1 instruction observed in the current review arguably stem from the conventions of SCRD rather than their scale. As a related example, SCRD have only recently begun to incorporate the perspectives of participants into their work (i.e., social validity; Snodgrass et al., 2018; Thoele & DeAngelo, 2023; Wellons et al., 2024). When it occurs, participant perspectives are often collected via rating scales, a practice that is understandable among large samples but potentially reductive given that the median SCRD sample contains no more than four participants (King et al., 2024). Much like the rating scales used to gauge social validity, limited reporting of Tier 1 instruction in SCRD reflects conventions designed for efficiency and may not reveal enough about the conditions under which Tier 2 intervention is effective.
We hesitate to couch the discussion of alignment and fidelity in terms of quality, which can evoke arbitrary checklists that can seem disconnected from the objectives of research (e.g., Harris et al., 2019; Lanovaz & Rapp, 2016). Rather, we echo calls to acknowledge the utility of flexible, context-specific guidelines when conducting SCRD (Ledford et al., 2023). In the case of Tier 2 intervention research specifically, this requires greater attention to Tier 1 instruction. The SCRD studies that provide multiple safeguards related to internal validity, such as randomization and the inclusion of several data points within each experimental phase, no doubt have considerable value. These elements alone, however, do not address whether Tier 2 interventions function as intended within the ecology of MTSS. Changes in priorities of SCRD methods may be warranted as lines of research extend beyond whether a practice can produce changes in outcomes to how these practices function and interact with concurrent school supports (Ledford et al., 2023).
Limitations
This review has several notable limitations. First, none of the studies explicitly involved students with LD. The pattern of reading difficulties associated with participants is nonetheless consistent with recent research regarding students with or at-risk for LD, particularly following the advent of MTSS frameworks that have decreased formal LD diagnoses (Berkeley et al., 2020; King, Wang, Datchuk & Rodgers, 2023). Second, MTSS and reading terms were applied to titles and keywords, yet instruction-related search terms were limited to abstracts. Although this asymmetry may have limited the consistency of the search strategy, it also reduced the likelihood of including irrelevant records. Third, we limited the search to studies published after Hill and colleagues’ (2012) review, which did not include SCRD. While it is possible that a search of this earlier period may have revealed additional eligible studies, the few studies pertaining to RTI between 2004 and 2011 (i.e., the search range of the original review), combined with limited awareness of issues raised by Hill and colleagues, suggest that inclusion of records prior to 2012 would not have meaningfully changed our results. Fourth, we included only studies that clearly occurred within RTI or MTSS frameworks and excluded those in which the presence of such frameworks could not be verified. As studies that did not occur with an instructional framework were unlikely to mention Tier 1 instruction, these omissions avoided creating a skewed picture of the literature.
In addition, we excluded studies that may have occurred within qualifying MTSS frameworks that were not explicitly mentioned by the author. Certain states require the use of RTI or MTSS, and studies conducted within those states would presumably qualify for inclusion. As authors cannot presuppose readers’ exhaustive knowledge of the context beyond what is explicitly disclosed, these exclusions were justified. Given the variance in implementation of MTSS across states, we further recommend that authors explicitly note whether MTSS is employed and enumerate specific components encompassed by the MTSS framework, particularly if they are relevant to literacy.
Implications for Practice
Due to its scale and flexibility, SCRD may be readily integrated into the work of researcher-practitioners who incorporate systematic data collection and intervention implementation into their work (Blampied, 2013; Ninci, 2023). One consequence of more SCRD research involving students with or at-risk for LD and greater attention to Tier 1 instruction is that issues presumably amenable to Tier 2 intervention might be better resolved through changes to core instruction. As such, the collection of fidelity and alignment data is critical for implementing data-based decision-making in both research and practice. The process of observing and modifying Tier 1 instruction to improve student outcomes—rather than immediately prescribing a Tier 2 solution—represents a goal for future SCRD research with the potential to improve the professional judgment of special educators, instructional coaches, and others responsible for coordinating services for students with or at-risk for LD and related learning needs (Baker et al., 2010).
Assessing instructional fidelity and monitoring instruction to support students with significant learning needs present considerable challenges for many schools (e.g., Ruffini et al., 2016). While practitioners would likely benefit from efforts to disseminate flexible tools for monitoring Tier 1 reading instruction (e.g., Iowa Reading Research Center, 2025), there is a need for rigorous, standardized assessment in this area. These tools should also be coupled with guidance on how to improve instructional practice (Cuticelli et al., 2016). Both resource limitations and perceptions of fidelity monitoring can make attention to these contextual factors difficult to maintain (McKenna & Parenti, 2017). Minimizing the aversiveness of observations and increasing buy-in (e.g., Falletta-Cowden & Lewon, 2023) are more salient in a context where the use of specific literacy practices is increasingly compelled by state governments (Barnes & Peltier, 2022; Neuman et al., 2023). Addressing these gaps will depend on helping schools operationalize Tier 1 implementation fidelity that can be measured, supported, and improved in cooperation with practitioners.
Directions for Future Research
We view these findings as less indicative of shortcomings with individual studies or the scope of this review than a series of missed opportunities to address critical questions rooted in long-standing research conventions. The relative absence of SCRD in LD research observed here and elsewhere is significant given the clear potential for this design to address fidelity, alignment, and other issues relevant to students with LD (Peltier et al., 2021). The prohibitive costs of group designs—and the shifting priorities of many funding agencies (Northern & Opp, 2026)—further underscore the role of SCRD and other more feasible designs in addressing the range of questions pertinent to LD. Authors are therefore encouraged to reconsider the role of SCRD and what it can accomplish within LD research. Hott, Flores, et al. (2023) and Hott, Heiniger, et al. (2023) have called for more SCRD and more rigorous methodological reporting in LD research. We extend this call to encompass techniques that capture a broader range of data relevant to Tier 2 interventions and issues central to students with or at-risk for LD (e.g., culturally responsive instruction; Austin et al., 2024). To reduce the likelihood of missing critical connections between instructional tiers within MTSS, future SCRD should incorporate qualitative interviews and systematic observations—approaches that would strengthen its contribution to special education research and provide insight for future intervention implementation in school settings.
In terms of Tier 2 research, SCRD studies establishing what works—and what doesn’t—within well-documented MTSS contexts could influence larger studies attempting to identify intensive interventions that respond to students whose needs extend beyond access to conventionally effective instruction. A first step is to extend the logic of direct observation beyond conditions directly controlled by the researcher to the contextual factors that shape instructional change. This could include incorporating the results of structured observations (e.g., ICE-R) conducted prior to baseline as well as stakeholder reports (e.g., King, Rodgers, and Lemons, 2022). Embedding interviews or contextual observations throughout the research process would align with proposed frameworks for mixed-methods SCRD, providing a richer account of instructional context while maintaining causal precision (Onghena et al., 2019). As simply raising awareness has a poor track record of reform, such changes will likely require alterations to how SCRD is taught (Kubina et al., 2021, 2023). These changes coincide with a broader movement to reconceptualize SCRD outcomes (e.g., use of effect sizes, Maggin et al., 2019) and quality in relation to more ambitious research objectives (Lambert et al., 2025; Ledford et al., 2023). Expansive approaches to SCRD characterized by mixed methods will likely require concomitant changes in how studies are interpreted and aggregated (e.g., Flemming & Noyes, 2021; Nye et al., 2016).
Returning to Tier 2 intervention research specifically, we advise authors to recognize the complexity of implementation fidelity when designing and articulating observation procedures (van Dijk et al., 2023). Like Hill and colleagues (2012), we examined whether studies provided sufficient information for readers to determine how fidelity was defined and the extent of alignment between instructional tiers. This approach is suitable to track concerns that, based on our findings, represent a nascent concern within SCRD involving children with or at-risk for LD. Although our findings do not permit causal inferences, studies demonstrating improved outcomes under aligned Tier 1–Tier 2 conditions (e.g., Stevens et al., 2020) underscore the potential value of this practice. Establishing stronger connections between the dimensions of fidelity observed at Tier 1 and outcomes associated with interventions at Tier 2, however, will require greater attention to how fidelity across tiers is conceptualized and reported in original studies and aggregated in secondary analyses (Bason et al., 2025; van Dijk et al., 2023). We encourage future research teams to plan for fidelity proactively and to broaden the scope of data that can provide insight into the adequacy of intervention delivery.
Footnotes
Funding
The authors received the following financial support for the research, authorship, and/or publication of this article: This project was completed with support from the Iowa Department of Education.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
