Abstract
The purpose of this review was to systematically analyze the literature on behavior management training for general educators (Pre–K-12). We identified 74 articles in which general educators were trained to implement a behavior management strategy. General educators were most commonly trained to implement behavior-specific praise (n = 12), Class-Wide Function-Related Intervention Teams (n = 8), or a multi-component intervention package (i.e., a student-level intervention that included a number of strategies; n = 21). The two most common training components were initial training provided in a one-on-one format (n = 30) and the inclusion of ongoing coaching (n = 29). Thirty-nine articles included measures of practitioner fidelity or discrete behaviors (e.g., behavior-specific praise) within the context of an experimental design. We evaluated methodological rigor and evidence of effectiveness of these 39 articles using What Works Clearinghouse standards. Eleven articles included group design studies, nine (81.82%) of which met standards with or without reservations, and four designs had promising evidence of effectiveness. Twenty-eight articles included a total of 49 single-case research designs, of which 27 designs (55.10%) met standards with or without reservations, and 23 designs provided moderate to strong evidence of effectiveness. Directions for future research and implications for the field are provided.
Educators are charged with meeting the diverse needs of all learners but managing challenging student behavior continues to be a prominent challenge. General educators report difficulty managing more severe problem behavior (e.g., physical aggression; Butler & Monda-Amaya, 2016), continuing instruction when challenging behavior occurs (Fallon et al., 2014), and finding the time to plan and implement intensive individualized strategies (Martinussen et al., 2011). Difficulties with managing student behavior have been connected to increased rates of teacher burnout, poor teacher retention, and lower job satisfaction (Gilmour & Wehby, 2020). Furthermore, overwhelmed teachers are at an increased risk of relying on punitive and exclusionary practices (e.g., office discipline referrals, suspensions; Bucalos & Lingo, 2005). The overuse of these types of practices can result in students experiencing poor academic and social outcomes during school-age years and post-school trajectories (Lloyd et al., 2019; Scott & Alter, 2017).
Compared to special education teachers, general educators’ lack of specialized behavior management training leaves them especially vulnerable to the challenges involved with supporting student behavior (Lane et al., 2015). This is problematic as a general educator’s ability to effectively manage student behavior is integral to student success. As students with and at risk for disabilities receive instruction alongside their same-age peers (Individuals with Disabilities Education Improvement Act [IDEIA], 2004), general educators report feeling ill-equipped to meet the management demands of students who engage in more problematic behavior (Butler & Monda-Amaya, 2016). This is consistent with Flower et al.’s (2017) review of teacher preparation coursework, which found that general educators may enter the field having received most instruction on universal classroom management strategies, such as establishing rules and routines (i.e., Tier 1 behavioral supports). A lack of training and experience in managing problem behavior can impact their ability to fully support students in the general education setting who require Tier 2 and Tier 3 practices (Lloyd et al., 2019; Moore et al., 2017). Previous research clearly creates an impetus for teacher training efforts: provide educators with training on more intensive behavioral support strategies for students who engage in the most problematic behavior (Moore et al., 2017). For example, general educators are likely to teach students with an emotional and behavioral disorder (EBD), considering 48.5% of students with EBD receive at least 80% of their instruction in the general education setting (U.S. Department of Education, National Center for Education Statistics, 2019). Typically, students with EBD require strategies that go well beyond Tier 1 practices (Evans et al., 2012). However, students with EBD present especially challenging demands for educators. Gilmour and Wehby (2020) recently conducted a study that found teaching students with disabilities was associated with higher teacher turnover. Specifically, students who engaged in more problematic behavior (e.g., students with EBD) were strongly associated with turnover. Creating inclusive educational environments is, in part, contingent on general educators’ ability to implement effective interventions for problematic behavior, which can have positive collateral impacts on teacher job satisfaction and retention.
Researchers have recently addressed these challenges by providing training and ongoing coaching to general educators on evidence-based practices for classroom and behavior management. Reviews have summarized the effectiveness and utility of low-intensity strategies in applied settings, including teacher-delivered behavior-specific praise (BSP; Royer et al., 2019) and opportunities to respond (OTRs; MacSuga-Gage & Simonsen, 2015). In addition, researchers have summarized teacher training components and their relative effects on treatment fidelity. For example, Fallon et al. (2015) evaluated whether performance feedback was an evidence-based practice to promote implementation of school-based practices. Through application of widely used standards to evaluate single-case designs, performance feedback was determined to be an evidence-based practice. In a more recent review, R. P. Ennis and colleagues (2020) examined the quality of research on coaching to promote teacher-delivered BSP and to determine if it is an evidence-based practice. Coaching was determined to be evidence-based practice to increase teacher use of BSP. Researchers have also systematically reviewed the evidence for functional behavior assessments and function-based interventions (e.g., Lloyd et al., 2019), which are hallmark strategies for addressing problem behavior that is non-responsive to initial tiers of support (Scott & Alter, 2017). In a recent example, Lloyd and colleagues (2019) completed a systematic review of function-based interventions implemented in general education settings and found that general educators were the primary implementor across most of the studies included.
Reviews similar to the ones previously listed have played a critical role in identifying and summarizing the extant literature on evidence-based practices for improving student behavior. An understanding of what training components might result in sustained change in teacher behavior can also be useful to inform future professional development and, subsequently, general educator use of behavioral support strategies. However, previous reviews continue to leave larger questions unanswered. First, there is no existing review that has systematically and comprehensively identified what interventions general educators have been trained to implement to prevent and reduce problematic behaviors and promote pro-social behaviors. The benefit to answering this question is three-fold: (a) it can reveal trends in publication in general educator training; (b) it can allow researchers and teacher educators to identify shortcomings in general educator behavior management training, which would re-inform future research; and (c) it can inform professional development on a number of strategies for general educators, compared to previous reviews that targeted isolated strategies or training components.
A second question that remains unanswered is how general educators have been trained to implement behavioral support strategies. Changes in student behavior often occur as a concomitant outcome of teacher behavior. This cascading effect of interventionist behavior on student behavior warrants deep attention on the methods by which educators were trained (Roberts et al., 2014). Information on typical training components across a diverse set of practices, as opposed to a single practice, can provide context on the procedural mechanisms necessary for sufficient implementation of behavioral support strategies (Lloyd et al., 2019).
However, information on which practices might result in desirable changes in teacher behavior should be informed by a reliable and high-quality source of science (Maggin et al., 2013). Methodologically rigorous research should guide the scientific advancement of sound educational practices (What Works Clearinghouse [WWC], 2017). WWC, an initiative of the U.S. Department of Education’s Institute of Education Sciences, provides guidelines for conducting, evaluating, and synthesizing the methodological quality of single-case and group design studies. Inclusion of WWC guidelines in applied educational research and systematic reviews is an important development for identifying practices that can result in changes in teacher behavior, which then results in collateral changes in student behavior.
Purpose
The purpose of this review was to extend the work of previous researchers to support future general educator training in behavior management. To our knowledge, there has been no review broadly focused on behavior management training for this subset of practitioners. This review included two stages. First, we described all relevant literature that involved training general educators to implement behavior management strategies to increase pro-social behaviors or decrease challenging behaviors of their students. Second, we analyzed the methodological rigor and evidence of effectiveness of studies that experimentally investigated the effect of training on teachers’ fidelity to intervention procedures or use of discrete behaviors (e.g., BSP). The research questions for this review were as follows:
What behavior management strategies have general educators (Pre–K-12) been trained to implement?
What procedures have researchers used when training general educators to implement behavior management strategies?
For studies that experimentally investigated the effect of training on teachers’ fidelity or discrete behaviors, what studies have adequate methodological rigor and evidence of effectiveness per WWC (2017) 4.0 guidelines?
Method
Inclusion Criteria
Studies eligible for inclusion met four criteria. First, studies were published in a peer-reviewed journal between the years of 2004 and 2020 in alignment with recommendations that systematic reviews focus on evidence from the previous 20 years (WWC, 2017). This criterion also aligns with the reauthorization of IDEA (IDEIA, 2004), which strengthened guidelines, and subsequently research, on functional behavior assessment and individualized behavioral interventions in public school settings. Second, study procedures included educator training on student-level interventions designed to result in immediate change in problematic or pro-social behaviors (e.g., aggression, disruption, academic engagement). We excluded studies solely focused on mental health outcomes (e.g., anxiety, depression) or studies that only measured distal outcomes (e.g., office discipline referrals). Third, at least 50% of participating trainees were described as general education teachers working in the United States. We included studies if teacher participants taught in a general education classroom, had a relevant bachelor’s or master’s degree, or held a teacher certification. We aimed to synthesize training procedures across a relatively homogeneous group of teacher participants and hypothesized there would be systematic differences between general educators in the United States and educators who had differing qualifications (e.g., associate’s degree in child development) or worked in other countries. Fourth, studies reported at least mean teacher fidelity or discrete behavior data as an outcome of training on the student-level intervention. We included studies if a percentage of intervention sessions were observed with a checklist of intervention components or discrete behaviors were measured in a time-series design (e.g., BSP or OTRs reported within a single case research [SCR] design). We excluded self-reports of fidelity and distal outcomes (e.g., classroom climate). As part of our rigor and effects coding, we later determined whether teacher fidelity and/or discrete behaviors were experimentally evaluated as a dependent variable.
Search and Screen
Our three-stage search and screening process is depicted in Figure 1. First, we searched Academic Search Complete, ERIC, PsycINFO, and Medline Full Text databases within EBSCO using terms related to (a) general education teachers (general education teacher OR classroom teacher OR regular education teacher), (b) behavioral interventions (behavior management OR classroom management OR behavior), and (c) training models (intervention* OR training OR professional development OR development OR coaching OR consultation). We connected phrases with the Boolean term “AND.” We limited results to records from scholarly journals, years 2004 to 2019, and academic journals (Note the hand search included journals from 2004 to 2020 [see the following]). The database search resulted in 4,540 records. We removed duplicates within EBSCO and EndNote file management software and 3,194 records remained.

Search and screening process.
Two undergraduate-level research assistants (RAs) independently screened titles and abstracts of all 3,194 records, using binary codes to indicate whether records clearly failed to meet an inclusion criterion (0) or should proceed to full-text screening (1). Prior to beginning, the first and second authors provided instruction on correct coding procedures, modeling, opportunities for rehearsal, and feedback during an initial training. After coding 20 records as a group, RAs independently demonstrated 90% agreement with the first author’s “gold standard” across two sets of 20 records. The first and second authors provided corrective feedback as RAs coded the first 500 records post-training and calculated inter-rater agreement (IRA) across all records (IRA = 92.29%). The first and second authors reviewed the titles and abstracts of records that either RA coded as “1” and removed any that were clearly irrelevant. Subsequently, 234 studies proceeded to full text screening.
We used a hierarchical coding scheme for full-text screening with criteria listed in the following order: (a) the student-level intervention was focused on challenging or pro-social behaviors, (b) at least 50% of trainees were general educators, (c) teachers were working in the United States, (d) researchers reported at least mean fidelity or a measure of discrete behaviors, and (e) the measure was directly linked to the intervention. If a study failed to meet one criterion in the hierarchy, we stopped screening and excluded it. Prior to independent screening, the first three authors double coded five studies until reaching 90% agreement and then split the remainder of the studies three ways. We completed blind double coding of 30% of studies (IRA = 88%). When two authors disagreed on inclusion status, the remaining author reviewed the study to resolve the discrepancy. Through this process, we included 66 studies.
Finally, we hand searched the four journals most represented in the database search results, including: (a) Journal of Positive Behavior Interventions, (b) Journal of Emotional and Behavioral Disorders, (c) Journal of Behavioral Education, and (d) School Psychology Review. Two graduate-level RAs reviewed titles and abstracts of articles published from 2004 to 2020 and recorded any studies that met our inclusion criteria in a spreadsheet. The first author then screened the studies (n = 23) according to our full-text screening criteria, and the second author double screened 30% (IRA = 100%), resulting in eight additional studies included.
Stage 1: Descriptive Coding
We coded studies to extract descriptive data about teacher and student participants (i.e., number, grade level, special education status), student-level interventions (see Table 2), teacher-level training characteristics (e.g., format, length, training components), research designs (i.e., group or SCR), and social validity results. Finally, we coded whether studies reported fidelity results or measures of discrete behaviors in the context of an experimental design. We coded “yes” for (a) group design studies that reported means for both intervention and control groups and (b) SCR design studies that included graphs of baseline and intervention conditions (e.g., ABAB, multiple baseline). The first two authors developed the codebook and refined variable descriptions by double coding six studies. Then, the first author trained a master’s level board certified behavior analyst (fourth author) with nine studies (91% agreement obtained), and the fourth author coded the remaining studies. The first author double coded 30% of studies (IRA = 90%) and the two coders met to review and resolve discrepancies.
Stage 2: Rigor and Effects Coding
Inclusion in WWC evaluation
Studies that evaluated teacher fidelity or discrete behaviors in the context of an experimental design (defined above) proceeded to an evaluation of rigor and effects based on the WWC standards for group and SCR designs (Version 4.0; WWC, 2017). Per WWC guidelines, we subsequently examined the effects of each design that met standards or met standards with reservations (note that SCR studies often included multiple designs). For both group and SCR designs, we only analyzed the effects of training on teacher behaviors that researchers aimed to increase (e.g., fidelity, BSP, OTRs) and excluded behaviors targeted for decrease (e.g., reprimands, negative interactions).
Group design coding
A group design (e.g., randomized controlled trial) met WWC standards if (a) intervention and comparison groups were assigned through randomization and (b) overall and differential attrition were low. A group design met WWC standards with reservations if (a) intervention and comparison groups were not assigned through random procedure but baseline equivalence was established or (b) random assignment did occur but sample attrition was high and baseline equivalence was established. The third author coded rigor of all group designs and the first author double coded 30% (IRA = 95.8%).
We calculated effect sizes for each relevant teacher outcome in group designs that met WWC criteria with or without reservations. We extracted post-intervention group sizes, means, and standard deviations from results tables or, if necessary, narrative results. We calculated Hedges’ g effect sizes for all outcomes to account for unequal sample sizes across studies and to reduce bias. We then calculated an average effect per study. For example, if a study included praise, behavior management, and positive statements, we calculated Hedge’s g for each measure and then averaged them. The first and third authors double-coded 100% of Hedge’s g effect sizes. Initial agreement was low (IRA = 57.89%) due to incorrect sample size inputs in the Excel effect size calculator for one study with six outcomes. There was high agreement after re-coding to resolve all discrepancies (IRA = 100%).
SCR design coding
We coded the quality and rigor of each individual SCR design within a study (e.g., each ABAB or multiple baseline design). An SCR design met WWC standards if (a) data were presented in a graphic format, (b) the researcher systematically manipulated the independent variable, (c) interobserver agreement (IOA) was measured for 20% of sessions in each phase with 80% agreement, (d) there were at least five data points per condition, and (e) there were at least three attempts to demonstrate an effect at three different points in time. A design met standards with reservations if there were 3 to 4 data points per condition and all other criteria were met. We identified whether SCR designs included graphs depicting teacher fidelity or discrete behaviors during descriptive coding; thus, we applied the remaining standards at this stage.
If a design met standards with or without reservations, we used WWC guidelines to visually analyze the graphed data. We analyzed the primary dependent variable for designs with multiple outcomes; if the authors did not identify a primary dependent variable, we rated the evidence across all relevant outcomes. Specifically, we examined each phase contrast (i.e., baseline to intervention [A to B]; intervention to baseline [B to A]) and determined whether there was a clear change in level, trend, and/or variability of data in the expected direction. For visual analysis of A to B contrasts, we considered whether (a) the overall level increased, (b) the trend changed from decreasing/flat to increasing, or (c) variability decreased. For B to A comparisons, we expected data to demonstrate opposite patterns (e.g., a decrease in level). If there was a basic effect for one contrast (i.e., pattern a, b, or c was depicted), we examined whether the effect was replicated across two additional phase changes (i.e., three demonstrations of effect). We categorized the evidence of effectiveness as (a) strong, if there were at least three demonstrations of effect with no non-effects; (b) moderate, if there were three demonstrations of effect and one or more non-effects (e.g., for designs with four phase contrasts); or (c) no evidence, if there were fewer than three demonstrations of effect. The first two authors coded rigor and effects of SCR designs and point-by-point IRA was adequate (30% designs; IRA = 86.18%). Lower agreement occurred on an initial set (n = 8; IRA = 79.10%). After meeting to reach consensus, agreement on a second set was high (n = 7; IRA = 94.64%).
Results
Through our systematic search and screening process, we identified 74 peer-reviewed journal articles that met all inclusion criteria. At our first stage, we conducted a descriptive review of all 74 studies (group n = 26; SCR n = 48). At our second stage, we applied WWC standards to evaluate the rigor and effects of 39 studies that analyzed teacher fidelity or discrete behaviors in the context of an experimental design (group n = 11; SCR n = 28).
Descriptive Results
Participants
Table 1 displays participant demographic data. Of the 74 studies included in this review, authors of 73 studies reported the number of general educator participants, and authors of 60 studies reported the number of student participants. Across those studies, there were 2,938 total general educator participants and 26,468 student participants. Group design studies included 2,768 general educators (treatment n = 1,505) and 24,715 students (treatment n = 11,400). SCR studies included 170 general educators and 1,753 students. Although SCR studies often include few participants (e.g., 3 to 5), the large number of students in these studies is reflective of researchers frequently measuring whole-class behaviors (e.g., group on-task).
Participant Demographics by Design Type.
Note. SCR = single case research; NA = not applicable.
Values represent the number of participants in studies. bValues represent the number of studies with participants in each category.
Authors of all 74 studies reported the grade level of student participants. In most of the studies, general educators were trained to implement behavioral interventions with elementary students (group study n = 22; SCR study n = 36). Authors of fewer studies reported training general educators to intervene with pre-kindergarten or Head Start students (group study n = 3; SCR study n = 5) or middle school students (group study n = 1; SCR study n = 6). Researchers in one study trained general educators to intervene with high school students (Lum et al., 2019). Authors of 68 studies reported students’ special education status, and most of those studies included both general education and special education students (group study n = 14; SCR study n = 21). Thirty studies included only general education students (group study n = 9; SCR study n = 21), and three SCR studies included only students receiving special education services.
Student-level interventions
Table 2 displays information on the student-level interventions general educators were trained to implement. Definitions for each student-level intervention are included as a supplemental file (see supplemental Table S1). We identified 20 student-level interventions, eight of which were included in three or more studies. In 21 of 74 studies (28.38%), researchers reported trainings focused on multi-component intervention packages (MCIP) designed by researchers based on assessment data and/or teacher consultation. In 12 studies (16.22%), researchers described trainings focused on BSP. Researchers described trainings focused on Class-Wide Function Related Intervention Teams (CW-FIT) in 8 studies (10.81%). General educators were trained to implement the Good Behavior Game, the Incredible Years Teacher Classroom Management program, and tootling in four studies each (5.41% of studies per intervention). General educators were trained to implement BEST in CLASS and First Step to Success in three studies each (4.05% of studies per intervention). Twelve student-level interventions were the focus of training in only one study (e.g., Caterpillar Game, Daily Report Card) or two studies (e.g., Check-In/Check-Out, social-emotional learning).
Number of Studies per Student-Level Intervention.
Note. Some student-level interventions were categorized into multiple tiers.
Teacher training procedures
We coded teacher training procedures for all 74 studies (see supplemental Table S2) and further examined patterns in methodologically rigorous studies. Authors of most studies described the training format (group study n = 18; SCR study n = 40) with distinct differences by design type. Group format training corresponded with group design studies (n = 16), whereas one-to-one trainings corresponded with SCR studies (n = 30). Similarly, initial training length was greater than 4 hr in group design studies (n = 10) but less than 1 hr in SCR studies (n = 18). Authors of 27 studies did not report the initial training length. During training sessions, didactic instruction was provided verbally (n = 27), in multiple ways (n = 29), or not reported (n = 28). No authors described providing written instructions. Authors reported a mix of trainer models used, including in vivo (n = 5), live (n = 25), or video (n = 7). Authors of more than half of all reviewed studies (group study n = 15; SCR study n = 29) did not report the type of trainer feedback provided; those that did primarily used verbal communication (group study n = 10; SCR study n = 18). Ongoing coaching was provided to participants in 21 group studies and 29 SCR studies. Finally, authors of all studies that measured social validity of the student-level intervention teachers were trained to implement reported that participants found them to be acceptable (group study n = 10; SCR study n = 37). It should be noted that 27 studies did not report social validity.
Rigor and Quality Results
Tables 3 and 4 display the results of coding the rigor and effects of group designs and SCR designs with WWC protocols (WWC, 2017).
WWC Evaluation of Group Design Studies That Evaluated Teacher Fidelity or Discrete Behaviors.
Note. “*” Hedges’ g measure of effect size; Coding based on protocol described by What Works Clearinghouse (WWC, 2017); NA = not applicable; ES = effect size; DV = dependent variable; BSP = behavior specific praise; OTR = opportunities to respond; PC = precorrection; IF = instructive feedback; CF = corrective feedback; TA = teacher adherence; TC = teacher competence; IS = instructional strategies; BMS = behavior management strategies; TS = teaching strategies; SCERTS = Social, Communication, Emotional Regulation, and Transactional Support; TCI = Teacher Competence Inventory; CPF = contingent positive feedback.
Baseline equivalence not assessed for studies with low attrition. bStandard deviations not provided; unable to calculate effect size.
WWC Evaluation of SCR Design Studies That Evaluated Teacher Fidelity or Discrete Behaviors.
Note. Coding based on protocol described by WWC (2017). All studies included graphs depicting results and systematic manipulation of the independent variable. IOA = interobserver agreement; BSP = behavior-specific praise; CSS = contingency-specifying stimuli; DV = dependent variable; IV = independent variable; OTR = opportunities to respond; PC = pre-corrections; PRIDE = labeled praise, unlabeled praise, reflections, behavior description; R = reservations; WWC = What Works Clearinghouse.
Group design studies
We evaluated the rigor and effects of 11 group design studies with WWC standards and identified seven designs (63.64%) that met standards and two designs (18.18%) that met standards with reservations. We calculated Hedge’s g effect sizes for teacher outcomes (i.e., measures of fidelity or discrete behavior) that served as a dependent variable in studies that met standards with or without reservations, followed by an average effect size. Mean Hedge’s g ranged from 0.10 to 2.38. One study demonstrated a small average effect (Murray et al., 2018). Two studies demonstrated a moderate average effect (Fabiano et al., 2018; Snyder et al., 2011). Three studies demonstrated a large average effect (Conroy et al., 2015, 2018; Morgan et al., 2018). Three studies met WWC standards but could not be included in the effect size calculation as researchers did not report standard deviations.
SCR design studies
We evaluated the rigor and quality of each independent design within SCR studies using WWC standards (n studies = 28; n designs = 49). Outcomes included both measures of teacher fidelity to intervention procedures (n designs = 3) and discrete behaviors. The most commonly reported outcomes were direct measures of BSP or praise (n designs = 21). We identified 18 SCR designs (36.73%) that met standards and 9 designs (18.36%) that met standards with reservations. We visually analyzed data from those 27 designs to determine whether training resulted in increases on teacher fidelity or discrete behaviors. Of those 27 designs, 19 designs (70.37%) demonstrated strong evidence of effectiveness, 4 designs (14.81%) demonstrated moderate evidence of effectiveness, and 4 designs (14.81%) demonstrated no evidence.
Teacher training characteristics of effective studies
Following the WWC evaluation, we conducted a post hoc analysis of training procedures and characteristics in the 18 group and SCR studies that demonstrated moderate to strong evidence of effectiveness to identify commonalities. As shown in Table 5, seven of the 18 studies (38.89%) provided group training. In contrast, 9 studies (50%) provided one-on-one training. Fourteen studies (77.78%) provided didactic instruction verbally or through multiple modalities. Seven studies (38.89%) provided a live model of correct implementation of the student-level intervention or discrete behavior followed by an opportunity for rehearsal. Two studies (15.38%) provided a video model with one study providing an opportunity for rehearsal immediately after. Immediate feedback was provided in 9 studies (50%). A majority of studies (n = 14; 77.78% of studies) that demonstrated moderate to strong evidence of effectiveness provided ongoing coaching after the initial training.
Training Components Across Studies with Moderate to Strong Evidence of Effectiveness.
Note. “*” denotes studies with multiple designs that demonstrated moderate to strong evidence; “X” indicates the training component was present; M = multiple modalities; V = verbal; IV = in-vivo; L = live; VM = video model; T = rehearsed with trainer; S = rehearsed with students.
Discussion
The purpose of this review was to summarize the literature on behavior management training for general educators and to identify methodologically rigorous examples of studies that measured changes in discrete teacher behaviors and/or fidelity within the context of the experimental design. During the initial stage of the review, we identified what behavior management strategies general educators have been trained to implement and how they were trained to implement them. Seventy-four studies met all inclusion criteria. General educators were most frequently trained to implement BSP (n = 12) or CW-FIT (n = 8). In a number of studies (n = 21), researchers developed an MCIP that integrated numerous strategies into a singular intervention package (e.g., group contingencies, BSP, and pre-corrections). Initial training was most frequently provided in a one-on-one format, and teacher-level outcomes were most commonly evaluated within the context of an SCR design. Ongoing coaching was provided in a majority of studies across both group and SCR designs. The most significant finding was the underreporting of initial training methods. This lack of information is a barrier to identifying under what conditions positive change in student behavior is most likely to occur. We discuss this finding in forthcoming sections.
During our second stage of the review, we evaluated the methodological rigor and evidence of effectiveness for studies that measured teacher fidelity or discrete behaviors as a dependent variable in the context of an experimental design. We evaluated 39 studies using WWC guidelines: 11 group design studies and 28 SCR design studies. Seven group design studies met WWC standards and two met with reservations due to overall and differential attrition. We evaluated the 28 SCR studies at the design level (n designs = 49). Twenty-seven SCR designs met WWC standards with or without reservations. Most studies that met design standards with or without reservations had moderate to strong evidence of effectiveness, which is a relative strength of the findings from this review. SCR designs that did not meet WWC standards commonly had fewer than three data points per condition and/or fewer than three attempts to demonstrate an effect at three separate times (n = 16). Group designs that did not meet design standards (n = 2) lacked random assignment of participants.
Implications
Findings from the current review provide important implications for general educator training in behavior management. We thematically discuss the most important implications.
Scaling the intensity of strategies targeted for training
Most studies in this review targeted low-intensity, Tier 1, and Tier 2 practices or discrete teacher behaviors (e.g., BSP). We hypothesize this is, in part, due to environmental considerations for general education settings. Researchers may have chosen strategies aligned with the management demands of classrooms with large adult-to-student ratios. There is strong research support for both BSP (e.g., Gage et al., 2017; Royer et al., 2019) and CW-FIT (e.g., Kamps et al., 2015—the two most frequent student-level interventions found in this review). Many educators would likely benefit from integrating these low-intensity strategies into their current classroom management plan. However, our findings reveal a disconnect between teacher-reported needs (i.e., training on how to address more problematic behaviors; Butler & Monda-Amaya, 2016) and the targeted practices across most studies. This is consistent with findings from previous research indicating teachers lacked knowledge and effective use of intensive behavioral support strategies for students who engage in the most problematic behavior (e.g., Moore et al., 2017). In tandem with training on Tier 1 and Tier 2 practices, training general educators to implement individualized behavioral interventions or MCIPs is necessary to further equip them to meet the management demands of a classroom.
Nonetheless, we did identify a number of studies that included individualized interventions (i.e., MCIP). The 21 studies in this category included idiosyncratic behavior support plans or multiple Tier 1 management techniques (i.e., precorrections, proximity, etc.). Two studies measured practitioner implementation of individualized and function-based interventions within SCR designs (DiGennaro et al., 2005; Riley et al., 2011), and all three of the total designs demonstrated strong evidence that general educators can be trained to implement more intensive behavioral supports in general education settings. There remains a great need for more research on general educator implementation of function-based interventions as a dependent variable alongside concomitant student outcomes.
We strongly suggest that the frequency of studies that targeted a specific practice, as reported in this review, should not be the sole factor for choosing a class-wide or individualized intervention. The purpose of this review was to focus on what behavior management strategies general educators have been trained to implement, how they were trained to implement them, and evidence of effectiveness of methodologically rigorous studies. This review did not include information on student-level outcomes, therefore limiting what recommendations can be made regarding different student-level interventions. The results of this review should be interpreted in combination with applied studies and previous existing reviews on the included strategies.
Identifying effective and feasible professional development models
Training practices across studies included in the review varied. Nonetheless, most studies with strong evidence for the effect of training on teacher behavior integrated numerous high-quality training practices. This finding is already well-demonstrated in previous reviews (e.g., R. P. Ennis et al., 2020). For example, 14 of the 18 studies (78.85%) with strong evidence of effectiveness included some form of ongoing coaching after initial training. Interestingly, ongoing coaching was also included in a majority of studies (94.73%) that proceeded to WWC evaluation but either (a) did not meet standards or (b) did not demonstrate moderate to strong evidence of effectiveness. This indicates that similar training practices produced differential results. Questions regarding whether extensive professional development for behavior management is worth the investment are still outstanding, especially for studies that, at minimum, met WWC standards with reservations but were unable to demonstrate sufficient evidence. It is important to note that we did not analyze the effects of most studies that implemented high-quality training practices because they did not proceed to the WWC evaluation or did not meet standards. Authors of those studies may have successfully increased teacher fidelity or discrete behaviors.
An additional consideration is the feasibility of training provided by school-based or district personnel, especially because ongoing coaching (a costly and time-intensive support) was a key training feature identified in this review. To build capacity for efficacious interventions at the school-site level, schools need feasible structures for general educator training. Various mediums exist, such as a tiered approach to coaching based on practitioner need (Fallon et al., 2019), technology-based coaching to allow for remote support (Collier-Meek et al., 2017), or a train the trainer model. Through direct observation, persons providing training and ongoing coaching can match the intensity of support to teacher need (i.e., multi-tiered system for professional development; Gage et al., 2017).
Notably, many studies did not report how teacher participants were trained. This is a concerning finding. Most studies that did not report procedures for how teacher participants were trained did not measure teacher use of a discrete behavior or fidelity within the context of the experimental design. In those studies, teachers were primary implementers, but the main dependent variable of interest was student-level outcomes. Regardless, procedures for training teacher implementers should be provided with replicable precision. Reporting training procedures can help identify efficacious professional development structures that promote teacher-level and, collaterally, student-level outcomes.
Increasing methodological rigor
Measuring practitioner implementation alongside concomitant student outcomes can serve two purposes: (a) identifying optimal conditions under which teacher behavior can be effectively changed and (b) identifying strategies that can promote positive student outcomes. Research that continues to measure changes in both teacher and student behavior can help the field understand mechanisms needed to build capacity for behavioral supports that are implemented in inclusive classrooms. Across 74 studies included in the review, only 39 measured practitioner implementation in the context of the design. This is a compelling finding and one that, if integrated into future research, can improve educational practices for both teachers and students. However, of the 60 SCR and group designs that measured practitioner implementation as a dependent variable, only 36 designs (63.15%) met standards with or without reservations. Thus, there is a clear need for additional rigorous research on this topic. It is important to note that some WWC standards can be challenging to meet when conducting research in applied settings (e.g., attrition, collecting data across consecutive sessions).
Study Limitations and Future Research
The results from the current review should be interpreted with the following limitations in mind. First, we classified many of the student-level interventions as MCIPs. These studies included a number of strategies not reflected in the total number of studies per intervention type (see Table 2). For example, many MCIPs included BSP as a component, but we did not list it as a study that targeted BSP. Next, the first and second authors had low initial IRA for WWC coding for methodological rigor of SCR designs (IRA = 79.10%). Nonetheless, we resolved all discrepancies and demonstrated a high level of agreement on a second set of studies (96.64%). A low IRA also occurred for effect size calculations for all group design studies (57.89%). After re-coding, all discrepancies were resolved (IRA = 100%).
Results from this review suggest multiple directions for future research. First, it is important that researchers produce methodologically rigorous research that measures changes in teacher and student behavior. Consumers of research can then have increased confidence in the results of high-quality studies. Measuring teacher behavior as a dependent variable can provide information on effective training components that result in efficient and therapeutic changes. If training is provided in contrived settings and situations, researchers should assess to what extent skills generalize across classroom environments—a variable this review did not formally assess.
Feasible coaching support is an emerging area of research. To build capacity for the use and sustainment of behavioral support strategies, there needs to be a consideration of coaches’ time and resources available (Collier-Meek et al., 2017). Recent research has included individualized support for teachers’ use of evidence-based classroom management strategies (e.g., Fallon et al., 2019), but additional research is needed. Future iterations of feasible coaching support should be evaluated using SCR and group design research. Future research might also explore whether the intensity of student-level interventions moderates the level of coaching support required to support general educator implementation.
Finally, the current review provides information on trends in general educator behavior management training. Researchers continue to report that general educators experience difficulty with more problematic behavior. Targeting individualized behavioral intervention strategies through high-quality training can enhance a general educator’s ability to respond to behavior that is non-responsive to Tier 1 or Tier 2 strategies (Moore et al., 2017).
Conclusion
Across the past two decades, researchers have responded to general educators’ need for behavior management support by conducting numerous studies focused on training teachers to intervene on problematic or pro-social behaviors. This research base indicates general educators have been trained on a range of low-intensity, Tier 1, and Tier 2 practices. Based on the results of methodologically rigorous studies, training is generally effective at increasing teacher fidelity to those interventions or teacher’s use of discrete behaviors (e.g., BSP). However, there remains a need for additional research on specific training practices that add the most value to training activities, are feasible for school districts to implement, and can support teachers in implementing more intensive or individualized interventions.
Supplemental Material
sj-docx-1-pbi-10.1177_10983007211020784 – Supplemental material for A Systematic Review of General Educator Behavior Management Training
Supplemental material, sj-docx-1-pbi-10.1177_10983007211020784 for A Systematic Review of General Educator Behavior Management Training by Mark D. Samudre, Lauren M. LeJeune, Kate E. Ascetta and Hannah Dollinger in Journal of Positive Behavior Interventions
Supplemental Material
sj-docx-2-pbi-10.1177_10983007211020784 – Supplemental material for A Systematic Review of General Educator Behavior Management Training
Supplemental material, sj-docx-2-pbi-10.1177_10983007211020784 for A Systematic Review of General Educator Behavior Management Training by Mark D. Samudre, Lauren M. LeJeune, Kate E. Ascetta and Hannah Dollinger in Journal of Positive Behavior Interventions
Footnotes
Action Editor: Dan Maggin
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Supplemental Material
Supplemental material for this article is available on the Journal of Positive Behavior Interventions website with the online version of this article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
