Abstract
Class-wide behavioral interventions are a feasible and effective method to support the behavior of all students. In six peer-reviewed studies, Tootling, a class-wide intervention that combines positive peer reporting with an interdependent group contingency, has increased positive peer reports and academically engaged behavior (AEB), and decreased disruptive behavior (DB). However, no prior studies have been conducted with middle school students, and none have employed strategies to promote enduring behavior change. A withdrawal design with maintenance phase, implemented across two middle school classrooms, found moderate effects (nonoverlap of all pairs [NAP] = 0.74, Tau-U = −0.48) of Tootling on decreasing DB and moderate to large effects (NAP = 0.76, Tau-U = 0.68) on increasing AEB, although internal threats to validity prevented the demonstration of a clear functional relationship across both classrooms. Results from the maintenance phase, in which the group contingency reward was removed, suggest promising strategies to support durable behavioral change. Limitations of the present study, directions for future research, social validity, and implications for practice are discussed.
Keywords
Teachers consistently rank disruptive student behavior as a primary concern (e.g., Greer-Chase, Rhodes, & Kellam, 2002; Hoglund, Klingle, & Hosan, 2015). In recent surveys, more than 38% of teachers reported that disruptive behavior (DB) interfered with their teaching and more than 75% of secondary school teachers believed their teaching would be more effective if these behaviors were reduced (National Center for Education Statistics, 2012). Disruptive classroom behavior has been repeatedly related to both immediate and distal negative outcomes for students exhibiting these behaviors including poor academic achievement, delinquency, more significant mental health needs, and adult criminal behavior (Bradley, Doolittle, & Bartolotta, 2008; Greer-Chase et al., 2002; Suldo, Gormley, DuPaul, & Anderson-Butcher, 2014). Exposure to these behaviors has been found to be detrimental to the other students in the classroom, both in terms of lost instructional time and increasing risk of future social–emotional and behavioral issues (Kellam, Ling, Merisca, Brown, & Ialongo, 1998; Sterling-Turner, Robinson, & Wilczynski, 2001). Most critically, the longer DB persists, the greater the risk for students and the more intractable, resistant, and expensive it is to treat (Bradley et al., 2008).
In response to significant school behavioral health need, there have been calls for focusing on prevention, integrating with the public health model, and targeting systems most likely to have the broadest impact (Suldo et al., 2014). A multitiered system includes three tiers of supports: (a) primary prevention (Tier 1), including school- and class-wide systems and supports directed toward all students, such as effective classroom management and academic support; (b) targeted interventions (Tier 2), such as small-group administered social skills, or self-management interventions for those who require increased adult attention and monitoring; and (c) intensive individual supports (Tier 3), administered to the small group of students (~5%) who are unresponsive to Tier 1 and 2 supports (Fuchs, Fuchs, & Compton, 2012).
Although there are multiple components that must be in place for a multitiered model to be successful, Tier 1 supports are considered the foundation. This is because without effective universal intervention in place, there will be more students in need of targeted and intensive supports than can feasibly be served. Tier 1 supports involve the use of class-wide behavioral interventions, evidence-based teaching strategies used with all students in a classroom to promote social and behavioral skills and decrease DB (Farmer et al., 2006). Class-wide interventions are feasible to implement, employ the efficiency of prevention efforts, and are effective at improving behavior for all students in the classroom (Chaffee, Briesch, Johnson, & Volpe, 2017). Unfortunately, however, research has found that the interventions typically used in schools to address DB are largely based on personal experience as opposed to empirically validated practices (Bramlett, Murphy, Johnson, Wallingsford, & Hall, 2002). As such, it is unclear the extent to which the strategies used will actually be effective.
Although research has been conducted on class-wide interventions over the past 50 years, the adoption of the tiered school behavioral health model has increased the focus on effective class-wide interventions. Results of a recent meta-analysis of class-wide interventions conducted in the general education classroom indicated that token economies, the Good Behavior Game (GBG; Barrish, Saunders, & Wolf, 1969), and interdependent group contingencies were equally effective at improving problematic student behavior (Chaffee et al., 2017). A majority of the studies identified in the meta-analysis involved group contingencies, in which the consequences delivered (e.g., rewards, punishment) were based on each individual’s behavior (i.e., independent), the collective group performance (i.e., interdependent), or the behavior of a specific student in the group (i.e., dependent; Litow & Pumroy, 1975). Interestingly, however, nearly half of the identified interventions were punitive in nature, such as focusing on problem behavior rather than the appropriate replacement behavior or involving response cost techniques in which points or rewards were removed rather than earned. However, punitive interventions may inadvertently escalate challenging behaviors (Martin & Pear, 2016), and interventions focused on teaching and reinforcing appropriate replacement behaviors produce more effective and enduring behavioral change (LeGray, Dufrene, Sterling-Turner, Olmi, & Bellone, 2010).
Tootling
One emerging intervention that aims to provide class-wide positive behavior supports is Tootling, a class-wide intervention involving student reports of peer prosocial and appropriate behaviors coupled with an interdependent group contingency reward (Skinner, Skinner, & Cashwell, 1998). The teacher distributes index cards to students and challenges them to record “tootles,” or positive behaviors exhibited by peers (e.g., seeing another student help pick up dropped papers, raising hand to participate instead of calling out). At the end of the period, the teacher collects, counts, and reads aloud five examples of tootles, and then praises the positive behaviors recorded on the index cards. Students in the class all earn a reward when the class meets its tootle goal (i.e., interdependent group contingency), and cumulative progress toward a predetermined criterion is publicly posted. Tootling functions by leveraging the praise and positive reinforcement of the teacher, as well as positive peer social pressure to earn the group reward to encourage appropriate behavior.
Over the past 20 years, there have been six published studies examining the effectiveness of Tootling. These studies have shown Tootling produces increased peer reports of prosocial behavior (Cashwell, Skinner, & Smith, 2001; Skinner, Neddenriep, Robinson, Ervin, & Jones, 2002), decreased class-wide DB (Cihak, Kirk, & Boon, 2009; Lambert, Tingstrom, Sterling, Dufrene, & Lynne, 2015; Lum, Tingstrom, Dufrene, Radley, & Lynne, 2017; McHugh, Tingstrom, Radley, Barry, & Walker, 2016), and increased class-wide engaged behavior (Lambert et al., 2015; Lum et al., 2017; McHugh et al., 2016). However, limitations of this work should be noted. To date, Tootling has primarily been implemented in elementary classrooms (Cashwell et al., 2001; Cihak et al., 2009; Lambert et al., 2015; McHugh et al., 2016; Skinner et al., 2002), with only one study conducted with students at the secondary level (Lum et al., 2017). As children enter adolescence, peer social relationships become increasingly complex and important. Furthermore, the type of behavior demonstrated and accepted by peers and classmates has been shown to affect individual academic achievement and school engagement (Carter et al., 2014; Lynch, Lerner, & Leventhal, 2013). Although Tootling has the potential to capitalize on the power and importance of peer relationships at the middle school level, it is unknown whether the Tootling intervention will be rejected by students as juvenile (Wigfield, Lutz, & Laurel Wagner, 2005). Furthermore, all prior studies of Tootling have occurred in the same geographic area (i.e., rural, southeastern United States), limiting the generalizability of prior results. Finally, an important goal of a successful behavioral intervention is to achieve enduring behavioral change. It has been suggested that increased rates of appropriate classroom behavior will be naturally reinforcing as they allow for increased positive peer and adult interactions, ultimately facilitating generalization of appropriate behavior (McConnell, 1987). However, although several prior Tootling studies have recommended further research into the maintenance of the intervention’s positive behavioral effects (Cihak et al., 2009; Lum et al., 2017), no study to date has employed a strategy to support maintenance of the effects of the intervention from programmed reinforcers (e.g., group contingency reward) to natural reinforcers (e.g., improved peer relationships, teacher praise).
Given these limitations of the existing literature base, the purpose of the current study was to answer the following research questions:
Method
Permission to conduct the study was obtained from school district administrators and school principals. In addition, all procedures were approved by the Institutional Review Board (IRB). Informed consent was obtained from each participating teacher.
Participants and Setting
Two general education classrooms from a middle school in a Northeastern metro area participated in this study. The school had approximately 614 students enrolled with 9.6% receiving free or reduced lunch, 6.5% English Language Learners, and 19.1% students receiving special education support. The middle school used a rotating schedule and had six 42- to 49-min blocks within one school day. The two participating teachers had contacted the school psychologist to request support to address disruptive student behavior in their classrooms.
Classroom A was a sixth-grade, general education English/language arts classroom consisting of 17 students (59% male; 59% White, 35% Asian American, 6% Black), one of whom was on a 504 plan for attention-deficit/hyperactivity disorder (ADHD). The teacher was a 31-year-old White female with a master’s degree in her sixth year of teaching. Classroom B was a sixth-grade general education inclusion social studies classroom of 24 students (54% male; 67% White, 21% Asian American, 12% Black), four of whom received special education services. Three students received special education services under the category of other health impairment for ADHD and one under the category of traumatic brain injury. An additional two students had a 504 plan for ADHD. In Classroom B, the teacher was a 54-year-old White male with a master’s degree in his 30th year of teaching, who was assisted by a special education paraprofessional on 3 out of every 6 days. The classes were taught during the same rotating block, and observation order alternated when they occurred within the period (e.g., first 20 min, last 20 min).
Dependent Variables
Class-wide behavior
The two primary dependent variables in this study were class-wide DB and AEB. DB was selected as a primary target for intervention because of its association with both immediate and long-term impacts on behavioral and mental health, as well as educational achievement for all students in the classroom (e.g., Greer-Chase et al., 2002; Kellam et al., 1998). As an important academic enabler contributing to academic and classroom success (DiPerna, Volpe, & Elliott, 2002), AEB was selected as a secondary target for intervention.
Per prior Tootling studies (Lambert et al., 2015; Lum et al., 2017), DB was operationally defined as a student demonstrating (a) out of seat behavior without permission, (b) audible vocalizations that were not permitted (e.g., talking, singing), or (c) motor activity not associated with the assigned task (e.g., physically touching another student, manipulating objects). AEB included both passive and active academic engagement. Active engagement was operationally defined as when the student was actively involved with academic tasks (e.g., reading aloud, writing) and/or speaking with a teacher or peer about the assigned material. Passive engagement was defined as attending to (e.g., looking at, listening to) the assigned work.
Data were collected in each classroom by the primary investigator and two trained observers. Observers were graduate students in school psychology who completed a training prior to the implementation of the study. During the training session, observers were trained on the operational definitions for DB and AEB by the primary investigator. Once the operational definitions were mastered, as demonstrated by above 90% accuracy on a quiz, observers conducted simultaneous observations of previously coded videos until a .80 interobserver agreement (IOA) criterion was achieved. IOA was calculated by dividing the total number of agreements by the total number of intervals, and then multiplying this value by 100.
Using the Behavioral Observation of Students in Schools (BOSS; Shapiro, 2013) mobile application, the dependent variables were measured during 20-min observations using a 15-s momentary time sampling (i.e., AEB) and partial interval (i.e., DB) recording procedure. At the start of each new interval, the BOSS mobile app cued the observer with a vibration and a student was observed. When cued for each subsequent interval, the observer rotated to a new student in a predetermined fixed pattern based on the student seating chart. This pattern included all students in each class and was continuously repeated for all intervals in the 20-min observation. Based on prior research on observational methods, the method of observing engagement in individual students in a fixed pattern provides valid estimates of class-wide behavior and is feasible (Briesch, Hemphill, Volpe, & Daniels, 2015). The observer also took notes about the time of the observation, number of students in attendance, and class activities. Data collection procedures were consistent across screening, baseline, intervention, and maintenance phases. Observations were scheduled for all of the school days of the study; however, either observations were not conducted or data were not included if the class activity deviated significantly from typical instruction (e.g., state-wide standardized testing, exam).
Social validity
To more fully understand the acceptability of and satisfaction with the intervention procedures, both teacher and student social validity were assessed using the Usage Rating Profile–Intervention, Revised (URP-IR; Chafouleas, Briesch, Neugebauer, & Riley-Tillman, 2011) and the Children’s Usage Rating Profile (CURP; Briesch & Chafouleas, 2009a) following the completion of the study.
Teacher-reported usability
The URP-IR is a 29-item scale that assesses multiple factors related to sustained intervention usage including acceptability, teacher understanding, home–school collaboration, feasibility, system climate, and system support (Chafouleas et al., 2011). Items were rated on a 6-point Likert-type scale, with higher ratings generally indicating higher perceived usability. The URP-IR has demonstrated solid internal consistency, with the reported Cronbach’s alpha for the subscales ranging from .67 to .95 (Briesch, Chafouleas, Neugebauer, & Riley-Tillman, 2013). Modifications included changing the tense of some words and the addition of Tootling-specific language.
Student-reported usability
The CURP is a 21-item questionnaire on which students were asked to rate the personal desirability, feasibility, and understanding of the intervention using a 4-point Likert-type scale. Briesch and Chafouleas (2009b) reported Cronbach’s alpha for the three subscales of the CURP to range from .75 to .92. Similar to modifications to the URP-IR, the tense of some words was changed and the language made more specific to the Tootling intervention on the CURP. Adjustments were made to the CURP to ensure that the name of the intervention matched the name chosen by the students in the class. All other aspects of the CURP, including one reversed-scored item, remained intact. The ratings for the six to eight items from each scale were summed and then averaged across all the students in the class.
IOA
IOA for class-wide DB and AEB was assessed between the primary research assistant and another trained observer for a minimum of 33% of observations (Classroom A: 46% of all observations, Classroom B: 38% of all observations) for each phase of the study in each classroom. IOA data were calculated as described earlier. Observers were required to obtain at least 80% IOA with the primary observer. If agreement fell below that criterion, observers were retrained on the procedures and operational definitions before conducting further observations. This retraining occurred only once for Classroom A (Day 3 of the second intervention phase). IOA for Classrooms A and B averaged 91.97% (range = 78.75%–100%) and 92.50% (range = 85.00%–98.75%), respectively, across both DB and AEB.
Study Design and Procedures
A single-subject, A–B–A–B–C reversal design with a maintenance phase was implemented in two middle school classrooms. Reversal designs are the most powerful within-subject design for demonstrating functional relations between the independent and dependent variables through prediction, verification, and replication (Gast & Baekey, 2014). Per What Works Clearinghouse (WWC) single-case design (SCD) study design standards, at least five observations were collected within each phase (Kratochwill et al., 2010). Phase change decisions were made based on visual analysis of DB data, given that this was the primary dependent variable.
Prebaseline
Prior to baseline, teachers were asked to use typical classroom management procedures. Similar to procedures used in prior Tootling studies (Lambert et al., 2015; Lum et al., 2017; McHugh et al., 2016), researchers conducted a 20-min screening observation to ensure that the classroom met the inclusion criteria for students exhibiting class-wide DB in at least 30% of observed intervals (Classroom A = 40.00%, Classroom B = 31.25%). This screening observation was intended to prevent floor effects and ensure that the baseline level of DB was in need of change (Kratochwill et al., 2010).
Baseline
Both classes began in the baseline phase, in which data were collected on the dependent variables while typical instructional and classroom management practices were in place. Teacher A reported using flexible seating and logical consequences, and was observed to use verbal prompting as classroom management procedures. Teacher B described his classroom management procedures as “non-existent,” and was observed to use verbal prompts and loss of privileges (e.g., lunch, recess). Classes remained in the baseline condition for at least 5 days or until a predictable pattern of behavior was established.
Teacher training
Following establishment of a stable pattern of behavior in the baseline phase, the researcher conducted a training session with each teacher. Teachers were introduced to the intervention, observed the primary researcher modeling the student training, participated in role-plays to introduce the intervention to students, and were provided implementation scripts for the student training and daily implementation of Tootling. During the meeting, the teachers were also assisted in developing a reward menu. Trained graduate students conducted observations of both the teacher trainings and subsequent student trainings to ensure that participating teachers and students were properly trained in the intervention.
Intervention
Based on the prior implementation of Tootling at the high school level (i.e., Lum et al., 2017) and other group contingency interventions implemented at the secondary level (e.g., Kleinman & Saigh, 2011), adaptations made to the intervention included (a) describing the intervention as a competition, (b) having students vote on a new name for the intervention, and (c) calling the positive peer reports “positive comments” rather than tootles. On the first school day of the intervention, the teachers introduced the Tootling intervention to the class and conducted a student training session, which included student practice writing positive peer comments and teacher feedback regarding the acceptability of the comments (i.e., name an individual student, specific, anonymous, focused on observed behavior). Each teacher solicited student ideas for names of the intervention and then conducted an anonymous vote. Classroom A chose Mores Magni Challenge (i.e., Good Behavior Challenge in Latin), and Classroom B chose Complementation. In addition to the teacher-identified reward list generated during the teacher training, the students in each class provided additional reward suggestions. In a class-wide vote, Classroom A selected a 15-min recess and Classroom B selected students choosing their own seats for a day.
The school day following the student training, each classroom teacher implemented Tootling procedures as described above. Teachers dispensed 3 × 5 inch note cards to the students and encouraged them to document their peers’ appropriate behavior. Prior to the end of each class, the teacher collected the written positive comments. Five minutes prior to class dismissal, the teacher randomly selected five tootles, read them aloud to the class, and provided praise to the students for the appropriate behavior listed on the tootle. The teacher also announced how many tootles were reported and recorded the number of tootles on a provided thermometer visual display that documented the class’ progress toward the cumulative goal. Tootles that were incorrect or inappropriate (e.g., identifying inappropriate behavior, a joke, identifying a positive attribute rather than describing a prosocial peer behavior) were ignored and not counted. Multiple tootles reporting the same prosocial behavior were counted individually.
The criterion for earning the interdependent class reward was determined by the teacher based on the value of the selected reward, and was initially set at 50 by each teacher. The teacher announced this criterion to the students in the class. When the class met the goal linked to the chosen reward (e.g., 50 tootles for a 15-min recess), the entire class received the predetermined reward. If the class did not meet the goal, the total number of tootles was applied to the next day’s total. Following the second day of the intervention, however, anecdotal reports suggest that students in Classroom B began openly discussing how to “outsmart” the intervention and earn the reward immediately by writing three positive comments each, often of duplicate behaviors. Teacher B adjusted the procedures to distribute only two index cards to each student and increased the criterion for reward to 160 positive comments.
Withdrawal
Following clear treatment effects on DB in each classroom during the first intervention phase, the positive peer reporting procedures and interdependent group contingency were withdrawn. The teachers notified students that the class would no longer be playing the game and all intervention materials were removed. As in the baseline condition, observers conducted daily observations of class-wide DB and AEB. On Day 3 of the withdrawal phase, one environmental change that could not be controlled for involved the return of the regularly assigned special education paraprofessional from medical leave. This staff member was present in Class B 3 out of every 6 days for the remainder of the study. Thus, the design for Classroom B was reclassified as A–B–A–A’–B’–C to reflect the unexpected staffing change.
Reimplementation
After the withdrawal phase, the positive peer reporting procedures and group contingency were reimplemented as they were in the initial intervention phase. Both teachers chose to reinstate the classroom’s progress toward their previous goal.
Maintenance
Once a treatment effect was documented in the second intervention phase, teachers were instructed to move into the maintenance phase. To promote the sustained feasibility of the intervention over time, the procedures of peers recording the tootles and teachers reading them aloud remained constant, but the interdependent group contingency was removed. This condition lasted 5 days. Teacher and student social validity assessments (e.g., URP-IR, CURP) were administered following the last day of the maintenance phase.
Treatment integrity
Treatment integrity was assessed throughout the intervention, withdrawal, and maintenance phases to ensure and verify that the teachers and students were trained appropriately and the intervention was implemented as planned. Treatment integrity measures included checklists to monitor the teacher training conducted by the primary investigator and student introduction and intervention training by the teacher. Integrity data reflected 100% fidelity to procedures across the two teachers during both the teacher Tootling training sessions and the student Tootling training sessions.
The primary observer collected treatment integrity data for all observations during intervention, withdrawal, and maintenance phases. However, as the two classes occurred during the same class period, observations alternated between either the first or last 20 min of the period. Treatment fidelity IOA was obtained by a secondary observer for at least 33% (Classroom A: 48%, Classroom B: 38%) of occasions. Observers measured intervention treatment integrity in both classrooms using a 10-item checklist. Certain aspects of the intervention occurred only at the beginning (e.g., reminding students of positive comments procedures) or end (e.g., tallying the total number of positive comments) of the period; therefore, those specific items were unable to be directly observed on each day. Treatment integrity was based on the percentage of steps able to be observed each day. Treatment integrity, as rated by the observers, averaged 96.15% (range = 80.00%–100%) for Classroom A and 93.33% (range = 70.00%–100%) for Classroom B. Although levels of treatment integrity were high overall, it is of note that at times both teachers failed to implement the same items, which included reminding students to be looking for positive peer behaviors, reviewing the procedures, and reminding students of their progress toward the goal. The teachers may have felt that these components of the procedure were redundant on a daily basis. Treatment integrity data during withdrawal and maintenance phases averaged 100% in both classrooms. IOA for treatment integrity was 100% across all observations in both classrooms.
In addition, teachers were provided with self-monitoring checklists of daily intervention and maintenance procedures; however, neither teacher regularly completed these checklists. Instead, both teachers reported primarily using the checklists as a prompting tool.
Performance feedback was given to teachers by the primary investigators if treatment integrity data fell below 80%, or if the teacher sought assistance. The only instance of performance feedback occurred for Teacher B on the second day of the first intervention phase, when treatment integrity was 70%. In this case, the teacher sought assistance from the researcher, given that multiple students were writing positive comments for the same student behavior in an intentional effort to increase the class progress toward the goal.
Data Analysis
Class-wide behavioral data were graphed as a percentage of intervals in which DB and AEB were recorded out of total intervals during 20-min observations. Although visual analysis may be used to examine several different features of the data (e.g., changes in level, trend, and variability, as well as the immediacy of effect), data gathered for both classrooms during baseline and intervention were inspected visually for changes in level and trend to determine treatment effects (Kazdin, 1982). Trend and level were emphasized during visual analysis as opposed to immediacy of effect because Lum et al. (2017) showed a gradual overall behavioral response when implementing Tootling with older students. Although there is natural variability in student behavior due to classroom factors (e.g., academic demand, instructional modality), both a gradual directional and average overall change is expected in response to the introduction and withdrawal of the intervention.
In addition to visual analysis, nonoverlap metrics were calculated to allow for effect comparison with prior Tootling studies and other class-wide interventions. Given the lack of consensus regarding quantitative analysis for single-case design studies the WWC recommends the use of multiple indices (Kratochwill et al., 2010). Thus, two nonoverlap metrics, nonoverlap of all pairs (NAP; Parker & Vannest, 2009) and Tau-U (Parker, Vannest, Davis, & Sauber, 2011), were used to further evaluate the treatment effects of the Tootling intervention. NAP and Tau-U have been used to describe the size of effects seen in prior Tootling studies (e.g., Lambert et al., 2015; Lum et al., 2017; McHugh et al., 2016), allowing for direct comparison of the effects of Tootling on student behavior. All NAP and Tau-U calculations were performed for each adjacent phase contrast separately (i.e., each A–B of the withdrawal design) and then combined into an overall weighted value for each classroom using a web-based calculator (Vannest, Parker, Gonen, & Adiguzel, 2016). Data from the maintenance phase were not included in the overall weighted value. Maintenance data were compared with the initial baseline phase as an index of the durability of behavioral change (Beeson & Robey, 2006).
NAP
The NAP nonoverlap metric builds upon the visual analysis of single-case design graphs and, like other nonoverlap metrics, summarizes data overlap between Phases A and B. Related to area under the curve (AUC) analyses, NAP compares each Phase A data point with each Phase B data point, and determines the probability that a random treatment data point will be greater than a random baseline data point. Although NAP is affected by baseline trend, it has no a priori assumptions, includes all data points in calculations, and has shown good discriminability as compared with other nonoverlap metrics (Parker & Vannest, 2009). NAP values between 0.00 and 0.65 are considered weak effects, scores between 0.66 and 0.92 are moderate effects, and scores from 0.93 to 1.00 are considered strong effects (Parker & Vannest, 2009).
Tau-U
Tau-U is a conservative nonoverlap metric that allows for baseline and/or intervention phase trend control. As it includes all data points in calculations, it is resistant to outliers and a small number of data points. Tau-U is derived from the Kendall’s rank correlation and the Mann–Whitney U test between groups. In contrast to methods relying on parametric assumption or linear trends, Tau-U is more reliable at identifying trend with limited data points (Parker et al., 2011) and has been used recently within the single-case literature (e.g., Bowman-Perrott, Burke, Zaini, Zhang, & Vannest, 2016; Chaffee et al., 2017) and in prior Tootling intervention studies (Lum et al., 2017). However, like all single-case quantitative metrics, Tau-U has limitations. Tau-U is affected by the number of data points, values may exceed the conventional bounds of ±1, and it is not sensitive to the magnitude of change when there is no overlap between baseline and intervention (Tarlow, 2017). Despite these limitations, Tau-U was selected for the aforementioned advantages and to allow for direction comparisons of the results of this study with prior Tootling and class-wide intervention studies. Tau-U values of 0.20 to 0.60 have been interpreted as a moderate effect, 0.60 to 0.80 as a large effect, and above 0.80 as a large to very large effect (Vannest & Ninci, 2015).
Results
The Tootling intervention was evaluated in two middle school classrooms using a single-case reversal design. A functional relationship was demonstrated in Classroom A between the Tootling intervention and increases in AEB, whereas effects on DB were somewhat mixed. However, the results obtained in Classroom B were more difficult to interpret, as some threats to internal validity occurred. Results specific to each classroom are discussed in the following section.
Classroom A
During baseline, the students in Classroom A exhibited DB during an average of 26% of intervals (range = 19%–31%), with no visible trend in the data (see Figure 1, top panel). When Tootling was introduced, the mean level of DB decreased to 17% (range = 4%–24%), and a significant decreasing trend was observed. The intervention was then withdrawn and the level of DB immediately increased from 4% to 15%. Throughout the withdrawal phase, the level of DB remained fairly consistent (M = 15%, range = 13%–21%) and lacked trend. When the intervention was reintroduced, the level of DB decreased slightly (M = 12%, range = 7%–16%), with an overall slight decreasing trend. During the maintenance phase, DB was observed during an average of 15% of intervals (range = 10%–24%) and no trend was observed. Across the intervention phases, the data show a strong or slight negative trend. Overall, Tootling in Classroom A had a moderate effect (Tau-U = −0.48, NAP = 0.92) in decreasing DB when using weighted Tau-U and NAP calculations (see Table 1). As an index of durable change from the intervention, DB in baseline compared with the maintenance phase demonstrated a moderate (NAP = 0.92) to large effect (Tau-U = −0.84).

Effects of tootling on middle school classrooms.
Effect Size Calculations.
Note. NAP = nonoverlap of all pairs; W = weak effect; M = moderate effect; L = large effect.
Wrong direction.
During the baseline phase, class-wide AEB data reflected a negative trend and occurred in an average of 74% of intervals (range = 69%–83%). However, when Tootling was introduced, AEB displayed a strong, positive trend coupled with increases in level (M = 84%, range = 75%–99%). When Tootling was subsequently removed, the class-wide AEB level immediately dropped from 99% to 71% and showed a decreasing trend (M = 80%, range = 70%–86%). When the intervention was reintroduced, the level increased (M = 84%, range = 75%–95%) and an increasing trend was observed. Finally, during the maintenance phase, AEB decreased slightly in level (M = 80%, range = 76%–86%) with no visible trend. Both intervention phases exhibited strong positive trends and levels of AEB that approached the ceiling (100%). Overall, calculations indicated Tootling had a moderate (NAP = 0.76) to large (Tau-U = 0.68) positive effect on AEB. Analyses of maintenance phase data compared with baseline showed a moderate (NAP = 0.80) to large (Tau-U = 0.76) durable effect of Tootling on AEB.
Classroom B
During the baseline phase, students in Classroom B (see Figure 1, bottom panel) demonstrated class-wide DB during an average of 37% of intervals (range = 23%–45%) and a slight increasing trend was observed. When Tootling was introduced, the level of DB decreased substantially to an average of 20% of intervals (range = 20%–34%) with no visible trend observed. Upon withdrawal of the intervention, DB initially increased but then decreased on the third day of implementation (at which time a staffing change occurred; see dotted line in Figure 1), leading to a steep negative trend (M = 20%, range = 13%–31%). When the intervention was reintroduced, the level of DB initially increased from 12% to 25%; however, the overall level was similar to that during the withdrawal phase (M = 20%, range = 14%–28%) with a slight decreasing trend. Finally, during the maintenance phase, DB decreased in level (M = 14%, range = 5%–24%) with a decreasing trend. Although nonoverlap metrics for class-wide DB in Classroom B indicated an overall weak effect (NAP = 0.63, Tau-U = −0.21), results suggested a large durable effect when comparing the baseline with maintenance phase (NAP = 0.96, Tau-U = −1.04; see Table 1).
Class-wide AEB averaged 65% of intervals (range = 56%–75%) with no visible trend observed during baseline. When the intervention was introduced, the level of AEB increased (M = 75%, range = 68%–80%), with no trend observed. Upon the withdrawal of Tootling, the intervals in which AEB occurred initially decreased from 78% to 65%; however, on the third day of this phase, the level of AEB increased dramatically from 56% to 79%. This dramatic change in behavior coincided with the abovementioned staffing change. Overall, AEB averaged 72% of intervals (range = 56%–79%) during the withdrawal phase with an overall increasing trend. Tootling was then reintroduced, and there was an increase in the level of AEB (M = 82%, range = 74%–89%) with a slight increasing trend. During the maintenance phase, AEB remained at the same level as the prior intervention phase (M = 82%, range = 73%–89%) with a slight increasing trend. Tau-U calculations indicated Tootling had a moderate (NAP = 0.92) to large (Tau-U = 0.79) effect at increasing AEB in Classroom B. The durable effects (baseline vs. maintenance) of Tootling on AEB for Classroom B were large (NAP = 0.96, Tau-U = 0.92).
Social Validity
Teacher-reported usability
Results of the URP-IR suggested that both Teacher A and Teacher B understood the intervention components (M = 5.67), believed that the intervention was feasible (Teacher A: M = 4.83; Teacher B: M = 4.33), felt that it required minimal home–school collaboration (Teacher A: M = 4.83; Teacher B: M = 4.33), and felt that the system was supportive of its use (Teacher A: M = 4.20; Teacher B: M = 4.60). However, Teacher A reported higher levels (M = 4.78) than Teacher B (M = 3.78) on the factor of acceptability. Specifically, Teacher B slightly disagreed and disagreed that the intervention was a fair way or a good way to handle the child’s behavior problem. Teacher B’s results also indicated that he believed that the intervention required significant system supports (e.g., consultative support, professional development; M = 4.67); however, Teacher A’s ratings (M = 1.33) indicated that the intervention could be implemented with low levels of system supports.
Student-reported usability
CURP results indicated that whereas both classes agreed that the intervention was feasible (Classroom A: M = 1.58; Classroom B: M = 1.69) and that they understood the intervention (Classroom A: M = 3.35; Classroom B: M = 3.34), students in Classroom A reported somewhat higher levels of personal desirability (Classroom A: M = 3.20; Classroom B: M = 2.80).
Discussion
Although there is an emerging evidence base in support of the use of Tootling to decrease disruptive classroom behavior (e.g., Cihak et al., 2009; Lambert et al., 2015; Lum et al., 2017; McHugh et al., 2016) and increase appropriate behavior or academic engagement (e.g., Lambert et al., 2015; Lum et al., 2017; McHugh et al., 2016), studies to date have implemented these procedures almost exclusively with elementary school students. Use of a positive peer reporting intervention in middle school classrooms has the potential to positively capitalize on this pivotal time of social pressure and self-growth; however, at the same time, it may be rejected as students attempt to assert independence from adults and their own maturity. This study sought to replicate the effects of prior studies of Tootling within two middle school classrooms. Given that previous research had demonstrated that the effects of Tootling did not necessarily maintain in the absence of the intervention (e.g., Cihak et al., 2009; McHugh et al., 2016), we not only utilized an A–B–A–B–C reversal design but also sought to assess the maintenance of effects in the absence of more labor-intensive components.
Conclusions regarding the effect of Tootling on class-wide behavior in middle school students must be made somewhat tentatively. Although visual analyses in Classroom A revealed an increasing trend in AEB as well as a decreasing trend in DB each time that the Tootling intervention was introduced, it is notable that there was not a high degree of separation across phases. This may have been due, in part, to the baseline levels of behavior, which were less problematic than in prior studies. For example, whereas the mean baseline levels of AEB and DB in the current study were 74% and 26% of intervals, respectively, prior Tootling studies have documented mean baseline levels of AEB below 60% (e.g., Lambert et al., 2015) and mean levels of DB as high as 54% (e.g., McHugh et al., 2016). As such, there was less room for improvement than in some prior related work. In addition, neither DB nor AEB fully returned to baseline levels during the withdrawal phase, resulting in a fair degree of overlap across phases. This is similar to the results of the one prior Tootling study conducted at the secondary level (Lum et al., 2017), which demonstrated data trends in the predicted direction with notable overlap across phases. In contrast, results of Tootling studies conducted at the elementary school level (e.g., Lambert et al., 2015; McHugh et al., 2016) exhibited minimal overlap across phases. Although this carryover is not desired with regard to a functional relationship, it may indicate that Tootling helps secondary students learn to display positive behavior even without the external reward in place. Tootling may support entrapment, the shift of newly acquired behaviors to natural reinforcement, by making positive peer behavior more salient to all students through the public positive peer reporting (McConnell, 1987; Skinner et al., 2002). The increased salience of positive peer behavior coupled with the interdependency has been shown to improve student behavior for all students, as well as targeted students with behavioral disorders (Algozzine, Daunic, & Smith, 2015). Even with the removal of the external reinforcement, it is likely students maintained their focus on the positive behavior of their peers.
Although results appeared to be modest based on visual analysis, quantitative analyses utilizing both NAP and Tau-U found moderate effects of Tootling on DB (NAP = 0.74, Tau-U = −0.48). These results were slightly smaller than the effects found by prior published Tootling studies (i.e., Lambert et al., 2015: NAP = 0.88–1.00; McHugh et al., 2016: NAP = 0.92–1.00; Lum et al., 2017: NAP = 0.92–0.95). In addition, both NAP and Tau-U analyses found moderate to large effects of Tootling on AEB (NAP = 0.76, Tau-U = 0.68). These results replicated the effects found by prior elementary-level studies (i.e., Lambert et al., 2015: NAP = 0.90–1.00; McHugh et al., 2016: NAP = 0.92–1.00), and were slightly larger than effects previously found at the high school level (i.e., Lum et al., 2017: NAP = 0.82–0.86) for AEB. As hypothesized, students experiencing the unique developmental period of middle school may be slightly less receptive to the components of Tootling, potentially due to peer social pressure around positive comments. Students may consider it the teacher’s role to dispense compliments and, thus, may resist writing and receiving positive comments themselves for fear of being aligned with the adults rather than their peers. Alternatively, or in conjunction, during this period of self-identification that occurs in the middle school years, students may be self-conscious about appearing too juvenile in appreciating the positive comments. However, given that Tootling is largely student maintained and a feasible class-wide intervention, the moderate to large effects sizes on class-wide DB and AEB are encouraging for general education teachers seeking Tier 1 interventions for middle school classrooms.
Although there were three opportunities to demonstrate an effect in Classroom A, threats to internal validity prevented this from occurring in Classroom B. Effect size comparisons of baseline with the first Tootling phase indicate moderate to large effects for DB and large effects for AEB. These results were present despite the teacher’s decision to substantially increase the criterion for reward (i.e., from 50–160 positive comments) and limit students to two positive comments each per class session. These changes were made by Teacher B to maintain the acceptability of the intervention when students started to “game” the positive comment system, and served to distance the reward. Although replication of the effects appeared to continue—at least initially—in the withdrawal phase, there was a clear shift in the levels of both AEB and DB that coincided with the return of an assigned paraprofessional from medical leave and the multiday absence of a particularly disruptive student. It is unknown whether students’ subsequent behavioral response is due to the intervention, the changes in the classroom environment, or both. It should be noted that the levels of target behaviors seemed to stabilize in the withdrawal phase following the return of the paraprofessional, and student behavior in Classroom B responded to the reimplementation of Tootling with increases in AEB and a decreasing trend in DB. However, as neither a functional relationship nor the WWC standards were established for Classroom B, these promising results are inconclusive and must be interpreted with caution.
Strategies for Maintenance of Behavioral Change
To demonstrate meaningful change in behavior, behavioral interventions must achieve results that endure over time; however, this had not been examined in prior Tootling studies. In fact, only two prior Tootling studies included a follow-up phase (Lambert et al., 2015; Lum et al., 2017). One study found that teachers continued to employ the intervention (Lambert et al., 2015); however, the other study found that after 1 week, teachers were no longer using any intervention components and class-wide behaviors had nearly returned to baseline levels (Lum et al., 2017). To address this limitation, the current study explored whether the behavioral gains of the Tootling intervention would be maintained once the reward component was removed. In line with our hypothesis, we found that behavioral changes from Tootling were durable when positive comments procedures were maintained, even with the removal of the interdependent group contingency. This, along with the carryover in the withdrawal phase, suggests that the promotion of a positive classroom environment through Tootling may lead to entrapment, the shift from reliance on external rewards to the natural reinforcement of the teacher and peer social recognition. In fact, Classroom A’s index of durable change (baseline vs. maintenance phases) for both DB and AEB indicated moderate to large effects. Although the data should be interpreted with caution, visual analyses of Classroom B’s maintenance phase seem to confirm—or even improve upon—the maintenance results of Classroom A with the index of durable change suggesting large effects on both DB and AEB. These results indicate that it may not be necessary to keep the rewards component of Tootling to continue to see positive behavioral effects. This is particularly important as research has shown teachers have difficulty sustaining intervention components (Han & Weiss, 2005) and may find external reinforcement procedures less acceptable (Akin-Little, Eckert, Lovett, & Little, 2004).
Social Validity
Even with evidence of effectiveness and strategies to support durable behavioral change, an intervention is of limited value if teachers and students do not want to actually use it. Results suggest that teachers understood the intervention components, believed that the intervention was feasible, and felt that the intervention required minimal home–school collaboration; however, Teacher B found the intervention less acceptable as compared with Teacher A. According to anecdotal report, Teacher B’s baseline classroom management practices emphasized punitive individual student consequences. Research on teacher intervention acceptability has found higher levels of acceptability if the intervention is aligned with teachers’ professional and philosophical orientation (Witt, Elliott, & Martens, 1984). Thus, the fact that the collective and positive focus of the Tootling intervention conflicted with Teacher B’s typical classroom management style may have contributed to these lower ratings.
Although teacher acceptability should be considered, as a primarily peer-maintained intervention, it is particularly important that students view the Tootling intervention as acceptable and feasible. Student results for each classroom paralleled the teacher ratings. Students in Classroom B had lower ratings of personal desirability of the intervention, as compared with students in Classroom A, although the majority of students in both classes reported liking the intervention.
Limitations and Directions for Future Research
As is the case with any study conducted in an applied setting, there are some aspects that could not be controlled and may have affected the results. Of primary consequence in this study, the planned replication of Classroom B was inconclusive due to the return of a paraprofessional from medical leave during the withdrawal phase. This irreversible event threatened the internal validity of the results. In addition, there are some factors that may limit the generalizability of the results. For example, Classroom A had a relatively small class size (n = 17) compared with Classroom B (n = 24), and did not include any special education students. Thus, further replications of Tootling with diverse populations are needed to enhance the external validity and generalizability of the results. In particular, additional replications are needed at the secondary level, given the limited evidence base.
There were also some limitations with regard to study design that may have prevented the demonstration of a strong functional relationship or a full examination of the effects of the intervention. First, Classroom A demonstrated ceiling effects with AEB during both intervention phases. Prior Tootling studies (e.g., Lambert et al., 2015; Lum et al., 2017; McHugh et al., 2016) displayed much lower baseline levels of AEB, potentially allowing for greater intervention effects. Although a screening observation standard of 30% of intervals of DB was administered to prevent floor effects, future studies may consider a stricter standard, multiple screening observation sessions, or one that also involves a criterion for AEB. It may also be beneficial for future studies to employ a multiple baseline design, as this would allow for a longer intervention phase to more fully examine the effects across multiple reward cycles and address any entrapment effects during the withdrawal phase. Furthermore, class-wide behavioral intervention research often relies on teachers to self-identify their classroom as having difficulty. Future research should also survey students to determine whether they perceive a problem before implementing any intervention.
All behavior observation methods include some aspect of error. Although prior research has indicated that observing students individually in a fixed rotating pattern produces valid estimates of class-wide behavior (Briesch et al., 2015), this method does not evaluate the specific behavior or response to intervention for a target student. Future research should also assess both class-wide and target student behavior. In addition, as both classrooms occurred in the same period, observers were not able to directly assess all aspects of treatment integrity on each day of the study as certain components occur at the beginning (e.g., review positive commenting procedures) or end (e.g., read aloud five positive comments) of the class session. Thus, actual treatment integrity might have been lower than what is reported. Self-report measures were planned to capture these missing treatment integrity data; however, teachers did not complete them with regularity. The observations of the classrooms that did occur, as well as the teachers’ informal comments, serve as an indication that the intervention was implemented with integrity.
Finally, the goal of any behavioral intervention is enduring behavioral change. Results from this study suggest that fading out the reward component may promote durable behavioral change; however, the maintenance phase only lasted 5 days. Future research should examine the longer term effects of Tootling on class-wide behavior change.
Implications for Practice
In addition to identifying additional avenues for future research, the current study has several implications for applied practice. This and prior studies suggest that Tootling is a feasible class-wide intervention that holds promise for decreasing overall DB and increasing academic engagement, although additional research is needed to document the evidence base for use at the middle school level. At the core of the Tootling intervention is peer-to-peer written praise, which is aligned with the positive behavioral interventions and supports (PBIS) principles of prevention and positive reinforcement of appropriate behavior. Tootling may be particularly appealing to teachers, who are managing the instructional and behavioral management demands of a classroom, as students are responsible for the majority of tasks (i.e., writing tootles) associated with the intervention. In addition, as there has been markedly less research conducted on class-wide interventions at the middle and high school levels (Chaffee et al., 2017; Maggin, Pustejovsky, & Johnson, 2017), it is promising that results of the current study suggest that both middle school teachers and students found the intervention to be both acceptable and feasible, thus making a potential contribution to the secondary-level intervention toolbox. Finally, as many evidence-based interventions tend to be focused on negative behaviors (Chaffee et al., 2017), Tootling is a promising option for teachers working within a school-wide PBIS context and seeking an aligned class-wide intervention.
Footnotes
Acknowledgements
This study would not have been possible without the financial support of the Society for the Study of School Psychology Dissertation Grant Award and the committed participation of the middle school teachers. Thank you to Kristin Nissen and Taicha Cornelio, our research assistants, for their hours of observations.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Preparation of this article was supported through funding from the Society for the Study of School Psychology (SSSP). Opinions expressed herein do not necessarily reflect the position of the SSSP, and such endorsements should not be inferred.
