Abstract
A within-participant withdrawal design was used to examine the effects of a positive version of the Good Behavior Game (GBG) for three students in an elementary special education classroom for students with emotional or behavioral disorders (EBD). Results indicated immediate improvements in disruptive behavior and academic engagement for all three students when the GBG was implemented which was generally replicated in the second intervention phase for academic engagement. Tau-U effect sizes ranged from .76 to .95 indicating large and very large effects for both behaviors for two students and academic engagement for the third student, but Tau-U for the third student’s disruptive behavior was .32 (considered not effective). Two students exhibited slight decreasing trends in academic engagement behavior and increasing trends in disruptive behavior near the end of intervention phases. A preliminary within-session analysis also suggested students’ behavior was generally better in intervention phases both during and not during the GBG when compared with their baseline levels of behavior. Social validity information suggested generally positive teacher and student perceptions about the GBG initially, with less positive student perceptions 10 weeks following the conclusion of the study. The teacher reported implementing the GBG 5 times over the 10 weeks following the conclusion of the study.
The Good Behavior Game (GBG; Barrish et al., 1969) is an interdependent group contingency, class-wide intervention with a focus on clarifying classroom rules (i.e., behavioral expectations) and providing student groups (i.e., teams) with feedback and consequences depending on whether or not team members adhere to those rules. Although there are several component variations (e.g., varying length of time, using or not using rewards, assigning/removing points for rule violations/following), the GBG is typically played for a brief length of time (e.g., 10 min during a school day) and typically involves defining and teaching classroom rules, dividing the class into two or more teams, removing team points for rule violations or awarding team points for rule following, and awarding daily or weekly rewards for teams meeting set criteria for winning the game (Flower, McKenna, Bunuan, et al., 2014). Recent studies (e.g., Tanol et al., 2010; Wright & McCurdy, 2012) demonstrated positive versions of the GBG where teams are assigned points for rule following are as effective and acceptable as versions of the game where teams are assigned or lose points for rule violations. Furthermore, teachers have indicated a preference for a positive version of the GBG because it helps to establish more positive classroom environments (Tanol et al., 2010), and positive versions of the GBG also stand in alignment with research-based recommendations for teaching and reinforcing appropriate behaviors to promote more positive classroom environments (e.g., Epstein et al., 2008).
Researchers reported positive effects of the GBG for improving day-to-day classroom behavior (e.g., reductions in disruptive or inappropriate behaviors and increases in on-task or academically engaged behaviors) for students across elementary and secondary grade levels, in general education and special education classroom settings, and for whole groups of students (e.g., class-level effects) and individual students (e.g., Bowman-Perrott et al., 2016; Flower, McKenna, Bunuan, et al., 2014). In most studies, researchers typically examined effects of the GBG by collecting data on student behavior only during the brief time when the game is being played; however, recent results reported by Donaldson et al. (2015) and Pennington and McComas (2017) demonstrated that positive effects of the GBG implemented in one classroom activity did not carry over to other classroom activities, suggesting improvements to student behavior did not generalize to other classroom activities, even those occurring in the same classroom and on the same days. The results from the Donaldson et al. (2015) and Pennington and McComas (2017) studies also demonstrated that problem behaviors did not worsen during class times when the GBG was not being implemented. Because of this, Donaldson et al. (2015) suggested teachers identify specific activities or problematic times to implement interventions like the GBG that require more monitoring and effort rather than implementing it for the whole day or for the whole class session.
GBG for Students With EBD
Results from several studies also demonstrated the GBG was generally effective for students who exhibit chronic problem behaviors such as students with or at risk for emotional or behavioral disorders (EBD; e.g., Darveaux, 1984; Donaldson et al., 2017; Groves & Austin, 2017; Johnson et al., 1978; McGoey et al., 2010; Salend et al., 1989; Sy et al., 2016; Tanol et al., 2010). Nevertheless, the research on the effects of the GBG specifically for students with or at risk for EBD is relatively limited. To illustrate, in their meta-analysis of single-case design studies examining the effects of the GBG in classroom settings, Bowman-Perrott and colleagues (2016) reported only 5.6% of participating students were identified as having or being at risk of EBD across all studies included in their review.
Research is even more limited on the effects of the GBG for students placed in self-contained classrooms specifically designated for students with EBD or other chronic problem behaviors. Results from three studies conducted in these types of classrooms demonstrated the effectiveness of the GBG to improve overall class-level behaviors (i.e., reporting results for the entire class or for groups of students within the class; Johnson et al., 1978; Salend et al., 1989; Sy et al., 2016). However, as highlighted in results reported by Donaldson et al. (2017), it may be important to focus on individual student data when analyzing effects of the GBG, perhaps particularly for students who exhibit chronic problem behaviors, in an effort to identify students who may need more individualized and intensive behavioral intervention. In Donaldson et al.’s (2017) study, two students identified by their general education teachers as being the most disruptive were not responsive to the GBG.
To our knowledge, only two studies (Groves & Austin, 2017; Phillips & Christie, 1986) examined the effects of the GBG on individual student’s behavior in self-contained classrooms for students with EBD or other chronic problem behaviors. First, Phillips and Christie (1986) reported results from an AB study conducted in the United Kingdom which suggested potential effectiveness of the GBG to decrease off-task behavior for the most disruptive student in a middle school age-level classroom for students with academic or behavioral problems. However, the AB design used in this study does not meet current single-case design methodological standards (Council for Exceptional Children, 2014; Kratochwill et al., 2010). Second, Groves and Austin (2017) conducted a study in an upper elementary-age classroom in a South Wales special school for students with severe EBD. Specifically, they used an alternating treatments design to compare the effects of two variations in positive versions of the GBG: one version where students earned points individually for adhering to classroom rules and another variation where students worked on teams to earn points for adhering to classroom rules. Groves and Austin (2017) reported reductions in individual student’s problem behavior for each of the four students identified by their teacher to exhibit the highest rates of problem behaviors for both variations when compared with observation sessions when the GBG was not being implemented. Further research is needed, however, to build on Groves and Austin’s demonstration of positive individual student outcomes of the GBG in a classroom for students with EBD.
Further research is also needed to examine the social validity of the GBG for students with EBD in particular. To illustrate, recent reviews on the GBG (Bowman-Perrott et al., 2016; Flower, McKenna, Bunuan, et al., 2014) indicated a limited focus on the social validity of the GBG, in general. For example, Bowman-Perrot et al. (2016) reported only approximately 62% of studies included in their meta-analysis reported social validity information. When reported, social validity information indicated generally positive perceptions of the GBG (Bowman-Perrot et al., 2016). However, in at least four studies and including the Groves and Austin study conducted in a classroom for students with EBD (Flower, McKenna, Meuthing, et al., 2014; Groves & Austin, 2017; Mitchell et al., 2015; Salend et al., 1989), social validity information suggested students have mixed perceptions about the GBG (i.e., one or more student in each study indicated neutral or negative perceptions). Furthermore, in a study reported by Flower, McKenna, Muething, et al. (2014), the teacher provided favorable social validity ratings for the GBG and its positive effects, but at the 5-week follow-up visit, the teacher had discontinued implementing the game. In light of limited and mixed information about the social validity of the GBG, it is particularly important to gather social validity information focused on the acceptability and viability of using the GBG as a long-term intervention approach for students who exhibit chronic problem behaviors.
Purposes
The current study was designed to build on the published literature to extend the research base on the effects of the GBG for individual students with EBD, gather further information about the social validity of the GBG for students with EBD, and provide further descriptive information about student outcomes observed across whole class sessions where the GBG is implemented for only a portion of the class time. Specifically, the primary research question was: Does a teacher’s implementation of a positive version of the GBG in an elementary special education classroom for students with EBD result in improvements in individual students’ academic engagement and disruptive behaviors? A secondary purpose was to gather and summarize information related to the social validity of the GBG in this type of classroom setting, including gathering information about students’ and teacher’s perceptions about the GBG just after the initial implementation of the GBG, gathering student and teacher perception information 10 weeks after the completion of the study, and gathering information on the teacher’s reported continued use of the GBG across the 10 weeks after the completion of the study. In addition, we conducted a preliminary within-session analysis of student behavior in intervention phase sessions by disaggregating data during the GBG (i.e., when the good behavior was being played during small group work at the end of class) and not during the GBG (i.e., when the game was not being played at the beginning of class).
Method
Setting
The study was conducted in an elementary school in the southeastern United States during the fourth- and fifth-grade remedial math class of a self-contained special education classroom for students with EBD. The special education teacher was a White female enrolled in a master’s degree–level teacher preparation program at a local university. At the time of the study, the teacher had less than 2 years of teaching experience. She was teaching on a provisional special education license and had not yet completed traditional license requirements. The primary researcher was the teacher’s advisor in her graduate program. The special education teacher requested support from the primary researcher to implement and evaluate an intervention to address problems in her classroom. Specifically, the teacher reported her students exhibited frequent disruptive outbursts and their “motivation to perform was low.” The primary researcher suggested the GBG because it could be implemented to address class-wide problems.
At the beginning of the study, there were four students in the math class. The primary researcher obtained parental/guardian consent as well as student assent for all four students. However, one student moved to another school after Session 19, which was the third session of the intervention phase; he is not included in this report. Data were collected on the three remaining target students (Ron, Josh, and Deidre). Two new students moved into the classroom after the study began: one arrived during the baseline phase and a second student began attending at Session 26 (she was present for the last 2 days of the first intervention phase). The new students participated in the GBG, but data were not collected on them.
The special education teacher facilitated all classroom instruction and implemented the GBG during the intervention phases. A teaching assistant was often in the classroom. The math class period lasted 45 min and typically began with whole-group instruction led by the teacher followed by small-group instructional activities (students working in groups of 2–3) where the teacher rotated among groups to assist and monitor student behavior. Often, the students had “free time” or “computer time” at the end of the class period.
Participants
The special education teacher provided the following summary information about Ron, Josh, and Deidre. Researchers did not review students’ academic or behavioral records (e.g., functional behavioral assessment results).
Ron
Ron was a fourth-grade, Black male with Attention-Deficit/Hyperactivity Disorder. He received all of his daily instruction in the special education classroom. The teacher indicated Ron performed at a fourth-grade level in math and required little additional instructional support to complete work. However, his teacher also reported he frequently worked ahead of other students and rarely communicated with other students about assigned work when working in small groups. His teacher said he frequently refused to answer teacher and peer questions about math, and he had difficulty working and interacting with peers. Initial anecdotal observations by researchers also indicated that Ron required frequent prompts and redirections to begin and complete work and to follow teacher directions; we observed instances of him mumbling to himself following a teacher’s direction or in response to a peer.
Josh
Josh was a fifth-grade, White male who received special education services for specific learning disability in math. He received most of his daily instruction in the special education classroom but attended related arts classes (i.e., art, gym, and music) with peers in a general education class. The teacher indicated Josh performed at second-grade level for math, and he required substantial instructional supports and struggled to complete work even with additional supports. He consistently refused to complete work, and he rarely answered questions or participated in whole class or small group discussions. Researchers’ initial observations confirmed general low levels of compliance to the teacher’s behavioral and academic directions.
Deidre
Deidre was a fifth-grade, Black female who received special education services under the eligibility category emotional disturbance. She received all daily instruction in the special education classroom. Deidre’s teacher indicated she performed on a fourth-grade level for math. Although math was a relative area of strength for Deidre, her teacher reported high levels of disruptive behavior during math class and throughout the day. According to her teacher, she frequently walked away from class activities (e.g., walking away from the instructional area to talk to the teaching assistant at the back of the room or in an adjacent office). At times, she would leave the classroom without permission. Her teacher also reported Deidre would make disruptive noises and have outbursts to gain attention or when she was frustrated. For example, on a day when the assigned task was difficult, Deidre slammed her pencil on the desk and made an audible noise (e.g., “Ugh!”). We also observed her saying, “This is stupid” in reference to an assigned task.
Materials
Intervention materials included a poster with classroom rules listed, a MotivAider® timer which vibrated on a fixed 2-min schedule cuing the teacher to assign points during the GBG, dry-erase markers for the teacher to write out team names and write team points during the GBG, written instructions for the teacher outlining steps to implement the GBG (given to the teacher during training), and rewards for students (small pieces of candy).
Data Collection and Dependent Variables
Two trained observers from a local university conducted direct observations to collect data on student behavior for two dependent variables: academic engagement and disruptive behavior. We collected data during the first 30 min of class each session (to include most of the instructional activities for the class), which included 10 min when the GBG was implemented during intervention phases. This allowed us to capture a more global picture of the effects of the GBG in this classroom (i.e., not just while the GBG was being played). During baseline phases, observations started at the beginning of class and typically ended after 30 min; however, one observation during the initial baseline phase and one observation during the second baseline phase were each less than 30 min because instructional activities ended early. Therefore, the mean length of observations during baseline phases was 28.96 min, ranging from 20 to 30 min. After learning about the GBG, the teacher elected to implement the GBG during small-group instructional activities which typically occurred at the end of class and was identified by the teacher to be particularly problematic. During intervention phases, observations started at the beginning of class and ended when the teacher announced the end of the GBG. The mean length of observations during intervention phases was 34.10 min, ranging from 29.50 to 44 min. Data were collected 2 to 4 days each week across phases due to class schedules, special activities (i.e., no instructional activities), student absences, and researcher availability. On average, the GBG began approximately 20 min after class began.
Data were collected on all three students simultaneously using interval recording procedures (30-s intervals; momentary time sampling for academic engagement and partial interval recording for disruptive behavior). From a feasibility standpoint, using 30-s intervals allowed us to record data simultaneously for all three students and for both target behaviors and across the entirety of the observation session. Our initial observations (pre-baseline) indicated high and ongoing disruptive behaviors, so we wanted to capture student behavior across all intervals during a session as opposed to using shorter intervals and recoding data on each student in a sequential, round-robin approach (e.g., watch Deidre for 10 s, then watch Josh for 10).
Data were coded for each student, for each interval, and for each dependent variable. Observers practiced and refined data collection definitions and procedures for five sessions in the classroom before beginning baseline data collection. During these practice sessions, observers discussed and came to a consensus about behavioral definitions and examples and nonexamples, revised data collection sheets, and learned students by names. On the last practice session, interobserver agreement (IOA) using interval-by-interval comparisons (i.e., point-by-point agreement) was more than 80% agreement criterion for all students and student behaviors and for all teacher behaviors. Specific information on procedures, behavioral definitions, and how IOA was assessed will be provided in sections to follow.
Academic engagement
Academic engagement was defined as actively or passively participating in an assigned or approved academic activity/task (adapted from Direct Behavior Ratings standard behaviors; Chafouleas et al., 2009). This included (but was not limited to) (a) looking at or having head oriented toward the material, task, or teacher/speaker; (b) making appropriate motor responses (e.g., writing, looking at the teacher or student speaking, appropriately using instructional manipulatives); or (c) asking for assistance in an acceptable manner (e.g., raising hand). Not academic engagement was defined as not participating in an assigned/approved academic activity. Examples included (but were not limited to) (a) inappropriately looking around the room, (b) inappropriately out of seat during an instructional activity, (c) disturbing others or unsanctioned talk to others, (d) engaging in an unapproved or unassigned activity, (e) down time, and (f) student out of the room or instructional area. Not academic engagement also included behaviors that were permissible according to classroom rules but did not meet the definition for academic engagement (adapted from Saudargas & Fellers, 1986), such as out of seat to sharpen pencil, to walk to or from teacher or desk or get materials, getting materials from under desk, or looking away from person talking during class discussion. Any behavior that was included in our list of examples for not academic engagement was treated as exclusive to being engaged (e.g., if a student was out of seat to sharpen pencil but was looking toward the teacher/speaker, the student was coded as not engaged). Observers used momentary time sampling to record academic engagement, and data were presented as percentages of intervals coded with academic engagement.
Disruptive behavior
Disruptive behavior was defined as any behavior that was disruptive to regular school or classroom activity (adapted from Direct Behavior Ratings standard behaviors; Chafouleas et al., 2009). Examples included but were not limited to out of seat (e.g., bottom out of seat or not sitting appropriately in seat), fidgeting (e.g., tapping desk with hands, feet, or tapping a pencil on desk), playing with objects (e.g., rolling pencil or other small object on desk, moving an object in hand or lap—other than instructional materials), aggression (such as pushing another person or throwing an object at another person), laying on the floor, hitting or making noise with an object or body, talking/yelling about things unrelated to the assigned academic task, negative talk, or inappropriate vocalizations. Observers used partial interval recording to record occurrences of disruptive behavior, and data were presented as a percentage of intervals coded with disruptive behavior. Based on our definitions and selected interval recording procedures, a student could have been recorded as exhibiting disruptive behavior and academic engagement in the same interval.
Interobserver Agreement
To assess IOA, a second observer independently and simultaneously recorded data alongside the primary observer for at least 20% of sessions within each phase of the study and for each student with one exception: No IOA data were recorded for Deidre for the second intervention phase due to absences. Percentage agreement was calculated for each IOA session by dividing the number of intervals with agreement by the number of agreements plus disagreements (point-by-point agreement). Across the three students and across all IOA sessions, mean IOA was 96.0% for academic engagement (range: 91.7%–100%) and 96.7% for student disruptive behavior (range: 89.3%–100%). For academic engagement, mean IOA for baseline, intervention, withdrawal, and intervention phases, respectively, were Ron—96.1%, 95.4%, 96.7%, and 97.7%; Josh—97.2%, 94.3%, 93.3%, and 95.5%; and Deidre—94.1%, 95.8%, and 94.1%, with no IOA recorded for Deidre during the second intervention phase. For disruptive behavior, mean IOA across the four phases in order were: Ron—94.4%, 97.1%, 91.7%, and 97.7%; Josh—96.6%, 98.1%, 96.7%, and 97.7%; and Deidre—94.9%, 96.9%, and 93.3%.
Design, Procedures, and Analysis
A within-subject withdrawal design (A-B-A-B) was used to evaluate the effects of the GBG. Individual student data were graphed and analyzed daily. Phase change decisions were response guided and were based on visual analysis of student data within and across phases (i.e., trend, level, and variability). As the GBG is a class-wide intervention, data for all three students were considered in making phase change decisions.
Following visual analysis, effect sizes were calculated and interpreted for each student and for each dependent variable. Specifically, Tau-U was used as the primary effect size of interest to account for observed overlap of multiple data points between baseline and intervention phases and to account for potential baseline trends (Maggin et al., 2019; Parker et al., 2011; Vannest & Ninci, 2015). Each Tau-U effect size was interpreted following the recommendations of Vannest and Ninci (2015). Additional effect size estimates were calculated and presented as supplementary information to guide readers’ interpretations. Specifically, percentage of nonoverlapping data (PND; Scruggs & Mastropieri, 1998) are presented as basic overlap indices and within-case standardized mean differences effect sizes using Hedge’s g are presented, which are analogous to typical effect sizes reported in group research design studies (Maggin et al., 2019).
Baseline
At baseline, the teacher led instruction during whole-group and small-group instructional activities. The lead researcher asked the teacher to facilitate instruction and respond to student behavior in a typical manner. There was a list of classroom rules printed on a large poster board and posted on the front wall. The rules were: “I will: ignore inappropriate behavior, be safe at all times, be respectful, be responsible for myself and property, give 100% effort to complete work, follow first request, keep hands and feet to self, and use kind words.” The teacher reported she taught the classroom rules at the beginning of the school year and frequently referred to classroom rules. The teacher also reported using a token economy (i.e., school “bucks”) to reinforce appropriate behavior, and we did not ask her to discontinue using the token economy. However, researchers never observed the teacher awarding a buck to a student in either baseline or intervention phases. The first baseline phase lasted 3 weeks. One of the students had multiple absences during baseline data collection, which required us to extend the baseline phase much longer than planned.
GBG
Following baseline, the lead researcher met with the teacher for approximately 20 min to train her to implement the GBG. The researcher explained the steps to implement the GBG, provided rationales for the game’s effectiveness, and demonstrated examples of what the teacher should say or do when implementing the GBG. The teacher stated she would like to use the GBG to specifically target students’ behavior during small-group activities, and she decided to use the GBG to narrow the classroom rules to focus on behaviors that were most problematic and relevant for her students during small group work: “I will: be respectful, give 100% effort to complete work, and work with my team.” We did not anticipate the teachers’ change in rules, and we did not collect data or information on students’ adherence to specific rules or rule violations across phases of the study.
During this meeting, the teacher divided students from her current student roster into two teams (consisting of 2–3 students per team) to work together during small group work; the teacher formed teams based on having a mix of ability levels and based on students who the teacher thought generally worked well together. On most days during the intervention phases, Deidre was on a team with one or more students not participating in the study, and Ron and Josh were on the same team when they were both present (sometimes with one or more students not participating in the study joining their team). However, on 4 intervention days, the teacher had to make adjustments to teams when students were absent: On Session 28, Ron and Josh were on different teams made up of one or more additional students; on Session 38, Ron was the only target student present, and he was on a team that day with nonparticipating students; and on the last 2 days of the second intervention phases (i.e., Sessions 42 and 43), Deidre and Ron were on the same team, and the remaining nonparticipating students were on the second team. There was never a team consisting of only one student.
The teacher chose to assign team points for following rules (i.e., positive reinforcement for rule following as outlined in implementation procedures in Tanol et al., 2010) during the GBG rather than to remove points for not following rules. The lead researcher provided the teacher with a MotivAider® set to vibrate silently at 2-min intervals. On days when the teacher implemented the GBG, she wore the MotivAider®, which prompted her to attend to student behavior at the end of each 2-min interval, then to assess whether or not teams were following rules at that moment, and finally to either assign points or provide error correction to teams. We did not collect data on whether or not the teacher accurately assigned points. The teacher initially identified a criterion of each team earning 4 points for teams to “win” the game, but on some days, she set a criterion of 5 points. Her initial decision to set the criterion at 4 points was because she felt it was attainable and realistic for students, and she wanted them to earn the reward. We did not ask the teacher why she set variable criteria (4 or 5 points) each day. She selected edible reinforcers (i.e., candy) as the daily reward for winning teams because it would be easy to give out and would not interfere too much with class activities and student work. We did not ask the students about their reinforcer preferences. We did not instruct the teacher to simply ignore problem behaviors during the GBG. Instead, we instructed her to, when necessary, provide limited attention for rule violations or problem behavior by either ignoring the problem behavior or by providing brief, corrective feedback in a typical instructional voice.
During the next class session following teacher training, the lead researcher introduced the GBG to students and modeled implementing the GBG for the teacher; no data were collected that day. The following day, the teacher implemented the GBG as a practice session; student data were not reported for the practice session because most of the class period was free time. The lead researcher gave feedback to the teacher related to her implementation of the game.
In subsequent intervention sessions, the teacher implemented the GBG during small-group work (typically occurring at the end of the class period). The teacher was instructed to play the GBG for 10 min each day, which was consistent with previously reported procedures for the length of the game (e.g., Tanol et al., 2010). Observers noted during which 30-s interval the teacher began playing the GBG (i.e., when she started the 10-min timer, officially starting the game) and when she verbally announced the end of the game. During the first intervention phase, the mean duration of the game was approximately 10.90 min (ranging from approximately 10 to 16 min). Because the teacher chose to implement the GBG toward the end of class, each day she announced at the beginning of class they would be playing the GBG later that day during group work. Observers also recorded whether or not each team met criteria (“won the game”) each day. The first and second GBG phases lasted 3 and 2 weeks, respectively.
Withdrawal
The GBG was withdrawn for approximately 2 weeks. Each day, the teacher announced at the beginning of class they would not be playing the GBG.
GBG
The GBG was then reinstated for approximately 2 weeks. Each day, the teacher announced at the beginning of class they would be playing the GBG during group work. The mean duration of the game was approximately 10.33 min (ranging approximately from 9.5 to 11 min). Observers recorded winning teams each day.
Follow-up visit
After the completion of the overall effectiveness study, the lead researcher told the teacher she could choose to continue to implement the GBG or not, but the researcher asked the teacher to maintain a record of days she played the GBG after the study was over. Ten weeks later, researchers gathered this information from the teacher and also conducted a follow-up social validity assessment with the teacher and students.
Treatment Integrity Assessed
Treatment integrity was assessed using a checklist that contained procedural steps from Tanol et al. (2010) as well as additional components developed by the research team. Checklist items are depicted in Figure 1. Prior to beginning the study, the lead researcher met with the second observer to discuss items on the treatment integrity checklist. Here, the researchers discussed examples and nonexamples of items on the checklist.

GBG treatment integrity checklist items.
Treatment integrity was assessed each session during baseline and intervention phases and was calculated and reported as a percentage of steps implemented each day. We assessed treatment integrity during baseline and withdrawal phases because some of the procedural steps were ones that a teacher could implement even without playing the GBG. During the first baseline phase, the teacher implemented 6.3% checklist items each day; during each baseline session, the teacher implemented the step, “consistently provides limited attention for rule violations” even before the GBG was introduced. On the day when the teacher practiced implementing the GBG (after training and modeling), she implemented 87.5% checklist items (she did not refer to the classroom rules, and she did not refer to rule-following procedures) which met our training criterion of 80% steps implemented. However, the lead researcher discussed this with the teacher; on all subsequent class sessions, she implemented 100% of steps across both intervention phases. During the withdrawal phase, she implemented a range of 9.1% to 27.3% treatment integrity checklist items each day. On the first day of withdrawal, although she did not play the GBG, she referred to classroom rules, responded consistently to rule following, and consistently provided limited attention for rule violations (three steps implemented, 18.8%). On the second day of withdrawal, she referred to classroom rules, and she consistently provided limited attention to rule violations (two steps implemented, 12.5%). On the third day, she handed out candy at the end of class (as a reward for completing work), but the observer did not record that any of the other steps of the GBG were implemented (one step implemented, 6.3%). On the last 2 days of withdrawal, she consistently provided limited attention to rule violations (one step implemented each day, 6.3%). Interobserver agreement was assessed on treatment integrity data for at least 20% of sessions of each phase and was calculated for each IOA session by dividing the number of agreements on observers’ ratings (i.e., implemented or not) for each procedural step by the number of agreements plus disagreements for procedural steps. Interobserver agreement for treatment integrity was 100% each session.
Social Validity Assessed
To gather information about the social validity of the GBG in this context, the lead researcher interviewed the teacher and asked her to respond to a set of 10 Likert-type items (from Tanol et al., 2010; response options of strongly disagree, disagree, neutral, agree, and strongly agree) and open-ended questions. Items on the social validity teacher interview were as follows: (a) I enjoyed implementing the GBG in my classroom; (b) I plan to use the GBG in my classroom in the future outside of this research study; (c) After using the GBG in my classroom, I was able to see immediate changes in my students’ behavior; (d) The addition of the GBG has improved academics in my classroom; (e) The addition of the GBG has improved behavior in my classroom; (f) The addition of the GBG has improved the atmosphere in my classroom; (g) I found it easy to use the GBG in my classroom; (h) The GBG was a good fit for students in my classroom; (i) Adding the GBG did not interfere with academic instruction and routines in my classroom; and (j) Using the GBG did not take up too much time. The researcher also conducted brief individual interviews with students where she asked them open-ended questions about their perceptions about the GBG (see Table 2 for questions asked). Social validity was assessed with the teacher and students post initial implementation (i.e., after the first intervention phase) and at the 10-week follow-up.
Results
Student Behavior
Percentages of intervals with academic engagement and disruptive behavior are presented for each session, for each student, and across all phases of the study in Figure 2 (graphed by school days because data were not collected every day). Additional missing data points for a student indicate the student was absent for the full or almost all of the class session. Visual analyses of these data suggest the GBG was moderately effective for at least one target behavior for each of the students; however, there was evidence of variability in data and overlapping data in intervention phases compared with adjacent baseline phases for all students. Tau-U (the primary effect size index) ranged from large to very large effects for both behaviors for Ron and Josh and for Deidre’s academic engagement. Supplemental effect size indices (PND and Hodge’s g) provide additional support for the general effectiveness of the GBG. Means, standard deviations, and ranges across phases and effect sizes are presented in Table 1.

Percentage of 30-s intervals students with academic engagement and disruptive behavior.
Phase Means, Ranges, and Effect Sizes for Dependent Variables.
Note. GBG = Good Behavior Game; PND = percentage of non-overlapping data.
Following interpretation recommendations in Vannest and Ninci (2015).
p < .05.
Ron’s academic engagement was variable during baseline (M = 57.8% of intervals observed, SD = 14.3, range: 41.7%–80.0%). With the introduction of the GBG, the level of academic engagement was higher and slightly less variable (M = 82.7%, SD = 10.0, range: 66.2%–94.9%). There was a descending trend in Ron’s academic engagement when the GBG was withdrawn (M = 62.1%, SD = 20.3, range: 40.0%–81.8%) and an immediate increase in level when the GBG was reintroduced (M = 82.1%, SD = 9.9, range: 72.9%–95.4%). Ron’s disruptive behavior was generally high and variable during baseline (M = 42.8% of intervals observed, SD = 12.9, range: 25.8%–56.7%), but there was a decrease in his level of disruptive behavior during the GBG phase, although still variable (M = 16.0%, SD = 11.0, range: 3.4% to 38.2%). When the GBG was withdrawn, Ron’s variability and level of disruptive behavior increased (M = 39.0%, SD = 10.4, range: 13.6%–77.5%), and his disruptive behavior was lower and less variable during the final GBG phase (M = 16.6%, SD = 10.4, range: 7.95%–30.2%). However, near the end of both the first and second intervention phase, there was a slight decreasing trend in Ron’s academic engagement and a slight increasing trend in his disruptive behavior. On Sessions 22 and 37, most of Ron’s disruptive behavior was coded because he was standing behind his seat or leaning back in his seat; on session 24, Ron told his teacher he did not take his medication that day. Tau-U effect sizes were .79 (p = .005) for academic engagement (considered a large effect) and −.81 (p = .004) for disruptive behavior (very large effect).
Josh’s level of academic engagement behavior was higher and less variable during intervention phases when compared with baseline phases (baseline: M = 55.0% of intervals observed, SD = 13.3, range: 40.0%–78.3%; intervention: M = 82.4%, SD = 9.3, range: 71.7%–95.0%). Similar to Ron, he had a descending trend in academic engagement during the withdrawal phase which immediately increased in level when the GBG was reintroduced (withdrawal: M = 55.6% of intervals observed, SD = 15.4, range: 38.3%–71.1%; reimplementation: M = 82.0%, SD = 3.2, range: 78.4%–84.9%). Josh also had lower levels of disruptive behavior during intervention phases compared with baseline phases (baseline: M = 33.6% of intervals, SD = 9.6, range: 18.3%–45.5%; intervention: M = 12.9%, SD = 11.6, range: 1.4%–27.9%; withdrawal: M = 43.2, SD = 18.5, range: 23.3%–70.0%; reimplementation: M = 19.1%, SD = 14.7, range: 1.7%–32.8%). But, similar to Ron, near the end of both intervention phases, there was a slight decreasing trend in Josh’s academic engagement and a slight increasing trend in his disruptive behavior. For Josh, Tau-U was .94 (p < .002, very large effect) for academic engagement and −.76 (p = .003, large effect) for disruptive behavior.
Deidre’s level of academic engagement was higher during intervention phases when compared with baseline phases with a descending trend when the GBG was withdrawn and an immediate increase in level when it was reintroduced (baseline: M = 65.1% of intervals, SD = 12.5, range: 48.3%–78.3%; intervention: M = 85.5%, SD = 6.2, range: 75.0%–90.7%; withdrawal: M = 70.4%, SD = 6.9, range: 61.7%–76.7%; reimplementation: M = 85.9%, SD = 5.0, range: 82.0%–91.6%). On average, Deidre’s disruptive behavior was lower and generally more stable during intervention phases compared with baseline phases but with overlapping data (baseline: M = 22.4%, SD = 20.1, range: 3.9%–47.3%; intervention: M = 15.8%, SD = 14.5, range: 3.4%–52.8%; withdrawal: M = 35.4%, SD = 23.4, range: 10/0% to 66.7%; re-implementation: M = 10.5%, SD = 4.4, range: 5.6%–14.3%). Nevertheless, Deidre’s disruptive behavior was variable during baseline and withdrawal phases. During the two baseline phases, she had three days of low disruptive behavior (Sessions 2, 12, and 33), but the remaining 5 days (Sessions 1, 13, 28, 29, and 32) her disruptive behavior occurred more than 30% of the time. Her disruptive behavior was low during both GBG phases, with the exception of Session 21. On Session 21, much of Deidre’s disruptive behavior was coded because she was leaning back in her seat for many minutes. Tau-U for Deidre’s academic engagement was .95 (p = .001), indicating a very large effect. However, Tau-U for her disruptive behavior was −.32 (p = .280), indicating no effect.
For descriptive purposes, we disaggregated students’ academic engagement and disruptive behavior in intervention phases by behavior during the GBG (i.e., when the game was being played during small group work) and behavior not during the GBG (i.e., when the game was not being played—typically at the beginning of class during whole group instruction and before small group work began; see Figure 3). These data suggest students’ behavior was generally better in intervention sessions, even during portions of the class session when the GBG was not being played when compared with their baseline behavior (but with considerable overlap). Furthermore, their behavior followed similar patterns (e.g., levels, variability) during and not during the GBG, with overlapping data.

Percentage of 30-s intervals with student behavior—during and not during GBG.
Social Validity
The teacher strongly agreed with all Likert-type items on the social validity questionnaire at post initial implementation (i.e., the end of the first intervention phases) and at the 10-week follow-up. According to the teacher’s responses to social validity open-ended questions, at post initial implementation, she indicated she liked the GBG because it was structured, not competitive, and allowed for immediate rewards for students. She also reported she thought it was effective, and she liked playing it during the middle of the class session. She indicated she wanted to continue all components of the GBG.
According to the teacher’s records and reports, she implemented the GBG 5 times over the course of the 10 weeks following the completion of the overall effectiveness study. She indicated some of those times she implemented the GBG during science class (with the same students). At the 10-week follow-up, she indicated she liked the GBG because it helped make expectations consistent and all students understand expectations. She reported she intended to continue to implement the GBG for the next school year and would “definitely start out the year” using it. She again indicated she wanted to continue all components of the GBG. She also indicated she still thought the GBG was as effective as it was initially. However, she did report one negative aspect of the game: Her students sometimes argued about which team they were on. When asked about reasons why she did not implement the GBG on a particular day during the follow-up phase, she said sometimes she did not play the game due to interruptions to the typical class routine such as students not being in class, student outbursts, or testing. Finally, we asked, “When and how often do you think it would be most beneficial/helpful to play the GBG (e.g., every day, occasionally, during trouble spots during the year)?” The teacher indicated she would start out the school year implementing it every day, she would fade it to approximately two times each week once student behavior was “in line,” and she would not tell students ahead of time when it would be played.
Student responses to social validity open-ended questions are presented in Table 2. Responses are presented for post-initial implementation and for follow-up. Their initial responses suggest positive perceptions about the GBG, but at the 10-week follow-up, Ron’s responses to the social validity assessment suggested less positive perceptions with him stating, “It’s not that fun anymore,” and he did not hope his teacher would continue to play the game. Deidre’s responses were neutral at follow-up.
Students’ Responses to Open-Ended Social Validity Questions.
Discussion
The results of this study are important in at least three ways. First, these results provide further support for the GBG as a generally effective intervention for improving individual student behavior in an elementary, special education classroom for students with EBD. Specifically, visual analyses indicated the GBG was somewhat effective for at least one target behavior for each of the three students. Tau-U effect sizes indicated large and very large effects for both behaviors for Ron and Josh and for Deidre’s academic engagement, but the Tau-U effect size for Deidre’s disruptive behavior was in the not effective range. There were immediate improvements in disruptive behavior and academic engagement for all three students when the GBG was first introduced, and there were slight decreases in variability for all three students. Changes in the variability for Deidre’s disruptive behavior were particularly noteworthy. With the exception of Session 21 (when most of her disruptive behavior was coded for her leaning back in her seat), her disruptive behavior was markedly less variable in both intervention phases compared with baseline phases. Furthermore, for all three students, when the GBG was withdrawn, there were descending trends in academic engagement and increasing trends in and more variability in disruptive behavior. When the GBG was reintroduced, improvements were generally replicated for all three students’ academic engagement during the second intervention phase with immediate improvements in behavior, but two students (Ron and Josh) exhibited variable disruptive behavior in the second intervention phase. These same two students exhibited slight decreasing trends in academic engagement and increasing trends in disruptive behavior near the end of the initial 3-week intervention phase and the second intervention phase. Nevertheless, data did not indicate a nontherapeutic effect for any student. Results are consistent with previous research demonstrating the general effectiveness of the GBG across various classroom settings (Bowman-Perrott et al., 2016; Flower, McKenna, Bunuan, et al., 2014). These positive results are particularly meaningful because they provide support the GBG can result in relatively global improvements in student behavior. In this study, we reported data on student behavior during almost the entire class session; this is in contrast to typical research on the GBG where data are reported on student behavior occurring only during times when the GBG is being played.
Results from our preliminary within-session analysis also suggested students’ behavior was generally better in intervention phases during the GGB and not during the GBG when compared with their baseline levels of behavior. Student behavior followed similar patterns during and not during the GBG, with overlapping data. However, these findings should be interpreted with caution because the instructional activities were different during and not during the GBG. Although tenuous, the within-session analysis supports Donaldson et al.’s (2015) hypothesis: “the absence of a worsening effect when the GBG was introduced in one activity and not in another suggests that targeting the most problematic times in the classroom with interventions will not produce increased behavior problems at other times” (p. 689).
Second, despite these general, positive outcomes, there was evidence of high variability and overlapping data across adjacent phases for all three students as well as idiosyncratic responses to the GBG. Perhaps this finding is an artifact of examining more global outcomes in the current study (i.e., behavioral outcomes for almost the entire class session). Although there were immediate improvements in behavior for all three students when the GBG was initially introduced, results indicate both Ron and Josh exhibited decreasing trends in academic engagement and increasing trends in disruptive behavior near the end of the initial 3-week intervention phase and on the last day of the second intervention phase. This observed response variability is consistent with data presented by Donaldson et al. (2017), where two students exhibited immediate improvements during the first intervention phase, but their disruptive behavior became more variable and slightly increasing toward the end of the second intervention phase.
There are several potential explanations for this slight potential deterioration of effects observed for Ron and Josh. These include the following: (a) the novelty of the GBG may have worn off; (b) the reward (candy) may have lost its reinforcing effectiveness; (c) playing the game only a few times each week (as opposed to daily) could have limited effectiveness; and (d) playing the game over the course of several weeks during intervention phases may have limited effectiveness over time.
Third, results from the social validity assessments provide additional information about the GBG as a sustainable intervention approach for students with chronic problem behaviors. Teacher’s and students’ responses to the social validity assessments suggest positive perceptions about the GBG after the initial intervention phase. This is consistent with previous reports of positive social validity information (e.g., Bowman-Perrott et al., 2016). At a follow-up visit 10 weeks after the end of the study, the teacher’s responses to the social validity assessment still suggested she had positive perceptions about the GBG. However, she reported having only implemented the GBG 5 times over the 10 weeks. This is consistent with information reported by (Flower, McKenna, Muething, et al., 2014) where the teacher discontinued the GBG at a 5-week follow-up visit. Finally, at the 10-week follow-up, Ron’s responses to the social validity assessment suggested less positive perceptions, and Deidre’s responses were neutral.
We are unsure how to interpret this social validity information. We relied on teacher report, rather than direct observation, on how often she implemented the GBG after the end of data collection, and we did not collect data on student behavior at follow-up. Furthermore, the fact that she did not implement the GBG that often (on average, once every 2 weeks) does not necessarily mean it was less socially valid (in fact, her responses to our social validity questions at the 10-week follow-up indicated she still had positive perceptions about the GBG). Among other possible reasons, her less frequent use of the GBG may have been a result of her perceived improvements in student behavior or her desire to improve contextual fit for her class by fading implementation. We acknowledge teacher and student responses to our social validity questions may have been biased toward positive responses because the first author served as the teacher’s graduate advisor, and we acknowledge if we had not asked the teacher to record times when she played the GBG, she may not have played it again at all after the end of data collection. In addition, we do not have good rationales for why Ron’s perceptions were more negative at follow-up. And, although Deidre’s responses at follow-up were more neutral compared with her initial post-intervention responses, her responses may have been indicative of an overall general improvement in her behavior and her perception that she did not need the GBG anymore. We hope the social validity information reported here will encourage other researchers to conduct more comprehensive examinations of social validity and how it might change over time, but we caution against drawing broad conclusions at this time.
Limitations
In addition to the limitations discussed earlier, limitations to this study include the following. First, our behavioral definitions were very broad in an effort to capture student behaviors typical for a variety of students; this limited our accuracy in capturing specific, behavioral topographies. Also, there are limitations to our data collection procedures. Our initial goal was to collect data during the first 30 min of class each baseline and intervention session (to include most of the instructional activities each day which consisted of whole-group instruction followed by small-group instructional activities) and to ask the teacher to implement the GBG during the first 10 min of class in intervention sessions. This would allow us to capture a more global picture of the effects of the GBG (i.e., not just while the GBG was being played) in a special education classroom for students with EBD. After baseline data were collected and after we introduced the GBG to the teacher in this study, she asked if she could implement the GBG during the class time when students worked in small groups (which occurred toward the end of the class session) because she felt it was the most problematic activity. To support the needs of the class, we adjusted our research plan to honor the teacher’s request to implement the GBG during small-group activities toward the end of the class period. Doing so negated our ability to conduct a focused, direct examination and comparison of effects occurring during and not during the GBG because we did not record when instructional activities shifted from whole-group to small-group during baseline observation sessions.
Furthermore, we used 30-s intervals for our time sampling procedures; a smaller interval (e.g., 10 or 15 s) may have yielded better sampling of student behavior. Although 30-s intervals are longer than what are typically used in research studies using direct observations, results from comparisons of simulated observational data (Devine et al., 2011) to examine experimental effects support the validity of using 30-s intervals, especially when observation sessions are lengthy (i.e., 30-min vs. 10-min observation sessions). Specifically, Devine et al.’s (2011) simulations indicated recording duration events (e.g., academic engagement) during 30-min observation sessions and using momentary time sampling and 30-s intervals generated data paths that were comparable with data generated using continuous recording. However, unlike simulation studies that support the use of intervals as long as 30 s for momentary time sampling for duration events, simulations support much shorter intervals for partial interval recording (Devine et al., 2011; Rapp et al., 2008). In particular, when compared with observational data captured through continuous recording, changes in frequency events that are measured using partial interval recording are most likely to be detected using shorter intervals (e.g., 10-s intervals). Our use of 30-s intervals for partial interval recording of disruptive behavior likely limited our ability to accurately capture potential changes across baseline and intervention phases for that dependent variable in particular. As a result, we are most concerned about how we collected data on students’ disruptive behavior. In particular, our use of 30-s intervals to record disruptive behavior via partial interval recording could have impacted the accuracy of our data and limited our ability to detect changes in behavior. Furthermore, our selection of interval lengths was based primarily on feasibility rather than on data. For example, as suggested by Ledford et al. (2015), determining actual duration per occurrences of our target behaviors would have enabled us to precisely select more appropriate interval lengths that uniquely matched typical duration of occurrences of our target behaviors.
We did not collect behavioral data on all students in the classroom (e.g., two students moved into the classroom after the study began), so we cannot draw conclusions on class-level effects of the GBG for this type of class setting. We also could not control for potential influences the new students might have had on our target students’ behavior or the teacher’s behavior (although we recognize, in these types of classrooms, students frequently move in and out of the class); these changes in the overall classroom setting could certainly have influenced observed variability in student behavior across phases of the study.
We did not collect data on the teacher’s accuracy in assigning points during the GBG, and we did not collect data on whether the teacher stated they would or would not be playing the GBG at the beginning of class each day. We did not observe any instance (across the study) where the teacher gave out or made a verbal reference to a “buck” from her token economy; however, we did not collect data on this. Our treatment integrity checklist required observers to make subjective judgments about the teacher’s implementation of the GBG. All of these could have influenced internal validity in our study.
We relied on the teacher to report how often she implemented the GBG in the 10 weeks after data collection ended, and we used interview formats to gather social validity information from the teacher and students. Certainly, these limitations make it difficult to make definitive statements related to social validity and the validity of the GBG as a long-term intervention approach. We did not collect direct observation data during the follow-up phase; specifically, we did not collect maintenance or generalization data. This would have given us more information to help us evaluate long-term effects and factors which might have influenced the social validity of the GBG over time.
Implications for Practice and Future Research
Despite limitations, the results of this study highlight important implications for practice and for future research. Results reported here provide additional support for the general effectiveness of GBG, specifically for improving behavior for students who exhibit chronic problem behaviors in EBD classrooms. Nevertheless, even during intervention phases, student behavior was variable from day to day, and Ron’s and Josh’s behavioral trends slightly worsened near the end of the first intervention phase and on the last day of the second intervention phase. Anecdotally, on the last day of data collection and after their team lost the game, Ron put his head down on his desk and Deidre immediately walked away from the instructional area, went into the teacher’s office in the back of the room, and slammed the door loudly. We recommend teachers implementing the GBG in EBD classrooms anticipate day-to-day variability, plan for potential outbursts, and provide preemptive instructions to students on how to respond when their team wins or loses and provide students with feedback on their responses. In addition, teachers should consider when the GBG might be most beneficial for their students. The teacher in this study was most concerned about improving student behavior during small-group instructional activities and therefore selected to implement the GBG during this time only.
Future research is needed to examine strategies to expand the effects of the GBG over time, to other times when the game is not being played, and to reduce day-to-day response variability, particularly for students with long histories of problem behaviors and in classrooms for students with EBD. In their meta-analysis on effects of the GBG, (Flower, McKenna, and Bunuan, et al., 2014) reported longer durations of implementation did not result in greater effects on student behavior, and their analyses suggested a typical “immediate drop in challenging behavior, but only slight additional change over time” (p. 565). For two students in our study, their behavior began to get worse toward the end of the first and second intervention phase. Researchers should examine strategies that reduce this potential deterioration of effects. For example, it may be possible to see meaningful long-term effects by implementing the GBG on an intermittent schedule (as suggested by the teacher in our study) to produce “immediate drops” in problem behavior at different intervals across the school year without deteriorated effects. Furthermore, additional research is needed to expand our understanding of within-session effects of the GBG as well as carry-over of effects to other times of the day, and focused research is needed to examine strategies for improving carry-over effects of the GBG. For example, results reported by Donaldson et al. (2015) and Pennington and McComas (2017) demonstrated improvements in student behavior did not carry over to other classroom contexts. In our study, student behavior was somewhat better in intervention sessions even when the GBG was not being played. However, we need to identify technologies that would allow teachers and students to benefit from effects of the GBG even when the game is not being implemented (e.g., announcing the game will be played later; increasing purposeful precorrection, reminders of classroom rules, and teacher praise). We also need to identify ways to expand and extend the effectiveness of the GBG (e.g., gradually increasing the length of the game or playing the game more frequently throughout the day); guard against losing the effectiveness of the GBG (e.g., adaptations, such as providing different options for reinforcers or considering student preference when selecting reinforcers, to keep students interested and motivated); and make it more likely teachers will continue to implement the GBG over time. Finally, future investigations are necessary to delimitate teachers’ and students’ perceptions, and potential changes in their perceptions, about the GBG if implemented for extended periods of time.
Footnotes
Author Note
Jason R. Gordon is now affiliated with the University of Tennessee, Chattanooga, USA.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
