Abstract
Mathematics fluency is a critical component of mathematics learning yet few attempts have been made to synthesize this research base. Seventeen single–case design studies with 55 participants were reviewed using meta–analytic procedures. A component analysis of practice elements was conducted and treatment intensity and feasibility were examined. Findings suggest that drill and practice with modeling produced the largest effect sizes. Treatments with more than 3 components yielded higher effect sizes than those with fewer than 3, and a combination of treatment agents lead to better outcomes than a single agent. Other findings pertaining to prebaseline assessment, treatment time, experimental design employed, and treatment setting are also discussed.
Many schools are implementing response–to–intervention (RtI) decision–making frameworks to support student learning and as part of the process of identifying students as learning disabled (LD; Individuals with Disabilities Education Improvement Act, 2004). Central to the RtI service–delivery approach is that school professionals have knowledge of the types of instruction and interventions that fall under the evidence–based category (Gresham, 2004). Moreover, implementation of evidence–based interventions within RtI is critical for the data to meet legal and ethical standards necessary to be used for LD identification (Burns, Jacob, & Wagner, 2008). Unfortunately, there are relatively few treatment summaries available to assist school professionals with the selection and identification of empirically supported mathematics computation interventions (Daly & McCurdy, 2002), which represents one potentially significant challenge associated with successful implementation of the RtI framework for mathematic.
A recent report from the National Mathematics Advisory Panel (NMAP, 2008) indicated that children in this country cannot solve single–digit addition, subtraction, multiplication, or division problems as quickly or efficiently as students from other countries. The NMAP postulated that these differences in computational fluency were related to the quantity and quality of practice within the classroom. In other words, teachers do not incorporate enough opportunities to practice basic mathematics skills (Daly, Martens, Barnett, Witt, & Olson, 2007), and few textbook curricula incorporate sufficient activities to facilitate fluent performance of basic computation facts (NMAP, 2008).
Basic fact fluency is an important component of mathematics skill development because it is required for successful independent living (Patton, Cronin, Bassett, & Koppel, 1997) and may serve as the foundation for applications related to time, money, and problem solving (Shapiro, 2004; Daly et al., 2007). Research has suggested that students who lack automaticity for basic facts may be less able to comprehend underlying mathematical concepts or access curriculum that emphasizes a problem–solving approach (Gersten & Chard, 1999), potentially because simultaneously attending to multiple tasks is challenging, particularly if these tasks are complex. Therefore, when skill execution is fast and accurate, attention resources can be allocated to the more complex tasks for processing (Goldman & Pellegrino, 1987). Research with students with specific learning disabilities frequently finds low levels of computation fluency (e.g., Calhoon, Emerson, Flores, & Houchins, 2007; Geary, 1993) and longitudinal research that fluency deficits tended to persist among students with mathematics disabilities (Jordan, Hanich, & Kaplan, 2003). Thus, underdeveloped computation fluency may be a hallmark of mathematics difficulties (Gersten, Jordan, & Flojo, 2005).
Mathematics Basic Fact Fluency Interventions
Research on fluency building interventions highlights practice as a key active ingredient (Daly et al., 2007; Rivera & Bryant, 1992), with some fluency researchers indicating that as much as 70 percent of instructional time be allocated to practice activities (Binder, 1996). This is in line with research that has consistently recommended the provision of drill and practice activities for students struggling with mathematics, including those with learning disabilities (e.g., Ashcraft, 1987; Goldman, Mertz, & Pellegrino, 1986, 1989). Although there are many options for mathematics–practice activities, not all forms of practice are equally effective. Several elements are necessary for practice to be productive including the use of materials at an appropriate level of skill difficulty (Burns, VanDerHeyden, & Jiban, 2006), brief practice opportunities that incorporate modeling, feedback and reinforcement (Daly et al., 2007; Fuchs et al., 2008; Rivera & Bryant, 1992), timed practice (Rivera & Bryant, 1992), and self–management of individual practice opportunities (McDougall & Brady, 1998).
Recent research by Powell, Fuchs, Fuchs, Cirino, and Fletcher (2009) has demonstrated that treatment packages designed to improve automatic fact retrieval that included direct practice on retrieval, timed practice with corrective feedback, and strategic counting surpassed treatment packages that omitted these features. However, these researchers pointed out that the active ingredients in these intervention packages still needed to be determined. Haring and Eaton (1978) suggested that drill and practice are different activities and that each serves a separate role in fluency building. According to Haring and Eaton, drill is the practice of isolated items whereas practice requires the use of learned responses in combination with previously learned responses. The authors postulated that the former may facilitate proficiency and the latter may lead to retention, maintenance, and generalization. However, other researchers have suggested that practice on individual items is the most important aspect of an effective intervention (Cohen, Servan–Schreiber, & McCelland, 1992; Symonds & Chase, 1992). Therefore, identification of the ingredients of practice that are common across computation intervention strategies might prove useful for future treatment planning (Kratochwill & Shernoff, 2004), help to isolate the mechanisms associated with improved learning, and could lead to novel applications (Kazdin, 2008; Skinner, Fletcher, & Henington, 1996).
In addition to examining the effects of mathematics computation interventions on fluent performance, the intensity of treatments is important to consider. Interventions used within tier 2 of an RtI framework should be more intense than core instruction alone, and those used in tier 3 should be more intense than tier 2 (Burns, Deno, & Jimerson, 2007). Intensity can be defined as the total number of weeks or sessions of treatment or variations in the number of treatment sessions per week (Barnett, Daly, Jones, & Lentz, 2004). Another way to determine intervention intensity is to examine the number of treatment elements necessary for treatment success (Barnett et al., 2004). For example, Swanson and Sasche–Lee (2000) found that academic interventions across subject areas with three or more elements produced larger effect sizes than those with fewer elements. Layering various intervention components, such as adding performance feedback to a modeled practice procedure may be necessary for some students, whereas other students may benefit from performance feedback alone. Determining the least intrusive, yet most effective, intervention is not only resource efficient for the school, but also advantageous for the student (Barnett et al., 2004) and for the intervention agent, as easier interventions are more likely to be consistently and accurately implemented (Gresham, 1989).
Intervention intensity could also be measured by examining the individuals responsible for administering the treatment and the setting under which the treatment was provided (Kazdin, 2008). Research has illustrated that teacher preparation and knowledge of fluency and evidence–based interventions is limited (Begeny & Martens, 2006; Fuchs et al., 2008; Kratochwill, Volpiansky, Clements, & Ball, 2007) and that teachers benefit from direct assistance with novel treatment procedures (Gilbertson, Witt, LaFleur, Singletary, & VanDerHeyden, 2007). Therefore, whether teachers, students, researchers, or a combination of these individuals implement treatment procedures may influence treatment effectiveness.
Research Designs
Practitioners should not rely on any single study for policy or practice decisions, but should instead examine meta–analytic studies (Kavale & Forness, 2000). Moreover, the evidence–based practice movement has “afforded meta–analysis a permanent seat at the policy table” (Cordray & Morphy, 2009, p. 489). Previous research syntheses of the mathematics literature identified effective instructional elements that are beneficial to the learning needs of all students, including low–achieving students, those with learning disabilities, and students with emotional or behavioral disorders (Baker, Gersten, & Lee, 2002; Kroesbergen & Van Luit, 2003; Kunsch, Jitendra, & Sood, 2007; Swanson & Sasche–Lee, 2000; Templeton, Neel, & Blood, 2008). Much of this earlier work examined instruction or intervention across several mathematic content areas including basic skills (Baker et al., 2002; Kroesbergen & Van Luit, 2003; Kunsch et al., 2007; Swanson & Sasche–Lee, 2000) and isolated the impact of direct instruction (Baker et al., 2002; Kroesbergen & Van Luit, 2003; Swanson & Sasche–Lee, 2000), strategy instruction (Kroesbergen & Van Luit, 2003; Swanson & Sasche–Lee, 2000; Templeton et al., 2008), and peer–mediated (Kroesbergen & Van Luit, 2003; Kunsch et al., 2007; Templeton et al., 2008) treatment types. Direct and strategy instruction or a combination of the two were found to be highly effective across studies for ameliorating mathematics difficulties and peer–mediated instruction produced moderate effects (Kunsch et al., 2007). Only three of these research syntheses included single–case design studies (Kroesbergen & Van Luit, 2003; Swanson & Sachse–Lee, 2000; Templeton et al., 2008). Of these three, only one focused exclusively on computation (Templeton et al., 2008) and included fluency as a dependent variable.
Templeton and colleagues (2008) examined 15 computation or basic–facts mathematics intervention single–case design studies that targeted students with emotional or behavioral disorders. Although fluency was included as a dependent variable, it was combined with accuracy to examine the impact of treatments on general mathematic performance. Results suggested studies including strategy instruction or nontraditional instructional delivery (e.g., cooperative learning, peer tutoring) resulted in larger effect sizes than not including these factors. In addition, studies that did not include an environmental accommodation (e.g., choice, feedback) also yielded higher effects than studies that did include such components. The only study or participant variable that impacted treatment effects was whether the primary focus of treatment was mathematic performance. Maintenance outcomes, when provided, were poor but for the few studies that assessed generalization, outcomes were good. Templeton et al. (2008) used percentage of nonoverlapping data points, a single–case design effect sized metric that has been criticized by some as being influenced by outliers (Parker & Hagan–Burke, 2007) and its inability to differentiate between treatments (Parker, Hagan–Burke, & Vannest, 2007).
Study Purpose
Given the limited evidence available on mathematics fluency interventions, the need for fluency interventions in schools, and the need to clearly define interventions used within an RtI model, we conducted the current study to identify components of effective mathematics fluency interventions. Thus, we used meta–analytic procedures to synthesize the extant single–case design literature on mathematics computation interventions that address fluency. Our study extends the work of Templeton et al. (2008) by including all students struggling with computation fluency rather than isolating those identified as having emotional and behavioral disorders and by limiting our study to computation fluency only. Additionally, we chose to examine the number and type of fluency intervention components rather than evaluate the effects of strategy instruction, environmental accommodations, and delivery of instruction as did Templeton and colleagues. We examined only students within elementary school (through sixth grade) whereas Templeton et al. included students at the secondary–school level as well. We also used an alternative to nonoverlapping data—percentage of all nonoverlapping data (PAND; Parker et al., 2007)—as our statistic of interest. PAND is a nonparametric effect size that includes all data and can be converted to a phi coefficient. Finally, we analyzed the impact of type of initial assessment data collected and single–case design on treatment effectiveness.
The study was guided by the following research questions that were based on questions used in the Swanson and Sachse–Lee (2000) meta–analysis: (1) what effects were associated with various forms and types of practice; (2) how do the number of treatment components impact treatment effectiveness; (3) to what extent does treatment intensity impact treatment effectiveness; (4) to what extent does treatment effectiveness vary according to type of initial assessment data collected, treatment agent, or treatment setting; and (5) how does type of single–case design impact treatment outcome?
Method
Data Collection
The PsycINFO and Education Full Text electronic databases were searched for articles in December 2008 using the terms “digits” and “correct” (357 found), “math” and “fluency” (239 found), “multiplication” and “intervention” (72 found), “division” and “intervention” (113 found), “addition” and “intervention” (121), and “subtraction” and “intervention” (109). The following criteria were used to compare articles in order to find data usable for the current meta–analysis:
The study implemented a mathematical computation (addition, subtraction, multiplication, or division) intervention. The study used a single–case experimental design and provided data for individual students. The student participants in the study were elementary–age students between grades 1 and 6. The study included baseline data presented in a digits–correct–per–minute (dcpm) metric. The study was published or in press in a peer–reviewed journal. The study presented quantitative data that could be used to compute effect sizes. The study was written in English.
After identifying 19 articles appropriate for the review from the electronic searches, the references of identified articles were reviewed to identify potential additional articles as recommended by Cooper (1998), yielding nine additional articles. However, of the 28 identified articles, 5 used a group of students as the unit of analysis within the single–case design, 5 used a metric other than dcpm (e.g., percentage of problems answered correctly), and 1 did not implement an intervention. Thus, a total of 17 studies met the inclusion criteria and were included in the meta–analysis. Only one of these studies overlapped with those articles identified by Templeton et al. (2008).
The current study used only single–case design studies because data were needed for individual students and to replicate the approach used by Swanson and Sachse–Lee (2000). Meta–analyses of single–subject research are controversial primarily due to the lack of accepted effect size metric (Baron & Derenne, 2000), but recently developed approaches discussed later alleviate some of that concern. Moreover, our design involved identifying the level of skill before intervention for each individual student, which are data not presented in a group design (Burns et al., 2007).
The articles used in this study were also used in a previous meta–analysis (Burns, Codding, Boice, & Lukito, 2010), but the focus of that study was to examine the treatment by skill interaction between dcpm and treatment provided. In that analysis, we found that interventions characterized as addressing acquisition skills yielded a large effect size for students’ whose fact performance fell in the frustration range while fluency interventions yielded only a moderate effect size, partially supporting the treatment by skill interaction. Only a few studies implemented fluency interventions rendering interpretation of students with instructional–level skills difficult. The intention of the current study was to conduct a component analysis of these interventions as a first step toward identifying active treatment ingredients, as well as to examine treatment intensity. The methods described in this article are similar to those indicated in the previous article, but these analyses and findings are novel.
Coding
Studies that met inclusion criteria were systematically reviewed using a data–coding form we created (available upon request). The four–page data–coding form was developed to facilitate accuracy and consistency throughout the coding process and included participant variables, intervention variables, and study variables. Participant variables included the age and grade of participants and participant ethnicity. Intervention variables included: intervention description and components (e.g., modeling, drill, practice, reinforcement), intensity (i.e., number of minutes, weeks, and sessions per week), intervention agent (e.g., teacher, graduate student, or researcher), and intervention setting (e.g., classroom, library). Study variables included the initial (prebaseline) assessment of skills, single–case research design, intervention fidelity, interobserver agreement (IOA), and intervention acceptability. The first and second authors coded the included studies. Table 1 displays intervention and participant variables across included studies.
Description of Included Studies across Treatment Type, Component, and Category, Treatment Agent, Treatment Setting, and Treatment Sessions
Note: CBA–ID: Curriculum–based Assessment–Instructional Design; CBM: Curriculum–based Measurement; GLMPT: Great Leaps Mathematics Placement Test (Mercer et al., 2002); IBTS: Iowa Tests of Basic Skills (Hoover, Dunbar, & Frisbie, 2001); KMDT: KeyMath Diagnostic Arithmetic Test (Connolly, Natchman, & Pritchett, 1976); WJ: Woodcock–Johnson Psychoeducational Battery (Woodcock & Johnson, 1977); WJ–R: Woodcock–Johnson Psychoeducational Battery Revised (Woodcock & Johnson, 1989); WRAT–R: Wide Range Achievement Test–Revised (Jastak & Wilkinson, 1984); IR: incremental rehearsal; CR: contingent reinforcement; CCC: cover–copy–compare; PF: performance feedback; FC: flashcards; GL: great leaps; TP: taped problems; SM: self monitoring; I: interspersal; PT: peer tutoring; WU: warm–ups; W: weeks; S: sessions. w/= with; w/o = without.
Intervention Variables
There were eight interventions used by the various studies. Seven studies used some version of cover copy compare (CCC). CCC provides students’ with complete learning trials through the completion of five steps: (1) look at a model of the mathematics problem with the answer included, (2) cover the mathematics problem with the answer, (3) record the problem with the answer, (4) uncover the mathematics problem with the answer, and (5) compare the answer (Skinner, McLaughlin, & Logan, 1997). Three studies used a flashcard procedure such as incremental rehearsal, a sequenced drill procedure that controls the ratio of known to unknown facts practiced (Burns, 2005). The following interventions were used by one study each, Great Leaps for Math (Mercer, Mercer, & Campbell, 2002), interspersal technique (i.e., independent worksheet practice where instructional–level items are interspersed with easier items), self–management (i.e., self–monitoring), contingent reinforcement for increased performance, taped problems (students are required to record answers to mathematical problems before the audio–recording reveals the answer), and timed warm–up probes. One study used a combination of timings, peer–delivered immediate feedback, and positive practice overcorrection. Interventions were analyzed according to their individual components and then divided into four categories: (1) practice with modeling, (2) practice without modeling, (3) drill, and (4) self–management. Practice that incorporated self– or teacher–directed modeling (e.g., CCC) was grouped as practice with modeling and practice activities without modeling (e.g., interspersal, contingent reinforcement) were classified as such. Treatments that included repeated practice of individual items (e.g., incremental rehearsal) were included as drill (Cohen et al., 1992).
We examined effect size according to these four categories as well as by number of components included using the categorical division of three steps over or under baseline conditions previously described by Swanson and Sasche–Lee (2000). The number of treatment components ranged from one to six and included the following types of elements: modeling, prompting, error correction, performance feedback, reinforcement, overcorrection, goal setting, drill, timed practice, and manipulatives.
We also analyzed effect sizes according to treatment intensity defined as the total sessions of treatment. Five studies reported total weeks, 11 reported the number of sessions per week, and all reported total number of sessions. Eleven studies reported a combination of sessions per week, total sessions, or total weeks.
Treatment setting and agent were reported in all 17 studies. Setting varied across studies and included the classroom (n= 8), an empty room (n= 3), resource room (n= 2), an office/conference room (n= 2), library (n= 1), and computer lab (n= 1). Researchers or research assistants (graduate or undergraduate students) employed treatments in six studies, teachers served as the treatment agent in three studies, and students directed their own intervention in two studies. A combination of treatment agents was utilized in six studies and included the following: (1) researcher and student (n= 3), (2) teacher and student (n= 1), and (3) teacher and researcher (n= 2).
Study Variables
The 17 articles used in the meta–analysis were coded according to whether or not assessment data were collected before the intervention occurred and the type of assessment used. Seventy–five percent (n= 13) of the studies were coded as “Using Assessment Information” if they used assessment data to identify students for inclusion in the study and/or to determine which skill to address with the intervention. Of the 13 studies, 5 used curriculum–based assessment or measurement (CBA/M) to identify students for participation by collecting preintervention data in the skill and comparing them to a screening criterion (e.g., less than 20 dcpm), 5 used a standardized mathematical achievement test (e.g., KeyMath–3, Connolly, 2007; Great Leaps Placement Test, Mercer et al., 2002), and 3 used a combination of data (i.e., CBM and standardized achievement test).
Forty–one percent (n= 7) of studies used some type of multiple baseline (MLB) design (i.e., across students, problem sets, multiple probe) and 47 percent (n= 8) used an alternating treatment design (ATD). The remaining two studies employed either a quasi–experimental design (AB) or a combination of MLB and ATD.
We coded for treatment integrity, acceptability, and IOA even though these variables were examined only descriptively. Twelve studies reported treatment integrity but only 11 studies provided data. Data pertaining to IOA was provided by 14 studies. Only three studies reported that acceptability was examined.
Data Analyses
There is considerable debate about effect sizes for single–case design research. The percentage of nonoverlapping data (PND; Scruggs, Mastropieri, & Casto, 1987) is perhaps the most straightforward, easy to compute, and commonly used single–case design effect size (Scruggs & Mastropieri, 1998), but PND does not differentiate well between two interventions (Parker et al., 2007). Moreover, the lack of a known sampling distribution prevents computing confidence intervals, PND can be too heavily influenced by a single outlying data point (Parker & Hagan–Burke, 2007), and PND is not a measure of effect, nor is it related to any effect size estimate (Parker et al., 2007).
PAND (Parker et al., 2007) is a nonparametric effect size that is based on Cohen's (1988) description of the relationship between effect sizes and percentage of data that do not overlap. PAND alleviates many of the difficulties associated with PND, and is computed with all of the data, not just a single baseline data point (Parker et al., 2007). More importantly, PAND can be converted to a phi coefficient, which is a commonly accepted effect size with a known distribution from which a confidence interval can be computed.
Because there is no single effect size that is clearly superior to others when synthesizing single–case design studies, we converted the data to PND and PAND, and then used the PAND to compute phi. Data were converted to a PND by counting the number of intervention data points that exceeded the highest baseline data point and dividing by the total number of intervention points. PAND was computed by counting the number of baseline data points and intervention data points and dividing the number of intervention points that overlapped the highest baseline data point by the total number of data points. PAND equaled the percentage of total overlapping data points by the total number of data points. PAND was converted to phi using the formula presented by Parker and Hagan–Burke (2007). We also computed a mathematic fluency gain score by subtracting the first baseline data point score from the median of the intervention data, reflecting the increase in dcpm scores per intervention.
Because PND and PAND are nonparametric data, we reported median scores for those data, along with a mean phi coefficient, and the 95 percent confidence interval around the mean phi based on the standard error of the means. Due to the relatively small sample sizes, we also provided a median dcpm gain score. A median PND and PAND of. 80 were judged to be an effective intervention (Burns & Wagner, 2008; Scruggs & Mastropieri, 1998). A phi coefficient of. 29 or less was considered negligible,. 30 to. 49 was small,. 50 to. 69 was moderate, and. 70 or higher was strong (Cohen, 1988). Phi was primarily interpreted because it is a commonly accepted effect size, but neither PND nor PAND are, and because confidence intervals can be computed for phi, which greatly enhances the interpretability of the data (Hedges & Olkin, 1985). Moreover, gain scores can be problematic to interpret, especially with small samples (Norman, 1989). Thus, gain scores are included to assist in interpreting the findings, but the research questions are examined with the phi coefficient.
There was a relatively small number of studies from which mean phi coefficients were computed. Thus, a Fail–Safe N for effect sizes (Orwin, 1983) was computed to determine the potential influence of a file–drawer problem (Rosenthal, 1979). The criterion phi for a negligible effect of. 29 was used in the formula. The result was the total number of studies that would have to be found with in order to change the resulting observed mean phi to one with a negligible magnitude.
Interscorer Agreement
A random subsample of 35 percent (n= 6) of all identified studies was independently coded by a third rater (the third author) to evaluate the interrater agreement relative to coding procedures used by the first and second authors. Interscorer agreement was examined for each item coded. The following formula was used to determine interobserver agreement: Agreement Rate = (No. of observations agreed upon/total no. of observations) × 100 percent. Reliability across categories ranged from 81 to 97 percent with an overall average agreement of 90 percent.
Results
Descriptive Data
Fifty–five participants (M= 3.6, SD= 1.5) were included in the 17 studies with 24 boys and 25 girls (two studies did not report gender). Participants’ average age was 10 (SD= 1.32) and average grade was 4 (SD= 1.3). Nine studies reported the racial composition of their sample: five studies included participants from multiple races, three studies included African American participants, and one study included Caucasian participants.
All students included were identified as performing below expectations for mathematics with 10.9 percent identified with a learning disability in mathematic, 12.7 percent labeled with a behavioral disorder, 21.8 percent identified as mentally retarded, and 54.5 percent of the students were not identified with any disability. The median baseline score for students in second grade was 8.75 dcpm, 10.00 dcpm for third–graders, 6.50 for fourth–graders, 24.00 for fifth–graders, and 28.63 for sixth–graders. The baseline dcpm score for students in second, third, and fourth grades fell within a frustration range, and the data for the fifth– and sixth–graders fell at the lowest extreme of the instructional range (Burns et al., 2006).
The median gain scores for each grade were 4.38 dcpm for second grade, 8.00 dcpm for third grade, 10 dcpm for fourth grade, 7.50 dcpm for fifth grade, and 18.25 for sixth grade. A nonparametric Kruskal–Wallis analysis of these data found a nonsignificant effect, X2(df = 4) = 7.66, p=. 11. Therefore, gain scores were collapsed across grade for subsequent analyses.
Of the 11 studies providing data, treatment integrity was examined across an average of 26 percent of sessions (SD= 10.0) and the average score was 98 percent (SD= 3.0). The average percentage of sessions across the 14 studies for which IOA was conducted was 28 percent (SD= 7.8) and percentage agreement was 99 percent (SD= 0.9). Only three studies reported acceptability data and all three reported that treatments were acceptable.
Research Questions
Effect sizes were computed across the four types of practice, the number of treatment components, and total treatment sessions. Analyses were also conducted to determine the extent to which treatment effects varied according to type of initial assessment data collected, treatment agent, and treatment setting. The final set of analyses examined the impact of single–case design on treatment outcomes.
Treatment Categories
Table 2 provides the median gain score, PND, PAND, and phi coefficients for each of the five treatment categories. The plurality of effect sizes (45 percent) computed from these studies was categorized as practice with modeling. Practice with modeling and drill produced strong effect sizes and self–management yielded a moderate effect size. Practice without modeling, which constituted 31 percent of all effects, produced negligible effect sizes.
Median Gain in Digits Correct per Minute (dcpm) and Effect Sizes According to Type of Practice
Treatment Components
In order to examine the impact of number of treatment components on effect size, we used a criterion of <3 or ≥3 steps over baseline (Swanson & Sasche–Lee, 2000). Table 3 illustrates that, as expected, fewer treatment components produced negligible effects whereas more components produced a moderate effect size. That said, for the majority of effect sizes computed (76 percent) treatment incorporated three or more components.
Median Gain in Digits Correct per Minute (dcpm) and Effect Sizes According to Number of Treatment Components
Treatment Intensity
The total number of sessions (M= 27.0, SD= 8.7) was used to determine treatment intensity and was categorized with the same heuristic that was used by Swanson and Sachse–Lee (2000). Thus, the studies were coded as less than 29 total sessions or at least 30 total sessions. As shown in Table 4, a large effect was found for studies that used 29 or fewer intervention sessions. A small effect was found when intervention sessions exceeded 30.
Median Gain in Digits Correct per Minute (dcpm) and Effect Sizes According to Number of Intervention Sessions
Prebaseline Assessment
Table 5 presents results illustrating whether assessment of skills and/or skill levels prior to baseline impacts effect size. Findings suggest a small effect size was yielded for studies that included no prebaseline assessment or used CBA/M. A cautionary note is that the CBA data represent only three (5 percent) students. A moderate effect was found for indirect mathematical achievement tests and a large effect was found for the combination of CBA/M and indirect mathematical achievement tests.
Median Gain in Digits Correct per Minute (dcpm) and Effect Sizes According to Prebaseline Assessment Conducted
Treatment Agent
Table 6 presents effect sizes examined according to the person that implemented the intervention. Small effect sizes were yielded when the teacher implemented the treatment alone but large effects were found when students implemented their own treatment. Researcher–implemented interventions resulted in a small mean effect with a confidence interval that included zero. Combinations of treatment agents produced moderate to large effect sizes. Researchers and students produced moderate effect sizes but the combination of teacher and student or teacher and researcher led to large effects.
Median Gain in Digits Correct per Minute (dcpm) and Effect Sizes According to Treatment Agents Responsible for Employing Interventions
Treatment Setting
As illustrated in Table 7, moderate to large effects were found across treatment settings with two exceptions. A negligible effect size was yielded for the computer lab, and a large but negative effect size was found for library. The classroom was the most common treatment location (47 percent of all effects computed). Treatments employed in the resource room produced the largest effect sizes.
Median Gain in Digits Correct per Minute (dcpm) and Effect Sizes According to Setting the Treatment was Implemented
Experimental Design
Table 8 examined whether effect sizes varied according to single–case design employed. A negligible effect was found for AB designs. Small effect sizes were found for ATD and large effects were found for MLB. The combination of MLB and ATD produced a moderate average effect size. There were relatively few effect sizes derived from MLB and ATD combinations, and the confidence interval was large, but the lower end of the confidence interval exceeded the criterion for a negligible effect (φ=. 29).
Median Gain in Digits Correct per Minute (dcpm) and Effect Sizes According to Single–Case Experimental Design Employed
Discussion
The purpose of this study was to use meta–analytic procedures to evaluate the extant literature on mathematics basic fact interventions that focus on building fluency, defined as digits correct per minute (dcpm). This research synthesis was guided by the following questions: (1) What effects were associated with various types of practice? (2) Do the number of treatment components impact treatment effectiveness? (3) Does treatment intensity impact treatment effectiveness? (4) Does treatment effectiveness vary according to type of initial assessment data collected, treatment agent, or treatment setting? and (5) Does type of single–case design impact treatment outcome?
Practice has been described as a key ingredient of interventions targeting computational fluency (Binder, 1996; Fuchs et al., 2008; Daly et al., 2007; Rivera & Bryant, 1992) and for those with specific mathematics learning disabilities (Ashcraft, 1987; Goldman et al., 1986, 1989). In fact, Fuchs et al. (2008) described both drill and practice as critical elements of effective intervention for students with mathematic challenges. There are many forms of practice that have been utilized across treatments employed to target fluent performance of basic mathematic skills including practice with and without modeling, drill which incorporates practice on individual skill items, and self–management of practice activities. The results of this analysis suggest that drill and practice with modeling produce the largest treatment effects with self–management also yielding moderate effects.
Perhaps somewhat unexpected was that treatments incorporating practice without modeling, as in the case of interspersing easy with more difficult items on worksheets or using contingent reinforcement to increase the number of dcpm within a practice session, yielded no effects. This latter finding may have been related to students’ initial average baseline performance. That is, the average dcpm for students in grades two and three was below 14 dcpm, and for students in grades four and above, it ranged from 6 to 29 dcpm, suggesting that performance for a majority of participants fell in the frustration range (Burns et al., 2006). Therefore, these treatment elements alone may not be effective for students falling substantially behind their grade–level peers. This is commensurate with research by Tournaki (2003) who found that adding demonstration to standard drill and practice procedures resulted in better outcomes for students with learning disabilities, and with research suggesting that students with low fluency benefited more from a treatment including modeling than one without (Codding et al., 2007). Not surprising and consistent with the meta–analysis from Swanson and Sasche–Lee (2000), we found that the greater number of treatment components provided, the larger the effect size. This may suggest that students with fluency rates falling below expected levels benefit from more intense treatment packages (Barnett, Daly, Jones, & Lentz, 2007).
In addition to identifying effective practice procedures for fluency–building activities, we examined treatment intensity by quantifying the number of treatment sessions and found large effects for studies that used fewer than 30 intervention sessions and a small mean effect for more than 30 sessions. These data should not be interpreted to mean that adding intervention sessions reduces effectiveness, because an important intensity variable missing from our data is the number of minutes treatments were employed in one session. Number of minutes of treatment ranged between 3 and 30; however, not all studies isolated the treatment time from class instruction or assessment. Moreover, it could be that the interventions were stopped at 29 or fewer sessions because the intervention was effective and the deficit was deemed to be sufficiently remediated. Treatment dose is a critical variable when considering limited school resources (e.g., time, personnel), leading some researchers to suggest that decisions regarding which treatments are implemented should be made according to learning rates (Skinner, 2008). That is, the preferred treatment is the one that produces functional levels of fluency with the most efficient use of instructional time. Experimental manipulation of time allocated to treatment per session, sessions per week, or total time to a desired criterion is clearly needed in future research that evaluates treatment effectiveness on computation fluency.
In order to examine feasibility factors of treatment implementation, we examined effect sizes according to treatment agent, setting, and type of prebaseline data collected to identify skill level or study appropriateness. The studies examined herein used a variety of initial assessment methods for determining target skill selection including use of CBA, computation CBM, and standardized achievement tests. Small effect sizes were produced when no prebaseline assessment was used or when CBA or CBM procedures were implemented before baseline data were collected. The small effect sizes found when CBA/M were used could be due to the lack of empirically derived criteria for making instructional decisions. That is, only recently did Burns and colleagues (2006) empirically derive reliable and valid criterion levels for grades 2 through 5. Moderate effects were yielded when standardized achievement tests were administered and large effect sizes were found when both CBA/M and standardized measures were implemented. It is possible that a combination of assessment procedures was useful to provide an initial screening of students’ performance because, as previously noted, students’ basic skills were low. Although students in fifth and sixth grade generated performance levels that fell within the low end of the instructional range (Burns et al., 2006), basic fact fluency is expected to be mastered by the end of fourth grade indicating that these students were experiencing significant skill deficits perhaps characteristic of students in need of the most intensive levels of support (Batsche et al., 2005; Shapiro, 2004).
A combination of treatment agents yielded the two largest effect sizes. Specifically, when teachers and students or researchers and teachers worked together to implement intervention procedures, large effect sizes were found. Teachers implementing treatments alone resulted in small effect sizes, suggesting that teachers benefit from having additional support to employ treatment strategies whether it be from researchers or the students themselves. This finding is not surprising and reflects treatment integrity research illustrating the added benefit of providing direct assistance to teachers in order to enhance treatment implementation (e.g., Gilbertson et al., 2007). When students implemented their own intervention, a large effect size was also demonstrated, illustrating the benefits of students’ gaining responsibility over their own learning (McDougall & Brady, 1998). This finding holds promise for including time and resource efficient practice activities in classrooms. However, this benefit of student–directed interventions contrasts with Templeton et al. (2008), who found no differences between student– and other–directed interventions. Their findings may be the result of the population of students investigated or because all combinations of intervention agent were not analyzed. In the present study, only a small effect size was rendered when the researcher or a combination of the researcher and student employed the intervention jointly. Previous research has demonstrated that CBM performance is better under more natural school conditions and with teachers (Derr–Minneci & Shapiro, 1992), so perhaps this logic could serve as one explanation for these findings.
Moderate to large effect sizes in the expected direction were generated across locations of treatment implementation (resource room, office, empty classroom, classroom) with the exception of the computer lab and the library where effect sizes were negligible or large in the unexpected direction (students performed worse), respectively. It is important to point out that only a small number of students received interventions in these ineffective settings and the results are likely due to other treatment or study variables. For example, the intervention administered in both settings was characterized as practice without modeling, which was found to be a less effective practice strategy. It is possible that treatments employed in students’ classroom settings would produce effects that transferred to classroom mathematic activities more readily than treatments provided in an office, but we did not analyze performance across these dimensions. That said, Templeton et al. (2008) similarly found that setting did not impact mathematic treatment effects.
Because there are several methodological challenges associated with academic intervention research (Daly, Hintze, & Hamler, 2000; Martens & Eckert, 2007), we examined the impact of single–case design employed on effect size. As might be expected, multiple–baseline designs yielded the largest effect sizes, followed by the combination of MLB and ATD. ATD produced small effects and AB designs led to no discernable effects. MLB may be particularly suitable for examining treatment effects as treatments are implemented in a sequenced manner across students or sets of problems. Although ATD seems to be a theoretically logical choice that is advantageous for comparing treatments, carryover and practice effects are likely (Martens & Eckert, 2007), particularly in mathematics (Lee & Tingstrom, 1994). As illustrated in a recent special series, adapted changing criterion or adapted alternating treatments designs may offer other useful alternatives (Martens & Eckert, 2007).
These findings should be considered in the context of the study limitations. First, the dcpm criteria for inclusion identified frustration and instructional levels through grade 5, and we included students in grade 6 using the same criteria. Second, we did not evaluate effect size across other dimensions of practice such as cumulative or sequenced practice sessions or number of complete learning trials (Daly et al., 2007; Skinner, 2008). Third, this analysis includes only basic fact intervention studies that used single–case designs and therefore may not be representative of all basic–fact fluency research. Fourth, we did not examine maintenance or generalization of treatment effects. Fifth, although we provided descriptive evidence of treatment integrity, IOA, and acceptability, these variables were not included in the meta–analysis. Sixth, there could be unidentified overlap in the components reviewed. For example, it could be that studies that used students as the treatment agent also extended over the greatest number of weeks. Thus, the effect sizes computed from relatively few studies should be interpreted cautiously. Seventh, although we found categorization of studies into practice with and without modeling, drill, and self–management straightforward according to review of the methods from individual studies it is possible that we omitted other key components that could have led to different groupings. Finally, some of the fail–safe numbers were relatively low, indicating greater confidence in some of these findings than others.
The conditions under which practice activities that did not incorporate modeling are effective for fluency building should be examined in future research. Other elements of practice could be operationalized in empirical studies. It is still unclear how many opportunities to respond are required for students to gain fluent computation performance that meets an established criterion. Furthermore, practice conducted in the context of word problems was not examined and recent work by Fuchs et al. (2009) has suggested that skill transfer to algebra and word problems as well as fact retrieval and procedural calculation may only occur when direct instruction and practice on word problems is provided. Although we found that more treatment components led to better outcomes, the majority of students’ skills fell in the frustration range. Whether the number of treatment components needed could be reduced depending on students’ level of skill proficiency is worthy of future study. Perhaps, as suggested by Skinner (2008), research that explicitly manipulates the time or dose of treatment will provide the most useful information for determining which intervention strategies to use. In this analysis, we were only able to examine the association of dose with effect size. Synthesizing the maintained and generalized effects of treatments that target computation fluency should be considered in future meta–analyses, and a greater number of studies need to incorporate these variables in the study design. The type of prebaseline assessments necessary for administration should be a consideration of future research to provide a clearer explanation of our findings. Future researchers might consider our findings on design in the construction of their studies. For example, MLB or other adapted designs might be selected over ATD or researchers should incorporate design elements that minimize carryover and practice effects (Barlow & Hersen, 1984; Martens & Eckert, 2007).
In summary, this research synthesis suggests that practice with modeling, drill, and those treatments with more than three components were highly beneficial for the included study participants, who tended to exhibit basic computation fluency that fell in the frustration range or was unexpectedly low considering students’ grade level. A combination of treatment agents generally led to better treatment effectiveness (although students’ own treatment implementation also yielded good outcomes) and most settings used were conducive, at least immediately, to good outcomes. For these studies, it appears that less than 29 total treatment sessions yielded better treatment effects and studies that employed MLB designs produced strong effects. A combination of CBA/M and standardized indirect assessment methods administered before baseline produced the largest treatment effects.
