Abstract
Science, Technology, Engineering, and Mathematics (STEM) education initiatives have placed pressure on teachers to bring technology tools into classroom, including three-dimensional (3D) printing. Yet, little research has examined what specific math skills are required for 3D printing technology. This article describes a follow-up analysis of findings from a quasi-experimental study that tested feasibility of 3D geometry instruction, Anchored Instruction with Technology Applications (AITA), designed to help students visualize and construct 3D models based on Enhanced Anchored Instruction. Although we found that AITA improved math outcomes of students with math learning disabilities (MLD) in the previous analysis, we only used composite scores encompassing a variety of math and spatial tasks. In this study, we employed item response theory and differential item functioning to examine the impacts of MLD on students’ spatial thinking skills, understand the types of items to assess the intended skills in a valid way, and provide a detailed information of whether student ability and MLD status have caused different results to assess students’ spatial thinking skills. Results showed that students with MLD struggle to learn spatial thinking skills, and AITA was a significant positive indicator to improve spatial thinking skills for both students with and without MLD.
Keywords
Geometry is a prominent focus of math curriculum (National Council of Teachers of Mathematics [NCTM], 2000; National Governors Association Center for Best Practices, Council of Chief State School Officers, 2010). More specifically, the NCTM recommends that K–12 math instructional programs focus on the identification of geometric shapes and spatial relationships; application of transformation of and symmetry; and use of visualization, spatial reasoning, and geometric modeling to solve problems. The final report of the National Mathematics Advisory Panel (NMAP, 2008) emphasized the importance of the spatial skills needed for teaching and learning geometry, and the proficiency standards measured by the National Assessment of Educational Progress (NAEP) suggest that students should be able to use informal geometric concepts in solving problems using technological tools such as computers and geometric shapes in eighth grade. Furthermore, research supports that students’ spatial skills predict their future success in Science, Technology, Engineering, and Mathematics (STEM) coursework (Uttal & Cohen, 2012; Wai et al., 2009), which promotes access to career and life opportunities.
Technology, increasingly prevalent in our lives, is another significant focus in our education system that has indelibly changed instructional activities across the nation. The market for K–12 software has exceeded 8 billion dollars in the United States. Moreover, families, schools, and governments increasingly value technology as a central part of education (Bulman & Fairlie, 2016). However, Escueta and colleagues (2017) noted that although most agree that educational technology can be helpful under some circumstances, education researchers are far from reaching consensus on what types of educational technology are most worth investing in and in which contexts to use them. The speed at which new technology tools and technology-based interventions reach the market seems to outpace the ability that education researchers and policy makers can keep up with in terms of conducting rigorous evaluations of such technologies to inform policy and practice (Molnar, 2017).
Three-dimensional (3D) printing technology, for example, has been widely used in school settings as a tool with promising potential for engaging students in STEM activities through technology-rich, hands-on interaction (Lacey, 2010; Murray, 2013; Sheridan et al., 2013, 2014). With numerous STEM education initiatives, 3D printing technology has provided an opportunity for teachers to bring cutting-edge technology into classrooms to replace traditional hands-on manipulatives for teaching geometric skills. However, little research has been conducted to investigate the extent to which teachers and students benefit from the use of 3D printing technology in their classrooms, and limited research has been published that demonstrates the effectiveness of 3D printing technology on student academic outcomes, especially for students with or at risk for math learning disabilities (MLD; Buehler et al., 2016). Although most education researchers agree conceptually that technology tools like 3D printing have potential utility in school settings, data are not well established that demonstrate the feasibility, usability, and promise of 3D printing technology for instructional facilitation and student learning.
Experts support that an anticipated problem with expanding 3D printing technology to the field of education is that teachers and students may not have the computer-aided design (CAD) expertise and skills to create a digital model from scratch in software (Elrod, 2016; Pryor, 2014), and how to leverage geometric measurement and spatial skills to facilitate this process. Of the limited studies conducted with 3D printers, the majority employed a focus on the procedural skills necessary to create 3D products rather than teaching the underlying essential geometric concepts and spatial thinking skills that facilitate the creation of 3D products (Butler et al., 2003; Maccini & Ruhl, 2000; Scheuermann et al., 2009). Without teaching, the specific geometry standards and spatial thinking skills that are called for in math standards (e.g., National Governors Association Center for Best Practices, Council of Chief State School Officers, 2010) and providing opportunities for students to apply these skills in structured learning contexts, students with or at risk for MLD will be more likely to struggle to apply school math in solving authentic problems and have difficulties in seeing spatial relationships in real-world situations (Lipson, 2007).
Other factors that may cause low utility of 3D printing technology especially in special education instructional settings include an unsupported line of research, limited professional development or workshop opportunities, and lack of curriculum-based instructional resources. Moreover, the focus of math interventions for students with or at risk for MLD has been limited to basic facts or simple computation and using drill-and-practice for remedial instruction (Sarama et al., 2011). This predominant instructional approach has resulted in students with or at risk for MLD having little to no formal instruction and guided support to develop their spatial skills and to apply formal geometric education to real-world skills, and has caused them to lack the knowledge and skills necessary to fully engage in STEM activities.
Host Study Overview
To address the emerging need to provide teachers with instructional materials for teaching spatial thinking and real-world geometry skills, we developed and tested the feasibility of a technology-based contextualized geometry intervention, called Anchored Instruction with Technology Applications (AITA; Choo, 2017). Based on the premise that technology is an essential tool for teaching math that can improve students’ geometry outcomes (Johnson et al., 2015) and spatial thinking skills are sufficiently malleable to make teacher instruction practically feasible (Uttal et al., 2013), we integrated 3D printing projects into an existing Enhanced Anchored Instruction (EAI; Bottge et al., 2010) curriculum that teaches the standards for mathematical practice that are part of the Common Core State Standards for mathematics (CCSS-M) with a specific focus on improving spatial thinking skills for students with MLD. In addition to utilizing EAI’s video-based anchored problem component (see Bottge et al., 2014, 2015), in the host study we developed a series of technology-based geometry lessons that employed explicit instruction to teach foundational concepts of basic geometry and contextualized instruction using 3D printing projects to teach CAD skills.
We conducted a quasi-experimental study to test AITA’s feasibility and usability compared with EAI and business-as-usual (BAU) instruction in the classroom settings with 90 middle school students including those with MLD (Choo, 2017). We provided teacher training prior to the study and daily lesson plans that included lesson objectives, detailed instructional activity manuals, and other materials needed to implement the lessons. Teachers assigned to the EAI condition followed Bottge and colleagues’ (2014, 2015) EAI curriculum, and teachers assigned to the BAU condition followed their regular school math curriculum. All conditions addressed the same CCSS-M standards across three domains: Number System, Ratios and Proportional Relationships, and Geometry. The overall mean instructional dosage was 40, 38, and 39 days (one 60-min lesson per day) for BAU, EAI, and AITA classes, respectively.
We conducted a total of 19 full class period observations: six in AITA classrooms, seven in EAI classrooms, and six in BAU classrooms, distributed across the study. Observation data included general information for each classroom (e.g., school, teacher, number of students), minutes of instructional time, and ratings of student engagement across conditions. For EAI and AITA conditions, we evaluated the level of treatment fidelity by rating alignment to prespecified lesson plans. For the BAU condition, we described instructional activities to evaluate content alignment with EAI and AITA conditions. Four observations involved a secondary observer to index reliability (one in AITA, two in EAI, and one in BAU), which accounted for 21.1% of the total number of observations. Interobserver agreement values were 98% for the AITA condition, 99% for the EAI condition, and 98% for the BAU condition.
Researcher developed tests of student’ spatial thinking and problem-solving skills and two standardized achievement subtests that focused on computation and problem-solving were administered at pretest and posttest to compare the effects of three different instructional approaches on students’ math skills. Although baseline equivalence was not established for analytic groups at pretest and effect sizes differences do not meet the What Works Clearinghouse (WWC) criteria, pretest differences were controlled for in the analysis (i.e., statistical adjustment; WWC, 2017). Two-level hierarchical linear models were used to analyze the test data and results demonstrated statistically significant positive effects of both AITA and EAI lessons more than BAU for improving the spatial thinking and geometry skills of middle school students with and without MLD. We concluded that 3D printing technology can be used to reshape instructional activities to deliver geometry instruction as a part of the school curriculum when explicit instruction and contextualized approaches are systematically integrated with standards-aligned 3D printing projects and implemented by well-trained teachers (Choo, 2017).
Although we observed statistically significant moderate increases in student math learning following the implementation of AITA compared to other intervention approaches with moderate effect sizes ranging .38–.41 (Hedge’s g), the host study relied heavily on researcher developed measures to assess students’ spatial thinking and geometric skills because of the lack of availability of commercially available measures of these skills. In addition, the host study did not examine whether student learning occurred as a function of the AITA intervention or other factors that could have influenced observed effects including MLD status and individual student ability. Further analysis is needed to examine the mean differences in math learning among the math interventions and the role of compounded error in the composite score used as the primary outcome in the study. In addition, further analysis is needed to determine whether mean differences among the interventions is reflective of students’ abilities or MLD status, or if some underlying bias influenced different responses.
Purpose of the Current Study and Research Questions
The purpose of this study was to conduct a post hoc evaluation of the item-level characteristics of the researcher developed measures using item response theory (IRT; Lord & Novick, 1968). We also conducted differential item functioning (DIF) analysis to examine whether assessment items were equivalent or comparable between groups with different characteristics (i.e., MLD vs. no MLD). The content of the spatial thinking assessment was of particular interest to us given that students with or without MLD did not outscore significantly on the outcome measure, despite the findings of the host study that students taught with AITA in resource room scored significantly higher than students who taught with the AITA in inclusive classroom and those taught with other math interventions in any settings. Other student characteristics, such as gender or race/ethnicity, are not formally analyzed in this study. Taken into account the current context, we propose that DIF analysis would capture measurement bias and develop a valid tool in estimating the spatial thinking skills of students with or at risk for MLD.
In this study, we posed three research questions:
We propose that these research questions would (a) explain the impacts of MLD on students’ spatial thinking skills, (b) provide a detailed information of whether student ability and MLD status have caused biased results to assess students’ spatial thinking skills, and (c) guide the future research by providing recommendations on the selection of a set of STAT items or different spatial ability measure that are most informative and least biased in the subscale level.
Method
Participants
A total of 90 students from six middle schools participated in the host study (Choo, 2017). Schools were located in central Kentucky, and only one classroom from each school participated in the study. Of the six classrooms, three were seventh-grade inclusive math classrooms cotaught by one math teacher and one special education teacher, and the other three were eighth-grade math resource rooms taught by one special education teacher. After removing two students due to missing data, data from 88 students comprise the analytic sample. Half of the participants (n = 44) were identified as being with or at risk for MLD, which was defined as either having an Individualized Education Plan (IEP), specific math learning accommodations through Section 504 plans, or being placed in Tier 2 supplemental math program. The other half of the sample (n = 44) did not have any identified disabilities or MLD. For all participants, MLD status was constant across the intervention period. The majority of participants (n = 74, 84.1%) were White, whereas 14 participants (15.9%) identified with other racial/ethnic groups. Just over half of the participants (n = 49) were male.
Study Conditions
All classrooms were assigned to their first preferred condition among the three treatment conditions (two in BAU, two in EAI, and two in AITA) based on data from a preference survey completed by participating teachers and their principals. Instruction in all conditions took place during regular school hours. All students in the same classroom received the same assigned instruction; no student was pulled out of his or her classroom to receive different math instruction during their math class period throughout the study. All students who were identified as being with or at risk for MLD in this study received additional Tier 2 supplemental math instruction outside of their regular math class period in lieu of one of their elective classes (e.g., Kentucky System of Interventions; Kentucky Department of Education, 2012).
BAU condition
Two teachers assigned to the BAU condition followed their regular math curriculum, which was aligned with the CCSS-M (National Governors Association Center for Best Practices, Council of Chief State School Officers, 2010) and their districts’ curriculum guides prior to the study. According to participating teachers’ surveys, lesson plans, instructional materials, and classroom observation records, BAU teachers often used explicit instruction, modeling, individual work, and collaborative group work. During the study period, BAU lessons addressed seventh-grade Geometry standards (7.G.1-6) and reviewed Number System and Ratios and Proportional Relationships standards as well. BAU instructional activities included warm-up questions to check basic computation skills or to review math concepts that were taught in the previous lessons, worksheets or teacher-made math packets, hands-on materials, and technology tools (e.g., computer, interactive whiteboard). The schedules of BAU instruction in schools were matched to EAI and AITA conditions; class periods of the BAU lessons were typically 60 min per day over an average of 40 school days.
EAI condition
Two teachers assigned to EAI condition used two instructional units from the EAI package developed by Bottge and his colleagues (2010) during the study period: Fraction of the Cost (FOC) and Hovercraft (HOV). Based on the CCSS-M (National Governors Association Center for Best Practices, Council of Chief State School Officers, 2010), these two EAI units addressed seventh-grade standards in Geometry, Number System, and Ratios and Proportional Relationships. The average number of instructional days were 9.5 days for the FOC unit and 28.5 days for the HOV unit (38 days total) in a 60-min class session per day.
AITA condition
Two teachers assigned to AITA condition used the FOC unit from the EAI curriculum and two instructional units that were developed through the feasibility study (Choo, 2017): Flatland (FL) and 3D-Hovercraft (3D-HC). The FOC unit used in the AITA condition was identical to the FOC EAI unit. FL and 3D-HC units were developed to teach seventh-grade Geometry standards with an explicit focus on drawing geometric figures (7.G.1-2) and solving real-life and multi-step mathematical problems involving two- and three-dimensional objects (7.G.4-6). Other math standards such as Number System and Ratios and Proportional Relationships were indirectly addressed throughout the instructional activities. The average number of instructional days were 10 days for the FOC unit, 9.5 days for the FL unit, and 19.5 days for the 3D-HC unit, in a 60-min class session each day.
The first unit of AITA, FL, includes a series of video-based instruction and project-based learning activities. The FL lesson objectives are to understand basic geometric concepts and apply the concepts for solving contextualized problems using CAD software (SketchUp, Version 2016) that is available for educators without any cost. The FL unit begins with a 30-min movie, Flatland (Flat World Productions, LLC, 2007). The story is based on the 1884 science fiction novella Flatland: A Romance of Many Dimensions written by Edwin A Abbott. Throughout the movie, two main characters, Square (two-dimensional (2D) figure) and Sphere (3D figure), explore geometric concepts. In the study, students first watched the movie and were asked to (a) construct 2D figures they saw in the movie using CAD software; (b) construct 3D figures and calculate their surface areas and volumes; and (c) construct a 2D car first and then make it 3D to be 3D printable.
Instructional activities of the FL unit include warm-up questions, an explicit instructional approach to teach geometric concepts, and solving geometric problems related to the video story. For example, students learned how to identify irregular polygons and regular polygons, how to calculate perimeter and area with 2D figures, and how to calculate volume and surface area with 3D shapes (see Figure 1). Students then used CAD software to construct geometric figures they had learned in the previous lessons. The FL unit also included instructional activities focused on basic dimensioning skills. Students were taught to identify and use a specific CAD tool for each dimension, explain the procedures of CAD techniques, and construct sample CAD models. Students also received introductory level instruction to operate 3D printers. Once the students completed their 3D model, students used 3D printing software to preview their 3D models and print their final 3D products using 3D printers (see Figure 2).

AITA worksheet sample of FL unit.

AITA student work sample of 3D printed cars.
The second unit of AITA, 3D-HC, integrates 3D printing projects with the academic contents of HOV unit from EAI curriculum. The main lesson objective of the 3D-HC is similar to the HOV unit (i.e., constructing a HC rollover cage), but the difference is that 3D-HC requires students to use CAD software in planning, drawing, and constructing a 3D rollover cage and to print out their 3D-HC models with 3D printers. More specifically, 3D-HC lessons are focused on teaching students to design CAD models of HCs on all three x-, y-, and z-axes, whereas HOV lessons of EAI consist of constructing 2D hand-drawing designs on graph paper. The 3D-HC unit specifically addresses dimension concepts and measurement skills, and the instructional activities focus on teaching one-dimensional (1D), 2D, and 3D and measurement skills using CAD software tools to draw geometric shapes with given conditions (e.g., 7.G.2; see Figure 3). In this study, at the end of the unit, students constructed larger sized HCs using polyvinyl chloride (PVC) pipes similar to the HOV unit of the EAI curriculum.

AITA sample models of 3D hovercrafts.
Measures
In our host study, we used two proximal and two distal outcome measures to measure students’ math learning. The proximal outcome measures were the Spatial Thinking Ability Test (STAT) and Problem-Solving (PS; Bottge et al., 2014, 2015), both of which were researcher developed. The distal outcome measures were the Iowa Test of Basic Skills (University of Iowa, 2008), Math Computation subtest (ITBS-C) and the Math Problem-Solving and Data Interpretation (ITBS-PS). Each of the participating teachers administered all measures over 3 consecutive days right before (i.e., pretest) and after (i.e., posttest) they delivered the intervention associated with their assigned study condition. The first author independently scored each test, and a second rater who had scoring experience in large-scale EAI studies also independently scored 20% of the pretests and posttests across conditions. Interrater agreement was 96%–100% across tests. For this study, to evaluate the item-level characteristics of students’ spatial thinking ability, we examined student performance data from the STAT and ITBS-C.
STAT
The STAT (published by Association of American Geographers, 2006) was developed and validated by researchers (Lee & Bednarz, 2009, 2012) to assess students’ problem-solving skills in reading a map, determining a location based on given information, and differentiating among spatial data types. STAT consists of 16 dichotomous items (i.e., correct or incorrect answers) as multiple-choice questions. Closely aligned with the CCSS-M middle school geometry standards, questions require students to visually navigate a road map using verbal information, mentally visualize a 3D image based on 2D information, and identify real-life examples based on picture examples of zero dimension (0D), 1D, or 2D. Internal reliability estimates obtained in this study were .80 at pretest and .88 at posttest.
Iowa Tests of Basic Skills Math Computation (ITBS-C)
The ITBS-C test (Form C, Level 12; published by University of Iowa, 2008) consists of 30 dichotomous items as multiple-choice questions to assess students’ computation skills in four basic operations (i.e., addition, subtraction, multiplication, division) with whole numbers, fractions, and decimals. Out of nine addition questions, there are two for whole numbers, four for fractions, and three for decimals, which is the same distribution used for the nine subtraction questions. Out of eight multiplication questions, three are for whole numbers, two are for fractions, and three are for decimals. Out of four division questions, three are for whole numbers and one is for decimals (there are none for fractions). Internal reliability estimates obtained in this study were .89 at pretest and .93 at posttest.
Data Analysis
Item parameters and person abilities were examined for the posttest STAT using IRT. DIF analysis was conducted to examine if students with MLD responded to the 16 items on the STAT differently than students without MLD. DIF is a statistical method used for examining whether test items are equivalent or comparable between groups that have different, unique characteristics, such as females and males. If an item is identified as showing DIF, it implies that the two groups respond to the item differently and the psychometric properties vary by group characteristics (Ercikan, 2002). The differences between groups on a certain item may result either from actual differences in abilities demonstrating understanding of the construct or from collectively different responses to the items that are related to the secondary traits, unrelated to the construct being measured.
If items showing DIF measure secondary traits, the traits may be irrelevant to the targeted ability, and the scores from those items may not be validly interpreted at demonstrating comparable ability across groups that have the same scores. Clauser and Mazor (1998) discussed the use of statistical procedures to identify differentially functioning test items, claiming that a test item is biased when it unfairly favors one group over another. If items show DIF, they are not likely to measure the intended abilities or construct but the secondary or irrelevant abilities. These items are considered measurement error or bias that impede test fairness (American Educational Research Association [AERA]; American Psychological Association [APA]; National Council on Measurement in Education [NCME], 2014). To make a test valid and fair, items are analyzed with DIF analytic methods to identify statistically significantly responses between groups that vary on some characteristics, and content experts judge the items showing DIF to decide whether they are biased or not (Buzick & Stone, 2011; Zieky, 1993; Zumbo, 1999).
In this study, the IRT parameters of the posttest of STAT were analyzed with Rasch dichotomous modeling. Then DIF was conducted to examine how differently students with and without MLD responded to the test items in the test. Through the analysis, eight out of 16 items were selected. The eight-item parameters of post-STAT were anchored to calibrate the participants’ spatial thinking abilities in the pretest to have the item parameters of both the posttest and the pretest at the same scale. If items or persons are anchored, the mean and standard deviation of the parameters are calibrated from the same data (Embretson & Reise, 2000). With the item parameters anchored in the posttest, the pretest person abilities were calibrated to see the changes of the participants between two periods. The students’ abilities were compared on their changes in the pretest and posttest after receiving one of three interventions, BAU, EAI, or AITA. In this study, the items of the posttest were anchored because the same test items were used in both tests.
By reducing items showing DIF in the measure of interest, the test will measure the spatial thinking skills of the participants without bias that may provide more valid and equitable interpretations of the scores for the participants. The fewer number of items in a test will also reduce the burden of students who take the test and enable teachers to measure students’ skill learning in a shorter time. The TAM package in R (Robitzsch et al., 2018) was used to fit a marginal maximum likelihood estimation of a set of Rasch models.
Results
The data from 88 participants enrolled in the host study completed the STAT and ITBS-C pretest and posttest and are included in the analytic sample for this study. Table 1 presents the mean scores and paired samples t-test with the effect size, Cohen’s d, between participants with MLD and without MLD. Independent samples t-tests were conducted to compare the mean differences between the groups. All mean differences in STAT and IOWA Math Computations between the two groups were significant. In the pretest of STAT, there was a significant mean difference in the scores between the MLD participants (M = 4.80, SD = 2.278) and the non-MLD participants (M = 6.14, SD = 2.958), t(86) = 3.529, p = .001. The effect size was almost large, Cohen’s d = .75. In the posttest, there was also a significant mean difference between the MLD participants (M = 4.80, SD = 2.278) and the non-MLD participants (M = 7.84, SD = 2.10), t(86) = 3.097, p = .003. The effect size was moderate to large, Cohen’s d = .659. The two groups also showed the mean score differences in IOWA Math Computations. In the pretest, there was a significant mean difference in the scores between the MLD participants (M = 13.41, SD = 5.571) and the non-MLD participants (M = 20.67, SD = 5.145), t(84) = 6.268, p < .001. The effect size was large, Cohen’s d=1.354. In the posttest, there was also a significant mean difference between the MLD participants (M = 15.43, SD = 6.638) and the non-MLD participants (M = 21.77, SD = 4.533), t(86) = 5.232, p < .001. The effect size was large, Cohen’s d = 1.115.
Comparing Mean Scores in STAT and IOWA_MC by MLD Status.
Note. MLD = participants with math learning disability; non-MLD = participants without math learning disability; CI = confidence interval; LL = lower limit; UL = upper limit; STAT = Spatial Thinking Ability Test; ITBS-C = Iowa Tests of Basic Skills Math Computation.
Table 2 shows the correlations between demographic variables of interest (gender and MLD status) and test scores. Correlations between gender and test scores were not statistically significant; however, MLD status was negatively correlated with all measures. In other words, if participants were identified as having MLD, their test scores were significantly lower than those of the participants without MLD at both time points for STAT and ITBS-C.
Correlations Among Gender, MLD Status, and Test Scores.
Note. MLD = participants who have math learning disability; pre-STAT = pretest of Spatial Thinking Ability Test; Pre-ITBS = pretest of Iowa Tests of Basic Skills Math Computation; Post-STAT = posttest of Spatial Thinking Ability Test; post-ITBS = posttest of Iowa Tests of Basic Skills Math Computation.
p < .01; **p < .001.
IRT Results of the Posttests From Rasch Modeling
Unidimensional Rasch model was conducted to estimate the parameters for a set of 16 items in the posttest STAT data set. Cronbach’s coefficient alpha using classical test estimation was .64, [.54, .75]. The IRT EAP (expected a posteriori) reliability, which is one estimate of overall instrument reliability under the IRT modeled score, was .57. These two estimation indices indicate the measure is moderately reliable, although .8 or higher would be preferred.
Good item fit was observed according to the item statistics, with all 16 items ranging within tolerance fit statistics. All the items ranged within 3/4 to 4/3 INFIT statistics. INFIT is “a mean-square weighted statistic based on the squared standardized residual between what is observed and what would be expected on the basis of the model” (De Ayala, 2013, p. 52). There is not an established rule of thumb for interpreting the item-fit statistic. One guideline claims that values inside 3/4 to 4/3 INFIT statistics are preferred (Wu et al., 1998), and less than 5% of items out of tolerance is a strong fit. Using this as a guideline, we concluded there was good item fit for the current data set.
DIF Analysis
DIF was conducted to examine whether performance on any of the items differed between students with and without MLD after controlling for differences in person location (De Ayala, 2013). The mean differences of EAP person ability between two groups were tested using independent samples t-tests. The mean EAP person abilities were −.29 for the participants with MLD and .27 for those without MLD, t(86) = 3.21, p < .01. The significant mean differences in estimated person abilities in both groups indicate the two groups had significantly different capacities in performing spatial thinking skills.
To conduct a DIF analysis, the variable “MLD” was set up as a facet and the IRT analysis was rerun. Table 3 presents the results of DIF calibration and the items that were flagged accordingly. A significance test was conducted by dividing the interaction term by the standard error, and Table 3 shows z-values of each STAT item. The z-value larger than 2 indicates the two groups showed significantly different responses as a whole to the given item. Of the 16 items, four items (Items 1, 3, 7, and 9) were flagged with larger than 2 z-values. For the participants with MLD to endorse the right answers, Items 1 and 3 were collectively much more difficult and Items 7 and 9 were collectively much easier than those without MLD.
Results of Differential Item Functioning Analysis for Post-STAT.
Note. STAT = Spatial Thinking Ability Test; theta = person ability; SE of theta = standard error of theta.
Selecting Items and Geometry Standards
In addition to the flagged items (Items 1, 3, 7, and 9) based on the z-values, we examined Lee and Bednarz’s (2012) detailed spatial thinking components and the grade-level appropriateness of each STAT item. We excluded Items 4, 6, and 8 because difficulty indices indicate these items were too complex for middle school students to comprehend. Items 9, 10, 11, and 12 were also excluded because the spatial thinking measure was intended to assess proximal intervention effects, but these items do not have direct relevance to AITA curriculum or grade-level standards. After taking standard alignment into account, we included Items 1 and 3 for the final analyses because of their content similarity to another item (e.g., Items 1 and 2; Items 3 and 5). Finally, Items 1, 2, 3, 5, 13, 14, 15, and 16 remained in the analyses of the STAT data set.
Items 1 and 2: If you are located at Point 1 and travel north one block, then turn west and travel three blocks, and then turn south and travel two blocks, you will be closest to point (Item 1); and if you are located at Point 1 and travel west one block, then turn left and travel three, then turn west and travel one block, and then turn right and travel four blocks, you will be closest to point (Item 2). To solve Items 1 and 2, students should be able to visually navigate road maps using verbal information including current location, directions to destination, or street information (Lee & Bednarz, 2012). These items measure (a) dimensioning skills of spatial thinking components (e.g., graphing points in the first quadrant of the coordinate plane, interpreting coordinate values of points in the context of the situation), (b) orientation and direction skills (e.g., forward-backward, left-right, up-down, back-front, horizontal-vertical, north/south/east/west), and (c) mathematical problem-solving skills (e.g., representing real world and mathematical problems).
Items 3 and 5: If you draw a graph showing change of Texas annual precipitation between A and B, the graph will be _____ (Item 3); and imagine that you are standing at location X and looking in the direction of A and B. Among five slope profiles (A~E), which profile most closely represents what you would see (Item 5)? To solve Items 3 and 5, students should be able to recognize map patterns and represent them in graphic form, and create a profile of topography along a proposed line on a contour map (Lee & Bednarz, 2012). These items measure (a) understanding of cross-section concepts (e.g., 7.G.3: describe the 2D figures that result from slicing 3D figures, as in plane sections of right rectangular prisms and right rectangular pyramids) and (b) being able to transform perceptions, representations, and images from one dimension to another (e.g., cross-sections of 2D images transformed to 3D images). Moreover, the items partially addressed 7.G.1-2 (e.g., describing geometrical figures and the relationships between them).
Items 13, 14, 15, and 16: locations of weather stations in Washington County _____ (multiple choice: Lines, Area, Points and Lines, Points and Area; Item 13); Mississippi River channels and their basins _____ (multiple choice: Lines, Area, Points and Lines, Lines and Area; Item 14); shuttle bus route of the Lincoln Elementary School _____ (multiple choice: Points, Area, Points and Lines, Points and Area; Item 15); and places that can be reached by Franklin County fire engines in 5 min or less _____ (multiple choice: Points, Lines, Areas, Points and Lines; Item 16). To solve Items 13, 14, 15, and 16, students should be able to visually extract types of spatial data from verbally expressed spatial information (Lee & Bednarz, 2012). These items measure (a) comprehending integration of geographic features represented as points, networks, and regions and (b) comprehending spatial shapes and patterns. These items also partially addressed 7.G.4-6 (i.e., solve real-world and mathematical problems involving area, surface area, and volume).
Anchoring and Comparison Between the Pretests and the Posttests
With the parameters of the final eight items anchored in the posttest of STAT, the person parameters of the pretest of STAT were estimated, and the differences between the pretest and posttest were compared by setting the item difficulty at the same scale through anchoring. Cronbach’s coefficient alpha using classical test estimation of the pretest was .45 [.28, .62]. The IRT EAP reliability was .36. Cronbach’s coefficient alpha using classical test estimation of the posttest was .62 [.50, .74]. The IRT EAP reliability was .59.
A two-way between subjects analysis of variance (ANOVA) was conducted to examine the extent which participating students with versus without MLD made the same gains on the test of spatial thinking skills during the study. Each student in the sample participated in one of three interventions: BAU, EAI, and AITA. Results from the ANOVA indicated that the effect of MLD on the person abilities in the pretest of STAT was F(1, 82) = 11.62, p < .01; the effect of intervention was F(2, 82) = 6.16, p < .01; and the interaction effect of MLD and intervention was F (2, 82) = .42, p = .66. The results of the effect of MLD on the person abilities in the posttest of STAT indicated that the effect of MLD was F(1, 82) = 9.6, p < .01; the effect of intervention was F(2, 82) = 12.59, p < .001; and the interaction effect of MLD and intervention was F (2, 82) = 6.06, p < .01.
Tables 4 and 5 present the mean spatial thinking abilities in the pretest and posttest of the eight-item STAT. MLD status was a significant predictor of participants’ abilities of spatial thinking skills at the pretest and posttest, such that students with MLD had lower abilities of the targeted skills than those without MLD at both pretest and posttest. The mean of the spatial thinking abilities of students with MLD at pretest was −.30 (SD = .12), whereas the mean of those without MLD at pretest was .30 (SD = .13). Similarly, the mean of the spatial thinking abilities of students with MLD at posttest was −.29 (SD = .15), whereas the mean of those without MLD was .27 (SD = .09).
The Mean Spatial Thinking Abilities on the Pretest and Posttest of eight-item STAT I.
Note. Standard deviations are in parentheses. STAT = Spatial Thinking Ability Test; MLD = participant with math learning difficulties; non-MLD = participant without math learning difficulties; BAU = business as usual; EAI = Enhanced Anchored Instruction; AITA = Anchored Instruction with Technology Applications.
The Mean Spatial Thinking Abilities on the Pretest and Posttest of eight-item STAT II.
Note. Standard deviations are in parentheses. STAT = Spatial Thinking Ability Test; MLD = participant with math learning difficulties; Non-MLD = participant without math learning difficulties; BAU = business as usual; EAI = Enhanced Anchored Instruction; AITA = Anchored Instruction with Technology Applications.
A significant effect of the intervention was also observed. In the pretest eight-item STAT, the mean of spatial thinking abilities enrolled in BAU was −.40 (SD = .14), the mean of the abilities in EAI was .30 (SD = .17), and the mean in AITA was .18 (SD = .15). At posttest, the mean of spatial thinking abilities enrolled in BAU was −.53 (SD = .17), the mean of the abilities in EAI was .15 (SD = .15), and the mean in AITA was .39 (SD = .10). A significant interaction effect of MLD and intervention was found at posttest, but not at pretest. At pretest, the participants with MLD who were enrolled in BAU had the lowest mean spatial thinking abilities (M = −.59), and those without MLD enrolled in EAI had the highest mean abilities (M = −.07). At posttest, the participants with MLD who were enrolled in BAU had the lowest mean spatial thinking abilities (M = −1.02), and those without MLD enrolled in EAI had the highest mean abilities (M = .47).
Discussion
This study examines changes in students’ abilities of spatial thinking skills following math intervention in one of three conditions—AITA, EAI, or BAU—in the context of a larger host study that sought to examine the effects of the interventions for middle school students with and without MLD. Taking a measurement approach, this study focused on examining differential effects of the intervention for students with MLD versus students without MLD. The results of a proximal measure of spatial thinking skills (STAT) were analyzed by means of IRT and DIF analysis to examine whether there is potential bias in assessing student math learning that may have concealed effects for students with MLD versus those without MLD. Because the host study demonstrated greater increases in students’ math learning in the resource (special education) classroom compared with the inclusive (general education) classroom, especially for students in the AITA condition, we were concerned that the lack of significant differential effects in student math learning by disability status was due to measurement error or bias.
We extended our analysis of results from the larger feasibility study to select relevant items of the STAT and to determine if the improvement in student outcomes is reflective of students’ abilities or if some underlying bias influenced different responses. Findings indicate that students demonstrate significant DIF across MLD status after controlling for person ability. Independent samples t-test results reveal that the abilities of students with MLD were significantly lower in performing the spatial thinking skills than those of students without MLD across the treatment conditions. This finding contradicts our previous research in the host study that student characteristics, including disability status, did not significantly predict any of the math outcome measures. Based on the analyses conducted here, we conclude that the skills of spatial thinking, like those of working memory, number facts, computation, and problem solving, appear to be another mathematical area that students with MLD struggle to learn.
With respect to differences in performing spatial thinking skills between students with and without MLD, four of the STAT items indicated the presence of significant DIF. The results indicate that some items in the measure may not assess the intended construct (i.e., spatial thinking skills) in a valid way across groups due to irrelevant contents. With a reduced set of items after DIF analysis and content evaluation according to the geometry standards, we recalibrated the parameters of the STAT items. We suggest the reduced set of the STAT items in this study measured the spatial thinking skills of the participants without bias that may mislead interpretations of the outcomes of the students with MLD. This study observed that AITA was a significant positive indicator to improve spatial thinking skills in both groups of students with and without MLD, which is consistent with our findings in the host study using a two-level hierarchical linear model. This study also observed a positive interaction effect between the EAI condition and students without MLD. EAI students without MLD had the highest spatial thinking abilities based on the selected STAT items. We note this as a positive finding because it extends the previous literature of EAI (Bottge, 1999; Bottge et al., 2014, 2015; Cognition and Technology Group at Vanderbilt, 1990) and provides empirical data that EAI can improve students’ spatial thinking skills.
After taking into account the parameters of the original 16-item STAT and the final eight-item STAT, we believe that the presence of DIF warrants a refined examination of students’ spatial thinking skills using the restricted set of items, based on data suggesting the age appropriateness, content relevance, and bias related to disability status may be interfering with the precision of measurement using the original set of STAT items. Results indicate that the original version of the STAT may have difficulty accurately capturing the spatial thinking skills of the participating students with and without MLD due to measurement error and content relevance.
Future Research and Limitations
Although the findings indicated that the selected STAT items are closely aligned with seventh-grade Geometry standards in the CCSS-M measuring participants’ spatial thinking skills without bias, future studies should include, or develop, additional spatial thinking measures to assess other components of spatial skills. Three categories identified by Linn and Petersen (1985)—spatial perception (the ability to determine spatial relationships with respect to the orientation of their own body), mental rotation (the ability to imagine shapes rotated into a new orientation that is used in more complex spatial skills such as spatial visualization), and spatial visualization (a predictor of success in a variety of academic areas)—are used extensively in more complex spatial tasks such as 3D printing projects and CAD. This suggestion is closely related to a general challenge in the field with regard to commercially available mathematics assessments that adequately measure targeted mathematics constructs, which may have limited the ability of the host study to measure effectiveness of interventions. In addition, some spatial measures (e.g., Mental Rotation Test, Peters et al., 1995; Differential Aptitude Test-Spatial Relations, Bennett et al., 1947; Spatial Visualization, Winter et al., 1896) have physically deteriorated because only copies of copies are available.
Additional limitations of this study are related to sample size and demographic variables. The small sample size (N = 90) in the host study decreased the generalizability of findings in this study. Similarly, the lack of racial and ethnic diversity of the sample relative to the national population demographics constrains interpretation of findings. Although the study was informed by the results of the host study and a priori hypotheses were related to potentially concealed effects of the interventions on a single proximal outcome—students’ spatial thinking skills—it is possible measurement artifacts similarly influenced other outcomes assessed in the host study that are not reported here. Future studies that examine 3D printing technology in middle school should be conducted with larger, more diverse samples, and carefully consider the role of measurement in the study (e.g., examining proximal vs. distal outcomes using researcher developed vs. commercially available measures).
Implications for Practice
This study holds implications for practice. For one, it expands previous research by providing a detailed information of whether student ability and MLD status have caused different results to assess students’ spatial thinking skills. More specifically, this study specifies what abilities of students with or at risk for MLD were significantly lower in performing the spatial thinking skills than those of students without MLD. Although some STAT items were too complex for the participating students to comprehend, the geometry standards addressed by the selected STAT items can be particularly useful for educators to identify and teach relevant spatial thinking skills in their math classrooms. We anticipate that educators can use the geometry standards aligned with the selected STAT items in planning and preparing for their geometry instructions to improve spatial thinking skills of students with or at risk for MLD.
Another implication for practice is that spatial thinking, in general, is a math skill that students with or at risk for MLD struggle with. We agree that 3D printing technology has great potential to bring an evolving technology trend to classrooms and replace traditional hands-on manipulatives for teaching and learning math. However, without teaching the actual spatial thinking skills of how to use 3D printing tools and design software (CAD skills), students with or at risk for MLD likely continue to struggle to apply 3D-related math concepts in solving authentic problems or have difficulties in seeing spatial relationships in real-world situations (Choo, 2017; Lipson, 2007; Uttal et al., 2013). Given the increased attention in STEM education toward 3D printing technology, spatial thinking skills may deserve more attention in the classroom and should be explicitly taught in classrooms.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
