Online and Handwritten Homework in Calculus for STEM Majors

Abstract

Calculus is an essential intellectual gateway and initiation in the education of science, technology, engineering, and mathematics (STEM) students. In this study, graded online computer homework was compared with graded handwritten homework in second-semester calculus for STEM students. Previous calculus studies excluded STEM students or compared graded online computer homework with ungraded uncollected handwritten homework. Two large sections and two small sections of Calculus II were studied using a quasi-experimental design. Students were given the same lecture by the same instructor for each class. Students were not aware of the type of homework to be assigned when they registered. Online homework sections used WebAssign, which is one of the most widely used textbook and homework systems. The analysis indicates that there was no significant difference in performance between handwritten and online homework. Assessment questions that required graphical answers provided the greatest contrast between handwritten and online homework and were separately evaluated. Controls were included for entering math scores and a socioeconomic indicator. This study does not find any significant difference due to homework type. As a secondary question, the effect of class size is examined.

Keywords

calculus computer homework class size higher education mathematics education studies online homework STEM education

Introduction

Calculus is a significant gateway course in the education of science, technology, engineering, and mathematics (STEM) college students. STEM students typically are required to take calculus classes as a prerequisite to advanced classes. Sometimes it is called calculus for science majors or is classified as Mainstream Calculus. Mainstream Calculus I, II, and III were defined for the purposes of the Conference Board of the Mathematical Sciences¹ (CBMS) survey (Blair, Kirkman, & Maxwell, 2018) and the Mathematical Association of America survey (Bressoud, Mesa, & Rasmussen, 2015) as calculus that can be used as part of the calculus prerequisite for higher level mathematics courses. The 2015 CBMS survey estimated that Mainstream Calculus II had a (nondistance learning) enrollment in Fall 2015 of roughly 157,000 students in U.S. higher education (Blair et al., 2018, p. 17, Table S.5). The enrollment in spring may have been even larger since Mainstream Calculus I, the feeder course, had a Fall 2015 enrollment of 317,000 students (Blair et al., 2018, p. 17, Table S.5). The average class size in a Mainstream Calculus II class in a 4-year college or university was 39 students (Blair et al., 2018, p. 17, Table S.5).

Online homework systems in Mainstream Calculus classes are now in widespread use. More data are available for Calculus I than for Calculus II, and so we note that 49% of faculty had Calculus I students submit paper assignments, 36% had students submit assignments using an online homework system, and 12% never had students submit homework (Burn & Mesa, 2015, p. 50).² Both online homework systems and handwritten assignments are in extensive use. In this study, the authors compared WebAssign, one of the most widely used online homework systems, with handwritten homework. A distinction of this study is that previous online or computer homework studies in calculus compared ungraded handwritten homework with computer-graded homework or were not classes for STEM students.

Mainstream Calculus II is taught primarily by lecture in 85% of classes in public 2-year colleges (Blair, Kirkman, & Maxwell, 2013, p. 2). The CBMS study reports that approximately 46% of Mainstream Calculus II was in lecture with recitation (Blair et al., 2018, p. 17, Table S.5). But the previous study points out, “With the creation of mathematics tutoring centers, perhaps recitation sections are becoming less necessary, and required calculus lab assignments may not always be completed in a ‘section’ of a course” (Blair et al., 2013, p. 17). As for online homework systems, an American Mathematical Society (AMS) homework survey found that homework systems in general calculus are about evenly divided among MyMathLab, WebAssign, and WeBWorK (Kehoe, 2010, p. 755). All three homework systems allow problems to require mathematical expressions and equations as answers as well as multiple choice or numerical answers. All three systems will grade these answers and can give immediate feedback. Algebraic expressions are graded with underlying mathematical software or numerical sampling. Students cannot submit pictures to be graded by the software. WebAssign and MyMathLab have copyrighted questions tied to specific textbooks, while WeBWorK has an open source supply of problems not tied to textbooks.

Alan Schoenfeld (2008) comments on the difficulty of controlling variables in educational studies. He writes, “If different teachers taught ‘experimental’ and ‘control’ classes, the ‘teacher variable’ might be the most significant factor in the experience” (p. 472). The teacher variable was controlled in this study because every day the same teacher taught all classes and delivered the same lecture. While classes received different homework treatments, students were not aware of the homework expectations until after the semester began. Students were aware of the class size and meeting times prior to registration. There was one large and one small section for each homework type. Students’ standardized test scores and Pell Grant statuses were investigated as covariants. Pell grants are the primary source of U.S. government student aid for college students, and awards are based on a computation of the expected family contribution (Federal Pell Grant Program, 2015). It is a ubiquitous, if imperfect, proxy for socioeconomic status in U.S. higher education (Delisle, 2017).

Literature Review

Online and handwritten homework offer a tradeoff between immediate feedback that may be limited in scope and potentially more detailed feedback that is delayed. The homework feedback issue was named as both the greatest benefit and the greatest drawback in the opinion survey given by the AMS on homework software (Kehoe, 2010). The majority of respondents believed that online homework promotes learning (e.g., through immediate feedback and allowing multiple attempts), which is a major benefit. However, a majority of respondents also thought that it was a major drawback that students do not show their work. A major benefit named by respondents who are now using homework software is that they can grade homework that previously was ungraded. A significant number of respondents believed or anticipated that homework software would reduce student copying (as individual questions have certain random parameters so students may get different versions of a question).

Several studies examined homework comparisons and student performance in Calculus. The largest study was by Hirsch and Weibel (2003) and involved 1,300 students, but the study excluded engineering, physics, chemistry, and mathematics majors. The study compared handwritten homework with a mixture of handwritten and computer homework making this a study of blended homework versus handwritten homework. In the blended group about 11 written problems per week were replaced by WeBWorK problems. Students doing the blended homework did slightly better on the final exam. Hirsch and Weibel used placement scores as a control. While this study may not be a study of STEM students, may not be Mainstream Calculus, and is not a comparison test of computer versus handwritten homework, it is the first indication that adding computer-based homework may improve student achievement in Calculus as measured by final exams.

There are three additional studies (Halcrow & Dunnigan, 2012; LaRose, 2010; Zerr, 2007) that compared ungraded handwritten homework with online or computer-graded homework. These three studies explored whether students’ achievement, time spent on Calculus, and attitude can be improved by the addition of some graded online homework. However, they are not a fair comparison of achievement using online (or computer) homework with handwritten homework because one is graded and the other not. The differing treatment of grading is either a fundamental difference in the treatments or a severe confounding variable. Zerr (2007) compared ungraded handwritten assignments to computer assignments in Calculus. The computer group did better but not at a statistically significant level. LaRose (2010) conducted a Calculus II study examining ungraded (and uncollected) handwritten homework versus online homework that was not counted in students’ grades and online homework that was counted in students’ grades. Each group had approximately 220 students. LaRose found that whether students spend time on homework “the medium by which the homework is delivered (handwritten or online) matters less than whether it is graded” (p. 674). There was a slightly better exam performance by the computer homework groups on tests. The difference in performance on only one of three tests was statistically significant. Since the handwritten homework was ungraded, LaRose’s results may be a reasonable upper bound of the benefit of online homework in comparison to handwritten. Halcrow and Dunnigan (2012) examined the author-written online “CalcPortal” with four small classes. The study included 38 students in the online group who took the final exam. It included control for the teacher variable by having each of two teachers teach a treatment and control section. They compared test scores of students in handwritten homework classes with online homework classes in Calculus I. However, the handwritten homework was not graded. They reported positive or neutral results. For one teacher, the online students mean test scores were better on all six tests with two tests showing statistical significance; but for the second teacher, the online mean test scores were better on three of the six tests, and none were statistically significant (Halcrow & Dunnigan, 2012). For further discussion of handwritten homework versus online homework, the reader may consult “pros and cons” in Kathleen Malevich’s (2011) MS thesis.

The major issues for faculty members in the AMS study on homework software (Kehoe, 2010) were the value of additional feedback gained from computer homework and whether that feedback is adequate since students do not show their work. Since the feedback to students is such an important issue in relation to homework, the conclusions of studies involving handwritten homework that is not graded seem limited. This study fills that gap in the literature.

Treatments and Design

Research Questions

The primary question under study is the following:

How do students using online homework perform compared with students using handwritten homework in second-semester Calculus?

The design is quasi-experimental, but students were “virtually randomized” as is detailed in the Treatment and Control Group Selection section.

A secondary question is this:

How do students in large classes (90–150) perform compared with students in small classes (<40) in second-semester calculus?

The design is quasi-experimental, and students were allowed to select their section.

Both research questions are examined for both large and small classes using both online and graded handwritten homework.

Nonhomework Aspects of the Course

The hosting department offered eight sections of its Mainstream Calculus II class in Fall 2016. Students, by their own choice, registered for a section based on their schedule, instructor preference, time preference, and available space. Four sections were used for this experiment. Two sections required handwritten homework (control), and two sections required the online homework system (treatment) associated to the textbook (Enhanced WebAssign). It is optional for teachers in the department to use online homework. Both the treatment and control groups have one small (<40) size section and one larger section (90–150).

The four sections in this study were listed in the schedule book as team taught by the first two authors. They delivered the lectures, assembled the homework assignments, and authored the tests. Classes met 4 days a week and were taught in lecture format with specific days reserved for review. In addition, there was a tutoring center available for students. This is typical of the Mainstream Calculus II format as previously discussed.

All classes received the same lectures and used the textbook Calculus: Early Transcendentals (Stewart, 2011). A lecture that was given to the first section was also given to all other sections by the same teacher on the same day. Sample tests, sample problems, or handouts were identical for all sections and posted on the course management system, Moodle. Teachers taught the four classes as similarly as they could. Students could not request homework problems on random days. In teacher evaluations, some students stated a preference for one teacher over the other, but only one student objected to the team-teaching format.

Four 1-hour tests and a 2-hour final exam included both multiple-choice and open-ended items (i.e., questions that required showing work or giving explanations). The multiple-choice items were scanned, and the open-ended items were graded by graduate teaching assistants, following a rubric developed by the two instructors. The graduate student who graded a particular open-ended question graded that question on every test. Each 1-hour test consisted of 12 multiple-choice problems and 3 open-ended problems in which students had to show work and give explanation. The open-ended problems were graded out of 20 points, and each multiple choice was worth 1 point, for respective maximum scores of 60 and 12 points. The final exam was 24 multiple-choice and 6 open-ended problems for respective maximum scores of 24 and 120 points. For the purposes of this study, we constructed a composite score, which weighted the multiple-choice and long-answer sections evenly and weighted a problem the same regardless of the assessment on which it appears. The composite score is

\begin{matrix} \frac{1}{12} (2 \times \frac{Final Exam multiple choice}{24} + 2 \times \frac{Final Exam long answer}{120} \\ + \sum_{i = 1}^{4} (\frac{Test i multiple choice}{12} + \frac{Test i long answer}{60})) \end{matrix}

Multiple-choice items are not particularly common in Calculus II, and so an analysis of the open-ended problems is included. For this purpose, the open-ended composite score is

\frac{1}{6} (2 \times \frac{Final Exam long answer}{120} + \sum_{i = 1}^{4} \frac{Test i long answer}{60})

Exams closely reflected the common pool of homework problems, sample problems made available on the course management system, and examples done in class.

Dichotomy of Homework Systems

A challenge to instructors is to compose homework that reflects the learning objectives of a course. In handwritten calculus homework, students must formulate and express solutions. This aspect of expression is lost in online homework. Does online homework prepare students to meet the challenge of adequately expressing and demonstrating understanding on tests? Selinski and Milbourne (2015) note a lack of data to determine if online homework supports understanding and retention in mathematics courses. We believed it was important that tests reflect the level of expression and understanding expected in handwritten homework classes, thereby maintaining the same expected level of expression and understanding.

In the design, an attempt was made to assign approximately the same number of hours work to the treatment and control groups. Hence, the online students had more homework problems, but the handwritten group had a greater burden of expression.

Online homework

WebAssign, the online homework system, is tailored to the textbook (Mathematics textbook questions in WebAssign, 2018). Homework problems in WebAssign are identified with homework questions in the textbook with some additional “extra practice” problems. Homework assignments were estimated (by WebAssign) to be approximately 1 to 1½ hours for each of 33 sections on the syllabus. The online group had 33 separate assignments consisting of one per section. One teacher picked the WebAssign homework problems.

Students usually had 1 week to complete a section, and multiple sections were open at a time. Students were allowed 10 attempts to answer a free-response question. For a multiple-choice question, the number of attempts allowed was one less than the number of choices. The online system would immediately indicate if an attempt was correct or incorrect. A key with worked out solutions was made available by the online system to students after the due date.

Most of the questions in WebAssign require a mathematical expression as an answer. Some explanatory problems from the textbook become multiple choice in WebAssign (e.g., “What is a convergent series? What is a divergent series?”). The questions that transfer to WebAssign from the textbook with the most significant change are questions that ask students to sketch a graph, curve, or surface. In WebAssign, students are asked to sketch, but they cannot enter a picture into the online system. Instead, this type of problem is implemented on WebAssign as multiple choice. However, students submitting the handwritten assignment version of this question must make a freehand sketch. All students must make freehand sketches on tests, since it is a desired skill. We called test questions requiring freehand sketches, graphical answer questions. We examined the scores of the graphical answer questions.

Handwritten homework

There were 14 weekly homework assignments. They were graded by undergraduate graders. The assigned problems that were to be collected and graded were selected by one of the teachers from the corresponding assigned WebAssign problems.

Writing up a complete handwritten answer takes significantly more time than just doing scratch work, and so handwritten collected assignments reflected that fact and were usually 60% as long. However, all book problems that corresponded to a WebAssign assignment were assigned as practice problems. Corrected homework assignments were returned to the students the following week. Answers to homework problems were posted to the class management system where they could be access by the students. To ensure consistency of grading, the instructors provided the graders with a guide.

Test Items

Referring to White and Mesa’s (2014) categories, which are a variant of those of Tallman and Carlson (2012), we believe that our test problems fall within the categories of Remember, Recall and apply procedure, Recognize and apply procedure, Understand, and Apply understanding. These category appearances are consistent with White and Mesa’s (2014) analysis of Calculus I tests, that is, “ … [Analyze, Evaluate, Create] codes showed up in our data less than 0.1% of the time …” (p. 680). Hence, we believe that our tests fall within the mainstream of calculus. We do recognize that tests constructed by various faculty members (even in the same department) may significantly vary as was noted in the White and Mesa (2014) Calculus I study. Some samples from Test 2 are given in Tables 1 and 2. Test 2 covered parametric and polar equations and the beginning of sequences and series.

Table 1.

Open-Ended Problems From Test 2.

1 (a). Sketch the polar curve given by

r = 1 - sin (θ)

for 0 ≤

θ

≤ 2π.

(b). Find the area enclosed by the curve in part (a). Express your answer as a definite integral, that is, set up an integral but do not evaluate it.

2. Determine if the series

\sum_{n = 1}^{\infty} \frac{ln (n)}{n^{2}}

is convergent or divergent. Be sure you state the test used or method as part of your work.

3. Determine if the series

\sum_{n = 1}^{\infty} (- 1)^{n - 1} \frac{n}{\sqrt{2 n^{2} + 1}}

is convergent or divergent. Be sure you state the test used or method as part of your work.

Table 2.

Sample Multiple-Choice Sequences and Series Problems From Test 2.

1. Suppose

\sum_{n = 1}^{\infty} a_{n}

is an infinite series for which

{lim}_{n \to \infty} a_{n} = 0

. Which of the following statements must be true?

\sum_{n = 1}^{\infty} a_{n}

is convergent.

\sum_{n = 1}^{\infty} a_{n} = 0 .

\sum_{n = 1}^{\infty} a_{n}

does not diverge to infinity.

(a) 1 only. (b) 2 only. (c) 3 only. (d) 1 and 3 only. (e) None of the above.

2. Consider the series

1 - \frac{1}{2} + \frac{1}{10} - \frac{1}{4} + \frac{1}{100} - \frac{1}{8} + \frac{1}{1, 000} - \frac{1}{16} + \frac{1}{10, 000} - \frac{1}{32} + \dots

where

a_{n} = {\begin{matrix} \frac{1}{10^{\frac{n - 1}{2}}} if n is odd \\ \frac{1}{2^{\frac{n}{2}}} if n is even \end{matrix}

Which of the following is true?

(a) The alternating series test applies and shows that the series converges.

(b) The alternating series test applies and shows that the series diverges. (c) The series converges, but the alternating series test does not apply.

(d) The series diverges, but the alternating series test does not apply.

(e) None of the above are true.

3. Which of the following sequences

{a_{n}}_{n = 1}^{\infty}

diverge?

(a)

a_{n} = \frac{1}{n}

(b)

a_{n} = (- 1)^{n + 1} \frac{1}{n}

(c)

a_{n} = \frac{n^{2}}{e^{n}}

(d)

a_{n} = (- 1)^{n + 1}

(e) None of the above diverge.

4. You may use the comparison test on the series

\sum_{n = 1}^{\infty} \frac{1}{\sqrt{n^{4} + 2}}

by noting

(a)

\frac{1}{\sqrt{n^{4} + 2}} < \frac{1}{n^{4}}

. (b)

\frac{1}{n^{4}} < \frac{1}{\sqrt{n^{4} + 2}}

. (c)

\frac{1}{\sqrt{n^{4} + 2}} < \frac{1}{n^{2}}

. (d)

\frac{1}{n^{2}} < \frac{1}{\sqrt{n^{4} + 2}}

. (e) None of the above may be used for the comparison test.

5. Consider the series

\sum_{n = 0}^{\infty} e^{n}

. The sequence of terms begins as which of the following?

(a)

\frac{1}{1 - e}

(b)

e, e^{2}, e^{3}, \dots

(c)

e + e^{2} + e^{3} + \dots

(d)

1 + e + e^{2} + \dots

(e) None of the above

Answers: e, c, d, c, e

Question 1 (Table 1) is a graphical answer question, a cardioid. The material on polar coordinates is covered in less than 2 hours, and most of that time is devoted to calculus methods rather than the basic definitions and graphing. The initial method to use is clear and falls into Recognize and apply procedure, but the implementation requires understanding for those who have not previously done this equation. Students using WebAssign usually see graphing questions as multiple choice and may eliminate incorrect graphs without having to make the sketch oneself. Question 2 may be done either with an integral test, which would then require integration by parts, or as a subtle comparison test. A student must recognize the need for a test and which to apply. The execution requires an understanding. We viewed that question as Apply understanding. Question 3 also requires recognizing which test to apply but the execution is more straightforward limit computation from Calculus I. We considered it a Recognize and apply procedure.

Multiple-choice items (Table 2) included memorization but also included testing for details of understanding concepts and methods. Some questions may be categorized as Remember, but Understand is more common as students must make comparisons or inferences that require an understanding of a concept.

Cost of Treatment and Control

Both the treatment and the control have associated costs. We cannot separate the cost of the textbook from the cost of the homework since they were usually purchased as a package for the students in the treatment group. We give the semester cost but multisemester options are also available. They are summarized in Table 3.

Table 3.

Book and Online Access Cost.

Option	Semester access	Multisemester access
Ebook alone	34.99	41.99
WebAssign with ebook	94.00	125.00
Paper book (own), WebAssign, and Ebook	171.65
Paper book rental	44.55

The undergraduate homework graders were paid $10 per hour and filled out their own time sheets. The cost per student (based on 14th-day enrollment) was $22.81. The total cost of the one-semester option is $57.80 (with $22.81 paid by the department or university and $34.99 paid by the student).

Treatment group costs were completely paid by the students. WebAssign with ebook access for one semester was $94.00 per student. WebAssign offers discounts at some schools and multiclass bundles. Note that from the school’s standpoint, the cost is $22.81 per student for handwritten homework versus $0.00 per student for online homework.

Methodology

The experimental design had a fully crossed factorial structure (Hinkelmann & Kempthorne, 2007). Specifically, the treatment factors considered in the statistical model consisted of the main effects: homework type (handwritten, online), class size (small, large), and gender (male, female) with all possible second- and the third-order interaction effects. The interaction effects are of particular interest (e.g., to determine if the differences across the two types of homework varied across the levels of class size or gender). We also examined the model conditioned on class size and also conditioned on the student being an underrepresented minority (i.e., African American, Hispanic or Latino, American Indian or Alaska Native, or Pacific Islander). We used SAS or STAT (version 9.4) and R (version 3.4.2) for our statistical analyses.

Statistical Model

The design uses two covariates to control nonsystematic variation (this is a common technique often referred to as analysis of covariance) and made treatment comparisons with this adjustment (Hinkelmann & Kempthorne, 2007, pp. 239–276). When the covariate is not significant, then analysis of covariance results in standard analysis of variance. It is a common approach to mix regression with analysis of variance models. Since each factor is at two levels, simple indicator variables (generally denoted as x) can be used to express each of the design factors. Thus, the full model is

\begin{matrix} Y = β_{0} + β_{1} x_{1} + β_{2} x_{2} + β_{3} x_{3} + β_{12} x_{1} x_{2} + β_{13} x_{1} x_{3} \\ + β_{23} x_{2} x_{3} + β_{123} x_{1} x_{2} x_{3} + β_{4} x_{4} + β_{5} z + ɛ \end{matrix}

where x₁, x₂, x₃, and x₄ are homework type, class size, gender, and Pell indicators; z is the ACT covariate (below); and ɛ is the error terms that is assumed to be approximately normally distributed with mean zero, common variance, and with independence holding across students. The response Y corresponds to the composite score, the open-ended composite score, or the sum for the graphical answer scores.

Covariates or Control Variables

To account for nonhomogeneity among the students, we make use of the Math ACT or Math SAT equivalent score in the form of a covariate called ACT and the indicator variable of whether a student received a Pell Grant. The ACT and SAT are the standardized tests used for admissions in U.S. higher education. Gender can also be viewed as a control variable. The ACT was used as a proxy for prior mathematical preparation or a pretest. The Pell Grant was a proxy socioeconomic indicator. Both ACT score and Pell are measured on each student prior to the course assignment, and thus their independence with treatment assignment is plausible. If a student took the ACT more than once, then the highest score was used. We further assume that the ACT has a linear relationship to the response with common slope for each homework type and class-size combination. In all models considered, we always found ACT to be a highly significant effect, and we often found Pell to be moderately significant. In effect, the ACT and Pell covariates are used in the model to effectively further reduce experimental error in the design to sharpen the significance of the treatment effects and their interactions.

Treatment and Control Group Selection

The Institute for Education Sciences (IES) has strong preference for randomized controlled trials (Coalition for Evidence-Based Policy, 2003). While the pool of students was not randomly assigned, it was the case that students were unaware that they were participating in a study and students were unaware of the type of grading options at the time of registration. They could not make the selection based on whether they preferred one or the other. We say homework treatment was virtually randomized.

One check of whether randomness is a reasonable interpretation is to examine the distribution of the 60 underrepresented minority students. The χ² statistic with 3 degrees of freedom for the distribution is 1.82. Under the assumption of a random assignment, there is a 61% chance the distribution would have a higher (worse) χ² and is consistent with random selection.

The IES objection to a comparison group design is as follows: “ … unobservable differences between the members of the two groups that differentially affect their outcomes. For example, if intervention participants self-select themselves into the intervention group, they may be more motivated to succeed than their control-group counterparts.” We address the IES objections and how the two intervention groups in this study avoid the pitfalls discussed by IES (p. 3, 11–12).

Students have similar prior test scores and demographics.

Students should be studied over the same time period, with the same methods, and same outcome data.

Item 1 is a matter of concern for randomized and comparison group study. We included ACT and Pell Grants as covariates to address test scores and socioeconomic characteristics. Both groups were studied over the same time period, with identical methods, and the same outcome data.

Students should not self-select their treatment.

The comparison group should not be comprised of individuals who had the option to participate in the intervention but declined.

The intervention or comparison groups and outcome measures are determined “prospectively”—that is, before the intervention is administered.

Concerning (3) and (4), students were unaware of the two options at the time of registration and could not make the selection based on whether they preferred one or the other. Three students who changed between control or treatment sections during the add or drop period were not included in the data. All the measures were arranged prospectively (5), and the only data gathered could have been obtained in nonexperimental sections. This last item was necessary to obtain the university’s institutional review board ethics approval without making the students aware of the study.

The issues with self-selection not discussed earlier are the typical priorities of students scheduling in multisection classes: (a) fit into schedule, (b) choice of instructor, and (c) time of day. Concerning (a), the authors checked and are unaware of any important class conflict that may have prejudiced the student’s selection process and cause differences in the control and treatment groups. The choice of the instructor (b) was not a factor between the control and treatment groups since the online registration booklet listed all four sections as team-taught with the same instructors. Concerning (c), there was no control. It is possible that early morning students versus late risers may have selected the 8:30 a.m. treatment group versus an 11:30 a.m. control group. However, if students were randomized into inappropriate time slots for them to be effective, then, one can argue, it would prejudice or reduce the value of the trials (e.g., a late riser put in an 8:30 a.m. class might not attend lectures or pay attention in lecture but might at 11:30 lectures). Students selecting their own time reflect the real-world situation and “… representations of particular situations—and the usefulness of the model will depend on the fidelity of the representation” (Schoenfeld, 2008, p. 480).

Note that there was virtual randomization with respect to all the effects except class size. Students knew the class size at the time of selection. There was virtual randomization between the two small sections and virtual randomization between the two large sections.

Treatment and Control Group Observations

The initial total enrollment was 270 students. To be included in the study, a student had to complete the course, (i.e., turn in a final exam). Students could not withdraw during the last month of class. Three students were removed because they switched sections in the first 2 weeks after they may have become aware of the homework grading methodologies. One student was removed for cheating on a test. Ultimately, there were 242 students in this study.

Of the 242 students in this study, 226 had complete data. The gender split was 71.2% male and 28.8% female. Demographics were diverse with 60.6% White, 16.2% African American, 8.3% Hispanic or Latino, 6.6% Asian, 0.4% American Indian or Alaska Native, 0.0% Pacific Islander, and 7.9% listed as other.

The majority of the 18 students who had some missing data were missing only the covariate ACT or Pell. The level of “missingness” was fairly constant across the statistical models at approximately 10%. In addition to missing ACT, this percentage also reflects students having at least one missing exam, and thus students needing remedy to compute composite scores. A complete case analysis with no imputation is allowed with missing data at a level under 10% (Hair, Black, Babin, & Anderson, 2000). Given the relatively low levels of missingness and the relatively large sample size in treatment cells, no imputation was performed. The raw data are available (Smolinsky, Olafsson, Marx, & Wang 2018).

Results and Statistical Analysis

Figure 1 provides a simple summary comparing the composite scores across homework type (handwritten vs. online). We find that the score distributions are relatively symmetric with means of 0.405 for handwritten and 0.415 for online, with a pooled standard deviation of 0.165. The t test found no significant differences across the factor homework type (p > .62). Even after adjusting for the ACT covariate, nonsignificance remained (p > .29). Similar findings were present for t tests comparing composite scores across the factor class size or the factor gender (p's > .86 and .49, respectively) and remained nonsignificant after adjusting for the covariate (p's > .99 and .12, respectively).

Figure 1.

Distribution of composite scores by homework type.

Investigating the composite score using a full factorial on homework type, class size, and gender, along with the ACT linear covariate, we found moderate evidence of Gender × Class Size interactions, which are suggestive that females had higher composite scores than males, only in the large sections.

Composite Score

There were no statistically significant main effects or interactions involving class size, homework type, and gender (all p > .05). The ACT was statistically significant as a linear covariate (p < .001). The Pell effect indicator was also significant (p < .046). Pell recipients had an average composite score of 0.048 less than nonrecipients after controlling for all other effects. The model R² value was 39% with approximately 10% missingness in the data. The gender by class size interaction was significant at the 0.10 level with a p < .081. The effect was again that females were performing moderately better than males on average but only in the large sections. Figure 2 displays such an interactive relationship, displaying the four averages of gender and class size combinations.

Figure 2.

Composite score: Least-squares means by class size and gender.

Open-Ended Composite Score

There were no statistically significant main effects or interactions involving class size, grading type, and gender (all p > .05). However, for this response, the three-factor interaction was found to be moderately significant between homework type, class size, and gender, with p < .076. The major driver of this interaction was the difference found across gender but specific only to the handwritten homework only in large sections. There was an average increase of 0.101 points for females over males for this slice (p < .056, Bonferroni corrected). The Pell effect was also moderately significant (p < .095), with a 0.046 average increase in open-ended composite scores for non-Pell relative to Pell recipients. The ACT covariate remained highly significant with p < .0001.

Sum of the Three Graphical Answer Problems

For this response, a three-factor interaction was found between homework, section size, and gender (p < .023). Note that a significant difference was found across gender, but only at the levels of handwritten homework in large sections, with an average increase of 10.4 points for females over males (p < .008, Bonferroni corrected). Figure 3 displays the three-factor interaction in terms of homework comparisons (hand vs. online) at the four section-by-gender combinations. The left panel displays an average increase of 7.69 points for females in large sections who had handwritten homework when compared with online homework. The right panel shows a similar shift of 8.66 points for males in small sections who used handwritten homework when compared with online homework. The unprotected p values for these shifts are .055 and .043, respectively. However, when adjusting for multiple comparisons, these two shifts each become nonsignificant (p > .10). The model R² was 31%, and the ACT covariate was significant (p < .0001), while the Pell effect was nonsignificant (p > .42), after adjusting for all other effects.

Figure 3.

Sum, graphical answer problems: Least-squares means by class size, gender, and homework.

Analysis, Conditional on Underrepresented Minorities

Conditional analyses were performed only on the 24.9% of the data that were considered from underrepresented minorities. For each of the three responses (composite score, sum of graphical answers, and open-ended composite), no significant main effects or interactions were found across the factor homework type and section size (p > .10 in all cases). The ACT covariate was highly significant for each response (p < .005 in all cases), whereas the Pell indicator was nonsignificant in all cases (p > .34). The conditional analysis on the underrepresented group had less than 7% missingness.

Analysis, Conditional on Class Size

Conditional analyses were performed conditioned on students being in a large section and conditional on students being in a small section. This was done for both the composite score and the open-ended composite score yielding four cases. There was no indication of an effect from homework type with p values ranging between .42 and .95. ACT was significant in all cases, with greater model R² found in small sections when compared with large sections (large section and open-ended R²= 0.31; large section and composite R²= 0.34; small section and open-ended R²= 0.48; small section and composite R²= 0.53). Gender was significant in the large sections with women performing 0.067 higher on the open-ended composite with p < .022 and performing 0.056 higher on the composite with p < .026. Pell Grant recipients performed 0.049 worse in the large sections composite with a moderate p < .088. There were no other significant or moderate effects.

Discussion

Computer homework has a number of ostensible advantages and disadvantages. Compared with handwritten homework, the equivalent question has an algorithmic variant so students are assigned somewhat different questions. The variations in questions may encourage individual time-on-task compared with static problems. A good practice is promoting time on task, which is an important part of homework assignments (Burn & Mesa 2015; Chickering & Gamson, 1991). But students can find ways to avoid this benefit. (In 2018, the first author observed a large group of students on the Groupme app. The first person to solve a homework problem—known correct via immediate feedback—would send the screen shot to the whole group.)

Burn and Mesa (2015) note that “assignments are a primary means to provide students with timely, supportive, and corrective feedback, which is known to make a difference in students’ learning (Angelo & Cross, 1993; McKeachie & Svinicki, 2006)” (p. 49). Students get immediate feedback from online homework, but the feedback is limited to the answer rather than the work. It is timely, but is it supportive and corrective?

While we recognize these issues, we found no significant impact on the composite score or the open-ended composite score due to homework type (Table 4). The p values for the effect of homework type in all variations of the model were consistently nonsignificant, with the smallest being .42 and the largest over .99.

Table 4.

Scores.

	Large class		Small class
	Composite	Open-ended composite	Composite	Open-ended composite
Handwritten	0.392	0.3503	0.433	0.3773
Online	0.429	0.3797	0.377	0.3209

Homework type may not matter in most Mainstream Calculus classes because the expected level of explanation and work might not require a graders attention. For example, students would be penalized for incorrect use of an equal sign. If students performed a substitution in a definite integral without changing the limits of integration (even if they ultimately changed back to the original variable and arrived at the correct answer) or if they used the equal sign as an indiscriminate symbol between steps, then they would be penalized. However, there were no greater expectations of argumentation and precision. There were few students who correctly wrote coherent explanations using logical quantifiers or the implies symbol. Perhaps this is not surprising since even mathematics majors usually get their preparation for writing logical argument in mathematics after calculus. The courses in this study were not more “in-depth” honors courses (Selinski & Milbourne, 2015, p. 32). Perhaps the expectations of Mainstream Calculus do not require handwritten homework, and tutoring support is a more effective use of resources than handwritten homework grading.

The similarity of results between online and handwritten aspects of a course has also been reported in other settings. For example, Ardid, Gómez-Tejedor, Meseguer-Dueñas, Riera, and Vidaurre (2015) noted a similarity of results between online and offline assessments in physics for engineers.

There was no virtual randomization with respect to class size, but we believe there was within the small-class group and within the large class group. We therefore examined the effects conditional on the class size, (i.e., comparing within each class size group). There was no effect from the homework type. We do note that women performed slightly but statistically significantly better than men in the large sections.

The study is still a valid comparison group study when viewed as a class size study. There was no virtual randomization, but it did include control for ACT and Pell. Looking at the effect of class size, we find no significant or moderate effect due to class size. The Mathematical Association of America study literature review commented that some studies, which are not specific to mathematics, show a negative association between class sizes and student achievement while others found class size is not a good predictor or not a predictor of student achievement (Selinski & Milbourne, 2015, p. 32). This study supports that class size is not a predictor in Calculus II.

Graphical answer homework questions became multiple choice in WebAssign, so it is plausible that the influence of homework type would be revealed in the graphical answer test questions. It could not be verified in the small sections nor the large sections. However, the influence of homework type in large sections on the graphical answers had a p < .13. While it is not at the level of moderately significant, it is worth noting as a possible point of further exploration. Looking deeper at these results, it first appears that females in large sections benefited from handwritten homework and males in small sections benefited from handwritten homework; however, these appearances do not rise to the level of statistical significance.

Conclusion

Is it detrimental that students do not show their work in online homework? This concern using fully online or computer homework compared with handwritten homework was voiced by teachers (Kehoe, 2010). We did not find this fear to be realized and can endorse online or computer homework as being as good as handwritten and graded homework. Even the difference in scores for graphical answer questions was not significant. Departmental faculties that are concerned with the cost to students may use graders or use WeBWorK. It does not seem necessary in this era to assign homework that does not provide feedback to students.

There was also no difference in performance based on class sizes of 90 to 150 or 40. Students chose their class aware of the class size, and so we cannot justifiably recommend only offering large sections. Nevertheless, we can recommend that schools that solely teach using small sections may include larger sections as options.

Footnotes

Acknowledgments

The investigators are grateful to many people who assisted in the study. Graduate assistants Yu-Chan Chang and Sana Issa from the Louisiana State University (LSU) Department of Mathematics graded exams. Undergraduates Margaret Carey, Joey Lane, Juan E Rubio, Grady Cunningham, and Christopher Chuang graded homework. Bernie Braun, Director of Institutional Research, prepared the student variables for the study. The authors are grateful to Eugene Kennedy of the LSU School of Education for several discussions. The investigators also acknowledge the LSU President’s Student Aide Program, which supported undergraduate graders. The manuscript was improved by helpful suggestions from the reviewers. Charles Delzell and Susan Dunham made corrections to the manuscript.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Notes

Author Biographies

Lawrence Smolinsky is a Roy Paul Daniels professor in the Department of Mathematics at Louisiana State University. His present research interests include mathematics in higher education and informal middle school mathematics education as well as information science. His mathematical work includes topology and mathematical physics.

Gestur Olafsson is an alumni professor in the Department of Mathematics at Louisiana State University. His research interests include harmonic analysis, representation theory, and geometry.

Brian D. Marx is a professor in the Department of Experimental Statistics at Louisiana State University. His main research interests include P-spline smoothing and chemometric applications. He is the coordinating editor for the journal Statistical Modelling.

Gaomin Wang has a PhD in physics and is completing a degree in applied statistics at Louisiana State University. She is currently an Insight Data Science fellow at Insight, Palo Alto.

References

Angelo

T. A.

Cross

K. P.

(1993) Classroom assessment techniques: A handbook for college teachers, San Francisco, CA: Jossey-Bass.

Ardid

Gómez-Tejedor

J. A.

Meseguer-Dueñas

J. M.

Riera

Vidaurre

(2015) Online exams for blended assessment. Study of different application methodologies. Computers & Education 81: 296–303.

Blair

Kirkman

E. E.

Maxwell

J. W.

(2013) Statistical abstract of undergraduate programs in the mathematical sciences in the United States. Fall 2010 CBMS survey, Washington, DC: American Mathematical Society.

Blair

Kirkman

E. E.

Maxwell

J. W.

(2018) Statistical abstract of undergraduate programs in the mathematical sciences in the United States. Fall 2015 CBMS survey, Washington, DC: American Mathematical Society.

Bressoud

D. M.

Mesa

Rasmussen

C. L.

(2015) Insights and recommendations from the MAA. National study of college calculus, Washington, DC: Mathematical Association of America.

Burn

Mesa

(2015) The Calculus I curriculum. In: Bressoud

D. M.

Mesa

Rasmussen

C. L.

(eds) Insights and recommendations from the MAA national study of college calculus, Washington, DC: Mathematical Association of America, pp. 45–57.

Chickering

A. W.

Gamson

Z. F.

(1991) Seven principles for good practice in undergraduate education. New Directions for Teaching and Learning 47: 63–71.

Coalition for Evidence-Based Policy. (2003). Identifying and implementing educational practices supported by rigorous evidence: A user friendly guide (Publication No: NCEE EB2003). Institute for Education Science, U.S. Department of Education. Retrieved from https://ies.ed.gov/pubsearch/pubsinfo.asp?pubid=NCEEEB2003.

Delisle, J. (2017). The Pell Grant proxy: A ubiquitous but flawed measure of low-income student enrollment (Evidence Speaks Reports 2, no. 26, 2017). Retrieved from https://www.brookings.edu/research/the-pell-grant-proxy-a-ubiquitous-but-flawed-measure-of-low-income-student-enrollment/.

10.

Federal Pell Grant Program. (2015). Programs: Federal Pell Grant Program. U.S. Department of Education. Retrieved from https://www2.ed.gov/programs/fpg/index.html.

11.

Hair

J. F.

Jr Black

W. C.

Babin

B. J.

Anderson

R. E.

(2000) Multivariate data analysis, 7th ed. Upper Saddle River, NJ: Pearson Education.

12.

Halcrow

Dunnigan

(2012) Online homework in Calculus I: Friend or foe? PRIMUS 22(8): 664–682.

13.

Hinkelmann

Kempthorne

(2007) Design and analysis of experiments, volume 1: Introduction to experimental design, 2nd ed. Hoboken, NJ: Wiley.

14.

Hirsch

Weibel

(2003) Statistical evidence that web-based homework helps. FOCUS 23(2): 14.

15.

Kehoe

(2010) AMS homework software survey. Notices of the American Mathematical Society 57: 753–757.

16.

LaRose

P. G.

(2010) The impact of implementing web homework in second-semester calculus. PRIMUS 20(8): 664–683.

17.

Malevich, K. (2011). The accuracy and validity of online homework systems (master’s thesis). University of Minnesota, Duluth, MN. Retrieved from http://www.d.umn.edu/math/Technical%20Reports/Technical%20Reports%202007-/TR%202011/TR_2011_2.pdf.

18.

Mathematics Textbook Questions in WebAssign. (2018). WebAssign from Cengage. Retrieved from https://www.webassign.net/features/textbooks/mathematics_textbooks.html.

19.

McKeachie

W. J.

Svinicki

M. D.

(2006) Teaching tips: Strategies, research, and theory for college and university teaching, 12th ed. Boston, MA: Houghton Mifflin.

20.

Schoenfeld

(2008) Research methods in mathematics education. In: English

L. D.

(ed.) Handbook of international research in mathematics education, 2nd ed. New York, NY/London, England: Routledge, Taylor & Francis Group, pp. 467–519.

21.

Selinski

N. E.

Milbourne

(2015) The institutional context. In: Bressoud

D. M.

Mesa

Rasmussen

C. L.

(eds) Insights and recommendations from the MAA national study of college calculus, Washington, DC: Mathematical Association of America, pp. 31–44.

22.

Smolinsky, L., Olafsson, G., Marx, B. D., & Wang, G. (2018). Data for: Online and handwritten homework in Calculus for STEM majors (Data set; Mendeley Data v1). doi:10.17632/d4js3h6zsx.1.

23.

Stewart

(2011) Calculus: Early transcendentals, 7th ed. Boston, MA: Brooks Cole Cengage Learning.

24.

Tallman

Carlson

M. P.

(2012) A characterization of calculus I final exams in U.S. colleges and universities. In: Brown

Larsen

Marrongelle

Oehrtman

(eds) Proceedings of the 15th Annual Conference on Research in Undergraduate Mathematics Education, Portland, OR: Portland State University, pp. 217–226.

25.

White

N. J.

Mesa

(2014) Describing cognitive orientation of Calculus I tasks across different types of coursework. ZDM Mathematics Education 46(4): 675–690.

26.

Zerr

R. J.

(2007) A quantitative and qualitative analysis of the effectiveness of online homework in first-semester calculus. Journal of Computers in Mathematics and Science Teaching 26(1): 55–73.