Abstract
Many studies use multiexperiment designs where experiments are carried out at different times of semester. When comparing between experiments, the data may be confounded by between-participants effects related to motivation. Research indicates that course-credit participants who engage in research early in semester have different personality and performance characteristics compared to those tested late in semester. This study examined whether the semester effect is caused by internal (inherent motivation of the participant) or external (looming exams, essays) factors. To do this, sustained attention and intrinsic/extrinsic motivation was measured in groups of course-credit (n = 40) and paid (n = 40) participants early and late in semester. While there was no difference in sustained attention between the groups early in semester, the course-credit group performed significantly worse late in semester. The course-credit group also showed a significant decrease in intrinsic motivation with time whereas the paid participants showed no change. Because changes were not seen for both groups, the semester difference cannot be due to external factors. Instead, the data demonstrate that course-credit participants who engage early have high sustained attention and intrinsic motivation compared to their late counterparts, who leave their participation to the last minute. Researchers who use multiexperimental designs across semester need to control for these effects—perhaps by using paid participants who do not vary across semester.
All participants may not be created equal. To illustrate how individual differences could affect the inferences drawn from psychological experimentation, consider the following scenario. An experiment investigates the effect of predictive spatial cues on shifts in exogenous attention. Like many psychology studies, the experiment uses undergraduate psychology students who participate as part of their course requirements. The results show a strong effect of cueing whereby reaction times (RTs) are faster for validly cued trials than for invalid trials. This successful experiment is followed by a second experiment later in the semester where the spatial cues no longer predict the target's location. The second experiment shows a reduced cueing effect, and this effect is confirmed with an omnibus analysis of variance, which shows a significant interaction between the within-subject factor (valid/invalid) and the between-subject factor (predictive/nonpredictive). Based on these results, the experimenters conclude that spatial cues produce a larger cueing effect when they are predictive than when they are not.
The scenario outlined above is one that could be played out in experimental psychology laboratories around the world. While the conclusions seem sound at a surface level, there is a problem related to the effect of time: One of the experiments was carried out earlier in the semester than the other. University students who carry out experiments as part of their course requirements are used widely throughout psychology research (Miller, 1981). The benefit of using student pools is that they are easily accessible, relatively homogenous, and (hopefully) bright and able. It is possible, however, that these participants differ in their motivation to carry out the experiment. Returning to our scenario again, it is conceivable that participants in the first experiment, which was conducted earlier in semester, had higher levels of motivation than those tested later in semester. Increased motivation is known to enhance neural processing within task-related regions and change reaction time profiles (Small et al., 2005) and improve detection performance (Engelmann, Damaraju, Padmala, & Pessoa, 2009). More importantly, increasing motivation is known to reduce intraindividual variability in reaction time for a choice RT task (Garrett, MacDonald, & Craik, 2012). It is therefore possible that the reduced effect of cueing observed in Experiment 2 is related to lower motivation and higher data variability in these participants, which obscures differences between the cueing conditions, rather than a true effect of cue predictability.
Research indicates that differences exist between participants who partake in studies early or late in semester. Participants who engage in studies early in semester are more likely to be female (Harber, Zimbardo, & Boyd, 2003; Richter, Wilson, Milner, & Senter, 1981), have high grades, and be more learning oriented (Bender, 2007), as well as having lower sensation seeking and impulsivity (Zelenski, Rusting, & Larsen, 2003). In terms of task performance, early participants attempt more items on symbol substitution tasks (Richter et al., 1981), scan more items in a digit scanning task (Richert & Ward, 1976), and spend more time completing difficult items on an anagram task (Hom, 1987). Participants who sign up late in semester are also more likely to withdraw from an aversive task, when given the chance, than those who sign up early (Navarick & Bellone, 2010).
While a number of studies have established that performance differences exist between early and late participants, the locus of this difference is not clear. The locus could be internal—related to the inherent quality of the participant, or external—related to environmental pressures. To give an example: A participant who engages in a study late in semester may perform poorly on a task (Hom, 1987) or withdraw (Navarick & Bellone, 2010) because of an inherent lack of motivation or external pressures such as looming essays and exams. Support for the internal account comes from research showing differences between early and late participants in personality traits such as personal structure (Roman, Moskowitz, Stein, & Eisenberg, 1995) and compliance (Aviv, Zelenski, Rallo, & Larsen, 2002). Such traits are presumably relatively stable and reflect an internal locus rather than external, situational factors. Grimm, Markman, and Maddox (2012) argue, however, that late participants are not necessarily unmotivated, but are in a different motivational state caused by situational factors at the end of semester. To test the proposition, they administered a maths test where either correct responses were rewarded with points, or fewer points were lost. Results showed that early participants performed better when they tried to maximize gains, whereas late participants performed better when they tried to minimize losses—perhaps due to a situational prevention focus.
Knowing the locus of the semester-related timing effect is important as it would allow researchers to control for, or avoid, such effects in their experimental designs. To provide a better understanding of why late participants perform worse than early participants, this study used two groups tested at different times in the semester. The first group were students participating as part of their psychology course requirements, whereas the second group were students paid for their time. While both groups experience the same external/situational constraints, they may differ in relation to their motivation to perform experimental tasks.
Task-related performance was measured using the Sustained Attention to Response Task (SART Robertson, Manly, Andrade, Baddeley, & Yiend, 1997). The SART requires participants to maintain attention and withhold their response to infrequent targets. As such, it should provide a good reflection of how performance is affected on a broad range of cognitive tasks, which measure RT and detection. Performance will be measured using an inverse efficiency score, which takes error and RT in account, as well as trial-to-trial variability in RT. If the lower performance of late participants is related to external demands (e.g., pressure of upcoming exams), then both groups should show decreased processing efficiency as well as more variable RTs from early to late semester. Conversely, if the lower performance is driven by internal constraints where unmotivated course-credit participants leave their attendance to the last minute, then an effect of semester should only be apparent for the course-credit participants.
Individual differences in motivation were measured with the Student Work Preference Inventory (Amabile, Hill, Hennessey, & Tighe, 1994). This inventory measures stable traits relating to intrinsic motivation (work done for its own sake) and extrinsic motivation (work done for reward or recognition; Amabile et al., 1994). In relation to intrinsic motivation, Hom (1987) concluded that early participants had higher levels than late participants because their performance was detrimentally affected by the presence of an external reward. No effect of semester, however, was reported by Case de Calvo and Reich (2007) when they administered a questionnaire assessing intrinsic motivation. Although these inconsistent results make predictions difficult, it seems reasonable to propose that, if the early/late semester effect is the result of internal processes, then course-credit participants who engage in studies early in the semester should have higher intrinsic motivation than late participants. In contrast, the paid participants should have the same level of intrinsic motivation irrespective of the time of semester.
To our knowledge, no study has investigated the effect of semester on extrinsic motivation. While it could be argued that both course-credit and paid participation are done for reward, the reward is more salient for paid than for course-credit participants. With this in mind, it was predicted that paid participants would have higher levels of extrinsic motivation than course-credit participants. Moreover, levels of extrinsic motivation should be stable across time for both groups. This lack of interaction contrasts with the interaction predicted for intrinsic motivation and will demonstrate that the effect of semester is specific to intrinsic motivation and not related to a general motivational mechanism.
Method
Participants
Eighty university students were recruited using the Sona human subject pool management software (m = 18, f = 62,
Although the experiment aimed to use all participants' data, there was one clear outlier. This female from the early course-credit condition correctly responded to targets and withheld her response to nontargets only 65% of the time. The nearest lowest score was 81%, and the mean accuracy for all participants was 91%. Given that this participant clearly misunderstood or was unable to do the task, her data were omitted from the analysis. Chi-square analyses revealed no difference in gender ratio as a function of time (early, late) or reward (course-credit, paid). An analysis of variance (ANOVA) with time and reward as between-participants factors revealed that early participants were older (
Apparatus
Stimulus presentation was controlled with a PC running E-prime 2.0 software and displayed on an LCD screen (Dell U2212HM) with a diagonal width of 545 mm. Responses were recorded using an E-prime serial response box (Model 200A). The box was placed in front of the participant parallel with their midsagittal plane. A height-adjustable chin rest maintained participants' head position so that the centre of the display panel was in line with their midsagittal plane at eye level at a distance of 450 mm. A closed-circuit video camera ensured that participants' concentration was maintained.
Stimuli
The stimuli were based on the SART developed by Robertson, Manly, Andrade, Baddeley, and Yiend (1997). Digits ranging between 1 and 9 were shown in white against a black background. The height of the digits ranged between 12 and 29 mm in roughly five equal steps. Correspondingly, the width of the digits ranged between 7 and 20 mm in five steps. A 24-mm diameter circle with an “x” in it (⊗) was used as a mask. The stimuli were placed in the centre of the screen.
Procedure
The SART was administered in a quiet, evenly lit room. Participants completed 360 trials. The factorial combinations of digit (1–9) and font size (5 levels) were equally represented with 7 repeats. The order in which the factorial combinations occurred was randomized so that each participant saw a unique series of trials. The task began with the presentation of a digit for 250 ms. This digit was immediately replaced by a mask for 900 ms (the stimulus onset asynchrony, SOA, was therefore 1150 ms). For each trial, participants determined whether the digit was a “3” or not. If a “3” was presented, participants withheld their response. Otherwise, if the digit was any other number, participants made a speeded response using the index finger of their preferred hand. Participants therefore had to withhold their response, on average, for 1 out of 9 trials. Both speed and accuracy were emphasized as important. Following the presentation of the mask, the next digit was immediately presented. While a fixed, predictable sequence has been used in some studies (Johnson et al., 2007), we used a random sequence where the next digit could not be predicted. The cycle of presentation was not contingent upon participants' responses and continued until all 360 trials were completed. The task took 7 minutes to administer. Participants were given a block of 18 practice trials prior to completing the experimental trials. Following completion of the SART, The Student Work Preference Inventory (Amabile et al., 1994) was administered.
Analyses
Errors were recorded when participants failed to withhold a response to a “3” (commission errors) or when they failed to respond to a number other than “3” (omission errors). The total number of errors was summed and was converted to a percentage of the total numbers of trials (360). Reactions times were recorded for correct responses to numbers other than “3” and were averaged across trials. Because it is likely that different participants traded off speed against accuracy, an inverse measure of processing efficiency was used to take accuracy and RT into account. The inverse efficiency score was suggested by Townsend and Ashby (1983) and divides the average RT by percentage accuracy. While the inverse efficiency score has been criticized (Bruyer & Brysbaert, 2011), it has been fruitfully used by a number of researchers (Roder, Kusmierek, Spence, & Schicke, 2007; Shore, Barnes, & Spence, 2006). Rach, Diederich, and Colonius (2011) compared the inverse efficiency score with a method that fitted a sequential sampling model to the RT and accuracy data. Although the sequential sampling method has some benefits, Rach et al. (2011) concluded that the inverse efficiency score was still a useful indicator of performance. In addition to the inverse efficiency score, the standard deviation of the RT was analysed to investigate whether correct response times were more variable on a trial-to-trial basis.
Results
The inverse efficiency data were analysed with an analysis of covariance (ANCOVA) with time (early, late) and reward (course-credit, paid) as between-participant factors. Preliminary analyses revealed a correlation between age and performance on the SART, r(79) = .283, p = .012, with older participants performing worse on the SART. In addition, there was a significant correlation between age and extrinsic motivation, r(79) = −.235, p = .037, whereby older participants had lower levels of extrinsic motivation. Given that early participants were significantly older than late participants and the effect of age on the dependent variables, age was included as a covariate in the analyses.
For the inverse efficiency data, there was no main effect of time, F(1, 74) = 0.606, p = .439, Mean inverse processing efficiency scores, controlling for age covariance, for the course-credit and paid groups tested early and late in semester. Standard error bars are shown.
The same ANCOVA model was used for the RT variability data. There was no main effect of time, F(1, 74) = 2.175, p = .104, Response time variability measured using the standard deviation (SD) of reaction time, controlling for age covariance, for the course-credit and paid groups tested early and late in semester. Standard error bars are shown.
Intrinsic and extrinsic motivation scores were calculated separately for each participant and were analysed using the same ANCOVA as that for the inverse efficiency data. For intrinsic motivation, there was no effect of reward, F(1, 74) = 0.163, p = .688, Mean (a) intrinsic and (b) extrinsic motivation scores, controlling for age covariance, for the course-credit and paid groups tested early and late in semester. Standard error bars are shown.
For extrinsic motivation, there was a significant effect of reward, F(1, 74) = 8.568, p = .005,
Discussion
Unlike many studies, which have investigated the effect of semester timing on course-credit participant characteristics, the current study used a control group of paid participants. This control group allowed us to test whether differences between early and late participants were the result of internal factors such as motivation or external factors such as study pressures late in semester.
Cognitive performance was measured using an inverse efficiency and RT variability score obtained from the SART (Robertson et al., 1997). The inverse efficiency data give an indication of the accuracy of responses relative to the speed at which the responses were made. In contrast, the RT variability reflects trial-to-trial variability in response times for correct responses. There was no overall difference between the course-credit and paid participants in sustained attention performance for either measure. No difference between course-credit and paid participants has also been reported by Brase (2009) for a Bayesian task. There was also no overall decrement in performance from early to late semester for either measure. This lack of difference effectively rules out the possibility that semester effects are related to external factors such as study pressures late in semester. Instead, the data showed an interaction between group and time for both the inverse efficiency and the RT variability data.
For the inverse efficiency data, early participants showed no difference in performance between the course-credit and paid groups. In contrast, for the late participants, the course-credit group performed significantly worse than the paid group. The interaction was brought about by a nonsignificant fall in performance by the course-credit group coupled with a nonsignificant rise in performance in the paid group. For the paid participants, there is some indication that sustained attention increased later in semester. This increase could reflect a learning effect whereby first-year psychology students learn to do better in experiments as they gain experience by participating in psychological research. This potential learning effect may have been counteracted in the course-credit participants where internal factors, such as lower motivation, reduced sustained attention.
The RT variability data complement the inverse efficiency data. Course-credit participants had low levels of RT variability early in semester, which significantly increased later in semester. In contrast, RT variability was not affected by the time of semester in the paid participants. Given that motivation is related to increase variability on a RT task (Garrett, MacDonald, & Craik, 2012), the increased variability later in semester for course-credit participants suggests a lack of motivation in this group.
Individual differences in intrinsic and extrinsic motivation were measured using the Student Work Preference Inventory (Amabile et al., 1994). For intrinsic motivation, there was no overall group difference, nor was there an effect of time. There was, however, some indication of an interaction between group and time. While it should be acknowledged that post hoc tests should not be conducted on a nonsignificant interaction, the pattern in Figure 3a is in the direction that would be predicted. The post hoc tests revealed that, although there was no effect of time for paid participants, the early course-credit participants had significantly higher levels of intrinsic motivation than their counterparts who signed up later in the semester. The results therefore support Hom's (1987) claim that early participants have higher levels of intrinsic motivation than late participants. The results, however, contrast with those of Casa de Calvo and Reich (2007) who reported no effect of time on responses to the Intrinsic Motivation Inventory (Ryan, 1982). Although the cause of the discrepancy is unclear, it is possible that the Student Work Preference Inventory (Amabile et al., 1994) is more sensitive to intrinsic and extrinsic motivation in students because it is specifically designed to assess this group.
The interaction observed for intrinsic motivation stands in contrast to the pattern of results observed for extrinsic motivation. In this case, there was a main effect of group whereby paid participants had a higher level of extrinsic motivation than course credit participants. It is likely that that this effect reflects the fact that individuals with higher levels of reward-driven extrinsic motivation are more likely to participate in an experiment for money. It is also noteworthy that the effect of group was consistent across time. Thus, the interaction between time and group is quite specific to intrinsic motivation and does not reflect a general motivational state.
The current study has demonstrated that, while there are no differences in sustained attention between course-credit and paid participants early in semester, paid participants do better later in semester. There was also an effect of time on intrinsic motivation whereby scores decreased across semester for course-credit participants whereas paid participants showed no change. These results therefore demonstrate that the difference between early and late course-credit participants is related to an internal mechanism linked to motivation. Participants who participate early in experiments are keen to engage in the study and do well. In contrast, students who are less enthusiastic about participating in research leave it to the last minute and do less well. Another related possibility is that the late participants were intrinsically motivated early in semester, but “bunt-out” during the semester. If we had tested these participants early in semester, they would have had high levels of intrinsic motivation and sustained attention. Because “burn-out” is known to affect tasks related to response readiness (Kleinsorge, Diestel, Scheil, & Niven, 2014) it is possible that these same participants had lower levels of intrinsic motivation and sustained attention later in semester.
The results of this study have implications for experimental design. Experiments conducted early in semester are likely to include participants who are more motivated and perform better than participants tested later in semester. A lack of motivation and/or poorer performance may introduce noise into the data (Garrett et al., 2012) and obscure effects that may have been significant otherwise. Such effects become particularly problematic when experiments are conducted at different times of semester, and the results are compared. Because the effect of semester is driven by internal qualities of the participants, any intervention needs to be targeted at this level.
One solution to the effect of semester is to screen the data for outliers with poor performance and exclude these participants. This solution, however, is somewhat post hoc and therefore not ideal. One could also argue that different experiments should be conducted at the same time of semester with participants interleaved between the experiments. While this solution may work in an ideal world, it is more likely that most experiments are conducted on the basis of results of the previous experiment—making it difficult to control for order. Another solution is to use paid participants in multiple experiment designs because they do not differ in relation to motivation or performance across semester. Whatever the solution, researchers should be aware of the differences that can arise between the data collected early and late in semester and control for these effects.
