Abstract
Objective
In this study, we aimed to show that post-task self-reported mind-wandering can be influenced by task performance.
Background
Retrospective self-report scales are widely used to measure thought content such as task-unrelated thoughts or mind-wandering in sustained attention or vigilance research. Self-reported thought content is presumed to be a predictor of performance. However, it is possible performance affects how people report their thought content.
Method
In a remote online experiment, we used a fixed order Sustained Attention to Response Task (SART) to force errors by manipulating an expected stimulus. We then assessed self-reported thought content.
Results
We were successful in forcing errors in the SART. Participants in the forced error version of the task reported having higher task-unrelated thoughts than those participants in a version of the task which did not force an error, despite the tasks being identical up until the forced error.
Conclusion
Post-task thought content probes (and similar thought content measures) are apparently affected by task performance despite their conventional use as a predictor of that performance. The current method of using post hoc thought content probes is thus a poor choice for studying the impact of thought content on performance.
Application
A fixed order SART with forced errors is a novel way to investigate relationships between performance and self-report measures of thought content.
Keywords
In vigilance and sustained attention, there is a growing literature examining the impact of thought content, for example, mind-wandering, on performance (Christoff et al., 2009; Seli et al., 2018; Smallwood et al., 2004; Zanesco et al., 2024) and also those challenging this work (Dang et al., 2025; Head & Helton, 2018; Humphrey et al., 2025). The advocates of employing thought content measures as predictors for vigilance performance suggest vigilance errors are due to either states of mindlessness or attention withdrawal from the task due to task-unrelated thoughts (Smallwood et al., 2004). The critics of this perspective propose task-unrelated thoughts may occur during vigilance tasks but that these thoughts are basically irrelevant for performance (Dang et al., 2025). An example of an early study in vigilance examining the impact of task-unrelated thoughts on performance was Antrobus and colleagues (1966). In their study, Antrobus and colleagues induced task-unrelated thoughts by exposing participants in one condition to unsettling news prior to performing a vigilance task. Telling male university students that their university draft deferment for the Vietnam War was no longer valid via a sham news report did generate more post-task self-reports of task-unrelated thoughts in that condition compared to a control, but interestingly had no significant impact on vigilance performance. Nevertheless, despite early failures in the literature to support an impact of self-reports of task-unrelated thoughts on vigilance performance, more contemporary researchers have continued to advocate the possibility that task-unrelated thoughts directly cause sustained attention or vigilance lapses (Seli et al., 2018).
A challenge for the entire field of examining the impact of thought content on performance, not just vigilance performance, is that assessment of thought content requires a first-person verbal report or response (Overgaard, 2006). Participants must report what they are thinking about during or after performance. If thought content is assessed ahead of performance or physiological responses, then the thought content assessment may interfere with subsequent performance or physiological responses. The participant would be performing a task, and then probed regarding what the participant is thinking. Then this probed response would be used to predict subsequent performance or physiological responses. The problem is the probing itself may influence subsequent thoughts, and this potential change in thoughts may influence subsequent performance or physiological responses. Reliably assessing the impact of thought content prior to performance is a difficult challenge as the act of verbalizing or reporting thoughts may alter the stream of thoughts itself and thus, subsequent performance and physiological responses (note researchers using “think aloud” protocols have raised similar concerns, see Raffaelli et al., 2021). Thought content researchers interested in mind-wandering and task-unrelated thoughts have instead decided to probe participants during tasks or immediately after tasks with retrospective self-reports of thought content (Christoff et al., 2009; Seli et al., 2018; Smallwood et al., 2004; Zanesco et al., 2024). The participant is performing a task and then is probed about what they were thinking in the near past (Kane et al., 2021).
Head and Helton (2018), however, have pointed out a serious problem with this entire method. Causes precede effects in time (White, 2000). The self-report occurs after the period of performance or physiological response. This naturally begs the question of the direction of causality. Participants may instead utilize self-reports of thought content to explain or manage impressions of their performance or physiological responses. A participant has sweaty palms and is then probed about thought content. The participant notices they have sweaty palms; therefore, the participant infers thinking about something which could cause sweaty palms. This is not implausible as it is structurally similar to the James-Lange theory of emotion (Lange & Haupt, 1922). Even though plausible, researchers pursuing mind-wandering and task-unrelated thoughts seem to discount this possibility (see Aitken et al., 2023; Head & Helton, 2018).
If self-reports are made partially in response to performance, we would expect participants would report more task-unrelated thoughts after a forced error than not. The forced error would preferably be an obvious error or near error (like an action slip) made by the participant, not just error feedback provided by an experimenter which may or may not be believed (Helton et al., 1999). One of the most widely used tasks in the mind-wandering or thought content literature, especially in regards to sustained attention, is the Sustained Attention to Response Task (SART; Robertson et al., 1997). The SART is a Go-No-Go task with very infrequent No-Go stimuli: randomized single digits are presented, and key presses are withheld only for the digit 3. Accuracy and speed of response to both Go and No-Go stimuli are measured. A modified version of the SART, the fixed order version, provides an excellent candidate for a forced error task (Carter et al., 2013; Docktree et al., 2006; Manly et al., 2003). In the fixed order version of the SART the number stimuli occur in repeated order (4 always follows 3 and so on); the number stimuli are not randomly presented but in their fixed order. When the participant performs the fixed order SART, the pattern quickly becomes obvious. Therefore, the expectancy of what stimuli will occur next in the task is very high. Errors still happen in the fixed order SART, especially errors of commission (inappropriate responses to No-Go stimuli), but they are much rarer than with the traditional SART. However, if after a period of the fixed order SART the order were altered, an error could be forced. The participant would have a strong expectancy, for example, that the next stimulus to occur would be a Go stimulus, but the participant would instead encounter a No-Go stimulus. Since the SART has a notable speed-accuracy trade-off many participants would likely make the expected response to the catch-No-Go erroneously (Dang et al., 2018). Since error awareness in the SART is essentially 100% (McAvinue et al., 2005), the participant would know they made a mistake. Furthermore, participants would also likely notice if they began to respond but were still successfully able to withhold the response (e.g., a near error, see Head et al., 2020; Wilson et al., 2018).
While the SART may initially appear to be unrelated to occupational tasks, the SART has gained ground among applied researchers in a number of domains, such as military settings and law enforcement (Head et al., 2017), and has even been recently applied to agricultural jobs (Mensen et al., 2024). Essentially the SART represents any Go-No-Go selection or detection task where the Go rate is more prevalent than the No-Go rate. This can occur in a variety of occupational settings where predominate motor responses need to be disrupted occasionally or a prepotent motor action needs to be inhibited. For example, an agricultural worker may need to spray weeds, but occasionally withhold the spraying action to a non-weed, or a forestry operator may need to cut logs but occasionally inhibit the cutting action when necessary.
In the present study, we had participants perform one of four fixed order SARTs. We wanted to probe thought content only once after the last number stimulus, the catch-trial, was presented in the task. Probing once would limit any potential contamination of repeated probing and enabled the thought content probe to occur at the same time for all versions of the task (after the 160th number). There were two fixed-orders for the main part of the task. One task order led to the expectation of the catch-trial being a Go (number 0) and the other task order led to the expectation of a No-Go (number 3). For both task orders, we then modified a version of the task so the catch-trial was changed to the other type (so in the expected-Go task, the participant was instead presented with a No-Go stimulus). This resulted in four fixed order SARTs: Task EG-RG (expected Go, received Go), Task EG-RN (expected Go, received No-Go), Task EN-RN (expected No-Go, received No-Go), and Task EN-RG (expected No-Go, received Go). The task where we expected to be able to force errors was Task EG-RN (expected Go, received No-Go) as the main errors made in a SART are errors of commission (responding to No-Go stimuli inappropriately).
Our first prediction was that we could indeed force errors of commission with the catch-trial switch. We expected more errors of commission in Task EG-RN (expected Go, received No-Go) than Task EN-RN (expected No-Go, received No-Go). Errors of omission are less common and not the primary performance metric of interest in the SART. Comparing errors of omission for Task EG-RG (expected Go, received Go) with Task EN-RG (expected No-Go, received Go) would, however, enable us to determine if we could force errors of omission as well in the fixed order SART. We were less confident this would be possible in the SART, but included Task EN-RG as a test of this possibility.
In regards to self-reports of task-unrelated thoughts, Task EG-RG (expected Go, received Go) and Task EG-RN (expected Go, received No-Go) were perceptually and procedurally identical up until the final catch-trial. Since the participants were randomly assigned to task, self-reports of task-unrelated thoughts should be identical for Task EG-RG and Task EG-RN participants if thought content was uninfluenced by the catch-trial manipulation. We expected, however, we could force errors with our manipulation and we expected this would impact self-reports of thought content. Since participants in Task EG-RG were expecting a Go stimulus and given the default behavior in the SART is responding (given the relative prevalence of Go stimuli), we expected very few, if any, participants making an error in Task EG-RG on the catch-trial. Task EG-RG’s participants would, therefore, not have thought content reports impacted by errors. Task EG-RN’s participants would be highly likely to make errors (button responses) or near errors (initial movement responses that were successfully inhibited before an actual button response was committed), hence their reports of thought content would be subject to error awareness or even the awareness of a near error. By near error we mean the initiation of an inappropriate response that was successfully withheld and therefore, did not result in an actually erroneous response (Head et al., 2020; Wilson et al., 2018). We expected Task EG-RN to result in higher self-reported task-unrelated thoughts after the catch-trial than Task EG-RG.
Methods
Participants
Four hundred and fifty-nine undergraduate students from introductory psychology classes at George Mason University served as participants for course credit. Participants were native English speakers at least 18 years old and had normal or corrected-to-normal vision. We did not collect other demographic data on the participants as this was deemed irrelevant to the research goal of this experiment. We based our sample size on expectations from previous research using a binary performance metric (see Head & Helton, 2018). Our aim was to collect as many participants before the end of the semester we ran the study, and only if we fell below 400 participants would we continue collecting data into the next semester (and then repeat the rule).
Procedure
The study was conducted remotely online using Millisecond Inquisit software, which was downloaded to the participants’ computers to avoid network lag affecting timing. Participants completed 30 practice trials followed by 160 test trials. In each trial, a number between 0 and 9 was shown on the screen for 250 ms, followed by a mask (a circle with a cross in it) for 900 ms. Therefore, the full task (not including practice) took about 3 min. Trials were shown continuously apart from a break between practice and test trials. Participants were instructed to press the spacebar for each number except “3.” During the practice trials, participants were provided feedback if they pressed or withheld pressing incorrectly. No performance feedback was presented during the main portion of the task.
While participants were instructed not to assume that any apparent pattern in the numbers would continue, both the practice and the test trials did cycle through the numbers 0–9 in order (a target rate of 1/10 stimuli) until the final trial, the catch-trial. Participants were randomly assigned to one of four conditions: (1) Task EG-RG: The last number was not a 3, and it was in numerical order (e.g., 7-8-9-0) (2) Task EG-RN: The last number was a 3, but it was not in numerical order (e.g., 7-8-9-3) (3) Task EN-RN: The last number was a 3, and it was in numerical order (e.g., 0-1-2-3) (4) Task EN-RG: The last number was not a 3, and it was not in numerical order (e.g., 0-1-2-8)
After the final catch-trial, participants completed a visual analog scale rating how much they were thinking about the task they were doing (“Entirely about the task”) relative to something other than the task (“Entirely about something other than the task”) on a 100-point scale. The exact wording was, “During the last number display, were you thinking more about the task you were doing or more about something other than the task you were doing?” The direction of the scale, which side was about the task or something other than the task, from right to left was counterbalanced across participants. While there are a number of ways to probe thought content, this approach has been used previously (see Weinstein, 2018). In addition, participants were asked to report the last number they had seen during the trial by entering it via their keyboard. This enabled us to assess awareness of the last number seen in the task.
Results
Sixty four participants were excluded from analyses due to omission error rates greater than 10% (not making enough responses to the Go stimuli), commission error rates greater than 40% (indicating they may not have understood the requirement to withhold responses to the No-Go stimuli), incomplete participation, or other technical issues. These performance levels were chosen as exclusion criteria because they would indicate either a failure to understand the task requirements or indicated a lack of serious engagement with the task. Given the remote nature of the study, participants needed to demonstrate active engagement with the task and compliance with task instructions. These performance criteria were set before analyzing the results. This resulted in a total sample of 395 participants.
For all statistical tests we used a priori tests as recommended by methodologists given we had direct comparisons we wanted to make to test our hypotheses (Ruxton & Beauchamp, 2008).
Forcing Errors
Error Rates and Thought Reports for all Tasks
Self-Reported Thoughts
We performed a t-test comparing self-reports of task-unrelated thoughts to the catch-trial probe between Task EG-RG (expected Go, received Go) and Task EG-RN (expected Go, received No-Go). We predicted participants in the EG-RN would report having more thoughts about something other than the task than participants in the EG-RG condition. Participants reported thinking significantly more about something other than the task in Task EG-RN (M = 49.44, SD = 29.57) than in Task EG-RG (M = 40.33, SD = 29.49) as predicted, t (193) = 2.14, p = .034, Mdifference = 9.11, 95%CI [.72, 17.50], d = .31. The self-report values for all tasks are displayed in Table 1.
Memory Check
We checked to see if the participants in Task EG-RN (expected Go, received No-Go) who made an inappropriate response to the catch-trial were different from participants who correctly withheld a response to the catch-trial regarding whether they correctly reported the number stimulus presented on the catch-trial using a Fisher’s exact test. The percentage of participants who were able to correctly remember the catch-trial stimulus were not significantly different between those who withheld to the catch-trial (72.72%) and those who made an error of commission to the catch-trial (73.26%), p = 1.000.
Speed Check
We checked to see if the participants in Task EG-RN (expected Go, received No-Go) who made an inappropriate response to the catch-trial were different from the participants who correctly withheld a response to the catch-trial regarding the preceding trial’s response time (the response time to the trial immediately prior to the catch-trial). Participants who made an inappropriate response to the catch-trial (M = 262.70 ms, SD = 133.44) were significantly faster than participants who correctly withheld a response to the catch-trial (M = 421.48 ms, SD = 243.81), t (101) = 4.02 = , p < .001, Mdifference = 158.78 ms, 95%CI [80.48, 237.09], d = .98. Note the degrees of freedom are slightly lower due to some participants not making a response to the stimulus preceding the catch-trial.
Discussion
In the present study, we were able to force errors in a fixed order SART. The participants in two of our tasks were led to expect either a Go or No-Go catch-trial but were instead given the opposite stimuli. In the case of those expecting a No-Go, being presented instead with a Go caused commission error rates to go from 22.43% of participants to 79.63%. In the case of those expecting a Go, being instead presented with a No-Go caused omission error rates to go from 1.15% of participants to 11.83%. A fixed ordered SART with catch-trial manipulation provides an excellent tool to examine how forced errors may impact other metrics of interest to researchers.
Having established we could indeed cause errors in the SART, particularly the more relevant commission errors, could we also cause task-unrelated-thought reports? This appears to be the case. Those participants who expected a Go stimulus but instead were presented a No-Go stimulus reported thinking more about something other than the task than those participants who expected a Go stimulus and were indeed presented a Go stimulus. The effect, while statistically significant, was small (Cohen’s d = .31; see Lakens, 2013; Sullivan & Feinn, 2012). Our self-report measure of task-unrelated thoughts was, however, a visual analog scale ranging from 0, completely task-related thoughts, to 100, completed task-unrelated thoughts, and the means hovered around the mid-point of the scale. Many researchers in the mind-wandering community instead use binary forced choice responses with their thought content probes—“are you either having a task-related thought or a task-unrelated thought” (Kane et al., 2021). Future research should employ a binary or forced choice thought content probe to determine if a forced error impacts a forced choice thought content report. A person facing a binary choice who is unsure of whether their thoughts are more or less task-related may get nudged to one or the other on a binary choice if they are aware of errors. Essentially, the effect found with the visual analog scale in this study may actually be exacerbated when a participant is given fewer response options. We should also note there was overall a great deal of variability in the thought report measure across participants, which may make any effect on this metric likely to be small.
In addition, future researchers should examine and assess near errors and their impact directly on self-reports of thought content. While we had no independent assessment of a near error in the present study, we suspect these may be noticeable by participants in SART like tasks (see Head et al., 2020; Wilson et al., 2018). If the participant almost makes a mistake, does it influence performance as much as making an actual error? The participant could interpret a near error in a similar way. For example the participant may reason, “I almost messed up, I must have been thinking about something other than the task.” This could be studied in the future using electromyography (EMG), to assess muscular activity in the responding hand and arm, or employing an alternative input device (see Wilson et al., 2018).
For the task of primary focus, Task EG-RN (expected a Go, received a No-Go) we also performed two further analyses to examine differences between those who made a commission error to the catch-trial stimulus and those who managed to appropriately withhold their response to the catch-trial stimulus. First we checked whether they were different in being able to report accurately the number stimulus of the catch-trial after the thought content probe. The two groups of participants were not different. In both cases ∼73% of participants could accurately free recall the number stimulus they were presented on the catch-trial. One interpretation of this result is the people who made a commission error and those who did not in this task were not different regarding their task awareness during the catch-trial. Where these two groups of individuals were clearly different was in regard to their response times to the stimulus immediately preceding the catch-trial stimulus. In this case the participants who failed to withhold to the catch-trial stimulus were much faster on the preceding stimulus than those who could withhold appropriately to the catch-trial stimulus. This effect was large (Cohen’s d = .98; see Sullivan & Feinn, 2012). Given the notable speed-accuracy trade-off in the SART (Dang et al., 2018, 2023), a likely cause of commission errors is simply deciding to respond quickly. Another perspective is the high likelihood or expectation of what stimulus would occur next enabled the participant to respond more quickly. Their level of confidence may have related to their response speed. Commission errors in the SART likely are due to response strategy choices or a lenient response bias (Bedi et al., 2023, 2024a, 2024b, 2024c; Dang et al., 2018, 2025), not variation in task-unrelated thoughts/mind-wandering or perceptual decoupling (Cheyne et al., 2009; Christoff et al., 2009; Manly et al., 2000, 2003; Schooler et al., 2011; Seli, 2016; Seli et al., 2012; Smallwood et al., 2004; Zanesco et al., 2024). Nevertheless, how response strategy or response bias may or may not interrelate with self-reported thought content remains open to investigation.
Thought content may or may not impact performance, but researchers investigating thought content, in particular, task-unrelated thoughts or mind-wandering, should give method issues further consideration. The Sustained Attention to Response Task (SART), in particular, is one of the preferred tasks for examining the impact of task-unrelated thoughts on human performance (Cheyne et al., 2009; Christoff et al., 2009; Manly et al., 2000, 2003; Schooler et al., 2011; Seli, 2016; Seli et al., 2012; Smallwood et al., 2004; Zanesco et al., 2024). There are researchers who challenge the use of the SART as a vigilance task, especially when used for short durations (as we have done here), because it has a substantial speed-accuracy trade-off (Dang et al., 2018; Helton, 2025). Despite these concerns, other researchers continue to advocate the SART as a vigilance or sustained attention measure (see Shelat & Giesbrecht, 2025). Putting this issue regarding the SART’s status as a vigilance task aside, errors (which participants are aware of in the SART; see McAvinue et al., 2005) may influence self-reports of task-unrelated thought. Indeed, in the present case we have demonstrated post-task thought content measures were affected by task performance (errors). Researchers, however, regularly use post-task thought content measures as predictors of prior performance and some researchers advocate strongly for employing them to understand the psychological processes employed by participants during task performance (Cheyne et al., 2009; Christoff et al., 2009; Manly et al., 2000, 2003; Schooler et al., 2011; Seli, 2016; Seli et al., 2012; Smallwood et al., 2004; Zanesco et al., 2024). Given the possibility of confusing the direction of causality, we believe this method is a poor choice for studying the impact of thought content on performance.
Some researchers claim these self-report measures are indeed reliable because they predict prior occurring physiological states or measures, not just performance (Denkova et al., 2018). While the current study does not directly address physiological measures, we should note the authors using self-report measures to predict prior physiological metrics do not determine or even attempt to assess the direction of causality in their studies. While they may assume participants’ self-reports could not be influenced by preceding physiological measures, this assumption warrants further examination. Could a neurophysiological signal, for example, something measured via electroencephalogram (EEG), correlate with a physiological signal the participant could be directly aware of, such as blink-rate, eye-movements, muscle tremor, and so on? Trivially, of course, which is why researchers have to carefully control for gross movement artefacts when using EEG. However, putting the obvious aside, there is evidence of substantial correlations between physiological signals people can be aware of and underlying neurophysiological signals (Sciaraffa et al., 2021). This issue should be explored in future studies.
Despite concerns, some researchers argue strongly in favor of using participant self-reports (Corneille & Gawronski, 2024). Alternatively, Nisbett and Wilson (1977, p. 231) argued there is “… little to no direct introspective access to higher order cognitive processes.” Since at least Plotinus’ work in 270 CE in the Enneads people have realized much of human cognition is implicit, preconscious or unconscious. Freud later popularized the degree to which cognition is unconscious or preconscious (Westen, 1999), and current empirical work is verifying this perspective (Afalo et al., 2022). While the evolution of human language is a vexing problem for researchers (Corballis, 2017), there is little reason to suspect it did not evolve from an ancestor species’ animal communication system (Haldane, 1955). Animal communication is used to manipulate others’ behavior, not as an internal or introspective probe of cognitive processes. Additionally, animal communication or signaling systems are rife with deception (Šekrst, 2022; indeed even self-deception see Angilletta et al., 2019). Consider a simple case like Batesian mimicry, where some animals mimic the signals of more dangerous or less palatable animals. The non-venomous Texas rat snake (Pantherophis obsoletus lindheimeri), for example, mimics the rattling of the venomous western diamondback rattlesnake (Crotalus atrox). While human language may be a unique adaptation, there is no reason currently to believe it deviates from the imperatives of other animal communication systems. Therefore, self-reports could be contaminated or influenced by social or other goals. Furthermore, like other animal communication systems, human language is often deceptive; people do lie, for example, even when the content of those lies can be objectively refuted. In the case of self-reports of internal processes, we cannot verify them objectively, but we should at least be sensitive to social uses of those reports. Perhaps, self-reports of mind-wandering are sometimes a social justification for mishaps: “I am not always incompetent, I was just momentarily distracted. This can be rectified.” They may even a case of self-deception, “I am not incompetent; I just was momentarily distracted.” While speculative, there are reasons to have doubts about post-task self-report’s ability to diagnose action slips or lapses (Dang et al., 2025). We, however, do not want to cast an overly broad stroke against some uses of introspective methods, such as think-aloud protocols used to probe areas where conscious deliberation is clearly relevant. We are mainly addressing this specific use of using thought content reports to predict actions or behavior already committed.
The current method of using after-the-fact thought content probes to infer the impact of thought content on earlier performance or physiological responses may require further examination. Alternatively, researchers could focus on causing thought content changes directly (so for example, inducing mind-wandering) and seeing whether and how those subsequent changes in thought content impact performance or physiological responses (e.g., Antrobus et al., 1966; Hancock et al., 2017). Regardless, our manipulated fixed order SART provides a viable and interesting avenue to further explore the complex relationship between thought content reports and performance.
Key Points
• We were able to force errors using a fixed-order version of the Sustained Attention to Response Task. • Participants experiencing a forced error reported higher rates of mind-wandering despite the tasks being identical, which suggests that performance affected thought content reports. • The current use of post hoc thought content probes is a poor choice for measuring the effects of thought content on performance.
Footnotes
Author Contributions
A.E. and W.H. wrote the main manuscript text.
A.E. and W.H. reviewed the manuscript.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Ethical Considerations
The research was approved by the George Mason University Human Ethics Committee.
Consent to Participate
All participants gave their informed consent for inclusion before they participated in the study. The study was conducted in accordance with the Declaration of Helsinki.
Consent for Publication
All participants provided consent for publication.
Data Availability Statement
The dataset for the current study is not publicly available due the fact that they constitute an excerpt of research in progress but are available from the corresponding author on reasonable request.
