Disengaged Responses,Distorted Risks: Evaluating Effort on Student Risk Assessments

Abstract

Chronic absenteeism among high school students poses a significant threat to academic success. As schools address the root causes of absenteeism, assessments such as the Washington Assessment of Risks and Needs of Students (WARNS) help identify students’ risk factors and support needs. However, assessment scores depend on the quality and authenticity of student responses. When students are disengaged (e.g., rushing through items, exerting minimal effort), the data may misrepresent their actual needs, undermining assessment validity. We examined disengagement patterns among high school students completing the WARNS, focusing on response time as a behavioral indicator of engagement. A small percentage (<5%) of students displayed disengagement, which differed between males and females and assessment context. Differences in risk classification patterns were observed for students identified as rapid responders. Results highlighted the importance of incorporating response process data into assessment interpretation and suggested practical strategies for improving the accuracy of these types of assessments.

Keywords

risk assessment response time chronic absenteeism

Chronic absenteeism in high school is a serious concern, as it predicts a host of negative outcomes including low academic achievement, dropout, and increased risk of delinquency (Black & Elgaddal, 2024). In the 2021–2022 school year, 31% of U.S. students were chronically absent, meaning they missed 10% or more of school days. In 2023, the national average declined only slightly to 28% (U.S. Department of Education, n.d.). In response, educators and juvenile courts have turned to structured risk and needs assessments to identify the factors underlying truancy and to guide early interventions (Development Services Group, 2015).

Description and Utility of Risk Assessments

Risk assessments are structured tools used to evaluate the likelihood of encountering adverse outcomes (e.g., reoffending, dropping out of school, substance use) and to identify the needs or factors that, if addressed, can reduce those risks. In practice, these assessments help professionals collect and synthesize information about behavioral, emotional, academic, and social risk factors in a systematic way. The assessments yield quantitative estimates (risk scores or classifications) that inform decisions about supports for youth. That is, practitioners use risk assessments to bring objectivity and evidence into early intervention planning (Schwalbe, 2008).

This first step in early intervention can prevent escalation into more serious problems. Problems like chronic absenteeism, early aggressive behavior, or substance experimentation are warning signs for later outcomes such as dropping out of school or criminal behavior. For instance, high truancy in high school is associated with higher dropout rates and increased risk of delinquency (Henry & Huizinga, 2007; Rocque et al., 2017). By using risk assessments, schools and agencies aim to intervene early, before a student disengages completely. As Strand et al. (2023) point out, combating truancy and school failure even at the middle school level is seen as a cornerstone of intervention efforts to improve long-term outcomes. In short, youth risk assessments are critical tools because they (a) identify youth who are on concerning trajectories, (b) clarify the factors contributing to risks, and (c) guide the allocation of help (from truancy reduction efforts to mental health referrals) in a timely manner. Early, targeted intervention can alter a young person’s life path – keeping them in school and out of trouble – and risk assessments are designed to facilitate exactly that.

Risk assessments typically evaluate a broad range of factors across multiple domains, including behavioral, emotional/psychological, academic/educational, and social/environmental. These domains are grounded in research showing that challenges in any of these areas can increase a youth’s likelihood of negative outcomes (e.g., delinquency, school failure). Early aggressive behavior or conduct problems (behavioral risk) in childhood predict criminal activity (Shaw et al., 2019). Untreated depression or trauma (emotional risks) can contribute to substance abuse or school dropout. Academic difficulties and disengagement from school are strong risk factors for not graduating (academic risk, Henry et al., 2012).

The Washington Assessment of the Risks and Needs of Students (WARNS) evaluates six domains associated with school disengagement and failure (George et al., 2021). It was originally developed in response to the Washington State Becca Bill legislation to assist with identifying and addressing the needs of students with chronic unexcused absences (Gotch & French, 2020). Tools like the WARNS are employed in schools, community programs, and juvenile justice settings to collect student responses and combine them with additional information to guide supportive actions. The WARNS functions as an initial screening instrument rather than a stand-alone diagnostic, disciplinary, or placement tool. Primary score use is to begin conversations with the student and school personnel.

Quality of Risk Assessment Scores Due to Low Student Effort

Risk assessment scores are only useful if youth provide accurate responses. Many youth assessments rely on self-report or an interview, which means the youth’s cooperation and effort in responding are crucial. However, individuals may provide inaccurate or misleading answers. For example, they may minimize concerns due to fear of punishment or overstate issues in an effort to seek attention. Moreover, some youth may expend minimal effort when responding, answering so quickly that they could not have processed the item and provided a thoughtful response. The prevalence and pattern of these quick non-effortful responses is the focus of our study, as such disengaged responding can compromise assessment results and, in turn, the decisions made based on these results.

Concerns about rapid responding and invalid responses are not unique to risk assessments but tend to be an issue for any assessment that is low stakes for respondents. Unlike high-stakes assessments, low-stakes assessments have little or no personal high-stakes consequence for the respondent (Wise & DeMars, 2005). In achievement testing contexts, high-stakes assessments impact outcomes such as students’ grades, graduation, or admission decisions. By contrast, low-stakes assessments in achievement contexts are typically gathered for test development (DeMars, 2000), school improvement initiatives (Finney et al., 2016), or as formative assessments that do not impact grades but are used to provide teachers with immediate and actionable feedback about student ability (Arhavbarien, 2026). Moreover, test scores reported for comparability studies (e.g., NAEP and TIMSS) or accountability mandates (e.g., Race to the Top) are often gathered in low-stakes testing contexts, where test performance has no impact on the student (e.g., Braun et al., 2011; Cole & Osterlind, 2008; Hopfenbeck & Kjærnsli, 2016; Rios & Soland, 2022; Smith & Smith, 2004; Zamarro et al., 2019). In such low-stakes assessment contexts, a non-trivial proportion of students tend to expend minimal effort (Wise & DeMars, 2010). Importantly, low effort can bias test scores and result in inaccurate interpretations (Wise, 2017; Wise & DeMars, 2005).

Beyond achievement test performance, educators and the public are interested in a more expansive view of student competence that includes non-cognitive attributes or personal qualities (e.g., self-control, growth mindset, social belonging), with the goal of enhancing these malleable attributes (Duckworth & Yeager, 2015). These assessments are gathered in a low-stakes context in both K-12 and higher education, and issues of low effort have been empirically documented (Akhtar & Kovacs, 2023; Barry & Finney, 2016; Schaefer & Finney, 2025; Swerdzewski et al., 2011). Moreover, these attributes are often measured as part of a needs assessment or early alert mechanism to identify students with low scores (Markle et al., 2013; Markle & O’Banion, 2014; Meikrantz Sharp et al., 2026) and provide educational supports (Pope et al., 2023). Yet, students may not take these assessments seriously as they know their responses do not impact grades or graduation. In turn, measures of these attributes are often accompanied by measures of expended effort to gauge the trustworthiness of the responses (DIA Higher Education Collaborators, n.d.).

Response time is a common approach to quantify expended effort (Wise & Kong, 2005) and could be used to evaluate rapid responding on risk assessments. For example, spending only one or 2 seconds (s) answering an item is evidence of disengagement (Schnipke, 1995; Soland et al., 2019; Wise, 2017). Rapid responding is not an all-or-nothing behavior; most low-effort respondents give effort on some items (Wise & Kuhfeld, 2021). Wise (2015) illustrated how score-based inferences from achievement tests can change after removing students who rapidly responded to more than 10% of items, which previous research suggests is less than 25% of the test-taking population (Jensen et al., 2018; Kong et al., 2007; Rios et al., 2014; Soland, 2018a, 2018b; Soland et al., 2019; Soland & Kuhfeld, 2019; Wise & Cotten, 2009; Wise & DeMars, 2010; Wise & Kong, 2005). Although it is encouraging that 75% or more of students are engaging in minimal rapid responding, ideally, we want all students to put forth the requisite level of effort needed to ensure valid interpretations from scores.

For non-cognitive assessments and surveys, a variety of methods for identifying careless or inattentive responses have been used (e.g., long-string analysis, Mahalanobis distance, person-total correlations, self-reported effort; Curran, 2016; Meade & Craig, 2012; Schaefer & Finney, 2025). Response time, or the time it took an individual to complete a set of survey items, is commonly used on an intuitive basis to identify careless survey responding, yet it can be difficult to create a threshold for response time on a survey (Curran, 2016). Different methodological approaches exist to employ response time to identify careless responding. One approach is employing mixture modeling to empirically uncover classes of respondents who carelessly respond. Each respondent’s response time is given a probability of being generated by a careless responding class or non-careless responding class (Lundgren & Eklöf, 2023). This approach can also employ a stepwise weighting method that uses survey screen time information to determine the probability that an individual carelessly responded and then downweights these responses in the analysis (Ulitzsch et al., 2024).

A second approach is using a fixed or common time threshold for each item or block of items. The threshold classifies respondents’ response times as rapid if below the threshold or non-rapid otherwise. Response time effort (RTE), or the proportion of an individual’s responses classified as non-rapid can be created, representing an individual’s effort on the entire assessment (Wise & Kong, 2005). A challenge to this approach is determining the fixed threshold. A common approach for academic tests is the normative threshold (NT) method: setting the item threshold at 10%, 20%, or 30% of the mean response times (Wise, 2017; Wise & Ma, 2012). A common approach for setting the fixed threshold in surveys is to determine the number of seconds necessary to answer an item and set the threshold at this value. For example, the threshold of 2 s per item has received traction in the survey domain (Curran, 2016; Huang et al., 2012). Moreover, response time was coupled with other survey detection methods (e.g., long-string analysis) to identify disengaged survey responses at the page level of sets of survey items (Soland et al., 2019). They found that the highest overlap in responses deemed disengaged across detection methods occurred for responses under 2 s per item. Given the computational and conceptual ease of the fixed threshold method relative to a mixture modeling approach, we employed this approach for the initial assessment of rapid responding on the WARNS. Additionally, this approach has the practical advantage of being easier to communicate to stakeholders who ask about how seriously students take the assessment.

Despite growing literature on expended effort on achievement tests (e.g., Borger et al., 2025; Borghans et al., 2024) and surveys (e.g., Meade & Craig, 2012), we could find no literature on response time effort for risk assessments. If youth perceive the risk assessments as having little to no high-stakes consequences on their lives, they may rush through the assessments and provide invalid responses. Reed et al. (2024) examined test-taking effort on a low-stakes reading assessment among youth with high needs (juvenile offenders). Based on total testing times, results suggested response patterns consistent with rapid responding. Although helpful in understanding this population’s behavior on a reading test, the results do not provide insight into behavior when completing risk assessments.

Moreover, rapid responding is influenced by administration conditions. Remote assessment settings have been associated with both lower scores and lower motivation relative to in-person settings (Alahmadi & DeMars, 2022). Whereas in-person assessment settings are more controlled, remote settings are less controlled, which can negatively impact an assessment’s psychometric properties (Barry & Finney, 2009; Schaefer & Finney, 2025). Additionally, well-trained proctors are associated with increased examinee motivation relative to less trained proctors during in-person testing (Lau et al., 2009). Proctored in-person testing is associated with higher examinee motivation relative to unproctored remote testing (Alahmadi & DeMars, 2022). Research on proctoring in remote settings indicates that proctoring does not have an impact on examinee motivation relative to no proctoring (e.g., Hollister & Berenson, 2009; Rios & Liu, 2017). For risk assessments like the WARNS, youth typically complete the assessments in person; however, the presence and behavior of a proctor are often unknown. Administrators are given specific protocols explaining the use of the risk assessment scores, but it is unclear whether these protocols are implemented with fidelity and whether doing so influences youths’ perceptions of the stakes of the risk assessment. With that said, the amount of rapid responding could be examined across administration settings (e.g., school, treatment/resource facility) and formats (e.g., group vs. individual administration) to identify where rapid responding is more common. For example, group assessment administrations may increase time pressure as youth observe peers finishing, which may encourage faster responding, whereas individual administrations may reduce peer-related pressure and allow youth to answer more slowly and thoughtfully.

To begin understanding youths’ response effort when completing risk assessments, a first step is to estimate the prevalence of rapid responding and whether it is associated with individual or context characteristics. Further, rapid responding data can provide insight into risk misclassification, which can result in misallocation of resources and support. For example, a student who needs support but is misclassified as low risk (e.g., false negative) may not be offered services (e.g., meeting with a school counselor or a referral to resources) to address underlying factors contributing to their absenteeism or disengagement from school. Alternatively, a student who is misclassified as high risk (e.g., false positive) may receive unnecessary support, potentially diverting limited resources away from students who need them most. Typically, in educational screening contexts, greater priority is placed on minimizing false negatives so that students who may benefit from additional support are less likely to be overlooked (e.g., Kearney et al., 2023; Wu et al., 2026).

Purpose of the Study

We examined high school students’ rapid and effortful responding on the WARNS. We were interested in the following three research questions:

(1) What percentage of students are rapidly responding to the WARNS items?

A low percentage of rapid responses would increase confidence in the interpretation and use of WARNS scores. At present, however, the prevalence of rapid responding on this risk assessment is unknown. This aim aligns with Standard 4.13 in the Standards for Psychological and Educational Testing (AERA et al., 2014), which directs assessment users to investigate potential sources of irrelevant variance that may affect scores. Pervasive rapid responding would prompt new administration processes to ensure high-quality usable scores.

(2) Which student and context characteristics are associated with rapid responding?

If rapid responding is more common for particular student groups or particular administration contexts, data collection procedures could be adapted to reduce rapid responding in those settings. Currently, it is unclear whether rapid responding on the WARNS varies across student groups or administration contexts.

(3) Are students who rapidly respond disproportionately classified into particular risk categories?

Understanding how rapid responders are categorized in terms of risk helps us better understand if these students have an increased rate of not getting supports or being assigned unneeded supports.

Methods

Sample

Participants included 2,773 high school students who completed the WARNS during the 2023–2024 school year through participating organizations across the U.S. Over 90% of the sample was from Washington state, where the WARNS is named in legislation as an assessment that can be used when middle school and high school students reach a certain threshold of unexcused absences in a school year (2 and 7; RCW 28A.225.020). Assessments were primarily administered within public school districts, with additional administrations through youth service agencies and county programs. Students were from 62 school districts, representing 21% of the districts in the state. Table 1 provides sample descriptive statistics by grade level. The sample was composed primarily of students who identified as Hispanic or White.

Table 1.

Descriptive Characteristics of the Sample by Grade Level

Grade	N	Age (SD)	Female (%)	Male (%)	Hispanic (%)	Black (%)	White (%)	American Indian/Alaska Native (%)	Pacific Islander (%)	Asian (%)	Other Race/ethnicity (%)
9	730	14.5 (0.6)	46.8	51.4	42.2	9.2	39.9	7.1	4.4	5.2	2.7
10	794	15.5 (0.6)	44.8	53.5	43.1	7.8	40.6	6.3	2.9	5.4	3.3
11	708	16.5 (0.6)	42.3	54.3	44.6	7.1	42.2	8.9	2.8	5.8	2.8
12	541	17.8 (1.1)	40.5	57.3	46.2	6.7	47.0	6.8	1.7	5.2	2.8
All grades	2,773	15.9 (1.4)	43.8	53.9	43.9	7.8	42.0	7.3	3.0	5.4	2.9

Instrument

The Washington Assessment of the Risks and Needs of Students (WARNS, George et al., 2021) is an online 40-item self-report measure designed for schools, courts, and youth service providers to assess students’ risks and needs related to truancy and school failure and to help target follow-up services (George et al., 2021). The WARNS measures six subdomains, including Aggression-Defiance (AD), Depression-Anxiety (DA), Substance Abuse (SA), Peer Deviance (PD), Family Environment (FE), and School Engagement (SE). The AD (7 items) subscale assesses frequency of aggressive (“I got so angry I hit or broke something”) and defiant (“I lied, scammed, or conned someone to get what I wanted”) behaviors. The DA (8 items) subscale assesses the symptoms of depression (“I felt like nothing could cheer me up”) and anxiety (“I had trouble sleeping or eating because I couldn’t get something off my mind”). The SA subscale (5 items) measures usage and effects of drugs and alcohol (“I missed or skipped school to use or recover from drugs or alcohol”). The PD subscale (5 items) queries participants’ perception of deviance of their peers with respect to multiple areas of risk such as defiance, aggression, and substance abuse (“My friends got into physical fights”). The FE subscale (6 items) measures the quality of the parent–child relationship as well as the home environment (“I felt safe with my family”). Finally, the SE subscale (9 items) measures the participant’s attachment to school and learning (“I liked going to school”), efforts to succeed (“I studied for my quizzes and tests”), and connectedness to school and school personnel (“My teachers took a personal interest in me”).

Items refer to attitudes and behaviors within the past 2 months and use a four-point rating scale ranging from “Never, or hardly ever” to “Always, or almost always.” Items are brief and written at approximately a fourth-grade reading level. Prior research supports a bifactor structure comprising one general risk and needs factor and six specific factors each corresponding to one WARNS subdomain (Strand et al., 2019). In addition, an accumulation of evidence supports several inferences in a developing validity argument for score use (e.g., Gotch & French, 2020). The internal consistency reliability estimate (omega) with data used in this study for the WARNS total score, which is used for classification of risk, was .96.

Procedures

Assessments were administered in a digital format as part of routine organizational practice by WARNS users. School districts and partner organizations determine when to administer the assessment. Typical reasons include fulfilling state-mandated assessment requirements triggered by established truancy thresholds, supporting comprehensive risk assessment initiatives, or as part of general student evaluations. No data were collected on the exact reason for completing the assessment. The online assessment platform uses a forced-choice format, requiring students to answer each item before proceeding to the subsequent item. Therefore, there is no missing data.

Administration procedures vary across organizations. Implementations include a range of contexts and formats, such as in-person or remote, individual or group sessions, and completion on either student or school devices. For example, schools using the WARNS as a general screener may administer the assessment to large groups in a computer lab, whereas referrals tied to truancy thresholds may be completed individually with a counselor. Regardless of administration procedure, all organizations are instructed to inform students why they are being asked to complete the WARNS and that the general purpose is to start a conversation about where they may need support. When completing the WARNS, students are asked to respond to the items by reflecting on their life within the past 2 months. Administrators were provided access to standard guidance and training materials describing implementation procedures and suggested language for introducing the assessments. Fidelity of these procedures at the point of use was not recorded.

The online assessment system presents items in blocks of three to five per screen with a total of 12 blocks. Timestamps are recorded at the block level, such that a single submission time is stored for each block of items. Items representing different WARNS domains are randomly distributed across blocks, and the order and layout of blocks are fixed across students. In this study, for each block, response time was computed as the difference between the block start and submit times. For example, in Table 2, for Block 1 with 5 items, the mean response time of 42.5 s represents the average time to complete all 5 items across all students. Assessments with total completion times exceeding 50 min were excluded to remove implausibly long administrations.

Table 2.

Number of Rapid Responders (NRR) and Threshold Value (TV) in Seconds Per Block

Block (n items)	Mean RT (SD)	NT10		NT20		NT30		2-second
Block (n items)	Mean RT (SD)	NRR	TV	NRR	TV	NRR	TV	NRR	TV
Block 1 (5)	42.5 (41.9)	0	4.25	3	8.50	31	12.76	10	10
Block 2 (4)	37.7 (45.5)	2	3.77	10	7.53	57	11.30	14	8
Block 3 (5)	37.2 (24.1)	1	3.72	6	7.44	29	11.16	18	10
Block 4 (5)	42.3 (35.8)	1	4.23	13	8.45	47	12.68	24	10
Block 5 (5)	32.2 (51.7)	0	3.22	10	6.45	33	9.67	37	10
Block 6 (5)	29.1 (32.7)	1	2.91	9	5.83	40	8.74	64	10
Block 7 (4)	25.6 (36.0)	1	2.56	10	5.11	39	7.67	49	8
Block 8 (5)	29.3 (30.8)	1	2.93	9	5.86	63	8.78	108	10
Block 9 (5)	25.7 (20.0)	0	2.57	8	5.14	29	7.71	81	10
Block 10 (4)	24.0 (20.6)	0	2.40	11	4.79	27	7.19	50	8
Block 11 (3)	17.1 (19.3)	1	1.71	8	3.43	33	5.14	52	6
Block 12 (3)	18.1 (13.6)	1	1.81	11	3.63	37	5.44	53	6
RTE < .90		1		18		86		115

Note. NRR = number of rapid responders. Mean RT = average time in seconds to complete a block of items (e.g., block 1 = 42.5 seconds to complete 5 items on average). TV = threshold value in seconds. RTE = response time effort, which is the proportion of blocks an individual responded to non-rapid. From the RTE values, we created a low effort indicator as RTE ≤ .90, corresponding to an individual rapidly responding on more than 10% of blocks.

Item Response Analysis

Rapid responding was identified using four rules commonly applied to the block response times. First, we used the two-second per item rule (Huang et al., 2012; Soland et al., 2019), where a block was flagged as rapid if its response time in seconds was less than or equal to 2 multiplied by the number of items in the block; otherwise, it was classified as non-rapid. Second, we used a normative threshold method (NT, Wise & Ma, 2012), where responses were flagged as rapid if their time fell below a specified proportion of the block’s mean response time. Specifically, we considered thresholds of 10%, 20%, and 30% of the block mean (NT10, NT20, and NT30, respectively) given previous empirical study of their functioning to identify rapid responding (e.g., Soland et al., 2021; Wise & Kuhfeld, 2021). For example, the NT10 threshold for Block 1 was 4.25 s, representing 10% of the average block response time, with NT20 and NT30 representing 20% and 30% of the average block response time. The two-second rule defines the threshold as 2 multiplied by the number of items in the block (e.g., Block 1 = 5 items × 2 = 10 s). For each method (two-second rule, NT10, NT20, NT30), we computed response time effort (RTE) as the student-level proportion of blocks classified as non-rapid (Wise & Kong, 2005). Aligning with prior research (e.g., Rios & Deng, 2021; Rios et al., 2017; Wise & DeMars, 2005), we created a low effort indicator using RTE ≤ .90, corresponding to rapid responding on more than 10% of blocks, which was equivalent to two or more WARNS blocks.

Validity Check of Response-Time Thresholds

To evaluate the validity of response-time thresholds before applying them to the student sample, multiple administrations of the WARNS were completed by the authors of this study. Each author completed at least one effortful administration (i.e., reading and answering items carefully) and one rapid-responding administration (i.e., completing the assessment as quickly as possible without reading items). This resulted in 14 administrations with known response behavior, including six that were effortful and eight that were rapid. The rapid responding thresholds estimated using the four rules applied to the student sample (two-second rule, NT10, NT20, NT30) were compared to the authors’ time spent answering thoughtfully or as quickly as possible. A threshold setting method was considered invalid (too insensitive) if the threshold did not flag the authors’ quick, thoughtless responses as rapid responses or missed effortful responses. A method that is working well should minimize both false positives and false negatives.

Analysis of Risk Status by Disengagement Status

Proportions of students classified as low effort and effortful were estimated within overall risk level classification (high/low) and within each domain-level risk category (low, medium, high). Additional summaries of disengagement were generated by gender, grade, locale, organization type, and assessment administration format. Two-way cross-tabulations were created for pairs of grouping variables to examine patterns at subgroup intersections.

Internal Consistency Reliability Analysis

Internal consistency reliability was estimated with McDonald’s omega (ω, McDonald, 1999) for each domain-specific factor, the general factor, and the total composite. Estimates were based on the bifactor model, which tends to fit the data the best (e.g., Strand et al., 2019). Estimates were obtained for the full sample and then recomputed after effort-based filtering using the RTE ≥ .90 criterion under each response-time rule.

Results

Given the threshold-setting procedures have not been examined for risk assessments, we first evaluated if each method could correctly identify rapid and non-rapid responses to the WARNS provided by the authors. This validation check allowed us to have confidence in the detection criteria we used for the analysis. Table 3 presents the performance of the threshold methods to the authors’ data with known response behavior. Under perfect detection, the rapid missed and effort flagged columns would contain only zeros. Across blocks, NT10 missed nearly all known rapid responses, indicating that a 10 percent mean response time cutoff was too lenient or insensitive for detecting rapid responding on this risk assessment. NT20 reduced the number of missed rapid responses but still failed to identify several known rapid responses across many blocks. NT30 and the two-second rule performed best, identifying nearly all known rapid responses on this risk assessment. Although NT30 incorrectly flagged one known-effortful response on Block 2, it was substantially more sensitive to rapid responding than NT10 and NT20. Accordingly, we focus on results from NT30 and the two-second rule. Given their common use with achievement tests, results for the NT10 and NT20 thresholds are reported for comparison (i.e., to show how using these less sensitive thresholds would change the number of responses classified as rapid on this assessment). Because the NT10 and NT20 demonstrated limited sensitivity for the WARNS, we do not interpret those results in detail below.

Table 3.

Block-Level Performance of Response Time Threshold Methods in Author’s Response Data

Block (n items)	NT10		NT20		NT30		2-second rule
Block (n items)	Rapid missed	Effortful flagged	Rapid missed	Effortful flagged	Rapid missed	Effortful flagged	Rapid missed	Effortful flagged
Block 1 (5)	8	0	3	0	0	0	1	0
Block 2 (4)	5	0	0	0	0	1	0	0
Block 3 (5)	8	0	1	0	0	0	0	0
Block 4 (5)	5	0	0	0	0	0	0	0
Block 5 (5)	8	0	1	0	0	0	0	0
Block 6 (5)	8	0	1	0	0	0	0	0
Block 7 (4)	8	0	1	0	0	0	0	0
Block 8 (5)	8	0	1	0	0	0	0	0
Block 9 (5)	8	0	3	0	0	0	0	0
Block 10 (4)	8	0	6	0	1	0	0	0
Block 11 (3)	8	0	4	0	2	0	2	0
Block 12 (3)	8	0	4	0	0	0	0	0

Note. The authors completed the WARNS effortfully 6 times and quickly 8 times. Counts reflect the number of times the threshold incorrectly identified the responses as rapid or not. “Rapid missed” denotes known-rapid responses not correctly identified using that threshold method. “Effortful Flagged” denotes known-effortful responses incorrectly identified as rapid using that threshold method.

In Table 2, we present the number of rapid responders (NRR) and corresponding average response threshold value (TV in seconds) for each block under four threshold rules: NT10, NT20, NT30, and the two-second rule. NRR refers to the number of students whose response times fell below the specified threshold for that block, indicating potential rapid responding. The TV column represents the cutoff time applied for that threshold (e.g., under NT10 for Block 1, responses faster than 4.25 seconds were flagged as rapid). To orient the reader, Block 1 illustrates these metrics clearly. The mean response time was 42.5 seconds (SD = 41.9), suggesting generally effortful engagement. Under NT10, 0 students were classified as rapid responders using a threshold of 4.25 s. NT20 flagged 3 students as responding rapidly using a threshold of 8.50 s. This insensitivity confirms our validity check results and our focus on the other thresholds. NT30 identified 31 students as responding rapidly using a 12.76-second threshold. The two-second rule classified 10 students as rapid responders, applying a fixed cutoff of 2 s multiplied by the number of items in the block (10 s for Block 1 with 5 items). This pattern shows that the thresholds of (NT10 and NT20 identified very few rapid responses, whereas NT30 and the two-second rule flagged substantially more students. Across all response time threshold rules, most students were classified as effortful. More than 95% of students met the RTE ≥ .90 criterion, meaning that fewer than 5% of students exhibited rapid responses on more than 10% of blocks. Specifically, the number of students classified as low effort (RTE < .90) ranged from 1 under the insensitive NT10 threshold rule to 115 under the more sensitive two-second rule (see Table 2).

Average response times declined across blocks, ranging from 42.5 s on Block 1 to 18.1 s on Block 12. Block-level prevalence of rapid responding varied substantially across blocks and threshold methods (Table 2). As expected, given the validity check of the threshold methods using the authors’ responding data, NT10 and NT20 identified relatively few rapid responses per block (0–2 and 3–13, respectively), whereas NT30 and the two-second rule identified more students (29–63 and 10–108, respectively).

Differences in Rapid Responding

Simple Group Differences

Table 4 summarizes rapid responding across student groups, reporting each group’s percentage of the overall sample and percentage of rapid responders (PRR) identified under each response time threshold rule. Some groups contributed a disproportionately large share of rapid responses relative to their percentage of the sample: males, 11–12th grade students, group-administered, non-district setting, and urban setting. For example, students who identified as male represented 53.8% of the total sample, yet accounted for 61.6% (NT30) and 63.5% (two-second rule) of students with rapid responses. Alternatively, students who identified as female represented 43.7% of the total sample and accounted for only 36.1% (NT30) and 34.8% (two-second rule) of students with rapid responses. For grade level, 11–12th grade students comprised 45% of the total sample but were 55.8% (NT30) and 52.2% (two-second rule) of the rapid responders. Students who completed the assessment in a school district during a group administration represented 35.6% of the total sample yet were 55.8% (NT30) and 62.6% (two-second rule) of students with rapid responses, whereas students who completed the assessment individually within a district represented 59.6% of the total sample but were only 25.6% (NT30) and 22.6% (two-second rule) of students with rapid responses. Students in urban locations represented 62.8% of the total sample but were 70.9% (NT30) and 73.0% (two-second rule) of the rapid responders. Students in rural locations represented 31.9% of the total sample but were only 10.5% (NT30) and 12.2% (two-second rule) of students with rapid responses.

Table 4.

Rapid Responders by Student Group

Grouping variable	Group	Total sample (%)	NT10 PRR (%)	NT20 PRR (%)	NT30 PRR (%)	2-second PRR (%)
Sex	Female	43.7	0	33.3	36.1	34.8
Sex	Male	53.8	100	61.1	61.6	63.5
Grade Level	9-10th	55.0	100	27.8	44.2	47.8
Grade Level	11–12th	45.0	0	72.2	55.8	52.2
Org.type	Not school	4.8	0	5.6	18.6	14.8
	District	59.6	0	22.2	25.6	22.6
	District (group)	35.6	100	72.2	55.8	62.6
Locale	Rural	31.9	0	16.7	10.5	12.2
Locale	Urban	62.8	100	77.8	70.9	73.0
AD Level	High	4.9	0	22.2	11.6	8.7
	Moderate	8.0	0	22.2	10.5	8.7
	Low	87.1	100	55.6	77.9	82.6
DA Level	High	22.8	0	16.7	10.5	11.3
	Moderate	20.1	0	22.2	23.3	16.5
	Low	57.1	100	61.1	66.3	72.2
FE Level	High	12.8	100	22.2	17.4	16.5
	Moderate	18.9	0	61.1	32.6	27.8
	Low	68.3	0	16.7	50.0	55.7
PD Level	High	3.7	0	0	3.5	2.6
	Moderate	11.3	0	33.3	10.5	9.6
	Low	85.0	100	66.7	86.1	87.8
SE Level	High	20.5	100	66.7	47.7	40.0
	Moderate	26.7	0	33.3	22.1	21.7
	Low	52.8	0	0	30.2	38.3
SA Level	High	3.7	0	22.2	8.1	6.1
	Moderate	9.2	0	16.7	10.5	7.8
	Low	87.1	100	61.1	81.4	86.1
Risk Level	High	74.9	100	100	76.7	68.7
Risk Level	Low	25.1	0	0	23.3	31.3

Note. PRR = percentage of rapid responders. Org.type = organization type. AD = Aggression-Defiance. DA = Depression-Anxiety. FE = Family Environment. PD = Peer Deviance. SE = School Engagement. SA = Substance Abuse. Not that Risk Level is determined by a cumulative score across subscales (e.g., a combination of moderate and low can equal high risk).

In sum, across blocks, rapid responding was rare overall (<5% of students), but prevalence varied by threshold and block. NT30 and the two-second rule flagged substantially more rapid responders than NT10/NT20. Group differences emerged: rapid responding was disproportionately higher among males, 11–12th graders, students tested in group/urban settings, and those outside district settings.

Intersectional Group Differences

Table 5 summarizes rapid responding within intersectional groups, including each group’s percentage of the overall sample and the percentage of rapid responses identified under each method. Some of the largest differences between a group’s share of the sample and its share of students with rapid responses involved school district-based group assessment sessions. For example, students who completed the assessment in school districts during group administration sessions in urban areas accounted for 35.6% of the total sample but were 55.8% (NT30) and 62.6% (two-second rule) of rapid responders. Within district group assessment sessions, 11th- and 12th-grade students represented 14.4% of the total sample yet were 31.4% (NT30) and 31.3% (two-second rule) of rapid responders. Within district group assessment sessions, students who identified as male represented 18.6% of the total sample yet were 29.1% (NT30) and 35.7% (two-second rule) of rapid responders. Students classified as having low risk in these group sessions represented 12.9% of the sample yet were 20.9% (NT30) and 28.7% (two-second rule) of rapid responders.

Table 5.

Rapid Responders by Intersectional Student Group

Grouping variable	Group	Total sample (%)	NT10 PRR (%)	NT20 PRR (%)	NT30 PRR (%)	2-second PRR (%)
Sex & Grade Level	Female \| 11–12th	18.7	0.0	33.3	18.6	16.5
	Female \| 9–10th	25.1	0.0	0.0	17.4	18.3
	Male \| 11–12th	25.0	0.0	38.9	36.1	34.8
	Male \| 9–10th	28.8	100.0	22.2	25.6	28.7
Sex & Org.type	Female \| Not School	1.8	0.0	5.6	3.5	3.5
	Female \| District	25.9	0.0	0.0	5.8	4.4
	Female \| District (group)	16.0	0.0	27.8	26.7	27.0
	Male \| Not School	2.9	0.0	0.0	14.0	10.4
	Male \| District	32.4	0.0	16.7	18.6	17.4
	Male \| District (group)	18.6	100.0	44.4	29.1	35.7
Sex & Risk Level	Female \| High	32.7	0.0	33.3	20.9	18.3
	Female \| Low	11.0	0.0	0.0	15.1	16.5
	Male \| High	39.9	100.0	61.1	53.5	48.7
	Male \| Low	14.0	0.0	0.0	8.1	14.8
Sex & Locale	Female \| Rural	15.6	0.0	0.0	0.0	0.9
	Female \| Urban	26.1	0.0	27.8	32.6	30.4
	Male \| Rural	15.8	0.0	11.1	9.3	10.4
	Male \| Urban	34.9	100.0	50.0	38.4	42.6
Grade Level & Org.type	11–12th \| Not School	1.6	0.0	5.6	7.0	5.2
	11–12th \| District	29.1	0.0	11.1	17.4	15.7
	11–12th \| District (group)	14.4	0.0	55.6	31.4	31.3
	9–10th \| Not School	3.3	0.0	0.0	11.6	9.6
	9–10th \| District	30.5	0.0	11.1	8.1	7.0
	9–10th \| District (group)	21.1	100.0	16.7	24.4	31.3
Grade Level & Risk Level	11–12th \| High	33.5	0.0	72.2	44.2	36.5
	11–12th \| Low	11.6	0.0	0.0	11.6	15.7
	9–10th \| High	41.4	100.0	27.8	32.6	32.2
	9–10th \| Low	13.6	0.0	0.0	11.6	15.7
Grade Level & locale	11–12th \| Rural	14.0	0.0	5.6	8.1	9.6
	11–12th \| Urban	29.5	0.0	61.1	40.7	37.4
	9–10th \| Rural	17.9	0.0	11.1	2.3	2.6
	9–10th \| Urban	33.4	100.0	16.7	30.2	35.7
Org.type & Risk Level	Not School \| High	4.2	0.0	5.6	17.4	13.9
	Not School \| Low	0.7	0.0	0.0	1.2	0.9
	District \| High	48.1	0.0	22.2	24.4	20.9
	District \| Low	11.5	0.0	0.0	1.2	1.7
	District group \| High	22.7	100.0	72.2	34.9	33.9
	District group \| Low	12.9	0.0	0.0	20.9	28.7
Org.type & Locale	District \| Rural	31.9	0.0	16.7	10.5	12.2
	District \| Urban	27.3	0.0	5.6	15.1	10.4
	District Group \| Urban	35.6	100.0	72.2	55.8	62.6
Locale & Risk Level	Rural \| High	24.8	0.0	16.7	10.5	10.4
	Rural \| Low	7.1	0.0	0.0	0.0	1.7
	Urban \| High	45.6	100.0	77.8	48.8	44.4
	Urban \| Low	17.2	0.0	0.0	22.1	28.7

Note. PRR = percentage of rapid responders. Org.type = organization type.

In contrast, several groups in individual district administrations had smaller shares of rapid responders than the total sample. Students who identified as female and completed the assessment individually in district settings comprised 25.9% of the total sample but were only 5.8% (NT30) and 4.4% (two-second rule) of rapid responders, and 9th- and 10th-grade students who completed the assessment individually in district settings comprised 30.5% of the sample but were only 8.1% (NT30) and 7.0% (two-second rule) of rapid responders. Rural district settings showed a similar pattern, accounting for 31.9% of the total sample but were only 10.5% (NT30) and 12.2% (two-second rule) of rapid responders.

Students who completed the assessment in non-school settings (i.e., community-based agencies) and were classified as having high risk represented 4.2% of the total sample yet were 17.4% (NT30) and 13.9% (two-second rule) of rapid responders. Males in non-school settings represented 2.9% of the total sample yet were 14.0% (NT30) and 10.4% (two-second rule) of rapid responders. Males who were classified as high risk represented 39.9% of the total sample yet were 53.5% (NT30) and 48.7% (two-second rule) of rapid responders. Across gender and grade level, rapid responding was most commonly observed among male 11th–12th graders. This group represented 25% of the sample but constituted 36.1% (NT30) and 34.8% (two-second rule) of rapid responders.

Disproportional Risk Classification

Prior to filtering the rapid responders, the majority of the total sample was classified as low risk across the individual WARNS domains. As seen in Table 4, the percentage of students classified as low risk ranged from 52.8% for the School Engagement domain to 87.1% for the Aggression-Defiance and Substance Abuse domains. However, for students labeled as rapid responders using the two-second rule, the distribution of low-risk classifications differed from the full sample: low-risk was more common for Depression-Anxiety (i.e., 72.2% vs 57.1%) but less common for the School Engagement (38.3% vs 52.8%), Family Environment (55.7% vs 68.3%), and Aggression-Defiance (82.6% vs 87.1%) domains when employing the two-second rule for students who were rapid responding compared to the full sample.

For the Total Risk classification, prior to filtering rapid responders, 74.9% of students were categorized as high risk and 25.1% as low risk. For students labeled as rapid responders, 76.7% (NT30) and 68.7% (two-second rule) were categorized as high risk, whereas 23.3% (NT30) and 31.3% (two-second rule) were categorized as low risk. Using the two-second rule, low-risk categorization was overrepresented, and high-risk categorization was underrepresented relative to the full sample. Thus, at the total-risk level, evidence consistent with possible false negatives was observed under the two-second rule, but not under NT30. The opposite (i.e., false positive) could occur with straight lined responses on the WARNS (selecting the same response that represents high risk across all items), but occurred less than 1% of the time. In student risk screening, the consequences of false negatives are often more severe than those of false positives. A false negative, misclassifying a truly at-risk student as low risk, can delay or prevent access to timely intervention, allowing academic, behavioral, or mental health difficulties to compound over time. Screening systems in education frequently prioritize sensitivity when the cost of missed risk is high, and support early identification (e.g., Kearney et al., 2023; Wu et al., 2026).

Internal Consistency Reliability Across Filtering Conditions

Table 6 reports internal consistency reliability via coefficient omega (ω) for the WARNS total score and for subdomains based on the bifactor model. Reliability for the total score, which is used for classification of risk, was .96. Reliability estimates for the six WARNS domains showed little variation across filtering conditions that removed students identified as rapid responders under each response time threshold method. Estimates for the general factor (ω_h = .78) and total score (ω_t = .96) were the same across all filtering rules. We note that the percentage filtered ranged from .04% (n = 1; NT10) to 4% (n = 115; two-second) of the total sample (N = 2773). At these percentages, it is not surprising the omega estimates did not change. The factor loading coefficients used to calculate omega would not be expected to change with this minor loss of a sample size.

Table 6.

Coefficient Omega (ω) from Bifactor CFA by Effort-Based Filtering Rule

Data analyzed	ω_s AD	ω_s DA	ω_s FE	ω_s PD	ω_s SA	ω_s SE	ω_h g	ω_t total
All students	0.16	0.44	0.36	0.33	0.17	0.57	0.78	0.96
RTE ≥ .90 (NT10)	0.16	0.44	0.36	0.33	0.17	0.57	0.78	0.96
RTE ≥ .90 (NT20)	0.16	0.44	0.36	0.33	0.17	0.57	0.78	0.96
RTE ≥ .90 (NT30)	0.16	0.43	0.36	0.33	0.17	0.56	0.78	0.96
RTE ≥ .90 (two-second rule)	0.16	0.44	0.36	0.33	0.17	0.56	0.78	0.96

Note. RTE = response time effort. AD = Aggression-Defiance. DA = Depression-Anxiety. FE = Family Environment. PD = Peer Deviance. SE = School Engagement. SA = Substance Abuse.

Discussion

We examined the prevalence of rapid responding on a youth risk assessment, how rates of rapid responding varied across student groups and administration contexts, and how risk classifications could differ for students identified as rapid responders. We analyzed data from students who took the WARNS high school assessment, using response times to flag disengaged responses. The vast majority of students were adequately engaged, with over 95% meeting a high effort threshold (RTE ≥ .90) under various detection rules. In other words, fewer than 5% of students exhibited clear signs of rapid responding. Nonetheless, a non-trivial subset did respond rapidly. We also observed a pattern of declining response times over the course of the assessment. On average, students spent about 42.5 seconds on the first block, dropping to around 18.1 seconds by the last block. This speeding-up suggests that some students became less attentive or eager to finish as they progressed, consistent with known patterns of waning motivation during low-stakes tests.

We explored whether rapid responding varied across student characteristics or administration contexts. In general, disengaged responding was observed across different demographic groups and was not limited to any single group. Students who identified as male accounted for a larger share of rapid responders than their representation in the sample, consistent with the literature (e.g., DeMars et al., 2013; Wise & Kuhfeld, 2021). We also observed differences by administration context. For example, rapid responding was more common in less supervised, district-based group administration contexts, which is consistent with behavior of students in proctored and more controlled environments (Alahmadi & DeMars, 2022; Barry & Finney, 2009; Schaefer & Finney, 2025).

Our results underscore the importance of accounting for respondent effort when interpreting risk assessment results. Practitioners should not assume that a risk assessment score is an error-free reflection of a student’s situation. Practitioners need to be aware of how the student responded to the items. Moreover, we call for practitioners to integrate response process data, such as item response timing, into assessment scoring or reports. For example, if a student’s average response time is extremely low, the WARNS report could flag this student’s responses and recommend caution regarding score interpretation. Implementing an effort flag would allow school personnel to treat those results with skepticism and follow up with a conversation with the student. In fact, the WARNS system currently has a section for such comments from the school personnel. That is, personnel can note if they think the student did not take the assessment seriously or had odd assessment behavior. However, that option is rarely used (<10%). This flag would be helpful at the individual level, as it would allow for subsequent gathering of data to inform student support. The flag would also be helpful at the aggregate level when trying to accurately summarize risk at the group level (schools, districts, etc.). For example, Wise and Kuhfeld (2021) have advocated for motivation filtering or removing data from low-effort examinees when producing aggregate-level results. Motivation filtering leads to more valid conclusions because construct irrelevant variance reflecting low effort has been removed. In fact, filtering out disengaged responses has been shown to improve correlations between test scores and external criteria, meaning the assessment does a better job of reflecting true ability or needs once the noise of rapid responding is reduced. In a risk assessment context, such filtering should be explored at the school or district level to understand how a filtering practice may change the risk rates at that level. That is, filtering out low-risk rapid responders who may be high-risk will adjust the low-risk percentages downward to be more accurate possibly at the expense of the high-risk percentage accuracy. What is not captured in this filtering, as we have described, is the fact that some students may straight line responses as rapid responders and be classified as high-risk. We observed this behavior in less than 1% of responses. We note this because careful examination on filtering in a risk context may not be as straightforward as achievement contexts, especially when trying to understand risk at the school or district level.

Implications

Our results have important implications for practitioners using risk assessments to guide interventions. Even a few disengaged responses can distort a student’s risk profile. At the total score level under the two-second rule, rapid responders were more often classified as low risk and less often classified as high risk than students in the full sample. Thus, some students truly at high risk may appear low risk simply because they rushed through the assessment, potentially delaying follow-up and timely intervention despite actual risk (e.g., attendance, behavior, and mental health). We also acknowledge the opposite can happen. In either case (false negative or positive), low-effort responding undermines accuracy and fairness in risk assessments.

Another implication of our results is the importance of interventions to minimize rapid responding (e.g., Finney & Pastor, 2025; McFadden & Finney, 2025). This is perhaps the most critical issue at the individual level. Our findings suggest engagement often drops near the end of the assessment, possibly due to fatigue, making real-time effort monitoring valuable. Detecting rapid responses as they occur would allow staff to intervene immediately. For example, a counselor could remind a student to slow down, or the computer could prompt them after consecutive quick clicks (e.g., Wise et al., 2019). Effort-monitoring in tools like WARNS could improve the validity of score interpretations and would be low cost. More importantly, it may increase the accuracy of the information for an individual student, resulting in more appropriate next steps to support that student. This is where accuracy matters the most.

Limitations

Several limitations warrant caution. First, the sample may limit generalizability. Our data came from high school students in one U.S. region completing the WARNS, reflecting older adolescent behaviors. Findings may not extend to younger students or other populations. Regional factors such as school culture or administration practices could also influence engagement.

Second, we defined low-effort responding using response times. Although widely used, time-based thresholds involve trade-offs between false positives and negatives. We applied liberal criteria (e.g., NT30, two-second rule) to avoid mislabeling rapid responders as effortful. Other indicators such as straightlining could be examined. Moreover, asking students to self-report expended effort typically identifies students who did not respond thoughtfully but did not rapidly respond (Schaefer & Finney, 2025). Future WARNS administrations should consider asking students if their responses were thoughtful and valid. In short, we applied one behavioral method of identifying disengaged responders (i.e., rapid responding), and while grounded in established methods, some misclassification is possible thus the prevalence estimates should be interpreted as approximate (Wise, 2017).

Third, administration conditions varied by school, introducing uncontrolled variability. Setting differences (quiet office vs. group) and instructions can affect motivation (Alahmadi & DeMars, 2022; Barry & Finney, 2009; Schaefer & Finney, 2025). We do not know the extent to which students were supervised individually, in groups, or remotely. This leaves questions to explore about how different administration protocols influence or at least are related to disengagement.

Fourth, other validity threats besides rapid responding remain, such as deliberate under- or over-reporting of risks. For example, a student might downplay substance use yet appear engaged by timing metrics. Alternatively, a student may exaggerate risk behaviors (e.g., reporting higher levels of stress or substance use than actually experienced) in order to gain attention or access to additional support or resources. Such biases (e.g., social desirability, attention-seeking) are beyond response-time detection.

Future Directions

Our next step is to examine rapid responding in middle school students. Early adolescence (ages ∼11–14) may pose different challenges for self-report assessments. Middle schoolers might try harder to please adults, reducing disengagement compared to high school students (Soland, 2018a; Soland & Kuhfeld, 2019). Replication is needed for generalizability. Because middle school is critical for catching emerging attendance or behavior issues, ensuring valid risk data for this group is essential. This work will extend WARNS-like tools for early intervention.

Another avenue is refining detection methods for non-achievement scales. We used the two-second rule, but optimal thresholds may vary by question type or age group. Future studies should calibrate thresholds by context and explore machine-learning approaches to identify aberrant patterns beyond a single cutoff. Combining methods could reduce false positives and negatives, improving corrective actions.

Finally, research should explore strategies to prevent disengagement. Interventions could occur before or during assessments. For example, asking students to commit to full effort (Finney et al., 2025) or adding a pre-survey script emphasizing honest responses may boost engagement. Real-time monitoring could prompt students when rapid responding is detected (Wise et al., 2019). Piloting these approaches in schools and evaluating their impact on engagement and risk assessment outcomes would be valuable.

In conclusion, our work highlights a subtle but important threat to the validity of youth risk assessments. Disengaged responding can distort the identification of students at-risk or in need of support. Our study found that although most students put forth effort on the WARNS, a small proportion did not, and those few cases can undermine the accuracy of the results. Failing to account for rapid responding could lead to misclassification and possible misallocated resources – either students in need not receiving support or, conversely, interventions being triggered by inaccurate data. We offered suggestions for next steps for research and practice. Specifically, by extending research to different age groups, improving methods to detect and prevent low-effort responses, and integrating these practices into assessment administration, we can strengthen the effectiveness of risk assessments. The long-term goal of such work is to ensure that every student’s assessment truly reflects their circumstances, so that educators can make informed decisions and direct support to where it’s needed. In the fight against chronic absenteeism and related youth risks, maintaining the integrity of assessment data is crucial – and that means making sure disengaged responses do not lead to distorted risks.

Footnotes

ORCID iD

Brian F. French

Ethical Considerations

This study was conducted in accordance with ethical standards for research involving human participants. Institutional Review Board deemed the work exempt given this was archival data.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The research reported here was supported by the Institute of Education Sciences, U.S. Department of Education, through Grant R305A210087 to Washington State University. The opinions expressed are those of the authors and do not represent views of the Institute or the U.S. Department of Education.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request. Due to privacy and confidentiality agreements, some restrictions may apply.*

References

Akhtar

Kovacs

(2023). Which tests should be administered first, ability or non-ability? The effect of test order on careless responding. Personality and Individual Differences, 207, Article 112157. https://doi.org/10.1016/j.paid.2023.112157

Alahmadi

DeMars

C. E.

(2022). Large-scale assessment during a pandemic: Results from James Madison University’s remote assessment day. Research & Practice in Assessment, 17(1), 5–15. https://www.rpajournal.com/large-scale-assessment-during-a-pandemic-results-from-james-madison-universitys-remote-assessment-day/

American Educational Research AssociationAmerican Psychological Association, & National Council on Measurement in Education . (2014). Standards for educational and psychological testing. American Educational Research Association.

Arhavbarien

(2026). Low-stakes assessments in secondary and further education schools: A systematic literature review. Social Sciences & Humanities Open, 13, Article 102320. https://doi.org/10.1016/j.ssaho.2025.102320

Barry

Finney

S. J.

(2016). Modeling change in effort across a low-stakes testing session: A latent growth curve modeling approach. Applied Measurement in Education, 29(1), 46–64. https://doi.org/10.1080/08957347.2015.1102914

Barry

C. L.

Finney

S. J.

(2009). Does it matter how data are collected? A comparison of testing conditions and the implications for validity. Research & Practice in Assessment, 3, 1–15. https://eric.ed.gov/?id=EJ1062735.

Black

L. I.

Elgaddal

(2024). Chronic school absenteeism for health-related reasons among children ages 5–17 years: United States, 2022 (NCHS Data Brief No. 498). National Center for Health Statistics.

Borger

Eklöf

Johansson

Strietholt

(2025). The issue of test-taking motivation in low- and high-stakes tests: Are students underachieving in PISA? Learning and Individual Differences, 122, Article 102722. https://doi.org/10.1016/j.lindif.2025.102722

Borghans

Diris

Tavares

(2024). Student characteristics and effort during test-taking. Learning and Instruction, 93, Article 101924. https://doi.org/10.1016/j.learninstruc.2024.101924

10.

Braun

Kirsch

Yamamoto

(2011). An experimental study of the effects of monetary incentives on performance on the 12th-grade NAEP reading assessment. Teachers College Record: The Voice of Scholarship in Education, 113(11), 2309–2344. https://doi.org/10.1177/016146811111301101

11.

Cole

J. S.

Osterlind

S. J.

(2008). Investigating differences between low- and high-stakes test performance on a general education exam. The Journal of General Education, 57(2), 119–130. https://doi.org/10.2307/27798099

12.

Curran

P. G.

(2016). Methods for the detection of carelessly invalid responses in survey data. Journal of Experimental Social Psychology, 66, 4–19. https://doi.org/10.1016/j.jesp.2015.07.006

13.

DeMars

C. E.

(2000). Test stakes and item format interactions. Applied Measurement in Education, 13(1), 55–77. https://doi.org/10.1207/s15324818ame1301_3

14.

DeMars

C. E.

Bashkov

B. M.

Socha

A. B.

(2013). The role of gender in test-taking motivation under low-stakes conditions. Research & Practice in Assessment, 8, 69–82.

15.

Development Services Group, Inc. (2015). “Risk and needs assessment for youths.” Literature review. Office of Juvenile Justice and Delinquency Prevention. https://ojjdp.ojp.gov/model-programs-guide/literature-reviews/risk_needs_assessments_for_youths.pdf

16.

DIA Higher Education Collaborators . (n.d.). ISSAQ holistic student success platform. ISSAQ. https://www.issaq.net/

17.

Duckworth

A. L.

Yeager

D. S.

(2015). Measurement matters: Assessing personal qualities other than cognitive ability for educational purposes. Educational Researcher, 44(4), 237–251. https://doi.org/10.3102/0013189X1558432

18.

Finney

S. J.

Miller

S. A.

McGoey

(2025). Increasing expended effort on low-stakes accountability tests via priming: Effectiveness with graduating university students. Research & Practice in Assessment, 20(1), 76–92. https://www.rpajournal.com/increasing-expended-effort-on-low-stakes-accountability-tests-via-priming-effectiveness-with-graduating-university-students/

19.

Finney

S. J.

Pastor

D. A.

(2025). Perceived normativity of giving effort on low-stakes tests: Relations with examinee effort and test performance. Educational Assessment, 30(3), 165–182. https://doi.org/10.1080/10627197.2025.2474395

20.

Finney

S. J.

Sundre

D. L.

Swain

M. S.

Williams

L. M.

(2016). The validity of value-added estimates from low-stakes testing contexts: The impact of change in test-taking motivation and test consequences. Educational Assessment, 21(1), 60–87. https://doi.org/10.1080/10627197.2015.1127753

21.

George

Coker

French

Strand

Gotch

McBride

McCurley

(2021). Washington assessment of the Risks and needs of Students (WARNS) user manual (Version 6.1). Washington State Center for Court Research, Administrative Office of the Courts.

22.

Gotch

C. M.

French

B. F.

(2020). A validation trajectory for the Washington assessment of risks and needs of students. Educational Assessment, 25(1), 65–82. https://doi.org/10.1080/10627197.2019.1702462

23.

Henry

K. L.

Huizinga

D. H.

(2007). Truancy’s effect on the onset of drug use among urban adolescents placed at risk. Journal of Adolescent Health, 40(4), 358.e9–358.e3.58E17. https://doi.org/10.1016/j.jadohealth.2006.11.138

24.

Henry

K. L.

Knight

K. E.

Thornberry

T. P.

(2012). School disengagement as a predictor of dropout, delinquency, and problem substance use during adolescence and early adulthood. Journal of Youth and Adolescence, 41(2), 156–166. https://doi.org/10.1007/s10964-011-9665-3

25.

Hollister

Berenson

M. L.

(2009). Proctored versus unproctored online exams: Studying the impact of exam environment on student performance. Decision Sciences Journal of Innovative Education, 7(1), 271–294. https://doi.org/10.1111/j.1540-4609.2008.00220.x

26.

Hopfenbeck

T. N.

Kjærnsli

(2016). Students’ test motivation in PISA: The case of Norway. The Curriculum Journal, 27(3), 406–422. https://doi.org/10.1080/09585176.2016.1156004

27.

Huang

J. L.

Curran

P. G.

Keeney

Poposki

E. M.

DeShon

R. P.

(2012). Detecting and deterring insufficient effort responding to surveys. Journal of Business and Psychology, 27(1), 99–114. https://doi.org/10.1007/s10869-011-9231-8

28.

Jensen

Rice

Soland

(2018). The influence of rapidly guessed item responses on teacher value-added estimates: Implications for policy and practice. Educational Evaluation and Policy Analysis, 40(2), 267–284. https://doi.org/10.3102/0162373718759600

29.

Kearney

C. A.

Dupont

Fensken

Gonzálvez

(2023). School attendance problems and absenteeism as early warning signals: Review and implications for health-based protocols and school-based practices. Frontiers in Education, 8, Article 1253595. https://doi.org/10.3389/feduc.2023.1253595

30.

Kong

X. J.

Wise

S. L.

Bhola

D. S.

(2007). Setting the response time threshold parameter to differentiate solution behavior from rapid-guessing behavior. Educational and Psychological Measurement, 67(4), 606–619. https://doi.org/10.1177/0013164406294779

31.

Lau

A. R.

Swerdzewski

P. J.

Jones

A. T.

Anderson

A. D.

Markle

R. E.

(2009). Proctors matter: Strategies for increasing examinee effort on general education program assessments. Journal of General Education, 58(3), 196–217. https://doi.org/10.2307/27798138

32.

Lundgren

Eklöf

(2023). Questionnaire-taking motivation: Using response times to assess motivation to optimize on the PISA 2018 student questionnaire. International Journal of Testing, 23(4), 231–256. https://doi.org/10.1080/15305058.2023.2214647

33.

Markle

Olivera-Aguilar

Jackson

Noeth

Robbins

(2013). Examining evidence of reliability, validity, and fairness for the success navigator assessment. Educational Testing Service. https://doi.org/10.1002/j.2333-8504.2013.tb02319.x

34.

Markle

R. E.

O'Banion

(2014). Assessing affective factors to improve retention and completion. Learning Abstracts, 17(11), 1–16. https://www.league.org/occasional-papers/assessing-affective-factors-improve-retention-and-completion

35.

McDonald

R. P.

(1999). Test theory: A unified treatment. Lawrence Erlbaum.

36.

McFadden

M. E.

Finney

S. J.

(2025). Investigating the impact of multiple priming questions on examinee effort during low-stakes testing. International Journal of Testing, 25(1), 109–133. https://doi.org/10.1080/15305058.2024.2414425

37.

Meade

A. W.

Craig

S. B.

(2012). Identifying careless responses in survey data. Psychological Methods, 17(3), 437–455. https://doi.org/10.1037/a0028085

38.

Meikrantz Sharp

M. K.

Herr

R. K.

Finney

S. J.

(2026). Factors affecting international student success in higher education: A needs assessment to guide differentiated and intentional programing. Journal of College Student Retention: Research, Theory & Practice. Advance online publication. https://doi.org/10.1177/15210251261430891

39.

Pope

A. M.

Finney

S. J.

Crewe

(2023). Evaluating the effectiveness of an academic success program: Showcasing the importance of theory to practice. Journal of Student Affairs Inquiry, 6(1), 35–50. https://doi.org/10.18060/27924. https://files.eric.ed.gov/fulltext/ED627166.pdf

40.

Reed

Hall

S. R.

Houchins

D. E.

(2024). High-risk students talking low-stakes assessments: Do the data reflect ability or effort? Forum Pedagogiczne, 14(2.1), 17–31. https://doi.org/10.21697/fp.2024.2.1.2

41.

Rios

J. A.

Deng

(2021). Does the choice of response time threshold procedure substantially affect inferences concerning the identification and exclusion of rapid guessing responses? A meta-analysis. Large-Scale Assessments in Education, 9(1), 18. https://doi.org/10.1186/s40536-021-00110-8

42.

Rios

J. A.

Guo

Mao

Liu

O. L.

(2017). Evaluating the impact of careless responses on aggregated scores: To filter unmotivated examinees or not? International Journal of Testing, 17(1), 74–104. https://doi.org/10.1080/15305058.2016.1231193

43.

Rios

J. A.

Liu

O. L.

(2017). Online proctored versus unproctored low-stakes internet test administration: Is there differential test-taking behavior and performance? American Journal of Distance Education, 31(4), 226–241. https://doi.org/10.1080/08923647.2017.1258628

44.

Rios

J. A.

Liu

O. L.

Bridgeman

(2014). Identifying low-effort examinees on student learning outcomes assessment: A comparison of two approaches. New Directions for Institutional Research, 2014(161), 69–82. https://doi.org/10.1002/ir.20068

45.

Rios

J. A.

Soland

(2022). An investigation of item, examinee, and country correlates of rapid guessing in PISA. International Journal of Testing, 22(2), 154–184. https://doi.org/10.1080/15305058.2022.2036161

46.

Rocque

Jennings

W. G.

Piquero

A. R.

Ozkan

Farrington

D. P.

(2017). The importance of school attendance: Findings from the Cambridge study in delinquent development on the life-course effects of truancy. Crime & Delinquency, 63(5), 592–612. https://doi.org/10.1177/0011128716660520

47.

Schaefer

K. E.

Finney

S. J.

(2025). The influence of student disengagement on a non-cognitive measure: Practical solutions for assessment practitioners. Research and Practice in Assessment, 20(1), 34–48. https://files.eric.ed.gov/fulltext/EJ1488106.pdf

48.

Schnipke

D. L.

(1995). Assessing speededness in computer-based tests using item response times. The Johns Hopkins University.

49.

Schwalbe

C. S.

(2008). A meta-analysis of juvenile justice risk assessment instruments: Predictive validity by gender. Criminal Justice and Behavior, 35(11), 1367–1381. https://doi.org/10.1177/0093854808324377

50.

Shaw

D. S.

Galán

C. A.

Lemery-Chalfant

Dishion

T. J.

Elam

K. K.

Wilson

M. N.

Gardner

(2019). Trajectories and predictors of children's early-starting conduct problems: Child, family, genetic, and intervention effects. Development and Psychopathology, 31(5), 1911–1921. https://doi.org/10.1017/S0954579419000828

51.

Smith

L. F.

Smith

J. K.

(2004). The influence of test consequences on national examinations. North American Journal of Psychology, 6(1), 13–25.

52.

Soland

(2018a). Are achievement gap estimates biased by differential student test effort? Putting an important policy metric to the test. Teachers College Record, 120(12), 1–26. https://doi.org/10.1177/016146811812001202

53.

Soland

(2018b). The achievement gap or the engagement gap? Investigating the sensitivity of gaps estimates to test motivation. Applied Measurement in Education, 31(4), 312–323. https://doi.org/10.1080/08957347.2018.1495213

54.

Soland

Kuhfeld

(2019). Do students rapidly guess repeatedly over time? A longitudinal analysis of student test disengagement, background, and attitudes. Educational Assessment, 24(4), 327–342. https://doi.org/10.1080/10627197.2019.1645592

55.

Soland

Kuhfeld

Rios

(2021). Comparing different response time threshold setting methods to detect low effort on a large-scale assessment. Large-Scale Assessments in Education, 9(8), 1–21. https://doi.org/10.1186/s40536-021-00100-w

56.

Soland

Wise

S. L.

Gao

(2019). Identifying disengaged survey responses: New evidence using response time metadata. Applied Measurement in Education, 32(2), 151–165. https://doi.org/10.1080/08957347.2019.1577244

57.

Strand

P. S.

French

B. F.

Austin

B. W.

(2023). Assessment of the risks and needs of middle school students: Invariance properties related to gender and ethnicity. Assessment, 30(3), 580–591. https://doi.org/10.1177/10731911211062505

58.

Strand

P. S.

Gotch

C. M.

French

B. F.

Beaver

J. L.

(2019). Factor structure and invariance of an adolescent risks and needs assessment. Assessment, 26(6), 1105–1116. https://doi.org/10.1177/1073191117706021

59.

Swerdzewski

P. J.

Harmes

J. C.

Finney

S. J.

(2011). Two approaches for identifying low-motivated students in a low-stakes assessment context. Applied Measurement in Education, 24(2), 162–188. https://doi.org/10.1080/08957347.2011.555217

60.

Ulitzsch

Shin

H. J.

Lüdtke

(2024). Accounting for careless and insufficient effort responding in large-scale survey data. Development, evaluation, and application of a screen-time-based weighting procedure. Behavior Research Methods, 56(2), 804–825. https://doi.org/10.3758/s13428-022-02053-6

61.

U.S. Department of Education . (n.d.). Chronic absenteeism. https://www.ed.gov/teaching-and-administration/supporting-students/chronic-absenteeism

62.

Wise

S. L.

(2015). Effort analysis: Individual score validation of achievement test data. Applied Measurement in Education, 28(3), 237–252. https://doi.org/10.1080/08957347.2015.1042155

63.

Wise

S. L.

(2017). Rapid‐guessing behavior: Its identification, interpretation, and implications. Educational Measurement: Issues and Practice, 36(4), 52–61. https://doi.org/10.1111/emip.12165

64.

Wise

S. L.

Cotten

M. R.

(2009). Test-taking effort and score validity: The influence of student conceptions of assessment. In McInerney

D. M.

Brown

G. T. L.

Liem

G. A. D.

(Eds.), Student perspectives on assessment: What students can tell us about assessment for learning (pp. 187–206). Information Age.

65.

Wise

S. L.

DeMars

C. E.

(2005). Low examinee effort in low-stakes assessment: Problems and potential solutions. Educational Assessment, 10(1), 1–17. https://doi.org/10.1207/s15326977ea1001_1

66.

Wise

S. L.

DeMars

C. E.

(2010). Examinee non-effort and the validity of program assessment results. Educational Assessment, 15(1), 27–41. https://doi.org/10.1080/10627191003673216

67.

Wise

S. L.

Kong

X. J.

(2005). Response time effort: A new measure of examinee motivation in computer-based tests. Applied Measurement in Education, 18(2), 163–183. https://doi.org/10.1207/s15324818ame1802_2

68.

Wise

S. L.

Kuhfeld

M. R.

(2021). Using retest data to evaluate and improve effort‐moderated scoring. Journal of Educational Measurement, 58(1), 130–149. https://doi.org/10.1111/jedm.12275

69.

Wise

S. L.

Kuhfeld

M. R.

Soland

(2019). The effects of effort monitoring with proctor notification on test-taking engagement, test performance, and validity. Applied Measurement in Education, 32(2), 183–192. https://doi.org/10.1080/08957347.2019.1577248

70.

Wise

S. L.

(2012). Setting response time thresholds for a CAT item pool: The normative threshold method. In annual meeting of the National Council on Measurement in Education, Vancouver, Canada, April 2012, pp. 163–183.

71.

Weiland

Stained

(2026). The chronic(les) of absenteeism measurement: Unpacking the many measures of attendance and evidence for a lower chronic absenteeism threshold (EdWorkingPaper no. 26-1380). Annenberg Institute at Brown University. https://doi.org/10.26300/1zvw-qw93

72.

Zamarro

Hitt

Mendez

(2019). When students don’t care: Reexamining international differences in achievement and student effort. Journal of Human Capital, 13(4), 519–552. https://doi.org/10.2139/ssrn.2857243