Abstract

Keywords
Introduction
The introduction of generative artificial intelligence (GenAI) into education represents a double-edged sword. While GenAI offers unprecedented efficiencies in task completion, it may simultaneously undermine the metacognitive processes—students’ ability to monitor, evaluate, and control their own learning—that are foundational to deep learning and academic achievement (Fan et al., 2025; Kasneci et al., 2023; Wang & Guo, 2025). These tools may inadvertently foster metacognitive laziness, defined as learners’ tendency to offload effortful metacognitive self-regulatory practices, such as goal-setting, error monitoring, and strategic reflection, to AI systems rather than engaging in deliberate cognitive effort (Fan et al., 2025; Song & Song, 2023).
When students consistently delegate metacognitive processes to AI tools, they may miss critical opportunities to develop their metacognition (Molenaar, 2022), potentially disrupting the development of critical thinking and self-regulated learning strategies essential for academic success. This phenomenon has consequential implications, as meta-analyses demonstrate that metacognitive skills predict academic performance above and beyond intelligence (He et al., 2024; Ohtani & Hisasaka, 2018) and underpin lifelong learning and clinical reasoning in professional contexts where critical decisions cannot be outsourced to AI systems. Yet existing instruments—whether assessing general metacognitive awareness (Schraw & Dennison, 1994) or work avoidance (Dowson & McInerney, 2004)—cannot capture AI-mediated metacognitive offloading as a distinct phenomenon, leaving educators unable to assess whether AI tools enhance learning or erode metacognitive rigor. Understanding and measuring this behavioral pattern is therefore essential for developing evidence-based policies for responsible AI integration in education.
Metacognitive laziness conceptually aligns with work avoidance goals, a well-established motivational orientation describing students’ motivation to minimize effort and avoid challenging academic tasks (Dowson & McInerney, 2004; King & McInerney, 2014). Students with work avoidance goals strategically seek to complete tasks with minimal cognitive engagement, prioritizing efficiency over mastery. While traditionally manifesting as selecting easier courses or using superficial study strategies, AI tools may enable a different form of work avoidance through strategic metacognitive delegation. This AI-mediated form of work avoidance has profound implications for student engagement and disaffection—distinct motivational constructs representing active engagement versus withdrawal behaviors in learning environments (Skinner et al., 2009).
Existing instruments fail to capture AI-mediated metacognitive offloading for conceptually distinct reasons. On one hand, traditional metacognitive measures, such as the Metacognitive Awareness Inventory (MAI; Schraw & Dennison, 1994), presume that students engage in metacognitive processes themselves and cannot distinguish between those who independently employ strategies and those who delegate to AI. For example, a student using AI to set learning goals, monitor comprehension, and evaluate understanding might score identically to one developing these capacities autonomously, yet their developmental trajectories differ fundamentally (Schön et al., 2023). Existing work avoidance measures (e.g., Goal Orientation and Learning Strategies Survey [GOALS-S]; Dowson & McInerney, 2004), on the other hand, capture general effort minimization but lack specificity for AI-mediated metacognitive delegation as a distinct, technology-enabled mechanism. In addition, emerging AI literacy scales (e.g., Gen-AI awareness scale; Semerci Şahin et al., 2025) and ChatGPT usage measures (Abbas et al., 2024; Nemt-allah et al., 2024) assess AI knowledge and frequency of use but cannot distinguish beneficial AI use (e.g., using AI as a tutor while maintaining metacognitive engagement) from detrimental dependence (e.g., outsourcing metacognitive processes entirely).
Given these measurement gaps, the present study developed and initially validated the Metacognitive Laziness Scale (MLS) within a health professions education context, where faculty have observed changing student behaviors with ChatGPT integration (Durmuş Sarıkahya et al., 2025). We examined the MLS's relationships with student engagement and disaffection and hypothesize that it will (a) have a unidimensional factor structure, (b) demonstrate adequate internal reliability, and (c) show stronger associations with disaffection than engagement measures, reflecting its nature as an avoidance-oriented construct.
Methods
Scale Development and Validation
The MLS was developed through established scale development guidelines (DeVellis & Thorpe, 2021). Below, we describe our rationale and methodological procedures.
Item Development
Item generation, the process of creating an initial pool of items that operationalize a target construct, requires grounding in theoretical frameworks and empirical evidence to ensure content validity (DeVellis & Thorpe, 2021). Given that “metacognitive laziness” describes students’ tendency to avoid effortful thinking by relying on GenAI tools for metacognitive tasks, we drew the MLS items from the six-item Work Avoidance Goals subscale of the GOALS-S (Dowson & McInerney, 2004) and contextualized them to measure students’ metacognitive laziness. For example, the original Work Avoidance Goals scale item, “I choose easy options in school so that I don't have to work too hard,” was revised and adapted to “I choose to use AI for assignments, so I don't have to think too hard.”
Item Refinement
Item refinement ensures clarity, specificity, and theoretical alignment through expert evaluation and pilot testing, minimizing construct-irrelevant variance (Haynes et al., 1995). Two experts in assessment and educational technology independently assessed the six-item pool for relevance to metacognition and behavioral specificity. Ambiguous items (e.g., “I don't develop study strategies when I can use AI instead”) were revised to reflect direct attribution (e.g., “I don't develop my own study strategies when I can use AI instead”).
Participants and Procedures
Of the 316 participants invited to participate in the data collection via convenience sampling, 144 participants from a Hong Kong university gave their voluntary informed consent and completed the survey (45.6% response rate). The sample comprised predominantly female students (n = 99, 68.8%) with male students representing 31.2% (n = 45) of the sample (Table 1). Participants were drawn from six academic disciplines, with the largest representation from Nursing (n = 67, 46.5%), followed by Pharmacy (n = 25, 17.4%) and Medical school students (n = 16, 11.1%). Smaller groups included Chinese Medicine (n = 13, 9.0%), Social Work (n = 13, 9.0%), and Food and Nutritional Science (n = 10, 6.9%). Students were distributed across undergraduate year levels, with the highest representation from Year 2 (n = 53, 36.8%) and Year 4 (n = 52, 36.1%) students. Year 3 students comprised 18.1% (n = 26) of the sample, while Year 1 students represented the smallest group (n = 13, 9.0%).
Sample Characteristics (N = 144).
Note. UG = undergraduate; MBBS = Bachelor of Medicine, Bachelor of Surgery.
The survey questions were built into the Qualtrics platform, and the survey link was distributed online via the students’ learning management system, Moodle, after their completion of a three-week interprofessional education (IPE) simulation course. Their consent or non-consent to participate in the survey did not, in any way, affect their academic standing in the course. The ethics and procedures of this study were in accordance with the 1964 Helsinki Declaration and its later amendments. We sought ethics approval from the Human Research Ethics Committee of The University of Hong Kong.
Validation Study Measures
Metacognitive Laziness Scale: This newly developed six-item scale was used to measure participants’ metacognitive laziness due to the use of GenAI tools. Sample items include “I avoid challenging learning tasks when AI can do them for me” and “I don't develop my own study strategies when I can use AI instead.” The instructions stated, “Different students have different approaches to using AI in their learning. Please rate how true each statement is of you.” Participants rated how true each statement was to them using a 5-point response scale, from 1 (not at all true of me) to 5 (very true of me). Higher mean scores indicate higher metacognitive laziness levels. In the current study, the scale's internal consistency was high (α = .95).
Engagement and Disaffection: We used the Engagement versus Disaffection with Learning scale (EVDL; Skinner et al., 2009), adapted to the IPE context. Sample items include “In IPE sessions, I work as hard as I can” (five-item behavioral engagement subscale, α = .93), “I enjoy learning new things in IPE sessions” (five-item emotional engagement subscale, α = .92), “When I’m in IPE sessions, I just act like I’m working” (five-item behavioral disaffection subscale, α = .93), and “When we work on something in IPE sessions, I feel bored” (five-item emotional disaffection subscale, α = .98). Participants responded to each item using a 4-point scale ranging from 1 (not at all true) to 4 (very true). Higher mean scores for each subscale indicate higher behavioral and emotional engagement or disaffection. Our confirmatory factor analysis (CFA) results confirmed two-factor structures for both engagement and disaffection measures. The engagement model (behavioral and emotional engagement) showed acceptable fit, χ2(34) = 89.72, p < .001, Comparative Fit Index (CFI) = .957, Tucker–Lewis Index (TLI) = .944, root mean square error of approximation (RMSEA) = .107. The disaffection model (behavioral and emotional disaffection) also demonstrated acceptable fit, χ2(34) = 94.39, p <.001, CFI = .964, TLI = v.952, RMSEA = .111.
Data Analysis
No missing data was found in the dataset. For all variables, descriptive statistics, such as frequency distribution, means, standard deviations, and bivariate correlations, were calculated. The internal reliabilities of the measures were examined via Cronbach's alpha.
We conducted both within- and between-network validity of the MLS (see Mendoza & Yan, 2021, 2025). To examine the within-network validity and hypothesized unidimensional factor structure of the MLS, we performed CFA with weighted least squares means and variance adjusted (WLSMV) estimation with polychoric correlations, which is appropriate for ordinal Likert-scale data. Given that the MLS was based on an existing unidimensional subscale (i.e., the Work Avoidance Goals subscale) and has strong empirical evidence supporting its unidimensionality (Dowson & McInerney, 2004; King & McInerney, 2014), we directly performed CFA without running an exploratory factor analysis (EFA; Tavakol & Wetzel, 2020). Model modifications were considered only when theoretically justified by conceptual relationships between items, guided by sequential examination of modification indices. We also compared the hypothesized unidimensional model against alternative factor structures to ensure empirical superiority.
For the between-network validity, we used a structural equation model (SEM) to test the relationships between the MLS scores and the engagement and disaffection scores. We employed robust maximum likelihood estimation, which is appropriate for nonnormal data as indicated by Shapiro–Wilk tests. For both the CFA and SEM, we determined good model fit when there is a greater than 0.90 value in the model CFI and TLI, and a value of less than 0.08 in RMSEA (Hu & Bentler, 1995). In addition, a value less than .08 of the standardized root mean square residual (SRMR) is considered a good fit, while a value of .00 is considered a perfect fit (Hu & Bentler, 1999).
All these analyses were performed using the statistical software R (R Core Team, 2016).
Results
Descriptive Statistics
Descriptive statistics and intercorrelations for all study variables are presented in Table 2. The MLS demonstrated large positive correlations with both behavioral (r = .46, p < .001) and emotional disaffection (r = .50, p < .001), but nonsignificant correlations with behavioral (r = .01) and emotional engagement (r = .04). Engagement measures were highly intercorrelated (r = .84), as were disaffection measures (r = .86), while engagement-disaffection correlations were small and non-significant (−.07 to .01).
Descriptive Statistics and Intercorrelations for Study Variables (N = 144).
Note. All variables showed nonnormal distributions (Shapiro–Wilk p < .05). Metacognitive Laziness Scale used a 5-point scale (1 = never to 5 = very true of me). Engagement and disaffection measures used 4-point scales (1 = not at all true to 4 = very true).
p < .05. *p < .01. **p < .001.
Confirmatory Factor Analyses
Metacognitive Laziness Scale: Consistent with the recommendations to evaluate model fit holistically across multiple indices (Hair et al., 2019; Kline, 2023), the unidimensional CFA model (see Table 3) showed overall acceptable fit to the data, SBχ2(6) = 20.57, p < .002), after accounting for three modification indices. 1 We accounted for the inherent conceptual connections among the items: 1 and 2, 1 and 3, and 2 and 3. The unidimensional model (Figure 1) with the modifications had significantly better model fit than the default unidimensional model, ΔSBχ2(3) = 38.95, p < .001. The scaled CFI (.998), TLI (.995), and SRMR (.014) indicated good fit, and the standardized factor loadings were uniformly strong (.73 to .96). Although the RMSEA was somewhat elevated (RMSEA = .130, 90% CI [.071, .194], the overall pattern of evidence supported the adequacy of the measurement model.

The CFA model of the MLS.
Confirmatory Factor Analysis Model Comparison for the Metacognitive Laziness Scale.
Note. N = 144. CFI = Comparative Fit Index; TLI = Tucker–Lewis Index; RMSEA = root mean square error of approximation; CI = confidence interval; SRMR = standardized root mean square residual. All models were estimated using WLSMV with polychoric correlations. The recommended unidimensional modified model is highlighted in bold. The modified model included residual covariances between items 1↔2, 1↔3, and 2↔3 based on modification indices and theoretical justification. Good model fit criteria: CFI/TLI ≥ .95, RMSEA ≤ .08, SRMR ≤ .08 (Hu & Bentler, 1999).
To assess whether the modified unidimensional model is robust, we further tested it against an alternative factor structure: a two-factor model with alternative item groupings (items 1, 3, 5 vs. 2, 4, 6). The two-factor alternative model showed poor fit (CFI = .992, RMSEA = .227), clearly inferior to the modified unidimensional model.
Standardized factor loadings for the modified MLS model were all strong and significant (Table 4), ranging from .73 to .96 (all ps < .001). Specifically, loadings were .88 for item 1, .86 for item 2, .73 for item 3, .91 for item 4, .96 for item 5, and .95 for item 6. All loadings exceeded the conventional threshold of .70, with item 5 demonstrating the strongest association with the latent construct.
Standardized Factor Loadings for the Metacognitive Laziness Scale.
Note. N = 144. All factor loadings are statistically significant at p < .001. SE = standard error. CI = confidence interval. Factor loadings are standardized coefficients from the modified unidimensional confirmatory factor analysis model with residual covariances between items 1↔2, 1↔3, and 2↔3.
Structural Equation Modeling: The SEM model included the MLS as a predictor of four outcome factors: behavioral engagement, emotional engagement, behavioral disaffection, and emotional disaffection. The SEM (Figure 2) showed an acceptable fit to the data, χ2(285) = 565.73, p<.001, CFI = .931, TLI = .921, RMSEA = .083, 90% CI [.073, .093], SRMR = .054. CFI and TLI values above .90 and an RMSEA below .08 indicate acceptable model fit according to conventional standards (Hu & Bentler, 1999). The MLS factor also explained substantial variance in disaffection outcomes: 23.7% of variance in behavioral disaffection (R2 = .24 and 25.0% of variance in emotional disaffection (R2 = .25) and demonstrated large, significant positive relationships with both disaffection factors: behavioral disaffection (β = .49, SE = .07, p < .001) and emotional disaffection (β = .50, SE = .07, p < .001). In contrast, MLS explained virtually no variance in engagement outcomes (behavioral engagement R2 < .001; emotional engagement R2 = .00) and showed no significant relationships with engagement factors: behavioral engagement (β = .01, SE = .05, p = .87) and emotional engagement (β = .05, SE = .047, p = .59), further supporting the discriminant validity of the relationships.

The fully latent SEM of the relationships among metacognitive laziness, engagement, and disaffection. Note. mt_ = metacognitive laziness; bh_n = behavioral engagement; em_n = emotional engagement; bh_d = behavioral disaffection; and em_d = emotional disaffection. Arrows with thicker bolding indicate stronger relationships.
Discussion
This study developed and validated the MLS, providing the first psychometrically sound instrument to assess AI-mediated metacognitive laziness in educational contexts. Our findings support all three hypotheses in sequence.
The CFA results confirmed that the MLS exhibits a unidimensional structure (H1). The strong factor loadings across all items further support the coherence of metacognitive laziness as a single construct, extending work avoidance goal theory into AI-mediated learning contexts.
The MLS demonstrated excellent internal consistency (H2), exceeding conventional reliability standards and indicating that the six items effectively capture the construct with minimal measurement error. This reliability level is comparable to well-established educational scales (Cale et al., 2025; King & McInerney, 2014) and supports the instrument's utility for both research and applied contexts.
The pattern of correlations strongly supported our theoretical predictions among metacognitive laziness, engagement, and disaffection (H3). Metacognitive laziness showed significant and large positive correlations with both behavioral and emotional disaffection, while demonstrating non-significant associations with engagement measures. The SEM results further confirmed this pattern, with metacognitive laziness explaining substantial variance in disaffection outcomes (24%–25%) while contributing no variance to engagement outcomes. This discriminant validity evidence aligns with theoretical models distinguishing engagement and disaffection as separate constructs rather than opposite poles of a continuum (Skinner et al., 2009), suggesting that AI-mediated metacognitive offloading specifically promotes withdrawal behaviors rather than merely reducing positive participation.
These findings have significant theoretical and practical implications. Theoretically, the results extend work avoidance goal theory into the digital age and demonstrate initial evidence that AI tools may create novel pathways for academic disengagement that transcend traditional effort-minimization strategies. Practically, the MLS could enable educators to identify students at risk for AI-dependent learning patterns before they become entrenched. The moderate mean score suggests that metacognitive laziness is already present in our student sample, warranting proactive interventions. Educational institutions could use the MLS to develop targeted support programs that promote metacognitive awareness while harnessing AI's benefits responsibly.
Limitations and Future Directions
Several limitations merit consideration. First, the cross-sectional design precludes causal inferences about the relationship between AI use and metacognitive development. Longitudinal studies are needed to establish whether sustained AI dependence actually diminishes metacognitive skills or whether students with pre-existing metacognitive deficits are simply more prone to AI offloading.
Second, sample characteristics limit generalizability. The sample was small and limited to health professions students in Hong Kong, though the sample size falls within recommended ranges for CFA with simple structures (Kyriazos, 2018; Wolf et al., 2013). The gender imbalance (68.8% female) reflects commonly observed health professions demographics but constrains cross-gender comparability. The 45.6% response rate may raise selection bias concerns, though observed score variability for the MLS (M = 2.70, SD = 1.07, range = 1.00–5.00) and diversity across six disciplines and four-year levels suggest reasonable sample heterogeneity. Measurement invariance testing across gender and replication with larger, more diverse samples are warranted.
Future research should examine the MLS across diverse academic contexts and larger samples to further establish its predictive validity for objective learning outcomes, such as problem-solving performance and transfer of learning. Additionally, intervention studies testing whether metacognitive training can mitigate AI-dependent behaviors would provide crucial evidence for educational practice. Finally, investigating the potential moderating effects of AI literacy and metacognitive instruction on the relationship between AI use and learning outcomes represents a promising avenue for promoting responsible AI integration in education.
Conclusion
The Metacognitive Laziness Scale offers researchers and educators a validated tool to navigate the complex landscape of AI-enhanced learning, potentially enabling evidence-based approaches to fostering both technological fluency and metacognitive competence in the digital age. We hope that our small effort to contribute to advancing our understanding of AI-driven metacognitive laziness will ignite traction among the community of practitioners to inform the enrichment of their research agenda.
Takeaway Message
This study developed and validated the MLS, providing the first psychometrically sound instrument to measure AI-mediated metacognitive laziness in educational contexts.
The MLS demonstrates excellent reliability and validity, with confirmatory factor analysis supporting its unidimensional structure and strong factor loadings across all six items.
Metacognitive laziness showed significant positive correlations with behavioral and emotional disaffection while demonstrating non-significant associations with engagement, confirming theoretically predicted relationships.
The scale enables educators to identify students at risk for AI-dependent learning patterns and implement proactive interventions before such patterns become entrenched.
Supplemental Material
sj-docx-1-roe-10.1177_20965311261450994 - Supplemental material for Assessing AI-Driven Metacognitive Offloading: Initial Development and Validation of the Metacognitive Laziness Scale
Supplemental material, sj-docx-1-roe-10.1177_20965311261450994 for Assessing AI-Driven Metacognitive Offloading: Initial Development and Validation of the Metacognitive Laziness Scale by John Ian Wilzon T. Dizon, Norman B. Mendoza, Dragan Gasevic and Fraide A. Ganotice in ECNU Review of Education
Footnotes
Acknowledgments
We thank the institutional leaders at the University of Hong Kong for their consistent support.
Ethical Considerations
The ethics and procedures of this study were approved by the Human Research Ethics Committee of The University of Hong Kong (approval number EA240378).
Author Contributions
John Ian Wilzon T. Dizon co-conceptualized the study, analyzed the data, and was a significant contributor to the writing and editing of the manuscript. Norman B. Mendoza reviewed the conceptualization, data analysis, and results of the study and contributed to the writing and editing of the manuscript. Dragan Gasevic reviewed the conceptualization and the results of the study and contributed to the writing and editing of the manuscript. Fraide A. Ganotice, Jr. co-conceptualized the study, supervised the data collection, reviewed the results, and contributed to the writing and editing of the manuscript.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was supported by the Teaching Development Grant from the University of Hong Kong, granted to the corresponding author.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Supplemental Material
Supplemental material for this article is available online.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
