Abstract
This study developed and validated a comprehensive AI literacy scale for higher education students through a mixed-methods approach. The development process integrated literature review, expert interviews, and BERTopic modeling analysis. Following content validity assessment and pilot testing, the scale was validated with a sample of 400 students. Factor analyses supported a 17-item, four-dimensional structure comprising AI Fundamental Knowledge, AI Impact Assessment, AI Performance Evaluation, and AI Practical Application. The scale demonstrated adequate internal consistency and configural and metric invariance across academic disciplines. Group differences were observed according to academic major, AI course experience, and academic level. Structural equation modeling indicated that AI literacy is positively associated with academic self-efficacy, which is in turn related to multi-item measured creativity and students’ GPA. Mediation analysis indicated that academic self-efficacy mediated the association between AI literacy and creativity, while the indirect association between AI literacy and academic achievement via this pathway was offset by the negative link between creativity and GPA. However, the generalizability of these findings may be constrained by the specific cultural context and sample characteristics.
Keywords
Introduction
AI has moved beyond experimental applications to transform practical educational contexts, with the emergence of generative AI democratizing access to these technologies among higher education students (Chan and Tsi, 2024; Walter, 2024; Yang et al., 2025). University students increasingly engage with AI systems for academic research, writing support, problem-solving, and discipline-specific inquiry. However, this widespread adoption brings concerns about reliability, academic integrity, ethical considerations, and overreliance on systems, highlighting the critical need for appropriate AI literacy development to prevent negative educational outcomes (Stanford University Human-Centered Artificial Intelligence, 2021).
AI literacy differs substantially from traditional digital literacy frameworks. While digital literacy primarily focuses on basic technological skills and information management, AI literacy encompasses understanding AI concepts and technologies, applying AI in everyday contexts, and critically evaluating AI systems and AI-generated outputs (Long and Magerko, 2020; Ng et al., 2021). Unlike computational thinking or ICT competence, which emphasize algorithmic problem-solving or tool proficiency, AI literacy involves engagement with autonomous systems that make probabilistic decisions and generate content independently, often without transparent explanations of their internal processes. This distinction is particularly crucial in higher education, where students navigate complex human-AI interactions that require interpretive judgment and evaluative reasoning beyond conventional digital skills.
Despite growing recognition of its importance, reliable AI literacy measurement remains challenging. Establishing consistent definitions and assessment methods has proven difficult, with existing studies showing limited consensus on core competencies (Annapureddy et al., 2025; Wang et al., 2023a). Numerous AI literacy scales have been developed; however, many target K-12, teachers, or general populations and focus primarily on awareness, attitudes, or ethical perceptions rather than academically situated AI use (Jin et al., 2025; Lintner, 2024; Tagare et al., 2025). As a result, these measurements provide limited insight into how AI literacy function as an educational competence within higher education contexts, where AI engagement is closely intertwined with academic practices, disciplinary learning, and learning-related psychological variables.
Building on established frameworks proposed by Long and Magerko (2020), Ng et al. (2021), Wang et al. (2023b), and related studies, this research develops a synthesized four-dimensional AI literacy model specifically focused on higher education contexts. Across these frameworks, recurring dimensions – conceptual understanding, applied use, critical evaluation, and ethical awareness – emerge consistently, forming the theoretical basis of the four-dimensional structure adopted in this study. By situating these dimensions within academic contexts, the present study emphasizes not only functional interaction with AI systems but also students’ capacity to evaluate and reflect on AI use in learning activities such as research, writing, and problem-solving.
Methodologically, existing scale development has relied on expert-driven thematic analysis for item generation. While such qualitative approaches are essential for construct definition, they are inherently influenced by researcher interpretation and typically based on relatively small expert samples. To enhance construct validity and reduce researcher subjectivity, this study complements traditional expert interviews with computational text mining using BERTopic (Grootendorst, 2022). Whereas thematic analysis depends on interpretive coding, BERTopic enables systematic identification of latent thematic structures across a large corpus of international AI literacy discourse, thereby improving construct coverage, transparency, and reproducibility while retaining the strengths of qualitative judgment.
Beyond measurement concerns, the educational consequences of AI literacy remain insufficiently explored. Although preliminary studies suggest associations between technological competencies and academic outcomes, the pathways through which AI literacy relates to academic self-efficacy, creativity, and academic achievement remain theoretically underdeveloped and empirically underexamined. Without such evidence, AI literacy risks remaining a descriptive concept rather than an explanatory construct within educational research.
Accordingly, this study addresses the following research questions:
This research proceeds in three phases: (1) constructing a theoretical framework through systematic literature review, (2) developing and validating measurement items through mixed methods (literature review, expert interviews, and topic modeling), and (3) empirically examining relationships with academic variables (academic self-efficacy, creativity, and academic achievement). This study contributes by integrating computational topic modeling into AI literacy scale development and empirically validating AI literacy’s relationship with core academic outcomes. Through these processes, the study aims to contribute to AI literacy measurement literature while providing empirical evidence for AI literacy as a distinct multidimensional construct and offers a validated assessment instrument to inform educational practice in AI-integrated higher education contexts.
Literature review
Concept and competency of AI literacy
The conceptualization of literacy has evolved substantially with technological advances, progressing from traditional literacy (Bawden, 2001) to various specialized forms such as information and media literacy (Buckingham, 2015; Livingstone, 2004). While AI literacy was initially conceptualized as a subset of digital literacy, recent scholarly discourse increasingly suggests its distinct theoretical positioning due to AI’s unique characteristics and societal implications (Ng et al., 2021; Wang et al., 2023b). Long and Magerko (2020: 58) define AI literacy as “the ability to critically evaluate AI technologies, communicate and collaborate with AI, and use AI as a tool across contexts.” AI literacy appears to differ fundamentally from traditional literacy frameworks in both scope and complexity. Traditional literacy frameworks primarily focus on text interpretation and basic digital tool usage (Street, 2003), whereas AI literacy includes understanding of AI principles, algorithmic characteristics, and potential biases embedded within systems (Long and Magerko, 2020). The ethical dimensions of AI literacy extend beyond conventional information ethics to address unique challenges including algorithmic bias, AI accountability, and transparent decision-making processes (Floridi et al., 2018).
Recent theoretical developments have increasingly proposed frameworks for understanding AI literacy components. Ng et al. (2021) suggested that AI literacy encompasses core components including understanding AI concepts, applying AI tools, evaluating and creating with AI systems, and navigating AI ethics. However, the rapid evolution of AI technologies, particularly the emergence of generative AI applications and related algorithms, may require expansion of these foundational frameworks to address new competencies such as prompt engineering, content verification, and human-AI collaborative workflows.
This multifaceted nature of AI literacy suggests potential distinctions between knowledge-based competencies (understanding how AI systems function) and application-oriented capabilities (effectively utilizing AI tools for specific purposes). This theoretical distinction may have important implications for measurement development, as different competency types could require different assessment approaches and validation strategies. Furthermore, the contextual nature of AI literacy – how competencies manifest differently across educational, professional, and personal contexts – remains an area requiring further theoretical development and empirical investigation.
Existing AI literacy measurement
The need for reliable AI literacy measurement has grown substantially alongside increasing AI adoption in education and society. While theoretical frameworks have been established, empirical research on validated measurement tools has rapidly expanded, particularly since the emergence of generative AI technologies. Recent systematic review provides comprehensive insights into the current state of AI literacy measurement, revealing both progress and significant gaps in instrument development. Lintner’s (2024) systematic review represents comprehensive analysis of AI literacy scales to date, evaluating the psychometric quality of 22 studies validating 16 different scales using the COSMIN (Consensus-based Standards for the selection of health Measurement Instruments) tool. This systematic assessment revealed that while most existing scales demonstrated good structural validity and internal consistency, critical limitations persist across the measurement landscape. Significantly, only a few scales have been tested for content validity, reliability, construct validity, and responsiveness, while none have been examined for cross-cultural validity and measurement error – fundamental requirements for robust psychometric instruments. Early foundational work by Long and Magerko (2020) established core competencies focusing on critical evaluation, communication, and collaboration with AI systems, providing theoretical foundations that influenced subsequent empirical developments. However, their framework preceded the mainstream introduction of generative AI, limiting its applicability to contemporary AI interactions that students regularly encounter.
Recent studies have witnessed significant advancement in AI literacy measurement, with researchers addressing both general AI competencies and emerging generative AI applications. Wang et al. (2023c) introduced a comprehensive 31-item scale measuring AI literacy across four dimensions – awareness, usage, evaluation, and ethics – validated through rigorous three-step content validation and factor analysis procedures. Their measurement demonstrated psychometric properties and established significant relationships with digital literacy and attitudes toward AI technology, though development focused primarily on general adult populations rather than higher education contexts. Carolus et al. (2023) made substantial theoretical contributions with the Meta AI Literacy Scale (MAILS), a 34-item measurement that expanded beyond traditional AI knowledge to incorporate psychological competencies including psychological competencies including problem-solving, learning, and emotion regulation. This multidimensional framework represented a significant advancement, integrating knowledge-related, operational, critical, and ethical dimensions while addressing psychological change and meta-competencies. Koch et al. (2024) subsequently confirmed the scale’s robustness through further validation studies and developed a 10-item short version, demonstrating consistency across different populations and contexts.
Recognizing the limitations of general population measurements, several studies have developed approaches targeting specific educational contexts. Laupichler et al. (2023) identified the Scale for the Assessment of Non-Experts’ AI Literacy (SNAIL) through iterative delphi expert methodology, resulting in a 31-item scale with three factors: technical understanding, critical appraisal, and practical application. Their approach specifically addressed non-expert populations, filling a crucial gap in measurement for individuals without formal AI or computer science education. For higher education contexts specifically, Hornberger et al. (2023) developed a comprehensive measurement focusing on technical knowledge assessment, though their approach emphasized factual knowledge over practical application skills that characterize contemporary AI literacy requirements. Yuan et al. (2024) addressed this limitation by developing a holistic AI literacy scale including individual, interactive, and sociocultural dimensions, with cognitive, behavioral, and normative competencies across six dimensions: AI features, AI processing, algorithm influences, user efficacy, ethical consideration, and threat appraisal.
The rapid adoption of generative AI tolls has created new measurement needs that traditional AI literacy scales inadequately address. Liu et al. (2025) developed a workplace-oriented Generative AI Literacy (GAIL) framework covering five core dimensions: basic technical competence, prompt optimization, content evaluation, innovative application, and ethical and compliance awareness. This framework specifically addresses the unique competencies required for effective human-AI collaboration in generative AI context, including prompt engineering skills and content evaluation capabilities. Chen et al. (2025) conducted empirical research examining generative AI literacy across four dimensions – utilization, interaction, evaluation of output, and ethics – among higher education students. Their findings revealed that while students actively use generative AI tools for academic purposes, most demonstrate critical evaluation of outputs and express need for explicit institutional guidance regarding ethical and appropriate use.
Recent efforts have addressed the cultural specificity of AI literacy measurement through cross-cultural validation studies. Hobeika et al. (2024) developed and validated an Arabic version of the AI Literacy Scale (AILS) for university students. Their work demonstrated the feasibility of cross-cultural adaptation while highlighting the need for culturally appropriate measurement approaches.
Most existing scales have been developed primarily for general populations or specific professional contexts, with relatively limited attention to the distinctive requirements of higher education students (Carolus et al., 2023; Wang et al., 2023b). Higher education students tend to integrate AI tools into academic research, scholarly writing, and learning processes in ways that may differ substantially from general population usage patterns (Hornberger et al., 2023). This contextual gap may be particularly significant given evidence suggesting that students encounter AI in specialized academic contexts, including research assistance, automated feedback systems, and intelligent content curation (Chen et al., 2025), which could require competencies that differ from those captured by measurements designed for broader populations. Furthermore, existing measurement has typically relied on single methodological approaches, with limited integration of multiple validation methods (Lintner, 2024). While individual studies have identified specific psychometric properties through conventional approaches such as factor analysis and reliability testing (Laupichler et al., 2023; Wang et al., 2023c), fewer studies appear to have systematically combined diverse methodological approaches such as literature analysis, expert interview, and computational text analysis to ensure comprehensive construct coverage. This methodological pattern suggests potential opportunities for more fine-tuned measurement development through integrated validation approaches.
Additionally, while some research has begun to explore relationships between AI competencies and academic variables (Wang et al., 2023c), investigation of complex mediating pathways appears to be in early stages of development. For instance, the potential role of academic self-efficacy as a mediating mechanism through which AI literacy might influence creativity and academic achievement has received relatively limited empirical attention, despite established theoretical frameworks in self-efficacy research (Bandura, 1997) suggesting such psychological pathways could be important for understanding educational outcomes in technology-enhanced learning contexts.
Academic variables related AI literacy
The relationship between AI literacy and academic outcomes has emerged as an increasingly important area of scholarly inquiry with recent empirical evidence revealing complex associations among technological competencies and educational variables. Contemporary research suggests that AI competencies may influence academic self-efficacy, creativity, and academic achievement through multifaceted pathways, though findings remain mixed and require continued investigation (Mansoor et al., 2024; Zhang et al., 2024).
Academic self-efficacy, conceptualized by Bandura (1977) as learners’ belief in their ability to perform academic tasks, has been established as a significant predictor of academic performance across diverse educational contexts. Research in self-efficacy theory suggests that competence in specific technologies domains may enhance confidence in related academic tasks (Bandura, 1997). Recent studies have begun to examine these relationships more directly within AI contexts. Zhang et al. (2024) explored the complex associations between academic self-efficacy, academic stress, and performance expectations in AI usage behaviors among 300 university students, finding that academic self-efficacy mediates relationships between academic variables and problematic AI dependency patterns. Their investigation using the Interaction of the Person-Affect-Cognition-Execution (I-PACE) model revealed that students with lower academic self-efficacy may be more prone to developing dependent relationships with AI tools. The emergence of AI technologies in educational contexts appears to create multidimensional dynamics in self-efficacy development. Recent multinational research suggests complex relationships between AI literacy and self-efficacy beliefs. A multinational study of 1465 university students across Germany, the UK, and the US found that students identified foundational levels of AI literacy alongside relatively high levels of interest and positive attitudes toward AI technologies, though significant cross-national variations existed in self-efficacy beliefs (Hornberger et al., 2025). These findings suggest that cultural and educational contexts may influence how AI literacy development relates to academic confidence. Zhang et al. (2024) identified several concerning consequences of AI dependency, including increased academic laziness, spread of misinformation, reduced creativity, and diminished critical and independent thinking abilities. These findings highlight the complexity of relationships between AI literacy development and academic self-efficacy, suggesting that while technological competence may enhance confidence, excessive dependency could potentially undermine the very academic capabilities it initially supported.
Creativity has gained renewed prominence in educational discourse within AI contexts, with emerging research suggesting contradictory relationships between AI usage and creative capabilities. While some theoretical perspectives propose that AI tools might enhance human creativity by providing resources for idea generation and iterative refinement, empirical findings suggest more complex dynamics. Research conducted among Austrian university students found that AI critical appraisal significantly and negatively impacted both AI self-efficacy and AI output quality, suggesting that critical evaluation of AI capabilities may reduce overconfidence while potentially constraining creative exploration (Hornberger et al., 2025). Chen et al. (2025) conducted empirical investigation of generative AI literacy across four dimensions – utilization, interaction, evaluation of output, and ethics – among higher education students. Their findings identified that while students actively use generative AI tools for academic purposes, most exhibited critical evaluation of outputs and expressed need for explicit institutional guidance regarding ethical and appropriate use. These results suggest that creativity in AI-enhanced on students’ ability to balance AI assistance with critical evaluation skills. The relationship between AI tools and creativity appears to vary considerably based on implementation approaches and student characteristics. Recent evidence suggests that well-founded AI tool usage may support student performance and motivation, while excessive reliance on AI-powered tools may impact student performance negatively (Khoso et al., 2023; Montenegro-Rueda et al., 2023). This variability highlights the importance of understanding how different approaches to AI integration influence creative capabilities and academic outcomes.
Academic achievement, measured through grades, standardized assessments, or performance evaluations, represents a critical outcome variable in educational research. Recent empirical evidence reveals complex relationships between AI literacy and academic performance. Mansoor et al. (2024) conducted a comparative transnational survey among university students, finding an overall moderate AI literacy level and reporting an inverse relationship between academic performance and AI literacy levels. Their analysis suggested that students with lower academic performance may tend to rely more heavily on AI tools to complete academic tasks, though the causal mechanisms underlying this relationship require further investigation. These findings align with other recent research showing mixed results regarding AI literacy and academic achievement relationships. While some studies have identified positive associations between AI literacy, AI usage, and academic performance, others report weak correlations or inconclusive results (Abbas et al., 2019; Asio, 2024). Austrian university student research found that AI technical understanding, critical appraisal, practical application, self-efficacy, and output quality had statistically insignificant effects on students’ academic performance, despite identifying significant relationships with AI self-efficacy and output quality (Hornberger et al., 2025). Research examining generative AI literacy specifically has provided additional insights into these complex relationships. O’Dea et al. (2026) investigated factors affecting university students’ generative AI literacy in UK and Hong Kong contexts, utilizing the four-dimensional AI literacy framework including knowledge and understanding, use and application, evaluation and creation, and AI ethics. Their findings suggested that generative AI literacy development varies significantly across cultural and educational contexts, with implications for how AI competencies relate to academic outcomes. Walter (2024) emphasized that AI integration in education requires systematic approaches considering creativity, technological fluency, and critical thinking skills, moving beyond traditional educational methods to embrace more dynamic, student-centered learning environments. This perspective suggests that the relationships among AI literacy, academic self-efficacy, creativity, and academic achievement may depend on pedagogical approaches and institutional support structures. The complexity of these relations suggests the importance of comprehensive theoretical frameworks that can account for multiple pathways, mediating mechanisms, and contextual factors. While empirical evidence continues to accumulate, the gaps remain in understanding how AI literacy development influences academic outcomes through psychological and pedagogical mechanisms, particularly in different cultural and disciplinary contexts. The mixed findings across studies highlight the need for more refined theoretical models and measurement approaches that can capture the multidimensional nature of AI literacy’s educational impact.
Methodology
This study employed a systematic mixed-methods approach to develop and validate an AI literacy measurement tool specifically designed for higher education contexts. Recent methodological developments emphasize the importance of triangulation in scale development to enhance validity and reduce biases inherent in single-method approaches. The development process involved three main phases: conceptualization through methodological triangulation, comprehensive validation through expert review and pilot testing, and examination of relationships with academic variables (Figure 1).

Steps of scale development and validation.
The initial development phase integrated three complementary approaches to ensure comprehensive coverage of AI literacy components: literature analysis, expert interviews, and computational text analysis through BERTopic modeling. Literature analysis established theoretical foundations by examining existing AI literacy definitions and measurement frameworks. Expert interviews with eight participants, including industry professionals and professors, provided insights into required competencies and contextual factors. The interview data were qualitatively analyzed to identify and refine AI literacy components, serving as the primary basis for construct definition. BERTopic modeling was applied as a complementary procedure to examine thematic convergence and construct coverage across international AI literacy discourse, rather than as a primary method for construct generation. The resulting topics were systematically mapped onto the preliminary construct framework derived from literature review and expert interviews to assess alignment, identify missing areas, and refine construct boundaries. This mixed-method approach, combining qualitative research with computational text analysis, provided complementary perspectives and enhanced construct validity.
For content validity assessment, eight different experts evaluated items using Lynn’s (1986) method. A pilot study was conducted with 42 higher education students to assess face validity, item clarity, and comprehensibility. Participants provided written feedback on ambiguous or difficult items. Based on this process, items were revised or removed due to ambiguity, redundancy, or conceptual misalignment. After this, a total of 400 participants from multiple higher education institutions and diverse academic disciplines were recruited and randomly divided into two equal groups of 200 participants each. The detailed demographic characteristics are presented in Table 2. Exploratory factor analysis was conducted using SPSS 27.0, and confirmatory factor analysis was performed using AMOS 26.0, enabling thorough statistical validation and measurement invariance testing. Measurement invariance was examined through multi-group confirmatory factor analysis, sequentially testing configural and metric invariance, with more restrictive forms evaluated in subsequent models.
To test the hypotheses, we used validated measurements for each construct. Academic self-efficacy was measured using Kim and Park’s (2001) scale. Creativity was assessed using scale adapted by Choi (2014), which conceptualizes creativity as a multidimensional construct including fluency, flexibility, originality, and elaboration. The scale was developed by reorganizing creativity-related items grounded in classic creativity theories, including Guilford’s structure of intellect (Guilford, 1950, 1967, 1968), Torrance’s divergent thinking framework (Torrance, 1965), and Amabile’s componential model (Amabile, 1988, 1996). The scale has been empirically validated through exploratory and confirmatory factor analyses and has demonstrated adequate internal consistency across its sub-dimensions (Cronbach’s ɑ = 0.773–0.908), Academic achievement was operationalized using participants’ GPA in AI-related coursework.
Using data from 182 participants, the validated scales were applied to analyze the structural relationships among AI literacy, academic self-efficacy, creativity, and academic achievement. Structural equation modeling was conducted with AMOS 26.0 using composite scores as indicators rather than individual items. Following CFA validation, composite scores were calculated as mean values of items within each validated factor, with AI literacy modeled using four composite indicators and academic self-efficacy and creativity using their respective composite indicators. Mediation effects were assessed through bias-corrected bootstrap procedures with 500 resamples, incorporating phantom variables to decompose complex indirect pathways.
Based on the proposed model (Figure 2), the following hypotheses were tested:

Research model for validation.
In addition to testing direct effects, mediation analyses were conducted to examine whether academic self-efficacy mediates the relationship between AI literacy and creativity, and whether AI literacy has indirect effects on academic achievement through the sequential mediation of academic self-efficacy and creativity.
Scale development and validation
This study developed and validated an AI literacy scale through three methodological approaches. We first analyzed literature to establish a theoretical foundation for conceptualizing AI literacy and its components, focusing on higher education student competencies. We then conducted in-depth interviews with domestic AI experts to identify specific competencies needed for workforce entry. Additionally, we applied topic modeling to international AI experts’ social media content to complement our previous findings. These approaches were integrated to establish AI literacy components with clear definitions and develop appropriate measurement items.
Exploration of AI literacy components and initial item development
Primary components identification
In this phase, we established a theoretical framework for AI literacy components, examining scales across various educational and professional contexts. Previous research has focused on diverse populations including college students, adults, and corporate employees, predominantly using Long and Magerko’s (2020) AI competencies framework (Carolus et al., 2023; Hornberger et al., 2023; Karaca et al., 2021; Laupichler et al., 2023; Long and Magerko, 2020; Mikalef and Gupta, 2021; Ng et al., 2021; Wang et al., 2023b). The systematic analysis of these studies identified four primary components:
AI Knowledge. Understanding of AI concepts, principles, and technical foundations, differentiated into basic AI knowledge, advanced AI knowledge, and AI learning methods.
AI Application. Integration of development and utilization capabilities, reflecting no-code/zero-code tools and generative AI, which have blurred traditional boundaries of development and application skills.
AI Ethics and Evaluation. Ethical considerations combined with critical evaluation capabilities, including data ethics, AI ethical principles, and critical thinking about AI applications.
AI’s social impact. Understanding AI’s influence on daily life, social structurers, careers, and broader societal implications.
Expert interview analysis
Eight experts were interviewed to validate and expand the literature-identified components. The interviews explored required competencies for higher education students in the AI era. Thematic analysis revealed five primary themes:
The first theme centered on AI knowledge structure, revealing a hierarchical organization distinguishing between basic conceptual understanding and advanced technical knowledge. Industry experts emphasized the importance of raising domain knowledge quickly, as one expert noted: “Since I went in without any practical experience. . . As a career move but with no AI experience, I decided to quickly raise my domain knowledge level to produce results” (Industry expert D).
The importance of data literacy was particularly emphasized: “Data is the most important. Whether it's deepfakes or inappropriate videos, you first need human labels to create machine learning models” (Industry expert E).
The second theme revealed an evolution in perceptions of AI application skills, particularly following the emergence of generative AI. The focus has shifted from pure programing capability to problem-solving abilities and hands-on experience. As one industry expert explained: “Current LLM implementations enable natural language coding interfaces, producing analytical results comparable to traditional programming outputs, effectively democratizing development capabilities” (Industry expert C).
The emergence of prompt engineering was highlighted as a crucial skill: “The prominence of prompt engineering will increase substantially. It includes understanding algorithmic foundations and formulating queries that align with the system's operational parameters” (Professor G).
Ethical considerations emerged as the third theme, with experts emphasizing organizational management of ethics and data protocols. Industry experts consistently emphasized data ethics and copyright considerations: “While the application of outputs is crucial, we carefully review each dataset used in model development. Although more data would of course lead to better results, we have to be very careful because we cannot use data in an arbitrary way and ignore copyright issues” (Industry expert D).
Academic perspectives added depth to ethical considerations: “(Regarding ChatGPT usage in assignments) Will students disclose its use or not? Their continued non-disclosure indicates their assumption of responsibility through final review. . . This reveals how potential elite members of our society conceptualize intellectual property” (Professor F).
The fourth theme focused on social impact assessment, particularly the concept of “AI transition.” Experts discussed both immediate impacts and future implications: “Being able to think about or logically reason about how activities in your daily life could be transformed by AI . . . I thought that might be a really important point from that perspective” (Industry expert D).
Concerns about societal changes were evident: “If its output created in 5 minutes is cleaner and better than what I produced after much deliberation, wouldn't people stop engaging in creative activities that require deep thinking?” (Professor F).
The fifth theme, basic AI competency, emerged as a distinct component focusing on fundamental skills required in the AI era. Communication abilities were particularly emphasized: “Communication skills seem really important. Whether you are good at English or not, being able to explain things clearly is crucial” (Industry expert E).
The importance of human collaboration was also highlighted: “I think human-to-human collaboration skills are really necessary. In my view, you need to be able to collaborate with other humans first before you can work well with AI” (Professor H).
These findings contributed to framework development in three ways: (1) reconstructing technical components into unified “AI Utilization” reflecting industry practices, (2) adding “Basic AI Competency” for fundamental AI-era skills not identified in literature, and (3) expanding ethical considerations to include both individual and organizational protocols. These modifications enhanced the framework’s relevance for higher education contexts.
Topic modeling analysis
BERTopic modeling analysis of international experts’ content generated 33 initial topics, consolidated into 10 meaningful clusters through hierarchical clustering. These clusters aligned with the five preliminary components, providing empirical support for the conceptual framework and revealing the interconnected nature of AI literacy dimensions (Table 1).
Mapping of BERTopic modeling’s clustering results to components.
Key findings include the identification of distinct dimensions within the AI knowledge, emphasis on the international discourse surrounding AI ethics and regulations, emergence of a complex interplay among various AI literacy dimensions, and recognition of the technical infrastructure as a significant organizational-level factor. The analysis identified the temporal dimension of AI literacy, emphasizing both immediate and future implications of AI technologies. These findings were incorporated into scale development, ensuring comprehensive coverage of the AI literacy construct and providing empirical support for the proposed framework.
Initial assessment
Initial assessment covered 80 items across five factors: Factor 1 (27 items, CVI range: 0.50–1.00), Factor 2 (20 items, 0.50–1.00), Factor 3 (17 items, 0.25–1.00), Factor 4 (9 items, 0.75–1.00), and Factor 5 (7 items, 0.625–1.00). Based on experts’ feedback and subsequent revisions, the scale was refined to a more precise and comprehensive 84-item scale for pilot testing. The item counts for each component were adjusted: Factor 1 was reduced to 25 items, Factor 2 was expanded to 24 items, Factor 3 was streamlined to 15 items, Factor 4 was refined to 11 items, and Factor 5 retained its 7 items with content revisions.
Based on the pilot test with 42 higher education students and a subsequent expert review, items showing redundancy, ambiguity, or conceptual overlap were identified and removed. As a result, the initial pool of 84 items was refined to 48 items prior to exploratory factor analysis. The refined scale comprised five constructs: Factor 1 (14 items), Factor 2 (12 items), Factor 3 (12 items), Factor 4 (6 items), and Factor 5 (4 items).
As shown in Table 2, the sample consisted of 154 males (38.5%) and 246 females (61.5%). In terms of age, 319 participants (79.8%) were 29 years old or younger, 63 (15.8%) were between 30 and 39, and 18 (4.4%) were 40 or older. Academic status was comprised of 219 undergraduates (54.8%) and 181 graduate students (45.3%). A slight majority of participants attended private institutions (54.8%) located in metropolitan areas (55.8%). Participants’ major fields were primarily Engineering (42.5%) and Humanities & Social Sciences (40.0%), with smaller proportions from Natural Sciences (8.0%), Medicine (5.0%), and Arts & Physical Education (4.5%).
Demographic characteristics of survey respondents (N = 400).
Course grades are reported only for students who have completed regular AI-related courses. One-time eventful seminar attendees (n = 2) are classified as “Not enrolled.”
Descriptive statistics for all 48 items revealed mean scores ranging from 2.13 to 4.00 with standard deviations between 0.734 and 1.402. The skewness and kurtosis values for all factors fell within the ±2 range, indicating that the data met the assumption of normality for subsequent analyses (Table 3).
Descriptive statistics.
Item-item correlation analysis showed that the correlation coefficients ranged from −0.014 to 0.798, with most correlations being statistically significant (p < 0.01 or p < 0.05). No correlation coefficients exceeded the 0.80 threshold, indicating no severe multicollinearity issues that would interfere with factor analysis procedures.
Exploratory and confirmatory factor analysis
Exploratory factor analysis
Exploratory factor analysis was conducted with Sample 1 (n = 200). Before EFA, preliminary data screening was conducted. Descriptive statistics revealed that the mean scores ranged from 2.09 (SD = 1.304) to 4.01 (SD = 0.716) across all 48 items. Standard deviations ranged from 0.716 to 1.383, indicating adequate variability in responses. All variables demonstrated acceptable levels of skewness (−0.944 to 0.958) and kurtosis (−1.339 to 1.534), with absolute values below |2.0|, and indicating that the assumption of normality was satisfied for all variables.
An exploratory factor analysis was conducted on 48 items with data from 200 participants (Sample 1). Prior to factor extraction, we conducted item refinement process based on factor loading below 0.40, communalities below 0.30, and cross-loading above 0.32, resulting in the retention of 21 items. The Kaiser-Meyer-Olkin measure verified the sampling adequacy for the analysis, KMO = 0.938, which is above the acceptable limit of 0.6. Bartlett’s test of sphericity χ2 (210) = 2834.82,
Results of exploratory factor analysis.
Note. Loadings ≥ .40 are shown in bold.
Internal consistency was assessed using Cronbach’s alpha coefficient. The overall 21-item scale demonstrated internal consistency (α = 0.942). Internal consistency estimates for each factor identified in the EFA showed Factor 1 (seven items) for α = 0.927, Factor 2 (five items) for α = 0.867, Factor 3 for α = 0.868, and Factor 4 (six items) for α = 0.875. All factors exceeded the recommended threshold of 0.70 for acceptable internal consistency. Item-total correlations ranged from 0.434 to 0.867, indicating that all items contributed meaningfully to the coherence of the scale. Item-factor correlations ranged from −0.578 to 0.518, supporting the use of oblique rotation and indicating moderate relationships among factors.
Confirmatory factor analysis
Following the EFA and assessment of internal consistency conducted on Sample 1, a second independent sample (Sample 2, N = 200) was collected for confirmatory factor analysis. Preliminary descriptive statistics for the CFA sample revealed that mean scores ranged from 2.68 (SD = 1.214) to 4.00 (SD = 0.777), with standard deviations ranging from 0.749 to 1.319. All 21 items demonstrated acceptable distributional properties, with skewness values ranging from −0.871 to 0.343 and kurtosis values ranging from −1.058 to 1.501, indicating normality assumptions were satisfied for confirmatory factor analysis.
To validate the factor structure identified in the EFA, confirmatory factor analysis was conducted using maximum likelihood estimation on an interdependent sample (N = 200). Following item refinement based on squared multiple correlation (SMC) below acceptable thresholds, 17 items were retained for the final model. The hypothesized four-factor model demonstrated acceptable fit to the data (Figure 3 and Table 6):

Confirmatory factor analysis model.
All factor loadings were statistically significant (
All factors demonstrated satisfactory internal consistency. Cronbach’s alpha for the full scale was 0.914 and for each factor as follows: AI fundamental knowledge = 0.873, AI impact assessment = 0.791, AI performance evaluation = 0.835, and AI practical application = 0.791. These values indicates that the items within each facto were consistently related to their underlying constructs.
To assess convergent validity, average variance extracted (AVE) value and construct reliability (CR) were calculated for each factor. As shown in Table 5, AVE values for all construct ranged from 0.553 to 0.620, exceeding the recommended threshold of 0.50 (Fornell and Larcker, 1981). In addition, the CR values ranged from 0.797 to 0.880, surpassing the suggested minimum of 0.70 (Hair et al., 2019).
Results of confirmatory factor analysis.
Multi-group confirmatory factor analysis
Multi-group confirmatory factor analysis was conducted to examine measurement invariance across Engineering and Humanities major students (Table 6). The unconstrained model demonstrated acceptable fit to the data (χ2 = 333.640, p < 0.001, TLI = 0.909, CFI = 0.925, RMSEA = 0.055). χ2 difference tests revealed that the unconstrained model and the model with constrained factor loadings (Model 1) did not differ significantly (
Multi-group confirmatory factor analysis results.
Constrained model 1: Factor loadings constrained equal across groups.
Constrained model 2: Covariances constrained equal across groups.
Constrained model 3: Both factor loadings and covariances constrained equal across groups.
Constrained model 4: Factor loadings, covariances, and error variances constrained equal across groups.
Group differences in validated scales
Following confirmation of measurement invariance, independent samples t-tests were conducted to examine AI literacy differences across key demographic variables.
Academic discipline (Engineering vs Humanities/Social Sciences)
Results revealed significant differences across all four AI literacy factors (Table 7). Engineering students demonstrated higher scores than humanities/social science students in AI Fundamental Knowledge (M = 3.615 vs M = 2.871, t = −7.652, p < 0.001, Cohen’s d = 0.842), AI Performance Evaluation (M = 3.233 vs M = 2.591, t = −5.968, p < 0.001, Cohen’s d = 0.651), and AI Practical Application (M = 3.729 vs M = 3.260, t = −5.421, p < 0.001, Cohen’s d = 0.596). A smaller but significant difference was also found in AI Impact Assessment (M = 4.027 vs M = 3.829, t = −2.824, p < 0.01, Cohen’s d = 0.311). Effect sizes ranged from medium to large, with AI Fundamental Knowledge showing the largest difference (d = 0.842) and AI Impact Assessment showing the smallest difference (d = 0.311) between groups.
Independent t-test results by academic disciplines.
Note. *p < .05, **p < 0.01, ***p < .001.
AI course experience
Similarly, students with AI course experience showed significantly higher proficiency across all AI literacy dimensions compared to those without such experience (Table 8). The most pronounced differences were observed in AI Fundamental Knowledge and AI Performance Evaluation. Significant differences were also evident in AI Practical Application and AI Impact Assessment.
Independent t-test results by AI course taking.
Note. ***p < .001.
Academic level (undergraduate vs graduate)
Regarding academic level differences, graduate students demonstrated significantly higher AI literacy in specific domains (Table 9). Differences were found in AI Fundamental Knowledge and AI Performance Evaluation, with graduate students outperforming undergraduates in both areas. However, no significant differences were observed in AI Impact Assessment or AI Practical Application.
Independent t-test results by status.
Note. *p < .05, ***p < .001.
Research model validation
Analysis with 182 participants providing GPA information demonstrated strong internal consistency (Cronbach’s α: Academic self-efficacy = 0.889, Creativity = 0.888). Prior to conducting structural equation modeling (SEM), correlation analysis among latent factors was performed. The results revealed correlations of 0.428 between AI literacy and academic self-efficacy, 0.234 between AI literacy and creativity, and 0.537 between academic self-efficacy and creativity. All correlation coefficients were below 0.80, indicating no multicollinearity concerns and confirming that the basic assumptions for SEM analysis were satisfied. The factors demonstrated appropriate levels of intercorrelation, supporting the feasibility of structural modeling. Academic self-efficacy and creativity exhibited the strongest correlation, suggesting these constructs are closely related. These preliminary findings supported the subsequent structural equation modeling analysis by establishing the initial statistical relationships between AI literacy, academic self-efficacy, creativity, and academic achievement.
The hypothesized structural model was tested using maximum likelihood estimation. Model fit indices indicated acceptable fit: χ2 (40) = 88.887, p < 0.001, SRMR = 0.075, CFI = 0.927, TLI = 0.900, RMSEA = 0.082 (90% CI: 0.059–0.105). The measurement model included three factors for academic self-efficacy (task difficulty preference, self-regulatory efficacy, and confidence) and three factors for creativity (fluency, flexibility, and originality), based on their factor loadings and theoretical foundations.
Model specification testing was conducted to examine whether the relationship between AI literacy and creativity involved indirect pathways through academic self-efficacy. This finding is consistent with theoretical frameworks suggesting that technology literacy is associated with creative capabilities through enhanced self-confidence and competence beliefs. Based on the model specification testing and fit indices, the final structural model indicates that AI literacy is indirectly related with creativity through academic self-efficacy, while maintaining direct effects on both academic self-efficacy and academic achievement.
Path analysis results for the direct effects are presented in Table 10 and Figure 4. All five direct path hypotheses were supported. AI literacy was positively associated with both academic self-efficacy
Direct path analysis results.
Note.*p < .05, ***p < .001.

Research model result.
The structural equation modeling results provided support for all five hypothesized relationships, though one finding emerged in an unexpected direction (Table 11). AI literacy showed a significant positive association with academic self-efficacy, confirming H1 and aligning with self-efficacy theory (Bandura, 1997), which posits that perceived competence in specific domains is associated with greater confidence in related academic tasks. Academic self-efficacy showed a strong positive relationship with creativity, supporting H2 and consistent with Tierney and Farmer’s (2002) findings that self-efficacy beliefs are important correlates of creative performance. This relationship is consistent with the theoretical proposition that confidence in one’s academic abilities facilitates creative thinking processes. Both AI literacy and academic self-efficacy were positively associated with academic achievement, confirming H3 and H4. The self-efficacy-achievement relationship aligns with extensive research demonstrating this well-established connection (Zimmerman, 2000). Creativity showed a significant negative association with academic achievement, supporting H5, though the direction differed from theoretical expectations. This counterintuitive finding may reflect tensions between creative thinking processes and conventional assessment methods.
Hypothesis results.
Given the negative association between creativity and academic achievement, additional analyses were conducted to examine indirect pathways to understand the underlying pathways through which AI literacy is statistically associated with academic outcomes. Mediation analysis allows for decomposition of total effects into direct and indirect factors, providing insights into whether the association between AI literacy and academic outcomes is accounted for by psychological mechanisms such as self-efficacy. Understanding these mediating pathways is particularly important for educational practice, as it can inform whether interventions focus on developing AI technical skills directly or on building academic confidence as a foundation for effective AI integration.
To examine indirect effects, bias-corrected bootstrap mediation analysis with 500 resamples was conducted. Phantom variables were employed to decompose complex indirect pathways and examine specific mediation effects within the structural model. Results are presented in Table 12.
Mediation analysis results for AI literacy and academic achievement.
p < 0.01. *p < 0.05.
The mediation analysis revealed that the total indirect effect of AI literacy on academic achievement was not statistically significant
Discussion and conclusion
This study developed and validated a multidimensional AI literacy scale for higher education students, addressing a gap in the assessment of AI competencies within academic contexts. The systematic development process yielded a 17-item instrument measuring four dimensions: AI fundamental knowledge, AI impact assessment, AI performance evaluation, and AI practical application. The scale demonstrated adequate psychometric properties and revealed significant relationships between AI literacy and academic variables.
The four-factor structure emerged consistently across exploratory and confirmatory factor analyses, supporting the theoretical distinction between knowledge-based and application-oriented AI competencies. This finding aligns with existing digital literacy frameworks that differentiate between technical knowledge and practical application skills (Wang et al., 2023b), while extending these concepts to AI-specific contexts. The measurement invariance testing confirmed that the AI literacy construct operates consistently across engineering and humanities students, indicating that the underlying factor structure remains stable despite disciplinary differences. This finding provides confidence that the scale measures the same construct across different academic populations, though the significant mean differences suggest that absolute competency levels vary substantially between groups. The internal consistency coefficients (α = 0.791–0.873), together with construct validity, indicate that the scale is suitable for research purposes. The average variance extracted values (0.553–0.620) meet acceptable thresholds, indicating that each factor captures meaningful shared variance among its constituent items.
As expected, engineering students demonstrated higher AI literacy scores across all dimensions, providing evidence for the scale’s discriminant validity. However, the magnitude of differences varied meaningfully across dimensions, with the largest gaps in fundamental knowledge (Cohen’s d = 0.842) and AI performance evaluation (Cohen’s d = 0.651), suggesting these areas are most influenced by technical training background. The significant differences based on AI course experience provide evidence for the scale’s discriminant validity, as students with formal AI education demonstrated higher competencies across all dimensions. This pattern supports the theoretical expectation that formal instruction contributes to AI literacy development. The pattern of difference-with largest effects in fundamental knowledge and performance evaluation, moderate effects in practical application, and smallest effects in impact assessment-suggests that technical training primarily enhances analytical capabilities, while evaluative and applied skills may develop through more diverse pathways. This finding warrants further investigation to understand how AI literacy components develop across different educational experiences.
The structural equation modeling results revealed that AI literacy is positively associated with academic self-efficacy (β = 0.442, p < 0.001), supporting social cognitive theory’s propositions about the relationship between competence and confidence (Bandura, 1997). This finding extends previous research on technology self-efficacy by demonstrating the relationship specifically within AI contexts. Academic self-efficacy’s positive relationship with creativity (β = 0.539, p < 0.001) aligns with existing research on self-efficacy as an antecedent to creative performance (Tierney and Farmer, 2002). The indirect relation between AI literacy and creativity through academic self-efficacy suggests that these constructs are statistically linked through academic self-efficacy rather than directly. The unexpected negative relationship between creativity and academic achievement (β = −0.219, p < 0.05) requires careful interpretation. This finding may reflect tensions between creative approaches to learning and conventional assessment methods, or it could indicate suppression effects within the model. The negative coefficient emerged despite positive zero-order correlations, suggesting that creativity’s relationship with achievement becomes negative when controlling for AI literacy and self-efficacy.
The validated four-factor structure contributes to theoretical understanding of AI literacy with novelty focusing on higher education students through items specifically designed for university and graduate-level contexts, distinguishing it from previous general AI literacy scales. The distinction between fundamental knowledge and practical application aligns with broader information literacy frameworks while addressing AI-specific requirements. The indirect-only effect of AI literacy on creativity through academic self-efficacy extends social cognitive theory to technology-enhanced learning contexts, demonstrating that academic self-efficacy serves as a key statistical mediator related technological competencies with other academic outcomes. This finding suggests that AI literacy development involves psychological as well as technical dimensions.
Limitations and future directions
Several limitations constrain the interpretation and generalizability of these findings. This study is limited by its context specificity, as data were collected from higher education students in a single country, the reliance on self-reported measures, and the cross-sectional design, which precludes causal inferences about the relationships among constructs. Although the overall sample size was adequate for estimating the proposed measurement and structural models, the split-sample design for exploratory and confirmatory factor analyses and the group sizes used in the multigroup CFA represent relatively modest samples for complex latent variable modeling. Consequently, some parameter estimates and invariance tests may be sensitive to sampling variability, particularly for cross-group comparisons, and should be interpreted with appropriate caution. Additionally, cultural factors may influence students’ AI literacy perceptions and self-efficacy, suggesting that cross-cultural validation is needed before applying the scale in different educational contexts. The cross-sectional design precludes causal inferences about the relationships among constructs. The reliance on self-report measures, while appropriate for assessing perceived competencies, may introduce response bias and does not capture objective performance capabilities. The unexpected creativity-achievement relationship highlights the need for more nuanced measurement approaches that can distinguish between different types of creative thinking and academic performance. The focus on general AI literacy rather than domain-specific applications limits understanding of how AI competencies operate within specific disciplinary contexts. While group differences were statistically significant, variations in prior AI experience among students may have contributed to these differences. Future research could further control for such factors and examine domain-specific AI literacy measures to provide more targeted insights for educational applications.
The validated scale provides a foundation for longitudinal research examining AI literacy development trajectories and the stability of the four-factor structure over time. Investigation of the relationship between self-reported AI literacy and objective performance measures would strengthen construct validity evidence. Cross-cultural validation studies could establish the scale’s applicability across different educational systems and cultural contexts. Additionally, research examining the effectiveness of different instructional approaches for developing specific AI literacy dimensions could inform evidence-based program development. The unexpected creativity-achievement relationship warrants further investigation using alternative creativity measures and different academic achievement indicators to understand the nature of this relationship and its implications for educational practice.
Implications and conclusions
This research makes several important contributions to AI literacy measurement and understanding in higher education contexts. The development and validation of a four-factor AI literacy scale addresses a critical assessment gap in higher education, specifically designed for university and graduate learners. The proposed scale incorporates items reflecting students’ real experiences, disciplinary differences, and AI-related course exposure. The structure is empirically grounded through factor analyses and topic modeling of international AI discourse. The empirical findings reveal significant patterns in AI literacy development. Notable group differences across academic majors, course experience, and academic levels demonstrate that AI competence develops unevenly across student populations. Engineering students’ pronounced advantages in technical domains, combined with smaller differences in ethical reasoning, suggest that disciplinary training influences specific aspects of AI literacy while other dimensions develop more independently. The structural relationships identified through mediation analysis offer important theoretical insights. AI literacy is statistically linked to academic outcomes mainly through psychological mediators rather than via direct paths, and academic self-efficacy accounted for the association between AI literacy and creativity. This finding challenges assumptions about direct skill transfer and highlights the importance of confidence-building in technology education.
These results have practical implications for educational design and implementation. The findings suggest that effective AI education may require differentiated approaches that account for students’ disciplinary backgrounds while simultaneously addressing both technical competencies and psychological factors. The critical role of academic self-efficacy indicates that AI education programs could benefit from integrating confidence-building strategies alongside skill development to maximize learning outcomes. This research provides both a validated measurement tool and evidence-based theoretical insights that advance understanding of AI literacy in educational contexts. The findings contribute to the growing knowledge base needed to develop effective AI education programs that prepare students for successful engagement with AI technologies in their academic and professional futures. The validated scale offers researchers and educators a reliable instrument for assessing AI literacy while the structural findings provide theoretical foundations for understanding how AI competencies relate to broader academic outcomes.
Footnotes
Acknowledgements
This article is the revision of the first author’s doctoral dissertation from Yonsei University.
Ethical considerations
The study was approved by Yonsei University Graduate School after submitting a declaration of ethical conduct in research. The institution does not require ethical approval for the submission of thesis.
Consent to participate
Written informed consent was obtained from all participants prior to participation in the survey.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by the Yonsei University Humanities and Social Sciences Field Creative Research Fund of 2024-22-0576.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data availability statement
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
