Development and validation of a multidimensional AI literacy scale for higher education students: A mixed-method study

Abstract

This study developed and validated a comprehensive AI literacy scale for higher education students through a mixed-methods approach. The development process integrated literature review, expert interviews, and BERTopic modeling analysis. Following content validity assessment and pilot testing, the scale was validated with a sample of 400 students. Factor analyses supported a 17-item, four-dimensional structure comprising AI Fundamental Knowledge, AI Impact Assessment, AI Performance Evaluation, and AI Practical Application. The scale demonstrated adequate internal consistency and configural and metric invariance across academic disciplines. Group differences were observed according to academic major, AI course experience, and academic level. Structural equation modeling indicated that AI literacy is positively associated with academic self-efficacy, which is in turn related to multi-item measured creativity and students’ GPA. Mediation analysis indicated that academic self-efficacy mediated the association between AI literacy and creativity, while the indirect association between AI literacy and academic achievement via this pathway was offset by the negative link between creativity and GPA. However, the generalizability of these findings may be constrained by the specific cultural context and sample characteristics.

Keywords

artificial intelligence AI literacy scale development higher education psychometrics validation structural equation modeling

Introduction

AI has moved beyond experimental applications to transform practical educational contexts, with the emergence of generative AI democratizing access to these technologies among higher education students (Chan and Tsi, 2024; Walter, 2024; Yang et al., 2025). University students increasingly engage with AI systems for academic research, writing support, problem-solving, and discipline-specific inquiry. However, this widespread adoption brings concerns about reliability, academic integrity, ethical considerations, and overreliance on systems, highlighting the critical need for appropriate AI literacy development to prevent negative educational outcomes (Stanford University Human-Centered Artificial Intelligence, 2021).

AI literacy differs substantially from traditional digital literacy frameworks. While digital literacy primarily focuses on basic technological skills and information management, AI literacy encompasses understanding AI concepts and technologies, applying AI in everyday contexts, and critically evaluating AI systems and AI-generated outputs (Long and Magerko, 2020; Ng et al., 2021). Unlike computational thinking or ICT competence, which emphasize algorithmic problem-solving or tool proficiency, AI literacy involves engagement with autonomous systems that make probabilistic decisions and generate content independently, often without transparent explanations of their internal processes. This distinction is particularly crucial in higher education, where students navigate complex human-AI interactions that require interpretive judgment and evaluative reasoning beyond conventional digital skills.

Despite growing recognition of its importance, reliable AI literacy measurement remains challenging. Establishing consistent definitions and assessment methods has proven difficult, with existing studies showing limited consensus on core competencies (Annapureddy et al., 2025; Wang et al., 2023a). Numerous AI literacy scales have been developed; however, many target K-12, teachers, or general populations and focus primarily on awareness, attitudes, or ethical perceptions rather than academically situated AI use (Jin et al., 2025; Lintner, 2024; Tagare et al., 2025). As a result, these measurements provide limited insight into how AI literacy function as an educational competence within higher education contexts, where AI engagement is closely intertwined with academic practices, disciplinary learning, and learning-related psychological variables.

Building on established frameworks proposed by Long and Magerko (2020), Ng et al. (2021), Wang et al. (2023b), and related studies, this research develops a synthesized four-dimensional AI literacy model specifically focused on higher education contexts. Across these frameworks, recurring dimensions – conceptual understanding, applied use, critical evaluation, and ethical awareness – emerge consistently, forming the theoretical basis of the four-dimensional structure adopted in this study. By situating these dimensions within academic contexts, the present study emphasizes not only functional interaction with AI systems but also students’ capacity to evaluate and reflect on AI use in learning activities such as research, writing, and problem-solving.

Methodologically, existing scale development has relied on expert-driven thematic analysis for item generation. While such qualitative approaches are essential for construct definition, they are inherently influenced by researcher interpretation and typically based on relatively small expert samples. To enhance construct validity and reduce researcher subjectivity, this study complements traditional expert interviews with computational text mining using BERTopic (Grootendorst, 2022). Whereas thematic analysis depends on interpretive coding, BERTopic enables systematic identification of latent thematic structures across a large corpus of international AI literacy discourse, thereby improving construct coverage, transparency, and reproducibility while retaining the strengths of qualitative judgment.

Beyond measurement concerns, the educational consequences of AI literacy remain insufficiently explored. Although preliminary studies suggest associations between technological competencies and academic outcomes, the pathways through which AI literacy relates to academic self-efficacy, creativity, and academic achievement remain theoretically underdeveloped and empirically underexamined. Without such evidence, AI literacy risks remaining a descriptive concept rather than an explanatory construct within educational research.

Accordingly, this study addresses the following research questions:

RQ1. What are the consistent factors of AI literacy as a measurable construct in higher education?

RQ2. What are the relationships between AI literacy and academic self-efficacy, creativity, and academic achievement?

This research proceeds in three phases: (1) constructing a theoretical framework through systematic literature review, (2) developing and validating measurement items through mixed methods (literature review, expert interviews, and topic modeling), and (3) empirically examining relationships with academic variables (academic self-efficacy, creativity, and academic achievement). This study contributes by integrating computational topic modeling into AI literacy scale development and empirically validating AI literacy’s relationship with core academic outcomes. Through these processes, the study aims to contribute to AI literacy measurement literature while providing empirical evidence for AI literacy as a distinct multidimensional construct and offers a validated assessment instrument to inform educational practice in AI-integrated higher education contexts.

Literature review

Concept and competency of AI literacy

The conceptualization of literacy has evolved substantially with technological advances, progressing from traditional literacy (Bawden, 2001) to various specialized forms such as information and media literacy (Buckingham, 2015; Livingstone, 2004). While AI literacy was initially conceptualized as a subset of digital literacy, recent scholarly discourse increasingly suggests its distinct theoretical positioning due to AI’s unique characteristics and societal implications (Ng et al., 2021; Wang et al., 2023b). Long and Magerko (2020: 58) define AI literacy as “the ability to critically evaluate AI technologies, communicate and collaborate with AI, and use AI as a tool across contexts.” AI literacy appears to differ fundamentally from traditional literacy frameworks in both scope and complexity. Traditional literacy frameworks primarily focus on text interpretation and basic digital tool usage (Street, 2003), whereas AI literacy includes understanding of AI principles, algorithmic characteristics, and potential biases embedded within systems (Long and Magerko, 2020). The ethical dimensions of AI literacy extend beyond conventional information ethics to address unique challenges including algorithmic bias, AI accountability, and transparent decision-making processes (Floridi et al., 2018).

Recent theoretical developments have increasingly proposed frameworks for understanding AI literacy components. Ng et al. (2021) suggested that AI literacy encompasses core components including understanding AI concepts, applying AI tools, evaluating and creating with AI systems, and navigating AI ethics. However, the rapid evolution of AI technologies, particularly the emergence of generative AI applications and related algorithms, may require expansion of these foundational frameworks to address new competencies such as prompt engineering, content verification, and human-AI collaborative workflows.

This multifaceted nature of AI literacy suggests potential distinctions between knowledge-based competencies (understanding how AI systems function) and application-oriented capabilities (effectively utilizing AI tools for specific purposes). This theoretical distinction may have important implications for measurement development, as different competency types could require different assessment approaches and validation strategies. Furthermore, the contextual nature of AI literacy – how competencies manifest differently across educational, professional, and personal contexts – remains an area requiring further theoretical development and empirical investigation.

Existing AI literacy measurement

The need for reliable AI literacy measurement has grown substantially alongside increasing AI adoption in education and society. While theoretical frameworks have been established, empirical research on validated measurement tools has rapidly expanded, particularly since the emergence of generative AI technologies. Recent systematic review provides comprehensive insights into the current state of AI literacy measurement, revealing both progress and significant gaps in instrument development. Lintner’s (2024) systematic review represents comprehensive analysis of AI literacy scales to date, evaluating the psychometric quality of 22 studies validating 16 different scales using the COSMIN (Consensus-based Standards for the selection of health Measurement Instruments) tool. This systematic assessment revealed that while most existing scales demonstrated good structural validity and internal consistency, critical limitations persist across the measurement landscape. Significantly, only a few scales have been tested for content validity, reliability, construct validity, and responsiveness, while none have been examined for cross-cultural validity and measurement error – fundamental requirements for robust psychometric instruments. Early foundational work by Long and Magerko (2020) established core competencies focusing on critical evaluation, communication, and collaboration with AI systems, providing theoretical foundations that influenced subsequent empirical developments. However, their framework preceded the mainstream introduction of generative AI, limiting its applicability to contemporary AI interactions that students regularly encounter.

Recent studies have witnessed significant advancement in AI literacy measurement, with researchers addressing both general AI competencies and emerging generative AI applications. Wang et al. (2023c) introduced a comprehensive 31-item scale measuring AI literacy across four dimensions – awareness, usage, evaluation, and ethics – validated through rigorous three-step content validation and factor analysis procedures. Their measurement demonstrated psychometric properties and established significant relationships with digital literacy and attitudes toward AI technology, though development focused primarily on general adult populations rather than higher education contexts. Carolus et al. (2023) made substantial theoretical contributions with the Meta AI Literacy Scale (MAILS), a 34-item measurement that expanded beyond traditional AI knowledge to incorporate psychological competencies including psychological competencies including problem-solving, learning, and emotion regulation. This multidimensional framework represented a significant advancement, integrating knowledge-related, operational, critical, and ethical dimensions while addressing psychological change and meta-competencies. Koch et al. (2024) subsequently confirmed the scale’s robustness through further validation studies and developed a 10-item short version, demonstrating consistency across different populations and contexts.

Recognizing the limitations of general population measurements, several studies have developed approaches targeting specific educational contexts. Laupichler et al. (2023) identified the Scale for the Assessment of Non-Experts’ AI Literacy (SNAIL) through iterative delphi expert methodology, resulting in a 31-item scale with three factors: technical understanding, critical appraisal, and practical application. Their approach specifically addressed non-expert populations, filling a crucial gap in measurement for individuals without formal AI or computer science education. For higher education contexts specifically, Hornberger et al. (2023) developed a comprehensive measurement focusing on technical knowledge assessment, though their approach emphasized factual knowledge over practical application skills that characterize contemporary AI literacy requirements. Yuan et al. (2024) addressed this limitation by developing a holistic AI literacy scale including individual, interactive, and sociocultural dimensions, with cognitive, behavioral, and normative competencies across six dimensions: AI features, AI processing, algorithm influences, user efficacy, ethical consideration, and threat appraisal.

The rapid adoption of generative AI tolls has created new measurement needs that traditional AI literacy scales inadequately address. Liu et al. (2025) developed a workplace-oriented Generative AI Literacy (GAIL) framework covering five core dimensions: basic technical competence, prompt optimization, content evaluation, innovative application, and ethical and compliance awareness. This framework specifically addresses the unique competencies required for effective human-AI collaboration in generative AI context, including prompt engineering skills and content evaluation capabilities. Chen et al. (2025) conducted empirical research examining generative AI literacy across four dimensions – utilization, interaction, evaluation of output, and ethics – among higher education students. Their findings revealed that while students actively use generative AI tools for academic purposes, most demonstrate critical evaluation of outputs and express need for explicit institutional guidance regarding ethical and appropriate use.

Recent efforts have addressed the cultural specificity of AI literacy measurement through cross-cultural validation studies. Hobeika et al. (2024) developed and validated an Arabic version of the AI Literacy Scale (AILS) for university students. Their work demonstrated the feasibility of cross-cultural adaptation while highlighting the need for culturally appropriate measurement approaches.

Most existing scales have been developed primarily for general populations or specific professional contexts, with relatively limited attention to the distinctive requirements of higher education students (Carolus et al., 2023; Wang et al., 2023b). Higher education students tend to integrate AI tools into academic research, scholarly writing, and learning processes in ways that may differ substantially from general population usage patterns (Hornberger et al., 2023). This contextual gap may be particularly significant given evidence suggesting that students encounter AI in specialized academic contexts, including research assistance, automated feedback systems, and intelligent content curation (Chen et al., 2025), which could require competencies that differ from those captured by measurements designed for broader populations. Furthermore, existing measurement has typically relied on single methodological approaches, with limited integration of multiple validation methods (Lintner, 2024). While individual studies have identified specific psychometric properties through conventional approaches such as factor analysis and reliability testing (Laupichler et al., 2023; Wang et al., 2023c), fewer studies appear to have systematically combined diverse methodological approaches such as literature analysis, expert interview, and computational text analysis to ensure comprehensive construct coverage. This methodological pattern suggests potential opportunities for more fine-tuned measurement development through integrated validation approaches.

Additionally, while some research has begun to explore relationships between AI competencies and academic variables (Wang et al., 2023c), investigation of complex mediating pathways appears to be in early stages of development. For instance, the potential role of academic self-efficacy as a mediating mechanism through which AI literacy might influence creativity and academic achievement has received relatively limited empirical attention, despite established theoretical frameworks in self-efficacy research (Bandura, 1997) suggesting such psychological pathways could be important for understanding educational outcomes in technology-enhanced learning contexts.

Academic variables related AI literacy

The relationship between AI literacy and academic outcomes has emerged as an increasingly important area of scholarly inquiry with recent empirical evidence revealing complex associations among technological competencies and educational variables. Contemporary research suggests that AI competencies may influence academic self-efficacy, creativity, and academic achievement through multifaceted pathways, though findings remain mixed and require continued investigation (Mansoor et al., 2024; Zhang et al., 2024).

Academic self-efficacy, conceptualized by Bandura (1977) as learners’ belief in their ability to perform academic tasks, has been established as a significant predictor of academic performance across diverse educational contexts. Research in self-efficacy theory suggests that competence in specific technologies domains may enhance confidence in related academic tasks (Bandura, 1997). Recent studies have begun to examine these relationships more directly within AI contexts. Zhang et al. (2024) explored the complex associations between academic self-efficacy, academic stress, and performance expectations in AI usage behaviors among 300 university students, finding that academic self-efficacy mediates relationships between academic variables and problematic AI dependency patterns. Their investigation using the Interaction of the Person-Affect-Cognition-Execution (I-PACE) model revealed that students with lower academic self-efficacy may be more prone to developing dependent relationships with AI tools. The emergence of AI technologies in educational contexts appears to create multidimensional dynamics in self-efficacy development. Recent multinational research suggests complex relationships between AI literacy and self-efficacy beliefs. A multinational study of 1465 university students across Germany, the UK, and the US found that students identified foundational levels of AI literacy alongside relatively high levels of interest and positive attitudes toward AI technologies, though significant cross-national variations existed in self-efficacy beliefs (Hornberger et al., 2025). These findings suggest that cultural and educational contexts may influence how AI literacy development relates to academic confidence. Zhang et al. (2024) identified several concerning consequences of AI dependency, including increased academic laziness, spread of misinformation, reduced creativity, and diminished critical and independent thinking abilities. These findings highlight the complexity of relationships between AI literacy development and academic self-efficacy, suggesting that while technological competence may enhance confidence, excessive dependency could potentially undermine the very academic capabilities it initially supported.

Creativity has gained renewed prominence in educational discourse within AI contexts, with emerging research suggesting contradictory relationships between AI usage and creative capabilities. While some theoretical perspectives propose that AI tools might enhance human creativity by providing resources for idea generation and iterative refinement, empirical findings suggest more complex dynamics. Research conducted among Austrian university students found that AI critical appraisal significantly and negatively impacted both AI self-efficacy and AI output quality, suggesting that critical evaluation of AI capabilities may reduce overconfidence while potentially constraining creative exploration (Hornberger et al., 2025). Chen et al. (2025) conducted empirical investigation of generative AI literacy across four dimensions – utilization, interaction, evaluation of output, and ethics – among higher education students. Their findings identified that while students actively use generative AI tools for academic purposes, most exhibited critical evaluation of outputs and expressed need for explicit institutional guidance regarding ethical and appropriate use. These results suggest that creativity in AI-enhanced on students’ ability to balance AI assistance with critical evaluation skills. The relationship between AI tools and creativity appears to vary considerably based on implementation approaches and student characteristics. Recent evidence suggests that well-founded AI tool usage may support student performance and motivation, while excessive reliance on AI-powered tools may impact student performance negatively (Khoso et al., 2023; Montenegro-Rueda et al., 2023). This variability highlights the importance of understanding how different approaches to AI integration influence creative capabilities and academic outcomes.

Academic achievement, measured through grades, standardized assessments, or performance evaluations, represents a critical outcome variable in educational research. Recent empirical evidence reveals complex relationships between AI literacy and academic performance. Mansoor et al. (2024) conducted a comparative transnational survey among university students, finding an overall moderate AI literacy level and reporting an inverse relationship between academic performance and AI literacy levels. Their analysis suggested that students with lower academic performance may tend to rely more heavily on AI tools to complete academic tasks, though the causal mechanisms underlying this relationship require further investigation. These findings align with other recent research showing mixed results regarding AI literacy and academic achievement relationships. While some studies have identified positive associations between AI literacy, AI usage, and academic performance, others report weak correlations or inconclusive results (Abbas et al., 2019; Asio, 2024). Austrian university student research found that AI technical understanding, critical appraisal, practical application, self-efficacy, and output quality had statistically insignificant effects on students’ academic performance, despite identifying significant relationships with AI self-efficacy and output quality (Hornberger et al., 2025). Research examining generative AI literacy specifically has provided additional insights into these complex relationships. O’Dea et al. (2026) investigated factors affecting university students’ generative AI literacy in UK and Hong Kong contexts, utilizing the four-dimensional AI literacy framework including knowledge and understanding, use and application, evaluation and creation, and AI ethics. Their findings suggested that generative AI literacy development varies significantly across cultural and educational contexts, with implications for how AI competencies relate to academic outcomes. Walter (2024) emphasized that AI integration in education requires systematic approaches considering creativity, technological fluency, and critical thinking skills, moving beyond traditional educational methods to embrace more dynamic, student-centered learning environments. This perspective suggests that the relationships among AI literacy, academic self-efficacy, creativity, and academic achievement may depend on pedagogical approaches and institutional support structures. The complexity of these relations suggests the importance of comprehensive theoretical frameworks that can account for multiple pathways, mediating mechanisms, and contextual factors. While empirical evidence continues to accumulate, the gaps remain in understanding how AI literacy development influences academic outcomes through psychological and pedagogical mechanisms, particularly in different cultural and disciplinary contexts. The mixed findings across studies highlight the need for more refined theoretical models and measurement approaches that can capture the multidimensional nature of AI literacy’s educational impact.

Methodology

This study employed a systematic mixed-methods approach to develop and validate an AI literacy measurement tool specifically designed for higher education contexts. Recent methodological developments emphasize the importance of triangulation in scale development to enhance validity and reduce biases inherent in single-method approaches. The development process involved three main phases: conceptualization through methodological triangulation, comprehensive validation through expert review and pilot testing, and examination of relationships with academic variables (Figure 1).

Figure 1.

Steps of scale development and validation.

The initial development phase integrated three complementary approaches to ensure comprehensive coverage of AI literacy components: literature analysis, expert interviews, and computational text analysis through BERTopic modeling. Literature analysis established theoretical foundations by examining existing AI literacy definitions and measurement frameworks. Expert interviews with eight participants, including industry professionals and professors, provided insights into required competencies and contextual factors. The interview data were qualitatively analyzed to identify and refine AI literacy components, serving as the primary basis for construct definition. BERTopic modeling was applied as a complementary procedure to examine thematic convergence and construct coverage across international AI literacy discourse, rather than as a primary method for construct generation. The resulting topics were systematically mapped onto the preliminary construct framework derived from literature review and expert interviews to assess alignment, identify missing areas, and refine construct boundaries. This mixed-method approach, combining qualitative research with computational text analysis, provided complementary perspectives and enhanced construct validity.

For content validity assessment, eight different experts evaluated items using Lynn’s (1986) method. A pilot study was conducted with 42 higher education students to assess face validity, item clarity, and comprehensibility. Participants provided written feedback on ambiguous or difficult items. Based on this process, items were revised or removed due to ambiguity, redundancy, or conceptual misalignment. After this, a total of 400 participants from multiple higher education institutions and diverse academic disciplines were recruited and randomly divided into two equal groups of 200 participants each. The detailed demographic characteristics are presented in Table 2. Exploratory factor analysis was conducted using SPSS 27.0, and confirmatory factor analysis was performed using AMOS 26.0, enabling thorough statistical validation and measurement invariance testing. Measurement invariance was examined through multi-group confirmatory factor analysis, sequentially testing configural and metric invariance, with more restrictive forms evaluated in subsequent models.

To test the hypotheses, we used validated measurements for each construct. Academic self-efficacy was measured using Kim and Park’s (2001) scale. Creativity was assessed using scale adapted by Choi (2014), which conceptualizes creativity as a multidimensional construct including fluency, flexibility, originality, and elaboration. The scale was developed by reorganizing creativity-related items grounded in classic creativity theories, including Guilford’s structure of intellect (Guilford, 1950, 1967, 1968), Torrance’s divergent thinking framework (Torrance, 1965), and Amabile’s componential model (Amabile, 1988, 1996). The scale has been empirically validated through exploratory and confirmatory factor analyses and has demonstrated adequate internal consistency across its sub-dimensions (Cronbach’s ɑ = 0.773–0.908), Academic achievement was operationalized using participants’ GPA in AI-related coursework.

Using data from 182 participants, the validated scales were applied to analyze the structural relationships among AI literacy, academic self-efficacy, creativity, and academic achievement. Structural equation modeling was conducted with AMOS 26.0 using composite scores as indicators rather than individual items. Following CFA validation, composite scores were calculated as mean values of items within each validated factor, with AI literacy modeled using four composite indicators and academic self-efficacy and creativity using their respective composite indicators. Mediation effects were assessed through bias-corrected bootstrap procedures with 500 resamples, incorporating phantom variables to decompose complex indirect pathways.

Based on the proposed model (Figure 2), the following hypotheses were tested:

H1: AI literacy has a positive effect on academic self-efficacy.

H2: Academic self-efficacy has a positive effect on creativity.

H3: AI literacy has a positive effect on academic achievement.

H4: Academic self-efficacy has a positive effect on academic achievement.

H5: Creativity has an effect on academic achievement.

Figure 2.

Research model for validation.

In addition to testing direct effects, mediation analyses were conducted to examine whether academic self-efficacy mediates the relationship between AI literacy and creativity, and whether AI literacy has indirect effects on academic achievement through the sequential mediation of academic self-efficacy and creativity.

Scale development and validation

This study developed and validated an AI literacy scale through three methodological approaches. We first analyzed literature to establish a theoretical foundation for conceptualizing AI literacy and its components, focusing on higher education student competencies. We then conducted in-depth interviews with domestic AI experts to identify specific competencies needed for workforce entry. Additionally, we applied topic modeling to international AI experts’ social media content to complement our previous findings. These approaches were integrated to establish AI literacy components with clear definitions and develop appropriate measurement items.

Exploration of AI literacy components and initial item development

Primary components identification

In this phase, we established a theoretical framework for AI literacy components, examining scales across various educational and professional contexts. Previous research has focused on diverse populations including college students, adults, and corporate employees, predominantly using Long and Magerko’s (2020) AI competencies framework (Carolus et al., 2023; Hornberger et al., 2023; Karaca et al., 2021; Laupichler et al., 2023; Long and Magerko, 2020; Mikalef and Gupta, 2021; Ng et al., 2021; Wang et al., 2023b). The systematic analysis of these studies identified four primary components:

AI Knowledge. Understanding of AI concepts, principles, and technical foundations, differentiated into basic AI knowledge, advanced AI knowledge, and AI learning methods.

AI Application. Integration of development and utilization capabilities, reflecting no-code/zero-code tools and generative AI, which have blurred traditional boundaries of development and application skills.

AI Ethics and Evaluation. Ethical considerations combined with critical evaluation capabilities, including data ethics, AI ethical principles, and critical thinking about AI applications.

AI’s social impact. Understanding AI’s influence on daily life, social structurers, careers, and broader societal implications.

Expert interview analysis

Eight experts were interviewed to validate and expand the literature-identified components. The interviews explored required competencies for higher education students in the AI era. Thematic analysis revealed five primary themes:

The first theme centered on AI knowledge structure, revealing a hierarchical organization distinguishing between basic conceptual understanding and advanced technical knowledge. Industry experts emphasized the importance of raising domain knowledge quickly, as one expert noted: “Since I went in without any practical experience. . . As a career move but with no AI experience, I decided to quickly raise my domain knowledge level to produce results” (Industry expert D).

The importance of data literacy was particularly emphasized: “Data is the most important. Whether it's deepfakes or inappropriate videos, you first need human labels to create machine learning models” (Industry expert E).

The second theme revealed an evolution in perceptions of AI application skills, particularly following the emergence of generative AI. The focus has shifted from pure programing capability to problem-solving abilities and hands-on experience. As one industry expert explained: “Current LLM implementations enable natural language coding interfaces, producing analytical results comparable to traditional programming outputs, effectively democratizing development capabilities” (Industry expert C).

The emergence of prompt engineering was highlighted as a crucial skill: “The prominence of prompt engineering will increase substantially. It includes understanding algorithmic foundations and formulating queries that align with the system's operational parameters” (Professor G).

Ethical considerations emerged as the third theme, with experts emphasizing organizational management of ethics and data protocols. Industry experts consistently emphasized data ethics and copyright considerations: “While the application of outputs is crucial, we carefully review each dataset used in model development. Although more data would of course lead to better results, we have to be very careful because we cannot use data in an arbitrary way and ignore copyright issues” (Industry expert D).

Academic perspectives added depth to ethical considerations: “(Regarding ChatGPT usage in assignments) Will students disclose its use or not? Their continued non-disclosure indicates their assumption of responsibility through final review. . . This reveals how potential elite members of our society conceptualize intellectual property” (Professor F).

The fourth theme focused on social impact assessment, particularly the concept of “AI transition.” Experts discussed both immediate impacts and future implications: “Being able to think about or logically reason about how activities in your daily life could be transformed by AI . . . I thought that might be a really important point from that perspective” (Industry expert D).

Concerns about societal changes were evident: “If its output created in 5 minutes is cleaner and better than what I produced after much deliberation, wouldn't people stop engaging in creative activities that require deep thinking?” (Professor F).

The fifth theme, basic AI competency, emerged as a distinct component focusing on fundamental skills required in the AI era. Communication abilities were particularly emphasized: “Communication skills seem really important. Whether you are good at English or not, being able to explain things clearly is crucial” (Industry expert E).

The importance of human collaboration was also highlighted: “I think human-to-human collaboration skills are really necessary. In my view, you need to be able to collaborate with other humans first before you can work well with AI” (Professor H).

These findings contributed to framework development in three ways: (1) reconstructing technical components into unified “AI Utilization” reflecting industry practices, (2) adding “Basic AI Competency” for fundamental AI-era skills not identified in literature, and (3) expanding ethical considerations to include both individual and organizational protocols. These modifications enhanced the framework’s relevance for higher education contexts.

Topic modeling analysis

BERTopic modeling analysis of international experts’ content generated 33 initial topics, consolidated into 10 meaningful clusters through hierarchical clustering. These clusters aligned with the five preliminary components, providing empirical support for the conceptual framework and revealing the interconnected nature of AI literacy dimensions (Table 1).

Table 1.

Mapping of BERTopic modeling’s clustering results to components.

Predefined component	Hierarchical cluster	IDM cluster
AI Knowledge	Human-AI Difference	Human-centered AI
	AI-related Learning	AI Methodology
		AI Knowledge Acquisition Methods
		AI-related Concepts
AI Basic Competency	Academic Training	Academic Training
AI Basic Competency	Human Interaction	Human Interaction
AI Ethics and Evaluation	AI Ethics and Regulation	AI data security
		AI model democratization
		AI Regulations
		AI Technology Infrastructure
		AI Technology Development and Legal Restrictions
AI Social Impact	AI Transform	AI Transformation
	ChatGPT’s Impact	AI and Social Dynamics
	Technical Infrastructure Construction	Importance of Human Role
	AI Social Impact
	Accessibility

Key findings include the identification of distinct dimensions within the AI knowledge, emphasis on the international discourse surrounding AI ethics and regulations, emergence of a complex interplay among various AI literacy dimensions, and recognition of the technical infrastructure as a significant organizational-level factor. The analysis identified the temporal dimension of AI literacy, emphasizing both immediate and future implications of AI technologies. These findings were incorporated into scale development, ensuring comprehensive coverage of the AI literacy construct and providing empirical support for the proposed framework.

Initial assessment

Initial assessment covered 80 items across five factors: Factor 1 (27 items, CVI range: 0.50–1.00), Factor 2 (20 items, 0.50–1.00), Factor 3 (17 items, 0.25–1.00), Factor 4 (9 items, 0.75–1.00), and Factor 5 (7 items, 0.625–1.00). Based on experts’ feedback and subsequent revisions, the scale was refined to a more precise and comprehensive 84-item scale for pilot testing. The item counts for each component were adjusted: Factor 1 was reduced to 25 items, Factor 2 was expanded to 24 items, Factor 3 was streamlined to 15 items, Factor 4 was refined to 11 items, and Factor 5 retained its 7 items with content revisions.

Based on the pilot test with 42 higher education students and a subsequent expert review, items showing redundancy, ambiguity, or conceptual overlap were identified and removed. As a result, the initial pool of 84 items was refined to 48 items prior to exploratory factor analysis. The refined scale comprised five constructs: Factor 1 (14 items), Factor 2 (12 items), Factor 3 (12 items), Factor 4 (6 items), and Factor 5 (4 items).

As shown in Table 2, the sample consisted of 154 males (38.5%) and 246 females (61.5%). In terms of age, 319 participants (79.8%) were 29 years old or younger, 63 (15.8%) were between 30 and 39, and 18 (4.4%) were 40 or older. Academic status was comprised of 219 undergraduates (54.8%) and 181 graduate students (45.3%). A slight majority of participants attended private institutions (54.8%) located in metropolitan areas (55.8%). Participants’ major fields were primarily Engineering (42.5%) and Humanities & Social Sciences (40.0%), with smaller proportions from Natural Sciences (8.0%), Medicine (5.0%), and Arts & Physical Education (4.5%).

Table 2.

Demographic characteristics of survey respondents (N = 400).

Characteristics	n	%	Characteristics	n	%
Gender			AI course experience
Male	154	38.5	Yes	192	47.5
Female	246	61.5	No	208	52.5
Age			Major field
⩽29	319	79.8	Engineering	170	42.5
30–39	63	15.8	Humanities & Social Sciences	160	40.0
⩾40	18	4.4	Natural Sciences	32	8.0
Academic status			Medicine	20	5.0
Undergraduate	219	54.8	Arts & Physical Education	18	4.5
Graduate	181	45.3	AI usage frequency
Institution type			Daily	130	32.5
National	181	45.3	Weekly (1–3 times)	159	39.8
Private	219	54.8	Monthly (1–3 times)	77	19.3
Institution location			Rarely (⩽3 times/6 months)	24	6.0
Metropolitan area	223	55.8	Never	10	2.5
Non-Metropolitan area	177	44.2	Course grade^a
			A grade	127	31.8
			B grade or below	55	13.8
			Currently enrolled	8	2.0
			Not enrolled	210	52.5

Course grades are reported only for students who have completed regular AI-related courses. One-time eventful seminar attendees (n = 2) are classified as “Not enrolled.”

Descriptive statistics for all 48 items revealed mean scores ranging from 2.13 to 4.00 with standard deviations between 0.734 and 1.402. The skewness and kurtosis values for all factors fell within the ±2 range, indicating that the data met the assumption of normality for subsequent analyses (Table 3).

Table 3.

Descriptive statistics.

Item	Min	Max	Mean	SD	Skewness	Kurtosis
1	1	5	3.83	0.855	−0.228	−0.569
2	1	5	3.71	0.905	−0.368	−0.414
3	1	5	3.28	1.086	−0.203	−0.737
4	1	5	3.19	1.180	−0.106	−0.962
5	1	5	3.06	1.164	0.008	−0.858
6	1	5	3.67	0.957	−0.460	−0.201
7	1	5	2.63	1.250	0.346	−0.897
8	1	5	3.47	1.083	−0.494	−0.399
9	1	5	3.57	1.016	−0.523	−0.120
10	1	5	3.21	1.306	−0.238	−1.103
11	1	5	3.28	1.138	−0.110	−0.842
12	1	5	2.66	1.282	0.324	−0.995
13	1	5	3.41	0.971	−0.232	−0.342
14	1	5	3.71	0.981	−0.583	−0.196
15	1	5	3.39	0.973	−0.278	−0.463
16	1	5	3.04	1.220	−0.117	−1.034
17	1	5	3.27	0.982	−0.196	−0.465
18	1	5	3.21	1.068	−0.209	−0.622
19	1	5	3.62	0.953	−0.487	−0.134
20	1	5	3.18	1.106	−0.255	−0.717
21	1	5	2.45	1.256	0.553	−0.750
22	1	5	3.59	1.022	−0.471	−0.363
23	1	5	3.78	0.953	−0.612	−0.119
24	1	5	2.99	1.409	−0.011	−1.357
25	1	5	2.64	1.221	0.286	−0.935
26	1	5	2.14	1.325	0.918	−0.429
27	1	5	3.66	0.947	−0.548	0.054
28	1	5	3.82	0.881	−0.741	0.499
29	1	5	3.71	0.886	−0.538	0.025
30	1	5	3.68	1.000	−0.586	−0.324
31	1	5	3.23	1.163	−0.250	−0.915
32	1	5	2.24	1.359	0.805	−0.648
33	1	5	3.93	0.869	−0.906	1.171
34	1	5	3.53	0.942	−0.342	−0.307
35	1	5	3.41	0.935	−0.243	−0.350
36	1	5	3.99	0.990	−0.941	0.382
37	1	5	2.93	1.159	0.118	−0.853
38	1	5	3.08	1.153	−0.073	−0.813
39	1	5	3.89	0.790	−0.714	0.919
40	1	5	3.79	0.820	−0.691	0.707
41	1	5	3.62	0.893	−0.506	0.052
42	1	5	3.99	0.784	−0.701	0.610
43	1	5	3.99	0.732	−0.631	1.048
44	2	5	3.95	0.744	−0.543	0.335
45	1	5	3.81	0.866	−0.738	0.696
46	1	5	3.89	0.899	−0.644	0.247
47	1	5	4.00	0.869	−0.802	0.513
48	1	5	3.37	1.186	−0.262	−0.876

Item-item correlation analysis showed that the correlation coefficients ranged from −0.014 to 0.798, with most correlations being statistically significant (p < 0.01 or p < 0.05). No correlation coefficients exceeded the 0.80 threshold, indicating no severe multicollinearity issues that would interfere with factor analysis procedures.

Exploratory and confirmatory factor analysis

Exploratory factor analysis

Exploratory factor analysis was conducted with Sample 1 (n = 200). Before EFA, preliminary data screening was conducted. Descriptive statistics revealed that the mean scores ranged from 2.09 (SD = 1.304) to 4.01 (SD = 0.716) across all 48 items. Standard deviations ranged from 0.716 to 1.383, indicating adequate variability in responses. All variables demonstrated acceptable levels of skewness (−0.944 to 0.958) and kurtosis (−1.339 to 1.534), with absolute values below |2.0|, and indicating that the assumption of normality was satisfied for all variables.

An exploratory factor analysis was conducted on 48 items with data from 200 participants (Sample 1). Prior to factor extraction, we conducted item refinement process based on factor loading below 0.40, communalities below 0.30, and cross-loading above 0.32, resulting in the retention of 21 items. The Kaiser-Meyer-Olkin measure verified the sampling adequacy for the analysis, KMO = 0.938, which is above the acceptable limit of 0.6. Bartlett’s test of sphericity χ² (210) = 2834.82, $p < . 001$ , indicated that correlations between items were sufficiently large for EFA. Principal axis factoring with oblimin rotation was used because the factors were expected to be correlated. Four factors had eigenvalues over Kaiser’s criterion of 1 and in combination explained 63.02% of the variance. Table 4 shows the factor loadings after rotation.

Table 4.

Results of exploratory factor analysis.

Item	Factor 1	Factor 2	Factor 3	Factor 4	Communality
1	0.907	0.052	−0.041	−0.073	0.835
2	0.787	0.027	0.116	0.110	0.642
3	0.780	−0.002	0.041	0.082	0.631
4	0.754	0.105	−0.208	−0.102	0.773
5	0.694	−0.047	−0.283	−0.017	0.740
6	0.666	0.078	−0.108	0.030	0.616
7	0.467	0.145	0.049	0.268	0.485
8	0.022	0.784	0.034	−0.067	0.559
9	−0.032	0.753	−0.008	−0.009	0.547
10	0.009	0.705	−0.049	0.051	0.573
11	0.079	0.685	−0.076	0.159	0.731
12	0.129	0.638	−0.012	0.025	0.512
13	−0.039	0.120	−0.866	0.025	0.828
14	0.036	0.041	−0.793	0.066	0.759
15	0.170	−0.089	−0.582	0.140	0.565
16	−0.144	0.140	−0.005	0.757	0.600
17	0.169	−0.093	−0.051	0.745	0.693
18	0.163	−0.010	−0.045	0.617	0.546
19	0.157	−0.014	−0.109	0.569	0.532
20	0.073	0.032	−0.279	0.526	0.618
21	−0.076	0.275	−0.088	0.470	0.450
Eigen value	9.415	1.905	1.124	0.791
% Of variance	44.832	9.074	5.351	3.767
Cumulative %				63.024

Note. Loadings ≥ .40 are shown in bold.

Internal consistency was assessed using Cronbach’s alpha coefficient. The overall 21-item scale demonstrated internal consistency (α = 0.942). Internal consistency estimates for each factor identified in the EFA showed Factor 1 (seven items) for α = 0.927, Factor 2 (five items) for α = 0.867, Factor 3 for α = 0.868, and Factor 4 (six items) for α = 0.875. All factors exceeded the recommended threshold of 0.70 for acceptable internal consistency. Item-total correlations ranged from 0.434 to 0.867, indicating that all items contributed meaningfully to the coherence of the scale. Item-factor correlations ranged from −0.578 to 0.518, supporting the use of oblique rotation and indicating moderate relationships among factors.

Confirmatory factor analysis

Following the EFA and assessment of internal consistency conducted on Sample 1, a second independent sample (Sample 2, N = 200) was collected for confirmatory factor analysis. Preliminary descriptive statistics for the CFA sample revealed that mean scores ranged from 2.68 (SD = 1.214) to 4.00 (SD = 0.777), with standard deviations ranging from 0.749 to 1.319. All 21 items demonstrated acceptable distributional properties, with skewness values ranging from −0.871 to 0.343 and kurtosis values ranging from −1.058 to 1.501, indicating normality assumptions were satisfied for confirmatory factor analysis.

To validate the factor structure identified in the EFA, confirmatory factor analysis was conducted using maximum likelihood estimation on an interdependent sample (N = 200). Following item refinement based on squared multiple correlation (SMC) below acceptable thresholds, 17 items were retained for the final model. The hypothesized four-factor model demonstrated acceptable fit to the data (Figure 3 and Table 6): $X^{2} (112) = 238.24$ , $p < . 001; C F I = . 933, T L I = . 918, S R M R = . 061, R M S E A = . 075 (90 % C I = . 062, . 089)$ . Model fit was evaluated using multiple complementary indices, consistent with recommendations that emphasize joint interpretation of absolute and incremental fit measures rather than reliance on a single cutoff criterion (Hu and Bentler, 1999).

Figure 3.

Confirmatory factor analysis model.

All factor loadings were statistically significant ( $p < . 001)$ and ranged from 0.61 to 0.89. Factor correlations ranged from 0.44 to 0.73. SMC values ranged from 0.38 to 0.80. The final model comprised four factors: AI Fundamental knowledge (six items), AI Impact assessment (four items), AI Performance evaluation (three items), and AI Practical application (four items).

All factors demonstrated satisfactory internal consistency. Cronbach’s alpha for the full scale was 0.914 and for each factor as follows: AI fundamental knowledge = 0.873, AI impact assessment = 0.791, AI performance evaluation = 0.835, and AI practical application = 0.791. These values indicates that the items within each facto were consistently related to their underlying constructs.

To assess convergent validity, average variance extracted (AVE) value and construct reliability (CR) were calculated for each factor. As shown in Table 5, AVE values for all construct ranged from 0.553 to 0.620, exceeding the recommended threshold of 0.50 (Fornell and Larcker, 1981). In addition, the CR values ranged from 0.797 to 0.880, surpassing the suggested minimum of 0.70 (Hair et al., 2019).

Table 5.

Results of confirmatory factor analysis.

Fator	No. of items	β	AVE	CR
AI Fundamental knowledge	6	0.673–0.846	0.553	0.880
AI Impact assessment	4	0.612–0.784	0.620	0.866
AI Performance evaluation	3	0.630–0.893	0.573	0.797
AI Practical application	4	0.675–0.784	0.563	0.837

Multi-group confirmatory factor analysis

Multi-group confirmatory factor analysis was conducted to examine measurement invariance across Engineering and Humanities major students (Table 6). The unconstrained model demonstrated acceptable fit to the data (χ² = 333.640, p < 0.001, TLI = 0.909, CFI = 0.925, RMSEA = 0.055). χ² difference tests revealed that the unconstrained model and the model with constrained factor loadings (Model 1) did not differ significantly ( $Δ χ^{2} = 18.574, Δ d f = 13, p > . 05$ ), supporting both configural and metric invariance. This finding indicates that the AI literacy construct is conceptualized similarly across both academic disciplines, and that factor loadings between latent variables and observed indicators are equivalent between groups, thereby confirming the appropriateness of multi-group analyses. However, models with increasingly restrictive constraints (Models 2–4) demonstrated significant deterioration in model fit compared to the baseline model, suggesting violations of scalar and residual invariance assumptions.

Table 6.

Multi-group confirmatory factor analysis results.

Model	$χ^{2}$	$d f$	TLI	CFI	RMSEA	$Δ χ^{2}$	$Δ d f$	$p$
Unconstrained model	333.640	224	0.909	0.925	0.055
Constrained model 1^a	352.215	237	0.909	0.921	0.054	18.574	13	0.137
Constrained model 2^b	354.135	234	0.904	0.918	0.056	20.494	10	<0.05
Constrained model 3^c	432.218	264	0.881	0.885	0.062	98.577	40	<0.001
Constrained model 4^d	457.677	282	0.884	0.879	0.062	124.037	58	<0.001

Constrained model 1: Factor loadings constrained equal across groups.

Constrained model 2: Covariances constrained equal across groups.

Constrained model 3: Both factor loadings and covariances constrained equal across groups.

Constrained model 4: Factor loadings, covariances, and error variances constrained equal across groups.

Group differences in validated scales

Following confirmation of measurement invariance, independent samples t-tests were conducted to examine AI literacy differences across key demographic variables.

Academic discipline (Engineering vs Humanities/Social Sciences)

Results revealed significant differences across all four AI literacy factors (Table 7). Engineering students demonstrated higher scores than humanities/social science students in AI Fundamental Knowledge (M = 3.615 vs M = 2.871, t = −7.652, p < 0.001, Cohen’s d = 0.842), AI Performance Evaluation (M = 3.233 vs M = 2.591, t = −5.968, p < 0.001, Cohen’s d = 0.651), and AI Practical Application (M = 3.729 vs M = 3.260, t = −5.421, p < 0.001, Cohen’s d = 0.596). A smaller but significant difference was also found in AI Impact Assessment (M = 4.027 vs M = 3.829, t = −2.824, p < 0.01, Cohen’s d = 0.311). Effect sizes ranged from medium to large, with AI Fundamental Knowledge showing the largest difference (d = 0.842) and AI Impact Assessment showing the smallest difference (d = 0.311) between groups.

Table 7.

Independent t-test results by academic disciplines.

Factor	Group	N	Mean	S.E.	T	Cohen’s d
AI Fundamental Knowledge	Humanities & Social Sciences	160	2.871	0.804	−7.652***	0.842
AI Fundamental Knowledge	Engineering	171	3.615	0.950	−7.652***	0.842
AI Impact Assessment	Humanities & Social Sciences	160	3.829	0.643	−2.824**	0.311
AI Impact Assessment	Engineering	171	4.027	0.632	−2.824**	0.311
AI Performance Evaluation	Humanities & Social Sciences	160	2.591	0.852	−5.968***	0.651
AI Performance Evaluation	Engineering	171	3.233	1.097	−5.968***	0.651
AI Practical Application	Humanities & Social Sciences	160	3.260	0.795	−5.421***	0.596
AI Practical Application	Engineering	171	3.729	0.777	−5.421***	0.596

Note. *p < .05, **p < 0.01, ***p < .001.

AI course experience

Similarly, students with AI course experience showed significantly higher proficiency across all AI literacy dimensions compared to those without such experience (Table 8). The most pronounced differences were observed in AI Fundamental Knowledge and AI Performance Evaluation. Significant differences were also evident in AI Practical Application and AI Impact Assessment.

Table 8.

Independent t-test results by AI course taking.

Factor	Group	N	Mean	S.E.	t	Cohen’s d
AI Fundamental Knowledge	With AI course	192	3.664	0.867	10.674***	1.068
AI Fundamental Knowledge	Without AI course	208	2.767	0.811	10.674***	1.068
AI Impact Assessment	With AI course	192	4.040	0.607	3.992***	0.400
AI Impact Assessment	Without AI course	208	3.793	0.628	3.992***	0.400
AI Performance Evaluation	With AI course	192	3.288	1.023	8.156***	0.821
AI Performance Evaluation	Without AI course	208	2.501	0.893	8.156***	0.821
AI Practical Application	With AI course	192	3.747	0.746	7.291***	0.730
AI Practical Application	Without AI course	208	3.185	0.792	7.291***	0.730

Note. ***p < .001.

Academic level (undergraduate vs graduate)

Regarding academic level differences, graduate students demonstrated significantly higher AI literacy in specific domains (Table 9). Differences were found in AI Fundamental Knowledge and AI Performance Evaluation, with graduate students outperforming undergraduates in both areas. However, no significant differences were observed in AI Impact Assessment or AI Practical Application.

Table 9.

Independent t-test results by status.

Factor	Group	N	Mean	S.E.	t	Cohen’s d
AI Fundamental Knowledge	Undergraduate	219	3.037	0.872	−3.747***	0.381
AI Fundamental Knowledge	Graduate	181	3.393	1.005	−3.747***	0.381
AI Impact Assessment	Undergraduate	219	3.905	0.632	−0.231	-
AI Impact Assessment	Graduate	181	3.920	0.629	−0.231	-
AI Performance Evaluation	Undergraduate	219	2.767	0.914	−2.344*	0.241
AI Performance Evaluation	Graduate	181	3.015	1.152	−2.344*	0.241
AI Practical Application	Undergraduate	219	3.417	0.769	−1.029	-
AI Practical Application	Graduate	181	3.501	0.877	−1.029	-

Note. *p < .05, ***p < .001.

Research model validation

Analysis with 182 participants providing GPA information demonstrated strong internal consistency (Cronbach’s α: Academic self-efficacy = 0.889, Creativity = 0.888). Prior to conducting structural equation modeling (SEM), correlation analysis among latent factors was performed. The results revealed correlations of 0.428 between AI literacy and academic self-efficacy, 0.234 between AI literacy and creativity, and 0.537 between academic self-efficacy and creativity. All correlation coefficients were below 0.80, indicating no multicollinearity concerns and confirming that the basic assumptions for SEM analysis were satisfied. The factors demonstrated appropriate levels of intercorrelation, supporting the feasibility of structural modeling. Academic self-efficacy and creativity exhibited the strongest correlation, suggesting these constructs are closely related. These preliminary findings supported the subsequent structural equation modeling analysis by establishing the initial statistical relationships between AI literacy, academic self-efficacy, creativity, and academic achievement.

The hypothesized structural model was tested using maximum likelihood estimation. Model fit indices indicated acceptable fit: χ² (40) = 88.887, p < 0.001, SRMR = 0.075, CFI = 0.927, TLI = 0.900, RMSEA = 0.082 (90% CI: 0.059–0.105). The measurement model included three factors for academic self-efficacy (task difficulty preference, self-regulatory efficacy, and confidence) and three factors for creativity (fluency, flexibility, and originality), based on their factor loadings and theoretical foundations.

Model specification testing was conducted to examine whether the relationship between AI literacy and creativity involved indirect pathways through academic self-efficacy. This finding is consistent with theoretical frameworks suggesting that technology literacy is associated with creative capabilities through enhanced self-confidence and competence beliefs. Based on the model specification testing and fit indices, the final structural model indicates that AI literacy is indirectly related with creativity through academic self-efficacy, while maintaining direct effects on both academic self-efficacy and academic achievement.

Path analysis results for the direct effects are presented in Table 10 and Figure 4. All five direct path hypotheses were supported. AI literacy was positively associated with both academic self-efficacy $(β = . 442)$ and academic achievement $(β = . 190)$ . Academic self-efficacy was positively associated with creativity $(β = . 539)$ and academic achievement $(β = . 295)$ . Creativity was negatively associated with academic achievement $(β = - . 219)$ .

Table 10.

Direct path analysis results.

Path	Coefficient		Standard error	C.R.(t)
Path	B	β	Standard error	C.R.(t)
AI Literacy → Academic self-efficacy	0.374	0.442	0.096	3.895***
Academic self-efficacy → Creativity	0.787	0.539	0.092	2.002***
AI Literacy → Academic achievement	0.185	0.190	0.096	3.895*
Academic self-efficacy → Academic achievement	0.341	0.295	0.082	−2.108*
Creativity → Academic achievement	−0.173	−0.219	0.174	4.533*

Note.*p < .05, ***p < .001.

Figure 4.

Research model result.

The structural equation modeling results provided support for all five hypothesized relationships, though one finding emerged in an unexpected direction (Table 11). AI literacy showed a significant positive association with academic self-efficacy, confirming H1 and aligning with self-efficacy theory (Bandura, 1997), which posits that perceived competence in specific domains is associated with greater confidence in related academic tasks. Academic self-efficacy showed a strong positive relationship with creativity, supporting H2 and consistent with Tierney and Farmer’s (2002) findings that self-efficacy beliefs are important correlates of creative performance. This relationship is consistent with the theoretical proposition that confidence in one’s academic abilities facilitates creative thinking processes. Both AI literacy and academic self-efficacy were positively associated with academic achievement, confirming H3 and H4. The self-efficacy-achievement relationship aligns with extensive research demonstrating this well-established connection (Zimmerman, 2000). Creativity showed a significant negative association with academic achievement, supporting H5, though the direction differed from theoretical expectations. This counterintuitive finding may reflect tensions between creative thinking processes and conventional assessment methods.

Table 11.

Hypothesis results.

Hypothesis	Path	Results
H1	AI Literacy has an effect on Academic self-efficacy.	Accepted
H2	Academic self-efficacy has an effect on Creativity.	Accepted
H3	AI Literacy has an effect on Academic achievement.	Accepted
H4	Academic self-efficacy has an effect on Academic achievement.	Accepted
H5	Creativity has an effect on Academic achievement.	Accepted

Given the negative association between creativity and academic achievement, additional analyses were conducted to examine indirect pathways to understand the underlying pathways through which AI literacy is statistically associated with academic outcomes. Mediation analysis allows for decomposition of total effects into direct and indirect factors, providing insights into whether the association between AI literacy and academic outcomes is accounted for by psychological mechanisms such as self-efficacy. Understanding these mediating pathways is particularly important for educational practice, as it can inform whether interventions focus on developing AI technical skills directly or on building academic confidence as a foundation for effective AI integration.

To examine indirect effects, bias-corrected bootstrap mediation analysis with 500 resamples was conducted. Phantom variables were employed to decompose complex indirect pathways and examine specific mediation effects within the structural model. Results are presented in Table 12.

Table 12.

Mediation analysis results for AI literacy and academic achievement.

Path	Total effect	Direct effect	Total indirect effect	95% C.I
Path	Total effect	Direct effect	Total indirect effect	LLCI	ULCI
AI Literacy → Academic Achievement	0.268**	0.190*	0.078	−0.004	0.284

p < 0.01. *p < 0.05.

The mediation analysis revealed that the total indirect effect of AI literacy on academic achievement was not statistically significant $(β = . 078, 95 % CI [- . 004, . 284])$ , while the indirect effect of AI literacy on creativity through academic self-efficacy was statistically significant $(β = . 238, 95 % CI [. 083, . 394])$ .

Discussion and conclusion

This study developed and validated a multidimensional AI literacy scale for higher education students, addressing a gap in the assessment of AI competencies within academic contexts. The systematic development process yielded a 17-item instrument measuring four dimensions: AI fundamental knowledge, AI impact assessment, AI performance evaluation, and AI practical application. The scale demonstrated adequate psychometric properties and revealed significant relationships between AI literacy and academic variables.

The four-factor structure emerged consistently across exploratory and confirmatory factor analyses, supporting the theoretical distinction between knowledge-based and application-oriented AI competencies. This finding aligns with existing digital literacy frameworks that differentiate between technical knowledge and practical application skills (Wang et al., 2023b), while extending these concepts to AI-specific contexts. The measurement invariance testing confirmed that the AI literacy construct operates consistently across engineering and humanities students, indicating that the underlying factor structure remains stable despite disciplinary differences. This finding provides confidence that the scale measures the same construct across different academic populations, though the significant mean differences suggest that absolute competency levels vary substantially between groups. The internal consistency coefficients (α = 0.791–0.873), together with construct validity, indicate that the scale is suitable for research purposes. The average variance extracted values (0.553–0.620) meet acceptable thresholds, indicating that each factor captures meaningful shared variance among its constituent items.

As expected, engineering students demonstrated higher AI literacy scores across all dimensions, providing evidence for the scale’s discriminant validity. However, the magnitude of differences varied meaningfully across dimensions, with the largest gaps in fundamental knowledge (Cohen’s d = 0.842) and AI performance evaluation (Cohen’s d = 0.651), suggesting these areas are most influenced by technical training background. The significant differences based on AI course experience provide evidence for the scale’s discriminant validity, as students with formal AI education demonstrated higher competencies across all dimensions. This pattern supports the theoretical expectation that formal instruction contributes to AI literacy development. The pattern of difference-with largest effects in fundamental knowledge and performance evaluation, moderate effects in practical application, and smallest effects in impact assessment-suggests that technical training primarily enhances analytical capabilities, while evaluative and applied skills may develop through more diverse pathways. This finding warrants further investigation to understand how AI literacy components develop across different educational experiences.

The structural equation modeling results revealed that AI literacy is positively associated with academic self-efficacy (β = 0.442, p < 0.001), supporting social cognitive theory’s propositions about the relationship between competence and confidence (Bandura, 1997). This finding extends previous research on technology self-efficacy by demonstrating the relationship specifically within AI contexts. Academic self-efficacy’s positive relationship with creativity (β = 0.539, p < 0.001) aligns with existing research on self-efficacy as an antecedent to creative performance (Tierney and Farmer, 2002). The indirect relation between AI literacy and creativity through academic self-efficacy suggests that these constructs are statistically linked through academic self-efficacy rather than directly. The unexpected negative relationship between creativity and academic achievement (β = −0.219, p < 0.05) requires careful interpretation. This finding may reflect tensions between creative approaches to learning and conventional assessment methods, or it could indicate suppression effects within the model. The negative coefficient emerged despite positive zero-order correlations, suggesting that creativity’s relationship with achievement becomes negative when controlling for AI literacy and self-efficacy.

The validated four-factor structure contributes to theoretical understanding of AI literacy with novelty focusing on higher education students through items specifically designed for university and graduate-level contexts, distinguishing it from previous general AI literacy scales. The distinction between fundamental knowledge and practical application aligns with broader information literacy frameworks while addressing AI-specific requirements. The indirect-only effect of AI literacy on creativity through academic self-efficacy extends social cognitive theory to technology-enhanced learning contexts, demonstrating that academic self-efficacy serves as a key statistical mediator related technological competencies with other academic outcomes. This finding suggests that AI literacy development involves psychological as well as technical dimensions.

Limitations and future directions

Several limitations constrain the interpretation and generalizability of these findings. This study is limited by its context specificity, as data were collected from higher education students in a single country, the reliance on self-reported measures, and the cross-sectional design, which precludes causal inferences about the relationships among constructs. Although the overall sample size was adequate for estimating the proposed measurement and structural models, the split-sample design for exploratory and confirmatory factor analyses and the group sizes used in the multigroup CFA represent relatively modest samples for complex latent variable modeling. Consequently, some parameter estimates and invariance tests may be sensitive to sampling variability, particularly for cross-group comparisons, and should be interpreted with appropriate caution. Additionally, cultural factors may influence students’ AI literacy perceptions and self-efficacy, suggesting that cross-cultural validation is needed before applying the scale in different educational contexts. The cross-sectional design precludes causal inferences about the relationships among constructs. The reliance on self-report measures, while appropriate for assessing perceived competencies, may introduce response bias and does not capture objective performance capabilities. The unexpected creativity-achievement relationship highlights the need for more nuanced measurement approaches that can distinguish between different types of creative thinking and academic performance. The focus on general AI literacy rather than domain-specific applications limits understanding of how AI competencies operate within specific disciplinary contexts. While group differences were statistically significant, variations in prior AI experience among students may have contributed to these differences. Future research could further control for such factors and examine domain-specific AI literacy measures to provide more targeted insights for educational applications.

The validated scale provides a foundation for longitudinal research examining AI literacy development trajectories and the stability of the four-factor structure over time. Investigation of the relationship between self-reported AI literacy and objective performance measures would strengthen construct validity evidence. Cross-cultural validation studies could establish the scale’s applicability across different educational systems and cultural contexts. Additionally, research examining the effectiveness of different instructional approaches for developing specific AI literacy dimensions could inform evidence-based program development. The unexpected creativity-achievement relationship warrants further investigation using alternative creativity measures and different academic achievement indicators to understand the nature of this relationship and its implications for educational practice.

Implications and conclusions

This research makes several important contributions to AI literacy measurement and understanding in higher education contexts. The development and validation of a four-factor AI literacy scale addresses a critical assessment gap in higher education, specifically designed for university and graduate learners. The proposed scale incorporates items reflecting students’ real experiences, disciplinary differences, and AI-related course exposure. The structure is empirically grounded through factor analyses and topic modeling of international AI discourse. The empirical findings reveal significant patterns in AI literacy development. Notable group differences across academic majors, course experience, and academic levels demonstrate that AI competence develops unevenly across student populations. Engineering students’ pronounced advantages in technical domains, combined with smaller differences in ethical reasoning, suggest that disciplinary training influences specific aspects of AI literacy while other dimensions develop more independently. The structural relationships identified through mediation analysis offer important theoretical insights. AI literacy is statistically linked to academic outcomes mainly through psychological mediators rather than via direct paths, and academic self-efficacy accounted for the association between AI literacy and creativity. This finding challenges assumptions about direct skill transfer and highlights the importance of confidence-building in technology education.

These results have practical implications for educational design and implementation. The findings suggest that effective AI education may require differentiated approaches that account for students’ disciplinary backgrounds while simultaneously addressing both technical competencies and psychological factors. The critical role of academic self-efficacy indicates that AI education programs could benefit from integrating confidence-building strategies alongside skill development to maximize learning outcomes. This research provides both a validated measurement tool and evidence-based theoretical insights that advance understanding of AI literacy in educational contexts. The findings contribute to the growing knowledge base needed to develop effective AI education programs that prepare students for successful engagement with AI technologies in their academic and professional futures. The validated scale offers researchers and educators a reliable instrument for assessing AI literacy while the structural findings provide theoretical foundations for understanding how AI competencies relate to broader academic outcomes.

Footnotes

Acknowledgements

This article is the revision of the first author’s doctoral dissertation from Yonsei University.

ORCID iDs

Keun Young Kang

Ji-Hong Park

Ethical considerations

The study was approved by Yonsei University Graduate School after submitting a declaration of ethical conduct in research. The institution does not require ethical approval for the submission of thesis.

Consent to participate

Written informed consent was obtained from all participants prior to participation in the survey.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by the Yonsei University Humanities and Social Sciences Field Creative Research Fund of 2024-22-0576.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data availability statement

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

Author biographies

Keun Young Kang is an instructor in Graduate school of Education at Yonsei University, S. Korea. She holds a PhD in library and information science from Yonsei University. Her research interests include AI literacy, human-AI collaboration, and research methodology. Her research has been published in journals such as Journal of the Korean Society for Library and Information Science, PLOS One, Journal of the Association for Information Science and Technology, Frontiers in Research Metrics and Analytics, BMC bioinformatics.

Ji-Hong Park is a full professor at the Department of Library and Information Science of Yonsei University, Seoul, S. Korea. His areas of research interest include AI-driven information services, scholarly communication, and social networks. He holds a PhD in information science and technology from the School of Information Studies at Syracuse University, USA, an MS in information management (currently, information systems) from the Syracuse University, and a BA in library and information science from the Yonsei University. He has published several academic articles in journals such as Journal of Librarianship and Information Science, Journal of Documentation, Library & Information Science Research, Journal of the Association for Information Science and Technology, Information Processing & Management, Journal of Information Science, LIBRI, and Electronic Library.

References

Abbas

Hussain

Rasool

(2019) Digital literacy effect on the academic performance of students at higher education level in Pakistan. Global Social Sciences Review IV(I): 108–116.

Amabile

(1988) A model of creativity and innovation in organizations. In: Staw

Cummings

(eds) Research in Organizational Behavior, vol. 10. JAI Press, pp.123–167.

Amabile

(1996) Creativity in Context: Update to the Social Psychology of Creativity. Westview Press.

Annapureddy

Fornaroli

Gatica-Perez

(2025) Generative AI literacy: Twelve defining competencies. Digital Government: Research and Practice 6(1): 1–21.

Asio

JMR

(2024) Artificial intelligence (AI) literacy and academic performance of tertiary level students: a preliminary analysis. Social Sciences, Humanities and Education Journal 5(2): 309–321.

Bandura

(1977) Self-efficacy: Toward a unifying theory of behavioral change. Psychological Review 84(2): 191–215.

Bandura

(1997) Self-efficacy. Cambridge University Press.

Bawden

(2001) Information and digital literacies: A review of concepts. Journal of Documentation 57(2): 218–259.

Buckingham

(2015) Defining digital literacy - What do young people need to know about digital media? Nordic Journal of Digital Literacy 10: 21–35.

10.

Carolus

Koch

Straka

, et al. (2023) MAILS - meta AI literacy scale: Development and testing of an AI literacy questionnaire based on well-founded competency models and psychological change- and meta-competencies. Computers in Human Behavior: Artificial Humans 1(2): 100014.

11.

Chan

CKY

Tsi

LHY

(2024) Will generative AI replace teachers in higher education? A study of teacher and student perceptions. Studies In Educational Evaluation 83: 101395.

12.

Chen

Tallant

Selig

(2025) Exploring generative AI literacy in higher education: Student adoption, interaction, evaluation and ethical perceptions. Information and Learning Sciences 126(1–2): 132–148.

13.

Choi

(2014) The effect of culture and arts based experience on individual creativity and performance. PhD Thesis, Ewha Womans University, South Korea.

14.

Floridi

Cowls

Beltrametti

, et al. (2018) AI4People-An ethical framework for a good AI society: Opportunities, risks, principles, and recommendations. Minds and Machines 28(4): 689–707.

15.

Fornell

Larcker

(1981) Evaluating structural equation models with unobservable variables and measurement error. Journal of Marketing Research 18(1): 39–50.

16.

Grootendorst

(2022) BERTopic: Neural topic modeling with a class-based TF-IDF procedure. arXiv preprint arXiv: 2203. 05794.

17.

Guilford

(1950) Creativity. American Psychologist 5(9): 444–454.

18.

Guilford

(1967) The Nature of Human Intelligence. McGraw-Hill.

19.

Guilford

(1968) Intelligence has three facets: There are numerous intellectual abilities, but they fall neatly into a rational system. Science 160(3828): 615–620.

20.

Hair

Risher

Sarstedt

, et al. (2019) When to use and how to report the results of PLS-SEM. European Business Review 31(1): 2–24.

21.

Hobeika

Hallit

Malaeb

, et al. (2024) Multinational validation of the Arabic version of the Artificial Intelligence Literacy Scale (AILS) in university students. Cogent Psychology 11(1): 2395637.

22.

Hornberger

Bewersdorff

Nerdel

(2023) What do university students know about Artificial Intelligence? Development and validation of an AI literacy test. Computers and Education: Artificial Intelligence 5: 1–12.

23.

Hornberger

Bewersdorff

Schiff

, et al. (2025) A multinational assessment of AI literacy among university students in Germany, the UK, and the US. Computers in Human Behavior: Artificial Humans 4: 100132.

24.

Bentler

(1999) Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling A Multidisciplinary Journal 6(1): 1–55.

25.

Jin

Martinez-Maldonado

Gašević

, et al. (2025) GLAT: The generative AI literacy assessment test. Computers and Education: Artificial Intelligence 9: 100436.

26.

Karaca

Çalışkan

Demir

(2021) Medical artificial intelligence readiness scale for medical students (MAIRS-MS) - Development, validity and reliability study. BMC Medical Education 21(1): 112–119.

27.

Khoso

Ali

Aslam

(2023) Use of Chat-GPT and AI tools by undergraduates: Students and teachers’ perspective. Spry Contemporary Educational Practices 2(2): 1–24.

28.

Kim

Park

(2001) Construction and validation of academic self-efficacy scale [In Korean]. Korean Journal of Educational Research 39(1): 95–123.

29.

Koch

Carolus

Wienrich

, et al. (2024) Meta AI literacy scale: Further validation and development of a short version. Heliyon 10(21): e39686.

30.

Laupichler

Aster

Haverkamp

, et al. (2023) Development of the “Scale for the assessment of non-experts’ AI literacy” – An exploratory factor analysis. Computers in Human Behavior Reports 12: 100338.

31.

Lintner

(2024) A systematic review of AI literacy scales. npj Science of Learning 9(1): 50.

32.

Liu

Zhang

Wei

(2025) Generative artificial intelligence literacy: Scale development and its effect on job performance. Behavioral Sciences 15(6): 811.

33.

Livingstone

(2004) What is media literacy? Intermediair 32(3): 18–20.

34.

Long

Magerko

(2020) What is AI literacy? Competencies and design considerations. In: Proceedings of the 2020 CHI conference on human factors in computing systems, Virtual conference, pp.1–16. Hawaii: Association for Computing Machinery.

35.

Lynn

(1986) Determination and quantification of content validity. Nursing Research 35(6): 382–385.

36.

Mansoor

HMH

Bawazir

Alsabri

, et al. (2024) Artificial intelligence literacy among university students—A comparative transnational survey. Frontiers in Communication 9: 1478476.

37.

Mikalef

Gupta

(2021) Artificial intelligence capability: Conceptualization, measurement calibration, and empirical study on its impact on organizational creativity and firm performance. Information Management 58(3): 103434.

38.

Montenegro-Rueda

Fernández-Cerero

Fernández-Batanero

, et al. (2023) Impact of the implementation of ChatGPT in education: A systematic review. Computers 12(8): 1–13.

39.

DTK

Leung

JKL

Chu

SKW

, et al. (2021) Conceptualizing AI literacy: An exploratory review. Computers and Education: Artificial Intelligence 2: 1–11.

40.

O’Dea

DTK

O’Dea

, et al. (2026) Factors affecting university students’ generative AI literacy: Evidence and evaluation in the UK and Hong Kong contexts. Policy Futures in Education 24: 13–34. DOI: 10.1177/14782103241287401

41.

Stanford University Human-Centered Artificial Intelligence (2021) Artificial Intelligence Index Report 2021. Report, Stanford University Human-Centered Artificial Intelligence.

42.

Street

(2003) What’s ‘new’ in new literacy studies? Critical approaches to literacy in theory and practice. Current Issues in Comparative Education 5(2): 77–91.

43.

Tagare

Karki

(2025) K-12 teachers’ ethical competencies for AI literacy: Insights from a systematic literature review. Computers & Education 239: 105435.

44.

Tierney

Farmer

(2002) Creative self-efficacy: Its potential antecedents and relationship to creative performance. Academy of Management Journal 45(6): 1137–1148.

45.

Torrance

(1965) Scientific views of creativity and factors affecting its growth. Daedalus 94: 663–681.

46.

Walter

(2024) Embracing the future of Artificial Intelligence in the classroom: the relevance of AI literacy, prompt engineering, and critical thinking in modern education. International Journal of Educational Technology in Higher Education 21(15): 1–29.

47.

Wang

Rau

PLP

Yuan

(2023a) Measuring user competence in using artificial intelligence: Validity and reliability of artificial intelligence literacy scale. Behaviour & Information Technology 42(9): 1324–1337.

48.

Wang

Liu

Zhao

, et al. (2023b) Review of large vision models and visual prompt engineering. Meta-Radiology 1(3): 1–13.

49.

Wang

Sun

Chen

(2023c) Effects of higher education institutes’ artificial intelligence capability on students’ self-efficacy, creativity and learning performance. Education and Information Technologies 28(5): 4919–4939.

50.

Yang

Zhang

Sun

, et al. (2025) Navigating the landscape of AI literacy education: Insights from a decade of research (2014–2024). Humanities and Social Sciences Communications 12(1): 374.

51.

Yuan

Tsai

HYS

Chen

(2024) Charting competence: A holistic scale for measuring proficiency in artificial intelligence literacy. Journal of Computer Assisted Learning 62(7): 1455–1484.

52.

Zhang

Zhao

Zhou

, et al. (2024) Do you have AI dependency? The roles of academic self-efficacy, academic stress, and performance expectations on problematic AI usage behavior. International Journal of Educational Technology in Higher Education 21(1): 34.

53.

Zimmerman

(2000) Self-efficacy: An essential motive to learn. Contemporary Educational Psychology 25(1): 82–91.