Abstract
This pilot study explored how preservice teachers (PSTs) use generative artificial intelligence (AI) to create disability reference handouts by comparing a ChatGPT group to a traditional search group. Thirty preservice general and special educators enrolled in an introductory special education course at a Midwestern U.S. university were assigned to one of two conditions. Handouts were evaluated with a rubric aligned to a widely used textbook, assessing definitions, characteristics, instructional challenges, and strategies. ChatGPT handouts were at least as accurate as traditional ones. Participants reported positive experiences using AI. Findings suggest that AI can support PSTs’ knowledge acquisition, warranting further study of specialized AI in preservice education.
The public launch of ChatGPT in late 2022 marked a critical moment for K–12 and higher education. Within a very short time, generative artificial intelligence (AI) shifted from a novelty (e.g., asking it to write a Spice Girls-style song about your dog) to a tool that permeates nearly every aspect of educators’ daily tasks (e.g., supporting lesson planning or helping write parent emails). During the 2023 to 2024 school year, 60% of the United States. K–12 educators used AI to draft lesson plans, differentiate texts, and grade student work (Gallup & Walton Family Foundation, 2025). The following year, 84% used AI and planned to expand usage in the future (Policar & Morgan-Bortone, 2024).
As AI tools become increasingly available to educators and learners, the field must grapple with whether these tools support authentic, practical learning without compromising educational rigor and accuracy. AI-powered tools offer opportunities to personalize content, adapt materials, and support learner variability. These features are especially relevant to students with learning disabilities (SLDs), who often experience challenges with decoding, organizing, and retaining information (Fangwi & Fedelis, 2025). AI can simplify text, repurpose content, and scaffold tasks, strategies that are consistent with high-leverage practices for SLDs (McLeskey et al., 2017) and are central to Universal Design for Learning (CAST, 2018).
Preservice Teacher Preparation for Students With Learning Disabilities
While emerging technologies, such as AI, offer new possibilities for improving instruction and supporting students with disabilities (SWDs), their impact hinges on the understanding of student needs that educators develop during their preservice preparation. For preservice teachers (PSTs), building this understanding begins with high-quality instruction, and research continues to highlight gaps in how preservice programs address the strengths and needs of SWDs, including SLDs (McLeskey et al., 2017). Across the country, many PSTs complete their programs with limited coursework or practical experience related to teaching students with SWDs. When it comes to literacy or age-appropriate instruction, training is often too broad or overly theoretical (Byrd & Alexdander, 2020). As a result, PSTs often enter the classroom with misconceptions about how disabilities affect learning and how to adapt instruction effectively. This is particularly problematic for SLDs, where challenges in reading, writing, memory, and attention profoundly affect access to the general curriculum (Kim, 2022).
AI’s capacity to personalize learning and tailor instruction to individual students’ needs provides support that aligns directly with research on effective instruction for SLDs, including explicit instruction, visual cues, and guided practice (Turnbull et al., 2020). This convergence places both PST education and K–12 instruction at a pivotal moment, as many in-service educators face limitations when planning and designing for SWDs. AI could serve as an initial planning partner as educators consider appropriate accommodations and modifications.
Theoretical Framework
Extending Shulman (1986) Pedagogical Content Knowledge (PCK), the Technological Pedagogical Content Knowledge (TPACK) framework (Mishra & Koehler, 2006) posits that effective technology integration requires a complex, nuanced understanding of the relationships between technology, pedagogy, and content. With over 20 years of research and more than 1,200 journal articles, the TPACK has been found to improve digital competency and learner outcomes (Aslan et al., 2025; Moreno et al., 2019; Su, 2023; Tseng et al., 2022). Research continues to show that technology for its own sake fails to promote effective use (Voithofer & Nelson, 2021). Rather than viewing these as isolated silos, the framework illustrates that technology is most effective when used alongside evidence-based pedagogy within a specific content context.
Drawing on the TPACK framework, the Effective Use of AI in Educator Preparation Framework (EUAI; see Figure 1) illustrates that effective AI integration in special education is predicated on two foundational prerequisites: basic AI literacy, defined as the ability to understand, use, evaluate, and act ethically with AI (Dilek et al., 2025), and a foundational knowledge of special education. This base supports four essential pillars: understanding of the AI platform, quality prompt engineering, a clear verification process, and operational understanding. Collectively, these pillars facilitate the integration of AI into special educator preparation.

Effective use of AI in educator preparation.
The first pillar requires PSTs to have a functional understanding of various AI interfaces. Teacher educators should introduce these tools (e.g., ChatGPT, CoPilot, Claude) by explicitly highlighting their strengths and limitations. The second pillar requires PSTs to master effective prompting to direct AI toward desired outputs. Third, teacher educators should implement a rigorous verification process for AI-produced content. PSTs must synthesize this information to demonstrate an operational understanding of the content. Taken together, these four pillars underscore that effective AI integration for PSTs is not achieved through tool access alone but rather through intentional instructional design that builds educators’ capacity to use AI thoughtfully, critically, and contextually as an adaptation, scaffolded support, and evidence-based practice.
Building PSTs Capacity to Use AI
Students with learning disabilities (SLDs) frequently encounter barriers to academic content due to challenges in foundational skills like decoding, reading fluency, written expression, and executive function, which are often implicitly assumed in general education materials (Turnbull et al., 2020). This necessitates scaffolds that adapt content presentation, output format, or engagement methods. AI tools, such as ChatGPT, offer solutions aligned with evidence-based interventions for SLDs by simplifying text, providing multiple means of representation, and generating visual supports for comprehension (Goldman et al., 2024; Turnbull et al., 2020). When integrated, AI can deliver personalized learning, reduce cognitive load, aid organization and task planning, and support learning outcomes for learners with working memory, attention, or information processing difficulties (Goldman et al., 2025).
Despite this potential, little is known about how AI tools are being used to benefit SLDs directly (Panjwani-Charania & Zhai, 2023). As the field explores appropriate applications of emerging technologies, it is essential to ensure that these tools not only supplement instruction but also align with the individualized, research-based supports required by SLDs.
Concerns About the Use of AI in Education
While the potential of AI to transform teaching and learning is undeniable, its application in educational settings presents both possibilities and pitfalls. The accelerated adoption of AI has raised concerns about its ethical use and the integrity of its application. A primary reason for these challenges is that the quality of the AI training data inherently limits the effectiveness of AI tools. A May 2024 Pew Report revealed that one-quarter of educators believe AI currently does more harm than good, citing factual inaccuracies, algorithmic bias, and a perceived invitation to cheat (Lin, 2024). As AI tools are trained on publicly available internet data, their output can include information that is accurate, inaccurate, debunked, and inherently biased (Porayska-Pomsta et al., 2023). Thus, teaching educators about the ethical and responsible use of AI in their workflows is paramount. These essential AI literacy skills include the ability to effectively prompt AI tools; evaluate AI datasets; understand how personal information is collected, used, and shared; and assess the credibility of AI output (Mills et al., 2024).
A particularly prominent concern, quickly documented in initial AI use, was the phenomenon of AI hallucinations, in which large language models confidently generated inaccurate or made-up information (Zhang et al., 2023). When applied to disability content, such inaccuracies risk perpetuating myths, reinforcing stereotypes, or misrepresenting essential information, which is particularly problematic for PSTs who often lack the necessary background knowledge to evaluate the content’s accuracy. A recent analysis of AI responses on special education terminology found that over one-third of outputs contained factually incorrect or misleading information (Bernard & Aladé, 2024).
For PSTs, the danger lies not only in receiving incorrect content but also in trusting the information without engaging in critical reflection. For instance, the CEEDAR Center highlighted that while AI may help scaffold knowledge and instructional planning, teacher educators must ensure it does not replace foundational learning experiences, such as analyzing case studies, designing individualized supports, or writing evidence-based Individualized Education Program (IEP) goals (Dieker et al., 2024). At the forefront of these considerations are concerns about academic cheating, which appears to be growing in both use and concern. PSTs are using AI to generate lesson plans and discussion posts without citing AI use (Langreo, 2024b). These behaviors are problematic because teacher preparation programs aim to build PSTs’ foundational knowledge and reflective skills. These skills are necessary for supporting SWDs, especially those with SLDs.
To address concerns about AI for PSTs, teacher education programs must implement clear guidelines that promote responsible AI use while reducing the risk of academic dishonesty or bias (Karataş et al., 2025). Clear, specific policies for the appropriate use of AI are necessary. To achieve this, faculty must clearly define AI use policies, while PSTs must remain transparent about their AI use in lesson plans and assessments. Faculty in teacher education programs should model ethical AI practices through discussion, demonstration, and case studies to explore ethical dilemmas (e.g., academic dishonesty, data privacy, and over-reliance on AI), thereby bolstering PSTs’ skills and professional judgment (Karataş & Yüce, 2024).
Purpose of the Study
As preservice education programs strive to equip future special educators with effective tools and strategies, AI technologies like ChatGPT could enhance their understanding of and support for SWDs. These tools offer quick access to information, sample accommodations, and instructional examples and resources that, when used intentionally, could supplement traditional learning experiences. It is within this context that AI may offer value, not as a replacement for preservice education, but as a tool to support it. As the Center for Innovation, Design, and Digital Learning (Center for Innovation, Design, and Digital Learning 2024) highlights, successful AI integration requires thorough instruction, guided practice, and thoughtful evaluation. Particularly within the context of preservice education focused on SLDs, understanding the impact of AI on the development of educator knowledge is essential. This study directly addresses that need by investigating how PSTs use AI to create handouts on SLDs and comparing them with materials produced through traditional methods. In doing so, we aim to explore whether AI can help bridge longstanding gaps in educators’ knowledge and preparation regarding SWD.
A significant question is whether the content generated by tools like ChatGPT is reliable. Little is known about the accuracy or educational value of the content it produces, particularly when used to learn about disability-specific content. Early reports and initial studies indicate that PSTs find AI tools helpful for saving time and generating ideas, but remain uncertain about the trustworthiness of the information provided (Bae et al., 2024; Unal & Hobe, 2025). This is especially important in learning disabilities, where a clear understanding of a student’s needs is essential for effective instructional planning. Therefore, the purpose of this study is to examine whether ChatGPT can produce high-quality information on disabilities and the K–12 classroom.
We investigated how PSTs used ChatGPT to create disability-related handouts and how these handouts compared to those made using traditional search methods. We assessed PSTs’ perceptions of ChatGPT’s usefulness, accuracy, and overall experience. The following research questions guided our study:
Method
Participants, Sampling, and Setting
This study was conducted during the spring semester of 2023, just 2 months after the release of ChatGPT. Participants in this study were enrolled in one of the two sections of an introductory special education course offered at a Midwestern U.S. university. Two of the authors were instructors of this course, providing a readily accessible participant pool. The participant sample for this study consisted of 30 preservice general and special educators and related service providers, selected through convenience sampling. The class met twice a week for 15 weeks. Most participants were in their second year of college (n = 20). The remainder were in their third year (n = 6), fourth year (n = 1), and fifth year (n = 1). Participant ages were 19 (n = 8), 20 (n = 10), 21 (n = 3), 22 (n = 4), 23 (n = 1), 25 (n = 2), 27 (n = 1), and 41 (n = 1). Before this assignment, only 10% of participants had used ChatGPT, with 36.67% reporting that they had heard of it but not used it, and the remaining 53.33% stating that they had never heard of it. The majority reported being comfortable with technology but did not consider themselves experts (n = 23; see Table 1).
Comfort With Technology and AI Knowledge of Study Participants.
Research Design
This quasi-experimental post-test-only control-group design assessed the use of ChatGPT as a resource for learning about the disability categories outlined in the Individuals with Disabilities Education Improvement Act (IDEA, 2004). Working individually, in pairs, or in trios, PSTs were assigned to either the treatment group (ChatGPT; n = 16) or the control group (n = 14) to create handouts explaining the various IDEA disabilities and their impact on teaching and learning. Participants in the second group were not explicitly instructed to use ChatGPT., Following the creation of these handouts, researchers evaluated their accuracy to determine the effect of ChatGPT use on information accuracy.
Procedures
The disability project, a major project in the Introduction to Special Education course, requires PSTs to research one of eight IDEA (2004) disability categories and create a handout that includes a disability overview, challenges for both students and teachers, and relevant resources, tools, or interventions. Requirements for these sections are operationally defined in Table 2.
Operational Definitions of Handout Components.
PSTs self-selected their partners, forming four pairs and seven triads, with one student choosing to work alone. Each group chose one disability from the following options to research: Learning Disabilities, Autism (Levels 1 and 3), Emotional Behavioral Disorders, Attention-Deficit/Hyperactivity Disorder (ADHD), Speech and Language Disorders, Intellectual Disabilities, or Developmental Delays. They then created a presentation, a professional learning plan, and a supporting handout that defined and described the disability. While IDEA does not explicitly list ADHD as a disability nor differentiate support levels within autism, course instructors included these options due to their frequent manifestation in general education settings. To prevent redundancy, each disability could be selected only once per section.
After PSTs selected their groups and disability category, each group was assigned to either the treatment or the control (no direction to use ChatGPT) condition. As the study was part of a classroom assignment, the instructor wanted to ensure all PSTs were successful. Thus, he considered his personal knowledge of the PSTs, including their general learner dispositions, when finalizing assignments for the treatment or control group. This decision-making process was centered on students’ overall academic readiness rather than their technical proficiency with search engines or AI tools. The treatment group consisted of two pairs and four triads.
Handouts were completed out of class. The treatment group was explicitly instructed to use ChatGPT to gather their information. The control group used the internet (e.g., YouTube and Google), books, or other reference materials. Once the handouts were completed, two researchers assessed their accuracy by comparing the information to widely used textbooks on disabilities (Turnbull et al., 2014, 2020). The Special Education Eligibility: A Step-by-Step Guide (Pierangelo & Giuliani, 2007) and the DEC Recommended Practices with Examples (Division for Early Childhood [DEC], 2016) were used for developmental delay, as there were no corresponding chapters in the other textbooks. Handouts were evaluated on four criteria as compared to the textbook: (a) the definition of the disability, (b) the characteristics of students with the disability, (c) the instructional challenges a student with the disability brings to the classroom, and (d) the strategies or solutions that can be used to support students with the disability. Upon completion of the assignment, the control group received ChatGPT training to ensure equitable learning.
Training
The researcher conducted a short training session for the treatment group on using ChatGPT. After the control group left, training began with explicit instruction on account creation and basic prompting. The researcher modeled these steps live by connecting her computer to a projector and sharing her screen. Prompt engineering techniques were not taught. The training ended with time for participants to practice and ask questions.
Measures
Independent Variable
The independent variable was the intentional use of ChatGPT to generate the content for the disability handout. This variable had two distinct levels: treatment and control. The treatment group received explicit training on ChatGPT and was subsequently instructed to use it to create their handout content. The control group did not receive any training on ChatGPT and was given no specific guidance regarding its use; they were free to develop their handouts using their usual methods. The potential effect of guided ChatGPT use (independent variable) on the quality of developed handouts (dependent variable) was then assessed using a researcher-developed rubric.
Handout Accuracy
Handout accuracy was measured using a researcher-developed rubric aligned with the information presented in two editions of a widely used textbook: Exceptional Lives (Turnbull et al., 2014, 2020). The selection of textbooks was informed by their extensive publication history (10 editions over 30 years) and their status as a leading resource in special education (Assaf et al., 2021). The contents in each chapter followed the same format, regardless of disability. They each had a section on defining the disability, describing the characteristics, determining supplemental aids and supports, and using effective instructional strategies. Accuracy was operationally defined as the extent to which the handouts aligned with the textbook content. Information on developmental delay was derived from the eighth edition of Exceptional Lives (Turnbull et al., 2014). All other information was taken from the ninth edition of Exceptional Lives (Turnbull et al., 2020).
The rubric assessed four key dimensions: definition, characteristics, instructional challenges, and strategies (see online supplemental materials). For each rubric dimension (definition, characteristics, instructional challenges, and strategies), the researchers used a 3-point scale (1 = matched textbook, 2 = partial match, 3 = significantly different or inaccurate) to compare the evaluated handout’s content with the textbook. To satisfy the accuracy threshold, handouts required a minimum frequency of factual matches for each rubric criterion. For example, participants were required to identify at least five characteristics that aligned with the textbook’s content.
Two researchers assessed each handout. To ensure scoring consistency, the first author trained the second author, also a special education researcher, before independent scoring. This training involved modeling the scoring of one randomly selected handout and co-screening two more randomly selected handouts. While these three handouts were drawn from the actual data set, they were included in the final analysis to maintain the full sample size. Following this training, the researchers independently scored the remaining eight handouts. Handout scores were compared, and disagreements were discussed to reach a consensus. This process ensured that the scoring rubrics were applied consistently across all items. Inter-rater reliability was calculated by dividing the number of agreements by the total number of opportunities and multiplying the quotient by 100. Agreement was 97.7%.
Usability, Feasibility, Acceptability, and Perceptions
We used the Intervention Appropriateness Measure (IAM; α = .87) and Feasibility of Intervention Measure (FIM; α = .89; Weiner et al., 2017), both utilizing 5-point Likert scales, to assess the intervention’s (i.e., the use of ChatGPT) usability, feasibility, acceptability, and user perceptions of implementing an evidence-based practice. Including these scales was essential for determining students’ perceptions of using ChatGPT or their BAU search procedures. The Miscellaneous (MISC) scale was a researcher-developed instrument designed to elicit further insights into PSTs’ feelings about using ChatGPT. While not a validated scale, this measure allowed PSTs to share their feelings about the use of AI. Examples include “Using ChatGPT saved me time” and “I felt like using ChatGPT was cheating” (see the complete scale in the online supplemental material). The use of these scales enabled analysis to answer Research Question 3.
Data Analysis
Statistical comparisons were conducted using the Statistical Package for the Social Sciences (SPSS) and included descriptive statistics, independent samples t tests, effect size estimates, and Pearson chi-square tests. To compare perceptions between the ChatGPT and control groups, we conducted independent samples t-tests and a confirmatory Mann–Whitney U test to account for potential violations of normality, followed by a descriptive item-level analysis to examine potential patterns and differences in individual responses across the two groups.
Results
The purpose of this study was to determine if ChatGPT could produce quality information about disabilities. Twelve handouts were submitted for this assignment. One was excluded from the analysis after an embedded copyright notice identified it as plagiarized, leaving 11 handouts to address the research questions.
RQ1: Comparison of Treatment to Control
An independent samples t-test comparing the mean handout scores showed a statistically significant difference between the ChatGPT and non-ChatGPT groups. In our sample (n = 11), handouts that were created using ChatGPT (treatment; n = 6), on average, produced a match to the information in the textbook more accurately (M = 2.83, SD = .31) than those that were not created with ChatGPT, control: n = 5, M = 2.15, SD = .52, t(9) = −2.73, p = .023. The effect size was very large (Hedges’ g = −1.51). Upon further examination, we found no statistically significant differences between the individual categories (see Table 3).
Independent Samples t-test.
CI = confidence interval; SD = standard deviation.
However, each criterion demonstrated varying effect sizes: one showed a small to medium effect, one a medium effect, another a medium to large effect, and one a large effect. These distinctions offer a more nuanced and practically meaningful interpretation.
To illustrate the differences in significance and effect sizes, we detailed the overall differences in statistical significance and effect sizes, comparing those that used ChatGPT and those that did not across all researched disability areas. For the “definition” category, handouts generated by ChatGPT (n = 6, M = 2.83, SD = .41) were not statistically significantly closer to the textbook definition than those created using other methods: n = 5, M = 2.00, SD = 1.00, t(5.11) = −1.75, p = .140. A very large effect size was observed (Hedges’ g = −1.04). The “characteristics” category produced by ChatGPT (n = 6, M = 3.00, SD = 0.00) was not statistically significantly closer to the textbook definition than the handouts created using other methods: n = 5, M = 2.00, SD = 1.00), t(4.00) = −2.24, p = .089. The effect size was very large (Hedges’ g = −1.37). The “instructional challenges” category produced by ChatGPT (n = 6, M = 2.67, SD = 0.82) did not differ significantly from the handouts created using other methods: n = 5, M = 1.80, SD = 1.10, t(9) = −1.51, p = .166. The effect size was large (Hedges’ g = .83). The “strategies and solutions” produced by ChatGPT (n = 6, M = 2.83, SD = 0.41) was not statistically significantly different from the handouts created using other methods: n = 5, M = 2.80, SD = 0.45, t(9) = −0.13, p = .90. The effect size was small (Hedges’ g = −.07).
RQ2: Comparison of SLD Handouts
We compared rubric scores for handouts created with ChatGPT (treatment) with those for the control group. The treatment handout scored a “12” on the rubric, which was a perfect score. The handout created in the control group scored a “6,” only matching the textbook in the areas of “explaining the instructional strategies” and “solutions.” Whereas the handout created using ChatGPT defined specific learning disabilities as “neurological differences that affect how individuals process information and may impact their ability to learn and use certain academic skills,” the control group defined it by listing various disabilities, including dyslexia, dyscalculia, ADHD, autism, and intellectual disabilities. Some of the characteristics listed by the treatment group included challenges with executive functioning, reading, writing, and math. When asked to list the strengths of SLDs, the treatment group shared creativity, resilience, and strong social skills. They noted that SLDs may struggle to maintain attention, engage in disruptive behavior, and become frustrated. The control group did not list any.
Additionally, the handout shared that traditional testing may not be effective for SLDs. The control handout, however, stated, It’s important to note that each student may experience their learning disability differently, and the list above is not exhaustive. Additionally, students may also have multiple disabilities, so educators need to work closely with parents, specialists, and support staff to provide the best possible education for all students.
Finally, both handouts provided adequate explanations of how educators can support students with SLD.
RQ3: Perceptions of Using ChatGPT to Create a Disability-Related Handout
To answer our final research question, PSTs in the treatment group (n = 16) and the control group (n = 14) were surveyed on their perceptions of their search method. Descriptive statistics were calculated for both groups across three social validity measures: IAM, FIM, and MISC. Scores on the IAM, FIM, and MISC ranged from 12.00 to 20.00, 13.00 to 20.00, and 13.00 to 22.00, respectively. For the IAM measure, the treatment group (M = 16.43, SD = 3.06) had a slightly lower mean than the control group (M = 16.63, SD = 2.42), suggesting comparable perceptions of intervention impact. For the FIM measure, the treatment group (M = 17.07, SD = 2.09) also scored slightly lower than the control group (M = 17.25, SD = 2.27), indicating similar views on functional independence outcomes. However, for the MISC measure, the treatment group (M = 18.93, SD = 1.98) reported higher scores than the control group (M = 17.63, SD = 2.25), suggesting that participants using ChatGPT perceived additional aspects of the intervention more positively. Complete results are presented in Table 4.
Social Validity Results: Student Perceptions of Intervention Impact.
Note: IAM = Intervention Appropriateness Measure; FIM =
Independent samples t-tests were used to determine whether the differences were statistically significant. Levene test of equal variances showed no significant differences in variability between the two groups. Results for the means indicated no significant differences between the treatment and control groups for IAM: t(28) = 0.20, p = .85; FIM: t(28) = 0.22, p = .83); or MISC: t(28) = −1.68, p = .11. Although not statistically significant, the effect sizes provide additional context: IAM (d = 0.07) and FIM (d = 0.08) showed negligible effects. In contrast, MISC (d = 0.61) indicated a moderate effect size in favor of the chat group.
Mann-Whitney U tests were conducted as a nonparametric alternative because the ordinal data could violate the normality assumption required for the independent samples t-test. Both distributions were similar in shape, and their medians were not statistically different. These tests confirmed the independent samples t-test findings between groups for IAM (U = 104.50, p = .76), FIM (U = 103.00, p = .73), and MISC (U = 149.50, p = .12).
We conducted an item-level descriptive analysis of the responses to the social validity items to identify nuanced differences in how participants in the treatment and control groups perceived the intervention. Using descriptive statistics for these data provided a more detailed view of participants’ thoughts that may not be reflected in the overall mean scores. Across the IAM and FIM scales, both groups reported high levels of agreement, indicating that the treatment and control were generally seen as appropriate and feasible. Items such as “My search method was suitable,” “doable,” and “easy to use” received mean ratings above 4.0 in both groups, suggesting that both were acceptable. However, subtle differences emerged: the control group tended to rate items related to fit (e.g., “My search method was fitting” and “a good match”) slightly higher, while the treatment group showed marginally higher ratings on items reflecting applicability and ease of use, such as “My search method was applicable” and “easy to use.”
The most notable differences appeared in the MISC items, which captured broader perceptions of the intervention’s value and future utility. For “ChatGPT may have been a helpful resource,” the treatment group reported a mean of 4.50 (SD = 0.52), whereas the control group reported a mean of 3.93 (SD = 0.80). While both groups tended to agree with the statement, the treatment group was slightly more positive. This suggests those who used ChatGPT may have found it more engaging and useful. The treatment group also expressed greater willingness to use ChatGPT in the future for assignments (M = 3.86; SD = 0.77) and to understand better students (M = 3.79; SD = 0.80). In contrast, the control group reported slightly less interest in using it for future assignments (M = 3.33; SD = 1.11) and for understanding students (M = 3.40; SD = 0.63).
Discussion
This study explored whether ChatGPT could help PSTs understand and support students with disabilities. Framed by the EUAI, this study examined: (a) the accuracy of disability-related information generated by ChatGPT; (b) the similarities and differences between handouts created with ChatGPT and traditional search methods; and (c) PSTs’ perceptions of ChatGPT’s utility, helpfulness, and quality. This study was designed to examine the accuracy and perceived usefulness of AI-generated informational resources and does not evaluate instructional quality, teaching effectiveness, or student learning outcomes. Findings should therefore be interpreted within the scope of these measures. Results showed that ChatGPT-created resources were at least as accurate as those from traditional search methods, and participants were optimistic and enthusiastic about the future use of ChatGPT. These findings suggest potential related to information access and organization rather than evidence of instructional transformation.
Accuracy of ChatGPT
As established in the EUAI, it is essential that teachers critically evaluate the information produced by AI. However, it is also important to establish a baseline for the technology’s accuracy. Our findings showed that handouts created with ChatGPT were a more accurate match to the textbook information than those created through traditional search methods. There were no instances where the handout created using traditional methods was statistically significantly better. This indicates that, for the specific task of creating disability-related reference handouts, ChatGPT can support PSTs in producing content aligned with commonly used instructional texts. At the same time, accuracy in this context reflects alignment with textbook descriptions and should not be interpreted as evidence of instructional quality, pedagogical decision-making, or classroom effectiveness. These mixed findings underscore the importance of cautious interpretation, particularly given the study’s pilot nature and small sample size.
Indications from our study and other research (Rakap, 2024) suggest that ChatGPT has value as a tool for PSTs. This value appears to lie primarily in facilitating access to and organization of foundational information rather than replacing deeper instructional preparation. There is also ongoing concern about the accuracy of AI-generated information among special educators and researchers (Zaugg, 2024), because much of the peer-reviewed special education literature is locked behind paywalls (Cook et al., 2025). This gap between AI’s potential and the quality of its training data is particularly significant in special education, where accuracy and nuance are indispensable.
Even still, PSTs in our study and those in studies by Education Week (Langreo, 2024a; b) showed an interest in learning more about AI and using it in their future classrooms. However, only one aspect of the ChatGPT-created handouts earned a perfect rubric score, indicating that AI-generated content still requires careful review, verification, and contextualization by users. This implies a need to continue developing specialized AI tools trained explicitly on peer-reviewed research and evidence-based practices (e.g., High-Leverage Practices; Council for Exceptional Children). In turn, this may increase their credibility and utility as instructional supports rather than general-purpose tools.
AI’s Utility
Findings from this study contribute to the emerging literature on the implications and utility of AI as a tool to support SWDs. While much of this developing literature focuses on classifying students and predicting academic performance (e.g., Cruz-Jesus et al., 2020; Alvarez et al., 2020; Pallathadka, 2021), as well as AI’s ability to personalize learning and create targeted interventions (Morciano et al., 2024; Sukiman & Abd Aziz, 2021), it is equally important to explore how AI can directly support educators of SWDs. AI has the potential to personalize learning, adapt content and materials, and act as a thought partner in developing effective lesson plans and IEPs. AI’s capacity to support general educators is particularly relevant, as SLDs are predominantly served in their classrooms (Silva et al., 2022). Because many general educators only have one course (Clausen et al., 2023), it is important both general and special education teachers have the essential skills (e.g., direct instruction, learning strategies, cooperative grouping) needed to support SWDs (Graham et al., 2022).
Despite this need, general educators frequently report feeling inadequately prepared to support SWDs in their classrooms effectively (Smith, 2020). AI, therefore, potentially presents a unique opportunity to bridge this preparedness gap. A recent meta-analysis found that professional development has a positive impact on educators’ knowledge, skills, and beliefs, as well as student behavior related to inclusive education (Donath et al., 2023). Educators report that these opportunities are often either too generic (Schwartz, 2023) or too long (Ehlert & Souvignier, 2024). The results of this study suggest that ChatGPT could be a valuable resource for educators to facilitate personalized professional development, and that they need to be critical consumers of AI outputs to use it with integrity in their practice.
Educators’ Perceptions
Our findings align with current, limited research indicating educators’ optimism about the potential benefits of AI in education (Chounta et al., 2022; Uygun, 2024). Adoption of AI in schools is still in its infancy. Prior to our study, nearly all participants had no experience using ChatGPT. These results are further confirmed by a recent Education Week report, which found that 46% of 498 educators had not explored AI tools (Sawchuk, 2024). Another report found that 84% of educators were not leveraging AI to develop personalized assessments (BCS, The Chartered Institute for IT, 2024). Despite AI’s early stage on the Diffusion of Innovations curve, its adoption is progressing at a significantly faster rate than that of previous technologies (e.g., smartphones, computers), underscoring the ongoing need to survey educators about its perceived usefulness for SWDs (Garcia-Aviles, 2020).
Previous studies have found a correlation between preservice preparation, attitudes, and the use of technology in the classroom (Kent & Giles, 2017; Kramarski & Michalsky, 2014). This reinforces the importance of the current PSTs’ perceptions of AI within this course. In-service and PSTs need opportunities to learn about integrating AI into special education and to interact with AI in ways that will enhance their professional development (Hur, 2025). These perceptions should be interpreted as indicators of acceptability and feasibility, not as evidence of effectiveness or impact on teaching practice. This pilot contributes to the literature by suggesting that early exposure to AI may increase interest and comfort. However, further research is needed to determine whether these perceptions translate into meaningful instructional application.
Limitations
While this study provides valuable insights into educators’ use of ChatGPT to learn about SLDs and other disabilities, several limitations must be acknowledged. First, this pilot study included only 30 participants, all enrolled at the same university. The small sample size limits generalizability and limits the external validity (Curtis & Keeler, 2023). Additionally, the study lacked a prior power analysis and had low statistical power. The observed non-significant findings, in conjunction with small effect sizes, suggest that the results should be interpreted with caution, as they may represent a Type II error (a failure to detect a true effect) rather than a definitive absence of an effect. It is also possible that participants’ prior exposure to AI influenced the outcomes. Future research should prioritize a larger, more diverse sample and implement standardized, equalized training protocols to minimize confounding variables and ensure that any observed differences are due solely to the intervention.
Another limitation is that the activity was conducted as part of a class assignment. Although participants were clear that their grade would not be affected by the results of their social validity surveys, they may have responded in ways they believed would be more socially desirable (Rosenthal & Rosnow, 2009), thereby rating the intervention more favorably. Finally, although all participants in the ChatGPT treatment group received the same training, it is impossible to determine whether they received additional guidance, training, or inherent abilities that may have affected these results. The same is true of the control group. Participants were not taught how to search; it was assumed they had the skills to conduct basic internet research.
Several limitations were introduced due to the nature of the research conducted as part of a classroom assignment. First, because the priority was student learning and student success, the assignment of treatment and control groups was not random. Although knowledge of ChatGPT was not considered in the assignments, there is a chance that intentional grouping affected the results. Additionally, PSTs were given the option to work individually, in pairs, or in triads, which meant some had a greater workload than others. Those working collaboratively benefited from opportunities for peer feedback and support. Interestingly, the student who chose to work alone was the same one who had a plagiarized handout.
Implications for Practice
The findings suggest that ChatGPT can effectively support PSTs in acquiring accurate, useful information about disabilities, particularly learning disabilities. In this study, this support was limited to informational reference handouts. It should be interpreted as evidence of AI’s potential to support foundational knowledge acquisition rather than instructional practice or teaching effectiveness. From a pedagogical standpoint, AI tools may offer value in PST preparation, particularly as PSTs seek efficient, structured, and readily available content.
ChatGPT was also well received by PSTs, suggesting that it could function as an acceptable and feasible instructional support within coursework. However, positive perceptions alone do not indicate that AI use improves PSTs’ instructional decision-making, lesson design, or ability to support students with learning disabilities. As AI continues to be integrated into educational environments, preservice programs may consider incorporating AI tools into coursework in a structured and guided manner, emphasizing their use as supplements to traditional instruction rather than replacements for core pedagogical learning experiences.
These findings show that teacher preparation programs must explicitly address the ethical considerations, accuracy limitations, and verification requirements associated with AI-generated content. Users cannot assume the accuracy of AI-generated materials, and the present study highlights the importance of teaching PSTs to critically evaluate, corroborate, and contextualize AI outputs. Programs seeking to integrate AI should therefore pair AI use with instruction in information literacy, ethical decision-making, and responsible technology use. As preservice programs explore ways to integrate AI into their coursework, this study supports using AI for introductory learning tasks, such as synthesizing disability-related information, rather than for complex instructional planning or individualized intervention design. Coursework that introduces AI should reinforce that effective teaching requires deep content knowledge, instructional strategies, and professional judgment that extends beyond AI-generated suggestions.
Finally, while AI tools may help PSTs engage more efficiently with disability-related content, programs should avoid positioning AI as a solution to gaps in preservice preparation without additional empirical support. AI should be framed as one of multiple tools that, when used thoughtfully and critically, complement existing instructional approaches.
Implications for Research
The study’s findings highlight important areas for future research. Because this study was a small-scale, post-test-only pilot examining the accuracy of AI-generated informational materials and PSTs’ perceptions, the findings should be viewed as exploratory rather than confirmatory. The observed effect sizes, ranging from small to large, suggest meaningful differences that the limited sample size may have masked. These effect sizes indicate potential areas of inquiry rather than evidence of AI’s superiority over traditional methods.
Future research should continue to examine the accuracy of information generated by AI to ensure its output aligns with current, evidence-based practices in special education. The present study establishes a baseline for comparing AI-generated content with textbook sources but does not address how PSTs interpret, adapt, or apply AI-generated information in instructional contexts. As new research emerges, it will be important to systematically evaluate whether AI tools reflect updated knowledge and avoid outdated or inaccurate information.
Researchers should also investigate the impact and reliability of specialized AI tools and custom chatbots that are trained on peer-reviewed and practitioner content. While this study examined a general-purpose AI tool (ChatGPT), it does not provide evidence regarding the effectiveness or reliability of education-specific AI systems. Comparative studies examining general versus specialized AI tools may help clarify whether training data improves accuracy, relevance, and usability for educator preparation.
Future research should look at how training influences educators’ perceptions and use of AI. Although this study captured PSTs’ perceptions of usefulness and feasibility, it did not examine how structured instruction in AI use affects knowledge development, instructional planning, or professional judgment. Future studies could employ pre-post designs, experimental comparisons of training models, and qualitative methods (e.g., interviews, artifact analysis) to examine how AI literacy instruction shapes educators’ engagement with AI tools.
Finally, future research should explore how AI-supported knowledge acquisition interacts with broader preservice preparation experiences. This study did not examine whether AI use influences candidates’ ability to design lessons, develop IEP goals, or implement evidence-based practices in applied settings. Longitudinal and practice-based research designs are needed to determine whether AI-supported learning during preservice coursework translates into changes in instructional decision-making, classroom practice, or student outcomes for SLDs.
Conclusion
This study demonstrates the potential of AI, particularly ChatGPT, as an instructional resource for PSTs to understand and support SWDs. Our findings show that resources created with ChatGPT can be as accurate, if not more accurate, than those resources created through more traditional instructional methods using internet searches. The identified accuracy, along with the optimistic perspectives of PST participants, suggests a possible shift in how educators teach and inform their students in a personalized, timely manner, using the most up-to-date information. ChatGPT and other generative AI applications can provide educators with highly accessible resources and support the development of individualized classroom materials. As AI use grows in education, it is shifting from a support tool to a key part of preservice preparation and instruction. Therefore, preservice programs and professional development creators must purposefully incorporate AI, ensuring that current and future educators possess the ethical and practical skills necessary to implement it effectively. The integration of AI into teaching students with disabilities and across other areas of special education holds promise for creating effective, inclusive learning environments for all students.
Footnotes
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
