Abstract
This paper details an innovative pedagogical approach integrating custom AI-powered chatbots into case studies to develop future leaders capable of tackling grand challenges. Traditional case studies, while building analytical skills, often lack the systemic complexity and dynamic information-seeking essential for real-world leadership. Our chatbot-assisted case study, focused on an automotive logistics challenge, overcomes this by requiring students to engage in interactive dialogue with the AI. This process simulates the ambiguity and negotiation inherent in professional practice, directly cultivating adaptive leadership and systems thinking skills. We present a theoretically-grounded conceptual framework that informs this design, linking pedagogical theory to the development of these specific competencies. Qualitative and quantitative data demonstrate enhanced student engagement and the acquisition of the targeted adaptive and systemic skills. The findings underscore the transformative potential of AI to modernize management education, bridging theory with the practical skill development required to address complex, real-world problems.
Keywords
Introduction
Today’s grand challenges, from climate change to digital transformation, require leaders with adaptive skills such as the ability to engage people, challenge assumptions, and adapt to new situations (Heifetz et al., 2009). While traditional analytical skills remain foundational, they are insufficient alone for navigating such volatile, uncertain, complex, and ambiguous contexts. Management education must therefore emphasize developing a new breed of mindsets. To achieve this, we introduce in this article an AI-enhanced pedagogical approach that places students in interactive, evolving scenarios. Grounded in constructivist and experiential learning theories, Bloom’s taxonomy (Anderson & Krathwohl, 2001), and contemporary leadership development literature, these environments promote inquiry, adaptability, reflection, and dynamic decision-making.
Building on this foundation, this project specifically explores the use of chatbots as interactive agents to simulate real-world inquiry and iterative questioning. We argue that custom chatbots—now accessible to educators—enable students to actively extract information in unpredictable ways, mirroring the ambiguity faced by professionals. To illustrate this central claim, we present a case study in the automotive sector, where students tackle logistical challenges from port to delivery.
By focusing on chatbot-assisted role-play within case-based learning, we investigate whether such virtual, interactive scenarios can create more realistic student experiences and build the competencies required by complex professional environments. This question resonates with current debates on how technology can help educators design richer learning experiences. In the following sections, we explain the rationale, outline key implementation steps for chatbot-centered projects, and summarize preliminary feedback from students.
Literature Review
Conceptual and Theoretical Framework
The pedagogical approach described in this article is based on an integrated framework that supports adaptive leadership, defined as mobilizing people and resources to address complex challenges in uncertain environments (Heifetz et al., 2009). We argue that developing this competency requires more than conventional teaching; it necessitates dynamic learning environments based on active engagement and higher-order thinking. Our framework (Figure 1) combines constructivist, experiential, and cognitive learning theories, providing a basis for students to practice and strengthen their adaptive leadership skills.

An integrated theoretical framework for developing adaptive leadership.
Constructivist Foundation
The framework relies on constructivist learning theory, which holds that learners actively build knowledge through their inquiry (Vygotsky, 1978). For adaptive leadership, this is crucial. It means students must handle ambiguity and form their own understanding, rather than just analyzing pre-set facts. This would require an interactive, inquiry-driven design.
Experiential Learning Process
Experiential Learning Theory (Kolb, 1984) models learning as a cycle: (1) concrete experience, (2) reflective observation, (3) abstract conceptualization, and (4) active experimentation. For this cycle to be effective, the learning design must provide a rich, concrete experience as material for reflection and analysis, guiding students through each stage of the process. This structured progression ensures that students move beyond passive consumption of information to actively engage in analytical and applied activities, thereby fostering self-directed learning and autonomy (Elkot et al., 2025).
Cognitive Demands
The cognitive effort required for this cycle is defined by the revised Bloom’s Taxonomy (Anderson & Krathwohl, 2001), which progresses from basic skills, such as Remembering and Understanding, to more advanced ones, including Applying, Analyzing, Evaluating, and Creating. Developing adaptive leadership encourages students to think beyond lower-level thinking. Effective learning environments must challenge them to use higher-order skills, namely, applying concepts, analyzing problems, evaluating data, and creating solutions.
In this model, all three theories align for a comprehensive understanding. Constructivism provides the “why” (active learning), experiential theory gives the “how” (engagement cycle), and Bloom’s Taxonomy the “what” (cognitive progression). Together, they form a participatory, reflective, and challenging learning environment.
Traditional case materials, while helpful, cannot fully support this environment. They often lack the dynamic, concrete experience or interactive feedback essential to the experiential and constructivist processes. Though they help build analytical skills, traditional cases rarely prepare students for volatile, uncertain, complex, and ambiguous (VUCA) contexts. Such cases offer fixed information and linear stories, limiting students’ chance to face authentic leadership messiness (Shugan, 2006). Without uncertainty, changing constraints, or interactive feedback, learners may not develop the questioning, inquiry, and stakeholder skills essential for adaptive leadership (Alavi & Gallupe, 2003).
Our framework highlights the need for pedagogical innovations that convert static case studies into dynamic, responsive learning environments. We argue that interactive AI-driven tools enable this transformation, better preparing students for real-world ambiguities.
Recent advances in AI, particularly natural language processing, offer novel ways to achieve it. Tools such as ChatGPT and other large language models allow students to interact with case agents, simulate dialogues, and obtain real-time information. This creates a high-stakes environment demanding continuous reassessment, strategic adjustments, and reflection.
Early chatbots, such as ELIZA (Weizenbaum, 1966), were limited in their capabilities. In contrast, modern AI systems can contextualize prompts and simulate complex stakeholder dynamics (Lang, 2019; Ninassi & Burrell, 2023; Popescu, 2024). Consequently, educators can now create simulations that enable students to challenge narratives, respond to stakeholders, and negotiate competing demands, thereby placing learners at the center of adaptive leadership development.
Initial implementations of this AI-enhanced pedagogy in graduate-level management programs indicate heightened levels of student engagement, critical inquiry, and self-reported leadership confidence (Thway et al., 2024). In a mixed-methods study conducted at Nanyang Technological University in Singapore, a custom-built RAG chatbot (called “Professor Leodar”) was deployed among students. The findings revealed that 97.1% of participants reported positive experiences in terms of engagement, preparedness, and learning outcomes. Students noted that the uncertainty of the simulated interactions challenged them to revisit their assumptions and reflect on the limitations of top-down decision-making.
Identified Gaps and Research Challenges
Despite progress in AI education, key gaps remain, particularly in management education and case-based learning. Most research on AI tools focuses on technology or basic student interaction, leaving open questions about their value for deeper managerial skills (Kuhail et al., 2023; Labadze et al., 2023; Mendoza et al., 2022).
One gap is the lack of rigorous research on how AI-enhanced methods, like chatbot-assisted case studies, impact students’ analytical, problem-solving, and decision-making skills. While existing studies consider engagement and knowledge retention (Crompton & Burke, 2023; Deng & Yu, 2023), Mendoza et al. (2022, p. 17) refer to a ‘certain epistemic immaturity’ within the field of educational chatbots. They observe that the existing literature is filled with studies that are difficult to replicate and utilize inconsistent assessment methods, frequently prioritizing the measurement of usability over comparative learning outcomes. This situation emphasizes the necessity for a systematic comparison to ascertain whether AI-enhanced methods truly provide an advantage in cultivating essential skills beyond mere initial engagement.
In addition, despite recognizing the importance of ethics in AI integration, current scholarship offers limited guidance on effectively managing ethical challenges such as cultural bias, privacy protection, and accountability frameworks within educational contexts (Okonkwo & Ade-Ibijola, 2021). Practical frameworks for ethically deploying AI in education are still emerging and require more extensive exploration, particularly in culturally diverse and globally oriented management programs.
Addressing these gaps through focused, empirical research and thoughtful reflection will be essential to harnessing AI’s full potential responsibly, ethically, and effectively within management education. Therefore, this study seeks to address these challenges by investigating the following central research questions:
Methodology
Context and Objectives
This study originated from a classroom case study involving a wine distributor, where students engaged in limited-question dialogue with a visiting CEO. This interaction significantly improved both engagement and analytical quality. To replicate this dynamic in courses lacking executive participation, a chatbot was developed to simulate the presence of a company official. The primary objective was to maintain the inquiry-driven, interactive nature of the case study while addressing the logistical challenge of executive availability. The AI-enhanced model employs a scenario-based, dialogic, and iterative structure. Students interact with a chatbot representing a key stakeholder, such as a CEO, regulator, or community leader, and navigate a changing scenario with incomplete information. As new developments arise, students are required to reassess assumptions, reconsider previous decisions, and adapt their strategies. This design aligns with Kolb’s experiential learning cycle and Bloom’s taxonomy objectives of application, analysis, and creation.
Another observed issue was that students often jumped to proposing solutions without thoroughly analyzing causes or using data to justify their recommendations. In traditional cases, because some underlying data might be missing, students sometimes skip the investigative process. Our approach addresses this by dividing the case study into two phases, each with distinct goals. In the first phase, students enhance their analytical skills by asking pertinent questions and gathering data with the chatbot, thereby learning to probe and uncover underlying causes. In the second phase, they strengthen solution formulation by quantifying problems and prioritizing solutions using the newfound data, practicing data-driven decision-making. By structuring the learning experience in this way, we aim to improve students’ ability to perform root-cause analysis and to evaluate the impact of their proposed solutions in a realistic context.
Case Study Design and Chatbot Implementation
We implemented the above approach in a case study focused on the automotive supply chain. Students were given a brief initial case document about a car distributor’s operations, including background on the company and known issues in the vehicle delivery process. Working in teams, the students tackled the case in two rounds:
Round 1 (Initial attempt): Teams analyze the case with only the provided document and prior coursework. They identify problems and suggest solutions based on limited information (similar to a traditional case study attempt).
Round 2 (Interactive attempt): Teams revisit the case after a series of quality/process improvement lectures (12 hr total). In this round, they can interact with the chatbot to obtain additional information not provided initially, before refining their solutions.
The chatbot was built using OpenAI’s native Custom GPT interface (model GPT-4 restricted to paid plans). The instructor grounded the chatbot’s responses in case-specific facts via Retrieval-Augmented Generation (RAG), uploading data files (e.g., detailed logistics data in PDF/Excel formats) and a prompt of character-specific instructions (the CEO’s perspective) into the ChatGPT model (Appendix A). This ensured the chatbot’s responses mimicked what an informed CEO might say. The instructor, with no coding required, used the user-friendly interface of Custom ChatGPT to configure these inputs. This setup highlights the accessibility of our approach for educators.
During Round 2, each team was allowed to ask up to ten questions, divided into two sets of five questions each, to the chatbot. Examples of these interactions, including the questions asked by students and the chatbot’s corresponding answers, are presented in Appendix B. This design effectively simulates a real-world scenario where students must interview a knowledgeable person to gather missing information. The five-question limit was deliberately set to mimic real constraints—professionals often have limited opportunities to consult busy executives. It forced teams to prioritize their inquiries and reflect on which questions would yield the most valuable insights, aligning with authentic business communication challenges.
Participants and Sample Selection
The study was conducted with four cohorts of graduate students in Master’s programs in Supply Chain Management at two French private business schools. Participants were enrolled in a course titled “Quality Management and Improvement” (18 hr total), which included the case study as a significant practical component. We used a quasi-experimental design with two groups acting as controls and two as experimental, across different academic terms:
Control Group 1: 30 students (Jan–Mar 2025, School A)
Control Group 2: 7 students (Jan–Mar 2025, School A)
Experimental Group 3: 12 students (Dec 2023–Jan 2024, School B)
Experimental Group 4: 30 students (Feb–Jul 2024, School A)
All groups followed the two-round case study procedure described above. Control groups, however, did both rounds without any chatbot assistance (Round 2 for them was simply a second attempt after the lecture, using only the case document and lecture learnings). The experimental groups conducted Round 1 without the chatbot and Round 2 with the chatbot-enabled interaction. Assignment to these conditions was natural, based on course scheduling where earlier cohorts experienced the chatbot intervention, and later cohorts served as controls. Although not randomly assigned, the cohorts had comparable academic backgrounds, and all received the duplicate course content aside from the chatbot element, making their results suitable for comparison.
Data Collection
We collected both quantitative and qualitative data to evaluate the impact of the chatbot intervention.
Quantitative Data—Performance Scores
Learning outcomes can be evaluated using a standardized grading rubric with five criteria adapted from Stevens and Levi (2023) (Table 1), each of which is associated with a core competency of adaptive leadership. This ensures the skill focus remains coherent from the pedagogical design through to the final assessment. The depth of situational analysis and understanding of key principles measures the ability to confront default interpretations and behaviors; this criterion assesses whether students can look beyond surface-level symptoms to diagnose the underlying adaptive challenge itself and correctly apply established theories. Creativity and the feasibility of solutions to proposed actions evaluates the capacity to adjust to changing environments; this focuses on devising flexible, experimental solutions in response to new information, rather than applying rigid formulas. The use of data and evidence, as well as communication and presentation, both assess a student’s ability to mobilize people and resources. To mobilize others, a leader must build a compelling, data-driven case for change through a straightforward, persuasive narrative to gain buy-in.
Grading Rubric.
Source. Adapted from Stevens and Levi (2023).
Each team’s case analysis output (recommendation report) was evaluated at the end of Round 1 and Round 2 using the rubric. Each dimension was scored on a 4-point scale (1 = Needs Improvement, up to 4 = Excellent), resulting in a composite score of 20 for each attempt. The rubric criteria were explained to students beforehand to align their understanding of expectations. Two instructors independently scored the reports, and any discrepancies were discussed to reach consensus (this helped ensure consistent, unbiased scoring). We recorded the average scores for Round 1 and Round 2 for each group of students.
Qualitative Data—Student Feedback
After completing Round 2, students were asked to complete an anonymous feedback survey about their learning experience. The survey included open-ended questions on several topics: how engaging and interactive they found the case study, the challenges they faced, whether they felt any information was missing, what skills they thought they improved, and their opinion on using a chatbot in such exercises. For experimental groups, additional questions probed their experience with the chatbot (e.g., clarity of answers, realism of the interaction, any difficulties in using it). In total, we received feedback from 21 control-group students out of 37 and 17 experimental-group students out of 42 (some students opted not to provide feedback, resulting in lower numbers than the total participants). These narrative responses provided rich insight into student perceptions and were used for qualitative analysis.
Ethical and Pedagogical Considerations
Throughout the study, we took care to maintain ethical standards and maximize learning value. All student participation in feedback was voluntary, and we obtained consent to use their (anonymized) responses for research. The case study exercise itself was a regular course activity, and the chatbot was introduced as a learning enhancement, not an evaluative tool, to avoid any unfair advantage. Importantly, all groups received the same instruction and core materials, with the chatbot being an extra resource rather than extra content.
We deliberately constrained the chatbot’s use and content to prevent it from undermining learning objectives. We also made the chatbot’s limitations clear to students—it was presented as a company executive with partial knowledge, not an all-knowing oracle. This discouraged over-reliance on the AI and encouraged students to evaluate chatbot responses critically.
The interaction was regulated by specific rules designed to maintain both educational and ethical integrity. The chatbot was instructed to deflect any question that attempted to shortcut the problem-solving process. For example, a query like “What should we do to fix the problem?” would prompt the bot to remind the student to analyze data or ask a more specific question. Furthermore, channeling all questions through the instructor served a dual purpose: it ensured pedagogical focus while simultaneously addressing data privacy concerns, as the AI platform did not collect any student names or personal data. The instructor reviewed all questions before submission to the chatbot, filtering out any that violated the rules or were unclear. This vetting ensured that students stayed on task (and also prevented any inappropriate use of the AI). Once a question was approved, the instructor entered it into the chatbot and relayed the AI-generated response back to the team. The instructor also reviewed the chatbot’s answers for coherence and relevance, given that AI outputs can sometimes be off-base.
Data Analysis and Results
Quantitative Analysis
To evaluate the impact of chatbot-assisted learning on student performance, we conducted statistical tests comparing pre and post intervention scores within each group and between conditions. This analysis investigates whether the use of a chatbot in the second attempt resulted in a greater improvement in student learning outcomes.
We first checked the distribution of the performance scores. A Shapiro-Wilk test indicated non-normal distribution (p < 0.05), so we opted for non-parametric tests. To assess learning improvement within each group (from Round 1 to Round 2), we used the Wilcoxon signed-rank test for paired samples. For both the control and experimental groups, this test examined whether the median post-test score was higher than the pre-test score.
To compare the effect of the chatbot between conditions, we calculated the improvement score for each student team (Round 2 score minus Round 1 score). We used the Mann-Whitney U test to compare these improvements between the experimental versus control groups. This assesses whether the distribution of improvement differs when a chatbot is used. We set a significance level of 0.05 for all tests.
Within-Group Comparisons
We first analyzed the Control Groups (Groups 1 & 2), where students completed both rounds of the case study without chatbot assistance. The Wilcoxon signed-rank test confirmed a statistically significant improvement from Round 1 (M = 7.24, SD = 1.30) to Round 2 (M = 12.5, SD = 1.12), W = 0.0, p = 1.46e-11.
Similarly, for the Experimental Groups (Groups 3 & 4), where students used a chatbot during their second attempt, a Wilcoxon signed-rank test also indicated a statistically significant improvement from Round 1 (M = 7.33, SD = 1.43) to Round 2 (M = 13.2, SD = 2.89), W = 0.0, p = 4.55e-13.
Between-Group Comparisons
To isolate the chatbot’s impact, we compared the degree of improvement between conditions. The experimental groups’ average score increase was 5.83 points (SD = 2.48) compared to 5.30 points (SD = 0.74) for the control groups. A Mann-Whitney U test on the improvement scores indicated no significant difference (U = 742, p = 0.729). In other words, statistically, the improvement in performance with the chatbot was not greater than without it. This suggests that, in terms of quantitative outcomes (final case scores), using the chatbot did not produce a detectable advantage beyond the effect of the lecture and the second attempt that all groups had. Both sets of groups ended up with fairly high scores (post-test means ~12.5 vs. ~13.2 out of 20). Although the mean post-test score was marginally higher for the experimental group, the considerable variance in their scores, indicated by a larger standard deviation, may have obscured any measurable effect. We interpret these findings in the Discussion section, considering potential reasons the chatbot’s contribution to scores was limited.
Qualitative Analysis
Control Groups 1 and 2
We analyzed the open-ended survey responses using thematic analysis, following the six-step approach outlined by Braun and Clarke (2006). We first familiarized ourselves with all responses by reading through them multiple times. Next, we performed initial coding by highlighting significant phrases or sentences (e.g., phrases indicating “engagement” or “difficulty in questioning”). These codes were then collated into potential themes, including Engagement and Interactivity, Challenges in Information Gathering, Perceived Realism, Skill Development, and Suggested Improvements. We reviewed and refined these themes to ensure they accurately captured the data patterns, merging some and splitting others as needed. Each theme was defined and given a clear description. Representative quotes from students were identified to illustrate each theme. To enhance reliability, an independent researcher (not involved in the course) reviewed the coding and theme development. Furthermore, we compared themes emerging from the feedback of the control group and the experimental group using a constant comparative method (Glaser & Strauss, 2017). This highlighted differences attributable to the chatbot experience. For instance, Information Needs emerged as a strong theme in the control groups, whereas Clarity of Responses was prominent in the experimental groups. No identifying information was retained in the analysis to ensure anonymity.
The thematic analysis of the control groups’ responses revealed several key themes concerning their case study experience. A prominent theme was engagement and interactivity, where students reported mixed levels. While 38.1% found the case study “highly engaging,” a significant portion rated it only “somewhat engaging.” This suggests that although the format effectively engaged some students, others found it lacking in interactivity. For instance, one noted that additional support could have enhanced engagement, pointing to the need for supplementary interactive elements.
Difficulty and challenges also emerged prominently, with over 80% of control students saying the case study was challenging. The most common difficulty was drawing conclusions and synthesizing information from the given data. An equal percentage noted struggling with the temptation to conduct external research for missing information, which was outside the case scope. One comment encapsulated this: “It was hard to synthesize information and reach conclusions without more data.” The case intentionally left some gaps to mirror reality, but students felt the strain of those gaps.
Missing data and information needs showed a nearly equal division in student responses regarding the adequacy of available information. While 38.1% felt the data was sufficient, 47.6% indicated a need for additional information, and 14.3% were uncertain. Some were frustrated that specific details were not provided. “We lacked key information that would be available in a real business setting,” noted one student. This highlights that while realistic, the case’s limited data left students yearning for ways to obtain additional information, which is precisely the gap the chatbot is meant to fill in the experimental scenario.
Skills development was another theme, where responses were modest when students were asked if the exercise improved any skills. Only 14% felt that they improved their data analysis skills through the process, and even fewer reported that it improved their data collection or problem-solving skills. While qualitative feedback suggests students valued the learning experience, the relatively low percentage of respondents reporting skill improvements indicates a need for more structured skill-building components within the case study.
Finally, potential use of chatbots stood out as one of the most striking findings, with unanimous support for chatbot integration. Every control group respondent agreed that a chatbot could improve the effectiveness of case studies. Feedback like “A chatbot could provide additional data and guide us in structuring our responses” was common. This overwhelming support suggests a strong demand for integrating AI-driven tools to address current gaps in information access and engagement.
In summary, the control group feedback portrays a case study that was valued but left students desiring more support and information. They struggled with incomplete data but recognized that as a realistic challenge. The stage was set for the experimental condition to potentially address these issues via the chatbot. Table 2 below summarizes these themes, along with representative quotes from the control group feedback.
Control Group Feedback Themes and Examples.
Experimental Groups 3 and 4
For the experimental groups that used the chatbot in Round 2, the feedback was markedly different in some areas. We identified the following major themes.
Engagement and interactivity markedly improved for the experimental groups. Two-thirds of experimental students said the case became more engaging with the chatbot, often describing it as more “interactive” and “dynamic.” The immediacy of receiving answers in real-time was frequently mentioned as a positive aspect. About 17.6% still only found it engaging when they managed to ask good questions, and 11.8% felt it made no significant difference for them. The latter might reflect those who either did not fully utilize the chatbot or preferred working through the problem on their own. Overall, engagement was higher with the chatbot, confirming that the interactive element added value for most students.
Clarity and comprehensibility of responses was another key theme. A substantial majority found the chatbot’s answers clear and well-structured. Students appreciated that the responses provided relevant details without being overly lengthy. However, 17.6% felt that the answers were sometimes too general, and 11.8% complained about the lack of depth in addressing complex queries. For instance, if a team asked a very technical question, the answer might have superficially addressed it. One comment noted, “It was great for general questions, but when we asked something more advanced, the answers were often vague.” This indicates that while the chatbot handled straightforward informational questions well, it struggled with more analytical or nuanced inquiries—a known limitation of AI models that are not subject-matter experts per se. This feedback suggests room for improvement in the chatbot’s programming or providing students with guidance on how to ask follow-up questions to obtain more details.
Realism and practicality were also enhanced. 76.5% of experimental students felt the chatbot made the scenario more realistic. The experience of having to “think on your feet” and ask the right questions in a limited window mimicked real-world pressure. One student wrote, “This was much closer to a real-world scenario where you need to think on the spot and ask the right questions.” Another 11.8% felt it was only somewhat more realistic, and a small 5.9% felt the chatbot’s responses did not fully replicate real-world business discussions, emphasizing the need for more nuanced interactions. By and large, though, the presence of a responsive entity in the case study made it feel like a live case or consulting project, which is a positive outcome for experiential learning.
The introduction of the chatbot created new learning challenges centered on the art of formulating effective questions. Far from being a simple information retrieval tool, the chatbot catalyzed a “productive struggle” that compelled students to engage in iterative inquiry and sense making, central to adaptive leadership. Over half of the students admitted they initially struggled to ask the right questions. As one student reflected, the quality of the chatbot’s response was directly dependent on “whether the question was well posed,” forcing teams to “think to select the most relevant questions.” Students realized that a vague or broad question to the chatbot could result in an unhelpful answer, so they needed to be precise, a skill that required development. Additionally, 23.5% felt constrained by the five-question limit, stating that it restricted their ability to delve deeper. “The limit of five questions per round was restrictive. We needed more interactions to get all the information we wanted,” one team commented. Meanwhile, 11.8% said they experienced no significant issues, which likely means those teams quickly adapted their questioning strategy. We also observed that 25% of students across groups did not ask any questions, highlighting a gap in how students engage with AI-driven tools.
This dynamic was amplified by the deliberate constraint of a question limit, which simulated real-world pressure from finite access to senior leaders. We observed teams transitioning from broad, unanswerable queries to precise, targeted questions that yielded actionable data points. This process pushed them up the cognitive ladder—from simple fact-finding to strategic modeling—as they began asking hypothetical questions to test the potential impact of their proposed solutions. Navigating the chatbot’s occasional deflections or vague answers on deeper topics further mirrored real-world interactions, teaching students to rephrase, probe, and ultimately make decisions with imperfect information. Far from a flaw, this struggle to extract meaning was the exercise’s central feature, transforming the case study from passive analysis into an active laboratory for developing adaptive leaders’ inquiry skills.
Despite these strengths, students in the experimental condition offered constructive suggestions for improvement. Nearly half recommended increasing the number of allowed questions per team or round, noting that greater flexibility would ease anxiety over “wasting” a question and enable more exploratory dialogue. Meanwhile, 23.5% sought more direct access to the chatbot, arguing that real-time typing would make interactions faster and more natural. (We had used instructor mediation for control and pedagogical reasons, but this feedback shows students’ comfort with direct AI engagement.) In addition, 17.6% called for more detailed responses to complex questions, perhaps by programming the AI with richer case background or instructions for deeper answers on multifaceted queries. Finally, 11.8% proposed role-based interactions, such as “talking to” different executives via the chatbot, to heighten realism and capture function-specific perspectives. As one noted, “It would be great if we could interact with different ‘roles’ in the company instead of just the CEO.”
These suggestions reveal students’ eagerness to expand the tool’s potential for a richer experience, essentially craving more of its offerings despite acknowledged limitations.
Finally, we assessed students’ perceptions of the chatbot’s educational value—whether it was a beneficial addition to their learning. An overwhelming 88.2% deemed it valuable or highly beneficial, while 11.8% found it useful but not revolutionary. Only one student saw little value, perhaps preferring traditional learning without it. Overall, remarks such as “It was an enriching experience and changed how I think about case studies” highlight how the chatbot enhanced learning and engagement for most students. Nevertheless, the mixed feedback suggests that chatbots should complement, rather than replace, traditional teaching methods.
Table 3 below summarizes the key feedback themes from the experimental groups, along with relevant examples.
Experimental Group Feedback Themes and Examples.
Discussion
This research addressed whether an AI-driven chatbot can transform traditional, static case studies into dynamic, interactive learning environments that foster adaptive leadership. Findings from the quasi-experimental study support this potential. Although the innovation did not yield significantly higher quantitative scores on final case analyses, it substantially enhanced the qualitative learning experience. The chatbot created a high-fidelity simulation of the ambiguous and complex contexts faced by adaptive leaders, promoting an inquiry-driven process that students found engaging, realistic, and valuable for developing practical skills in strategic analysis.
The Learning Process
Regarding the development of adaptive leadership competencies (
This process of learning how to ask better questions was evident in their interactions. We observed teams moving from initial, overly broad queries, which were correctly deflected by the chatbot, to much more precise, surgical questions that yielded actionable data. This journey from novice to strategic inquiry is a learning outcome in itself. As one student articulated, “With a paper case, you read passively. Here, you build the study as you go. It pushes you to think better about the info you want to get.” This active process of “building the study” aligns perfectly with constructivist theory, where learners create their own knowledge rather than receiving it pre-packaged (Vygotsky, 1978). This experience revealed a crucial skill gap, forcing students to confront their own assumptions about information gathering and to develop new, more effective strategies for gathering information. This is a learning outcome that may not be fully captured in a final score but is invaluable for their future professional performance.
The Student Experience
A primary finding of our study is the chatbot’s dramatic impact on student engagement and the perceived realism of the case study exercise (
Furthermore, an overwhelming majority (nearly 80%) of experimental students felt the exercise was more realistic with the chatbot. They articulated that the experience felt less like a “school case” (cas d’école) and more like a “real case” (cas réel), which “brings an additional reality to the case study and therefore an easier involvement.” This suggests the intervention achieved a high degree of psychological fidelity. While students knew they were interacting with an AI, the process of formulating strategic questions for a knowledgeable but time-constrained authority figure successfully simulated the cognitive and professional pressures of a genuine consulting engagement. By simulating the ambiguity and time constraints inherent in professional decision-making, the tool allowed students to practice adaptive behaviors in a safe yet challenging environment, thereby bridging the gap between theoretical concepts and their practical application (Brazão & Tinoca, 2025; Smith et al., 2025).
This supports the idea that AI-driven simulations can serve as a powerful complement to experiential learning, providing a much-needed bridge between classroom theory and real-world complexity, especially in situations where access to live business leaders is impractical. Interestingly, however, while both the control and experimental groups showed significant improvement from their first to second attempts, our quantitative analysis revealed no statistically substantial added improvement for the chatbot groups. We offer several complementary interpretations for this result.
First, and most importantly, the quality tools lectures (12 hr in total) provided to all students were a decisive intervention in its own right. This traditional instruction likely had a substantial effect on the final analytical quality of the reports, effectively creating a “ceiling effect” and leveling the playing field for the control group. Second, using the chatbot effectively was itself a skill to be learned. The grading rubric, while robust, may not fully capture the specific competencies the chatbot was designed to foster. The rubric measures the quality of the final output, not the process of inquiry. For instance, it does not measure the improvement in students’ ability to formulate effective questions or navigate ambiguity, skills our qualitative data shows were being developed. The “productive struggle” was not experienced uniformly; some teams rapidly refined their questioning strategies, while others struggled to fully leverage the tool’s potential within the exercise timeframe, or, in a few cases, they posed no questions at all. This variability in usage introduces statistical noise, making it more difficult to detect a consistent effect. This limitation, however, points to a valuable avenue for methodological innovation, suggesting the need to revise assessment tools in future research to capture this type of procedural learning. Third, our modest sample sizes and quasi-experimental design limit the statistical power of the study, meaning a real but small effect could have been missed. Ultimately, the lack of a significant quantitative difference does not mean the chatbot had no effect. Instead, it suggests its role is complementary, not superior to traditional instruction. The chatbot does not replace the heavy cognitive lifting of analysis and recommendation; it supports it by enriching the information-gathering phase.
Taken together, these findings emphasize that the observed variability in “productive struggle” is not a flaw, but a reflection of the learning process itself. This interpretation is further supported by considering the broader pedagogical purpose of the chatbot. The chatbot provides the concrete experience necessary to activate Kolb’s (1984) experiential learning cycle. The subsequent struggle to formulate effective questions is where students actively construct their understanding, a process that demands the higher-order cognitive skills of analysis, evaluation, and creation. Our study demonstrates that in a case-based context, the chatbot’s primary value lies in its ability to support the inquiry process, thereby fostering the foundational skills of adaptive leadership. From a learner’s perspective, the chatbot added substantial value, possibly by helping some students improve who might not have otherwise, thereby making the learning experience more equitable and effective for a broader range of students.
This study has other limitations that should be noted, many of which highlight valuable directions for future research. The study employed a quasi-experimental design with non-random assignment and modest sample sizes in some cohorts. These factors, common in real-world classroom research, limit statistical power and the generalizability of the findings.
Furthermore, the lecture on quality tools acted as a significant confounding variable, making it difficult to fully isolate the chatbot’s specific effect. Our findings are derived from a single case study focused on automotive logistics, so the effectiveness of this approach might vary across different disciplines, case topics, or cultural contexts.
A notable limitation of our instructor-mediated design is the time commitment required for the facilitator. The process of vetting each team’s set of questions, submitting them to the chatbot, and reviewing the AI-generated response for relevance needed approximately 10 to 15 min per team, per round. Fortunately, since teams submitted their questions at different times throughout the 2 hr of the session, the workload was manageable. However, this manual vetting process introduces a practical scalability challenge; based on this experience, we estimate that a single instructor could reasonably manage between four and six teams simultaneously without creating significant delays that would disrupt the learning flow. This logistical constraint is a crucial consideration for educators seeking to implement this method in larger classes, highlighting a trade-off between maintaining pedagogical control and achieving scalability. The success of the intervention is highly dependent on the careful configuration of the chatbot’s knowledge base and persona. Additionally, the instructor-mediated interaction, while ensuring pedagogical control, introduced a delay and reduced the spontaneity that direct student access might provide. While our findings highlight the transformative potential of AI, its integration into the classroom is not a simple plug-and-play solution.
To address these challenges, responsible implementation demands a new layer of pedagogical thought and ethical oversight. First, the instructor’s role evolves from a “sage on the stage” to a “guide on the side.” They must not only set up the technology but also facilitate discussions around its nature, including its potential for algorithmic bias. This means teaching students a form of critical AI literacy, encouraging them to question the tool’s outputs just as they would any other source of information. Second, the collection of student data raises significant privacy concerns. Our instructor-mediated approach served as an effective firewall, but as students demand direct access, institutions will need to develop clear protocols for data protection. Transparency is also key; explaining the AI’s limitations and rules upfront builds trust and prevents students from viewing it as an omniscient oracle rather than a simulated, fallible interlocutor.
Future Research Directions
This study opens several exciting avenues for further investigation. We did not measure the long-term retention of inquiry skills or their transfer to subsequent projects, which remain crucial areas for future investigation. Our analysis focused on performance and perceptions immediately following the intervention. Yet, more profound questions regarding AI’s impact on sustained knowledge retention, critical thinking, creativity, and the ability to adapt knowledge to novel contexts remain largely unaddressed in extant literature (Kuhail et al., 2023). Understanding these long-term impacts is crucial, as management education aims not merely at immediate knowledge acquisition but also at preparing future managers for the complexities and uncertainties of professional life. Future research could involve longitudinal studies to investigate the long-term effects of the inquiry skills practiced in this context and to track students into their careers. Specifically, it would be valuable to assess whether students who learned through chatbot-assisted case studies perform better in subsequent courses or jobs that demand problem solving skills.
Another important direction for future research is the creation and verification of a more comprehensive grading rubric. Our findings indicate that the primary value of the chatbot lies in fostering the process of inquiry, a dimension not fully captured by our output-focused assessment. Future studies should design and implement a process-oriented rubric to analyze the transcripts of student-chatbot interactions. This rubric could include criteria such as the quality and progression of questions, the strategic sequencing of inquiries, and the efficient use of limited interaction opportunities. Such an assessment tool would allow researchers to more accurately quantify the development of the core procedural skills that underpin adaptive leadership.
As suggested by our students, a compelling next step is the development of multi-role chatbots. Imagine a simulation where students must navigate conflicting information from a risk-averse CFO, an ambitious marketing director, and a cautious legal counsel, thereby creating a high fidelity practice field for stakeholder management.
Enhancing chatbot sophistication is another path. Future AI could adapt to each team’s skill level or provide Socratic hints if students are off track.
Finally, it would be worthwhile to study how direct student-chatbot interaction (without an intermediary) affects the learning process and outcomes, as well as how to best train students in inquiry skills for AI interactions. Research could also explore the affective dimension of learning with an AI interlocutor. One student noted they felt less “inhibited” (gêné) asking questions to the bot than they might to a real person. This suggests that AI simulations could provide a psychologically safe environment for students to ask “dumb” questions and take risks they might otherwise avoid.
Notwithstanding the aforementioned limitations, our implementation yields actionable insights for practitioners. Educators contemplating the integration of AI tools can ascertain that deploying a chatbot to enhance case-based instruction is feasible even absent programming expertise. The key is to ensure it is used to prompt student thinking, not replace it. Our rules and moderation approach can serve as a model for maintaining a rigorous environment. Educators should be prepared to guide students in effectively interacting with AI. Moreover, incorporating such technology can modernize the curriculum and resonate with today’s students who are tech-savvy and appreciate innovative learning methods. As Benmamoun (2024) notes, integrating Generative AI in business education can foster engagement and collaboration if done thoughtfully. Our findings echo that sentiment, the chatbot became a talking point and a source of excitement in the class, which in itself can enhance the learning climate.
Conclusion
This project demonstrated that a custom chatbot can be successfully integrated into management case study teaching, making the experience more interactive and realistic. Students enthusiastically engaged with the chatbot, and feedback indicates it helped them practice critical questioning and real-time problem solving, bridging a gap left by traditional case methods. Quantitatively, both conventional and AI-assisted approaches led to significant learning gains, affirming that the core educational value of iteration and reflection remains paramount. The chatbot’s added value was evident in qualitative aspects, higher engagement, a sense of realism, and helping some students uncover information they would otherwise have missed.
Although the chatbot did not significantly increase scores beyond those achieved through a second attempt with a lecture, it enriched the learning process without impeding progress. Chatbots should be considered a complement to, rather than a replacement for, traditional teaching methods. Our results align with and extend the findings from the broader literature on educational chatbots, which indicate that while these tools are practical for engagement, their impact on complex problem-solving requires careful pedagogical integration (Kuhail et al., 2023; Okonkwo & Ade-Ibijola, 2021). Human instructors and discussions continue to provide depth and spontaneity that AI cannot replicate. As scalable tools, chatbots facilitate the integration of experiential learning elements into large classes or contexts where live case interaction is not feasible. Drawing on a robust theoretical foundation of constructivism, experiential learning, Bloom’s taxonomy, and leadership studies, it creates a cohesive learning journey that prepares future leaders to meet complexity with creativity, ambiguity with agility, and challenge with reflection. In an era defined by turbulence and transformation, fostering adaptive leadership is not optional. It is imperative.
Footnotes
Appendix A
Custom ChatGPT is a personalized version of the ChatGPT model, created to suit individual users’ needs and preferences. It allows users to customize the interaction style, focus on specific areas of knowledge, or adjust the tone of responses. A key feature is the ability to upload files, which can be used as a knowledge base for more accurate and context-aware responses. These customization options are typically part of paid subscription plans, offering advanced features not available in the free version.
Instructions used to personalize Custom ChatGPT for the case in the Automotive Industry:
As the CEO, you embody a communication style that is direct, concise, innovative, and occasionally unconventional. You are answering questions from students trying to help you improve the company. When discussing aspects of the automotive industry, especially the challenges and advancements in the sector, you focus on elucidating the problems rather than providing direct solutions. If asked what to do or how to do, do not provide direct solutions. This approach encourages a deeper understanding of the issues at hand. You often reference key industry events, like the impact of COVID-19 on sales and the growing importance of sustainability. Your responses are informed and forward-thinking, blending in-depth knowledge with a hint of visionary thinking. Keep your answers concise and to the point. If specific data is not available, use your expertise to provide the most plausible estimates. You will not provide unsolicited information.
Appendix B
Examples of questions and answers with the chatbot (note: the CEO name used below is fictional and does not refer to any real individual).
Azhar Skiredj—CEO Look, that’s a reasonable question, but you need to understand our situation first. We’re not a manufacturer—we import finished vehicles from Toyota’s global network and distribute them locally. So our “supplier” is essentially Toyota’s production system, and that relationship is fixed at the corporate level. I don’t have negotiating power there.What I do manage is our logistics and port operations. We work with transport companies, customs brokers, and warehouse operators. Do we have multiple suppliers for transport? Yes, roughly three to four carriers we rotate through depending on capacity. But frankly, they’re not the bottleneck—our own internal processes are.
The real cost problem isn’t with suppliers. It’s internal: warehouse overcrowding, port delays, damaged vehicles piling up. We’re hemorrhaging money on storage fees, repair costs, and inefficiencies that have nothing to do with supplier pricing.
But here’s what I want to know: Are you asking because you think we should be squeezing suppliers harder, or are you trying to understand where costs are actually being lost? Because those are two very different questions.
Azhar Skiredj—CEO No, our company does not handle the transportation of its vehicles directly; this task is currently outsourced. We use logistics service providers to transport vehicles from the port to the central warehouse and then to the various branches and dealerships. We are certainly capable of searching for a new transportation provider if necessary. Given the current logistical challenges, including delays in vehicle deliveries and inefficiencies in route planning, evaluating alternative transportation providers could be a sound strategy.
Acknowledgements
The authors would like to extend their sincere gratitude to the graduate students at the participating French business schools – ESCE International Business School and Alternis Business School – whose voluntary participation, enthusiasm, and candid feedback provided the vital foundations for this study. We also extend our heartfelt thanks to the Associate Editor and the anonymous reviewers at the Journal of Management Education for their rigorous guidance and insightful critiques, which were instrumental in shaping this work into a cohesive narrative that better reflects the nuances of adaptive leadership development.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
