Personalized Driven Instruction Through Explainable Agentic AI in Multicultural Higher Education Environments

Abstract

The need for intelligent, transparent, and adaptive personalized instruction systems has increased due to the quick diversification of higher education settings and the exponential expansion of educational big data. In order to address the pedagogical, cultural, and cognitive variability present in multicultural higher education settings, this study suggests a Big Data–Driven Personalized Instruction framework powered by Explainable Agentic Artificial Intelligence (X-AI). The suggested approach uses autonomous agentic AI architectures that can orchestrate goal-directed learning, dynamic learner profiling, and real-time instructional adaption by utilizing large-scale, multimodal educational data, namely the HarvardX-MITx Person-Course Dataset. Explainability mechanisms are incorporated at both the model and decision levels to guarantee pedagogical trustworthiness and ethical deployment. This allows for interpretable insights into learner performance projections, instructional recommendations, and adaptive intervention tactics. To assist teachers in comprehending cross-cultural learning patterns and reducing algorithmic bias, the system incorporates feature attribution, causal inference, and visual analytics. When compared to traditional data-driven personalization techniques, the suggested methodology dramatically increases learning outcome prediction accuracy $F 1 = 0.88$ , instructional relevance, and student engagement, according to experimental evaluations carried out on diverse higher education datasets. This work advances a scalable and culturally responsive paradigm for personalized education by combining big data analytics, agentic autonomy, and explainable AI. It provides educators, institutions, and policymakers looking to transform higher education systems in an equitable, transparent, and data-driven manner with practical insights.

Keywords

big data explainable agentic AI higher education instruction multiculture personalization

Introduction

A “Big Data” revolution, marked by the unprecedented volume, velocity, and variety of learner-generated data, has been sparked by the digitization of higher education.¹ With the growing use of Learning Management Systems (LMS) and Massive Open Online Courses (MOOCs) by institutions, the student body has evolved from a confined, homogeneous group to a heterogeneous, worldwide dispersed community. This change poses a significant pedagogical challenge: how to provide individualized education that is both culturally sensitive and cognitively appropriate at scale.²

Predictive modeling, which categorizes pupils as “at-risk” or “safe” based on past data, has been the main emphasis of traditional learning analytics (LA).³ These models are useful, but they are essentially passive. They do not have the agency to act independently, but they can narrate what has happened or forecast what may happen. Agentic AI—systems that can perceive their surroundings, reason about objectives, act to alter the system’s state, and reflect on the results—is now undergoing a paradigm shift.^4,5 An agentic system in an educational setting does more than just identify a student who is having difficulty; it also functions as an independent instructor, changing the curriculum order, providing remedial material, or instantly adjusting the level of assessment difficulty.

However, there are two significant concerns associated with using Agentic AI in multicultural settings: bias and opacity. Large Language Models and Deep Reinforcement Learning are two examples of complex agentic systems that function as “black boxes”.⁶ These autonomous agents need to be explicable in order for educators to have faith in them. Moreover, agents aiming for basic reward metrics (such as course completion rates) may unintentionally adopt policies that discriminate against subpopulations in the absence of explicit restraints, perpetuating historical injustices found in the training data.⁷

A new framework for culturally-aware agentic personalization is presented in this publication. We use the HarvardX-MITx Person-Course Dataset.⁸ to predict learner behavior across more than 100 countries, in contrast to earlier research that relies on limited datasets (such as OULAD.⁹). By explicitly incorporating Fairness Constraints derived from Hofstede’s Cultural Dimensions, we formally characterize the customization problem as a Constrained Partially Observable Markov Decision Process (CPOMDP).¹⁰ We show that we can develop transparent and efficient agents by combining Offline Reinforcement Learning with Shapley Additive exPlanations (SHAP).

Related Work

Agentic AI in education

An increase in autonomy can be seen in the transition from Intelligent Tutoring Systems (ITS) to Agentic AI. Rule-based expert systems were the foundation of early ITS.¹¹ Reinforcement Learning (RL) is used by modern Agentic AI to extract the best teaching practices from data. In order to improve the order of learning activities, for example, recent work has represented student interaction as a Markov Decision Process (MDP).^12,13 However, the majority of current agents presume a “universal learner,” ignoring the significant influence of cultural background on learning behaviors (e.g., collaborative vs. competitive engagement, help-seeking).¹⁴

Explainable AI (XAI) and visual analytics

The demand for XAI has increased as AI models get more complicated. For black-box models, methods such as SHAP and LIME offer post-hoc explanations.^15,16 For “Human-in-the-Loop” decision-making in education, where educators must verify algorithmic suggestions, XAI is essential.¹⁷ The interface for this validation is provided by visual analytics dashboards, which convert intricate feature attributions into useful educational insights.¹⁸

Fairness in educational data mining

There is ample evidence of algorithmic bias in schooling. For minority groups, models that were trained on majority populations frequently perform poorly.¹⁹ By using strategies like adversarial debiasing and re-weighting, fairness-aware machine learning aims to lessen this. However, because the agent’s actions impact the future data distribution, maintaining fairness in sequential decision-making (RL) is much more difficult.²⁰ In order to overcome this, we use restricted optimization to explicitly incorporate fairness into the agent’s optimization goal.

Mathematical Framework

We use a CPOMDP to simulate the interaction between the multicultural learner and the agentic tutor.

State space and cultural embedding

Unlike completely observable MDPs, educational environments include culturally mediated help-seeking behavior, motivation levels, and latent cognitive competency that are not readily apparent in log data. Even if belief-MDP formulations probabilistically approach concealed states, they do not explicitly include fairness criteria across protected cultural features. We characterize the instructional personalization problem as a CPOMDP to take into consideration the uncertainty in both knowledge tracing and culturally contextualized behavioral adaption. This idea enables the simultaneous optimization of educational effectiveness and group-level justice in the face of uncertainty.

Let $S$ be the state space. The state $s_{t}$ of a student at time $t$ is composed of three vectors:

s_{t} = [k_{t}, b_{t}, c]

where:

$k_{t} \in R^{d}$ : Mastery of $d$ concepts is represented by the latent Knowledge State (e.g., tracked by Bayesian Knowledge Tracing).

$b_{t} \in R^{m}$ : The Behavioral State, which is based on recent log data (such as quiz attempts, forum posts, and movie pauses).

$c \in R^{6}$ : The vector of cultural context. We link each student’s nation to Hofstede’s six cultural dimensions (Power Distance, Individualism, Masculinity, Uncertainty Avoidance, Long-Term Orientation, Indulgence), in contrast to earlier studies that used straightforward “Country” designations.¹⁰ This enables the agent to make generalizations about nations with comparable cultural features.

Action space

The agent chooses an action $a_{t} \in A$ from a set of pedagogical interventions:

A = {Nudge, Simplify, Deepen, Connect Peer, Notify Instructor, Do Nothing}

Objective function: Reward and fairness

Because interventions affect future learning trajectories, demographic parity is justified in successive instructional situations. Longitudinal inequality worsens if some cultural groupings consistently receive less high-impact interventions. Therefore, equitable cumulative learning opportunities are ensured by imposing parity in predicted intervention benefits.

The agent aims to satisfy fairness requirements while maximizing a cumulative reward R.

The definition of the base reward function $r (s, a, s^{'})$ :

r_{t} = Δ {Grade}_{t + 1} - λ \cdot Cost (a_{t})

where

Δ Grade

is the improvement in assessment score, and

Cost (a)

penalizes expensive interventions (e.g., involving a human instructor).

Fairness Constraint: We impose Demographic Parity on the benefit of interventions across cultural clusters. Let $G$ be a set of cultural groups (e.g., High vs. Low Individualism).

\forall g \in G, E_{π} [\sum_{t} r_{t} | c \in g] \geq τ

This ensures the agent does not maximize global rewards by neglecting specific cultural groups.

Optimization via Lagrangian relaxation

In order to learn from the static HarvardX-MITx dataset without direct contact, we employ Offline RL (more precisely, Conservative Q-Learning or CQL) to tackle this constrained optimization issue.²¹

We construct the Lagrangian:

L (π, ν) = J_{R} (π) - \sum_{g} ν_{g} (J_{C g} (π) - τ)

where

J_{R}

is the expected return,

J_{C_{g}}

is the expected return for group

g

, and

ν_{g}

are Lagrange multipliers. The agent updates its policy

π

to maximize

L

, while simultaneously updating

ν

to enforce constraints (Dual Gradient Descent).²²

Methodology and System Architecture

Dataset selection: HarvardX-MITx

We use the HarvardX-MITx Person-Course Dataset (Academic Year 2013) to meet the need for a rigorous, multicultural dataset.^8,23

Scale: 641138 unique registrants.

Features: userid_DI (De-identified ID), course_id, final_cc_cname_DI (Country), LoE_DI (Level of Education), YoB (Year of Birth), nplay_video, nforum_posts, grade.

Cultural Mapping: We join this dataset with Hofstede’s index data,²⁴ mapping final_cc_cname_DI to the 6-dimensional vector $c$ .

Data Preprocessing: 1.

Filtering: We select learners who engaged with at least 5 chapters to focus on active participants.

Imputation: To prevent bias injection, missing demographic data is handled as a distinct category.

Normalization: Log transformation is applied to interaction counts (nevents).

Agentic workflow

Three loops make up the system design (Fig. 1):

FIG. 1.

The Fair-Adaptive Agentic Architecture (FA3).

The Perception Loop: Uses a Recurrent Neural Network (LSTM) to ingest raw clickstream records and update the Belief State $b (s_{t})$ .

The Decision Loop: The Policy of RL The best educational action is chosen by $π_{θ} (a| b)$ .

The Explanation Loop: The X-AI Module creates counterfactuals for recourse and calculates SHAP values for the state attributes to explain why an action was selected.

Experimental Results

Our Fairness-Aware Agentic (FAA) model was compared to two baselines: 1.

Rule-Based Heuristic (RB): Many LMSs employ standard “if-then” intervention logic.

Unconstrained PPO Agent (PPO): Only grade improvement is maximized by a typical Proximal Policy Optimization agent.

Performance metrics

In order to predict the expected reward on holdout data, models were assessed using offline policy evaluation techniques, particularly Doubly Robust (DR) estimation.

Analysis

Students from high individualism (Western) cultures, where the training data was densest, were disproportionately favored by the unconstrained PPO agent, which obtained the highest raw reward but showed a considerable fairness gap (0.24). Our FAA model showed the effectiveness of the Lagrangian constraints by reducing cultural discrepancy by almost 80% (0.05 gap) while sacrificing a modest margin of global utility (0.79 vs. 0.82).

CQL Sensitivity Analysis

The CQL conservatism parameter, α, was modified to fall between ∼0.1, 0.5, and 1.0}. While higher α values increased fairness stability and decreased overestimation bias, they also marginally reduced expected reward (−1.8% at α = 1.0). The ideal balance between incentive retention and fairness enforcement was achieved with a moderate α = 0.5.

Training complexity

O(N·T·|A|) for CQL with changes to LSTM beliefs. On an NVIDIA A100 GPU, learners were trained for 4.6 hours.18 ms for each decision’s inference latency makes it appropriate for real-time LMS integration. Fairness gap reduction (0.24 → 0.05) has a 95% bootstrap confidence interval (CI) of [0.16, 0.21], which is statistically significant (p < 0.01).

We used Fitted Q Evaluation and Weighted Importance Sampling in addition to DR estimate. Evaluation stability under model misspecification was confirmed by the consistent model ranking given by all three estimators, with variance limits coinciding within 95% CIs.

Cultural distance analysis

We examined how Cultural Distance from the U.S./centric norm affected the agent’s performance.

D_{cult} = | | Ⅎ b {c} {learner} - Ⅎ b {c} {U S} | |_2

Standard agents demonstrated a significant inverse relationship $(r = - 0.65)$ between prediction accuracy and cultural distance. The FAA model effectively handled intercultural variety, as seen by the lack of significant correlation ( $r = - 0.08$ ).

Explainability fidelity

Instead of using step-by-step explanations, fidelity is calculated over the entire learner trajectory. In particular, while maintaining sequential dependencies, we quantify the agreement between cumulative projected Q-values and SHAP-reconstructed contributions over time steps.

We computed the Fidelity of our SHAP explanations.

Fidelity = E [| f (x) - \sum ϕ_{i} |]

The explanation fidelity was > $0.95$ , ensuring that the insights provided to teachers accurately reflected the agent’s decision logic.

Ablation study

While the incentive increased by 3% when fairness limitations were removed, the fairness gap doubled. Removing the SHAP module decreased interpretability fidelity by 41% but had no effect on reward.

Visual Analytics for Educators

We created a Visual Analytics Dashboard (Fig. 2) to make the “Black Box” transparent.

FIG. 2.

Explainable, Actionable, and Cluster-Aware Visual Analytics for Agentic Educational Decision Support.

Figure 2 Description: •

Left Panel (The “Who”): Learners colored by Cultural Cluster in an t-SNE projection. Teachers can use this to determine whether the agent is handling certain clusters (such “High Uncertainty Avoidance”) differently.

•

Center Panel (The “Why”): A chosen student’s SHAP Force Plot. For instance, it demonstrates that the student’s “High Peer Connection” (derived from the forum network analysis) reduced the risk score while “Low Video Interaction” increased it.

•

Right Panel (The “What if”): A Counterfactual Interface. It answers: “What is the minimal change required to move this student from ‘Fail’ to ‘Pass’?” ◦

For instance: “If the student posts 2 more times in the forum, Probability of Pass increases by 15%.”

Discussion

Agentic AI adoption in education is an ethical requirement rather than just a technical advancement. Using the HarvardX-MITx dataset, our findings show that “blind” big data approaches run the risk of automating the advantages enjoyed by previously privileged groups (e.g., those from cultures whose learning styles coincide with Western MOOC design). We regain equity by explicitly modeling culture using Hofstede’s dimensions and applying mathematical limitations. In the limit of sufficient coverage, convergence to a conservative optimal Q-function is ensured under basic assumptions of CQL and bounded reward functions. We use fairness constraint recalibration and periodic policy re-estimation for non-stationary cultural distributions.

The suggested structure complies with GDPR Article 22 on the “right to explanation” and automated decision-making. Meaningful transparency is operationalized through our counterfactual recourse interface and explanation layer based on SHAP. To guarantee regulatory compliance for institutional implementation, audit logs and dashboards for fairness monitoring are incorporated.

The observed Constraint-Reward Trade-off (Table 1) supports the “Price of Fairness.” To purchase a significant increase in equality, we must pay a minor cost in aggregate efficiency. This is an investment that educational institutions must make. Additionally, the AI becomes a partner rather than an oracle when Visual Analytics is integrated. Teachers can detect system design failure as well as student failure using the SHAP-based reasons (e.g., if “Country” is frequently a top predictor of failure, the course content may be culturally biased).

Table 1.

Performance comparison of instructional Decision-Making models under accuracy–fairness trade-offs

Model	Avg. reward (normalized)	Dropout prediction F1	Fairness gap (high vs low IDV)
Rule-Based (RB)	0.45	0.62	0.12
Unconstrained PPO	0.82	0.89	0.24
Fairness-Aware Agent (FAA)	0.79	0.88	0.05

We acknowledge that Hofstede indices operate at national aggregates and may not capture intra-country heterogeneity. To mitigate this limitation, we augment cultural vectors with behavioral clustering derived from interaction traces, enabling latent cultural inference at the individual level. Furthermore, temporal drift is addressed through periodic offline re-training and policy recalibration using rolling-window data, ensuring robustness under evolving cultural distributions. In addition to national indices, we also derive behavioral cultural proxies by unsupervised clustering of engagement patterns, allowing for personalized cultural embeddings.

In order to reduce the possibility of attribution being distributed arbitrarily among correlated predictors under correlated educational features (such as forum activity and peer engagement), we used hierarchical clustering of correlated features before SHAP computation. This is because SHAP assumes feature independence in its marginal contribution estimation.

Conclusion

A robust, mathematically based framework for Agentic AI in multicultural higher education was proposed in this study. We showed that it is feasible to create intelligent and just individualized instruction systems by combining Constrained POMDPs, Offline RL, and Visual Analytics on the large HarvardX-MITx dataset. Future research will concentrate on multiagent scenarios in which teachers and students work together to jointly create learning routes.

Authors’ Contributions

C.Q.: Conceptualization. K.C.: Data curation. Z.W.: Formal analysis. L.H.: Drafting of article.

Footnotes

Acknowledgment

Authors would like to thank their respective institute for the efforts and provide work space to perform this experiment.

Author Disclosure Statement

Author reports no conflict of interest.

Funding Information

No funding was received for this article.

Abbreviations Used

References

1. Long

, Siemens

. Penetrating the fog: Analytics in learning and education. EDUCAUSE Review 2011;46(5):31–40.

2. Kizilcec

, Cohen

. Eight-minute self-regulation intervention raises educational attainment at scale in individualist but not collectivist cultures. Proc Natl Acad Sci USA 2017;114(17):4348–4353.

3. Shaun

, De Baker

, Inventado

. Chapter 4: Educational data mining and learning analytics. Comput Sci. 2014;7:1–6.

4. Norvig

, Russell

. Artificial intelligence a modern approach. Pearson: Hoboken, New Jersey; 2016.

5. Luckin

, Holmes

. Intelligence unleashed: An argument for AI in education. Pearson: London, United Kingdom; 2016.

6. Arrieta

, Díaz-Rodríguez

, Del Ser

, et al. Explainable artificial intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible Ai. Information Fusion 2020;58:82–115.

7. Kusner

, Loftus

, Russell

, et al. Counterfactual fairness. Advances in neural information processing systems: Long Beach, California; 2017;Vol 30.

8. Ho

, Chuang

, Reich

, et al. HarvardX and MITx: Two years of open online courses fall 2012-summer SSRN; 2014.

9. Kuzilek

, Hlosta

, Zdrahal

. Open university learning analytics dataset. Sci Data 2017;4(1):1–8.

10.

10. Hofstede

. Dimensionalizing cultures: The Hofstede model in context. Online Read Psychol Cult 2011;2(1):8.

11.

11. Anderson

, Corbett

, Koedinger

, et al. Cognitive tutors: Lessons learned. J Learn Sci. 1995;4(2):167–207.

12.

12. Rafferty

, Brunskill

, Griffiths

, et al. Faster teaching via pomdp planning. Cogn Sci 2016;40(6):1290–1332.

13.

13. Doroudi

, Aleven

, Brunskill

. Where’s the reward? a review of reinforcement learning for instructional sequencing. Int J Artif Intell Educ 2019;29(4):568–620.

14.

14. Sun

, Liu

, Lin

, et al. Temporal learning analytics to explore traces of self-regulated learning behaviors and their associations with learning performance, cognitive load, and student engagement in an asynchronous online course. Front Psychol 2023;13:1096337.

15.

15. Lundberg

, Lee

. A unified approach to interpreting model predictions. Adv Neural Inf Process Syst 2017;30.

16.

16. Ribeiro

, Singh

, Guestrin

. “ Why should i trust you?” Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM; 2016; pp. 1135–1144.

17.

17. Holstein

, Wortman Vaughan

, Daumé

, et al. Improving fairness in machine learning systems: What do industry practitioners need?. In Proceedings of the 2019 CHI conference on human factors in computing systems. CHI; 2019; pp. 1–16.

18.

18. Vieira

, Parsons

, Byrd

. Visual learning analytics of educational data: A systematic literature review and research agenda. Comput Educ. 2018;122:119–135.

19.

19. Le Quy

, Nguyen

, Friege

, et al. Evaluation of group fairness measures in student performance prediction problems. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer Nature Switzerland: Cham; 2022; pp. 119–136.

20.

20. Wen

, Bastani

, Topcu

. Algorithms for fairness in sequential decision making. In: International Conference on Artificial Intelligence and Statistics. PMLR; 2021; pp. 1144–1152.

21.

21. Levine

, Kumar

, Tucker

, et al. Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv Preprint 2020.

22.

22. Bertsekas

. Constrained optimization and Lagrange multiplier methods. Academic Press: San Diego, California; 2014.

23.

23.MITx H. HarvardX-MITx Person-Course Academic Year 2013 De-Identified dataset, version 2.0. Harvard Dataverse. Harvard Dataverse; 2014.

24.

24.Insights H. Country comparison tool. Hofstede Insights: Helsinki, Finland; 2024.