Beyond Human Actors: Leveraging AI for Enhanced Public Safety XR Training

Abstract

Law enforcement officers often respond to behavioral health crises, where individuals with mental health conditions face an 11.6-fold higher risk of police use of force. While extended reality (XR) offers a promising training solution for de-escalation, current systems lack the necessary realism for developing skills under pressure. This study introduces an XR training prototype powered by an AI-driven Non-Playable Character (NPC) to address these shortcomings. A two-phase evaluation assessed the system's usability and its training efficacy with officers. Results showed high user engagement, a strong sense of virtual presence, and an optimized cognitive load for learning. Most significantly, participating officers demonstrated a 50% reduction in race and gender bias according to IAT scores. This highlights the potential of AI-enhanced XR to improve police decision-making and mitigate bias in high-stakes encounters.

Keywords

artificial intelligence AR/VR/MR education and learning extended reality machine learning user-centered design

Introduction

Law enforcement officers (LEOs) and community partners often act as first responders to 911 calls, including those involving behavioral health crises. Calls involving individuals with mental health conditions comprise ~20% of service requests, and people with serious behavioral health issues face an estimated 11.6 times higher risk of police use of force (UoF) compared to the general population (Laniyonu & Goff, 2021; Morabito et al., 2017). To reduce the likelihood and severity of police UoF, it is crucial to enhance LEO training and promote effective de-escalation and crisis management strategies. Integrating extended reality (XR) technologies (Skarbez et al., 2021) offer potential benefits to enhance police decision-making (Regal et al., 2023) through the successful application of de-escalation tactics for handling critical situations. However, existing XR-based LEO scenarios face significant gaps in addressing the nuanced demands of de-escalation training, such as a lack of diverse training scenarios and real-time stressful interactions that are critical for developing and practicing de-escalation skills under stress. Moreover, existing XR simulations require dependency on human operators to guide training sessions, which constrain their utility and impact. Our paper aims to leverage artificial intelligence (AI)-informed training experiences within XR to present transformative potential to bridge these gaps (LaVoie et al., 2023).

Background

XR law enforcement training simulations on de-escalation present various scenarios set in different environmental situations varying in complexity (Muñoz et al., 2024). Lately, AI-driven NPCs (e.g., Inworld AI, Convai) are gaining popularity as embodied intelligent agents for virtual simulations and engaging XR learning experiences through intelligent and adaptive conversations (Mochamad et al., 2024) using LLMs. Real- world interactions between individuals and police are unpredictable; therefore, training scenarios need to increase situation difficulty by invoking emotions-based NPC behavior (e.g., civilian conduct, behavioral health). In this paper, we design neurobehaviorally diverse NPCs in XR that are driven by state-of-the-art LLMs and facilitate a comprehensive and intuitive learning experience for LEOs.

Approach

We followed a two-phase evaluation approach for the AI- NPC based law enforcement XR training prototype (Figure 1). In the first phase, we evaluated the AI-based XR training prototype for overall technological usability and recruited non-law enforcement participants (n = 26 (14 F); 31.5 ± 16.4 years) for the same. In the second phase, we focused on evaluating the training efficacy with LEOs (n = 6 (1 F); 38 ± 10.5 years; mean years of law enforcement experience: 13.6 years) and gathered both subjective and quantitative feedback on the AI-based XR training prototype. We designed sixteen NPC scenarios motivated by real-world incidents and body-camera footage (Figure 1) driven by a 2 x 2 counterbalanced combination of conduct (cooperative/ non-cooperative) and neurodiversity (neurotypical/ neurodivergent) among gender (male/female) and race (black/white)-matched NPC avatars. Participants underwent two rounds of NPC scenarios where they responded to a cognitive load-theory (CLT) survey questionnaire (Sweller, 1994) at the end of each round. They also reported emotions experienced after trials with NPCs per characteristic using the Discrete Emotion Questionnaire (DEQ; Harmon-Jones et al., 2016). Participants’ implicit biases were recorded during an implicit association test (IAT; Greenwald et al., 1998) conducted pre- and post-study for race and gender categories. At the end of the study, participants reported their overall user experience through User Engagement Scale (UES; O’Brien, 2016), VR Presence (VRPQ; Witmer et al., 1998), and Simulator Sickness (SSQ; Kennedy et al., 1993) survey questionnaires.

Figure 1.

XR Training prototype implementation.

Outcome

For the first phase of evaluation with non-law enforcement participants (Table 1), median UES scores were rated high overall (Table 1 Left) across subscales of aesthetic appeal, reward - which focuses on the hedonic aspects of our AI-NPC based XR training, and emotional engagement. Whereas focused attention was rated medium and perceived usability was rated low as the XR training prototype is intended for LEOs. Participants reported high median presence scores across the VRPQ subscales. We observed lower median participant discomfort scores on the SSQ. These findings are further corroborated by CLT scores where median participant scores were high for Germane load, indicating long-term knowledge retention with the XR training prototype, and

Table 1.

User Evaluation Outcomes of the XR Training Prototype With Non-LEO and LEO User Groups.

Metric	Subscales	Participants	M(SD)	Participants	M (SD)
UES	Aesthetic Appeal	Non-LEO (n = 26)	3.46 (1.35)	LEO (n = 6)	4 (1.29)
	Focused Attention		2.57 (1.24)		2.66 (0.82)
	Perceived Usability		0.5 (1.11)		1.5 (1.44)
	Reward		4 (1.32)		3.5 (0.96)
	Emotional		4 (1.07)		4 (1.64)
	Engagement
VRPQ	Spatial Presence	Non-LEO (n = 26)	4.06 (1.3)	LEO (n = 6)	3.22 (0.94)
	Involvement		4.8 (1.01)		4.6 (0.72)
	Realism		4.12 (0.8)		3.75 (0.78)
CLT	Intrinsic Load	Non-LEO (n = 26)	3.5 (1.24)	LEO (n = 6)	1 (0.54)
	Germane Load		4.33 (0.79)		4 (0.57)
	Extraneous Load		2 (1.11)		2 (1.34)
SSQ	Nausea	Non-LEO (n = 26)	5 (4.26)	LEO (n = 6)	4 (1.48)
	Oculomotor		3 (4.03)		1 (1.22)
	Disorientation		0 (0.29)		0 (1.34)

A two-way (conduct x neurodiversity) ANOVA for DEQ pooled across both study phases found greater scores for anger, anxiety, and fear with non-cooperative NPC personality (all p < .003) and for sadness with neurodiverse NPC behavior (p < .001). A paired samples t-test of IAT scores for AI NPC race (p = .002) and gender (p < .001) indicated that post-study scores were significantly higher than pre-study scores, indicating 50% race and gender bias mitigation. Post-study semi- structured interviews showed a promising response towards AI NPCs, especially from LEOs. Participants across both studies perceived the AI NPCs to be conversant as an actual human-being. They also found the neurodivergent AI NPCs to be relevant for training on mental health issues as they presented diverse emotional responses, something that is lacking in current LEO training. However, LEOs found the AI NPCs to be relatively compliant, even for non-cooperative personalities when compared to real world officer-civilian interactions, suggesting improving NPC believability.

low extrinsic load. For the second phase of evaluation with the LEOs (Table 1 right), the median UES scores were higher for aesthetic appeal and perceived usability. The median VRPQ scores were medium to high across all subscales, but lower than the non-LEO scores. The LEOs also rated low median scores for discomfort with SSQ highlighting lack of nauseousness and oculomotor discomfort. The median CLT scores were also rated high for germane load, and low for intrinsic and extraneous loads, but were lower than non-LEO scores.

Conclusion

In this paper, we demonstrate the design and evaluation of neurobehaviorally diverse AI NPCs in XR. Our evaluation findings highlight an overall positive user experience for the AI NPC prototypes, and the effectiveness of the AI NPCs to evoke emotions that represent the scenarios for LEOs. User feedback by LEOs on improving NPC realism by tweaking the conduct and neurodivergence-based behaviors will guide our future work in refining the NPC behaviors.

Footnotes

ORCID iD

Ranjana K Mehta

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by funds from the National Science Foundation (Grant No: 2349138) to Dr. Ranjana K. Mehta. The funding organization played no role in the study design, data collection, analysis, interpretation of data, or the writing of this manuscript.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

Witmer

B. G.

Singer

M. J.

(1998). Measuring presence in virtual environments: A presence questionnaire. Presence: Teleoperators and Virtual Environments, 7(3), 225–240. https://doi.org/10.1162/105474698565686

Greenwald

A. G.

McGhee

D. E.

Schwartz

J. L.

(1998). Measuring individual differences in implicit cognition: The implicit association test. Journal of Personality and Social Psychology, 74(6), 1464.

Harmon-Jones

Bastian

Harmon-Jones

(2016). The discrete emotions questionnaire: A new tool for measuring state self-reported emotions. PLoS One, 11, e0159915.

Sweller

(1994). Cognitive load theory, learning difficulty, and instructional design. Learning and Instruction, 4(4), 295–312. https://doi.org/10.1016/0959-4752(94)90003-5.

Kennedy

R. S.

Lane

N. E.

Berbaum

K. S.

Lilienthal

M. G.

(1993). Simulator Sickness Questionnaire (SSQ) [Database record]. APA PsycTests. https://doi.org/10.1037/t04669-000

Laniyonu

Goff

P. A.

(2021). Measuring disparities in police use of force and injury among persons with serious mental illness. BMC Psychiatry, 21(1), 500. https://doi.org/10.1186/s12888-021-03510-w

Lavoie

Álvarez

Baker

Kohl

(2023). Training police to de-escalate mental health crisis situations: Comparing virtual reality and live-action scenario-based approaches. Policing: A Journal of Policy and Practice, 17, paad069.

Morabito

M. S.

Socia

Wik

Fisher

W. H.

(2017). The nature and extent of police use of force in encounters with people with behavioral health disorders. International Journal of Law and Psychiatry, 50, 31–37. https://doi.org/10.1016/j.ijlp.2016.10.001

Muñoz

J. E.

Lavoie

J. A.

Pope

A. T.

(2024). Psychophysiological insights and user perspectives: enhancing police de-escalation skills through full-body VR training. Frontiers in Psychology, 15, 1390677. https://doi.org/10.3389/fpsyg.2024.1390677

10.

O’Brien

(2016). Theoretical perspectives on user engagement. In: O’Brien

Cairns

(Eds.), Why engagement matters (pp. 1–26). Springer. https://doi.org/10.1007/978-3-319-27446-1_1

11.

Skarbez

Smith

Whitton

M. C.

(2021). Revisiting Milgram and Kishino’s reality-virtuality continuum. Frontiers in Virtual Reality, 2, 647997.