Developing shooter movement models from virtual reality simulation

Abstract

This research uses virtual reality (VR) to immerse human subjects in an active school shooting scenario in order to generate ecologically valid models of school shooter movement and behavior. Historically, data recovered from US school shootings has lacked the fidelity needed to model shooter movement; consequently, simulations of school shootings have had to rely on significant and unsupported assumptions about the movements of the shooter and/or victims. We asked human subjects to act as school shooters in a VR simulation. We then recorded their movements, observations, and actions. Our results show that participant shooters are statistically equivalent to historical incidents with respect to aggregate engagement metrics (shot rate, victim rate, and accuracy) across most scenarios. Moreover, empirical models trained on participant data reduced prediction error by at least 15.5% compared to heuristic baselines and 16.7% compared to models trained on pedestrian data. Overall, this work provides a reproducible framework for data-driven modeling of shooter movement, supporting controlled simulation-based evaluation of response strategies.

Keywords

Virtual reality active shooter behavior modeling machine learning

1. Introduction

Despite increased spending on school safety and preventive measures in the United States, school shootings have increased; among incidents involving at least one victim, more shootings occurred in the last decade (2016–2025) than in the previous five decades combined (1966–2015).^1–3 Unfortunately, the effectiveness of preventive measures, such as the introduction of armed school security guards, has not been rigorously evaluated,^4–6 largely because ecologically valid studies would require the simulation of a school shooting that could cause psychological distress to participants.^7,8 In order to evaluate potential school shooting prevention measures, techniques must be developed that allow researchers to accurately simulate these traumatic events without harming human subjects. This paper argues that virtual reality (VR) can be used to safely simulate school shootings in order to model the behavior of shooters and capture data that reflects their behavior.

Our near-term goal is to generate data suitable for training a model that accurately predicts school shooter movements and behavior. We believe that an accurate model of the movements of a school shooter could be leveraged to evaluate or improve prevention measures or victim evacuation. Yet, modeling school shooter movements is difficult because, with one exception, data from prior shootings is too coarse to reconstruct the movements and actions of the shooter.^9–11 Undaunted, agent-based modeling (ABM) has been used to simulate shooting incidents, often relying on critical yet unsupported assumptions about how a shooter will move in different situations. Common assumptions include the shooter remaining stationary,¹² moving randomly,¹³ or consistently advancing toward the nearest civilian.^14–17 The validity of these assumptions and their impact on the data collected from these simulations is unclear.

We intend to augment these agent-based models with data from human-subject studies conducted in VR. To this end, we asked university students and faculty to assume the role of a school shooter in a VR simulation. We then recorded their movements and actions. The resulting data include approximately 450 min of high-fidelity movement information of participants acting as a school shooter.

For a variety of reasons, we contend that our approach produces ecologically valid school shooter data. First, participants are given face-valid descriptions of the study and told that their goal is “to shoot as many people as possible.” We used surveys to confirm that participants understood these directions and were motivated to follow these directions. School shooters often express similar goals before initiating school shootings.^9,18 Second, preliminary studies show that, despite being left to decide on their own how to act, participants take actions that qualitatively resemble shooter crime scripts developed by investigators to model shooter behavior: plan, enter the location, select the first target, fire, search for additional targets, fire, and repeat.¹⁹ Although their movements vary, all of our subjects utilized a similar pattern of actions, switching between searching for targets and remaining in one location to fire on nearby targets of opportunity. In this paper, we further evaluate this hypothesis by comparing data collected from these human-subject studies to historical data collected from previous school shootings. We then compare a simple, empirical model of school shooter movements to previous approaches in terms of prediction error on unseen data. Finally, we evaluated how participants felt about the value of participating in this study.

This paper provides evidence that VR can be used to collect data capturing human behavior in rare, safety-critical scenarios under controlled conditions. We also demonstrate that the resulting data can be used to develop predictive models. This work is based on a paper presented at the 2025 Annual Modeling and Simulation Conference.²⁰ The data and models related to this work are available on GitHub (https://github.com/chrismcclurg/vr-shooter-data). The simulation environment is available upon request.

2. Related work

Our approach is motivated by research in ABM, where representing human behavior remains a central challenge; human trajectory modeling, which offers empirically grounded approaches to modeling individual decision-making; and VR, which provides both a source of empirical human behavior data and a means for validating behavioral models. These research areas are discussed in the following sections.

2.1. Agent-based modeling

Active shooting incidents have been studied with ABM, where researchers have examined how different modeling assumptions and interventions influence casualty outcomes. In ABM, individuals are represented as agents governed by rules or constraints, allowing population-level trends to be examined across many simulated trials.²¹ Anklam et al,¹³ e.g., looked at whether having individuals with concealed carry permits and/or an assigned resource officer would reduce casualties. In their model, the shooter agent randomly selects victims and when to change location until it encounters law enforcement or randomly commits suicide. Hayes and Hayes examined the impact of a proposed gun bill (Assault Weapons Ban of 2013) on school shootings.^14,22 In this study, the modeled shooter followed a simple process of reloading, shooting the nearest individual, and moving toward the most recently targeted individual. Briggs and Kennedy studied whether people using unarmed resistance would reduce casualties.¹² Stewart used ABM to examine the impact of cognitive delay, response strategy, and police response time, with the shooter again targeting the closest person.¹⁵ Lee et al.¹⁶ evaluated the effectiveness of the Run. Hide. Fight. response strategy, with shooter behavior defined by proximity-based targeting within range, and later extended this work to an environment modeled after the Columbine High School library.¹⁷ Across these studies, shooter behavior is specified through a small set of predefined decision rules.

Similar approaches have been applied to other risk-critical domains. Fire evacuation models commonly rely on heuristic decision rules (e.g., nearest-exit selection, shortest-path routing, and rule-based evacuation timing), often coupled to increasingly detailed physical or environmental dynamics.^23–25 Similarly, Bae et al.²⁶ modeled city evacuation during a bombing by representing agents as traffic participants, assuming fixed congestion dynamics over road topology. Other studies model evacuation using agents with discrete action spaces, where actions are stochastic and probabilistically calibrated from survey data.^27,28 Beyond evacuation, Malleson et al.²⁹ examined urban crime patterns using agents governed by the Physical conditions, Emotional states, Cognitive capabilities, and Social status (PECS) framework, which represents internal agent state while retaining prescriptive behavioral structure. In these domains, agent behavior is similarly defined through predefined heuristic, probabilistic, or conceptually structured decision rules.

Across both active shooter and broader risk modeling, a central challenge lies in how human decision-making is specified within agent-based systems. Kennedy identified three categories of decision-making models in ABM: (1) mathematical approaches based on simplified rules or functions, (2) conceptual frameworks that explicitly represent cognitive or emotional states, and (3) high-fidelity cognitive architectures.³⁰ Most existing models adopt mathematical or conceptual representations that favor interpretability and scalability over behavioral fidelity, reflecting practical constraints related to data availability and validation in high-risk domains. While heuristic-based ABMs may incorporate empirical calibration, the structure of the decision logic is typically defined a priori and held invariant. In contrast, our approach leverages VR to collect empirical trajectories from which decision-making policies are inferred directly, relaxing the assumption of predefined decision structure.

2.2. Human trajectory modeling

Human behavior representation is also central to the extensive literature on human trajectory modeling, which approaches decision-making implicitly through observed motion. In this setting, the positions of a target pedestrian are observed over time $t \in [0, T_{obs}]$ , and then future positions are predicted over time $t \in (T_{obs}, T_{obs} + T_{pred}]$ .³¹ Since this problem has been studied for more than 30 years,³² we refer the reader to complete surveys^31,33,34 and instead focus on models that have similarly incorporated Long Short-Term Memory Networks (LSTMs).³⁵

Alahi et al.³⁶ introduced the Social-LSTM model, in which each pedestrian’s movement is represented by an individual LSTM. Information from neighboring pedestrians is shared in a pooling layer either through hidden states (S-LSTM) or coordinates (O-LSTM). Bartoli et al.³⁷ extended this model by including a “context-aware” pooling layer that accounts for the coordinates of static obstacles. Bisagno et al.³⁸ extended the S-LSTM model by first applying k-means clustering to the trajectories in a scene and then only pooling the hidden states of out-group trajectories with respect to a given trajectory. Pei et al. introduced a hand-crafted social affinity map (SAM) that divided the space around the target into bins. Neighboring pedestrians located within the same bin were then treated as a group and jointly influenced the target’s motion.³⁹ Other work used attention mechanisms to account for either static obstacles⁴⁰ or nontarget pedestrians^41,42 in a scene, but these were often limited to high-resolution data.⁴⁰ Some researchers^43–45 have used multichannel encoder–decoder architectures. Typical inputs included target position and/or velocity, static occupancy maps, and neighbor context encoded either as occupancy grids or relative motion. Pfeiffer et al.⁴⁴ combined velocity with explicit static maps and neighbor grids, while Xue et al.⁴³ replaced the static map with raw scene features. Finally, Shi et al.⁴⁵ avoided using grids, instead using a subnetwork based on the relative motion of a pedestrian’s neighbors. These types of models have demonstrated improved performance over S-LSTM on most benchmark scenes where direct comparisons were made.^43,45

None of these previous models has been used to predict a shooter’s trajectory, nor have any been trained on data resembling a shooting event. The standard datasets for pedestrian movement prediction (ETH,⁴⁶ UCY⁴⁷) are based on the movements of crowds of homogeneous pedestrians (The ETH dataset admits that pedestrians that are “standing or strolling aimlessly” are ignored in order to remove nonhomogeneous data.⁴⁶). As a result, the models trained on these datasets produce socially compliant, smooth, and group-coherent^38,39 trajectory predictions. School shootings, in contrast, involve a heterogeneous set of agents: bystanders who may run, freeze, or hide (without trajectory) and a shooter who is motivated to harm bystanders. Thus, previous models based on the movement of pedestrians are unlikely to generalize to shooter scenarios because of the difference in motivation of the agents. To capture this heterogeneity, we adopt a multichannel encoder–decoder architecture, which allows distinct input channels to account for different agent and environment states.^43–45 Unlike previous work, our model explicitly encodes state information for neighboring pedestrians (e.g., alive or dead) by using dedicated channels. Finally, we avoid reliance on raw image data, which tends to overfit to scene-specific appearance features and generalizes poorly to visually distinct environments, especially in a simulation-to-reality domain shift.

2.3. VR for data and validation

Virtual reality (VR) has been routinely validated as a tool for measuring and developing human decision-making in situations of risk. In fire evacuation, Arias et al.⁴⁸ found behavioral patterns (i.e., evacuation times and exit choices) to be comparable for both VR and physical experiments. In fire evacuation with robot assistance, Yin et al.⁴⁹ found similar outcomes and trust in the robot to be comparable in VR and physical experiments. Shipman et al.⁵⁰ studied a simulation of violent threat, demonstrating that psychological responses and key behavioral dependencies observed in VR were statistically indistinguishable from those measured in a corresponding physical experiment. Medical researchers have demonstrated that skill development in VR was comparable to real surgery training^51–53 Qin et al.⁵⁴ demonstrated that VR-based combat trauma care training leads to significant improvements in trainee knowledge relative to pretraining baselines. Finally, Harris et al.⁵⁵ found the decision-making of individuals in military combat to be comparable for both VR and live-fire scenarios. This last example is particularly relevant to our work, because participants are given a gun and must make split-second decisions on when to shoot or not shoot.

The presented studies establish VR as a validated experimental instrument for measuring decision-making and behavioral dynamics in high-risk, time-critical scenarios. In the context of school shootings, continuous ground-truth movement data from real perpetrators is essentially unavailable, with only a single documented example known to the authors.⁵⁶ Accordingly, VR provides a rare and ethically feasible means of collecting ecologically valid, continuous behavioral data under controlled conditions. When individual real-world trajectories are unavailable, VR-derived behaviors are evaluated against aggregate statistics reported across historical incidents, which represents the level of validation available in this domain. Complementary work⁵⁷ has explored alternative modeling abstractions designed to support evaluation across multiple environments, trading geometric specificity for broader contextual coverage.

3. Human-subject experiment

We conducted three studies (S1, S2, and S3), which tasked human subjects in VR with acting like an active school shooter. The frequency of data collection was 1 Hz for S1 and 2 Hz for S2 and S3. All studies recorded the participant’s position and cumulative shots fired, but only S2 and S3 recorded the participant’s rotation, gaze direction, pupil diameter, cumulative reloads, and the position and visibility of other objects. For S1 and S2, data were written to a file at every timestep, whereas for S3, raw data were transferred over a User Datagram Protocol (UDP) socket to improve simulation performance and reduce subject nausea. Data was collected over three time periods: S1 in April 2023, S2 in September–October 2023, and S3 in October–December 2024. The total number of participants was 103, with 32, 39, and 32 participants in S1, S2, and S3, respectively. Data from participants who did not finish the simulation due to physical discomfort (6), psychological discomfort (3), or hardware error (4) was discarded.

3.1. Simulation environment

A virtual reproduction of Columbine High School served as the simulation environment, the site of the infamous 1999 school shooting in the United States.⁹ Google Maps was used to measure the perimeter of the building using the built-in measuring tool, as shown in Figure 1(a). The newer renovations of the building were ignored in order to faithfully simulate the building at the time of the school shooting. Once the building perimeter was scaled, crime scene diagrams from the official report⁹ were scaled to overlay on the building perimeter, resulting in a complete representation of the floor plans (except for rooms lacking crime scene diagrams). The floor plans were then used to create a computer-aided design (CAD) model in SolidWorks (Figure 1(b)). The CAD model was then imported into Unity. Unity’s asset store was used to obtain textures and objects such as doors, cars, and a detailed handgun for the environment. A comparison of the virtual environment to the physical environment in 1999 is depicted in Figures 1(c) and (d).

Figure 1.

Steps (a–c) to make the simulation environment resemble the real environment (d) are depicted. (a) Google Maps. (b) SolidWorks Assembly. (c) Unity Scene. (d) Columbine High School.⁹

3.2. Nonplayer characters as bystanders

The nonplayer characters (NPCs) were used to populate the environment with autonomously behaving students and teachers. These NPCs were controlled by finite state machines (FSM), a common practice for modeling evacuees.^58–60 We designed the FSM to mimic the “Run. Hide. Fight” response recommended by the Department of Education.⁶¹ Fighting is difficult to accurately simulate and seldom occurs during actual school shootings, so this option was disregarded. Figure 2 (right) depicts the implemented finite state machine. The default state of the NPC was relaxed. If triggered, the NPC would switch from the relaxed state to the alarmed state. From the alarmed state, the NPCs could either transition to the hiding or escaping states randomly. Our model includes the run–hide decision probability as a parameter, which we set to have the NPCs attempt to escape 25% of the time (see Appendix 5 for sensitivity analysis). While NPCs were in a given state (relaxed, alarmed, hiding, or escaping), they moved along a corresponding set of waypoints. This set of waypoints is color-coded on the left of Figure 2. The column of rectangles on the right side of the figure shows which animations correspond to a given state. By default, NPCs were either sitting, walking, or teaching. When triggered, the NPCs run (yellow, then orange or green). Once an NPC has reached either the end of a green or orange waypoint path, a crouch animation occurs.

Figure 2.

Each NPC is modeled as a finite state machine (right). The column of ovals describes the states and their transitions, whereas the column of rectangles describes the animation used by the NPC for that state. Within each state, the NPCs follow a specific waypoint system (left).

3.3. Hardware

An HTC Vive Pro headset was used, which compares favorably among other headsets in comfort, display quality, tracking system stability, and prescriptive glasses compatibility.^62,63 Valve Index hand controllers were used. The right rear trigger was used to shoot the gun. While a two-handed gun controller was considered, we decided that a one-handed controller would better reflect the fact that 77.2% of mass shooters used handguns from 1966 to 2019.⁶⁴ Foot interfaces (Cybershoes)⁶⁵ were used to simulate embodied walking (Figure 3). The participants wore the shoes while sitting on a swivel chair, which allowed a full range of motion while remaining seated.

Figure 3.

Participants were seated in a swivel chair with an HTC Vive headset. Cybershoes were strapped over shoes for locomotion. Valve Index controllers were strapped to hands for hand movements and shooting.

3.4. Participants

The experiments were conducted in a lab on the Pennsylvania State University campus. Poster and email advertisements were used to recruit subjects. The inclusion criteria for the study required participants to be male, at least 18 years of age, and not have previously experienced motion sickness while using a VR headset. Female participants were excluded because 98% of shooters between 1966 and 2019 were male.⁶⁴ Our target recruits generally matched the demographic characteristics of real school shooters—young male students. Participants were paid $15 in S1 and S2, while S3 paid $30 to attract more participants. Individuals were not permitted to participate more than once. The study was Institutional Review Board (IRB)-approved.

3.5. Procedure

Upon arrival at the lab, the participant was asked to complete a consent form. In S1, the experimenter read a fixed script describing the experiment and task. In S2 and S3, an NPC modeled after the experimenter explained the task. Next, a research assistant outfitted the subject with VR devices (Cybershoes, controllers, and a VR headset) and explained how to use the devices. Once participants entered the VR environment, they found themselves in a virtual environment modeled to closely resemble the real-world environment. The subjects were led to a training room by the NPC experimenter. In the training room, the participant practiced walking using the shoe interfaces and firing the gun at targets. The NPC experimenter explained how to shoot and reload the gun. The participant was allowed as much time as needed in the training room. After training, the participant exited the training room without a gun. The participant then arrived at an outdoor shed in order to study a map of the school environment. The map, shown in Figure 3(c), was in focus for exactly 1 min to reduce variability among participants in S2 and S3. In S1, participants could look at the map for as long as they wanted. In S2 and S3, the participant then toured the school. During the tour, the participant was shown the closest entrance, the cafe, the library, the auditorium, and the back entrance in that order. Once the tour was complete, the participant was given a handgun. The NPCs representing students then arrive at the school. This event marked the beginning of the experiment. Once the participant fired their first shot, the NPC bystanders reacted and the experiment continued for five more minutes. The different stages of the study are depicted in Figure 4.

Figure 4.

The stages (a–f) of the simulation are depicted. (a) Receive instructions. (b) Practice aiming. (c) Study map. (d) Take tour. (e) Receive gun. (f) Be shooter.

4. Ecological validity of data

Having collected a large dataset on participant movements and actions across three studies (S1–S3), we now evaluate its suitability for modeling school shooter behavior. Specifically, we assess whether VR participants reproduce historically plausible aggregate engagement metrics and whether the resulting dataset improves trajectory prediction performance. To establish ecological validity, we compare participant shot rate, victim rate, and shot accuracy to historical school shooting data using the two one-sided tests (TOST) procedure to assess statistical equivalence. Because detailed real-world shooter trajectories and decision-level context are rarely available, this validation is intentionally limited to aggregate outcomes rather than fine-grained path or choice-point behavior.

4.1. Two one-sided tests

To evaluate the ecological validity of the collected data, we compared participant shooting statistics to measurements from historical school shootings. Using a database of school shootings spanning 1966–2023,⁶⁴ we extracted incidents with publicly reported durations, number of shots fired, and number of victims (see Table 1). For each outcome variable (shot rate, victim rate, and shot accuracy), we computed the experimental mean and 90% confidence interval (CI) and compared them to prespecified equivalence bounds (EB) based on the central 95% of the historical distribution (2.5th–97.5th percentiles). Equivalence was determined with a TOST,⁶⁶ which tests whether the mean lies entirely within the equivalence region. Unlike traditional $t$ -tests that assess whether two groups differ, TOST provides evidence that differences, if present, are too small to be practically meaningful. We emphasize that historical school shooting data is sparse and highly heterogeneous, with many unobserved contextual factors (e.g., building layout, responder timing, perpetrator count, and population density) varying substantially across incidents. This variance yields wide EB, so TOST is interpreted as a plausibility check rather than scenario-level similarity. We did not perform statistical outlier removal, as the historical sample is small and extreme incidents may reflect the phenomenon of interest. We also tested whether school level (elementary/middle/high/post-secondary) correlated with the outcome metrics and found only weak-to-moderate associations (Pearson $| r | \leq 0.45$ ), suggesting school level does not strongly explain the observed variation.

Table 1.

Historical data of US school shooting incidents.

Year	Location	Time (min)	No. of rounds	No. of victims	Shot rate	Victim rate	Shot accuracy
1966	Austin, TX^64,67	96	150	46	1.6	0.5	0.307
1976	Fullerton, CA^64,68	5	23	9	4.6	1.8	0.391
1989	Stockton, CA^64,69,70	3	106	35	35.3	11.7	0.330
1998	Springfield, OR^64,71,72	5	50	29	10.0	5.8	0.580
1999	Littleton, CO^9,64	47	94	36	2.0	0.8	0.383
2005	Red Lake, MN^64,73	9	45	16	5.0	1.8	0.356
2007	Blacksburg, VA^64,74	9	170	58	18.9	6.4	0.341
2008	Dekalb, IL^64,75,76	6	54	26	9.0	4.3	0.481
2012	Sandy Hook, CT^64,77	5	154	28	30.8	5.6	0.182
2013	Santa Monica, CA^64,78	10	100	8	10.0	0.8	0.080
2014	Isla Vista, CA^64,79	8	55	20	6.9	2.5	0.364
2014	Marysville, WA^64,80,81	4	8	7	2.0	1.8	0.875
2018	Parkland, FL^56,64,82	6	150	34	25.0	5.7	0.227
2021	Oxford, MI^64,83,84	5	33	11	6.6	2.2	0.333
2022	Uvalde, TX^64,85	77	142	38	1.8	0.5	0.268
2023	Nashville, TN^64,86,87	14	150	6	10.7	0.4	0.040

The min/max values of columns are highlighted.

As shown in Table 2, participant shot rate, victim rate, and shot accuracy were statistically equivalent to historical data under the prespecified bounds, with two exceptions: S1 shot rate exceeded the upper equivalence bound and S1 shot accuracy fell below the lower bound. Both deviations can be attributed to the absence of a reload requirement in S1. A reload action was implemented in S2 and S3, after which both measures fell within the EB. To assess the robustness of these findings, we conducted a sensitivity analysis by rerunning the TOST procedure with narrower EB. Specifically, we replaced the prespecified central 95% range (2.5th–97.5th percentiles) with the central 90% range (5th–95th percentiles), keeping all other parameters (sample sizes, standard errors, and $α = 0.05$ decision threshold) unchanged. This analysis produced the same equivalence decisions for shot rate and shot accuracy across all sessions. For the victim rate, however, the narrower bounds resulted in S2 and S3 shifting from equivalent to inconclusive, reflecting the heavy-tailed nature of the historical victim rate distribution ( $CV \approx 0.94$ , $γ_{1} = 1.42$ , $γ_{2} = 2.19$ ). While the median historical rate was only 2.0 victims/min, some events reached 11.7 victims/min, so small changes to the cutoff affect the equivalence decision near the upper tail. Overall, these findings suggest that participant behavior was comparable to historical data after adjusting for reloading, but also illustrate the difficulty of drawing firm conclusions when the reference distributions are skewed and heavy-tailed.

Table 2.

Results of two one-sided tests (TOST) comparing participant shooting statistics (shot rate, victim rate, and shot accuracy) from studies S1–S3 to historical school shooting data.

Measure	Data	Mean	90% CI	Eq. bounds	Lower test	Upper test	Equivalent?
Shot rate	S1	$55.14$	$[44.97, 65.32]$	$[1.67, 33.63]$	$t (15) = 9.21, p < 0.001$	$t (15) = 3.71, p = 0.999$	No
	S2	$17.54$	$[12.32, 22.76]$	$[1.67, 33.63]$	$t (15) = 5.33, p < 0.001$	$t (15) = - 5.40, p < 0.001$	Yes
	S3	$19.55$	$[13.97, 25.13]$	$[1.67, 33.63]$	$t (15) = 5.62, p < 0.001$	$t (15) = - 4.42, p < 0.001$	Yes
Victim rate	S1	$6.17$	$[4.60, 7.74]$	$[0.45, 9.71]$	$t (15) = 6.39, p < 0.001$	$t (15) = - 3.96, p < 0.001$	Yes
	S2	$6.17$	$[4.51, 7.83]$	$[0.45, 9.71]$	$t (15) = 6.04, p < 0.001$	$t (15) = - 3.74, p < 0.001$	Yes
	S3	$7.05$	$[5.34, 8.76]$	$[0.45, 9.71]$	$t (15) = 6.77, p < 0.001$	$t (15) = - 2.73, p = 0.008$	Yes
Shot accuracy	S1	$0.12$	$[0.03, 0.21]$	$[0.05, 0.76]$	$t (15) = 1.32, p = 0.104$	$t (15) = - 12.97, p < 0.001$	No
	S2	$0.36$	$[0.27, 0.45]$	$[0.05, 0.76]$	$t (15) = 5.82, p < 0.001$	$t (15) = - 7.72, p < 0.001$	Yes
	S3	$0.37$	$[0.28, 0.47]$	$[0.05, 0.76]$	$t (15) = 5.92, p < 0.001$	$t (15) = - 7.31, p < 0.001$	Yes

Experimental means and 90% confidence intervals (CI) are shown together with equivalence bounds (EB), defined as the central 95% of the historical distribution (2.5th–97.5th percentiles). Equivalence is concluded when the 90% CI lies entirely within the equivalence bounds, which is mathematically equivalent to both one-sided tests being significant ( $α \leq 0.05$ ). Bold entries indicate nonequivalence.

5. Modeling shooter movements

This section evaluates shooter trajectory prediction. The goal is to predict, at every timestep, the next $N_{T}$ s of a shooter’s movement given the previous $N_{T}$ s of movement. Unlike pedestrian trajectory prediction—where large-scale datasets are available and widely used^46,47—there is only one known example of continuous positional data from a real school shooter.⁵⁶ Our ecologically valid dataset provides a unique source of data for training and testing models to predict shooter trajectories over extended horizons. We evaluate the performance of the models using cross-validation and by comparison to the limited real data available. Furthermore, we evaluate prediction performance across multiple time horizons ( $N_{T} \in {5 s, 10 s, 20 s}$ ), representing progressively more challenging and practically relevant forecasting tasks. To our knowledge, this is the first work to predict shooter trajectories at horizons this long.

5.1. Evaluations

We evaluated our data in three complementary ways. First, we trained a sequence-to-sequence LSTM trajectory prediction model on the participant data, providing a basic data-driven model of shooter dynamics, and compared its predictive accuracy to the heuristic-governed shooter trajectories commonly used in agent-based simulations.^12,14–17 Second, we investigate the importance of data relevance by comparing model performance when trained on no data, nonrepresentative pedestrian data, and shooter-specific data. Finally, we compare multiple model architectures to examine whether additional architectural complexity provides measurable benefits once relevant data are available. Together, these analyses provide a comprehensive assessment of both the representativeness of our collected data and its utility for building models that accurately predict shooter movement.

5.2. Data

Our data sources included participants acting as shooters in VR, a real school shooting, and publicly available pedestrian crowd data. For the participant data, we used S2 and S3 for model training and testing because equivalence testing (section 4) showed that both were statistically equivalent with historical shooter data. In addition, only S2 and S3 included annotations of neighboring pedestrians and static objects, which are critical for more sophisticated models. This resulted in 60 participants and approximately 300 min of data sampled at 2 Hz. For the real shooter data, we extracted trajectories from a publicly available video animation of the Marjory Stoneman Douglas High School shooting.⁵⁶ This reconstruction, based on video surveillance, depicted the shooter and bystanders over time, relative to the school floor plan. We overlaid a Cartesian grid (scaled using Google Maps perimeter measurements) and discretized the animation into 300 frames. In each frame, we recorded the timestamp and positions of the shooter, bystanders, and doors using an interactive Python plotting script (see Figure 5). We then truncated the data to the longest uninterrupted segment and resampled it to 2 Hz using linear interpolation, yielding 192 s of data. For the pedestrian data, we used the ETH/UCY data, the standard benchmark for pedestrian trajectory prediction.^36,46,47 It contains several hours of crowded urban and campus scenes with approximately 1500 annotated trajectories. For consistency, we resampled the data from 2.4 Hz to 2.0 Hz using linear interpolation.

Figure 5.

Positional data was extracted frame-by-frame from a video animation of the school shooting in Parkland, Florida.⁵⁶

5.3. Metrics

Based on previous work involving pedestrian trajectory prediction,³⁶ we used average path error (APE) and final path error (FPE) as quantitative metrics for comparing models. The APE measures the mean Euclidean distance over the entire predicted trajectory, whereas FPE measures the Euclidean distance at the final predicted timestep. Let $N$ denote the total number of trajectories predicted, $T$ be the number of timesteps per trajectory, and $p_{n, t}, {\hat{p}}_{n, t} \in R^{2}$ be the real and predicted 2D positions, respectively, at timestep $t$ of trajectory $n$ . The metrics are defined as:

\begin{matrix} APE = \frac{1}{NT} \sum_{n = 1}^{N} \sum_{t = 1}^{T} ‖ {\hat{p}}_{n, t} - p_{n, t} ‖_{2} \\ FPE = \frac{1}{N} \sum_{n = 1}^{N} ‖ {\hat{p}}_{n, T} - p_{n, T} ‖_{2} \end{matrix}

(3)

5.4. Procedure

All datasets were handled using the same evaluation pipeline. Participant data were randomly ordered by subject and divided by scene into five subsets. The real shooter data contained only one subject and thus formed a single subset. For the empirical models, we applied a leave-one-subset-out cross-validation scheme: in each split, four participant subsets (80%) were used for training and the remaining subset (20%) for testing. This process was repeated until each subset had served as the test set once. For models trained on nonrepresentative pedestrian data, we used the same five-fold structure for training but evaluated each model on the corresponding held-out participant subset, enabling direct cross-dataset comparison. Prediction errors were aggregated by subject, yielding one independent error value per subject for each model-horizon condition. Accordingly, the number of independent samples for statistical testing was equal to the number of participants ( $n = 60$ ) for the VR data and $(n = 1)$ for the real shooter trajectory.

5.5. Models

Several heuristics served as nonempirical baselines for comparison. The no-movement model and closest target (CT) model have been used previously in ABM,^12,14–17 while the constant velocity (CV) approach is often used as a baseline in human trajectory prediction.^36,46 Details for these baselines are provided in Appendix 3.

No movement (NM). The shooter remains stationary throughout the prediction horizon.¹²

CT-f. The shooter moves toward the nearest alive person at a fixed speed, set to an empirically chosen constant.^14–17

CT-a. The shooter moves toward the nearest alive person at an adaptive speed, computed as the mean velocity over a fixed window of preceding timesteps.

CV. The shooter continues moving with its current velocity, maintaining both speed and direction.⁸⁸

All our empirical models were based upon LSTM³⁵ network units for temporal encoding and decoding. An LSTM is a specialized recurrent neural network (RNN),⁸⁹ in which the recurrent cell maintains a hidden state $h_{t}$ that is recursively updated and provided as input to the cell at the next time step. This aspect makes RNNs well-suited for processing sequential data such as time series. LSTM networks also include a cell state, which is essentially long-term memory, allowing them to capture dependencies over longer horizons and mitigate the vanishing gradient problem.³⁵

For empirical models, a single-channel LSTM encoder–decoder was used as the base model (BA-LSTM) for shooter trajectory prediction. This model, shown in Figure 6, consists of an LSTM encoder, an LSTM decoder, and a dense refinement stack of fully connected layers. The dense refinement stack was added to correct a severe underfitting seen in training to predict longer sequences. The model takes as inputs the sequence of previous 2D velocities, where it then predicts the future 2D velocities. Three other model architectures were considered: a three-channel LSTM encoder–decoder model called a “data-driven model” by its authors,⁴⁴ which we will refer to as DD-LSTM; a six-channel model that has been used to guide a shooter-distracting robot,⁹⁰ which we will refer to as RO-LSTM; finally, we present our model for predicting shooter trajectory, ST-LSTM, which incorporates some aspects of both RO-LSTM and DD-LSTM.

Figure 6.

The base model used in this experiment, where the number of hidden units of each layer is annotated.

These other empirical models are shown in Figure 7. The DD-LSTM model (top-left) contains three input channels: 2D velocity, a static occupancy ( $60 \times 60 \times 6$ Cartesian) grid, and a person’s ( $72 \times 1$ polar) grid. The RO-LSTM model (top-right) contains six input channels: 2D velocity, static occupancy ( $21 \times 21$ Cartesian) grid, as well as individual $20 \times 20$ polar grids for alive bystanders, victims, open doors, and closed doors. The ST-LSTM (bottom-left) contains two input channels: 2D velocity, as well as a stacked input $(60 \times 60 \times 6)$ where individual layers are defined by static occupancy and room accessibility grids, as well as distance maps of bystanders, victims, open doors, and closed doors. Whereas RO-LSTM simply flattens the Cartesian and polar grids, both DD-LSTM and ST-LSTM use a pretrained encoder (PTE) as a fixed feature extractor. The PTE is derived from the blue portion of the full convolutional autoencoder (CAE) in the bottom-right of Figure 7. The CAE encodes a $60 \times 60 \times C$ input as a latent vector with dimension $D$ . This latent vector serves as a minimal representation for which the decoder can reconstruct the initial input. Loss is defined by mean squared error with L2 regularization between the initial input and the final reconstruction. Details for autoencoder training are provided in Appendix 2. Whether flattened or encoded, sequences of vectors are passed to each LSTM encoder, where the encoded outputs are concatenated and passed through some combination of the decoder and dense refinement.

Figure 7.

Comparison of architectures used in our STP analysis. Both DD-LSTM and ST-LSTM use a pretrained encoder (PTE) as a feature extractor. This PTE comes from the convolutional autoencoder (CAE) shown in the bottom-right. The training of the CAE uses reconstruction loss, which is depicted with the dashed arrow.

5.6. Implementation

All empirical models were trained and evaluated within a common implementation framework to ensure comparability across architectures. The DD-LSTM and RO-LSTM were adapted from their respective sources^44,90 to use the same dense refinement stack as BA-LSTM (Figure 6). Across all models, encoder LSTM dropout, decoder LSTM dropout, and L2 regularization were drawn from shared search ranges, while all other optimization settings were held fixed. Specifically, the encoder LSTM dropout was selected from ${0.2, 0.3, 0.4}$ , decoder LSTM dropout from ${0.3, 0.4, 0.5}$ (with recurrent dropout fixed at 0.2), and L2 regularization from ${10^{- 3}, 5 \times 10^{- 3}}$ . Dense layers employed a fixed L1 regularization of $10^{- 6}$ , and batch normalization (momentum $= 0.6$ ) was applied at the fusion layers. All models were trained with a batch size of 64 for up to 200 epochs using early stopping and learning-rate reduction on plateau.

To select hyperparameters that differed across model variants, we performed a focused grid search over encoder dropout, decoder dropout, and L2 regularization strength, while holding all remaining hyperparameters constant. Hyperparameters were selected based on validation loss using the same cross-validation splits employed for performance evaluation. The results from the grid search are provided in Appendix 4. Specifically, Table 8 gives the validation loss corresponding to each model and hyperparameter configuration. This strategy prioritizes controlled architectural comparison but does not exhaustively explore the full hyperparameter space for each baseline; as a result, some baselines may achieve improved performance under more extensive, model-specific tuning. Comparative results are therefore interpreted as indicative rather than optimal.

5.6.1. Empirical versus heuristic models

The first evaluation compared empirical and nonempirical approaches for predicting shooter trajectories. In ABM, researchers often rely on heuristics to approximate shooter movement. However, these heuristics can introduce systematic bias into population-level outcomes, as agent movements drive when and where encounters with victims or responders occur. This evaluation, therefore, assesses how using participant-derived data improves trajectory prediction accuracy over heuristic methods. The APE and FPE were used to quantify both overall path deviation and endpoint accuracy across conditions. In the results below, we report APE, whereas tabulated APE and FPE values are provided in Appendix 1.

The top row of Figure 8 depicts APE as a function of predictive model, comparing the empirical BA-LSTM to nonempirical baselines (NM, CT-f, CT-a, and CV) for predicting participant and real trajectories at 5 s, 10 s, and 20 s horizons. For participant trajectories, BA-LSTM reduced APE relative to NM, CT-f, CT-a, and CV by 30.8%, 48.0%, 35.6%, and 27.8% at 5 s; by 23.1%, 36.5%, 27.4%, and 28.0% at 10 s; and by 14.3%, 24.5%, 18.6%, and 32.4% at 20 s, with all differences statistically significant ( $p < . 05$ ; Table 3). For real shooter trajectories, BA-LSTM reduced APE relative to NM, CT-f, CT-a, and CV by 34.7%, 47.2%, 44.9%, and 30.1% at 5 s; by 25.5%, 35.5%, 33.2%, and 26.9% at 10 s; and by 15.1%, 19.0%, 18.6%, and 27.8% at 20 s.

Figure 8.

Comparison of APE across all three evaluations. In the case of predicting participant trajectories (left), asterisks indicate significance with Welch’s unequal-variance t-test: * $p < 0.05$ , ** $p < 0.01$ , *** $p < 0.001$ . In the case of predicting a real shooter trajectory (right), significance could not be determined with a sample size of $n = 1 .$

Table 3.

Pairwise Welch’s $t$ -tests for the first evaluation: BA- LSTM (BA) compared to nonempirical baselines.

Horizon	Comparison	Welch’s $t$ -test
5 s	BA vs. NM	$t (105.8) = 5.317, p < . 001$
	BA vs. CT-f	$t (116.7) = 12.78, p < . 001$
	BA vs. CT-a	$t (106.1) = 6.615, p < . 001$
	BA vs. CV	$t (100.6) = 4.314, p < . 001$
10 s	BA vs. NM	$t (113.8) = 3.811, p < . 001$
	BA vs. CT-f	$t (118.0) = 8.069, p < . 001$
	BA vs. CT-a	$t (113.9) = 4.803, p < . 001$
	BA vs. CV	$t (102.0) = 4.253, p < . 001$
20 s	BA vs. NM	$t (117.2) = 2.420, p = . 017$
	BA vs. CT-f	$t (118.0) = 4.927, p < . 001$
	BA vs. CT-a	$t (116.7) = 3.263, p = . 001$
	BA vs. CV	$t (94.5) = 5.155, p < . 001$

All differences were statistically significant ( $p < . 05$ ).

5.6.2. Impact of data relevance

The second evaluation examined how prediction accuracy changes with different levels of data availability: no data, using the best-case of nonempirical heuristics; nonrepresentative data, training on pedestrians moving in crowds; and representative data, training on participants acting as school shooters in VR. While one might argue that motion models are domain-agnostic—that a single-channel encoder–decoder trained on kinematic inputs should generalize to any moving agent—this evaluation directly tests that assumption. By comparing prediction accuracy across these three levels of data availability, we quantify the benefit of using domain-specific shooter data over unrelated data or no data at all. The APE and FPE were used to quantify both overall path deviation and endpoint accuracy across conditions. In the results below, we report APE, whereas tabulated APE and FPE values are provided in Appendix 1.

The middle row of Figure 8 shows APE as a function of training data availability (no data, nonrepresentative pedestrian data, and representative participant data) for predicting participant and real trajectories at 5 s, 10 s, and 20 s horizons. For participant trajectories, representative training reduced APE relative to no training data (i.e., the best nonempirical baseline) and pedestrian (ETH/UCY) training data by 27.8% and 19.0% at 5 s, by 23.1% and 19.7% at 10 s, and by 14.3% and 13.4% at 20 s, with all differences statistically significant ( $p < . 05$ ; Table 4). For real shooter trajectories, a similar pattern was observed: representative training reduced APE relative to no training data and pedestrian training data by 30.1% and 10.2% at 5 s, by 25.5% and 11.3% at 10 s, and by 15.1% and 14.4% at 20 s.

Table 4.

Pairwise Welch’s $t$ -tests for the second evaluation: training with participant data (Pa) compared to no data (No) or pedestrian data (Pe). All differences were significant ( $p < . 05$ ).

Horizon	Comparison	Welch’s $t$ -test
5 s	Pa vs. No	$t (100.6) = 4.314, p < . 001$
5 s	Pa vs. Pe	$t (110.6) = 2.967, p = . 004$
10 s	Pa vs. No	$t (113.8) = 3.811, p < . 001$
10 s	Pa vs. Pe	$t (117.5) = 3.352, p = . 001$
20 s	Pa vs. No	$t (117.2) = 2.420, p = . 017$
20 s	Pa vs. Pe	$t (117.1) = 2.239, p = . 027$

5.6.3. Effect of model architecture

The third evaluation compared trajectory prediction accuracy across model architectures of increasing complexity: a kinematic-only, single-channel model (BA-LSTM); a model incorporating static and pedestrian occupancy grids (DD-LSTM); an extension with separate grids for alive and dead pedestrians (RO-LSTM); and a hybrid model that combines the most effective features of the others (ST-LSTM). All models exhibited convergence during training, suggesting that they successfully learned the task. This evaluation assessed whether trajectory prediction benefits from greater architectural sophistication. The APE and FPE were used to quantify both overall path deviation and endpoint accuracy across conditions. In the results below, we report APE, whereas tabulated APE and FPE values are provided in Appendix 1.

The bottom row of Figure 8 shows APE as a function of predictive model, comparing empirical architectures (BA-LSTM, RO-LSTM, DD-LSTM, ST-LSTM) for predicting participant and real trajectories at 5 s, 10 s, and 20 s horizons. For participant trajectories, ST-LSTM reduced APE relative to BA-LSTM, RO-LSTM, and DD-LSTM by 7.7%, 8.3%, and 4.4% at 5 s; by 8.7%, 7.1%, and 5.1% at 10 s; and by 10.1%, 7.0%, and 4.6% at 20 s, although none of these differences were statistically significant ( $p < . 05$ ; Table 5). An ablation study of the ST-LSTM model is provided in Appendix 6. For real shooter trajectories, ST-LSTM reduced APE relative to RO-LSTM and DD-LSTM by 18.8% and 2.5% at 5 s; by 14.7% and 1.8% at 10 s; and by 10.7% and 3.4% at 20 s. However, ST-LSTM was consistently outperformed by the simpler BA-LSTM at all horizons.

Table 5.

Pairwise Welch’s $t$ -tests comparing our hybrid model (ST) to other (*-LSTM) model architectures.

Horizon	Comparison	Welch’s $t$ -test
5 s	ST vs. BA	$t (116.3) = 1.252, p = . 213$
	ST vs. RO	$t (117.5) = 1.409, p = . 161$
	ST vs. DD	$t (117.4) = 0.709, p = . 480$
10 s	ST vs. BA	$t (115.5) = 1.376, p = . 171$
	ST vs. RO	$t (118.0) = 1.193, p = . 235$
	ST vs. DD	$t (117.3) = 0.814, p = . 418$
20 s	ST vs. BA	$t (116.4) = 1.686, p = . 094$
	ST vs. RO	$t (118.0) = 1.212, p = . 228$
	ST vs. DD	$t (117.1) = 0.739, p = . 462$

None of the differences was statistically significant ( $p < . 05$ ).

6. Discussion

Several trends emerge from our results. First, nonempirical heuristics that assume how a shooter moves performed significantly worse than our baseline empirical model trained with participant data, highlighting clear improvements over prior approaches. Second, training the empirical model with nonrepresentative pedestrian data (ETH/UCY) also yielded significantly worse performance than training with participant data, demonstrating that shooter movement models are not domain-agnostic. Finally, while the hybrid ST-LSTM slightly outperformed other models in predicting participant trajectories, the differences were not statistically significant, and results were mixed when applied to real shooter data. Notably, the single-channel BA-LSTM remained competitive across all comparisons of predicting the real trajectory. Altogether, these results suggest that improving the availability of relevant data has a greater impact on predictive accuracy than introducing new architectures. This finding reinforces recent trends in machine learning, showing that data-centric approaches—such as correcting labels or augmenting training sets—often outperform model-centric efforts like hyperparameter tuning.⁹¹ Our findings also align with recent data-quality research emphasizing informativeness and representativeness as critical dimensions of trustworthy machine-learning datasets.⁹² For the active shooter modeling task, task-aligned participant data provided greater performance gains than more complex model architectures. These results further support data-centric machine learning and demonstrate that VR offers a practical and more ethical avenue for collecting high-fidelity human behavior data in safety-critical contexts.

6.1. Subjective responses

To further assess the trustworthiness of the data, we examined participant feedback on their experience. Across the three studies, participants rated statements on a 1–5 scale (least to most true). On average, they reported moderate immersion in the environment ( $M = 3.65$ , $SD = 0.94$ ), strong task understanding and seriousness ( $M = 4.42$ , $SD = 0.79$ ), and generally positive freedom of movement ( $M = 3.58$ , $SD = 1.15$ ). These results suggest that participants engaged with the task as intended and were able to navigate the environment effectively. Participants were also asked to anonymously evaluate the study as a whole, though this survey was only included in the latter portion of S3. Responses (see Figure 9) indicated strong endorsement: 92% of participants ( $n = 25$ ) rated the study as very or extremely important, 80% as very or extremely helpful, and 76% reported that the study presented no more than slight danger. These subjective responses reinforce the quantitative findings, strengthening confidence that the VR-collected behavioral data provide a valid and trustworthy basis for predictive modeling of shooter trajectories.

Figure 9.

Study 3 (S3) surveyed participants’ experiences and impressions ( $n = 25$ ) about the research using a 5-point Likert-type scale.

6.2. Ethical considerations

Because this study involved simulated violence and potentially distressing scenarios, steps were taken to minimize potential negative impacts on participants. The study protocol was reviewed and approved by the authors’ IRB, which classified the research as minimal risk under US federal guidelines (45 C.F.R. § 46.102).⁹³ The experiment was designed to be fully face-valid and nondeceptive: participants were informed at recruitment, during consent, and through in-experiment instructions that the task involved shooting NPCs in a virtual environment. Participants were advised that they could withdraw at any time without penalty and still receive full compensation, and experimenters were trained to monitor for visible signs of distress and to terminate the session if necessary. Following participation, individuals completed postexperiment questionnaires assessing perceived danger and emotional discomfort, with a strong majority (76%) of respondents rating the study as no more than slightly dangerous. All participants were also provided with contact information for university mental health services in case distress arose after leaving the laboratory; consistent with the IRB’s determination of minimal risk, no structured psychological debriefing or longitudinal monitoring for delayed distress was required or implemented beyond these safeguards. Participant data were handled in accordance with an IRB-approved protocol requiring deidentification and access control: identifiable information was collected only for compensation purposes and stored separately from experimental data, no master list linking participant identities to behavioral records was created, and the publicly released dataset consists of anonymized, abstracted coordinate trajectories from a simulated environment labeled only with arbitrary participant identifiers (e.g., P00 and P01) and containing no direct or indirect identifiers, demographic information, or link to real-world identities. Prior to release, data were stored on institutionally managed systems with access restricted to authorized research personnel. Although the goal of this work is to support the evaluation of defensive interventions, we recognize that any model predicting behavior in violent scenarios could, in principle, be repurposed for harmful purposes; to reduce this risk, we limit public release to anonymized trajectories in a simulated environment and do not provide recommendations, optimization strategies, or guidance for real-world actions.

6.3. Limitations

A few limitations of this work should be noted. Participants were not real school shooters, and access to such individuals is effectively impossible, as most perpetrators either die during their attack or serve long prison sentences. Accordingly, this study does not attempt to reproduce or model the internal psychological states of real shooters, particularly given the lack of a clear or consistent psychological profile across cases.⁶⁴ In addition, the absence of ground-truth trajectory data from real-world school shootings precludes direct validation of individual movement paths. Despite this, the experimental task was designed to be face valid with respect to the assumed proximate objective of maximizing casualties. Under this framing, participant movement trajectories are treated as approximations of behavior consistent with task instructions. Postexperiment surveys (section 6) confirmed that participants understood the task and took it seriously, and observed behavioral metrics were consistent with aggregate patterns reported in historical school shooting data, providing contextual support for interpreting results.

This study did not explicitly assess participants’ prior familiarity with the Columbine shooting or its associated narratives, which represents a potential source of unobserved bias. Participants were drawn from a college-aged population ( $23.6 \pm 5.4$ years), for whom the Columbine shooting (1999) predates participants’ lived experience, which plausibly limits—but does not eliminate—the likelihood of detailed prior familiarity with the case. The experimental protocol did, however, include a standardized map-familiarization period and guided walkthrough of the virtual school environment for all participants prior to task execution, ensuring a consistent baseline level of spatial familiarity regardless of prior exposure. While we cannot rule out that some participants may have recognized the setting or possessed background knowledge of the event, these design choices reduce the likelihood that case-specific familiarity systematically influenced observed behavior. Future work could explicitly measure prior exposure to relevant historical events to examine this effect more directly.

Certain participant characteristics warrant clarification. While participants engaged in simulated violence as instructed, without personal intent to harm real individuals, the participant pool was not selected to be stress-free or psychologically neutral. A strong majority (70%) reported ongoing life stressors, and a substantial portion (60%) reported prior involvement in physical fights, reflecting heterogeneity in stress and conflict exposure. These factors do not approximate perpetrator psychology or confer violent intent, but they indicate that observed behaviors arose from real individuals making goal-directed decisions under nontrivial cognitive and emotional demands rather than from an artificially neutral participant pool.

The NPC behavior in the present study is intentionally simplified and does not capture higher-order responses such as panic, freezing, herding, or barricading, all of which may influence real-world dynamics during violent emergencies. The NPCs are therefore modeled as a controlled representation of bystander availability and exposure rather than as a comprehensive behavioral simulation. This abstraction enables tractable analysis and avoids introducing additional unvalidated behavioral assumptions, while still supporting sensitivity analysis of key parameters (Appendix 5).

Finally, all VR data used to train the models were collected in a single school environment, which limits the ability to make strong claims about generalization across arbitrary environmental layouts. Although the learned continuous trajectory prediction model was evaluated on a real shooter from a different, unseen environment, broader claims about environment generality would require empirical data collected across multiple school layouts. While we acknowledge that collecting training data from a single school layout limits claims of generalization across arbitrary environment layouts, we view this work as an important step toward the goal of creating better active shooter simulations. Reproducibility across active shooters, the focus of this paper, may allow us to create active shooter models that generalize across environments if environment-independent patterns can be found or if the framework can be used to collect seed data in other environments. But this cannot all happen in a single paper. The evaluations presented in this paper support consistency at the level of aggregate behavioral patterns, but do not imply environment or path-level generalization. Follow-on work has begun to explore alternative modeling abstractions in order to facilitate evaluation across multiple environments.⁵⁷

7. Conclusion

Evaluating school shooting preventive measures requires an accurate and reproducible simulation. This paper presents a method for using human-subject experiments conducted in VR to generate data for modeling shooter movements. Virtual reality offers a means of immersing participants in stressful or emotionally charged situations without risk of harm, but the ecological validity of such data has remained an open question. Our results provide evidence that participant behavior in VR meaningfully reflects the behavior of real school shooters. Participants demonstrated similar behavioral tendencies to those observed in historical shootings. Moreover, empirical models trained with participant data significantly outperformed both heuristic baselines and models trained with nonrepresentative pedestrian data. Finally, survey responses indicated strong endorsement from participants, with the majority rating the study as very to extremely important, very to extremely helpful, and no more than slightly dangerous. Altogether, we believe that this research will contribute to the efforts of the ABM community to simulate school shootings by providing informative and representative data grounded in the behavior of human subjects.

In future work, we plan to extend this research in several directions. The participant dataset will be used to inform agent-based models of active shooter incidents, enabling the discovery of new, emergent population-level trends. We will also repeat the experimental procedure in our virtual environment to generate training data for models of victims and first responders. Importantly, we plan to replicate the data collection protocol across multiple school layouts to explicitly examine how building layout variations influence learned movement patterns and model generalization. In addition, we will use the procedure to systematically evaluate the effectiveness of various security measures on shooter outcomes, building on prior work deploying shooter-mitigating robots in schools.⁹⁰ Beyond school shooting scenarios, our methodology for collecting ecologically valid, context-specific behavioral data offers a template for other domains where domain-agnostic movement models are inadequate or where representative datasets are scarce. Together, these extensions will enable formal validation of agent-based models and rigorous benchmarking of intervention strategies across systematically varied simulation conditions—including facility layouts, agent behaviors, and mitigation strategies—supporting reproducible, data-driven evaluation of school safety policies and other high-risk scenarios.

Footnotes

Appendix 1

Appendix 2

Appendix 3

Appendix 4

Appendix 5

Appendix 6 ORCID iD

Christopher A. McClurg

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This material is based upon work supported by the National Science Foundation under grant no. IIS-2045146. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. Portions of this paper were edited and refined with assistance from generative AI. The authors reviewed and take responsibility for all content.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Author biographies

Christopher A. McClurg is a PhD candidate in the Department of Aerospace Engineering at The Pennsylvania State University. He received his MS and BS from The Ohio State University. His research interests focus on simulation, human-robot interaction, and machine learning with applications in emergency response scenarios.

Alan R. Wagner is an associate professor in the Department of Aerospace Engineering at The Pennsylvania State University. He is also a research associate in the Rock Ethics Institute. He received his PhD from Georgia Institute of Technology’s College of Computing. His research and teaching interests focus on the development of techniques that allow robots to interact with a wide variety of people in various social contexts.

References

Riedman

. K–12 school shooting database. https://k12ssdb.org (2025, accessed 20 November 2025).

Jonson

. Preventing school shootings: the effectiveness of safety measures. Vict Offenders 2017; 12: 956–973.

Cornell

Mayer

Sulkowski

. History and future of school safety research. Sch Psychol Rev 2020; 50: 143–157.

Addington

. Cops and cameras: public school security as a policy response to columbine. Am Behav Sci 2009; 52: 1426–1446.

Schwartz

Ramchand

Barnes-Proby

, et al. The role of technology in improving K-12 school safety. Rand Corporation, 2016.

The Johns Hopkins University Applied Physics Laboratory. A comprehensive report on school safety technology, 2016. https://www.ojp.gov/pdffiles1/nij/grants/250274.pdf

Suomalainen

Haravuori

Berg

, et al. A controlled follow-up study of adolescents exposed to a school shooting–psychological consequences after four months. Eur Psychiatry 2011; 26: 490–497.

Elklit

Kurdahl

. The psychological reactions after witnessing a killing in public in a Danish high school. Eur J Psychotraumatol 2013; 4: 19826.

Grider

Young

Battan

, et al. Columbine high school shootings: Jefferson County Sheriff’s office report. Technical report, Jefferson County Sheriff’s Office, 2000.

10.

Connecticut State Police. After action report—Newtown shooting incident, December 14, 2012. Technical report, Connecticut State Police, 2018.

11.

Virginia Tech Review Panel. Mass shootings at Virginia Tech, April 16, 2007: report of the review panel. Technical report NCJ 219774. Commonwealth of Virginia, 2007.

12.

Briggs

Kennedy

. Active shooter: an agent-based model of unarmed resistance. In: 2016 winter simulation conference (WSC). Washington, DC, 11–14 December 2016, pp. 3521–3531. IEEE.

13.

Anklam

Kirby

Sharevski

, et al. Mitigating active shooter impact: analysis for policy options based on agent/computer-based modeling. Technical report, MITRE Corporation, 2014.

14.

Hayes

. Agent-based simulation of mass shootings: determining how to limit the scale of a tragedy. J Artif Soc Soc Simul 2014; 17: 5.

15.

Stewart

. Active shooter simulations: an agent-based model of civilian response strategy. PhD Thesis, Iowa State University, 2017.

16.

Lee

Dietz

Ostrowski

. Agent-based modeling for casualty rate assessment of large event active shooter incidents. In: 2018 winter simulation conference (WSC), Gothenburg, 9–13 December 2018, pp. 2737–2746. IEEE.

17.

Lee

. Agent-based modeling to assess the effectiveness of run hide fight. PhD Thesis, Purdue University, 2019.

18.

Office of the Child Advocate State of Connecticut. Shooting at sandy hook elementary school: report of the office of the child advocate. Technical report, State of Connecticut, 2014.

19.

Osborne

Capellan

. Examining active shooter events through the rational choice perspective and crime script analysis. Secur J 2017; 30: 880–902.

20.

McClurg

Wagner

. Using virtual reality to simulate and study the movements of school shooters. In: 2025 annual modeling and simulation conference (ANNSIM), Madrid, 26–29 May 2025, pp. 1–13. IEEE.

21.

Macy

Willer

. From factors to actors: computational sociology and agent-based modeling. Annu Rev Sociol 2002; 28: 143–166.

22.

US Congress. S.150—assault weapons ban of 2013. 113th Congress, 1st session. https://www.congress.gov/bill/113th-congress/senate-bill/150

23.

Xie

Chen

Kwan

, et al. Numerical simulation of the fire emergency evacuation for a metro platform accident. Simulation 2021; 97: 19–32.

24.

Niu

Song

. A simulation model fusing space and agent for indoor dynamic fire evacuation analysis. Simulation 2016; 92: 215–232.

25.

Wang

Jiang

, et al. Metro station evacuation safety assessment considering emergency response. Simulation 2022; 98: 919–931.

26.

Bae

Lee

Hong

, et al. Simulation-based analyses of an evacuation from a metropolis during a bombardment. Simulation 2014; 90: 1244–1267.

27.

Iskandar

Dugdale

Beck

, et al. Agent-based simulation of seismic crisis including human behavior: application to the city of Beirut, Lebanon. Simulation 2024; 100: 357–377.

28.

Gillet

Daudé

Saval

, et al. Modeling staged and simultaneous evacuation during a volcanic crisis of la soufrière of guadeloupe (France). Simulation 2024; 100: 401–416.

29.

Malleson

See

Evans

, et al. Implementing comprehensive offender behaviour in a realistic agent-based model of burglary. Simulation 2012; 88: 50–71.

30.

Kennedy

. Modelling human behaviour in agent-based models. In: Heppenstall

Crooks

See

, et al (eds) Agent-based models of geographical systems. Springer, 2011, pp. 167–179.

31.

Sighencea

Stanciu

Căleanu

. A review of deep learning-based methods for pedestrian trajectory prediction. Sensors 2021; 21: 7543.

32.

Helbing

Molnar

. Social force model for pedestrian dynamics. Phys Rev E 1995; 51: 4282.

33.

Rudenko

Palmieri

Herman

, et al. Human motion trajectory prediction: a survey. Int J Rob Res 2020; 39: 895–935.

34.

Korbmacher

Tordeux

. Review of pedestrian trajectory prediction methods: comparing deep learning and knowledge-based approaches. IEEE Trans Intel Transp Syst 2022; 23: 24126–24144. https://doi.org/10.1109/TITS.2022.3205676

35.

Hochreiter

Schmidhuber

. Long short-term memory. Neural Comput 1997; 9: 1735–1780.

36.

Alahi

Goel

Ramanathan

, et al. Social LSTM: human trajectory prediction in crowded spaces. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas, NV, 27–30 June 2016, pp. 961–971. IEEE.

37.

Bartoli

Lisanti

Ballan

, et al. Context-aware trajectory prediction. In: 2018 24th international conference on pattern recognition (ICPR), Beijing, 20–24 August 2018, pp. 1941–1946. IEEE.

38.

Bisagno

Zhang

Conci

(eds). Group LSTM: group trajectory prediction in crowded scenarios. In: Proceedings of the European conference on computer vision (ECCV) workshops. Springer, 2018, pp. 243–259. https://doi.org/10.1007/978-3-030-11018-5_16

39.

Pei

Zhang

, et al. Human trajectory prediction in crowded scene using social-affinity long short-term memory. Pattern Recognit 2019; 93: 273–282.

40.

Varshneya

Srinivasaraghavan

. Human trajectory prediction using spatially aware deep attention models. arXiv Preprint. 2017. https://doi.org/10.48550/arxiv.1705.09436

41.

Vemula

Muelling

(eds). Social attention: modeling attention in human crowds. In: 2018 IEEE international conference on robotics and automation (ICRA). IEEE, 2017, pp. 4601–4607.

42.

Fernando

Denman

Sridharan

, et al. Soft+ hardwired attention: an LSTM framework for human trajectory prediction and abnormal event detection. Neural Netw 2018; 108: 466–478.

43.

Xue

Huynh

Reynolds

. SS-LSTM: a hierarchical LSTM model for pedestrian trajectory prediction. In: 2018 IEEE winter conference on applications of computer vision (WACV), Lake Tahoe, NV, 14–15 March 2018, pp. 1186–1194. IEEE.

44.

Pfeiffer

Paolo

Sommer

, et al. A data-driven model for interaction-aware pedestrian motion prediction in object cluttered environments. In: 2018 IEEE international conference on robotics and automation (ICRA), Brisbane, QLD, Australia, 21–25 May 2018, pp. 5921–5928. IEEE.

45.

Shi

Shao

Guo

, et al. Pedestrian trajectory prediction in extremely crowded scenarios. Sensors 2019; 19: 1223.

46.

Pellegrini

Ess

Schindler

, et al. You’ll never walk alone: modeling social behavior for multi-target tracking. In: 2009 IEEE 12th international conference on computer vision, Kyoto, Japan, 28 September–2 October 2009, pp. 261–268. IEEE.

47.

Lerner

Chrysanthou

Lischinski

. Crowds by example. Comput Graph Forum 2007; 26: 655–664.

48.

Arias

Mossberg

Nilsson

, et al. A study on evacuation behavior in physical and virtual reality experiments. Fire Technol 2022; 58: 817–849.

49.

Yin

Nayyar

Holman

, et al. Validation and evacuee modeling of virtual robot-guided emergency evacuation experiments. PsyArXiv. 2024. https://doi.org/10.31234/osf.io/mr78s

50.

Shipman

Majumdar

Feng

, et al. A quantitative comparison of virtual and physical experimental paradigms for the investigation of pedestrian responses in hostile emergencies. Sci Rep 2024; 14: 6892.

51.

Dequidt

Courtecuisse

Comas

, et al. Computer-based training system for cataract surgery. Simulation 2013; 89: 1421–1435.

52.

Huri

Gülşen

Karmiş

, et al. Cadaver versus simulator based arthroscopic training in shoulder surgery. Turk J Med Sci 2021; 51: 1179–1190.

53.

Kawashima

Nader

Collins

, et al. Virtual reality simulations in robotic surgery training: a systematic review and meta-analysis. J Robot Surg 2025; 19: 1–10.

54.

Qin

You

Liu

, et al. Development of virtual reality training system for combat musculoskeletal trauma care. Simulation 2025; 101: 3–11.

55.

Harris

Arthur

Kearse

, et al. Exploring the role of virtual reality in military decision training. Front Virtual Real 2023; 4: 1165030.

56.

Fox News. Chilling animation: Parkland shooter’s movements in school, 2018. https://www.youtube.com/watch?v=Laizg39LsuQ&t=2s

57.

McClurg

Wagner

(eds). Developing a discrete-event simulator of school shooter behavior from VR data. In: 2026 annual modeling and simulation conference (ANNSIM). IEEE, 2026, pp. 1–13.

58.

Jayaparvathy

Sheeba Angel

. Study of human factors contributing to fatal injury in a multi-floor building during emergency using finite state machines. In: 2021 Asian conference on innovation in technology (ASIANCON), Pune, India, 27–29 August 2021, pp. 1–5. IEEE.

59.

Che

Niu

Shui

, et al. A novel simulation framework based on information asymmetry to evaluate evacuation plan. Vis Comput 2015; 31: 853–861.

60.

Kielar

Handel

Biedermann

, et al. Concurrent hierarchical finite state machines for modeling pedestrian behavioral tendencies. Transp Res Procedia 2014; 2: 576–584.

61.

U.S. Department of Education, Office of Elementary and Secondary Education, Office of Safe and Healthy Students. Guide for Developing High-Quality School Emergency Operations Plans. Washington, DC: U.S. Department of Education, 2013. https://www.ed.gov/media/document/rems-guide-developing-high-quality-emergency-operations-plans-k-12-schools-2013-113150.pdf

62.

Angelov

Petkov

Shipkovenski

, et al. Modern virtual reality headsets. In: 2020 international congress on human-computer interaction, optimization and robotic applications (HORA), Ankara Turkey, 26–28 June 2020, pp. 1–5. IEEE. https://doi.org/10.1109/HORA49412.2020.9152604

63.

Mehrfard

Fotouhi

Taylor

, et al. A comparative analysis of virtual reality head-mounted display systems. arXiv Preprint. 2019. https://doi.org/10.48550/arXiv.1912.02913

64.

Peterson

Densley

. The violence project mass shooter database, Version 7. The Violence Project, 2023. https://theviolenceproject.org/databases/mass-shooters

65.

Bieglmayer

. Apparatus for capturing movements of a person using the apparatus for the purposes of transforming the movements into a virtual space, 2022. US Patent 11,216,081.

66.

Lakens

. Equivalence tests: a practical primer for t tests, correlations, and meta-analyses. Soc Psychol Pers Sci 2017; 8: 355–362. https://doi.org/10.1177/1948550617697177

67.

Lavergne

. University of Texas tower shooting (1966). Handbook of Texas, 2017.

68.

Services

Lloyd

. 40 years later, families endure ‘pain of remembering’ cal state fullerton mass shooting, 2016. https://www.nbclosangeles.com/news/cal-state-fullerton-mass-shooting-memorial/154932/

69.

Books

. Mass murderers. Time Life Medical, 1993.

70.

Davidson

. Under fire: The NRA and the battle for gun control. University of Iowa Press, 1998.

71.

PBS

FRONTLINE

. 111 years without parole: the killer at Thurston High, 2024. https://www.pbs.org/wgbh/pages/frontline/shows/kinkel/trial/

72.

Bull

. KLCC special documentary: the Thurston high school shooting—20 years later — klcc.org, 2018. https://www.klcc.org/crime-law-justice/2018-05-21/klcc-special-documentary-the-thurston-high-school-shooting-20-years-later

73.

Hughes

. Feds: assault at Red Lake over in nine minutes, 2005. https://news.minnesota.publicradio.org/features/2005/04/18_hughesa_redlakeupdate/

74.

Reuters. Virginia tech rampage lasted just nine minutes, 2007. https://www.reuters.com/article/world/virginia-tech-rampage-lasted-just-nine-minutes-idUSN16311336/

75.

Chicago Tribune. 6 dead in NIU shooting — chicagotribune.com, 2008. https://www.chicagotribune.com/2008/02/15/6-dead-in-niu-shooting-2/

76.

Martin

Stewart

. NIU Gunman had ’stopped taking medication’, 2008. https://www.npr.org/2008/02/15/19073647/niu-gunman-had-stopped-taking-medication

77.

Ray

. Sandy Hook elementary school shooting, 2025. https://www.britannica.com/event/Sandy-Hook-Elementary-School-shooting

78.

Blankstein

. Santa Monica gunman fired 100 rounds during rampage, officials say, 2013. https://www.latimes.com/local/lanow/la-xpm-2013-jun-12-la-me-ln-santa-monica-gunman-100-rounds-20130612-story.html

79.

Serna

. Elliot Rodger meticulously planned Isla Vista rampage, report says, 2015. https://www.latimes.com/local/lanow/la-me-ln-santa-barbara-isla-vista-rampage-investigation-20150219-story.html

80.

HeraldNet. Timeline of the Marysville Pilchuck shooting, 2014. https://www.heraldnet.com/news/timeline-of-the-marysville-pilchuck-shooting/

81.

Johnson

Cavaliere

. Two killed, four wounded in Washington State school shootings2014. https://www.yahoo.com/news/washington-state-high-school-lockdown-shooting-reports-183324596.html

82.

Perez

. Florida school shooter could have fired many more bullets, 2018. https://www.cnn.com/2018/02/27/us/florida-school-shooter-ammunition-left/index.html

83.

Altavena

Shepard

Witsil

, et al. 4 dead, 7 injured in Oxford high school shooting; suspect is 15-year-old student, 2021. https://www.freep.com/story/news/local/michigan/oakland/2021/11/30/oxford-high-school-active-shooter-victims/8810588002/

84.

Foster

Kuznicki

. Judge rules ‘every bullet’ fired in Oxford high school shooting could affect settlement amount, 2024. https://www.wilx.com/2024/11/15/judge-rules-every-bullet-fired-oxford-high-school-shooting-could-affect-settlement-amount/

85.

Burrows

Moody

Guzman

. Texas house of representatives: investigative committee on the Robb elementary shooting, 2022. https://www.house.texas.gov/

86.

Smithson

Breslin

. Demands for shooter’s writings continue month after Nashville school shooting, 2023. https://www.wsmv.com/2023/04/26/demands-shooters-writings-continue-month-after-nashville-school-shooting/

87.

Gadd

Wegner

Fiscus

, et al. Nashville school shooting: seven fatally shot at Covenant School, including 28-year-old suspect, 2023. https://www.tennessean.com/story/news/2023/03/27/covenant-school-nashville-shooting-green-hills/70052363007/

88.

Halliday

Resnick

Walker

. Fundamentals of physics. 10th ed. John Wiley & Sons, 2013.

89.

Rumelhart

Hinton

Williams

. Learning representations by back-propagating errors. Nature 1986; 323: 533–536.

90.

McClurg

Wagner

. Studying the effects of robot intervention on school shooters in virtual reality. In: Proceedings of the 2026 international joint conference on artificial intelligence and European conference on artificial intelligence (IJCAI-ECAI), Bremen, 15–21 August 2026.

91.

Bhatt

Prajapati

, et al. A data-centric approach to improve performance of deep learning models. Sci Rep 2024; 14: 22329.

92.

Schwabe

Becker

Seyferth

, et al. The metric-framework for assessing data quality for trustworthy AI in medicine: a systematic review. NPJ Digit Med 2024; 7: 203.

93.

U.S. Department of Health and Human Services. Protection of human subjects, 45 C.F.R. § 46. https://www.ecfr.gov/current/title-45/subtitle-A/subchapter-A/part-46 (2025, accessed 23 August 2025).