Abstract
There is a need to leverage non-electric applications of nuclear heat in generating additional income streams via thermal power dispatch (TPD) to other industrial processes, especially during periods of low electricity demand. Research efforts at the Idaho National Laboratory (INL) have been exploring the development of a TPD system to supply heat to a nearby high-temperature steam electrolysis (HTSE) plant for hydrogen production. As the development of the TPD system evolves, there is a need to develop supporting operating concepts for operating the system in the control room simultaneously. This paper describes a graded simulator-based approach to assess mental workload while testing the TPD operating concepts. The mental workload of three distinct groups of participants comprising college students, operational experts and trained operators was assessed in three separate studies using TPD simulators at two levels of fidelity. Results showed students’ overall workload was low, and the workload of trained operators was medium across all scenarios. Further analyses revealed additional findings discussed within the context verification of the TPD operating concepts implemented using a graded simulator-based approach.
Keywords
Introduction
Nuclear power plants (NPP) remain a reliable source of clean baseload electrical power and hold immense potential for achieving the U.S. net zero carbon emission goals. To make NPPs economically competitive with other alternative power sources like natural gas and fossil fuels, there is a need to leverage non-electric applications of nuclear heat in generating additional income streams, especially during periods of low electricity demand. Alternative use cases of nuclear heat will require a thermal power dispatch (TPD) system to channel heat from the NPP to alternative users. Research efforts at the Idaho National Laboratory (INL), under the auspices of the Flexible Plant Operations and Generations (FPOG) pathway, have been exploring the development of a TPD system to supply heat to a nearby high-temperature steam electrolysis (HTSE) plant for hydrogen production. Modernizing existing plants and the design of new advanced reactors to include TPD capabilities will go a long way to preserve the continued operation of large light water reactors (LWRs) of the U.S. NPP fleet, which serves as critical energy infrastructure (Knighton et al., 2020) and enable the economic competitiveness of new and existing plants in general.
Safety and efficiency are paramount in conventional plant modernization and advanced reactor development. Human factors engineering (HFE), involving the development and testing of new control room operating concepts is critical to advanced reactor technology designs, and the modernization of existing plants to include advanced capabilities like the TPD. Unfortunately, the HFE component does not receive the needed attention in the development lifecycle, deferring its development to the later stages (Gideon & Boring, 2023). As the development of the TPD system evolves, there is a need for parallel and iterative development of supporting operating concepts for operating the system in the control room and conducting human operator-in-the-loop (HOIL) testing to determine the impact of the operating concepts on operator performance. Mental workload is an important construct providing insight into the level of cognitive demand imposed on operators performing control room tasks.
Mental workload can be measured in three different ways, using (1) perceived or subjective experience, (2) observed performance impacts, or (3) physiological impacts of the task demands. The safe and efficient operations of NPPs require constant monitoring and detection, timely decision-making, and execution of control actions (O’Hara, 2000), all of which impose varying levels of mental workload. Operators require an optimal level of workload to ensure alertness, however, a mental workload beyond the levels required for optimum performance can lead to stress, degrading operator performance (Matthews et al., 2010) and compromising plant safety. Therefore, testing new operating concepts to ensure they do not impose a significant mental workload beyond what is currently observed in the operating plant is paramount.
Simulators may be used to conduct iterative testing of novel operating concepts like the TPD in a graded fashion. The early stages of testing involve operator surrogates, such as college students operating the simulator, to detect usability issues common to both students and trained operators. Even though well-trained students may not attain the same level of expertise to understand the underlying basis of their actions, they have been shown to experience equivalent levels of mental workload with ample training when control room tasks are simplified (Yang et al., 2023). This implies testing the TPD operating concept with varying participants at different points in system development using simulators of corresponding fidelity should produce equivalent levels of mental workload.
Graded Simulator-Based Approach
Simulators are used for the verification and validation (V&V) of new designs, components, and operating concepts in NPP operations according to the Human Factors Engineering Program Review Model (NUREG-0711; O’Hara et al., 2012). A graded simulator-based approach (Gideon & Boring, 2023) employs simulators of varying fidelity for simultaneous and iterative design and control concept testing. Initial licensing of NPPs and subsequent license amendments arising from changes due to introducing novel operating concepts like the TPD require documented V&V. Additionally, V&V, especially with simulators of varying fidelity, provide cost-effectiveness and pragmatism in that part-task simulators may be used for verifying advanced operating concepts in the early phase of development while deferring integrated system validation (ISV) using full-scope simulators to later stages of development. The graded-simulator-based approach was used to develop the TPD models, human-system interfaces (HSIs), operating concepts (ConOps), use case scenarios, and corresponding procedures through this multi-year research project. The TPD is a novel system and requires multiple development iterations. The graded simulator-based approach enables iterative testing with simulators of corresponding fidelity across the development lifecycle using formative and summative evaluation methods.
Formative and summative evaluations represent a continuum of control room development activities using simulator-based testing (Boring et al., 2021). The goal of formative evaluation is to derive insights to address basic usability issues using low fidelity simulators. Summative evaluation is conducted toward the end of system development using high functional and physical fidelity simulators. Overall, adopting the graded simulator-based approach facilitates pragmatic and cost-effective iterative testing using operator surrogates during the initial stage of development, addressing general usability issues early and allowing trained operators to focus on the technical expertise required to operate the system during summative evaluation.
Thermal Power Dispatch Simulator
A HOIL assessment involves human operators performing representative control room scenarios in a simulator environment to assess the performance of a new system or system component like the TPD. The TPD system models were developed as part of INL’s ongoing efforts and implemented in GSE Solutions’ Generic Pressurized Water Reactor (GPWR) simulator to support HOIL using scenario-based assessment (Ulrich et al., 2021). The current TPD iteration has a rated capacity of 100 MWt for a 500 MWe hydrogen plant. Frequency was used as the basis for determining scenario representativeness. The TPD is not expected to undergo startup and shutdown scenarios often as it would only be shut down during outages, which occur every eighteen months on average. During most of its operation, the TPD system will be in either Hot Standby or Online modes. Therefore, the most frequent operating scenarios would be transitioning from Hot Standby to Online and from Online to Hot Standby. The two representative operating scenarios correspond to raising the steam extraction rate from Hot Standby (with marginal extraction) to Online (at 2.5 lb/s) and vice versa. For convenience, the scenarios Hot Standby to Online and Online to Hot Standby will be referred to by the shorthand forms engage and disengage TPD.
The TPD scenarios are performed by operators interacting with the HSI (Figure 1). Prior work captures a detailed TPD HSI specification (Ulrich et al., 2023). Operating scenarios can be performed in Manual control or Auto Control modes, hereafter referred to as the shorthand “Manual control” and “Auto control” for convenience. In Manual control, the operator executes a sequence of procedure steps in line with the adapted operating procedures. Auto control enables the automatic execution of procedure steps while the operator performs verification according to the U.S. NRC’s rules on control automation requiring operator verification (O’Hara & Higgins, 2020). The goal of the current study was to assess the mental workload of operators while verifying the TPD operating concepts using different participant groups and varying levels of simulator fidelity according to the graded simulator-based approach. Two categories of studies were conducted as formative and summative evaluations. The formative evaluations compared participants’ workload operating the TPD in Auto control and Manual control. The summative evaluation was targeted at assessing the mental workload of a different participant group operating the TPD system in Auto control. The study involved two hypotheses. First, Auto control functionality would reduce mental workload. Second, mental workload levels would be equivalent across the different participant groups operating simulators of varying fidelity levels.

Thermal power dispatch human-system interface (HSI).
Methods
Participants
According to the graded simulator-based approach, the studies comprised participants with varying experience levels operating simulators of different fidelities. Each study had a distinct group of participants. The participant groups comprised college students (N = 12), operational experts (a mix of former operators and non-operators; N = 6), and trained operators (N = 2).
The student participants had no prior operational experience. Students have been shown to experience comparable mental workload as trained operators when complex tasks are adapted to simpler ones and with adequate training (Yang et al., 2023). The current study included a middle ground of operational experts comprising a mix of former and non-operators to introduce more variability on the continuum of participant experience. The non-operators in the operational expert group have in-depth operational knowledge but no recent plant operational experience. The last group of participants comprised highly trained operators with current operating licenses. These participants had 18 years of combined experience as licensed commercial NPP operators. The Institutional Review Boards approved the study protocols of the corresponding institutions
Location/Facilities
The TPD tests consisted of three separate studies conducted at three different locations: the Psychology Department at the University of Idaho (UI), the Human Performance Test Facility (HPTF) at the U.S. Nuclear Regulatory Commission (NRC), and the Human Systems Simulation Laboratory (HSSL) at INL. The UI and NRC studies constituted formative evaluations, and the INL study was a summative evaluation. The formative evaluations were first conducted with students and operational experts as participants to elicit feedback for refining the TPD prior to the summative evaluation six months later. The formative evaluations involved performing the scenarios in Manual control and Auto control, while the summative evaluation was performed in Auto control only. The combination of operating scenarios and control modes yielded four operating conditions used in the studies (Table 1).
Operating Conditions: Operating Scenarios and Control Modes.
The formative and summative testing used two different levels of HSI fidelity. In the formative tests, the TPD was adapted as a part-task simulator, fitting the HSI into a single 32-inch 4K UHD monitor and connected to the GPWR simulator via an application programming interface for dynamic element presentation. The summative testing involved a full-scope simulator with multiple HSI panels similar to what is obtainable in conventional LWR control rooms. In the summative testing environment, the simulator HSI represented more realism as is obtainable in an actual control room setting.
Measures
The mental workload of participants was measured using the National Aeronautics and Space Administration Task Load Index (NASA-TLX; Hart & Staveland, 1988). The NASA-TLX is a multidimensional measure of subjective workload imposed by performing a task. The measure defines workload along three dimensions: mental demand (MD), physical demand (PD), and temporal demand (TD). Additionally, three dimensions related to operator interaction with the task are included in the NASA-TLX: performance (OP), effort (EF), and frustration (FR). Participants ranked the mental workload experienced performing the task by rating each of the six dimensions on a scale of (10 = very low and 100 = very high). The final workload score was calculated as an average rating of all six domains, with scores ranging from 0 to 100.
Protocols
The formative evaluations conducted in the two different locations (UI and NRC) followed the same abbreviated and simplified protocol, while the INL study was conducted using a separate protocol scaled to accommodate the increased simulator fidelity and larger number and types of scenarios. The formative testing was a 2(Operating scenario: Engage TPD, Disengage TPD) × 2(Control mode: Auto control, Manual control) repeated measures design in which participants completed the tasks individually. The study protocol of the formative evaluation proceeded as follows. Participants completed informed consent and watched a video showing a walkthrough of engaging the TPD in Manual control. Then, participants were randomly assigned to conditions, attempting to force equal sample sizes as much as possible. Next, participants completed one operating scenario in Manual and Auto controls, with order counterbalanced. Afterward, the participants completed the NASA-TLX. Participants completed the second scenario in Manual and Auto controls and completed a second NASA-TLX. Participants were debriefed on their general experience performing the tasks.
The summative testing was a repeated measures design involving the two operating scenarios in Auto control only conditions. Participants operated as a crew of two in reader-doer roles (Ulrich et al., 2024), with one participant as the shift supervisor (SS) and the other as the operator at the controls (OAC). The SS performed the reader role in using the procedures to guide the scenario by maintaining a high level of the process and instructed the OAC to perform monitoring and execute control actions. The study spanned four days. On the first day, participants provided informed consent, went through a general orientation, and were acquainted with the layout of the HSSL. The second day included system fundamentals and operations training, followed by a dry-run scenario. Participants executed the scenarios on the third and fourth day, completing the NASA-TLX after each scenario and providing a debrief on their experience.
Results and Discussions
The NASA-TLX data were analyzed using marginal mean averages of the four operating conditions (Figure 2). The resulting marginal mean averages were interpreted using a categorization scheme established in oil and gas process control giving the comparable nature of operations and operating crews to what is obtainable nuclear process control. For example, the central control room (CCR) of oil processing plants operates round the clock with operating crews monitoring and controlling the plant over 8-hour shifts (Sugarindra et al., 2017), similar to what is obtainable in NPP control rooms. Therefore, the interpretation of mental workload scores in CCR was applied to the current study as low (0–9), medium (10–29), rather high (30–49), high (50–79), and very high (80–100). Results showed students’ overall mental workload was low across all operating conditions. Operational experts reported medium mental workload levels across all scenarios but engaging the TPD in Auto control. Results for the trained operators showed a medium mental workload across the two operating conditions. A medium level of workload in the operational experts and trained operator groups partially supports the hypothesis that mental workload levels would be equivalent across the different participant groups operating simulators of varying fidelity levels.

Average NASA-TLX scores by operating conditions.
Another hypothesis was Auto control functionality would reduce mental workload. The analysis showed a lower workload disengaging the TPD in Auto control by both students and operational experts (Figure 2). However, workload levels were higher engaging the TPD in Auto control for both students and operational experts. This pattern showed the hypothesis was supported in disengaging the TPD but not in engaging the TPD. Figure 3 shows frustration level ranking across the three participant groups with operational experts reporting a relatively high frustration level in the same scenario (engaging the TPD) where Auto control was reported to involve higher mental workload.

Average frustration rating by operating conditions.
Post-task debriefs of student and operational experts provided some insights into the patterns observed in the NASA-TLX scores. The feedback suggested the higher workload recorded throughout engaging the TPD in Auto control was likely driven by the nature of the training received by the students prior to performing the task and their limited operational understanding. Since students are novices and have a limited understanding of the underlying operational significance of their actions, they may have built a fragile mental model around the Manual control procedure steps used in the training process. Hence, the Auto control, intended to enhance their performance, may have disrupted their mental model of the system operation. As for the operational experts who had sufficient knowledge of the underlying operational significance of their actions and could infer what the automation was doing, the feedback was the TPD HSI provided insufficient information to enable them to infer what was happening in the backend. Feedback addressing the concerns raised by the students and operational was used to enhance the TPD HSI prior to the full-scope study involving the trained operators. Overall, the sharp drop in frustration levels in trained operators suggests changes made in response to the feedback from the operational experts, and more information provided via multiple screens in the high fidelity simulator reduced mental workload.
Conclusions
In conclusion, this study provided more impetus for the graded simulator-based approach as a proactive and cost-effective strategy for testing novel operating concepts like the TPD from early development to maturity, preventing potential costs associated with rework during late development and licensing applications. Using different participant groups and simulators of varying fidelities for multi-stage verification provided further assurance that the TPD does not impose an additional workload on operators. The comparable medium workload levels between operational experts and trained operators partially support the hypothesis that mental workload levels would be equivalent across the different participant groups operating simulators of varying fidelity levels. The lower mental workload recorded by the students was understandable, given their level of expertise and the experimental artifact of the training video. The benefits of the graded simulator-based approach can be maximized by ensuring formative evaluations are tailored to account for nuances associated with performing the task by different participant groups to prevent the unintended impact of artifacts.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work of authorship was prepared as an account of work sponsored by Idaho national Laboratory, an agency of the U.S. Government. Neither the U.S. Government nor any agency thereof, nor any of their employees, makes any warranty, expressed or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness, of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights.
