Abstract
Given the same set of training, qualifications, and information, experts in a reliable and trustworthy system should theoretically make similar decisions. However, undesired decision variability (“noise”) has been observed in many high-stakes domains, which is both concerning and unsurprising given the complexity of decisions and known influence of several factors. Moreover, not all variability is undesirable, and the study of intuitive decision-making in naturalistic settings demonstrates the skill of the human expert when evaluating complex cases. The current paper describes a “noise audit” where we evaluated individual risk-informed decision making for regulatory action at U.S. commercial nuclear power plants. In a scenario study, individuals exhibited high consistency in inspection type decisions, although they varied substantially when assigning a significance code, resulting in many decisions that were different than what would be expected by the agency. Together, the results indicate the presence of decision “noise” in one of the two nuclear regulatory processes examined, though some of that variability is desirable given plant and case-specific differences.
Keywords
Introduction
Undesired decision variability, or “noise,” popularized in the 2021 book of the same name (Kahneman et al., 2021), has been observed in many high-impact systems including medical diagnosis, scientific peer review, and sentencing in the judicial system (Bonavia & Marin-Garcia, 2023; Choi et al., 1998; Hofer et al., 1999). Decision noise can have large consequences for individual cases, resulting in unjust outcomes or inappropriate costs, but it can also degrade the functioning and reliability of the system as a whole.
Given a group of individuals operating with the same set of information and level of training, each individual should theoretically arrive at the same decision. However, studies in the risk analysis domain demonstrate that decisions about risk parameters are subject to various individual biases (Montibeller & von Winterfeldt, 2015), and many factors can also contribute “noise” to a decision system, defined as undesired variability in decisions that should ideally be identical (Kahneman et al., 2021). Decision noise can be characterized in a variety of ways, generally as either “Level Noise” (i.e., variability in average level of judgments between individuals) or “Pattern Noise” (i.e., variability in an individual’s responses to particular cases). Individual differences factors (“stable pattern noise”) like personality or risk tolerance (Rolison & Shenton, 2020) and more transient factors (i.e., “occasion noise”) like time of day or physical state (fatigue, hunger, etc.) have also been shown to influence decision consistency (Neprash & Barnett, 2019).
The presence of human variability (both between people and within a person) has long been known in behavioral science, contributing to methodological developments such as interrater reliability statistics and repeated measures experimental designs (Belur et al., 2021). However, building upon the vast array of research on bias in judgment and decision making, “noise” has started to receive more attention, especially in domains like medicine (Krauss et al., 2016) and hiring (Campion et al., 1998). One of the key tenets of the recent research is that organizations should try to reduce the extent to which major decisions are subject to the “lottery” of who is chosen as the decision-maker. Encouraging or enforcing guidelines and practicing other “decision hygiene” activities can decrease variability and result in a more reliable and predictable process (Kahneman et al., 2021; Levashina et al., 2014).
However, not all decision variability is undesirable, and there is much to be said for skilled expert intuition and the value it can bring to decision making in highly complex and uncertain environments (Kahneman & Klein, 2009). Although decisions are often made without decision-makers’ explicit awareness of their decision processes (Fischhoff & Broomell, 2020), skilled experts are thought to perform a subconscious pattern recognition task when faced with a novel situation, identifying aspects of the current situation that may be similar to situations they have encountered in the past (Klein et al., 2010). Rather than decision makers explicitly comparing and choosing from a set of two or more options as has been posited by some theories of judgment and decision making (Johnson & Raab, 2003), Naturalistic Decision Making theories posit that judgments are informed by a relatively subconscious pattern recognition and more conscious mental simulation activity that allows skilled experts to arrive at remarkably perceptive and informed decisions (Klein, 1993; Klein et al., 2010). These theories suggest that variability may (accurately) reflect true individual expert intuition about a situation and that the opportunity for variability should not be completely designed out of a system. For complex systems especially, reductionist, revisionist approaches to safety and accident evaluations (i.e., attempting to identify a single or few “causes”) tend to underestimate the impact of the emergent properties of the system that are simply not all knowable in any one evaluation (Dekker et al., 2011). A systems-focused approach with an emphasis on diversity of narrative is argued to be more desirable than more single-component cause investigations. These accounts would warn against striving for consistency at all costs. Similarly, much safety research suggests that “over-proceduralization” can have unintended consequences (Fuchs & Dien, 2017), and the introduction of procedures and protocols themselves increase the complexity of the system (Dekker et al., 2011).
Risk-informed decision making is especially complex and subject to the particulars of a given situation by design. The U.S. Nuclear Regulatory Commission uses a risk-informed decision making process (the “Reactor Oversight Program”) that was initially developed and refined in the late 1990s as a response to concerns about the former regulatory process (the Systematic Assessment of Licensee Performance Process) that was criticized as being subjective and inconsistent (U.S. Nuclear Regulatory Commission, 2023a) and consequently subject to unpredictable and unjust outcomes. It is important to note that the Reactor Oversight Program is risk-informed and not risk-based because it considers many other factors in addition to the quantified risk values, for example, qualitative factors such as safety-margins affected and defense-in-depth impacts. Risk-informed processes are further complemented by the performance-based approach that the NRC takes with respect to regulations. The program aims to minimize inconsistency between the NRC regions and encourages more objective and risk-informed decisions, while still necessarily relying on the professional judgment of analysts and managers. “Regions” in this context could refer to either the geographical area and the commercial nuclear power plants contained therein, or the NRC’s Regional Offices and the associated decision-makers (https://www.nrc.gov/about-nrc/locations.html). Unless otherwise noted, the term “regions” in the paper is referring to the latter and not the former. Each Regional Office operates independently, though they use the same guidelines and procedures (developed by the program office in Headquarters) for inspection/oversight. See the following website for more information on the organization and relationships between the Regional Offices and the Office of Nuclear Reactor Regulation: https://www.nrc.gov/docs/ML2132/ML21320A324.pdf.
Indeed, the Reactor Oversight Program follows many protocols that encourage decision hygiene (Kahneman et al., 2021), including guidelines, risk-informed thresholds, and methods for minimizing the effects of bias (U.S. Nuclear Regulatory Commission, 2023a). The Reactor Oversight Program has been in use since 2000 and has succeeded in standardizing the NRC’s processes, resulting in more objective, traceable, and risk-informed decisions. However, the NRC continually evaluates and iterates its procedures using both self-assessment and external evaluations to meet its primary objectives of “ensur[ing] that commercial nuclear power plants are operated in a manner that provides adequate protection of public health and safety” (U.S. Nuclear Regulatory Commission, 2023, p. 11). Since the primary mission of the NRC is to ensure public safety through transparent and risk-informed processes, the aim of this study is both to conduct a self-evaluation to determine the level of variability for two important processes supporting the Reactor Oversight Program and to continue to provide transparency of the NRC’s processes, thus retaining (and where possible, increasing) the public trust (Grimmelikhuijsen, 2009).
The current paper describes an exploratory study of decision making in the nuclear regulatory domain. While much prior research has aimed at understanding judgment and decision making for nuclear power plant operators (Grosdeva & De Montmollin, 1994; Norros et al., 2014; Roth et al., 1994), less emphasis has been placed on the decision making of regulators, who also impact the safety and reliability of the nuclear industry as a whole. We describe our approach for conducting a pilot “noise audit” of decision variability by evaluating whether individual decision-makers exhibit decision noise between each other when assessing the same set of cases.
NRC Reactor Oversight Program
The NRC is a self-evaluating organization focused on “ensuring the safe use of radioactive materials for beneficial civilian purposes while protecting people and the environment. The NRC regulates commercial nuclear power plants and other uses of nuclear materials, such as in nuclear medicine, through licensing, inspection and enforcement of its requirements” (About NRC | NRC.gov). The aim of the NRC’s Reactor Oversight Program is to evaluate potential safety concerns at commercial power plants in the U.S. and to ensure licensee adherence to safety and other procedures. The Reactor Oversight Program only applies to the existing fleet of light water power plants and does not apply to radioactive materials or other nuclear areas that the NRC regulates. On invitation of the NRC and with our NRC co-investigator, we focused our study on two key decision processes: the Incident Investigation Program (IIP) and the Significance Determination Process (SDP). Each of these processes is intended to aid the regulators in arriving at a defensible regulatory decision given the complexities and inherent uncertainties. The processes are also part of a larger regulatory process that includes elements that were not the focus of the current study (e.g., the Accident Sequence Precursor (ASP) program). The Incident Investigation Program is primarily aimed at deciding what inspection type to conduct following an occurrence, and the Significance Determination Process is primarily aimed at assigning significance to inspection findings identified within the seven cornerstones of safety at operating reactors. The cornerstones are Initiating Events, Mitigating Systems, Barrier Integrity, Emergency Preparedness, Occupational Radiation Safety, Public Radiation Safety, and Security. SDP decisions undergo multiple decision points, with a preliminary estimation of significance, then a formal review period, and then a final decision. Risk-informed decision making—in these two processes and in others—helps the agency focus on safety-significant issues, minimizes resource expenditures on low-risk issue, enhances public perception through transparency, and helps ensure consistent regulatory treatment in both licensing and oversight.
Decisions about regulatory action in each process are based on a series of procedures outlined in detail in the publicly available NRC Incident Investigation Program Management Directive 8.3 (U.S. Nuclear Regulatory Commission, 2023 and NRC Inspection Manual Chapter 0609 (U.S. Nuclear Regulatory Commission, 2020). Broadly speaking, both decisions are made by regional administrators, division directors, branch chiefs, or other decision makers using insights and inputs shared with them by cognizant subject matter experts (SMEs). In both processes there is a limited number of decision makers; typically there are 1 or 2 decision makers in the MD8.3 process and 4 in the Significance Determination Process. In many cases, decisions are informed by calculations of the probability of damage to the reactor core of the plant (the Conditional Core Damage Probability or the Core Damage Frequency) and results from other evaluations conducted by NRC personnel other than the ultimate decision maker. However, according to procedures, decision makers are meant to review these materials and arrive at an independent decision prior to the group meeting (see Figure 1). Overview of two major components of the NRC’s reactor oversight program. Note. The IIP program (second column from the left) is the Incident Investigation Program and focuses on decisions related to reactive inspection type. The significance determination process (SDP; far right) assigns safety significance to inspection findings. Each of these decisions is informed by calculated core damage probabilities or frequencies, when available by experienced senior reactor analysts.
Method
We developed a scenario-based survey that presented all decision-makers with the same set of naturalistic scenarios and prompted them to make decisions like they would in the real world. We then compared individual decisions on each of the same scenarios to evaluate decision consistency.
Study 1—IIP
Materials
We constructed naturalistic scenarios that described information related to specific events and/or conditions occurring at hypothetical nuclear reactor facilities. The scenarios were initially written by the NRC co-investigator on the study and second author of this paper and were validated with an expert panel. The scenarios included critical information required for study participants to analyze the case descriptions to judge the need for further evaluation and to assess potential risk/safety significance related to the event or condition. The scenarios were either in the low (CCDP < 1E7), medium (CCDP ∼ 1E-6), or high (CCDP > 1E-4) range and the risk thresholds were selected based on “breakpoints” in the associated NRC procedures. See Appendix A for examples.
Five scenarios were constructed for each process to represent different risk levels and completeness of information provided for analysis. We manipulated the level of risk and the level of “obviousness” (i.e., clear cues to indicate that risk level) to provide a range of scenarios that may reflect real-world situations encountered by these decision makers. It is important to note that the information provided to the survey participants included the same information (in the same format) that would be included in standard review packets at NRC, simulating a naturalistic review situation. The obviousness manipulation was meant to work as a nudge to evaluate whether people would pay greater attention to the other factors in the scenario instead of or in addition to the calculated risk value. Hence “obviousness” in the IIP (and SDP) scenarios is both (1) whether the calculated risk value falls clearly in one category or another—and the proximity to a threshold—and (2) also complicating factors, for example, qualitative inputs that cannot be included in the quantified values, that might influence the decision makers. An example would be if the failed/degraded component in the nuclear plant had a potential extent-of-condition impact beyond the system boundaries. This would not be captured in the probabilistic risk analysis (PRA) model but might “nudge” the decision maker’s estimate of the safety significance to a higher level. The experimental conditions for the IIP survey showing risk level and obvious/not obvious conditions are shown in Figure 2. IIP scenarios. Note. Scenarios varied in risk level and “obviousness” and were presented in a random order. Scenario C was explicitly designed to be a non-obvious medium-risk scenario. Specifically, the authors chose CCDP values for the medium-risk scenarios that were straddling the border between regulatory response categories.
Participants
We recruited participants from within the NRC who were known to have experience with each process. The survey recruitment and data collection were managed independently by the research team to ensure anonymity. We aimed to recruit respondents with high levels of experience with these processes from across regions and roles within the NRC. For the IIP survey, we received responses from 10 individuals with an average of 13.9 years (SD = 7.2) of experience with IIP decisions. The roles of respondents included 1 regional administrator, 1 deputy regional administrator, 3 deputy directors, 3 division directors, and 1 branch chief. One respondent did not state their role.
Procedure
Respondents received a link to an online survey created in Qualtrics and were prompted to provide consent to participation in the study. At the start of the survey, participants were provided information about the purpose of the study: “The objective of the study is to obtain information regarding the application of risk informed decision making as that process is typically applied. In so doing, we hope to better understand how different people approach the process and determine how we might simplify and improve procedures and possibly develop guidelines to assist NRC decision-makers in performing decision – making tasks.” Participants were then instructed: “You will be provided a case scenario that describes events or conditions occurring at a nuclear facility that require your assessment of potential safety risk. Please carefully review the scenario description, and follow NRC standard procedure to arrive at a decision regarding risk determination and recommended course of action. Following your review of the Case Scenario, you will be asked to make appropriate decisions regarding risk estimation and a recommended course of action. Once you have completed the review and decision and recommendation responses, you can proceed to answering a short questionnaire concerning the decision-making process that you used and further information about your specific approach to the process.”
Scenarios were presented one at a time and in a randomized order. Respondents were prompted with the following questions following each case: (1) Based on your review of the case scenario that you have been given, if you are the responsible decision maker, what would be your most appropriate recommendation for further investigation? Please select from the Inspection Types listed below (Baseline/No Additional Inspection, Special Inspection Team, Augmented Inspection Team, Incident Investigation Team), (2) briefly explain the main reason or rationale that you used to make your decision, and (3) how confident are you in your selection of the most appropriate risk value color? (Slider from 0 to 100%).
After completing all questions for the scenario, participants clicked to continue to the next screen, where the next scenario was presented. They proceeded this way to the end of the survey. At the end of the survey, there were several follow-up questions, including questions about whether the study instructions were understandable (or why not), whether the case scenarios gave sufficient information (or why not), whether they believe other individuals with the same experience would respond similarly (or why not), and their title/years of experience. Participants could take as much time as they needed to complete the survey.
Study 2—SDP
Materials
We created a separate set of 5 scenarios that described naturalistic events/conditions that were inspired by real SDP cases. Similar to the IIP scenarios, we manipulated the level of risk and “obviousness,” as shown in Figure 3. The scenarios were either in the low (delta CDF < 1E7), medium (delta CDF ∼ 1E-6), or high (delta CDF > 1E-4) ranges and the risk thresholds were selected based on “breakpoints” in the associated NRC procedure. The SDP scenarios were written and formatted in a way that is standard for information packets in the NRC. SDP scenarios. Note. Scenarios varied in risk level and “obviousness” and were presented in a random order. Scenario C was explicitly designed to be a non-obvious medium-risk scenario. Specifically, the authors chose CCDP values for the medium-risk scenarios that were straddling the border between regulatory response categories.
Participants
Similar to the IIP survey, we aimed to recruit respondents with high levels of experience with the SDP process from across regions and roles within NRC. We received responses from 13 individuals with an average of 14.5 years (SD = 7.0) of experience with SDP decisions. Roles included 3 deputy directors, 1 regional administrator, 5 division directors, 1 branch chief, 1 deputy regional administrator, and 1 enforcement specialist. One respondent did not state their role. 10 of these individuals also completed the IIP survey, as they had experience with both processes.
Procedure
Participants were provided a link to a Qualtrics survey where they gave consent to participant in the study and then read a description of the purpose of the study. Participants received the following instructions: “The objective of the study is to obtain information regarding the application of risk informed decision making as that process is typically applied. In so doing, we hope to better understand how different people approach the process and determine how we might simplify and improve procedures and possibly develop guidelines to assist NRC decision-makers in performing decision–making tasks.” Participants were then given the following instructions: “You will be provided a case scenario that describes events or conditions occurring at a nuclear facility that require your assessment of potential safety risk. Please carefully review the scenario description, and follow NRC standard procedure to arrive at a decision regarding risk determination and recommended course of action. Following your review of the Case Scenario, you will be asked to make appropriate decisions regarding risk estimation and a recommended course of action. Once you have completed the review and decision and recommendation responses, you can proceed to answering a short questionnaire concerning the decision-making process that you used and further information about your specific approach to the process.”
Participants then proceeded through each of the 5 scenarios at their own pace. At the end of each scenario, they were asked the following questions: (1) Based on your review of the case scenario that you have been given, if you are the responsible decision maker, what safety significance characterization do you believe is most appropriate for this case? Choose the color label that represents your preferred risk value from the following list. If you are working as a designated panel member, please choose your selected recommendation for submission to the panel (Green, White, Yellow, or Red (response options)), (2) briefly explain the main reason or rationale that you used to make your decision, and (3) how confident are you in your selection of the most appropriate risk value color? (Slider from 0 to 100%).
After completing all questions for the scenario, participants clicked to continue to the next screen, where the next scenario was presented. They proceeded this way to the end of the survey. At the end of the survey, there were several follow-up questions, including questions about whether the study instructions were understandable (and why not), whether the case scenarios gave sufficient information (and why not), whether they believe other individuals with the same experience would respond similarly (and why not), and their title/years of experience. Scenarios were presented in a random order.
Results
Study 1—IIP
In the two low-risk scenarios, responses were 100% consistent and respondents reported high confidence (84–90% on average). The medium risk scenario elicited the most variability, with 56% of respondents recommending a baseline inspection and 44% recommending a special inspection. The patterns of responses in the high-risk scenarios reflected the manipulation of obviousness. In Scenario D, the majority of participants (78%) chose AIT, with 22% choosing SIT. When the level of risk/safety significance was less explicit in Scenario E, the pattern essentially flipped, with only 11% choosing the AIT response, and the vast majority choosing SIT (89%) (see Figure 4). Response patterns on IIP survey.
As far as confidence, respondents reported high confidence in the two low-risk scenarios (84–90% on average), though confidence levels dipped to the low 70s in the medium risk scenario, showing an expected decrease in confidence when risk was at the border of IIP decision criteria and qualitative information in the scenario was incongruent with what the PRA model indicated. Confidence levels were also high in the high-risk scenarios (80–88%), with the exception of the confidence of those who chose the AIT response in Scenario E (59% confidence) (see Figure 5). Average confidence responses on IIP survey.
We also examined individual differences in confidence levels across the scenarios to evaluate potential individual factors that could contribute to variability. Though the sample was relatively small with only 10 respondents, the data show marked variation across individuals.
1
Figure 6 shows that some individuals reported a high level of confidence no matter what level of risk or ambiguity, some individuals reported lower confidence when the scenarios had more non-obvious risk information, and some individuals reported low confidence in all cases. Individual differences in confidence on IIP survey.
Finally, we asked our subject matter expert to identify the responses that were expected according to NRC procedures and proportional to the level of risk. Figure 7 demonstrates that the majority of responses (95.6%) were consistent with the expected agency response, depicted by the regions labeled “proportional” (i.e., a response proportional to the safety/risk of the scenario). However, 2 cases were considered unexpectedly low regulatory response according to the level of risk, both in Scenario D. IIP individual responses according to level of risk and proportionality. Note. The average level of confidence for those who made the decisions that were considered proportional to the level of risk in Scenario D was only 3.7% higher than the confidence of those who made the responses that were unexpected. Though we did not run statistical analysis due to the small sample size, this result suggests that confidence was not a strong predictor of “accuracy.” Also, it is worth mentioning that the Y-axis contains some subjectivity in that the decision of a higher or lower regulatory response relies to some extent on individual judgment (while following the procedures and guidance documents). The X-axis is objective and fixed in that the scenarios are based on actual events where the risk was calculated via PRA.
Qualitative Data
Analysis of respondents’ reasoning/rationale revealed several interesting insights. Respondents were aware of the limits of their expertise. Several respondents described a lack of clarity regarding the difference between SIT and AIT, and individuals tended to vary in terms of whether they felt the “default” response should be the more or less severe inspection type. Some individuals described choosing the less severe inspection type to start, then escalating the response depending on what is found, and others described choosing the more severe inspection type to start to be conservative. This may reflect a difference in understanding/adherence to guidelines or recommendations, individual differences in risk tolerance, or a lack of clarity in guidelines. Respondents also questioned deterministic criteria (one of the standard components of the scenario) and their applicability to diverse operational scenarios.
One respondent directly compared the current scenario to a case that they had experienced in the past, performing a kind of recognition-primed decision action (Klein, 1993). They noted that the prior case, with similar features, had ultimately ended up as a Red finding and they decided (correctly) to recommend a more severe inspection type (AIT).
Study 2—SDP
Unlike the IIP data, which demonstrated high consistency, results from the SDP study show a high level of variability in all scenarios. See the Discussion for further details. As shown in Figure 8, in the two low-risk scenarios there was a 58/42% and 50/50% split in terms of Green and White decisions, respectively. In the medium-risk scenario (Scenario C), response recommendations included Green (59%), White (33%), and a small percentage Yellow (8%). The high-risk scenarios included more significant SDP responses (White and Yellow), but respondents only made Red SDP choices when the risk level was both high and obvious (Scenario D). Response patterns on SDP survey.
Confidence levels were highest in Low-Risk, Obvious Scenario A, but dropped in Scenario B. Confidence was lowest in the medium-risk scenario, but it was similar regardless of choice for all scenarios (see Figure 9). Average confidence levels in SDP data.
We also evaluated individual variation in confidence levels, as shown in Figure 10. Though we did not run statistical analyses due to the small sample size, we observed that the response patterns were highly variable between individuals.
2
However, people showed similar levels of confidence regardless of their choice for inspection type. Individual variation in confidence on SDP scenarios.
Finally, we evaluated whether the responses on the SDP survey were considered expected by the agency according to procedural guidelines for the level of risk/safety significance. Our SME identified acceptable responses on each scenario and we compared them to the responses on the survey, displaying the results in Figure 11. While the majority (63.7%) of responses were considered proportional, 20.7% were a higher level of regulatory response than expected and 15.5% were a lower level of regulatory response than expected. Level of risk and level of response for individual SDP responses. Note. The average confidence level between those who made responses that were considered proportional to the level of risk was 8.01% higher than the confidence of those who made responses that were considered unexpected. Though we did not run statistical analyses due to the small sample size, this result suggests that there may be a weak relationship between confidence and “accuracy.” Also, it is worth mentioning that the Y-axis contains some subjectivity in that the decision of a higher or lower regulatory response relies to some extent on individual judgment (while following the procedures and guidance documents). The X-axis is objective and fixed in that the scenarios are based on actual events where the risk was calculated via PRA.
Qualitative Data
Respondents indicated a high degree of uncertainty in their SDP choices, especially when there was missing information from the licensee. They described that “in the real world” they would have additional questions for the licensee and be able to request additional PRA analysis, which would potentially allow them greater certainty about the case. Respondents highlighted the challenge of integrating uncertainties in risk assessments, particularly regarding external events and human factors. Discussions often centered around the adequacy of mitigation strategies, regulatory compliance, and the interpretation of risk thresholds. Respondents considered the role of expert opinions, operational complexities, and regulatory frameworks when determining risk/safety significance. Finally, some respondents described the need for clarifying the “Greater than Green” approach guidelines, which are part of preliminary SDP determinations.
Conclusion
Taken together, results show a difference in noise levels between the IIP and SDP processes. Whereas the IIP survey had high consistency with very few decisions that were considered unexpected by the agency, the SDP scenarios elicited higher levels of variability, even in the low-risk and more obvious cases. Approximately 36% of participant responses differed from those of the SME. Respondents provided extensive rationales for each decision, revealing a high degree of expertise and insight. They also demonstrated a high degree of self-awareness, acknowledging the limits of their knowledge and experience.
General Discussion
In a study of decision making in the nuclear regulatory domain, we aimed to determine whether individual decision-makers exhibit decision variability between each other when assessing the same set of cases. Overall results showed a high degree of consistency, especially for decisions about inspection type, but greater variability in decisions for SDP risk/safety significance. Individuals were particularly varied in responses regarding the non-obvious cases. It was beyond the scope of this study to identify specific sources of variability, so these observations should be confirmed with additional follow-on research.
It is worth noting that some variability is expected in nuclear regulatory decisions, especially given high system complexity and plant- and case-specific considerations. For example, two events at two identical plants with identical calculated risk impacts could occur. If plant A had the event occur due to a deficiency with a broad and generic impact to the whole US fleet (e.g., a diesel generator used for 20 different units) vs. plant B (where the event occurred due to an emergency diesel generator that is used at only one plant in the country), these would be very different cases, and hence the variability of regulatory action would be desirable to ensure protection of public health and safety. However, in our scenario study we effectively controlled the variability between plants by presenting the same scenarios to all decision makers, suggesting that some of the observed variability could be considered undesirable “noise” as defined by Kahneman et al. (2021).
Risk-informed decisions are particularly nuanced and complex, especially when there is no known “correct answer.” In the nuclear regulatory domain, decision makers are operating with probabilities and making the best-informed decision possible with the tools and information they have. The regulators use the ROP processes to try to minimize uncertainties and have defensible decisions to back regulatory action. As one decision-maker stated, “hard truths are difficult to come by,” which is why the decisions must be risk-informed, not purely risk-based. Moreover, in both the SDP and IIP processes, in real-world cases the decision makers would likely have been involved in rounds of management review, peer review, and licensee opportunities to input, which would have provided more context to the scenarios than what we provided by just showing the “final” information packets to each decision-maker individually. This “work as done” differs from the “work as described” in the stated NRC procedures, where managerial decisions are supposed to occur in an unbiased, independent fashion based on only this information. Instead, decisions are often made collaboratively and with more exposure, context, and back-and-forth than what the written procedures might dictate. For example, NRC Inspection Manual Chapter 0609-Attachment 1, “Significance and Enforcement Review Panel (SERP) Process,” clearly establishes the four voting panel members, yet in practice many more individuals are influencing SDP decisions. We view this discrepancy as an opportunity for the agency to either bring procedures/processes into alignment with reality or for greater “decision hygiene” to be added to processes to ensure individual inputs are first established and then merged for a final decision. This is consistent with how professional decisions are made in many other domains (Dekker et al., 2011).
For decisions about significance, our results suggest that decision makers may be particularly likely to recommend greater regulatory responses when the information provided is more clearly high-risk/safety significant. We also observed decisions that could result in an over utilization of resources in the low-risk cases and an under-use of regulatory action in the high-risk cases. While we did not have any way of determining the absolute “correctness” of the decisions, we did work with our subject matter expert to evaluate whether each decision was at least proportional to its level of risk. Only in the SDP study did a large number of cases fall outside of the response options that the agency might expect. This finding suggests several interpretations. One interpretation is that a higher level of variability in the SDP process may be expected because of higher system complexity (Dekker et al., 2011), implying that more weight should be given to the holistic professional judgment and less to the individual component decision pieces or processes. The SDP process also requires more precision (as can be seen with the narrower bands between different colors in Figure 1), but the SDP process also involves more time and resources. It is also possible that the variability is indeed undesirable “noise” (Kahneman et al., 2021) and that steps should be taken to increase decision hygiene. Processes, procedures, and decision support tools could be improved to ensure more consistency.
The individual differences data showed that some people’s confidence varied according to the level of obviousness, whereas others reported high confidence regardless of the cues. Though we did not test the correlations for statistical significance due to the small sample sizes, confidence appeared to decrease with years of experience, suggesting that more experienced decision-makers may be more attuned to the high level of uncertainty present in all risk-informed decisions and thus have lower confidence. While this would need to be more formally tested with a larger sample, this pattern is consistent with prior research showing that confidence and accuracy/knowledge are not often well-calibrated (Fischhoff & Broomell, 2020). Thus, the variations in confidence data may reflect individual differences in experience, personality, risk tolerance, or other factors that should be studied in future research.
Finally, although it was not the intention of this study to follow a Naturalistic Decision Making methodological approach per se, we observed several aspects of the decision making process that would be ripe for analysis using Cognitive Task Analysis (including Critical Decision Method) and other approaches (Crandall et al., 2006). Our approach to the scenario study was inspired by NDM-style studies that have high ecological validity. We allowed the decision making in the scenario study to happen outside of a controlled lab environment and did not control what resources or individual strategies/processes people may have used to support their decision making. This may have allowed more “noise” to enter the process, but as we were attempting to audit the noise level during naturalistic decision making, we argue that this is a useful first pass at understanding the decision environment. In the rationale descriptions, we also noted some evidence of recognition priming—decision makers described key pieces of information about each scenario that prompted pattern recognition in some individuals. An informal interview with one decision maker also revealed a story of a time when he had a hunch about a situation at a plant and decided to go against the recommended inspection type according to the calculated risk level. He ended up discovering critical information about the case that was important to the investigation process. This is an anecdotal example of the risk of over-proceduralization and taking the human expert out of the equation.
Limitations and Future Directions
We had several limitations in the study, including statistical and sample size constraints. Our scenarios were non-naturalistic in the sense that the respondents had to make decisions “in one sitting” and were not able to ask for additional information from the licensee, request additional probabilistic risk assessment work, or benefit from the perspectives of others, an activity which is intrinsic in the NRC’s decision-making processes. Indeed, in their rationales, participants noted that further discussion and input from other experts could change their mind, and that in the real world, these decisions are made by a group. However, one of the recommendations for “decision hygiene” is to ensure independence of decisions (Kahneman et al., 2021), so we both recommend that the NRC encourages individual decisions to be formed prior to the group discussion and argue that our approach demonstrates how those decisions may vary upon entering the group meeting. The final group decision may then be made with higher confidence.
It is also possible that the variation we see in responses on the scenarios (especially in the SDP survey) is not a reflection of “noise” from the decision-makers, but it could instead reflect a failure of construct validity. The scenarios may not have effectively portrayed the intended aspects of risk, and thus respondents may have been responding accurately to the higher or lower levels of risk that were unintended by our designed experimental manipulation. The scenarios may not have included the key elements for making risk determinations. Also, we intentionally deidentified the plants by using fake names in order to control for any prior knowledge or experience that decision-makers may have had with a particular plant. This may have stripped the scenarios of some realism or individually known context about that plant that may have informed the decision. However, the scenarios were designed to follow the standard format for an SDP review packet, including each of the standard sections with typical descriptions, which are intended to be reviewed prior to the group meeting. All scenarios in the SDP survey included the Timeline or “story,” Performance Deficiency, the “more-than-minor” criteria met, a description of how the deficient licensee performance was the proximate cause of the degraded condition (or event), the basis for not screening the finding to Green, Event/Condition/Both, exposure time, NRC calculated risk level, licensee’s position, and significant uncertainties. The scenarios were also reviewed and edited by an expert panel. As such, we have high confidence that the scenario descriptions were as representative as possible of what the decision-makers would typically be working with. Moreover, the within-subjects nature of our study design ensured that all participants viewed and responded to all of the same scenarios, including whatever limitations or missing information they may have had.
We also did not attempt to measure or define explicitly what noise factors could be influencing the decision variability. In other domains like the medical field, time of day, mood, and other “occasion noise” factors have been shown to affect decision outcomes (Kahneman et al., 2021; Neprash & Barnett, 2019). In the current study, individual differences factors like risk tolerance could also explain some of the variability (Rolison & Shenton, 2020). As our intention was to simply conduct a “noise audit,” evaluating the presence and extent of potential noise in this process, we did not attempt to control for or measure these types of factors, and instead let them “have their way” during the decision making. Future research should focus on determining and specifying the sources of noise more systematically.
Lastly, we used a single SME for the evaluation of acceptable responses. While multiple SMEs would have been desirable, this was cost/resource prohibitive in the current pilot study. We argue however that this limitation was minimized by (1) the use of quantified risk values which are an objective measure (any potential biases of the SME were lessened by the use of objective, quantifiable risk models that constrained the acceptable answers) and (2) the fact that the simulation scenarios were pulled from actual historical events in the U.S. nuclear industry, for which we know the “correct” decisions. Nonetheless, future research should use a larger group of SMEs to evaluate the acceptable responses.
Conclusion
In sum, we conducted a “noise audit” of decisions in the nuclear regulatory domain following the approach outlined by Kahneman et al. (2021). We evaluated decisions in the U.S. NRC’s Reactor Oversight Program by analyzing variability in two regulatory processes: incident investigation and significance determination. We evaluated individual decisions on representative scenarios and observed a high degree of consistency, especially for decisions about inspection type in the IIP. Our results suggest that, as a whole, the Reactor Oversight Program facilitates reliable and objective risk-informed decisions in nuclear power regulation, contributing to the continued success of the Nuclear Regulatory Commission’s mission of safety. However, our data also reveal the potential presence of decision variability in some aspects of the SDP in particular. This extends the research on “noise” into the risk-informed decision-making domain and suggests that other organizations that depend upon reliable expert judgment in risk decision making—especially those with direct health/safety responsibilities—might also benefit from this type of analysis (Kahneman et al., 2021). Some examples of risk-informed agencies that might benefit include (1) those tasked with water management (e.g., control of dams where diverse and competing goals exist for maximizing river navigation, lessening environmental impact, and minimizing loss of property), (2) agencies charged with managing the reliability of the electrical grid where the availability of power is paramount and who are increasingly challenged by extreme weather events, and (3) organizations assigned with responding to diseases and epidemics, where rare and “single decisions” can have worldwide impacts.
Future research could clarify the sources of that variability and identify opportunities for guidelines, procedural enhancements, or training approaches that can better support the decision makers when faced with inevitable complexity and uncertainty. One approach to this could be to leverage NDM-inspired cognitive task analysis and scenario-based training to (1) understand and teach the factors that go into expert judgment and decision making and (2) provide newer decision makers exposure to complex historic cases, the regulatory decisions, and the ultimate outcomes so that they may have increased recognition and (ideally) reduced variability when faced with similar cases in the future.
Footnotes
Author Note
This report was prepared as an account of work sponsored by an agency of the U.S. Government. Neither the U.S. Government nor any agency thereof, nor any of their employees, makes any warranty, expressed or implied, or assumes any legal liability or responsibility for any third party’s use, or the results of such use, of any information, apparatus, product, or process disclosed in this report, or represents that its use by such third party would not infringe privately owned rights. The views expressed in this paper are not necessarily those of the U.S. Nuclear Regulatory Commission.
Acknowledgments
We are indebted to Kevin Coyne, Jack Giessner, and Russell Felts at the U.S. Nuclear Regulatory Commission for their service as an expert panel for the study. Their inputs on the scenarios, estimations of acceptable levels of variation, and overall support on the project was invaluable. We would also like to thank Benjamin Schwartz for helpful comments on the study methodology and early reports. We are grateful to Sunil Weerakkody for comments from the NRC on the initial draft of the paper. Finally, we are grateful to the NRC decision makers who participated in the study and allowed us a window into their processes and procedures.
Consent to Participate
The study was considered an internal improvement project and the NRC did not require institutional review board review. All participants in the scenario study provided written informed consent prior to participation.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The study was funded by the U.S. Nuclear Regulatory Commission Future Focused Research project program under Contract Number 31310023P0030.
Declaration of Conflicting Interests
The authors declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: This paper is the outcome of a project that was funded by an internal Nuclear Regulatory Commission (NRC) Future Focused Research project grant that was awarded to co-author J.D. Hanna, a former employee of NRC. He formulated the original idea, acquired the internal funding, and the NRC panel selected Monterey Technologies, Inc., as the awardees of the contract, thus bringing principal investigator E.M. Barhorst-Cates and co-investigator A.P. Ciavarelli onto the project. The initial concept of conducting a “Noise Audit” of the NRC Reactor Oversight Program was part of the explicitly defined tasking in the contract, though the specific methodology for doing so was not pre-determined. Authors EBC and APC declare no conflicts of interest. Author JDH declares he has a conflict of interest, in that he is employed with the Jensen Hughes, which is an organization that has an interest in the research and/or is likely to benefit from its publication and dissemination.
