Implementing Lumberjacks and Black Swans Into Model-Based Tools to Support Human

Abstract

Objective:

The objectives were to (a) implement theoretical perspectives regarding human–automation interaction (HAI) into model-based tools to assist designers in developing systems that support effective performance and (b) conduct validations to assess the ability of the models to predict operator performance.

Background:

Two key concepts in HAI, the lumberjack analogy and black swan events, have been studied extensively. The lumberjack analogy describes the effects of imperfect automation on operator performance. In routine operations, an increased degree of automation supports performance, but in failure conditions, increased automation results in more significantly impaired performance. Black swans are the rare and unexpected failures of imperfect automation.

Method:

The lumberjack analogy and black swan concepts have been implemented into three model-based tools that predict operator performance in different systems. These tools include a flight management system, a remotely controlled robotic arm, and an environmental process control system.

Results:

Each modeling effort included a corresponding validation. In one validation, the software tool was used to compare three flight management system designs, which were ranked in the same order as predicted by subject matter experts. The second validation compared model-predicted operator complacency with empirical performance in the same conditions. The third validation compared model-predicted and empirically determined time to detect and repair faults in four automation conditions.

Conclusion:

The three model-based tools offer useful ways to predict operator performance in complex systems.

Application:

The three tools offer ways to predict the effects of different automation designs on operator performance.

Keywords

human–automation interaction human performance modeling levels of automation attentional processes situation awareness

Introduction

As automation is implemented to a greater extent and across a wider range of systems, there is a need for effective, research-based guidance to support designers. This paper provides the theoretical and empirical framework for two human–automation interaction (HAI) concepts—the lumberjack analogy and black swan (totally unexpected) events—and their effects on overall operator performance. The paper then describes the implementation and validation of these concepts in three distinct and separate model-based software tools to assist designers in developing automated systems to support operator performance. There are relatively few modeling efforts that attempt to address HAI except at a very detailed level (e.g., Boehm-Davis, Holt, Diez, & Hansberger, 2002; Degani, Heymann, & Shafto, 2013; John, Blackmon, Polson, Fennell, & Teo, 2009; Lüdtke & Osterloh, 2010; Lüdtke, Osterloh, & Frische, 2012; Lüdtke, Osterloh, Mioch, Rister, & Looije, 2009), and they either do not directly assess issues of imperfect automation or do not contain firm validation data. Finally, this paper identifies lessons learned across the projects, and thus it can be informative to other human factors professionals attempting to predict HAI performance.

A stages-and-levels taxonomy of automation developed by Parasuraman, Sheridan, and Wickens (2000, 2008; Wickens, Mavor, Parasuraman, & McGee, 1998) defined automation partially in terms of the stages of human information processing that could be assisted by automation. This stage taxonomy is represented schematically in Figure 1, along with some prototypical examples of automated functions at each stage depicted at the bottom.

Figure 1.

Stages in which automation can assist human performance, including examples.

Such a taxonomy provides a framework for understanding three important characteristics of human interaction with imperfect automation. First, across many applications, there is a marked distinction between automation that provides assistance to the operator regarding the diagnosis of the state of the world (Stages 1 and 2) and that which offers decision recommendations for actions and sometimes even executes the recommended action (Stages 3 and 4). Examples of this dichotomy may include, in medicine, the distinction between automated diagnostic aids and treatment recommendations (Garg et al., 2005; Morrow, Wickens, & North, 2006). In remotely controlled robotic arm design, the distinction is between displays of trajectory error and autopilot flying of the trajectory (Li, Wickens, Sarter, & Sebok, 2014). In aviation, the distinction is between midair conflict detection systems (Wickens, Rice et al., 2009) and conflict resolution advisories (Prevot, Homola, Martin, Mercer, & Cabrall, 2012; Trapsilawati, Qu, Wickens, & Chen, 2015). In statistical analysis, the distinction is between software packages that offer only an inference with a confidence interval versus those that recommend a decision to accept or reject the null hypothesis.

Second, the development of such automation aids is more problematic for later-stage (3 and 4) automation than for earlier-stage (1 and 2) automation. The reason lies in the contrast between the two clusters regarding optimal design. For earlier-stage automation, there is typically a “ground truth” of correctness. A diagnosis can ultimately be determined to be right or wrong (even though at the time it is made, it may be uncertain). Hence automated systems can be evaluated on their absolute level of diagnostic accuracy. In contrast, an automated system that recommends (and/or executes) a course of action in the face of an uncertain diagnosis (Stages 3 and 4) must, by definition, make an assessment of the values (or costs) of certain outcomes.

Consider an automated medical assistant that recommends to the physician a particular treatment in the face of an uncertain diagnosis. The assistant must account for the relative costs of recommending an unneeded treatment versus those of an untreated condition. There is no reason to assume that the automation’s value structure is identical to that of the physician (or the patient), and therefore there is no reason to assume that the recommendation is “optimal.” As a consequence, later-stage automation may more often be “wrong” in the sense of providing a recommendation that is different from that of the human operator, not because the diagnosis underlying the recommendation was incorrect but because the value structure differed. Thus the physician and automation may disagree not because one was right and the other wrong but because one felt that the cost of the unnecessary surgery (in dollars and danger to the patient) was greater. With a greater likelihood of being incorrect from the point of view of the human user, the automation’s perceived reliability will be lower and certainly harder to objectively quantify.

Third, the data on human use of imperfect automation indicate greater costs of such imperfections when automation does fail. To understand this third source of importance for the taxonomy, we must also consider the level of automation within each stage. Based on the original classic work of Sheridan and Verplank (1979), Parasuraman et al. (2000) considered that each stage within the taxonomy could be characterized by different levels of implementation. These levels can be thought of as the ratio of automation work to human work or, alternatively, how “aggressive” and authoritative the automation is in performing its assistance. The most intuitive contrast of levels, within Stage 3 automation, is between an automated agent that recommends one or more courses of action (lower level) and an agent that will execute the action within a certain period unless the human operator vetoes it (Olson & Sarter, 2001). As Table 1 shows, similar assignments of levels can be made across each of the different stages.

Table 1:

Multiple Examples of Automation Stages and Levels

Variable	Stage 1: Attention Filtering	Stage 2: Diagnosis	Stage 3: Action Choice	Stage 4: Action Execution
Example	The cockpit traffic display	The medical diagnostic assistant	The engine supervisor	Vehicle navigation autonomous functions
Higher level
Filters all but one	Displays only the closest aircraft	Diagnoses only the most likely disease	Shuts down the engine	Totally automated (e.g., Google car)
Presents all, but highlights	Displays all aircraft but highlights the closest	Presents and rank orders the three most likely diseases	Shuts down the engine unless the operator vetoes	Lane keeping and headway control
Lower level
Presents all	Presents all surrounding traffic	Presents 10 plausible diseases	Recommends engine shutdown	Headway control only

Importantly, there is no absolute number of levels that must be assigned within each stage, and indeed, Stage 3 lends itself to more possible levels than are indicated in Table 1. In addition, there are certain contingencies between levels across stages. In particular, the highest level at Stage 3 by default dictates high levels at Stage 4. That is, if an automated system “decides” to execute an action at the highest level of Stage 3 (rather than simply recommend it), it must execute the action (highest level of Stage 4). Also there is a partial contingency between Stage 2 and Stage 3 in that many decisions of automation are made on the basis of the automation’s own diagnosis. However, this automation-driven diagnosis–decision link is not invariably true, as, for example, a physician may enter his or her own independent diagnosis into an automation treatment recommendation system for the automation to recommend a course of action.

Onnasch, Wickens, Li, and Manzey (2014) used the framework provided by the matrix of stages and levels to define a new variable, degree of automation (DOA), that is increased by moving upward and/or rightward across the cells of Table 1. The importance of collapsing these two dimensions (stages and levels) onto a single one (degree) is because both stages and levels appear to have the same implications for HAI in normal and automation failure conditions. These implications are embedded in a set of four phenomena that Onnasch et al. derived from a meta-analysis of 16 experiments: nominal performance; automation failure performance, which we will refer to here specifically as a human response to unexpected automation failure (HRUAF); situation awareness (SA); and workload. All studies included within the meta-analysis varied the degree of imperfect automation (automation that could occasionally “fail”) and recorded both normal performance of the human–automation system and performance when a failure occurred. In addition, where possible, Onnasch et al. extracted and recorded SA across the manipulated DOA as a possible moderator of performance. Workload was assessed in many of these studies and was found to covary with routine performance, decreasing with higher DOA. Workload is not presented here, as it is not central to our thesis.

An important consideration is that the broader meaning of automation failure may include software bugs, hardware failure, circumstances for which the automation was not designed, and instances in which the automation performs as the designer intended but not as the user expected. Such situations could occur if the user, for example, incorrectly programmed the automation or had an inaccurate mental model of its functioning.

The results of the meta-analysis (Onnasch et al., 2014) are shown in Figure 2. Panel (a) indicates that a higher DOA increases routine performance. This finding is not surprising, as it shows that across studies, automation is functioning as intended. Panel (b), however, indicates a progressive loss in performance with higher DOA (automation is doing more work) on the infrequent occasion when automation fails and the operator must assume manual control. This trade-off is the critical aspect of the data, represented by the “lumberjack analogy”: The higher the trees (better routine performance with higher DOA), the harder they fall (worse performance in failure conditions).

Figure 2.

Results of meta-analysis of degrees of automaton (from Onnasch, Wickens, Li, & Manzey, 2014). The regression trend lines are the dashed lines.

Panel (c) depicts the critical variable that underlies the performance trade-off: the progressive loss of SA with higher DOA. Such a finding is also consistent with the general understanding that being out of the loop degrades awareness of what automation is doing (Endsley & Kiris, 1995; Kaber & Endsley, 2004), with a progressively greater loss as automation does more work and the human does less work. Such a phenomenon, sometimes referred to as “out-of-the-loop unfamiliarity,” or OOTLUF, can be seen as resulting from a combination of three components:

Complacency, inducing in the operator a failure to monitor and hence a failure to notice effectively when a fault occurs. The irony of automation is that the more reliable the automation is, the rarer failure events are, becoming essentially “black swans” (Taleb, 2007) that are missed altogether (Molloy & Parasuraman, 1996; Wickens, Hooey, Gore, Sebok, & Koenecke, 2009).

The reduction of active responding and engagement with the system, which induces in the operator a failure to learn and remember the dynamically evolving state of automation and to understand fully the environment in which automation is functioning. This reduction is amplified in moving to higher DOAs. The resulting loss of SA in turn degrades the appropriate response to the failure.

The progressive loss of skill of the operators with increasing DOA (Casner, Geven, Recker, & Schooler, 2014), which degrades the quality of an appropriate intervention.

Thus the key performance variables in this trade-off model are the divergence of the two performance curves as DOA increases, creating the higher “tree” and the harder “fall”; the key mediating variable is the loss of SA. The DOA trade-off model is embodied in the straight regression trend lines shown in the figure; the data that validate the model are seen in the “spaghetti graphs” of the individual studies underlying the trend lines.

Naturally, designers should be concerned about such a trade-off, given that almost all automation can fail. These failures are likely to be unexpected black swans, not anticipated by the designer. A host of possible approaches or solutions exist, four of which are described next.

A higher DOA is appropriate if failure analysis indicates that the consequences of failures are not sufficiently serious to outweigh the benefits of higher DOA, given the frequency of correct operation, and there are only very modest costs of such failures (Sheridan & Parasuraman, 2000). This may be the case for routine automation in non-safety-critical environments, such as a photocopier or other office equipment.

Simply automate less: Choose a lower DOA, and accept that the human will be required to perform additional work, if the reduced costs of the automation failure outweigh the additional workload. For example, continue with automation diagnostic systems (earlier stage), or implement automation decision systems only at a low level.

Capitalize on the meaning of the “loss of SA curve” and carefully craft intuitive displays of dynamic processes that can improve SA without imposing significant mental effort on the operator, in the spirit of ecological displays (Burns et al., 2008; Hoff & Bashir, 2015; Mercado et al., 2016). Such a design approach can reduce or even eliminate the downward slope of the dynamic performance loss function (Figure 2b) without decreasing benefit of automation (Figure 2a).

Provide training specifically on the experience of automation failures (Hoff & Bashir, 2015; Sauer, Chavaillaz, & Wastell, 2015; Schaefer, Chen, Szaslmas, & Hancock, 2016). This training may be simply verbal instruction that automation could fail (and how it might). But the full impact of the findings of active learning benefits (Wickens, Hutchins, Carolan, & Cumming, 2013) suggest that actually experiencing an automation failure during training, for example, in a simulator, can be a compelling way to establish the operator’s mental model that automation is imperfect (Bahner, Hüper, & Manzey, 2008).

Earlier, we described the phenomenon of HRUAF: the human response to the black swan events of unexpected automation failure. We also described how the lumberjack analogy characterizes how the human response is affected by increasing DOAs, as mediated by the reduction of SA. We also identified four potential solutions to the problem. In the following section, we focus explicitly on three validated models of HRUAF (or models of HAI that contain an explicit HRUAF component). Such models have the advantages that they provide quantitative predictions of operator performance in routine and failure conditions, they offer metrics of HAI quality, and they can predict the effects of proposed design solutions listed earlier.

Modeling

The benefits of quantitative computational modeling in general are well documented elsewhere (e.g., Byrne, 2013; Foyle & Hooey, 2008; Gray, 2007; Laughery, Lebiere, & Archer, 2006; Pew & Mavor, 1997), as are the requirements that such models be validated against human performance data in order to be trusted and useful (Wickens & Sebok, 2014). Indeed, such validated human performance models of HRUAF are rare, particularly as that response is moderated by DOA. In this section, we describe three such modeling efforts that we have conducted and from which we have extracted lessons learned. We note that all three efforts addressed automation in complex safety critical aerospace systems: flight deck automation, automated robotic arm control, and automated environmental process control.

All three models are embedded in software tools that provide an interface through which the user interacts with the model. The interface allows the user to investigate questions about the effects of automation properties on HRUAF and other aspects of performance and cognition. Finally, and critically, all three software tools incorporate a model of human noticing of the unexpected visual event. In these projects, the unexpected event is the visual manifestation of the automation failure. The visual noticing model, NSEEV (noticing, salience, expectancy, effort, and value), describes the quantitative manner in which detection of a visual indicator of the automation failure (or any other discrete visual event) is influenced by factors such as the expectancy of the event and salience of the failure indication. For example, the low expectancy of the black swan failure will increase detection time, whereas the high salience of a flashing “no data” message will shorten it. The model also predicts the degrading effects of cognitive workload to response time and, most importantly, the degrading effect of greater retinal eccentricity away from the operator’s momentary fixation at the time of the failure. This fixation in turn is driven by the visual demands of the ongoing task at the time of the failure. The NSEEV model is described in detail elsewhere (Wickens, 2015; Wickens, Sebok, Li, Sarter, & Gacy, 2015; Steelman, McCarley, & Wickens, 2011).

Table 2 provides an overview of critical features of the three modeling efforts to predict human interaction with automation. The table identifies the domains, the unexpected event(s) they address, and how they treat the HRUAF. In addition to this overview, we describe each modeling effort in more detail.

Table 2:

Overview of Three Modeling Efforts on Black Swan Automation Failure Response

Variable	Automation Design Advisor Tool (ADAT)	Man–Machine Integration Design Analysis System–Function Allocation Support Tool (MIDAS-FAST)	Space Performance Research Integration Tool (S-PRINT)
Domain	Flight deck automation: Flight management system (FMS)	Robotic arm supervision and control	Robotic arm and environmental process control in a multitasking environment
The unexpected (black swan) event	Uncommanded flight mode changes within the FMS	Deviations of a robotic arm movement from the intended trajectory	Fluctuations in a process control system coupled with failure of an automated failure diagnosis and guidance task
How the loss of situation awareness was modeled	An algorithm to quantify automation mode complexity applies increasing penalties as complexity increases	A model of complacency based on human failure to sufficiently monitor automation; changes to scanning result in delayed noticing and response to trajectory deviations	A multitasking selection model predicts operator engagement and therefore noticing and response
Degree of automation	Not varied	Varied in terms of robotic arm trajectory control and hazard alerting	Varied in terms of robotic arm trajectory control and the presence or absence of a process control diagnosis and response system

Automation Design Advisor Tool (ADAT)

The flight management system (FMS) is a complex, yet highly useful, automated aviation system (Ferris, Wickens, & Sarter, 2010; Sebok et al., 2012) that helps pilots plan and execute flight paths to maximize efficiency while maintaining safe performance (Degani, 2004; Pritchett, 2009). The FMS has been on flight decks since the 1970s, providing a wealth of lessons learned regarding the challenges it presents to pilots. Briefly summarized and as reflected in the literature on the challenges of HAI, the FMS (1) affects complex, interacting flight control systems; (b) operates in ways that pilots, at times, do not expect; (c) presents nonsalient, alphabetic-code indications of the current mode, requiring the pilot to interpret the code and visualize the mode’s effects on the flight path; (d) can have different outcomes for pilot actions, depending on the current mode; (e) has many modes of operation; and (f) has multiple ways to trigger mode transitions (Fennell, Sherry, Roberts, & Feary, 2006; Sarter & Woods, 1992, 1994, 2000; Sherry, Feary, Polson, & Fennell, 2003). As a consequence, subtle mode changes, initially unnoticed by pilots, may sometimes be perceived as automation failures and lead to confusion in response.

ADAT was developed to help FMS designers identify potential design concerns that might be expected to impair pilot performance by using the software tool to replicate the layout of the FMS design and then characterize the user interactions. Figure 3 shows a screenshot of the tool (Sebok et al., 2012), where the user re-creates the layout of the FMS components on the flight deck. Specifically, ADAT evaluates six aspects of FMS design in six modules, each based upon computational models (metrics) of factors influencing human performance: (a) layout of the displays and controls, (b) noticeability of changes, (c) meaningfulness of terms, (d) confusability of terms and symbols, (e) complexity of modes, and (f) procedures performed when using the FMS. Each area, referred to as a module in ADAT, has concerns that could adversely affect pilot performance. These six modules were selected through iterative discussions with subject matter experts (pilots, avionics designers, pilot trainers, and airline procedure writers) to identify critical FMS design issues. The current paper focuses on the two most relevant modules (noticing changes and complexity; see Sebok et al., 2012, for a more detailed discussion of other modules). Note that these modules are tabs shown at the top of the screenshot in Figure 3.

Figure 3.

Screenshot of the Automation Design Advisor Tool layout module, where designers locate and characterize the flight management system displays and controls.

Based on user inputs, ADAT provides feedback about potential human factors concerns for each of the aforementioned aspects. The results of the six computational metrics are indicated with a stoplight status code for each of the design factors and their expected impacts on pilot performance. When ADAT identifies a design concern, it provides links to empirical research summaries in the tool that explain the problem or present potential solutions.

The changes module addresses a well-documented finding that pilots are sometimes unaware of display changes that indicate important aspects of aircraft behavior pertaining, in particular, to modes of automation. When mode transitions occur that are not commanded by the pilot, they are sometimes unexpected (Nikolic, Orr, & Sarter, 2004; Sarter, Mumaw & Wickens, 2007; Wickens & Alexander, 2009; Wickens, Hooey, et al., 2009). The changes module, based heavily on the NSEEV model described earlier, reflects the well-known phenomenon of change blindness (Rensink, 2002). This lack of awareness of changes can negatively affect safety. The changes module uses the NSEEV model of visual noticing (Steelman et al., 2011; Wickens, 2015; Wickens & McCarley, 2008) to predict whether or not pilots will notice mode transitions based on (a) salience, expectancy, and location of changes; (b) phase of flight and the associated expected pilot workload (higher workload results in decreased noticing of unexpected events); and (c) time elapsed since the change occurred. Time elapsed is important because change blindness research suggests that if a visual change is not detected within 5 to 10 s after its occurrence, the change is likely to be missed (Rensink, 2002). Such a failure of detection represents a breakdown in the first component of OOTLUF, described earlier. The model penalizes designs to the extent that important events, such as subtle mode changes, are not salient, are located in the peripheral visual field, are unexpected, or are encountered during high-workload conditions.

The complexity module assesses the problems associated with understanding and predicting the many different FMS behaviors (Degani et al., 2013; Funk et al., 1999; Sarter & Woods, 1994, 2000). The number of modes, the interactions or couplings between the modes (Halford, Baker, McCredden, & Bain, 2005), and the agent that activates a mode transition (pilot or automation) all affect system complexity and thus are assumed to affect pilot SA, the second component of OOTLUF. Finally, the feedback presented to the pilot regarding the effects of mode transitions on the flight path is evaluated. Graphical displays that show the flight path could greatly enhance pilot mode awareness, but the typical feedback is a three- or four-letter code, from which pilots should remember flight performance implications. This code is a poor display for supporting SA. The complexity module evaluates these relevant design issues that affect pilot SA and hence should influence the quality of the HRUAF failure response.

In the validation of ADAT ratings of FMS designs, five expert raters evaluated three types of FMS systems and found that ADAT rankings matched a priori expectations, as described next. The raters included four people trained in human factors; four of the five had piloting experience (two were commercial pilots, two were former private pilots or student pilots), and one was an avionics designer. The only non–human factors person was a professional pilot. A commercial FMS, with known issues, was rated lowest. A more modern corporate design, developed to remediate earlier problems, received intermediate ratings. The third design, specifically developed by human factors specialists to address automation complexity and interface design concerns (Boorman & Mumaw, 2004; Mumaw, Boorman & Prada, 2006), received the highest ratings.

Man–Machine Integration Design Analysis System–Function Allocation Support Tool (MIDAS-FAST)

Our second automation modeling effort is of the robotic arm, such as one used to conduct inspections and repairs on the International Space Station (Li et al., 2014; Wickens, Sebok, et al., 2015). Astronauts manipulate the arm using joysticks and monitor arm movement through camera and window views, which typically provide suboptimal perspectives of the task environment. The arm contains moving segments and joints that pose potential collision hazards with surrounding structure. Figure 4 shows the simulated robotic arm environment, with a trajectory (orange “staple”) indicating the recommended flight path around an obstacle. Given the complexity and criticality of robotic arm operations, astronauts receive hundreds of hours of robotic control training (Z. Tomlinson, personal communication, May 10, 2010). Automation offers a potential way to further support operator performance.

Figure 4.

One view of the robotic arm, with the end effector aimed at the recommended trajectory for traversing an obstruction.

This project used modeling, simulation, and human-in-the-loop (HITL) research to investigate how different DOAs applied to the movement and supervision of the arm could be expected to influence operator performance. A human performance model and a robotic simulation were integrated in the MIDAS-FAST software (Sebok et al., 2013). The model was implemented in MIDAS (Gore, 2010), a product of NASA Ames Research Center, and the robotic simulation was the Basic Operational Robotics Instructional System, from the NASA Johnson Space Center.

The operator model included six components, elaborated in Wickens, Sebok, et al. (2015). The NSEEV model (used in ADAT) predicted complacency in terms of failure to monitor automation for unexpected events, hence compromising failure management.

The complacency model predicted the loss of SA resulting from a higher DOA, as underlies the lumberjack analogy (see Figure 2). Details of this model are provided in Wickens, Sebok, et al. (2015), but its function was to predict the progressive reduction of scanning (and resulting degradation of SA) to the display panels vital to detecting robotic arm failures (errant trajectories) as the operator relied on progressively higher DOAs. These degrees varied from manual performance, to performance supported by a display of the path (supporting SA: Stage 2 automation) to automation that provided direct control (Stages 3 and 4). The reduction in scanning (to the display showing the errant trajectory) results both from the smoother control (lower bandwidth) at higher DOA and from the lower expectancy of failure under autopilot control. Both effects are embodied in the NSEEV model. An experiment with 36 highly trained participants (Li et al., 2014) indicated that, compared with manual performance, increasing the DOA improved routine performance in arm manipulation and reduced operator workload. However, failure detection and fault management were increasingly compromised by the higher DOA. The complacency model correctly predicted this differential failure response across the three DOA conditions, with a correlation of 0.97 between model predictions and empirical findings (Wickens, Sebok, et al., 2015).

Space Performance Research Integration Tool (S-PRINT)

The third project investigated HAI in a multitasking environment. The S-PRINT tool predicts human performance in unexpected workload transitions due to automation failures. S-PRINT was developed to support astronaut performance on long-duration missions, where fatigue and potential task overload are concerns. The HAI component of S-PRINT focuses on the features of any automation system that can degrade the failure detection, diagnosis, and response, thus invoking the lumberjack analogy. This HAI model contains four quantitative components or performance shaping functions (PSFs) that influence monitoring, failure detection, and SA, and hence the overall quality of the failure response. These are as follows:

DOA, as described earlier and as implemented in MIDAS-FAST.

The reliability of the automation. As research indicates (Moray & Inagaki, 2000; Rovira, McGarry, & Parasuraman, 2007; Wickens, Hooey, et al., 2009), the more reliable a system, the less likely an operator is to expect a failure. The less likely the failure is thought to be, the less frequently the operator scans the system displays (Hegerth, Lorenz, Villmek, & Krems, 2016) as predicted by components of the NSEEV model. S-PRINT users specify whether a failure is expected or unexpected for each type of failure. Reliability is associated with a PSF that affects operator performance more severely for unexpected automation failures.

The salience of the failure indication affects operator noticing of automation failures, as we highlighted in NSEEV and ADAT earlier. S-PRINT users input the salience of a particular failure, which affects the PSF applied to operator performance. Less salient failure indications result in longer detection times and hence impaired response in automation failure conditions.

The type of automation failure can affect operator performance. Although there are many ways any given automation system can fail, based on the work of Mosier, Skitka, Heers, and Burdick (1998) and Parasuraman and Manzey (2010), an important dichotomy built into our S-PRINT model of automation failure detection was whether automation decision aid failed because it gave incorrect advice—an error of commission—or because it unexpectedly gave no advice at all—an error of omission (Wickens, Clegg, Vieane, Sebok, & Janes, 2015). These two types of automation errors were expected to have qualitatively different effects on failure detection and recovery.

We note here that all four of these are objective metrics and that 1, 2, and 3 can be defined on a quantitative ordinal scale. Thus S-PRINT aggregates these factors into a single metric that predicts the adequacy of automation support for an effective human response to automation failure.

The model of operator performance in the multitasking environment was implemented as a discrete event simulation. The operator conducts a robotic arm task, as described earlier in the context of MIDAS-FAST, by moving the arm around an obstruction while monitoring—and intervening with as needed—an environmental process control system, AutoCAMS (Manzey et al., 2008). In robotic arm control, the operator tasks of planning the route, executing the route, and making decisions about rate and changing direction are each modeled explicitly. Similarly, the decisions to scan and intervene with the process control system when it fails are modeled as specific subtasks. These tasks include distributions of operator performance time and accuracy, based on interviews with subject matter experts.

The human performance model in S-PRINT was implemented using the Improved Performance Research Integration Tool (IMPRINT; Allender, 2000). Thus, the operator model for robotic arm control in S-PRINT used the same concepts (e.g., differences in trajectory control tasks and workload were modeled across the trajectory control conditions) as the operator model in MIDAS-FAST. However, the specific implementations varied, as MIDAS and IMPRINT are different types of modeling architectures and address human performance in somewhat different ways (Wickens, Sebok, et al., 2013).

The S-PRINT model was validated in an empirical study that utilized a multitask environment. The robotic arm task used in MIDAS-FAST was employed in both the higher-workload Stage 2 mode (automation-supported SA by presenting trajectory deviations) and the lower-workload Stage 3 mode (automated control). The robotic task was time-shared with monitoring AutoCAMS, the environmental process control simulation to control the astronaut’s cabin atmospheric environment (Wickens, Clegg, et al., 2015). The automation supporting routine performance of this system could occasionally fail, and failure management was supported by an automated decision aid that provided both diagnostic advice (Stage 2) and guidance for repair (Stage 3). On a totally unexpected black swan occasion, at the end of the experiment, the diagnosis and guidance failed to appear, which induced the HRUAF.

As anticipated, operator performance suffered when the automation decision aid was unexpectedly gone. Of greatest relevance, the discrete event simulation model and the four-component HAI performance-shaping function described earlier accurately predicted both detection time and total repair time. These times were predicted for conditions in which the decision aid was present and when it was unexpectedly gone. These times were also predicted under both higher- and lower-workload conditions created by the lower and higher DOA of the concurrent robotic arm task described earlier.

The S-PRINT model predictions and performance data are shown in Table 3 and Figures 5 (detection) and 6 (total repair time) and reveal the high degree of correlation (r = .99 for detection, r = .88 for repair).

Table 3:

Model-Predicted and Empirically Gathered Time to Detect and Repair Process Control Faults

Process Control Task	Robotic Arm Condition	Diagnosis and Guidance Automation	Model-Predicted Times (s)		Actual Observed Times (s)
Process Control Task	Robotic Arm Condition	Diagnosis and Guidance Automation	M	SD	M	SD
Detect	Automated	No failure	5.4	4.5	1	3
Detect	Automated	Gone	6.5	3.8	2	2
Detect	Manual	No failure	7.0	6.2	2	1
Detect	Manual	Gone	9.1	6.8	4	7
Repair	Automated	No failure	59.4	1.6	88	28
Repair	Automated	Gone	71.0	0.4	114	29
Repair	Manual	No failure	59.1	0.1	82	6
Repair	Manual	Gone	69.3	0.6	137	37

Figure 5.

Plot of mean model-predicted (x-axis) and mean empirically determined time (y-axis) to detect fault in the process control system (r = .99).

Figure 6.

Plot of mean model-predicted (x-axis) and mean empirically determined (y-axis) time to repair fault in the process control system (r = .88).

Discussion and Conclusions

In this paper we argue, and data support the conclusion, that the higher DOA (the higher tree), as operationalized by both later stages and higher levels of automation, increasingly compromises the human response to automation failures (the harder fall following the lumberjack axe). This compromise results from the higher DOA inducing more complacency, which delays or prevents noticing the black swan automation failure. The higher DOA supports less operator engagement, decreasing understanding and lowering SA in more highly automated conditions. The loss of SA is an undesirable state fostered by increasing automation complexity and decreasing transparency and mode feedback.

The aforementioned effects were modeled in different ways to characterize three different complex systems. In all cases, these model predictions were validated with reasonably high correlations between predicted and obtained values (Wickens & Sebok, 2014). In ADAT, the validation was based on the rankings of FMS designs. In MIDAS-FAST and S-PRINT, validations were performed against actual HITL responses to unexpected automation failures in simulation studies. The capability of such modeling efforts is vital to being able to predict the appropriate DOAs and other automation system design factors that effectively support operator performance.

Importantly, each model utilized specific metrics that characterize the quality of HAI in terms of support for HRUAF. In ADAT, this metric was support for failure noticing and automation complexity. In MIDAS-FAST, it was the automation-induced complacency prediction, and in S-PRINT, it was the combined metrics of reliability, DOA, automation complexity, and failure detection and interpretation.

These three projects, summarized in Table 4, were performed sequentially, and lessons learned during one project were applied to subsequent projects. Comparing these three projects offers a set of lessons learned, divided into (a) scope, modeling, and software tool development lessons and (b) validation study lessons.

Table 4:

Summary and Comparison of the Three Modeling Efforts

	Automation Design Advisor Tool (ADAT)	Man–Machine Integration Design Analysis System–Function Allocation Support Tool (MIDAS-FAST)	Space Performance Research Integration Tool (S-PRINT)
Domain	Flight deck automation: FMS	Robotic arm supervision and control	Robotic arm and environmental process control in a multitasking environment
Primary output	Qualitative assessments of the extent to which the design is expected to support pilot performance	Performance predictions: Deviations from the planned trajectory, quality, and speed of performance; workload; and SA; scanning behavior	Performance predictions: Time to detect and repair fault in the process control system and trajectory control performance in the robotic environment
Validation	Five SMEs evaluated three FMS designs, and the rank orders of scores matched expectations	Experimental studies provided data to parameterize the model; a model of operator complacency was used for a true validation	Experimental studies provided data to build the model and to validate aspects of the model (time to complete task, task selection)
Primary success	Evaluates and compares FMS designs and predicts probability of detecting and time to detect an FMS mode transition	Predicts DOA-induced complacency and resulting HRUAF directly and successfully; directly connects the human performance model to a robotic simulation	Predicts operator performance (time to complete mission, errors, workload) in a variety of contexts; allows users to investigate different types of automation design and failures
Primary limitation	Does not address other human performance considerations (e.g., workload, predicted performance); does not evaluate “across-the-flight” performance	Includes a limited number of automation scenarios and requires extensive “post processing” to identify critical failure parameters, e.g., complacency	Does not connect to a simulator; does not include visual scanning or SA predictions

Note. DOA = degree of automation; FMS = flight management system; HRUAF = human response to unexpected automation failure; SA = situation awareness; SME = subject matter expert.

Regarding scope issues, many factors have the potential to influence HAI. Careful analysis is needed beforehand to identify those factors that are likely to have the greatest impact on operator and system performance (e.g., frame-of-reference transformation in robotic arm control; Gacy, Wickens, Sebok, Gore, & Hooey, 2011; fatigue effects on long-duration missions; Wickens, Hutchins, Laux, & Sebok, 2015). Second, trade-offs are needed to decide what should be modeled. The additional realism gained by connecting a model to a simulator can result in more accurate predictions, but it can create a less usable system. Third, a dynamic model of human performance that predicts a variety of performance parameters across the mission offers far more insights into expected human–system performance than a simple design assessment tool. Such a model, running multiple iterations, can provide measures of variance in outputs, such as response time to failure. Fourth, if failure response is of interest, there should be ways for users to identify a variety of failure types, such as the distinction between errors of omission (failing to detect) and errors of commission (following an inappropriate automation recommendation for action; Wickens, Clegg, et al., 2015). Fifth, the tool should be developed to allow extensibility. S-PRINT, as an example, allows users to develop completely new human performance models to evaluate a wide variety of automation types.

Another key point from this work is that validation efforts are necessary for establishing model credibility, particularly when the models attempt to predict human performance in complex work environments. It is sometimes possible through knowledge of previous studies in the empirical literature to identify a study with conditions that are similar to those predicted by the model. The empirical findings can be compared with model predictions and provide support for the validity of model predictions. One important caveat is that the scenarios investigated in the experiment must be similar enough to those predicted by the model that comparisons can be justified.

Although previous studies can be useful for validation, it is generally desirable to perform targeted empirical studies that specifically examine the same scenarios as the model predicts. These studies need to be planned carefully to identify what parameters will be compared and then ensure that the experiment includes those performance measures. Targeted validation studies must gather the same performance parameters (e.g., time to complete mission, deviations from the trajectory) as the model predicts. It is also highly desirable to include the same types of measures in the experiment as are predicted by the model (e.g., workload, SA). However, collecting the same types of measures is typically challenging, as models often predict continuous workload or SA, and experiments typically gather discrete subjective workload measures and SA snapshots. It is possible to gather continuous physiological measures of workload to compare with model-predicted workload functions, and it is also possible to collapse (integrate or average) model-predicted workload or SA across a scenario.

Finally, for stochastic models that attempt to predict a variety of human performance metrics, a full model validation is virtually impossible. Models that predict multiple parameters quickly extend beyond what is possible to evaluate in an HITL experiment. However, by identifying key parameters—those considered most important for the end users of the model—it is possible to develop the study to test these most relevant issues and thereby validate models for these critical parameters.

Key Points

The lumberjack analogy specifies that as the degree of automation increases, routine performance improves, but performance is more severely degraded in automation failure conditions.

Automation failures, particularly in highly reliable systems, are unexpected “black swan” events that are vulnerable to being overlooked by operators.

Three human performance model–based tools that include these two principles have been developed to predict operator performance in different automation conditions.

Subject matter expert and empirical validations provide evidence for the quality of model predictions in identifying the impacts of automation design and failure on operator performance.

Footnotes

Acknowledgements

The projects described in this paper were supported by three NASA grants: NNX07AT79A (ADAT), NNX09AM81G (MIDAS-FAST), and NNX12AE69G (S-PRINT). The authors would like to thank the NASA project sponsors and our fellow researchers. In addition, we thank the many subject matter experts who contributed to each of these projects. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views of NASA.

Angelia Sebok is a principal human factors engineer and program manager at Alion Science and Technology. She earned her MS degree in industrial and systems engineering from Virginia Tech in 1991, and she is a Certified Human Factors Professional (CHFP).

Christopher D. Wickens is a professor emeritus of aviation and psychology at the University of Illinois. He is currently a consulting senior scientist for Alion Science and Technology and a researcher and visiting professor at Colorado State University.

References

Allender

(2000). Modeling human performance: Impacting system design, performance, and cost. In Chinni

(Ed.), Proceedings of the Military, Government and Aerospace Simulation Symposium, 2000 Advanced Simulation Technologies Conference (pp. 139–144). San Diego, CA: Society for Computer Simulation International.

Bahner

J. E.

Hüper

Manzey

(2008). Misuse of automated decision aids. International Journal of Human–Computer Studies, 66, 688–699.

Boehm-Davis

D. A.

Holt

R. W.

Diez

Hansberger

J. T.

(2002). Developing and validating cockpit interventions based on cognitive modeling. In Gray

W. D.

Schunn

C. D.

(Eds.), Proceedings of the Twenty-Fourth Annual Conference of the Cognitive Science (p. 27). New York, NY: Psychology Press.

Boorman

D. J.

Mumaw

R. J.

(2004, September/October). A new autoflight/FMS interface: Guiding design principles. Paper presented at the International Conference on Human-Computer Interaction in Aeronautics, Toulouse, France.

Burns

C. M.

Skraaning

Jamieson

G. A.

Lau

Kwok

Welch

Andresen

(2008). Evaluation of ecological interface design for nuclear process control: Situation awareness effects. Human Factors, 50, 663–679.

Byrne

(2013). Cognitive models. In Lee

Kirlik

(Eds.), The Oxford handbook of cognitive engineering (pp. 415–423). Oxford, UK: Oxford University Press.

Casner

S. M.

Geven

R. W.

Recker

M. P.

Schooler

J. W.

(2014). The retention of manual flying skills in the automated cockpit. Human Factors, 56, 1506–1516.

Degani

(2004). Taming Hal: Designing interfaces beyond 2001. New York, NY: Palgrave Macmillan.

Degani

Heymann

Shafto

(2013). Modeling and formal analysis of human–machine interaction. In Lee

Kirlik

(Eds.), The Oxford handbook of cognitive engineering (pp. 433–448). Oxford, UK: Oxford University Press.

10.

Endsley

M. R.

Kiris

E. O.

(1995). The out-of-the-loop performance problem level of control in automation. Human Factors, 37, 387–394.

11.

Fennell

Sherry

Roberts

R. J.

Jr. Feary

(2006). Difficult access: The impact of recall steps on flight management system errors. International Journal of Aviation Psychology, 16, 175–196.

12.

Ferris

Wickens

C. D.

Sarter

(2010). Cockpit automation: Still struggling to catch up . . . In Salas

Maurino

(Eds.), Human factors in aviation (2nd ed., pp. 479–502). New York, NY: Elsevier.

13.

Foyle

Hooey

(2008). Human performance models in aviation. Boca Raton, FL: Taylor & Francis.

14.

Funk

Lyall

Wilson

Vint

Niemczyk

Suroteguh

Owen

(1999). Flight deck automation issues. International Journal of Aviation Psychology, 9, 109–123.

15.

Gacy

A. M.

Wickens

C. D.

Sebok

Gore

B. F.

Hooey

B. L.

(2011). Modeling operator performance and cognition in robotic missions. In Proceedings of the Human Factors and Ergonomics Society 55th Annual Meeting (pp. 861–865). Santa Monica, CA: Human Factors and Ergonomics Society.

16.

Garg

A. X.

Adhikari

N. K.

McDonald

Rosas-Arellano

M. P.

Devereaux

P. J.

Beyene

Sam

Haynes

R. B.

(2005). Effect of computerized decision support systems on practitioner performance and patient outcome. Journal of the American Medical Association, 293, 1223–1238.

17.

Gore

B. F.

(2010). Man–Machine Integration Design And Analysis System (MIDAS) v5: Augmentations, motivations, and directions for aeronautics applications. In Cacciabu

P. C.

Hjalmdahl

Lüdtke

Riccioli

(Eds.), Human modelling in assisted transportation (pp. 43–54). Heidelberg, Germany: Springer.

18.

Gray

(2007) Integrated models of cognitive systems. Oxford, UK: Oxford University Press.

19.

Halford

G. S.

Baker

McCredden

J. E.

Bain

J. D.

(2005). How many variables can humans process? Psychological Science, 16, 70–76.

20.

Hegerth

Lorenz

Villmek

Krems

(2016). Keep your scanners peeled: Gaze behavior as a measure of automation trust during highly automated driving. Human Factors, 58, 509–519.

21.

Hoff

Bashir

(2015) Trust in automation integrating empirical evidence on factors that influence trust. Human Factors, 57, 407–434.

22.

John

B. E.

Blackmon

M. H.

Polson

P. G.

Fennell

Teo

(2009). Rapid theory prototyping: An example of an aviation task. In Proceedings of the Human Factors and Ergonomics Society 53rd Annual Meeting (pp. 794–798). Santa Monica, CA: Human Factors and Ergonomics Society.

23.

Kaber

D. B.

Endsley

M. R.

(2004). The effects of level of automation and adaptive automation on human performance, situation awareness and workload in a dynamic control task. Theoretical Issues in Ergonomics Science, 5, 113–153.

24.

Laughery

K. R.

Jr. Lebiere

Archer

(2006). Modeling human performance in complex systems. In Salvendy

(Ed.), Handbook of human factors and ergonomics (3rd ed., pp. 967–996). Hoboken, NJ: Wiley.

25.

Wickens

Sarter

Sebok

(2014). Stages and levels of automation in support of space teleoperations. Human Factors, 56, 1050–1061.

26.

Lüdtke

Osterloh

J.-P.

(2010, November). Modeling memory effects in the operation of advanced flight management systems. Paper presented at the Human Computer Interaction Aero Conference 2010, Cape Canaveral, FL.

27.

Lüdtke

Osterloh

J.-P.

Frische

(2012, February). Multi-criteria evaluation of aircraft cockpit systems by model-based simulation of pilot performance. Paper presented at the Embedded Real Time Software and Systems Conference, Toulouse, France.

28.

Lüdtke

Osterloh

J.-P.

Mioch

Rister

Looije

(2009). Cognitive modelling of pilot errors and error recovery in flight management tasks. In Proceedings of 7th IFIP WG 13.5 Working Conference, HESSD 2009 (pp. 54–67). Berlin, Germany; Springer.

29.

Manzey

Bleil

Bahner-Heyne

J. E.

Klostermann

Onnasch

Reichenbach

Rottger

(2008). Auto-CAMS 2.0 manual. Retrieved from http://www.aio.tu-berlin.de/?id=30492

30.

Mercado

Rupp

Chen

Barnes

Barber

Procci

(2016). Intelligent agent transparency in human-agent teaming for multi-UxV management. Human Factors, 58, 401–415.

31.

Molloy

Parasuraman

(1996). Monitoring an automated system for a single failure: Vigilance and task complexity effects. Human Factors, 38, 311–322.

32.

Moray

Inagaki

(2000). Attention and complacency. Theoretical Issues in Ergonomics Science, 1, 354–365.

33.

Morrow

D. G.

Wickens

C. D.

North

(2006). Reducing and mitigating human error in medicine. In Nickerson

R. S.

(Ed.), Annual review of human factors and ergonomics (Vol. 1, pp. 254–296). Santa Monica, CA: Human Factors and Ergonomics Society.

34.

Mosier

K. L.

Skitka

L. J.

Heers

Burdick

(1998). Automation bias: Decision-making and performance in high-tech cockpits. International Journal of Aviation Psychology, 8, 47–63.

35.

Mumaw

Boorman

D. J.

Prada

R. L.

(2006, September). Experimental evaluation of a new autoflight interface. Paper presented at HCI-Aero 2006, International Conference on Human Computer Interaction, Seattle, WA.

36.

Nikolic

Orr

Sarter

(2004). Why pilots miss the green box: How display context undermines attention capture. International Journal of Aviation Psychology, 14, 39–52.

37.

Olson

W. A.

Sarter

N. B.

(2001). Management-by-consent in human–machine systems: When and why it breaks down. Human Factors, 43, 255–266.

38.

Onnasch

Wickens

C. D.

Manzey

(2014). Human performance consequences of stages and levels of automation: An integrated meta-analysis. Human Factors, 56, 476–488.

39.

Parasuraman

Manzey

(2010). Complacency and bias in human use of automation: A review and attentional synthesis. Human Factors, 52, 381–410.

40.

Parasuraman

Sheridan

T. B.

Wickens

C. D.

(2000). A model for types and levels of human interaction with automation. IEEE Transactions on Systems, Man, and Cybernetics–Part A: Systems and Humans, 30, 286–297.

41.

Parasuraman

Sheridan

T. B.

Wickens

C. D.

(2008). Situation awareness, mental workload, and trust in automation: Viable, empirically supported cognitive engineering constructs. Journal of Cognitive Engineering and Decision Making, 2, 141–161.

42.

Pew

Mavor

(1997). Modeling cognitive and organizational performance. Washington, DC: National Academy Press.

43.

Prevot

Homola

J. R.

Martin

L. H.

Mercer

J. S.

Cabrall

C. D.

(2012). Toward automated air traffic control: Investigating a fundamental paradigm shift in human/systems interaction. International Journal of Human–Computer Interaction, 28, 77–98.

44.

Pritchett

(2009). Aviation automation: General perspectives and specific guidance for the design of modes and alerts. In Durso

(Ed.), Reviews of human factors and ergonomics (Vol. 5, pp. 82–113). Santa Monica, CA: Human Factors and Ergonomics Society.

45.

Rensink

R. A.

(2002). Change detection. Annual Review of Psychology, 53, 245–277.

46.

Rovira

McGarry

Parasuraman

(2007). Effects of imperfect automation on decision making in a simulated command and control task. Human Factors, 49, 76–87.

47.

Sarter

N. B.

Mumaw

R. J.

Wickens

C. D.

(2007). Pilots’ monitoring strategies and performance on highly automated flight decks: An empirical study combining behavioral and eye tracking data. Human Factors, 49, 347–357.

48.

Sarter

N. B.

Woods

D. D.

(1992). Pilot interaction with cockpit automation: Operational experiences with the flight management system (FMS). International Journal of Aviation Psychology, 2, 303–321.

49.

Sarter

N. B.

Woods

D. D.

(1994). Pilot interaction with cockpit automation II: An experimental study of pilots’ models and awareness of the flight management system. International Journal of Aviation Psychology, 4, 1–28.

50.

Sarter

N. B.

Woods

(2000). Team play with a powerful and independent agent: A full-mission simulation study. Human Factors, 42, 390–402.

51.

Sauer

Chavaillaz

Wastell

(2015). Experience of automation failures in training: Effects of trust, automation bias, complacency, and performance. Ergonomics, 59, 767–780.

52.

Schaefer

Chen

Szaslmas

Hancock

(2016) A meta-analysis of factors influencing the development of trust in automation. Human Factors, 58, 377–400.

53.

Sebok

Wickens

Sarter

Gore

Hooey

Gacy

Brehon

(2013). Space human factors and habitability MIDAS-FAST: Development and validation of a tool to support function allocation. Final report provided to the NASA Human Research Program under Grant NNX09AM81G.

54.

Sebok

Wickens

Sarter

Quesada

Socash

Anthony

(2012). The Automation Design Advisor Tool (ADAT): Development and validation of a model-based tool to support flight deck automation design for NextGen operations. Human Factors and Ergonomics in Manufacturing and Service Industries, 22, 378–394.

55.

Sheridan

Parasuraman

(2000). Human versus automation responding to failures: An expected value analysis. Human Factors, 42, 403–407.

56.

Sheridan

T. B.

Verplank

W. L.

(1979). Human and computer control of undersea teleoperators. Cambridge, MA: MIT.

57.

Sherry

Feary

Polson

Fennell

(2003). Drinking from the fire hose: Why the flight management system can be hard to train and difficult to use (NASA Tech. Rep. NASA/TM-2004-212274). Washington, DC: NASA.

58.

Steelman

K. S.

McCarley

J. S.

Wickens

C. D.

(2011). Modeling the control of attention in visual workspaces. Human Factors, 53, 142–153.

59.

Taleb

(2007). The black swan: The impact of the highly improbable. New York, NY: Random House.

60.

Trapsilawati

Wickens

C. D.

Chen

C.-H.

(2015). Factors assessment of conflict resolution aid reliability and time pressure in future air traffic control. Ergonomics, 58, 897–908.

61.

Wickens

C. D.

(2015). Noticing events in the visual workplace: The SEEV and NSEEV models. In Hoffman

R. R.

Hancock

P. A.

Scerbo

M. W.

Parasuraman

Szalma

J. L.

(Eds.), Cambridge handbook of applied perception research (pp. 749–768). New York: Cambridge.

62.

Wickens

C. D.

Alexander

A. L.

(2009). Attentional tunneling and task management in synthetic vision displays. International Journal of Aviation Psychology, 19, 182–189.

63.

Wickens

C. D.

Clegg

Vieane

Sebok

Janes

(2015). Visual attention allocation between robotic arm and environmental process control: Validating the STOM task switching model. In Proceedings of the Human Factors and Ergonomics Society 59th Annual Meeting (pp. 617–621). Santa Monica, CA: Human Factors and Ergonomics Society.

64.

Wickens

Hooey

Gore

Sebok

Koenecke

(2009). Identifying black swans in NextGen: Predicting human performance in off-nominal conditions. Human Factors, 51, 638–651.

65.

Wickens

C. D.

Hutchins

Carolan

Cumming

(2013). Effectiveness of part-task training and increasing-difficulty training strategies: A meta-analysis approach. Human Factors, 55, 461–470.

66.

Wickens

C. D.

Hutchins

S. D.

Laux

Sebok

(2015). The impact of sleep disruption on complex cognitive tasks: A meta-analysis. Human Factors, 57, 930–946.

67.

Wickens

C. D.

Mavor

A. S.

Parasuraman

McGee

J. P.

(Eds.). (1998). The future of air traffic control: Human operators and automation. Washington, DC: National Academy Press.

68.

Wickens

C. D.

McCarley

J. M.

(2008). Applied attention theory. Boca Raton, FL: CRC Press.

69.

Wickens

C. D.

Rice

Keller

Hutchins

Hughes

Clayton

(2009). False alerts in the air traffic control conflict alerting system: Is there a “cry wolf “ effect ? Human Factors, 51, 446–462.

70.

Wickens

C. D.

Sebok

(2014). Flight deck models of workload and multitasking: An overview of validation. In Vidulich

M. A.

Tsang

P. S.

Flach

J. M.

(Eds.), Advances in aviation psychology (Vol. 1, pp. 69–84). Burlington, VT: Ashgate.

71.

Wickens

C. D.

Sebok

Keller

Peters

Small

Hutchins

Algarín

Gore

B. F.

Hooey

B. L.

Foyle

D. C.

(2013). Modeling and evaluating pilot performance in NextGen: Review of and recommendations regarding pilot modeling efforts, architectures, and validation studies (NASA/TM 2013–216504). Washington, DC: NASA.

72.

Wickens

C. D.

Sebok

Sarter

Gacy

(2015). Using modeling and simulation to predict operator performance and automation-induced complacency with robotic automation: A case study and empirical validation. Human Factors, 57, 959–975.

Implementing Lumberjacks and Black Swans Into Model-Based Tools to Support Human–Automation Interaction

Abstract

Objective:

Background:

Method:

Results:

Conclusion:

Application:

Keywords

Introduction

Modeling

Automation Design Advisor Tool (ADAT)

Man–Machine Integration Design Analysis System–Function Allocation Support Tool (MIDAS-FAST)

Space Performance Research Integration Tool (S-PRINT)

Discussion and Conclusions

Key Points

Footnotes

Acknowledgements

References