Abstract
The concept of the “lumberjack effect” addresses important concerns regarding the role of automation (and impact of automation failure) in complex engineering systems such as aviation, power plant control rooms, and others. Conflicts between “narrow” and “broad” definitions of automation failures uncover logical inconsistencies regarding human-automation interactions processes in maintaining system performance. However, the “lumberjack effect” debate obscures essential considerations that automation tools exist as components in complex systems with multiple levels of abstraction hierarchies (Rasmussen). A sole focus on automation failure conflates the automation components with the overall system function. Elaborating operating conditions associated with human-automation interaction conflicts and failures supports an improved approach to failure mode and effects analysis techniques for improving system safety. Detecting system failure modes and operating conditions, poorly quantified in other risk priority determinations, are important contributions at the overall system analysis level, not as an isolated emphasis on the automation itself.
Keywords
Introduction
Skraaning and Jamieson’s paper (SJ: (Skraaning Jr & Jamieson, 2023)) addressing “narrow” and “broad” views of automation failure provide both an important set of insights, and expose some troubling concerns, regarding cognitive human factors engineering approaches to human-automation interaction (HAI). An ongoing debate regarding these aspects of automation failure centers around the metaphoric “lumberjack effect”: with more complex and higher levels of automation rather than human task allocation (the “taller the trees”), the impacts of automation failure lead to more consequential and catastrophic outcomes (the “harder they fall”) (Hauptman & McNeese, 2022).
One potentially problematic element of this “lumberjack effect” debate is whether the focus on “automation failure” is the appropriate level of analysis and critique for these elements of system design, analysis, and intervention. This is not to say that automation failures do not occur, or that they are not problematic in their effects on complex systems operations. Further, there are many aspects of complex systems operations where elimination of automation, or restrictions of automated system functions to purely human-controlled teleoperations, would render these systems completely inoperable. Consider, as an exemplar instance, the performance of automated systems in spaceflight exploration and science missions (Caldwell, 2023). Earth-based distributed supervisory team coordination and information flow in human spaceflight operations (let alone fully autonomous and robotic missions) fundamentally requires multiple automated systems to perform accurately and appropriately, and to effectively provide information to support distributed human expertise, for such missions to be accomplished at all (Caldwell, 2005, 2005b; Caldwell et al., 2007; Caldwell & Onken, 2012).
Spaceflight exploration quintessentially involves extending human awareness, information flow, and task performance into previously unknown domains. Humans do not have the capability to perform these extensions without the benefit of complex automation systems operating in novel and unexpected conditions. The debate about the lumberjack effect has its origins here, in the design phase of the automation systems themselves, and may be laid at the feet of the (human) designers of those systems. Engineering systems operating in novel conditions cannot be designed to be infinitely robust, or capable of anticipating all possible operating conditions (Caldwell, 2014). An automation system with a fixed and limited level of data processing, component response capability, and performance reliability is necessarily limited in its operational range over time. Operating envelopes regarding system component designs and performance capabilities begin with the system-as-designed, with any associated assumptions of potential risks of failure and mechanisms for how to respond to them during the performance of the system-as-implemented (Liu et al., 2013; Sharma & Srivastava, 2018).
An explicit aspect of spaceflight event/fault detection, isolation, and recovery (EDIR/FDIR) anomaly resolution is that engineering components, sensors reporting the state and function of those components, telemetry information flows between components and monitoring systems, and supervisory control interfaces to those monitoring systems can all be sources of a reported anomaly and need to be assessed for correct functioning (Caldwell et al., 2007; Onken & Caldwell, 2009, 2011). Such risk analysis approaches are intended to examine system operating capabilities and responses to potential failures throughout the system development and operational cycles from initial design through operational life and into final retirement (Buede, 2000; Fiksel, 2003; Maier, 1998; NASA, 2007; Suranto, 2015). Automation designer hubris, not the automation system itself, is responsible for any designs that require an automation system to autonomously, completely, and independently function in all combinations of dynamic and unaddressed environmental conditions or system configurations in any complex task setting. In this sense, the height of the tree is not the level of the automation itself, but the hubris of the design engineer and the brittleness of the resulting design to respond to dynamic operating challenges. This issue of system design and risk analysis will be addressed again later in this response.
Logical Versus Empirical Disconfirmation
Two types of hypothesis testing and knowledge advancement are common in the sciences and engineering. Most work in areas of cognitive psychology, human factors engineering, and other branches of engineering operate on the basis of empirical disconfirmation: we set up a hypothesis, collect data, and try to determine if the hypothesis is likely supported by the data collected. Logical disconfirmation can be a much stronger form of knowledge advancement. If we propose an explanation of the world, and it can be logically shown to be inconsistent or contradictory to other explanations about the same phenomena at the same scale of analysis, we know we have a significant problem where some component of the explanations must be false. This is a fascinating, and potentially powerful, aspect of the SJ discussion. Both the “narrow” and “broad” definitions of automation failure presented by Wickens and critiqued by Skraaning and Jamieson have very concerning logical weaknesses. A broad (liberal) definition of automation failure that suggests that any misalignment of the operator’s mental model of automated system performance with the actual design or implementation of the automation qualifies as automation failure (SJ, second manuscript page, second column) is exceptionally suspect. The very process of human learning can be defined as a refinement of cognitive models and associations of condition, cause, and effect through increased task-relevant experience. By definition, the “mistakes” class of human error analysis represents conditions where the operator performs an action based on an incorrect understanding of system state or the effect of that action; errors of omission or commission can also result from operator misunderstandings or lack of knowledge about the true system state (Reason, 1988, 1990; Swain, 1990). A novice student pilot’s failure to understand how an aircraft autopilot operates, or the difference between descent rate versus angle, is not a failure of the autopilot automation. Even an experienced pilot may be exposed to a new automation technology for the first time, and use an incorrect cognitive model based on how previous technologies have functioned in their prior experience. This would not be seen as automation failure, but negative transfer of training (Grossman & Salas, 2011; Liu et al., 2008).
Similarly, a very narrow (conservative) definition of automation failure as interpreted (SJ, second manuscript page, first column) seeks to draw hard distinctions between the actual task operations, and the “support” features of those task operations (such as information presentation, state projection, or decision support options. This type of component distinction seems to reject the design and implementation of human supervisory control systems and any version of the Sheridan “level of automation” hierarchy supporting system interfaces between human interactive systems and task interactive systems, that is, levels 2–9 on the 1–10 scale (Sheridan, 1987, 1992; Sheridan, 2011; Sheridan et al., 1978). It becomes a logical impossibility to suggest that automation components without their related support and information flow systems represent the full automation system unless and until the engineering system as a whole functions continuously at a Sheridan level 10 automation (i.e., no human input or interaction at all during system operations).
Such results imply that neither the narrow nor the broad definitions of automation failure as discussed are useful in our study of complex human-automation interactions (HAI). A fundamental defect in the lumberjack effect debate is the confusion of what and how the automation being addressed relates to the system as it operates, and the functions that such a system is designed, or is in reality capable, to perform in order to support desired goals.
The Automation is Not the System
As described above, automation is an essential contributor to spaceflight exploration, whether or not the humans themselves are in space (Caldwell, 2023). However essential though, the purpose of spaceflight automation ranging from German clock drives to on-board state determination and error correction for radio telescope observatories is not just the creation or operation of the automation itself. In other words, the spaceflight automation is not the entire spaceflight mission, nor does it exist as a standalone entity decontextualized from the purpose of the system. Here, the analysis of any complex system’s automation components must be viewed in terms of the abstraction hierarchy of purpose, function, and physical form, as introduced by Rasmussen (Rasmussen, 1985).
Conceptually, the allocation of functions to automation has long been seen as a potentially valuable method for increasing system safety and reliability. For many decades since Fitts, the question has always been how, where, and when to allocate those functions (Fitts et al., 1951; Fitts & Posner, 1967). Rasmussen’s emphasis on abstraction hierarchies and ecological displays of system operations has been a specific focus on increasing both the transparency of the system’s operation and the elaboration of “maps” of the linkages of the system’s physical forms and component interactions to the purpose and function of the engineering system and its operations in the world (Lind, 2003; Rasmussen, 1985, 1988; Sheridan, 2017; Vincente & Rasmussen, 1992). The power plant examples at the basis of the SJ work directly relate to the abstraction hierarchy work performed by Rasmussen and others (Lind, 2003; Rasmussen, 1988; Reising, 2000; Vincente & Rasmussen, 1992). There is a very substantial problem, though, in confusing the performance of a specific version (“physical form”) of an automation implementation with the purpose or abstract function of an automation subsystem of a complex engineering system (such as a power plant, aircraft, or spacecraft). Further, the strict restriction on which inputs, outputs, and processes are within scope (as implied by the narrow definition of automation failure) is even more problematic in assessing a single specific form to a general abstract function of automation.
Unfortunately, this discussion of the abstraction hierarchy of specific automation version implementations and their effects on system operations seems to be missing (at least to this author). In fact, there is a broader consideration of how crucial terms are used in order to support such analyses. It seems without question that the purpose of human factors engineering analysis and research on HAI is to improve the overall reliability of complex, high risk systems that utilize complex HAI, not simply to decompose and point-optimize specific automation components. The distinction of system analysis and component function, however, can be elaborated in the differences in terminology use regarding the concept of “mode” in the SJ and Wickens work, compared to other treatments of systems engineering and risk analysis.
Modes and Modes
A text search of the word “mode” in SJ (Skraaning Jr & Jamieson, 2023) helps to illuminate one of the concerns of an excessively limited view of the discipline when addressing system failure (especially when compared to the authors’ considerations of automation failure). Four instances of the term exist in the paper (with a fifth in the reference section, within the title of the Sarter and Woods paper (Sarter & Woods, 1995) on “mode error” in supervisory control systems). Each of these four in-text uses addresses an aspect of “mode” in the sense of configurations, programs or settings intended to execute a particular type of aviation function (autopilot, final approach flare, or throttle position hold). Sarter’s later work on the topic (Sarter, 2008) indicates that “[m]ode awareness refers to an operator’s knowledge and understanding of the current and future automation configuration, including its status, targets, and behavior” [pg. 506], implying that the aviation “mode” of interest is the functional configuration of the automation-as-implemented (whether or not it faithfully executes the capabilities of automation-as-designed).
It is surprising, then, that a paper addressing fatal accidents in the aviation context does not have any reference to the rich historical tradition, born and developed in the aerospace industry, of risk evaluation or risk assessment known as failure modes and effects analysis (FMEA) (Liu et al., 2013; Sharma & Srivastava, 2018). In the FMEA-common usage, “mode” is not simply a particular automation configuration, but an overall combination of features, functions and conditions that can or will (or in a post-hoc analysis, did) lead to an overall system failure (not simply of a single component). Failure mode in this sense is more analogous to the overall pattern of holes allowing a situation to progress to a catastrophe in Reason’s “swiss cheese” model of error (Reason, 1988, 1990), not the characteristics or locations of specific holes.
Also lost in the SJ discussion of modes is a broader consideration of causal and contributing factors in the determination of (human) error and system failures. Many error causation models have been developed since the original Swain conceptualizations (Swain, 1990); major weaknesses in a number of these models include excessive emphasis on a single source of fault, or a single contributing causal factor. The second issue, of single versus multiple causal factors, is addressed in a discussion of the “pinball versus pachinko” model of error causation analysis (Caldwell, 2008; Thomadsen et al., 2003). Is any automation failure addressed in the lumberjack model cases really just a single-cause failure, or a more complex set of interactions of components (including humans), interfaces, and understanding of context (including that of designers
Toward a Discussion of Contexts and Risk Impacts of Human-Automation Interactions
Despite these critiques, this author sees a potentially significant and novel contribution to the systems safety and FMEA literature arising from the lumberjack effect debate. An important aspect of quantifying results of FMEA analysis is the development of a “risk priority numbers” (RPNs) addressing probability of failure mode, severity of the failure, and likelihood of detection of the failure mode as it is progressing (Liu et al., 2013). A frequent critique of RPN calculations is that the determination of failure mode detectability is challenging at best. This is, however, one of the important elements actually provided by the lumberjack effect studies and debate: how well can designers anticipate, or operators and the automation itself detect, the combination of conditions and contexts that represent an impending failure mode (not to be confused with automation mode). Both simulator studies and retrospective analyses of sentinel incidents (not just fatal accidents, but near misses with operator reports) of the type that SJ and Wickens describe help us to understand where and how our understanding is limited and condition recognition and recovery is impaired.
The results of such studies and analyses would not assess the strength of a purported lumberjack effect, but something more crucial to overall system safety analysis. In essence, the analysis provides the first steps towards a quantitative determination of the Contextualized Risk Impacts on Safety and Performance of Human-Automation Response Dynamics (“CRISP-HARD”). Such “CRISP-HARD” metrics not only provide important insights to the FMEA RPN determination of failure mode likelihood of detection, but also support processes of troubleshooting during real time EDIR/FDIR anomaly resolution.
Conclusion
The lumberjack effect debate between Skraaning and Jamieson, and Wickens and colleagues, regarding automation failure represents, in this author’s view, a potentially valuable consideration of HAI, but at an incorrect level of systems safety analysis. The specific forms of the automation components for powerplants, aerospace missions, or other engineering systems are not to be studied or addressed separately from the lifecycle of task and function allocations of the broader functions and purposes of the systems of which the automation is a part. Our studies of when and how automation capabilities are limited in different performance contexts, and how these limitations are hidden from designers or operators, can provide important contributions to the overall assessment and improvement of system safety. However, these contributions require awareness of the abstraction hierarchy and context of the system and its failure modes, not an isolated discussion of the automation as a singular focus.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
