Ergonomics evaluation modeling of multimodal interaction technology for airplane cockpit and experimental validation

Abstract

Background

With the development of eye movement and voice control, multimodal interaction has become a leading trend for future aircraft cockpits. Yet, most studies either demonstrate its effectiveness broadly or focus on single channels, especially the visual channel. A comprehensive ergonomic evaluation method is still lacking.

Objective

This study proposes and validates a model for evaluating multimodal interaction in aircraft cockpits, addressing workload distribution, channel occupancy, and resource demand.

Methods

Our research proposes an evaluation model for multimodal interaction in aircraft cockpits, based on the analysis of pilots’ information cognitive processing. The model sequentially evaluates the workload of multimodal interaction technology in aircraft cockpit scenarios from the perspectives of task level, channel occupancy, and channel resource demand. Validation experiments were conducted using route planning tasks, assessing the workload for tactile interaction modes and multimodal interaction modes.

Results

Results demonstrated the effectiveness of the proposed ergonomics evaluation model, as correlation analysis revealed a significant positive relationship between the model outcomes and subjective workload scores under both tactile and multimodal conditions. While multimodal interaction incurred significantly higher task time and subjective workload compared to tactile interaction, it also enabled more balanced workload distribution across channels, suggesting potential advantages for managing complex multi-channel tasks in future cockpit operations.

Conclusions

The validation experiments confirmed the effectiveness of our ergonomics evaluation model for multimodal interaction in airplane cockpit. It offers a practical tool for early cockpit design by identifying channel conflicts and workload distribution issues before prototype development, thus supporting safer and more efficient multimodal systems.

Keywords

man-machine systems workload multimodal interaction technology multiple resource theory hierarchical task analysis VACP scale ergonomics evaluation model workload assessment

Introduction

Multimodal interaction technology has become the main development trend for future aircraft cockpit interaction. From the perspective of natural interaction, human perceptual channels operate in parallel; multimodal interaction aims to reasonably allocate and utilize non-visual channels to enhance interaction naturalness and reliability. Traditional cockpit interaction relies primarily on tactile control and visual perception. While these conventional methods limit resource channel conflicts, they often lead to perceptual overload due to the heavy reliance on a single sensory modality. In contrast, multimodal interaction employs resources such as auditory, visual, and tactile—using voice, eye-movement, and touch controls—to manage systems. This approach has been proven to enhance information transmission efficiency, reduce cognitive load, lower task errors, and improve fault tolerance.¹ However, the simultaneous use of multiple channels might result in conflicts between interaction tasks and resource channels without proper design, ultimately compromising performance. Therefore, developing evaluation methods for multimodal interaction during the design phase is essential for understanding channel utilization and potential conflicts, thereby contributing to improvements in cockpit design.

Previous studies have demonstrated that modifying or increasing interaction modalities can enhance task performance. Cohen et al. verified that voice input is more efficient and natural than keyboard input.² In map navigation, Oviatt proposed that integrating multiple channels effectively compensates for the limitations of single-modality voice interaction.³ Van Erp et al. confirmed that in weightless environments, tactile information suffers less loss compared to visual input.⁴ Wilkins and Acton found tactile feedback minimally affected in high-noise environments,⁵ while Jansen et al. showed that tactile cues reduce pilots’ visual workload in flight.⁶ Shree DV et al. demonstrated that gaze-controlled interfaces significantly increase reaction speed compared to touchscreens and joysticks.⁷ Rakkolainen et al. summarized multimodal interaction technologies based on human sensory modalities, highlighting technical feasibility and effectiveness.⁸ Moreover, multimodal interaction improves user satisfaction in VR,⁹ reduces system coupling, decreases errors, lowers cognitive load, and balances cognitive resource allocation.¹ Under emergencies, multimodal systems provide faster responses and greater adaptability.¹⁰ Experimental studies further indicate that effective multimodal fusion reduces cognitive load and enhances overall efficiency.^11,12

The current evaluation of human–machine interaction in aircraft cockpits primarily focuses on single elements such as display interfaces, alert systems, and control methods. Wei et al. analyzed cockpit display interfaces from an ergonomics perspective, experimentally assessing cognitive workload¹³Similarly, Zeng et al. conducted cognitive experiments on optimized interfaces to evaluate the relationship between design factors and task performance,¹⁴ thereby providing theoretical foundations and design implications for display interfaces. Furthermore, Causse, Behrend and Mumaw performed eye-tracking experiments with 20 pilots during flight tasks to analyze gaze fixation distributions in monitoring tasks, then providing the theoretical foundation for layout design.¹⁵ In addition, Tippey et al. compared pilot responses to various alert modalities, demonstrating that tactile alerts yield the highest perceptual efficiency.¹⁶ Strickland, Pioro and Ntuen investigated the impact of cockpit instrument design on visual fatigue in flight,¹⁷ suggesting that rational design can alleviate fatigue and enhance work efficiency. Similarly, Schreuder et al. employed eye tracker to assess driver fatigue, which provided the physiological basis for improving in-vehicle alarm designs.¹⁸ Lastly, Aloise et al. examined visual attention toward targets using eye tracker, analyzing optimal areas for interface design elements and the optimal forms for interface layout.¹⁹

In addition to these single-modality approaches, many evaluations rely on prototype-based experiments and conduct overall assessments, but lack decomposition across individual channels. Traditional methods include subjective evaluations (e.g., questionnaires, interviews, observation records),^20,21 objective measures such as task performance and log analysis,^22,23 and physiological indices such as eye movement and EEG.^24,25 These assessments typically use interaction efficiency, user experience, adaptability and many other measurements as indicators.^26,27 For example, Li et al. proposed a user experience evaluation method combining eye-tracking, finger movement, and facial expressions, and further developed an MLP-based affect prediction method from multimodal data.²⁸ Triantafyllidis et al. compared multimodal combinations to assess cognitive workload and usability.²⁹ Aloise et al. employed subjective scales to evaluate flight performance, human–machine efficiency, and comfort during pilot operations, thereby identifying main factors influencing mental workload and effectiveness.¹⁹ Li et al. used subjective measurement of situational awareness to reorganize and reconstruct information presentation according to pilots’ cognitive models, resulting in an improved design for system interfaces.³⁰ AuerStefan et al. compared the effectiveness of various feedback modalities, including auditory and tactile cues, within a VR simulator.³¹ Wang et al. examined the feasibility of replacing conventional side sticks or control yokes with touchscreens for aircraft manipulation; their findings revealed that pilots’ performance, system usability, and situational awareness were significantly lower with touchscreen interfaces, which indicated that the future of manipulation still needs to be explored and improved.³² Furthermore, Xin et al. combined subjective evaluations with eye-tracking data in flight tests to assess AR-HUD displays, demonstrating high situational awareness and low workload under emergency conditions, thereby confirming the validity of the new interface.³³ At the conceptual stage, Dong and Liu developed a multimodal evaluation framework for design concepts, establishing multimodal expression mechanisms and multisensory collaboration principles.³⁴ They proposed methods to convert scattered conceptual views into perceivable multimodal scenarios and applied semantic differential methods for early-stage user experience evaluation.

Although existing methods have advanced the evaluation of cockpit interaction, they remain limited in several aspects. Prototype- and simulation-based assessments cannot fully identify potential channel conflicts during the early design stage. These approaches often measure overall workload or satisfaction but provide limited insight into how specific resource channels are utilized or compete, making it difficult to determine clear directions for improvement. In summary, while prior research has demonstrated both the feasibility and benefits of multimodal interaction, extant evaluation methods mainly rely on single technologies or experiment-based comparisons, lacking systematic mechanisms to assess conflicts and channel occupancy. Given the increasing demand for multimodal cockpit interaction and the potential risk of task–channel conflicts, a comprehensive evaluation framework is urgently needed during the design phase. To address this gap, this study employs multiple resource theory³⁵ and Hierarchical Task Analysis (HTA)³⁶ to analyze pilots’ information processing. Based on this, an evaluation model comprising “Task Conflict Evaluation—Channel Conflict Evaluation—Channel Load Evaluation” is developed and preliminarily validated through a route-planning simulation experiment.

Methodology

Ergonomics evaluation modeling of multimodal interaction technology for airplane cockpit

Analysis of pilot information cognitive processing

Multiple Resource Theory explains the allocation of resources among concurrent tasks and the relationship between workload and task difficulty.³⁵ Multiple Resource Theory categorizes multitasking operations into two primary types: (1) simultaneous operation of tasks within the same channel and (2) simultaneous operation of tasks across different channels. Based on the understanding of pilots’ operational tasks, it is assumed that: (1) when tasks within the same channel are performed simultaneously, conflicts arise; the overall task workload is greater than or equal to the sum of the individual task workloads, and the operation time exceeds the sum of the individual task operation times (as individuals typically convert such tasks into sequential processing); (2) when tasks across different channels are executed simultaneously, they can be processed in parallel; however, the combined task workload is greater than that of any single task yet less than the sum of the individual workloads, and the overall operation time is similarly greater than any single task but less than the sum of their individual times. This theory is typically characterized by four channels: visual, auditory, cognition, and perception. It provides a framework for evaluating the rationality of multimodal interaction technologies in aircraft cockpits by analyzing the occupancy of resource channels. By examining the utilization of each channel, one can assess channel occupancy under multimodal interaction, thereby evaluating the appropriateness of the design and integration of these systems. Based on this rationale, we first analyzed the information processing of pilots.

The pilot information cognitive processing process (Figure 1) mainly originates from the input of visuals, sound, and somatosensory information, and the output of visual, vocal, gesture, and tactile control. Sound information enters through the ears into the auditory channel, visuals through the eyes into the visual channel, and somatosensory through the haptic into the tactile channel. Human cognitive behavior allows for the selective processing of perceived information, thereby entering working memory.³⁷ While processing information, working memory updates and mobilizes long-term memory to process information from different perceived channels.³⁷ Each channel, while processing its own information, can also integrate with other channels to collectively integrate information, such as Visual-auditory integration,” “Tactile-auditory integration,” and “Tactile-visual integration".³⁸ Through the comprehensive integration of information from different channels, the pilot will make decisions and proceed to operation, selecting the appropriate control method to manipulate the aircraft.

Figure 1.

Pilot information process integrating multimodal human-machine interaction technology modes.

The main difference in information processing between multimodal interaction technologies and traditional cockpits lies in channel allocation. In conventional aircraft cockpits, control actions are primarily executed via hand-operated control sticks/joysticks, throttle levers, and instrument panel buttons, with limited use of foot controls thus mainly engaging the behavioral channel and resulting in minimal conflicts with information input channels. In contrast, under a multimodal interaction mode, eye movement control, voice control, and gesture/tactile control concurrently occupy the visual, behavioral, and voice channels.

During aircraft operation, there is the possibility that both information input and output might utilize the same channel, or that multiple pieces of information might be transmitted through a single channel. In other words, under a multimodal interaction framework—which involves visual, auditory, cognition, and behavioral channels—there exists the potential for conflicts between information input and output, as well as among multiple information inputs/outputs. Therefore, eliminating channel conflicts at the procedural logic level in the design of multimodal interaction systems is an effective strategy for optimizing integration. Furthermore, identifying and analyzing these conflicts can serve as a method to evaluate the comprehensive effectiveness of multimodal interaction integration.

Comprehensive ergonomics evaluation model for multimodal interaction technology

Flight tasks are inherently complex, involving intricate interfaces and the simultaneous execution of multiple sub-tasks. To evaluate the effectiveness of multimodal interaction technologies in complex scenarios, it is necessary to decompose the overall task. Event decomposition modeling relies on HTA, which provides a comprehensive model of the sub-goal hierarchy within a system; its structure is applicable to various analyses.³⁶ HTA elucidates the sequence, steps, and interrelationships among sub-tasks, thereby facilitating the analysis of task objectives, required knowledge and skills, and necessary resources. One notable advantage of the HTA method is its ability to effectively analyze task structure and execution processes, and to identify the relationships and dependencies among individual steps. Based on this, our study proposes a “Task Conflict Evaluation—Channel Conflict Evaluation—Channel Load Evaluation” model. In this model, the “Task Conflict Evaluation” component assesses the number of tasks that must be completed at time t; the “Channel Conflict Evaluation” component evaluates the degree of conflict in channel occupancy at time t, that is, the amount of information occupying each channel; and the “Channel Load Evaluation” component measures the resource demands for processing information in each channel at time t. A network model of aircraft tasks based on multimodal interaction technologies is constructed, comprising the following sub-steps:

For any operational task M, input the task and derive its task flow based on the cockpit's available multimodal human–machine interaction modalities.

For any task M, extract its task segments in chronological order to form the task segment set for M, which is represented using the following dataset: M = {m₁, m₂, m₃, ……, m_x, ……, m_n};

Based on task segment m_x, it is decomposed into a sequence of operational tasks arranged in the order of execution. Within task segment m_x, one or more operational tasks may exist. Let $B_{jt}$ denote the number of operational tasks at time t within this task segment m_x;

Based on the operational task, it is further decomposed into visual, auditory, cognitive, and motor channels, thereby forming a time series of channel behaviors. The occupancy demand for each channel is considered to be stochastic. Let $B_{v j t}$ denote the visual channel occupancy demand at time t for task segment m_x, $B_{a j t}$ denote the auditory channel occupancy demand, $B_{c j t}$ denote the cognitive channel occupancy demand, and $B_{p j t}$ denote the motor channel occupancy demand.

Construct a task sequence for the input task M based on its task segments, an operational task sequence derived from the operational tasks, and a behavior element sequence derived from the behavioral factors. Note that both the operational task sequence and the behavior element sequence may contain multiple tasks or behavior elements concurrently. After decomposition, these sequences are organized into a network-like structure.

Task Conflict Evaluation

N_t represents the number of tasks that the operator needs to handle at time t, evaluated through the decomposition of tasks using HTA.

N_{t} = \sum_{j = 1}^{n} B_{jt}

(1)

where

B_{jt} = 1

represents the jth task that needs to be completed at time t.

Channel Conflict Evaluation

Channel conflict evaluation assesses the occupancy of each channel based on the decomposition of tasks into various channels.

Visual/Auditory Channel. Nv_t / Na_t represents the visual/auditory channel conflict of the operator under multimodal interaction tasks.

N v_{t} = \sum_{j = 1}^{n} B_{v j t} / N a_{t} = \sum_{j = 1}^{n} B_{a j t}

(2)

where

B_{v j t} = 1 / B_{a j t} = 1

indicates the occupancy of visual/auditory channel at time t during task j.

Behavior Channel. Npt represents the behavior channel conflict of the operator under multimodal interaction tasks.

N p_{t} = \sum_{j = 1}^{n} (B_{s j t} + B_{h j t})

(3)

where

B_{s j t} = 1

indicates the occupancy of behavior channel at time t during task j due to voice control behavior. Similarly,

B_{h j t} = 1

indicates the occupancy of behavior channel at time t during task j due to hand control behavior.

The Channel Conflict Evaluation Model considers that under ideal conditions, at any given moment a pilot may be engaged in multiple tasks; however, the visual, auditory, and behavioral channels can each only meet the occupancy requirement for one corresponding task. In this model, visual channel occupancy encompasses both information observation and eye control; auditory channel occupancy involves listening to information; and behavioral channel occupancy includes gesture control, voice control, and tactile control. A channel occupancy demand exceeding one may occur if different tasks require the same channel simultaneously or if a single task demands multiple uses of the same channel. When ${Nv}_{t} / {Na}_{t} / {Np}_{t} = 1$ or ${Nv}_{t} / {Na}_{t} / {Np}_{t} = 0$ , it indicates that at time t the pilot's demand for that channel doesn’t exceed 1 unit, implying no conflict. Conversely, when ${Nv}_{t} / {Na}_{t} / {Np}_{t} > 1$ , it is determined that the pilot's demand for that channel exceeds its processing capacity, indicating the task unfeasible and necessitating adjustments to the task flow or improvements in the multimodal interaction design.

Channel Load Evaluation

W_T represents the workload at moment T, constructed based on the channel behavior after task decomposition. It enables the analysis of the changes in channel resource demands of the operator during task.³⁹

\begin{aligned} W_{T} = & [\sum_{i = 1}^{l} \sum_{t = 1}^{m} a_{i, t}] + [\sum_{i = 1}^{l} c_{i, i} \sum_{t = 1}^{m} a_{t, j} + \sum_{i = 1}^{i - 1} \sum_{j = i + 1}^{l} \\ c_{i, j} \\ \sum_{t = 1}^{m - 1} \sum_{s = t + 1}^{m} ((a_{t, i} + a_{s, j}) + (a_{t, j} + a_{s, i}))] \end{aligned}

(4)

Where W_T represents the load at time T, i, j denote the different resource channels, t represents the operator's tasks or behaviors, $a_{i j}$ represents the occupancy of channel i in task j, $c_{i i}$ represents intra-channel conflict coefficient, and $c_{i j}$ represents inter-channel conflict coefficient. Meanwhile, if a_t,i or a_s,j = 0 then (a_t,i + a_s,j) = 0; similarly, if a_t,j or a_s,i = 0, then (a_t,j + a_s,i) = 0.

Within the parentheses, the first part calculates the conflict within the same channel, while the second part assesses the conflict across different channels. This formula is designed to analyze potential conflicts arising from the same channel being occupied by different operations and from different channels being occupied by multiple operations—that is, conflicts may exist both within and between channels. This comprehensive formulation accounts for various channel occupancy scenarios, making it highly suitable for implementation on software tool platforms; however, for routine analysis, a simplified version that omits the conflict coefficient might be used. In the Channel Load Evaluation model, the McCracken-Aldrich scale³⁹ is used as the basis for computational analysis. McCracken and Aldrich have developed rating scales for each VACP component, providing relative ratings of the utilization level of each resource component. When the value of W_T exceeds a predetermined threshold, the operator's workload is considered overloaded at that moment, indicating that the multimodal interaction process is suboptimal and that further optimization of the multimodal design for pilots under these task conditions is necessary.

Model validation experiment

Participants

The experiment involved 20 pilots with extensive flight experience. All participants were right-handed, had normal or corrected-to-normal vision, and did not exhibit any color blindness or color weakness.

Experimental design

This study developed a simulation platform for multimodal interaction in aircraft cockpit, integrating various control modes such as gesture control, haptic control, voice control, and eye movement control. Based on the route planning task provided by the multimodal interaction platform, this experiment employed two completion modes: a tactile operation mode and a multimodal interaction mode (comprising voice, manual, and eye control). The specific task content and sub-task breakdown are presented in the Table 1. The flight route planning task can be decomposed into eight subtasks: Insert a new waypoint before waypoint 3 (step 1); Set the formation pattern of the new waypoint to Formation 1 (step 2); Set the flight altitude of the new waypoint to 12,000 (step 3); Set the flight speed of the new waypoint to 345 (step 4); Change the geographical location of the new waypoint to enemy target A2 (step 5); Play the simulation process(step 6); Adjust the play speed to five times (step 7); End play(step 8).

Table 1.

Flight route planning task process.

Interaction mode	Tactile interaction mode	Multimodal interaction mode
Step1: Insert a new waypoint before waypoint 3	– Select the waypoint 3 – Click the “Insert Key” to add a new waypoint	– Long press the voice button and say “Waypoint 3”. – Gaze at the waypoint list, long press the voice button and say “Insert”.
Step2: Set the formation pattern of the new waypoint to Formation 1	– Click the “Formation Pattern 1” button in the waypoint properties panel	– Gaze at the formation attribute area – Long press the voice button with right hand, and say “Formation Pattern 1”.
Step3: Set the flight altitude of the new waypoint to 12,000	– Drag the flight altitude slider to near 12,000	– Gaze at the flight altitude area – Press the “up/down” button on the joystick to adjust the altitude to near 12,000.
Step4: Set the flight speed of the new waypoint to: 345	– Drag the flight speed slider to near 345	– Gaze at the flight speed area – Press the “up/down” button on the joystick to adjust the speed to near 345.
Step5: Change the geographical location of the new waypoint to enemy target A2	– Drag the new waypoint to a position near enemy target A2	– Gaze somewhere on the map – Long press the “voice button” and say “Here”.
Step6: Play the simulation process	– Click the “Play” button in the deduction player.	– Long press the “voice button” and say “Replay”.
Step7: Adjust the play speed to five times faster	– Click the speed button to adjust to 5 times speed.	– Long press the “voice button” and say “Play speed five times”.
Step8: End play	– Click the “Pause” button.	– Long press the “voice button” and say “Pause”.

In the Tactile interaction mode, participants were asked to complete eight subtasks only by haptic channel. In the multimodal interaction mode, participants’ operations are allocated in different channels. As shown in Table 1, Step 1 can be decomposed as: action channel (voice): Long press the voice button and say “Waypoint 3”, and action channel (voice) and Visual channel: After Waypoint 3 is highlighted, gaze at the waypoint list, long press the voice button, and say “Insert”. Step 2 can be decomposed as: visual channel: Gaze at the formation attribute area in the waypoint properties panel, and action channel (voice): Long press the voice button with right hand, and say “Formation Pattern 1”. Step 3 can be decomposed as: visual channel: Gaze at the flight altitude area in the waypoint properties panel, and action channel (behavior): Press the “up/down” button on the joystick with right hand to adjust the altitude to near 12,000. Step 4 can be decomposed as: visual channel: Gaze at the flight speed area in the waypoint properties panel, and action channel (behavior): Press the “up/down” button on the joystick with right hand to adjust the speed to near 345. Step 5 can be decomposed: visual channel: gaze somewhere on the map, and action channel(voice): long press the “voice button” with right hand, and say “Here”. Step 6, Step 7 and Step 8 were changed from tactile control to voice control, which mainly occupied the action channel (voice).

Prior to the formal experiment, participants were required to practice all platform operations sufficiently. The experimental order was arranged using a Latin square balanced design to mitigate learning effects during the formal experiment. The experimental dependent variables included task completion time, subjective workload measured by the NASA-TLX scale, and a comprehensive evaluation score derived from the model proposed in this study.

Results

The analysis of different interaction modes

The descriptive statistics for task time and NASA-TLX scores were shown in Table 2. A t-test was initially conducted to compare the multimodal interaction mode with the single-channel interaction mode. The results showed that task time (t(18) = −5.89,P < 0.001) and subjective workload (t(16) = −2.11,P = 0.051) of multimodal interaction modes(M_{subjective−workload} = 45.13, M_Tasktime = 41.41) were significantly higher than tactile interaction modes(M_{subjective−workload} = 33.96, M_Tasktime = 27.79). These findings indicated that tactile interaction mode was found to be more convenient and lower workload.

Table 2.

The descriptive statistics for task time and NASA-TLX scores.

	Tactile interaction mode	Multimodal interaction mode
Task time	27.79 (8.62)	41.41 (6.88)
NASA-TLX scores	33.96 (17.71)	45.13 (20.31)

The analysis of channel occupancy in multimodal interaction modes

The modeling described in Section 2.1 includes task conflicts, intra-channel conflicts, and inter-channel conflicts, making it suitable for decomposing complex tasks and enabling calculation through a software platform. In this validation experiment, to evaluate multimodal interaction, we first referred to the task conflict evaluation model defined in Section 2.1.2.1 to decompose tasks into non-conflicting steps within the same time interval. Subsequently, following Section 2.1.2.2, we decomposed operations within the same step across different channels, addressing situations where the same channel is occupied by multiple operations or where different channels are simultaneously occupied, thereby resolving the possibility of both intra-channel and inter-channel conflicts. This allowed us to simplify the ‘channel workload evaluation’ formula by ignoring the conflict coefficients and instead applying the McCracken-Aldrich VACP (Visual, Auditory, Cognitive, and Psychomotor) scale⁴⁰ as the basis for calculation and analysis. The VACP method, grounded in multiple resource theory, is designed to predict mental workload induced by tasks and meets the requirement of this study to assess the effectiveness of multimodal interaction design in its early design phase. McCracken and Aldrich have developed scoring tables for each VACP component, providing relative ratings of the extent to which each resource component is utilized.⁴⁰ When the workload exceeds the preset threshold, it is determined that the operator is overloaded at that time point, indicating that the multimodal human–machine interaction process is unreasonable, and the multimodal design for pilots under such task conditions requires further optimization. Table 3 presents the occupancy and load for each channel.

Table 3.

Decomposition of channel occupancy by interaction mode.

Interaction mode	Channel	Step 1	Step 2	Step 3	Step 4	Step 5	Step 7
Tactile interaction mode	Cognition	Symbol/Signal Recognition (1.2)
	Action	Discontinuous Response (2.2)				Continuous Adjustment (2.6)	Discontinuous Response (2.2)
	Visual	Visual Positioning/Adjustment (4)				Visual Observation/Inspection (3)
Multimodal interaction mode	Cognition	Decoding/Encoding, Recollection (5.3)
	Action	Simple Language (2)				Continuous Adjustment (2.6)	Simple Language (2)
	Visual		Visual Observation/ Inspection (3)	Visual Tracking/Gazing (4.4)	Visual Observation/ Inspection (3)

Figure 2 illustrates the comparative analysis of channel workload across different interaction modes at each procedural step of the route planning task. Overall, workload trends remain consistent across steps, with both interaction modes showing relatively low levels; however, the multimodal interaction mode generally imposes a slightly higher workload than the tactile mode. Breaking down into channels, the most notable discrepancies between multimodal interaction mode and tactile mode are in the cognitive and visual dimensions. In the cognitive channel, workload under tactile interaction consistently remains lower than that of multimodal interaction, suggesting that the integration and coordination of multiple modalities require additional cognitive resources. In contrast, in the visual channel, certain steps under multimodal interaction show near-zero workload, as voice control directly replaces visual involvement, highlighting its effectiveness in releasing visual resources. Psychomotor workload shows minimal variation across modes, indicating limited influence from the type of interaction. For total workload, patterns largely mirror those of the cognitive channel, confirming cognition as the main contributing factor. These results suggest that multimodal interaction, while increasing cognitive demand, can effectively redistribute workload by reducing visual load during specific task stages, providing valuable implications for cockpit interface design.

Figure 2.

Comparison of channel workload in each step between different interaction modes.

The analysis of the effectiveness under different evaluation methods

Figure 3 depicts the comparative evaluation of different methodologies, confirming consistent trends across interaction modes, with multimodal interaction generally incurring a higher workload than tactile interaction. The evaluation results for both single-channel and multimodal interactions are consistent across different interaction modes. This finding aligns with the task completion times and subjective workload outcomes, thereby preliminarily validating the efficacy of the ergonomic evaluation model. These results indicate that the comprehensive evaluation model proposed in this study—through task decomposition and channel-specific analysis during the design phase—can effectively assess the validity of multimodal interaction without requiring extensive testing, which will play a crucial role in optimizing interaction designs at the design stage. Although the multimodal evaluation method produced overall higher workload scores than the NASA-TLX due to differences in scoring approaches, we examined whether it could still capture workload variations across control modes. To this end, we conducted a correlation analysis between the NASA-TLX and multimodal evaluation difference scores under tactile control and multimodal control conditions. The results revealed a significant positive correlation (r = 0.52, P = 0.032), indicating that the two methods showed a consistent trend in describing workload differences. This finding supports the validity of the multimodal evaluation method as a complementary measure of workload.

Figure 3.

Comparison of evaluation method between different interaction modes.

Discussion

Theoretical contribution

By comparing results with NASA-TLX, this study preliminarily validates the effectiveness of the comprehensive ergonomics evaluation model in assessing task workload. Previous studies have demonstrated that the VACP scale can effectively reflect subjective mental workload in domains such as driving and healthcare⁴¹; our work extends these findings to the context of aircraft cockpits. Unlike traditional prototype- and experiment-based evaluations, this study provides a feasible approach to identifying potential channel conflicts and workload distribution issues during the design phase, thereby reducing the cost of later-stage modifications. Furthermore, it expands upon scenario-based evaluation methods that typically rely on observation, questionnaires, and log analysis.⁴²

Interestingly, this study found that tactile-based interaction enabled more convenient task completion and imposed a lower workload compared to multimodal interaction, which contradicts prior findings.¹ A plausible explanation lies in participants’ high proficiency with tactile interaction: in multimodal interaction, greater reliance on memory and cognitive effort was required, resulting in higher cognitive load, as also indicated by subjective workload assessments. This aligns with Oviatt,⁴³ who observed that users tend to prefer unimodal input for simple tasks, while multimodal input becomes advantageous as task complexity increases. This phenomenon can be further explained by the expertise–performance paradox: experienced individuals employ top-down processing to rapidly complete familiar tasks, but such reliance may reduce flexibility and mask the potential advantages of new interaction modes.^44,45 In this study, professional pilots’ reliance on training-based expertise allowed them to complete tactile tasks efficiently, thereby obscuring the theoretical benefits of multimodal interaction.

Practical implications

In future aircraft cockpits, multimodal human–machine interaction technologies will be widely adopted, fundamentally transforming pilot–system interactions. The effective application of diverse interaction methods—such as voice commands, eye-tracking, gesture recognition, and tactile feedback—will be critical for improving operational efficiency and safety. The comprehensive evaluation model developed in this study is particularly valuable in this context, as it enables the assessment of multimodal interaction designs during the design phase without requiring physical prototypes. The results of this model are consistent with those obtained in post-prototype testing, validating both its accuracy and reliability.

Although findings indicate that multimodal interaction increases cognitive load, it simultaneously reduces visual load in specific task stages. This reflects a resource reallocation mechanism, consistent with prior evidence that voice input can effectively relieve demands on visual resources.^3,6 By redistributing workload more evenly across channels, multimodal interaction reduces the likelihood of single-modality overload. This redistribution not only enhances usability but also contributes to safety and robustness: in emergency conditions, multimodal systems may offer stronger adaptability and fault tolerance, consistent with previous findings on their advantages under high-stress environments.¹⁰

By offering detailed insights into workload distribution, channel occupancy, and task performance, the model facilitates rapid iterative development and minimizes the costs of later redesigns. Moreover, through systematic analysis of multiple design schemes, it supports the formulation of preliminary design guidelines and practical recommendations for refining interaction strategies. Consequently, this evaluation framework not only enhances baseline usability but also ensures that multimodal interaction designs are optimized from the outset—ultimately enabling more efficient, resilient, and safer operations in future aircraft cockpits.

Limitations and future research

Our study also has some limitations. First, the validation of the proposed model was conducted solely within the context of a route planning task, lacking verification in complex flight combat scenarios; future work should further validate its effectiveness in intricate settings involving multiple tasks and multi-channel conflicts. Furthermore, future research could involve a more detailed categorization of sub-tasks to calibrate task conflict coefficients and priority processing levels, and employ physiological measurements to assess channel occupancy across different sub-tasks, thereby further enhancing the precision of the multimodal interaction technology evaluation model.

Conclusion

This study proposed and validated a comprehensive ergonomic evaluation model for multimodal interaction in aircraft cockpits, addressing the limitations of existing single-channel or prototype-based evaluation approaches. By integrating “Task Conflict Assessment–Channel Conflict Assessment–Channel Load Assessment”, the model provides a systematic framework for assessing workload distribution and potential channel conflicts during the design phase. Validation experiments demonstrated that although multimodal interaction incurred higher task time and subjective workload compared to tactile interaction, the evaluation results of the proposed model were consistent with subjective assessments. The significant correlation between model outcomes and NASA-TLX scores further confirmed its reliability. Moreover, multimodal interaction mode enables a more balanced distribution of workload across different channels, indicating its potential advantages for managing complex multi-channel tasks in the future. These findings highlight the model's potential to serve as a practical tool for early-stage cockpit interface design, enabling designers to identify workload issues before physical prototyping, reduce redesign costs, and ultimately optimize the efficiency and safety of future multimodal cockpit systems.

Footnotes

Acknowledgements

We thank the editor and the reviewers for their useful feedback that improved this paper.

Ethical approval

This study was conducted in accordance with the ethical principles of the Declaration of Helsinki and was approved by the Tsinghua University Committee on Science and Technology Ethics (Approval No.: THU04KS2025005).

Informed consent

Written informed consent was obtained from all participants.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Foundation of National Key Laboratory of Human Factors Engineering (Grant No. HFNKL2023J09) and the National Natural Science Foundation of China (Grant No. 72171130).

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article. This article is a revised and expanded version of a paper entitled [Ergonomics evaluation modeling of multimodal interaction technology for airplane cockpit and experimental validation] presented at [IEA2024, JEJU Korea and August 25–29].

References

LaViola Jr

Kruijff

McMahan

, et al. 3D User Interfaces: Theory and Practice. Boston, MA: Addison-Wesley Professional, 2017.

Cohen

Dalrymple

Moran

, et al. Synergistic use of direct manipulation and natural language. In: Proceedings of the SIGCHI Conference on Human factors in Computing Systems, 1989, pp.227–233.

Oviatt

. Multimodal interactive maps: designing for human performance. Hum–Comput Interact 1997; 12: 93–129.

Van Erp

Van Veen

, et al. A multi-purpose tactile vest for astronauts in the international space station. In: Proceedings of eurohaptics. Dublin, Ireland: ACM, Press, 2003, pp.405–408.

Wilkins

Acton

. Noise and accidents—a review. Ann Occup Hyg 1982; 25: 249–260.

Jansen

Wennemers

Vos

, et al. Flytact: a tactile display improves a helicopter pilot’s landing performance in degraded visual environments. In: Haptics: Perception, Devices and Scenarios: 6th International Conference, EuroHaptics 2008 Madrid, Spain, June 10–13, 2008 Proceedings 6, 2008, pp.867–875: Springer.

Shree

DVJ

Murthy

LRD

Saluja

, et al. Operating different displays in military fast jets using eye gaze tracker. J Aviat Technol Eng 2018; 8: 31.

Rakkolainen

Farooq

Kangas

, et al. Technologies for multimodal interaction in extended reality—a scoping review. Multimodal Technol Interact 2021; 5: 81.

Luo

Zhang

Pan

, et al. Dream-Experiment: a MR user interface with natural multi-channel interaction for virtual experiments. IEEE Trans Vis Comput Graph 2020; 26: 3524–3534.

10.

Wickens

Sarter

, et al. Informing the design of multimodal displays: a meta-analysis of empirical studies comparing auditory and tactile interruptions. Proc Hum Factors Ergon Soc Annu Meet 2011; 55: 1170–1174.

11.

Bischoff

Graefe

. Dependable multimodal communication and interaction with robotic assistants. In: Proceedings. 11th IEEE International Workshop on Robot and Human Interactive Communication, 2002, pp.300–305: IEEE.

12.

Johnson

Agah

. Human robot interaction through semantic integration of multiple modalities, dialog management, and contexts. Int J Soc Robot 2009; 1: 283–305.

13.

Wei

Zhuang

Wanyan

, et al. A model for discrimination and prediction of mental workload of aircraft cockpit display interface. Chin J Aeronaut 2014; 27: 1070–1077.

14.

Zeng

Sun

Liu

, et al. Visual cognition-based optimised design of primary flight displays in cockpits. Aeronaut J 2025; 129: 609–626.

15.

Causse

Behrend

Mumaw

. Aerospace Psychology and Human Factors: Applied Methods and Techniques. Göttingen: Hogrefe, 2024.

16.

Tippey

Roady

Rodriguez-Paras

, et al. General aviation weather alerting: the effectiveness of different visual and tactile display characteristics in supporting weather-related decision making. Int J Aerosp Psychol 2017; 27: 121–136.

17.

Strickland

Pioro

Ntuen

. The impact of cockpit instruments on pilot exhaustion. Comput Ind Eng 1996; 31: 483–486.

18.

Schreuder

Riccio

Risetti

, et al. User-centered design in brain–computer interfaces—A case study. Artif Intell Med 2013; 59: 71–80.

19.

Aloise

Aricò

Schettini

, et al. Asynchronous gaze-independent event-related potential-based brain–computer interface. Artif Intell Med 2013; 59: 61–69.

20.

Jin

Chen

, et al. Modeling takeover behavior in level 3 automated driving via a structural equation model: considering the mediating role of trust. Accid Anal Prev 2021; 157: 106156.

21.

Liang

. Voice search behavior under human–vehicle interaction context: an exploratory study. Libr Hi Tech 2024; 42: 1496–1516.

22.

Han

Abowd

Stasko

. Intivisor: a visual analytics system for interaction log analysis. IEEE Trans Vis Comput Graph 2024; 31: 1772–1784.

23.

Lou

Yan

, et al. Distance effects on visual search and visually guided freehand interaction on large displays. Int J Ind Ergon 2022; 90: 103318.

24.

Kumar

. Measurement of cognitive load in HCI systems using EEG power spectrum: an experimental study. Procedia Comput Sci 2016; 84: 70–78.

25.

Wang

, et al. Pilot behavior recognition based on multi-modality fusion technology using physiological characteristics. Biosensors 2022; 12: 04.

26.

Guerino

Valentim

NMC

. Usability and user experience evaluation of natural user interfaces: a systematic mapping study. IET softw 2020; 14: 451–467.

27.

Roider

Rümelin

Pfleging

, et al. Investigating the effects of modality switches on driver distraction and interaction efficiency in the car. J Multimodal User Interfaces 2019; 13: 89–97.

28.

Zeng

, et al. Multi-modal user experience evaluation on in-vehicle HMI systems using eye-tracking, facial expression, and finger-tracking for the smart cockpit. Int J Veh Perform 2022; 8: 429–449.

29.

Triantafyllidis

Mcgreavy

, et al. Study of multimodal interfaces and the improvements on teleoperation. IEEE Access 2020; 8: 78213–78227.

30.

W-C

Zakarija

C-S

, et al. Interface design on cabin pressurization system affecting pilot’s situation awareness: the comparison between digital displays and pointed displays. Hum Factors Ergon Manuf Serv Ind 2020; 30: 103–113.

31.

Stefan

Christoph

Harald

, et al. Aircraft cockpit interaction in virtual reality with visual, auditive, and vibrotactile feedback. Proc ACM Hum-Comput Interact Epub ahead of print 31 October 2023; 7: 420–443.

32.

Wang

W-C

Korek

, et al. Future flight deck design: developing an innovative touchscreen inceptor combined with the primary flight display. Int J Ind Ergon 2024; 101: 103588.

33.

Xin

Kam

Qinbiao

, et al. Exploring the human-centric interaction paradigm: augmented reality-assisted head-up display design for collaborative human-machine interface in cockpit. Adv Eng Inf 2024; 62: 102656.

34.

Dong

Liu

. Research on UX evaluation method of design concept under multi-modal experience scenario in the earlier design stages. Int J Int Des Manuf (IJIDeM) 2018; 12: 505–515.

35.

Wickens

. Multiple resources and performance prediction. Theor Issues Ergon Sci 2002; 3: 159–177.

36.

Stanton

. Hierarchical task analysis: developments, applications, and extensions. Appl Ergon 2006; 37: 55–79.

37.

Baddeley

. Exploring Working Memory: Selected works of Alan Baddeley. Routledge, 2017.

38.

Alais

Newell

Mamassian

. Multisensory processing in review: from physiology to behaviour. Seeing Perceiving 2010; 23: 3–38.

39.

Bierbaum

Szabo

Aldrich

. Task analysis of the UH-60 mission and decision rules for developing a UH-60 workload prediction model: Summary report. US Army Research Institute for the Behavioral and Social Sciences, 1989.

40.

McCracken

Aldrich

. Analyses of Selected LHX Mission Functions: Implications for Operator Workload and System Automation Goals. Fort Belvoir, VA: Defense Technical Information Center. Epub ahead of print 1 June 1984. DOI: 10.21236/ADA232330.

41.

Huang

Zhang

. Generalizability of mental workload prediction using VACP scales in different fields. In: Harris

W-C

(eds) Engineering psychology and cognitive ergonomics. Cham: Springer Nature Switzerland, 2023, pp.79–94.

42.

Fernández

MÁ

Peláez

López

, et al. Multimodal interfaces for the smart home: findings in the process from architectural design to user evaluation. In: Bravo

López-de-Ipiña

Moya

(eds) Ubiquitous computing and ambient intelligence. Berlin, Heidelberg: Springer, 2012, pp.173–180.

43.

Oviatt

Coulston

Lunsford

. When do we interact multimodally? Cognitive load and multimodal communication patterns. In: Proceedings of the 6th International Conference on Multimodal Interfaces, New York, NY, USA: Association for Computing Machinery, pp.129–136.

44.

Dror

, et al. The paradox of human expertise: why experts get it wrong. Paradoxical Brain 2011: 177–188.

45.

Ericsson

Smith

. Toward a general theory of expertise: Prospects and limits. New York, NY, USA: Cambridge University Press, 1991.