Abstract
Background
With the development of eye movement and voice control, multimodal interaction has become a leading trend for future aircraft cockpits. Yet, most studies either demonstrate its effectiveness broadly or focus on single channels, especially the visual channel. A comprehensive ergonomic evaluation method is still lacking.
Objective
This study proposes and validates a model for evaluating multimodal interaction in aircraft cockpits, addressing workload distribution, channel occupancy, and resource demand.
Methods
Our research proposes an evaluation model for multimodal interaction in aircraft cockpits, based on the analysis of pilots’ information cognitive processing. The model sequentially evaluates the workload of multimodal interaction technology in aircraft cockpit scenarios from the perspectives of task level, channel occupancy, and channel resource demand. Validation experiments were conducted using route planning tasks, assessing the workload for tactile interaction modes and multimodal interaction modes.
Results
Results demonstrated the effectiveness of the proposed ergonomics evaluation model, as correlation analysis revealed a significant positive relationship between the model outcomes and subjective workload scores under both tactile and multimodal conditions. While multimodal interaction incurred significantly higher task time and subjective workload compared to tactile interaction, it also enabled more balanced workload distribution across channels, suggesting potential advantages for managing complex multi-channel tasks in future cockpit operations.
Conclusions
The validation experiments confirmed the effectiveness of our ergonomics evaluation model for multimodal interaction in airplane cockpit. It offers a practical tool for early cockpit design by identifying channel conflicts and workload distribution issues before prototype development, thus supporting safer and more efficient multimodal systems.
Keywords
Introduction
Multimodal interaction technology has become the main development trend for future aircraft cockpit interaction. From the perspective of natural interaction, human perceptual channels operate in parallel; multimodal interaction aims to reasonably allocate and utilize non-visual channels to enhance interaction naturalness and reliability. Traditional cockpit interaction relies primarily on tactile control and visual perception. While these conventional methods limit resource channel conflicts, they often lead to perceptual overload due to the heavy reliance on a single sensory modality. In contrast, multimodal interaction employs resources such as auditory, visual, and tactile—using voice, eye-movement, and touch controls—to manage systems. This approach has been proven to enhance information transmission efficiency, reduce cognitive load, lower task errors, and improve fault tolerance. 1 However, the simultaneous use of multiple channels might result in conflicts between interaction tasks and resource channels without proper design, ultimately compromising performance. Therefore, developing evaluation methods for multimodal interaction during the design phase is essential for understanding channel utilization and potential conflicts, thereby contributing to improvements in cockpit design.
Previous studies have demonstrated that modifying or increasing interaction modalities can enhance task performance. Cohen et al. verified that voice input is more efficient and natural than keyboard input. 2 In map navigation, Oviatt proposed that integrating multiple channels effectively compensates for the limitations of single-modality voice interaction. 3 Van Erp et al. confirmed that in weightless environments, tactile information suffers less loss compared to visual input. 4 Wilkins and Acton found tactile feedback minimally affected in high-noise environments, 5 while Jansen et al. showed that tactile cues reduce pilots’ visual workload in flight. 6 Shree DV et al. demonstrated that gaze-controlled interfaces significantly increase reaction speed compared to touchscreens and joysticks. 7 Rakkolainen et al. summarized multimodal interaction technologies based on human sensory modalities, highlighting technical feasibility and effectiveness. 8 Moreover, multimodal interaction improves user satisfaction in VR, 9 reduces system coupling, decreases errors, lowers cognitive load, and balances cognitive resource allocation. 1 Under emergencies, multimodal systems provide faster responses and greater adaptability. 10 Experimental studies further indicate that effective multimodal fusion reduces cognitive load and enhances overall efficiency.11,12
The current evaluation of human–machine interaction in aircraft cockpits primarily focuses on single elements such as display interfaces, alert systems, and control methods. Wei et al. analyzed cockpit display interfaces from an ergonomics perspective, experimentally assessing cognitive workload 13 Similarly, Zeng et al. conducted cognitive experiments on optimized interfaces to evaluate the relationship between design factors and task performance, 14 thereby providing theoretical foundations and design implications for display interfaces. Furthermore, Causse, Behrend and Mumaw performed eye-tracking experiments with 20 pilots during flight tasks to analyze gaze fixation distributions in monitoring tasks, then providing the theoretical foundation for layout design. 15 In addition, Tippey et al. compared pilot responses to various alert modalities, demonstrating that tactile alerts yield the highest perceptual efficiency. 16 Strickland, Pioro and Ntuen investigated the impact of cockpit instrument design on visual fatigue in flight, 17 suggesting that rational design can alleviate fatigue and enhance work efficiency. Similarly, Schreuder et al. employed eye tracker to assess driver fatigue, which provided the physiological basis for improving in-vehicle alarm designs. 18 Lastly, Aloise et al. examined visual attention toward targets using eye tracker, analyzing optimal areas for interface design elements and the optimal forms for interface layout. 19
In addition to these single-modality approaches, many evaluations rely on prototype-based experiments and conduct overall assessments, but lack decomposition across individual channels. Traditional methods include subjective evaluations (e.g., questionnaires, interviews, observation records),20,21 objective measures such as task performance and log analysis,22,23 and physiological indices such as eye movement and EEG.24,25 These assessments typically use interaction efficiency, user experience, adaptability and many other measurements as indicators.26,27 For example, Li et al. proposed a user experience evaluation method combining eye-tracking, finger movement, and facial expressions, and further developed an MLP-based affect prediction method from multimodal data. 28 Triantafyllidis et al. compared multimodal combinations to assess cognitive workload and usability. 29 Aloise et al. employed subjective scales to evaluate flight performance, human–machine efficiency, and comfort during pilot operations, thereby identifying main factors influencing mental workload and effectiveness. 19 Li et al. used subjective measurement of situational awareness to reorganize and reconstruct information presentation according to pilots’ cognitive models, resulting in an improved design for system interfaces. 30 AuerStefan et al. compared the effectiveness of various feedback modalities, including auditory and tactile cues, within a VR simulator. 31 Wang et al. examined the feasibility of replacing conventional side sticks or control yokes with touchscreens for aircraft manipulation; their findings revealed that pilots’ performance, system usability, and situational awareness were significantly lower with touchscreen interfaces, which indicated that the future of manipulation still needs to be explored and improved. 32 Furthermore, Xin et al. combined subjective evaluations with eye-tracking data in flight tests to assess AR-HUD displays, demonstrating high situational awareness and low workload under emergency conditions, thereby confirming the validity of the new interface. 33 At the conceptual stage, Dong and Liu developed a multimodal evaluation framework for design concepts, establishing multimodal expression mechanisms and multisensory collaboration principles. 34 They proposed methods to convert scattered conceptual views into perceivable multimodal scenarios and applied semantic differential methods for early-stage user experience evaluation.
Although existing methods have advanced the evaluation of cockpit interaction, they remain limited in several aspects. Prototype- and simulation-based assessments cannot fully identify potential channel conflicts during the early design stage. These approaches often measure overall workload or satisfaction but provide limited insight into how specific resource channels are utilized or compete, making it difficult to determine clear directions for improvement. In summary, while prior research has demonstrated both the feasibility and benefits of multimodal interaction, extant evaluation methods mainly rely on single technologies or experiment-based comparisons, lacking systematic mechanisms to assess conflicts and channel occupancy. Given the increasing demand for multimodal cockpit interaction and the potential risk of task–channel conflicts, a comprehensive evaluation framework is urgently needed during the design phase. To address this gap, this study employs multiple resource theory 35 and Hierarchical Task Analysis (HTA) 36 to analyze pilots’ information processing. Based on this, an evaluation model comprising “Task Conflict Evaluation—Channel Conflict Evaluation—Channel Load Evaluation” is developed and preliminarily validated through a route-planning simulation experiment.
Methodology
Ergonomics evaluation modeling of multimodal interaction technology for airplane cockpit
Analysis of pilot information cognitive processing
Multiple Resource Theory explains the allocation of resources among concurrent tasks and the relationship between workload and task difficulty. 35 Multiple Resource Theory categorizes multitasking operations into two primary types: (1) simultaneous operation of tasks within the same channel and (2) simultaneous operation of tasks across different channels. Based on the understanding of pilots’ operational tasks, it is assumed that: (1) when tasks within the same channel are performed simultaneously, conflicts arise; the overall task workload is greater than or equal to the sum of the individual task workloads, and the operation time exceeds the sum of the individual task operation times (as individuals typically convert such tasks into sequential processing); (2) when tasks across different channels are executed simultaneously, they can be processed in parallel; however, the combined task workload is greater than that of any single task yet less than the sum of the individual workloads, and the overall operation time is similarly greater than any single task but less than the sum of their individual times. This theory is typically characterized by four channels: visual, auditory, cognition, and perception. It provides a framework for evaluating the rationality of multimodal interaction technologies in aircraft cockpits by analyzing the occupancy of resource channels. By examining the utilization of each channel, one can assess channel occupancy under multimodal interaction, thereby evaluating the appropriateness of the design and integration of these systems. Based on this rationale, we first analyzed the information processing of pilots.
The pilot information cognitive processing process (Figure 1) mainly originates from the input of visuals, sound, and somatosensory information, and the output of visual, vocal, gesture, and tactile control. Sound information enters through the ears into the auditory channel, visuals through the eyes into the visual channel, and somatosensory through the haptic into the tactile channel. Human cognitive behavior allows for the selective processing of perceived information, thereby entering working memory. 37 While processing information, working memory updates and mobilizes long-term memory to process information from different perceived channels. 37 Each channel, while processing its own information, can also integrate with other channels to collectively integrate information, such as Visual-auditory integration,” “Tactile-auditory integration,” and “Tactile-visual integration". 38 Through the comprehensive integration of information from different channels, the pilot will make decisions and proceed to operation, selecting the appropriate control method to manipulate the aircraft.

Pilot information process integrating multimodal human-machine interaction technology modes.
The main difference in information processing between multimodal interaction technologies and traditional cockpits lies in channel allocation. In conventional aircraft cockpits, control actions are primarily executed via hand-operated control sticks/joysticks, throttle levers, and instrument panel buttons, with limited use of foot controls thus mainly engaging the behavioral channel and resulting in minimal conflicts with information input channels. In contrast, under a multimodal interaction mode, eye movement control, voice control, and gesture/tactile control concurrently occupy the visual, behavioral, and voice channels.
During aircraft operation, there is the possibility that both information input and output might utilize the same channel, or that multiple pieces of information might be transmitted through a single channel. In other words, under a multimodal interaction framework—which involves visual, auditory, cognition, and behavioral channels—there exists the potential for conflicts between information input and output, as well as among multiple information inputs/outputs. Therefore, eliminating channel conflicts at the procedural logic level in the design of multimodal interaction systems is an effective strategy for optimizing integration. Furthermore, identifying and analyzing these conflicts can serve as a method to evaluate the comprehensive effectiveness of multimodal interaction integration.
Comprehensive ergonomics evaluation model for multimodal interaction technology
Flight tasks are inherently complex, involving intricate interfaces and the simultaneous execution of multiple sub-tasks. To evaluate the effectiveness of multimodal interaction technologies in complex scenarios, it is necessary to decompose the overall task. Event decomposition modeling relies on HTA, which provides a comprehensive model of the sub-goal hierarchy within a system; its structure is applicable to various analyses.
36
HTA elucidates the sequence, steps, and interrelationships among sub-tasks, thereby facilitating the analysis of task objectives, required knowledge and skills, and necessary resources. One notable advantage of the HTA method is its ability to effectively analyze task structure and execution processes, and to identify the relationships and dependencies among individual steps. Based on this, our study proposes a “Task Conflict Evaluation—Channel Conflict Evaluation—Channel Load Evaluation” model. In this model, the “Task Conflict Evaluation” component assesses the number of tasks that must be completed at time t; the “Channel Conflict Evaluation” component evaluates the degree of conflict in channel occupancy at time t, that is, the amount of information occupying each channel; and the “Channel Load Evaluation” component measures the resource demands for processing information in each channel at time t. A network model of aircraft tasks based on multimodal interaction technologies is constructed, comprising the following sub-steps:
For any operational task M, input the task and derive its task flow based on the cockpit's available multimodal human–machine interaction modalities. For any task M, extract its task segments in chronological order to form the task segment set for M, which is represented using the following dataset: M = {m1, m2, m3, ……, mx, ……, mn}; Based on task segment mx, it is decomposed into a sequence of operational tasks arranged in the order of execution. Within task segment mx, one or more operational tasks may exist. Let Based on the operational task, it is further decomposed into visual, auditory, cognitive, and motor channels, thereby forming a time series of channel behaviors. The occupancy demand for each channel is considered to be stochastic. Let Construct a task sequence for the input task M based on its task segments, an operational task sequence derived from the operational tasks, and a behavior element sequence derived from the behavioral factors. Note that both the operational task sequence and the behavior element sequence may contain multiple tasks or behavior elements concurrently. After decomposition, these sequences are organized into a network-like structure.
Task Conflict Evaluation
Nt represents the number of tasks that the operator needs to handle at time t, evaluated through the decomposition of tasks using HTA.
Channel Conflict Evaluation
Channel conflict evaluation assesses the occupancy of each channel based on the decomposition of tasks into various channels.
The Channel Conflict Evaluation Model considers that under ideal conditions, at any given moment a pilot may be engaged in multiple tasks; however, the visual, auditory, and behavioral channels can each only meet the occupancy requirement for one corresponding task. In this model, visual channel occupancy encompasses both information observation and eye control; auditory channel occupancy involves listening to information; and behavioral channel occupancy includes gesture control, voice control, and tactile control. A channel occupancy demand exceeding one may occur if different tasks require the same channel simultaneously or if a single task demands multiple uses of the same channel. When
Channel Load Evaluation
WT represents the workload at moment T, constructed based on the channel behavior after task decomposition. It enables the analysis of the changes in channel resource demands of the operator during task.
39
Where WT represents the load at time T, i, j denote the different resource channels, t represents the operator's tasks or behaviors,
Within the parentheses, the first part calculates the conflict within the same channel, while the second part assesses the conflict across different channels. This formula is designed to analyze potential conflicts arising from the same channel being occupied by different operations and from different channels being occupied by multiple operations—that is, conflicts may exist both within and between channels. This comprehensive formulation accounts for various channel occupancy scenarios, making it highly suitable for implementation on software tool platforms; however, for routine analysis, a simplified version that omits the conflict coefficient might be used. In the Channel Load Evaluation model, the McCracken-Aldrich scale 39 is used as the basis for computational analysis. McCracken and Aldrich have developed rating scales for each VACP component, providing relative ratings of the utilization level of each resource component. When the value of WT exceeds a predetermined threshold, the operator's workload is considered overloaded at that moment, indicating that the multimodal interaction process is suboptimal and that further optimization of the multimodal design for pilots under these task conditions is necessary.
Model validation experiment
Participants
The experiment involved 20 pilots with extensive flight experience. All participants were right-handed, had normal or corrected-to-normal vision, and did not exhibit any color blindness or color weakness.
Experimental design
This study developed a simulation platform for multimodal interaction in aircraft cockpit, integrating various control modes such as gesture control, haptic control, voice control, and eye movement control. Based on the route planning task provided by the multimodal interaction platform, this experiment employed two completion modes: a tactile operation mode and a multimodal interaction mode (comprising voice, manual, and eye control). The specific task content and sub-task breakdown are presented in the Table 1. The flight route planning task can be decomposed into eight subtasks: Insert a new waypoint before waypoint 3 (step 1); Set the formation pattern of the new waypoint to Formation 1 (step 2); Set the flight altitude of the new waypoint to 12,000 (step 3); Set the flight speed of the new waypoint to 345 (step 4); Change the geographical location of the new waypoint to enemy target A2 (step 5); Play the simulation process(step 6); Adjust the play speed to five times (step 7); End play(step 8).
Flight route planning task process.
In the Tactile interaction mode, participants were asked to complete eight subtasks only by haptic channel. In the multimodal interaction mode, participants’ operations are allocated in different channels. As shown in Table 1, Step 1 can be decomposed as: action channel (voice): Long press the voice button and say “Waypoint 3”, and action channel (voice) and Visual channel: After Waypoint 3 is highlighted, gaze at the waypoint list, long press the voice button, and say “Insert”. Step 2 can be decomposed as: visual channel: Gaze at the formation attribute area in the waypoint properties panel, and action channel (voice): Long press the voice button with right hand, and say “Formation Pattern 1”. Step 3 can be decomposed as: visual channel: Gaze at the flight altitude area in the waypoint properties panel, and action channel (behavior): Press the “up/down” button on the joystick with right hand to adjust the altitude to near 12,000. Step 4 can be decomposed as: visual channel: Gaze at the flight speed area in the waypoint properties panel, and action channel (behavior): Press the “up/down” button on the joystick with right hand to adjust the speed to near 345. Step 5 can be decomposed: visual channel: gaze somewhere on the map, and action channel(voice): long press the “voice button” with right hand, and say “Here”. Step 6, Step 7 and Step 8 were changed from tactile control to voice control, which mainly occupied the action channel (voice).
Prior to the formal experiment, participants were required to practice all platform operations sufficiently. The experimental order was arranged using a Latin square balanced design to mitigate learning effects during the formal experiment. The experimental dependent variables included task completion time, subjective workload measured by the NASA-TLX scale, and a comprehensive evaluation score derived from the model proposed in this study.
Results
The analysis of different interaction modes
The descriptive statistics for task time and NASA-TLX scores were shown in Table 2. A t-test was initially conducted to compare the multimodal interaction mode with the single-channel interaction mode. The results showed that task time (t(18) = −5.89,P < 0.001) and subjective workload (t(16) = −2.11,P = 0.051) of multimodal interaction modes(Msubjective−workload = 45.13, MTasktime = 41.41) were significantly higher than tactile interaction modes(Msubjective−workload = 33.96, MTasktime = 27.79). These findings indicated that tactile interaction mode was found to be more convenient and lower workload.
The descriptive statistics for task time and NASA-TLX scores.
The analysis of channel occupancy in multimodal interaction modes
The modeling described in Section 2.1 includes task conflicts, intra-channel conflicts, and inter-channel conflicts, making it suitable for decomposing complex tasks and enabling calculation through a software platform. In this validation experiment, to evaluate multimodal interaction, we first referred to the task conflict evaluation model defined in Section 2.1.2.1 to decompose tasks into non-conflicting steps within the same time interval. Subsequently, following Section 2.1.2.2, we decomposed operations within the same step across different channels, addressing situations where the same channel is occupied by multiple operations or where different channels are simultaneously occupied, thereby resolving the possibility of both intra-channel and inter-channel conflicts. This allowed us to simplify the ‘channel workload evaluation’ formula by ignoring the conflict coefficients and instead applying the McCracken-Aldrich VACP (Visual, Auditory, Cognitive, and Psychomotor) scale 40 as the basis for calculation and analysis. The VACP method, grounded in multiple resource theory, is designed to predict mental workload induced by tasks and meets the requirement of this study to assess the effectiveness of multimodal interaction design in its early design phase. McCracken and Aldrich have developed scoring tables for each VACP component, providing relative ratings of the extent to which each resource component is utilized. 40 When the workload exceeds the preset threshold, it is determined that the operator is overloaded at that time point, indicating that the multimodal human–machine interaction process is unreasonable, and the multimodal design for pilots under such task conditions requires further optimization. Table 3 presents the occupancy and load for each channel.
Decomposition of channel occupancy by interaction mode.
Figure 2 illustrates the comparative analysis of channel workload across different interaction modes at each procedural step of the route planning task. Overall, workload trends remain consistent across steps, with both interaction modes showing relatively low levels; however, the multimodal interaction mode generally imposes a slightly higher workload than the tactile mode. Breaking down into channels, the most notable discrepancies between multimodal interaction mode and tactile mode are in the cognitive and visual dimensions. In the cognitive channel, workload under tactile interaction consistently remains lower than that of multimodal interaction, suggesting that the integration and coordination of multiple modalities require additional cognitive resources. In contrast, in the visual channel, certain steps under multimodal interaction show near-zero workload, as voice control directly replaces visual involvement, highlighting its effectiveness in releasing visual resources. Psychomotor workload shows minimal variation across modes, indicating limited influence from the type of interaction. For total workload, patterns largely mirror those of the cognitive channel, confirming cognition as the main contributing factor. These results suggest that multimodal interaction, while increasing cognitive demand, can effectively redistribute workload by reducing visual load during specific task stages, providing valuable implications for cockpit interface design.

Comparison of channel workload in each step between different interaction modes.
The analysis of the effectiveness under different evaluation methods
Figure 3 depicts the comparative evaluation of different methodologies, confirming consistent trends across interaction modes, with multimodal interaction generally incurring a higher workload than tactile interaction. The evaluation results for both single-channel and multimodal interactions are consistent across different interaction modes. This finding aligns with the task completion times and subjective workload outcomes, thereby preliminarily validating the efficacy of the ergonomic evaluation model. These results indicate that the comprehensive evaluation model proposed in this study—through task decomposition and channel-specific analysis during the design phase—can effectively assess the validity of multimodal interaction without requiring extensive testing, which will play a crucial role in optimizing interaction designs at the design stage. Although the multimodal evaluation method produced overall higher workload scores than the NASA-TLX due to differences in scoring approaches, we examined whether it could still capture workload variations across control modes. To this end, we conducted a correlation analysis between the NASA-TLX and multimodal evaluation difference scores under tactile control and multimodal control conditions. The results revealed a significant positive correlation (r = 0.52, P = 0.032), indicating that the two methods showed a consistent trend in describing workload differences. This finding supports the validity of the multimodal evaluation method as a complementary measure of workload.

Comparison of evaluation method between different interaction modes.
Discussion
Theoretical contribution
By comparing results with NASA-TLX, this study preliminarily validates the effectiveness of the comprehensive ergonomics evaluation model in assessing task workload. Previous studies have demonstrated that the VACP scale can effectively reflect subjective mental workload in domains such as driving and healthcare 41 ; our work extends these findings to the context of aircraft cockpits. Unlike traditional prototype- and experiment-based evaluations, this study provides a feasible approach to identifying potential channel conflicts and workload distribution issues during the design phase, thereby reducing the cost of later-stage modifications. Furthermore, it expands upon scenario-based evaluation methods that typically rely on observation, questionnaires, and log analysis. 42
Interestingly, this study found that tactile-based interaction enabled more convenient task completion and imposed a lower workload compared to multimodal interaction, which contradicts prior findings. 1 A plausible explanation lies in participants’ high proficiency with tactile interaction: in multimodal interaction, greater reliance on memory and cognitive effort was required, resulting in higher cognitive load, as also indicated by subjective workload assessments. This aligns with Oviatt, 43 who observed that users tend to prefer unimodal input for simple tasks, while multimodal input becomes advantageous as task complexity increases. This phenomenon can be further explained by the expertise–performance paradox: experienced individuals employ top-down processing to rapidly complete familiar tasks, but such reliance may reduce flexibility and mask the potential advantages of new interaction modes.44,45 In this study, professional pilots’ reliance on training-based expertise allowed them to complete tactile tasks efficiently, thereby obscuring the theoretical benefits of multimodal interaction.
Practical implications
In future aircraft cockpits, multimodal human–machine interaction technologies will be widely adopted, fundamentally transforming pilot–system interactions. The effective application of diverse interaction methods—such as voice commands, eye-tracking, gesture recognition, and tactile feedback—will be critical for improving operational efficiency and safety. The comprehensive evaluation model developed in this study is particularly valuable in this context, as it enables the assessment of multimodal interaction designs during the design phase without requiring physical prototypes. The results of this model are consistent with those obtained in post-prototype testing, validating both its accuracy and reliability.
Although findings indicate that multimodal interaction increases cognitive load, it simultaneously reduces visual load in specific task stages. This reflects a resource reallocation mechanism, consistent with prior evidence that voice input can effectively relieve demands on visual resources.3,6 By redistributing workload more evenly across channels, multimodal interaction reduces the likelihood of single-modality overload. This redistribution not only enhances usability but also contributes to safety and robustness: in emergency conditions, multimodal systems may offer stronger adaptability and fault tolerance, consistent with previous findings on their advantages under high-stress environments. 10
By offering detailed insights into workload distribution, channel occupancy, and task performance, the model facilitates rapid iterative development and minimizes the costs of later redesigns. Moreover, through systematic analysis of multiple design schemes, it supports the formulation of preliminary design guidelines and practical recommendations for refining interaction strategies. Consequently, this evaluation framework not only enhances baseline usability but also ensures that multimodal interaction designs are optimized from the outset—ultimately enabling more efficient, resilient, and safer operations in future aircraft cockpits.
Limitations and future research
Our study also has some limitations. First, the validation of the proposed model was conducted solely within the context of a route planning task, lacking verification in complex flight combat scenarios; future work should further validate its effectiveness in intricate settings involving multiple tasks and multi-channel conflicts. Furthermore, future research could involve a more detailed categorization of sub-tasks to calibrate task conflict coefficients and priority processing levels, and employ physiological measurements to assess channel occupancy across different sub-tasks, thereby further enhancing the precision of the multimodal interaction technology evaluation model.
Conclusion
This study proposed and validated a comprehensive ergonomic evaluation model for multimodal interaction in aircraft cockpits, addressing the limitations of existing single-channel or prototype-based evaluation approaches. By integrating “Task Conflict Assessment–Channel Conflict Assessment–Channel Load Assessment”, the model provides a systematic framework for assessing workload distribution and potential channel conflicts during the design phase. Validation experiments demonstrated that although multimodal interaction incurred higher task time and subjective workload compared to tactile interaction, the evaluation results of the proposed model were consistent with subjective assessments. The significant correlation between model outcomes and NASA-TLX scores further confirmed its reliability. Moreover, multimodal interaction mode enables a more balanced distribution of workload across different channels, indicating its potential advantages for managing complex multi-channel tasks in the future. These findings highlight the model's potential to serve as a practical tool for early-stage cockpit interface design, enabling designers to identify workload issues before physical prototyping, reduce redesign costs, and ultimately optimize the efficiency and safety of future multimodal cockpit systems.
Footnotes
Acknowledgements
We thank the editor and the reviewers for their useful feedback that improved this paper.
Ethical approval
This study was conducted in accordance with the ethical principles of the Declaration of Helsinki and was approved by the Tsinghua University Committee on Science and Technology Ethics (Approval No.: THU04KS2025005).
Informed consent
Written informed consent was obtained from all participants.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Foundation of National Key Laboratory of Human Factors Engineering (Grant No. HFNKL2023J09) and the National Natural Science Foundation of China (Grant No. 72171130).
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article. This article is a revised and expanded version of a paper entitled [Ergonomics evaluation modeling of multimodal interaction technology for airplane cockpit and experimental validation] presented at [IEA2024, JEJU Korea and August 25–29].
