Abstract
This study examines the relationship between action team communication characteristics and task demand levels, with the goal of informing communication-based measures of team workload. Forty teams of three participated in an experiment emulating key teaming aspects of action teams in a simulated task based on wildland firefighting command and control. Task demand levels and team interdependence were systematically manipulated. Team communication characteristics and subjective workload were measured. Results revealed several trends in team communication characteristics under high demand conditions, including more time spent communicating, longer utterance durations, reduced rates of speech, reductions in communication pattern complexity, and greater centralization in high-interdependence teams. Additionally, content-free communication measures improved statistical models of team performance in high demand missions beyond traditional workload questionnaires. Overall, this study provides evidence for the value of communication-based measures of team workload, highlights the important role of team interdependence in the team workload, and offers direction for further development and validation.
Keywords
Introduction
It is critical for safety and performance that human capabilities be matched with the demands of work. Research on cognitive workload has explored the relationships between task demands, performance, and cognitive resources (Longo et al., 2022; Norman & Bobrow, 1975; Young et al., 2015), resulting in impactful interventions and system designs. This research has expanded from the workload of individuals to the study and measurement of the collective workload of teams of people, called team workload (Bowers et al., 1997; Funke et al., 2012; Zhang et al., 2023). Although definitions and conceptualization of team workload vary across the literature, there is some consensus that team workload should be treated as distinct from individual workload. Team workload must capture additional constructs inherent to teamwork (Bowers et al., 1997; Bowers & Jentsch, 2005; Johnson et al., under review; Funke et al., 2012; Bedwell et al., 2014; Young et al., 2007; Zhang et al., 2023). For this work, we define team workload as a multi-dimensional construct emerging at the team level and representing the interaction between a team’s finite performance capacities, the teamwork and taskwork demands placed on the team, and their performance on their team-level tasks within an operating environment over time.
Focusing on workload at the team level may be necessary for improving interdependent teams, wherein the sustained performance of the whole team is more important than that of individual members. Action teams are a highly interdependent type of team that contend with particularly demanding and complex conditions in which the consequences of overload and performance degradation are often severe. Action teams are characterized by a heterogeneous distribution of expertise, information, and tasks that are distributed across several highly trained individuals with specialized roles and skills (Kozlowski et al., 1996). Examples include command and control teams, search and rescue teams, and surgical teams (Sundstrom, 1999). Action teams will benefit substantially from improvements in team workload measurement techniques that could subsequently allow for their work, environment, and team processes to be improved. Improvements in team workload assessments for action teams can aid researchers, engineers, designers, and acquisition decision makers assessing the impact of system designs and interventions. Future real-time applications may also have several uses such as improving a commander’s awareness of the demands that their teams are under or triggering adaptive automation to alleviate workload in a targeted manner. Other user groups may also employ team workload measures to evaluate the impact of new systems on workflows.
The most common ways to measure team workload in action teams include subjective (e.g., questionnaires; Sellers et al., 2014), performance-based (e.g., using secondary task paradigms; Lenné et al., 2014), and physiological-based (Verdière et al., 2019) measurement (Funke et al., 2012). These measures are often gathered from individuals and aggregated to create a team-level team workload measure. Aggregation can involve various methods, such as averaging, summing, or using more complex statistical techniques. However, measurement and aggregation approaches come with implicit assumptions (Mathieu & Luciano, 2019; Waller et al., 2016), which may often be violated in action teams. For example, averaging subjective questionnaire scores to derive a team-level workload measure may rely on the assumption that individual workload levels are homogeneously distributed across team members that there is an additive linear relationship between the workload levels of the individual teammates and the overall team workload. However, average-based measures of team states are prone to undue influence from extreme values and statistically rely on homogeneity of variance for their validity (Mathieu & Luciano, 2019). In action teams, workload homogeneity is not always present or consistent over time (Barnes et al., 2008), and the relationship between individual and team-level workload may be non-linear (Funke et al., 2012). For instance, overload in one or a few members may result in bottlenecks (such as the requirement for special skills or information) that prevent the team from deploying their resources to maintain performance, despite there being available cognitive resources among other members of the team.
Subjective measures are also obtrusive, generally impractical to embed within realistic action team environments, and constrained to capturing static snapshots in time. Physiological measures collected from wearable sensors overcome some of the shortcomings of subjective measures and show promise for studying teams in various contexts (e.g., Dias et al., 2019; Halgas et al., 2023; Kazi et al., 2021). Physiological measures can offer the advantage of continuous, real-time data collection, but they also come with challenges such as the need for calibration; potential discomfort for participants; operational contexts preventing use for data collection due to functional limitations or security concerns; and the complexity of interpreting physiological data in the context of team interactions and workload. Establishing links between physiological measures, which are collected at the individual level, and team-level states are still in a nascent stage of development and are not well established for team workload (Kazi et al., 2021).
Communication Measures of Team Workload
Team interaction-based measurement approaches may provide some advantages over other techniques for assessing workload in action teams. Interaction-based measures, much like physiological measures, can be embedded within the work context. They have potential for real-time applications using behavioral event data generated from the interactions among teammates that take place naturally during execution. Communication-based interaction measures have the particular advantage of not interrupting the task or requiring a suite of physiological sensors which are often prone to environmental disruptions. Communication measures based on communication flow (i.e., who talks to who), content (i.e., what is said), or a combination can provide valuable insights into team interactions and have been effectively used to assess team dynamics and performance (Cooke & Gorman, 2009) and can tap directly into team cognition (Cooke et al., 2013). Common communication-based measures in action teams include the frequency and pattern of communication among team members (Barth et al., 2015; van den Oever & Schraagen, 2021), turn-taking dynamics in conversations (Gorman et al., 2012), and content analysis of team communication (Fincannon et al., 2011). However, communication-based approaches to team measurement of workload have been understudied, and the viability and generalizability of different measures are not known. This study aims to increase our understanding and effective application of communication-based team interaction measures to team workload measurement in action teams.
As demands placed upon a team approach or exceed its workload capacity, team members must adapt their individual and collective behaviors to maintain performance (Funke et al., 2012; Woods & Wreathall, 2016). It is also well established that people proactively adjust their communications as contexts change to facilitate the exchange of information (Clark & Brennan, 1991; Sacks et al., 1974). Research in action teams suggest that teams adjust their communication characteristics, such as quantity, rate, centralization, and complexity to maintain performance to vary with changing task demands. For instance, greater communication quantities were observed in simulated submarine command and control scenarios where the number of vessels the team was required to identify and navigate around was increased (Roberts et al., 2017). Aviation crews in simulation have been observed to communicate more and their communications tended to be shorter and more rapid under non-routine flight conditions (Lei et al., 2016). Teams conducting remote search tasks have also been found to exhibit increased rates of speech and to decentralize their communication when time pressure was increased (Gervits et al., 2016). Decentralization of communications has been associated with high demand conditions in other studies, including high complexity surgical procedures (Barth et al., 2015; van den Oever & Schraagen, 2021), the Apollo 13 flight director’s voice loops during crisis (van den Oever & Schraagen, 2021), and simulated submarine scenarios (Roberts et al., 2019; Stanton & Roberts, 2020). Measures of interaction complexity have also been applied to military command and control teams, revealing that the complexity of communication patterns tended to decrease when more targets were introduced in an air battle management task (Parker et al., 2016; Russell et al., 2012; Strang et al., 2012).
Presumably, patterns in adaptation in part reflect the coordination required to maintain performance under different demands levels. For example, higher speech rates and quantities may allow teammates to convey more information more quickly for the execution of joint tasks that require the assembling and sequencing of information or enable continuous adjustment between teammates executing parallel tasks. Decentralization may reflect situations in which information and expertise are required across several team members to complete an interdependent task (Barth et el., 2015) or it may distribute communication load among several people instead of a few. The complexity in communication patterns may reflect the task constraints altering the degrees of freedom available for team coordination as task demands change (Strang et al., 2012) or indicate an adaptive response (Wiltshire et al., 2018). Overall, the empirical work in action teams does provide some evidence of the links between team workload and team communication patterns. However, there has not been sufficient study to determine how these patterns generalize and possibly persist across different action team and task contexts and, in particular, what the role of team interdependence plays in these relationships.
The Current Study
Our study aims to expand upon the research suggesting a potential link between communication patterns and team workload and explore the potential for using communication measures to assess team workload in action teams, and to furthermore understand how it relates to team performance under load. To this end, we conducted an experiment where task demands were manipulated in a resource-limited manner (Norman & Bobrow, 1975), but at the team level (Johnson et al., 2023). This design required participants to collaborate to allocate resources effectively as part of 3-person teams engaged in a simulated task based on wildland firefighting command and control. These tasks were intended to mimic key activities of action teams, a context in which effective communication is critical. Team interdependence was also directly manipulated to examine how the interdependencies between teammates influence the relationship between team workload and communication.
Team workload is a multifaceted and complex construct, which can make isolating and understanding the parts (e.g., resources, demands, or performance) particularly challenging. Therefore, this study focuses on the relationship between team demand manipulations, as characteristics of the tasks being executed, and team performance outcomes. The aim of this is to investigate potential congruent relationships between communication measures and team workload. We also examine how communication measures directly relate to traditional subjective team workload measures, investigating the convergence between these measures and also revealing any unique team performance variance that communication measures might capture. The primary research questions focus on whether team communications provide information about team workload states and team performance under high demand conditions, and whether this relationship changes with interdependence in action teams.
Although previous studies serve to provide a link between team communication characteristics and team workload in action teams, most of the findings are indirect, and narrowly scoped. Furthermore, the structures of the teams and demand manipulations took several different forms, often including “off-nominal events” (Gervits et al., 2016; van den Oever & Schraagen, 2021) or qualitatively different tasks, which may elicit markedly different patterns than those of high demand—but otherwise nominal—conditions. This makes it challenging to generalize previous research to different action teams (e.g., with different levels of team interdependence) and contexts. Overall, there remains a lack of work directly testing the relationship between team communication characteristics (e.g., quantities, rates, distributions, and patterns) and team workload within controlled action team experimental settings. To address that gap, the current study directly measures several of these communications characteristics in a singular action team task context, and systematically manipulates demand levels to clarify how communications change under load.
The following hypotheses are anticipated based on the previous findings describing how action team communication characteristics change under high demand conditions. These previous findings suggest that as the demands placed on the team are increased, communication quantities increase (Lei et al., 2016; Roberts et al., 2017), communication rates increase (Gervits et al., 2016; Lei et al., 2016), communications become more decentralized across the team members (Grote et al., 2010; Roberts et al., 2019; Stanton & Roberts, 2020b; Van Den Oever & Schraagen, 2021), and communication patterns become less complex over time (Parker et al., 2016; Russell et al., 2012; Strang, Horwood, et al., 2012). Therefore, we hypothesize that: • • • •
Exploratory Objectives
A primary goal of team workload research is to also understand team performance under load, ultimately so that it can be anticipated, designed for, or otherwise modified. Commonly, the relationship between task demands and team performance is examined via relatively constrained traditional measurement techniques (e.g., subjective workload questionnaires). The current study seeks to move beyond traditional approaches by examining how team performance under high demand is related to both team communication characteristics and more traditional subjective measures. To do that, we pursued the following exploratory study objectives: • •
These two exploratory objectives ultimately focus on whether communication measures may be suitable for predicting team performance when demands are high and if they might supplement traditional measures in that respect.
Study Methods
Participants and Design
One hundred twenty participants, ages 18–47 (M = 22.9, SD = 3.7), were recruited from a large university in the southwestern United States. Forty teams of three participants were formed and individuals randomly assigned to roles of Blue Leader, Red Leader, or Gold Leader. The experiment used a 2 × 2 mixed design, with Task Demand (High Demand/Low Demand) counterbalanced within-teams and Team Interdependence (High Interdependence/Low Interdependence) as the between-teams factor. All teams completed two missions. Each session lasted approximately 90 minutes and participants were compensated with a $25 digital gift card upon completion.
Simulation Environment
Task Design
Asset Attribute Comparison.
Note. Color/greyscale coding indicates relative advantage of each asset: green/light grey = big advantage, amber/medium grey = moderate advantage, red/dark grey = disadvantage.
Task Flow
First, an alarm would sound, and a red “X” would display the location of a potential fire. Next, an asset (e.g., helicopter) would be sent to confirm whether there was a fire at that location. A separate digital map would be marked to indicate if the warning was correct or not. Next, the team needed to determine which houses were at risk of being burned down and collectively plan their tactics. Finally, the team would deploy and use fire trucks and bulldozers to fight the fire directly or indirectly, while also resupplying as needed. Teams could use different strategies, but the task flow generally followed Figure 1. Team task sequence.
Between-subject Manipulation: Team Interdependence
Two conditions of team interdependence were tested: High Interdependence and Low Interdependence. Interdependence was implemented by adapting the Co-active Design process proposed by Johnson et al. (2014). This consists of a joint task analysis, followed by a systematic, iterative examination of the dependencies between different roles based on the task parameters. Hard dependencies generally increase the need for coordination. For example, fire trucks were the only assets that could extinguish fires, creating a hard dependency between teammates when all the firetrucks were controlled by one teammate. Soft dependencies allow for improvements in efficiency or performance via collaboration and backup behaviors. For example, firetrucks could be used for reconnaissance of fires, but helicopters were faster and improved the efficiency of this task. Yet neither was specifically required, regardless of who was in control of them.
High Interdependence (Top) and Low Interdependence (Bottom) Condition Role Details.
Note. Dark grey boxes indicate the role has access to the associated information/task.
In the Low Interdependence condition, all three roles (Red, Blue, and Gold) were assigned the same assets (1 helicopter, 1 bulldozer, and 1 fire truck), received the same information from the simulation (fire warnings, wind status, and forecast), and could all complete the map marking task. In this condition, all the elements of the team task could be completed by one individual alone. However, containing the fires and achieving a good performance score in the high demand condition required more than one individual, due to the number and scale of fires.
In the High Interdependence condition, each role only had access to one of the different asset types (either 3 helicopters, 3 bulldozers, or 3 fire trucks), and heterogeneous access to environmental information, leading to a large portion of the task being classified as hard dependencies. All three teammates were needed to complete the team’s goals in both high and low demand conditions.
Within-Teams Manipulation: Demand
Demand was manipulated at the team-level by changing the count of fires ignited during a mission (Figure 2). The task was also resource limited (at the team level) because it depended on the distribution of team resources and necessitated collaboration among members. Timeline of events in low demand versus high demand mission. Note. Blue circle indicates wind direction change. Subdued blue circle indicates the time of the associated forecast. Red “X” indicates fire warning populated (correct warning). Subdued, dashed red “X” indicates fire warning populated (false alarm). Red flame indicates a new fire spawned in the environment.
In the High demand mission, two fires were generated instead of one at each spawn point with a moderate amount of distance between each one, and an additional fire was spawned 100 seconds after each of the main fires (Figure 2). Three of the fires appeared at the same location and time (0 sec, 300 sec, 600 sec) in both the High Demand and Low Demand missions to increase equivalency between missions, but fire spawn points were geographically mirrored between missions so that the participant would be less likely to anticipate them.
Materials and Apparatus
During the experiment, participants sat at a large desk, separated by partitions preventing them from communicating non-verbally or seeing one another’s computer screens. Six computers with a local area network and internet connectivity, keyboards, and mice were used. The first three computers were used as participants’ workstations with horizontal dual monitor setups. An interactive digital map implemented in Qualtrics and a secondary task window were displayed on the left monitor. The NFC simulation environment was displayed on the right monitor. A fourth computer hosted the simulation and other software, while the fifth and sixth computers remotely administered a secondary task for the other two participants. Four Epos / Sennheiser GSP 670 gaming audio headsets with an integrated microphone were used for communications.
Workstations were labeled by team member role (e.g., Red Leader). Quick reference sheets were attached to the left side of each participant’s workstation desk. These sheets included reminders of communication procedures, controls, and map marking rules. A paper compass rose indicating cardinal directions (north, south, east, and west) was taped to the right side of each workstation desk to assist with communicating directions in the task.
Procedure
Team Communication Best Practices Reinforced in Training.
Following training, participants completed two 12-min missions (one High Demand and one Low Demand), with order counterbalanced between teams. Workload questionnaires were administered after each mission. Following completion of both missions, after the workload questionnaires, participants also completed questionnaires including demographics, a leadership questionnaire, and qualitative workload questions. Finally, participants were debriefed and compensated.
Measures
Team Performance
The team performance score was embedded within the NFC software (NFC Manual v1.40, Omodei et al., n.d) and based on the number of trees (−1 point) and houses (−50 points) destroyed by the fire. Points began at 100% of the possible points in the scenario and were deducted based on the number of trees and houses burned during the scenario (See Supplemental Materials).
Team Communication Measures
Team Communication Measures.
Recurrence Quantification Analysis
An adaptation of categorical recurrence quantification analysis (RQA) was used to characterize team communication complexity. RQA is a set of techniques used to assess patterns of complex systems over time (Webber & Zbilut, 2005). RQA has been applied to characterize coordination in action teams, using a symbol-based categorical (i.e., discrete) approach which requires the coding of relevant variables prior to analysis (Gorman et al., 2012, 2020). For team workload specifically, prior studies suggested communication pattern entropy decreases in higher demand conditions, but only if the semantic meaning of content is considered (Parker et al., 2016; Russell et al., 2012; Strang et al., 2012). Therefore, semantic content was coded (see description below) in addition to speaker states for RQA utilizing a primarily automated approach that retained the timing at a 0.2 Hz sampling rate for comparison with other time series measures, in contrast to sequential approaches (Demir et al., 2023; Gorman et al., 2012; Russell et al., 2012). This sampling rate was a compromise between a high sampling rate (e.g., 1+ Hz) which might be overly sensitive to differences in timing and gaps in communication and make meaningful patterns difficult to detect, and a sequential approach which decouples it from being directly compared to other time series measures.
Speaker Coding
Speaker states were defined by the communication flow of who was speaking during each time period. If there were overlapping communications, then it was coded as a separate state. Speaker codes were sampled at 5 second intervals throughout each mission. If no one spoke during an interval, it was coded as a zero and retained (Figure 3). Recurrence quantification state space example.
Content Coding via NLP
Natural language processing (NLP) techniques were used to code for concepts found in the speech. First, utterances were split into individual words called tokens using the R package tidytext (Silge & Robinson, 2016). Lemmatization was performed on the tokens to transform them into their base forms with the R package textstem (Rinker, 2018). Word embeddings from a pre-trained model (Google Code Archive, n.d) containing 300-dimensional vectors for 3 million words and phrases were utilized to quantify semantic meaning. K-means clustering was used to segment words into clusters based on semantic similarities. Five clusters were selected to represent conceptual state based on the within-groups sum of squares elbow method. These clusters were used to represent conceptual states in the dialogue and were associated with individual words.
The time series of conceptual codes were also sampled in 5 second intervals. If more than one code was present in an interval, a separate code was generated to represent the combination of codes. Conceptual codes were multiplied by 100 and then combined with the speaker state symbol for the associated time interval, creating a unique combined state symbol for each combination of speaker and content codes. The resulting state series generated for each mission was 144 symbols (720 sec mission/5 sec sampling intervals), with 32 total combined states (See Figures 3 and 4 for an example). Example symbolic state series of speaker and role combinations.
Categorical Entropy
RQA produces several measures, one of which is entropy: a measure of system complexity, which for RQA is often computed as the Shannon entropy of the frequency distribution of diagonal lines present on a recurrence plot (Figure 5). Recent research has suggested that Categorical Entropy (catH) may be more suitable for estimating information entropy in categorical time series, such as the ones found in this study (Leonardi, 2018). Categorical entropy (catH) is calculated based on the Shannon entropy of the distribution of the area of the rectangles present in the recurrence plot. Example recurrence plot associated with the combined state series in Figure 4.
The R package crqa (Coco & Dale, 2014) was used to compute categorical entropy (catH), the focus of this analysis. Following common convention for categorical RQA, the minimum diagonal length for recurrence was set to 2, delay = 1, embedding dimension = 1, metric set to “Euclidean” and data type set to “categorical.” RQA metrics are generated from characteristics of the recurrence plot (Webber & Zbilut, 2005). See Figure 5 for an example.
Workload Questionnaires
Workload Questionnaires.
Note. The NASA-TLX items are also included within the TWLQ.
Results
Communications Data Preparation
Team Communication Measures Descriptive Statistics.
Two teams were removed from the analyses. One team was excluded due to very little overall communication, and another was excluded because one of the participants did not communicate at all during one of the missions, and very little during the other. These exclusions were necessary to ensure the quality of the communication data for a meaningful analysis. See Table 6 for descriptive statistics for each communication measure.
Communication Measure Multivariate Effects
A repeated measures Multivariate Analysis of Variance (MANOVA) was conducted with Demand as the within-teams factor and Interdependence as the between-teams factor on the six team communication measure dependent variables (DVs): Intensity, Frequency, Duration, Speech Rate, Centralization, and Complexity. There was a significant multivariate effect of Demand on the combined dependent variables, Pillai’s Trace = 0.354, F (6, 30) = 2.74, p = .030, η p 2 = .192. There was also a significant multivariate effect of Interdependence on the combined dependent variables, Pillai’s Trace = 0.528, F (6, 30) = 5.59, p < .001, η p 2 = .786. The interaction between Demand and Interdependence did not reach statistical significance, Pillai’s Trace = 0.238, F (6, 30) = 1.56, p = .194, η p 2 = .052. These results suggest that both the level of Interdependence and Demand exhibited a significant influence on the combined team communication measures.
Communication Measures Univariate Effects
Summary of 2 × 2 Split Plot ANOVAs for Communication Measure Dependent Variables.
Note. *** <.001, ** <.01, * <.05. Bold font is used to highlight statistically significant results (p<.05) to improve readability.

Team communication measures across demand and interdependence conditions. Note. Error bars indicate 95% CI of the mean.
Summary of Results (H1–H4)
Communication Quantities and Demand (H1)
Table 7 and Figure 6 show that, supporting H1, the time spent communicating (Intensity) increased with higher demands, but the average number of utterances (Frequency) did not. Teams in the High Interdependence condition also tended to talk more (greater Intensity and Frequency), which is unsurprising due to the increase in hard dependencies that required more coordination between roles.
Message Lengths and Demands (H2)
Team utterance durations tended to increase and team speech rates decreased in High Demand missions, failing to support H2 (Table 7 and Figure 6). These unexpected findings are complex and will be discussed more later.
Centralization and Demands (H3)
A significant interaction effect between Demand and Interdependence was found (Table 7 and Figure 6). Post-hoc analyses using contrasts indicated that there was no difference between levels of Demand when Interdependence was Low, t (35) = 1.20, p = .24. Conversely, when Interdependence was High, a significant difference in Centralization between levels of Demand was observed, t (35) = −2.52, p = .016. In other words, High Interdependence teams had more centralized communications in the High Demand mission, whereas Low Interdependence teams tended to stay the same between missions. These results indicate a relationship between task demands and centralization; however, it was not in the direction predicted by H3 and only High interdependence teams differed significantly between demand conditions.
Complexity and Demands (H4)
Team communication Complexity was lower in the High Demand condition, supporting H4 (Table 7 and Figure 6). Furthermore, results revealed that communication complexity tended to be higher in the Low Interdependence teams.
Across findings for H1–H4, our results indicate that several communication measures can be linked to high levels of team demands, perhaps laying the groundwork for communication-based team workload measures.
Exploratory Analysis of Communication Demand Question
Pearson correlations were conducted to investigate the correspondence between communication quantities (both Frequency and Intensity), and Communication Demand questionnaire item responses, averaged at the team-level (as is commonly done). Intensity and Frequency were split between High and Low Demand conditions for comparison.
Correlations Between TWLQ “Communication Demand” Item and Communication Behaviors.
Note. *** <.001, ** <.01, * <.05. Bold font is used to highlight statistically significant results (p<.05) to improve readability.
Communication Measures and Team Performance
Exploratory Objective 1 examines how communication measures related to team performance under high demand conditions. First, after exploring the data for collinearity, correlations were conducted to determine if each variable was related to team performance individually, followed by a multiple regression with semi-partial correlations to identify the best joint predictors of team performance in the High Demand mission.
Individual Variables and Performance Regressions
Several multiple regression equations were calculated to evaluate which communication measures were individually related to team performance under High Demand conditions. Each equation included the communication variable, and the Interdependence term, with Team Performance as the outcome variable.
Regression Results for Each Communication Measure in the High Demand Mission.
Note. df = (2,34). *** <.001, ** <.01, * <.05. Bold font is used to highlight statistically significant results (p<.05) to improve readability.
Multiple Regression
A multiple regression analysis was then conducted to evaluate predictors of team performance identified in the previous steps in the High Demand mission. Semi-partial correlations were calculated for each of the communication variables quantifying the unique contribution of each variable to the prediction of team performance, controlling for the other variables in the model.
The variables included in the final model were determined based on (1) significant or near-significant relationships with performance in the previous analysis and (2) collinearity diagnostics. The Duration variable was removed because it did not show a relationship with team performance. Collinearity diagnostics indicated that Frequency was highly collinear with both Intensity (r = .84) and Complexity (r = −.89) and also inflated the Variance Inflation Factor (VIF) of both Frequency and Duration far beyond the conventional cut-off (VIF >10) when it was included in the model. Therefore, both Frequency and Duration were dropped from the final model in favor of Intensity which captured most of the variance of those two measures.
High Demand Mission Multiple Regression Results.
Note. *** <.001, ** <.01, * <.05. Bold font is used to highlight statistically significant results (p<.05) to improve readability.
Overall, the results of the analysis for Exploratory Objective 1 suggest links between communication variables and team performance under high demand when the variables are treated in isolation (Table 9). However, trends are muddled when communication variables are analyzed collectively (Table 10).
Communication Measures as a Supplement to Subjective Workload Measures
Exploratory Objective 2 evaluates how the findings thus far combine, asking: Do the communication measures provide information about team performance under high demand beyond traditional workload questionnaires?
Correlations Between Measures
Correlations Between Communication Measures and Workload Questionnaires.
Note. *** <.001, ** <.01, * <.05. Bold font is used to highlight statistically significant results (p<.05) to improve readability.
Speech Rate was negatively correlated to all three team-level workload questionnaire averages (TLX Avg , TWLQ Avg , RS-TLX Avg ) both overall and in the Low Demand mission. In the High Demand mission, only TWLQ Avg was significantly correlated with Speech Rate and the relationship was not as strong. The results suggested a relationship between talking faster and increased perceived workload averages, but, otherwise, no apparent collinearity exists between the communication measures and traditional team workload measures.
Communication measures and questionnaires
Structures of Regression Models Predicting Team Performance With Communication Measures Arranged From Least to Most Difficult Data Types to Collect.
Note. Each model requires the same data types as all of the proceeding models in addition to the “additional data requirements” needed for that model. For example, Model 3 requires questionnaires aggregated as average and CV, utterance durations, and speaker identities.
Hierarchical Regression Results in High Demand Mission.
Note. *** <.001, ** <.01, * <.05. Bold font is used to highlight statistically significant results (p<.05) to improve readability.
The results of the hierarchical regression (based on significant R2Δ) suggest the communication measures explain additional variance in team performance beyond the questionnaires under high demand, complementing the mean and variance-based aggregation of traditional workload questionnaires in team performance prediction. Notably, a large amount of additional explanatory power can be attributed to the relatively easy-to-implement, content-free flow measures (Intensity and Centralization) in Models 2 and 3, although Speech Rate did contribute to a marginal extent in Model 4. The Complexity measure did not provide additional information above and beyond the other measures.
These results suggest that communication measures can contribute to predictions of team performance under high demand conditions beyond workload questionnaire measures. The explanatory power of these measures does not appear to be collinear to traditional subjective workload questionnaires, but rather provides something different and supplemental, rather than a straightforward replacement.
Discussion
The aim of this study was to test whether team communication-based team interaction measures (e.g., Cooke & Gorman, 2009) can be utilized to assess team workload in action teams. The current study consolidated hypotheses generated from previous findings (Entin & Serfaty, 1999; Gervits et al., 2016; Grote et al., 2010; Stanton & Roberts, 2020) and is unique in that team-level demands were manipulated with the explicit goal of assessing their effect on team-level communication characteristics. Several characteristics of team communications were indeed significantly different between high and low demand task conditions, including the quantity of communication, the duration of utterances, the centralization of the communication, the complexity of the communication patterns, and the rate of speech. The results also suggest that team interdependence plays an important part in determining how communication characteristics adapt as demands change.
Increased Demands Increased Communication Time, but not Frequency
Teams tended to spend more mission time communicating in the high demand mission; however, they did not have more distinct utterances (“Frequency”). One reason for fewer utterances is that teams may need to convey more information in each utterance as the number of concurrent tasks increases with task demands. In the current study, the team itself was often required to “multitask” and spread their resources when demands were greater; this could have resulted in more complex communications, which take longer to convey and are not necessarily broken up into separate utterances. Gervits et al., (2016) and Khawaja et al. (2012) have found that higher demand conditions are associated with an increased number of words spoken by the team.
An exploratory analysis also revealed that the amount participants communicated was not strongly related to their ratings of communication demands. Perhaps unsurprisingly, this indicates that supplementing standard team workload measures with behavioral measures may help more accurately measure the role of team communication demand as a part of team workload.
Teams that spent more mission time talking tended to perform better in the high demand condition. This finding is in line with previous work (Gervits et al., 2016; Grote et al., 2010) and highlights how task context can impact any hypothesized relationships between communication characteristics and demand loads. In our study, increases in task demands corresponded with a greater need to coordinate. However, implicit coordination was difficult due to a lack of visibility between participants and very limited time spent working together. Thus, a similar trend may apply to other teams for which implicit coordination is difficult, such as geographically distributed action teams. In contrast, other research on co-located and experienced teams suggests a shift from explicit coordination to implicit coordination modes during periods of high demand, presumably to free up individual cognitive resources (Entin & Serfaty, 1999; Grote et al., 2010; MacMillan, Entin, et al., 2004).
Teams Communicated Slower and Longer when Demands were Greater
When the number of things which must be accomplished increases under time constraints, people may increase the rate at which they do certain tasks to maintain overall performance (Wickens & Tsang, 2015). In the context of teaming, this includes communicating with teammates. Counter to our (H2) predictions, under high demand teams actually had longer utterance lengths (in seconds) on average, and they tended to talk slower (the rate of word production). The literature does show some teams extend their utterances when demand increases; but those studies used measures that did not account for time. Namely, Gervits et al., (2016) found increases in demands were associated with more words per utterance, and Khawaja et al. (2012) found a similar increase in the overall number of the number of words per sentence uttered. Unfortunately, neither measure can be computed from our data due to the unreliability of identifying discrete sentences in the transcripts.
Our findings regarding utterance length explain how time spent speaking increased, but utterance frequency did not; messages were simply longer and may have contained more information, a response to an increase in concurrent team demands. However, that finding fails to explain why utterances were also slower. One explanation is that teams were required to balance their mental resources between teamwork and taskwork, leaving fewer resources for communications, which resulted in slower speech. Supplementary analysis of the TWLQ indicates that the teamwork-taskwork balancing sub-dimension scores were indeed significantly higher in the high demand mission compared to the low demand mission (See Supplemental Materials).
It may be the case that participants both spoke more slowly and provided more detail in high demand conditions because they were attempting to avoid errors or to ensure clarification in any communications that could be detrimental to team performance if missed. Research in pragmatics suggests that people often proactively adjust their communication to prevent potential errors and ensure clarity (Clark & Brennan, 1991; Sacks et al., 1974). In the current context, it may have manifested itself as more deliberate speech and the provision of additional details to account for higher task demands and the increased possibility of communication breakdowns.
Higher Team Interdependence Increased Centralization of Communication
Despite predicting that teams under higher demands would have decentralized or “flattened” communications (Barth et al., 2015; Grote et al., 2010; Roberts et al., 2019; Stanton & Roberts, 2020b), the current study found no evidence of that effect. Instead, high interdependence teams actually centralized their communications further, whereas low interdependence teams tended to maintain the same level of centralization. Decentralization under load may not be universal across action teams as suggested by others (Barth et al., 2015; Grote et al., 2010; Roberts et al., 2019; Stanton & Roberts, 2020b), but may be more directly related to the distribution of individual requirements to communicate and coordinate throughout the team (Stanton et al., 2017). In the high interdependence teams in this study, a larger proportion of the coordination activity fell on the Gold Leader, who controlled the helicopters used to identify potential fires. Follow-on analysis confirmed that the Gold Leader experienced the greatest increase in communication behaviors (See Supplemental Materials). In contrast, the low interdependence teams had no set role allocations and no reason for one role to carry a higher coordination burden than others.
When team performance was considered, the results also show that effective teams did have relatively lower levels of centralization when demands increased regardless of interdependence level. This may indicate that everyone in the team was doing their part to help coordinate, sense make, or conduct interdependent tasks, and that team members were engaging their individual resources as a collective unit.
Higher Demands Reduced Communication Complexity
This study introduced an automated method of conducting recurrence quantification analysis (RQA) that incorporates team communication flow and content using NLP, built upon prior research (Demir et al., 2023; Gorman et al., 2012; Russell et al., 2012; Strang et al., 2012), but adopting a “true” time series approach that allows RQA to be extended to continuous assessment. Team interaction patterns were less complex when demands were high, supporting previous work (Parker et al., 2016; Russel et al., 2012; Strang, Horwood, et al., 2012). The reduction could be a result of increased negative constraints (e.g., task demands, cognitive load), that cause dynamical systems to lose complexity as suggested by Strang et al. (Goldberger et al., 1990 cited by Strang et al., 2012). In the current context, an increase in tasks placed constraints on potential team actions in time, and this configuration may have resulted in self-organization of the teams into less complex coordination patterns. Further, the high interdependence teams tended to have lower complexity of interactions, and, because they had fewer degrees of freedom in their potential actions, this too aligns with the negative constraints hypothesis.
Finally, teams who had less complex communications tended to perform better in the high demand mission, showing some association with team performance. Thus, the use of team communication pattern complexity measures shows promise for detecting meaningful changes in demand at the team-level and associated changes in team performance.
Communication Measures of Workload Improved Prediction of Performance
This study provides new evidence that communication-based team interaction measures improved predictions of team performance beyond traditional workload questionnaires under high demand conditions (Exploratory Objective 2). Questionnaires aggregated with both average and variance approaches (CV) only accounted for 23% (Adjusted R 2 = .15) of the variance in team performance during the high-demand mission. However, this explanatory power more than doubled to 49% (Adjusted R 2 = .39) with the inclusion of just three communication measures (Intensity, Centralization, and Speech Rate).
Models of team performance in the high demand mission were improved most drastically by adding content-free team communication measures (Intensity and Centralization), and, to a lesser extent, Speech Rate. These relatively easily accessible communication measures did not overlap with existing workload measures, and can provide utility beyond traditional subjective measures, when the goal is to predict team performance under high demand conditions.
Complexity, the measure that requires the most labor-intensive data was a significant predictor when analyzed in isolation, but did not account for significant additional variance when added on top of intensity, centralization, and speech rate. This indicates that Complexity—as it was computed for this study—may have significant overlap with these other measures, but also has potential as a useful singular measure deserving of further investigation.
Limitations and Future Directions
This study suggests that communication measures of team workload have the potential to be a powerful tool for the measurement of team workload with further development. Our findings for H1–H4 suggest that several communication measures correlate with high levels of demands placed on teams, indicating their potential as interaction-based team workload indicators. These correlations provide initial evidence that communication metrics reflect team workload. However, these measures currently indicate relative shifts rather than absolute workload levels. To be most useful, workload measures need to quantify team workload on an absolute scale. Developing such a scale would enable precise assessments and comparisons across different contexts and teams, capturing not only relative changes but also establishing baseline levels and thresholds for various workload states. Future research should aim to refine communication measures to function on an absolute scale, ensuring reliable workload quantification. This advancement is crucial for applying these measures in real-time settings and for designing interventions to optimize team performance and safety under high demand conditions.
In addition, there is a conceptual challenge inherent in disentangling workload and team performance. In interdependent team tasks, communication is required for performance, and the characteristics of communication are likely to have some relationship to team performance based on the context, regardless of workload levels. The primary comparisons for assessing the link between team workload and communication measures were the experimentally manipulated demand conditions and the workload questionnaires. Whereas neither the questionnaires nor the manipulations are assumed to directly represent the construct of team workload perfectly, they act as proxies to explore the multifaceted phenomena from different perspectives.
Our findings indicated that performance was statistically related to questionnaire scores, consistent with previous research. Although we identified some relationships between communication measures and team performance, there was not a strong direct link between workload questionnaires and communication measures.
Interestingly, the communication measures were most effective as predictors of team performance under load when statistically combined with the workload scores. This suggests that communication measures and workload scores capture different aspects of the relationship between team performance and workload, warranting further exploration in future studies. Specifically, future work should aim to clarify the constructs of team workload and team performance in action teams and consider including additional workload proxies, such as secondary tasks or physiological measures, to assess convergent validity.
The current study relied on communication measures summarized at the mission level as a first step toward understanding team workload. These communication measures may also be suitable for real-time capture and analysis via embedded technology (e.g., communication analysis in a military vehicle crew). Communication measures may be sensitive to fluctuations in workload that are simply not captured by other measures, and can be implemented using time series methods such as a sliding window technique (Gorman et al., 2012, 2020). Other analytical approaches may also be worth exploring in the context of team workload, including techniques developed by Wiltshire et al. (2018) for detecting phase transitions via sample entropy. These and other related techniques (e.g., Grimm et al., 2023) may be useful for identifying team workload transitions, which could serve as leverage points for improving action team performance. Automated detection methods could then be used to trigger supporting interventions, inform the reallocation of resources, or in training and assessment to quantify team responses to changing demands. These methods were not tested here but are encouraged in future work.
Communication-based measures can provide rich contextual information about team interactions and processes. However, they can also be time-consuming to collect and analyze when relying on transcription and content coding (Hasan et al., 2022; Jahangir et al., 2021). In this study, automatic transcription of the speaker’s identity and words spoken was accomplished using Otter.ai. However, these automated transcripts were deemed too inaccurate to analyze without manual verification and correction. Although there is progress being made in automatic speaker detection and automatic behavioral coding in various team contexts (e.g., Hasan et al., 2022; Jahangir et al., 2021), further advancements are essential for the real-time implementation of communication measures, especially those that rely on content such as speech rate and complexity measures used in this study.
Physiological measures have some relative advantages over communication measures for real-time implementation as they can capture data at a high sampling rate, potentially improving the continuous tracking of team workload during periods of silence where the primary form of team communication and coordination may be non-verbal. Recent work has demonstrated that physiological measures can be applied to teams (e.g., Dias et al., 2019; Halgas et al., 2023; Kazi et al., 2021). However, they are vulnerable to environmental disturbances, can be dislodged during use, require expensive hardware, and need significant computational power. Thus, communication measures may have a particular advantage over physiological in action teams where the primary forms of communication and coordination are verbal, where collection via embedded technology is possible (e.g., a military vehicle crew intercom system), and in settings where physiological sensors are otherwise unsuitable. As both communication and physiological measures of team workload continue to mature, their relative strengths and weaknesses should continue to be evaluated.
A challenge of the NLP technique used in the current study is that it required a set number of categories be used for clustering. Five clusters were chosen based on the sum of squares elbow method; however, this still requires judgement on the researcher’s part. Future work could take an approach similar to Parker et al. (2016) to evaluate how different clustering sizes and strategies impact the sensitivity of the complexity measure. In addition, communication states were sampled over 5 second intervals and when no one spoke in an interval it was coded as a 0. This can potentially inflate RR and DET values if non-speaking “dead air” state constitute a large portion of the communication data. For this reason, some applications of RQA include removing (Gorman et al., 2012) or randomizing (Simpson et al., 2022) dead air states. However, it was determined that neither mitigation strategy would be appropriate for this study, as it could not also be used as a continuous measure in future work (e.g., sliding window approaches; Gorman et al., 2020). Furthermore, the risks of dead air causing spurious results were mitigated by a variety of factors including the relatively low sampling rate, and high prevenance of speaking states (about 20–39% of all of the mission time included someone speaking on average between conditions). It also allowed us to incorporate potentially meaningful information provided by breaks in communication into the complexity measure. Future work should evaluate how different sampling frequencies impact results and whether results from sequential approaches are similar to those found here.
As is often the case with experimental team studies, the participants consisted of ad hoc teams of university students. Given the participant population and pragmatic constraints, the ability of the testbed to capture realistic action team conditions was also limited. The testbed was designed with this in mind, with a focus on cognitive fidelity rather than domain realism, and a relatively low training requirement. The measures and the task were based on eliciting action team-like coordination and communication behaviors, and thus suggest that they may generalize across many action teams that perform dynamic command and control tasks (e.g., military command and control, intelligence surveillance and reconnaissance). However, the generalization to real world action teams is not yet validated. Future work will benefit from applying the analyses in this study to more realistic settings (e.g., training simulations) and field studies.
One potential criticism of the study design is the presence of more false alarms in the Low Demand condition. However, this was considered during the study design, and it was determined that the consequences of ignoring a potential fire warning in the scenario were so large, and the effort required to investigate them so small, that typical signal detection logic does not apply to a significant extent and should not impact the findings.
One additional technical limitation of this study is the lack of correction for Type 1 error inflation. Given the exploratory nature of this study to identify potential team workload measures for further study and low power of the study which relies on measures taken at the team level, we chose not to apply statistical corrections like Bonferroni. Although this increases the risk of Type 1 errors, the potential impact of such errors is minimal in this context. Future research with larger samples should consider these corrections to ensure more robust findings.
Conclusion
In conclusion, this study found that several team communication characteristics change in a systematic fashion when team demands are high and can also be meaningfully linked to both workload and performance under high demand conditions in action team tasks. However, findings were not always consistent with previous studies, suggesting that context, structure, and the nature of the demands themselves may have a substantial impact on communication changes in demanding conditions. In addition, this study found that communication measures supplemented, rather than replaced, traditional subjective measures of team workload for predicting high demand team performance. These findings support the continued development of communication-based measures of team workload in dynamic settings and highlight the need for further advancement in underlying theory.
Supplemental Material
Supplemental Material - Exploring the Relationship Between Team Workload and Communication in Action Teams
Supplemental Material for Exploring the Relationship Between Team Workload and Communication in Action Teams by Craig J. Johnson, Robert S. Gutzwiller, Eric Holder, Polemnia G. Amazeen, and Nancy J. Cooke in Journal of Cognitive Engineering and Decision Making
Footnotes
Acknowledgments
We would like to acknowledge the efforts of Christopher Lieber who assisted with the initial scenario design, as well as the support provided by the Arizona State University Center for Human, A.I., and Robot Teaming (CHART), and the Department of Defense Science Mathematics, and Research for Transformation (SMART) Scholarship-for-Service program. The views expressed are those of the authors and do not reflect the official policy or position of the US Army, Department of Defense, or the US Government.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
