Multilevel Confirmatory Factor Analysis Reveals Two Distinct Human

Abstract

Objective

This work examined the relationship of the constructs measured by the trust scales developed by Chancey et al. (2017) and Jian et al. (2000) using a multilevel confirmatory factor analysis (CFA).

Background

Modern theories of automation trust have been proposed based on data collected using trust scales. Chancey et al. (2017) adapted Madsen and Gregor’s (2000) trust scale to align with Lee and See’s (2004) human–automation trust framework. In contrast, Jian et al. (2000) developed a scale empirically with trust and distrust as factors. However, it remains unclear whether these two scales measure the same construct.

Method

We analyzed data collected from previous experiments to investigate the relationship between the two trust scales using a multilevel CFA.

Results

Data provided evidence that Jian et al. (2000) and Chancey et al. (2017) automation trust scales are only weakly related. Trust and distrust are found to be distinct factors in Jian et al.’s (2000) scale, whereas performance, process, and purpose are distinct factors in Chancey et al.’s (2017) trust scale.

Conclusion

The analysis suggested that the two scales purporting to measure human–automation trust are only weakly related.

Application

Trust researchers and automation designers may consider using Chancey et al. (2017) and Jian et al. (2000) scales to capture different characteristics of human–automation trust.

Keywords

trust in automation human–automation interaction confirmatory factor analysis

Introduction

Automation is a pervasive and integral part of many professional and everyday activities in a modern society. Automation is defined as any mechanized and/or computational agent that partially or fully performs a variety of physical and/or cognitive tasks that humans used to perform or that humans are not able to perform (Bainbridge, 1983). Human operators ideally use automation as intended, improving human or joint human–automation performance. However, operators may also adopt suboptimal automation usage strategies (Sorkin & Woods, 1985; Yamani & McCarley, 2016, 2018) which disuse reliable or misuse unreliable automation (Parasuraman & Riley, 1997), degrading joint human–automation performance (Yamani & Horrey, 2018; Yamani et al., in press).

The rich literature on human–automation interaction in human factors suggests that an operator’s trust toward an automated system, or human-automation trust, is a critical factor for forming, growing, and maintaining successful human-automation interactions (Chen & Barnes, 2014; Lee et al., 2021; Lee & Moray, 1992, 1994; Long et al., 2020, 2022; Lyons & Stokes, 2012; Moray et al., 2000; Muir, 1994; Muir & Moray, 1996; Parasuraman & Riley, 1997; Riley, 1994; Yamani et al., 2020). The goal of the current study is to examine the relationship of the constructs measured by two trust scales, a theory-driven scale by Chancey et al. (2017) and an empirically developed scale by Jian et al. (2000), using a multilevel confirmatory factor analysis (CFA).

Providing one of the more popular theoretical perspectives on human-automation trust, Lee and See (2004) define trust as “an attitude that an agent will help achieve an individual’s goals in a situation characterized by uncertainty and vulnerability” (p. 51). Moreover, Lee and See (2004) offer an integrative perspective of human–automation trust comprised of three informational bases, performance, process, and purpose. Performance refers to the current and historical behaviors of automation that are observable to operators (e.g., automation reliability) and describes what the automation does. Process refers to the appropriateness of the automation’s algorithm that controls the automation’s behavior in a given situation and describes how the automation operates. Finally, purpose refers to the extent to which the automation is used as intended by the designers and describes why it was developed. These information bases indicate levels of attributional abstraction that are theorized to influence the development of trust (Lee & See, 2004). That is, trustees collect information pertaining to each of the three bases to develop their trust toward automation.

One method for measuring human–automation trust is via questionnaires which can be categorized to two different types: empirically or theoretically driven scales (see Kohn et al., 2021 for extensive review of human–automation trust measures). The empirically-determined scale of trust in automated systems developed by Jian and colleagues (2000) is widely used in human factors. Although their questionnaire does not directly rest on any pervasive theory, in Jian and colleagues (2000), researchers asked participants to generate descriptions of trust and evaluate words related to trust and distrust, examining relationships between the elicited words and the word “trust,” and quantified similarity of paired words. This empirical process resulted in a 12-item scale with trust and distrust dimensions. A later study showed that the Jian et al. (2000) scale measures trust and distrust as distinct factors (Spain et al., 2008).

Alternatively, Madsen and Gregor (2000) designed and tested a psychometric instrument to measure human-computer trust from existing trust theories (e.g., Mayer et al., 1995; Rempel et al., 1985). Using the nominal group technique (Delbecq et al., 1975), experienced computer users generated factors that they associated with human–computer trust. Madsen and Gregor (2000) refined these user-generated factors using several techniques, including comparison to previous theories, interrater reliability judgments (Thurstone scaling technique; Moore & Benbasat, 1991; Neuman, 1994), and a principal components analysis. The resulting scale included five dimensions (perceived reliability, perceived technical competence, perceived understandability, faith, and personal attachment) and consisted of 25 items rated on a 12-point Likert scale from “Not at All” to “Very Much” (Madsen & Gregor, 2000).

Recognizing the psychometric qualities of Madsen and Gregor’s (2000) scale and a plausible connection between identified factors and the theoretical perspective outlined by Lee and See (2004), Chancey and colleagues (2017) re-classified “perceived reliability” as “performance-based trust” (predictability or ability; what the automation does; trust in the actions of the agent), “perceived understandability” as “process-based trust” (dependability or integrity; how the automation works; trust in the agent, not the actions), and “faith” as “purpose-based trust” (faith or benevolence; why the automation was developed; trust in the agent, irrespective of past behaviors), a scale referred as Chancey et al. (2017) scale. Using a sensor-based signaling system task, Chancey and colleagues (2017) examined the effects of factors of miss/false alarm rate and risk on trust (i.e., performance, process, and purpose), compliance (i.e., operator response when a signal is issued), and reliance (i.e., operator refraining from a response when the system is silent or indicating normal operations) using mediation and moderated-mediation analyses. Indirect effects indicated that the factors of trust mediated the relationships between false alarms and compliance, but not reliance (as hypothesized). This suggests the Chancey et al. (2017) scale is reliably impacted by miss/false alarms (e.g., mistakes impact trust), and in turn the scale predicted compliance (e.g., trust produces changes in behavior). Additionally, serial mediation analyses (Hayes, 2017) indicated the indirect effect of performance, process, and then purpose, demonstrating that compliance rate was influenced through each of the three bases of trust significantly and sequentially (Chancey, 2016). This suggests distinct factors exist within the Chancey et al. (2017) scale, and the factors uniquely account for performance differences, such as compliance rate. Moreover, Chancey et al. (2017) found the scale showed adequate internal consistency for trust (α = .97) as well as for the modified scale’s factors: performance (α = .96), process (α = .91), and purpose (α = .93).

The Chancey et al. (2017) scale has been used in several additional experimental works (e.g., Chancey & Politowicz, 2020; Karpinsky et al., 2018; Politowicz et al., 2021; Sato et al., 2020). For instance, Karpinsky and colleagues (2018) examined the impact of task load on perceived trust towards imperfect automation in a flight simulation task. In their experiment, participants performed a continuous tracking task using a joystick while monitoring four gauges representing the state of two engines of the aircraft. An imperfect automated system alerted the operator at a reliability level of 70%. The results showed that the operators rated trust levels lower when visual demand of the tracking task was greater, demonstrating the measure is sensitive to detecting impacts on trust. Most critically, the high task load lowered trust ratings only on the performance and process dimensions but not on the purpose dimension, suggesting some independence between the dimensions.

Current Study

Although human–automation trust is considered a critical factor in successful human–automation interactions, research on psychometric qualities and relationship among human–automation trust scales currently available in the literature is still lacking (Bolton, 2022). Human–automation trust scales (e.g., Chancey et al., 2017; Jian et al., 2000) are widely used to measure the same construct, human–automation trust, in human factors studies. However, when employed together, researchers have found that discrepancies arise (e.g., Sato et al., under review). Each of these scales was developed, either empirically or theoretically, to include distinct dimensions. To date, though, no research has used a data-driven approach such as confirmatory factor analysis (CFA) to examine the dimensions that exist within the scales. Additionally, there is still a gap in understanding how theoretical dimensions of trust (i.e., performance, process, and purpose) might relate to empirically derived dimensions of trust (i.e., trust and distrust).

The current study compares the two scales using existing data from previous experiments conducted in our laboratory. Specifically, using multilevel CFA, this study examined how Jian et al. (2000) and Chancey et al. (2017) scales of human-automation trust conform to the hypothesized models using a new sample of participants. Specifically, the multilevel CFAs allowed us to examine how well the items correspond to each other within each scale, and the correlation between the two scales to examine whether they operationalize trust in the same way.

Experiment Summaries

Below are summaries of the four experiments from which data were included in the current analysis (see Table 1). In the experiments, all participants performed concurrently the compensatory tracking task manually and the system monitoring task supported by the automated system and reported their trust levels following the completion of the task via Jian et al.’s (2000) and Chancey et al.’s (2017) trust scales. Importantly, although each scale is purportedly measuring the same construct, effects were not consistent across all manipulations. Indeed, depending on the scale employed, 3 out of 7 cases (42%) led to different conclusions. To illustrate, the effect of tracking task difficulty on trust shows a lack of effect on trust as measured by Jian et al.’s scale (i.e., trust declined with greater difficulty once and had no effect twice). Alternatively, Chancey et al.’s (2017) scale consistently showed the impact of tracking task difficulty on trust consistently across three experiments. The original data file is available for download (https://www.yamanilab.com/).

Table 1.

Summaries of the Four Experiments Included in the Current Analysis.

Study	Independent Variable	Scale Used for Trust	Observed Effect
Karpinsky et al. (2018), N = 40	Tracking difficulty	Jian et al. (2000)	Trust declined with greater load
	Tracking difficulty	Chancey et al. (2017)	Trust declined with greater load
	Signaling system error (False alarm vs. Miss)	Jian et al. (2000)	No effect on trust
	Signaling system error (False alarm vs. Miss)	Chancey et al. (2017)	No effect on trust
Sato et al. (2023), N = 34	Tracking difficulty	Jian et al. (2000)	No effect on trust
	Tracking difficulty	Chancey et al. (2017)	Trust declined with greater load
	Task priority	Jian et al. (2000)	No effect on trust
	Task priority	Chancey et al. (2017)	No effect on trust
Sato et al. (in preparation), N = 16	Number of concurrent tasks	Jian et al. (2000)	Trust increased with greater load
Sato et al. (in preparation), N = 16	Number of concurrent tasks	Chancey et al. (2017)	No effect on trust
Sato et al. (2024), N = 39	Tracking difficulty	Jian et al. (2000)	No effect on trust
	Tracking difficulty	Chancey et al. (2017)	Trust declined with greater load
	Frequency of interruption	Jian et al. (2000)	No effect on trust
	Frequency of interruption	Chancey et al. (2017)	No effect on trust

Karpinsky et al. (2018). This experiment studied the relationship between trust in an imperfect signaling system and allocation of visual attention using the Multi-Attribute Task Battery II (MATB-II; Santiago-Espada et al., 2011). Participants performed the compensatory tracking task with varying levels of task difficulty while responding to the system monitoring task assisted by an imperfect signaling system with reliability of 70%. Results showed that automation trust declined for both scales when the tracking task difficulty was high.

Sato et al. (2023). This experiment used the MATB-II to investigate the impact of task priority on automation trust. Participants were asked to perform the tracking task, the system monitoring task aided by an automated signaling system, and the fuel management task in either the equal priority or tracking priority conditions. In the equal priority condition, participants equally prioritized the three tasks. In the tracking priority condition, participants prioritized the tracking task more than the other two tasks. Results showed that, using Chancey et al.’s (2017) trust scale, prioritization of the tracking task eliminated the effect of task load on performance-based trust in automation. However, this effect of task priority was not confirmed using Jian et al.’s (2000) trust scale.

Sato et al. (in preparation). Using the MATB-II, this experiment explored the impact of the number of concurrent tasks on automation trust and attention allocation. Participants performed the tracking task and the system monitoring task with an automated signaling system as in the previous studies above with or without the fuel management task. Results using Jian et al.’s (2000) trust scale showed higher trust ratings among participants when concurrently performing the three tasks compared to performing two tasks. However, ratings measured via Chancey et al.’s (2017) scale showed comparable trust levels between the two experimental conditions across the three bases of trust.

Sato et al. (2024). This experiment using the MATB-II tested whether frequency of interruptions impacts automation trust. Participants performed the tracking task and the system monitoring task assisted by an automated signaling system. During each of the two 20-min trials, participants were interrupted by the communication task either 16 times or four times where they listened to auditory stimuli and tune the radio to a target frequency. Ratings measured by Chancey et al.’s (2017) scale showed the effect of task load on automation trust, replicating Karpinsky et al. (2018) while ratings measured by Jian et al.’s (2000) scale did not. No effect of task interruption was observed in either scale.

Method

Participants

Participants were 129 undergraduate students (94 females, mean age = 20.64 years, SD = 4.11 years) recruited from the community of Old Dominion University (Norfolk, VA) for participation in one of four studies (i.e., participants took part in only one of the four studies): Karpinsky et al. (2018); Sato et al. (2023); Sato et al., under review; Sato et al., 2024. All participants received course credit for their participation. This research complied with the American Psychological Association Code of Ethics and all four studies were approved by the Institutional Review Board at Old Dominion University. Informed consent was obtained from each participant.

Scales

The modified Human-Computer Trust Questionnaire (Chancey et al., 2017; cf. Madsen & Gregor, 2000) used in the studies presented in the current paper contains 13 items (see Table 2). According to Lee and See (2004), the three hypothesized factors are critical for the development of automation trust: (i) Performance (5 items) assessing what automation does, (ii) Process (5 items) assessing how automation works, and (iii) Purpose (3 items) assessing why the automation was developed. All items were measured using a 12-point Likert scale with the extremities on the scale anchored to “Not Descriptive” and “Very Descriptive.”

Table 2.

Chancey et al. (2017) Trust Scale.

Performance

The system always provides the advice I require to help me perform well.

The system’s advice reliably helps me perform well.

The system’s advice consistently helps me perform well.

For me to perform well, I can rely on the system to function properly.

The system adequately analyzes the task to help me perform well.

Process

Although I may not know exactly how the system works, I know how to use it to perform well.

I will be able to perform well the next time I use the system because I understand how it behaves.

I understand how the system will help me perform well.

It is easy to follow what the system does to help me perform well.

To help me perform well, I recognize what I should do to get the advice I need from the system the next time I use it.

Purpose

To help me perform well, when I am uncertain about deciding, I believe the system rather than myself.

Even when the system gives me unusual information, I am certain that the system’s advice will help me to perform well.

Even if I have no reason to expect that the system will function properly, I still feel certain that it will help me perform well.

Jian et al.’s (2000) scale contains 12 items (see Table 3) to measure two factors: Trust (7 items) and Distrust (5 items). All items were measured using a 7-point Likert scale with the extremities on the scale anchored to “Not at all” and “Extremely.”

Table 3.

Jian et al.’s (2000) Trust Scale.

Distrust

The system is deceptive.^a

The system behaves in an underhanded manner.^a

I am suspicious of the system’s intent, action, or output.^a

I am wary of the system.^a

The system’s actions will have a harmful or injurious outcome.^a

Trust

I am confident in the system

The system provides security.

The system has integrity.

The system is dependable.

The system is reliable.

I can trust the system.

I am familiar with the system.

^aIndicates distrust item for reverse coding.

Analysis approach

The current study analyzed trust data from the four previously published studies described above. Given that dimensionality of both scales is already known based on prior work (Jian et al., 2000; Madsen & Gregor, 2000), a CFA approach was used to examine if the proposed factor structures fit the current sample. CFAs examine how well each item on a scale corresponds with the latent factor it represents via the factor loading. Low standardized factor loadings (near zero) indicate that the way participants respond to that item is not similar to the way participants respond to other items for that factor. High standardized factor loadings (near one) indicate that the way participants respond to that item is very similar to how they respond to other items for that factor. Consistently high factor loadings across items within a factor indicate the items reflect a cohesive latent construct. CFAs allow for multiple latent factors to reflect that a scale may have multiple dimensions (related but distinct characteristics of the overall construct). Because multiple responses/trials were nested within individuals in the four experiments (i.e., a multilevel design; Raudenbush & Bryke, 2001), a multilevel CFA (Li et al., 1998) was conducted to examine the factor structures of both trust scales. Performance-, process-, and purpose-based trust were modeled as first-order latent variables defined by the corresponding items from the Chancey et al. (2017) scale (see Table 2 for a listing of items that served as indicators for each factor). A second-order factor model was specified, where these latent variables served as indicators of a second-order factor of trust for this scale. Similarly, trust and distrust latent variables were defined by corresponding items from the Jian et al. (2000) scale (see Table 3 for a listing of items that served as indicators for each factor), and these latent variables served as indicators of a second-order factor of trust for the Jian et al. scale. The correlation between the two second-order factors was estimated. This allows for a direct examination of how similarly participants responded to the two measures (i.e., if someone high on trust as defined by performance-, process-, and purpose-based trust is also high on trust as defined by trust versus distrust). See Figure 1 for a depiction of this model.

Figure 1.

Second-order multilevel confirmatory factor analysis of both trust measures.

The same model was specified at both levels of analysis (within-person versus between-person). This means the same latent variables are represented within-person (capturing any differences across trials) and between-person (capturing any differences across individuals). For all first-order latent factors, a fixed factor approach was used (fixing the variance of the latent factor to 1 and allowing all item loadings to be freely estimated). For the higher order latent factors, the first loading was fixed at 1 to scale the factor, and the others were freely estimated. After confirming all items were normally distributed, robust maximum likelihood estimation was used. Analyses were conducted in Mplus version 8.4 (Múthen & Múthen, 1998–2019). For each factor loading, a 95% confidence interval (CI) was computed. Model fit was evaluated using the Comparative Fit Index (CFI), Tucker-Lewis Index (TLI), Root Mean Square Error of Approximation (RMSEA), and the standardized root mean square residual (SRMR). A model with adequate fit would have CFI greater than .90, TLI greater than .95, RMSEA less than .06, and SRMR less than .08 (Hu & Bentler, 1999). The overall model fit would indicate if the decisions made regarding which items load onto which factors were generally good, or if the model should be respecified in some way (e.g., if some items should be dropped for not corresponding well with the others, or if a second-order factor of overall trust is not well-defined by the first-order dimensions). Finally, to examine correlations among subscales for both measures, another multilevel CFA was conducted. Rather than include a second-order factor to represent overall trust as assessed by each scale, instead all factors representing subscales were allowed to correlate.

Results

The fit for the second-order factor multilevel CFA was adequate for most fit indices, with others suggesting some room for improvement: χ² (538) = 998.33, p < .001, CFI = .91, TLI = .90, RMSEA = 0.06, within SRMR = 0.09, between SRMR = .08. The standardized factor loadings (λ) at both the within-person and between-person levels can be seen in Tables 4 and 5 for the Chancey et al. (2017) scale and the Jian et al. (2000) scale, respectively. As seen in the table, factor loadings are stronger for both scales at the between-person level, indicating both scales are better at distinguishing trust differences across individuals rather than variability within an individual. Moreover, the 95% CIs are narrower at the between-person level, suggesting greater precision. Factor loadings for all items are relatively strong for both measures with 95% CIs that do not contain zero, with a few exceptions for the Jian et al. (2000) scale. This suggests that generally, the items are tapping into the same dimensions as depicted in the model (e.g., the three purpose-based trust items all reflect the same underlying construct). The last item assessing trust (“I am familiar with the system”) has a very low loading at the within-person level (.05) and a 95% CI containing zero, suggesting participants do not respond to this item similarly to the other trust items and it does not contribute to the latent factor. Additionally, the second-order factor for the Jian et al. (2000) scale is not well-defined by the two subscales (i.e., does not represent one cohesive construct defined by those two subscales), with the distrust subscale exhibiting very low factor loadings at both the within-person (.36) and between-person (.27) levels, again with 95% CIs that contain zero. This suggests the Jian et al. (2000) questionnaire is not one cohesive measure. These low factor loadings may be driving some of the model fit indices suggesting room for improvement.

Table 4.

Standardized Factor Loadings for the Multilevel Confirmatory Factor Analysis of the Chancey et al. (2017) Scale.

	Within		Between
	Loading (λ)	95% CI	Loading (λ)	95% CI
First-Order Factors (Chancey et al., 2017)
Purpose 1: Even when the system gives me unusual advice, I am certain that the system’s advice will help me perform well.	.58	0.437, 0.724	.99	0.946, 1.033
Purpose 7: Even if I have no reason to expect that the system will function properly, I still feel certain that it will help me to perform well.	.66	0.476, 0.842	.98	0.953, 1.011
Purpose 9: To help me perform well, when I am uncertain about deciding, I believe the system rather than myself.	.49	0.287, 0.699	.98	0.949, 1.018
Performance 2: For me to perform well, I can rely on the system to function properly.	.73	0.562, 0.891	.96	0.897, 1.016
Performance 4: The system’s advice reliability helps me perform well.	.86	0.810, 0.914	.99	0.970, 1.027
Performance 5: The system’s advice consistently helps me perform well.	.90	0.850, 0.958	.99	0.961, 1.010
Performance 12: The system always provides the advice I require to help me perform well.	.53	0.307, 0.758	1.00	0.941, 1.059
Performance 13: The system adequately analyzes the task to help me perform well.	.57	0.418, 0.722	.99	0.971, 1.026
Process 3: It is easy to follow what the system does to help me perform well.	.77	0.669, 0.876	.95	0.887, 1.012
Process 6: I understand how the system will help me perform well.	.78	0.689, 0.878	1.00	0.969, 1.038
Process 8: Although I may not know exactly how the system works, I know how to use it to perform well.	.50	0.301, 0.705	.97	0.921, 1.011
Process 10: To help me perform well, I recognize what I should do to get the advice I need from the system the next time I use it.	.35	0.149, 0.557	.96	0.871, 1.039
Process 11: I will be able to perform well the next time I use the system because I understand how it behaves.	.40	0.144, 0.660	.96	0.866, 1.044
Second-Order Factor (Chancey et al., 2017)
Purpose	1.00	--	1.00	--
Performance	.94	0.869, 1.019	.95	0.907, 0.998
Process	.93	0.809, 1.056	.97	0.933, 0.996

Table 5.

Standardized Factor Loadings for the Multilevel Confirmatory Factor Analysis of the Jian et al. (2000) Scale.

	Within		Between
	Loading (λ)	95% CI	Loading (λ)	95% CI
First-Order Factors (Jian et al., 2000)
Distrust 1: The system is deceptive.	.40	0.181, 0.611	.89	0.745, 1.03
Distrust 2: The system behaves in an underhanded manner.	.47	0.217, 0.717	.87	0.692, 1.038
Distrust 3: I am suspicious of the system’s intent, action, or outputs.	.68	0.453, 0.905	.99	0.884, 1.098
Distrust 4: I am wary of the system	.91	0.780, 1.044	.86	0.759, 0.957
Distrust 5: The system’s actions will have a harmful or injurious outcome.	.41	0.206, 0.619	.60	0.381, 0.819
Trust 1: I am confident in the system.	.64	0.448, 0.829	.99	0.904, 1.084
Trust 2: The system provides security.	.69	0.445, 0.933	.86	0.726, 0.987
Trust 3: The system has integrity.	.72	0.519, 0.925	.81	0.661, 0.948
Trust 4: The system is dependable.	.80	0.532, 1.058	.99	0.937, 1.054
Trust 5: The system is reliable.	.81	0.580, 1.041	1.00	0.934, 1.069
Trust 6: I can trust the system.	.80	0.689, 0.913	.97	0.933, 1.015
Trust 7: I am familiar with the system.	.05	−0.179, 0.283	.63	0.342, 0.924
Second-Order Factor (Jian et al., 2000)
Distrust	.36	−0.003, 0.717	.27	−0.013, 0.556
Trust	1.00	—	1.00	—

In contrast, the Chancey et al. (2017) scale exhibits strong factor loadings for all subscales at both levels (all loadings >.90). The correlation among the two trust scales was r = .40 (p < .001) at the within-person level, and r = .36 (p = .003) at the between-person level, representing only small-to-medium effect sizes. Regressions for the Jian et al. (2000) scale factor scores predicting the Chancey et al. (2017) scale factor scores indicate nonsignificance for the distrust factor on all three Chancey et al. (2017) factors, and significant, small-to-medium effects for the trust factor on all three Chancey et al. (2017) factors. Table 6 represents results of regressions for the Jian et al. (2000) scale factor scores predicting the Chancey et al. (2017) scale factor scores. The same pattern was true for both within-person associations (i.e., how much scores varied from trial to trial) as well as between-person associations (i.e., how much scores varied across people).

Table 6.

Regressions for the Jian et al. (2000) Scale Factor Scores Predicting the Chancey et al. (2017) Scale Factor Scores.

	Within	p	β	Between	p	β
	B (SE)	p	β	B (SE)	p	β
Outcome: Purpose factor
Distrust factor	−0.07 (0.07)	.308	−0.061	−0.07 (0.07)	.276	−0.066
Trust factor	0.63 (0.07)	<.001	0.521	0.41 (0.07)	<.001	0.384
Outcome: Perform factor
Distrust factor	−0.07 (0.07)	.315	−0.061	−0.12 (0.07)	.075	−0.105
Trust factor	0.54 (0.07)	<.001	0.483	0.49 (0.06)	<.001	0.459
Outcome: Process factor
Distrust factor	−0.13 (0.07)	.052	−0.120	−0.11 (0.07)	.110	−0.098
Trust factor	0.51 (0.07)	<.001	0.471	0.41 (0.07)	<.001	0.382

Finally, correlations among the multilevel CFA factors representing subscales of both measures are shown in Table 7. At the within-person level, the Trust measure of the Jian scale had medium sized correlations with Purpose (r = .49), Performance (r = .37), Process (r = .31), and Distrust (r = .35). At the between-person level, the Trust measure of the Jian scale had medium sized correlations with Purpose (r = .33), Performance (r = .41), and Process (r = .34), and a small sized correlation with Distrust (r = .21). It is worth noting both correlations between Trust and Distrust were not significant at the .05 level.

Table 7.

Correlations Among Factor Scores in a Multilevel Confirmatory Factor Analysis of the Jian et al. (2000) and Chancey et al. (2017) Scales.

	Purpose	Perform	Process	Distrust	Trust
Chancey et al. (2017)
Purpose factor	—	.99*	.99*	.12	.49*
Perform factor	.95*	—	.88*	.08	.37*
Process factor	.96*	.92*	—	−.01	.31*
Jian et al. (2000)
Distrust factor	.03	.01	.00	—	.35
Trust factor	.33*	.41*	.34*	.21	—

Note. Within-person estimates are above the diagonal and between-person estimates are below the diagonal.

*p < .05.

Discussion

The current study used a multilevel CFA to explore the factor structure of an empirically-driven trust scale, Jian et al.’s (2000) scale, and a theory-driven trust scale, Chancey et al.’s (2017) scale. We reanalyzed data from 129 participants from four low-fidelity flight simulator experiments that involved the identical automated alerted-monitor system with high, but imperfect, reliability. Three points can summarize the results.

First, the results confirm the hypothesized factor structures for both trust scales. For the Jian et al. (2000) scale, corroborating the finding of Spain et al. (2008), the data show support that trust and distrust are distinct factors, and likely two separate constructs. In other words, distrust is not the inverse of trust (which would have been represented with a very strong negative loading) but may actually be a separate construct that is only loosely correlated with trust. Additionally, for the Chancey et al. (2017) scale, the data also demonstrate evidence that there exist three distinct factors, labeled as performance, process, and purpose, as hypothesized based on Lee and See’s (2004) triadic model of automation trust. This finding confirms that the Chancey et al. (2017) trust scale is appropriate use for measuring the three bases of trust.

Second, the multilevel CFA revealed a significant but weak association between the two scales. This result indicates that the two scales likely measure different constructs of human-automation trust with fairly small correlations. This was largely driven by the distrust factor in the Jian et al. (2000) scale, which did not predict any of the factors from the Chancey et al. (2017) scale. Yet, showing some overlap, Jian et al.’s (2000) trust factor did significantly predict all three of the Chancey et al. (2017) factors, with small-to-medium effects. Indeed, the distrust factor may simply not be congruent with the performance, process, and purpose bases of trust outlined in Lee and See (2004), whereas Jian et al.’s trust factor may have some predictive value. Importantly, the results indicate that the triadic factor structure of the Chancey et al. (2017) trust scale, which is theoretically congruent with the foundational work by Lee and See (2004), does not align with the dual factor structure in the Jian et al. (2000) trust scale. The current results thus suggest that distrust items in Jian et al. (2000) measure a construct distinct from that measured by the trust items in Jian et al.’s (2000) scale and Chancey et al.’s (2017) scale. This casts an important question of what trust actually means and when the two questionnaires can be used for different types of interaction with automation.

Third, Jian et al. (2000) developed the trust scale to measure trust and distrust which has been widely implemented in human factors research. Although the scale was empirically developed, results of the scale have informed design recommendations and decisions in many domains, ranging from highly automated driving (Hartwich et al., 2019) to wearable fitness technology (Rupp et al., 2016), highlighting its practical value.

However, the current findings suggest that these recommendations based on data using the Jian et al. (2000) trust scale as a singular measure of trust and distrust as opposites on a continuum may not be accurate (i.e., they are two separate constructs. cf. Spain et al., 2008). If trust and distrust are at the ends of a continuous underlying dimension, then factors that promote or degrade trust will have the opposite effects on the distrust dimension (i.e., higher trust scores should lead to lower distrust scores). Practically, an aggregate measure of trust from the Jian et al. (2000) scale could be derived by reverse coding the distrust measures. Yet, the results from the CFA showed that trust and distrust are separate constructs, indicating that each can vary independently from each other. From a practical perspective, an aggregate trust score from Jian et al.’s (2000) scale may not be appropriate. Indeed, trust and distrust have been considered as two separate constructs in other domains (e.g., Conchie et al., 2011; Sitkin & Roth, 1993), with trust being regarded as positive expectations regarding another’s conduct, and distrust as negative expectations regarding another’s conduct (Lewicki et al., 1998). From this perspective, a trustor may have both high trust and high distrust in a trustee, where a person may be assured of another in certain respects but has reason to be highly warry (e.g., “trust but verify”).

An important takeaway from these results is that research that constructs theoretical arguments based on Lee and See (2004) and then attempts to measure trust using the Jian et al. (2000) scale is not appropriate, because the distrust items in the Jian et al. (2000) scale does not necessarily measure the construct that Lee and See (2004) proposed. More research should examine trust-related scales within existing frameworks of human-automation trust to better guide practical and impactful decisions in safety-critical domains using the scales. In addition, further experimentation is necessary for examining how trust as measured in the two scales mediates the effect of automation characteristics (e.g., reliability) on human behaviors such as compliance and reliance (Chancey et al., 2015, 2017; Meyer et al., 2014).

Overall, the multilevel CFA findings verify that the intended distinct dimensions, process, performance, and purpose in Chancey et al.’s (2017) scale and trust and distrust in Jian et al.’s (2000) scale, exist in their respective scales. However, the findings reveal that Chancey et al.’s (2017) scale and Jian et al.’s (2000) scale likely do not measure the same construct of human-automation trust. This finding brings clarity to the discrepancies in findings when the two scales were used together (e.g., Sato et al., 2024). The two scales do not operationalize trust in the same way.

From this, questions emerge about what construct or aspect of human-automation trust each scale is actually capturing. Chancey et al. (2017) was developed from the triadic structure of human-automation trust, a structure echoed from years of cross-domain trust research (e.g., Lee & See, 2004; Mayer et al., 1995; Muir & Moray, 1996; Rempel et al., 1985), and findings from this study support the Chancey et al. (2017) scale does contain three distinct dimensions. This poses an important question regarding similarities and differences that Jian et al.’s (2000) and Chancey et al.’s (2017) scales measure on human-automation trust. A related question is the extent to which these two scales vary in their sensitivity. Rating variability across the four experiments may represent not only differences in the validity of the two scales but also sensitivity, which should be examined in future research.

Several limitations exist in the current study. First, the results suggest that trust and distrust in Jian et al. (2000) scale are distinct in this sample, but future researchers may want to evaluate this for themselves in their samples for generalizability. Second, although widely used to make design decisions and drive human–automation trust theory, findings from this study suggest the Jian et al. (2000) scale is not measuring human–automation trust along theoretical triadic dimensions (cf. Lee & See, 2004; Mayer et al., 1995; Muir & Moray, 1996; Rempel et al., 1985), and importantly, that the distrust items in Jian et al. (2000) scale did not correlate with any of the three dimensions of Chancey et al. (2017) scale. The current results are, however, constrained to studies that use relatively simplistic automation (i.e., sensor-based signalizing systems). Therefore, more research is necessary for determining whether the current results hold for interactions between humans and more complex technologies such as human–robot interactions (Hancock et al., 2011; Nam & Lyons, 2020) and human-autonomy teaming (Chancey et al., 2021; Sato et al., 2022). Lastly, as with many laboratory studies, the studies used in this current analysis include system failures that are randomly generated. These failures are not fully representative of what an operator would encounter in the real world, where failures would have a root cause that the operator could account for, potentially impacting when and how the operator would trust the system. This limitation should be taken into consideration.

Key Points

• We examined the relationship of the constructs measured by Chancey et al. (2017) and Jian et al. (2000) trust scales using a multilevel confirmatory factor analysis.

• Results showed that trust and distrust are distinct factors, loading onto the same construct of trust and that performance, process, and purpose are distinct factors, loading onto another.

• The analysis suggested that the two scales likely measure different constructs of human–automation trust.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was partially supported by a Cooperative Agreement, Langley Research Center (NIA.COOP.05.202075-202075) to Old Dominion University from NASA’s Autonomous Systems Enduring Discipline under the Transformational Tools and Technologies Revolutionary Aviation Mobility Sub-project. The views expressed are those of the authors and do not necessarily reflect the official policy or position of NASA or the U.S. Government.

ORCID iD

Yusuke Yamani

Author Biographies

Yusuke Yamani is an interim associate dean of the Perry Honors College and an associate professor in the Department of Psychology and the Department of Civil and Environmental Engineering at Old Dominion University. He received his PhD in psychology (Visual Cognition and Human Performance) from the University of Illinois at Urbana-Champaign in 2013.

Shelby K. Long is a PhD candidate in the Department of Psychology at Old Dominion University. She earned her MS in psychology at Old Dominion University in 2018 and her BS in psychology at Georgia Institute of Technology in 2013.

Tetsuya Sato is a PhD candidate in the Department of Psychology at Old Dominion University. He received his MS and BS in psychology from Old Dominion University in 2020 and 2018, respectively.

Abby Braitman is an assistant professor in the Department of Psychology at Old Dominion University. She earned her PhD in applied experimental psychology at Old Dominion University in 2012.

Mike Politowicz is a PhD student in the Department of Psychology at Old Dominion University and a research aerospace engineer in the Crew Systems and Aviation Operations Branch at NASA Langley Research Center. He received his BS in aerospace engineering from the University of Michigan in 2011.

Eric T. Chancey is a human factors researcher in the Crew Systems and Aviation Operations branch at NASA’s Langley Research Center. He earned his PhD in human factors psychology at Old Dominion University in 2016.

References

Bainbridge

(1983). Ironies of automation. Automatica, 19(6), 775–779. https://doi.org/10.1016/0005-1098(83)90046-8

Bolton

M. L.

(2022). Trust is not a virtue: Why we should not trust trust. Ergonomics in Design: The Quarterly of Human Factors Applications, 0(0), Article 106480462211301. https://doi.org/10.1177/10648046221130171

Chancey

E. T.

(2016). The effects of alarm system errors on dependence: Moderated mediation of trust with and without risk [Doctoral Dissertation, Old Dominion University]. ProQuest Dissertations & Theses Global.

Chancey

E. T.

Bliss

J. P.

Proaps

A. B.

Madhavan

(2015). The role of trust as a mediator between system characteristics and response behaviors. Human Factors, 57(6), 947–958. https://doi.org/10.1177/0018720815582261

Chancey

E. T.

Bliss

J. P.

Yamani

Handley

H. A.

(2017). Trust and the compliance–reliance paradigm: The effects of risk, error bias, and reliability on trust and dependence. Human Factors, 59(3), 333–345. https://doi.org/10.1177/0018720816682648

Chancey

E. T.

Politowicz

M. S.

(2020). Public trust and acceptance for concepts of remotely operated Urban Air Mobility transportation. Proceedings of the Human Factors and Ergonomics Society - Annual Meeting, 64(1), 1044–1048. https://doi.org/10.1177/1071181320641251

Chancey

E. T.

Politowicz

M. S.

Le Vie

(2021). Enabling advanced air mobility operations through appropriate trust in human-autonomy teaming: Foundational research approaches and applications. AIAA Scitech 2021 Forum.

Chen

J. Y.

Barnes

M. J.

(2014). Human–agent teaming for multirobot control: A review of human factors issues. IEEE Transactions on Human-Machine Systems, 44(1), 13–29. https://doi.org/10.1109/thms.2013.2293535

Conchie

S. M.

Taylor

P. J.

Charlton

(2011). Trust and distrust in safety leadership: mirror reflections? Safety Science, 8-9, 1208–1214.

10.

Delbecq

A. L.

Van de Ven

A. H.

Gustafson

D. H.

(1975). Group techniques for program planning: A guide to nominal group and Delphi processes. Scott, Foresman.

11.

Hancock

P. A.

Billings

D. R.

Schaefer

K. E.

Chen

J. Y.

De Visser

E. J.

Parasuraman

(2011). A meta-analysis of factors affecting trust in human-robot interaction. Human Factors, 53(5), 517–527. https://doi.org/10.1177/0018720811417254

12.

Hartwich

Witzlack

Beggiato

Krems

J. F.

(2019). The first impression counts–A combined driving simulator and test track study on the development of trust and acceptance of highly automated driving. Transportation Research Part F: Traffic Psychology and Behaviour, 65, 522–535. https://doi.org/10.1016/j.trf.2018.05.012

13.

Hayes

A. F.

(2017). Introduction to mediation, moderation, and conditional process analysis: A regression-based approach. Guilford publications.

14.

L. T.

Bentler

P. M.

(1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling: A Multidisciplinary Journal, 6(1), 1–55. https://doi.org/10.1080/10705519909540118

15.

Jian

J. Y.

Bisantz

A. M.

Drury

C. G.

(2000). Foundations for an empirically determined scale of trust in automated systems. International Journal of Cognitive Ergonomics, 4(1), 53–71. https://doi.org/10.1207/s15327566ijce0401_04

16.

Karpinsky

N. D.

Chancey

E. T.

Palmer

D. B.

Yamani

(2018). Automation trust and attention allocation in multitasking workspace. Applied Ergonomics, 70, 194–201. https://doi.org/10.1016/j.apergo.2018.03.008

17.

Kohn

S. C.

de Visser

E. J.

Wiese

Lee

Shaw

T. H.

(2021). Measurement of trust in automation: A narrative review and reference guide. Frontiers in Psychology, 12, Article 604977. https://doi.org/10.3389/fpsyg.2021.604977

18.

Lee

Yamani

Long

S. K.

Unverricht

Itoh

(2021). Revisiting human- machine trust: A replication study of Muir and Moray (1996) using a simulated pasteurizer plant task. Ergonomics, 64(9), 1–14. https://doi.org/10.1080/00140139.2021.1909752

19.

Lee

J. D.

Moray

(1992). Trust, control strategies and allocation of function in human-machine systems. Ergonomics, 35(10), 1243–1270. https://doi.org/10.1080/00140139208967392

20.

Lee

J. D.

Moray

(1994). Trust, self-confidence, and operators’ adaptation to automation. International Journal of Human-Computer Studies, 40(1), 153–184. https://doi.org/10.1006/ijhc.1994.1007

21.

Lee

J. D.

See

K. A.

(2004). Trust in automation: Designing for appropriate reliance. Human Factors, 46(1), 50–80. https://doi.org/10.1518/hfes.46.1.50_30392

22.

Lewicki

R. J.

McAllister

D. J.

Bies

R. J.

(1998). Trust and distrust: New relationships and realities. Academy of Management Review, 23(3), 438–458.

23.

Duncan

T. E.

Harmer

Acock

Stoolmiller

(1998). Analyzing measurement models of latent variables through multilevel confirmatory factor analysis and hierarchical linear modeling approaches. Structural Equation Modeling: A Multidisciplinary Journal, 5(3), 294–306. https://doi.org/10.1080/10705519809540106

24.

Long

S. K.

Lee

Yamani

Unverricht

Itoh

(2022). Does automation trust evolve from a leap of faith? An analysis using a reprogrammed pasteurizer simulation task. Applied Ergonomics, 100(6), Article 103674. https://doi.org/10.1016/j.apergo.2021.103674

25.

Long

S. K.

Sato

Millner

Loranger

Mirabelli

Yamani

(2020). Empirically and theoretically driven scales on automation trust: A multi-level confirmatory factor analysis. Proceedings of the Human Factors and Ergonomics Society - Annual Meeting, 64(1), 1829–1832. https://doi.org/10.1177/1071181320641440

26.

Lyons

J. B.

Stokes

C. K.

(2012). Human–human reliance in the context of automation. Human Factors, 54(1), 112–121. https://doi.org/10.1177/0018720811427034

27.

Madsen

Gregor

(2000). Measuring human-computer trust. In Proceedings of 11th Australasian Conference on Information Systems (Vol. 53, pp. 6–8). ACM.

28.

Mayer

R. C.

Davis

J. H.

Schoorman

F. D.

(1995). An integrative model of organizational trust. Academy of Management Review, 20(3), 709–734. https://doi.org/10.5465/amr.1995.9508080335

29.

Meyer

Wiczorek

Gunzler

(2014). Measures of reliance and compliance in aided visual scanning. Human Factors, 56(5), 840–849. https://doi.org/10.1177/0018720813512865

30.

Moore

G. C.

Benbasat

(1991). Development of an instrument to measure the perceptions of adopting an information technology innovation. Information Systems Research, 2(3), 192–222. https://doi.org/10.1287/isre.2.3.192

31.

Moray

Inagaki

Itoh

(2000). Adaptive automation, trust, and self-confidence in fault management of time-critical tasks. Journal of Experimental Psychology: Applied, 6(1), 44–58. https://doi.org/10.1037//1076-898x.6.1.44

32.

Muir

B. M.

(1994). Trust in automation: Part I. Theoretical issues in the study of trust and human intervention in automated systems. Ergonomics, 37(11), 1905–1922. https://doi.org/10.1080/00140139408964957

33.

Muir

B. M.

Moray

(1996). Trust in automation. Part II. Experimental studies of trust and human intervention in a process control simulation. Ergonomics, 39(3), 429–460. https://doi.org/10.1080/00140139608964474

34.

Nam

C. S.

Lyons

J. B.

(Eds.), (2020). Trust in human-robot interaction. Academic Press.

35.

Neuman

S. P.

(1994). Generalized scaling of permeabilities: Validation and effect of support scale. Geophysical Research Letters, 21(5), 349–352. https://doi.org/10.1029/94gl00308

36.

Parasuraman

Riley

(1997). Humans and automation: Use, misuse, disuse, abuse. Human Factors: The Journal of the Human Factors and Ergonomics Society, 39(2), 230–253. https://doi.org/10.1518/001872097778543886

37.

Politowicz

M. S.

Chancey

E. T.

Glaab

L. J.

(2021). Effects of autonomous sUAS separation methods on subjective workload, situation awareness, and trust. AIAA SciTech.

38.

Raudenbush

S. W.

Bryke

A. S.

(2001). Hierarchical linear models: Applications and data analysis methods (2nd ed.). Sage.

39.

Rempel

J. K.

Holmes

J. G.

Zanna

M. P.

(1985). Trust in close relationships. Journal of Personality and Social Psychology, 49(1), 95–112. https://doi.org/10.1037//0022-3514.49.1.95

40.

Riley

V. A.

(1994). Human use of automation [Doctoral Dissertation, University of Minnesota]. ProQuest Dissertations Publishing.

41.

Rupp

M. A.

Michaelis

J. R.

McConnell

D. S.

Smither

J. A.

(2016). The impact of technological trust and self-determined motivation on intentions to use wearable fitness technology. Proceedings of the Human Factors and Ergonomics Society - Annual Meeting, 60(1), 1434–1438. https://doi.org/10.1177/1541931213601329

42.

Santiago- Espada

Myer

R. R.

Latorella

K. A.

Comstock

J. R.

(2011). The multi- attribute task battery II (MATB- II) software for human performance and workload research: A user’s guide (NASA/TM-2011–217164).

43.

Sato

Islam

Still

J. D.

Scerbo

M. W.

Yamani

(2023). Task priority reduces an adverse effect of task load on automation trust in a dynamic multitasking environment. Cognition, Technology & Work, 25(1), 1–13. https://doi.org/10.1007/s10111-022-00717-z

44.

Sato

Jackson

Yamani

(2024) (In press). Number of interrupting events influences response time in multitasking, but not trust in automation. The International Journal of Aerospace Psychology. https://doi.org/10.1080/24721840.2024.2311706

45.

Sato

Long

S. K.

Yamani

(in preparation). The effects of multitasking on visual scanning and trust towards imperfect automation.

46.

Sato

Politowicz

M. S.

Islam

Chancey

E. T.

Yamani

(2022). Attentional considerations in advanced air mobility operations: Control, manage, or assist? Proceedings of the Human Factors and Ergonomics Society - Annual Meeting, 66(1), 28–32. https://doi.org/10.1177/1071181322661184

47.

Sato

Yamani

Liechty

Chancey

E. T.

(2020). Automation trust increases under high-workload multitasking scenarios involving risk. Cognition, Technology & Work, 22(2), 399–407. https://doi.org/10.1007/s10111-019-00580-5

48.

Sitkin

S. B.

Roth

N. L.

(1993). Explaining the limited effectiveness of legalistic “remedies” for trust/distrust. Organization Science, 4(3), 367–392.

49.

Sorkin

R. D.

Woods

D. D.

(1985). Systems with human monitors: A signal detection analysis. Human-Computer Interaction, 1(1), 49–75. https://doi.org/10.1207/s15327051hci0101_2

50.

Spain

R. D.

Bustamante

E. A.

Bliss

J. P.

(2008). Towards an empirically developed scale for system trust: Take two. Proceedings of the Human Factors and Ergonomics Society - Annual Meeting, 52(19), 1335–1339. https://doi.org/10.1177/154193120805201907

51.

Yamani

Horrey

W. J.

(2018). A theoretical model of human-automation interaction grounded in resource allocation policy during automated driving. International Journal of Human Factors and Ergonomics, 5(3), 225–239. https://doi.org/10.1504/ijhfe.2018.095912

52.

Yamani

Long

S. K.

Itoh

(2020). Human–automation trust to technologies for naïve users amidst and following the COVID-19 pandemic. Human Factors, 62(7), 1087–1094. https://doi.org/10.1177/0018720820948981

53.

Yamani

McCarley

J. S.

(2016). Workload capacity: A response time–based measure of automation dependence. Human Factors, 58(3), 462–471. https://doi.org/10.1177/0018720815621172

54.

Yamani

McCarley

J. S.

(2018). Effects of task difficulty and display format on automation usage strategy: A workload capacity analysis. Human Factors, 60(4), 527–537. https://doi.org/10.1177/0018720818759356

55.

Yamani

Sato

Jackson

Politowicz

Chancey

E. T.

(in press). A theoretical approach to management of limited attentional resources to support the m:N operation in advanced air mobility ecosystem. In Lawless

W. F.

Mittu

Sofge

Hesham

(Eds.), Human-machine shared contexts. Elsevier Science.

Multilevel Confirmatory Factor Analysis Reveals Two Distinct Human–Automation Trust Constructs

Abstract

Objective

Background

Method

Results

Conclusion

Application

Keywords

Introduction

Current Study

Experiment Summaries

Method

Participants

Scales

Analysis approach

Results

Discussion

Key Points

Footnotes

Declaration of Conflicting Interests

Funding

ORCID iD

Author Biographies

References