When dimensions collide: The risks of unjustified simplification of complex leadership concepts

Abstract

Although prior research has shown that behavioral and nonbehavioral items are not equivalent, researchers have failed to recognize that these items elicit reliance on different memory processes and predict different outcomes. The typical use of multidimensional scales compounds this problem even further when researchers ignore the multidimensional nature of the construct and smash disparate dimensions based on semantic and episodic memory sources into a single composite score, a practice we refer to as DIMSmash. We conduct a literature review of four popular leadership measures to examine the prevalence of DIMSmash, demonstrate how a memory-based approach to measurement suggests that DIMSmash is ill-advised, and conduct a computer simulation to show that the consequences of DIMSmash are not trivial. We found that the practice of DIMSmash is pervasive; 89% of the reviewed articles collapsed multidimensional leadership measures into a single score, even though the consequences of DIMSmash matter. Proceeding with DIMSmash despite the warning signs yields misleading results that obscure relationships with outcomes, which leads to incorrect conclusions. We suggest that the use of a bifactor model can overcome the limitations of DIMSmash.

Keywords

Measurement Validity Leadership Memory processes Theory testing Construct validity

Words matter. Yet researchers routinely assume that scale items are equivalent and act on that assumption by engaging in practices that undermine construct validity. This problem is illustrated by the widespread conflation of behavioral and nonbehavioral items (Banks et al., 2023) in scales developed to measure a wide range of constructs in the management domain, with leadership being a prime example. As observed by Fischer (2023), measures intended to be behavioral are often composed of multiple items with highly abstract language that refers to the leader's character or the leader's effect on followers. While mixing different types of items within a single scale has been criticized, both behavioral and nonbehavioral items are necessary to capture complex multidimensional leadership constructs because the dimensions typically focus on both what the leader does and trait-based attributions of who the leader is (Lord, 1985). However, problems arise with the typical use of multidimensional scales because researchers act as if the behavioral and nonbehavioral dimensions are equivalent and collapse them into a single composite score.

We argue that researchers are largely unaware that the dimensions in their multidimensional measures may not only reflect different dimensions of a construct but also tap into distinct memory processes in a way that profoundly affects construct and predictive validity. Indeed, cognitive research has long distinguished between two distinct memory processes: one relying on abstract general impressions (semantic memory) and the other relying on behaviors or events in a specific context (episodic memory). Prior research has documented that there are meaningful differences between behavioral and nonbehavioral items; they are associated with different memory processes, and they predict different outcomes (Hansbrough et al., 2021). The practice of collapsing all dimensions into a single composite score, regardless of the memory system reflected by these items, not only potentially misrepresents the overall construct but also may obfuscate relationships predicted by our theories. We refer to this practice as DIMSmash because all of the items that are intended to represent different dimensions of the multidimensional construct are smashed into a single score. As observed by Rhemtulla et al. (2020), the assumption that all items measure the same thing is plausible in some contexts; however, when measuring broader and more conceptually complex constructs, the items used in the scale are not interchangeable manifestations of the multidimensional construct. Nevertheless, common factor models are routinely assumed and applied without any justification, and the misapplication of those models can result in incorrect interpretations (Rhemtulla et al., 2020). When a construct's multidimensional structure has been conceptually and empirically verified, believing that the smashed composite score will adequately represent the intended construct is more of a hope or a wish than sound science.

While we focus on the practice of DIMSmash in the leadership domain, it should be noted that these concerns apply more broadly to both micro and macro levels of research. For example, Boon et al. (2019) observed that the human resource (HR) systems literature assumes an additive relationship between individual practices by summing scores on individual practices to create a single HR system score. Similarly, the strategic human capital literature has been criticized for conflating knowledge, skills, abilities, and other characteristics (KSAO) into a single human capital resource (HCR) score (Nyberg et al., 2014). As such, the ideas presented here have widespread applicability.

In the present paper, we (a) detail how a memory-based approach to measurement raises questions about the advisability of DIMSmash and how it undermines theory testing; (b) present the results of a literature review of four popular leadership measures to examine the prevalence of DIMSmash and the justification provided for doing so; (c) present the results from a computer simulation to explore the pitfalls of DIMSmash; and (d) provide recommendations to consider prior to DIMSmash to strengthen theory testing of multidimensional constructs. We begin our discussion with a review of the distinction between semantic and episodic memory.

Semantic and episodic memory

Semantic memory consists of general knowledge and broad cognitive schemas that include representations of people in terms of their traits, attributes, and group memberships (i.e., leader/non-leader). Semantic memory entails a pattern completion process that fills in the gaps in our memories and operates at a preconscious level (Hanges et al., 2000; Lord et al., 2001; Smith & DeCoster, 2000). For example, people may rely on general schemas such as implicit personality and implicit leadership theories when they have incomplete information about another person (Fiske & Taylor, 2013). Reliance on semantic memory may contribute to inaccuracy in ratings because people endorse items that seem familiar but did not actually occur (Hansbrough et al., 2015; Shondrick et al., 2010). It should be noted that semantic memory tends to be the default mode of processing due to its general relevance and ease of use in guiding retrospective judgments (Srull & Wyer, 1989).

In contrast, episodic memory is a context-specific memory for events and personal experiences that is integrated with information about the self and consists of rich, vivid details about the spatial context for an event, when the event occurred, and the emotions experienced during encoding (Allen et al., 2008; Tulving, 2002). Because episodic memory enables individuals to consciously re-experience past events upon retrieval (Tulving, 2002), it enables them to provide explanations or justifications for their conclusions. As indicated by Smith and DeCoster (2000), this differs from semantic memory, which may render perceivers unable to provide any justification for their answers other than intuition. Episodic and semantic memories are both aspects of declarative memory, yet they generally emphasize different brain structures (temporal lobes for semantic memory and the hippocampus and limbic systems for episodic memory; Addis & Schacter, 2012; Allen et al., 2008), which is consistent with dual-process models of memory.

It is important to note that one type of memory is not inherently better than the other. Rather, we maintain that the memory system being tapped must be consistent with the nature of the intended dimension. We recommend that ratings should be drawn from semantic memory when the dimensions focus on general impressions of the leader, whereas ratings should be drawn from episodic memory when the dimensions focus on leader behaviors.

It should be noted that developing measures that are purely episodic or semantic is challenging, in part, because raters have different experiences with leaders. Therefore, no item will evoke an episodic or semantic-based response from all individuals. Nevertheless, carefully written items can have markedly different tendencies to elicit responses based on semantic or episodic memory for both leadership and criterion measures (Hansbrough et al., 2021).

Winocur et al. (2010) contended that there is a dynamic interplay between semantic and episodic memory. Over time, the gist of an episodic memory is consolidated into semantic memory while still preserving the episodic memory. Thus, a rater may have two very different types of memories regarding a specific leader. For example, an individual may encode in episodic memory an event where a leader helped them complete a specific work task on a particular afternoon so they could leave early for an important event. The consolidated semantic representation of this experience may only be that the leader is considerate and helpful. The type of memory that raters use depends upon the availability of the information, the cues that are present, and the demands of the particular task.

The words used in an item have been shown to influence the cognitive processing and memory sources people rely on when responding to the item (ter Doest & Semin, 2005). For example, Hansbrough et al. (2021) found that abstract items centered on adjectives or generalized impressions were associated with semantic memory, whereas items focusing on behaviors were associated with episodic memory. Similarly, Lord et al. (2021) reported that episodic memory increased when an item referenced repeated events and specified a particular context. Recently, Balthazard et al. (2023) reported preliminary neurological evidence showing that abstract and concrete items differentially activate regions in the brain associated with either semantic or episodic memory.

Our focus on the memory system used by raters is consistent with the call from multiple psychometricians (e.g., Furr, 2021; Kane, 2006; Messick, 1990) to explore response process validity. This call has been codified in the 1999 Standards for Educational and Psychological Testing (American Educational Research Association [AERA], American Psychological Association [APA], & National Council on Measurement in Education [NCME], 1999), which explicitly states the need for researchers to explore the response process validity of their measures. Response process validity concerns the alignment of people's thought processes as they respond to measures with the thought processes researchers assume respondents use when responding to these measures (Zumbo & Chan, 2014).

From a memory-based perspective, when researchers intend to assess constructs consistent with semantic memory, such as implicit leadership theories or overall leadership impressions, raters will tend to emphasize semantic memory while responding to items. Alternatively, when researchers intend to assess constructs consistent with episodic memory, such as leader behaviors, raters will tend to emphasize episodic memory when responding to items. Construct validity improves as the overlap between the actual and hypothesized thought processes increases (Böckenholt, 2012; Böckenholt, 2017).

The common practice of combining dimensions based on behavioral and nonbehavioral items into a single composite score overlooks the fact that they may reflect two distinct memory processes and predict different outcomes. As shown by Hansbrough et al. (2021), scales based on episodic memory better predicted specific, event-based constructs and outcomes (i.e., trust¹ and empowerment), whereas scales based on semantic memory better predicted general, abstract constructs (i.e., perceived organizational support) or global outcomes. We contend that combining semantic and episodic items into a single composite score may hide support for some of a theory's predictions, whereas other predictions may be overestimated simply as an artifact of a larger percentage of items that tap into one particular memory sourcewhich is consistent with the nature of the outcome.

To summarize, we contend that the practice of collapsing dimensions based on behavioral and nonbehavioral items into a single composite score (i.e., DIMSmash) (1) ignores the meaningful distinction between dimensions based on semantic memory and those based on episodic memory, undercutting response process validity, and (2) results in misleading conclusions regarding the support of our theories.

Use of Multidimensional Scales

Although many leadership constructs are conceptualized and empirically validated as multidimensional, these constructs are typically collapsed into single composite scores. Ignoring the multidimensional nature of a construct and treating it as unidimensional is inappropriate (Carver, 1989; Chen et al., 2012; Hull et al., 1991) and reflects a lack of understanding of common factor models. Researchers erroneously assume that a common factor model reflects a combination of a measure's items. In reality, common factor models reflect only what is shared (i.e., common) among all items in a measure (Rhemtulla et al., 2020). Any variance that is not shared among all of the items is considered to be random error.

Consequences of DIMSmash. The consequences of DIMSmash are not trivial as they may obscure relationships with outcomes. “If indicators of the construct represent disparate facets and those facets uniquely predict, or are predicted by, another variable in the model, the common factor model will lead to inappropriate inferences” (Rhemtulla et al., 2020, p. 32). This concern is particularly relevant for multidimensional models of leadership where behavioral and nonbehavioral dimensions, that are associated with different memory processes, are anticipated to predict different outcomes.

As noted by Edwards (2001), the determination of the appropriate model depends on whether the multidimensional construct is a superordinate or aggregate construct. A superordinate construct is top-down whereby the construct is manifested by its dimensions (e.g., leader–member exchange [LMX] as indicated by Affect, Loyalty, Contribution, and Professional Respect). Thus, multidimensional leadership theories are superordinate constructs. In contrast, an aggregate construct is bottom-up whereby the construct is an additive combination of its dimensions (e.g., overall job satisfaction as a composite of satisfaction with specific job facets). While there are aggregate constructs in organizational behavior (Edwards, 2001), the majority of researchers conceptualize their constructs as superordinate (Credé & Harms, 2015). In keeping with this practice and our objective to focus on the use of multidimensional models in the leadership domain, we will confine our discussion to superordinate constructs.

Although superordinate constructs are often operationalized by summing the scores on all dimensions to create a single composite score (i.e., DIMSmash), this fails to recognize that there may be meaningful differences among the first-order dimensions, and these dimensions are expected to differentially relate to various dependent variables. As noted by Edwards (2001, p. 185), “If the dimensions were not distinct, then the construct would be unidimensional rather than multidimensional.” A bifactor model reflects the general factor that is common among the items as well as the individual dimensions. As such, a bifactor model allows researchers to examine the influence of the different dimensions apart from that of the common factor on criteria. Bifactor models are particularly useful when researchers are interested in the predictive validity of both the general factor and the different dimensions (Chen et al., 2006). Therefore, bifactor models are particularly relevant for multidimensional leadership theories where different dimensions are anticipated to predict different outcomes. We contend that a bifactor model is a more appropriate model because the contribution of each dimension is preserved.²

How Common is DIMSmash? Although Rhemtulla et al. (2020) asserted that researchers routinely practice DIMSmash, we do not know to what extent this extends to the leadership domain. Nor do we know what justifications researchers provide for this practice. Therefore, we surveyed the literature to explore (a) the prevalence of DIMSmash and (b) what justification, if any, was provided for doing so. The following multidimensional leadership measures were selected based on reviews of the most popular leadership models (e.g., Dinh et al., 2016; Zhu et al., 2019): transformational leadership (Multifactor Leadership Questionnaire [MLQ] 5X; Bass & Avolio, 1996); LMX (LMX MDM, Liden & Maslyn, 1998); authentic leadership (Avolio et al., 2007; Walumbwa et al., 2008); and servant leadership (Liden et al., 2008). All of these measures were originally developed to represent multidimensional constructs, yet the extent to which they continued to be used in this manner is unclear.

Following Kim et al. (2020), the search was constrained to 26 high-quality and peer-reviewed journals in management, psychology, and leadership published in 2018–2023 (see Appendix A for a complete list of the journals included). All journals are indexed in PsycINFO. A keyword search was conducted using the terms “transformational leadership,” “LMX,” “authentic leadership,” and “servant leadership.” The initial search yielded a total of 343 articles. We then applied the following inclusion/exclusion criteria as depicted in Figure 1. Articles were excluded from the sample if they were not based on original empirical, quantitative data; did not use one of the mulitdimensional measures of interest; did not use all of the items in the scale; or used modified scale items. A total of 55 articles met the inclusion criteria and were included in our sample. All articles included in the sample are denoted by an asterisk (*) in the references.

Figure 1.

Inclusion/Exclusion Criteria. Note. LMX MDM = Leader–Member Exchange MDM; MLQ = Multifactor Leadership Questionnaire.

We found that DIMSmash was widespread as evidenced by approximately 89% of the articles engaging in this practice. As shown in Table 1, DIMSmash is not confined to a particular measure but is a consistent practice across measures. This suggests that DIMSmash is the norm rather than the exception. Furthermore, 63.27% of the studies provided no evidence to justify DIMSmash, and another 12.24% relied on the explanation that it is a common practice (e.g., everybody does it).

Table 1.

Prevalence of DIMSmash and Justification.

	Total number of studies (k) meeting inclusion criteria	Combined multidimensional scale into a single score	Justification
	Total number of studies (k) meeting inclusion criteria	Combined multidimensional scale into a single score	None	“Everyone does it”	CFA second-order factor with highly correlated first-order dimensions	Conducted cfa but ignored results	Ran bifactor and used model's factor scores
LMX (LMX–MDM)	20	20	16 (80.00%)	1 (5.00%)	2 (10.00%)	1 (5.00%)	0 (0.00%)
Servant Leadership	5	5	2 (40.00%)	1 (20.00%)	1 (20.00%)	1 (20.00%)	0 (0.00%)
Transformational Leadership (MLQ-5x)	21	18	9 (50.00%)	3 (16.67%)	4 (22.22%)	1 (5.56%)	1 (5.56%)
Authentic Leadership (ALQ)	9	6	4 (66.67%)	1 (16.67%)	0 (0.00%)	1 (16.67%)	0 (0.00%)
Total	55	49	31 (63.27%)	6 (12.24%)	7 (14.29%)	4 (8.16%)	1 (2.04%)

Additionally, 14.29% of the articles conducted a confirmatory factor analysis (CFA) and argued that the obtained higher-order common factor was highly correlated with the first-order dimensions, thereby justifying smashing the dimensions into a single composite score. However, as Credé and Harms (2015) pointed out, this is insufficient. Researchers must also demonstrate that the second-order factor completely accounts for the relationships among the first-order dimensions as well as among all the scale's items. If the second-order factor cannot account for these relationships, only focusing on the second-order composite score ignores any unique relationships that can be predicted by the first-order dimensions. The readily available χ² fit test can be used to assess this property (Credé & Harms, 2015). If a second-order CFA is imposed on the data and the χ² fit test is non-significant, this provides evidence that the property is satisfied. However, caution is warranted when interpreting a non-significant χ² fit test due to possible alternative explanations (e.g., low statistical power or alternative models with superior global fit). We agree with Credé and Harms (2015) that a second-order factor should be used only if it has been shown that the higher-order factor completely explains the relationships below it (i.e., items and first-order relationships). Here, we discuss the consequences of using a single higher-order factor when that factor cannot explain all of the lower-level relationships. Unfortunately, this is a common occurrence; therefore, researchers need to be cognizant of the serious pitfalls of DIMSmash.

In other cases, the results of the CFA were not used to inform subsequent analyses; 8.16% of the articles presented CFA results that showed that the multidimensional model was a better fit but ignored those results and used the single composite score. Only one article (2.04%) ran a bifactor model, which we contend is the most appropriate model for multidimensional leadership measures.

Taken together, the results indicate that not only are multidimensional measures typically used in a manner that is inconsistent with the conceptualization of the intended constructs but also that DIMSmash is a widely accepted practice in the leadership literature. Indeed, the high percentage of articles that provided no justification for DIMSmash suggests that neither researchers nor reviewers view this practice as problematic.

Memory and Advisability of DIMSmash: An Example Based on LMX MDM

A memory-based approach questions the advisability of DIMSmash. We contend that researchers must consider the underlying nature of the dimensions, the item characteristics within the dimensions, and the outcomes predicted by the different dimensions prior to engaging in DIMSmash. We apply this approach using one of the leadership measures identified in the review, LMX MDM, to explore how taking memory into account might impact our decision to engage in DIMSmash and how engaging in DIMSmash despite the warning signs might change our conclusions. Our intention is not to criticize LMX, or the LMX MDM, but rather to reflect on how the measure, and other multidimensional measures, are typically used.

Underlying nature of the dimensions. LMX MDM consists of four dimensions: Affect, Loyalty, Professional Respect and Contribution. A close inspection of the dimensions reveals that they are consistent with both semantic and episodic memory. Specifically, the Affect, Loyalty, and Professional Respect dimensions are based on generalized impressions (e.g., likable/dislikable, Srull & Wyer, 1989) and are consistent with semantic memory, whereas the Contribution dimension, described as task-related behaviors (Liden & Maslyn, 1998), is consistent with episodic memory. Although the construct is conceptualized as being multidimensional, Liden and Maslyn (1998) found support for a higher-order factor. Therefore, subsequent research routinely treats LMX MDM as a unidimensional, composite measure (e.g., Bauer et al., 2006). However, as we noted earlier, only showing support for a higher-order factor is completely insufficient to justify using a single composite score (e.g., Credé & Harms, 2015).

Item characteristics. We examined items within each dimension according to Hansbrough et al.'s (2021) approach. The item characteristics provide further evidence that the dimensions of LMX MDM may be based on different memory sources. Most of the original 11 LMX MDM items within the different dimensions center on global generalized impressions of the leader and, therefore, are more abstract (e.g., “My supervisor is the type of person one would like to have as a friend”). However, the Contribution items refer to behaviors/events involving subordinate behaviors (e.g., “I do work for my supervisor that goes beyond what is specified in my job description”) and are more concrete. Moreover, because these items focus on the rater's actions, they may allow one to re-experience actions in a particular context—the hallmark of episodic memory (Tulving, 2002). In contrast, the Affect, Loyalty, and Professional Respect items pertain to global general impressions of the leader and enduring emotional states (e.g., I like my supervisor as a person), which are much less likely to evoke a memory of a particular event. (See also Appendix B for a list of items and their likely memory source.)

Next, we recruited six subject matter experts to classify the 11 LMX MDM items from Liden and Maslyn (1998) into one of two categories. The items were randomly presented to the raters, who were provided with precise definitions of the two intended categories. Specifically, each item was to be classified as either semantic or episodic.

Semantic items were defined as those that refer to a generalized impression of the leader, inferences about the leader's internal state or motives, likely future behavior, or an enduring emotional state in relation to the leader. In contrast, episodic items were defined as those that refer to specific behaviors or events that respondents could recall and mentally re-experience when responding, respondents’ self-knowledge in a particular context, or respondents’ anticipated future behavior.

The classification task was explained to all raters during a single Zoom session. Raters were guided through four practice items, and after each classification, the responses were reviewed collectively. Once the practice phase concluded and any questions were addressed, the raters completed the task independently. Of these raters, 100% classified the Contribution items as episodic whereas 33% classified any of the other items as episodic. Thus, the results replicated our classification of items.

Prediction of outcomes. Liden and Maslyn (1998) anticipated that “a global outcome such as satisfaction with supervision, which encompasses multiple beliefs about the leader, would be related to Professional Respect as well as Affect, Loyalty and Contribution” (p. 62). However, based on the predictive congruence principle (Hansbrough et al., 2021), we would expect that only dimensions consistent with semantic memory (i.e., Affect, Loyalty, and Professional Respect) would predict satisfaction with supervision because it is also based on a general impression. Indeed, Liden and Maslyn (1998) found that Affect, Loyalty, and Professional Respect all predicted satisfaction with supervision, while the Contribution dimension, which is more concrete, did not. In contrast, the Contribution dimension predicted satisfaction with work, but the other dimensions did not.

The examination of item characteristics within the dimensions and the outcomes they predict can be used to guide the selection of the most relevant models to compare using CFA. Based on our observation that Contribution differs from the other dimensions, the most relevant comparison would be a bifactor factor model in which the global semantic factor more strongly loads on the Loyalty, Affect, and Professional Respect items than the Contribution items and the Contribution items also load on a unique factor.³ The intercorrelations among the latent factors reported by Liden and Maslyn (1998, Figure 1) support this interpretation. The average correlation between the Contribution latent factor and the other three factors is .52, whereas the average correlation between the three non-Contribution factors is .73. These results suggest that Contribution is acting in a systematically different way than the other three dimensions. Therefore, DIMSmash may be particularly inadvisable in this situation.

But let us assume that despite all of the warning signs, we proceed with DIMSmash anyway. Our example using LMX MDM, as detailed above, shows that Contribution is the only dimension that predicts satisfaction with work. Rhemtulla et al. (2020) cautioned that when a dimension uniquely predicts a variable, collapsing the scale into a single composite score may obscure relationships with outcomes. We conducted a simulation to demonstrate this assertion.

Does DIMSmash Matter? A Simulation

Procedures. We conducted a Monte Carlo simulation to illustrate the misleading results obtained when a multidimensional measure (i.e., LMX MDM) is collapsed into a single composite score. As discussed earlier, Liden and Maslyn's (1998) exploratory factor analysis and CFA revealed that LMX MDM had four distinct factors (i.e., Affect, Loyalty, Contribution, and Professional Respect). We used the correlations from Table 5 of Liden and Maslyn's (1998) paper (i.e., correlations from the validation sample of organizational employees) to generate data for each LMX dimension in our simulation.

We chose the two dependent variables discussed earlier (i.e., “satisfaction with work” and “satisfaction with supervision”) to demonstrate any potential biasing effect of DIMSmash in our simulation. As shown by Liden and Maslyn (1998; see Table 6), these two variables exhibited different patterns of results with the four LMX dimensions. We used the standardized regression coefficients shown in Liden and Maslyn's Table 6 for satisfaction with work and satisfaction with supervision to generate the dependent variables in our simulation. We were able to recapture the values that Liden and Maslyn (1998) obtained for the correlations among the LMX dimensions (see Table 2) as well as the standardized regression coefficients with the dependent variables (see Table 3) in our simulation. This replication of Liden and Maslyn's (1998) results confirmed that our simulation code was working and that our results are interpretable.

Table 2.

Comparison of Liden and Maslyn's (1998) Table 5 Correlations among the LMX Dimensions and Simulation Results.

Liden & Maslyn's Results	Affect	Loyalty	Contribution	Professional Respect
1. Affect		0.65	0.38	0.67
2. Loyalty			0.38	0.57
3. Contribution				0.32
Simulation Results	Affect	Loyalty	Contribution	Professional Respect
1. Affect		0.649	0.379	0.669
2. Loyalty			0.379	0.569
3. Contribution				0.320

Note. We used Liden and Maslyn's (1998) correlations for the validation sample of organizational employees.

Table 3.

Comparison of Liden and Maslyn's (1998) Table 6 Regression Coefficients and Simulation Results.

Liden and Maslyn's results	Affect	Loyalty	Contribution	Professional Respect
Satisfaction with work	−.33	0.25	0.29	0.00
Satisfaction with supervision	0.34	0.25	0.13	0.26
Simulation results	Affect	Loyalty	Contribution	Professional Respect
Satisfaction with work	−.329	0.252	0.290	−0.002
Satisfaction with supervision	0.340	0.249	0.130	0.260

Next, we used RStudio 2025.05.0+496 “Mariposa Orchid” Release for Windows to run our simulation. One of the authors of this paper wrote the R code for this simulation (see Appendix C). Our results are based on the same sample size as the Liden and Maslyn (1998) validation sample of organizational employees (n = 249) per sample, and we randomly generated 50,000 samples. After generating the data for the variables, we committed DIMSmash in two ways. First, we combined the four LMX dimensions into a single composite by simply averaging the dimension scores together into a single score. In the original scale development study, the four LMX dimensions differed in terms of the number of items used to measure each dimension; the Contribution dimension was measured with two items whereas all the other LMX dimensions had three items. When the number of items for each dimension varies, the overall score more strongly reflects the dimensions with the most items. Therefore, we also created a differentially weighted composite to equally reflect all dimensions. The second composite score, hereafter referred to as the differentially weighted composite, was created by differentially weighting each LMX dimension by the number of items in that dimension. We correlated the two composite scores with the dependent variables and recorded the results. We repeated this process 50,000 times. Our results represent the average correlations obtained across the 50,000 replication samples.

Results. Both the unit composite and the differentially weighted composite failed to exhibit significant correlations with satisfaction with work (Unit Composite: $\bar{r}$ (98) = 0.094, $\bar{t}$ (98) = 1.598, p > .05; Differentially Weighted Composite: $\bar{r}$ (98) = 0.084, $\bar{t}$ (98) = 1.376, p > .05). Collapsing the LMX MDM measure into a single composite score yields the impression that LMX does not relate to satisfaction with work. However, that is an incorrect conclusion. Liden and Maslyn (1998) found that the Contribution dimension significantly predicted satisfaction with work. Because none of the other LMX dimensions significantly contributed to the prediction of this dependent variable, both the Unit and Differentially Weighted Composite scores hid this relationship. Therefore, DIMSmash would have misled researchers regarding the relationship between LMX and satisfaction with work.

The opposite result was obtained with satisfaction with supervision as the criterion. Both the Unit Composite and the Differentially Weighted Composite showed significant correlations with this dependent variable (Unit Composite: $\bar{r}$ (98) = 0.805, $\bar{t}$ (98) = 42.474, p < .001; Differentially Weighted Composite: $\bar{r}$ (98) = 0.811, $\bar{t}$ (98) = 44.043, p < .001). However, Liden and Maslyn (1998) found that the Affect, Loyalty, and Professional Respect dimensions significantly predicted satisfaction with supervision, but the Contribution dimension did not. This differs from the conclusions that would be reached using the DIMSmash composite scores. The magnitude of the correlations between the composite scores and the dependent variable is a highly inflated estimate of LMX's relationship to satisfaction with supervision. Indeed, these correlations are so large that they might raise questions in researchers’ minds regarding the extent to which LMX and satisfaction with supervision are truly separate constructs. However, a comparison of these DIMSmash composite score correlations with the actual regression weights shown in Table 3 reveals the misleading nature of the composite correlations. In summary, consistent with Carver’s (1989) and Rhemtulla et al.’s (2020) observations, creating a single composite score via DIMSmash yields misleading results.

Although we focused on how dimensions based on different types of memory are related to different outcomes, it should be noted that there may be other reasons why the dimensions relate to outcomes. For example, Liden and Maslyn (1998) reported that both Loyalty and Contribution were related to leader-rated performance. This is likely because they reflect different aspects of performance (e.g., relationships and task) that are associated with both semantic and episodic memory. However, the association between Loyalty and leader-rated performance (.48) is much higher than the association between Contribution and leader-rated performance (.32). DIMSmash would have obscured this distinction. The Contribution dimension items center on the follower's self-reported behavior. Therefore, another possibility is that social desirability or self-presentational concerns may be driving the Contribution results. However, as pointed out by Liden and Maslyn (1998) while Contribution was significantly related to social desirability (.16, p < .05), it was not practically significant nor were any of the correlations between social desirability and Affect, Loyalty, and Professional Respect significant. Moreover, the reported means of Contribution were similar to those of the other dimensions, which suggests that Contribution was not inflated by impression management.

To illustrate what the bifactor model does and how it shows the specific dimensions independent from the common factor, we applied this model using data generated based on Liden and Maslyn’s (1998) LMX MDM data. Liden and Maslyn (1998) reported that their data fit the correlated first-order CFA model in Figure 1. We used the standardized coefficients in this figure to generate simulated data that conformed to their original data. We chose a bifactor model because Edwards (2001) argued that it is the most appropriate model for superordinate multidimensional theories.

We used R Studio 2025.05.0+496 “Mariposa Orchid” Release for Windows to run our simulation, and one of the authors of this paper wrote the R code for this simulation (see Appendix D). The program generated 500 observations using the standardized coefficients shown in Liden and Maslyn's Figure 1. Before examining what happens when the bifactor model is applied to this simulated data, we needed to verify that this R code worked properly. We did this by fitting the generated data back to Liden and Maslyn's (1998) Figure 1 correlated first-order CFA model. We expected that the simulated data would fit exceptionally well. As shown in Figure 2, the fit of the correlated first-order factor model was almost perfect. The obtained factor loadings closely converged to those reported by Liden and Maslyn (average discrepancy = −0.002, min/max discrepancy = −0.057 to 0.038) as did the correlation between the first-order factors (average discrepancy = 0.0135, min/max discrepancy = −0.03 to 0.041). Therefore, we concluded that our R code worked correctly and generated data corresponding to Liden and Maslyn's (1998) data.

Figure 2.

Results for Correlated First-order Confirmatory Factor Analysis (CFA) to Confirm Appendix D Program is Working. Note. BIC = Bayesian information criterion; CFI = comparative fit index; RMSEA = root mean square error of approximation; SRMR = standardized root mean square residual.

Next, we applied the bifactor model to these data. As shown in Figure 3, the fit of this model was almost identical to the correlated first-order factor model. Comparing the fit of the correlated first-order model to the fit of the bifactor model reveals a non-significant difference between these models ( $Δ χ^{2} (3) = 4.97, n s$ ). Based on fit indices alone, the bifactor model may appear equivalent to the common factor model. However, examination of the factor loadings of the bifactor model highlights an essential difference between these models. Consistent with our expectations, the previously identified semantic Affect, Loyalty, and Professional Respect items (see Appendix B) loaded more strongly on the general bifactor LMX dimension (LMX_Affect $\bar{r} = 0.79$ , LMX_Loyalty $\bar{r} = 0.63$ , LMX_Professional Respect $\bar{r} = 0.71$ ) than on the original first-order dimensions (Affect $\bar{r} = 0.36$ , Loyalty $\bar{r} = 0.38$ , Professional Respect $\bar{r} = 0.48$ ). In contrast, the episodic Contribution items loaded less strongly on the general bifactor LMX dimension (LMX_Contribution $\bar{r} = 0.25$ ) than on the original Contribution dimension (Contribution $\bar{r} = 0.71$ ). In summary, consistent with our argument, the common dimension does not represent all the items and first-order factors contained in the LMX MDM measure. Rather, the general dimension was predominantly weighted toward the semantic instead of episodic items. Thus, a common factor model can be misleading whereas a bifactor model retains the information contained across all items.

Figure 3.

Results for Bifactor Confirmatory Factor Analysis (CFA) Using the Simulated Liden and Maslyn (1998) data. Note. BIC = Bayesian information criterion; CFI = comparative fit index; LMX = leader–member exchange; RMSEA = root mean square error of approximation; SRMR = standardized root mean square residual.

Discussion

Researchers inadvertently engage in practices that undermine validity. We contend that this stems from a failure to recognize that behavioral and nonbehavioral items reflect two very different memory processes and predict different outcomes. The ubiquitous practice of smashing dimensions based on semantic and episodic memory into a single composite score (DIMSmash) compounds this problem even further. Indeed, 89% of the articles examined in our review collapsed multidimensional measures into a single composite score. Not only was DIMSmash pervasive across all of the measures examined, over 75% of the studies either offered no justification for doing so or observed that it was consistent with prior practice. However, DIMSmash undermines theory testing because it no longer represents the theory as intended. Going forward, we contend that researchers must provide more rigorous evidence (or any!) prior to DIMSmash. Failure to address the issue or simply observing that “everybody does it” is insufficient. Moreover, we call on reviewers to request compelling justification for DIMSmash.

Taking memory into account questions the advisability of DIMSmash. As shown in our example using LMX MDM, examining the intended nature of the underlying dimensions, item characteristics within each dimension, and the outcomes predicted by the different dimensions indicates that Contribution is acting in a systematically different way from that of the other dimensions. Therefore, this observation suggests that DIMSmash is ill-advised. As previously noted, collapsing a multidimensional measure into a single composite score conflates different types of memory and obscures relationships with dependent variables, resulting in incorrect conclusions.

Our simulation, based on Liden and Maslyn's (1998) data, highlights this point. Because Contribution is the only dimension that predicts satisfaction with work, DIMSmash obscures this relationship. Conversely, DIMSmash heightens the relationship with satisfaction with supervision because three of the four dimensions predict satisfaction with supervision, while Contribution is poorly correlated with the other dimensions as well as the dependent variable. In contrast, we contend that a bifactor model represents the theory as intended and will not yield misleading results because it shows the influence of the different dimensions on outcomes apart from that of the common factor.

Although our focus here was on the use of multidimensional leadership scales, our concerns regarding DIMSmash also apply to dependent variables such as organizational citizenship behaviors (OCBs). This is particularly relevant because OCBs are often used as a key outcome linked to leadership. If both multidimensional scales are smashed, the interpretation of the results becomes muddied. This issue also applies beyond the leadership domain. For example, general intelligence is often conceptualized as having both crystallized and fluid components, which show only modest correlations and predict different criteria. Furthermore, fluid intelligence declines with advancing age whereas crystallized intelligence is more stable (Allen et al., 2008; Persson & Nyberg, 2008). Yet these dimensions are typically combined into a G factor.

Nevertheless, DIMSmash appears to be the rule rather than the exception. It should be noted that the implications of the routine practice of DIMSmash are not trivial and extend well beyond a single study. For example, because relationships with key dependent variables may be obscured, researchers may erroneously conclude that portions of the theories they intended to test are not supported. Multiplied over many studies, DIMSmash raises significant concerns about whether we know what we think we know about the effect of leadership on outcomes. Meta-analyses composed of studies that use DIMSmashed composites may yield true scores that are entirely meaningless and uninterpretable because they do not represent the theory as intended. Moreover, there may be cascading effects when measures subjected to DIMSmash are used to provide evidence of construct validity/discriminant validity for new scale development. In the next section, we provide recommendations to consider prior to DIMSmash, which can be applied to many areas of organizational science.

Recommendations to Consider Prior to DIMSmash

Consider the nature of the underlying dimension, item characteristics within the dimensions, and prediction of outcomes. As highlighted in our example using LMX MDM, we encourage researchers to consider the nature of the underlying dimensions, item characteristics of the dimensions, and whether the dimensions predict the same or different outcomes to determine if DIMSmash is advisable. DIMSmash might be a reasonable option under the following conditions: if all of the dimensions are consistent with semantic memory; if the items within the dimensions focus on general impressions or enduring emotional states; if all of the dimensions predict the same outcome(s); and if there is clear evidence that the second-order composite score completely accounts for the correlations among the first-order factors and the correlations among the items themselves (e.g., Credé & Harms, 2015). However, to our knowledge, no current multidimensional leadership measure meets these criteria.

The distinction between a common factor and a bifactor model matters. From a memory-based perspective, because a common factor model takes into account only the common systematic variance that is shared among all of the dimensions, it is likely to tap into an overarching schema, based on semantic memory, rather than the specific dimensions intended by the multidimensional construct. Furthermore, as shown in our simulation, a common factor model can obscure relationships with outcomes, which leads to erroneous conclusions. This is particularly likely in cases where one of the dimensions predicts a unique outcome. In contrast, a bifactor model shows the individual influence of different dimensions on criteria apart from that of the common factor. Therefore, a bifactor model will not obscure relationships with outcomes. As shown by our CFA simulation, although the common factor and bifactor models appear equivalent based on fit indices, the factor loadings showed that the common dimension did not represent all of the items and first-order factors contained in the LMX MDM measure. In contrast, the bifactor model retained information across all items.

Future Research and Implications

Scale development: Moving toward response process validity. The use of questionnaires is problematic because they may capture general behavior, such as affect, rather than unique leader behaviors (Eva et al., 2024). A memory-based approach to measurement may be particularly well suited to pull apart leader behaviors from general impressions. Given the important implications of different types of memory, going forward, we call for the systematic examination of the memory source of items when developing scales to enhance the validity of our measures as well as to address calls for “purer” scales (Fischer, 2023; Hansbrough et al., 2021) composed of only behavioral or nonbehavioral items that reflect the same memory source. A variety of techniques have been proposed (e.g., remember/know judgments, think-aloud protocols, response times, physiological [e.g., pupillometry, EEG (electroencephalogram)] measures) that can be used to verify that items are operating as expected. These approaches offer different “windows” for revealing the underlying memory processes people use when responding to items. Although such windows may be too cumbersome to accompany routine measurement, they can be particularly helpful in the initial development of a measure.

Tulving (1985) proposed that raters have metacognitions about memory and can distinguish between ratings based on episodic memory, referred to as remember judgments, and those based on semantic memory, referred to as know judgments. The cognitive neuroscience literature has shown that episodic and semantic memories align with remember and know judgments (Diana et al., 2006, 2007). For example, Martell and Evans (2005) asked raters to indicate after responding to each item whether their response was based on a vivid recollection of a specific event (remember judgment) or was based on a general feeling or impression (know judgment). This technique could be used in scale development to identify items that are more likely to be associated with episodic or semantic memory. Think-aloud protocols are another technique that relies on rater metacognition. This technique can provide insight into peoples’ cognitive processes and responses as they complete items. For example, Erickan et al. (2010) applied the think-aloud protocol to identify and confirm sources that contribute to differential item functioning (DIF) in standardized assessments. However, concerns have been raised that asking people to reflect on their cognitive processes changes them (i.e., reactivity). This seems particularly relevant if we want to determine whether responses are based on automatic processes (e.g., schemas) or recollection of specific leader behaviors, because asking people to report their cognitive processes may move them away from default semantic memory processes. Furthermore, it should be noted that there is a lag between brain activity and free recall; put differently, free recall is an afterthought. As shown by Fried (2022), neurons in the hippocampus that code for a specific episodic memory activate 1 to 2 s before a person becomes aware of recalling that memory.

Response times may offer another helpful technique to identify different memory processes. For example, response times have been widely used in educational testing to identify when people are guessing (e.g., Sireci et al., 2008). Generally, search time in the memory literature is conceptualized as the time between the presentation of the memory probe (item to be recalled or items to be rated) and the initiation of the response to items. Because semantic memory is based on automatic associations, responses based on semantic memory should be shorter than those based on episodic memory. Consistent with this expectation, Dehaene (2014) found a key boundary between local, automatic, and conscious brain-scale processes occurred around 300 ms. Therefore, in terms of scale development, responses that are less than 300 ms are more likely to be associated with semantic memory. However, it is important to note that the relationship between response time and memory is unlikely to be linear. Memory search requires time and effort. If people are unable to recall a specific behavior, at some point they will give up and rely on a gut feeling (e.g., semantic memory). As observed by De Boeck and Jeon (2019, p. 8), “Early on in the response process the cost of spending more time is compensated by an increasing chance to find the correct response, but the longer it takes to find the correct response, the higher the cost becomes while the perceived chance of finding the correct response may decrease so that the expectation of a correct response no longer compensates for the cost of effort.”

Finally, the use of physiological measures might provide useful information regarding the memory source of items. Pupillometry measures the diameter of participants’ pupils over time and has been used to reflect both emotional arousal (Laeng et al., 2012) and cognitive effort (Kahneman & Beatty, 1966). Research has shown pupil dilation increases when episodic memories are activated as the activation of these memories requires greater cognitive load (Goldinger & Papesh, 2012). Therefore, pupillometry may provide confirmation that items elicit the intended memory processes. Neurological measures, such as EEG, can also be used to capture neural activation in the moment when people respond to items to examine the impact of item characteristics on the memory source. Neurological data enable researchers to obtain a more direct measure of response processes rather than an indirect proxy such as verbal protocols. Because the cognitive networks involved in episodic memory differ from those typical of semantic memory, indications of what brain systems initiate a rater's response to an item can be a helpful indicator of the memory source. Although the neurocognitive literature has primarily relied on functional magnetic resonance imaging (Diana et al., 2007), recent developments in neuroimaging suggest that less invasive devices, such as magnetoencephalography (MEG) recording using an external cap, can detect activation of core neural processes (Pizzo et al., 2019). Neuroimaging data could also be used to understand how specific incidents (i.e., episodic memory) get consolidated into semantic memory and how those memories are accessed at a later time. Taken together, these techniques may be useful to inform scale development and validation as well as the development “purer” scales (Fischer, 2023; Hansbrough et al., 2021) composed of items that reflect the same memory source.

Scale use . Response validity and memory processes are often overlooked not only in the initial development of scales but also when measures are changed by subsequent researchers. For example, Heggestad et al. (2019) found that while adaptations to scales are common, evidence to support the validity of these changes is rarely provided. Indeed, “any alteration such as changing the wording for some items or eliminating others from the scale means that the resulting scale may no longer be accurately measuring the underlying construct that it was originally created to assess” (Aguinis & Vandenberg, 2014, p. 589). As a result of these and other practices, the constructs and theories that researchers intend to test are disconnected from the measures used to validate them. We contend any modification to scale items may alter the type of memory that is emphasized which, in turn, alters the meaning of the underlying measure.

Questionnaire attributes . The type of memory used by raters may also be influenced by characteristics of the questionnaire. Careful consideration regarding questionnaire attributes may elicit responses based on more or less episodic memory. For example, research indicates that instructions to report only responses based on specific memories and to avoid responses based on general impressions (i.e., source monitoring) increases reliance on episodic memory (Hansbrough et al., 2021; Martell & Evans, 2005). The prevalence of events queried may also affect the memory source used. Rare events/behaviors that have not been experienced by most raters (e.g., low baseline behaviors such as abusive leadership) may make reliance on episodic memory equally rare because most people do not have memories of these specific events. Moreover, response alternatives with a broad temporal focus that reflect all of a rater's experience with a leader (e.g., frequency estimates) may elicit more semantic memories, whereas a narrow temporal focus (e.g., the last week or the last day) may elicit more episodic memories.

Rater factors. Motivation may also be important because unmotivated raters may be more inclined to rely on semantic memory due to ease of effort and rapid accessibility (Srull & Wyer, 1989). Lack of motivation is of particular concern for leadership ratings because raters do not view the rating process as particularly important (Martinko et al., 2018a). We also expect that motivation can change during long surveys, as raters may become fatigued in later sections of a survey prompting greater reliance on semantic memory. Different samples may also vary substantially in underlying motivation, with crowdsourcing samples likely lower on motivation than participants in face-to-face experiments or organizational samples where participants rate their actual leader. A rater's length of experience with a specific leader may also be important, as memory consolidation processes may increase the quantity and accessibility of semantic memories as their experiences with a leader cumulate.

Conclusion

We hope that our discussion will help researchers to recognize the meaningful distinction between items based on different types of memory and stop the widespread practice of collapsing dimensions that are intended to represent multidimensional constructs into a single composite score (DIMSmash) which creates significant misalignment between theory and method. Put differently, when researchers engage in DIMSmash they may be testing a different theory rather than the theory that is assumed. Moreover, as shown by our simulation, DIMSmash is inadvisable because it can produce misleading results. Multiplied over many studies, the practice of DIMSmash raises significant concerns about whether we know what we think we know about the effect of leadership on outcomes. In conclusion, we can no longer assume that items across different dimensions are equivalent and act on that assumption by smashing multidimensional scales into a single composite score. It is time for us to begin the cleanup.

Footnotes

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research is supported by the grant, “Rater Memory: The Missing Link to Improve Measurement and Validity” (W911NF-23-1-0336) U.S. Army Research Institute for the Behavioral and Social Sciences (ARI) awarded to the first and second authors and “Advancing Leadership Research” (W911NF-18-2-0049) U.S. Army Research Institute for the Behavioral and Social Sciences (ARI) awarded to the third author. The views, opinions, and/or findings contained in this paper are those of the authors and shall not be construed as an official Department of the Army position, policy, or decision unless so designated by other documents.

ORCID iD

Tiffany Keller Hansbrough

Notes

Appendix A

Journals Included in Literature Review

Appendix B

LMX MDM Items and Likely Memory Source

Appendix C

This is the R-code used to run the simulation described in the paper.

R^{2} = 1 - \frac{(1 - a d j_R^{2}) (N - p - 1)}{N - 1}

#Setting up matrices to store results for each iteration and activating libraries

# Input of LMX standardized regression coefficients predicting dependent variable obtained from Liden and Maslyn

# Main loop for simulation

# Generating LMX dimensions with the properties specified from Liden and Maslyn

# Generating the dependent variables

Sat_Work = −0.33*val_datz[,1] + 0.25*val_datz[,2] + .29*val_datz[,3] + .0*val_datz[,4] + (sqrt(1 − 0.098181818))*rnorm(samplesize,0,1.18)

Sat_Sup = 0.34*val_datz[,1] + 0.25*val_datz[,2] + .13*val_datz[,3] + .26*val_datz[,4] + (sqrt(1 − 0.671538462))*rnorm(samplesize,0,1)

# Obtained regression results for particular sample

# Creating the two composite scores, differentially weighted composite = [,1] and united weighted composite = [,2]

# Storing the results from the sample

# Storing obtained LMX correlations for sample

# Created 2 output files

Appendix D

#Creation of first-order factors consistent with Figure 1 of Liden and Maslyn (1998)

dat < - rnorm_multi(n = samplen,

mu = c(0, 0, 0, 0),

sd = c(1, 1, 1, 1),

r = c(0.763, 0.544, 0.74, 0.563, 0.692, 0.456),

varnames = c(“Affect”, “Loyalty”, “Contribution”, “PRespect”),

empirical = FALSE)

#Creation of the individual items from Figure 1 of Liden and Maslyn (1998)

dat$A1 = .851*dat$Affect + sqrt(.275)*rnorm(samplen,0,1)

dat$A2 = .864*dat$Affect + sqrt(.254)*rnorm(samplen,0,1)

dat$A3 = .875*dat$Affect + sqrt(.234)*rnorm(samplen,0,1)

dat$L1 = .554*dat$Loyalty + sqrt(.693)*rnorm(samplen,0,1)

dat$L2 = .775*dat$Loyalty + sqrt(.400)*rnorm(samplen,0,1)

dat$L3 = .856*dat$Loyalty + sqrt(.267)*rnorm(samplen,0,1)

dat$C1 = .738*dat$Contribution + sqrt(.455)*rnorm(samplen,0,1)

dat$C2 = .551*dat$Contribution + sqrt(.697)*rnorm(samplen,0,1)

dat$PR1 = .846*dat$PRespect + sqrt(.285)*rnorm(samplen,0,1)

dat$PR2 = .866*dat$PRespect + sqrt(.251)*rnorm(samplen,0,1)

dat$PR3 = .867*dat$PRespect + sqrt(.248)*rnorm(samplen,0,1)

#Correlated first-order factor model

model.1st = ‘# Factors

L_Affect =∼ A1 + A2 + A3

L_Loyalty =∼ L1 + L2 + L3

L_Contrib =∼ C1 + C2

L_ProfRespect =∼ PR1 + PR2 + PR3’

fit_1st < - cfa(model.1st, data = dat, std.lv = TRUE, information=“observed”)

summary(fit_1st, fit.measures = TRUE, standardized = TRUE)

#Bifactor Model

model.bi < - ‘

# Specific factors

L_Affect =∼ A1 + A2 + A3

L_Loyalty =∼ L1 + L2 + L3

L_Contrib =∼ 0.78*C1 + 0.512*C2

L_PR =∼ PR1 + PR2 + PR3

# General factor

LMX =∼ A1 + A2 + A3 + L1 + L2 + L3 + C1 + C2 + PR1 + PR2 + PR3

# Orthogonality constraints (bifactor structure)

LMX ∼∼ 0*L_Contrib

LMX ∼∼ 0*L_Affect

LMX ∼∼ 0*L_Loyalty

LMX ∼∼ 0*L_PR

L_Contrib ∼∼ 0*L_Affect

L_Contrib ∼∼ 0*L_Loyalty

L_Contrib ∼∼ 0*L_PR

L_Affect ∼∼ 0*L_Loyalty

L_Affect ∼∼ 0*L_PR

L_Loyalty ∼∼ 0*L_PR’

fit_bi < - cfa(model.bi, data = dat, std.lv = TRUE, information = "observed”)

summary(fit_bi, fit.measures = TRUE, standardized = TRUE)

#Chi-squared difference test for the two models

anova(fit_1st, fit_bi)

References

Addis

D. R.

Schacter

(2012). The hippocampus and imagining the future: Where do we stand? Frontiers in Human Neuroscience, 5, 173. https://doi.org/10.3389/fnhum.2011.00173

* Afota

M.-C.

Robert

Vandenberghe

(2021). The interactive effect of leader–member exchange and psychological climate for overwork on subordinate workaholism and job strain. European Journal of Work and Organizational Psychology, 30(4), 495–509. https://doi.org/10.1080/1359432x.2020.1858806

* Agarwal

Bhal

K. T.

(2020). A multidimensional measure of responsible leadership: Integrating strategy and ethics. Group & Organization Management, 45(5), 637–673. https://doi.org/10.1177/1059601120930140

Aguinis

Vandenberg

R. J.

(2014). An ounce of prevention is worth a pound of cure: Improving research quality before data collection. Annual Review of Organizational Psychology and Organizational Behavior, 1, 569–595. https://doi.org/10.1146/annurev-orgpsych-031413-091231

Allen

P. A.

Kaut

K. P.

Lord

R. R.

(2008). Emotion and episodic memory. In Dere

Easton

Nadel

Huston

J. P.

(Eds.), Handbook of behavioral neuroscience (Vol. 18, pp. 115–132). London: Elsevier. https://doi.org/10.1016/S1569-7339(08)00208-7

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education . (1999). Standards for educational and psychological testing. Washington, D.C.: American Educational Research Association.

* Anand

Vidyarthi

Liden

R. C.

(2018a). Leader–member exchange as a linking pin in the idiosyncratic deals—Performance relationship in workgroups. The Leadership Quarterly, 29(6), 698–708. https://doi.org/10.1016/j.leaqua.2018.07.005

* Anand

Vidyarthi

Rolnicki

(2018b). Leader–member exchange and organizational citizenship behaviors: Contextual effects of leader power distance and group task interdependence. The Leadership Quarterly, 29(4), 489–500. https://doi.org/10.1016/j.leaqua.2017.11.002

Avolio

B. J.

Gardner

Walumbwa

F. O.

(2007). Authentic leadership questionnaire for researchers (ALQ) [Database record]. APA PsycTests. https://doi.org/10.1037/t06442-000

10.

* Avolio

B. J.

Wernsing

Gardner

W. L.

(2018). Revisiting the development and validation of the authentic leadership questionnaire: Analytical clarifications. Journal of Management, 44(2), 399–411. https://doi.org./10.1177/0149206317739960

11.

Balthazard

Hansbrough

T. K.

Hanges

Acton

Zheng

Lord

R. G.

Foti

R. J.

(2023). An EEG investigation of rater memory process congruence in organizational leadership assessment. In AoM Community (NEU) Conference in Organisational Neuroscience, Rotterdam.

12.

Banks

G. C.

Gooty

Ross

R. L.

Williams

C. E.

Harrington

N. T.

(2018). Construct redundancy in leader behaviors: A review and agenda for the future. The Leadership Quarterly, 29(1), 236–251. https://doi.org/10.1016/j.leaqua.2017.12.005

13.

Banks

G. C.

Woznyj

H. M.

Mansfield

C. A.

(2023). Where is “behavior” in organizational behavior? A call for a revolution in leadership research and beyond. The Leadership Quarterly, 34(6), 101581. https://doi.org/10.1016/j.leaqua.2021.101581

14.

Bass

B. M.

(1990). Bass & Stogdill's handbook of leadership: Theory, research, and managerial applications (3rd ed.). New York, NY: Free Press.

15.

Bass

B. M.

Avolio

B. J.

(1996). Manual for the multifactor leadership questionnaire. Palo Alto, CA: Mindgarden.

16.

Bauer

T. N.

Erdogan

Liden

R. C.

Wayne

S. J.

(2006). A longitudinal study of the moderating role of leader–member exchange, performance and turnover during new executive development. Journal of Applied Psychology, 91(2), 298–310. https://doi.org/10.1037/0021-9010.91.2.298

17.

* Bauer

T. N.

Perrot

Liden

R. C.

Erdogan

(2019). Understanding the consequences of newcomer proactive behaviors: The moderating contextual role of servant leadership. Journal of Vocational Behavior, 112, 356–368. https://doi.org/10.1016/j.jvb.2019.05.001

18.

Böckenholt

(2012). Modeling multiple response processes in judgment and choice. Psychological Methods, 17(4), 665–678. https://doi.org/10.1037/a0028111

19.

Böckenholt

(2017). Measuring response styles in Likert items. Psychological Methods, 22(1), 69–83. https://doi.org/10.1037/met0000106

20.

Boon

Den Hartog

D. N.

Lepak

D. P.

(2019). A systematic review of human resource management systems and their measurement. Journal of Management, 45(6), 2498–2537. https://doi.org/10.1177/0149206318818718

21.

* Bowler

W. M.

Paul

J. B.

Halbesleben

J. R.

(2019). LMX And attributions of organizational citizenship behavior motives: When is citizenship perceived as brownnosing? Journal of Business and Psychology, 34(2), 139–152. https://doi.org/10.1007/s10869-017-9526-5

22.

* Breevaart

Zacher

(2019). Main and interactive effects of weekly transformational and laissez-faire leadership on followers’ trust in the leader and leader effectiveness. Journal of Occupational and Organizational Psychology, 92(2), 384–409. https://doi.org/10.1111/joop.12253

23.

* Brimhall

K. C.

Palinkas

(2020). Using mixed methods to uncover inclusive leader behaviors: A promising approach for improving employee and organizational outcomes. Journal of Leadership & Organizational Studies, 27(4), 357–375. https://doi.org/10.1177/1548051820936286

24.

Carver

C. S.

(1989). How should multifaceted personality constructs be tested? Issues illustrated by self-monitoring, attributional style, and hardiness. Journal of Personality and Social Psychology, 56(4), 577–585. https://doi.org/10.1037/0022-3514.56.4.577

25.

Chen

F. F.

Hayes

Carver

C. S.

Laurenceau

J. P.

Zhang

(2012). Modeling general and specific variance in multifaceted constructs: A comparison of the bifactor model to other approaches. Journal of Personality, 80(1), 219–251. https://doi.org/10.1111/j.1467-6494.2011.00739.x

26.

Chen

F. F.

Stephen

West

S. G.

Sousa

K. H.

(2006). A comparison of bifactor and second-order models of quality of life. Multivariate Behavioral Research, 41(2), 189–225. https://doi.org/10.1207/s15327906mbr4102_5

27.

* Chénard-Poirier

L. A.

Morin

A. J.

Boudrias

J. S.

Gillet

(2022). The combined effects of destructive and constructive leadership on thriving at work and behavioral empowerment. Journal of Business and Psychology, 37, 173–189. https://doi.org/10.1007/s10869-021-09734-7

28.

* Chiniara

Bentein

(2018). The servant leadership advantage: When perceiving low differentiation in leader-member relationship quality influences team cohesion, team task performance and service OCB. The Leadership Quarterly, 29(2), 333–345. https://doi.org/10.1016/j.leaqua.2017.05.002

29.

* Choi

Kraimer

M. L.

Seibert

S. E.

(2020). Conflict, justice, and inequality: Why perceptions of leader–member exchange differentiation hurt performance in teams. Journal of Organizational Behavior, 41(6), 567–586. https://doi.org/10.1002/job.2451

30.

* Cooper

C. D.

Kong

D. T.

Crossley

C. D.

(2018). Leader humor as an interpersonal resource: Integrating three theoretical perspectives. Academy of Management Journal, 61(2), 769–796. https://doi.org/10.5465/amj.2014.0358

31.

Credé

Harms

P. D.

(2015). 25 Years of higher-order confirmatory factor analysis in the organizational sciences: A critical review and development of reporting recommendations. Journal of Organizational Behavior, 36(6), 845–872. https://doi.org/10.1002/job.2008

32.

De Boeck

Jeon

(2019). An overview of models for response times and processes in cognitive tests. Frontiers in Psychology, 10, 102. https://doi.org/10.3389/fpsyg.2019.00102

33.

Dehaene

(2014). Consciousness and the brain: Deciphering how the brain codes our thoughts. New York, NY: Viking.

34.

Diana

R. A.

Reder

L. M.

Arndt

Park

(2006). Models of recognition: A review of arguments in favor of a dual-process account. Psychonomic Bulletin & Review, 13(1), 1–21.

35.

Diana

R. A.

Yonelinas

A. P.

Ranganath

(2007). Imaging recollection and familiarity in the medial temporal lobe: A three-component model. Trends in Cognitive Sciences, 11(9), 379–386.

36.

* Dietl

Reb

(2021). A self-regulation model of leader authenticity based on mindful self-regulated attention and political skill. Human Relations, 74(4), 473–501.

37.

Dinh

J. E.

Lord

R. G.

Gardner

W. L.

Meuser

J. D.

Liden

R. C.

(2014). Leadership theory and research in the new millennium: Current theoretical trends and changing perspectives. The Leadership Quarterly, 25(1), 36–62.

38.

* Djourova

N. P.

Rodríguez Molina

Tordera Santamatilde

Abate

(2020). Self-efficacy and resilience: Mediating mechanisms in the relationship between the transformational leadership dimensions and well-being. Journal of Leadership & Organizational Studies, 27(3), 256–270.

39.

Edwards

J. R.

(2001). Multidimensional constructs in organizational behavior research: An integrative analytical framework. Organizational Research Methods, 4(2), 144–192.

40.

Embretson

S. E.

Reese

S. P.

(2000). Item response theory for psychologists. Mahwah, NJ: Lawrence Erlbaum.

41.

Erickan

Arim

Law

Domene

Gagnon

Lacroix

(2010). Application on think aloud protocols for examining and confirming sources of differential item functioning identified by expert reviews. Educational Measurement: Issues and Practice, 29(2), 24–35.

42.

Eva

Howard

J. L.

Liden

R. C.

Morin

A. J. S.

Schwarz

(2024). An inconvenient truth: A comprehensive examination of the added value (or lack thereof) of leadership measures. Journal of Management Studies. https://doi.org/10.1111/joms.13156

43.

* Farahnak

L. R.

Ehrhart

M. G.

Torres

E. M.

Aarons

G. A.

(2020). The influence of transformational leadership and leader attitudes on subordinate attitudes and implementation success. Journal of Leadership & Organizational Studies, 27(1), 98–111.

44.

Fischer

(2023). Measuring behaviors counterfactually. The Leadership Quarterly, 34(6), 101750.

45.

Fiske

S. T.

Taylor

S. E.

(2013). Social cognition: From brains to culture. New York, NY: McGraw-Hill Higher Education.

46.

Fried

(2022). Neurons as will and representation. Nature Reviews Neuroscience, 23, 104–114.

47.

* Frieder

R. E.

Wang

I.-S.

(2018). Linking job-relevant personality traits, transformational leadership, and job performance via perceived meaningfulness at work: A moderated mediation model. Journal of Applied Psychology, 103(3), 324–333.

48.

Furr

R. M.

(2021). Psychometrics: An introduction (4th ed.). Thousand Oaks, CA: Sage.

49.

* Gagné

Morin

A. J. S.

Schabram

Wang

Z. N.

Chemolli

Briand

(2020). Uncovering relations between leadership perceptions and motivation under different organizational contexts: A multilevel cross-lagged analysis. Journal of Business and Psychology, 35(6), 713–732.

50.

* Gill

Caza

(2018). An investigation of authentic leadership’s individual and group influences on follower responses. Journal of Management, 44(2), 530–554.

51.

Giordano

Ones

D. S.

Waller

N. G.

Stanek

K. C.

(2020). Exploratory bifactor measurement models in vocational behavior. Journal of Vocational Behavior, 120, 103430. https://doi.org/10.1016/j.jvb.2020.103430

52.

* Goldammer

Annen

Stöckli

P. L.

Jonas

(2020). Careless responding in questionnaire measures: Detection, impact, and remedies. The Leadership Quarterly, 31(4), 101384.

53.

Goldinger

S. D.

Papesh

M. H.

(2012). Pupil dilation reflects the creation and retrieval of memories. Current Directions in Psychological Science, 21(2), 90–95.

54.

* Groves

K. S.

(2020). Testing a moderated mediation model of transformational leadership, values, and organization change. Journal of Leadership & Organizational Studies, 27(1), 35–48.

55.

* Hackney

K. J.

Maher

L. P.

Daniels

S. R.

Hochwarter

W. A.

Ferris

G. R.

(2018). Performance, stress, and attitudinal outcomes of perceptions of others’ entitlement behavior: Supervisor–subordinate work relationship quality as moderator in two samples. Group & Organization Management, 43(1), 101–137.

56.

* Hancock

A. J.

Gellatly

I. R.

Walsh

M. M.

Arnold

K. A.

Connelly

C. E.

(2023). Good, bad, and ugly leadership patterns: Implications for followers’ work-related and context-free outcomes. Journal of Management, 49(2), 640–676.

57.

Hanges

Lord

Dickson

(2000). An information-processing perspective on leadership and culture: A case for connectionist architecture. Applied Psychology, 49(1), 133–161.

58.

Hansbrough

T. K.

Lord

R. G.

Schyns

(2015). Reconsidering the accuracy of follower leadership ratings. The Leadership Quarterly, 26(2), 220–237.

59.

Hansbrough

T. K.

Lord

R. G.

Schyns

Foti

R. J.

Liden

R. G.

Acton

B. P.

(2021). Do you remember? Rater memory systems and leadership measurement. The Leadership Quarterly, 32(2), 101455.

60.

Heggestad

E. D.

Scheaf

D. J.

Banks

G. C.

Hausfeld

M. M.

Tonidandel

Williams

E. B.

(2019). Scale adaptions in organizational science research: A review and best practice recommendations. Journal of Management, 45(6), 2596–2627. https://doi/10.1177/0149206319850280

61.

Hull

J. G.

Lehn

D. A.

Tedlie

J. C.

(1991). A general approach to testing multifaceted personality constructs. Journal of Personality and Social Psychology, 61(6), 932–945.

62.

* Jiang

Chen

C. C.

(2018). Integrating knowledge activities for team innovation: Effects of transformational leadership. Journal of Management, 44(5), 1819–1847.

63.

Kahneman

Beatty

(1966). Pupil diameter and load on memory. Science, 154(3756), 1583–1585.

64.

* Kammerhoff

Lauenstein

Schütz

(2019). Leading toward harmony—Different types of conflict mediate how followers’ perceptions of transformational leadership are related to job satisfaction and performance. European Management Journal, 37(2), 210–221.

65.

* Kanat-Maymon

Elimelech

Roth

(2020). Work motivations as antecedents and outcomes of leadership: Integrating self-determination theory and the full range leadership theory. European Management Journal, 38(4), 555–564.

66.

Kane

M. T.

(2006). Validation. In Robert

B. L.

(Ed.), Educational measurement. Fourth ed. (pp. 17–64). Wesport, CT: Praeger.

67.

Kim

Yammarino

F. J.

Dionne

S. D.

Eckardt

Cheong

Tsai

Guo

Park

J. W.

(2020). State-of-the-science review of leader–follower dyads research. The Leadership Quarterly, 31(1), 101306.

68.

* Kim

K. Y.

Atwater

Latheef

Zheng

(2019). Three motives for abusive supervision: The mitigating effect of subordinates attributed motives on abusive supervision’s negative outcomes. Journal of Leadership & Organizational Studies, 26(4), 476–494.

69.

* Kim

K. Y.

Eisenberger

Takeuchi

Baik

(2022a). Organizational-level perceived support enhances organizational profitability. Journal of Applied Psychology, 107(12), 2176–2196.

70.

* Kim

Liden

R. C.

Liu

(2022b). The interplay of leader–member exchange and peer mentoring in teams on team performance via team potency. Journal of Organizational Behavior, 43(5), 932–945.

71.

* Kirrane

Kilroy

Kidney

Flood

P. C.

Bauwens

(2019). The relationship between attachment style and creativity: The mediating roles of LMX and TMX. European Journal of Work and Organizational Psychology, 28(6), 784–799.

72.

Laeng

Sirois

Gredebäck

(2012). Pupillometry: A window to the preconscious? Perspectives on Psychological Science, 7(1), 18–27.

73.

* Lapointe

Vandenberghe

Ben Ayed

A. K.

Schwarz

Tremblay

Chenevert

(2020). Social comparisons, self-conceptions, and attributions: Assessing the self-related contingencies in leader–member exchange relationships. Journal of Business and Psychology, 35, 381–402.

74.

* Lemoine

G. J.

Blum

T. C.

(2021). Servant leadership, leader gender, and team gender role: Testing a female advantage in a cascading model of performance. Personnel Psychology, 74(1), 3–28.

75.

* Lemoine

G. J.

Hartnell

C. A.

Hora

Watts

D. I.

(2024). Moral minds: How and when does servant leadership influence employees to benefit multiple stakeholders? Personnel Psychology, 77, 1055–1085.

76.

* Levesque-Cote

Fernet

Austin

Morin

A. J. S.

(2018). New wine in a new bottle: Refining the assessment of authentic leadership using exploratory structural equation modeling (ESEM). Journal of Business and Psychology, 33(5), 611–628.

77.

Lewis

J. D.

Weigart

(1985). Trust as a social reality. Social Forces, 63(4), 967–985.

78.

* Li

Chen

Bai

Liden

R. C.

Wong

M.-N.

Qiao

(2023). Serving while being energized (strained)? A dual-path model linking servant leadership to leader psychological strain and job performance. Journal of Applied Psychology, 108(4), 660–675.

79.

Liden

R. C.

Maslyn

J. M.

(1998). Multidimensionality of leader–member exchange: An empirical assessment through scale development. Journal of Management, 24(1), 43–72.

80.

Liden

R. C.

Wayne

S. J.

Zhao

Henderson

(2008). Servant leadership: Development of a multidimensional measure and multi-level assessment. The Leadership Quarterly, 19(2), 161–177.

81.

Lin

Y.-T.

(2020). The experience of being oneself in memory: Exploring sense of identity via observer memory. Review of Philosophy and Psychology, 11(2), 405–422.

82.

Lindskold

(1978). Trust development, the GRIT proposal and the effects of conciliatory acts on conflict and cooperation. Psychological Bulletin, 85(4), 772–793.

83.

Lord

R. G.

(1985). Accuracy in behavioral measurement: An alternative definition based on raters’ cognitive schema and signal detection theory. Journal of Applied Psychology, 70(1), 66–71.

84.

Lord

R. G.

Brown

D. J.

Harvey

J. L.

Hall

R. J.

(2001). Contextual constraints on prototype generation and their multilevel consequences for leadership perceptions. The Leadership Quarterly, 12(3), 311–338.

85.

Lord

R. G.

Hall

R. J.

Gatti

P. M.

Zheng

Morgan

(2021). Item characteristics and personal semantics as predictors of episodic memory and factor loadings in the levels of self-concept scale. Paper presented at the annual meeting of the Academy of Management, Virtual.

86.

Martell

R. F.

Evans

D. P.

(2005). Source-monitoring training: Toward reducing rater expectancy effects in behavioral management. Journal of Applied Psychology, 90(5), 956–963.

87.

* Martinko

M. J.

Mackey

J. D.

Moss

S. E.

Harvey

McAllister

C. P.

Brees

J. R.

(2018a). An exploration of the role of subordinate affect in leader evaluations. Journal of Applied Psychology, 103(7), 738–752.

88.

* Martinko

M. J.

Randolph-Seng

Shen

Brees

J. R.

Mahoney

K. T.

Kessler

S. R.

(2018b). An examination of the influence of implicit leadership theories, attribution styles and performance cues on questionnaire measures of leadership. Journal of Leadership and Organizational Studies, 25(1), 116–133.

89.

* Masterson

Sun

Wayne

S. J.

Kluemper

(2021). The roller coaster of happiness: An investigation of interns’ happiness variability, LMX, and job-seeking goals. Journal of Vocational Behavior, 131, 103654.

90.

Messick

(1990). Validity of test interpretation and use. Research Report 90-11. Education Testing Service.

91.

* Nielsen

Tafvelin

von Thiele Schwarz

Hasson

(2022). In the eye of the beholder: How self-other agreements influence leadership training outcomes as perceived by leaders and their followers. Journal of Business and Psychology, 37(1), 73–90.

92.

Nyberg

A. J.

Moliterno

T. P.

Hale

Lepak

D. P.

(2014). Resource-based perspectives on unit-level human capital: A review and integration. Journal of Management, 40(1), 316–346.

93.

* Pathki

C. S.

Kluemper

D. H.

Meuser

J. D.

McLarty

B. D.

(2022). The Org-B5: Development of a short work frame-of-reference measure of the big five. Journal of Management, 48(5), 1299–1337.

94.

Persson

Nyberg

(2008). Structure–function correlates of episodic memory in aging. In Dere

Easton

Nadel

Huston

J. P.

(Eds.), Handbook of episodic memory (pp. 521–536). Amsterdam, Netherlands: Elsevier.

95.

Pizzo

Roehri

Medina Villalon

Trébuchon

Chen

Lagarde

Carron

Gavaret

Giusiano

McGonigal

Bartolomei

Badier

J. M.

Bénar

C. G.

(2019). Deep brain activities can be detected with magnetoencephalography. Nature Communications, 10(1), 971.

96.

* Quade

M. J.

McLarty

B. D.

Bonner

J. M.

(2020). The influence of supervisor bottom-line mentality and employee bottom-line mentality on leader–member exchange and subsequent employee performance. Human Relations, 73(8), 1157–1181.

97.

Ranganath

Ritchey

(2012). Two cortical systems for memory-guided behaviour. Nature Reviews Neuroscience, 13(10), 713–726.

98.

Rhemtulla

van Bork

Borsboom

(2020). Worse than measurement error: Consequences of inappropriate latent variable measurement models. Psychological Methods, 25(1), 30–45.

99.

* Seitz

S. R.

Owens

B. P.

(2021). Transformable? A multi-dimensional exploration of transformational leadership and follower implicit person theories. European Journal of Work and Organizational Psychology, 30(1), 95–109.

100.

Semin

G. R.

Fiedler

(1988). The cognitive functions of linguistic categories in describing persons: Social cognition and language. Journal of Personality and Social Psychology, 54(4), 558–568.

101.

* Seo

J. J.

Nahrgang

J. D.

Carter

M. Z.

Hom

P. W.

(2018). Not all differentiation is the same: Examining the moderating effects of leader–member exchange (LMX) configurations. Journal of Applied Psychology, 103(5), 478–495.

102.

Shondrick

S. J.

Dinh

J. E.

Lord

R. G.

(2010). Developments in implicit leadership theory and cognitive science: Applications to improving measurement and understanding alternatives to hierarchical leadership. The Leadership Quarterly, 21(6), 959–978.

103.

* Singh

Vidyarthi

P. R.

(2018). Idiosyncratic deals to employee outcomes: Mediating role of social exchange relationships. Journal of Leadership & Organizational Studies, 25(4), 443–455.

104.

Sireci

S. G.

Baldwin

Martone

Zenisky

A. L.

Kaira

Lam

Shea

C. L.

Han

Deng

Delton

Hambleton

R. K.

(2008). Massachusetts adult proficiency tests technical manual: Version 2. Amherst, MA: Center for Educational Assessment. http://www.umass.edu/remp/CEA_TechMan.html

105.

Smith

E. R.

DeCoster

(2000). Dual-process models in social and cognitive psychology: Conceptual integration and links to underlying memory systems. Personality and Social Psychology Review, 4(2), 108–131.

106.

* Song

Wang

Zhao

(2021). How employee authenticity shapes work attitudes and behaviors: The mediating role of psychological capital and the moderating role of leader authenticity. Journal of Business and Psychology, 36(6), 1125–1136.

107.

Srull

T. K.

Wyer

R. S.

(1989). Person memory and person judgment. Psychological Review, 96(1), 58–83.

108.

* Sun

P. Y. T.

Anderson

M. H.

Gang

(2024). Determining the hierarchal structure and nature of servant leadership. Journal of Business and Psychology, 39, 715–734.

109.

* Svendsen

Unterrainer

Jønsson

T. F.

(2018). The effect of transformational leadership and job autonomy on promotive and prohibitive voice: A two-wave study. Journal of Leadership & Organizational Studies, 25(2), 171–183.

110.

* Tafvelin

Hasson

Holmstrom

Schwarz

U. T.

(2019). Are formal leaders the only ones benefitting from leadership training? A shared leadership perspective. Journal of Leadership & Organizational Studies, 26(1), 32–43.

111.

ter Doest

Semin

G. R.

(2005). Retrieval contexts and the concreteness effect: Dissociations in memory for concrete and abstract words. European Journal of Cognitive Psychology, 17(6), 859–881.

112.

Tulving

(1985). How many memory systems are there?. American Psychologist, 40(4), 385–398.

113.

Tulving

(2002). Episodic memory: From mind to brain. Annual Review of Psychology, 53, 1–25.

114.

* Turner

L. A.

Merriman

K. K.

(2022). Cultural intelligence and establishment of organizational diversity management practices: An upper echelons perspective. Human Resource Management Journal, 32(2), 321–340.

115.

* Van Dijk

Kark

Matta

Johnson

R. E.

(2021). Collective aspirations: Collective regulatory focus as a mediator between transformational and transactional leadership and team creativity. Journal of Business and Psychology, 36(4), 633–658.

116.

Walumbwa

F. O.

Avolio

B. J.

Gardner

W. L.

Wernsing

T. S.

Peterson

S. J.

(2008). Authentic leadership: Development and validation of a theory-based measure. Journal of Management, 34(1), 89–126.

117.

* Watkins

Lee

S. H.

Yam

K. C.

Zhan

Long

(2022). Helping after dark: Ambivalent leadership outcomes of helping followers after the workday. Journal of Organizational Behavior, 43(6), 1038–1062.

118.

Winocur

Moscovitch

Bontempi

(2010). Memory formation and long-term retention in humans and animals: Convergence towards a transformational account of hippocampal–neocortical interactions. Neuropsychologia, 48(8), 2339–2356.

119.

* Xu

A. J.

Loi

Cai

Liden

R. C.

(2019). Reversing the lens: How followers influence leader–member exchange quality. Journal of Occupational and Organizational Psychology, 92(3), 475–497.

120.

* Yang

Chen

Wang

X. H.

(2024). Paradoxical leadership behavior and employee creative deviance: The role of paradoxical mindset and leader–member exchange. Journal of Business and Psychology, 39, 697–713.

121.

Zhu

Song

L. J.

Zhu

Johnson

R. E.

(2019). Visualizing the landscape and evolution of leadership research. The Leadership Quarterly, 30(2), 215–232.

122.

Zumbo

B. D.

Chan

E. K.

(2014). Setting the stage for validity and validation in social, behavioral, and health sciences: Trends in validation practices. In Zumbo

B. D.

Chan

E. K. H.

(Eds.), Validity and validation in social, behavioral, and health sciences (pp. 3–8). Cham, Switzerland: Springer.