Abstract
Research on implicit and explicit prejudice has treated implicit prejudice as a unitary construct characterized by automatic access to negative concepts. The present article makes the case that tasks purported to measure implicit prejudice actually assess 2 different processes. Some assess the extent to which prejudice is activated automatically on the perception of a member of the target group. Other implicit tasks assess the extent to which prejudice is automatically applied in judgment. In the reported study, participants completed 4 implicit and 2 explicit measures of prejudice against women. Factor analysis yielded a 3-factor solution. The solution provides support for the distinction between explicit prejudice and 2 types of implicit prejudice corresponding to automatic activation and automatic application of prejudice. Prejudice appears to be a multifaceted construct, different aspects of which are measured by different tasks.
For more than 70 years, social psychologists have been concerned with the measurement of prejudice toward out-groups. Indeed, this was arguably the original concern of some of the most prominent early social psychologists, such as Thurstone and Likert (Jones, 1985; Steiner, 1979). Although the basic goal has remained the same, measurement techniques have changed considerably over the years. Initially, researchers assessed stereotypes and prejudice in a manner that was straightforward and transparent to the respondent. Bogardus's (1925) Social Distance Scale is a typical example. Here, respondents were asked to express the intimacy they were willing to tolerate from people belonging to an out-group (on a scale ranging from would exclude from my country to would admit to close kinship by marriage). Thurstone's Equal Interval Scale (1927), Guttman's scalogram (1950), and Osgood, Suci, and Tannenbaum's (1957) semantic differential are other techniques that were developed to measure prejudice.
In the 1960s and 1970s, societal pressures against prejudiced attitudes began to increase. Concurrently, and perhaps unsurprisingly, the traditional measures indicated a decline in racism and sexism, at least in the United States (Dovidio & Gaertner, 1986). In this context, interpretation of the existing measures became worrisome at best. Although it was plausible that prejudice was on the decline, it was also possible that prejudice was taking more subtle and insidious forms to which the available assessment methods were largely insensitive. This concern spawned the development of a second generation of prejudice scales that were aimed at assessing presumably covert forms of prejudice, also called “modern racism” or “symbolic racism.” Such scales were considered to be less reactive because attempts were made to develop items and response formats that disguised the true purpose of the measurement instrument. Thus, if participants were “truly” prejudiced, despite their attempts at self-presentation, this would be revealed in these measures. Examples of such scales include the Modern Racism Scale (McConahay, Hardee, & Batts, 1981), the Pro-Black Scale and the Anti-Black Scale (Katz & Hass, 1988), the Attitudes Toward Blacks Scale (Brigham, 1993), the Subtle Prejudice Scale (Pettigrew & Meertens, 1995), the Modern Sexism Scale (Swim, Aikin, Hall, & Hunter, 1995), and the Ambivalent Sexism Inventory (Glick & Fiske, 1996). Example items are listed in Table 1.
Explicit Measures of Prejudice
Even these more subtle measures, however, seemed to many researchers too vulnerable to self-presentation. Some argued that although the purpose of the scales was not necessarily obvious to the respondent, the fact that the instrument dealt exclusively with a specific target out-group (e.g., Blacks or women) and questions assessed beliefs about this out-group was certainly transparent (Dovidio & Fazio, 1992; Fazio, Jackson, Dunton, & Williams, 1995; Gaertner & Dovidio, 1986). Thus, given that respondents were aware of the topic under consideration, and given the lack of time constraints on their responding, they could control their responses so as to present themselves in a favorable light.
In support of this criticism of the second-generation prejudice scales, Fazio et al. (1995) demonstrated that participants’ responses to the Modern Racism Scale were systematically affected by the race of the experimenter who administered the questionnaire. White participants who completed the scale in the presence of a Black experimenter tended to have lower prejudice scores than those who completed the scale in the presence of a White experimenter. Indirect evidence was also provided by Chen, Lee-Chai, and Bargh (1998), who examined the effects of power on social responsibility. The study took place in a professor's office, and power was manipulated by the position in which the participant was seated during the study: Participants were seated either in the professor's chair, behind the desk (high power), or in a guest chair, in front of the desk (low power). As expected, this subtle manipulation had an influence on participants’ prejudice scores on the Modern Racism Scale, which they completed while sitting at the desk. 1 Taken together, the research by Fazio and colleagues and by Chen and colleagues demonstrates the vulnerability of the second-generation explicit measures of prejudice to social desirability and situational pressures.
In the wake of fervent interest in the specification and definition of automatic cognitive processes and reports of tasks to examine such processes, prejudice researchers have recently developed implicit measures of prejudice. This development parallels recent interest in implicit memory in cognitive psychology (Graf & Schacter, 1985; Kihlstrom, 1987; Roediger, 1990; Schacter, 1987). In implicit measures of prejudice, the researcher attempts to assess automatic evaluative responses to a social category of interest. If a negative evaluative response is automatically activated by exposure to a member of a target group or the symbolic representation of the target group (e.g., a verbal label designating the group or a highly descriptive trait), this is held to be an indication of prejudice.
As a means of assessing automaticity of responding, the stimulus representing the out-group is often presented outside of the conscious awareness of the participant or in a task in which control over the influences of exposure to the stimulus on subsequent processing is limited or even prevented. As a means of assessing valence of response, the effects of prior presentation of a symbolic representation of the target group on judgments about stimuli not related to the target group are typically measured. Because the respondent is unaware of the purpose of the task and often unaware of the presence of the symbolic representation of the target group per se, the implicit assessment techniques are held to be insensitive to self-presentational concerns and to measure the respondent's true level of prejudice toward a given target group. Widely used implicit measures of prejudice include the category inclusion task (Dovidio, Evans, & Tyler, 1986), the automatic application task (Devine, 1989; Lepore & Brown, 1997), the Stroop task (Locke, MacLeod, & Walker, 1994), the adjective evaluation task (Fazio et al., 1995), the lexical decision task (Wittenbrink, Judd, & Park, 1997), the adjective categorization task (Dovidio, Kawakami, Johnson, Johnson, & Howard, 1997), and, most recently, the word pronunciation task (Kawakami, Dion, & Dovidio, 1998) and the implicit association test (Greenwald, McGhee, & Schwartz, 1998). 2 Table 2 provides a short description of each task.
Implicit Measures of Prejudice
Note. SOA = stimulus onset asynchrony; RT = response time.
In me original study, the adjective evaluation task consisted of six phases, but only the fourth phase is described here. The other phases were included to obtain baseline response latencies for the adjective evaluations (Phase 1) and to make the cover story more credible.
The implicit association test is actually composed of five phases, but only the two phases used to calculate a participant's prejudice score are described here. The other three phases are included to habituate the participants to use their left and right hands to categorize the target stimuli.
The distinction between implicit and explicit measures of prejudice parallels the distinction between implicit and explicit cognition (Devine, 1989; Greenwald & Banaji, 1995). Explicit cognitive processes are largely conscious, effortful, intentional, and demanding of resources, whereas implicit processes are generally held to be unconscious, effortless, unintentional, and not demanding of cognitive resources (Bargh, 1989, 1994; Shiffrin & Schneider, 1977). Different tasks have been designed to measure the two processes. Thus, although social psychologists have remained interested in the same phenomenon over the years—that is, in prejudice—in devising different kinds of assessment techniques they may no longer be measuring the same construct. The following questions therefore arise for contemporary theory and research on prejudice: Do implicit and explicit measures of prejudice assess the same underlying construct? What are the differences between the extant explicit measures? What about the implicit measures of prejudice? Can the implicit measures be used interchangeably, or do different implicit measures assess different aspects of prejudice?
In what follows, we address these questions. First, we examine the existing conceptualizations of the relationships between implicit and explicit prejudice and review research findings that have been cited as evidence in support of those views. Second, we propose a possible way to categorize implicit measures of prejudice that might serve to organize the conceptual confusion. Then we report findings concerning the relationships among four implicit measures and two explicit measures of prejudice. Although much extant research on the relationship between implicit and explicit prejudice examines racism in the United States, the present findings concern the relations between implicit and explicit sexism 3 in Germany. Finally, we discuss the relationship between the empirical findings and our proposed conceptual organization of existing measures of implicit prejudice.
Implicit and Explicit Prejudice: One Construct or Two Constructs?
Same Construct Approach
In one view, implicit and explicit measures of prejudice assess precisely the same thing because what has become automatic is the prejudice that was initially consciously, and perhaps intentionally, learned. Put differently, the implicit measures assess the internalization of the prejudice tapped by the explicit measures. Under this view, then, there should be a positive relationship between implicit and explicit measures of prejudice (Kawakami et al., 1998; Lepore & Brown, 1997; Locke et al., 1994; Neumann, 1998; Wittenbrink et al., 1997). Importantly, the same construct approach does not require that this relationship be particularly strong. First, explicit measures involve a different experimental method than implicit measures (self-report questionnaires in one case and computer tasks in the other), and the correlation between two measures of the same construct is smaller when the two indicators involve different methods than when they involve the same method (Campbell & Fiske, 1959). Second, prejudice scores derived from implicit tasks are generally relative in nature (i.e., the responses to out-group primes are compared with the responses to in-group primes), whereas prejudice scores on explicit measures tend to be absolute (i.e., they assess individuals’ beliefs about the out-group independent of whether they have more favorable or unfavorable beliefs about the in-group).
Furthermore, the same construct approach does not deny the fact that self-presentational concerns can and do affect responding to the explicit measures. It is assumed that self-presentational strategies affect the responses of all respondents and, in addition, that self-presentational concerns affect the responses of all respondents in the same manner (e.g., as less prejudiced than they actually are). If all respondents present themselves more favorably than they actually are, then this tendency should affect the mean values but not the rank order of respondents on the explicit prejudice scales (Wittenbrink, Judd, Park, & Stone, 1998). Thus, it may be that, on self-report scales, highly prejudiced individuals appear to be less prejudiced than revealed by an implicit measure, but this does not deny the possibility that prejudiced individuals lie on the upper end of the distribution on both the implicit and the explicit measures. If this is the case, then a positive relationship between implicit and explicit prejudice should still be observed.
A number of recent studies provide evidence in support of the positive relationship between implicit and explicit measures of prejudice. Wittenbrink et al. (1997), for example, found a positive correlation between implicit prejudice, as measured by a lexical decision task, and McConahay et al.'s (1981) Modern Racism Scale and Katz and Hass's (1988) Pro-Black Scale (Table 3 provides a list of studies testing, either directly or indirectly, the relationship between implicit and explicit prejudice; see Blair, in press, for a similar table). Neumann (1998) used a modified version of the implicit association test (Greenwald et al., 1998) in which participants’ implicit prejudice is inferred from the speed with which they can respond to prejudice-inconsistent stimuli with the same motor response (see Table 2 for a detailed description of the task). He found a significant correlation of .43 between participants’ implicit prejudice scores on the implicit association test and their responses on Pettigrew and Meertens's (1995) Subtle Prejudice Scale. Banaji (1999), Capozza and Voci (1998), Dovidio et al. (1997, Study 2), Locke et al. (1994), Moskowitz, Salomon, and Taylor (in press), and Moskowitz, Wasel, Gollwitzer, and Schaal (1999) have also reported positive relationships between implicit and explicit prejudice.
Relationships Between Implicit and Explicit Measures of Prejudice
Devine (1989), Lepore and Brown (1997), and Locke et al. (1994) used the explicit measure to divide participants into high and low prejudice groups, treated the explicit prejudice score as a categorical variable, and reported F values. We transformed the F values to r values with the following formula: r = [F/(F + df error)].5.
Dovidio et al. (1997) and Wittenbrink et al. (1997) calculated several indexes of prejudice from their implicit measures. The correlations reported here refer to the most commonly used index, which is defined as the difference in response latency between prejudice-inconsistent trials (“White-negative”/“Black-positive”) and prejudice-consistent trials (“White-positive”/“Black-negative”).
Kawakami et al. (1998), Moskowitz et al, (in press), Moskowitz et al. (1999), and Locke et al. (1994) used the response latency data to calculate an implicit stereotyping score rather than an implicit prejudice score. That is, the implicit scores represent the extent to which respondents activate stereotype-consistent and stereotype-irrelevant traits after the presentation of Black and White primes. As a result, the correlations reported here describe the relationship between explicit prejudice and implicit stereotyping.
To test the specific hypothesis that self-presentational concerns affect the responses of all respondents in more or less the same manner, Wittenbrink et al. (1998) systematically varied the presentation time of the primes in the lexical decision task. Primes were presented at below-threshold levels in the first study, they were partly visible in a second study, and they were presented supraliminally in a third study. Variation in presentation time had a marked influence on level of prejudice revealed by the task: The longer the presentation times, the less prejudiced the participants appeared on average. However, presentation time did not affect the relationship between implicit and explicit prejudice. The correlations between scores on the lexical decision task and scores based on the single latent factor found to account for all of the covariance in five explicit measures of prejudice were consistently positive and significant (average rs = .40, .35, and .36 for stimulus onset asynchronies [SOAs] of 15 ms, 30 ms, and 250 ms, respectively 4 & rpar;. Thus, although participants presented themselves in an increasingly favorable light as the presentation times of the primes increased, the relationship with the explicit measures remained the same.
Kawakami et al. (1998) observed a similar pattern of correlations in a study in which they used a word pronunciation task to assess implicit stereotyping and compared this with a combined index of responses on the Blatant Racism Scale (Pettigrew & Meertens, 1995), the Modern Racism Scale (McConahay et al., 1981), and the evaluation thermometer. In addition, they varied the SOA in the word pronunciation task (300 ms vs. 2,000 ms). There was a moderate but significant correlation between implicit and explicit prejudice (r = .17). More important, the relationship was not affected by the manipulation of SOA (rs = .16 and .20 for SOAs of 300 ms and 2,000 ms, respectively). Taken together, the research by Wittenbrink et al. (1998) and by Kawakami et al. (1998) suggests that the rank order of participants according to prejudice level is largely unaffected by the amount of control they have over responding or their awareness of the purpose of the task (see also Locke et al., 1994).
Dissociation Approach
An alternative approach holds that implicit and explicit measures of prejudice assess very different things and that the two types of measures are therefore largely uncorrelated. A prominent example of this theoretical approach is Devine's (1989) dissociation model (but see Banaji & Greenwald, 1995; Dovidio et al., 1997; Fazio et al., 1995). In Devine's account, all individuals are exposed to cultural stereotypes and prejudices from very early in life, and these cultural prejudices become internalized such that they are automatically activated on mere exposure to a member of an out-group (or a symbolic representation of the out-group). However, over time individuals learn new information about members of out-groups, in addition to beliefs about prejudice itself. Through learning, some individuals come to hold and consciously report beliefs that are strongly at odds with cultural prejudices. At the same time, other individuals learn and consciously hold new personal beliefs that are consistent with the cultural prejudice. Thus, at least in adults, there are two constructs of interest: cultural prejudices that are internalized and automatized and personal beliefs that are effortfully elaborated and consciously available. Implicit measures presumably assess the former, whereas explicit measures assess the latter. What this means for the observed statistical relationship between implicit and explicit measures of prejudice is quite straightforward. Because all individuals respond automatically with the culturally determined prejudice, but only some people learn and embrace new, nonprejudiced beliefs, there should be no stable correlation between implicit and explicit measures of prejudice (Dovidio et al., 1997).
There is also a somewhat weaker, more methodologically oriented version of the dissociation approach according to which implicit and explicit measures should be unrelated. Both Dunton and Fazio (1997) and Plant and Devine (1998) proposed that explicit measures contain a high degree of systematic error, whereas implicit measures do not. If, in the extreme, explicit measures are highly sensitive to socially desirable responding, and implicit measures reveal the true underlying prejudice, there is no reason to assume that the measures should be correlated. The implication of this account, then, is that if researchers could somehow remove the systematic error from self-report measures (e.g., by using the bogus pipeline), they would observe a relationship between the two types of prejudice. A further implication is that the relationship between implicit and explicit measures of prejudice should be affected by individual differences. For individuals who are highly motivated to control prejudiced responses, there should be no relationship; for individuals who are willing to report prejudiced attitudes, however, explicit and implicit measures should be correlated.
A number of studies provide support for the dissociation approach. Devine (1989) assessed participants’ explicit prejudice level with the Modern Racism Scale. Implicit prejudice was measured with the two-part automatic application task. In the first part, as described in Table 2, participants were exposed to words associated with the Black stereotype that were masked to prevent conscious recognition. In the second part, participants read a story about an ambiguous target person and rated his hostility. If exposure to words associated with Blacks automatically activates the concept of hostility (which is central to the negative stereotype of Blacks), an individual will interpret the behavior of the ambiguous target person as hostile. Devine (1989) found no relationship between participants’ self-reported prejudice and their prejudice as assessed by the priming task. It should be noted, however, that Lepore and Brown (1997) replicated Devine's procedure but primed participants with words associated with the category Blacks rather than with the stereotype for Blacks. A significant relationship between implicit and explicit prejudice was observed, providing evidence for the same construct approach.
Dovidio and colleagues also found some support for the dissociation approach. In two studies, Dovidio et al. (1997) assessed participants’ level of implicit prejudice using the adjective categorization task, in which participants are exposed to subliminal primes of pictures of Blacks and Whites and then judge whether a series of positive and negative adjectives are descriptors of people in general. Participants’ explicit prejudice was assessed with Brigham's (1993) Attitudes Toward Blacks Scale and McConahay et al.'s (1981) Modern Racism Scale. In neither study was a relationship between the implicit and explicit measures observed. 5 Likewise, Fazio et al. (1995) measured implicit prejudice with the adjective evaluation task, in which participants judge the valence of positive and negative adjectives that are primed by pictures of members of the target out-group. Participants’ level of prejudice, as measured by the implicit task, did not correlate with their responses on the Modern Racism Scale. Greenwald et al. (1998) and Banaji and Greenwald (1995) also found no relationship between implicit and explicit prejudice.
In summary, there are two major approaches to the conceptualization of implicit and explicit prejudice. In the same construct approach, implicit and explicit measures are positively related because whatever is measured by the former task is an internalization of what is measured by the latter task. In the dissociation approach, implicit measures assess culturally shared beliefs that have been internalized. Conversely, explicit measures assess conscious beliefs that may be determined by the individual's attitudes toward prejudice, fairness, and personal learning history, as well as self-presentational concerns. For some individuals, these beliefs are consistent with internalized prejudice; for others, these beliefs are inconsistent. Thus, empirical observation should reveal no relationship between implicit and explicit measures of prejudice.
Common Assumptions
Although the major approaches to prejudice just reviewed involve different conceptions of implicit and explicit prejudice, adherents of both views appear to hold similar unquestioned convictions that guide their research. Adherents of both approaches distinguish, for example, between implicit measures, on the one hand, and explicit measures, on the other. Although researchers tend to explain in detail why they use both types of measures in their research, they generally do not justify their choice of instruments within these categories. The practice is hardly problematic for the use of explicit measures, because the second-generation self-report measures correlate highly with each other. For this reason, many researchers combine several explicit measures into a single explicit prejudice index (e.g., Kawakami et al., 1998; Wittenbrink et al., 1998). Furthermore, the presentation of a new scale generally includes a detailed report of how the scale relates to already-existing prejudice measures (see Glick & Fiske, 1996, for a good example). However, the lack of explicit criteria for choosing a particular measure is considerably more problematic with regard to implicit measures. Researchers have a tendency to use the measure that they themselves developed. Although there are substantial differences between the available implicit tasks, as revealed in Table 2, many researchers treat implicit measures as equivalent when they discuss the theoretical implications of their results and when they relate their work to existing findings.
To our knowledge, however, there is no published research that has systematically examined the relationships among more than two implicit measures of prejudice in addition to multiple explicit measures. In fact, in nearly all extant studies of implicit prejudice, one implicit measure of prejudice was assessed and compared with one or more explicit measures of prejudice. 6 Thus, to date there has been no empirical test of the hypothesis that implicit measures of prejudice assess the same underlying construct. After presenting an argument for a possible classification of implicit measures of prejudice, we report such a test.
Prejudice as a Multidimensional Construct
A Theoretical Classification of Implicit Measures
As seen in the foregoing review and revealed in detail in Table 2, there is much variation among implicit measures of prejudice. In principle, the tasks can be distinguished along a number of different dimensions. For example, in some tasks the prime stimuli are presented subliminally (e.g., Wittenbrink et al., 1997), whereas in other tasks the primes are presented supraliminally (Locke et al., 1994). The prime stimuli are sometimes words that denote target groups (such as Black and White; Kawakami et al., 1998), sometimes they are pictures of exemplars of the target groups (Fazio et al., 1995), and at other times they are first names that are typical of members of the target groups (Greenwald et al., 1998). Some tasks aim to activate the cultural stereotype of a target group (Devine, 1989), but other tasks are specifically designed to activate only words that are associated with the target category (Lepore & Brown, 1997). In some tasks, participants make relatively superficial responses to the targets, such as lexical decision judgments (Wittenbrink et al., 1997) or word pronunciation (Kawakami et al., 1998). Other tasks involve responses that require more in-depth processing of the targets. For example, in some tasks participants assess the evaluative connotation of a word (Fazio et al., 1995) or decide whether a word can be used to describe people or houses (Dovidio et al., 1986, 1997). In most tasks, participants are unaware that their prejudice level is being measured (Fazio et al., 1995; Wittenbrink et al., 1997), but this is not always the case (Greenwald et al., 1998; Locke et al., 1994).
Without minimizing the importance of the just-mentioned distinctions, we propose here a classification of the implicit tasks that, we argue, may provide some conceptual clarity to the construct of implicit prejudice. Specifically, we propose that the tasks can be classified according to whether they assess the automatic activation of prejudice or whether they assess the automatic application of prejudice to a target or target concept. Indeed, there is good evidence that automatically activated concepts can influence later information processing without awareness (Bargh & Pietromonaco, 1982; Higgins, Rholes, & Jones, 1977; Wegner & Bargh, 1998). However, in recent years, evidence has also accumulated to suggest that the link between the activation and the application of a concept is not as straightforward as initially assumed. It seems that situational variables and individual differences can determine whether an activated concept is subsequently used in later judgment or behavior.
In their first experiment, for instance, Gilbert and Hixon (1991) exposed experimental participants to primes related to Americans’ stereotype of Asians. Some participants were made cognitively busy during such exposure, and, as compared with participants who were not under cognitive load, there was no evidence of automatic stereotype activation for these individuals. The second experiment showed, furthermore, that once a stereotype was automatically activated, cognitive busyness actually increased stereotype application. Thus, there was a clear distinction between activation, which was inhibited by cognitive load, and application, which was facilitated by cognitive load. In interpreting these findings, Gilbert and Hixon (1991) noted that although stereotype activation typically increases the use of that stereotype in information processing, “it does not mandate such use, nor does it determine the precise nature of its use. It is possible for activated information to exert no effect on subsequent judgments or to have a variety of different effects” (p. 512). The notion that situational factors moderate stereotype activation and use is consistent as well with the reasoning of Bargh (1989), who distinguished between two types of automatic processes. Whereas “non-goal-dependent” automatic processes are independent of situational demands, “goal-dependent” automatic processes are elicited only when individuals have a particular processing goal in mind.
Halberstadt and Niedenthal (1997), echoing Dalgleish and Watts (1990), made a related distinction between the automatic capture of attention by an emotional stimulus and the use of the attended stimulus in subsequent processing, using individual differences as a case in point.
Even if participants do orient their attention to a particular stimulus, or spend more time looking at it, this does not necessarily mean that the stimulus received further processing, or provide insight into the nature of that processing… in the case of phobics, for example, the attended-to stimuli may receive less processing, as measured by their relatively poor memorability. (Halberstadt & Niedenthal, 1997, p. 1020)
Förster and Mussweiler (1999) recently provided more direct evidence for individual differences in the automatic activation of a concept versus the use of the concept. In an initial study, these researchers showed that the concept of sex automatically activated the concept of aggression in both men and women. In a subsequent study, however, they found that men, but not women, actually applied the concept of aggression to target individuals (i.e., men behaved more aggressively in a task measuring aggression unobtrusively). Results of numerous other studies also seem to be consistent with our distinction between the automatic activation and the automatic application of prejudice (Banaji, Hardin, & Rothman, 1993; Chen et al., 1998; Higgins, 1996; Higgins & Brendl, 1995; Locke et al., 1994; Locke & Walker, 1999; Macrae, Bodenhausen, Milne, Thorn, & Castelli, 1997).
With regard to the specific implicit tasks, it would seem, a priori, that the adjective evaluation task (Fazio et al., 1995), the lexical decision task (Wittenbrink et al., 1997), the adjective categorization task (Dovidio et al., 1997), and the word pronunciation task (Kawakami et al., 1998) measure the extent to which prejudice is automatically activated by prime stimuli. The participants’ task is to categorize the target adjectives as words or nonwords (Wittenbrink et al., 1997), as having either a positive or negative connotation (Fazio et al., 1995), or as attributes generally used to describe persons or houses (Dovidio et al., 1997), or their task is to simply pronounce the target adjective (Kawakami et al., 1998).
In contrast, the category inclusion task (Dovidio et al., 1986), the automatic application task (Devine, 1989; Lepore & Brown, 1997), and the Stroop task (Locke et al., 1994) seem to measure something resembling application of prejudice. Here, participants are asked to apply positive and negative attributes to a target person or a target group by indicating whether these attributes are descriptive of the target (Dovidio et al., 1986; Locke et al., 1994) or whether a fictitious target possesses the attributes (Devine, 1989; Lepore & Brown, 1997). 7
The Three Components of Prejudice
The preceding analysis leads to a conceptualization of prejudice as a multidimensional construct. Being prejudiced can mean that negative concepts are accessed immediately on contact with a member of a target out-group. This aspect of prejudice is probably best assessed by the tasks that we have just argued measure automatic concept activation. Being prejudiced can also mean that biased thoughts or feelings are implicitly expressed when drawing an inference about or attributing character traits to a member of an out-group. This aspect of prejudice is likely to be tapped most accurately by tasks that measure an individual's tendency to apply prejudice automatically when making a judgment about the target group. And, finally, being prejudiced means espousing negative beliefs about members of discriminated groups and denying them certain rights. This aspect of prejudice is assessed by explicit measures in which respondents are asked to self-report their attitudes toward the target out-group.
To test this multidimensional conceptualization of prejudice, we conducted a study in which we assessed participants’ prejudice toward women with six different tasks. Two were explicit self-report measures of sexism, and four were implicit measures of sexism (strict adaptations of the existing implicit tasks). Two of the implicit measures were, following the present account, measures of automatic activation, and two were measures of automatic application. Data were then subjected to correlational and factor-analytic analyses. The same construct approach most naturally predicts a one-factor solution. However, because similar methodology produces shared error variance, a two-factor solution with two correlated factors (implicit measures loading highly on the first factor and explicit measures loading highly on the second) is also consistent with the same construct approach (see Green, Goldman, & Salovey, 1993, for a related argument in the domain of mood). The dissociation approach predicts a two-factor solution with two relatively independent factors. Finally, the present conceptualization of prejudice as a multidimensional construct predicts measures to be strongly related within subcategories (i.e., implicit activation, implicit application, and explicit) and to be moderately related across categories. Furthermore, we expected that a three-factor solution would best account for the relationship among the different prejudice scores for each task.
Relating Implicit and Explicit Measures of Sexism
In the present study, 130 male undergraduates at the University of Konstanz (Germany) completed a German translation of Swim et al.'s (1995) Modern Sexism Scale. Two months later, 80 individuals who had either particularly high or particularly low scores on the explicit measure were invited to participate in a presumably unrelated study of perceptual processes. Thirty-five “sexist” and 33 “nonsexist” individuals agreed to return to the laboratory. In the laboratory, the participants completed four implicit and two explicit measures of prejudice toward women. The Fazio et al. (1995) adjective evaluation task and the Wittenbrink et al. (1997) lexical decision task were adapted as implicit measures of sexism because, as discussed previously, they were judged a priori to measure the automatic activation of prejudice. Devine's (1989) automatic application task and Dovidio et al.'s (1986) category inclusion task were also adapted as implicit measures because they appear to assess automatic application of prejudice. Explicit measures included, again, the Modern Sexism Scale and Glick and Fiske's (1996) Ambivalent Sexism Inventory.
The logic of the procedure was to lead the participants from the most uncontrollable to the most controllable measure of prejudice. The entire study was presented as a cognitive psychology experiment designed to examine people's capacity to categorize objects quickly. Thus, after an initial computer task in which individual thresholds for the subliminal priming were established, participants performed the automatic application task and the lexical decision task. Both tasks involved below-threshold presentation of primes. Participants then performed the adjective evaluation task, in which primes were presented supraliminally but for which a highly plausible cover story disguised the true purpose of the task (as described subsequently). The last computer task was the category inclusion task. Here the primes were presented supraliminally, and although a misleading cover story was presented, the participants were nevertheless aware that they were establishing links between men and women, on the one hand, and positively and negatively valenced adjectives, on the other hand. At the end of the study, participants completed the two self-report measures.
Because the original tasks were modified to assess sexism rather than racism, and because other researchers may wish to use similar adaptations, we provide some details of the procedure here. Additional details can be obtained from the authors.
Automatic Application Task
Adaptation of the automatic application task as an implicit measure of prejudice toward women involved a number of steps (Devine, 1989; Lepore & Brown, 1997). The first was to assess the cultural stereotype for women and to find the central negative trait for the stereotype among German undergraduates (Devine used “hostility” for the negative stereotype for Blacks in the United States).
Participants who did not take part in the main study were asked to list traits and adjectives that other individuals generally associate with women. Emotionality was the most frequently mentioned trait. According to German undergraduates, “emotionality” expresses the view that women are moody, unstable, and incapable of controlling their affective reactions, which often prevents them from functioning efficiently in their environment. The idea that emotionality is contained in the cultural stereotype for women is supported by Bem (1974).
Next, five brief descriptions revealed in pretesting to be ambiguous (but plausible) behavioral manifestations of emotionality were embedded in a short, coherent story called “A Week in D.'s Life.” The story was gender neutral; the main character was called “D.,” and the use of pronouns was avoided. The full story, translated into English, is presented in Appendix A.
The resulting automatic application task involved two subtasks. In the first, ostensibly a vigilance task, the target category “women” was activated through the usual presentation of primes to parafoveal vision while participants attended to a central fixation point. There were two types of subliminal primes, female primes and neutral primes. The female primes evoked the category of women but had no link to the concept of emotionality. The neutral primes were related neither to women nor to emotionality (see Appendix B). Thus, the female primes directly activated the category women but not necessarily the negative belief that all women are emotional. In fact, 40 of the participants were exposed to 80% female words and 20% neutral words (female prime condition), whereas the other participants saw 80% neutral words and 20% female words (neutral prime condition) in the vigilance task. Only those who received the priming with 80% female words appear in many of the critical analyses pertinent to this article. 8 However, both conditions were conducted so as to replicate Devine (1989) and Lepore and Brown (1997).
In the second subtask, participants read the short story and then evaluated the main character on 12 continuous rating scales. Six of the scales referred to emotional traits (e.g., sensitive, moody, emotional, rational, steady, and thoughtful), and 6 referred to traits that were neutral with respect to emotionality (e.g., closed-minded, boring, pessimistic, fair, artistic, and polite). Within each category, half of the traits were negative, and half were positive. Prejudice scores were calculated as follows: The continuous rating scales were divided into 26 equal intervals, and a number between 1 and 26 was assigned to participants’ ratings. Consistent with the procedure of Devine (1989), an emotionality score was calculated by inverting the “rational” items and averaging them with the rating of the emotional items. The higher the values, the more participants interpreted the main character as being emotional.
To replicate the analyses of Devine (1989) and Lepore and Brown (1997), we assigned participants to high and low prejudice groups based on their scores on the Modern Sexism Scale at Time 1. If the concept of emotionality is automatically activated in participants both high and low in sexism, and this concept is then applied to an ambiguous target, then we should find a main effect of prime condition for emotionality ratings (Devine, 1989). If automatic activation and application occur only in individuals high in sexism, then an interaction between priming condition and prejudice level should be observed (Lepore & Brown, 1997).
For the female prime condition, the mean emotionality ratings were −2.27 and −1.35, respectively, for the low and high prejudice groups. For those in the neutral prime condition, the corresponding means were −0.56 and −2.23. In a Prejudice Level (high vs. low) × Priming (female vs. neutral) analysis of variance, neither the main effects nor the interaction effect was significant (or even marginal). A similar analysis was also performed with a different dependent variable (i.e., prejudice toward D.) that reflected the difference between negative and positive scales (Lepore & Brown, 1997). Again, no effects were significant. Our findings therefore replicate neither those of Devine (1989) nor those of Lepore and Brown (1997).
Lexical Decision Task
Wittenbrink et al.'s (1997) implicit measure of prejudice was presented as a lexical decision task in which participants were to indicate as quickly as possible whether or not a given letter string represented a German word. On each trial, participants were exposed to one of three subliminal primes: “woman,” “man,” or “XXXX.” Targets were 16 words and 16 pronounceable nonwords. The 16 words consisted of 8 positive and 8 negative adjectives (see Appendix C). 9
Response latencies from trials on which the target was a word were used to calculate prejudice scores. Specifically, response latencies to adjectives that followed the prime “woman” were subtracted from the latencies of response to the same target following a neutral prime. This difference represents a facilitation score that expresses the extent to which access to a given target word is facilitated by the prime “woman.” The same facilitation score was computed for the “man” prime. Facilitation scores for positive and negative adjectives were then computed separately. This procedure resulted in four types of facilitation scores that expressed the extent to which lexical decisions about positive and negative adjectives were facilitated by the primes “woman” and “man.” In theory, sexist individuals respond quickly to prejudice-consistent prime–target combinations (negative adjectives following female primes and positive adjectives following male primes), and they respond slowly to prejudice-inconsistent prime–target combinations (positive adjectives following female primes and negative adjectives following male primes). Nonsexist individuals respond equally fast to both types of prime–target combinations. On average, there was a 30.29-ms difference between facilitation on prejudice-consistent trials and facilitation on prejudice-inconsistent trials. Analyses of the log-transformed data showed that the difference was reliable, t(67) = 2.03, p < .05. This indicates that the 68 participants were, on average, relatively sexist.
Adjective Evaluation Task
The Fazio et al. (1995) implicit measure of prejudice involves six phases, and the crucial response time data are collected during Phases 1 and 4. In Phase 1 of the present study, participants saw 12 positive and 12 negative adjectives presented on the computer screen one at a time, and their task was to indicate as quickly as possible whether the adjective was positive or negative in connotation. The adjectives themselves, and the presentation times for the adjectives, were identical to those used in Phase 4, which made it possible to use the response times in Phase 1 as baseline ratings for the analysis of responses in Phase 4.
In Phase 2, participants saw pictures of 6 men and 6 women on the screen and were instructed to memorize them. Phase 3 consisted of the presentation of 24 pictures, half of which had been seen in Phase 2 and half of which were new. The participants’ task was to indicate whether or not they had seen the pictures before. Phase 4 was presented to participants as a combination of the previous tasks. On each trial, they saw a picture and then an adjective. They were instructed to memorize the picture and, at the same time, to indicate as quickly as possible whether the presented adjective was positive or negative (see Appendix D for a complete list of adjectives). In Phase 5, participants saw some of the pictures shown in Phase 4, along with some new pictures, and their task was to indicate which items they had seen earlier in the study. Phase 6 of the Fazio et al. (1995) procedure was not included in the present study. 10
Participants’ response latencies from Phase 4 served as the primary dependent variable, and their response latencies from Phase 1 were used as baseline scores. Facilitation scores were averaged within prejudice-consistent and prejudice-inconsistent trials. The difference between the two scores was taken as an indicator of implicit sexism. Analysis of the raw scores revealed that facilitation was 6.26 ms greater on prejudice-consistent trials than on prejudice-inconsistent trials. Although this difference is relatively small, a t test on the log-transformed scores revealed that it was significant, t(67) = 3.16, p < .01. This suggests, as before, that the participants were relatively sexist on average.
Category Inclusion Task
The last implicit measure was a modified version of Dovidio et al.'s (1986) category inclusion task, which has also been used by Judd, Park, Ryan, Brauer, and Kraus (1995) and Wittenbrink et al. (1997). The present version of the task included three types of primes: “woman,” “man,” and “house.” The primes were presented supraliminally, and participants were able to read them. Among the 24 target adjectives, 16 were descriptive of human beings (e.g., intelligent), and 8 were generally used to describe houses (e.g., dilapidated; see Appendix E). Within each category, half of the adjectives were positive, and half were negative. Each of the 24 target adjectives was paired once with each of the three primes, resulting in 72 experimental trials. 11
As a means of forming a sexism score, response latencies were averaged across prejudice-consistent trials (those on which negative adjectives were paired with female primes and those on which positive adjectives were paired with male primes) and, separately, across prejudice-inconsistent trials (those on which negative adjectives were paired with male primes, and those on which positive adjectives were paired with female primes), and the difference between the two was then calculated. The average difference was 6.35 ms, and analyses of the inverse-transformed data showed that this difference was not reliably different from zero, t(67) = 0.29, ns.
Findings
Relationships among Measures of Prejudice
Bivariate correlations were calculated to examine the relationships among the six different measures of prejudice. The correlations are reported in Table 4.
Intercorrelations Among the Different Measures of Prejudice
Note. MSS1 = Modern Sexism Scale, Time 1 (Swim et al., 1995); MSS2 = Modern Sexism Scale, Time 2 (Swim et al., 1995); ASI = Ambivalent Sexism Inventory (Glick & Fiske, 1996); AAT = automatic application task (Devine, 1989; Lepore & Brown, 1997); LDT = lexical decision task (Wittenbrink et al., 1997); AET = adjective evaluation task (Fazio et al., 1995); CIT = category inclusion task (Dovidio et al., 1986).
p < .10.
p < .05.
p < .01.
p < .001.
As can be seen in Table 4, the test–retest reliability coefficient for the Modern Sexism Scale was .81. The two explicit measures of prejudice, the Ambivalent Sexism Inventory and the Modern Sexism Scale, correlated highly with each other at Time 2 (r = .52, p < .001). This finding closely replicates Glick and Fiske (1996), who found a correlation of .57. The correlations with the implicit measures are also shown in Table 4. The lexical decision task, the adjective evaluation task, and the category inclusion task shared a common pattern: They were correlated with the Modern Sexism Scale (rs = .24, .28, and .28, respectively), but they were not related to the Ambivalent Sexism Inventory (rs = .20, −.01, and .10, respectively).
Whereas the lexical decision task and the adjective evaluation task were correlated with each other (r = .27), the category inclusion task was unrelated to the other two implicit measures of prejudice. The prejudice scores derived from the automatic application task were not related to any of the other measures. This is not surprising because we did not replicate the original findings.
A number of additional analyses were also conducted to allow exclusion of alternative interpretations of the results. First, prejudice scores based on the two subscales of the Ambivalent Sexism Inventory, the Benevolent Sexism subscale and the Hostile Sexism subscale, were calculated separately. Whereas benevolent sexism measures the extent to which participants see women stereotypically, hostile sexism reflects negative attitudes toward women. We expected hostile sexism to be more closely related to the implicit measures of prejudice, which assessed the extent to which participants associated negative traits with women. However, scores on neither of the two subscales were related to any of the implicit measures. As one might expect, the Modern Sexism Scale score at Time 2 was more strongly correlated with the Hostile Sexism subscale score (r = .60, p < .0001) than with the Benevolent Sexism subscale score (r = .25, p < .05).
Second, an alternative explanation of the relationships among scores from the category inclusion task, the lexical decision task, and the adjective evaluation task was also addressed. Whereas the prejudice scores from the category inclusion task were based on raw data, the prejudice scores from the other two tasks were calculated on the basis of facilitation scores. One could argue that the nonsignificant correlation between the category inclusion task, on the one hand, and the lexical decision task and the adjective evaluation task, on the other hand, was due to the differences in how the prejudice scores were derived.
Thus, new prejudice scores were calculated for the lexical decision task and the adjective evaluation task. This time, absolute response latencies (i.e., not subtracting from baseline responding) rather than facilitation scores were used to estimate prejudice. The results generally replicated the pattern of the former scores, but the correlations tended to be weaker. The lexical decision task and the adjective evaluation task were still positively correlated, although only marginally (r = .21, p = .09), and they remained unrelated to the category inclusion task.
These correlations are most consistent with the conceptualization of prejudice as a multidimensional construct. Although the tasks measuring automatic activation (the lexical decision task and the adjective evaluation task) and the task measuring automatic application (the category inclusion task) were related to the Modern Sexism Scale, they were unrelated to each other. This finding supports the general notion that different implicit measures tap different components of prejudice. Still, two comments must be made at this point. First, the multidimensional approach might have predicted that the relationships between two measures assessing the same component would be stronger than the relationships between two tasks assessing two different components. One should not forget, however, that the present versions of the implicit tasks contained considerably fewer trials than those used in the original studies. This change negatively affected the measures’ reliability, which decreased the correlations that these measures could possibly have with other measures. Second, one might be surprised that the implicit measures were related to one of the explicit measures (the Modern Sexism Scale) and not the other (the Ambivalent Sexism Inventory). But this is not surprising given that participants were preselected on the basis of either very high or very low scores on the Modern Sexism Scale at Time 1. This selection procedure increased our chances of finding correlations between the Modern Sexism Scale and other measures of prejudice.
Factor Analyses
To gain a greater understanding of the relationships among the measures of prejudice, we conducted a factor analysis using all prejudice scores collected at Time 2, except those from the automatic application task (because it was uncorrelated with the other measures). Principal factors extraction with an oblique (promax) rotation was performed on the five remaining scores. Three factors with eigenvalues larger than one were retained. In total, the factors explained 79% of the variance. Loadings of variables on factors after rotation, communalities, and percentages of variance explained are shown in Table 5. The Modern Sexism Scale and the Ambivalent Sexism Inventory loaded highly on the first factor. The second factor was represented by the lexical decision task and the adjective evaluation task. The third factor was defined by a single variable, the category inclusion task.
Factor Loadings After Promax Rotation: Factor Analyses at Time 2
Note. All factor loadings below .40 have been deleted. MSS2 = Modern Sexism Scale, Time 2 (Swim et al., 1995); ASI = Ambivalent Sexism Inventory (Glick & Fiske, 1996); LDT = lexical decision task (Wittenbrink et al, 1997); AET = adjective evaluation task (Fazio et al., 1995); CIT = category inclusion task (Dovidio et al., 1986).
We tested an alternative one-factor solution (37% of the variance) and a two-factor solution (59% of the variance). To test the fit of these solutions, we applied a scree test (Cattell, 1966) procedure. Although the first three eigenvalues descended linearly (from 1.86 to 1.02), there was a considerable gap between the third (1.02) and the fourth (0.65) eigenvalues. Thus, a three-factor solution was preferred to a one- or a two-factor solution in the interpretation of the results.
Implications
We proposed initially that prejudice is a multidimensional construct and that the different existing measures of prejudice assess different components of this construct. After having discussed the distinction between implicit and explicit measures, we proposed that implicit measures of prejudice can be partitioned into two categories: those measuring automatic activation of prejudice versus those measuring automatic application of prejudice. In the present study, we assessed sexism with four implicit and two explicit measures. One of the implicit measures, the automatic application task used by Devine (1989) and by Lepore and Brown (1997), was not related to any of the other implicit and explicit measures.
The remaining three implicit measures were positively related to the explicit Modern Sexism Scale (Swim et al., 1995), but the relationships among them were rather weak. Whereas Wittenbrink et al.'s (1997) lexical decision task and Fazio et al.'s (1995) adjective evaluation task correlated with each other, Dovidio et al.'s (1986) category inclusion task was unrelated to the other two implicit measures. Somewhat surprisingly, the explicit Ambivalent Sexism Inventory was strongly related to the only other explicit measure included in the study (the Modern Sexism Scale) but was unrelated to any implicit measure.
A factor analysis yielded a three-factor solution, with the two explicit measures representing the first factor, the two correlated implicit measures loading highly on the second factor (lexical decision task and adjective evaluation task), and the category inclusion task constituting the third factor. It seems clear that the first factor describes the explicit, controlled component of prejudice. As the examples in Table 1 show, both the Modern Sexism Scale and the Ambivalent Sexism Inventory ask individuals about their beliefs about and attitudes toward women. These beliefs and attitudes are self-reported, and participants are free to present themselves in socially desirable or self-enhancing ways.
In addition, we suggest that the second factor constitutes the implicit, automatic activation of prejudice. Both the lexical decision task and the adjective evaluation task involve rapid decisions about trait adjectives, and participants are either unaware of the primed category or unaware of the relationship between the category and the trait adjective. What is measured here are the associations that are automatically activated on contact with the category. Note that participants are not asked to apply these concepts to the target group under consideration.
The third factor is particularly interesting, because the implicit measure loading highly on this factor (the category inclusion task) was unrelated to any of the other implicit measures obtained in the study. Consistent with our prior reasoning, we believe that the task assesses a qualitatively different aspect of prejudice, namely the automatic application of prejudice (i.e., the spontaneous inferences and attributions that people make about a target group). Consider the participants’ task. On the critical trials on which a label for a social category and an adjective describing human beings are presented, participants see a category prime and are asked to decide whether a subsequently presented trait adjective could ever describe the category. Both primes and target words are presented above threshold. In contrast to the other implicit tasks, this task involves a matching process in which the participants determine the fit between the category and the trait adjective. In this sense, the task requires the application of activated positive and negative concepts in a judgment about the target group. This contrasts with the other implicit measures that assess the concepts activated on contact with a member of the target category.
Alternative Accounts
One might object that the tasks that we classify as measuring the activation versus the application of a primed concept and the explicit measures of prejudice differ in the amount of awareness that respondents have of the fact that their level of prejudice is being measured. Whereas respondents have no awareness in the tasks that measure spontaneous activations (e.g., lexical decision task and adjective evaluation task), they have a moderate amount of awareness in the task that measures automatic applications (e.g., category inclusion task), and it is likely that they are aware that their prejudice is being measured in the explicit measures of prejudice (e.g., Modern Sexism Scale and Ambivalent Sexism Inventory). However, if this were the case, we would expect the observed correlations to depend on the distance between two tasks on the no awareness–awareness continuum. Specifically, we would expect to find moderately strong correlations between the spontaneous activation and automatic application tasks and between the automatic application tasks and the explicit measures and relatively weak correlations between the spontaneous activation tasks and the explicit measures. This was not the case.
The same reasoning can be applied to the order of completion or shared error variance due to similar methodology. If order of completion or shared error variance were responsible for the results, one might expect that the size of correlations would depend on the amount of time that separated two tasks during the experimental procedure or on the similarity of the methodology. Again, this was not the case. The results are inconsistent with interpretations that appeal to participants’ awareness that prejudice against women was being measured, the order of task completion, or the shared error variance due to similar methodology.
Relation to Previous Findings
These findings replicate some of the earlier studies on implicit and explicit prejudice and are contradictory to others. Wittenbrink and colleagues (1997) found a correlation of .24 between prejudice scores on the lexical decision task and explicit prejudice scores on the Modern Racism Scale (see Table 3), and we observed a correlation of exactly the same size between the lexical decision task and the Modern Sexism Scale. Furthermore, Wittenbrink and colleagues found no relationship between the lexical decision task and Dovidio et al.'s (1986) category inclusion task, and the present results replicate this finding. Contrary to the findings of Wittenbrink et al. (1997), which revealed no relationship between the category inclusion task and questionnaire measures of prejudice, the prejudice scores derived from the category inclusion task in our study were correlated with Modern Sexism Scale scores. This difference may be due to the modifications made in the present version of the category inclusion task (see Footnote 11).
Finally, a significant correlation between the adjective evaluation task and the Modern Sexism Scale was observed in the present study, which contrasts with the lack of relationship that Fazio and colleagues (1995) obtained between the adjective evaluation task and the Modern Racism Scale. In general, given the fact that our implicit tasks involved fewer experimental trials than the original versions of the tasks and that the participants completed several implicit measures one after the other, whereas participants in previous studies generally completed these tasks in isolation, the results of the present study are quite convergent with those obtained by other researchers.
As we pointed out earlier, the relationships observed in this study and previous studies are likely to be affected by the measures’ reliability and validity. In reviewing the current literature on implicit and explicit measures of prejudice, we were surprised to find a difference in the way the measures were developed and tested. Whereas developers of explicit scales usually go through considerable effort to examine the test–retest reliability, internal consistency, and convergent, discriminant, and predictive validity of their new scale, most of the inventors of implicit tasks do not apply the same scientific rigor. The field would greatly benefit if more were known about the psychometric properties of implicit measures of prejudice (for a more detailed discussion of this issue, see Blair, in press).
The absence of a relationship between the automatic application task and other measures of prejudice cannot be overinterpreted, given our failure to replicate the original effect (Devine, 1989; Lepore & Brown, 1997). The nonreplication may be due to the specific cultural stereotype we used, emotionality. Participants may not have agreed on whether emotionality should be considered a desirable or an undesirable feature. In this sense, emotionality may be more a part of the stereotype of women than part of prejudiced attitudes toward women. The participants’ ratings of the main character's emotionality may have been related to implicit measures of stereotypicality (counterstereotypical minus stereotypical adjectives) rather than implicit measures of prejudice (positive minus negative adjectives). However, the present study did not allow us to test this hypothesis. Alternatively, as one reviewer pointed out to us, it could be that participants imagined the main character to be male, even though the sex of the person was not specified. This may have been due to the fact that the default sex for a person whose sex is unknown is male. If this was the case, the stereotypically female trait “emotional” may not have been applicable to the target person (Banaji et al., 1993).
Conclusions
The present review and findings suggest that prejudice should be conceived as a multidimensional construct that involves the automatic activation of prejudice upon perception of a member of the target group, application of these ideas in judgments about a member of a target group, and conscious beliefs and action tendencies toward members of the target group. Although these aspects are likely to be related for some individuals, it may nevertheless be the case that someone who is highly prejudiced in one sense is somewhat less prejudiced in another sense. As the historical overview in the introduction suggests, the search by social psychologists for new measures of prejudice has been driven by the desire to get closer and closer to the heart of the construct itself and to progressively eliminate systematic and random error. Each new measure has been presented as being better than previous ones because it supposedly is a better indicator of the single, underlying construct. It appears that, rather than finding a better single indicator of prejudice, investigators have developed measures that assess different aspects of the construct.
Instead of searching for the one perfect measure of prejudice, and instead of conducting yet another study showing that implicit and explicit prejudice are related (or not), it is probably more promising for future research to systematically examine which measures assess which aspect of prejudice and, especially, which measure best predicts which kind of prejudiced behavior. It may be that if one wants to predict an individual's racial bias in the selection of job candidates, one is better off measuring inferences and attributions. If one wants to predict an individual's friendliness in an interaction with a member of the target group, it may be most promising to measure automatic associations. And, finally, if one wants to predict an individual's voting behavior on an amendment on affirmative action, one may be advised to measure beliefs and attitudes. Future research is necessary to verify these claims.
Dovidio et al.'s (1997) recent studies are a first step in this direction. These authors assessed participants’ prejudice level with one implicit task measuring the automatic activation of concepts (adjective categorization task) and two explicit questionnaires measuring self-reported attitudes. They found that the implicit task best predicted spontaneous responses on a word completion task and nonverbal behaviors in an interaction with a member of the out-group. The explicit questionnaire measures were closely related to deliberative, juridical decisions about a member of the out-group and the relative evaluation of interaction partners belonging to the in-group and the out-group. These findings demonstrate that different measures predict different kinds of prejudice-related behaviors. Future studies will likely support and extend such results. The present study suggests that researchers may want to give great attention to the choice of measure of prejudice they use in their own work, because this choice should depend on the kind of prejudiced behavior they are interested in predicting.
Footnotes
1
Participants in the high power condition provided less racist responses, but this was true only for individuals with a communal relationship orientation (Clark & Mills, 1979); the responses of individuals with an exchange relationship orientation were not affected by the power manipulation.
2
Note that this is not an exhaustive list. Other implicit tasks include the judgment task (Banaji, Hardin, & Rothman, 1993), the false fame task (Greenwald & Banaji, 1995), the implicit memory task (Hense, Penner, & Nelson, 1995), the pronoun task (Blair & Banaji, 1996), and the linguistic intergroup bias task (Von Hippel, Sekaquaptewa, & Vargas, 1997).
3
We use the terms sexism and prejudice against women interchangeably. In both cases, we refer to an individual's tendency to see women in negative terms and to attribute to them negative character traits, independent of whether these traits are stereotypical or not (Fazio et al., 1995).
4
The authors calculated several different prejudice scores based on participants’ responses on the implicit lexical decision task. The correlations reported here are those for the “implicit prejudice score,” which was based on responses to trials involving positive and negative traits stereotypic of the category that appeared as the prime.
5
It should be noted, however, that participants’ prejudice scores on the adjective categorization task were related to their prejudice scores from both explicit measures in one study (Study 2).
6
The only exception is Wittenbrink et al. (1997), who asked their participants to complete
category inclusion task together with their own lexical decision task. They found no correlation between the two implicit tasks and attributed this finding to the methodological weaknesses of the category inclusion task.
7
The implicit association test (Greenwald et al., 1998) is difficult to classify. On the one hand, one might argue that this task assesses spontaneous activations because participants are not asked to make a judgment about a target group. On the other hand, one might argue that responding to positive–negative words and to Black–White first names with either the same hand or two different hands is equivalent to examining the categorical fit between the target group and the attributes and therefore implies an inference process about the target group.
8
The automatic application task can be considered an implicit measure of prejudice, but only in the condition in which participants are primed with words that activate the target category. In other words, only the scores of participants in the female prime condition could be compared with other measures of prejudice. For this reason, a higher proportion of participants were assigned to the female prime condition than to the neutral prime condition.
9
In comparison with Wittenbrink et al.'s procedure, the present task included fewer targets (16 words instead of 48 words) and a smaller number of trials (96 instead of 232). We also presented an equal number of words and nonwords and therefore sampled yes and no responses equally (contrary to Wittenbrink et al., who had 5 times as many words as nonwords). Whereas Wittenbrink et al. (1997) used a prime presentation time of 15 ms, we decided to present the primes for 33 ms (which corresponded to two refresh rates on a computer with 66.67 Hz). This was done for three reasons. First, our primes were smaller (font size of 12 instead of the font size of 18 used by Wittenbrink et al.). Second, the scrambled letter mask was more efficient than the “XXXX” mask used by Wittenbrink et al. (Bargh & Chartrand, in press). Third, pretesting of the same paradigm used in another study (Wasel & Gollwitzer, 1997) had shown that participants had no conscious access to primes that were presented for 33 ms. In contrast to the Wittenbrink et al. task, we did not have an ostensibly unrelated prepriming task in which participants identified first names as being typical for either one or the other of the two target groups under consideration.
10
The only differences from the original
procedure were that participants saw black-and-white photographs of Caucasian men and women and that the fourth phase involved fewer trials. There were 24 photographs (12 men and 12 women) and 24 adjectives (12 positive and 12 negative). Each photograph was paired once with a positive and once with a negative adjective, yielding 48 experimental trials (Fazio et al. included 96 experimental trials in the fourth phase).
11
A number of important modifications were made to earlier versions of the task. Post-experimental interviews in the studies conducted by Judd et al. (1995) revealed that participants had the tendency to recode in their head the two group primes into one single category labeled “human beings.” This recoding obviously decreased the chances of detecting differences in activation and application between the two target groups. Judd et al. (1995) and Wittenbrink et al. (1997) attempted to address this problem by presenting the primes for a shorter duration (500 ms) than that used by
in their original study (2,000 ms). We also adopted a short presentation time, but we made an additional modification by adding so-called “reminder trials” to the study. On these reminder trials, the target was an adjective that was descriptive of one of the two human categories but not of the other (e.g., “pregnant” or “is a husband”). These adjectives were always paired with the human category for which they were not descriptive (e.g., pregnant was paired with “man”), so the correct response on the reminder trials was always no. Note that this is quite different from the so-called incongruent trials (such as the prime “man” paired with an adjective typically associated with women), in which the correct answer is yes. In all other respects, reminder trials were identical to experimental trials. Of the 9 reminder trials, 3 were used as practice trials, and 6 were interspersed among the experimental trials (there was 1 reminder trial after Experimental Trials 10, 20, 30, 40, 50, and 60). Therefore, the experimental phase consisted of 78 trials: 72 experimental trials and 6 reminder trials. Response times to reminder trials were not analyzed.
The Story Used in the Automatic Application Task: “A Week in D.'s Life”
D. went downtown on Saturday. In one of the stores, D. saw a pair of pants which D. liked. Although the pants were expensive, D. bought the pants without hesitating. On Sunday, we went on a boat trip on the lake with some friends. After a while I noticed that water was accumulating at the bottom of the boat, and D. checked what was going on. Briefly after D. told everybody that we had a hole in the boat I noticed that D. turned around and looked at the shore. Fortunately, we reached the harbor without drowning. On Tuesday, my boss fired me. When D. came home in late afternoon, I mentioned the bad news. D. was very disappointed at first but shortly after, D. was calm again. Later during dinner, D. wanted to eat a yogurt but added salt instead of sugar. On Thursday evening, we dressed formally because we had planned to go to an art opening. We stopped at the university on the way to the art gallery. D. wanted to look at the grades on the bulletin board of the department. D. had failed the exams. D. turned around and walked away. After a sandwich, which we bought at a hot-dog stand, we went to the art opening. A particular painting attracted our attention, and we looked at it for a while. Although D. was no expert in this domain, D. had the feeling that the price for the painting was much too high. When we came home we both were very tired and we went to bed right away.
Stimulus Words Used in the Subliminal Priming Phase of the Automatic Application Task
| Female words | Neutral words |
|---|---|
| Female | Glasses |
| To chat | Tree |
| Dress | Picture |
| Mother | Bet |
| Woman | Curl |
| Witch | Curtain |
| Lady | Nose |
| Pregnant | Turkey |
| To knit | River |
| To give birth | Moon |
| Bosom | School |
| Wife | Boat |
| Miss | Church |
| Erotic | Cloud |
| Hair dresser | Oven |
| Emancipation | To drink |
Stimulus Words Used as Targets in the Lexical Decision Task
| Positive words | Negative words |
|---|---|
| Attractive | Brutal |
| Affectionate | Gossipy |
| Beautiful | Heartless |
| Cooperative | Hysterical |
| Determined | Manipulative |
| Sensuous | Neurotic |
| Strong | Sexist |
| Sturdy | Cold |
Stimulus Words Used as Targets in the Adjective Evaluation Task
| Positive words | Negative words |
|---|---|
| Analytic | Boastful |
| Caring | Boorish |
| Charming | Cold |
| Comradely | Credulous |
| Courageous | Fearful |
| Fond of children | Indifferent |
| Gentle | Insensitive |
| Independent | Passive |
| Realistic | Submissive |
| Strong | Talkative |
| Tactful | Underhanded |
| Understanding | Violent |
Category Inclusion Task: Stimulus Words
Stimulus Words Used as Targets in Reminder Trials
| Prime | Target | Correct response |
|---|---|---|
| Man | Pregnant | No |
| Man | Becomes a mother | No |
| Man | Made up | No |
| Woman | Bald | No |
| Woman | Becomes a grandfather | No |
| Woman | Is a husband | No |
