Abstract
We present a core–concept model (CCM) suggesting that stimulus centrality is an important factor in category representations in implicit measures. We tested the hypothesis that idiographic stimuli (first name, birthday) are more central and therefore assess self–concept in Implicit Association Tests (IATs) more validly than generic and nonspecific stimuli (me, you). Superior validity of the idiographic variant emerged across three different domains of self–concept. First, an idiographic self–esteem IAT displayed higher correlations than a generic IAT with self–assessments and observer–assessments of self–esteem. Second, an idiographic body scheme–IAT predicted subjective ratings of body image and objective body–mass index. Third, an idiographic aggressiveness–IAT had higher incremental validity for unprovoked aggression when interacting with explicit measures of aggressiveness. We conclude that idiographic stimuli focus participants’ attention on the core features of the self, hence, tapping into self–related associations to a stronger degree than generic stimuli. Copyright © 2011 John Wiley & Sons, Ltd.
Would you pay rather attention when someone called ‘hey, you’ or when someone called your name, after spotting you in the crowd? Apparently, whether we address each other by unspecific or specific labels (such as pronouns or names) can have a tremendous effect on what we focus on, or how easily we pay attention. Can the different ways of addressing the self be exploited to improve on the assessment of self–concept? The self–concept refers to a very refined knowledge structure about the person we know best (Symons & Johnson, 1997). It plays a key role in memory when we filter information, construct knowledge, or evaluate objects and people including ourselves (Rogers, 1959; Markus, 1977; Greenwald & Pratkanis, 1984; Mussweiler & Strack, 2000). It is a dynamic associative structure with manifold semantic and evaluative links (Collins & Loftus, 1975; Markus & Wurf, 1987). In the present research, we investigate the hypothesis that the stimulus setup can influence the validity of implicit measures. The Implicit Association Test (IAT; Greenwald, McGhee, & Schwartz, 1998) is one of the most popular implicit measures to target the self–concept (Greenwald & Farnham, 2000). Previous findings have shown that the specific stimulus selection shapes the outcomes of the IAT procedure (e.g. Bluemke & Friese, 2006; Govan & Williams, 2004). Across three psychological domains, we will show that idiographic representations by self–relevant stimuli (e.g. BRAD or BOB) in the IAT are likely to be more sensitive to self–related associations than unspecific, generic pronouns (e.g. ME or YOU).
Theoretical Considerations: A Core–Concept Model
To explain the impact of stimuli on the validity of self–concept implicit measures, we propose a core–concept model (CCM) that is based on existing research on semantic network models and includes self–schemas as cognitive–affective structures (Markus, 1977). Object–related associations are typically depicted in social knowledge structures such as Greenwald et al.'s (2002) ‘unified theory.’ The model incorporates triangular associations between attitudes (object–valence relations), stereotypes (object–attribute relations), self–esteem (self as object–valence relations), and self–concept (self as object–attribute relations). The CCM makes additional qualifications and assumptions about the centrality of the stimuli required for the representation of a concept. Centrality has been defined as ‘the relative importance that various aspects bear in a person's conceptualization of objects in a given cognitive domain’ (Scott, Kline, Faguy–Coté, & Peterson, 1980, p.12).
Obviously, not all aspects are equally central in defining who one is. Trait centrality refers to the idea that some personality traits are closer to the core identity than alternative traits (Sedikides & Skowronski, 1993). For example, age may be more important for some people than for others. Or consider the idea that people's global self–esteem can be understood only to the extent that one acknowledges the esteem attached to self–components, which themselves are regarded by persons as more or less central (Rosenberg, 1979). In its strictest sense, centrality refers to invariant characteristics of the concepts. In the case of self–concept, this is a person's core.
Take an exemplary participant (including a social comparison standard) as depicted in Figure 1. Centrality can be illustrated by the relative strengths of links, or spatial distances, between hierarchically related concepts in a semantic network model. Some stimuli are closer to the concepts or have stronger ties than others. The more central a stimulus is, the more conceptually relevant the associative echo from the stimulus prompt will be. Stimulus centrality helps a participant focus on the concept in question during a measurement procedure. Consequently, when targeting the self–concept with central stimuli, the mental representation will be centred more strictly on the core self. In the IAT, this will influence the ease with which a stimulus can be validly associated with me, thus influencing which aspects during the measurement procedure constitute compatible and incompatible category alignments, ultimately reducing nuisance variance in the measure.

A social knowledge structure (adapted from Greenwald et al., 2002) according to the core–concept model (CCM) depicting the self of an exemplary person including a friend as a comparison standard: Nodes represent semantic concepts, and links represent associations between concepts that can vary in strength (line thickness). In contrast to stereotypes and attitudes, self–concept and self–esteem always involve direct links to ME. Depending on the comparison standard, aspects relating to NOT–ME can be included in the self too, either directly or indirectly (as mediated by links). ME and NOT–ME can overlap in social categories (gender, professional roles), trait attributes (e.g. intelligent), and first name initials (B). Alternatively, ME and NOT–ME can uniquely relate to attributes (e.g. forgetful, athletic), pronouns (my, your), and family name initials (L, F). Aspects central to the self are represented closer to ME, resulting in the self being linked to pronouns, initials, birthdays, and other personal identifiers with increasing centrality.
To qualify the CCM more formally, we assume that for accurate inferences about people's associations, the concepts in implicit measures need to be tapped into as centrally as possible. This can be achieved with stimulus selections that are (a) representative for and (b) proximal to the core concept. The stimulus selection to represent self–concept will determine how centrally represented a person's self–concept is. Importantly, these stimulus selection rules apply to any kind of stimulus type—regardless whether letters, numbers, action verbs, state verbs, adjectives, social categories, nouns, or names are used to address the self. Moreover, we assume that the CCM principles apply regardless whether the self–concept is assessed by category–based measures (that use category labels) such as the IAT or exemplar–based measures (that do not rely on self–related categories) such as the affective priming task (Wentura, Kulfanek, & Greve, 2005) or the name–letter–task (NLT; Nuttin, 1985). Although the CCM is meant to be applicable to a range of measurement situations, we test the basic idea of tapping into core concepts by exploring the centrality of self–related aspects. And although the model is applicable to various kinds of implicit measures, in the following experiments, we examine its utility with respect to self–concept IATs.
With regard to the first requirement, representativeness, the CCM appreciates that different cues increase the accessibility of different representations. When the concept representation within an implicit measurement procedure is not represented in a balanced manner or otherwise deviates from the concept as it is encountered in a specific social context, the validity of the measure is reduced markedly (Blair, 2002; Barden, Maddux, Petty, & Brewer, 2004; Rydell & Gawronski, 2009). Therefore, the stimulus selection should always be representative of the concept. If the selection targets the core concept, then the stimuli should be superior to stimuli that merely elicit peripheral and fluctuating mental representations. For if peculiarities prevent the mental construal of the intended concept, the measurement outcomes will be less indicative of the conceptual associations. In support of this, implicit–explicit (I–E) as well as implicit–implicit (I–I) correlations of an IAT that employed synonyms of the category labels as stimuli were higher than that of a (typical) stimulus–based IAT that allowed fluctuating mental construal because of peculiar stimulus connotations (Steffens, Kirschbaum, & Glados, 2008). With regard to self–concept measures then, ideally, those stimuli should be chosen that reduce fluctuations and allow access to the invariant features representing a person, unless one is explicitly interested in a context–dependent self construal (see McConnell, 2011).
With regard to the second aspect of the CCM, proximity, one and the same concept can be represented by stimuli that reside at different locations in the semantic network (Rosch, 1978; Rosch, Mervis, Gray, Johnson, & Boyes–Braem, 1976). Although two stimuli may both have ties with a concept, the semantic distance between these stimuli and the concept may differ. For instance, a researcher may choose EINSTEIN to represent the target concept math in an IAT. The semantic links from EINSTEIN to the concept math can be construed as such, but they are weak from the perspective of what characterises math at heart. By contrast, ALGEBRA, NUMBER, or EQUATION resonate stronger with math. Whenever a researcher incorporates stimuli to target at conceptual associations, proximal stimuli that resonate strongly are to be preferred to distal ones that resonate only weakly with the concept. Again, this pertains to the self as well, and this is where the distinction between idiographic and generic stimuli becomes crucial. Stimulus sets that represent self– and other–concept in implicit measures can vary in their degree of idiography. Generic and idiographic stimulus selections reflect the nomothetic and the idiographic approach to person description―a distinction developed by later Heidelberg philosopher Windelband (1894/1998), introduced into psychology by Münsterberg (1899/1994), and popularised by Allport (1937). The nomothetic approach draws on normative, standardised procedures to compare individuals, mostly using quantitative data. The idiographic approach relies on person–specific profiling, often using qualitative data. The latter approach takes many degrees of freedom to reach a full understanding, but at the risk of volatile data and low chances for exact replication. Neither of these approaches is a priori more legitimate than the other (cf. Molenaar, 2004; Tuerlinckx, 2004).
Thus, pronouns such as I or MINE are used as normative stimuli, in line with a nomothetic approach to measurement. In this case, the concepts and the measurement procedure can be held constant, applied to anyone, and still bear somehow on individual characteristics. At the downside, generic pronouns are distal and hardly descriptive of people's cores. (Likewise, nomothetic stimuli such as THEY or THEIR can represent the concept others or not–me, but they remain rather distant and vague.) By contrast, idiographic stimuli, such as name and birthday, convey highly person–specific aspects, yield vivid mental representations, and typically imply temporally invariant characteristics. Therefore, idiographic aspects are closer to the core of the self. At an intermediate level between idiographic and generic stimulus selections, sociographic stimuli (e.g. FEMALE or STUDENT) can be used to represent social categories such as gender or other social groups (Richetin, Richardson, & Mason, 2010). Some of these social group memberships can change, others cannot; but sociographic stimuli always differ from idiographic ones by their degree of person specificity. Thus, the proximity of sociographic stimuli to the core self—on average—tends to be lower than that of idiographic stimuli.
The CCM suggests that even within a level of relatively low proximity, some stimuli might still be more central than others. The self can be tapped via semantic routes at levels even lower than that of pronouns, provided they are idiographic, pre–meditated, and accessible. The name–letter task exploits this logic in using the liking of name letters as indication of self–esteem (Nuttin, 1985). The effect is strongest for participants’ initials commonly used for acronyms (Hoorens & Todorova, 1988; Stieger, Voracek, & Formann, 2011). Birthday numbers are likewise associated with the self and better liked than other numbers (Kitayama & Karasawa, 1997; Koole, Dijksterhuis, & Van Knippenberg, 2001).
Taken together, the CCM extends Greenwald et al.'s (2002) unified theory to incorporate the centrality of items and to clarify the semantic overlap and relative hierarchical distance between items and concepts. Centrality considerations apply to all sorts of concepts, and misspecification of categories by choosing suboptimal stimuli runs the risk of introducing nuisance variance. A specific methodological consequence following from the CCM is the hitherto untested hypothesis that IATs employing idiographic stimuli should have higher validity than IATs employing generic stimuli. There is some evidence to support this assumption. Greenwald and Farnham (2000) used both a generic and an idiographic self–concept IAT. The generic IAT consisted of preselected self–concept items (pronouns) whereas a second IAT comprised idiographic items, that is, participants themselves chose target stimuli that were descriptive for the individuals (e.g. their names). The researchers found stronger I–E correlations for the idiographic gender–self–concept IAT as compared with the generic IAT, |r| = .33 vs .22, despite a high correlation between both types of IAT (r = .68). The authors hesitated to infer the equivalence of the two IAT types—the idiographic IAT ‘better defined that latent variable’ in a confirmatory factor analysis—yet they suggested that the generic format was ‘likely to be the more efficient’, as the idiographic format requires collecting participant–specific information (pp. 1030–1031).
This initial finding was later corroborated by a meta–analysis on the relationship between IATs and explicit measures (Hofmann, Gawronski, Gschwendner, Le, & Schmitt, 2005). The study found higher I–E consistency for idiographic than generic IATs (r = .32 vs . .15), but this analysis rested on a low number of idiographic studies (k = 6). In addition, the meta–analysis compared IAT types post hoc and across different study contents, samples, and explicit measures. Therefore, it does not allow for causal inferences and can merely hint to generalisability if causality is established independently. In sum, these data suggest that idiographic IATs may assess self–relevant associations better than generic IATs.
To date, possible differences between generic and idiographic IATs have neither been theoretically explained nor systematically investigated. The current research therefore (1) suggests a theoretical model, the CCM, to explain differences between idiographic and generic self–concept measures, and (2) tests the CCM–derived prediction that idiographic stimuli outperform generic stimuli in self–concept IATs across topics, within samples (repeated measures), while using the identical validation criteria.
Overview of Experiments and Main Hypothesis
From the CCM logic, it follows that idiographic stimuli should raise the focus on the core self more than generic stimuli do. Using pronouns can be sufficient to tap into the self–concept, but an idiographic representation of IAT categories should reliably favor a central and more resonant route to the self, resulting in higher validity of assessed self–associations (Perugini, O'Gorman, & Prestwich, 2007). Whereas the general theoretical hypothesis is that tapping into core concepts increases validity, the main empirical hypothesis for the present research is: Idiographic self–representations reduce nuisance variance in IATs, as they narrow the nomothetic–idiographic gap, and this will be evident in stronger relationships of idiographic than generic implicit measures with domain–specific criteria.
We compared idiographic IATs (henceforth idio–IAT) that applied individual descriptions of self/others (e.g. first name, date of birth) to generic IATs (gen–IAT) that used pronouns (e.g. I, YOU; cf. Bosson, Swann, & Pennebaker, 2000; Greenwald & Farnham, 2000). To limit the between–participant variation, our selection procedure for idiographic stimuli restricted which person the participants could choose to represent the other–category (Karpinski, 2004). Depending on a participant's choice of the other–category, self and other might have been confounded with positivity differences, or involved a gender–mismatch. Therefore, we instructed participants to choose a well–known person such as a close friend or a relative of the same sex and of similar age. Similar approaches have been used by other researchers (e.g. Yamaguchi et al., 2007; Franck, De Raedt, Dereu, & Van den Abbeele, 2007). Moreover, we avoided fluctuating items that might only be vaguely and non–permanently related to the self, such as telephone number or student ID. To increase statistical power, cross–sectional within–subject designs required that participants took both IATs on the same occasion. We tested the better quality of idiographic IATs for self–evaluations (‘implicit self–esteem’; Experiment 1), body–schema (‘implicit weight identity’; Experiment 2), and the ‘automatic aggressive self–concept’ (Experiment 3).
Which criteria can be used to establish the higher validity of any IAT? Implicit and explicit self–concepts are assessed with implicit and explicit measures. Neither of these is process pure, so, the conceptual level must not be equated with the measurement level. For each topic, we will discuss whether the utility of idio–IAT will be evident in direct or interactive relations with explicit measures (Perugini, 2005; Perugini, Richetin, & Zogmaister, 2010). Generally, implicit and explicit measures at least partly depend on the same associative basis and can influence each other (Gawronski & Bodenhausen, 2006; Strack & Deutsch, 2004; Whitfield & Jordan, 2009); therefore, a complete I–E dissociation is unlikely in the first place (Nosek, 2007). Research has also shown that increasing self–accessibility results in stronger implicit–explicit correlations (LeBel, 2010; Perugini et al., 2007); this should make I–E consistency more likely for idiographic IATs. If one IAT variant consistently results in higher I–E correlations in a head–to–head comparison—equal reliability provided—we will conclude that lower correlations indicate lower validity (Hofmann, Gschwendner, Nosek, & Schmitt, 2005). To reduce same–source bias, we will also employ objective indicators such as observer assessment, physical properties, and objective behaviour.
Experiment 1
In the first study, we looked at implicit self–esteem. To our knowledge, idiographic self–esteem IATs have hardly ever been used, hence, idiographic and generic self–esteem IATs have not been compared within participants or between samples drawn from the same population. Powerful within–sample comparisons should reveal validity differences, if they exist. Following the CCM, we hypothesised that idio–IAT would show stronger signs of validity than gen–IAT with regard to the following criteria: First, we expected higher I–E consistency for IATs that apply idiographic rather than generic stimuli (Hofmann, Gawronski, et al., 2005). Despite some non–significant I–E self–esteem correlations in the literature (e.g. Baccus, Baldwin, & Packer, 2004; Jordan, Spencer, Zanna, Hoshino–Browne, & Correll, 2003; Koole et al., 2001), low but significant I–E relationships for self–esteem have repeatedly been found before (e.g. Conner & Barrett, 2005; Gebauer, Riketta, Broemer, & Maio, 2008; Greenwald & Farnham, 2000; Jordan, Whitfield, & Zeigler–Hill, 2007; Karpinski, 2004; Krizan, 2008; Krizan & Suls, 2008; LeBel, 2010; Olson, Fazio, & Hermann, 2007; Wentura et al., 2005). Second, implicit measures have predicted non–verbal behaviour indicative of personality aspects that people are unable to report (e.g. Asendorpf, Banse, & Mücke, 2002), so, we expected higher correlations of the idio–IAT than gen–IAT with experimenters’ ratings of participants’ self–esteem. Third, we expected higher correlations of idio–IAT than gen–IAT with sense of coherence (SOC; Antonovsky, 1979). SOC acts as a moderator of healthy (or unhealthy) responses to stress and serves as a proxy for physical and mental health (Eriksson & Lindström, 2006). It is a global orientation that entails (a) how good an individual is at comprehending things that occur to her, (b) how skilled an individual is to deal with challenges throughout life, and (c) how meaningful life is experienced despite obstacles. As such, SOC should be positively related to self–esteem (Johnson, 2004).
Method
Procedure and materials
Participants
One–hundred and forty–four (29 male) psychology undergraduates at Heidelberg University (Mage = 23.55 years; SD = 7.17) took part in a lab study on computer–based assessment in exchange for course credit. Participants worked in groups of three to six, yet individually, in cubicles, first on gen–IAT, then idio–IAT, and finally, the questionnaires. Once they started working on the tasks, the experimenter rated the self–esteem of participants. After the examination had ended, participants were debriefed and compensated.
Procedure
The order of (a) measures, (b) IAT blocks (within the measures), and (c) IAT stimuli (in the blocks) remained constant across participants because we were primarily interested in correlations, not mean IAT effects. We were guided by the following reasoning: First of all, counterbalancing the design factors would have introduced error variance that is usually detrimental for correlations (Egloff & Schmukle, 2002). Second, although meta–analytic findings indicate that—on average and across all attitudinal domains—no systematic differences for I–E–order could be observed (Hofmann, Gawronski, et al., 2005), specifically with regard to self–concept, stronger I–E consistency has emerged when the explicit measure preceded the implicit measure (e.g. Bosson et al., 2000; Glen & Banse, 2004; Krizan & Suls, 2008). As prior reflection might influence the subsequent assessment of implicit self–esteem, resulting in biassed I–E consistency owing to a methodological factor, we decided that explicit measures should follow implicit ones. Third, idio–IAT might invoke a person's self–concept more vividly than gen–IAT, so an idio–IAT–gen–IAT order might have induced unwanted carry–over effects (LeBel, 2010). Finally, as most previous studies relied exclusively on gen–IAT, we kept its position comparable to the one in previous studies, thereby putting idio–IAT at a potential disadvantage as later IATs tend to suffer from fatigue or strategic effects, error proneness, and reduced reliability (Greenwald, Nosek, & Banaji, 2003; Nosek, Greenwald, & Banaji, 2007). In sum, a standardised order conserves the properties of typical gen–IATs, while putting the presumably higher validity of idio–IATs to a stringent test.
Observer assessment of self–esteem
After introducing themselves and after having seated participants at the computers, approximately 1 minute after arrival, the experimenters rated participants’ self–esteem on a single 11–point rating scale (0 = no self–esteem at all; 10 = very high self–esteem). Experimenters were unacquainted with the participants and unaware of name, date of birth, or explicit and implicit self–esteem scores (cf. Jordan et al., 2007; Ranganath, Smith, & Nosek, 2008). Therefore, they had to rely on spontaneous gut reactions for their judgments. Prior to the data collection, experimenters underwent a coding training and agreed on the usage of the scale. By identifying people they all knew and by agreeing on which answering categories to assign to them, they anchored the meaning of the scale. Training highlighted the importance of paraverbal and nonverbal cues (speech parameters, facial expression, gaze avoidance, or head position; cf. Hofmann, Gschwendner, & Schmitt, 2009). Experimenters formed an overall impression of each participant during the initial interaction at the beginning of the study (less than a minute). We restricted coding to a global overall rating because extensive coding activity on several participants at the same time was infeasible and would have distracted and possibly raised suspicion. Research has shown that single–item measures can be reliable measures of global concepts (Wanous & Reichers, 1996; Wanous, Reichers, & Hudy, 1997) and that single–item measures can be good proxies for global self–esteem (Robins, Hendin, & Trzesniewski, 2001).
Implicit measures: Self–esteem–IATs
Gen–IAT and idio–IAT were applied as a nine–block procedure that required approximately 10 minutes (404 trials; see Table 1). Category labels and procedural parameters (e.g. inter–trial interval of 150 minutes) were identical. Exceptions to the rule are (a) the stimuli that represented the self–categories and other–categories, and (b) the missing single discrimination of positive−negative attributes for the second IAT. Every stimulus was practised and presented at least once in the respective practice and combined blocks. Generic target stimuli for self were ICH (I), SELBST (myself), MIR (me), MEINE (mine), EIGEN (own). The other–category was represented by ANDERE (others), EUCH (you [dative plural]), IHR (her or you [vocative plural]), EURE (your [genitive plural]), FREMD (foreign [German for not–me/strange/not familiar with]). Idiographic target stimuli for self were the first name, family name, place of birth, year of birth, and month and day of birth (e.g. 22 June). Equivalent information was obtained for a close and well–known person (e.g. a friend or a relative, such as a cousin) of roughly the same age and identical sex. To not confuse self– and other–category, items had to be different.
Procedure of Generic and Idiographic Implicit Association Test (Experiment 1; Analogous Setup for Exp. 2–3)
For both IATs, the valence categories contained 18 stimuli each to reflect the multiple dimensions of the multidimensional self–esteem scale (MSES; Schütz & Sellin, 2006; see succeeding text). This was done to represent self–esteem broadly and vividly and to increase the chances of conceptual I–E consistency. Three pairs of adjectives were derived from each of the six MSES–subscales (see Appendix). The adjectives did not differ significantly in word length or number of syllables (ts < 1), but with regard to valence according to the average ratings of two raters who rated each item on 5–point rating scales from −2 (very negative) to +2 (very positive) (inter–rater reliability: Krippendorff's (1970) α [ordinal] = .81), Ms = 1.19 vs –1.11 (SDs = 0.41 vs 0.72), t(34) = 11.81, p < .001 (corrected for unequal variances), Cohen's d = 3.94. We computed IAT effects as D5–scores (see Greenwald et al., 2003). Higher scores represent more positive self–esteem. Cronbach's α of gen–IAT and idio–IAT amounted to .87 and .89, respectively.
Explicit measures: Self–esteem and sense of coherence
First, a German version of the Rosenberg global self–esteem scale (RSES; Rosenberg, 1979; von Collani & Herzberg, 2003) was administered. Ten items assessed self–enhancement and self–derogation as positive and negative aspects of global self–esteem on 4–point rating scales (α = .86). RSES is not domain specific and taps into a person's overall global self–worth. As a more specific measure, participants filled in the multidimensional self–esteem scale (MSES; Schütz & Sellin, 2006). The 32 MSES–items were answered on 7–point rating scales (α = .93). The MSES features (1) emotional self–esteem, (2) social self–esteem during social contact, (3) capability to deal with criticism, (4) performance–related self–worth, (5) self–esteem owing to physical attractiveness, and (6) self–esteem owing to athleticism. The overall self–esteem score, based on all the items, served as a validity criterion. Finally, the short form of Antonovsky's Sense of Coherence scale (SOC–13; Antonovsky, 1993, 1997) assessed the extent to which participants express confidence in life as being predictable and explicable (comprehensibility), their availability of resources to meet the demands throughout life (manageability), and the general significance of life in terms of challenges and worthiness of investment (meaningfulness). We used all answers on the 7–point rating scales to form an aggregate scale score (α = .78).
Data analysis
Unless indicated otherwise, potential univariate and bivariate outlier values (Mahalanobis distance) were not detrimental to analyses and conclusions. When contrasting IAT correlations, within–sample planned comparisons with one–tailed testing were used (Steiger, 1980).
Results and Discussion
As regards convergent validity, global self–esteem (RSES) and multidimensional self–esteem (MSES) showed high overlap (cf. Table 2). By comparison, the relationship between the IATs was significant, but relatively weak, given that two equally reliable measures targeted at implicit self–esteem. Both IAT variants correlated significantly and to a similar extent with RSES. Only idio–IAT showed a significant relationship with MSES, resulting in a marginally significant difference in correlations between the two IAT variants, Steiger's Zdf = 141 = 1.26, p = .10. The weak correlation between the two IATs reduced the power for this planned comparison (Steiger, 1980). [Our sample size was in line with power considerations based on Greenwald and Farnham's (2000) finding of r = .68 between the IAT variants. Had our IATs correlated at this level, or even only as low as .40, the p–value would have been significant.] In addition, idio–IAT correlated higher than gen–IAT with the experimenters’ ratings of participants’ self–esteem (Zdf = 141 = 1.64, p = .05), and descriptively so with sense of coherence (Zdf = 141 = 1.25, p = .11). Observers’ intuitively reported summary evaluations of global self–esteem were significantly, although weakly, related to global self–esteem according to RSES (cf. Robins et al., 2001). These global observer ratings may not qualify as a comprehensive assessment of participants’ self–esteem–based behaviour, but they may capture less controlled aspects of a participant's facial mimics, gaze, voice, body posture, and interaction style (cf. Hofmann, Friese, & Strack, 2009; Hofmann, Gschwendner, et al., 2009) that leak through from self–esteem during the brief encounter. In the midst of well–established multiple–item measures of self–esteem, idio–IAT was the strongest predictor of observer ratings of participants’ self–esteem.
Descriptives and Interrelations (Experiments 1–3)
Note:
p < .10.
p < .05.
p < .01.
p < .001;
N = 123 (due to outliers; cf. Fn.1).
In sum, Experiment 1 supported the usefulness of idiographic IATs in line with the expectations derived from the CCM. When implicit self–esteem was measured with idiographic items rather than generic items, the IAT showed stronger relations with several criteria such as multi–dimensional self–esteem, observer assessments of participants’ self–esteem, and sense of coherence.
Experiment 2
Subjective body schemes play an important role in the self–concept (Markus, Hamill, & Sentis, 1987). Whereas body image refers to the mental representation of one's own body shape, body ideal represents an endorsed standard and is conceptually independent of one's current body shape. (Empirically though, when people at the more extreme ends of the weight spectrum are included in a sample, actual body shape exerts some influence on what body shape is considered to be ideal; e.g. Woodman & Steer, 2011). Insofar as discrepancies between body image and body ideal exist, they will lead to body dissatisfaction (e.g. Williamson, Gleaves, Watkins, & Schlundt, 1993). To the extent that participants spontaneously associate themselves differently with thinness and thickness, they show ‘implicit weight identity’ (Grover, Keel, & Mitchell, 2003). With previous findings, implicit weight identity should mirror to some extent the explicitly reported current body shape and objective physical properties such as body–mass index (BMI; cf. Grover et al., 2003). By contrast, implicit weight identity should not reflect one's own or society–based standards such as body ideal. In line with the CCM, idio–IAT should reflect implicit weight identity better than gen–IAT.
Method
Procedure and materials
Participants
One–hundred and twenty–three students of the University of Heidelberg (43 male) took part in the experiment in exchange for course credit or a snack bar and a soft–drink for the 20–minute session. Their mean age was 22.51 years (SD = 7.33).
Procedure
First, participants completed implicit measures of weight concept and provided socio–demographic data including self–reported size and weight. Next, explicit body image and body ideal were assessed, as well as objective physical properties.
Implicit measures: Weight–identity–IATs
The setup largely followed Experiment 1, with the exception of the attribute categories thin and thick. Trait words that formed 10 pairs of opposite content were chosen as attribute items, under the constraint that stimulus valence and word length were matched (see Appendix). IAT scores were computed as before (subtracting self + thick from self + thin). The higher an IAT effect, the stronger was implicit weight identity related to thickness. The IATs (both αs = .87) correlated at r = .48.
Explicit measures and body mass index
Participants were presented with a sex–specific figure rating scale (Stunkard, Sorensen, & Schulsinger, 1983). These visual analogue scales, which apply body silhouettes from thin to fat along a fine–graded ruler (1 = smallest ectomorph; 9 = largest endomorph), asked for the current body image as well as the subjective body ideal (Rand, Resnick, & Seldman, 1997). Higher ratings indicate a robust body shape, or a personal preference toward robust forms. We also obtained self–reports and factual data on weight and size, yet exclusively the latter ones were used for the analysis (cf. Nyholm et al., 2007). BMI scores were computed as weight (in kilograms) divided by the squared size (in meters). On average, the sample was neither underweight (BMI < 18.5), nor overweight (BMI > 25), M = 23.01 (SD = 3.11).
Results and Discussion
Implicit weight identity showed significant convergence with body image only when assessed with idio–IAT (cf. Table 2), making it a stronger predictor than gen–IAT, Steiger's Zdf = 120 = 1.62 (p = .05). As expected, no significant correlations of implicit weight identity with body ideal emerged, which can be taken as indication of discriminant validity. Surprisingly, there was no direct relationship with BMI. To account for unsolicited influences of sex and age, which shared variance with body schema variables, we regressed an IAT simultaneously with control variables in multiple regression analyses (cf. Table 3). Idio–IAT, not gen–IAT, correlated uniquely with body image, but not with body ideal, attesting to the convergent and discriminant validity of the idio–IAT once more. After controlling for sex and age, idio–IAT correlated significantly with BMI too, whereas gen–IAT did not. These data support the superiority of idiographic stimuli over generic stimuli for establishing the validity of implicit weight identity.
Convergent and Discriminant Validity (Experiment 2): Multiple Regressions of Body Image, Body Ideal, and Body Mass Index on Age and Sex as well as Idiographic (Model 1a) and Generic (Model 1b) Implicit Measures
Note: N = 123, df = 1, 119; β = standardised beta coefficient.
Experiment 3
The ‘automatic aggressive self–concept’ has been successfully used to predict rough and irregular behaviour in sports, although this behaviour is difficult to foretell by self–reports (Banse & Fischer, 2002). It also reflects long–term and short–term consequences of playing violent computer games (Uhlmann & Swanson, 2004; Bluemke, Friedrich, & Zumbach, 2010) as well as treatment effects of social skills trainings (Gollwitzer, Banse, Eisenbach, & Naumann, 2007). Because of typically low I–E consistency in aggression research (Bluemke & Zumbach, 2007), deviating from our predictions in Experiments 1 and 2, we expected neither idio–IAT nor gen–IAT to correlate substantially with explicit measures. Instead, we expected incremental validity when predicting aggressive behaviour.
Implicit measures are usually good at predicting uncontrolled and nonverbal behaviours (e.g. Asendorpf et al., 2002). As participants can usually monitor and control their behaviour, the usefulness of implicit measures strongly depends on participants’ self–control (Friese, Hofmann, & Schmitt, 2008). To begin with, we used an unobtrusive picture–choice task that reduces the likelihood that the behaviour is actually monitored and controlled (Webb, Campbell, Schwartz, & Sechrest, 1966). Yet, the incremental validity of aggressiveness–IATs should become particularly obvious when explicit measures indicate low self–control (Friese & Hofmann, 2009; Hofmann, Friese, et al., 2009). Participants low in trait self–control are known to respond more aggressively when provoked (Stucke & Baumeister, 2006; DeWall, Baumeister, Stillman, & Gailliot, 2007). Even in the absence of provocation, people differ in how they construe an ambivalent situation (irritability), the amount of behavioural control they exert (impulsivity), and their overall likelihood to control their anger or show hostility (trait aggressiveness). An individual's propensity for aggression may then be predicted on grounds of impulsive precursors (as indicated by IAT scores) when self–control is low, in other words, when irritability, impulsivity, and trait aggressiveness are high. Instead of a direct relationship, or incremental validity beyond explicit measures, we hypothesized an I × E interaction (Perugini, 2005; Perugini et al., 2010), and this interaction should be more pronounced for idio–IAT.
Method
Procedure and materials
Participants
Students of various majors at the University of Heidelberg took part in a lab study on personality and reaction times in exchange for course credit or a snack and a soft–drink. Data sets from N = 125 (43 male; M = 23.38 years, SD = 6.90) were available. 1 Because of the aversive nature of the aggression measure, we took care that nobody suffered from depressive or suicidal thoughts or underwent psychotherapeutic treatment at the time.
Procedure
Before the participants could become aware of the study topic, we covertly assessed aggressive behaviour. Upon arrival, the experimenter asked the participant for help with an additional 1–minute task before the main examination, namely to select 10 out of a set of 30 pictures. Ostensibly, this random selection of pictures was needed for another participant in an unrelated study. Given the many experiments run at the institute at the time, the cover story was plausible. After this measure of aggressive behaviour, the session continued with the implicit measures and paper–and–pencil questionnaires. Finally, the experimenter debriefed and compensated the participants. Funneled debriefing confirmed that none of the participants suspected the picture selection to be related to the experiment.
Behavioural measure of aggression
Following Mussweiler and Förster's (2000) procedure, the participants saw 30 photos from the International Affective Picture System (Lang, Bradley, & Cuthbert, 2005). Participants were left for themselves to select the pictures they considered most suitable for an approaching fellow student. The lack of any explicit selection rules underscored the randomness of the choice. Half of the randomly ordered photos (15 x 20 cm) displayed positive content; the other half displayed negative, emotionally disturbing content. Behavioural aggression was indexed by the ratio of negative to total number of the pictures selected by a participant. To the extent that participants choose negative pictures they violate the implicit social norm to not inflict pain on others. When prompted, participants are well aware of the psychological consequences of the procedure and dislike receiving negative pictures themselves (Denzler, Förster, & Liberman, 2009). As the images are aversive and create discomfort in the recipient, selecting negative pictures and exposing others to them constitutes unprovoked aggression (Mussweiler & Förster, 2000).
Implicit measures: Aggression–IATs
The only change in setup involved the categories peaceful and aggressive. Two pairs of complementary action verbs, matched in word length, mirrored each of the subscales of the Buss and Perry (1992) aggression questionnaire (physical aggression, verbal aggression, anger, and hostility; see Appendix). Higher IAT scores indicate stronger associations of the self with aggressive relative to peaceful. Both gen–IAT and idio–IAT were reliable (αs = .82 and .85) and correlated at r = .56.
Explicit measures: Irritability. impulsivity, and aggressiveness
Participants completed three questionnaires that became increasingly blatant in how they asked about the target construct, aggressiveness. First, a German version of the 20–item Caprara Irritability Scale (CIS; Caprara et al., 1985; Bluemke & Steinmayr, 2008), assessing the tendency to aggress impulsively in response to mild or even no provocation, hence irritability (6–point rating scales, e.g. ‘I can't help being a little rude to people I don't like’, α = .82). Next, a German translation of the Gladue Aggression Inventory followed (GAI; Gladue, 1991; Bluemke & Steinmayr, 2008), which provided seven items measuring (aggressive) impulsivity (GAI–imp; 5–point rating scales, e.g. ‘I become easily impatient and irritable if I have to wait’, α = .51). Finally, applying the same answering format, the widely–used 29–item Aggression Questionnaire (BPAQ; Buss & Perry, 1992) served as a measure of trait aggressiveness along the facets of physical and verbal aggression, anger, and hostility. For the German BPAQ–adaptation, only 27 of the 29 items entered the composite score to satisfy the intended factor structure (Herzberg, 2003; von Collani & Werner, 2005, α = .82).
Results and Discussion
As expected, the explicit measures converged in the medium range, but none of them correlated significantly with the IATs, yielding the typical I–E dissociation in aggression research (Bluemke & Zumbach, 2007; see Table 2). Given that the explicit measures were substantially correlated, and to maximise reliability, we ran a Principal Component Analysis (PCA) on irritability, impulsivity, and trait aggressiveness scores. The PCA suggested a strong first component (Eigenvalues = 2.14, .55, and .31, explaining 71%, 18%, and 10% of the variance). All three measures loaded significantly on the first component (λ = .88, .78, and .88 for CIS, GAI–imp, and BPAQ, respectively). The factor scores of the one–component solution were taken as explicit aggressiveness index that was subjected to further analyses. Separate regression analyses were used to inspect the incremental validity of idio–IAT and gen–IAT. Following Aiken and West's (1991) recommendations on centering variables, we predicted aggressive behaviour by explicit aggressiveness (PCA scores), implicit measure (IAT scores), and the respective I × E interaction effect (see Table 4). Although main effects were not significant, the interaction involving idio–IAT, but not gen–IAT, explained variance significantly. Simple slopes analyses showed that the propensity to aggress without being provoked increased with higher (+1 SD) idio–IAT–scores, but only for people who were high (+1 SD) in self–reported aggressiveness (Figure 2). In other words, when participants reported a dispositional lack of controlling aggression, the implicit measure gained predictive utility. Having high explicit and implicit scores resulted in the selection of a number of disturbing images almost twice as high as when either score was low. All explicit measures yielded similar outcomes in separate analyses. 2 To summarise, neither measure predicted unprovoked aggression on its own. Instead, idio–IAT, but not gen–IAT, predicted unprovoked aggressive behaviour as a function of explicit aggressiveness. This result is in line with the CCM that predicts idio–IATs to better tap into the core self than gen–IATs.
Incremental validity (Experiment 3): multiple regression of aggressive behavior on explicit aggressiveness (PCA–factor–scores), idiographic (Model 1a) and generic (Model 1b) implicit measures, and the respective Implicit × Explicit interaction terms
Note: N = 125, df = 1, 121; β = standardised beta coefficient.

Prediction of unprovoked aggressive behaviour (proportion of negative images) by the implicit aggressive self–concept at different levels of explicit aggressiveness (Experiment 3).
General Discussion
Summary and Interpretation
Three experiments confirmed our main hypothesis: Idiographic self–representations render IATs more suitable for the assessment of the implicit self–concept than do generic self–representations. Experiment 1 demonstrated that implicit and explicit measures of global self–esteem were related, regardless of whether gen–IAT or idio–IAT was taken as an indicator for implicit self–esteem. However, only idio–IAT captured the relations of implicit self–esteem to observer–assessed self–esteem, self–reported multidimensional self–esteem, and the salutogenetic criterion sense of coherence. Experiment 2 confirmed that implicit weight identity was related to explicit body image, yet only so when it was assessed by idio–IAT. Moreover, only for idio–IAT could a relationship to physical properties such as BMI be established. In Experiment 3 idio–IAT, but not gen–IAT, predicted aggression for individuals that lack self–control as indicated by their high scores in self–reported aggressiveness. Our experiments close a gap in the literature on the theoretical and empirical conclusiveness of a suggested methodological moderator of IAT validity (Hofmann, Gawronski, et al., 2005): the idiographic versus generic stimulus selection for representing the self and other. These findings are among the first to show that the stimulus selection in IATs may indeed influence the validity of these IATs, not only the magnitude of IAT effects (e.g. Bluemke & Friese, 2006; Govan & Williams, 2004).
Irrespective of the content domain, several times when gen–IAT failed to show significant direct or interactive relations with validity criteria, idio–IAT had a higher likelihood to reveal them. Remarkably, idio–IAT performed better despite being related to gen–IAT to a sizeable extent. Note that these findings were obtained although the idiographic selection of stimuli introduces variability across subjects, both with regard to the representation of the self, and with regard to the representation of the other. Thus, the chances to obtain meaningful correlations for idio–IAT were a priori lower than for gen–IAT, and still we never found a single instance where gen–IAT outperformed idio–IAT. We infer from this direct comparison that the chances to tap into the relevant social knowledge structures with implicit measures are higher when using an idiographic as compared with a generic approach. Had we relied on gen–IATs instead, most of the relationships with several criteria would have been obscured. Idiographic representations allowed a more optimistic view on IAT validity.
The empirical hypothesis is closely linked with the theoretical hypothesis derived from the CCM that stimulus centrality is a crucial ingredient in what concepts are activated in implicit measures. Our findings are compatible with the CCM. The relevance of idiographic stimuli can be delineated from their centrality for the concept in question. First, idiographic items allow a better representation of invariant person cores, and second, they are more proximal to the core self than generic pronouns. In combination, they ensure quick and accurate categorizations of stimuli—an essential ingredient for meaningful associations measured with speeded–classification tasks. The CCM is independent from the kind of implicit measure used, so our findings should conceptually replicate across other measures. Although it is clearly beyond the scope of this paper, the CCM also suggests further tests of the centrality assumptions for categories other than self–concept such as ethnic or gender groups.
Limitations
With regard to the fixed order of implicit and explicit measures, one might assume that I–E consistency was better for idio–IAT than gen–IAT, because the former was taken in closer proximity to the questionnaires. Note, however, that several criteria were established before the IATs were taken (i.e., observer ratings of self–esteem, body proportions, aggressive behaviour). The proximity of the idiographic measure cannot be responsible for the higher relationships of idio–IAT with these criteria.
Logically, our findings cannot rule out that idio–IAT is a better predictor only when preceded by gen–IAT, as was the case in the present studies. Theoretically, the continuous activation of the self–concept by completing gen–IAT may have helped the subsequent idio–IAT to better tap into self–relevant information. However, we think that it is unlikely that the idio–IATs consistently correlated higher with validity criteria because they were used at the second position, as IATs at later positions have generally been found to yield less pronounced, less reliable, and less valid IAT effects (Nosek et al., 2007). Thus, by employing gen–IAT first and idio–IAT second, we put the hypothesis of higher validity of the idio–IAT to a particularly stringent test. In addition, the reverse order (idio–IAT first, then gen–IAT) would have been likely to invoke carry–over effects of idiographic self– and other–concepts due to a vivid pre–activation of these concepts during the idio–IAT (LeBel, 2010). Nevertheless, future research should use a between–participants approach to investigate the differential validities of generic and idiographic IATs to rule out any remaining ambiguities of the within–participants approach employed in the present research.
Finally, we used idiographic other–stimuli in our study, which may have increased error variance due to another non–standardised element on top of idiographic self–stimuli. However, given that the self–concept overlaps with ‘significant others’, such a procedure need not be detrimental; idiographic other–stimuli may actually help the assessment of the relevant implicit self–evaluations (Mashek, Aron, & Boncimino, 2003; DeHart, Pelham, Fiedorowicz, Carvallo, & Gabriel, 2011). Nevertheless, the role of the other–category in self–concept IATs is an important and under–researched issue. Future research may explore whether validity can be improved even further by using other–stimuli not pertaining to a close or well–known friend, but to a fictitious person or a prominent person known by all participants that can be kept constant across participants. Such a pragmatic mixture of idiographic and standardised elements might further reduce nuisance variance. This an open empirical question reaching beyond our initial theoretical question.
Implications and Outlook
Our findings have implications for research on the self, and for self–concept IATs in particular. Part of the variance observed in I–E relations may be attributable to volatile self–representations. Generic representations seem to run a higher risk of merely assessing participants’ sorting speed, rather than self–related associations proper. Previous research may have underestimated I–E consistency as well as IAT–validity by relying on generic self– and other–representations (cf. Greenwald, Poehlman, Uhlmann, & Banaji, 2009). Some researchers have concluded from low I–E correlations that the IAT may not be a useful measure (e.g. Bosson et al., 2000); others have concluded that the dissociation indicates distinct mental representations (e.g. Rudolph, Schröder–Abé, Schütz, Gregg, & Sedikides, 2008; Krizan & Suls, 2009). Again, others have suggested that explicit measures might not be suitable validation criteria for implicit measures (e.g. Perugini et al., 2007). Such conclusions may have been premature. The importance of distinguishing conceptual from measurement level cannot be stressed enough, as all theorizing involves assumptions about the conceptual level, whereas we can merely investigate the measurement level. Therefore, previous null–findings on the I–E consistency in self–concept domains invite a check with idio–IATs. A cross–check may likewise remove inconsistencies with regard to the incremental value of self–concept IATs for predicting behaviour. Whether based on single studies or meta–analyses—estimates of I–E consistency and predictive validity are biassed downward if associations are less than optimally assessed.
The quality of a measure partly depends on the mental representations elicited by the stimuli. They fill with life the concepts that people will associate during the task. Researchers need to weigh the potential for incremental utility of an idiographic over a generic approach against the additional effort spent on collecting idiographic stimuli. The idiographic approach requires flexible software tools that can use individually determined stimuli as items in the measurement procedure. From a diagnostic perspective the implications are clear; choosing the idiographic approach may require somewhat more effort, but it offers the possibility to explain extra variance that would remain unexplained by a generic approach. The CCM predicts that the present findings will extend to implicit measures other than IATs, but this prediction has not been tested yet. Therefore, at this stage, we conclude that idiographic outperform generic representations at least in self–concept IATs.
Footnotes
Acknowledgements
The authors are grateful to Lisa Bossmann, Johannes Dabisch, Lisa Göbelbecker, Nadine Haag, Tillman Ihrig, Meike Meister, Anna Mokhart, Marius Prohl, Arnhild Proß, and Nathalie Seefried for their help in collecting the data of the Experiments 1–3. Wilhelm Hofmann's support in identifying the idiographic studies is clearly acknowledged as are Klaus Fiedler's and Jonathan Jong's helpful comments on earlier drafts of this manuscript. We further thank Kristin Steslow and Tristan Philip for proof–reading the manuscript. Financial support by the research pool of the University of Heidelberg granted to the first author (D.100200/08013) is gratefully acknowledged.
1
Data from two participants had to be dropped exclusively from the correlation/regression analysis involving gen–IAT and BPAQ, because when including them two bivariate (MAHAL) outliers distorted this specific relationship, r = .19 (p = .03), and the I × E interaction, ΔR2 = .04, β = .20 (p = .03). We deem these unpredicted results unreliable, because they are not supported by the other analyses, and because their significance hinges on a mere two data points. For all other analyses, the full number of participants was retained.
2
Separate models with CIS, GAI–imp, and BPAQ as predictors yielded analogous findings, ΔRidio2 = .05, .04, and .06 (ps < .02) vs. ΔRgen2 = .01, .002, and .01 (ps > .24), respectively. At this point, some readers may suggest full regression models that enter an explicit measure, both implicit measures, and their two interaction terms at once, or models that test the idio–IAT interaction as a final incremental step after the explicit measure, gen–IAT, idio–IAT, and gen–IAT × E interaction have been controlled for. However, such regression models not only lower the ratio of data points per predictor in an unfavorable manner. To detect a small effect with sufficient statistical power, a sample size of several hundred individuals is required (Miles & Shevlin, 2001). More importantly, such models do neither provide an answer to the crucial question of incremental validity of an implicit over an explicit measure, nor to the question of incremental validity of an IAT's interaction term. Instead, they test predictors after two IAT main effects and one gen–IAT interaction have also been entered; consequently, they only test the variable residual after common variance among the predictors has been partialled out. Also, these models do not deal adequately with predictor multicollinearity; in our case, gen–IAT and idio–IAT were substantially correlated, and their two interaction terms are more than 50% mathematically strictly dependent. Although redundant independent variables are unlikely to emerge as significant predictors, these full regression models showed increments for the idio–IAT × E interaction as the last predictor, ΔR2 = .05, .05, .03, and .05 (p = .02, .02, .08, and .02) for CIS, GAI–imp, BPAQ, and PCA–scores, respectively.
