Abstract
Linguistic relativity—the idea that language affects thought by way of its grammatical categorizations—has been controversially debated for decades. One of the contested cases is the grammatical gender of nouns, which is claimed to affect how their referents are conceptualized (i.e., as rather female or male in congruence with the grammatical gender of the noun), especially when used allegorically. But is this association strong enough to be detected in implicit measures, and, if so, can we disentangle effects of grammatical gender and allegorical association? Three experiments with native speakers of German tackled these questions. They revealed a gender congruency effect on allegorically used nouns, but this effect was stronger with an explicit measure (assignment of biological sex) than with an implicit measure (Extrinsic Affective Simon Task) and disappeared in the implicit measure when grammatical gender and allegorical associations were set into contrast. Taken together, these findings indicate that the observed congruency effect was driven by the association of nouns with personifications rather than by their grammatical gender. In conclusion, we also discuss implications of these findings for linguistic relativity.
Keywords
Mother Nature, Lady Liberty, Ded Moroz (“Grandfather Frost”), or Gevatter Tod (“Godfather Death”) are allegories of a specific type: human personifications of otherwise intangible entities. For centuries, artists across Europe have chosen such personified allegories to visualize abstract concepts (Baskins & Rosenthal, 2007; Battistini, 2005). To this end, the (inanimate) entity needed to be endowed with animate properties, including assignment of biological gender. The assumption that the selection of a biological gender has been largely driven by the grammatical gender of the respective noun was corroborated by a large-scale analysis of personified allegories from the ARTstore database (Segel & Boroditsky, 2011): In 78% of cases, the noun's grammatical gender in the artist's language coincided with the biological gender of its portrayal, thus rendering personified allegories a promising candidate for linguistic relativity.
Linguistic relativity is one of various labels for the idea that the language we speak affects the way we think. One mediating factor is the categorization principles that are implemented in and enforced by grammar (Whorf, 1956; see also Gentner & Goldin-Meadow, 2003; Gumperz & Levinson, 1996; Lucy, 1992). These “Whorfian” effects of language on thought have been subject to controversy for several decades, with grammatical gender being a particularly contested domain.
Grammatical gender refers to an abstract system of noun classes that determines the behaviour of associated words such as determiners or adjectives (Harley, 2008; Hockett, 1958). The two relevant classes here are masculine and feminine, which in (formal) gender languages like Latin, French, Russian, or German extend way beyond sexuated referents 1 (Comrie, 1999; Corbett, 1991). This raises the question of whether speakers of such languages include information on grammatical gender—and hence notions related to the respective sex—in their conceptual representation of entities: Is, for instance, liberty conceptualized as female and death as male, respectively, simply because their denotations (liberté in French and Tod in German) have feminine and masculine grammatical gender?
For reasons of conceptual clarity, in the remainder of this paper the term “gender” is used for the grammatical category, and the term “sex” for the biological gender of animates; likewise, usage of “feminine” and “masculine” is restricted to grammatical gender, and usage of “female” and “male” to sex.
There is currently no straightforward answer to this question, as allegories have not yet been investigated in detail, and studies on other categories produced mixed findings. Convincing evidence in support of such a gender congruency effect was reported for person nouns (Irmen & Kurovskaja, 2010; Irmen & Roßberg, 2004; Rothermund, 1998; Stahlberg, Braun, Irmen, & Sczesny, 2007), but not for inanimate objects (overviews in Bender, Beller, & Klauer, 2011; Cubelli, Paolieri, Lotto, & Job, 2011). More often, a gender effect on inanimates has been found with explicit measures such as sex-related ratings or assignment of female versus male names or voices (Flaherty, 2001; Sera, Berge, & Castillo Pintado, 1994; Sera et al., 2002; but see Bowers, Vigliocco, Stadthagen, & Vinson, 1999) rather than with implicit measures such as the elicitation of semantic substitution errors (Kousta, Vinson, & Vigliocco, 2008; Vigliocco, Vinson, Indefrey, Levelt, & Hellwig, 2004) or priming methods (Bender et al., 2011). Apparently, the more explicit measures facilitate the strategic (i.e., intentional) usage of grammatical gender when working on the task at hand: Without a rationale for assigning a female or male voice to an entity like a bridge, actively drawing on the grammatical gender of the word “bridge” might be the nearest resort.
To circumvent this problem, researchers experimented with indirect tasks such as ratings based on the semantic differential (Clarke, Losoff, McCracken, & Still, 1981; Koch, Zimmermann, & Garcia-Retamero, 2007; Konishi, 1993, 1994), similarity sortings (Cubelli et al., 2011; Vigliocco, Vinson, Paganelli, & Dworzynski, 2005), generation of fitting adjectives (Boroditsky, Schmidt, & Phillips, 2003; Martinez & Shatz, 1996), memorizing pairs of nouns and names (Boroditsky et al., 2003; Koch et al., 2007), or drawing inferences about properties (Imai, Schalk, Saalbach, & Okada, 2014; Saalbach, Imai, & Schalk, 2012). But while these tasks may not explicitly draw attention to the relevance of grammatical gender, they do not prevent participants from strategically employing such information, either. This concern also holds for the correlation in artworks reported above (Segel & Boroditsky, 2011): Considerations on the organization and the details of paintings are protracted and deliberate, thus leaving ample scope for an explicit consideration of grammatical gender.
Which methods one adopts (or even considers appropriate) in this field of research depend on the level at which effects of language on thought are localized. One possible reading of linguistic relativity is that the words we use in language, and the grammatical properties they possess, should have an impact on how we (consciously) think about their referents, for instance, by way of implicit or explicit associations. A second reading is that the grammatical properties should affect the semantic content of the word, for instance by instilling its referent with a notion of biological sex. Both readings are compatible with some of the different versions of the so-called Whorfian hypothesis, but they conceptualize the core question in slightly different ways, which then require distinct approaches for investigation.
As pointed out by Cubelli et al. (2011), the approach inclined towards the first reading above focuses on the connotative meaning of words and assumes the expected influence to take place at the level of prelinguistic, conceptual knowledge. A noun's association with biological sex (derived from its grammatical gender) is considered to be part of the conceptual representation of the object and therefore to affect how objects are perceived and categorized. With this agenda at stake, nonlinguistic tasks are mandatory (Lucy, 1992); effects of language on cognitive behaviour in tasks that involve language, on the other hand, are instances of “thinking for speaking” (Slobin, 1996) and would thus not qualify as evidence of “Whorfian effects” in the strong sense. Studies that invite or document free associations such as in voice assignment (Sera et al., 1994, 2002) or gender/sex correlations in art (Segel & Boroditsky, 2011) therefore produce informative evidence in this regard, conditional upon the methodological caveat noted above.
The approach inclined towards the second reading of linguistic relativity focuses on the denotative meaning of words and assumes the expected influence to take place at the lexical–semantic level (Cubelli et al., 2011). Respective studies examine whether lexical properties like gender shape the semantic meaning of the word (e.g., Cubelli et al., 2011; Kousta et al., 2008). As a matter of principle, these tasks cannot be nonlinguistic. If they tapped into underlying, implicit associations of the words they were investigating, however, they would still be able to reveal important insights into the possible influence of linguistic properties on cognition.
Our study was mainly motivated by this latter approach in that we intend to investigate whether the grammatical gender of a noun affects its semantic content, but we also attempt to combine both approaches, as the tasks we use (explained below) tap into both the lexical–semantic content of words and their conceptual associations. We were therefore not so much interested in whether correlations between grammatical gender and allegorical sex are to be found at all, but whether they are strong enough to also be found in implicit measures and, if so, whether a genuinely grammatical gender effect can be disentangled from an effect of allegorical association. In tackling these questions, we combined explicit and implicit measures to address the problem of strategic usage. A set of three experiments with native speakers of German investigated whether gender effects can be observed for allegorically used nouns (Experiment 1), to what extent implicit and explicit measures of such effects converge (Experiment 2), and how the effect of grammatical gender can be separated from an effect of allegorical association (Experiment 3).
In focusing on speakers of just one language, our studies depart from typical investigations of linguistic relativity, which either compare speakers of different languages (e.g., Boroditsky et al., 2003; Konishi, 1993; Sera et al., 1994, 2002) or work with bilinguals (Kousta et al., 2008). However, if such effects of grammatical properties on the semantic content are to be found, they should already be detectable with speakers of one single language, namely by contrasting two classes of nouns with diverging properties such as feminine versus masculine gender (possible implications of this procedure are taken up in the discussion).
Experiment 1
The goal of our first experiment was to investigate whether a gender congruency effect can be observed for allegorically used nouns. Even if such an effect may not be consistently observed for nouns referring to inanimates (Bender et al., 2011), this could well be the case for such entities that are frequently portrayed as human beings. To test this assumption, we employed a version of the Extrinsic Affective Simon Task (EAST; De Houwer, 2003), which provides an implicit test of associations.
Method
Participants
Twenty-eight German speakers (16 female and 12 male; age M = 21.8 years, range = 19 to 30, SD = 2.88) from the Freiburg area participated in this experiment. They were rewarded with up to 4.92 euros, contingent on the number of correct decisions in the task made within 800 ms.
Material and design
Nouns used as stimuli belonged either to the category under scrutiny (allegorically used entities) or to a reference category (here: animates with congruent biological and grammatical gender); all nouns are reported in Supplemental Material A. The category of animates encompassed 40 pairs of given names and 20 pairs of sex-specific nouns such as kin terms. The category of personified allegories encompassed 20 pairs of items that were selected from a more extensive list of potential allegories and pretested for associations with biological sex (by 20 female and 20 male native speakers; same instruction as that in Supplemental Material A). On a 4-point rating scale ranging from clearly female (1), rather female (2), rather male (3), to clearly male (4), the feminine allegories had obtained strong female associations (M = 1.46) and the masculine allegories strong male associations (M = 3.49), t(38) = 27.21, p < .001. Feminine and masculine allegories did not differ in the strength of sex-related association [feminine: 1.46 vs. masculine: 1.51; t(38) = 0.671, p = .506; with the scale being flipped for the masculine items], only marginally in the average number of letters [feminine: 5.50 vs. masculine: 4.75; t(38) = −1.976, p = .055], and not in the frequency of usage [average log frequency according to the Celex database (Baayen, Piepenbrock, & Gulikers, 1995); feminine: 1.35 vs. masculine: 1.50; t(38) = 0.713, p = .480].
Stimuli were presented in black, green, or blue colour, together with a randomly selected, postpositioned definite article that was either feminine (die) or masculine (der) to discourage the strategic use of grammatical gender. 2 The categorization task depended on category membership: Stimuli in black were exclusively animates (i.e., given names) and were to be categorized according to their biological sex; stimuli in green or blue consisted of one half animates (i.e., kin-terms and other sex-specific nouns as reference category) and one half allegories (as target category) and were to be categorized according to their colour.
For most native speakers of German, the grammatical gender of nouns appears to be arbitrary. In order to assess a noun's gender they therefore typically combine it with a definite article and test their agreement (e.g., Brücke is feminine if die Brücke “sounds right”—an intuition that is largely derived from the lexical entry in the mental lexicon). This strategy is emphasized by the Duden, the standard volume on German language, which indicates gender simply by giving the appropriate definite article. To interfere with this strategy of gender checking, we presented randomly selected feminine or masculine articles together with the noun, producing correct or incorrect combinations in half of the trials each. To further hamper gender checking, the article was presented in postposition, which is generally at odds with German syntax. Participants were instructed to ignore these articles as irrelevant to the task at hand. When we introduced these two manipulations in previous studies, we basically observed a substantial overall increase in reaction time, together with a slight decrease in accuracy, but no changes in the overall pattern. Possible implications of this manipulation are picked up in the discussion.
One biological sex each and one colour each were matched by assigning them to the same key, thus creating two types of trials. A trial was congruent if a feminine object was presented in a colour that requested categorization by the same key as the female sex of the animates, and if a masculine object was presented in a colour that requested categorization by the same key as the male sex of the animates. In the reversed case, the trial was incongruent (for an illustration of the principle, see Fig. 1). If grammatical gender instils the noun with a notion of biological sex, then the categorization should be facilitated in congruent as opposed to incongruent trials.

Design of the Extrinsic Affective Simon Task (EAST) for the tasks on gender congruency (instantiated here for an incongruent trial). To view this figure in colour, please visit the online version of this Journal.
Procedure
Targets were displayed in the centre of a 58.4-cm LCD screen with a 100-Hz refresh rate. They subtended about 5° of visual angle horizontally and 1° vertically. Category labels (“female”/“male” in black, and “green”/“blue” in green/blue, respectively) were present during all trials in the bottom-left and bottom-right corners. Participants were tested individually. Brief instructions were given on the screen, followed by a set of practice items with trial-wise feedback for correct and false responses. The same feedback was provided in the subsequent test trials. Targets in the 480 test trials were worked on in three complete blocks, with target items occurring in random order for 0.8 seconds each. Responses were made by pressing the inner keys of two computer mice placed to the left and right of the keyboard, respectively. Assignment of keys to position (left vs. right) was randomized.
Results and discussion
A gender congruency effect is diagnosed when responses are faster and/or more accurate in congruent trials (i.e., colour of feminine object = female sex of animate; colour of masculine object = male sex of animate) than in incongruent trials (colour of feminine object = male sex of animate, and colour of masculine object = female sex of animate). We expected that it would be stronger for animates than for allegories, as the former do have a biological sex, information on which is an essential part of the semantic meaning of the word (e.g., “daughter” is not just a child to someone, but a female child). This information on biological sex is activated during lexical access and should therefore naturally affect the item's categorization by pressing keys that also code sex categories, even though this is clearly irrelevant in these specific trials in which stimulus colour is relevant. Allegories may do so, too, but as their “biological” sex would be one only by association (at best), the effect should be weaker.
In all analyses reported below, response latencies that were outliers in each individual's distribution of latencies—as identified by Tukey's criterion (i.e., latencies that were below the first quartile minus 1.5 times the interquartile range or above the third quartile plus 1.5 times the interquartile range; see Clark-Carter, 2004)—were excluded.
The accuracy and latency data each were analysed in three steps. In the first step, we estimated mixed linear models (for the accuracy data: generalized mixed linear models with logistic link function) with participants and target items as random factors in order to determine which random structure fits the data best: Are random intercepts for participants and items sufficient, or are additional random slope components necessary for the experimental factors as a function of participants or items (see Jaeger, 2008; Judd, Westfall, & Kenny, 2012)? The strategy for selecting a model with appropriate random-effects structure is reported in Supplemental Material B. In the second step, the model with appropriate random-effects structure was used to check the fixed effects of three within-subject factors: type of target (animate vs. allegory), grammatical gender of target (feminine vs. masculine), and response association (whether the trial requires pressing the “female” or “male” key). 3 A gender congruency effect is detected, when the interaction Gender × Response Association is significant. Delta chi-square statistics are used for the accuracy data, and F statistics with Kenward–Roger approximated degrees of freedom (according to Judd et al., 2012) for the latency data. In the third step, we checked each noun category (in Experiment 1: animates and allegories) for a gender congruency effect. All analyses were conducted in the statistical programming language R (R Core Team, 2014) using the packages lme4 (Bates, Maechler, Bolker, & Walker, 2014) and afex (Singmann, 2014). The findings for Experiment 1 are summarized in Figure 2A and Supplemental Material C, the model comparisons are reported in Supplemental Material B(I), and the results for the fixed effects and for the noun categories are presented in the following sections.
As indicated in Supplemental Material B, the final model in most of the reported analyses includes random slopes for response association as a function of participants. This indicates interindividual differences with regard to the preference for the “female” or “male” response key, perhaps due to an interaction between the participant's handedness and the balanced assignment of the response keys.

Results in terms of accuracy and reaction time for Experiments 1 to 3 (mean values and standard deviations are reported in Supplemental Material C).
Accuracy
The analysis of the fixed effects indicated a general gender congruency effect [Gender × Response Association, χ2(1) = 110.52, p < .001] that interacted with the type of targets [Type × Gender × Response Association, χ2(1) = 15.13, p < .001], but no other effects. Testing the congruency effect for the two noun categories revealed a significant effect for animates [M = 9.29%, χ2(1) = 105.46, p < .001, one-sided] and for allegories [M = 4.11%, χ2(1) = 21.63, p < .001, one-sided].
Reaction time
The analysis of the fixed effects again indicated a general gender congruency effect [Gender × Response Association: F(1, 5990.0) = 9.15, p = .002] that interacted with the type of targets [Type × Gender × Response Association: F(1, 5989.7) = 14.44, p < .001], but no other effects. Testing the congruency effect for the two noun categories revealed a significant effect for animates [M = 19.48 ms, F(1, 2951.2) = 23.00, p < .001], but not for allegories [M = −2.29 ms, F(1, 2981.3) = 0.33, p = .56].
These findings indicate that the colour of the reference and target stimuli (i.e., the sexuated nouns and allegories) was easier to categorize when it generated a gender-congruent key match than when it did not. Especially for animates, the categorization was both faster and more correct, as expected. A similar effect was found for the allegorically used nouns, but it was smaller and restricted to the accuracy data.
Apparently, allegorically used nouns do indeed evoke associations with a biological sex that are in line with the noun's grammatical gender, as evidenced by the congruency effect in the accuracy data. These findings contrast with a previous study (Bender et al., 2011) in which we did not obtain such an effect. However, the two studies differ with regard to noun category as well as method: The earlier study targeted inanimate objects and was based on priming. There is reason to assume that differences in the emergence and strength of the congruency effect may depend on word category (Beller, Brattebø, Lavik, Reigstad, & Bender, in press). Specifically, nouns used as allegories may evoke stronger associations with a biological sex (e.g., due to their personification) than nouns referring to plain objects. If this assumption is correct, then allegories should not only generally generate a stronger congruency effect than nouns for objects, but the emergence of this effect should also largely depend on the strength of the association. To enable a direct comparison across categories, Experiment 2 therefore included an object condition. In addition, we supplemented the EAST with an explicit task to assess the strength of the association, which also allowed us to relate the lexical semantic and the conceptual level to each other, on which the two approaches described in the introduction focus respectively.
Experiment 2
Experiment 2 pursued three main goals: to replicate the obtained gender congruency effect for allegories, to set it into context by comparing it with nouns for objects, and to assess its strength by comparing the implicit measure with an explicit measure. The latter should also allow us to assess whether stronger associations with a biological sex may pave the way for the gender congruency effect.
Method
Participants
Sixty-one German speakers (33 female and 28 male; age M = 23.4 years, range = 18 to 46, SD = 5.19) from the Freiburg area participated in this experiment. They were rewarded with up to 4.92 euros, contingent on the number of fast and correct decisions in the task.
Material and design
The experiment consisted of two parts: Part I used an implicit measure (the EAST) and Part II an explicit measure (assignment of biological sex).
For the EAST, participants worked on nouns for animates (as reference category) and either allegories or nouns for objects (target category) in a between-subjects design. The items for the animates and the allegories were taken from Experiment 1. For the additional object condition, 20 pairs of nouns that referred to artefacts (see Supplemental Material A) were selected from a pool of nouns used in a previous study (Bender et al., 2011); again, average word length and frequency of usage did not differ significantly across feminine and masculine items.
For the explicit measure, all participants were also presented with a randomized list of nouns. For each noun, they had to indicate whether they would portray its referent as a female or male person on a 4-point rating scale ranging from clearly as woman (1), rather woman (2), rather man (3), to clearly man (4). The list included all 40 allegories and all 40 object nouns used as stimuli in the EAST (together with some other candidates for allegories); the order of the items and the order of the rating labels was controlled for (the exact phrasing of the instruction is given in Supplemental Material A).
Procedure
The procedure for the EAST was identical to that in Experiment 1. The assignment task for biological sex followed after the EAST.
Results and discussion
We first present the results from the implicit measure (EAST)—both for Experiment 2 only and jointly for Experiments 1 and 2—then from the explicit measure (assignment of biological sex), and finally from a reanalysis of the implicit data with explicit ratings and the number of letters as covariates.
Implicit measure: Experiment 2 only
The data were analysed in the same way as described for Experiment 1: We first determined the model with the most appropriate structure of random effects of participants and target items [reported in Supplemental Material B(II)] and then used this model to test the fixed effects of the factors type of target (animate vs. allegory/object), grammatical gender of target (feminine vs. masculine), and response association (whether the trial requires pressing the “female” or “male” key) as within-subject factors, and group (category of inanimate concepts: allegory vs. object) as between-subjects factor. The findings are summarized in Figure 2B and Supplemental Material C.
Accuracy
The analysis of the fixed effects indicated a general gender congruency effect [Gender × Response Association, χ2(1) = 122.14, p < .001] that interacted with the type of targets [Type × Gender × Response Association, χ2(1) = 29.38, p < .001]. It also indicated a main effect of Type, χ2(1) = 13.92, p < .001, and an interaction Type × Response Association, χ2(1) = 4.25, p = .04; as the only effect of group, we found a marginally significant interaction Type × Group, χ2(1) = 3.76, p = .05. Testing the congruency effect for the three noun categories revealed a significant effect for animates in both groups [M = 7.56%, χ2(1) = 79.14, p < .001; and M = 8.01%, χ2(1) = 79.37, p < .001, one-tailed], for allegories [M = 1.39%, χ2(1) = 2.75, p = .05, one-tailed], and for objects [M = 2.63%, χ2(1) = 13.05, p < .001, one-tailed].
Reaction time
The analysis of the fixed effects again indicated a general gender congruency effect [Gender × Response Association: F(1, 13118.9) = 44.35, p < .001] that interacted with the type of targets [Type × Gender × Response Association: F(1, 13117.0) = 14.21, p < .001], but no other effects. Testing the congruency effect for the different noun categories revealed a significant effect for animates of both groups [M = 25.50 ms, F(1, 3152.4) = 37.70, p < .001; and M = 16.80 ms, F(1, 3243.2) = 15.13, p < .001, one-tailed] and for allegories [M = 6.75 ms, F(1, 3241.4) = 3.01, p = .04, one-tailed], but not for objects [M = 5.61 ms, F(1, 3365.4) = 1.71, p = .095, one-tailed].
Again, the colour of the stimulus was easier to categorize when it generated a gender-congruent key match than when it did not. This was true and again most pronounced (as expected) for the animates for which categorization was both faster and more correct. The congruency effect for allegories found in the accuracies of Experiment 1 could be replicated, but it was smaller in the accuracies, yet extended to the latencies. Interestingly, we also found a congruency effect for the objects, although restricted to the accuracies.
Implicit measure: Joint analysis of Experiments 1 and 2
As Experiment 2 used the same animates and allegories as those in Experiment 1, we analysed the data from the allegory group of Experiment 2 jointly with the data from Experiment 1 in order to check whether or not the congruency effect was moderated by the additional factor “experiment” (Experiment 1 vs. Experiment 2). The random structure analysis is reported in Supplemental Material B(III).
Accuracy
The analysis of the fixed effects confirmed both the general gender congruency effect [Gender × Response Association: χ2(1) = 59.37, p < .001] and the interaction with the type of targets [Type × Gender × Response Association: χ2(1) = 19.69, p < .001] across experiments. The congruency effect was weaker for both types of nouns, animates and allegories, in Experiment 2 than in Experiment 1 [Experiment × Gender ×Response Association: χ2(1) = 4.98, p = .03], but the difference between the two types of targets was not moderated by the factor experiment [Experiment × Type × Gender × Response Association: χ2(1) = 0.31, p = .58].
Reaction time
The analysis of the fixed effects again confirmed both the general gender congruency effect [Gender × Response Association: F(1, 12442.2) = 38.18, p < .001] and the interaction with the type of targets [Type × Gender × Response Association: F(1, 12441.1) = 25.77, p < .001] across experiments. The congruency effect appeared to be stronger for both types of nouns, animates and allegories, in Experiment 2 than in Experiment 1, although this effect did not reach significance [Experiment × Gender × Response Association: F(1, 12443.7) = 3.66, p = .06]; again, the difference between the two types of targets was not moderated by the factor experiment [Experiment × Type × Gender × Response Association: F(1, 12442.5) = 0.06, p = .81].
The findings from this joint analysis suggest that the gender congruency effect from Experiment 1 and from the allegory group of Experiment 2 differed only in strength: They were stronger in Experiment 1 than in Experiment 2 in the accuracies, but weaker in the latencies (equally for both animates and allegories). Taken together, the two experiments thus suggest that allegorically used nouns do evoke associations with a biological sex that are in line with the noun's grammatical gender.
Explicit measure
The assignment task for biological sex included all those allegory and object nouns used as stimuli in the EAST. Assignments of biological sex were coded from 1 (clearly female) to 4 (clearly male). Thus, the neutral midpoint of the scale had position 2.5. Mean assignment for each item was calculated across all participants (the mean ratings for all items are reported in Supplemental Material A). The feminine allegories were indeed regarded as strongly female (M = 1.53) and the masculine allegories as strongly male (M = 3.18) on average, and this difference was significant, t(38) = 20.82, p < .001, d = 6.58. Evidently, the stimuli for the allegorical condition were well chosen, as they evoked rather clear associations with a biological sex in the explicit measure, in line with the grammatical gender. This cannot be said, however, for the category of objects, which turned out to be not as neutral as intended: Both the feminine objects and the masculine objects evoked associations that were, on average, inclined more towards male sex [feminine: M = 2.73; masculine: M = 2.91; difference not significant, t(38) = 1.31, p = .099; d = 0.414], which is in line with Mullen's (1990) observation that artefacts are generally more strongly associated with male sex.
Reanalysis of Experiments 1 and 2 with ratings and the number of letters as covariates
In order to test whether the gender congruency effect observed in the EAST was moderated by the associations and/or by the (slightly confounded) word length, we reanalysed the data from Experiment 1 and Experiment 2 by comparing their results with respect to the gender congruency effect in two sets of models: one set in which we included the mean rating of the targets (centred on the neutral scale midpoint, 2.5, and scaled to a standard deviation of 1.0) and the interaction of ratings and response associations as covariates, and another set in which we included the number of letters of the items (z-transformed) and its interaction with response association as covariates. 4 Again, we first determined the model with the most appropriate structure of random effects of participants and target items [reported in Supplemental Material B(IV) for Experiment 1 and Supplemental Material B(V) for Experiment 2] and then used this model to test the fixed effects. The results were clear-cut: In all cases, in which we had observed a gender congruency effect in the initial analyses, the relevant interaction Gender × Response Association was still present when we included the number of letters and its interaction with response association as covariates: In the accuracies of Experiment 1, χ2(1) = 21.67, p < .001, in the accuracies of Experiment 2, χ2(1) = 15.09, p < .001, and in the latencies of Experiment 2, F(1, 6604.7) = 4.90, p = .03. On the other hand, this interaction disappeared when the ratings were included instead: In the accuracies of Experiment 1, χ2(1) = 1.13, p = .29, in the accuracies of Experiment 2, χ2(1) = 0.18, p = .68, and in the latencies of Experiment 2, F(1, 6603.4) = 1.41, p = .23. This emphasizes that the gender congruency effect was not mediated by the (slightly confounded) number of letters, but instead by the explicit ratings of biological sex. In other words, stronger associations pave the way for gender congruency. Analogous to the initial analysis of Experiment 2, we did not find a significant interaction Group × Gender × Response Association in this reanalysis; this indicates that the gender congruency effect is alike for allegories and objects (with the animates excluded) and in both cases mediated by the explicit ratings.
The animates were not included in these analyses as no ratings were collected for these items (and there would be only little variance, as these items are, by definition, either clearly female or clearly male).
It is important to note that, besides an effect of allegorical association and of grammatical gender per se, the strategic use of grammatical gender could contribute to the congruency effect as well. Especially the explicit measure used here (assignment of biological sex) leaves ample space for such a strategy, as it was not timed, and to some extent this may also be true for the implicit measure (EAST), which was timed but may not have been rigid enough to preempt gender recognition and usage. This might explain in part why, in Experiment 2, the object category also generated a gender congruency effect—even in the absence of distinct associations (as a function of grammatical gender) with biological gender in the explicit ratings.
But where do these associations come from? Do they emerge directly from the grammatical gender of the words (which would be mandatory for a proper Whorfian effect), or are they conveyed by the sex of the personified allegories with which people have grown up (regardless of whether the biological sex of these allegories may have been inspired by the grammatical gender in the first place)? For the current data, both implicit and explicit, this is hard to tell. A direct effect of the grammatical gender is more likely for the EAST, as the processing of conceptual knowledge like associations with personified allegories are less likely here than in the explicit assignment task, where this is more easily possible. While the design of the previous two experiments did not provide a means to disentangle these potential influences, the question of whether the effect of grammatical gender per se may be superimposed—if not entirely driven—by the referent's allegorical association with a biological sex can, in fact, be resolved. Testing this assumption was the goal of Experiment 3.
Experiment 3
To further scrutinize the nature of the gender congruency effect, we took advantage of the fact that not all allegories are gender-congruent. The assignment task provided us with a number of allegories for which grammatical gender and association with a biological sex diverged. The season Frühling (“spring”), for instance, is grammatically masculine, but strongly associated with female sex (1.40). Items like these can help us to assess the relative contribution of grammatical gender and strength of association to the overall effect.
Method
Participants
Twenty-nine German speakers (17 female and 12 male; age M = 22.9 years, range = 19 to 33, SD = 3.69) from the Freiburg area participated in this experiment. They were rewarded with up to 4.92 euros, contingent on the number of fast and correct decisions in the task.
Material and design
Based on the results of the explicit assignment task of Experiment 2, we selected 10 pairs of feminine and masculine allegorically used nouns, each with strong gender-congruent associations, and 10 pairs with incongruent associations for a within-subject design. The gender-congruent allegories had obtained average ratings of 1.33 (feminine/female) and 3.44 (masculine/male), respectively, t(18) = 29.56, p < .001. Feminine and masculine words differed slightly in the strength of gender association [feminine: 1.33 vs. masculine: 1.56; t(18) = 3.18, p = .006; with the scale being flipped for the masculine items] and in the number of letters [feminine: 6.0 vs. masculine: 4.6; t(18) = −2.20, p = .041], but not in the frequency of usage [average log frequency according to the Celex database (Baayen et al., 1995); feminine: 1.05 vs. masculine: 1.62; t(18) = 2.01, p = .060]. The gender-incongruent allegories had obtained ratings of 1.83 (masculine/female) and 3.39 (feminine/male), respectively; t(18) = −11.97, p < .001. Feminine and masculine words did not differ in the strength of gender association [feminine: 1.61 vs. masculine: 1.83; t(18) = 1.64, p = .118; with the scale being flipped for the feminine items], in the number of letters [feminine: 5.8 vs. masculine: 5.8, t(18) = 0, p = 1], or in the frequency of usage [average log frequency; feminine: 1.39 vs. masculine: 1.48, t(18) = 0.346, p = .734]. Nouns for animates (as reference category) were the same as those in the previous experiments.
Procedure
The procedure (EAST) was the same as that in Experiment 1.
Results and discussion
The data were analysed in the same way as described for Experiment 1: We first determined the model with the most appropriate structure of random effects of participants and target items [reported in Supplemental Material B(VI)] and then used this model to test the fixed effects of three within-subject factors: type of target (animate vs. congruent allegory vs. incongruent allegory), grammatical gender of target (feminine vs. masculine), and response association (whether the trial requires pressing the “female” or “male” key). The findings are summarized in Figure 2C and Supplemental Material C.
Accuracy
The analysis of the fixed effects indicated a general gender congruency effect [Gender × Response Association, χ2(1) = 20.65, p < .001] that interacted with the type of targets [Type × Gender × Response Association, χ2(2) = 20.18, p < .001]. It also indicated a marginally significant effect of response association, χ2(1) = 3.96, p = .05, but no other effects. Testing the congruency effect for the three noun categories revealed a significant effect for animates [M = 6.90%, χ2(1) = 52.14, p < .001, one-tailed] and for the congruent allegories [M = 4.09%, χ2(1) = 10.24, p < .001, one-tailed], but not for the incongruent allegories [M = −0.65%, χ2(1) = 0.35, p = .28, one-tailed], with the difference between congruent and incongruent allegories being significant [Type × Gender × Response Association, χ2(1) = 6.99, p = .008].
Reaction time
The analysis of the fixed effects did not indicate a general gender congruency effect [Gender × Response Association, F(1, 6151.6) = 1.89, p = .17], but only the interaction Type × Gender × Response Association, F(2, 6151.0) = 9.93, p < .001. Testing the congruency effect for the different noun categories revealed a significant effect only for the animates [M = 21.92 ms, F(1, 3002.6) = 25.53, p < .001, one-tailed], but neither for the congruent allegories [M = −4.78 ms, F(1, 1519.8) = 0.79, p = .185, one-tailed], nor for the incongruent allegories [M = −3.07 ms, F(1, 1523.6) = 0.32, p = .285, one-tailed].
Generally speaking, ease of categorization depended on the type of stimulus, as indicated by the interaction with type: Categorization of the stimuli was facilitated when it generated a gender-congruent key match in the case of the gender-congruent items (animates and gender-congruent allegories like “war”), but not in the case of the gender-incongruent allegories (like “spring”). Again, as expected, the congruency effect in terms of both accuracy and reaction time was most pronounced for the animates, whereas for the congruent allegories it was restricted to accuracy. In other words: Effects of implicit associations (as indicated in the EAST) are confined to those items for which the explicitly associated biological sex (as indicated in the ratings) concurs with the grammatical gender. If (associated) biological sex and grammatical gender are in conflict, the effect disappears. These findings indicate that most, if not all, of the gender congruency effect in this paradigm can be attributed to the referent's association with the personified allegory rather than to an impact of the noun's grammatical gender.
Reanalysis with ratings and the number of letters as covariate
To substantiate this argument, we reanalysed the accuracy data for the congruent and incongruent allegories with the mean rating of the targets (as obtained from Experiment 2; centred on the scale midpoint 2.5 and scaled to a standard deviation of 1.0) and its interaction with response association as well as number of letters (z-transformed) and its interaction with response association as covariates. Again, we first determined the model with the most appropriate structure of random effects of participants and target items [reported in Supplemental Material B(VII)] and then used this model to test the fixed effects. The average number of letters did not play a role, neither for the accuracies, nor for the latencies. For the accuracies, however, for which we had observed a gender congruency effect in the initial analysis, the covariate Rating × Response Association reached significance, χ2(1) = 8.82, p = .003, whereas the interaction Gender × Response Association disappeared, χ2(1) = 1.78, p = .18, indicating that the gender congruency effect for allegories is mediated by the explicit ratings of biological sex (as in Experiments 1 and 2). In fact, the interaction Type × Gender × Response Association was eliminated, χ2(1) = 0.31, p = .58, in this analysis, whereas it was significant, χ2(1) = 7.09, p = .008, when only the number of letters and its interaction with response association were included as covariate. This finding indicates that the partially confounded number of letters was not relevant; differences in the size of the observed congruency effect between congruent and incongruent allegories were accounted for only by the explicit ratings.
Whether the gender congruency effect is entirely driven—rather than merely superimposed—by the associations with biological sex is difficult to decide, however. One might expect that, if the impact were based entirely on association, the effect should reverse rather than disappear. However, items in the incongruent condition were weaker on average in the strength of their association, especially for masculine words with female association, and this may explain why they could not fully reverse the effect. The current design can also not completely dispel the concern that some participants may have made strategic use of the grammatical gender of the items. Nevertheless, usage of an implicit task like the EAST goes a long way towards preempting such strategic usage to the greatest possible extent, as elaborated on below.
General Discussion
Dating back at least two millennia, liberty has been portrayed as female: as the goddess Libertas in the Roman Empire, as Marianne in Eugène Delacroix's painting La Liberté guidant le peuple (1830), or in embodiments such as the Statue of Liberty. In Latin and the Romance languages derived from it, the feminine (grammatical) gender of the noun most likely inspired the female sex of the allegory. But does this relationship corroborate that the grammatical gender of nouns affects how their referents are conceptualized? Two pieces of evidence speak in favour of this assumption: the strong correlation found in European literature and artworks (Segel & Boroditsky, 2011), and our own collection of allegorically used nouns, in which gender-congruent allegories clearly prevailed, both in number and in strength, over gender-incongruent allegories.
The principle of linguistic relativity, however, extends beyond such explicitly made associations and postulates that, even when not taking notice of the class to which a noun belongs, its grammatical gender should affect conceptualization. A conclusive test therefore requires the associations of grammatical and biological sex to be scrutinized without explicating this link. Our version of the Extrinsic Affective Simon Task (EAST) did obtain an effect of grammatical gender on the categorization of allegories (in all experiments) and of objects (in Experiment 2). At the same time, however, it also helped to put this effect into perspective: In terms of Cohen's d measure of effect size, the effect for allegories (with d ranging between −0.127 and 0.715) turned out to be less strong than that for animates (d ranging between 0.757 and 1.669; all experiments); it was much stronger with explicit measures (d = 6.58) than with implicit measures (Experiment 2); it was moderated by allegorical associations (all experiments); and it disappeared when setting grammatical gender and allegorical associations into contrast (Experiment 3). These findings suggest, along with the mediational analyses involving explicit ratings of sex-related associations, that what drove the apparent gender effect in our studies was the association of nouns with personified allegories rather than the gender of the noun itself. This is not to say that grammatical gender had no relevance for the effect, but it raises the question as to how, exactly, it contributes to this effect. In tackling this question, the discussion revolves around the following issues: (a) possible concerns about the appropriateness of the adopted method, (b) the most likely source of the gender congruency effect, and (c) its implications for linguistic relativity.
Methodological concerns
In our studies, we tested the Extrinsic Affective Simon Task (EAST) as a means to explore sex-related associations for words while at the same time preventing the strategic usage of grammatical gender. Such strategic usage has been critical in previous research with more explicit measures—and is also a problem for the assignment task used for the explicit rating in Experiment 2. The correlation of grammatical gender and biological sex in the assignment done by German speakers is considerably stronger here than in other studies with similar tasks and samples (Koch et al., 2007; Sera et al., 2002; Vigliocco et al., 2005). While this may indeed indicate stronger allegorical associations, we cannot rule out strategic usage. For the purpose of the current study, however, this is not critical, as allegorical association per se is not of principal interest, but was basically assessed to control for a confounding variable. If the strategic usage of grammatical gender overestimated the strength of this confounding variable, it would simply render the assessment of the relevant effect more conservative.
To some extent, participants may have drawn on grammatical gender deliberately even in the EAST (as probably indicated by the gender congruency effect for objects in Experiment 2), but the speeded categorizations required for this task would have hampered this more than in most other methods previously used in this field. As an additional means to interfere with such strategies, we presented randomly selected feminine or masculine articles postpositioned to the target items. This produced a correct match in half of the trials and an incorrect match in the other half. However, as none of the combinations was presented in the correct order (article–noun), they would have further impeded, rather than facilitated, gender checking. As correct and incorrect matches were counterbalanced, possible impediment and facilitation effects should have evened out and thus largely attenuated strategic usage. Any observed difference between responses to gender-congruent versus gender-incongruent trials would therefore indicate a genuine gender congruency effect.
One side-effect of the speeded categorization was that participants were not encouraged to process in any depth the conceptual connotations of the target items. For this very reason, one might argue that we prevented participants from thinking—and thus from any of those cognitive activities that could have served as evidence for a language-on-thinking effect. For the following reasons, however, we consider such an argument as not tenable. First of all, not all types of thinking are equally well suited for testing effects of language on thinking. Instances of deliberate or strategic usage of linguistic metaknowledge (such as information on grammatical gender) must be excluded, not only for methodological reasons, but also for conceptual reasons, as such effects would be of no avail in the controversy on linguistic relativity. Second, if cognitive processing were not at all involved in producing the gender congruency effect, the grammatical gender would be solely responsible for the emergence of this effect. A direct effect of a grammatical property like gender on semantic content, however, would not only be interesting in and of itself, but would also necessarily affect how these items are perceived and categorized (namely as somewhat more female or more male)—which, regardless of theoretical position, would then be considered a Whorfian effect. As argued above, however, our findings are unlikely to reflect a direct effect of a grammatical property on semantic content, but rather its mediation by allegorical associations, which are located at the conceptual level and hence must have involved “thinking” at least at some point in their emergence. In other words, while the EAST may not directly invite people to think about the items to be categorized, it does still reveal a product of this thinking.
Sources of the gender congruency effect
The pattern of findings obtained in our studies suggests that the gender congruency effect is brought about by associations of nouns with personified allegories such as Lady Liberty, rather than the nouns’ grammatical gender per se. More often than not, these associations may have relied on grammatical gender, which explains the prevalence of gender-congruent allegories (even if this may have involved a detour into the history of literature and arts, with the sex-related association of, for instance, English “liberty” most likely deriving from Latin libertas). But for a strong sex-related association to arise from such allegories, these allegories need to be popular and widespread, thus rendering such associations a cultural rather than a linguistic phenomenon (Beller et al., in press). As a consequence, some allegories give rise to strong, gender-congruent associations, while others are rather neutral or even incongruent. As a case in point, the German noun for “liberty”, Freiheit, seems to trigger no sex-related associations at all. Despite its grammatically feminine gender (as in Latin) and despite strong female associations in other countries, the participants in our pretest rated the biological sex of Freiheit as absolutely neutral.
Implications for linguistic relativity
The question at the core of this study is: Does language affect thought, or more precisely, do the grammatical distinctions and categories in people's native language affect how they think about the world? In the case of the domain under scrutiny here, this boils down to the question of whether the grammatical gender of a noun in a language makes its speakers think of this noun's referent in more female or male terms.
The assumption that the gender congruency effect is indeed brought about by a cultural transmission also has implications for how our findings relate to the debate on linguistic relativity, as its origin in grammar is crucial for any effect to be considered a “Whorfian effect”.
Typically, examination of linguistic relativity involves either a comparison of different languages (e.g., Sera et al., 2002; Vigliocco et al., 2005) or the performance of bilingual speakers under different conditions (Kousta et al., 2008). Our study differs from many others in that it focused on one single language. Does this diminish the significance of our findings? Could, for instance, our observation of a gender congruency effect be the result of a universal pattern of assigning sex to allegories, rather than language-specific sex-mappings? We do not believe this to be the case. To reiterate an argument developed elsewhere (Bender et al., 2011), the general presumptions in this field of research are, first, that grammatical structures implicit in a language should affect how speakers of this language perceive or categorize respective items, and, second, that this should be done differently in different languages. To test the first presumption, cross-linguistic comparisons are not required. If linguistic structure prestructures thought within each single language (Whorf, 1956, p. 213f.), it should be sufficient to examine whether thought is indeed prestructured, in our study: whether masculine words are conceived of as more male than feminine words (and vice versa). If evidence for this first presumption is not unequivocal—as is the case in our study—testing the second is redundant for this particular instance.
Yet, as pointed out earlier, cross-linguistic differences in the mapping complexity of the gender systems may account for differences in the emergence and/or strength of respective effects (Sera et al., 2002; Vigliocco et al. 2005). Generalizations should therefore be drawn with caution: Whether a genuine gender effect might be found in a two-gender language like French or Spanish remains an open question for empirical investigation. Furthermore, the fact that we were unable to substantiate an unequivocal Whorfian effect in the domain of grammatical gender does not speak against Whorfian effects in other domains (see, e.g., Boroditsky & Gaby, 2010; Dolscheid, Shayan, Majid, & Casasanto, 2013; Haun, Rapold, Janzen, & Levinson, 2011; Majid, Bowerman, Kita, Haun, & Levinson, 2004).
Conclusion
Our study set out to investigate whether the grammatical gender of a noun affects its semantic content and whether a genuinely grammatical gender effect can be disentangled from an effect of allegorical association. To tackle these questions, we combined explicit and implicit measures to address the problem of strategic usage. What we did find was evidence for a gender congruency effect that was predominantly driven by associations with personified allegories, likely to be motivated in many instances by grammatical gender, but not directly by grammatical gender. We interpret this as a rubbing off from conceptual categories onto the semantic content of words rather than an effect of grammatical properties such as gender onto conceptualization, and therefore as a cultural rather than a linguistic effect. For this reason, we hesitate to regard it as evidence for linguistic relativity.
Our current studies will surely not put to rest the controversy on linguistic relativity, not even in this narrow and clearly defined domain of grammatical gender. In particular, the question of methodological rigor will remain an issue and a challenge. Nonetheless, we hope to have demonstrated that implicit methods like the EAST can go a long way to deal with these challenges and provide a thorough means for scrutinizing Whorfian effects. But whatever Whorfian effects may be detectable, Lady Liberty and Gevatter Tod are not likely to serve as figureheads in this endeavour.
