Abstract
Some words are lexically suggestive about the taxonomic position of their referent (e.g., jellyfish in English), and this information can vary across languages (e.g., in Dutch the equivalent of jellyfish holds no taxonomic information: kwal). To evaluate the role of such lexical suggestions, we conducted a cross-linguistic study in which similarity judgements from two language groups (Dutch and English speakers) were compared. We paired asymmetrically informative items with items that are considered to be typical members of the referenced category (e.g., jellyfish–salmon). Our analyses revealed that items were deemed more similar by speakers of a language in which the lexical information was present (e.g., English speakers tended to give relatively higher ratings for jellyfish–salmon than Dutch participants did for the non-informative equivalent kwal–zalm). Results are discussed in light of theories of concept representation and compound processing.
Introduction
It is commonly assumed that our mental representation and conceptualization of the world are for an important part determined by perceivable features provided by that world. The representation of everyday concepts such as tree, for example, hinges on observable properties such as perceptual (has leaves), relational (required for production of paper), and ecological (the home of birds) information. Crucial human capacities such as identifying a novel tree, inferring characteristics from oaks to beaches, or judging similarity between trees and scrubs, all rely to a large extent on that information (e.g., Gelman, 2009; Hampton, 1979, 2006; Murphy, 2002; Rosch & Mervis, 1975; Tversky, 1977; Wu & Barsalou, 2009).
However, humans also rely on other sources to build their mental lexicon, such as how words are used in everyday communication. In this article, we focus on how linguistic context can potentially influence semantic representations. More specifically, we examine whether the arbitrary conventions of using particular labels in a language can subtly affect the meaning of the concepts involved, independent of observable features. Although most words in a language can be considered purely conventional and arbitrary signifiers (but see, for example, Lockwood & Dingemanse, 2015), a number of concepts are referred to by compounds that are suggestive of a particular meaning. For example, the words goldfish and jellyfish both include a reference to fish, and therefore, speakers of English can rely on cultural knowledge encoded in their language and reasonably suspect they are both subspecies of fish. They would be led astray sometimes, as jellyfish are actually cnidarians instead of fish, but clearly both labels contain information suggestive of their taxonomic position. The question then is, “Are the mental representations of these concepts influenced by the label, which is nothing more than a convention among the speakers of a language?”
The power of the label
The role of compound labels trivially depends on the status of the compound itself. For novel combinations of concepts, it is reasonable to assume that the interpretation of the combination is derived from the meaning of the constituents (Hampton, 1997; Wisniewski, 1997 but, see, for example, Connolly, Fodor, Gleitman, & Gleitman, 2007). Even when there is little knowledge about the modifying constituent, one can expect people to rely on the head noun to infer information regarding the combination, because they can rely on their meta-knowledge of how combinations of concepts are generally interpreted (Gagné & Spalding, 2011).
Labels that are highly familiar and lexicalized, on the contrary, such as jellyfish or bobcat, can be expected to have separate entries in the semantic system, representing their conventional meaning (e.g., Hampton, 1997; Kamp & Partee, 1995). As such, they should fall under the explanatory umbrella of traditional approaches to concept representation (e.g., exemplar theory, Smith & Medin, 1981; family resemblance, Rosch & Mervis, 1975; prototype theory, Hampton, 1995), which all rely on “world features” and not on the arbitrary label. Based on these models, one would not expect the category labels to have any bearing on the corresponding concepts. 1 Alternatively, one can consider the meaning of idioms such as jellyfish to be entrenched in the distributed semantic network of a language (e.g., Landauer & Dumais, 1997; Lund & Burgess, 1996). Again, there is no reason to suspect that the labels as such would affect the usage in everyday language, and thus the pattern of relations between words.
In sum, whether you expect the meaning of jellyfish to derive from the features of jellyfish or the everyday usage of the label in language, or both, the—in this case erroneous—information in the label should not fundamentally influence how the concept is represented semantically.
A cross-linguistic approach
Interestingly, given the contingency of the processes that drive naming patterns (Malt, Sloman, Gennari, Shi, & Wang, 1999), not all languages provide lexically informative labels for the same categories, and thus, referents can be found for which some languages have lexically informative labels, whereas others do not. We can thus detect the potential influence of labels by comparing concept representations of speakers of different languages, focusing on concepts associated with labels that are informative in one language but not in another.
In a previous study (Djalal, Voorspoels, Heyman, & Storms, 2016), we asked participants from three language groups (English, Dutch, and Indonesian speakers) to judge the similarity between concepts that shared the same lexical information in one of the three languages (e.g., jellyfish–catfish, chestnut–peanut), but not in the other two (e.g., jellyfish–catfish became kwal–meerval in Dutch and ubur-ubur–ikan lele in Indonesian; which contain no lexical information). In addition, we examined whether the lexical information that is included in objects’ names could affect people’s typicality judgements (“How typical is jellyfish for the category of fish?”). We found that participants rated similarity and typicality higher for items that were informative in their own native language, suggesting that their judgements are influenced by the labels’ lexical information. However, these studies leave open the possibility that participants relied on the phonetic and orthographic similarity of the terms. That is, jellyfish and (cat)fish also look and sound more similar than their Dutch and Indonesian equivalents due to the common constituent, which could in turn influence participants’ judgements.
The current study
In this study, we isolated the (potential) impact of the label on a concept’s representation using stimuli that were not orthographically or phonologically similar (e.g., jellyfish–salmon). More specifically, we asked participants of different language groups to judge the similarity of concept pairs. One item of each pair had a label that in one language was informative as to the concept’s taxonomical position (e.g., jellyfish is informative in English, but the Dutch equivalent kwal is not). Conversely, the label for a squid is suggestive in Dutch (i.e., inktvis; the word vis in Dutch means fish) but not in English (squid). Such words were paired with an item that is considered to be a typical member of the category mentioned in the label (e.g., jellyfish–salmon; squid–herring). If lexical information influences the representation of concepts, we would expect participants to judge similarity relatively higher for pairs with an item that is lexically informative in their own language, compared with participants from a different language group.
Method
Pre-registration
We pre-registered this experiment before the data collection on March 10, 2016 (see: osf.io/kq956). The pre-registration contains a short description of the hypothesis, the stimulus material, and statistical analyses. We have performed data collection and analyses as described in the pre-registration, unless otherwise stated. Materials, data, and analysis scripts are publicly available (see: osf.io/4ndmc).
Participants
Sixty English speakers (22 females, mean age: 25.45 years) and 68 Dutch speakers (32 females, mean age: 18.72 years) completed the experiment. 2 The Dutch speakers were students who participated on a voluntary basis, whereas the English speakers, recruited online using Prolific (www.prolific.ac), received £1 in return for their participation.
Materials
A list of 60 item pairs was presented in a web survey. The critical pairs comprised one word that contained lexical information, either in Dutch or in English, but never in both languages. The second word was related to the informative part of the first word in that it was either a typical member of the category (e.g., jellyfish–salmon; salmon is considered a typical member of the category fish) or a similar category coordinate (e.g., titmouse–rat; rat is considered a similar category coordinate of mouse). Of the 60 pairs, 20 were informative in English (e.g., jellyfish–salmon), 20 were informative in Dutch (e.g., inktvis–haring [squid–herring]), and 20 were fillers.
Procedure
In the survey, each word pair was presented in the following format: “How similar are X and Y?” Following our previous experiments, participants answered on a 10-point rating scale ranging from 1 (not at all similar) to 10 (extremely similar). The order of item pairs was randomised per participant. The study was presented as a similarity rating task, and self-evidently, participants were not informed about the purpose of the study. The survey took 5 to 7 minutes.
Results
To test the hypothesis that people’s similarity judgements are influenced by the lexical information in the objects’ names, we performed mixed-effects analyses on the similarity judgement scores. In the analyses, language group and language informativity (i.e., the language which contains the lexical information) were included as fixed effects. Participants and items (i.e., the 40 critical pairs) were included as random effects such that a maximal random structure was created (Barr, Levy, Scheepers, & Tily, 2013).
The results revealed that similarity scores did not significantly differ across the two language groups, χ²(1) = 0.47, p = .493, or the two language informativity conditions, χ²(1) = 1.45, p = .228. Crucially, however, the interaction between language group and language informativity proved to be statistically significant, χ²(1) = 6.10, p = .014, suggesting that similarity judgements are influenced by lexical information, as can be seen in Figure 1. 3

Similarity scores averaged across participants per language group and items per informativity level. Error bars represent 95% confidence intervals.
Discussion
In this study, we have shown that, for objects with informative labels in one language but not in another, similarity judgements of speakers were subtly different between languages. Contrary to what traditional approaches to concept representation predict, our findings seem to suggest that the representational level does get influenced by the informative label, which implies that these theories ought to be revised. In what follows, we aim to determine what it is about informative labels that drives the observed effect in our experiment. In a series of post hoc analyses, we consider a difference at the representational level, the activation and composition processes that are associated with interpreting compounds, and more high-level processes that are at work when meaning is constructed.
To examine the possibility of an informative label influencing the meaning of a word on a representational level, we replicated our main analysis on the basis of model-based relatedness scores derived from distributional semantic models. Using the recommended semantic spaces by Mandera, Keuleers, and Brysbaert (2017, available via http://meshugga.ugent.be/snaut/), for which the predictive validity of relatedness ratings has been well established in general (Baroni, Dinu, & Kruszewski, 2014; Mandera et al., 2017), we obtained cosine distance estimates for all our critical items in both Dutch and English. If linguistic labels truly affect semantic representation, one would anticipate the lexically informative item (e.g., jellyfish) to be located closer to the target item (e.g., salmon) than its non-informative counterpart in the other language (i.e., kwal to zalm). A mixed-effects analysis of variance (ANOVA) performed on the resulting cosine distances (based on 38 item pairs as word embeddings for two items were not available) did not show a significant Language Group × Language Informativity interaction, F(1, 36) = 0.45, p = .506, suggesting that at the semantic level, there seems to be no difference between the languages in terms of our critical items’ representation, and thus no influence of the informative label.
To further examine the (lack of) impact of informative labels on the (distributed) semantic representations, we expanded the set of lexicalized compounds to 100 in each language, including the informative stimuli used in this study, and compared the cosine distance between informative items (e.g., hedgehog) and their corresponding category labels (e.g., hog) with the distance between the nearest neighbours of those informative items (e.g., the nearest neighbour of hedgehog is rabbit, again according to the cosine distance metric) and the same category labels (e.g., hog). The rationale behind this comparison is the following: If the informative noun in a label influences its corresponding semantic representation, we expect the informative item to be, on average, closer to the relevant noun (hog) than its nearest neighbour. 4 Indeed, the nearest neighbour does not contain the informative label, and there is no reason to expect that it is systematically closer to the relevant noun.
However, across the 100 items in each language, paired-samples t-tests showed no significant difference in the cosine distances between both types of pairs (e.g., critical pairs like hedgehog–hog versus nearest-neighbour pairs like rabbit–hog): for English, between the critical pairs (M = 0.70, SD = 0.12) and the nearest-neighbour pairs (M = 0.72, SD = 0.15); t(99) = –1.21, p = .23, 95% confidence interval (CI) = [–0.03, 0.01]); for Dutch, between the critical pairs (M = 0.64, SD = 0.14) and the nearest-neighbour pairs (M = 0.65, SD = 0.14); t(99) = –1.56, p = .12, 95% CI = [–0.03, 0.00]). Although trending in the expected direction, this provides no compelling evidence to explain the role of informativity on similarity judgements purely in terms of (distributed) semantic representations. To summarise, both our analyses suggest that the label does not appear to considerably affect a concept’s semantic representation.
Thus far, we have exclusively focussed on the representational level, thereby ignoring that the critical items in our study are also compounds, albeit highly lexicalized ones. From this perspective, it may be informative to consider how compounds are processed when trying to determine the source of the observed effect. As a matter of fact, it is typically assumed that encountering a compound triggers the activation of its constituents (Libben, 1998; Zwitserlood, 1994), and there is evidence that some form of combinatorial process is initiated (Spalding, Gagné, Mullaly, & Ji, 2010), even in highly familiar and lexicalized compounds (e.g., El-Bialy, Gagné, & Spalding, 2013; Kuperman, Schreuder, Bertram, & Baayen, 2009; Marelli, Dinu, Zamparelli, & Baroni, 2015; Marelli, Gagné, & Spalding, 2017; Schmidtke, Matsuki, & Kuperman, 2017).
Focusing on mere activation of constituents, one may predict that activating the informative constituent (e.g., fish in jellyfish) can influence similarity judgements through spreading of activation from the relevant constituent to the target: For example, the activation of fish when confronted with jellyfish may boost the similarity with salmon, which is a typical fish. Indeed, in research on semantic priming, for example, Zwitserlood (1994) showed that compound words can prime associates of its constituents (e.g., blackbird priming white). Similarly, an account that assumes an automatic initiation of combinatorial processes requires the activation of the constituents. Even if the derived interpretation competes with and is eventually expected to give way to the conventional meaning of a lexicalized compound such as jellyfish, the activation can spread to related concepts.
Considering such compound-related processing, one can, thus, expect lexical informativity to boost similarity ratings proportionally to the similarity between the informative constituent (e.g., fish) and the target (e.g., salmon), that is, the expected influence of activation spreading from the informative label. However, we found no significant association between the item-level effect size of the informative constituent 5 and their corpus-extracted cosine distances with the corresponding targets (r = –.06, 95% CI = [–.37, .26]). Although this result does not invalidate the theoretical frameworks outlined above, it does suggest that the observed effect of the informative label was not purely the result of constituent activation spreading.
Finally, in our discussion thus far, we have tacitly assumed that similarity judgements are solely driven by the conceptual resemblance between two items (e.g., jellyfish and salmon), whether or not augmented through activation spreading from the informative constituent’s representation. However, as, for instance, recognised by the relational interpretation competitive evaluation (RICE) theory of compound processing (Spalding et al., 2010), inferential processes that go beyond the mere content of a concept also play a role (i.e., the so-called elaboration phase). Posing the question “how similar are X and Y” requires a decision that invites a certain type of reasoning. It is plausible that participants in our study relied on meta-knowledge, especially when confronted with the lesser known concepts. Gagné and Spalding (2011) presented evidence that upon encountering compounds, people do not solely rely on the meaning of the constituents to arrive at an interpretation but also on knowledge regarding the combination process in general. For example, the notion that modified concepts tend to refer to a member of the head concept is a form of meta-knowledge as it does not directly follow from the specific constituents itself, but through inference from experience with compounds.
In the absence of firsthand experience with category members of a compound concept, people may consequently rely on the information in the constituents. Indeed, Gagné and Spalding (2011) showed that people are likely to agree that features that are generally true for bottles are also true for brinn bottles, although the modifier concept brinn is completely unknown. On the basis of meta-knowledge on how concepts combine, people seem to infer that brinn bottles must be a subclass of bottles, and as such, they are comfortable attributing features of bottles to brinn bottles (albeit to a lesser extent, see Gagné & Spalding, 2011, 2014).
From a meta-knowledge perspective, one would expect the effect of the informative label to be stronger when the informative items were less familiar. It turns out that the item-level effect size of the informative label correlated –.26 (95% CI = [–.53, .06]) with judged familiarity and –.51 (95% CI = [–.72, -.22]) with log-transformed contextual diversity, two measures of how widely known a label is (Keuleers, Brysbaert, & New, 2010; Van Heuven, Mandera, Keuleers, & Brysbaert, 2014). 6 In other words, informative items that are well known (e.g., jellyfish) tended to show smaller effects than lesser known items (e.g., crayfish). Analogous to Gagné and Spalding’s (2011) brinn botles, when people have to judge the similarity between salmon and a fairly unknown concept called crayfish, they might rely on the information conveyed by the label and assume that the latter item is some kind of fish. As a result, people will infer that crayfish is presumably similar to salmon.
Conclusion
Concepts with informative labels were judged more similar to concepts related to the informative aspect by the speaker of the informative language compared with speakers of a language in which the label was not informative. Although more research is required for definitive answers, our planned and post hoc analyses were inconsistent with a difference at a representational level or an account relying on the mere activation of the informative constituent, but point to meta-knowledge being responsible for the present findings.
Supplemental Material
Disclosure_form_QJE-STD_17-026.R2 – Supplemental material for Is jellyfish more of a fish in English than in Dutch? The effect of informative labels
Supplemental material, Disclosure_form_QJE-STD_17-026.R2 for Is jellyfish more of a fish in English than in Dutch? The effect of informative labels by Farah M Djalal, Wouter Voorspoels, Gert Storms and Tom Heyman in Quarterly Journal of Experimental Psychology
Footnotes
Acknowledgements
F.M.D is a lecturer at Bina Nusantara University. However, this project began during her doctoral year at the University of Leuven and financed by the “Bijzonder Onderzoeksfonds” of the University of Leuven. T.H. is a post-doc of the Research Foundation—Flanders (FWO-Vlaanderen). W.V. is a post-doc at University of Leuven. The data are stored and can be accessed on Open Science Framework (osf.io/4ndmc).
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The research described in the manuscript was sponsored by grants DBOF/12/010 and OT/10/024 from the Leuven Research Council.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
