Abstract
This study is a cross-linguistic survey of terms for the ‘unique beginner’, defined as the highest and most inclusive rank in an ethnozoological taxonomy. Drawing on data from a world-wide sample of 149 languages, I show that terms corresponding to this category are often formally complex or characterized by polysemy. In addition, languages often lack a term for the unique beginner category altogether, confirming claims to this effect in the literature. Furthermore, I point out that the status of the unique beginner category and its lexical structure, in languages which have such a category, are positively correlated with mode of subsistence. Small-scale societies relying on hunting and/or gathering as the main mode of subsistence are likely to lack a term for the unique beginner, while those practicing advanced agriculture are the most likely to have a simplex unique beginner term not characterized by polysemy.
Introduction
Research in recent decades has uncovered some universal principles of how cultures use language to organize and categorize their surrounding flora and fauna (e.g., Berlin 1992; Berlin et al. 1973; Brown 1984). In the standard view, categorization is achieved by means of taxonomies, that is, by chains of concepts linked to each other by a relation of hyponymy. There are six ranks within ethnobiological taxonomies that have been identified by previous research: unique beginner, life-form, intermediate (a rarely occurring rank, whose status is somewhat controversial), generic, specific and varietal. Examples of terms corresponding to these ranks (excluding intermediate) from American English would be: plant (unique beginner), tree (life-form), oak (generic), white oak (specific), northern white oak (varietal) (Brown 2002:473). Note that each rank is included in the rank occurring above, and that this relationship is transitive: oak, occurring on the generic level, is immediately included in the higher life-form taxon tree, and by virtue of tree being included in plant, is also a hyponym of the unique beginner plant.
Work carried out from a historical perspective (e.g., Berlin et al. 1973) as well as research in cognitive psychology (e.g., Atran 1999) suggests that 1) not all of the ranks are of the same psychological salience and that therefore 2) many languages lack expressions for ranks with lesser salience or, if they have terms, these have a special status on the linguistic level, as compared to more basic ranks. This pertains in particular to the life-form and the unique beginner ranks (Berlin 1973; Brown 1986). These observations do not only apply to synchronic psychological saliency. Empirical data also imply a rather fixed diachronic sequence in which terms for higher ranks in ethnobiological taxonomies are acquired by languages. Assuming a psychological primacy of the generic rank, Berlin proposed in a seminal paper that “[t]he development of life-form names is most certainly subsequent to the appearance of generic names in the evolution of ethnobotanical nomenclature” (Berlin 1973:65–66) and that “[t]he last category to be lexically designated in the development of any ethnobotanical lexicon will be the unique beginner” (Berlin 1973:53).
The special status, both in terms of psychological primacy and as historically recent additions to the lexicon, may be indicated linguistically in a variety of ways. First, it is common for terms on the life-form and unique beginner levels to be morphologically complex, that is, decomposable into smaller constituent parts. The term designating the unique beginner rank in Yoruba (Niger-Congo) is of this kind. Rather than employing a simplex lexical item that cannot be broken down into smaller meaningful elements, Yoruba uses
An important variable that is said to correlate with the absence of a term for unique beginners is societal complexity (Brown 1985). An often-made assumption is that small-scale societies that are characterized by a traditional mode of subsistence based on hunting and/or gathering will be likely to lack unique beginner altogether (e.g., Berlin 1973:78; Brown 1985). As Berlin (1973) shows for the realm of ethnobotany, whenever terms for categories on higher taxonomic levels come into being, they are usually special in the sense outlined above. Therefore it should be possible to show quantitatively that the unique beginner terms are often expressed with special terms since they are likely to be relatively recent additions to the lexicon and psychologically less basic than other ranks in ethnobiological taxonomies. This study tests this scenario specifically for ethnozoological classification and provides for the first time, as a resource for further research, a detailed survey of common lexical structures in unique beginner terms. A further goal is to detect any possible areal patterns in the semantics of unique beginner terms.
Surveying Terms for the Unique Beginner in Ethnozoology
Sampling
The data on which this study is based come from a larger database that has been created to investigate properties of the basic nominal lexicon across languages, including differences and similarities in the semantic structure of complex and polysemous expressions. To capture a maximum of linguistic diversity, sampling was carried out in the following fashion: one language for each family recognized in the classification of the World Atlas of Language Structures (Dryer 2005), one of the most recent classifications of the world's languages, was chosen as a representative of this family. To account for the deeper time depth and presumable greater internal diversity of the larger families, one additional language for every 10th genus in a family was included in the sample (genus is a subordinate genealogical unit in Dryer (2005) roughly comparable to Germanic or Romance within Indo-European). Due to the lack of adequate lexical sources for some smaller families, it was not possible to find a suitable representative for every language family. The final sample consists of data from 111 languages. There are also 38 additional languages in the sample, for which data were gathered to randomly check intra-family variation. Thus, all in all, the sample provides data for 149 languages. Given the exploratory nature of this study, data for all 149 languages were used. For a complete list of the languages and sources, see Appendix A.
Defining the Unique Beginner for Practical Purposes
Defining the unique beginner category appears to be an easy task in theory, but in practice poses several problems when carrying out large-scale comparisons based on extant materials (Brown 1977). Dedicated ethnobiological literature on taxonomic hierarchies is limited in scope and is virtually unavailable for some regions of the world. In principle, what is sought is the most inclusive category in the ethnozoological folk taxonomy that immediately contains life-form categories such as ‘bird’ and ‘fish’. Tests for hyponymy as used in lexical semantics (cf. Cruse 1986) should therefore confirm a taxonomic relationship involving the conjectured unique beginner term. That is, to confirm the status of the unique beginner term, native speakers should accept translational equivalents of test sentences like birds are a kind of creature or fish are a kind of creature as valid and semantically normal statements. 1 However, such semantic tests can obviously not be performed with native speakers when a wide range of languages is under consideration.
Working with dictionaries is beneficial because it allows for consideration of a wide range of languages, but it has a serious drawback: lexicographers do not always include enough fine-grained information for relatively subtle semantic nuances to be apparent. For the purpose of this study, I therefore make the heuristic assumption that, unless further comments are provided to the contrary, terms presented in the source as having the meaning ‘animal’ or ‘creature’ are in fact lexical designations for the unique beginner category in that language. There are some reasons to be optimistic about this. First, the consulted dictionaries use a major European language (English, French, Spanish or Portuguese) as the semiotic system in which the meanings of the target language are described. French, Spanish and Portuguese have true unique beginner terms in the ethnozoological domain. For instance, in Spanish it is perfectly acceptable to say pájaros son animales or peces son animales. Second, lexicographers will have received a considerable amount of academic education, so that their conception of ‘animal’ is likely to be influenced by the scientific category of Animalia to a significant degree. For these two reasons it is likely that deviations from the scientific conception would be noted in the languages investigated, although potential errors cannot be excluded. The results presented in the following sections of this paper should therefore be considered preliminary. It is hoped that they will motivate further, more fine-grained research.
Results
Idiosyncratic Lexicalization of Semantic Features
Not only do many languages lack a term for the unique beginner category altogether, but many languages lexicalize meaning distinctions that cross-cut the taxonomical structure as found for instance in better known European languages in seemingly idiosyncratic ways. For instance, Kaluli (Bosavi, Papua-New Guinea)
No General Term
No general term for the unique beginner category could be identified from lexical sources for 29 of the 149 languages in the sample (19.5%). This may be due to two reasons: 1) the language in question indeed lacks a term, or 2) a term does exist, but it is not included in the dictionary. The latter possibility is unlikely since lexicographers whose mother tongue has a well-entrenched term for the unique beginner are likely to seek for lexical equivalents in the target language. However, sheer lack of documentation cannot be ruled out entirely. Fourteen of the languages with no unique beginner term are languages of small-scale societies in New Guinea and Australia. In Eurasia, lack of a unique beginner term is found in Laz (Kartvelian). In the Americas, 12 languages in the sample lack a lexical designation for the unique beginner category.
Another recurrent phenomenon encountered world-wide is languages making a lexical distinction between ‘wild animals’ and ‘domestic animals,’ but lacking a general term that would qualify as a true unique beginner. For instance, in the Rendille language (Cushitic),
The Berik language (Tor, Australia-New Guinea) makes a lexical distinction between
Loanword
One strategy of acquiring a term for the unique beginner is borrowing from a contact language. In Africa, the donor language is typically Arabic; in India, it is Sanskrit in both cases (in Badaga via the neighboring Kannada language). Borrowing a unique beginner term occurred in 14 languages of the sample, 9.4%, excluding the two creole languages Bislama and St. Lucian Creole French (Table 1, Figure 1).

Areal distribution of loanwords for the unique beginner.
Languages with loanwords for the unique beginner rank.
While many of these terms are relatively recent borrowings resulting from contact with European languages in the era of colonization, in a number of cases borrowing may have taken place much earlier: Basque and Sora, for instance, are likely to have acquired their unique beginner term approximately two millennia ago, 2 and the fact that these terms are still detectable as loanwords at all is due entirely to the lucky circumstance that we have significant knowledge of the donor languages. Given the potentially old age of these terms, they are likely to be well entrenched in the lexicon of the target languages. That is, loanwords need not necessarily be good indicators of low psychological saliency, especially if they have been present in the borrowing language for a considerable time.
Taxonomic Ambiguity
By taxonomic ambiguity, I mean the possibility of expanding the denotational range of a life-form or generic level term, usually of high cultural importance, to a higher taxon, in this case to the unique beginner level (see speculations to this effect in Berlin 1973:82). Taxonomic ambiguity occurs in a total of 14 languages in the sample (9.4%; Figure 2). Ambiguity with a variety of terms on the generic level occurs in four of these languages (Table 2). It is notable that the narrow meanings at the generic level all denote animals that are culturally or otherwise salient (i.e., large mammals that are typical in the regions of the world where the languages are spoken).

Areal distribution of taxonomic ambiguity in unique beginner terms.
Languages with taxonomic ambiguity (generic level) for the unique beginner rank.
Another type of taxonomic ambiguity is with a term on the life-form level, namely ‘bird,’ which occurs in seven languages of the sample. This phenomenon is particularly common in Oceanic languages (Fijian, Kapingamarangi, Samoan and Rotuman), but it also occurs in three languages of Meso- and South America. For Oceania, it seems very likely that the absence of larger mammals on small and isolated islands plays a key role in the emergence of this pattern of polysemy. For instance, on Samoa there are no indigenous mammals other than three species of bat, including two of flying-fox (Garden 2005:320). In contrast, the biological diversity of birds is extremely high in the area, making birds the animals par excellence. The common term for ‘bird, animal’ is
Another life-form polysemy is found in two Eurasian languages, Kildin Saami, spoken in the Kola Peninsula, and Nivkh, spoken in the Russian Far East, as well as in Hawaiian. These languages have unique beginner terms, which also have a more narrow reading as ‘insect, bug.’ The motivation for this phenomenon remains unclear, as insects intuitively do not constitute the most salient life-form, either in terms of perception or function. It is likely that these structures reflect Brown's (1984:16) “wug,” a category encompassing small creatures that is defined by Brown as comprising “insects and other small creatures such as spiders.”
Polysemy
This section surveys recurrent polysemous structures found in terms that designate the unique beginner rank; details are provided in Figure 3. Eight languages in the sample (5.4%) metonymically extend a term for ‘meat’ to also designate the unique beginner rank (see also Brown 1984:72). Semantically, this extension is natural, as animals provide a source of meat for human consumption, and this is one of the most important aspects from the point of view of humans. In this sense, this type of superordinate can be viewed as a collective superordinate in the sense of Wierzbicka (1984). This polysemy is particularly common in Africa (four of the eight languages with this polysemy are spoken there).

Areal distribution of recurrent patterns of polysemy in unique beginner terms.
Seven languages in the sample (4.7%) use a word meaning ‘thing’ to also express the ethnozoological unique beginner rank. Under the assumption that ‘thing’ is an even more inclusive taxonomic rank than the unique beginner, one might conceptualize this type of polysemy as one of taxonomic ambiguity as well. Four of the sampled languages are located in South America. For the Meyah language (East Bird's Head, Papua New-Guinea), Gravelle (2004:375) notes, “Meyah also lacks a single lexical item for ‘animal’. Instead, they use the phrase ‘things that live in the forest’, or they use specific terms for the kinds of animals.” The Bwe Karen (Sino-Tibetan) term
Furthermore, there are four languages (2.7%) in the sample that use ‘livestock’ as the unique beginner. The correlation with societal type is, as one might expect, perfect: all four languages are spoken in societies whose economy is based to a significant extent on pastoralism.
Complex Terms
Morphologically complex terms for the unique beginner rank, which are discussed in this section, are also common in the world's languages. Figure 4 provides more detailed data and maps showing the geographical distribution of each strategy.

Areal distribution of recurrent complex terms for the unique beginner rank.
One cross-linguistically common strategy to designate the unique beginner rank is to form a complex term based on the notions ‘alive’ or ‘animate’. For instance, Nez Perce (Penutian, Sahaptian)
A conceptualization based on the fact that animals, as opposed to other elements of the environment, can be killed is found in two languages of North America, Haida (Isolate) and Oneida (Iroquoian). Given the great geographical distance between the areas where these language are spoken–Queen Charlotte Island and the Great Lakes, it is likely that these are independent developments.
Complex terms on the basis of verbs meaning ‘to move’ occur in two language areas of the world, the Plains of North America and East and Southeast Asia, and in both cases is likely to be due to calquing on a local scale. In North America, the association is found in the Blackfoot (Algic, Algonquian) and Lakhota (Siouan). The Blackfoot term
In New Guinea, it is common to make use of a dvandva-compound to express the unique beginner rank. For instance, in Takia (Austronesian, Oceanic) the word designating the unique beginner rank is
Finally, a conceptualization of the unique beginner category on the basis of a verb meaning ‘to walk’ is found in Hawaiian (Austronesian) and Great Andamanese, an indigenous language of the Andaman Islands.
Other Patterns
Table 3 lists languages with other special terms, including information on their linguistic affiliation and glosses of the terms.
Languages with other special terms for the unique beginner rank.
There are also a number of languages with semi-analyzable terms, which are not included in the quantitative analysis and further discussion. However, it is still worth reporting on the semantic structure of these terms (Table 4).
Languages with semi-analyzable terms for the unique beginner rank.
Quantitative Evaluation
Twenty-nine languages in the sample (19.5%) have no identifiable term for the unique beginner, and there are 72 instances of special terms for the concept, including the ones that occur only once in the sample. However, this is not the same as saying that there are also 72 languages with a special term, for two reasons. First, one language may feature two terms, only one of which has special characteristics. Since without further fine-grained data there are no a priori reasons to assume a higher saliency for either of the terms, such languages might be provisionally and informally counted as having 0.5 special terms. Second, some languages may have been counted twice in the above analysis since they feature several special terms. For instance, Yoruba has a term with ‘meat’-polysemy,
Correlations with Societal Type and Size of the Speaker Community
Comparing societies according to their relative socio-cultural development necessarily involves significant simplification. It is a procrustean enterprise, given that no society will be precisely identical to any other society in all respects. Cross-societal research carried out in the 1950s and 1960s (e.g., Marsh 1967; Narroll 1956) sought to reduce the individual differences along which cultures may vary by conflating a large variety of parameters into a single figure that was meant to index the degree of societal development. Figures taken from these studies have sometimes been used in the ethnobiological literature (e.g., Witkowski et al. 1981) to check for correlations between particular aspects of the vocabulary of languages with the type of society in which they are spoken. For this study, a somewhat different approach was chosen that allows for examining more specifically particular aspects of societal organization. Murdock and White (1969:354–368), although rooted in the research tradition mentioned above, provide significantly more detailed data for each of the 186 cultures in their sample, indicating main and subsidiary modes of subsistence, level of political integration, and prevailing descent rules. Values for mode of subsistence were extracted from this source for each language in the present sample associated with one of the cultures in Murdock and White's sample.
To incorporate more languages in the statistical search for correlations with societal type, data on the mode of subsistence for additional cultures from this study were extracted from the Encyclopedia of World Cultures (Levinson 1991) and were coded in the same way as in Murdock and White (1969). Data are available for the majority of sample languages, but not for all. Statistical testing was therefore carried out only for the subset of the languages where cultural data from either Murdock and White (1969) or Levinson (1991) were amenable. Cultures were grouped according to their main mode of subsistence: 1.) hunting, gathering or fishing, 2.) horticulturalism or pastoralism, or 3). advanced agriculture. Grouping is complicated by the fact that some language communities may rely to the same degree on several different subsistence modes. For instance, the Toaripi of New Guinea engage both in hunting and gathering and horticulture. To accommodate for such ambiguities, societies that get at least as much subsistence from 1 as from 2 or 3 were grouped in mode 1, and societies that get more subsistence from 2 than 1 and at least as much as 3 were included in mode 2.
There are a number of further adjustments that had to be made for a meaningful statistical analysis. First, statistical analysis requires independence of the data to be evaluated, which is not the case if data come from genealogically related languages. To satisfy the condition of independence, only one language per family was included in the statistical evaluation. Languages for which data were not included in the statistical analysis appear in italics in Appendix A. A further issue is that some languages have several terms for the unique beginner, and these obviously need not all be of the same status. For instance, Hausa has
Distribution of types of unique beginner terms according to mode of subsistence.
Statistical evaluation of this distribution yields very significant results (Cramer's V = 0.3808, p = 0.01899, χ2, p-value by Monte Carlo simulation with 2000 replicates). To assess in more detail why this is statistically significant, it is instructive to look at the difference between the observed chi-square cell frequencies and the expected values based on the null hypothesis – that there is a random distribution of the types of unique beginner-terms over mode of subsistence. The most dramatic deviations are in boldface in Table 6.
Deviations of frequencies for each cell from expected values under the assumption of no correlation between mode of subsistence and the lexical encoding (or lack thereof) of the unique beginner rank.
There is a strong skewing with respect to the lack of a unique beginner term and societal type. Absence of a unique beginner term occurs almost exclusively in hunter-gatherer societies, the exception being the Laz people who practice agriculture but lack a lexical designation of the unique beginner rank. Apart from Laz, simplex, non-special, terms for the unique beginner are strongly associated with agriculture-based subsistence; this association is the second main contributor to statistical significance.
In addition, data from Lewis (2009) on the size of the speech communities of the sample languages were assembled. Size of the speech community and mode of subsistence are, of course, not independent variables. In hunter-gatherer societies, the family is the primary economic and social unit, which, together with the small population densities typical for hunter-gatherers, leads to relatively small speech communities. Societies of horticulturalists tend to be significantly larger, but are still smaller than societies with advanced agriculture (cf. Johnson and Earle 2000:246). To test for correlations with both subsistence mode and size of the speech community thus means approaching the same basic issue from two different but related angles. Testing for size of speech community also allows for the inclusion of more sample languages in the statistical analysis, as speaker numbers for all languages are provided in Lewis (2009). Additionally, a match in the results of both tests can be taken as a further indication of the validity of the correlations found. Extinct languages (Tasmanian, Biloxi, Upper Chehalis, Ineseño Chumash, Wappo and Tehuelche) were excluded from analysis. Where Lewis (2009) mentions a range instead of a precise number of speakers in a speech community, the mean of the range was used. Table 7 provides the geometric mean of the size of the speech community for languages without a lexical designation of the unique beginner, with special terms, and with simplex terms (for the precise figures for individual languages that are included in the statistical analysis, see Appendix A).
Geometric mean of speaker size of languages with no, special and simplex unique beginner terms.
Table 7 reveals that, on average, the size of the speech community rises proportionally with the order in which unique beginner terms are thought to come into being: languages with no unique beginner terms have on average the smallest number of speakers, languages with special terms are spoken by somewhat more people, and simplex unique beginner terms occur in languages with even more speakers. Statistically, though, no clear correlation emerges. With respect to the size of the speech community, the difference between languages that lack a unique beginner term and those with special terms is almost nonexistent (p = 0.9638, Wilcoxon Rank Sum Test). There is a statistical trend, but no correlation in the strict sense, for languages with simplex terms to be spoken by larger societies when compared with languages with special terms (p = 0.09537, Wilcoxon Rank Sum Test). When languages with no unique beginner term and those with special terms are grouped together (given that the internal difference between those is almost negligible), this trend is further strengthened (p = 0.07565, Wilcoxon Rank Sum Test). Mode of subsistence, therefore, appears to be the more powerful predictor of a language's lexical realization of the unique beginner rank.
Conclusion
The results presented in this paper are a clear statistical indication of the validity of the assumed association of languages without a term for the unique beginner level with traditional societies. In particular, the results suggest a correlation between the lack of a lexical designation for the unique beginner category and a mode of subsistence that relies primarily on hunting and gathering, and presence of a simplex term with no other function than to designate the unique beginner rank with agricultural societies. The size of the speaker community appears to play a subordinate role. As mentioned, the results of this investigation are preliminary. Further qualitative research may shed light on why traditional societies typically can do without a unique beginner term, and which societal developments play a role in the innovation of a lexical designation of the unique beginner category.
Footnotes
Acknowledgments
I am indebted to Cecil H. Brown, Bernard Comrie, Cliff Goddard, and the referees for this paper for helpful comments and useful suggestions, as well as to Heriberto Avelino for help with the Spanish abstract. I am also very grateful to Johanna Mattissen for providing unpublished field data for Laz, to Mark Donohue for unpublished lexical data for One and Yei, and to Tonya Stebbins and Julius Tayul for making a pre-publication version of their Mali dictionary available to me.
1
There are problems with English animal, which, as a folk category in American English, means ‘mammal’ for some speakers and is thus no designation for the unique beginner rank in the strict sense (cf. Wierzbicka 1984:315 and Brown 2002: 474–475 for discussion).
2
For Basque, this is suggested by the fact that the term was borrowed with the Latin neuter plural suffix. Neuter gender disappeared early in the development of the Romance languages (the plural of Spanish animal is animales). Early contact between Munda languages and Sanskrit is documented by loanwords from Munda languages into Sanskrit (Burrow 1946).
Sample languages and consulted sources.
