Abstract
The incompleteness thesis in cultural anthropology holds that the human mind is incomplete apart from culture, based on the rationale that the human brain and culture developed in interaction with each other, or “coevolved.” In evolutionary biology this process is known as the Baldwin effect, a form of natural selection in which individuals' innovative behavior creates novel selection pressures leading to genetic support or assimilation of that behavior. While the nature of these selection pressures is still controversial, as reflected in recent evolutionary accounts of language and tools, the Baldwin effect does not imply that evolution left the mind “unfinished” or rule out the possibility of psychic unity. Two basic approaches to the coevolution of mind and culture can be distinguished, according to how the human-made niche is defined. Informational coevolution is based on the idea that cultural artifacts and practices consist of information, such that the coevolutionary process produces cognitive mechanisms for acquiring and manipulating such information. Semiotic coevolution is based on the idea that culture consists of the meaning-making activity of the individuals who partake in it, such that the coevolutionary process produces the capacity to generate and experience such meaning. Although Evolutionary Psychology reflects the informational approach, the semiotic approach fits better with basic evolutionary principles.
Keywords
The incompleteness thesis in cultural anthropology states that, because the human mind evolved in tandem with—rather than prior to—the emergence of culture, it is “incomplete” without the guidance of cultural patterns (Geertz, 1973; Schwartz, 1992). This evolutionary notion underpins a basic tendency in interpretive studies to view culture as an “external control system” (Geertz, 1973), as an abstract and disembodied system of symbols with considerable power over human thought, and to downplay the role of individual agents in producing or manipulating those symbols. If the mind has no proper structure or function apart from symbols, after all, it must be the symbols themselves that define human mentation in all its diversity. Within cultural psychology, the incompleteness thesis shapes the view that higher-order mental functions are inseparable from cultural meanings (Miller & Schaberg, 2003), and that research should therefore focus on cultural diversity in thought and emotion rather than on innate processes or psychic unity (Shweder, 1990).
The incompleteness thesis was inspired by Geertz's (1973) observation that humans are born as “incomplete or unfinished animals” in desperate need of symbolic codes and patterns to orient themselves to the world (p. 49). Early hominins, Geertz argued, began to invent culture long before some hypothetical “critical point” of biological development. Hominins of the Lower Pleistocene used crude tools and an assortment of communicative signals, but their brains were three times smaller than those of modern Homo. In fact this early tool use and communicative activity itself spurred cognitive and cortical development, by changing the environment to which these early hominins had to adapt. Enhanced cognitive abilities, in turn, led to more advanced tools and social activities, giving rise to an evolutionary “feedback loop” between culture and mind.
In evolutionary biology this coevolutionary process is known as the Baldwin effect, a form of natural selection in which organisms, by dint of their own flexible and intelligent behavior in response to adaptive challenges, influence the selection process in such a way as to bring about genetic support of that very behavior. More simply, Baldwin's idea was that learning guides genetic change. Yet this idea raises certain questions, one of which is the nature and extent of such change. While the Baldwin effect is often associated with the genetic “assimilation” of behavior, in which learned patterns give way to genetic control, there is nothing about the principle that makes this inevitable. Indeed some versions of the Baldwin effect emphasize genetic “accommodation” rather than “assimilation,” claiming that cultural innovations such as language and tools create pressure for adaptations that enhance rather than diminish phenotypic plasticity and the capacity to learn (Deacon, 1997; West-Eberhard, 2003).
This is the approach to which Geertz tacitly appeals, but for him genetic accommodation to culture resulted in so much plasticity as to render the mind shapeless and underdeveloped, bereft of intrinsic adaptive features. He thus misconstrues the reciprocal nature of the Baldwin effect as an evolutionary process. If mind lacks a coherent design apart from culture, how did culture get invented in the first place, how is it learned and transmitted, and how does it change? The Baldwin effect is a form of natural selection, and natural selection is a process that enhances the fitness of organisms by crafting traits with proper structures and functions. Coevolution is still evolution, and as a product of coevolution the mind is an adaptive trait in its own right.
The Baldwin effect: Background and recent developments
Although named after the American psychologist James Mark Baldwin, the idea of the Baldwin effect was proposed independently by Baldwin (1896), Osborn (1896), and Lloyd Morgan (1896), all of whom wished to preserve a role for learning and conscious choice in the evolutionary process. All three theorists were responding to findings by Weismann (1891) that seemed to rule out any form of Lamarckian inheritance, or the direct genetic transmission of traits acquired during one's lifetime. The alternative they came up with was that learned behavior could indeed produce genetic change, not directly, by altering the genetic material passed to one's offspring, but indirectly, by increasing the survival and reproduction rates of individuals and their descendants, thereby fostering hereditary changes that support the learned trait.
Baldwin and Lloyd Morgan took as an archetype of their idea the development of animal instinct from learned behavior. In the face of an environmental problem, individual animals with enough plasticity and intelligence to find a solution through innovative behavior—or to learn such behavior from others—are at an advantage. At the outset natural selection favors such plasticity and the ability to learn, but learning has its costs: it takes up time and energy, and offers no guarantee that the behavioral adaptation will be transmitted correctly. Over time, then, natural selection favors individuals who can acquire the new behavior more easily and quickly, with less learning. Eventually, as in the Lamarckian scheme, the learned behavior yields to instinctual control, although this process takes place over the course of many generations rather than just one.
For its proponents the Baldwin effect thus suggested a role for learning in the evolutionary process, but the affinity to Lamarck's model, even at one remove, proved a source of controversy in the context of 20th-century evolutionary theory, known as the Modern Synthesis, with its emphasis on genetic variation due to random mutation as the basis of selection. In an influential critique, Simpson (1953) acknowledged that learning could influence natural selection in some cases, but argued that such influence would be difficult to verify given the possibility that a trend toward genetic control could result merely from an altered expression of existing genes rather than a change in the genotype. Mayr (1963) and Dobzhansky (1970) took this critique a step further, asking why, if learning is so effective in allowing individuals to acquire behaviors from earlier generations, any selection pressure for fixity and genetic assimilation would arise in the first place. Presumably such pressure might occur when the benefits of an automatic response outweigh the costs, as in cases of extreme environmental stress. Yet to skeptics of the Baldwin effect, bartering away the flexibility of learned adaptations for the fixity of genetic ones seems a dubious evolutionary bargain, and paradoxically would undermine the very plasticity on which the process depends: learning leads to genetic change, but genetic change leads to a loss of learning. So conceived, the Baldwin effect has a built-in tendency to extinguish the very learning and flexibility that makes the Baldwin effect possible.
Even if some form of genetic assimilation in the presence of learning can be documented, moreover, this outcome does not necessarily mean that learning was a causal factor. In Baldwin's original formulation, as Godfrey-Smith (2003) pointed out, the learning stage is not essential to the eventual outcome of genetic assimilation, which might just as easily result from normal processes of natural selection. For Baldwin (1896) the learning stage served as a buffer against environmental pressures, keeping the organism's lineage alive long enough to allow natural selection to accumulate genetic variations in support of the behavior. Yet few adaptive problems involve such extreme, “life-or-death” consequences; even those individuals who do not learn the adaptive skill can reproduce and pass on mutations conducive to it, thereby altering gene frequencies. The key point is that the selective environment already contains the pressures that make the mutations useful for survival. These pressures make it possible for natural selection to achieve the same genetic change the Baldwin effect achieves, but without an intervening stage of learning by members of the population. The possibility of such an outcome in turn makes it difficult to prove that learning affects the selection process, or that the Baldwin effect is an independent evolutionary factor.
Given that the very definitions of “learning” and “selection process” are still undecided, however, the debate about the Baldwin effect is far from over. A more recent interpretation, based on the evolutionary concept of niche construction, focuses on how learning guides natural selection by altering the selection pressures themselves. As explained by Godfrey-Smith (2003), individuals not only cope with the adaptive niche through learning and plasticity, they transform it, generating selection pressures that did not exist before. In contrast to the original formulation, in which an activity, preference, or skill is favored by selection pressures “out there” from the start, the niche construction approach holds that these behaviors, once widely adopted, create a new adaptive niche, with new adaptive challenges, and the genetic response is to this newly constructed niche rather than to the original setting in which the learned capacity arose. Natural selection will favor individuals best able to cope with the learning demands imposed by this novel niche, a selection process that exposes or “unmasks” preexisting genetic variation in the population and need not wait for additional mutations (Deacon, 2003; Dor & Jablonka, 2001). The development of language and tools among early hominins created just such a demand for learning. Gene combinations supporting language and tool use could not have served this function before the Pleistocene; they became selectively relevant because these learned behaviors had transformed the niche in a fundamental way.
The niche construction approach to the Baldwin effect is related to but distinct from the more general concept of niche construction as a model of adaptation (Laland, Odling-Smee, & Feldman, 2000; Lewontin, 1982). In his critique of adaptation as an accommodation or “fit” to a preexisting niche, Lewontin (1982) argued for a concept of “mutual construction” whereby organisms actively choose and modify the environments they inhabit. Building on this idea, Laland et al. (2000) defined niche construction as a ubiquitous adaptive process in which the choices, activities, or innovations of organisms, by altering their adaptive niches, generate “feedback” in the form of altered selection pressures and genetic change in future generations of those organisms. Beehives, spider webs, bird nests, beaver dams—all typify niche construction, and in every case the organism evolved new traits in response to the need to maintain, defend, or regulate its modified niche. Oft-cited examples among humans include the development of lactose tolerance among populations that domesticated goats and cattle, and the selection of malaria-resistant blood proteins (sickle-cell hemoglobin) among populations whose farming methods led to an increase in breeding areas for mosquitoes (Durham, 1991).
Although these examples involve novel selection pressures stemming from human-made changes to the adaptive niche, they do not involve the genetic encoding of the learned behaviors in question (e.g., herding goats or planting crops). Coevolution due to niche construction, in other words, may have broad genetic consequences, by no means limited to the assimilation or support of the particular behaviors on which such construction is based. The Baldwin effect can be understood as a special case of niche construction in which selection pressures do act on learned capacities, and do so because such capacities, once entrenched in the social ecology of the population, themselves form the selection pressure; they become so integral to survival that individuals lacking them suffer a loss of fitness. Whereas in the original formulation learning helps individuals cope with a preexisting, “given environment” (Baldwin, 1896, p. 553), the niche construction approach holds that the “given environment” of natural selection is itself modified by learned behavior. Learning is thus instrumental rather than incidental to the eventual genetic change.
A further departure of the niche construction approach to the Baldwin effect concerns the extent to which learned behavior is made innate. Although niche construction is found in a broad range of species, including both animals and plants, human niche construction is exceptional in the breadth and creativity of its technological, behavioral, and symbolic elements (Kendal, Tehrani, & Odling-Smee, 2011). Adaptive behaviors are adaptive in the context of culturally grounded knowledge about the world—conventions, narratives, beliefs—and thus dependent on the transmission and learning of this knowledge as it accumulates across generations. The heterogeneity and changeability of human culture thus makes it difficult for natural selection to fix adaptive behaviors genetically. Innovations such as language and stone tools are not singular, static constructions like bird's nests or beaver dams; they are tools of adaptation whose behavioral applications are diverse, open-ended, and better served by flexibility than by fixed action patterns. Human niche construction—using a handaxe to prepare a carcass, or a word to point the way to water—begets further niche construction. Carrying hand axes for scavenging, for example, creates the need for containers such as skin pouches, and new tools such as scrapers to make the pouches with. Pouches make it easier to travel long distances in search of food, which heightens the need for planning and communication.
In the context of hominin evolution, then, the niche construction approach to the Baldwin effect identifies the capacity for cultural learning, rather than specific learned behaviors, as the locus of adaptive pressure, and further claims that this pressure favors behavioral plasticity and the enhancement of learning rather than assimilation and genetic control. The result is not the ill-formed, ineffectual plasticity of Geertz's model; plasticity, to be adaptive, entails the ability not only to acquire social knowledge, but also to comprehend one's day-to-day, moment-to-moment circumstances in terms of it, to evaluate events for their relevance to survival, and to act accordingly. Flexible behaviors in turn generate new problems, not all of which social learning provides a solution for, and this means that phenotypic plasticity, to be adaptive, must not only be discriminating but inventive. Genetic variants underlying such plasticity are different from, but no less substantial or complex than, those underlying behavioral fixity; they serve to support and stabilize learning, thinking, and inventiveness, in essence making plasticity more reliable, an adaptive outcome described by West-Eberhard (2003) as “genetic accommodation.”
In fact, as West-Eberhard (2003) and others have shown, Baldwin himself recognized that the selection effects of learned behavior might lead to increased rather than decreased intelligence and phenotypic plasticity. Baldwin (1902) viewed consciousness as the basis of this plasticity and intelligence, as a discrete variation whose function is to enable individuals to “profit by experience” and to inherit the social patterns and knowledge of past generations (p. 39). The need to accommodate oneself to these social patterns, Baldwin held, itself has evolutionary consequences, leading to a “steady refinement of plasticity and its accompanying intelligence” (1902, p. 38). Despite these hints at a niche construction approach, however, Baldwin's thinking remained tied to the orthodox view of natural selection as an autonomous process grounded in external, stable environments, and he did not pursue the idea of social behavior as a selection force or the question of how culture might have shaped mind in a coevolutionary relation spanning millions of years. The niche construction approach, by acknowledging that behavioral solutions to adaptive problems modify the context of natural selection, opens the way to an understanding of how higher mental capacities among humans could have been selected for their adaptive value.
Yet understanding the influence of culture on human evolution ultimately hinges on understanding what “culture” is. For all the importance of cultural knowledge to human survival, its impact as a selection force depends on the nature of knowledge itself. Two basic approaches to the coevolution of mind and culture can be distinguished according to how they answer this question. The informational approach, represented by Evolutionary Psychology, holds that cultural artifacts and practices consist of information, defined in computational terms as objective stimuli from the environment that result in input to the organism. Tasks of survival imposed by the human-made niche during the Pleistocene were tasks of information processing (Tooby & Cosmides, 1992). The semiotic or meaning-centered view, by contrast, holds that the diverse innovations of the Pleistocene possessed representational power—and thus adaptive pressure on the evolving hominin mind—only by virtue of the meanings they had, as defined by collectively shared frames of reference. Tasks of survival imposed by the human-made niche were tasks of meaning-making, such that the coevolutionary process produced the capacity to generate and experience such meaning.
Contrary to the incompleteness thesis of cultural anthropology, both of these approaches agree that the coevolution of mind and culture produced fully fledged and distinctly human mental adaptations. Both agree, consistent with the niche construction approach to the Baldwin effect, that social and technological innovations during the Pleistocene introduced selection for behavior that would not have been adaptive before. These approaches disagree, however, about the nature of the human-made niche and the kind of selection pressure it created. This is not a dispute about the gross anatomical changes wrought by these pressures on the hominin body and brain. The human brain is three times larger than what would be predicted for a primate of our size (Passingham, 1982), and both coevolutionary approaches concur that these proportions reflect the selection force of culture. Rather, the dispute is about what kind of mental activities were selected for, and thus what kind of mind this enlarged brain supports.
The Baldwin effect and language
One way to address this question is to consider specific forms of hominin niche construction during the Pleistocene and the adaptive challenges they posed. Language is the quintessential example. A few theorists, most notably Chomsky (1988), have argued for a “discontinuity” approach to language evolution, according to which language is fundamentally unlike prelinguistic communication among nonhuman primates and therefore must have developed abruptly, due to a singular, chance mutation in the brain of an early hominin or to the “co-option” of a preexisting structure designed for another purpose, such as navigation (Hauser, Chomsky, & Fitch, 2002). For “continuity” theorists, by contrast, its very complexity means that language must have evolved from earlier prelinguistic capacities through a process of gradual, adaptive change in response to environmental pressures. But this raises the question: what were these environmental pressures? If early hominins broke away from a communicative pattern of calls, grunts, and gestures that had met their needs for millions of years, a pattern that continues to suffice for other great apes to this day, there must have been a change in the conditions of survival that made the transition to language adaptive.
The answer of the continuity theorists is that this selection pressure came not from the preexisting or “given” environment, but from a modification to the environment introduced by the hominins themselves. A Baldwinian approach suggests that a simple set of words, or protolanguage, could have developed among early hominins without any new adaptations to support it, and once this communicative behavior spread through the population it began to exert pressure for just such supportive adaptations. Any perceptual, motor, or cognitive predisposition that enhanced the ability to learn this protolanguage, and thereby to communicate with others and gain access to cultural knowledge, would have been favored by natural selection. Each incremental change led to more complex forms of language, increasing the pressure on linguistic capacity.
Where continuity theorists differ, however, is on the nature and extent of these adaptive changes. Anatomical changes corresponding to the articulatory and auditory demands of the linguistic niche are well documented, such as changes in the position and shape of the larynx and tongue (Lieberman, Laitman, Reidenberg, & Gannon, 1992). It is likely that this niche also favored brain-based changes related to speech perception and production, including neural support for the ability to discriminate among linguistic sounds, to produce clear speech sounds with varying intonations, and to focus on and hold others' speech sounds in memory long enough to grasp their meaning as a whole (Dor & Jablonka, 2010). Apart from the specific demands of language use, of course, the linguistic niche imposed the more basic demand of learning a given language to begin with, and this too formed a selection pressure on the evolving hominin brain. Consistent with the Baldwin effect, socially entrenched behavior favored genetic variations in support of more efficient and reliable learning of that behavior. The key question for continuity theorists, however, is whether specific linguistic contents—rules of grammar, symbolic categories, and patterns of verbal behavior—were assimilated to the genes as part of this Baldwinian process. How continuity theorists answer this question depends on how they define the adaptive task of language learning, and ultimately how they define language itself.
Deacon's The Symbolic Species (1997) defined this task as one of meaning-making. Language use in Deacon's view is not a simple adaptive skill like gathering fruit or avoiding predators, and thus not the kind of behavior that lends itself to genetic encoding in the form of an instinct. Language depends on a process of symbolic reference, in which basic units like words or morphemes derive their meaning not only from a correspondence to objects in the world but also from their embeddedness in the entire system of symbols that makes up a given language. Linguistic behavior goes hand-in-hand with this intersubjectively shared system, and once such behavior pervades an adaptive niche, turning it into a symbolically mediated environment, survival depends on the ability to learn this system, and to make references and communicate on the basis of it. This is a Baldwinian process, but the key point is that the linguistic niche—at least in a form that could exert an adaptive pressure—only exists as it is given representational power by the community of individuals who inhabit it. This representational dimension cannot be genetically instilled, simply because representation depends on a mental act of reference. Accordingly the evolved solution to the problem of language learning consists not of any linguistic content per se, but of supportive adaptations in the form of vocal, auditory, and referential capacities that make such content easier to acquire.
Pinker's The Language Instinct (1994), by contrast, defined the adaptive task of language learning as one of information processing. The basic units—words—come precoded with information, and processing these units is a matter of combining them in a coherent order rather than generating reference. Among early hominins these combinatorial skills initially consisted of learned behavior, but over time were assimilated in the form of a “language instinct.” “Instinct” Pinker defines not in the traditional sense of an innate fixed response, but rather as an innate disposition to behave in a certain way, based on genetically encoded rules for learning and applying that behavior to adaptive problems. And for Pinker, the rules of language are the rules of grammar, such that the language instinct consists of an innate “blueprint” of grammatical knowledge—a universal grammar.
The universal grammar is a neural structure, unique to humans, containing the core grammatical principles of all the languages in the world. Its function is to enable young children, even without special training or extended exposure to grammatical sentences, to learn the grammar of their native tongue and thereby adapt to a local linguistic environment. The idea of a universal grammar has had many proponents, foremost among them Chomsky (1988), but Pinker's work attempts to show how it could have evolved as a biological trait. To this end he appeals to the Baldwinian logic underlying the “massive modularity thesis” of Evolutionary Psychology, according to which adaptive tasks faced by our Pleistocene ancestors such as communicating, finding food, or using tools were tasks of information processing, and the evolved solution was a collection of brain-based circuits— “cognitive modules”—each dedicated to processing information about one of those tasks.
Adapting to the human-made niche, by this view, is not simply a matter of acquiring information, but also being able to distinguish types of information relevant to survival and to manipulate that information adaptively. This is not a single capacity but a vast array of them, corresponding to the vast array of problems facing Pleistocene hunter-gatherers, and all began as learned responses to these problems which, once having taken hold in the population, imposed a selection force in their own right. Natural selection then acted on genetic variations supporting the learned response, and it did so by retaining neural connections that strengthened the relation between informational input and behavioral output. Enhanced problem-solving behavior in turn created selection pressure for yet more powerful information-processing capacities, and so on until the learned behaviors were assimilated in the form of innate algorithms—rules and procedures that encapsulate information about a particular adaptive problem and how to solve it.
This is the Baldwin effect as framed by the information-processing model (see also Briscoe, 2000). The algorithms contained in cognitive modules, like those in computer software, operate on information about a prespecified problem and produce output in the form of a correct solution to the problem. “Information” too is defined computationally, not in the ordinary sense of facts or details but as an arrangement of objects in the environment that can be impressed on a substrate such as circuits in a computer or neurons in a brain. Whether in a computer or a cognitive module, the processing of such information occurs automatically, without conscious deliberation or guesswork. It follows that the meaning or semantic content of informational units must be intrinsic or “preassigned” to them (Bruner, 1990). Indeed the very notion of modularity—that information about an adaptive problem activates the relevant module, setting in motion a cascade of physical effects guided by innate knowledge about that very problem—assumes that the information has intrinsic representational power.
Linguistic information is no exception. In Pinker's (1994) view each language is a “discrete combinatorial system,” which means that the basic units retain distinct properties even when combined (rather than merely blending together, as in the case of light or sound), which in turn results in combinations with properties quite different from those of the individual units. Another example is the genetic code in DNA, in which four basic nucleotides combine to produce an infinite array of distinct genes, all with distinct properties. Despite this variation, however, the properties of the whole, be it a gene or a sentence, can always be predicted from the innate properties of the parts and the rules by which they are arranged. The function of the language instinct is to combine words according to a set of rules—the universal grammar—so as to produce larger structures—sentences—of endless variety.
The semiotic model, by contrast, holds that the task of learning language is more fundamental than combining linguistic units, because the very existence of linguistic units depends on a collectively shared process of reference. The key error of the universal grammar, as Deacon (1997) noted, is to assume that the tokens arranged into sentences are somehow symbolic or representational in their own right. Indeed the idea that language itself is based on grammar—on the proper arrangement of linguistic units—only holds up if the linguistic units being arranged already mean something. The semiotic model says no, they do not; to mean anything to anybody, linguistic signifiers must refer or “point to” signifieds in an individual's mind, against the background of a greater system of signs. No concatenation of linguistic units will deliver meaning in the absence of this “pointing to” process.
The field of semiotics offers a way to characterize this process of reference and thus the adaptive pressure posed by language. Ferdinand de Saussure (1916/1986) defined a sign as a unit of meaning comprising a signifier and a signified. The signifier is the form which the sign takes, for example a printed word, a riff of music, or a gesture, while the signified is the mental concept to which the signifier refers in the mind of the beholder. Although Saussure described the signifier-signified relation as a dyad, other theorists, most notably the American philosopher Charles Peirce (1978), have argued that a referential or “pointing to” relation requires a third term, or what Peirce labeled the “interpretant.” While the nature of this third term is still a matter of debate, it can be described broadly as a conceptual ground or point of view belonging to the interpreter, such that signifier points to signified to an experiencing subject, in accord with preexisting, internalized concepts and understandings. A more recent study defined this third term as the intendant—a field of preconscious aims, goals, or purposes in the service of which signifier points to signified (Cousins, 2012). By this view the “pointing to” relation is framed not only by socially inherited concepts about the world, but also by the individual's goals in that world, such that the reference itself functions to facilitate attainment of those goals and thus to enhance the individual's adaptedness to the symbolically mediated niche in which those goals are formed.
In both the semiotic and informational models, then, language use among early humans created a novel adaptive niche, with selection pressures favoring an improved capacity for language learning. But because these two models define language differently, they view the learning process differently as well, and thus the selection pressure imposed by it. For Evolutionary Psychologists, language learning, as with the learning in any other behavioral domain, is a matter of processing information, and the innate meaning of this information makes it possible to encode it genetically, in the form of domain-specific rules, thereby reducing the need for learning. The semiotic model instead views learning as a process of meaning-making—of relating signifiers to signifieds in terms of socially shared conceptual frames. Indeed this type of learning is the only option, given the basic epistemic condition that the ties of reference are not intrinsic to the world. What the selection pressure of language consists of is not words per se, or the rules of grammar binding them together, but rather the referential process that gives words their representational power in the first place.
Evaluation of Baldwinian approaches to language
As variations of the Baldwin effect, and thus of natural selection, the semiotic and informational models of language evolution can be further evaluated according to how well they conform to basic evolutionary principles. One such principle is that a given selection pressure, to result in genetic change, must be consistent over vast spans of time. If the coevolution of language and the brain has produced genetic adaptations to support language learning, as both the informational and semiotic models hold, then some feature of the linguistic niche must have exerted such a consistent pressure. Yet languages typically undergo continual change in their phonetic, syntactic, and semantic aspects. The pace of language change far exceeds that of evolutionary genetic change, such that any linguistic feature susceptible to such change could not have imposed a consistent, long-term pressure on the evolving human brain (Christiansen & Chater, 2008).
A key question for the Baldwinian view of language as a selection pressure, then, is how to reconcile language change with the evolutionary stipulation that selection pressures be invariant. This poses a particular problem for the informational model and its conception of the language adaptation as an algorithmic processing device. For the genetic assimilation of such a device to occur, the overt content of language—the informational units (words) that need to be combined into larger structures (sentences)—must exert a consistent selection pressure over a long period, yet such overt content changes much too fast to do this. Words shift in how they are pronounced, they lose meanings or gain new ones, or fall into disuse as new words come into being.
Even in the event that an early group of East African hominins spoke a single protolanguage from which all other languages developed, it is unlikely that such a language would have endured long enough to have its grammatical principles instilled in the brains of those hominins, or that they would have stayed together as a group long enough for this to happen. Although Evolutionary Psychologists refer to “the Pleistocene” as a stable, more or less uniform adaptive environment, in fact evidence shows that early hominins coped with a wide range of habitats and climatic conditions (Potts, 1998). Such diverse habitats would have spurred diverse solutions in the areas of food gathering, social organization, tool manufacture—and diverse forms of language to reflect on and communicate about those solutions. Language diversity and change, then, would have been ubiquitous even as the language capacity was still evolving, circumstances hard to reconcile with the adaptive outcome of an innate grammar device designed for a finite set of linguistic units.
On the semiotic view, by contrast, it was not the overt content of language that exerted a selection pressure, but rather the process of reference that made it possible for early hominins to produce and interpret that content. The evolutionary criterion of invariance applies not to units of linguistic information or to the grammar rules binding them together, but to the process of reference that makes these units representational to begin with. This process is not only compatible with language change but the very basis of it, and as a selection pressure remains constant throughout such change.
Change is inherent to language precisely because the reference of linguistic signifiers is grounded in the semantic context of the entire language, as shared by a community of individuals who use that language to communicate about their own needs and purposes (Lehmann, 1985). Changes in these needs open the way to novel forms of language, be it new usages of familiar words or the invention of new words and phrases. Indeed the growth of language can be explained in large part as the metaphorical or analogical extension of conventional terms to new or unfamiliar contexts, such as the use of “hang-glider” to describe an unpowered flying rig or “cyberspace” to describe the electronic medium of computer networks (Campbell, 2004; Traugott & Dasher, 2002). In semiotic terms, language change is based on novel “pointing to” relations between signifier and signified, corresponding to changes in the interpretive context by which that relation is defined.
By the semiotic view, then, the coevolutionary relation between language and the brain during the Pleistocene could have continued even after language began to change and diversify, that is, even after an initial period of uniformity in the content of language as a selection force (if there ever was such a period). However languages diverged among hunter-gatherer groups, the selection pressure of being able to “enter in” to this collectively shared intentional world remained invariant, and this meant being able to internalize the semantic system of language and to interpret signifiers—linguistic or otherwise—in terms of it. As languages grew, the selection pressure stemmed not from the need to process an expanded vocabulary according to more complex grammatical rules, but from the more fundamental need to experience the referential or “pointing to” relation of linguistic units in terms of an expanded linguistic context. Pleistocene individuals who were best able to do this, to glean meaning from the babble of sounds and expressions being used around them, would have been more involved in their community, more attractive as mates, and more likely to have offspring. Enhanced neural capacity for language then led to more complex forms of language, increasing the pressure on neural capacity.
What was the neural precursor of this capacity, and thus the focus of selection pressure imposed by language during the Pleistocene? This question points to another evolutionary criterion by which the semiotic and informational approaches can be judged. Evolution, as Darwin (1859) explained, is “descent with modification.” Natural selection never starts from scratch but always recruits prior structures and modifies those structures in ways that help solve an adaptive problem (Jacob, 1977). Whatever its form, then, the early linguistic niche was based on preexisting mental capacities and brain structures, and it was modifications of these very brain structures that in turn provided a better fit to the linguistic niche and were captured and elaborated by natural selection. This reciprocal process presumably originated with nonhuman primates, which means that a model of human language evolution ought to be able to identify a nonhuman precursor to the human language capacity, including supporting structures in nonhuman brains, and show how nonhuman communicative capacities are on a continuum with human ones.
This criterion too poses a problem for the concept of a neural device laden with innate grammatical rules. By implication a nonhuman precursor of this device should support an intermediate form of grammar for an intermediate or “simple” vocabulary, yet no such communicative system has ever been found among nonhuman primates or other animals. Nonhuman primates use calls and signals to warn others of danger, identify foods, or assert claims to territory, but these calls and signals are based on direct links to things in the environment—predators, prey, territorial markers—and lack the kind of complex interrelationships with other calls and signals that give words in human language their syntactic properties. And if nonhuman communication systems have no words and no need for grammar, there is no adaptive need to encode grammatical rules, and no apparent basis for a neural precursor to the universal grammar in humans. The implausibility of such an evolutionary transition is one reason Chomsky (2004) and others have favored a “discontinuity” approach to the origin of language, and why Evolutionary Psychologists have yet to elaborate on the origins of the universal grammar among nonhuman primates (Bickerton, 2007).
The semiotic model, in its emphasis on referential relations between signifier and signified rather than grammatical relations among words as the basis of language, highlights the continuity rather than discontinuity of human and nonhuman systems of communication. The semiotic model makes no assumption of intrinsically meaningful units; signifiers take on meaning only by referring to signifieds in individual minds, and this is true of signifiers in nonhuman communication systems no less than those in human ones. Nonhuman calls and signals—though not “words” with syntactic properties—nevertheless could not function as calls and signals without pointing to signifieds in individual animals' minds. Although this referential relation is not situated in the broader context of language, it is still reference. By this logic, then, symbolic reference is not the only kind of reference, but a particular type of it that only humans have acquired the capacity for.
Indeed semiotic theorists have outlined a hierarchy of referential relationships, based on the degree of complexity of the signifier-signified relation, which illustrates the kind of reference that nonhuman animals experience and its continuity with symbolic reference among humans (Deacon, 1997). Peirce (1978) described three types of signifying relations: iconic, indexical, and symbolic. Iconic reference is based on a visible similarity between signifier and signified, such as the relation of a painting or sculpture to its subject matter. Indexical reference, by contrast, is based on a causal or spatial relation of signifier to signified, as in the relation of thunderclaps to stormy weather, or the mercury in a thermometer to temperature. Whereas both iconic and indexical reference thus depend on knowledge of physical correlations in the environment, symbolic reference is based on a preexisting, socially shared convention or code such as language. The signifier “octopus” refers to a particular animal not because of any physical resemblance or proximity, but because individuals who know the English language agree that it does.
This is the standard rendition of the referential hierarchy, but it is misleading to the extent that it identifies particular signifiers as icons, indexes, or symbols. If signifiers have no intrinsic reference, it follows that no signifier is an icon, index, or symbol in and of itself. The referential hierarchy is based not on anything intrinsic to signifiers, then, but rather on the interpretive capacity by virtue of which they become signifiers. This capacity can be understood as a characteristic mode of reference, grounded in the brain, that defines all the interpretive activities of a given species (Deacon, 1997). Human beings, adapted to the niche of language, invariably perceive things symbolically, including so-called icons like pictures or statues, or indexes like thermometers. Nonhuman animals, by contrast, are bound by their characteristic mode of reference to perceive even the most complex symbol as an icon or index (if they perceive it at all).
Understood in terms of this referential hierarchy, nonhuman communication capacities—and the brain structures supporting them—appear to provide an evolutionary precursor to human language. In Deacon's (1997) view the calls and signals of nonhuman animals are based on iconic and indexical reference, in which signifiers point to signifieds in terms of causal, spatial, or temporal associations in the animal's adaptive environment. African vervet monkeys, known for their repertoire of alarm calls, are an oft-cited example (Cheney & Seyfarth, 1992). As studied in the wild, vervet monkey calls serve to alert other monkeys to approaching predators—leopards, snakes, eagles—as well as to elicit protective responses appropriate to the danger. Despite this referential function, however, the “pointing to” relation of signifier (particular call) and signified (concept of predator) is invariably defined by stable physical correlations in the monkeys' habitat, as experienced by each monkey. If a particular predator were to disappear from this habitat, then the respective alarm call—its interpretive basis undone—would vanish as well.
Efforts to teach nonhuman great apes (gorillas, chimpanzees, bonobos, orangutans) a simple symbol system further suggest that their communicative capacities, while functionally continuous with those of humans, are designed for indexical rather than symbolic reference. Chimpanzees have been taught to communicate with hand signs based on American Sign Language (Gardner & Gardner, 1985), as well as to use lexigrams (geometric symbols on an electronic touchpad) to indicate objects or activities. Some of these studies have yielded evidence of symbolic comprehension, such as when the chimpanzees Sherman and Austin learned how to point to the lexigrams for “food” and “tool” to classify unfamiliar food and tool items, suggesting a grasp of “food” and “tool” as abstract categories (Savage-Rumbaugh, 1986). The bonobo Kanzi is renowned for his knowledge of over 3000 spoken English words, and for his ability to respond to complex, novel commands such as “Give the doggie some carrots” or “Put the telephone on the TV” (Savage-Rumbaugh & Lewin, 1994). Kanzi is able to produce his own sentences as well, combining lexigrams in two- or three-sign groupings that conform to an action–object syntax and convey appropriate—and often novel—wishes or intentions: “Chase ball”; “Ice water go”; “Hide peanut.” Remarkable as they are, however, these achievements have occurred in captivity, in interaction with human caretakers, and do not reflect spontaneous responses to adaptive tasks in a natural habitat. The capacity for symbolic reference seems to be present in incipient form among nonhuman great apes, as it may be in other mammals such as dogs, elephants, and bottlenose dolphins (King & Janik, 2013), or birds such as African grey parrots (Pepperberg, 2006), but the neural substrate is not designed for a fully fledged symbolic adaptation as it is in humans.
The development of this adaptation likely began among the australopithecines approximately 2.5 million years ago, and involved a coevolutionary process in which rudimentary symbol use favored the selection of supportive anatomical and neurological structures, which then enabled greater symbol use. As the studies on ape language suggest, the transition to symbolic reference depended on a prior capacity of indexical reference—on the learning of an extensive set of references based on direct links between communicative signifiers and worldly objects, events, and actions. The transition occurred when these indexical references—spoken signs for types of prey, navigational directions, raw materials for tools and shelter—grew so numerous and entrenched in the communicative habits of a hominin community that these indexical references themselves began to define reference—they became part of the interpretive context in terms of which signifiers pointed to signifieds in individual hominin minds. The signifier-signified relation for “wildebeest,” for example, might now be defined not only by prior associations to a particular animal, but also by prior knowledge of the signs for “food,” “search,” or “axe.” What “wildebeest” meant to an individual hominin, in other words, now derived from a network of related signs; the reference itself was situated in—and accommodated to—communal conceptions of the world and its adaptive tasks.
Thus detached from its indexical moorings, the signifier for “wildebeest” could be used to convey needs apart from any particular setting or animal, and, if the studies with Kanzi and other apes are a guide, the next step was to form multiword combinations to express more complex needs and intentions: “Wildebeest there”; “Chop body”; “Carry meat home.” As Kanzi's novel sign combinations suggest, moreover, this process did not depend on a prior knowledge of syntax (Savage-Rumbaugh, 1990); it depended first and foremost on a capacity to understand signifiers apart from their grounding in the here and now. Syntax—regularities of word order in these rudimentary groupings—developed as an aid to comprehension, but, from an evolutionary standpoint, was incidental to the symbolizing faculty by which words could become words and stand in relation to each other in the first place.
Unlike in Kanzi's case, there were no Homo sapiens present to reward the achievement of symbolic reference, not to mention electronic touchpads to facilitate it. This achievement had to provide its own reward, so to speak, in terms of survival value, and it did so by making it possible, among other things, to communicate about objects and events at a remove from immediate adaptive settings, to abstract out the essential elements of those settings and form concepts by which to cope with novel problems of survival, and moreover, to conceive of and communicate about solutions to those problems. Such were their advantages that symbolic thought and communication would have pervaded every domain of social life, transforming the adaptive niche into a symbolically mediated one and creating selection pressures for genetic support of the symbolic capacity.
This leads back to the question of what the neural precursor of this capacity might have been. For many decades neuroscientists have linked the high-level mental abilities of humans to the prefrontal cortex, pointing to a disproportionate expansion of this cortical area, as compared to that of other primates, which occurred in tandem with the development of language over the past two million years (Blinkov & Glezer, 1968; Passingham, 1982). The prefrontal cortex occupies the forward-most portion of the frontal lobes and is characterized by extensive interconnections with other parts of the brain, including sensory systems, motor control systems, and midbrain systems involved with affect and memory. In humans these connections appear to support the prefrontal cortex in a variety of “executive” functions such as acting on goals, solving problems, and reflecting on experience—all of which involve language (Miller, Freedman, & Wallis, 2002; Nieder, 2009).
Comparative studies since the turn of the century have nonetheless questioned the idea that the prefrontal cortex is disproportionately large in human as compared to nonhuman primates, pointing instead to changes in the organization and relative volume of particular prefrontal areas or components (Falk, 2012; Smaers et al., 2011). Documented areas of expansion include Brodmann area 10 (Semendeferi, Armstrong, Schleicher, Zilles, & Van Hoesen, 2001) and the dorsolateral prefrontal cortex (including Brodmann areas 10, 44, 45, 46, and 47; Nieder, 2009). Schoenemann, Sheehan, and Glotzer (2005) compared the prefrontal cortex of 11 primate species and found that the relative volume of white matter—high-density, myelin-covered axons—was exceptionally large in humans, although Smaers et al. (2011), using different anatomical criteria, obtained this result for the left prefrontal hemisphere only. More recently, Semendeferi et al. (2011) found that the horizontal spacing between neurons in Brodmann area 10 is significantly greater in humans than in other higher apes, a difference that allows more room for axons and dendrites and suggests that this subarea was reorganized during hominin evolution. While the data are still preliminary, all of these postulated changes are thought to support a more complex connectivity between the prefrontal cortex and other parts of the brain, and thus to support the basic capacity of symbolic language and related behavior. All of these changes are consistent with a coevolutionary process in which the innovation of symbolic language and thought imposed a selection pressure on the prefrontal cortex—if not in its entirety, then some feature thereof—the elaboration of which then served as a neural platform for enhanced language and thought.
Although the genetic basis of this coevolutionary process is largely unknown, the discovery of a mutation in the FOXP2 gene in a three-generational family with a communication disorder provides the first direct evidence of a language-related gene, and opens up the possibility of investigating how genes changed in tandem with the evolution of language (Lai, Fisher, Hurst, Vargha-Khadem, & Monaco, 2001). Mutations of the FOXP2 gene in humans lead to problems with speech articulation, including difficulty controlling the mouth and lower part of the face, as well as with verbal comprehension and expression, both oral and written (Vargha-Khadem, Watkins, Alcock, Fletcher, & Passingham, 1995; Watkins, Dronkers, & Vargha-Khadem, 2002). The FOXP2 is a “transcription factor” gene, involved in regulating the expression of other genes, and molecular and neuroimaging data suggest that the human version of FOXP2 supports the development and function of neural tissues related to language learning and vocalization, including the frontal cortex, striatum, and cerebellum (Fisher & Marcus, 2006). This is not to say that FOXP2 is a “gene for language,” however. As with other transcription genes, the FOXP2 has multiple roles, regulating the genes of other tissues such as the lung, gut, and heart, and it is likely to be just one link in a complex genetic pathway affecting language (Marcus & Fisher, 2003). Moreover the FOXP2 is not unique to humans but is found in a wide range of vertebrates including mice, monkeys, and songbirds. Nevertheless the FOXP2 has undergone two changes (amino-acid substitutions) on the human lineage after the split from the chimpanzee lineage 4.6–6.2 million years ago, an accelerated rate of evolution suggesting that these changes took hold in the hominin population by increasing fitness (Zhang, Webb, & Podlaha, 2002). This was a fitness advantage that did not accrue to chimpanzees or other primates, raising the possibility that novel communicative behavior among early hominins contributed to the selection pressure for changes in the FOXP2 gene.
Together with the neurophysiological data, then, these genetic findings point to the recruitment and modification of extant neural structures in the adaptive shift from prelinguistic to linguistic capacities. We share with our great ape cousins a large, well-elaborated prefrontal cortex with extensive connections to other parts of the brain, and we likewise share a capacity for reference on which various behaviors such as planning future actions, communicating needs and intentions, and learning new tricks of survival are based. The difference is that for humans, the signifier-signified relation is mediated by the vast, conceptual network of language—related to, but independent of, concrete settings. If indeed the prefrontal cortex serves as the neural substrate of the reference process, then the distinctive structure of this brain area among humans, as defined by a relative expansion or elaboration of specific regions, reflects the heightened demands of symbolic as opposed to indexical reference.
The semiotic model thus holds that human modes of communication hinged on an improved capacity for reference rather than on an all-new, sui generis capacity for grammar. As prelinguistic calls and signals gained some detachment from concrete contexts and began to derive meaning from other calls and signals as well, they invariably took on a grammatical aspect, functioning as certain parts of speech and following tacit rules in how they were combined to convey meaning. Yet this grammatical aspect was entirely dependent on the underlying process of symbolic reference—on the “pointing to” process going on in early hominins' minds—without which there would be no words to be grammatical. Signifiers—linguistic or otherwise—are not signifiers without referring to signifieds in the context of a shared conceptual system, and it is only by learning this system that a subject can generate, and thereby understand, this referential relation.
This was the primary adaptive demand of the linguistic niche, and the adaptive solution was not just a mode of communication but a mode of thought. To perceive and respond to features of the Pleistocene environment important to survival—hunting routes, sites for camping, edible plants and animals, raw materials for tools and shelters—did not always involve the use of language, but the very act of perceiving these features—of distinguishing them from the manifold of environmental stimuli and attending to them in accord with goals of survival—was invariably shaped by it. The ability to interpret reality symbolically, once it emerged, would have supported not only communication but the whole gamut of social activities in a given hominin community. All social activities, being defined by and dependent on collective understandings, would have imposed the selection pressure of symbolic reference. Scant evidence of these early activities remains, but one exception is stone tools, and this evidence, compared with that of skeletal fossils, suggests that flaked stone technology was no less important than language as a selection pressure on hominin brains and bodies.
The Baldwin effect and tools
Although the recent discovery of cut-marked ungulate bones in Dikika, Ethiopia raises the possibility of hominin tool use as early as 3.39 million years ago (McPherron et al., 2010), the earliest examples of deliberately flaked stone tools are 2.6 million years old (Semaw et al., 2003). These early tools, made by striking one stone (the “hammer”) against another (the “core”) to form pieces with a sharp cutting edge, belong to what archeologists call the “Oldowan stone industry” because the first samples were found in the Olduvai Gorge, Tanzania (Leakey, 1971). Fossil remains suggest that a species of australopithecines, Australopithecus garhi, was contemporary with the earliest Oldowan tools (Asfaw et al., 1999), which predated the expansion of hominin brain size marking the presence of Homo habilis approximately 2 million years ago (Holloway, Broadfield, & Yuan, 2004). Though limited, this fossil evidence is consistent with a Baldwinian process in which novel behavioral strategies themselves altered the context of natural selection, creating adaptive pressure for genetic support of those very strategies.
Our hominin ancestors, in other words, did not wait for the transition to Homo to begin making tools; rather it was toolmaking that helped bring this transition about. Crude as they were, Oldowan tools made it possible for early hominins to cut into thick-skinned carcasses, to cleave flesh from bones as well as crush bones for their marrow—thus enabling these hominins to exploit food niches on the African savannah long dominated by carnivores such as lions and leopards. Expanded scavenging activity created the need for more advanced tools, as well as for better communication and cooperation in social groups, leading to the selection of bigger brains and enhanced mental capacities, and so on in an interactive cycle of niche construction and genetic change.
As with the Baldwinian approach to language, however, questions remain as to what kind of selection pressure toolmaking actually imposed, and what kind of genetic change this pressure resulted in. Evolutionary Psychology holds that the selection pressure of tool use, like that of language use, consisted of information, and the evolved solution was a cognitive module dedicated to processing information about stone tools and their manufacture. Standard versions of Evolutionary Psychology thus include a “tool-use module” among the hundreds or thousands of evolved modules residing in the human brain. By this view toolmaking began as a learned habit or pattern, but was gradually assimilated as a piece of neural circuitry containing rules about, for example, the selection of lithic materials, evaluation of core shape, or use of the hammerstone. Activated by a toolmaking task (input), these rules guide behavior so as to bring about a result in the form of a stone tool (output). Yet this model assumes, as in the case of language, that information about the adaptive problem, as well as the algorithms that manipulate such information, are intrinsically representational. Indeed the tool module and the language module are able to function as discrete and self-contained “modules” only because they respond to distinct types of information—because the domains of environmental information they deal with are likewise discrete and self-contained.
The semiotic approach by contrast holds that the selection pressure of tool use, like that of language use, consisted of a socially shared process of meaning-making. By this view the units of toolmaking, like the units of language, contain no information in and of themselves, but only as they signify something in the minds of those who use them. Lithic signifiers, like linguistic signifiers, only become signifiers by virtue of a process of reference. Fracture angles, percussion targets, cutting edges—such attributes are not intrinsic to the stone, but depend on an interpretation grounded in common notions and principles of toolmaking. As with language, the selection pressure imposed by tool use was one of being able to interpret signifiers as others did, and the adaptive solution was an interpretive capacity—the ability to relate signifiers to signifieds in terms of a conventional code.
It follows that language use and tool use are not discrete adaptive domains, regulated by functionally discrete cognitive modules, but different expressions of the same underlying capacity, that of symbolic reference (Holloway, 1981; Kitahara-Frisch, 1980; Stout & Chaminade, 2009). Holloway (1969) argued that both stone tools and language are based on symbolization, or the “capacity to structure the environment arbitrarily” (p. 399). “Arbitrary” he defined as “non-iconic”: there is no necessary relation—be it one of physical similarity or spatial contiguity—between the original material (cluster of vocalizations, cobble of stone) and the finished form (sentence, handaxe). Although the physical properties of stone arguably put more constraints on the production process than do the physical properties of vocal sound (Davidson & Noble, 1993), Holloway's (1981) point is that the finished form is not intrinsic to the stone but “imposed” on it in accord with a preconceived design. For tools as for sentences, the imposition of structure is not a single event but a sequence of activities, each of which is meaningful only in the context of the entire process and the anticipated product. There is nothing in the immediate environment to guide each step in the process; knowing what to do next—choosing a word or its inflection, choosing a striking angle or the degree of force—depends on internalized “conventions of sequence” (Holloway, 1969, p. 402) by which each step is understood as a result of the prior one and as a precursor to what follows. As Stout and Chaminade (2009) explained, the reference of any given step in the toolmaking process stems from its “associations with other actions in a superordinate system of technological rules” (p. 89).
The standardized design of Paleolithic handaxes and other tools, moreover, suggests that these rules were standardized as well, and thus socially shared and historically transmitted (Ambrose, 2001). Although Oldowan tools, those linked to Australopithecus and Homo habilis, were made with three or four blows of a hammerstone, early Acheulian tools, linked to Homo erectus, required from 60 to 75 (Semenov, 1970). Acheulian handaxes are typically pear-shaped, 12 to 18 cm long, and flaked on both sides. The broad end serves as the handle, while the tip end, usually thinner and more finely crafted, does the work of the axe. Though symmetrical in outline, both sides have their own subtle contours, with hollows, ridges, and notches arranged for a secure grip by either the left or the right hand. The grip and cutting edge give an indication of how the axe was mainly used, whether for sawing or cutting, as through animal hides or flesh (typical of right-handed axes, which tend to be gripped from around the side), or for pounding or striking, as in severing limbs from carcasses or cracking bones to get the marrow (typical of left-handed axes, which tend to be gripped from the top). In either case the bilateral symmetry of the axe suggests that the toolmaker systematically rotated the stone to trim flakes from each side. This was a lengthy process, filled with contingencies such as missed blows or fractures due to flaws in the material, yet the finished product showed a remarkable consistency for 1.5 million years of the Lower Paleolithic.
Contrary to the notion of this period as one of technological stasis, Stout (2011) argued that Acheulian tools did in fact undergo significant change in design, accompanied by developments in production technique and the use of ancillary tools. From their first appearance 1.6 million years ago to the Late Acheulian stage beginning 700,000 years ago, Acheulian handaxes grew thinner and more symmetrical, with sharper, less serrated edges—changes likely made possible by the use of “soft” hammers of bone, antler, or wood to refine the tool after it had been roughed out with a hammerstone. Yet all of these modifications involved the production of a regular shape from irregular raw materials, and all thus reflect a toolmaking process guided by a “stable representation of intended tool form” (Stout, 2011, p. 1053).
Language would have been instrumental to such cumulative technological change, to the transmission across generations of idealized tool forms and the techniques to achieve them, while more complex tools would have heightened pressure on language skills in a mutually reinforcing cycle (Stout & Chaminade, 2009). For all its importance, however, the coevolutionary relation between language and tools is to be distinguished from the coevolutionary relation that both of these innovations had with the evolving hominin mind and brain. Both toolmaking and language required the ability to learn and think on the basis of ideational codes, and both imposed the selection pressure of symbolization on early hominin populations.
To the extent that these selection pressures were the same, moreover, so were their demands on neurological function. Stout, Toth, Schick, and Chaminade (2008) studied the neural demands of Stone Age toolmaking by measuring the brain activity of expert toolmakers during the production of both Oldowan and Acheulian tools. Areas of activation during handaxe production included the anterior and right lateralized prefrontal cortex, indicating much overlap with the activation patterns of language use (Bookheimer, 2002). This common grounding in the prefrontal cortex suggests that toolmaking and language reflect alternate expressions of the human capacity for intentional, goal-directed behavior (Stout et al., 2008), which in turn suggests that this capacity coevolved with both of these behaviors. As in the case of language, then, the prefrontal cortex can be seen as a neurological precursor of symbolically mediated tool use, and as the possible basis of an evolutionary continuity in tool use between human and nonhuman primates.
Some theorists have questioned the affinity of language and tools as adaptive pressures in human evolution on the grounds that Paleolithic toolmaking consisted of overt, observable regularities of behavior, easily acquired through imitation and without the communication of rules (Chase, 1991; Noble & Davidson, 1996; Wynn, 1995). Chase (1991), for example, argued that stone tools are iconic rather than symbolic products because the operations involved in their manufacture are based on cause-effect relations in nature. Unlike language, in which the relation between a word and referent is based on convention, the relation between a strike to a stone edge and the resulting flake scar conforms to the laws of physics and the properties of the materials used. Despite some variation in form, as Chase (1991) noted, the final product always bears a natural and thus iconic relation to its function as a cutting or pounding tool, a function which itself can be learned by direct observation, without any need for symbolic understanding.
Again, however, the status of an object as icon, index, or symbol is not intrinsic to it, but always relative to a process of interpretation. The key question in understanding the evolutionary impact of toolmaking on early hominin mentality is not whether various steps in the process bear an iconic or indexical relation to the finished product, but whether a mentality characterized by iconic or indexical reference would be capable of carrying out these steps. Chase (1991) focused on the flaking process, but there were many other stages in the production of an Acheulian handaxe, not all of which resulted in immediate, visible progress toward the final result. Among other things, making a stone tool required the use of other tools, such as hammerstones or batons of bone or wood, which needed to be crafted for this specific function. This involved foresight and thus a kind of abstract reflection about the toolmaking process itself (Kitahara-Frisch, 1980). It involved looking upon an object in nature and “seeing” a potential tool of a prespecified form, the function of which was to shape another tool of a different prespecified form. The raw material thus stood for something at many degrees of remove from the immediate setting; there was no necessary link, based on physical correlations in this setting, between the signifier (stone cobble, bone, antler) and signified (potential secondary tool).
Further evidence of symbolic thought processes lies in the fact that the function of Acheulian handaxes was by no means limited to butchering carcasses. Use-wear analysis suggests that these tools were also used for scraping wood and cutting plant material (Keeley & Toth, 1981), and other uses likely included cutting and scraping hides, digging for roots or burrowing animals, and stripping bark from trees (Schick & Toth, 1993). Notably, many Acheulian handaxes have been found that show no use at all, or are far too big to have had any practical value, suggesting a social or ritual significance, for example as a display to prospective mates of the toolmaker's strength and skill. Here Chase's (1991) postulate of a natural or iconic relation between the original stone material and its function breaks down. The “pointing to” relation between the crafted stone and such referents as social status or physical fitness was not based on resemblance or a spatio-temporal co-occurrence, but on communal understandings about individual identity and social relationships.
To use an Acheulian handaxe, then, required learning—not merely by mimicking others' behaviors, but by internalizing an ideational framework in terms of which those behaviors took on direction and meaning. The multiple uses of Acheulian handaxes required the ability to perceive a given signifier—the handaxe—in different ways, according to different adaptive needs. Surely the same was true of words and morphemes in the protolanguages of the Pleistocene. Symbolic communication was integral to the development of stone technology and more broadly to the occupation of habitats ranging from the East African savannah to the temperate woodlands of Europe and China. Hunting and gathering in these diverse habitats was far from a uniform set of practices, but required detailed local knowledge of available raw materials for tools, garments, and shelter, suitable sites for camping, water and plant sources, tracks and habits of prey—and the ability to convey that knowledge. Changes in these survival conditions created selection pressures for inventiveness and flexibility of behavior, including the ability to use words and tools in new ways, and these innovations in turn created selection pressures of their own.
This is the Baldwin effect at work, but the point here is that knowledge of how to use a given innovation—a word, a stone tool, an animal trap—stemmed not from any self-evident, objective properties, but from prior knowledge of other, like artifacts—other words, other tools—as well as knowledge of the habitat and how to survive in it. The reference of lithic signifiers, like that of linguistic signifiers, was not intrinsic, but depended on collectively shared interpretations. It follows that the power of words and stone tools in altering the adaptive niche for a community of hunter-gatherers was inseparable from the symbolizing faculty by which they existed as words and stone tools. Consistent with the Baldwin effect, genetic changes shored up the behavioral advance, but these changes served to strengthen the referential capacity underlying various behaviors rather than to encode the basic content of the behaviors themselves.
Informational versus semiotic coevolution
The advent of the symbolic niche thus had a different evolutionary impact than that suggested by the “incompleteness thesis” of cultural anthropology. For Geertz (1973) the mind that evolved in interaction with culture is formless and inert, consisting of little more than the capacity to be socialized. Although the symbolic realm of Geertz's model favors the ability to learn, nowhere does Geertz speak of the mind's contribution to this symbolic realm. This is not because Geertz neglects meaning, but because he views it as external and “public,” embodied in observable, concrete symbols. Much of interpretive ethnography thus takes the form of code-breaking, of deciphering symbolic forms, and Geertz's own studies of the Balinese cockfight or the Moroccan bazaar reflect this notion that meaning inheres in symbols apart from the psychology of the participants. The symbolic niche, so defined, places no special demand on the ability to make meaning, and this would have been true in the Pleistocene as well. Geertz's claim that the human mind is “incomplete” or “unfinished” rests on this assumption that symbols are innately referential, that a given signifier—a word, a ritual—can stand for a particular referent without an agent to make sense of it. The mind need only acquire the symbols of a given culture—not generate them—and is utterly helpless until it does so.
Evolutionary Psychology's postulate of cognitive modules, though reflecting a different evolutionary outcome from that described by Geertz, likewise stems from an objectivist conception of the human-made niche. Evolutionary Psychologists claim that the plasticity implied by Geertz's model would leave individuals ill-equipped to deal with various tasks of survival, and that some sort of inborn “guidance system” is necessary to ensure that their choices are adaptive (Tooby & Cosmides, 1992). Novel adaptive behaviors among early Pleistocene hominins were thus assimilated as genetically encoded instructions and goals. This account assumes, however, that various activities—toolmaking, food gathering, language use—were contained and standardized enough to be regulated by a set of algorithms, and moreover that such algorithms, and the information from the environment they manipulate, could have intrinsic meaning apart from a process of reference.
Evolutionary Psychology imputes intrinsic meaning to information in the environment and the neural circuitry that encapsulates it, while cultural anthropology imputes meaning to public artifacts and social practices. Both overlook the semiotic insight that signifiers are only signifiers inasmuch as they point to a signified, and this “pointing to” relation always involves an interpretive process. Both positions thus misconstrue the adaptive imperative of a symbolically mediated niche. “Systems of symbols” do not inhere in reality, be it extrasomatic or genetic; they need to be constructed, interpreted, and utilized. If it is true, as Geertz argued, that human minds are incomplete without symbols, then it is equally true that symbol systems are incomplete without human minds.
As a biological design process natural selection is said to have many flaws, but leaving its works incomplete is not one of them. Lungs evolved in interaction with air, eyes with light, and ears with sound, and none of these traits could function in the absence of their respective elements, but this fact does not render these traits anything less than complete. Unlike air, light, or sound, moreover, much of culture is observer-relative and intentional; it depends for its existence on the meaning-making of a community of individuals. This means that culture as a selection force cannot be separated from mind; that a process of meaning-making itself is integral to the selection pressure of culture.
The semiotic approach thus offers a different view of the human-made niche from that of Evolutionary Psychology, corresponding to a different view of the selection pressure of this niche and thus of the coevolutionary process itself. For Evolutionary Psychology, various cultural entities—words, tools, patterns of social life—come bundled, so to speak, with ties of relatedness already attached, such that adapting to a culture filled with these entities is a matter of processing precoded information. This selection pressure resulted in the development, during the Pleistocene, of computational mechanisms which presumably led to the production of more information—more intricate kin relations, more complex grammatical patterns—and so on in a process of informational coevolution. The growing complexity of Pleistocene culture created further selection pressure for information processing, and the eventual outcome was a complex array of brain-based modules equipped to deal with specific survival tasks in this ancient environment.
The semiotic view, by contrast, holds that the diverse innovations of the Pleistocene possessed representational power—and thus adaptive influence on the evolving hominin mind—only by virtue of the collective interpretations of individual subjects. The adaptive task of the human-made niche was not to perform algorithmic operations on things with meaning already attached to them, but to bestow meaning on these things in terms of shared ideational frames. Semiotic coevolution stems from the epistemic condition that meaning—the reference of signifier to signified—is a creative response to things, not something intrinsic to them. What semiotic coevolution produces, then, is a foundational behavior from which other behaviors arise, including linguistic behavior: the capacity to make signifiers “signifiers” in the first place, that is the capacity to generate and experience the reference of a signifier to a signified. The growing complexity of Pleistocene culture created selection pressure for a more sophisticated referential capacity—namely the capacity to learn an increasingly complex ideational system and to interpret the world in terms of it.
This system is at once internal to the individual and intersubjectively shared, and for this reason the so-called arbitrary nature of symbolic reference does not mean capricious or unguided. By defining what refers to what, ideational frameworks produce a “consensus of perceptions,” making communication possible and ensuring a degree of regularity and predictability in social behavior (Holloway, 1969, p. 406). Being able to join this consensus is a requirement of survival, such that one function of symbolic reference, as Deacon (1997) observed, is to provide “fail-safe” access to the symbolic niche.
Semiotic coevolution and cultural change
Yet if providing such access were the only function of symbolic reference, the Stone Age might never have ended. The idea of semiotic coevolution sheds light on a question that any Baldwinian account of the evolution of the human mind must answer, namely, assuming that this coevolutionary process took place during the 2.6 million years of the Pleistocene epoch, how is it that humans have created, since the end of the Pleistocene 11,700 years ago, a sociocultural environment so radically different from the one in which their minds evolved? This question is especially relevant to Evolutionary Psychology's model of cognitive modules filled with algorithms from the Pleistocene. If these modules were perfectly calibrated to the adaptive tasks of a hunter-gatherer lifestyle, why would Pleistocene hominins have ever left that lifestyle behind? More specifically, if Pleistocene hominins apprehended problems of survival in terms of innate categories and rules, how could they ever have perceived new problems or come up with fundamentally new solutions?
The answer, by the present view, is that natural selection—in the form of semiotic coevolution—designed mind not as a computational process defined by brain-based goals and strategies, but as a semiotic process whereby goals and strategies are based in memory and as such are constantly changing to reflect the changing adaptive demands of the environment. The referential or meaning-making capacity responds to these changing demands by generating new meanings—new “pointing to” relations between signifier and signified, as guided by prior understandings of those very adaptive demands.
Hunter-gatherers in the Fertile Crescent of Southwest Asia around 13,000 years ago faced a shortage of wild food plants and game animals due to a major climate shift toward cooler, drier conditions (Hillman et al., 2001). As staples of fruit and legumes gradually disappeared from the region, these gatherers were forced to rely more on drought-resistant cereal plants like rye, wheat, and feather-grass, yet these too fell into decline. The solution these people came up with was to cultivate these wild cereals rather than try to gather them from an increasingly arid terrain. On more than one occasion, in more than one early settlement, an observant hunter-gatherer looked at the seeds of these plants and saw the possibility of sowing them by hand, rather than leaving it to the wind to spread them. The signifier—a seed of wheat or rye—was perceived in a new way, according to the preconceived goal of obtaining more grain.
As this insight occurred in the last few millennia of the Pleistocene, the resulting novel behavior—cultivating cereal grasses—could not have been predisposed by a cognitive module. Yet once this idea caught on, becoming part of the shared knowledge of the culture, it was only a matter of time before other food plants—lentil, barley, chickpea, field bean—were made into crops as well. The herding and breeding of wild animals—sheep, goats, cows, pigs—as a source of food and clothing likewise followed from the pressures of climate change on wild resources. The shift to an agricultural economy was a process of domesticating both plants and animals rather than wandering in search of them, and this process depended on seeing familiar signifiers—a grain of wheat, a wild goat—in unfamiliar ways.
The rise of agricultural settlements in turn created new adaptive pressures, for example the need for tools to support more intensive farming practices, vessels for grain preparation and storage, vehicles for transporting surplus grain for trade, and a means of keeping records of stored as well as bartered grain. Fired pottery, copper and bronze metalworking, wheeled carts and plows, writing systems—all developed in response to these adaptive needs. Bolstered by improved technology and expanded trade, agricultural settlements in southern Mesopotamia around 5000 years ago began to develop into prosperous cities such as Ur, Lagash, and Eridu. Large urban populations in turn created the need for more complex social and political organization, spurring the development of many features of civilization as we know it today: centralized bureaucracies, occupational specialization (artisans, tradesmen, teachers, priests), legal codes, religious systems, monumental architecture, public works, standing armies, and tax collection.
These were innovations that would not have been adaptive in the Pleistocene world, in which humans lived in groups of 50–150 individuals. Nor could these innovations have been derived from genetic algorithms based on a hunter-and-gatherer existence. They resulted instead from novel symbolic relationships in response to new adaptive problems—new relations between signifier and signified, stemming from changes in the ideational framework in which this reference is grounded. Then as now, these social institutions and practices existed as intentional entities—their reality and power based on collective consensus—and once they became part of the conventional social world they created new adaptive pressures, favoring yet further innovation.
Evolutionary Psychologists claim that modern humans have a “Stone Age mind,” and readily concede that the ancient goals and strategies found in cognitive modules, being tailored to the survival tasks of prehistoric hunter-gatherers, are poorly matched to more recent agricultural, industrial, or urban settings, all of which involve basic differences in social structure, status seeking, mating patterns, and resource acquisition. Indeed they argue that the power of these ancient goals is such that their poor fit with modern society leads to widespread maladaptive behavior, including a broad range of mental disorders (Stevens & Price, 1996). Yet this conclusion overlooks an important affinity between the Pleistocene and the present, namely that both of these environments comprise “intentional” or “constituted” worlds (Shweder, 1990), worlds whose events and entities are all quite real, but only by virtue of the intentional states of those involved with them. The Pleistocene world too was an intentional world, filled with artifacts and patterns of behavior that had no existence apart from human invention and interpretation. Then as now, the very existence of tools as tools, or ritual as ritual, presupposed a collective of minds capable of experiencing them as such, and this is the world to which our Pleistocene ancestors had to adapt. Cultural change since then is a story of infinite, incremental changes to this symbolic realm, changes which ultimately hinged on signifiers referring to different signifieds in people's minds, according to different adaptive needs.
Theorists have long noted how cultural change resembles biological change, with cultural artifacts, practices, or institutions accumulating small modifications over time much as organisms and their respective adaptations do (Mesoudi, Whiten, & Laland, 2004; Ziman, 2003). “Descent with modification” in the cultural realm applies not only to tangible things like stone tools, sailing ships, and woodwind instruments but also to intangible things like scientific theories, childrearing beliefs, and economic policy. Attempts to turn this biological analogy into an explanatory model, however, have met with many difficulties, the primary one being how to define the cultural entities that actually evolve. Neither material artifacts nor abstract ideas can easily be reduced to discrete elements, comparable to genes, that serve as units of transmission from generation to generation. As critics have argued, this is because material artifacts and abstract ideas are embedded in a social context; understanding and making use of them depends on understanding many other artifacts and ideas, all of which depends on an understanding of language (Sperber, 2000: Wimsatt, 1999).
For a similar reason any such elemental cultural units would not replicate or undergo change in the same way genes do. When a gene is transmitted through reproduction its structure remains largely intact, but when cultural content is transmitted through learning it gets interpreted in terms of the prior understandings and beliefs of the person who learns it. Such interpretation often leads to change: people not only learn ideas, but use imagination and foresight to transform them into something new. This process is quite unlike the random mutation and natural selection of genetic material, and for many theorists the interpretive dimension of culture poses a major obstacle to a Darwinian model of cultural change.
By the present view, however, the interpretive dimension of culture provides the very basis of such a model. Indeed by this view the search for cultural units akin to genes is misguided: all cultural entities, from the tangible to the intangible, exist as intentional entities—their very status as “entity” being based on social consensus—such that cultural change is always a matter of change in the collective experience of reference by which these entities are known. The key to understanding how cultural evolution resembles biological evolution, then, is to understand that symbolic reference itself is an adaptive trait designed to construe cultural entities according to prior needs and goals, and thus to produce meaning conducive to the achievement of those goals (Cousins, 2012). To the extent that the field of goals is widely shared, such that members of a cultural community share similar processes of reference, cultural signifiers remain stable in what they signify, and this is the basis of conventional perception and behavior. When needs and goals change, however, the process of symbolic reference generates new meanings—new “pointing to” relations between signifier and signified—in such a way as to make those signifiers useful to addressing the need. The result is “descent with modification”—a change in the collective construal of a cultural entity, and thus a change in the entity itself.
