Abstract
This article explores the implications of the social brain and the endorphin-based bonding mechanism that underpins it for the evolution of religion. I argue that religion evolved as one of the behavioural mechanisms designed to facilitate community bonding when humans first evolved the larger social groups of ~150 that now characterise our species. This is not a matter of facilitating cooperation, but of engineering social cohesion – a very different problem. Analysis of the size of C19th utopian communities suggests that a religious basis both allowed larger groups to form and greatly enhanced their longevity. I suggest that religion evolved in two stages: an early immersive form with no formal structure based on trance-dancing (a form still evident in the rituals and practices of many hunter-gatherers) and a later form which had more formal structures and gave rise to our modern doctrinal religions. I argue that the modern doctrinal religions did not replace ancestral immersive religions but rather that the doctrinal component was overlaid on the ancient immersive form, thereby giving rise to the mystical stance that underlies all world religions. I suggest that it is this mystical stance that causes the constant upwelling of cults and sects within world religions.
Introduction
Although religion, as a topic, has been approached from a number of different angles by different disciplines, it is probably fair to say that most approaches have emphasised the cognitive aspects of religions (what people believe), with a corresponding tendency to focus on the modern world religions. One reflection of this has been the emphasis in the cognitive science of religion on cognitive predispositions to adopt religious beliefs (e.g. the hyperactive agency detection device, or HADD: Boyer, 2008). This has, I suggest, led to an over-emphasis on the rational and intellectual features of religious behaviour. There is another important aspect of religion that tends to be given much less attention, and this is the essentially emotional side of religion. People typically become religious or join particular religions not for intellectual reasons but as a result of emotional experiences (reflected in extreme cases in the conversion phenomenon). The intellectual, or theological, side of religion is certainly important – it informs the particular nature of ontological beliefs and regulates the particularities of the rituals associated with those beliefs. But none of these would have any real force if they are not built on the emotional experiences, the ‘raw feels’, that give these beliefs an emotional significance. This raises an important question that addresses the ‘how?’ of why religion seems to work.
My claim is that an emphasis on emotional components of religion points us firmly in the direction of two other questions that have received surprisingly little attention, namely, when and why religion evolved. Some approaches to religion seem to take these questions for granted or at best assume that religion in some way fosters prosociality and cooperation – though without specifying how this happens or why such mechanisms might be needed by humans but not by any other animal species. The answers to these two questions may help illuminate the ‘how?’ question with which I began this article.
The long-standing conventional view in the history of religion(s) has been that the modern world (or doctrinal) religions are of relatively recent origin and were preceded by a long period during which religions (at least in the sense of beliefs about the world) were animist in form. As a term, animism fell into disrepute in the middle of the 20th century (albeit largely for spurious ideological reasons), although it has been rehabilitated to some extent more recently (Bird-David 1999; Guthrie, 1993). These forms of religion still persist in many hunter-gatherer societies (Peoples et al., 2016) and in all likelihood far beyond in the sense that they are a natural consequence of a tendency to attribute intentionality to all manner of animate and inanimate objects (Dunbar, 1995; Guthrie, 1993). For want of any better term other than animism, I have been inclined to follow Eliade (1964) and use the term ‘shamanic religions’ (with a small ‘s’) (Dunbar, 2013), with the clear proviso that I recognise the distinction between shamanism sensu stricto (the Siberian forms with religious specialists and a focus on healing) and shamanism sensu latto (which includes a wide variety of African and South American forms where rituals may be limited to trance experiences by a much wider range of individuals). An alternative label that in many ways captures the essence of what they involve might be immersive religions, by which I mean that individuals become collectively immersed in rituals and experiences such as trance.
These religions differ from the modern doctrinal religions in a number of key respects that apply, as the philosophers would say, severally and jointly (meaning the criteria apply generally, but not necessarily in every case). These include the absence of a formal theology, including gods (the spirit world is usually peopled by ancestor spirits and various kinds of therianthropes and mischief-makers whose intentions are often malevolent), the absence of any religiously justified moral codes (this is not to say the community has no moral codes), a major focus on transcendental experiences generated through trance, belief in spirits that inhabit natural features such as springs and trees, and the absence of both formal religious specialists and religious spaces. In its simplest forms, this does not usually include any beliefs or rituals for healing, although these may be important in full-blown shamanic forms (which probably emerged later: Peoples et al., 2016). In short, they are largely religions of experience rather than doctrine.
In this article, then, I ask three questions: why, when and how did religion as we know them evolve? I will use the term religion here, although some might argue that there is no such thing as religion-in-the-generality and that I am really concerned with religiosity rather than religion(s) as such. I recognise that finding a consensus definition for the term religion is not easy, and I will not even try to do so. Definitional disagreements are rarely profitable, and I generally prefer to go with the more practical scientific view that we ‘sort of know what we mean’ (Dunbar, 1995) and leave it at that. I will simply base my argument on the weak claim that religiosity is what gives us religions (I fail to see what a religion is if it does not involve a religious sense or sentiment) and that the transition from immersive-type religions to doctrinal religions (where the term clearly applies) is an important part of the story. So there are two related parts to my analysis: How, when and why did the early immersive, shamanic forms of religion arise? How, when and why did the transition to doctrinal religions occur?
The social brain
The Social Brain Hypothesis was originally proposed as an explanation for the fact that primates have much larger brains than all other animals. The reason, it was suggested, is that primates have much more complex social groups than other animals. This was later elaborated into the Machiavellian Intelligence Hypothesis (Whiten & Byrne, 1988), which viewed social complexity mainly in terms of the ability to outwit group members and steal a march on them by exploiting them. This claim tended to treat all primate species as pretty much identical and simply sought to explain why all primates have bigger brains than all other animals. Subsequently, this was reformulated as the social brain hypothesis sensu stricto which suggested that differences in brain size between primate species were related to the need to manage groups with different numbers of relationships (Dunbar, 1998). This was demonstrated by the fact that group size and brain size are highly correlated in primates, but not most other groups of mammals or birds that do not have the kind of bonded social groups characteristic of primates (Shultz & Dunbar, 2007, 2010).
Because the relationship between group size and brain size is remarkably robust, it can be used to predict a ‘natural’ group size for modern humans based on our brain size. The figure predicted by human brain size is 150, and this value has been confirmed empirically by data from both natural human groups (hunter-gatherer communities, military organisation, village sizes in small-scale agricultural communities ranging from the Doomsday Book of AD 1086 to the contemporary Hutterites of the American northwest) and personal social networks (Dunbar, 2014a). Humans, of course, have the capacity to live in much larger communities, and in this respect they resemble many other primates that have multilevel social systems.
We have used the social brain equation to estimate social group sizes for extinct hominin populations. We can do this because (a) the relationship is very robust and (b) because we know the start and end points for the human trajectory (i.e. chimpanzees and modern humans), so it is just a matter of deciding how the values are strung out in between. (Although some have questioned whether this can be done, in reality palaeoanthropologists do exactly this all the time in order to estimate the body masses, stature, and reproductive and ecological parameters for fossil hominins, so there really isn’t any justification for doubting the validity of doing so.) When we estimate group size for these species from their brain volumes, it is clear that group size remained stable at around the 50 limit for the first several million years of our evolutionary history after our lineage had separated from the great ape line (the australopithecine phase) (Figure 1). However, beginning 2 million years ago with the first appearance of the genus Homo, brain size, and hence group size, began to increase exponentially.

Community sizes (mean ± 95% CI) for the major fossil hominin species. Mean brain size for each population (clusters of fossil skulls from the same location and time) are mapped onto the ape regression equation for the social brain graph (see Gamble et al., 2011). Neanderthals are calculated at 80% cranial volume (cf. Figure 3) to compensate for the fact that they had more brain tissue assigned to the visual system and proportionately less assigned to the frontal lobes and would therefore have had smaller community sizes than would be implied by their total brain volume (see Pearce et al., 2013). The dashed line indicates the mean community size of ~40 for living chimpanzees. AMH signifies fossil anatomically modern humans (Homo sapiens).
Primate social groups differ in a number of crucial ways from those of almost all other birds and mammals. Most species form groups simply as a result of casual aggregations: animals stay together as long as being in a group is useful and the foraging conditions allow it. Left to themselves over a period of time, such groups usually disperse as animals drift off to continue foraging while other members of the group stop to rest. Because these groups are so unstable, animals can often end up on their own – or at least in groups that are smaller in size than would be ideal. Primates have solved this problem by evolving a form of bonded relationship that gives stability and coherence to the group by keeping its members together. This is no small feat because animals have to be willing to coordinate their behaviour with other group members, and this means being able to defer a rest period or a foraging period even though you really might prefer to carry on with these activities. The capacity to do that depends on the ability to inhibit prepotent actions (i.e. doing what you really want to do). This highly specialised cognitive ability depends on the evolution of novel brain regions (notably Brodmann area 10 in the frontal pole) not found in other mammalian orders or in birds. Indeed, in primates the capacity to inhibit behaviour increases as a function of brain size, and hence group size.
The central problem of sociality and its solutions
Humans, like all primates, are intensely social, and their peculiarly bonded form of sociality (Dunbar & Shultz, 2010; Shultz & Dunbar, 2007) has been a defining feature of their societies. Individuals are bound together by what can be best described as emotional ties. This gives their groups a degree of cohesiveness and stability through time that is invariably lacking in the societies of other animals and birds (other than in the form of pair-bonded monogamy). The problem for all species is that living in close proximity incurs costs in terms of ecology (groups have to travel further to find sufficient food) as well as social stress, and these have significant negative consequences for the animals involved, and especially the females whose fertility is dramatically affected by stress (Dunbar et al. 2009, 2018; Dunbar, 2019; Dunbar & MacCarron, 2019). These costs offset the advantages of group-living (in most cases, protection from predators), such that group size is always a trade-off between the benefits and costs of living in groups versus living alone. Species occupying high-predation-risk habitats (such as terrestrial species living in open habitats) will be prepared to tolerate higher costs (and hence live in larger groups) than species that habitually live in low-predation-risk habitats (such as arboreal species living in forests) (Dunbar et al., 2009, 2018; Dunbar & MacCarron, 2019).
There has been a long-standing tendency to see the evolution of cooperation as the central function of primate (and human) society (e.g. Lang et al., 2019). In fact, this is to completely misunderstand the nature of primate (and human) sociality as well as its evolution. Cooperation (if it occurs at all in primates) emerges long after bonded groups evolved in primates and is a consequence, not a cause, of social life. Group-living evolved as a defence against primates, but it is not a form of cooperation that incurs the public goods dilemma (the principal problem that most analyses of the evolution of cooperation are concerned with). Group-living is a passive form of cooperation. The problem that social mammals face in these contexts is how to maintain the cohesion of social groups, and especially large social groups, in the face of the centrifugal forces created by the stresses of living in groups. Primates (and a small number of other mammal orders, including the dolphins, equids and elephants) achieve this by forming bonded groups in which members develop intense forms of commitment to each other through social grooming. These bonds allow groups to stay together rather than drift apart as happens in most other species (unless they form monogamous pairbonds, the only form of bonded group found in other mammals and most birds: Shultz & Dunbar, 2007).
Brain power alone is not, of itself, sufficient to give primate social groups the kind of bonded quality that they possess. This requires a mechanism for creating intense emotional bonds between individuals. This seems to have been engineered by using social grooming. Primates are avid social groomers, and the more social they are, the more enthusiastically they groom. As a result, grooming time increases more or less linearly with group size across primate species, reaching a maximum at groups of about 50 when animals devote about 20% of their entire day to grooming. Individuals who groom frequently together monitor each other’s movements closely so as to avoid becoming separated, are careful to follow each other if one of them moves and, more importantly, come to each other’s aid when one of them is threatened. What seems to create this sense of bonding and commitment is that grooming triggers the endorphin system in the brain.
Endorphins are neuropeptides whose main function seems to be as part of the brain’s pain management system. They are opioids and create an opiate-like effect (calmness, analgesia, a sense that all is well with the world) that induces a sense of commitment and obligation towards those with whom one happens to be interacting. Because of these effects, the system seems to have been co-opted during primate evolution to underpin the process of social bonding (Depue & Morrone-Strupinsky, 2005 ; Keverne et al., 1989; Loseth et al., 2014; Dunbar 2010; Machin & Dunbar, 2011; Panksepp et al., 1978). In primates, the endorphin system is activated by the slow leafing through the fur that occurs during grooming, an action not unlike the stroking that humans do in intimate relationships (Nummenmaa et al., 2016). Activation of the endorphin system is mediated by a highly specialised peripheral neural system, the afferent C-tactile (or CT) fibres (Olausson et al., 2010). These differ from almost all other peripheral nerves in that they are unmyelinated (and hence slow), one way (there is no return motor loop from the brain), target the endorphin-producing neuron bundles in the brain and respond only to light slow stroking at around 2 cm per second (approximately the speed of hand movements during grooming and stroking).
The problem with grooming is that it is very time-consuming: it is a one-on-one activity (you cannot groom two people simultaneously) and it requires a considerable time investment in each individual to create a meaningful, bonded relationship. Indeed, in both humans and monkeys, the strength of a relationship (measured in terms of its likelihood of producing mutual support) depends directly on the time invested in it (Dunbar, 2018; Sutcliffe et al., 2012). This, combined with the fact that the time available in the day for social interaction is limited, sets an upper limit on the size of group that can be bonded by grooming. This limit is at about 50 individuals (of all ages).
When our hominin ancestors began to evolve the larger groups that characterise our lineage, the need to increase grooming time beyond this limit to ensure that additional individuals were bonded into the group would have placed significant stress on time budgets – not least because these were already at capacity with little or no spare time even during the australopithecine phase when group sizes seem to have been much the same as they are in chimpanzees today (Dunbar, 2014b). Estimates of how much time modern humans devote to social interaction indicate that we do not spend more time interacting than the most social of the primates (Dunbar, 2009). In other words, the 20% barrier seems to have been absolute and inviolable, and neither we nor our ancestors have ever been able to breach it. The only way this can be done is by finding a way of grooming more people at the same time – in effect, by using time more efficiently.
It seems that we did this by finding a number of ways to trigger the endorphin system indirectly (i.e. without physical contact), thereby allowing several, or even many, individuals to be ‘groomed’ at the same time. In sequential order, these seem to have involved laughter (a form of chorusing), singing (without words), dancing, feasting (eating together), storytelling and religion. We have been able to show that all of these trigger the endorphin system (Dunbar, Teasdale et al., 2016; Nummenmaa et al., 2016; Pearce et al., 2015; Tarr et al., 2015; Weinstein et al., 2016). All of these play a central role in our social world. Of these, laughter was clearly the earliest, since we share it with the great apes – and, indeed, laughter is an adaptation of play invitation vocalizations of Old World monkeys. Storytelling and religion, of course, depend on the possession of language and so must have been late evolutionary acquisitions. Singing and dancing, which are closely related, and perhaps communal feasting appear to be of intermediate age. Each allowed us to break through what was, in effect, a series of glass ceilings on group size.
The social role of religion
So how was religion involved in this bonding process? In this context, we are concerned with the earliest forms of religion, which, as I suggested above, were essentially immersive rather than doctrinal and ritual-based. In the absence of anything better, I take as my model for this the kind of trance-dancing found in hunter-gatherers like the !Kung San of southern Africa (Mathias, 1999). These typically involve the women providing the music (by singing and clapping) and the men engaging in a round dance that eventually triggers a trance (Eliade, 1964; Goodman, 1990). During trances, initiates experience travel in the spirit world that the trance gives them access to. During these travels, they meet ogres and tricksters (whose main interest seems to be in deflecting the traveller from finding their way back out to the real world) and good spirits (mainly ancestors) who act as spirit guides and help the traveller return to the world from which they came. Fear of failing to find the exit is palpable and real, as the exit is always described as small and difficult to find – the entrance to a tunnel connecting the two worlds (Eliade, 1964). The fear may even be justified: trance requires the initiate to dance to the point of exhaustion, and it may well be that sometimes individuals die in the process, which is then interpreted as a consequence of their spirit failing to find the exit back to the real world. The experience of trance seems to be cathartic in a way that seems to wipe the slate of social disaffections within the community clean. Indeed, in the San, trance dances are often explicitly held to diffuse tensions that have built up within the community (Alan Barnard, personal communication).
A line of inquiry that developed in the 1980s, but which subsequently seemed to languish, suggested that trance states were the product of endorphin activation (Frecska & Kulcsar, 1989; Henry, 1982; Jilek, 1982). This is an extremely plausible proposal and would certainly account for the phenomenon of Durkheimesque effervescence as well as the Turneresque sense of communitas that so often accompanies these kinds of ceremonies. The emotional and magical experiences that people have during trance states surely added to these effects. I suspect that one reason this line of enquiry died out was simply the fact that endorphins are difficult to work with, both in the field and in the laboratory, because they do not cross the blood–brain barrier (Machin & Dunbar, 2011).
The important question for us here is what added value these kinds of effects had in terms of community bonding. My claim is that the endorphin activation adds significantly to the bonding process because it allows a much larger number of individuals to take part than would be the case with most other social activities. It is significant that these kinds of rituals make use of many of the elements used in conventional social bonding – singing, dancing, handclapping, synchronised physical exertion – that are known to trigger the endorphin system (Cohen et al., 2010; Dunbar et al., 2016; Pearce et al., 2015; Tarr et al., 2015; Weinstein et al., 2016). These same elements continue to feature, of course, in the rituals and practices of most contemporary doctrinal religions.
The rituals and beliefs of religion differ strikingly from all the other older bonding mechanisms in that they require language. Physical contact, laughter, singing (chorusing) and dance are all behaviours that can occur spontaneously and, in and of themselves, create an immersive experience. There seems to be a crucial social component for all of them. Laughter is difficult to trigger when someone is on their own, for example, singing and dancing alone simply does not produce the same intensity of engagement or emotion than when doing these activities with other people. This seems to reflect an added value due to behavioural synchrony, which is well illustrated by Cohen et al.’s (2010) study of rowers: the physical exertion of rowing triggers an endorphin response (as indexed by elevated pain thresholds), but rowing in synchrony with others even in a virtual boat elevates this effect by around 100% without any detectable increase in effort. This, in many ways, parallels the use of sea shanties and other work songs to increase work output in the context of heavy manual work in 18th- and 19th-century sailing ships (or, alternatively, maintain work output in the face of increasing exhaustion).
Seen in the context of the progressive addition of new bonding behaviours to allow increases in group size, it is worth noting that religion does seem to add significantly to the size of communities that can be maintained as coherent units. In a study of optimal community size in 19th-century American utopian communities, Dunbar and Sosis (2017) found that the optimal size at foundation that maximised longevity was ~64 for secular cults and ~171 for religious cults (Figure 2). The difference in mean cult duration is equally marked: just 8 years for secular cults versus 36 years for religious cults. Longevity at optimal foundation size differed even more: 15 years versus 100 years. There appears to be something about a specifically religious ideology that makes it possible for individuals to hang together as a community for much longer before frictions eventually lead to community fragmentation and demise. A religious ethos somehow seems to enable people to tolerate each other’s foibles and irritating behaviour rather better than would otherwise be the case.

Duration (in years) for religious (filled symbols) and secular (unfilled symbols) C19th American utopian communities, plotted against size at foundation. Regression lines are upper bounds. Vertical lines demarcate optimal foundation size that maximises longevity. Reproduced from Dunbar and Sosis (2017).
There would seem to be two ways this might come about. One, that has been widely touted in the literature, is the ‘policeman in the sky’ phenomenon. An omniscient moralising high god who can see everything that we do even when the community itself cannot act as a threat to keep people in line. This clearly places a heavy weight on punishment as the principal means whereby community cohesion and stability are maintained. While this sits well with the post-agriculture environment where very large towns and cities are the primary social environment within which people live, it sits less well with a hunter-gatherer context where community size is small (100–200) and the groups within which people live on a day-by-day basis (camp groups or bands) are even smaller (35–50). The religions of such societies lack High Gods who can oversee the behaviour of community members, and they lack a moral code justified by religious principles (Lang et al., 2019). (This does not mean they lack a moral code; rather, the issue is that the code is not justified by appeal to religious principles.) The alternative view is that a religious framework, with its attendant rituals, creates an endogenous sense of commitment to the community that induces a greater willingness to put up with others’ irritating behaviour, either for its own sake or in order to maintain community equilibrium for the benefits the community provides.
The difference between these two would seem to be that between negative and positive reasons for being a member of a community – or, alternatively, the imposition of discipline from without (top-down from the community) versus the imposition of discipline from within (bottom-up from the individual). Common experience suggests that endogenous (bottom-up) commitment is always more effective than exogenous force in maintaining the rule of law. If we commit ourselves to something (the psychological phenomenon known as pre-commitment: Crockett et al., 2013), we are less likely to weaken and backslide than if someone else tries to make us adhere to some regulation that we believe is arbitrary and contrary to our interests. The role of religion seems to be that of making us committed from within. The use of a Moral High God (MHG) as an enforcer certainly occurs, at least among the Abrahamic religions (Lang et al., 2019), but it should probably be seen as an ancillary device rather than its primary function. The fact that at least some doctrinal, or world, religions (e.g. Buddhism) rely entirely on internal motivation and have no external force majeure in the form of an MHG is strong evidence in support of this suggestion.
Dating the origins of religion
Archaeological evidence for religion tends to rely on formal burials associated with grave goods, since, it is argued, these are explicit evidence of belief in an afterlife. Evidence for burials, however, dates back only around 100,000 years at most (the double burial of a mother and presumed child at Qafzeh in Israel) and is based mainly on the fact that the bones are heavily stained with ochre that had been sprinkled on the corpses. Ochre can act both as a preservative and as a decoration, and this is taken to imply something symbolic about the afterlife of the body. More convincing evidence based on associated grave goods (for use in the afterlife) and the clothing in which bodies were wrapped becomes common only from around 40,000 years ago in Europe. Although claims for deliberate burials have been made for some earlier Neanderthal sites (dating back perhaps as early as 300,000 years), most of these are disputed. That said, one has to wonder why some archaic humans apparently disposed of their dead down deep shafts in the back of caves: in the Sima de los Huesos (the Pit of Bones) in the Atapuerca Mountains of northern Spain, dated to c. 400,000 years BP, bones from 28 different bodies are jumbled together at the foot of a 13-m-deep shaft. It looks too suspiciously like the entrance to the underworld. The main problem is that there are also a very large number of cave bear bones down there. Although these could be interpreted as offerings for the journey (especially given the apparent significance of cave bears to all these early Europeans), it might equally reflect a natural accumulation of bones from periodic cave-dwellers being washed down during occasional floods.
Although archaeologists have been justifiably cautious about interpreting these kinds of archaeological finds as evidence of religion, other disciplines may now offer insights from a very different direction. There are two key sources of information. One is a secure dating for the origin of language, since language is a sine qua non for formal religion. This is because, although it is perfectly possible to have religious experiences on one’s own, these cannot form the basis for a viable religion unless and until they can be exchanged and compared with other individuals. It is the shared religious experiences that are meaningful: I have to be able to explain what I have experienced to you, and, more importantly, I have to be able to explain their significance to you. The second source of evidence derives from the fact that doing so requires us to be able to reflect on our religious experiences, and this requires mentalising or mindreading abilities. Mentalising is crucial for religion since it is what allows us to imagine other worlds – that is to say, worlds other than the one that we live in, in the here-and-now. If we can map mentalising abilities onto brain size, then it should be possible to identify its evolutionary trajectory through time.
There are a number of anatomical markers that palaeoanthropologists have used, with varying degrees of confidence, as markers of language. Strictly speaking, they are markers of speech rather than language, since they all relate to the capacity to control the vocal channels. Since it is possible, even likely, that there was a preverbal phase to language prior to the evolution of fully functional language (Dunbar, 2009), this may simply indicate the earliest point at which language could have evolved rather than when it actually evolved. Each of these markers involves a distinctive shift from a primate-like form to a modern human form. There are four of these anatomical markers: the expansion of the thoracic vertebrae in the upper chest (reflecting the nerves that control breathing and the ability to manage the long slow exhalations required for both singing and speech), the expansion of the hypoglossal canal in the floor of the braincase (reflecting the nerves that control movements of the tongue and vocal chamber in the upper throat that are required for the full range of human speech sounds), the positioning of the hyoid bone that supports the top of the oesophagus in the throat (which affects the range of sounds that can be produced by changing the size of the vocal chamber in the mouth, making certain vowel sounds possible) and the presence of human-like ear canals (which determine whether or not certain sounds, especially those of human speech, can be heard).
Figure 3 plots the timing of these against brain size for the different species of hominins, where data are available. In each case, it marks periods when the fossil evidence for the ancestral primate-like versions is present and periods when the derived modern human-like form is present, with the latter being taken to indicate the capacity for speech. Although the validity of most of these markers has been questioned at some point (albeit often with unimpressive levels of statistical naivety), what is striking about this graph is how closely they all agree. The consensus is clearly that modern human-like forms for these markers all first appear with the evolution of archaic humans (the Heidelberg folk, and their successors the Neanderthals). This might simply signal control over the vocal apparatus for chorusing (wordless singing or ‘humming’ in Mithen’s (2011) sense), or it might signal some form of true language.

Mean (±95% CI) cranial volume for different hominin species, listed in chronological order. Overlain on this is the presence of primate-like and human-like anatomical markers for speech. The human-like conditions are the following: a distinctive bulge in the size of the neural canal in the thoracic vertebrae (allowing improved control over breathing, needed for the long exhalations required for speech); increased size of the hypoglossal canal in the base of the skull (to allow greater innervation of tongue and mouth musculature for greater control over speech sounds); a lowering of the hyoid bone that supports the top of the oesophagus (allowing the production of certain key vowel sounds); and the presence of human-like vestibular canals (allowing recognition of the greater variety of human speech sounds). Redrawn from Dunbar (2009).
Mentalising, or mindreading, is the capacity to understand the intentions of others by modelling their mental states. The concept was originally developed by philosophers of mind to refer to the capacity to use intentional terms (represented by verbs like intending, believing, wanting, supposing, thinking, understanding, knowing, etc.). Mentalising forms a naturally recursive hierarchy which starts with first-order intentionality (knowing one’s own mind), the level that all conscious animals can aspire to. The capacity to understand someone else’s mindstate (second-order intentionality, better known as theory of mind) is shared with the great apes (probably) and appears in humans at about 5 years of age. Most adult humans are able to manage five orders of intentionality (Kinderman et al., 1998; Stiller & Dunbar, 2007), with the intermediate levels appearing successively at different ages during childhood and adolescence (Henzi et al., 2007). These capacities are important because they strongly influence our ability to imagine other worlds, to create and enjoy stories and to handle the grammatical complexity of language (Carney et al., 2014; Carney & Robertson, 2018; Dunbar, Launay et al., 2016). Indeed, the ability to handle complexly structured sentences maps directly onto mentalising ability in adult humans (Oesch & Dunbar, 2017).
Over the past two decades, a plethora of neuroimaging studies has linked theory of mind to a neural network in the brain involving the prefrontal cortex, the temporoparietal junction and elements in the temporal lobe (van Overwalle, 2009). More recent imaging studies have related individuals’ intentionality competences (the number of levels they can achieve) to volumetric changes in the size of these brain regions, particularly in the prefrontal cortex (Lewis et al., 2017; Powell et al., 2010). By the same token, intentional competences across the primates (monkeys at first order, apes at second order and humans at fifth order) also correlate linearly with prefrontal cortex volume (Dunbar, 2009; see also Devaine et al., 2017). We can use the regression equation for this to estimate the intentional competences of fossil hominins (Figure 4). This suggests that while early Homo is likely to have been able to manage third-order intentionality (i.e. one step up from the great apes), fully modern fifth-order intentionality would not have appeared before the evolution of anatomically modern humans around 200,000 years ago. Archaic humans, including Neanderthals, would have aspired to fourth-order intentionality, but not much more.

Mean (±95% CI) intentionality level achieved with the cranial volumes for individual hominin species, listed in chronological order. Intentionality is estimated by interpolating through a series of equations that estimate intentionality from frontal lobe volume, in turn estimated from brain size which, in turn, was estimated from intracranial volume for individual fossil skulls. In each case, the regression equations have r2 values in the order of 0.95. For details, see Dunbar (2009).
These differences in intentional competences would have dramatically affected both the language complexity and the mental world complexity of these species. In terms of linguistic competence, it suggests that archaic humans may well have had language but that it is unlikely to have been fully modern in its complexity. Their stories would have lacked the creative ‘bite’ of the modern storyteller (Carney et al., 2014). This, in turn, has implications for the level of complexity that they would have been able to achieve in talking about their religious experiences. Table 1 summarises the level of complexity in statements about religion that people with different intentional competences could achieve. While the complexity increases with higher orders of intentionality, my sense is that there may be a phase change at fifth order in which the complexity of religious utterances shifts from forms of personal religion (these are things I believe) to forms of communal religion (these are things we all agree to believe). If so, then it suggests that, even if archaic humans may have had some form of religion, conceptually modern forms of religion are unlikely to have evolved before the appearance of anatomically modern humans. In other words, religion as-we-know-it cannot be much more than 200,000 years old. Heidelbergs and Neanderthals may have had religious experiences, but their ability to interpret and communicate these would have been of the kind possible only with fourth-order intentionality – equivalent to those of a young human teenager rather than an adult human. My guess is that this would suggest a form of religious experience that is relatively unreflective and largely mystical or ecstatic in form.
Forms of religious statement that are possible with different levels of intentionality.
I believe this statement to be true, but there is no obligation or requirement that you believe it.
The crisis of the Neolithic
I have suggested that shamanic religions of immersion characterise hunter-gatherer societies and have done so since humans first acquired the capacity to think and act in a religious manner. The general consensus among historians of religion has been that doctrinal, or world, religions are a late development. The most likely timing for this phase shift in religious complexity is the Neolithic Settlement. There are two main reasons for thinking this. One is that this is when we first see archaeological evidence for formal religion. This includes what appear to be religious spaces (‘temples’), evidence for symbolism of a kind that might refer to divine beings (skulls embedded in walls, statues) and evidence for religious specialists (priests). These are all documented in the archaeological record from the Levant and are associated with the first villages and towns (Hodder, 2010).
The second reason for thinking that the period of the Neolithic Settlement marks the key transition point is the mere fact of settlement. As we saw earlier, social life incurs significant stresses for primates, and much of what they do (social grooming, the formation of coalitions and alliances) is designed to mitigate these costs in order to allow animals to live in stable social groups. Contemporary hunter-gatherers manage this problem in large part through fission–fusion sociality: this form of dispersed social system in which the community is divided between three or four camp groups reduces the number of individuals that anyone has to live with to around 50 while allowing some flexibility for individuals to move between camp groups if they become too stressed by the individuals with whom they happen to be living. Chimpanzees do something similar (Dunbar, 2019). The Neolithic Settlement (itself probably driven by the need for defence against raiders: Johnson & Earle, 2000) disrupted this by forcing the entire community to live in one location with no mechanism for mitigating these costs.
These stresses would have been so considerable that life in villages of even the size of a natural hunter-gatherer community (100–200 people) would have been impossible without some solution. I suggest that formal religion played a significant role in this for exactly the reasons identified in Figure 2: a formal religious dimension allows more individuals to live in harmony for longer. This would likely have coincided with, and been supported by, the rise of political structures that, in effect, allowed a secular police force to enforce the moral and social codes of the religious system. In other words, the centrepiece of the Neolithic is not the invention of agriculture (which had, in fact, already been in existence for several thousand years before the first settlements) but the invention of mechanisms that allowed people to live together in cramped conditions without (not to put too fine a word on it) killing each other. By implication, the more conventional immersive forms of religion were not sufficient for these purposes: something more robust was needed.
In fact, we see this pattern nicely illustrated in a number of New World contexts, where the shift to large communities occurred much later.
Bandy (2004) analysed village sizes on the Taraco Peninsula on Lake Titicaca in Bolivia between 1500 BC and AD 550, a period that spans the transition from small-scale agricultural villages to the formation of an integrated state. In the early phase, the mean village size was about 127, with village fissions typically occurring when the village was around 170 people. From around 1000 BC, the mean village size increases inexorably from a mean size of 275 to larger towns of around 400 people by 500 BC. This later phase coincides with the rise of the Tiwanaku state and is associated with the simultaneous appearance of a new formal religious complex, the Yaya-Mama religious tradition. This includes a novel form of ceremonial public space (the classic sunken ball court), decorated serving bowls, ceramic trumpets, incense burners and a distinctive style of stone sculpture. Organised religion seems to have been part of the toolkit used to keep the lid on fractiousness so as to allow larger communities to exist. Adler and Wilshusen (1990) identify a similar sequence in both the ethnographic and the archaeological records of the Pueblo Indians of the American Southwest: when village size exceeds ~250 people, ritual spaces (such as kivas) appear, suggesting a role for communal rituals in community bonding.
The mystical stance
The role of trance in small-scale religion raises a central question about the differences between these forms of religion and modern doctrinal religions. We might suppose that resorting to trance states with their intensely emotional content has been superseded by a calmer, more rigorous, theologically based cognitive approach to religion. In fact, this seems not to be the case. Even a cursory glance at the history of the Abrahamic religions, not to mention Buddhism and the yogic traditions within Hinduism and associated religions, is enough to remind us that a mystical tradition has continued to bubble beneath the surface of world religions throughout their respective histories. One needs to only point to the gnostic schools of North Africa (notably those under the influence of Pseudo-Dionysius) around the time of the fall of the Roman Empire, the Cathars of medieval France and many of the early Protestant sects of the 17th and 18th centuries (including the Quakers and Methodists, as well as the now forgotten Ranters and Johann Gichtel’s Amsterdam Brethren of the Angels among a great many others). On a more orthodox plane, there has obviously been the very long tradition of Christian mystics from Meister Eckhart, Julian of Norwich, St Theresa of Avila and St Francis of Assisi to the 18th-century Maronite nun Hannah al-‘Ujaimi. Within Judaism, there has been the Kabbalah tradition and within Islam the Sufi tradition, both of some considerable vintage.
A mystical stance thus seems to have been, and probably still is, a baseline motif that refuses to go away, despite the often draconian attempts by central authorities to eradicate it as theologically unsound. In some cases, such as the Cathars, the central authority was successful in completely repressing the sect, but in many cases they were conspicuously unsuccessful. This is undoubtedly because the ‘raw feels’ sense of fusing into the divine presence that is associated with mystical experiences is so powerful and so motivating that it overrides the threats that any secular power might issue. The mystical stance seems to introduce an uncontrollable element, a degree of individualism that is difficult to manage. In short, the mystical stance thus seems to have been present in all doctrinal religions throughout their history and remains very much alive and well.
In effect, what seems to have happened is that the doctrinal elements of the modern world religions have been grafted onto the old underlying mystical base. If so, this may explain why all the world religions are bedevilled by a constant welling up of cults and sects from below. These cults are almost always based around a charismatic leader (of either sex), involve small groups of individuals, have more ecstatic forms of religious service and often have what the central hierarchy considers to be thoroughly heretical theological views. These have clearly been problematic for the central hierarchy and mainstream believers, who are apt to try to suppress them in order to maintain some kind of theological regularity.
This suggestion explains two observations about modern world religions that are otherwise difficult to explain – and which, perhaps for that inconvenient reason, seldom seem to feature in discussions of religion. These are the apparent ease with which religions fragment into cults and sects and the fact that most of these are typically small in size. If we accept that religion is intended to be, or functions as, a unifying force in society, then these two phenomena are puzzling. If we suppose that religion arises as a consequence – by-product or otherwise – of basic universal cognitive processes (essentially the position defended by the cognitive science of religion), then this fragmentation is very puzzling. Why would a unitary, universal psychological mechanism give rise to so much variety, and seemingly do so as quickly as it does?
In this respect, religion resembles language and the speed with which languages spawn dialects that, within a few hundred years, can evolve into new, mutually incomprehensible languages. It has taken less than 1500 years for Latin to diversify into the six (maybe eight) Romance languages and less than 1000 years for English to evolve into its currently recognised six different official languages. This too is odd if, as most of those interested in language assume, language exists to permit communication. Why would language behave in such a way that seems explicitly designed to make communication impossible? The reason, I suggest, is the same in both cases: languages and religions are explicitly small-scale phenomena and both are designed to bond small communities. In other words, they are both mechanisms for identifying and bonding hunter-gatherer communities of 100–200 individuals. As it happens, both scale up extremely well and provide the basis for bonding potentially very large mega-communities (Launay & Dunbar, 2016). But neither does so perfectly, suggesting that they are really designed to manage much smaller groupings.
Conclusion
In this article, I have proposed an alternative view of the evolution of religion that combines the social brain hypothesis with the neurobiology of social bonding. I argue that the predominant focus on the cognitive and formal ritual aspects of religion overlooks what may, in fact, be by far the most important aspect of religion and the reason why people believe and behave religiously. This is the emotional ‘raw feels’ component that arises from engaging in religious rituals. This phenomenon is what I refer to as the ‘mystical stance’ – the capacity to become immersed in the religious experience, to enter into trance states. This arose, I suggest, as one of several mechanisms that humans developed to bond their relatively large social groups. I suggest that the doctrinal aspects of the modern world religions have simply been grafted onto this ancient substrate, and that this may explain some peculiar features of modern doctrinal religions such as their constant tendency to fragment into sects and cults. Doctrinal (or world) religions, as we know them, seem to date only from the Neolithic, even though the cognitive capacities that are necessary for modern religions (mentalising and language) long predate this. They seem to have arisen as a mechanism for bonding large numbers of people living in relatively cramped conditions. Although Neanderthals and other archaic humans may well have had both religion and language, the cognitive evidence unequivocally suggests that these would have been less sophisticated than those found in modern humans. This implies that religion as we know it (but still in its immersive, adoctrinal form) only arose with the appearance of modern humans around 200,000 years ago.
