Abstract
Whether face processing is modular or not has been the topic of a lively empirical and theoretical debate. In expert observers, the perception of nonface objects in their domain of expertise is remarkably similar to their perception of faces, in patterns of both behavioral performance and brain activation, providing some evidence against the modularity of face perception. However, the studies that have yielded these results do not rule out the possibility that object expertise and face processing occur in spatially overlapping, but functionally independent, brain regions. Recent research using an interference paradigm reveals that expert object (car) processing interferes with face processing. The level of interference was proportional to an individual's level of car expertise. These results may provide the most direct evidence to date that face and object recognition are not functionally independent.
Many debates in cognitive neuroscience center around modularity, or whether the mind-brain is divided according to content (modularity) or to process. Evolutionary pressures may lead to the development of specialized modules. For example, the ability to recognize faces is a vital skill for humans, allowing people to distinguish friend from foe, and thus face processing may have dedicated neural circuitry. Consistent with this idea, an impressive body of evidence suggests that face recognition differs qualitatively from object recognition. For example, newborns prefer to look at a facelike configuration than at control stimuli, including an upside-down version of the same image. In adults, the recognition of faces is more influenced by inversion (turning the face upside-down) than is the recognition of objects, and it is also more dependent on the spatial relations between features (i.e., configural effects; Farah, Wilson, Drain, & Tanaka, 1998).
In the brain, a small part of the human visual cortex (the fusiform face area, or FFA) is more active when people look at faces than when they look at other objects (Kanwisher, McDermott, & Chun, 1997). Recordings of the gross electrical activity (event-related potentials, or ERPs) of the visual cortex in response to the presentation of a face show a face-selective response that peaks early, at about 170 ms after the onset of the stimulus (Bentin, Allison, Puce, Perez, & McCarthy, 1996). In addition to this evidence for face selectivity in humans, face-selective cells are found in the visual cortex of monkeys, and recent functional magnetic resonance imaging (fMRI) studies in monkeys suggest that the “face areas” seen in humans may also be present in other animals. Also, a double dissociation (one patient worse with category A than category B and another patient showing the opposite pattern of performance) is found between brain-lesioned patients who have more difficulty recognizing faces than other objects and rare patients showing the reverse pattern (Farah, 1991). Together, these results suggest that faces may be processed by a domain-specific module. Such a system may have evolved because of the primordial importance of face recognition for human survival, and it could support the unique requirements of this skill, starting early in life.
EVIDENCE AGAINST THE MODULARITY OF FACE RECOGNITION
Evidence favoring the modularity of face processing would be easy to interpret if not for the fact that for each result suggesting that faces are “special,” another finding indicates that “face-specific” effects can be obtained with nonface objects under the right conditions. For example, newborns are found to prefer any pattern (not just faces) that has more elements at its top than at its bottom, suggesting an interesting but general bias, unlikely to reflect an innate preference for faces (Simion, Valenza, Cassia, Turati, & Umilta, 2002). Behavioral effects such as sensitivity to inversion or configural changes can be obtained for nonface objects in observers trained to become experts with a novel category (Gauthier & Tarr, 2002).
At a neurophysiological level, activity in the human FFA in response to nonface objects, such as birds or cars, is correlated with observers' expertise with those objects (Gauthier, Skudlarski, Gore, & Anderson, 2000). ERP studies reveal expertise effects in the brain that occur as early as the first face-specific response, around 170 ms following stimulus presentation (Tanaka & Curran, 2001). In monkeys, visual neurons can respond selectively to nonface objects, and this response is strikingly similar in some respects to that of face-selective neurons, especially after the monkeys have extensive experience with the nonface category (Logothetis, Pauls, & Poggio, 1995). Also, evidence from brain-lesioned patients is questioned, partly because almost no two patients are tested in the same manner, and there is controversy regarding the evidence necessary to claim a face-selective deficit. In addition, computer simulations that model neural networks show that double dissociations can be obtained even in a nonmodular architecture (Plaut, 1995).
Although all these findings cast a dubious shadow on the existence of a face module, the presence of specialization for faces compared with other objects is generally not questioned. Indeed, studies with expert observers typically aim at explaining the very origins of this specialization.
LIMITATIONS OF CURRENT METHODOLOGIES
Thus, researchers find themselves in something of a deadlock. They have been caught up in making the strongest case possible using the most recently developed techniques, but these techniques may not be ideally suited to answering the question of modularity. Human brain imaging using fMRI has been in the spotlight in recent years, and authors have used this technique to debate whether or not responses to faces and objects of expertise overlap. It may seem that conclusions are constrained by the limited spatial resolution of fMRI. However, efforts to look into smaller and smaller “brain pixels” are unlikely to resolve questions of modularity. After all, scientists already know from single-cell recordings that a certain proportion of neurons encode both faces and objects (e.g., Sigala, Gabbiani, & Logothetis, 2002). In the end, the question is not whether face and nonface objects are processed in different places in the brain, but whether they are processed independently. Evidence suggests that object expertise and face processing are very closely related in the brain both in space and in time. But one cannot rule out the possibility that there exist two functionally independent systems that obey different computational rules and have independent processing capacities, but that are so intermingled in the brain that they appear to be the same.
AN INTERFERENCE PARADIGM
We set out to test the functional independence of face and expert processing more directly. We were interested in people who are dual experts, being highly skilled recognizing both faces and, in this case, cars. We used cars as our expert category because they are very different from faces (especially in profile), and experts with this category are readily available. We predicted that dual experts would experience some kind of interference if they had to process a car and a face at the same time, as would be expected if the two skills relied on a common system. More specifically, we focused on interference in holistic processing because it has become a hallmark distinguishing face from object recognition. Holistic processing can be defined as the obligatory processing of all parts of a stimulus, even when observers are directed to attend selectively to one part (e.g., processing of the nose when observers are asked to attend only to the eyes). We knew that people process objects more holistically when they become experts (Gauthier et al., 2000), and therefore we predicted that this increase would trade-off with the holistic processing of faces when car experts had to process a car and a face at the same time.
Behavioral Interference
The task we used is illustrated in Figure 1a (Gauthier, Curran, Curby, & Collins, 2003). Observers saw a sequence of faces alternating with cars. Each car or face was made out of two parts (top and bottom), and observers pressed a key to indicate whether the bottom of the current image was the same or different from the bottom of the last image of the same category. This key press triggered the presentation of the next image. Observers were told to always ignore the top halves of the cars and faces. However, on half of the trials, the top part of the image was incongruent with the correct response for the bottom (e.g., if the bottoms were the same, the tops were different). Thus, we were able to obtain a standard measure of holistic processing for faces and for cars: If observers could selectively attend to the bottom as instructed, the information in the top part should not have influenced their responses. Therefore, our measure of holistic processing was the extent to which the identity of the top part influenced the judgment on the bottom half on each trial.

Example of trials used to measure the interference effect of car processing on face processing. Composites of faces and cars made of the tops and bottoms of different objects were presented alternately. Subjects were instructed to attend only to the bottom halves of the images for the entire experiment and, for each one, to judge whether the bottom matched that of the last object of the same category. Normal faces were interspersed with either (a) normal cars (tops upright) or (b) cars in a transformed configuration with the top half upside-down. From Gauthier, Curran, Curby, and Collins (2003).
To perform this task accurately on both faces and cars, observers had to keep a car part in memory while they made a judgment on a face part. We could thereby measure the influence of the car context on holistic processing of faces. Performance in this situation was compared with performance in a control condition (Fig. 1b) in which the same faces were alternated with cars that had an upside-down top part. Thus, the task and part to be attended were identical in the experimental and control conditions, but the control condition used a car configuration that was not familiar to car experts. Unfamiliar configurations, such as those resulting from inversion, are processed less holistically than familiar configurations (Farah et al., 1998). We therefore expected that the transformed cars would cause less interference on holistic processing of faces than the upright cars would.
We tested 40 observers, ranging in their expertise with cars from none to extensive (as measured by their performance matching pictures of cars compared with their performance matching pictures of birds, a category with which they had no special experience). To summarize our predictions, we expected that holistic processing for normal cars would be greater than that for transformed cars, especially in observers with more car expertise. We also expected that holistic processing resources would trade off, such that faces would be processed less holistically the more holistically the cars were processed. These predictions were supported; we not only found evidence that car experts process upright cars more holistically than cars in a modified configuration, but we also found a relationship between car expertise and an index specifying the predicted interference. The interference was measured on identical face trials in the two conditions, when only the car context differed. Car experts and novices did not differ on their overall performance on these face trials, but differed only in the extent to which performance varied between the two conditions. These results ruled out simple accounts of the interference in terms of differences in task difficulty or the physical attributes of the object categories and led us to reject the hypothesis of functional independence between face and expert object processing.
Neural Interference
An advantage of this paradigm is that it showed that the interference from the car context was associated specifically with holistic processing. Therefore, the interference was at a level that was relevant to prior claims of modularity for face perception. It was unclear where in the brain this interference occurred, so we could not draw conclusions about the neural substrate mediating holistic processing. However, prior research provides some indirect clues. First, car expertise is directly related to activity in the FFA (Gauthier et al., 2000). Second, during expertise training with novel objects, increases in holistic processing are correlated with activity in the FFA (Gauthier & Tarr, 2002). Therefore, this region is a candidate locus for the interference between holistic processing of objects of expertise (e.g., cars) and faces.
But a more important issue revolves around the temporal dynamics of this interference. In particular, the dual task that we used relies heavily on a transient form of memory known as short-term or working memory. Results of fMRI studies suggest that a distributed network of brain areas—including the FFA and frontal areas of the brain—is involved in face working memory; however, the response in the frontal areas lags behind activity in the FFA following encoding of a face (Druzgal & D'Esposito, 2003). To test whether the interference occurs during early stages of visual processing, rather than during later task-specific stages of processing (e.g., the consolidation of items into memory), we used ERPs, which have very fine temporal resolution.
ERPs were recorded from our subjects while they performed our dual task with cars and faces (Gauthier et al., 2003). We focused our analyses on an early response called the N170, the earliest face-selective response in the human brain (Rossion & Gauthier, 2002). This electrical potential is maximal at scalp electrodes above the occipital and temporal lobes of the brain (toward the back of the head) and peaks around 170 ms after the onset of the image (Bentin et al., 1996). In our study, novices showed a larger N170 in response to faces than cars, but car experts showed the reverse pattern. We also calculated an ERP interference index: the difference in magnitude of the N170 elicited by faces seen in the context of transformed cars and the N170 elicited by faces seen among normal cars. This index correlated with car expertise, indicating that the mechanisms responsible for the N170 responses to faces and to cars are not independent. In addition, the fact that interference occurred this early in processing is an indication that the interference was perceptual.
IMPLICATIONS FOR FUTURE RESEARCH
Although cognitive neuroscience often highlights studies using the latest technique, our research program taught us a useful lesson: The best spatial and temporal resolution cannot make up for asking the wrong question. Questions about the modularity of the mind have mostly received answers about localized specialization in the brain. However, modules could be supported by distributed neural networks, and localized functions need not reflect processing that is specific to a particular kind of content. Because arguments in favor of the modularity of face processing historically rested so much on there being an area in the brain that is face selective, many efforts to test this claim focused on whether object expertise could also engage this region. Although much was learned from these studies, the emphasis on “where” in the brain faces and objects are processed may have led researchers to ignore an important issue: whether processing of faces is functionally independent of the processing of other domains. Our interference results suggest that, wherever in the brain faces and objects are processed, their processing is not functionally independent and thus can hardly be called modular. And if face and object processing are not independent, the fields of object recognition and the study of expertise can facilitate understanding of how faces are processed and how the brain becomes specialized for their perception. Indeed, if modular models are rejected, cognitive neuroscience may need to bridge traditionally separate fields of research (Palmeri & Gauthier, 2004).
The study we have summarized here opens up many new lines of research. Our results indicate that expert object processing interferes with face processing in a working memory task, but it is unknown whether these two kinds of processing could interact in a more automatic fashion. For example, under conditions in which holistic processing resources are at capacity, does the mere perceptual co-occurrence of faces and cars (in the case of car experts) result in interference? In addition, the temporal conditions for this interference are unknown: If one cannot perform optimally when simultaneously processing faces and objects of expertise, how fast can one switch between domains of perceptual expertise without incurring a cost? Finally, although holistic processing can be measured reliably and correlated with specific neural markers, there is still too little work focusing on the mechanisms underlying this phenomenon (see Wenger & Ingvalson, 2003). Computational modeling, constrained by knowledge of neurophysiology, is a necessary step to allow researchers to move away from simplistic debates (e.g., “are faces special?”) and to investigate the complex mechanisms that govern learning and functional flexibility in the visual system.
Footnotes
Acknowledgements
This work was supported by National Science Foundation Award BCS-0091752, National Eye Institute Grant R01-EY13441, and a grant from the James S. McDonnell Foundation to the Perceptual Expertise Network.
