Holistic Processing of Faces in Preschool Children and Adults

Abstract

Contrary to the encoding-switch hypothesis, recent research demonstrates that 6-year-olds do not rely solely on parts-based encoding to recognize upright faces. This research shows better recognition of face parts presented in the whole face than in isolation, indicating use of holistic encoding. The present study examined whether children younger than 6 years also recognize faces holistically. Four-year-olds, 5-year-olds, and adults were administered a part-whole face recognition task. Children below the age of 6 remembered parts from upright faces better when tested in the whole-face context than in isolation. This whole-face advantage did not occur when faces were inverted. Although children showed a smaller inversion decrement than adults and generally performed more poorly than adults, the different age groups showed similar patterns of performance, indicating that young preschoolers, like older children and adults, are able to recognize faces holistically.

Much is known about the way adults recognize faces. Research has distinguished between three types of information, each of which is important for upright face recognition: (a) first-order relational information (i.e., two eyes above a nose, which is above a mouth); (b) holistic information (i.e., representations of the overall structure of the face; Tanaka & Farah, 1993); and (c) configural information (i.e., representations of the spatial relations among the individual features; for a recent review, see Maurer, Le Grand, & Mondloch, 2002). Holistic and configural information are employed for processing upright faces, but not inverted faces, which instead rely on parts-based processing (Bartlett & Searcy, 1993; Rhodes, Brake, & Atkinson, 1993; Tanaka & Farah, 1993).

Less is known about the processes that children employ for face recognition. Face recognition is certainly impressive in infancy. Within the first few hours of life, infants can distinguish between a moving schematic face and a scrambled face (Goren, Sarty, & Wu, 1975; Johnson, Dziurawiec, Ellis, & Morton, 1991). Infants are also able to recognize their mother's face within a few days of birth (Bushnell, Saï, & Mullin, 1989). By the age of 4 years, children have had much experience with faces and are capable of using the face as a cue to various types of information (e.g. identity, emotion, gaze direction; Bruce et al., 2000). Despite this ability, children do not achieve adult levels of face recognition performance until adolescence (Carey, Diamond, & Woods, 1980; Mondloch, Le Grand, & Maurer, 2002).

There has been long-standing debate regarding whether or not there is any qualitative difference in the way that children and adults recognize faces. On the basis of a series of experiments, Carey and Diamond (1977) proposed that the increasing accuracy in face recognition abilities during childhood may reflect a qualitative shift in processing modes. They suggested that until the age of 10, children rely on parts-based processing to recognize faces, and that there is a switch to configural processing at age 10. Recent research, however, has demonstrated that children under the age of 10 do not rely solely on piecemeal processing to make face recognition judgments.

Carey and Diamond (1994) used Young, Hellawell, and Hay's (1987) chimeric-face task to test for holistic processing in 6- and 10-year-old children. After the children studied a set of faces, the bottom and top halves of different faces were fused, and the children were asked to identify the top halves of these chimeric faces. Young et al. had found that adults' recognition accuracy decreases when the halves are aligned compared with when they are misaligned, because a new “Gestalt” emerges when the two halves are aligned. This effect is not seen when chimeric faces are inverted, as adults do not rely on the holistic aspects of inverted faces for recognition. Carey and Diamond (1994) found a similar pattern for 6- to 10-year-old children, indicating that children, like adults, encode faces holistically.

Tanaka, Kay, Grinnell, Stansfield, and Szechter (1998) have also provided evidence indicating that children and adults use similar modes of processing for unfamiliar faces. Six-, 8-, and 10-year-old children learned a set of unfamiliar faces. They were then required to recognize target facial features (eyes, mouth, nose) in a forced-choice procedure, with the features sometimes presented in the whole face and sometimes presented alone. Tanaka et al. reasoned that if children represent faces holistically, then they should recognize a facial feature better when it is embedded in the whole face than when it is presented alone. Indeed, children showed an advantage for recognizing features in whole upright faces. This whole-face advantage disappeared for inverted faces; for inverted faces, recognition performance was similar in the whole and part conditions. Similar results have been found with adults (Tanaka & Farah, 1993). These studies suggest that children and adults both use holistic processing to recognize faces.

These results are contrary to Carey and Diamond's (1977) proposal in indicating that children below the age of 10 do not depend solely on piecemeal representations, but instead rely on holistic encoding for face recognition. When does this adultlike style of processing emerge? Few studies have examined face recognition between infancy and school age (i.e., 6 years), and even fewer have investigated holistic processing in young children. In one study, children learned to recognize two target faces and were then required to identify these faces (presented either upright or inverted) among eight distractor faces, as quickly as possible (Brace et al., 2001). All children older than 5 years showed the classic inversion effect: They were able to access upright faces from memory more efficiently than inverted faces. Children below the age of 5, however, were faster at recognizing the target face when it was upside down than when it was upright. Brace et al. asserted that this “inverted inversion effect” might be due to a lack of reliance on holistic encoding for upright face recognition. They suggested that with increased exposure to faces, children become better at encoding faces holistically.

Using a categorization task, Schwarzer (2002) examined the use of holistic encoding in 2- to 5-year-old children for both face and nonface stimuli (birds and planes). In one experiment with faces and a separate experiment with nonface stimuli, children initially learned to categorize the stimuli into two groups, which were constructed so as to allow categorization in terms of either individual parts or overall structural similarity. A subsequent test phase involving new stimuli determined what type of strategy the children used (parts based or holistic) during the learning phase. Young children were better at categorizing all stimuli in terms of piecemeal characteristics than in terms of overall structure, indicating that they tended to use parts-based processing, rather than holistic processing.

Although previous studies have demonstrated that children more than 6 years old process faces in a manner that is qualitatively similar to the way adults process faces, two studies examining face recognition in children below the age of 6 indicate developmental differences in processing strategies. Our aim was to determine whether or not 4- and 5-year-old children recognize faces holistically. We employed Tanaka and Farah's (1993) part-whole face recognition task to investigate holistic processing. To ensure that the task was developmentally appropriate, we used an immediate recognition memory task, similar to Tanaka et al. (1998, Experiment 3). We used photos of children's faces (which all participants were required to recognize), conducted the experiment over two sessions to prevent fatigue in the children, and increased the exposure duration time to accommodate our younger participants. We also presented the tasks on a computer with a touch-sensitive screen in order to make the task more enjoyable for the children in particular.

In accord with Tanaka and Farah (1993), we reasoned that if preschoolers and adults recognize upright faces holistically, then a target feature should be easier to recognize when presented in the whole face rather than in isolation. To ensure that any observed whole-face advantage was specific to upright faces, we presented inverted faces as control stimuli. We expected that if young children and adults process inverted faces in terms of the component parts, then we would observe no advantage for recognizing target features when they were presented in inverted whole faces rather than in isolation. We also expected that, as in previous studies (e.g., Bruce et al., 2000), children's recognition accuracy would be significantly poorer than that of adults. However, if children and adults use qualitatively similar processing modes (i.e., holistic) to make recognition judgments, then all participants would show an advantage for whole faces over parts when faces were presented upright.

METHOD

Participants

There were three groups, each comprising 23 participants: 4-year-olds (M= 4 years 3 months; range: 3 years 11 months–4 years 7 months), 5-year-olds (M= 5 years 1 month; range: 4 years 8 months–5 years 7 months), and adults (M= 20 years; range: 18–26 years). Children were recruited from the Child Study Centre at the University of Western Australia and a child-care center. The adults participated to fulfill a requirement in their introductory psychology course.

Materials

The face stimuli were similar to those used by Tanaka et al. (1998, Experiment 3). Eight photographs of unfamiliar children's faces (four boys and four girls) were used as target faces. Four different children's faces were used as practice items. Four target faces (two boys, two girls) were presented in an upright orientation, and the remaining four faces were inverted (rotated 180°). For the whole condition, three distractor faces were created for each target by replacing the eyes, mouth, or nose from the target face with the respective feature from a new face. For the part condition, target features were generated by removing either the eyes, mouth, or nose from a face and presenting it in isolation on the screen; distractor features were generated by removing the relevant feature from a distractor face and placing it alone on the screen. The target and distractor whole-face stimuli were approximately 240 mm × 180 mm in size, and were arranged on a dark-gray background. The photographs were presented on a computer with a touch-sensitive screen using Metacard (Version 2.3) software.

Procedure

All participants completed an upright task and an inverted task. Children were seen for about 10 min on each of two separate occasions, 1 week apart. They were tested in a quiet room and were seated approximately 30 cm from the computer screen. The adult participants completed both tasks within a single 25-min session. The order of presentation of the upright and inverted tasks was counterbalanced across participants.

Upright task

Participants were told that they were going to play a game with photos of children's faces shown on a computer. They were instructed to look carefully at each face in order to remember it later. The task began with the presentation of four practice trials (two from each condition) to ensure that all participants understood the instructions and to familiarize the children with the task. On each trial, participants were introduced to a photo of an unfamiliar child's face (e.g., “This is Luke”; see Fig. 1a). The photo remained on the screen for 6 s for inspection. 1 In the whole condition, the forced-choice recognition task followed immediately. For this task, two faces appeared on the monitor (see Fig. 1b). One of the faces was the original target face, and the other face was a distractor face. Crucially, the distractor face differed from the target face by only one facial feature (eyes, mouth, or nose). The experimenter explained that one of the faces was the face seen previously (e.g., Luke) and the other was the face of that person's sibling (e.g., Luke's brother). Participants were asked, “Which is Luke's face?” and were instructed to identify the correct face by touching it on the screen.

Fig. 1.

Example of stimuli presented in the upright task: (a) target face, (b) test stimuli presented in the whole condition, and (c) test stimuli presented in the part condition.

In the part condition, two isolated facial features (eyes, mouths, or noses; see Fig. 1c) were presented immediately after the target face was shown. One of the features was from the target face, and the other feature was a distractor. Participants were asked which feature belonged to the target face (e.g., “Which are Luke's eyes?”).

There were 24 test trials in total: 2 conditions (whole and part) × 3 features (eyes, mouth, and nose) × 4 target faces (2 boys and 2 girls). Presentation of trials was randomized with the condition that the same target face could not be presented on consecutive trials. The left and right positions of the target stimuli were counterbalanced across test items during recognition trials. The experimenter initiated the test trials for children to ensure that their attention was directed at the computer screen. The adult participants initiated presentation of test items for themselves. Participants were not told whether their answer on each trial was correct.

Inverted task

The procedure for the inverted task was identical to that for the upright task, except that all faces and features were shown inverted.

RESULTS

For each task (upright and inverted), we performed a separate three-way repeated measures analysis of variance (ANOVA) on participants' response accuracy, with test type (part, whole) and feature (eyes, mouth, nose) as within-subjects factors, and age (4-year-olds, 5-year-olds, adults) as the between-subjects factor. Preliminary ANOVAs with task order (upright first, inverted first) as an additional factor showed no main effect or interactions involving task order.

Upright task

The upright task showed a significant effect of age, F(2, 66) = 28.85, p < .001, indicating that accuracy improved with age. Adults performed much better (M= 80.4%, SD= 7.2) than preschool children (4-year-olds: M= 63.2%, SD= 9.9; 5-year-olds: M= 63.4%, SD= 9.1). As predicted, there was a significant effect of test type, F(1, 66) = 43.30, p < .001; participants recognized more facial features when they were presented in the whole face (M= 74.6%, SD= 15.3) than when they were presented alone (M= 63.4%, SD= 12.6; see Fig. 2). Test type did not interact with age, and planned t tests revealed that this whole-face advantage was significant for each age group (all ps < .001).

Fig. 2.

Children's and adults' performance in the part and whole conditions of the upright task. Standard error bars are shown.

There was also a significant effect of facial feature, F(2, 66) = 11.20, p < .001; children and adults performed better when the target feature was the eyes (M= 75.2%, SD= 17.5) than when it was the mouth (M= 70.3%, SD= 17.9) or nose (M= 61.6%, SD= 17.2). This main effect was qualified by a significant Test Type × Feature interaction, F(2, 66) = 3.88, p < .05. For features presented in whole faces, participants were more accurate at recognizing a set of eyes (M= 84.0%, SD= 17.6) than they were at recognizing mouths (M= 72.5%, SD= 21.5) or noses (M= 67.4%, SD= 23.2). This preference for recognizing the eyes was not evident when the features were presented alone (eyes: M= 66.3%, SD= 24.6; mouth: M= 68.1%, SD= 22.2; nose: M= 63.2%, SD= 26.8).

Inverted task

The ANOVA for the inverted task revealed a significant effect of age, F(2, 66) = 6.88, p < .005, with children performing generally more poorly (4-year-olds: M= 62.4%, SD= 10.8; 5-year-olds: M= 58.3%, SD= 8.8) than adults (M= 69.6%, SD= 11.3). Contrary to predictions, there was a significant effect of test type, F(1, 66) = 13.84, p < .001; target features were better recognized in isolation (M= 68.1%, SD =16.4) than in the whole-face condition (M= 58.9%, SD = 13.7; see Fig. 3). This effect did not interact with age, and post hoc tests (with Bonferroni correction) were significant for all age groups (all ps < .001).

Fig. 3.

Children's and adults' performance in the part and whole conditions of the inverted task. Standard error bars are shown.

There was also a significant effect of feature, F(2, 66) = 8.11, p < .001, with participants recognizing eyes (M= 67.8%, SD= 18.2) and noses (M= 65.9%, SD= 17.6) better than mouths (M= 56.9%, SD= 16.0). Unlike in the upright task, this effect of feature was not dependent on the presentation of the stimuli (i.e., part, whole).

Inversion effect

Examination of Figures 2 and 3 indicates that the effect of inversion in the whole-face condition was greater for adults than for children. We conducted a two-way ANOVA on recognition accuracy in the whole condition, with orientation (upright, inverted) and age (4-year-olds, 5-year-olds, adults) as factors. As expected, the Orientation × Age interaction was significant, F(2, 66) = 3.97, p < .05. An inversion decrement score was calculated by applying the following formula to the accuracy data in the whole condition: [(upright − inverted)/upright] × 100%. A positive score thus reflects better recognition performance on upright than inverted trials. These decrement scores increased with age (4-year-olds: M= 12.2, SD= 26.0; 5-year-olds: M= 14.5, SD= 26.7; adults: M= 26.2, SD= 18.0). The effect of age was marginally significant, F(2, 66) = 2.28, p < .1, with the pattern of results revealing the greatest difference in scores to be between 4-year-olds and adults.

DISCUSSION

The aims of the present study were to investigate the processes preschoolers use to recognize faces and to determine whether these processes are similar to those used by adults. As predicted, preschoolers' and adults' recognition of upright features was better when they were presented in the context of a whole face than when they were presented in isolation. This finding is evidence for holistic encoding of faces by preschool children. Although the results revealed a significant effect of age (indicating that adults performed significantly better than preschoolers), age did not interact with test type. Thus, there was no increase in holistic encoding with age, providing no evidence for an encoding switch. Children and adults found it quite difficult to recognize features when they were embedded in an inverted face. In fact, all participants showed improved recognition when inverted features were presented alone. Finally, in the whole-face condition, inversion affected performance increasingly with age.

Carey and Diamond (1977) proposed that children use parts-based processing to represent faces until the age of 10, when there is a switch in processing modes. Recent evidence, however, has suggested that children aged 6 and above do not use parts-based processing to recognize faces. Instead, they employ a holistic encoding strategy (Carey & Diamond, 1994; Tanaka et al., 1998). The present results extend these findings and indicate that children as young as 4 encode faces holistically. As expected, preschoolers showed a whole-face advantage in the upright task; that is, they were better at recognizing individual features embedded in the whole face than in isolation. These findings are contrary to those of Schwarzer (2002), who demonstrated that 2- to 5-year-old children prefer to categorize faces on the basis of constituent parts. Instead, our results indicate that preschool children mentally represent the face as a whole, rendering the individual parts less accessible than the face itself. What might account for this discrepancy between the two studies? Schwarzer's use of schematically drawn faces, instead of more realistic photos (as used here), may have enhanced the salience of individual parts, affecting children's categorization abilities. A second, more likely possibility concerns the nature of the two tasks. Schwarzer used a categorization task to capture children's face-processing preferences. In contrast, this study used an immediate recognition task. It may be that when children are required to group a set of faces, they do so via piecemeal characteristics; when recognition is required, children prefer to use holistic information.

Consistent with recent research, these findings fail to support the existence of an encoding switch. Both preschoolers and adults employ holistic processes to recognize faces. Research suggests that faces may be represented holistically within the first year of life. Cohen and Cashon (2001) used a composite design in which 7-month-old infants first habituated to two female faces. They were then tested on a familiar face, a novel face, and a switched face (a familiar face whose internal facial features had been switched with those from the other familiar face). The infants looked longer at the switched face than the familiar face, suggesting that they were responding to the novelty of a new facial configuration. Thus, rather than explaining the experimental results by a developmental shift in processing modes from parts based to holistic, it is more parsimonious to suggest that adults and even very young children process faces in a qualitatively similar way, namely, via holistic processing.

In the inverted task, both children and adults recognized features presented in isolation better than features embedded in the face configuration. This indicates that features are not processed holistically in inverted faces. Rather, they appear to be represented independently. In this case, a face configuration acts as extraneous noise and impairs recognition of the target feature. Preschoolers and adults were better at recognizing eyes than mouths or noses, when presented in the context of an upright face (but not when presented alone). Tanaka and Farah (1993) also reported that adults made more accurate eye judgments than nose or mouth judgments. Research has shown that infants as young as 2 months prefer looking at the eyes to looking at other facial features (Haith, Bergam, & Moore, 1977). This sensitivity to the eyes accords with neurophysiological evidence for a neural system dedicated to processing eye-gaze direction. Perrett et al. (1985) have identified cells in the superior temporal sulcus of the cortex in the macaque monkey that respond not only to faces, but also to the direction of the eyes. The present results reveal that the eyes are significant features within the face for both preschoolers and adults.

By the age of 4, young children represent upright faces holistically. Thus, preschoolers and adults use qualitatively similar processing modes (i.e., holistic) for faces. However, preschoolers' performance did differ from adults' in two important ways. First, 4- and 5-year-olds recognized upright faces more poorly than adults. This finding is consistent with the results of previous studies demonstrating that children do not show an adultlike expertise for faces (Bruce et al., 2000). Second, the pattern of results indicates that preschoolers were less affected by inversion than were adults. Carey and Diamond (1994) also found age-related changes in the size of the inversion effect. Our results extend Carey and Diamond's (1994) results to children below the age of 6. It is important to note that these results indicate that the developmental change in the size of the inversion effect is not due to a greater reliance on holistic encoding. However, recognition accuracy in the upright whole condition was not equated across age groups; that is, adults found this condition much easier than did children. It will be necessary to confirm whether adults' increased sensitivity to inversion holds when baseline performance is matched across participants.

These results lead us to speculate about the possible mechanisms for the developmental differences in face recognition performance. One possibility that Carey and Diamond (1994) suggested is that the development of face-encoding skills reflects increasing knowledge of faces per se. Specifically, they suggested that adults' increased sensitivity to inversion may be the result of a greater reliance on configural information, that is, the spatial relations between features. Configural encoding is widely held to characterize adultlike expertise with faces (Diamond & Carey, 1986). Mondloch et al. (2002) investigated the use of configural information in children aged 6, 8, and 10 years, by manipulating unfamiliar faces either in terms of featural information (by changing the shape of the eyes and mouth) or in terms of configural information (by altering the spacing of the eyes and mouth). Children were able to use both types of information to aid in recognition, but younger children performed more poorly than adults when configural processing was required. The authors concluded that developmental changes in face-processing abilities are due to the gradual development of configural processing. However, adults' recognition accuracy was better when featural information was manipulated than when configural information was manipulated, indicating that subtle configural changes to the stimuli were more difficult to detect than featural changes (E. McKone, personal communication, December 5, 2002). Future studies will need to ensure that performance is equated across the crucial manipulations.

A second possible explanation of the developmental differences in face recognition performance concerns how facial representations are stored in memory. Valentine (1991) suggested that representations might be located in a multidimensional space, where faces are distinguished according to how they vary along the dimensions that define the space. It is conceivable that developmental differences might reflect fewer dimensions being represented in the child's than the adult's face-space (Johnston & Ellis, 1995). As more faces are experienced with age, additional dimensions may be invoked to allow one to discriminate between faces. Future research will need to pinpoint what these dimensions are, and whether or not they might account for the developmental differences observed in this study. Although children and adults both use holistic processing for face recognition, preschoolers evidently need much more experience with faces to achieve adultlike expertise. We have demonstrated, however, that this expertise is not due to a greater reliance on holistic encoding.

Footnotes

1. Pilot testing with longer exposure times (8 and 10 s) resulted in children becoming more distractible between test trials.

Acknowledgements

We are grateful to Bob Joseph at Boston University for allowing us to use the face stimuli, and to Susan Carey and Murray Maybery for comments on previous versions of this manuscript. We also wish to thank all the children for so eagerly participating in this study.

References

Bartlett

J.C.

Searcy

(1993). Inversion and configuration of faces. Cognitive Psychology, 25, 281– 316.

Brace

N.A.

Hole

G.J.

Kemp

R.I.

Pike

G.E.

Van Duuren

Norgate

(2001). Developmental changes in the effect of inversion: Using a picture book to investigate face recognition. Perception, 30, 85– 94.

Bruce

Campbell

R.N.

Doherty-Sneddon

Import

Langton

McAuley

Wright

(2000). Testing face-processing skills in children. British Journal of Developmental Psychology, 18, 319– 333.

Bushnell

I.W.R.

Saï

Mullin

J.T.

(1989). Neonatal recognition of the mother's face. British Journal of Developmental Psychology, 7, 3– 15.

Carey

Diamond

(1977). From piecemeal to configurational representation of faces. Science, 195, 312– 314.

Carey

Diamond

(1994). Are faces perceived as configurations more by adults than by children? Visual Cognition, 1, 253– 274.

Carey

Diamond

Woods

(1980). Development of face recognition—a maturational component? Developmental Psychology, 16, 257– 269.

Cohen

L.B.

Cashon

C.H.

(2001). Do 7-month-old infants process independent features or facial configurations? Infant and Child Development, 10, 83– 92.

Diamond

Carey

(1986). Why faces are and are not special: An effect of expertise. Journal of Experimental Psychology: General, 115, 107– 117.

10.

Goren

C.C.

Sarty

P.Y.K.

(1975). Visual following and pattern discrimination of face-like stimuli by newborn infants. Pediatrics, 56, 544– 549.

11.

Haith

M.M.

Bergman

Moore

M.J.

(1977). Eye contact and face scanning in early infancy. Science, 198, 853– 855.

12.

Johnson

M.H.

Dziurawiec

Ellis

Morton

(1991). Newborns' preferential tracking of face-like stimuli and its subsequent decline. Cognition, 40, 1– 19.

13.

Johnston

R.A.

Ellis

(1995). The development of face recognition. In Valentine

(Ed.), Cognitive and computational aspects of face recognition: Explorations in face space (pp. 1– 23). London: Routledge.

14.

Maurer

Le Grand

Mondloch

C.J.

(2002). The many faces of configural processing. Trends in Cognitive Sciences, 6, 255– 260.

15.

Mondloch

C.J.

Le Grand

Maurer

(2002). Configural face processing develops more slowly than featural face processing. Perception, 31, 553– 566.

16.

Perrett

Smith

Potter

Mistlin

Head

Milner

Jeeves

(1985). Visual cells in the temporal cortex sensitive to face view and gaze direction. Proceedings of the Royal Society of London, B223, 293– 317.

17.

Rhodes

Brake

Atkinson

A.P.

(1993). What's lost in inverted faces? Cognition, 47, 25– 57.

18.

Schwarzer

(2002). Processing of facial and non-facial visual stimuli in 2- to 5-year-old children. Infant and Child Development, 11, 253– 269.

19.

Tanaka

J.W.

Farah

M.J.

(1993). Parts and wholes in face recognition. Quarterly Journal of Experimental Psychology, 46A, 225– 245.

20.

Tanaka

J.W.

Kay

J.B.

Grinnell

Stansfield

Szechter

(1998). Face recognition in young children: When the whole is greater than the sum of its parts. Visual Cognition, 5, 479– 496.

21.

Valentine

(1991). A unified account of the effects of distinctiveness, inversion and race on face recognition. Quarterly Journal of Experimental Psychology, 43A, 161– 204.

22.

Young

A.W.

Hellawell

Hay

D.C.

(1987). Configural information in face perception. Perception, 16, 747– 759.