Read My Lips

Abstract

It is well known that the right side of the mouth moves more than the left during speech, but little is known about how this asymmetry affects lipreading. We investigated asymmetries in the visual expression and perception of speech using the McGurk effect—an illusion in which incongruent lip movements cause listeners to misreport sounds. Thirty right-handed participants watched film clips in which the left, the right, or neither side of the mouth was covered. The McGurk effect was attenuated when the right side of the mouth was covered, demonstrating that this side is more important to lipreading than is the left side of the mouth. Mirror-reversed images tested whether the asymmetry was the result of an observer bias toward the left hemispace. The McGurk effect was stronger in the normal than in the mirror orientation when the mouth was fully visible. Thus, observers attend more to what they think is the right side of the speaker's mouth. Asymmetries in mouth movements may reflect the gestural origins of language, which are also right lateralized.

Despite its apparent structural symmetry, the face displays important functional asymmetries. For the expression of emotion, it appears that the left side of the face moves more than the right (Wylie & Goodale, 1988), and the left side is perceived to be more expressive of emotion than the right side is (Nicholls, Wolfgang, Clode, & Lindell, 2002; Sackheim & Gur, 1978). In contrast, the right side of the face, particularly the mouth, may be more important to the expression of speech than the left side is. Research measuring movements made by the mouth has revealed that the right side of the mouth opens earlier and wider during speech production (Wolf & Goodale, 1987). Increased motility of the right-mouth has been observed in 5-month-old infants during syllabic babbling (Holowka & Petitto, 2002) and in common marmosets during social-contact calls (Hook-Costigan & Rogers, 1998). Because the right side of the mouth is innervated by the left cerebral hemisphere, increased movement of the right-mouth probably reflects left-hemisphere dominance for speech production, which occurs in 95% of the right-handed (dextral) population (Hausmann et al., 1998).

An insight into the functional significance of asymmetrical mouth movements during speech has been provided by Graves and Potter (1988). They asked participants to restrain manually either the left or the right side of their lips while reciting a tongue twister that required precise bilabial coordination. Listeners, who could not see the speakers, judged the quality of articulation of each sample. The tongue twisters were articulated more clearly when the right side of the mouth was free to move, demonstrating that movements of the right side of the mouth are more important to the articulation of speech than are movements of the left side of the mouth.

Asymmetries in mouth movements may also affect the visual communication of speech. This type of communication, known as lipreading, plays an important role in speech perception in individuals with hearing impairments, as well as those with normal hearing (Massaro, 1987). Campbell (1986) examined asymmetries in lipreading using photographic chimeric images in which one side of the face was articulating one sound (a consonant or vowel) while the other side was articulating another. When listeners were asked to identify the sounds spoken by the two halves, the half that featured the right side of the speaker's face and mouth was reported more accurately than the other half, and Campbell concluded that the right side of the mouth is more important than the left to the visual expression of speech.

As Campbell (1986) noted, however, increased accuracy for the right side of the mouth could be brought about by a perceptual asymmetry on the part of the observer. Divided-visual-field research has yielded mixed results, with some investigators reporting a left-visual-field advantage (Baynes, Funnell, & Fowler, 1994) and others reporting a right-visual-field advantage (Smeele, Massaro, Cohen, & Sittig, 1998) for visual speech perception. Campbell investigated the effect of perceptual biases by including mirror-reversed versions of the chimeric images. Post hoc analyses suggested that the advantage for the right-mouth was reduced when the images were mirror reversed. This, in turn, suggests that observers had a perceptual-attentional bias toward the side of the image that fell in their left hemispace (i.e., the right side of the model's face and mouth). Such an effect could reflect an attentional bias generated by the face as a whole (Luh, Rueckert, & Levy, 1991; Moreno, Borod, Welkowitz, & Alpert, 1990) or could be related specifically to lipreading.

Although Campbell's (1986) research suggests asymmetries in lipreading, the study was limited to relatively artificial, static images presented for very brief periods. The present study determined whether lipreading asymmetries exist for more natural, moving images. The visual expression of speech was indexed using the McGurk effect. This illusion arises when speech sounds and lip movements are incongruent. Thus, if the mouth movements of “ga” are dubbed over the sound “ba,” a fusion of the two (e.g., “da”) is often reported (McGurk & MacDonald, 1976). In the reverse situation, when the mouth movements of “ba” are dubbed over the sound “ga,” a combination of the two (e.g., “bga”) is often reported (McGurk & MacDonald, 1976). These effects demonstrate that visual processing affects auditory experience in normal individuals (Driver, 1996; Massaro, 1987).

To investigate asymmetries in the expression of speech, we covered either the left or right side of the mouth in our stimuli. We also included a baseline condition in which neither side of the mouth was covered. We expected McGurk-type errors to be most frequent in the baseline condition because this condition maximizes visual information from the mouth. In view of the research showing that the right side of the mouth moves more than the left during speech, we predicted that covering the right side of the mouth would reduce the number of McGurk errors relative to covering the left side. To investigate asymmetries in the perception of speech, we included mirror-reversed versions of the stimuli. If perceptual asymmetries play no part in lipreading, mirror reversal should have no effect on the side of the mouth that produces more McGurk errors. Alternatively, if perceptual asymmetries are important, then mirror reversal should moderate or reverse the asymmetry between the sides of a speaker's mouth.

METHOD

Participants

Thirty (28 females, 2 males) undergraduate students participated in this study as part of their course requirements. The participants' modal age was 18 years, and all reported having normal hearing and visual acuity. All participants were dextral (score>8), as indicated by the Edinburgh Handedness Inventory (Oldfield, 1971). Participants gave informed consent before testing, but were naive regarding the specific aims of the study.

Apparatus

Stimuli were filmed and played back using a Canon MVX1 digital video camera. They were edited on an Apple Macintosh G4 computer using FinalCutPro editing software. Stimuli were displayed on a 280×210 color television monitor coupled with a set of Sennheiser HD40 headphones.

Stimuli

Ten right-handed people (Edinburgh score>8; 5 males, 5 females) whose ages ranged between 18 and 40 years and who spoke English as their first language were used as models. Care was taken to ensure that models' faces were lit symmetrically during filming. Models were asked to fixate the center of the camera lens and articulate the following consonant-vowel (CV) syllables: ba, ga, pa, and da. Models were encouraged to pronounce each syllable naturally, ensuring that their lips were fully closed before and after speaking. Each CV syllable was filmed a number of times to ensure that a clear example was obtained.

To form the control and experimental stimuli, we separated and recombined the visual and acoustic components of the video using FinalCutPro software, taking care to ensure that the vision and sound of the recombined footage were temporally matched. The control stimuli consisted of a soundtrack that matched the visual images (e.g., visual ba combined with audio ba, visual ga combined with audio ga). To make the quality of the control stimuli comparable to that of the experimental stimuli, we created the control stimuli by dubbing the soundtrack from one example of a given CV syllable onto the visual image of another example of the same syllable.

The experimental stimuli consisted of six cases in which the soundtrack did not match the visual image. The vision-sound pairs of ga-ba, da-ba, and da-pa were designed to elicit fusion responses, whereas the reverse pairs, ba-ga, ba-da, and pa-da, were designed to elicit combination responses (see MacDonald & McGurk, 1978). The experimental stimuli were digitally edited to produce normal and mirror-reversed versions of the six vision-sound pairs. These images were then edited so that the right, the left, or neither side of the mouth was covered by a bar. The size of the bar changed according to the size of the speaker's mouth and covered half of the mouth. The inner edge of the bar was aligned with the horizontal midline of the mouth and extended laterally (see Fig. 1).

Fig. 1.

Sample images from the lipreading experiment. The examples illustrate the manipulation of side of the mouth covered (right, neither, or left side covered) and orientation (normal and mirror reversed).

Procedure

There were 40 control trials, which comprised each of the four congruent CV vision-sound pairs as spoken by the 10 models. The factors of incongruent vision-sound pair (ga-ba, da-ba, da-pa, ba-ga, ba-da, pa-da), orientation (normal, mirror reversed), side of mouth covered (right, neither, left) and model (1–10) were combined to produce 360 experimental trials. The order in which the factorial combinations were presented was randomized. The control and experimental trials were mixed and presented in four blocks of 100 trials. The order in which the blocks were administered was balanced across participants using a pseudorandom procedure.

Participants were seated at a table with the television screen placed in front of them and aligned with their midline. The soundtrack was played binaurally through headphones. Participants were asked to watch the video and report what the model said. There was a 5-s gap between trials, which allowed participants to respond and the experimenter to write down the response. The experimenter was blind to the experimental condition.

Participants attended one testing session, which lasted approximately 45 min. Before the experimental trials began, 20 practice trials (10 control and 10 experimental) were administered to ensure participants understood the requirements of the task.

RESULTS AND DISCUSSION

All errors fell into three categories: (a) fusion responses (e.g., for vision-sound pair ga-ba, “da” is reported), (b) combination responses (e.g., for vision-sound pair ba-ga, “bga” is reported), and (c) lipreading responses (e.g., for vision-sound pair ga-ba, “ga” is reported). Analysis of the error subtypes revealed that the majority of errors (18.9% of the trials overall) were fusion errors, followed by lipreading errors (2.2%) and combination errors (1.0%). A prevalence of fusion relative to combination errors has been reported previously (Green & Gerdeman, 1995) and may reflect listeners' reluctance to report phonotactically unacceptable sounds such as “bga.”

Total error was calculated by summing the three types of errors and converting this number into a percentage of the total number of trials. The average error rate for control trials was 3.2%, demonstrating that participants easily identified the stop consonants when the visual and auditory information was congruent. Considerably more errors (22.1%) were made for the experimental stimuli, in which the visual image and soundtrack were mismatched, indicating that participants' auditory experience was affected by visual information from the speaker, leading to the McGurk effect (Driver, 1996).

Total-error data for experimental trials were analyzed with a repeated measures analysis of variance with side of the mouth that was covered (right, neither, or left) and orientation (normal or mirror reversed) as within-participant factors. Effect sizes are reported as eta-squared values (η²). Figure 2 illustrates that the error rate changed as a function of the side of the mouth that was covered, F(2, 58)=15.70, p<.001, η²=.35. Post hoc t tests revealed no difference in the error rate between the neither-side-covered condition and the left-side-covered condition, t(29)=0.78, n.s. The error rate was lower, however, when the right side of the mouth was covered than when neither side was covered, t(29)=6.68, p<.001, or when the left side was covered, t(29)=3.90, p<.005.

Fig. 2.

Mean error rate for stimuli with the right, the left, or neither side of the mouth covered in the normal and mirror-reversed orientations. Confidence intervals were calculated using the technique described by Loftus and Masson (1994). Note that “left” and “right” refer to the side of the speaker's mouth that was covered, irrespective of whether the stimulus was mirror-reversed.

To investigate whether the asymmetry was consistent across the 10 models—and not due to a few idiosyncratic outliers—we reorganized the data so that the 10 models became the source of variance and side of the mouth was the repeated factor. Once again, a significant effect of side of the mouth was found, F(2, 18)=3.6, p<.05, η²=.28. Post hoc analyses revealed that error rates were lower when the right side of the mouth was covered than when neither side was covered, t(9)=2.97, p<.05, or when the left side was covered, t(9)=2.39, p<.05. Thus, it appears that the mouth asymmetry was consistent across models.

The finding that fewer errors were made when the right side of the mouth was covered than when the left or neither side was covered demonstrates that the right side of the mouth is more important than the left in generating the McGurk effect. This, in turn, suggests that the right-mouth is more visually expressive than the left-mouth during speech. We predicted that the McGurk effect would be strongest when both sides of the mouth were visible, because this condition maximized visual input. Instead, the error rate for the full-mouth condition was no higher than that observed when the left side was covered. This suggests that the visual information provided by movements of the right side of the mouth is just as informative as the information provided by movements of the entire mouth.

There was no effect of orientation, F(1, 29)=0.01, n.s. However, Figure 2 shows a significant interaction between the side of the mouth covered and orientation, F(2, 58)=8.54, p<.005, η²=.23. Post hoc t tests showed that orientation had no effect on error rate when the right or left side of the mouth was covered, t(29)=1.32, n.s., and t(29)=1.67, n.s., respectively. In contrast, when neither side of the mouth was covered, the error rate was higher when stimuli were presented in the normal orientation than when they were presented in the mirror orientation, t(29)=3.14, p<.005. Such an effect suggests a perceptual asymmetry of observers toward their left hemispace. Thus, the magnitude of the McGurk effect was maximized in the normal orientation when the more expressive, right side of the speaker's mouth fell in the attended left hemispace. Conversely, the McGurk effect was reduced in the mirror-reversed condition when the right side of the speaker's mouth fell in the unattended right hemispace.

The interaction between orientation and the side of the mouth covered provides an insight into the mechanisms that underlie the asymmetry. If the leftward attentional bias was generated by the face as a whole (e.g., Luh et al., 1991), one would expect a reduction in the McGurk effect whenever the mask that covered the mouth fell in the viewer's left hemispace. Instead, the effect of mirror-reversal disappeared when one side of the mouth was covered. This pattern of results suggests that the perceptual asymmetry is tied to the mouth and that participants preferentially attend to the part of the mouth that falls to their left. How participants knew to look toward what they thought was the right side of the speaker's mouth is a matter for debate. It is possible that the behavior is learned. Given the long evolutionary history of oral asymmetries in communication (Hook-Costigan & Rogers, 1998), however, the possibility that the bias is innate, like the universal ability to express and understand facial emotions such as sadness and happiness (Ekman, 1980), cannot be ruled out.

Our results demonstrate the importance of the right side of the mouth to the visual expression and comprehension of speech. Corballis (2002) suggested that language originated in gestures made by the right hand before evolving into “hidden” vocal gestures (i.e., speech). Our study suggests that the evolutionary origins of language are still visible and are lateralized to the same side—the right. It appears that listeners have an intuitive knowledge of this asymmetry and preferentially attend to the right side of a speaker's mouth. It would be interesting to examine this asymmetry in individuals with congenital hearing impairments, who are expert lip-readers, to determine the extent to which their lipreading skills are lateralized.

References

Baynes

Funnell

M.G.

Fowler

C.A.

(1994). Hemispheric contributions to the integration of visual and auditory information in speech perception. Perception & Psychophysics, 55, 633–641.

Campbell

(1986). The lateralization of lip-reading: A first look. Brain & Cognition, 5, 1–21.

Corballis

M.C.

(2002). From hand to mouth. Princeton, NJ: Princeton University Press.

Driver

(1996). Enhancement of selective listening by illusory mislocation of speech sounds due to lip-reading. Nature, 381, 66–68.

Ekman

(1980). The face of man: Expressions of universal emotion in a New Guinea village. New York: Garland STPM Press.

Graves

Potter

S.M.

(1988). Speaking from two sides of the mouth. Visible Language, 22, 129–137.

Green

K.P.

Gerdeman

(1995). Cross-modal discrepancies in coarticulation and integration of speech information: The McGurk effect with mismatched vowels. Perception & Psychophysics, 21, 1409–1426.

Hausmann

Behrendt-Koerbitz

Kautz

Lamm

Radelt

Guentuerkuen

(1998). Sex differences in oral asymmetries during word repetition. Neuropsychologia, 36, 1397–1402.

Holowka

Petitto

L.A.

(2002). Left hemisphere cerebral specialization for babies while babbling. Science, 297, 1515.

10.

Hook-Costigan

M.A.

Rogers

L.J.

(1998). Lateralised use of the mouth in production of vocalizations by marmosets. Neuropsychologia, 36, 1265–1273.

11.

Loftus

G.R.

Masson

M.E.J.

(1994). Using confidence intervals in within-subjects designs. Psychonomic Bulletin & Review, 1, 476–490.

12.

Luh

K.E.

Rueckert

L.M.

Levy

(1991). Perceptual asymmetries for free viewing of several types of chimeric stimuli. Brain & Cognition, 16, 83–103.

13.

MacDonald

McGurk

(1978). Visual influences on speech perception processes. Perception & Psychophysics, 24, 253–257.

14.

Massaro

D.W.

(1987). Speech perception by ear and eye. Hillsdale, NJ: Erlbaum.

15.

McGurk

MacDonald

(1976). Hearing lips and seeing voices. Nature, 264, 746–748.

16.

Moreno

C.R.

Borod

J.C.

Welkowitz

Alpert

(1990). Lateralisation for the expression and perception of facial emotion as a function of age. Neuropsychologia, 28, 199–209.

17.

Nicholls

M.E.R.

Wolfgang

B.J.

Clode

Lindell

A.K.

(2002). The effect of left and right poses on the expression of facial emotion. Neuropsychologia, 40, 1662–1665.

18.

Oldfield

R.C.

(1971). The assessment of handedness: The Edinburgh Inventory. Neuropsychologia, 9, 97–133.

19.

Sackheim

H.A.

Gur

R.C.

(1978). Lateral asymmetry in the intensity of emotional expression. Neuropsychologia, 6, 473–482.

20.

Smeele

P.M.T.

Massaro

D.W.

Cohen

M.M.

Sittig

A.C.

(1998). Laterality in visual speech perception. Journal of Experimental Psychology: Human Perception and Performance, 24, 1232–1242.

21.

Wolf

M.E.

Goodale

M.A.

(1987). Oral asymmetries during verbal and non-verbal movements of the mouth. Neuropsychologia, 25, 375–396.

22.

Wylie

D.R.

Goodale

M.A.

(1988). Left-sided oral asymmetries in spontaneous but not posed smiles. Neuropsychologia, 26, 823–832.