Abstract
Multimedia advertisements often contain nonverbal auditory elements, such as music and sound effects, and nonverbal visual elements, such as images and logos. On the one hand, these elements can have the unintended negative effect of interfering with the processing of the verbal ad copy. Two experiments demonstrate that auditory elements interfere more with the learning of and cognitive responding to English ad copy than with Chinese ad copy, and vice versa for visual elements. On the other hand, auditory and visual elements have the intended positive effect of facilitating ad copy recall when they are reinstated as part of an integrated marketing campaign or as a recall cue in an advertising tracking study. A third experiment demonstrates that auditory elements are better retrieval cues for English than for Chinese ad copy, and vice versa for visual elements. The authors discuss implications of these cross-linguistic differences for the effective design of multimedia communications, integrated marketing campaigns, advertising tracking studies, and cross-cultural research.
Multimedia advertisements employ various nonverbal auditory and visual elements, often to capture and hold consumers' attention and to serve as retrieval cues for later recall. These elements can relate to the ad copy in a meaningful way, or they can be unrelated to the ad copy. For example, of a large sample of U.S. television commercials, more than 42% contained music, but only 12% carried commercial messages in the lyrics (Stewart and Furse 1986). Of the Web's top 75 sites, 60% feature audio in addition to images, and approximately 25% provide ambient music (Jupiter Research, rpt. in Crockett 2001). We examine the influence of unrelated auditory and visual elements on the processing of ad copy that is written in the world's two most popular scripts: alphabetic English and logographic Chinese. Recent memory research on nonsensical Korean words suggests that alphabetic words should be more sensitive to the influence of auditory elements, whereas logographic words should be more sensitive to the influence of visual elements (Tavassoli and Han 2001).
We begin by exploring unintended detrimental consequences of employing auditory and visual ad elements. Nonverbal elements compete for limited cognitive resources at the time of message exposure and can interfere with the learning of the verbal ad copy, inhibit consumers' ability to elaborate, and thereby affect product judgments. Interference has been shown to result from auditory elements, such as music (Olsen 1997; Park and Young 1986) and sound effects (Rule and Rehill 1970), and from visual elements, such as video (Festinger and Maccoby 1964) and unrelated pictures (Edell and Staelin 1983). We argue that auditory elements are relatively more detrimental to the processing of English ad copy, whereas visual elements cause more interference in Chinese. We test this hypothesis for message recall, memory-based attitudes, and product interest in Experiment 1 and for cognitive responding in Experiment 2.
Despite competing for cognitive resources during message exposure, nonverbal ad elements can also serve as effective memory cues. Marketers often reinstate ad images or music as memory cues in advertising tracking studies (Stewart, Farmer, and Stannard 1990) and on Web sites, product packaging, and point-of-purchase displays as part of an integrated marketing campaign, to remind consumers of the advertising content (Edell and Keller 1989). For many consumers, the song “Like a Rock” triggers the recall of the Chevrolet brand and ad copy that was presented with the music. Similarly, the image of a raincoat-clad baby sitting in a car tire facilitates recall of the Michelin brand name and ad copy that was communicated alongside the image. We argue that reinstating music is a relatively stronger retrieval cue for the recall of English ad copy, whereas reinstating ad images is a relatively stronger retrieval cue in Chinese. We test this hypothesis in Experiment 3.
To summarize, the effective use of nonverbal elements in the design of marketing communications can present a trade-off between interference at message reception and facilitation at message retrieval. We provide insight into this design challenge by examining the relative potency with which auditory and visual elements contribute to these reciprocal effects in Chinese and in English. Our research builds on the more theoretical contribution of Tavassoli and Han (2001), whose findings on nonsensical words have several limitations that potentially prevent them from having broad marketing applications. We address these limitations and test their ideas for real words and levels of information processing that are central to consumer decision making. We conclude by discussing theoretical implications for the everyday processing of written language and practical implications for the effective design of multimedia communications, integrated marketing campaigns, and advertising tracking studies. We also discuss our results with respect to academic research in which auditory and visual stimuli or filler activities can confound cross-cultural research across alphabetic and logographic scripts.
Processing Chinese and English
Auditory and visual ad elements should differentially interact with Chinese and English words because reading alphabetic words relies more on sound-based (phonological) processes and reading logographs relies more on visual processes. As do most languages, English relies on an alphabetic script in which letters represent sounds. Readers of English tend to subvocalize written words, except for a small number of frequently used words (McCusker, Hillinger, and Bias 1981), and they rehearse English words primarily using a sound-based mental code in a subsystem of short-term memory known as the phonological loop (Baddeley 1986; Paivio 1986). Routine English language processing does not appear to rely on visual short-term memory other than through mental imagery (Gathercole and Baddeley 1993).
The process of reading differs considerably for the 25% of the world population who must visually distinguish upward of 7000 Chinese logographs. Logographs represent meaning, and a reader can mentally access concepts unmediated by subvocalization, though the activation of pronunciation may be immediate and can mediate the activation of meaning (Perfetti and Zhang 1991; Spinks et al. 2000). Reading logographs is not dominated by sound-based processes and appears to rely to a greater degree on visual processes (Hung and Tzeng 1981; Schmitt, Pan, and Tavassoli 1994; Zhou and Marslen-Wilson 1999). Chinese speakers may rely less on the sound-based processes to encode and rehearse words because there is an abundance of homophones in Chinese (words that sound the same but have different meanings, such as “dear” and “deer” in English). For example, Mandarin Chinese uses only approximately 400 syllables (1300 with tones), compared with approximately 4000 in English. Homophones make sound an ambiguous mental code, and Chinese speakers may supplement it with visual orthographic information for which meaning is unambiguous.
Marketing Implications
Much of the psychological research on differences in reading alphabetic and logographic scripts has been limited to lower-level processes, such as those involved in lexical access. In contrast, the marketing literature has identified effects that represent higher-level processes involved in learning and persuasion. Several articles have examined effects of selective attention. Pan and Schmitt (1996) find that attitude ratings provided by readers of Chinese are more sensitive than those provided by readers of English to the match between the femininity or masculinity of fonts for feminine (e.g., lipstick) and masculine (e.g., motorcycles) products, whereas the reverse relationship holds for auditory relationships (i.e., a female versus a male speaker). Tavassoli (2001) also finds that readers of Chinese are more sensitive to visual features of written words. They are more likely than readers of English to remember the print color of a brand name and are influenced more by a color match among brand names in brand evaluations. Notably, bilinguals have been found to be flexible in the way they process brand names in a mixed-language context (selectively attending sound-based or semantic relationships) depending on the emphasis given to the copresented alphabetic English and the logographic Chinese brand name (Zhang and Schmitt 2001).
In addition to selective attention to visual and acoustic features, Tavassoli (1999, 2002) and Tavassoli and Han (2001) argue about the processes of short-term memory. Memory for alphabetic words has been found to reflect the serial rehearsal characteristics of short-term memory's phonological loop (Baddeley 1986; Paivio 1986), and memory for the presentation order of written words is better for alphabetic words than for logographic words (Tavassoli 1999; Tavassoli and Han 2001). In contrast, memory for logographs reflects the spatial–relational rehearsal characteristics of visual short-term memory (Jiang, Olson, and Chun 2000; Paivio 1986), and memory for the spatial location of logographic words is better than it is for alphabetic words (Tavassoli 2002; Tavassoli and Han 2001).
Central to the thesis of this article, however, are not the qualitative rehearsal characteristics of short-term memory's phonological and visual stores but the quantitative limitations in their respective processing capacities (Baddeley 1986; Gathercole and Baddeley 1993; Miller 1956). We expect that auditory and visual ad elements will interact differently with Chinese and English words on the basis of a differential overlap in processing requirements.
Experiment 1: Recall, Attitudes, and Customer Interest
In Experiment 1, we examine detrimental effects of auditory and visual ad elements on the processing of ad copy. Tavassoli and Han (2001, Experiment 1) find that sounds interfere more with the encoding of alphabetic words, whereas geometric images interfere more with the encoding of logographic words. They tested this idea by having Korean participants learn nonsensical words that were written either in the alphabetic Hangul or in the logographic Hancha scripts. The nonsensical words were presented for one second each and were each followed by either a two-second sound or image that also had to be learned. As Tavassoli and Han predicted, recognition memory for alphabetic words was worse when colearned with sounds, whereas recognition of logographic words was worse when colearned with images.
However, Tavassoli and Han's (2001) results need to be treated with caution, because they may not necessarily generalize to the processing of advertising information. This is because the processing of nonsensical alphabetic words is largely limited to sound-based processes (Baddeley, Gathercole, and Papagno 1998). Semantic rehearsal is limited because letters have no meaning and words have no lexical entry. Visual rehearsal is limited because a small inventory of letters is reused for different words. In contrast, nonsensical logographic words need to be constructed from at least two different characters that inherently possess meaning and each consist of a unique stroke combination. In other words, whereas new-word learning in alphabetic scripts is largely limited to sound-based processing, the visually differentiated nonsensical logographs can be encoded and rehearsed semantically to some degree. This could have artificially created effects that are limited to nonmeaningful marketing stimuli, such as the brand name Exxon for alphabetic scripts or Microsoft for logographic scripts. Finally, only approximately 50% of Korean words can be written in a logo-graphic script, and these come from certain categories of words, such as nouns. It is therefore possible that there is a word-category effect that Tavassoli and Han did not control for with Korean nonsensical words.
In Experiment 1, we extend Tavassoli and Han's (2001) findings in several ways. First, we use real words that can be processed semantically in both Chinese and English. Second, we rely on sentences that are typical of information presented in advertisements. Third, we ask participants to view the information in the way they would view a television commercial rather than provide them with explicit instructions to learn both a list of unrelated words and the interspersed nonverbal stimuli. Dual-task learning may lead participants to adopt learning strategies that do not reflect the way consumers normally process commercials. Fourth, in addition to assessing memory, we examine the effect of auditory and visual distraction on attitudes formed, the strength of these attitudes, and customer interest. We next develop hypotheses for these dependent measures.
Message Reception
Distraction has a significant effect on learning verbal information (for reviews, see Baron, Baron, and Miller 1973; McGuire 1985). It has been found that recall of advertising claims is attenuated by visual elements, such as unrelated pictures in print advertisements (Edell and Staelin 1983), and auditory elements, such as background music (Park and Young 1986). Because short-term memory has multiple subsystems (Baddeley 1986), it can be confusing to interpret the effects of distracters in terms of a single-capacity processor, especially in complex multimedia advertisements (Janiszewski 1990). Instead, we should expect distracters to cause more interference in task performance and memory the greater the overlap is in processing resources with the target stimuli (Klingberg and Roland 1997). For example, even though English words can interfere with memory for pictorial information, and vice versa, this cross-modality interference is significantly weaker than within-modality interference (Anderson and Paulson 1978). Similarly, auditory interference is stronger for words than for pictures in English, and vice versa for visual interference (Pellegrino, Siegel, and Dhawan 1976). By the same reasoning, nonverbal auditory and visual information should have differential effects on the encoding and subsequent recall of alphabetic and logographic words. Auditory distracters such as music (Salame and Baddeley 1989) and other nonspeech sounds (Jones 1993) have been shown to interfere with processing in short-term memory's phonological loop and should interfere more with the learning of alphabetic English words. In contrast, visual distracters are processed in visual short-term memory and should interfere more with memory for Chinese logographs. In other words, although it can be assumed that both auditory and visual distracters compete for processing resources in both logographic Chinese and alphabetic English, there should be relative differences in the degree to which they do so across the scripts. 1
In some circumstances, nonverbal information can increase verbal memory. For example, pictures that are meaningfully related to a brand name and processed interactively can facilitate recall (Lutz and Lutz 1977; Schmitt, Tavassoli, and Millard 1993). Distracters can also aid memory when they prevent the generation of thoughts that are unrelated to the target message. For example, background music negatively affected recall for radio advertisements compared with silent advertisements when interstimulus intervals were relatively brief (≤2 seconds) but facilitated recall when they were extensive (3 seconds; Olsen 1997). We do not expect a facilitating effect because we rely on unrelated auditory and visual elements, either of which is present in the interstimulus interval.
Attitudes
Message reception is also central to models of attitude formation and attitude change. When distraction blocks message reception, it dilutes the effect of the message content (Baron, Baron, and Miller 1973; McGuire 1985). The effect of message reception on attitude formation and change has been found to be particularly strong in memory-based judgments (Hastie and Park 1986; Lichtenstein and Srull 1987). Consumers make memory-based judgments when they form evaluations about products and services by recalling what they previously learned. Distraction can also limit message scrutiny and affect online attitudes without affecting message reception (for a review, see Petty and Cacioppo 1986). Because we allow consumers to naturally process the advertisements, the attitudes we measure most likely contain a mix of online and memory-based elements.
Distraction can enhance or reduce the persuasive impact of a message. Distraction reduces learning of an advertisement's persuasive content and limits message scrutiny. Therefore, distraction should inhibit favorable evaluations of attitude objects described by strong messages that lead to the generation of positive cognitive responses, whereas distraction should lead to more favorable evaluations of attitude objects described by weak messages that result in the counterarguing of claims (Petty and Cacioppo 1986; Petty, Wells, and Brock 1976). In some circumstances, distraction can also lead to more favorable evaluations of attitude objects described by strong messages, such as when consumers have more cognitive resources available than are needed for processing a message. In that case, consumers' idiosyncratic thoughts can undermine persuasion, because the thoughts are likely to be less favorable than direct message associations (Anand and Sternthal 1989). Consumers are likely to have excess resources available for simple messages or when they repeatedly encounter a message.
We provide single exposures of an advertisement with strong message content for which distraction should reduce the favorability of evaluation. Brand evaluations should be relatively more favorable with visual than with auditory distraction in English, and vice versa in Chinese. Attitudes should also be stronger and held with more confidence with visual than with auditory distraction in English, and vice versa in Chinese. Attitudes held with more confidence persist longer, are more resistant to persuasion, and are better predictors of subsequent behaviors (Petty and Krosnick 1995). Therefore, we ask participants whether they would like to receive additional information on the advertised brand. We expect the same pattern of results for this behavioral measure of interest as for the attitudinal measures.
Method
Design
There were 120 students who participated in the 2 (scripts: alphabetic versus logographic) × 2 (distraction: auditory versus visual) between-participants experiment. We expected that the levels of auditory and visual distraction we chose would attenuate message learning in both script conditions, and we did not include a no-distraction control condition. In this experiment and all others, the instructions, materials, and questionnaires were provided in Chinese in the logographic condition and in English in the alphabetic condition.
Participants
In this and all other experiments, we relied on Singapore bilinguals who learned Chinese and English in childhood. Proficient bilinguals rely on different scripts to access a single conceptual system (for a review, see Francis 1999; for Chinese, see Chen and Leung 1989). We were thus able specifically to isolate the script used to represent the same concepts in the different languages. Moreover, the use of bilinguals controls for a host of factors that may differ between populations as different as Chinese speakers living in China and English speakers living in the United States. These factors include educational differences that can affect memory rehearsal strategies, cultural factors that can affect the evaluation of certain product characteristics, and different responses to nonverbal audiovisual stimuli.
Stimuli
We adapted the target verbal stimuli from Aaker and Lee's (2001, Experiment 3) study. These had been pretested to provide a positive message advocating a tennis racquet. The advertisement we used read as follows:
The new “Star” tennis racquet is of the highest quality created based on certain important attributes. First, the weight is light and optimally distributed to be heavier on the sides of the frame. This means that the “Star” tennis racquet allows you to hit solid, powerful returns and serves. The size of the “Star” tennis racquet's sweet spot is considerably larger than most competing brands. This allows you to hit with both power and accuracy. Finally, its shock absorbers are made of a new technology patented by Lycra. This uniquely eliminates vibrations that lead to painful tennis elbow and other arm-related injuries. Get the new difference now.
These stimuli were translated into Chinese and back-translated by bilingual Singaporeans. The advertisement's verbal content was displayed in a text box by a computer in a sentence-by-sentence manner with a three-second interval between each sentence.
In the auditory distraction condition, the verbal content was continuously accompanied by energetic, fast-tempo rock music without any visuals. The music had bass and rock guitar instrumentation. In the visual distraction condition, the advertisement was silent, and text boxes were framed by composite photographs that depicted people and objects related to tennis but not specifically to any of the verbal claims (e.g., images of tennis balls, stadiums, and people playing tennis). Similar but different images were also shown in the intervals between text boxes. The advertisement lasted about one minute.
Procedure
Participants were shown a set of three advertisements; the middle one was the target advertisement. The target advertisement was shown either in Chinese or in English with auditory or visual distraction. The two filler advertisements were shown in the same language as the target advertisement, and both contained text that was accompanied by music and images. Participants were instructed to “view the information as they would normally view a television advertisement.” After seeing all advertisements, participants engaged in a two-minute math quiz to clear short-term memory, under the assumption that this would not differentially affect performance across conditions. They then had two minutes to complete the attitude scales and had the option to request additional information on the brand. Finally, participants had two minutes to recall the advertisement's claims.
Dependent measures
Participants provided ratings on five attitude and behavioral intention scales. These scales did not contain any numbers, but they contained seven boxes, of which participants marked one. These were coded from 1 to 7, such that more positive ratings received a higher score. The five seven-point scales were anchored by “very good”/“very bad,” “like a lot”/“do not like at all,” “very desirable”/“not at all desirable,” “would like to try”/“would not try,” and “would buy”/“would not buy.” We averaged the ratings on the five scales to provide an index for attitude positivity (α = .94). We measured the strength with which these attitudes were held using two scales that were coded from 1 (“I am not at all certain about my ratings”/“I am not at all confident about my ratings”) to 7 (“I am very certain about my ratings”/“I am very confident about my ratings”). We used the average score on these two scales as an indicator of attitude strength (r = .96). As a behavioral measure of customer interest, participants also had the option to request or decline additional information about the tennis racquet. Finally, we calculated recall memory with the number of informational elements participants recalled from the advertisement. There were a total of eight possible recall elements coded (one from each sentence) by two coders. There was little disagreement in the coding, and discrepancies were resolved by discussion.
Results
We expected the same pattern of results across all dependent measures: attitude positivity, attitude strength, customer interest, and recall memory. Therefore, we performed a multivariate analysis of variance across these four measures on the between-participant factors script (alphabetic English versus logographic Chinese) and distraction (auditory versus visual). The main effects were not significant, but the interaction effect between script and distraction was highly significant (F(1, 119) = 13.01, p < .0005). We next report detailed results from univariate tests. The critical test of the differential processing hypothesis is the significance of the interaction term. Nevertheless, we also provide the results of ad hoc contrasts across the script and distracter conditions for ease of interpretability. The raw means are summarized in Table 1.
Summary Of The Means (Standard Errors) From Experiment 1
Attitudes
We hypothesized that the more participants were distracted the less positive and weaker participants' attitudes should be. Analyses of variance (ANOVAs) did not have significant main effects, but the interaction between script and distraction was significant for attitude positivity (F(1, 119) = 14.34, p < .0001) and for attitude strength (F(1, 119) = 6.85, p = .01). Ad hoc contrasts show that in Chinese, attitudes were more positive in the auditory than in the visual distraction condition (MA = 3.95 versus MV = 3.17; t[58] = 3.08, p < .004) and directionally stronger (MA = 4.72 versus MV = 4.15; t[58] = 1.78, p = .08). In contrast, in English, attitudes were less positive in the auditory than in the visual distraction condition (MA = 3.35 versus MV = 3.96; t[58] = −2.30, p < .03) and directionally weaker (MA = 4.22 versus MV = 4.80; t[58] = −1.93, p < .06). Across scripts, auditory distraction led to more positive attitudes (t[58] = 2.50, p < .02) that were directionally stronger (t[58] = 1.66, p = .10) in Chinese than in English. Visual distraction led to less positive attitudes (t[58] = −2.85, p < .007) that were weaker (t[58] = −2.03, p < .05) in Chinese than in English. The means for attitude positivity are represented graphically in Figure 1.

Experiment 1: Means For Attitude Positivity (+Se) For Advertisements Presented With Distracting Music Or Distracting Images
Customer interest
We performed a logit analysis on the binary information request data. There were no main effects in this analysis, but the interaction effect between script and distraction was significant (Wald's χ2 = 4.26, p < .04). Ad hoc contrasts showed that directionally more participants chose to receive additional information in Chinese in the auditory distraction condition (MA = .33) than in the visual distraction condition (MV = .13; χ2 = 3.35, p < .07). Although the pattern of means was in the opposite direction of the Chinese condition, the contrast between the auditory (MA = .27) and visual (MV = .40) distraction conditions did not approach significance in English. Across scripts, the means did not differ in the auditory distraction condition, but they were significantly lower in Chinese than in English in the visual distraction condition (χ2 = 5.45, p < .02).
Attribute recall
An ANOVA we performed on recall memory had a significant script main effect (F(1, 119) = 4.15, p < .05). Recall memory was greater in English (M = 3.05) than in Chinese (M = 2.67). The distraction main effect was not significant, but the interaction between script and distraction was significant (F(1, 119) = 8.55, p < .005). Ad hoc contrasts showed that recall was greater in Chinese in the auditory distraction condition (MA = 2.93) than in the visual distraction condition (MV = 2.4; t[58] = 1.98, p = .05). In contrast, recall was lower in English in the auditory (MA = 2.77) than in the visual distraction condition (MV = 3.33; t[58] = −2.16, p < .04). Across scripts, recall was not significantly different in the auditory distraction condition, but it was significantly lower in Chinese than in English in the visual distraction condition (t[58] = 3.20, p < .002). The recall means are represented graphically in Figure 2.

Experiment 1: Means For Recall Memory (+Se) For Information Presented With Distracting Music Or Distracting Images
Discussion
The significant interaction effects of Experiment 1 are consistent with the notion that alphabetic words rely more on sound-based processing and logographic words rely more on visual processes. Fast-paced music interfered with verbal processing relatively more than complex images for alphabetic English words, whereas the reverse was true for Chinese logographs. Distraction attenuated message learning, inhibited the favorability of evaluations and their strength, and lowered customer interest in the brand. These results suggest that Tavassoli and Han's (2001, Experiment 1) findings were not an artifact of using nonsensical words and dual-task learning instructions. The results generalize to information typical of advertisements based on sentences containing real words with lexical entries that can be processed semantically. Moreover, the results establish the differential effect of auditory and visual interference on alphabetic and logographic words in languages other than Korean in which the use of the logographic script is largely restricted to nouns. The results also extend Tavassoli and Han's (2001) intentional recognition memory findings to incidental recall memory, attitudes, and a behavioral measure of customer interest, which are key measures of advertising effectiveness. The results have direct implications for the design of persuasive messages that benefit from improved message reception: Designers should favor visual elements over auditory elements in English, and they should favor auditory elements over visual ones in Chinese.
In light of several insignificant ad hoc contrasts, it is problematic to interpret the simple means in this experiment. This is because it is difficult to calibrate auditory and visual distracters such that they are equally distracting in an absolute sense. Similarly, despite careful controls, there could be subtle differences between the Chinese and the English conditions even without distraction. For example, the means were higher in English than in Chinese for all dependent measures, and in the case of recall they are significantly so. This obfuscates the direct comparison of means across languages (Rosnow and Rosenthal 1991; Ross and Creyer 1993). Therefore, we can make clear inferences only about the relative effects auditory and visual distracters have on memory for alphabetic and logographic words as indicated by the significant interactions. For all dependent measures, these interactions supported the differential processing hypothesis.
A limitation of Experiment 1 and previous research (Tavassoli and Han 2001) is that the dependent measures largely reflect only the information-processing stage of encoding information into long-term memory. Moreover, the concurrent distracters allow for the possibility that the interference occurred before short-term memory in perceptual memory's acoustic and visual stores. Although this finding is notable in and of itself, it would have greater theoretical and practical implications if the effect of differential interference extended to deeper levels of processing involved in cognitive responding. Indeed, consumers' cognitive responses to a message may have a stronger impact on attitudes and subsequent behaviors than message learning (Greenwald 1968). Experiment 2 examines the differential effect of auditory and visual distraction on cognitive responding in Chinese and English. It also uses auditory and visual distracters that are not concurrent with the verbal information to avoid perceptual interference.
Experiment 2: Elaboration
Free reporting of message-evoked thoughts has been a central part of persuasion research since Greenwald's (1968) argument that attitudes are primarily based on people's reaction to a message rather than the learning of text per se and the notion that such thoughts mediate distraction effects on message impact (Festinger and Maccoby 1964; Osterhouse and Brock 1970). Distraction affects judgments through the process of inference making about a message's judgmental implications (Fishbein and Ajzen 1975; Greenwald 1968), a process that requires short-term memory capacity (Baddeley 1986).
The reduced opportunity to scrutinize the content of a message can enhance or reduce the persuasive impact of a message. For strong messages, the thoughts evoked are primarily positive, and evaluations should be more favorable the lower the level of distraction is (McGuire 1985). The reverse should occur for counterattitudinal messages that primarily evoke counterarguments. We argue that a strong message and more positive thoughts should be generated with visual distraction than with auditory distraction in English, and vice versa in Chinese. Attitude positivity, attitude strength, and attribute recall should follow a similar pattern. Again, this assumes that consumers would not have excess cognitive resources available with which idiosyncratic thoughts could dilute a message's persuasive impact (Anand and Sternthal 1989).
Method
Design
In Experiment 2, 80 bilingual Singapore students different from those in Experiment 1 participated in the 2 (script: alphabetic versus logographic) × 2 (distraction: auditory versus visual) between-participants experiment. All instructions and materials were provided in the respective language. We used the same tennis racquet advertisement as in Experiment 1.
Procedure
Participants were told that we were interested in their ability to perform two tasks at once. They were told to read the information about the tennis racquet as they “would read an advertisement” and that the experimenter would “ask some questions about the ad[vertisement] later.” Participants were also told to remember the nonverbal stimuli that were interspersed between each sentence. Thus, the auditory or visual distracters were presented as part of a divided-attention task. The auditory distracters were distinct sound sequences that included musical tones, science fiction–like beeps, and sound effects such as metallic clunks. The images were colored abstract shapes that were designed to represent corporate logos (i.e., a combination of lines and various geometric shapes).
After being exposed to the information, participants were given two minutes to list all the thoughts, reactions, and ideas that went through their minds as they read the advertisement. Next, they were asked to indicate for each thought listed how favorable or unfavorable that thought was by labeling it from +3 (“very favorable”) to −3 (“very unfavorable”). A neither favorable nor unfavorable thought was rated 0 (for a similar procedure, see Calder, Insko, and Yandell 1974; Lee and Mason 1999). The thought-listing procedure always preceded the attitude scales, because thoughts are the main variable of interest in this experiment. After rating their thoughts in this manner, participants filled out the same five attitude and purchase intention scales as in Experiment 1. Attitude positivity was again an average of these ratings (α = .97). We also assessed attitude strength as we did previously by averaging the response to two scales (r = .93). Recall scores were coded by two coders who had virtually no disagreement.
Results
We expected the same pattern of results across all dependent measures: thoughts listed, attitude positivity, attitude strength, and recall memory. Therefore, we performed a multivariate analysis of variance across the four measures on the between-participant factors scripts (alphabetic English versus logographic Chinese) and distraction (auditory versus visual). The main effects were not significant, but the interaction effect between scripts and distraction was highly significant (F(1, 119) = 26.00, p < .0001). We next report detailed results from univariate analyses.
Thoughts listed
As an indicator of cognitive responding, the key dependent variable of interest in Experiment 2 is the number of thoughts listed. We expected that the less participants were distracted, the more thoughts they would list, such that more thoughts would be listed in the auditory-distraction condition in Chinese and in the visual-distraction condition in English. Because all the differences in thoughts listed were found for thoughts that participants identified as favorable (the dominant cognitive response), we analyzed only the positive thoughts. (The results remain the same for analyzing total thoughts.) The results for all thought categories are summarized in Table 2.
Summary Of The Means (Standard Errors) Of Experiment 2
We performed an ANOVA on the number of positive thoughts for the between-participant factors script (alphabetic English versus logographic Chinese) and distraction (auditory versus visual). The main effects were not significant, but the interaction between script and distraction was significant (F(1, 79) = 15.32, p < .0002). Ad hoc contrast showed that participants listed more thoughts in Chinese when information was presented among auditory distracters (MA = 3.0) than when presented among visual distracters (MV = 2.4; t[38] = 2.35, p < .03). In contrast, fewer thoughts were listed in the auditory distraction condition in English (MA = 2.2) than in the visual distraction condition (MV = 3.1; t[38] = −3.15, p < .004). Across scripts, fewer thoughts were listed in English than Chinese in the auditory distraction condition (t[38] = −2.99, p < .005), but more were listed in the visual distraction condition (t[38] = 2.55, p < .02). The means for the number of positive thoughts listed are represented graphically in Figure 3.

Experiment 2: Means For The Number Of Positive Thoughts (+Se) Generated Among Auditory And Visual Distracters
Attitudes and recall
The ANOVAs similarly showed that only the script-distraction interaction effects, not the main effects, were significant for attitude positivity (F(1, 79) = 16.13, p < .0001), attitude strength (F(1, 79) = 15.17, p < .0002), and attribute recall (F(1, 79) = 12.75, p < .0006). In Chinese, attitudes were more positive in the auditory than in the visual distraction condition (MA = 5.0 versus MV = 4.2; t[38] = 3.07, p < .004), attitudes were stronger (MA = 4.2 versus MV = 3.6; t[38] = 2.20, p < .04), and more attributes were recalled (MA = 3.7 versus MV = 2.9; t[38] = 2.85, p < .008). In contrast, in English, evaluations were less favorable in the auditory than in the visual distraction condition (MA = 4.3 versus MV = 5.0; t[38] = −2.64, p < .02), attitudes were weaker (MA = 3.5 versus MV = 4.4; t[38] = −3.27, p < .003), and fewer attributes were recalled (MA = 3.2 versus MV = 3.8; t[38] = −2.19, p < .04). Across scripts, evaluations in the auditory distraction condition were more favorable in Chinese than in English (t[38] = 2.71, p < .01) but less favorable in the visual distraction condition (t[38] = −2.96, p < .006), attitudes were stronger in Chinese than in English in the auditory distraction condition (t[38] = −2.67, p < .02) but weaker in the visual distraction condition (t[38] = 2.84, p < .008), and recall was directionally higher in Chinese than in English for the auditory distraction condition (t[38] = −1.86, p < .08) but lower in the visual distraction condition (t[38] = 3.15, p < .004).
Discussion
A field study found that English advertisements with music were associated with lower message elaboration than advertisements without music, as measured in a telephone survey, especially of the advertisements' verbal contents (Stewart, Farmer, and Stannard 1990). Music was also found to reduce cognitive responding to an English advertisement for people with high but not low involvement (MacInnis and Park 1991). The results of Experiment 2 suggest that this type of auditory interference is stronger in English whereas visual interference is stronger in Chinese. This differential interference effect was found for the inhibition of positive thoughts (the dominant cognitive response evoked by the advertisement), attitudes toward the product, and recall of ad copy. These results replicate and extend the findings of Experiment 1. They demonstrate that auditory and visual distraction differentially affected thought processes based on alphabetic and logographic words, beyond the encoding of the words themselves. This suggests that the differential interference effect extends to deep levels of information processing.
The results of the first two experiments suggest that messages that would benefit from elaboration should have a minimum of unrelated music or sound effects in English but should have a minimum of unrelated visuals in Chinese. However, this conclusion would only paint half the picture, because nonverbal message elements can also have facilitating effects on message recall. Auditory and visual stimuli can become associated with verbal information and can serve as retrieval cues for the verbal content. This facilitating effect is often the reciprocal effect of interference.
Experiment 3: Integration
Successful advertising design depends not only on the processing of an advertisement's verbal content in isolation but also on relations formed between items of information across modalities. Memory for a brand name can be enhanced when meaningful associations can be made with an advertisement's images (e.g., Lutz and Lutz 1977; Schmitt, Tavassoli, and Millard 1993) or background music (Kellaris, Cox, and Cox 1993). However, it is not necessary that meaningful associations are established between the ad copy and nonverbal information. The encoding-specificity principle (Tulving and Thomson 1973) states that any contextual cue can become associated with an advertisement's verbal content (even being in the same room as at exposure) and can serve as a recall-enhancing retrieval cue when reinstated at memory. Even meaningless elements such as Intel's “tune” and Nike's swoosh can serve as memory cues. Reproduction of advertising elements on Web sites, product packaging, and point-of-purchase displays thus reminds consumers of the advertisement's verbal content, and this is a key function of integrated marketing communications (Edell and Keller 1989). We add to this literature by examining the relative effectiveness of auditory and visual memory cues in Chinese and in English.
Experiment 3 builds on Tavassoli and Han's (2001, 2002) findings; they examined pair-recognition of nonsensical words paired with abstract sounds and images in an intentional learning paradigm. When the words were written in an alphabetic script, participants were more likely to recognize word–sound pairs than word–image pairs, and vice versa for logographs. This pattern of results is the opposite of that of the differential interference effect and reflects the principle that the integration of information is stronger the more items there are that rely on the same encoding mechanisms (McClelland 1996; Tavassoli 1998; Yee, Hunt, and Pellegrino 1991). In other words, the same overlap in mental resources that causes mutual interference between two stimuli at encoding can also facilitate the formation of associations between them in memory.
We extend Tavassoli and Han's (2001, 2002) pair-recognition findings to recall memory in an advertising context. In Tavassoli and Han's experiments, participants learned a list of single nonsensical words that were paired with either brief sound effects or visual images akin to corporate logos. Participants were specifically instructed to learn the information. At test they were presented with the same information as at learning except that half the stimuli were in the same pairings and half were in pairings that were cross-matched. Participants had to recognize which pairing was in the same pairing as at learning. As we discussed previously, the use of two-syllable nonsensical words largely restricts the processing of alphabetic words to a sound-based code, whereas logographic languages are highly visually differentiated, and each character used to represent a syllable can be processed both visually and semantically. This makes it problematic to generalize the findings to stimuli that are typical in marketing communications. In Experiment 3, we therefore use real words and sentences presented as part of an advertisement.
We also do not use intentional learning instructions, but we ask participants to view the information as they would view an advertisement. We use instrumental background music and images that are generically related to the product category but not to any product attribute that is communicated. Moreover, we rely on a cued-recall task that is typical of advertising tracking studies (Stewart, Farmer, and Stannard 1990) to test retrieval from memory rather than a paired-recognition test that does not rely on retrieval from memory (Dosher 1991). In this way, we can assess the degree to which verbal recall benefits from incidental associations made with an advertisement's auditory and visual content. The degree to which reinstating these cues at recall benefits retrieval should depend on the degree to which they are integrated in memory. Because of a greater overlap in processing resources, we expect that visual memory cues facilitate recall memory of Chinese information relatively more than auditory memory cues do. In contrast, we expect that auditory cues facilitate recall memory of English information relatively more than visual memory cues do. We isolate this facilitating effect by presenting the identical audiovisual context across learning conditions and by varying only the presence of an auditory and visual memory cue at test. Whereas attention to nonverbal stimuli inhibits the encoding of verbal information, it does not inhibit memory retrieval (Fernandes and Moscovitch 2000).
Method
Design
In Experiment 3, 80 bilingual Singapore students different from those in Experiments 1 and 2 participated in the 2 (script: alphabetic versus logographic) × 2 (memory cue: auditory versus visual) between-participants experiment. All instructions and materials were provided in the respective language.
Stimuli
We relied on two target advertisements. One was the tennis racquet advertisement used in the previous two experiments, and the other was an advertisement describing a restaurant dinner: “Introducing the new Naysen restaurant. Charming. Affordable. Unique. Authentic. Famous. Cozy. Dining at Naysen is an experience.” Each sentence or word was displayed in a separate text box. The attributes were selected in a pretest with 36 bilinguals different from those in the main experiment. Of the participants, 18 rated the attributes in English and 18 rated them in Chinese on two seven-point scales. For the first scale, participants evaluated each attribute in terms of its valence (1 = “very negative,” 7 = “very positive”). For the second scale, participants indicated how important they perceived each attribute to be among all possible attributes in influencing an overall judgment (1 = “not at all important,” 7 = “very important”). The set of six attributes was evaluated as equally positive in Chinese (M = 5.49) and English (M = 5.37; p > .52) and was perceived similarly in terms of importance in Chinese (M = 4.87) and English (M = 4.79; p > .83). Therefore, we expected that participants would semantically process the information similarly whether written in Chinese or English.
The tennis racquet and restaurant dinner target advertisements were flanked by filler advertisements for a car and a razor. All advertisements were presented by a computer. The presentation of advertisements was either in Chinese or in English and did not otherwise differ between the auditory and visual cue conditions, to isolate a differential effect at retrieval. All advertisements were therefore audiovisual and contained continuous background music and photographic images. A text box for each sentence or attribute was superimposed on a different image that also remained on the screen for three seconds after the text box disappeared. The images were related to the product category but not specifically to any of the attributes described. For example, the tennis advertisement used an image of tennis balls and one of tennis racquets in a wallpaper-like pattern. The restaurant advertisement showed different images of the interior of a restaurant. The music in the tennis advertisement was a medium-paced rock/funk piece. The music was playful and rhythmic and had brass, piano, synthesizer, and percussion instrumentation. The music in the restaurant advertisement was a warm and intimate slow-paced ambient jazz piece; this music had saxophone and acoustic guitar instrumentation. The tennis advertisement lasted about 60 seconds, and the restaurant advertisement lasted about 45 seconds.
Procedure
Participants were instructed to view the advertisements as they would view a television commercial. After seeing the advertisements, participants engaged in a two-minute math quiz to clear short-term memory. Next, they recalled the claims from the tennis racquet advertisement and the restaurant advertisement. In the auditory-cue condition, the music from each advertisement was played during the time participants had to recall a specific advertisement. In the visual-cue condition, the advertisement's images were shown in the same order and pace as at learning. The music or images from each advertisement were looped twice, and participants had two minutes to recall the tennis racquet advertisement and a minute and a half to recall the restaurant dinner advertisements.
Results
Cued recall
The pattern of results was identical for the recall of the tennis and restaurant advertisements, and we combined the recall scores. We performed an ANOVA on the combined recall scores for the between-participant factors script (alphabetic English versus logographic Chinese) and memory cue (auditory versus visual). The script main effect was significant (F(1, 79) = 5.44, p < .03). Recall was higher in English (M = 8.55) than in Chinese (M = 8.05). The memory-cue main effect was not significant (F < 1). The interaction between script and memory cue was significant (F(1, 79) = 10.67, p < .002). Ad hoc contrasts showed that recall was higher in Chinese when aided by visual cues (MV = 8.45) than by auditory cues (MA = 7.65; t[38] = −2.42, p < .02). In contrast, recall was lower in English when aided by visual cues (MV = 8.25) than by auditory cues (MA = 8.85; t[38] = 2.20, p < .04). Across scripts, recall was higher in the auditory-cue condition in English than in Chinese (t[38] = 4.49, p < .0001), but it did not differ in the visual-cue conditions. Again, we caution that the significant script main effect makes direct comparisons of raw means problematic. The raw means are represented graphically in Figure 4.

Experiment 3: Mean Recall (+Se) With Auditory And Visual Cues
Discussion
The results of Experiment 3 provide further support for the differential processing hypothesis. More of the English ad copy was recalled in the presence of the auditory cues, whereas more of the Chinese ad copy was recalled in the presence of visual cues. These findings demonstrate a reciprocal facilitating effect to the interference effects observed in Experiments 1 and 2. In Experiments 1 and 2, auditory distraction attenuated the processing of English to a greater degree, whereas visual distraction attenuated the processing of Chinese to a greater degree. In Experiment 3, there was no opportunity for differential interference at encoding, because the advertisements in all conditions contained the same auditory and visual elements. Instead, reinstating either the audio or the visual portion of an advertisement led to differential levels of retrieval from memory in the two scripts.
Experiment 3 suggests that logos and ad images, such as the Michelin baby, can be used on product packaging and in-store displays to facilitate the retrieval of verbal information that was previously learned in the presence of these cues. This facilitating effect should be relatively stronger in Chinese than in English. Auditory brand identifiers can also be used and “like visuals or smells, sounds can become associated with brands, and once they are, they become hugely powerful as branding devices” (Andrew Ingram qtd. in Croft 1999, p. 41). For example, the MGM lion roars when the computer mouse is moved over the image on the corporate Web site. It has also been suggested that music can serve as a particularly effective retrieval cue in integrated marketing campaigns (Edell and Keller 1989). Auditory brand identifiers and music should be relatively more potent retrieval cues in English than in Chinese.
General Discussion
Overview of Findings
Written language is a central aspect of everyday life and is at the core of culture. Writing is a principal vehicle of communication in marketing and is the vehicle with which marketing researchers present experimental stimuli. We demonstrated three main results that provide direct marketing application for the theoretical contribution of Tavassoli and Han (2001), who predicted that the world's two major writing systems should interact differently with auditory and visual stimuli. In Experiment 1, we found that nonverbal auditory stimuli interfered more than nonverbal visual stimuli with the processing of alphabetic English, and vice versa for logographic Chinese. We demonstrated this for recall memory, attitude positivity, attitude strength, and customer interest. In Experiment 2, we found this differential distraction effect to extend beyond the encoding of information to cognitive responding at the time of exposure. In terms of tropes of the mind, this suggests that the phrase “I can't hear myself think” is more representative of processing alphabetic English words, and the phrase “I see what you mean” is more representative of processing Chinese logographs. In Experiment 3, as a reciprocal effect of interference, auditory memory cues facilitated the recall of English better than visual memory cues, and vice versa for the cued recall of Chinese.
These findings have broad applications because the everyday reading of information rarely occurs without some form of auditory and visual influence, be it music that is part of a commercial or a jackhammer outside the window, be it graphics or flashing banner advertisements on a Web page or the visual content of the programming context. We also believe that the implications of our findings extend beyond the population of fluent bilinguals who we employed to native speakers of Chinese and English. Tavassoli and Han's (2001) pair-recognition finding was originally demonstrated for a single language, Korean, in which many words can be written in both an alphabetic and a logographic script. Tavassoli and Han (2002) replicated this effect with Chinese–English bilinguals and with native Chinese and native English speakers. Our choice of bilingual participants was determined by our ability to control for cultural factors and therefore should not limit the generalizability of our results to monolinguals.
Marketing Implications
The differential effect of the auditory and visual distracters on the retention of alphabetic and logographic ad copy suggests, for example, that visually demanding traffic conditions attenuate memory for a just-read billboard relatively more for Chinese than for English. It also suggests that noise such as that in a subway should be more detrimental to consumers reading English than Chinese advertisements displayed on a poster or in a newspaper. Whereas these factors are not under the control of the marketer, the use of auditory and visual elements in a television commercial or a Web site is. Stimulating music that is designed to capture and hold consumers' attention should be relatively more detrimental to message reception and cognitive responding in English, whereas engaging visual graphics used to capture consumers' attention should be relatively more detrimental in Chinese. Effective advertisement design would thus mandate a relative shift from using auditory to visual elements in English, and vice versa in Chinese.
In contrast, nonverbal stimuli can be powerful memory devices, and our findings demonstrate a differential facilitating effect that works in the opposite direction of the differential interference effect. The findings of Experiment 3 suggest that auditory cues, such as music (e.g., Chevrolet's “Like a Rock”), sound effects (e.g., the MGM lion's roar), and meaningless brand identifiers (e.g., Intel's trademarked “tune”), should be more potent retrieval cues in English. In contrast, visual cues, such as meaningless logos (e.g., Nike's swoosh), images (e.g., the Michelin baby), and colors (e.g., the UPS brown), should be more potent retrieval cues in Chinese. This cross-linguistic difference is an important consideration for advertising tracking studies that try to use the most sensitive memory cues to test advertising effectiveness (Stewart, Farmer, and Stannard 1990) and for the effective design of advertisements and integrated marketing communications. These studies need to consider both the detrimental effects of nonverbal information at exposure and their facilitating effects when reproduced in multimedia Web sites or point-of-purchase displays. There can be no absolute recommendation for the use of auditory and visual ad elements, because these can differ in the degree to which they attract attention, demand cognitive resources, and relate in meaningful ways to the verbal ad copy. The key insight from this research is that there are relative differences in which alphabetic and logographic scripts are sensitive to their influence.
It is also worthwhile to note that less distraction does not always lead to more positive attitudes. For example, when consumers have an excess amount of cognitive resources available to process a message, they may generate idiosyncratic thoughts that are likely to be less favorable than direct message associations (Anand and Sternthal 1989). These situations can occur, for example, when a message is exceedingly simple or when a customer is repeatedly exposed to an advertisement. The most persuasive level of distraction also depends on how well arguments hold up under careful scrutiny. If the primary cognitive response to a message is the generation of counterarguments, distraction can also lead to more positive attitudes (e.g., Petty, Wells, and Brock 1979).
Distraction can also lead to a qualitative shift in information processing (for a review, see Petty and Wegener 1998). As consumers move down the elaboration likelihood continuum, central route processes diminish in magnitude. Attitudes then are determined less by the effortful examination of all relevant information and more by less careful examination of the same information or the effortful examination of less information. As a result, there may be a qualitative shift in information processing, such that the influence of peripheral mechanisms that do not involve thought about the substantive merits of the arguments gains in influence. In the context of central and peripheral cues, it would be particularly worthwhile to examine the relative influence of auditory and visual peripheral cues on persuasion for information written in alphabetic and logographic scripts. As Experiment 3 shows, the reciprocal effect of interference is the integration of verbal and nonverbal information in memory. Visual memory cues were relatively more potent in reinforcing memory for Chinese, whereas auditory memory cues were relatively more potent in English. These elements could also serve as peripheral cues. Our findings would predict that visual peripheral cues would be relatively more potent than auditory ones in Chinese, and vice versa in English. This is because visual peripheral cues compete for the short-term memory capacity required to process logo-graphic central cues to a greater degree than do auditory cues while being integrated to a greater degree with the verbal memory trace.
Finally, our results have serious implications for academic research. Most experimental research involves the presentation and reporting of written words. Many experiments also include multimedia stimuli and some form of filler task that include auditory and visual features. These factors can interact to create differences in the degree to which information is learned and processed. For example, consider a hypothetical experiment that compares the relative effects of central and peripheral processes in China and the United States. If the researchers arbitrarily chose an advertisement with distracting music, they might conclude that U.S. respondents, whose verbal processing should be relatively more distracted by the music, are more influenced by peripheral cues (e.g., the number of claims) and less by central arguments (e.g., the quality of these claims) than are Chinese respondents. They might reach exactly the opposite conclusion if they chose a visually complex advertisement instead, which should be relatively more distracting in the processing of Chinese words. These are difficult issues to resolve, but researchers need to be aware of the potential effects nonexperimental variables may have.
