Abstract
Voices are an important part of marketing communications. Hence, this study draws on sound symbolism to explore how voice pitch in a destination promotion video helps communicate the gender identity of a destination and ultimately generate visitors. The findings of three experimental studies demonstrate that a lower (vs. higher)-voice pitch symbolizes masculinity (vs. femininity), providing an obvious gender cue for tourists to evaluate the gender identity of a destination. This voiceover-destination congruency in masculinity/femininity perception was found to trigger one’s auditory mental imagery and travel intention, whereas voiceover-tourist congruency only fostered one’s travel intention to masculine destinations. Theoretically, results challenge self-congruency theory and suggest that masculine/feminine tourists may not necessarily value gender identity congruency as much as they have in the past which might be due to heterosexuality becoming less normative. Practically, auditory communication of destinations should consider masculinity/femininity congruency between the voiceover and the destination.
Keywords
Introduction
Drawing on symbolism, Ekinci and Hosany (2006) proposed the concept of destination personality that describes “the set of human characteristics associated with a destination as perceived from a tourist rather than a local resident viewpoint” (p. 127). Subsequently, personification has been an effective to differentiate similar destinations (Usakli & Baloglu, 2011), as human-like characteristics have been suggested to result in more favorable attributes and travel intentions than branding with functional attributes such as natural resources (Letheren et al., 2017).
Destination personalities can also be manipulated via being personified as more masculine or feminine. Hence, a gendered lens has widely been used to portray the masculinity and femininity of brands (Grohmann, 2009). Typical masculine characteristics have been proposed to encompass attributes such as assertiveness, forcefulness, authority, resilience, self-reliance, valor, and intrepidity. Conversely, typical feminine traits are often described as including nurturance, deference, elegance, tenderness, and sensitivity (Grohmann, 2009). Given the overarching effectiveness of a gendered lens in brand marketing and positioning (Lieveb et al., 2015; Pan et al., 2017), destination marketers have increasingly developed gendered cues to attract potential tourists. Examples of gendered destination include the commodification of women in Thai tourism and London’s macho performativity.
However, tourism scholars have historically focused on smaller and occasionally nebulous personal traits such as smartness (Au & Tsang, 2022) and coolness (Kock, 2021) rather than the gender identity of destinations (C. G. Q. Chi et al., 2018). Recently, Pan et al. (2021) developed a scale to measure two dimensions of destination gender: masculinity (i.e., dominance, vigor, courage, and competence) and femininity (i.e., grace, softness, gorgeousness, and kindheartedness). While the destination gender scale has enabled scholars to examine how tourists respond to gender cues related to destinations (e.g., Hamdy et al., 2023, 2024), it remains largely unclear how to effectively communicate a destination’s gender identity (Pan et al., 2020).
In modern marketing, sensory experiences play an important role in shaping consumer perceptions and decision-making. Sensory marketing, which leverages visual, auditory, olfactory, tactile, and gustatory stimuli, has been widely recognized for its ability to evoke emotions and enhance brand associations (Krishna, 2012). For example, John Denver’s classic song “Take Me Home, Country Roads” is suggested to portray a relaxing countryside destination image of West Virginia that imbues mental imagery and nostalgic feelings of mountains and their scenic beauty (Fan et al., 2023). Despite the growing importance of sensory marketing in tourism, existing studies have primarily focused on visual elements (e.g., color, symbols, and imagery) to shape destination perceptions (e.g., He et al., 2024; Yu et al., 2020), while auditory cues (such as voice) remain largely unexplored.
Voices, as one of the most obvious cues to distinguish gender, plays an important role in brand communication (Krishna, 2012) and may serve as a powerful yet underutilized tool for conveying a destination’s gender identity. Existing research has recognized voice as the first stimulus that individuals recognize and react to, even prior to birth (Kisilevsky et al., 2003). As suggested by sound symbolism, a speaker’s voice can act as a powerful persuasion tool, as consumers implicitly connect sound characteristics (e.g., voice pitch) with product/brand characteristics (e.g., destination gender) (Melzner & Raghubir, 2023). For example, Hurtz and Durkin (2004) discovered that voice gender in advertisements elicits gender-stereotypical attributes that lead to better recall performance of an advertisement. In particular, one’s perceptions of a voice’s gender primarily rely on voice pitch, with female voices typically being an octave higher than male voices (Latinus & Taylor, 2012). Hence, the pitch of a voiceover is likely an effective gender cue for destination marketers to communicate a destination’s gender identity.
Drawing on sound symbolism, the current study aims to examine how the pitch of voice used in a destination promotion video can be intentionally designed to better communicate a destination’s gender identity, enhance positive destination congruency perceptions in masculinity/femininity, and downstream visitation. It is believed this aim is particularly timely because destination marketers have increasingly developed short videos on mobile social platforms such as TikTok, Instagram, and Snapchat to promote travel activities (Gan et al., 2023), yet the verbal perspective of these communications has largely been overlooked in the sensory marketing literature (Motokiet al., 2023).
This aim was examined by employing three online experiments, adopting a between-group methodology. The first experiment examined how voice pitch influences individuals’ masculinity/femininity perception of the voiceover. The second experiment delved into the mechanism through which voice pitch affects one’s perceived masculinity/femininity of the voiceover, ultimately influencing their voiceover-destination congruency. Building upon the first and second experiments, the third experiment focused on how voiceover-destination congruency and voiceover-tourist congruency in masculinity/femininity perceptions influence one’s travel intention through auditory mental imagery.
Literature Review
Non-Verbal Communication in Tourism
As a powerful instrument for communication, voice involves two main elements: verbal and non-verbal. The former mainly refers to what is being said (i.e., linguistic content), whereas the latter describes how an individual says things (Tracy et al., 2011). Non-verbal communication have received extensive scholarly attention in the tourism literature (e.g., Islam & Kirillova, 2020; Jung & Yoon, 2011), not only because Birdwhistell (1952) asserted that 65% of human communication is non-verbal, but also because the tourism industry is a service industry where practitioners are required to demonstrate a high level of soft skills to show friendliness, responsiveness, and enthusiasm (Baker & Kim, 2018).
Sundaram and Webster (2000) identified four main forms of non-verbal communication in service encounters: physical appearance, kinesics, proxemics, and paralanguage. Tourism scholars have focused their studies on physical appearance (e.g., facial piercing: Pinto et al., 2020), kinesics (e.g., eye contact: K. Kim & Baker, 2019), and proxemics (e.g., physical distance between patrons and servers: Jacob & Guéguen, 2012). However, paralanguage (e.g., pitch, speech rate, and amplitude) has been suggested as the only form of non-verbal communication that influences customer’s both positive and negative emotions in tourism service encounters (H. Lin et al., 2020).
Given the rapid development of digital communication (Bharadwaj & Shipley, 2020), tourism scholars have increasingly shifted their focus to the effect of paralanguage on persuasion. For example, Wang et al. (2024) adopted a voice mining technique to report a significant effect of speech rate, loudness, and pitch in digital interpretation platforms on one’s tourism interpretation purchases. Further, Barnes (2024) found that tourism marketing videos with lower voice intensity and higher speech rate were more effective to trigger viewer’s positive emotions. Also, Barattin and Latusi (2025) discovered that a conversational human voice generated a higher level of hedonic value and thus resulted in more sharing behaviors for tourism destination brands than a formal corporate voice.
However, despite the recognition of the impact of paralanguage on persuasion, tourism studies have largely overlooked how the symbolic meanings embedded in these vocal characteristics connect with destination attributes (Motoki et al., 2023). This perspective is believed to be important not only because it complements a psycholinguistic view of tourism (Rahmani et al., 2019), but also because it introduces new psychological mechanisms to explain the context-dependent of effect of paralanguage on persuasion as discovered by existing tourism studies (e.g., Barattin & Latusi, 2025; Zhou & Huang, 2024).
Sound Symbolism and Personality
Since ancient Greek philosophy, sound has been assigned semantic meaning (Klink, 2001). In the dialogue Cratylus, Plato suggested that “the letter r appears to me to be the general instrument expressing all motion” (p. 460). This non-arbitrary relationship between sound characteristics and meaning, also known as sound symbolism, was first discovered by Sapir (1930) who demonstrated that the sound ‘mal’ symbolized a larger object, but the sound ‘mil’ symbolized a small object. With a myriad of supporting evidence on the symbolic meanings of sound (Westbury et al., 2018), many marketing scholars have started to investigate how a product name could be equipped with meanings for better marketing purposes. Specifically, sound symbolism occurs effortlessly to offer great marketing advantages (Parise & Pavani, 2011; Peiffer-Smadja & Cohen, 2019).
The characteristics of sound has resulted in at least to main research areas in the marketing literature. Inspired by Sapir (1930), the first research area explores the symbolic meanings of vowels and consonants. For example, Joshi and Kronrod (2020) discovered that words with voiceless consonants (e.g., /k/, /p/, and /t/) triggered environmentally friendliness perceptions, while those with voiced consonants (e.g., /b/, /d/, /g/, /z/, and /v/) were perceived as more harsh (Pathak et al., 2020). Unlike the first research area, that has mainly focused on word pronunciations, the second research area, which is more relevant to this study, has shifted the focus to paragraphs to examine the effects of various vocal features, such as pitch (Tolmeijer et al., 2021) and speech rate (Y. Lee et al., 2019). Table 1 summarizes relevant literature that has helped explain how individual’s perceptions have been formed by vocal features.
Relevant Studies on Sound Symbolism.
Since Aronovitch (1976, p. 208) asserted that “people do make personality judgements about other people based, at least in part, on vocal cues”, verbal cues have widely been recognized as reliable and prominent predictors of personality judgment (Riggio & Friedman, 1986). This argument was supported by the bio-informational dimensions theory, which suggests that the human voice contains markers signaling speaker’s characteristics. Breil et al. (2021) suggested that paralanguage allows individuals to make stable and acute judgments about other personalities within seconds, because it provides additional (e.g., emotional state) and contradictory (e.g., sarcasm and deception) information beyond meanings of spoken words.
Voice Pitch and Masculinity/Femininity
From a biological perspective, voice pitch, also known as the fundamental frequency (F0) of the voice, is determined by the amount of testosterone present at the later stages of puberty. As a sexually dimorphic feature, voice pitch gradually decreases throughout childhood development up until the onset of puberty in both males and females (Huber et al., 1999). Voice pitch of females generally decreases at a relatively slower rate than those of males through puberty, with males generally having a lower vocal pitch (F0 range: 80–185 Hz) than females (F0 range: 165–255 Hz) (Tsantani et al., 2016). This biological characteristic has widely been recognized in sound symbolism studies to serve as a masculinity or femininity cue in multiple studies (e.g., Cartei et al., 2014; Wu et al., 2023).
Generally, regardless of one’s actual gender identity (Ko et al., 2006), individuals with higher-pitched voices are more likely to be perceived as more feminine, while those with lower-pitched voices are more often stereotyped as more masculine. This argument was supported by Krahé and Papakonstantinou (2020) who discovered that a female was more likely to be perceived as masculine if her voice pitch was below 165 Hz. The relationship between voice pitch and masculinity/femininity has also been recently verified in many digital settings such as robot design (e.g., Perugia et al., 2022) and artificial intelligence-based voice assistants (e.g., Shiramizu et al., 2022). Given the biological characteristics of males, a lower (higher)-pitched voice is expected to be perceived as more masculine (feminine). Thus, the following hypothesis was proposed:
Masculinity/Femininity and Persuasion
Gélinas-Chebat et al. (1996, p. 243) asserted that voice is “an ignored vehicle of persuasion” and suggested the importance of future research in this area. More recent research as suggested that gender-related voice in advertisements can alter the perceived credibility of a message (Potter et al., 2019) and the attitude toward a commercial (Potter & Choi, 2006). These potential effects have encouraged scholars to compare the effects of voice masculinity versus voice femininity in advertising; yet the results have been largely inconclusive (Casado-Aranda et al., 2018).
Initial studies concluded that voice masculinity was more effective, because it was suggested to convey greater credibility, trust, authority, and expertise, and consequently, more persuasive (Klofstad, 2016). This is similar to Lovdal (1989) who asserted that male voices were considered more authoritative and convincing than the female voice. Even when the voice has been perceived as less warm, voice masculinity has been found to trigger positive cognitive, affective, and conative effects on individual’s responses (Zuckerman & Hodgins, 1993). These results have also been reflected by the dominant role of male voiceovers in advertising, with Pedelty and Kuecker (2014) reporting a 4:1 ratio of male to female voiceovers after a quantitative content analysis of 1,055 television advertisements.
However, given the rise of the feminist movement (Varghese & Kumar, 2022), scholars have started to doubt whether the superiority of voice masculinity in persuasion is based exclusively on traditional stereotypes (Rodero et al., 2013). Pedelty and Kuecker (2014) argued that the large proportion of male voiceovers in the past has shaped consumers’ expectations for hearing advertisements performed by male voiceovers, and thus marketers assumed consumers’ preference for male voiceovers. This circular logic has therefore been carried on through generations, ultimately over-exaggerating the superiority of voice masculinity. Also, Stevens and Ostberg (2020) argued that consumers are increasingly focusing on femininity characteristics such as emotions, senses, and impulses in decision-making processes than on masculinity characteristics (e.g., cognition, rationality, and logic). The disparity of past results, and the potential importance of understanding the role voice plays in decision-making, suggest a more in-depth investigation of the effect of voice masculinity/femininity in persuasion.
Application of Self-Congruent Theory
Debevec and Iyer (1988) introduced a match-up hypothesis that postulated that gender consistency between the product and the presenter leads to more positive evaluations, higher purchase intention, and higher perceived expertise (Rodero et al., 2013; Casado-Aranda et al., 2018; Efthymiou et al., 2024). This gender congruence has been found to also be at a brand category level, suggesting that consumers also tend to create their gender identities through the brands they use (Avery, 2012). Alreck (1994) asserted that gender identity is one of the most salient dimensions of brand personality. Neale et al. (2016) also suggested gender identity (i.e., masculinity/femininity) as a more effective dimension to predict consumers’ attitudes and responses toward a product/brand than biological sex (i.e., male/female).
This gender congruence is rooted at self-congruity theory (Neale et al., 2016). By definition, self-congruity describes the extent to which products or brands are similar to how individuals see or would like to see themselves (Malhotra, 1988). Self-congruity theory, hence, posits that high degree of congruence leads to more favorable attitudes and higher purchase (Belanche et al., 2021). Its symmetry is also supported by cognitive dissonance theory, which suggests that individuals experience mental discomfort and adapt their behaviors if they engage with cognitively inconsistent elements. In other words, to increase self-congruence, Belanche et al. (2021) discovered that individuals are more likely to follow suggestions from advertisers whose image is cognitively consistent with their own self-image and select products that are cognitively consistent (Fleck et al., 2012; D. Y. Kim & Kim, 2021).
Congruence with Destination Masculinity/Femininity
Since Aaker (1997) introduced the concept of brand personality to reflect the “set of human characteristics associated with a brand” (p. 347), marketing scholars have demonstrated its strong predicting power on individual’s decision-making process in various contexts such as shopping malls (e.g., H. R. Kim et al., 2005) and restaurants (e.g., Siguaw et al., 1999). Marketing researchers have studied how the gender of brands or products influences brand personality (Neale et al., 2016), consumer preferences, brand positioning, and brand value (Machado et al., 2019).
As a branding strategy that helps develop uniqueness in place perceptions, the concept of gender identities has recently been applied by Pan et al. (2021) to conceptualize gender as a two-dimensional construct consisting of destination masculinity and destination femininity. Sound symbolism theory (Melzner & Raghubir, 2023) suggests that auditory cues such as voice pitch implicitly communicate characteristics such as masculinity and femininity. When a voiceover’s gendered characteristics align with the perceived gender of a destination, it has been found to enhance congruency perceptions, leading to stronger associations and more favorable evaluations (Fleck et al., 2012). Therefore, a voiceover with masculine traits (e.g., low voice pitch) is expected to align more strongly with masculine destinations perceived as dominant or adventurous, while a voiceover with feminine traits (e.g., high voice pitch) should align with feminine destinations associated with elegance and relaxation. Following the above discussion, this study hypothesized that:
The two-dimensional conceptualization of destination gender has already been widely linked with self-congruity theory to examine travel decisions and behaviors such as destination attachment (Hamdy et al., 2024), destination loyalty (Ren & Pan, 2024), and destination revisit intention (Hamdy et al., 2023). Specifically, traveling has been found to be an important lifestyle choice that can play an important role in seeking, negotiating, and constructing self-identity (McWha et al., 2018). This suggests that tourists are expected to look for destinations that fit or are congruent with their gender expression.
Congruence with Tourist Masculinity/Femininity
Customer congruence has also been found to be relate to individuals’ attitudinal and behavioral responses (Belanche et al., 2021). As suggested by social identity theory (Tajfel et al., 2001), individuals tend to categorize themselves based on social groups and attributes, seeking experiences that reinforce their identity. Gendered vocal cues serve as signals that either align or misalign with a listener’s self-concept of gender identity. When an individual perceives a voice as congruent with their gender identity, they are more likely to feel a connection with the advertisement (D. Y. Kim & Kim, 2021).
In the context of celebrity advertising, Glover (2009) suggested that consumers are more likely to engender a sense of familiarity and closeness when there is a fit between the self-concepts of a celebrity and their own personality (Li et al., 2023). Similarly, Hamdy et al. (2023) found a significant effect of celebrity-tourist personality congruence on tourist’s revisit intention. Hence, destination promotion videos whose personality (e.g., masculinity/femininity) match those of potential tourists are expected to be more influential, because tourists are more likely to choose destinations that reflect their self-identity and reinforce their sense of belonging (McWha et al., 2018). Hence, this study hypothesized that:
The Mediating Role of Mental Imagery
Mental imagery represents a cognitive process that involves the activation of perceptual knowledge stored in an individual’s long-term memory (Miller et al., 2000). As a crucial cognitive process during the pre-consumption stage (Horowitz, 1972), mental imagery has garnered substantial attention in advertising research to examine how advertising messages can elicit mental imagery among consumers, which in turn can influence individual’s cognitive, affective, and conative responses (e.g., Gavilan et al., 2014; Lien & Chen, 2013; J. Yoo & Kim, 2014).
Kosslyn et al. (1978) proposed the quasi-pictorial theory of mental imagery to suggest mental imagery as the linguistic description of visual scenes. Prior to this, most scholars had doubted whether a mental image is picture-like, in that it only exists in visual form. This finding has stimulated scholars to explore different sensory formations of mental imagery. Specifically, Kosslyn et al. (2010) suggested that mental images are not only in visual form, but also include auditory and kinesthetic elements as well. However, our understanding of mental imagery has likely been biased toward visual elements, because most scholars have followed the initial conceptualization of mental imagery to focus exclusively on visual elements such as color (Au et al., 2024) and 360-degree rotatable product images (S. Kim et al., 2020).
White et al. (1977) argued that auditory imagery is the second most prominent factor, following visual imagery, in mental imagery formation. Similar conclusions have subsequently been found by many scholars who have explored the dimensionality of mental imagery (e.g., Khalilzadeh et al., 2023). The scarce scholarly attentions on auditory mental imagery could be attributed to the unclear sensory quality of the auditory construct. Unlike the visual construct, whose quality can be assessed by its saturation, brightness, and colorfulness (Moriya, 2024), the auditory construct does not have a clear quality assessment.
While the context-specific nature has made the impacts of sound on individual’s mental imagery formation complicated and has produced inconsistent findings, Oakes (2007) introduced a congruency perspective to better clarify the role of sound in one’s decision making. Such a perspective contains two congruence elements: relevancy (i.e., the extent to which stimulus facilitates or impedes clear identification of a message and its meaning) and expectancy (i.e., the extent to which stimulus aligns with one’s prior knowledge) (Heckler and Childers, 1992). In other words, sound could stimulate individuals’ narrative thinking and evoke memories of characters that express the destination personality and meaning, thereby creating a connection between the destination and the self.
Gallace and Spence (2006) affirmed auditory congruency with mental imagery by discovering that individuals were more quickly and accurately to identify characteristics of a visual cue when the visual cue was presented with a congruent audio stimulus. Hagtvedt and Patrick (2008) further elaborated on auditory congruency, suggesting that individuals find difficult in mentally connecting advertising messages with their brand image if the advertising messages are presented in incongruent sounds. Hence, it is hypothesized that:
As a concept rooted in self-concept theory (Sirgy, 1982), congruency effect has been found to not be limited to external elements (e.g., voiceover and destination), but also be related to how one’s self-image (i.e., tourist’s masculinity/femininity) is consistent with the stimulus message (e.g., Javornik et al., 2021) and the brand image (e.g., J. Yoo & Kim, 2014). For example, Holmes (2021) discovered that individuals with a higher degree of self–brand congruence demonstrated stronger emotion and recognition (e.g., mental imagery) than less self–brand congruence. Also, drawing on cognitive balance theory, Fan et al. (2023) discovered that tourists with higher perceived congruence with a destination were more likely to develop auditory mental imagery, and consequently higher visit intentions. Hence, the following research model (Figure 1) was developed, by hypothesizing that:

Proposed research model.
Study One
Study One examines H1: whether lower (higher) pitched voices positively influence perceived voice masculinity (femininity).
Experiment Design
Study One used a between-group methodology with a single-factor (low vs. middle vs. high pitched voice) experimental design to investigate individuals’ perceptions of masculinity toward voice. The experimental stimuli were created in three stages. First, four destination promotion sentences were randomly generated using a publicly available random-text generator (https://deepai.org/chat/text-generator) (Appendix A). Second, this study used a “male” voice from Play.ht (i.e., an AI-powered voice generator that creates ultra-realistic humanlike voices; https://www.play.ht) as the baseline voice and created an audio file for each sentence. A computer software package for speech analysis (i.e., Praat) suggested that the pitch (i.e., 113.29–135.14 Hz) of the baseline voices falls with the ranges (i.e., 85–155 Hz) of an average adult man. Next, voice pitch was systematically increased and decreased by three levels using the built-in function provided by AudioDirector 365 to create two different conditions. The current study followed Efthymiou et al. (2024) who recognized voice pitch as an objective vocal property, as existing auditory cognition literature has suggested that not all individuals consciously perceive pitch variations, especially when the manipulation is subtle or embedded in natural speech (Bidelman & Krishnan, 2009). Hence, rather than measuring individual’s subjective perception of the pitch manipulation, we manipulated it in an acoustical manner by employing Praat (Figure 2) to display the variation in pitch (measured in Hertz, Hz) over time (in seconds) for the four different conditions of each audio file. The results suggested the manipulation of voice pitch was successful as the average Hz of the high-pitched voices was 142.05, 18.04% higher than the middle-pitched voices (i.e., 120.34) and 37.29% higher than the lower-pitched voices (i.e., 103.47).

Voice pitch of a male voice.
The study was conducted on lediaocha.com. At the start of the online survey, this study included a hardware check to exclude participants who were not using sound-capable devices. Then, participants were instructed to complete a hearing test. Those who did not provide a correct answer were considered ineligible for the experiment. Eligible participants were randomly assigned to one of the three groups based on different voice pitched levels. Each participant was required to listen to four destination promotion sentences at random. After listening to each sentence, they were asked to rate the perceived masculinity of the voiceover using a seven-point Likert scale (−3 = very feminine; 3 = very masculine) (Wu et al., 2023). Finally, participants provided sociodemographic information, including gender, age, education level, and income level.
Data Analysis and Findings
With the assistance of an independent professional survey company (i.e., Lediaocha, a professional survey company operated by Shanghai Feiguan Information Technology Co, Ltd.), 314 responses were collected over a one-week period from their panel members. Given its 11 years of research experience in China (Lediaocha.com, 2025), Lediaocha.com has widely been utilized by scholars to conduct scientific surveys with residents in China (e.g., P. M. Lin et al., 2023). Invalid responses completed within an unreasonable time period (0.5 min) (n = 4), hearing test failures (n = 2), and validation test failures (n = 8) “How many sentences have you listened to in the experiment?”) were eliminated, resulting in 300 valid responses (n = 100 for each voice pitch level). This sample size satisfied the minimum sample size requirement (n = 159; n > 53 per experimental group) as suggested by an an-priori analysis (α = .05; β = .8; f = 0.25) using G*Power software. Further, the sample was deemed representative (Appendix B) as its demographic characteristics were similar to those of Chinese tourists (i.e., female: 52.0%sample vs. 56.0%population; average age: 37.6sample vs. 40.0population) (Zhang, 2024).
Data analysis consisted of four stages using IBM SPSS 27.0 to analyze the valid responses. First, chi-square analyses and one-way analyses of variance (ANOVAs) were used to identify sociodemographic differences between sample groups. The insignificant results (p > .05) affirmed the similarity of groups, suggesting a lack of confounding effects for group comparison (Appendix B). Second, a reliability analysis returned strong Cronbach’s alphas for the perceived masculinity/femininity of the voiceover (
Third, a one-way ANOVA was performed which suggested a significant effect of voice pitch on perceived masculinity for all four sentences (S1: F = 178.255, p < .001; S2: F = 170.113, p < .001; S3: F = 147.184, p < .001; S4: F = 129.395, p < .001), thereby supporting H1. Lastly, on top of the significant ANOVA result (F = 204.666, p < .001), Tukey post-hoc tests were performed to compare the average scores of perceived masculinity/femininity across the three conditions (Table 2). Consistent with many existing studies (e.g., Wolff & Puts, 2010; Cartei et al., 2014; Wu et al., 2023), respondents rated their perception of low-pitched voice messages to be more masculine/less feminine than high-pitched voice messages.
Perceived Voiceover Masculinity/Femininity of the Three Conditions (Study One).
Note.
p < .001.
Study Two
The objective of Study Two was twofold: (1) to examine whether voice pitch differences foster individual’s voice-destination congruency choice (H2) and (2) are they robust across gender voices?
Experiment Design
Similar to Study One, Study Two also adopted a between-group methodology with a single factor (low vs. middle vs. high pitched voice) experimental design to investigate individual’s voice-destination congruency choice. The same text-to-speech interface was adopted to alter the vocal features of each sentence, but Study Two used a “female” voice from Play.ht as the baseline voice. Specifically, the pitch (i.e., 217.89–229.07 Hz) of the baseline voices fell within the suggested ranges (i.e., 165–255 Hz) of an average adult woman (Purcell & John, 2010). The pitch differences were again cross-checked by Praat (Figure 3) to validate the manipulation of voice pitch, with the average Hz of the high-pitched voice (i.e., 265.26) being 21.52% and 43.48% higher than those of the middle-pitched voices (i.e., 218.28) and lower-pitched voices (i.e., 184.88), respectively. Participants were randomly assigned to listen one of the three conditions, in which the four pitched sentences were automatically broadcasted in a random sequence. Then, participants completed the same scale to assess the perceived masculinity of the voiceover as in Study One.

Voice pitch of a female voice.
Inspired by prior work on gender-based destination stereotypes (Pan et al., 2020), a binary destination choice task was used to examine individual’s voice-destination congruency choice. Specifically, self-congruent theory posits that a masculine (feminine) voice is more likely perceived to fit with a masculine (feminine) destination. After listening to the four sentences, participants were then presented with four randomized binary destination options: one stereotypically feminine option and one stereotypically masculine option (Pan et al., 2020, 2021) (see Appendix C). They were then instructed to select one destination that best fit the voice they interacted with previously from each destination pair. In other words, this procedure yielded four congruency choices for each participant, which was done to check for consistency in their responses to determine voice-destination congruency. The tenet of sound symbolism reveals that sound(s) features hold universal symbolic meaning across languages (Svantesson, 2017). Hence, we did not align the semantic meanings with specific destination pairs to minimize confounding effects of semantic meanings on sound symbolism (Efthymiou et al., 2024).
The four binary destination options were selected based on the characteristics of destination femininity (e.g., relaxing, lovely, romantic, and graceful) and masculinity (e.g., adventurous, heroic, conquering, and vast) identified by Pan et al. (2021). A pilot test was then conducted with 59 tourism postgraduates to evaluate the manipulation of destination masculinity/femininity. Specifically, pilot respondents were required to indicate their perceptions of the masculinity (7 items; Hamdy et al., 2023) and femininity (6 items; Hamdy et al., 2023) on 7-point scales (1 = strongly disagree; 7 = strongly agree) of the destinations. The results of several paired sample t-tests suggested that the masculine destinations were significantly more masculine than the feminine destinations (t = 33.731, p < .001); and that the feminine destinations were significantly more feminine than the masculine destinations (t = 29.574, p < .001).
Data Analysis and Findings
With the help of a professional survey company, different than the company in Study One (i.e., Credamo), 314 responses were collected over a one-week period from panel members. Credamo is a Chinese professional survey company, established in 2017, who provides data collection services to various academic institutions (e.g., New York University and Peking University) employing their 3 million+ person panel (Credamo, 2025). Responses completed within an unreasonable time period (20 s for each listening question; 1.5 min for the whole survey) (n = 8), who failed the hearing test failures (n = 1), or validation test “How many sentences have you listened in the experiment?”) failures (n = 5) were eliminated. This resulted in 300 valid responses, above the minimum sample size (n = 159) as suggested by an an-priori analysis (α = .05; β = .8; f = 0.25) using G*Power software. The sample was deemed representative (Appendix C) as its demographic characteristics match those of the Chinese tourists (i.e., female: 53.0%sample vs. 56.0%population; average age: 38.2sample vs. 40.0population) (Zhang, 2024).
Data analysis involved five main stages. The first four stages were similar to the data analysis approach in Study One. After identifying insignificant sociodemographic differences between sample groups (Appendix D) and checking the scale reliability for perceived masculinity/femininity of the voiceover (
Perceived Voiceover Masculinity/Femininity of the Three Conditions (Study Two).
Note.
p < .001.
The last stage of the analysis consisted of four logistic-based mediation model assessments (PROCESS Macro Model 4; Hayes, 2012) to examine whether voice pitch differences result in individual’s voice-destination congruency choice through perceived masculinity/femininity for each destination pair (Table 4). This separate model approach was employed to help control for different characteristics of destination masculinity/femininity, assessing whether individual’s voice-destination congruency choice was robust across different destination characteristics. One dummy variable served as a binary dependent variable with feminine destination as the baseline (Y1 = 1 for masculine destination). All four regression models included two other dummy variables (X1 = 1 for low voice pitch; X2 = 1 for high voice pitch) as multi-categorical independent variables. Specifically, X1 compared the low voice pitch condition with the other two conditions (middle and high), whereas X2 compared the high condition with the low and middle conditions.
Logistic Regression Analyses Predicting Individual’s Destination Choice.
p < .05. **p < .01. ***p < .001. nsp > .05.
Italic terms represent the predicting power of the model.
The results consistently suggested that perceived masculinity/femininity partially and positively mediated the effect of low voice pitch D1: b = 1.544, 95% confidence interval [CI] [0.721, 2.763]; D2: b = 1.256, 95% CI [−0.333, 2.551]; D3: b = 1.123, 95% CI [0.367, 2.117]; D4: b = 1.701, 95% CI [0.258, 3.845], and completely and negatively mediated the effect of high voice pitch (D1: b = −1.916, 95% CI [−3.460, −0.844]; D2: −b = 1.558, 95% CI [−3.214, −0.412]; D3: −b = 1.394, 95% CI [−2.612, −0.462]; D4: −b = 2.112, 95% CI [−4.807, −0.380]) on individual’s selections of a masculine (vs. feminine) destination, thereby supporting H2. These results triangulate the relationship between voice pitch and perceived masculinity/femininity of voiceovers across gender. The significant direct positive effect of low voice pitch on individual’s destination choice (b = 1.206–2.782, p < .001) supported Tsantani et al.’s argument (2016) that lower-pitched voices are perceived as more reliable and trustworthiness, especially in a Chinese context (Wu et al., 2021). Hence, consistent with (Dahl, 2011), the current study found that a lower-pitched voice is more effective at triggering direct responses from consumers
Study Three
The primary objective of Study Three was to examine the psychological mechanism through which voiceover-destination congruency (H3) and voiceover-tourist congruency (H4) influence tourist’s travel intentions through mental imagery (H5a and H5b).
Study Context
China was chosen as Study Three’s setting for three reasons. First, videos have been one of the most popular marketing tools for many tourism destinations in China (Shao et al., 2016). For example, a series of promotional videos produced by Litang Country attracted more than 1.512 million tourists that year and increased local tourism income by 72.4% (Bytedance, 2021). Second, China consists of diversified landscapes for tourism purposes. This diversity has allowed China to formulate multiple gender identities across different destinations. Hence, investigations in a Chinese context allow comparisons between destination masculinity and femininity within a single country, preventing possible cultural confounding effects in the proposed research model. Lastly, as reflected by many traditional poems, destination gender identity has widely been embedded within the Chinese education system. For example, Mountain Tai has been enchanted with masculine vigor and heroic spirit (e.g., “I must ascend the mountain’s crest; it dwarfs all peaks under my feet.”) and West Lake has been associated with femininity (e.g., “Compare West Lake to the beautiful woman Xi Zi: She looks just as becoming, lightly made up or richly adored.”) (Ren & Pan, 2024).
Experiment Design
With the help of the two professional survey companies, we recruited 800 participants (400 from Lediaoche.com and 400 from Credamo) with normal hearing. In addition to exceeding the minimum sample size (n = 240) as suggested by an an-priori analysis (α = .05; β = .8; f = 0.25) using G*Power software, the sample was deemed representative (Appendix D) as its demographic characteristics were similar to those of Chinese tourists (i.e., female: 53.8%sample vs. 56.0%population; average age: 37.6sample vs. 40.0population) (Zhang, 2024). They were then assigned to one of eight conditions in a 2 (voice pitch: high vs. low) × 2 (voice gender: male vs. female) × 2 (destination gender identity: masculinity vs. femininity) between-groups experiment. Voice gender and destination gender were included as between-group factors to control the effect of voice pitch. Beyond examining the voice pitch effect on tourist’s behavioral intentions (i.e., travel intentions to visit the advertised destination), Study Three also clarified whether such effects vary depending on voice gender and destination gender. The entire experiment had four phases. In the first phase, participants were instructed to indicate their level of masculinity/femininity using four items (1 = strongly disagree; 7 = strongly agree) from B. Yoo et al.’s (2011) Individual Cultural Values Scale.
In the second phase, one of the eight destination promotion videos was randomly presented to the participants. Specifically, two serene landscape videos for a stereotypically masculine destination (Mountain Tao: 35 s) and a stereotypically feminine destination (West Lake: 36 s) were obtained from Douyin. Douyin, as a popular short-video platform launched in China (Ren & Pan, 2024), has been found to profoundly influence destination marketing (Wei et al., 2023). The original sounds of the two videos were removed and replaced by the scripts written by the first author based on the official information of the relevant destination management organizations (Figure 4).

Voiceovers of the two videos.
A pilot-test was performed with 75 tourism undergraduates and postgraduates to evaluate the experimental stimuli manipulation. First, the participants were instructed to watch the two soundless videos before indicating their perceived masculinity/femininity of the destinations using the measurement items (masculinity: 7 items; femininity: 6 items) developed by Hamdy et al. (2024). As manipulated in the videos, the results revealed that Mountain Tai was significantly perceived as more masculine (F = 93.386, p < .001) and less feminine (F = −47.276, p < .001) than West Lake. Also, the participants evaluated the perceived credibility of the scripts (1 = not credible; 7 = very credible) and found that the perceived credibility was high (x̄Mountain Tai = 6.413; x̄West Lake = 6.587) and that the two video scripts revealed insignificant differences (t=−1.778, p > .05).
After validating the stimuli manipulation, the same text-to-speech interface used in Study One and Two was adopted to develop a high-pitched condition and a low-pitched condition for both “male” and “female” voices from Play.ht. As a result, four voice conditions (i.e., high-pitched male, low-pitched male, high-pitched female, and low-pitched female) were exported to mp3 files and were added to the two serene landscape videos accordingly (Figure 4). The pitch differences were again cross-checked by Praat, affirming that the average Hz of the high-pitched male (i.e., 143.06) and female voices (i.e., 259.63) were 36.4% and 41.4% higher than those of the low-pitched voices (i.e., male: 104.89; female: 183.57), respectively. After watching the video, respondents rated their perceptions of the voiceover as being masculine/feminine, using a seven-point Likert scale (−3 = very feminine; 3 = very masculine) (Wu et al., 2023).
In the third section, participants were instructed to rate their perceptions of the masculinity/femininity of the destination using a seven-point Likert scale (−3 = very feminine; 3 = very masculine). Despite potential order effects, the assessments of destination masculinity/femininity helped maintain consistency between the third section and the first two sections in evaluating participants’ perceptions of the masculinity/femininity and capture participants’ fresh cognitive appraisals of the destination promotion video. The fourth section asked the participants to report their mental imagery (3 items: Miller et al., 2000) of traveling to Mountain Tai/West Lake and indicate their relevant travel intentions (3 items: Alvarez & Campo, 2014) (from 1 = strongly disagree to 7 = strongly agree). Lastly, participants provided sociodemographic information, including gender, age, education level, and income level, which were considered as control variables to help prevent possible bias.
Data Analysis and Findings
All the scales used in the online survey underwent factor analysis (extraction method: Principal Components; no rotation) to produce one-dimensional factorial structures. The subsequent reliability analysis showed strong Cronbach’s alphas (tourist’s masculinity/femininity = 0.914; mental imagery = 0.899; travel intentio n = 0.899) in the two video conditions (West Lake: tourist’s masculinity/femininity = 0.894; mental imagery = 0.896; travel intentio n = 0.906; Mountain Tai: tourist’s masculinity/femininity = 0.929; mental imagery = 0.902; travel intentio n = 0.890), allowing for the creation of single variables for the measures by averaging the items in each scale.
As a more conservative approach, a confirmatory factor analysis was performed to confirm the convergent validity of the proposed research model using three criteria proposed by Fornell and Larcker (1981): (1) all factor loadings exceeded 0.7 (tourist’s masculinity/femininity = 0.743–0.896; mental imagery = 0.833–0.911; travel intentio n = 0.815–0.908), (2) all composite reliability (CR) values exceeded 0.7 (tourist’s masculinity/femininity = 0.915; mental imagery = 0.900; travel intentio n = 0.900), and all average variance extracted (AVE) values exceeded 0.5 (tourist’s masculinity/femininity = 0.732; mental imagery = 0.752; travel intentio n = 0.752) (Table 5). The model fit was found satisfactory as all fit indices were above the cut points (χ2/df = 2.938 < 3; CFI = 0.989 > 0.95; TLI = 0.985 > 0.95; GFI = 0.977 > 0.95; RMSEA = 0.049 between 0.03 and 0.08).
Confirmatory Factor Analysis of the Measurement Items.
Note. χ2/df = 2.938; CFI = 0.989; TLI = 0.985; GFI = 0.977; RMSEA =0.049.
Data analysis involved four main stages. First, similar to the first two studies, a series of chi-square analyses and one-way analyses of variance (ANOVAs) were conducted to identify sociodemographic differences between the eight sample groups, with the goal of minimizing potential confounding effects in further analyses (Appendix E).
Second, as a prerequisite verification for examining the proposed research model, a three-way ANOVA was performed to help affirm the symbolic meanings of voice pitch on masculinity/femininity by examining the effects of voice pitch, voice gender, and destination gender on individuals’ perceptions of the masculinity/femininity of the voiceovers (Table 6). In addition to the significant effect of voice gender (F = 4.491, p < .05), findings cross-validated the effect of voice pitch on perceived masculinity/femininity of the voiceover as perceived masculinity was significantly higher when the voice pitch was lower (F = 95.030, p < .001). This effect held true across destination gender as the effect of destination gender and all other interaction terms were insignificant. This finding confirmed the voice pitch-induced perception of voiceover’s masculinity/femininity, supporting further analysis.
Effects on Perceived Voiceover Masculinity/Femininity.
p < .05. ***p < .001. nsp > .05.
Third, a three-way ANOVA was performed to examine the direct effects of voice pitch, voice gender, and destination gender on individual’s mental imagery and travel intention (Table 7). While the significant direct effect of destination gender on mental imagery and travel intention (mental imagery: F = 4.582, p < .05; travel intention: F = 5.960, p < .05) was consistent with Hamdy et al. (2024) who suggested the superiority of destination’s feminine characteristics, there was no significant preference over voice pitch (mental imagery: F = 0.373, p > .05; travel intention: F = 2.968, p > .05) and voice gender (mental imagery: F = 0.039, p > .05; travel intention: F = 0.035, p > .05) in the destination promotion video. Despite the insignificant three-way interaction effects (mental imagery: F = 0.012, p > .05; travel intention: F = 0.390, p > .05), several significant two-way interactions were identified. Voice pitch was found to significantly interact with destination gender (mental imagery: F = 9.418, p < .01; and travel intention: F = 49.501, p < .001), suggesting that the influence of voice pitch depends on the gendered nature of the destination. Similarly, voice gender was found to interact with destination gender (mental imagery: F = 7.614, p < .01), suggesting that the effect of voice gender on mental imagery was moderated by how the destination was gendered. These findings highlight the importance of examining possible psychological mechanisms (e.g., voice-destination congruency) beyond the effect of vocal properties on tourist’s psychological and behavioral responses.
Effects on Mental Imagery and Travel Intention.
p < .05. **p < .01. ***p < .001. nsp > .05.
Lastly, two moderated mediation models using SPSS macro (Hayes, 2018; model 8) were conducted to examine the mechanism through which the two masculinity/femininity dyads (i.e., voiceover-destination congruence and voiceover-tourist congruence) influenced individual’s travel intentions through mental imagery across the two destinations (Table 8). Voice gender was considered as a binary moderator in the regression model assessments. After calibrating the 7-point scale data of the voiceover masculinity/femininity (VG) and the tourist masculinity/femininity (TG) into fuzzy set scores ranging from 0 to 1 to align with the binary variable of destination masculinity/femininity (DG), we followed Pradhan et al.’s (2023) methods to calculate the congruence scores between voiceover and destination/tourist using the generalized Euclidean distance (GED) square model:
Where
Regression Analyses of Individual’s Travel Intentions.
Notes. ***p<0.001; **p<0.01; *p<0.05; nsp>0.05.
The generalized Euclidean distance (GED) square model was adopted to calculate congruence scores for three main reasons. First, it aligns with the core study focus of Study 3 to capture perceived congruence between tourist, voiceover, and destination. While experimental manipulations of voice gender and destination gender offered categorical distinctions, they could not fully reflect individual subjective perceptions of masculinity/femininity. In other words, the GED square model allowed us to operationalize congruence as a continuous variable in a more precise way. Second, the GED square model helps create three congruence scores based on squared difference (Edwards, 1994), limiting the number of independent variables to reduce the multicollinearity risk inherent in creating multiple high-order interaction terms (Davison et al., 2002). Specifically, three independent variables (i.e., tourist, voiceover, and destination masculinity/femininity), together with their interactions, ends up seven variables in the regression model, inflating multicollinearity that in turn destabilizes coefficient estimate and reduces statistical power (Aiken et al., 1991). Lastly, the GED square model has widely been adopted in the marketing literature (e.g., Pradhan et al., 2016, 2023) to calculate congruency. The use of the GED square model allows more appropriate discussions on how the study findings are similar or different from the existing literature.
Results revealed that voiceover-destination congruency significantly increased participants’ travel intention to West Lake (b = 1.920, p < .001) but not to Mountain Tai (b = 0.376, p > .05), thereby lending partial support for H3. However, voiceover-tourist congruency was found to significantly reduce participants’ travel intention to West Lake (b = −1.938, p < .05) but had no significant effect on participants’ travel intention to Tai Mountain (b = 1.226, p > .05). Thus, H4 was not empirically supported.
It was also found that voiceover-destination congruency significantly increased participants’ mental imagery (West Lake: b = 2.409, p < .001; Mountain Tai: b = 2.888, p < .001). Also, mental imagery significantly increased participants’ travel intentions (West Lake: b = 0.388, p < .001; Mountain Tai: b = 0.573, p < .001). Given the insignificant moderating role of voice gender in the mediation relationship between voiceover-destination congruency and travel intentions (West Lake: b = 0.523, p > .05; Mountain Tai: b = 0.834, p > .05), mental imagery served as a partial mediator in relation to voiceover-destination congruency and travel intentions at the feminine destination regardless of the voice gender (female voice: b = 0.935, 95% CI [0.609, 1.282]; male voice: b = b = 0.713, 95% CI [0.404, 1.042]). However, it served as a full mediator at the masculine destination (female voice: b = 1.655, 95% CI [1.257, 2.094]; male voice: b = 1.432, 95% CI [0.921, 2.002]), thereby supporting H5a.
It was further found that voiceover-tourist congruency significantly increased participants’ mental imagery to the masculine destination (b = 4.151, p < .001) but had no significant effect in the feminine destination condition (b = −1.015, p > .05). In the masculine destination condition, given the insignificant direct effect of voiceover-tourist congruency on travel intentions (b = 1.226, p > .05), full mediation occurred with mental imagery in terms of how voiceover-tourist congruency triggered travel intentions across voice gender (female voice: b = 1.991, 95% CI [1.433, 4.048]; male voice: b = 2.691, 95% CI [1.372, 4.195]). However, mental imagery was found insignificantly to mediate travel intentions in relation to voiceover-tourist congruency in the feminine destination condition (female voice: b = −0.244, 95% CI [−1.671, 0.816]; male voice: b = −0.553, 95% CI [−1.494, 0.339]). Thus, H5b was partially supported.
Discussions and Implications
General Discussion
Table 9 reports the results of the hypothesis testing. The findings of Study One support H1 using a male voice. Consistent with various studies that suggested voice pitch as a significant biological characteristic to indicate one’s masculinity/femininity (e.g., Wolff & Puts, 2010; Cartei et al., 2014; Wu et al., 2023), voice pitch was found to negatively(positively) influence individual’s perceived masculinity(femininity) of a male voiceover. H1 was further supported by Study Two using a female voiceover, suggesting that lower(higher)-pitched voices triggered one’s perceived masculinity(femininity) of a voiceover, regardless of the voiceover’s biological sex (male/female) (Ko et al., 2006; Krahé & Papakonstantinou, 2020).
Results of the Hypothesis Testing.
p < .05. nsp > .05.
In addition to triangulating the negative effect of voice pitch on one’s perceived masculinity (H1), Study Two also examined the mechanism through which voice pitch influenced one’s destination choice through perceived masculinity (femininity). H2 was empirically supported which reveals that respondents preferred a masculine(feminine) voice to promote masculine(feminine) destinations, supporting Efthymiou et al. (2024) who discovered that voice’s masculine(feminine) characteristics (i.e., vocal tract length) promoted congruency attributions toward stereotypically masculine(feminine) products. Hence, the current findings are in congruence with studies that have advocated for congruency among vocal in a brand or product (e.g., Hu et al., 2023; Huh et al., 2023) for destination marketing.
In addition, Study Two found a significant direct effect of lower-pitched voice on one’s selections toward masculine destinations but an insignificant direct effect of higher-pitched voice, highlighting the non-linearity of voice pitch in one’s decision making (Liu et al., 2024; Wang et al., 2024). The insignificant role of higher-pitched voice is consistent with Dahl, 2011 who suggested that a lower-pitched voice is generally more effective to trigger consumer’s direct responses. Specifically, a lower-pitched voice has been suggested to be perceived as more reliable and trustworthy (Tsantani et al., 2016), especially in the Chinese context where Study Two was conducted (Wu et al., 2021).
Building upon the voice pitch-induced voiceover-destination congruency in Study One and Two, Study Three examined a mechanism through which voiceover-destination congruency and voiceover-tourist congruency in masculinity/femininity influenced one’s travel intention through mental imagery. The results yielded partial support for H3 (i.e., the direct effect of voiceover-destination congruency on one’s travel intention). Specifically, the direct effect of voiceover-destination congruency on one’s travel intention was only significant in feminine destination but not masculine destination. Dietrich et al. (2019) suggested that voice pitch is indeed a form of emotional characteristics, which aligns more with the inherently emotive nature of feminine destination. In contrast, masculine elements rely more on pragmatic appeals (Putrevu, 2004), where voice effect pitch might be less influential.
However, the direct effect of voiceover-tourist congruency on one’s travel intention (H4) was found insignificant for both feminine and masculine destinations. While this insignificant effect contradicts many influencer-brand congruency studies (e.g., Casalo et al., 2020; Belanche et al., 2021), it aligns with Pradhan et al. (2023) and highlights the unique formation of one’s travel intention. Specifically, while voiceover’s aspirational masculinity/femininity was found to transfer to the destination to create a sense of voiceover-destination congruency, a high voiceover-tourist congruency might lead to insignificant aspirational masculinity/femininity and thus have little effect on one’s travel intentions (Pradhan et al., 2016).
Study Three also examined the mediating role of mental imagery in the relationship between the congruency effects and travel intentions. Results supported H5a and suggest that mental imagery partially mediated the effect of voiceover-destination congruency and travel intentions. This significant mediating effect reaffirmed the ability of voices to elicit images in listeners’ minds (Rodero, 2012). Specifically, consistent with Kim et al. (2021), the voice of a voiceover was found to serve as a visual imagery-evoking tool that influenced the listener’s response toward a destination (i.e., travel intentions).
However, H5b was only partially supported in Study Three, as a mediating effect of voiceover-tourist congruency was significant in the masculine destination condition but insignificant in the feminine destination condition. This finding is consistent with social role theory and highlights the unbalanced impact of masculine and feminine cues on one’s perceptions (Eagly & Sczesny, 2019). As suggested by Pan et al. (2020), destination masculinity triggered more heuristic information processing that aligns with the formation of a mental imagery. In addition, as inspired by many studies that have conceptualized femininity as the absence of masculinity rather than a distinct construct (e.g., Felix et al., 2022), masculine destinations might have more stereotyped or defined imagery that rely on a specific gendered presentation. Feminine destinations may not depend as strongly on gendered imagery, making congruency less impactful.
Theoretical Contributions
This study contributes to the academic understanding of sound symbolism, destination masculinity/femininity, congruency in destination choice, and mental imagery in several ways. First, despite extensive scholarly attention on sensory marketing in the tourism literature, it is believed this is the first study to apply sound symbolism to study the effect of voice pitch in destination marketing. Existing studies have primarily focused on visual cues (e.g., colors: Au et al., 2024; font style: Huang, 2019) to communicate a destination’s characteristics. However, the one-sided discussion on the visual cues has overlooked vocal cues as one of the most reliable and prominent marketing tools that allows listeners to recognize a destination’s characteristics (Parise & Pavani, 2011; Peiffer-Smadja & Cohen, 2019).
It is also believed the current findings evolve the destination masculinity/femininity literature by examining how vocal features (i.e., voice pitch) form one’s masculinity/femininity perceptions. Drawing on sound symbolism, this study confirmed the mediating role of masculinity/femininity perception in the relationship between voice pitch and the congruency among vocals for a destination. While a lower-pitched voice was found to foster respondents’ perceived congruency with a masculine destination, a higher-pitched voice did not result in respondents’ perceived congruency with a feminine destination. This unexpected result raises questions about possible misinterpretations in research that assume linear effects of voice pitch on listeners’ perceptions and behaviors. Similar to Tsantani et al. (2016) a lower-pitched voice appeared to be more stereotyped or defined as a gendered cues than a higher-pitched voice.
This study also expands self-congruent theory by examining the effects of voiceover-destination congruency and voiceover-tourist congruency in masculinity/femininity perceptions on individuals travel intentions. Scholars have primarily adopted self-congruency theory to focus on the congruency between tourist’s personality or self-concept and the destination’s relevant characteristics (e.g., Šegota et al., 2022; Ranjbarian & Ghaffari, 2018) but have overlooked the congruency role of a destination’s spokesperson (e.g., voiceover). Specifically, travel intentions were found to be significantly triggered by voiceover-destination congruency but not voiceover-tourist congruency. This unexpected finding challenges self-congruency theory and suggests that masculine/feminine tourists may not necessarily value gender identity congruency, given that heterosexuality is becoming less normative (Habarth, 2015).
The current research enhances the body of knowledge on mental imagery by testing the voice pitch-induced effects of voiceover-destination congruency and voiceover-tourist congruency related to masculinity/femininity perceptions. While mental imagery can be elicited by different sensory inputs, prior research in advertising contexts have primarily focused on the elicitation of mental imagery through visual stimuli such as color (e.g., Au et al., 2024; He et al., 2024). Hence, this study represents a pioneering attempt to demonstrate how tourist’s mental imagery could be triggered by verbal stimuli (i.e., voice pitch). Specifically, mental imagery was found to significantly mediate the relationship between the voice pitch-induced voiceover-destination congruency and travel intentions. While the mediating role was found insignificant in the relationship between the voice pitch-induced voiceover-tourist congruency and travel intentions to a feminine destination, it reaffirmed that individuals do not necessarily value others with the same gender identity.
The last study contribution is methodology related. Compared to economic and medical scenarios, tourist decision-making and behaviors are likely more complex and involve various uncontrollable factors. While conducting experimental studies in tourism scenarios is often challenging to satisfy internal validity (Viglia & Dolnicar, 2020), recent breakthroughs in artificial intelligence (AI) technologies have provided tourism scholars with insightful ways to design scenario-based experiments (Xiong et al., 2024). While text-to-image AI tools have increasingly generated scholarly attention in experimental research, this study represents a pioneering attempt in the tourism literature to manipulate sound characteristics using an AI voice generator (i.e., Play.ht: https://www.play.ht). Specifically, the step-by-step explanation of the experimental stimuli in this study has introduced a new experimental approach for tourism scholars to understand sound characteristics in advertisements.
Practical Contributions
This research also has practical implications for various stakeholders in the tourism industry, such as destination management organizations, travel vloggers, and television programmers. First, this study focused on an underexplored study area (i.e., voice characteristic) in destination promotions. As demonstrated by the study’s methodology, voice characteristics (e.g., voice pitch) are easy to manipulate in a digital environment. Destination management organizations can use appropriate voice characteristics as signals to better communicate their destination’s competitive advantages and ultimately generate visitation. Our findings that a lower-pitched voice is more effective to communicate destination masculinity provide valuable insights for destination management organizations to promote their tourism offering.
Second, the study findings suggest a new approach to celebrity marketing. Celebrity endorsers have long been used by destination marketers to promote destinations to both domestic and international tourists (Roy et al., 2021). Notable examples include actor Chris Hemsworth for Australia, singer Taylor Swift for New York, and actor Jackie Chan for Indonesia. It is expected that linking a destination brand with a celebrity could help attract tourists, especially fans of the celebrities (S. Lee et al., 2008). However, voiceover-tourist congruency in masculinity/femininity perceptions was found insignificant to trigger travel intentions. Hence destination marketers should not overestimate the effect of a celebrity’s popularity on individual’s travel intentions and overlook the congruency between a celebrity and a destination, at least in the masculinity/femininity context.
Third, travel vloggers are increasingly visible in destination marketing as advocates of tourism and hospitality experiences (Nguyen et al., 2025). They concentrate on creating videos and disseminating them through social media platforms (Xu et al., 2021). Travel vloggers may benefit from aligning their vocal characteristics with the type of destination they promote to educate and entertain viewers about travel and tourism more effectively. Specifically, this study advocated the congruency among vocals for a brand or product, suggesting that a lower(higher)-pitched voice fits with a masculine(feminine) destination more. As a result, travel vloggers should consider how their natural voice pitch—or adjusted pitch through audio editing—can enhance the persuasiveness of their storytelling (F. Chi et al., 2025). For example, a lower-pitched (i.e., masculine) voice would likely be more effective to share adventure and entertaining experience, whereas a higher-pitched (i.e., feminine) voice is likely more appropriate to promote romantic travel experiences.
Lastly, travel documentaries are popular in the tourism industry and offer individuals with an opportunity to enjoy the travel experience while staying comfortably at home in front of their televisions or computers. Given the historical mindset that suggests a superiority of masculine voices in message communication (Rodero et al., 2013), travel documentaries have typically been dominated by male speakers (i.e., a lower-pitched voice). However, this study drew on self-congruent theory to highlight the voice pitch-induced voiceover-destination congruency related to femininity. Results of the study suggest that television programmers should not assume male voices are superior, but should focus on congruency of voice, with the product/destination being communicated.
Limitations and Future Research Directions
The study has identified seven limitations that require further investigation. First, this study conceptualized gender as binary (masculine vs. feminine), a historically hetero-normative approach. However, gender is fluid and exists on a spectrum. Future research should explore non-binary and gender-neutral voiceovers to provide a more inclusive understanding of auditory marketing in tourism. Second, this study focused exclusively on the effect of voice pitch. However, future studies should explore the verbal effect in real-world scenarios involving various verbal characteristics such as loudness, speech rate, and accent.
Third, as individual’s travel plans and loyalty to a place have become more affected by gender cues than other personality traits (Pan et al., 2020), this study focused on how voice pitch triggered one’s masculinity/femininity perceptions. Future work should further explore cross-modal effects between the vocal features and other human-like characteristics such as visual appearance and age. Fifth, this study systematically adjusted voice pitch to enable linear comparisons between the low- and high-voice pitch levels. However, as discovered in this study, the effects of voice pitch on one’s perceptions and behaviors are unlikely linear. Potential non-linear effects of voice pitch warrant consideration in future research.
Sixth, the model assessments were conducted in a Chinese context, which eliminated confounding effects from other variables (e.g., type of attractions, travel party, and travel purpose). While Svantesson (2017) suggested that most sound(s) features hold universal semantic meaning across languages and cultures, cultural factors such as power distance may moderate the effect of voice pitch. Specifically, Chinese consumers may tend to favor masculine figures in marketing communications (Song et al., 2019). Hence, understanding how the effect of voice pitch varies based on different attributes is important for future research. Lastly, this research revolved around congruency of voice masculinity/femininity with destination and tourist masculinity/femininity. Additional work is needed to better understand how the congruency effects vary across different aspects (e.g., background music, video script, and sentence structure).
Supplemental Material
sj-docx-1-jtr-10.1177_00472875251371054 – Supplemental material for Masculine and Feminine Gender Cues in Destination Promotion Videos: The Effect of Voice Pitch
Supplemental material, sj-docx-1-jtr-10.1177_00472875251371054 for Masculine and Feminine Gender Cues in Destination Promotion Videos: The Effect of Voice Pitch by Wai Ching Wilson Au, Fiona Chi and James F. Petrick in Journal of Travel Research
Footnotes
Author Contributions
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Supplemental Material
Supplemental material for this article is available online.
Author Biographies
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
