Abstract
The interaction between interviewer and respondent is one of the most significant sources to interviewer bias, with respondents often answering questions in a way they think is socially desirable. This bias, called the social desirability bias, has been studied in a variety of context and found to influence the validity of data collected. In this study, we examine whether the national identity of an interviewer will bias results in a survey on tourist destinations when there is a match between interviewer nationality and target destination. In a 2 × 2 between-subjects experiment, the main finding is that respondents will evaluate more positively an advertisement, the attractiveness, and the people from a country if interviewed by someone from that country compared to being interviewed by someone not from the target country. Our results imply that the practice of interviewing tourists while being at a destination, by a local or domestic interviewer, is a situation severely prone to biased results. Theoretical and managerial implications are offered based on these findings.
Introduction
A significant part of market research conducted today, either by tourism companies themselves or by market research firms on behalf of industry clients, is based on some kind of customer survey. While both more qualitative studies and (field) experiments are employed for a huge variety of research questions, the fact that interviewing customers still hold such an important part of the data collection domain calls for a careful consideration of how suitable some of these data are for the decision makers they are meant to assist. Although typical sources of error like item wording, response set, leading, threatening, or double-barreled questions in the survey itself (Frankfort-Nachmias and Nachmias, 2008), key informant issues (John and Reve, 1982) or nonresponse bias (Mitchell and Jolley, 2001) are often kept in mind when designing and conducting survey research, other sources to error are more difficult to handle. For example, imagine that a local Norwegian destination company wants to learn how satisfied international visitors are with their stay in the region. To collect some information the company engages a marketing research firm that carefully constructs a survey following all the ‘text book’ procedures, assigns the data collection task to one of its most trusted employees, and then undertakes a sampling technique and/or interviewing procedure based on interception (e.g. Walsh et al., 2008). Data are collected from a large sample, the analyses produce interesting results, and the destination company redesigns some of its planned marketing activities based on, among others, the results of this survey. While we might question various parts of this research process due to lack of more details on what actually went on, one particular error source is of specific interest in this article—the fact that the interviewer is a Norwegian and thus associated with ‘the host’, while the respondents are international tourists visiting the home of the interviewer. In other words, the situation is severely prone to be hampered with interviewer effects, meaning errors occurring in the interaction between the interviewer and the respondent. In this article, we seek to identify, and experimentally produce, biases in tourism surveys due to interviewer effects. While previous research has identified interviewer-related errors caused by issues like the respondents’ inability to speak the same language as the interviewer (e.g. Boyd and Westfall, 1965), the body language or facial expressions of the interviewer (Nachmias and Frankfort-Nachmias, 2008), the interviewer’s race (Cotter et al., 1982), or interviewer’s sex (Walker, 1992), our focus of attention is somewhat different. In personal surveys of tourists on a destination, one of the parties to the interaction will typically take on the role of a guest, while the other is representative of the host. Hence, based on a social desirability bias explanation, we assume that the guest are concerned with not offending the host and thus attune his or her answers in a way that makes the interview situation a pleasant experience for both parties. The succeeding parts of this article outline the theoretical bases for this assumption and present an experimental procedure designed to identify such social desirability biases in surveys of tourists. The empirical results are then presented and implications for tourism research discussed.
Interviewer effects and SDB
In the fall of 2013, TV station Channel 4 broadcasted a documentary titled ‘The cruel cut’, focusing on female genital mutilation (FGM). Among others, the viewers met Leyla Hussein who had decided to study the degree to which people strived to be politically correct in relation to the FGM procedure. She intercepted consumers in shops in Northampton, UK, and asked them to sign an appeal in support of her ‘culture, traditions and rights to be protected’ by allowing FGM. In just half an hour, 19 people signed up, and while many of them where actually against FGM, they still signed because ‘FGM was a part of Leyla’s culture’ (denkorteavis.dk). Simply stated, they did what was socially desirable in the current situation, even though it contradicted their more principally held beliefs. Not surprisingly, Leyla was shocked by how easy the intercepted individuals would comply to her request and sign on to support FGM.
What Leyla Hussein’s little experiment proved is the frequent problem with self-reported measures (like answers given in an interview), where respondents tend to provide socially desirable answers. This form of response bias, whether it implies overreporting positive issues or underreporting negative issues, is one of the most studied forms of response bias in social science (Fisher and Katz, 2000) and has concerned researchers in most areas that are relying on human self-reports. When Edwards (1957) proposed the term social desirability bias (SDB) it was described as a tendency to respond in a way that presents ourselves in a socially acceptable way, for which we expect some level of approval. A more recent explanation is that social desirability underlies the propensity for ‘survey respondents to tailor their answers to what they think would satisfy or please the interviewer’ (Davis and Silver, 2003: 33). According to Paulhus (1984), SDB can take on two distinct forms, where the first is termed self-deceptive positivity and concerns honest but too favorable self-presentations. The second form is impression management and implies a desire to appear in a socially approved way. Zerbe and Paulhus (1987) argues that self-deceptive positivity is a dispositional tendency that is reasonably invariant and thus not a biasing factor per se (King and Bruner, 2000) and that impression management is what really corrupts research. In their elaborate discussion of SDB in marketing and consumer research, King and Bruner (2000: 94) present a variety of situations that can foster SDB, of which designs where self-report measures are used is one and situations where subjects expect their response will ‘result in normatively influenced or evaluative consequences’ is another. Drawing on Thomas and Kilman (1975), they further assume that ‘the contaminating influence of response bias should be expected to operate whenever ratings are used to assess variables with evaluative overtones’ (King and Bruner, 2000: 94).
Related to our research question, we will argue that tourists intercepted by interviewers on the destination they are visiting are prone to find their responses to be evaluated in a normative way by the interviewer. Previous research has found this assumption to hold in a variety of different contexts and we find reason to believe that it also applies to interviewing visitors. For example, in an experiment with American undergraduate students, Kemph and Kasser (1996) found that attitudes toward male homosexuality were significantly influenced by the (male) interviewers’ sexual orientation. Walker (1992) found that sex of interviewer (male vs. female) influenced the answers given on a measure of attitudes toward women. In a study of candidates in a job interview situation, Keenan (1976) found the nonverbal behavior of the interviewer to influence both how relaxed the respondents felt and the impressions they gave in the interview. Interviewer effects have also been studied in relation to race, and while Schuman and Converse (1971) found racial effects in personal interviews, Cotter et al. (1982) found the same in telephone interviews. However, the studies on racial effects in interviews provide some interesting details that are of importance to our study. First, Hatchett and Schuman (1975) argue that both races (Whites and Blacks) seem to constrain themselves from responses that might offend the other race while being more honest with interviewers from their own race. Second, the effects of race seem to be limited to issues that have to do with race, which is also what Cotter et al. (1982) found in their study. While race of interviewer had no effect on nonracial questions, they found that White respondents interviewed by a Black interviewer were more pro-Black than Whites interviewed by Whites.
These findings have some important implications for our study and for the way we conceive of the terms ‘interviewer effect’ and SDB. Drawing on the definitions and conceptualizations previously mentioned, we have used the terms interchangeably, which can be somewhat confusing. 1 To clarify this we will argue that when the response is biased due to a characteristic of the interviewer (sex, nationality, or race), we are faced with an interviewer effect. The question then is what this effect is. We believe that when the effect is respondents tailoring their answers to what is socially desirable in the current situation, the interviewer effect is an SDB. For example, assume that a group of respondents is given a survey on racism or attitudes toward charity giving. If respondents tailor their answers according to what is socially acceptable in general, it would be consistent with how Edwards (1957), Paulhus (1984), and Davis and Silver (2003) present the concept of SDB. However, it would not be an effect caused by the interviewer but rather by general social norms. However, if answers to a survey are influenced by an interviewer’s unpleasant behavior, we are faced with an interviewer effect but that does not necessarily mean that respondents answer in a socially desirable manner. Thus, we may have interviewer effects unrelated to SDB and we may have SDB not caused by the interviewer. Finally, we argue that when interviewer characteristics drive respondents to answer in a way that is socially desirable (e.g. trying to please the person doing the interview), we have a situation where the response bias at hand is both an interviewer effect and an SDB. This is similar to the aforementioned studies (Cotter et al., 1982; Hatchett and Schuman,1975), where White (Black) respondents constrained themselves from offending Black (White) interviewers on race-related questions, whereas response differences disappeared when questions were unrelated to race. Hence, here SDB is caused by the interviewers’ race, and the SDB and the interviewer effect is the same.
Transferred to our setting of tourism surveys, we believe that a similar pattern will emerge when the interviewer is associated with the target destination (country) but not when the interviewer is one of ‘the respondents’ own’. Obviously, we also argue that the effect is limited to questions addressing the target destination or country. For example, we believe that a Norwegian interviewer intercepting an Italian and asking questions about Norway will induce biased responses. However, on the same basis, as there are no race effects on questions unrelated to race, we see no reason why a Norwegian interviewer intercepting an Italian and asking questions about Hungary should have the answers contaminated by SDB interviewer effects. The remainder of this article describes the procedures designed to test these assumptions regarding tourism surveys.
Method
The subjects were Danish consumers who were intercepted in the city center of Aarhus and asked to participate in the study. By interviewing Danish consumers while being in Denmark (instead of foreign tourists while in Norway) we achieved two important things. First, we reduced to a minimum the chance that responses were based on a rather recent or current experience with the destination in question, as we did not want empirical variance to be generated by respondents basing their answers on differences in recent service experiences. Second, interviewing Danish consumers in Denmark implied respondents being on their ‘home turf’, which contributes to isolating the effect of the experimental manipulation (Churchill, 1995). Stated differently, in this procedure, the guest–host roles are somewhat weakened by taking the respondent out of the ‘guest’ situation, thereby weakening the strength of the manipulation. The reasoning here is that if an explanatory variable that is weaker in a controlled experimental setting than in the real world still yields significant results, it increases the probability of a real-world cause–effect relationship and thus the validity of the results.
The experimental manipulation was achieved by the respondents being interviewed either by a Danish-speaking interviewer or by a Norwegian-speaking interviewer. Both interviewers were women and they both intercepted participants on the same locations. However, they were placed far enough apart so that one particular subject could not hear, or pay attention to, the interview situation taking place some distance away. In both conditions, subjects received a small booklet where they were initially exposed to an advertisement presenting Norway as a place to spend the holidays. After spending some time on the ad, subjects were asked to answer the questions measuring our dependent variables. First, one question was related to the ad itself and read ‘To what degree do you think this ad is representative for Norway as a holiday destination?’. The two next questions were related to Norway as a country and how Norwegians treat visitors. They read ‘To what degree is Norway an attractive holiday destination for you?’ and ‘To what degree do you find Norwegians accommodating toward tourists?’ All items were based on a 7-point normal category scale, anchored ‘to a very small degree’ and ‘to a very large degree’. These three items were consciously designed as the first concerns the ad, the second Norway as a country, and the third concerns Norwegians as a people. Hence, if there is an effect due to social desirability bias, it seems intuitive that the effect should be stronger the more the object (ad, country, and people) can be associated with the interviewer. All items, response anchors, instructions, and ad claims were written in Danish.
The measurement items and procedure were first subject to a face-validity evaluation by a marketing researcher and then pretested on a group of 10 Danish citizens; also these interviews were conducted by the Danish or the Norwegian interviewer. No changes were made based on this undertaking.
The fact that our experimental manipulation is constructed with one interviewer being a native Dane and the other a native Norwegian might drive another kind of interviewer effect than the one we are interested in here. If one of the two are more polite, prettier, less confident, has a calmer voice, and so on, there might be biasing results due to personal differences rather than differences in language/nationality. To control for this possibility we introduced one more manipulation. The booklet presented altered between focusing on Norway or Sweden and which subjects that were exposed to the Norwegian/Swedish version of the treatment was completely randomized. That is, everything in the booklet (including the ad) was completely similar, with the only difference being the word Norway versus Sweden in the ad and the same variation in country in the questions. Both the Norwegian and the Swedish version of the ad are portrayed in Figure 1. In summary, the experimental design was a 2 (interviewer language) x 2 (target country) between-subjects factorial design, with 25 randomized subjects in each of the four experimental cells. The distribution of men and women in the total sample were 48% and 52%, respectively, and the mean age was 40 years. For the dependent variables, there were no significant mean differences between men and women in the sample.

Experimental material—the ads employed.
To test the hypothesized effects, a multivariate analysis of variance (MANOVA) test was conducted with interviewer nationality (Norwegian/Danish) and target country (Norway/Sweden) as fixed factors, and (1) whether the ad is representative for the target country, (2) the country’s attractiveness, and (3) whether the inhabitants of the target country are accommodating toward tourists as dependent variables. Mean scores for the dependent variables are presented in Table 1, sorted by the experimental factors.
Means and standard deviations for dependent variables, sorted by main experimental factors.
The results of the MANOVA showed a significant main effect of interviewer nationality on the representativeness of the ad (F = 7.38, p = 0.008), the attractiveness of the target country (F = 10.199, p = 0.002), and the perception of inhabitants being accommodating toward tourists (F = 9.552, p = 0.003). As for the effects of target country, significant differences were found for the representativeness of the ad (F = 3.97, p = 0.049) and the country’s attractiveness (F = 4.068, p = 0.046). However, whether inhabitants are accommodative toward tourists did not differ between the two target countries (F = 0.106, p = 0.746).
The analyses returned a significant interaction effect between interviewer nationality and target country on one of the three dependent variables, with F value (p value) of 5.53 (0.021) for attractiveness. There was no significant interaction effect on neither ad representativeness nor being accommodating toward tourists (i.e. F values (p values) were 2.65 (0.106) and 3.202 (0.077), respectively). All significance tests are presented in Table 2.
MANOVA results for main and interaction effects.
MANOVA: multivariate analysis of variance.
a F value.
*Significant at p ≤ 0.05.
In addition to being significant, the results also proved to be in the direction expected. As can be seen from the marginal means portrayed in Figures 2, to 4, the effect of interviewer nationality outperforms the effect of target country on all three dependent variables. Recall from the p values presented above that the effects of target country are either not significant or only marginally below the significance criteria (0.049 and 0.046). This implies that the effect of the interviewer is primarily materialized when there is a match between her nationality and the target country. The results are further discussed in the succeeding paragraphs.

Estimated marginal means of ad representativeness.

Estimated marginal means of country attractiveness.

Estimated marginal means of accommodating toward tourists.
Discussion
The manipulations of the interviewer situation proved to produce the expected results and before turning to the SDB issues we first like to present some thoughts on the control conditions. First, the main effects of the different countries used show that regardless of interviewer nationality, subjects find the ad more representative for Norway than for Sweden and they also perceive Norway to be a more attractive holiday destination. These differences are differences in means and imply that they are calculated across the two interviewers. The fact that there is no significant difference on the perception of how accommodating Norwegians and Swedes are toward tourists serves as an indication that the SDB bias disappears when questions are not related to the interviewers nationality. As can be seen in Figures 2 to 4, the scores for Sweden (lower line) are not very different between the subjects interviewed by the Norwegian and Danish interviewer.
The real difference in responses occurs when the subjects are asked to judge the degree to which the ad represents Norway, whether Norway is an attractive holiday destination, and whether Norwegians are accommodating toward tourists. The major conclusion from our analyses is that the subjects interviewed by a person from the country they are asked to evaluate positively adjust their responses to what they think would satisfy the interviewer. Hence, the interviewer effect is obviously present. In fact, looking at Figures 2 to 4 we see that the mean score for all dependent variables is equal to, or lower, for Norway than for Sweden when the interviewer is Danish. When the interviewer is Norwegian, however, the scores are significantly much higher on all three variables, implying that the data collected are ‘faked good’, a tendency referred to as a ‘lying factor’ (King and Bruner, 2000). These results extend the current knowledge on SDB as it shows that the effect is present also when the object under study is a country or destination and when the interviewer is attached to the country in question. While previous research have found similar effects from interviewer sexual orientation, race, sex, and nonverbal behavior, to name a few, country of origin is a concept that is one step farther away from the individual than his or her sex, race, and sexual orientation. The fact that we find SDB also for this research object implies that the effect might be operating in more extensive contexts than often assumed. Moreover, the fact that we also find it to influence the perceptions of an ad, which the interviewer did not make herself, shows that the connection between content of the ad and origin of the interviewer is ‘enough’ to produce a bias.
For marketing research firms, and users of marketing research, our results hold some important practical implications. First, and simplest, humans still lie. It seems more important to be nice than honest or to be socially or politically correct rather than facing the risk of offending someone. While lying is deemed negatively in interpersonal relationships and interactions between humans, the farther away we get from personal friends or acquaintances the easier it is to justify the means with an end. Or in other words, the simpler it is to lie to comfort or satisfy someone else. In an interview situation where respondents do not know the interviewer, lying has no other personal consequence than being perceived as more nice and polite. Hence, implicit motivation to be perceived as nice, and not offend or hurt the interviewers feelings, drives us toward social desirability (SD)-biased responses. To make sure data collected are not hampered with SDB effects, marketing researchers should consider the research context in terms of the respondents probability of ‘faking good’ or ‘faking bad’ (King and Bruner, 2000) and have a baseline assumption where respondents will lie if it is socially desirable. However, we may question whether the lying factor is a result of conscious lying as a means to be perceived as a nice person or whether we unconsciously adapt or responses to the current situation. Strictly speaking, we may discuss whether the latter is actually lying, as the unconscious kind of SDB is not based on a cost-benefit kind of analysis. We suggest this as a future pathway for research into the more detailed mechanisms of SDB in tourism settings.
Second, because consumers adapt to the situation when they believe a faked response is more desirable than an honest response, researchers should be careful when assigning data collection to its employees or hired interviewers, taking into account the measures to be used, the sample to be interviewed, and the characteristics of the interviewer. If the survey is measuring the level of satisfaction among international visitors to a skiing resort, it is pretty easy to make sure the interviewer is not a local or maybe not even from the same country. International visitors seldom speak the local language, so replacing the locally born interviewer should be unproblematic.
Third, there are procedures that can be applied to test for SDB in data collected in interviews. For example, there are different SD scales that can be included in the survey to measure the level of SD responses and examples are the Marlowe–Crowne Social Desirability Scale (Crowne & Marlowe, 1960; 1964) and the same battery in short form (e.g. Reynolds, 1982). The idea behind these scales is that a low intercorrelation between the SDB scales and the target scale (e.g. attitudes) indicates data not confounded with SDB (King and Bruner, 2000). Hence, there are tools available to marketing and tourism researchers, and when faced with a project where SDB troubled data are considered possible, measures should be taken to deal with the potential problem. Our research has shown that SDB effects are found even in measures of a concept as uncontroversial as attitudes toward an ad, and market research companies and tourism managers should strive toward basing their suggestions and activities on data that are as ‘clean’ as possible. We think our research has yet again underscored the importance of thoroughly thought through research designs, even down to a detailed level portraying who interviews who about what.
Footnotes
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
