Abstract
This article presents a qualitative systematic review of sex-of-interviewer effects on survey responses on different topics. Overall, we identified 90 scientific studies including 100 datasets between 1962 and 2022. We find that sex-of-interviewer effects are present but also that the topical scope matters. Unsurprisingly, sensitive questions relating to topics such as gender, health, or sexuality, are more susceptible to sex-of-interviewer effects compared to factual items such as demographic questions. We conclude that interviewer sex deserves further attention in scientific studies, as interviewers may affect the survey estimate. We also note that survey reporting on methodological specifications needs to be more transparent to allow evaluating the quality of the survey data collection. Our findings have implications for researchers working with interviewer-administered survey data, as they should consider that interviewer sex can influence survey estimates and, as a result, may wish to control for potential effects.
Introduction
Survey data quality is a major concern for all stakeholders aiming to make inferences based on survey data. The Total Survey Error framework (TSE, e.g., Groves et al., 2004; Groves & Lyberg, 2010; Weisberg, 2009; Andersen et al., 1979) suggests that amongst other aspects, interviewer error may affect the overall quality of the survey estimate by inducing measurement error (but see e.g., West & Kreuter, 2015, for the impact of interviewers on nonresponse adjustments and representation error). As such, studying interviewer effects has been one focus in survey methodology and prior research has confirmed the existence of interviewer effects in interviewer-administered data collections (Beullens & Loosveldt, 2016; Davis et al., 2010; Fowler & Mangione, 1990; Groves, 1989, 2004; Loosveldt, 2008; Sturgis, 2014; West & Blom, 2017). Looking specifically at the increasing number of studies on sex-of-interviewer effects, a systematic review of these influences seems useful to understand the overall presence and influence of potential interviewer error. Moreover, systematically reviewing this literature may also reveal what other survey characteristics might influence sex-of-interviewer effects. Previous attempts to analyze the literature on sex-of-interviewer effects are limited, as they followed a narrative review approach (Davis et al., 2010; West & Blom, 2017).
Thus, the goal of this study is to examine sex-of-interviewer effects on the survey response by means of a systematic review, methodically including all eligible studies that fulfill rigorous inclusion and exclusion criteria. In other words, while previous literature reviews included a sample of relevant research in this topic, this review aims to assess the population of studies on the topic of interviewer sex effects on responses. The present work summarizes the modal findings qualitatively and neither offers computations nor standardizations of effect sizes related to interviewer sex error. Further, this article also looks at core study characteristics, including but not limited to sampling strategies, sample sizes, number and sex of interviewers, and sex-of-interviewer—sex-of-respondent matching. Finally, we also account for the topical scope of each study and dataset to identify content areas that may be more prone to sex-of-interviewer effects.
The article is structured as follows: we begin by building a theoretical framework by evaluating the role of interviewer effects within the TSE framework. Next, we discuss the design of our systematic review. Subsequently, we present the results of our analyses organized by looking at the overall study characteristics, first, and reviewing sex-of-interviewer effects organized by topic next. We close with concluding remarks as well as implications of this research for survey methodology, in general, but also for researchers, practitioners, and other stakeholders working with interviewer-administered survey data.
Interviewer Effects and the TSE Framework
The TSE framework (Andersen et al., 1979; Groves et al., 2004; Groves & Lyberg, 2010; Weisberg, 2009) posits that errors can occur at every stage of the survey process and affect the quality of the survey estimate. In particular, it distinguishes representation from measurement error. While the former refers to error induced especially by sampling strategies, the latter focuses on all aspects that affect the questionnaire design and survey response. Interviewer effects predominantly manifest themselves in the measurement error strand, 1 which can be explained looking at the response process model (e.g., Tourangeau & Rasinski, 1988) with four cognitive processing stages: comprehension of the survey question, retrieval of relevant information, use of that information to make the required judgments, and mapping the answer onto response categories for reporting. Prior research suggested that survey interviewers may affect the survey response (Hox et al., 2004) in three ways: 1) their socio-demographic characteristics may (un)intentionally trigger certain memories or responses in respondents; 2) interviewers’ personality traits and social skills may be used more or less successfully to elicit answers; and 3) interviewer opinions and attitudes may shift respondent answers in particular directions. While the standardized survey protocol discourages interviewers from voicing opinions or attitudes, we acknowledge that conscious or subconscious human error may occur, and interviewers do deviate from the survey protocol. This might further depend on the type of questions asked (e.g., Schnell & Kreuter, 2005). Personality and social skills should be important to convince the prospective survey respondents to take part in the survey interview but should play a subordinate role in inducing measurement error. While we acknowledge that interviewers with certain personality traits and social skills may create a more comfortable survey situation in general, the latter two mechanisms are difficult to measure and, thus, beyond the scope of our study. Instead, the focus of this research is on interviewers’ sex 2 as one of many socio-demographic characteristics that have been found to be influential. 3
Social attribution theory posits that respondents may amend their answers in response to notable interviewer characteristics, such as their sex (Fendrich et al., 1999; Heeb & Gmel, 2001). One underlying mechanism might be social desirability bias, i.e., respondents’ tendency to overreport desirable behavior and attitudes and to underreport undesirable ones (e.g., Krumpal, 2013; Phillips & Clancy, 1972; Tourangeau et al., 2000) as a reaction to the interviewer’s sex and thus provide responses that they believe the interviewer may expect, prefer, or tolerate (Stojmenovska & Steinmetz, 2017). For instance, we may expect an impact of interviewer sex on survey responses when respondents use cues, such as social norms, values, beliefs or stereotypes (Fendrich et al., 1999; Heeb & Gmel, 2001). To give an example, if asked about their attitudes on gender equality, respondents might report more progressive views to women in comparison to men. This might occur because of image management (e.g. to appear likeable) or adhering to social etiquette (e.g. to avoid insulting or upsetting the female interviewer).
Prior studies have found that formal question characteristics may offer interviewers some leeway to influence the survey response (Fellegi, 1964; Hyman, 1954; Schnell & Kreuter, 2005): for instance, open-ended, attitudinal, difficult (Mangione et al., 1992) or sensitive questions may be more prone to interviewer effects compared to close-ended, factual, easy or non-sensitive questions (Schnell & Kreuter, 2005). While our systematic review does not break down each study to individual questions, we apply these ideas to wider survey topics. For instance, we may expect that a certain topical scope, such as health, sexuality, or gender attitudes, may be more susceptible to sex-of-interviewer effects compared to factual items, e.g., demographics.
Study Design
Selection Strategy
We follow the guidelines by Moher et al. (2009) on the Preferred Reporting of Items for Systematic Reviews and Meta-Analyses (PRISMA) and present a flowchart in the Figure 1 outlining our study selection strategy. Simplified PRISMA Chart for Sex-of-Interviewer Effects on the Survey Response
Study Selection Criteria
We used an inclusive search string to search bibliographic databases (e.g., Web of Science, EBSCOhost, and JSTOR). The search also included grey literature websites and portals and manual collection looking at prior review articles, studies, and conference programs. In addition, a call for published or unpublished studies initiated by the authors and distributed through relevant mailing lists allowed finding additional literature. By January 2022, we identified n s = 1,670 studies; n s = 456 were duplicate records, which were removed from our database, leaving us with n s = 1,214 studies included for screening.
At the screening stage, we applied clearly defined eligibility criteria summarized in Table 1 (middle row). We focused on the use of real survey data, as opposed to simulated, fabricated or fraudulent data, that were collected using an interviewer-administered mode 5 and had an individual as the unit of analysis. The respective study also had to include both female and male interviewers. After screening we were left with n s = 169 studies that had to be more closely inspected for eligibility.
To be included as an eligible study, we further refined our selection criteria and inspected whether the studies reported findings on sex-of-interviewer effects on a survey item or item battery (see Table 1, bottom row). Included were only those studies in which the respondent gave a survey response, and wherein at least one numerical outcome was reported to ensure that studies had sufficient focus and analysis on interviewer sex effects. This means that null findings were also included in the review, as long as authors reported statistical results of the statistically insignificant findings. This strategy left us with n s = 90 eligible studies that included n d = 100 datasets for the systematic review.
Coding
The coding scheme was developed in an iterative process and included potential moderators and/or mediators to explore what, if anything, influenced the relationship between interviewer sex and respondents’ answers to survey items. We coded the following study characteristics: (1) whether the focus of the study was methodological, (2) the survey design (cross-sectional or longitudinal), (3) the main sampling strategy (probability, non-probability, mixed), (4) whether an interpenetrated design to disentangle spatial and interviewer effects was employed (yes/no), (5) the population (general or special), (6) sample sizes, and (7) the mode of data collection (CAPI/PAPI, CATI, CAWI, mixed). Moreover, we include (8) the total number of interviewers, the number of (9) male and (10) female interviewers, (11) the sex of respondents (male/female/both), (12) whether interviewer and respondent sex matching took place. We also coded (13) the topical scope of the survey and identified eleven wider themes: health, social (attitudes/behavior), family (including household matters), sexuality, crime, gender (including equality issues and feminism), work (including organizational or business matters), politics (including economic issues), demographics, and other. Finally, we also coded (14) the regional reach of the studies (Africa, Australasia, Europe, Latin America, North America). Coding was completed by two independent coders and the intercoder reliability, after resolving discrepancies between coders, was moderate to high (Cronbach’s α = 0.99; Cohen’s κ = 0.72). Any coding discrepancies between the two coders were resolved by discussion.
Analysis
The systematic review begins with a discussion of the overall number of studies examining sex-of-interviewer effects and its variation across time. Next, we quantitatively describe the results of the coding exercise. We focus on the number of studies (n s ) and also include the number of datasets (n d ), given that some studies included more than one dataset. A qualitative review of sex-of-interviewer effects on the survey response is presented subsequently, organized by topical area.
Results: A Systematic Review of Sex-of-Interviewer Effects on the Survey Response
General Observations
First, we discuss the general observations about our sample and the number of studies on sex-of-interviewer effects. Figure 2 shows the number of studies in our sample by year since the initial and ground-breaking work on interviewer effects by Kish (1962). Overall, we found 90 studies (n
s
) including 100 datasets (n
d
) that related to sex-of-interviewer effects during our field period. The earliest eligible study identified on sex-of-interviewer effects has been published by Colombotos et al. (1968); the newest studies during our field period have been identified as Johann and Mayer (2021), Martin (2021) and Nillesen et al. (2021). Number of Studies by Year
Figure 2 further demonstrates that only few studies explored the sex of interviewers until the mid-2000s, when they become more common. Push to web initiatives (e.g., Carpenter & Burton, 2018; Dillman, 2017; Lynn, 2020) may explain the potential pause to sex-of-interviewer studies, as web surveys rarely rely on interviewers. This was surely exacerbated by the Covid-19 pandemic, which paused the number of face-to-face survey data collections. Next, we move on to discussing the overall study characteristics.
Study Characteristics
Systematic Review of Sex-of-Interviewer Research
Note. Y refers to yes, N to no, N/A to not available.
PR refers to probability sampling strategy, NPR to non-probability sampling, Mix to a mixed sampling strategy combining probability and non-probability methods.
CAPI stands for Computer-Assisted-Personal-Interviewing, PAPI for Paper-and-Pencil-Interviewing, CATI for Computer-Assisted-Telephone-Interviewing including mobile phones, CAWI for Computer-Assisted-Web-Interviewing with virtual interviewers, Mix to mixed-mode designs.
No refers to number.
AFR refers to Africa, AASIA includes Australasian countries, EUR refers to Europe including Switzerland and Norway, LATAM to Latin and Middle America, NORAM to North America.
To get a better idea of the topical scope of the reported survey results on sex-of-interviewer effects, we coded up to five topics for each study in our sample. Figure 3 presents an overview of the number of topics (nt) found across datasets (n
d
) and studies (n
d
). Note that the number of topics (n
t
= 164) exceeds the overall number of datasets (n
d
= 100) and studies (n
d
= 90), as a single survey data set could include several topics. Overall, we found that the majority of datasets focused on sexuality (n
t
= 29), followed by gender (n
t
= 28), and health (n
t
= 26). Further, we identified that n
t
= 20 datasets covered politics, n
t
= 19 social issues, and n
t
= 18 demographics. Fewer datasets focused on family (n
t
= 7), crime (n
t
= 6), work (n
t
= 6), or other topics (n
t
= 5). Number of Survey Topics Across Datasets. Note that Datasets/Studies Could Include Multiple Topics, i.e., nt = 164, nd = 100, ns = 90
Qualitative Review of Sex-of-Interviewer Effects on the Survey Response
We qualitatively reviewed the n s = 90 eligible studies (n d = 100) regarding sex-of-interviewer effects on the survey response. Acknowledging that the topical scope is relevant to the presence of such effects, we organize our review by topic. Where possible, we identify the over-arching or most typical or systematic outcome for the critical topics, 10 but we do not focus on the magnitude of sex-of-interviewer effects. This would require standardizing, converting and quantitatively summarizing effect sizes as part of a meta-analysis which is beyond the scope of this study. We highlight core findings by topical area in bold.
Sexuality
We would expect that surveys on topics relating to sexuality are more susceptible to sex-of-interviewer effects than other topics, with the modal finding that response patterns are different in same-sex vs opposite-sex pairs. This is especially the case for female respondents, suggesting that homophily (Frankel, 2016), group membership (Schneider, 2008), in-group loyalty (Stojmenovska & Steinmetz, 2017) or self-disclosure theory (Catania et al., 1996, p. 110) play a part in the direction of answers. Generally, women may feel more comfortable disclosing their intimate experiences to other women however in many instances, evidence is too complex to draw underlying conclusions with confidence as sometimes findings present a mix of sex-of-interviewer effects with respect to their direction of bias.
Galla et al. (1981) find that female interviewers elicited more non-traditional responses compared to their male counterparts. Research by Darrow et al. (1986) indicated that female researchers recorded reports of men engaging in homosexual behavior more frequently, while male researchers recorded engagement in oral sex more often. Both studies relied on relatively small sample sizes (e.g., #10, #14). Axinn (1991) suggested that interviewer sex played a role in fertility reports, where female interviewers gathered significantly more progressive responses on contraceptive use and desire for a smaller family size. This study did not report the actual number of male and female interviewers, however (e.g., #2). Becker et al. (1995) find that women refused to answer question about sexual intercourse more frequently when a male interviewer conducted the survey, but also that they were less likely to respond to female interviewers on the use of contraceptives, arguing women might be worried about female gossip. It is worth noting that the number of female and male interviewers is uneven (e.g., #30).
Regarding sexual coercion and abuse, some studies find that survey respondents are more likely to disclose coercion and abuse to male interviewers (Chun et al., 2011; Dailey & Claus, 2001). However, especially the latter study, while relying on large case numbers, recruited respondents employing a non-probability sampling strategy and also included a much larger number of female interviewers (e.g., #38). Other research suggests that female interviewers are more likely to elicit higher reports of condom use (Kianersi et al., 2020; McCombie & Anarfi, 2002). The latter study relies on smaller case numbers and only two male and two female interviewers though (e.g., #86). Ahn et al. (2005) find that men exaggerate their erection function when interviewed by a male interviewer. However, their study relied on small sample size and involved only one male and one female interviewer (e.g., #46). In addition, a study of homosexual men using ACASI indicates that a male voice yielded fewer reports of unprotected sexual intercourse but elicited higher reports of HIV-negative partners (Fahrney et al., 2010). Surveys on sexual practices have produced null findings for potential sex-of-interviewer differences in the past (Blanc & Croft, 1992; Catania et al., 1996; Chitwood, 1988; DeLamater, 1974; Johnson & DeLamater, 1976; Johnson & Moore, 1993). However, some studies relied on non-probability methods, uneven numbers of male and female interviewers, or did not transparently report the number of interviewers or their sex (e.g., #4, #7, #16).
Gender
Given the topical scope, we might expect reporting on topics relating to gender to be generally more prone to sex-of-interviewer effects. The modal finding in the literature indicates that male interviewers have been found to elicit more traditional responses and female interviewers more liberal, progressive or feminist views on gender issues (Ballou & Del Boca, 1980; Benstead, 2010; Flores-Macias & Lawson, 2008; Galla et al., 1981; Huddy et al., 1997; Kane & Macaulay, 1993; Kappelhof, 2014; Lipps & Lutz, 2016; Stojmenovska & Steinmetz, 2017; Wang et al., 2010). Some studies display small or uneven numbers of male and female interviewers (e.g., #9, #75), do not report the number of interviewers nor their sexes (e.g., #68, #79 #56), rely on non-probability samples or mixed sampling designs combining probability with non-probability methods (e.g., #10, #48), or have small sample sizes (e.g., #32). Similar sex-of-interviewer effects are also found regarding survey attitudes towards the gender pay gap (Walzenbach et al., 2017) or the use of gender-exclusive language (Wang et al., 2010). Neither study allowed tracking the number of interviewers by sex. The sex-of-interviewer effect also seems to be present when employing the same male and female virtual interviewers for all respondents (Tourangeau et al., 2003). However, there appears to be variation regarding the wider topic area of gender: some scholars find that male interviewers may trigger more liberal views (Landis et al., 1973) – admittedly, the research relied on a very small sample size and only one male and one female interviewer (e.g., #3). Others suggest that more progressive attitudes are revealed to female interviewers (Walker, 1992), but it appears that also a larger number of women were employed as interviewers (e.g., #24).
Health
We found mixed results about sex-of-interviewer effects on health-related survey items but at large, the modal finding suggests that respondents state better health conditions when interviewed by men (Chun et al., 2011; Lipps & Lutz, 2016; Moum, 1998) and other research indicated that respondents seem more prone to admit to poor health to female interviewers (Krol et al., 2011; Pollner, 1998). We may speculate that respondents may want to look good in front of male interviewers and might feel more comfortable to reveal poor health to women, who might presumably be more caring. It is noteworthy that some studies rely on poor sample quality or small samples (e.g., #59, #60) and an uneven distribution of male and female interviewers (e.g., #59, #75) or lacked information about the interviewers (see e.g., #33).
Looking at substance abuse, research indicates that respondents are more likely to disclose this behavior to male interviewers compared to females. Findings in this light have been found for alcohol abuse (Chun et al., 2011; Cosper, 1972; Johnson & Parsons, 1994) and drug abuse (Johnson & Parsons, 1994). While the above studies claim probability sampling strategies, the case numbers are low with a few hundred participants from special populations (e.g., #2, #60, #26). Moreover, it seems that the number of male and female interviewers is uneven (e.g., #2, #26) or completely unknown (e.g., #60). Studies specifically on drug abuse suggest the opposite effect, i.e., respondents are more likely to report substance abuse to female interviewers (Darrow et al., 1986; Pollner, 1998). The former relies on a non-probability sample, small case numbers and a small number of interviewers unevenly distributed across sexes (e.g., #14). Another two recent studies do not find any sex-of-interviewer differences in reporting substance abuse (Heeb & Gmel, 2001; Johnson et al., 2000). While the former does not report the number and sex distribution of the interviewers, it is noteworthy that the latter employed twice as many female interviewers (e.g., #40, #37).
Research on mental health suggest that respondents provide higher levels of rated subjective anxiety to men (Pollner, 1998; Waters, 1975). Admittedly, the latter study relies on very small case numbers sampled using non-probability methods (e.g., #6). Finally, a study on the willingness of paying higher fees for healthcare products or services (Ngongo et al., 2015) indicates that a female interviewer elicits a greater willingness to pay higher prices. Again, this research relies on one female and one male interviewer only (e.g., #69). Finally, null findings regarding sex-of-interviewer effects are found on restraint and disinhibited eating (Eisinga et al., 2011).
Politics
Overall, we also identify sex-of-interviewer effects on political topics, although the direction of the effect is mixed and does not present clear modal findings. For instance, similarly to the results on gender issues, Ballou and Del Boca (1980) find that female interviewers elicit more progressive responses on a question about women’s movement and research by Kane and Macaulay (1993) indicates that female interviewers trigger more egalitarian and critical responses on inequality. Callegaro et al. (2005) find that white female interviewers trigger more “don’t know” responses, but also that male African-American interviewers elicit a higher willingness to vote for the white, male political candidate than female African-American interviewers. The former finding is also confirmed by research conducted by Schneider (2008). Work by Nguyen (2018) found that female interviewers elicit more frequently respondents’ admission to have taken a loan in the last 12 months but also higher trust in financial institutions; male interviewers seem to trigger higher prevalence of village meetings in the last 12 months. Johann and Mayer’s study (2021) indicates that respondents interviewed by (highly educated) men reveal a higher level of political knowledge. While these studies vary in the mode of data collection, most studies employ high quality sampling strategies, decent sample sizes, as well as even numbers of female and male interviewers. Only one of the studies employed a larger number of female interviewers and a smaller sample in one of the data sets (e.g., #9). Using virtual interviewers, Tourangeau et al. (2003) suggests that respondents report to have turned out in election more often when the virtual interviewer is male. Null effects are found on studies on ideology and political leaders (Benstead & Malouche, 2015; DeLamater, 1974; Lipps & Lutz, 2016; Schneider, 2008) as well as finance (Crossley et al., 2021).
Social Matters
Within the topical area of social matters, sex-of-interviewer effects also matter. The modal finding across several studies indicates that respondents report more progressive social views when the interviewer was female (Ballou & Del Boca, 1980; Dailey & Claus, 2001; Nguyen, 2018; Pearson, 1982; Walzenbach et al., 2017). It also appears that despite the reported random assignment of respondents to interviewers, male interviewers interview more male respondents, fewer respondents over age 60, fewer widowed, more respondents who are working or have higher total family incomes. Benstead (2014) finds that the sex of interviewers and religiosity interact: among religious respondents, female secular interviewers received fewer religious responses compared to men. Furthermore, religious respondents are less likely to say religion is very important when interviewed by a secular female interviewer. Moreover, respondents state a higher willingness to pay for an environmental good when interviewed by female interviewers compared to male interviewers (Gong & Aadland, 2011). Finally, Nass et al. (2003) show that respondents are more likely to reveal sensitive social behavior when prompted by a female voice. However, Tourangeau et al. (2003) suggests that virtual interviewers did not display sex-of-interviewer effects on embarrassing behavior. Admittedly, some studies suffer from poor design characteristics: for instance, unevenly distributed numbers of female and male interviewers (e.g., #13, #9, #38, #58), a lack of information on interviewers (e.g., #80), or small sample sizes (e.g., #11).
Demographics
Sex-of-interviewer effects do not seem to be commonly present in demographic questions, and even when so the modal finding is a null effect suggesting that more factual questions do not elicit any sex-of-interviewer effects. Only, Haber et al. (2018) show that female interviewers increased the odds of reporting asset ownership and household size. Most studies find no substantive sex-of-interviewer effects in respondents’ self-reported educational level (Groves & Fultz, 1985; Oreffice & Quintana-Domeque, 2016), race (Groves & Fultz, 1985), anthropometric measures referring to proportions of the human body (Oreffice & Quintana-Domeque, 2016), but also other socio-demographic attributes including age, occupation, religion, income, or union membership (Hutchinson & Wegge, 1991). Most of these studies rely on quality samples and larger case numbers. Although socio-demographic topics in these manuscripts vary extensively, null findings on sex-of-interviewer effects may not be surprising, given that these are factual questions which may be less prone to misreporting to either sex.
Family
Within the topic area of family and household, we identified few sex-of-interviewer effects, with the modal finding indicating that female interviewers received more pro-family or candid responses related to their household or domestic lives. For instance, female interviewers have been found to elicit higher rates of child deaths in Nigeria (Becker et al., 1995) but also more pro-marriage attitudes than male interviewers (Liu & Stainback, 2013). In line with this finding, research shows that female interviewers are associated with systematically less traditional attitudes towards gender roles in the household amongst Turkish respondents (Kappelhof, 2014). Stojmenovska and Steinmetz (2017) also find small but significant sex-of-interviewer effects on household items. The latter two studies do not provide details about the total number of interviewers or split by sex (e.g., #68, #79). However, null effects of sex-of interviewer effects are prevalent in other studies on home- or family-related content, such as children born or partners’ attitudes (Blanc & Croft, 1992; Kane & Macaulay, 1993; Lueptow et al., 1990).
Crime
The modal finding amongst crime studies indicate that interviewer sex does not influence respondents’ answers on the topic of crime. The only study reporting effects suggests that female interviewers trigger greater concerns about crime (Schräpler, 2001). While the study relies on a large sample, questions need to be raised about the sampling strategy representing a mix of probability and non-probability methods. It also did not specify the total number of interviewers or split by sex.
Work
Looking at work-related topics, we identify some interviewer sex effects in literature although these findings do not present a clear pattern of effects. For example, Anker et al. (1987) find that male interviewers elicit higher labor force participation rates in more specific questions, while female interviewers elicit higher rates for more general questions. Furthermore, research identified some interviewer effects concerning stereotyping occupations on the basis of sex (Clarke, 1989): female interviewers were more likely to reduce sex stereotyping for occupations often associated with men, e.g., university lecturers; male interviewers had a similar effect on stereotyping for jobs often associated with women, e.g., fashion reporters. Axinn (1991) indicated that female interviewers gathered significantly more reports of any wage work, farm work, ownership of farm animals, and money loaning compared to male interviewers. Some effect was identified on work-related items by Kane and Macaulay (1993) where female interviewers triggered more egalitarian responses in favor of gender equality. Martin (2021) finds that female interviewers yielded more negative ratings of journalists in Lebanon and Jordan in a survey on journalistic credibility. However, some studies have little information about the design regarding the total number of interviewers or split by sex (e.g., #17, #20, #89).
Other Topics
Miscellaneous topics are addressed in a few studies. For example, Pol and Ponzurick (1989) find that female interviewers trigger higher reports of attendance rates for football games. Little information is available about the study design, which seems to rely on rather small case numbers but lacks insights to the total number interviewers or split by sex (e.g., #18). However, null effects were found for the use of seat belts (Fhanér & Hane, 1974) or on general knowledge (Yang & Yu, 2008).
In sum, while our systematic review suggests that there is wider variation in the presence and direction of the effect of interviewer sex on the survey response, we note that the sex of interviewer seems to matter. Furthermore, we critically observed that reporting does not seem to follow a systematic pattern for the eligible studies, except for a few topics such as items related to gender issues, health or sex. Standard information for some studies was difficult to obtain or unavailable.
Discussion and Conclusions
This article presented a systematic review of sex-of-interviewer effects on the survey response. We posited that while sex-of-interviewer effects have been studied previously (e.g., Catania, Gibson, Chitwood, et al., 1990; Catania, Gibson, Marin, et al., 1990; Davis et al., 2010; Schaeffer et al., 2010; Sudman & Bradburn, 1974; West & Blom, 2017), a systematic review of the findings of potential effects is yet missing. Our article aims to fill this gap by studying publications and grey literature between 1962 and 2022. To our knowledge, this article represents the first systematic review of sex-of-interviewer effects on the survey response. We embedded our research in the Total Survey Error (e.g., Andersen et al., 1979; Groves & Lyberg, 2010; Weisberg, 2009), in which interviewer effects are predominantly associated with measurement error. Our analysis relies on a sample of n s = 90 studies containing n d = 100 datasets, which represented a larger sample than we anticipated. We presented our overall observation about the number of studies across time, and systematically discussed the studies’ survey characteristics. We also provided a qualitative review sex-of-interviewer effects by topical scope.
Our findings indicate that the presence and direction of sex-of-interviewer effects may depend on a variety of characteristics. The results lend some confidence that sampling procedures, population types, and sample sizes, and total number of interviewers or for each sex group are relevant characteristics to study and control for when modelling interviewer effects. While the analysis further suggests that the topical scope is important, it may fall too short to argue that some topics are connotated to be more masculine, such as politics and economics, or feminine, such as gender equality and family.
Sensitive questions relating to topical areas such as gender, health or sexuality seem to be more susceptible to sex-of-interviewer effects compared to factual topics such as demographic questions. In addition, future research may wish to investigate the interaction between interviewer and respondent sex. Potential relations between interviewers and respondents have already been observed (e.g., Benstead, 2014; Kane & Macaulay, 1993; Sahgal & Horowitz, 2011; Stojmenovska & Steinmetz, 2017). Researchers may want to re-think the suitability of the mode of data collection, opting for self-administered rather than interviewer-administered modes, or consider sex-matching, if appropriate.
Evidence Based Recommendations
In sum, our research suggests that it is important to continue conversations about interviewer sex in survey methodology and the wider framework of Total Survey Error. Only if we better understand how sex and other characteristics might influence the survey response, can we start thinking about how to mitigate these effects or how to correct for them post data collection. Our study proposes a few evidence-based recommendation how sex-of-interviewer effects could be addressed in field work.
Footnotes
Acknowledgements
The authors would like to thank Prof. Dr. Rainer Schnell for his valuable guidance in defining the research scope as well as Dr. Vanessa Gash and Dr. Sally Stares for their feedback and suggested improvements to this research.
Funding
The main author discloses receipt of the following financial support for the research and authorship of this article: This work was supported by the GESIS Eurolab Grant in 2018.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
