Abstract
Socially desirable responding (SDR) has been of long-standing interest to the field of marketing. Unfortunately, the construct has not always been well understood by marketing researchers. The authors provide a review of the SDR literature organized around three key issues—the conceptualization and measurement of SDR; the nomological constellation of personality traits, values, sociodemographics, and cultural factors associated with SDR; and the vexing issue of substance versus style in SDR measures. The authors review the current “state of the literature,” identify unresolved issues, and provide new empirical evidence to assess the generalizability of existing knowledge, which is disproportionately based on U.S. student samples, to a global context. The new evidence is derived from a large international data set involving 12,424 respondents in 26 countries on four continents.
Keywords
Surveys play a crucial role in marketing research. For example, of the 636 empirical articles that appeared in Journal of Marketing and Journal of Marketing Research during 1996–2005, nearly 30% employed surveys (Rindfleisch et al. 2008). A frequently noted concern with self-reports collected through surveys is that respondents may not respond truthfully but simply provide answers that make them look good (Paulhus 2002; Tourangeau and Yan 2007). This phenomenon is called socially desirable responding (SDR). The SDR phenomenon introduces extraneous variation in scale scores, which compromises the validity of marketing survey data. Consequently, SDR has been called “one of the most pervasive response biases” in survey data (Mick 1996, p. 106).
Despite the generally recognized importance of SDR in survey research, it has attracted relatively little attention in marketing. Only a few articles that explicitly address SDR (De Jong, Pieters, and Fox 2010; Fisher 1993; Mick 1996) have appeared in the major marketing journals in the recent past. Response biases, including SDR, are sometimes mentioned in scale development studies, but usually researchers simply report a correlation between the substantive construct of interest and an SDR scale and either conclude that SDR is not a problem (if the correlation is nonsignificant) or claim that SDR is not a serious issue (if the correlation is relatively small).
Socially desirable responding has been an area of active research in recent years, especially in psychology, and this work has led to important new insights, which call into doubt theories and practices that are still considered standard in marketing. Our reading of the marketing literature has led us to identify at least four common misconceptions: (1) SDR can be validly conceptualized as a unidimensional construct, (2) any of the SDR scales available in the literature can be used to assess SDR because they all measure the same construct, (3) the goal is to avoid a significant correlation between substantive constructs and SDR scales because such an association always implies contamination, and (4) the biasing influence of SDR can be removed simply by including a measure of SDR as a control variable.
Consequently, the aim of this article is twofold: (1) to update marketing researchers on the latest thinking in SDR research and (2) to reinforce and extend what is known using an unusually large international data set involving 12,424 nationally representative respondents in 26 countries on four continents. We organize the article around three key issues that are important for an improved understanding and treatment of SDR in marketing survey research. First, we consider the SDR construct and its measurement. Second, we examine the nomological constellation of personality traits, values, sociodemographics, and cultural factors associated with SDR. As with any other behavioral construct, SDR does not exist in isolation but rather derives much of its meaning from the theoretical constellation of related constructs. Third, we discuss the vexing issue of whether respondents’ ratings on SDR scales represent substance or style and how researchers should interpret an association of SDR with a substantive marketing construct. We discuss each issue in a separate section, first providing a review of the literature and identifying unresolved issues and then reporting new empirical evidence based on our global study. The article concludes with guidelines for the field and suggestions for further research.
The SDR Construct
An in-depth discussion of SDR should begin with the construct per se, which has been the target of considerable debate over the years. In this section, we discuss different conceptualizations and self-report measures proposed in the literature and review prior use of SDR scales in the marketing literature. Finally, we present new empirical evidence and address several unresolved issues using our global study.
Varieties of Social Desirability
Socially desirable responses are answers that make the respondent look good, based on cultural norms about the desirability of certain values, traits, attitudes, interests, opinions, and behaviors. In the past, social desirability has been studied either as a characteristic of items or as an aspect of personality. Our focus here is on the latter—that is, respondents’ enduring tendencies to provide overly positive self-descriptions (Paulhus 2002).
Initially, SDR was conceptualized as a unidimensional construct, and several instruments were developed to measure individual differences in SDR. However, low correlations between these scales soon led to the formulation of various two-factor models. One proposal was that SDR could be either a reflection of an exaggerated but honestly held self-view—an unconscious tendency to claim positive attributes and deny negative ones—or a deliberate attempt to project a favorable self-image. Terms such as alpha or gamma bias, self- versus other-deception, and self-deceptive enhancement (SDE) versus impression management (IM) were used to refer to these different expressions of SDR (Paulhus 1991).
More recently, instead of emphasizing the distinction between two forms of SDR based on level of awareness (nonconscious versus conscious), researchers have focused on the difference between two content domains in which SDR may be displayed. According to this view, self-favoring response tendencies are best understood in the context of two “fundamental modalities of human experience”—agency and communion (Paulhus and John 1998). Some people are more likely to engage in SDR in agency-related contexts, which involve dominance, assertiveness, autonomy, influence, control, mastery, uniqueness, power, status, and independence. Paulhus and John (1998) call this form of SDR egoistic response tendencies (ERT). Other people are more likely to engage in SDR in communion-related contexts, which are associated with affiliation, belonging, intimacy, love, connectedness, approval, and nurturance. Paulhus and John refer to this type of SDR as moralistic response tendencies (MRT).
The most elaborate conceptualization of SDR to date has been proposed by Paulhus (2002), who cross-classifies SDR by degree of awareness and domain of content. Thus, positively biased (superhero-like) self-perceptions on intellectual, social, and emotional qualities (ERT) can be unconscious and honestly held or deliberately and strategically projected. Similarly, positively biased (saint-like) self-perceptions on attributes related to responsibility and interpersonal relationships (MRT) can be sincere and genuinely believed or purposefully and instrumentally distorted. Paulhus (2002) argues that conscious IM is more susceptible to situational demands and therefore is less consistent across contexts and time, whereas unconscious self-deception is more dispositional and traitlike.
Self-Report Measures of SDR
Many scales have been proposed over the years to measure individual differences in SDR (for a review and references to the original literature, see Paulhus 1991). Among these are the Edwards SD scale, the Wiggins Sd scale, the Marlowe–Crowne social desirability scale, various lie scales (e.g., the EPI lie scale), and Paulhus's own balanced inventory of desirable responding (BIDR). The BIDR is the only multidimensional instrument and differentiates between SDE, which was assumed to measure unconscious positivity bias, and IM, which was believed to assess deliberate inflation of self-descriptions. A third subscale, called self-deceptive denial, was also hypothesized to measure unconscious bias, but it usually correlates strongly with IM and is not used frequently. The Edwards SD scale is closely related to SDE, whereas the Wiggins Sd scale and the EPI lie scale are strongly associated with IM. Paulhus (1991) argues that even though the Marlowe–Crowne scale is significantly correlated with both SDE and IM, it is primarily a measure of conscious distortion. However, several studies show that the correlations with SDE and IM are similar (e.g., Helmes and Holden 2003).
Recent studies have demonstrated that it is important to distinguish between the agentic and the communal forms of SDR (i.e., ERT and MRT) and that the SDE and IM subscales of the BIDR can be used to measure ERT and MRT, respectively (Konstabel, Aavik, and Allik 2006; Lalwani, Shavitt, and Johnson 2006; Paulhus 2002; Paulhus and John 1998; Pauls and Crost 2004; Pauls and Stemmler 2003). Paulhus (1991) reports reliabilities from .68 to .80 for SDE and from .75 to .86 for IM, but other more recent studies show lower scale reliabilities—in the mid-.60s for SDE and in the low .70s for IM (Meston et al. 1996; Pauls and Stemmler 2003; Reid-Seiser and Fritzsche 2001; Roth and Herzberg 2007). With an intercorrelation in the .05–.40 range, SDE and IM exhibit discriminant validity (Paulhus 1991; Pauls and Stemmler 2003).
Paulhus (2002) further suggests that the SDE scale assesses unconscious ERT, that the self-deceptive denial scale measures unconscious MRT, and that the IM scale (which should be renamed “communion management” to more clearly express that it measures only one form of IM, namely, moralistic IM) captures deliberate MRT. According to Paulhus, no scale exists to measure agency management (deliberate ERT).
However, the notion that the SDE scale assesses unconscious bias and the IM scale assesses conscious bias has been largely discredited (Pauls and Crost 2004; Roth and Herzberg 2007). Initial support for this idea was based on findings that IM was sensitive to explicit “fake good” manipulations, whereas SDE was not. Subsequent research has shown that this was primarily because the “fake good” manipulations had a connotation of “fake communion” and that if “fake agency” instructions were used, both scales were equally sensitive to faking manipulations. It appears that though the SDE and IM scales can be used effectively to differentiate ERT from MRT, the two scales do not tap unconscious and conscious bias, respectively. In situations in which demands for favorable self-presentation are minimal (e.g., when the topic is not sensitive and the data are collected anonymously), the SDE and IM scales are likely to capture unconscious biases. In contrast, when situational pressures to project a favorable image are strong (e.g., with explicit faking instructions, when the topic is sensitive and public disclosure of responses is possible, when something is at stake as in personnel selection contexts), the two scales probably capture both unconscious biases (which emerge even in the absence of situational demands) and conscious biases (which are encouraged by the situation). Table 1 summarizes our review of the literature on the conceptualization and measurement of SDR and identifies several important unresolved issues.
Summary of the State of the Literature and Unresolved Issues
Unsolved issues that are investigated in our global study are in italics.
Prior Use of SDR Scales in Marketing Research
We conducted a search of the marketing literature to determine how often SDR scales were used in empirical research. Specifically, we analyzed how often the two most well-known scales (Marlowe–Crowne and BIDR) were used in articles in Journal of Marketing Research, Journal of Marketing, and Journal of Consumer Research. This analysis showed that the Marlowe–Crowne scale was by far the most frequently employed SDR instrument. It appeared in 26 articles between 1968 and 2008. In 23 cases, the Marlowe–Crowne scale was used to check for response bias in a construct of interest or to control for response bias when investigating the relationship between substantive constructs. In three instances, the Marlowe–Crowne scale served as a measure of a substantive construct (e.g., social approval). The BIDR was used in only 7 articles between 1996 and 2008. In 4 applications, only the IM scale was employed; in 1 application, the overall BIDR scale was used; and in 2 instances, both the SDE and IM scales were applied. With one exception, the BIDR was used to control for response bias.
These results show three things. First, it appears that SDR scales are used infrequently in the major marketing journals. Second, if an SDR scale is employed, it is usually the Marlowe–Crowne scale, which does not distinguish between different forms of SDR and actually confounds egoistic and moralistic responding. Third, almost without exception, an association with SDR is treated as evidence of response bias. As we subsequently show in greater detail, in general, this practice is not warranted.
Empirical Evidence from the Global Study
In our empirical study, we measured ERT and MRT with the SDE and IM subscales of Paulhus's (1991) BIDR. The BIDR consists of 20 SDE and 20 IM items, but the market research agencies that administered the surveys considered the full 40-item scale too long to administer and were concerned about respondent dropout (see also De Jong, Steenkamp, and Veldkamp 2009). Therefore, we selected a subset of 10 SDE and 10 IM items by omitting potentially offensive and/or inappropriate items while retaining the balanced structure of the scale (5 positively and 5 negatively worded items per SDR dimension). We used five-point Likert scales (1 = “strongly disagree,” and 5 = “strongly agree”) to collect the ratings. Table 2 presents the items.
Items from the BIDR Used in Global Study
Indicates a negatively keyed item.
Our extensive data set—collected among more than 12,000 respondents in 26 countries in Europe, Asia, and North America—enables us to examine basic characteristics of these measures of choice for ERT and MRT on a global scale. Details of our entire data collection effort appear in the Web Appendix (see http://www.marketingpower.com/jmrapril10). We estimated respondents’ latent ERT and MRT scores using a recently developed hierarchical item response theory (IRT) modeling technique, which relaxes the condition of cross-national measurement invariance. We calculated reliability of the construct as
The average reliabilities for ERT and MRT were .67 and .73, with a range from .49 to .76 and .67 to .77, respectively. In 25 of 26 countries, MRT is measured more reliably than ERT. Thus, although the BIDR was developed and refined in North America, both components of SDR can typically be measured with a reasonable degree of reliability around the world (the only exception was ERT in Thailand, where the reliability was only .49). These results are especially encouraging given that negatively keyed items often work less well in non-Western countries. In addition, ERT and MRT exhibit discriminant validity in all countries. The average correlation between ERT and MRT was .31, with a range of .19–.43. Figure 1 gives the country means on ERT and MRT (relative to the United States, which is indexed at 100). The figure shows that ERT and MRT are not equally prevalent around the world. In the next section, we examine possible cultural causes.

Country Scores on ERT and MRT
The Constellation of ERT and MRT
As with other behavioral constructs, ERT and MRT do not exist in isolation but rather derive much of their meaning from the nomological constellation of related constructs. Understanding this constellation provides additional insights into the two dimensions of SDR. In this section, we first discuss previous work on the relationships of ERT and MRT with personality traits, personal values, sociodemographics, and national culture. Then, we present new evidence to reinforce what is known and to investigate unresolved issues based on our global study.
Personality Traits
Personality traits represent basic human ways of experiencing and reacting to the world. The dominant conceptualization of personality is the Big Five factor model, which distinguishes five fundamental personality traits: extraversion, emotional stability (or its opposite, neuroticism), agreeableness, conscientiousness, and openness to experience (Digman 1990). At an even more abstract level, the Big Five factors tend to load on two higher-order constructs consisting of openness to experience and extraversion on the one hand and agreeableness and conscientiousness on the other hand (the position of emotional stability is less clear). Digman (1997) relates the two factors to the theoretical distinction between agency (openness to experience and extraversion) and communion (agreeableness and conscientiousness).
The motives underlying ERT and MRT are congruent with the consistencies in behavior captured by the two sets of traits. People engage in egoistic responding to satisfy their power and achievement strivings and their needs for mastery and control, and behavioral regularities such as being outgoing, generating enthusiasm, or radiating energy (extraversion) and having an imaginative mind, being open to new ideas, or valuing change (openness to experience) support these motives. In contrast, MRT entails an avoidance of disapproval by conforming to social norms and a positive valuation of relationships and intimacy. Being considerate, cooperating with others, or showing affection (agreeableness) and doing things thoroughly, controlling impulses, or showing perseverance (conscientiousness) are in accordance with these motives (Paulhus 2002; Pauls and Stemmler 2003).
Two additional issues must be mentioned briefly. First, in Paulhus and John's (1998) theoretical work, emotional stability is not directly involved in either ERT or MRT. However, key characteristics of emotional stability, such as high self-esteem and dominant-assured personality, have clear agency qualities. In support of this notion, Mick (1996) reports that self-esteem, conceptualized as a psychological adjustment construct, exhibited a correlation of .53 (p < .001) with ERT and a correlation of .21 (p < .01) with MRT. Thus, we may expect a primary association of emotional stability with ERT and a secondary association with MRT. Second, Digman (1997, p. 1251) notes a small but consistent relationship between conscientiousness and the agency metafactor, which may be explained by the notion that conscientiousness includes the subfactors of achievement and competence (see also Paulhus and John 1998). These considerations suggest that conscientiousness exhibits a secondary effect on ERT.
The relationship between the Big Five factors and ERT/MRT has attracted considerable empirical research, which we summarize in Table 3 (including sample-size weighted average correlations and 95% confidence intervals). Across all studies, emotional stability is strongly related to ERT (
Correlations of ERT and EMT with the Big Five Personality Traits: Summary of Previous Studies
Notes: O = openness to experience, E = extraversion, ES = emotional stability, C = conscientiousness, and A = agreeableness. Correlations that are significant at p = .05 (two-sided) are in bold. In case a study reports correlations for “standard” and “special” conditions (e.g., fake good), we report the results for the standard condition. Average correlations and confidence intervals are sample-size weighted.
Although many of these results are consistent with theoretical expectations, several issues remain. First, there are questions about the generalizability of the findings because all studies were conducted in North America and Europe and the participants were mostly students. Second, prior studies relied on bivariate correlations between SDR and the Big Five, which ignore the shared variance among the latter. As we indicated previously, at a higher level, the Big Five load on two factors, so it is of interest to investigate whether previous findings change when ERT and MRT are regressed on all five trait factors simultaneously. Third, contrary to expectations, conscientiousness does not have a stronger relationship to MRT than to ERT.
Values
Values are concepts or beliefs pertaining to desirable end states or modes of conduct that transcend specific situations, guide selection or evaluation of behavior, and are ordered by relative importance (Schwartz 1992, p. 4). It is currently widely accepted that the most comprehensive and rigorously validated representation of human values is the Schwartz (1992) value typology. Schwartz derived a typology of ten distinct types of values that reflect a continuum of related motivations. The value types are organized into four higher-order value domains: self-enhancement, openness to change, self-transcendence, and conservation.
Self-enhancement values express the extent to which people are motivated to enhance their personal interests, even at the expense of others. Underlying the self-enhancement domain are the value types of power and achievement. People who place priority on self-enhancement values should be more prone to exhibit ERT. Openness-to-change values derive from people's needs for control, autonomy, independence, and stimulation. The common aspect of these values is that they motivate people to follow their own intellectual and emotional interests in unpredictable and uncertain directions, which is congruent with the agency motive underlying ERT.
Self-transcendence arrays values in terms of the extent to which they motivate people to transcend selfish concerns and promote the welfare of others, close and distant, and society at large. Self-transcendence encompasses the value types of benevolence and universalism. Benevolence focuses on concern with the welfare of close others, and universalism focuses on understanding, appreciation, and protection of the welfare of all people. According to Paulhus and John (1998, p. 1039), the communion orientation underlying MRT extends beyond relationships to the positive value placed on “benefiting others, even the society as a whole,” which mirrors benevolence and universalism, respectively. Therefore, self-transcendence should be positively related to MRT. Finally, the common aspect of conservation values is that they motivate people to “preserve the status quo and the certainty it provides in relationships with close others, institutions, and traditions” (Schwartz 1992, p. 43). This is expressed in people's need for security, harmony, and conformity. Conservation values are congruent with the communion orientation of MRT.
The research of Shavitt and colleagues (Lalwani, Shavitt, and Johnson 2006; Shavitt et al. 2006) sheds some light on the relationships between the Schwartz value typology and ERT and MRT. These authors focus on four cultural orientations at the individual level, namely, horizontal versus vertical individualism and collectivism. These cultural orientations broadly mirror the Schwartz value typology. Vertical individualists emphasize self-enhancement values, whereas horizontal individualists emphasize openness to change values. Vertical collectivists stress conservation values, whereas horizontal collectivists emphasize self-transcendence (Shavitt et al. 2006). Across nine studies among samples of U.S. respondents, Shavitt and colleagues find that horizontal individualism (horizontal collectivism) is consistently positively correlated with ERT (MRT). Thus, Shavitt and colleagues provide indirect empirical support for the notion that openness to change (self-transcendence) should be positively related to ERT (MRT). They neither predicted nor found evidence for the role of vertical individualism and vertical collectivism in shaping ERT and MRT, respectively. This might imply that there is no relationship between self-enhancement (conservation) and ERT (MRT). Alternatively, it is possible that vertical individualism (vertical collectivism) does not fully mirror self-enhancement (conservation).
Sociodemographics
Gender is the only sociodemographic variable that has been repeatedly examined in the context of ERT and MRT. Research has consistently found that men score higher on ERT and that women score higher on MRT (Heine and Lehman 1995; Lalwani, Shavitt, and Johnson 2006; Paulhus 1991). Gender differences on ERT and MRT may be explained by traditional, gender-based socialization roles.
National Culture
Crowne and Marlowe (1964) posit that people's tendency to engage in SDR might be systematically related to the culture in which they live. Agency and communion can be related to two of the dimensions of national culture distinguished by Hofstede (2001): individualism/collectivism and masculinity/femininity.
Individualism/collectivism pertains to the degree to which people in a country prefer to act as individuals rather than as members of a group. Members of individualist societies place their personal goals and desires ahead of those of the in-group. In contrast, in collectivist countries, there is a close-knit social structure, in which people expect their group to care for them in exchange for unwavering loyalty. The desires for uniqueness and independence are core elements of individualism, whereas conformity and interdependence are central to collectivism (Hofstede 2001).
Because agency traits, such as independence, self-reliance, and uniqueness, are socially desirable in individualist cultures, exaggerated self-perceptions on these qualities are likely to be beneficial. In contrast, collectivist cultures are conformity oriented, and loyalty to the group and concern with promoting the group's continued existence are rewarded. Communal traits, such as belongingness and maintenance of social relationships, are socially desirable in collectivist cultures, which should encourage people to present themselves in a favorable light on these traits to meet interpersonal goals (Lalwani, Shavitt, and Johnson 2006).
Several studies have contrasted mean differences on ERT or MRT between a collectivist country and an individualist country. Heine and Lehman (1995) find no differences between Canadian and Japanese students on either ERT or MRT. However, Lalwani, Shavitt, and Johnson (2006) report that U.S. students ranked significantly higher than Singaporean students on ERT and significantly lower on MRT. Church (2000) reviews research showing that North Americans score higher than Asians on self-esteem measures and list more positive self-statements. This may be taken as indirect evidence for the notion that ERT is higher in individualist countries. Van Hemert and colleagues (2002) correlate the aggregate scores of 23 countries on Hofstede's (2001) individualism/collectivism dimension with country means on MRT (as measured by the Eysenck lie scale). They report a correlation of −.68 (p < .01). In summary, there is strong evidence that MRT is higher in collectivist countries than in individualist countries. Evidence for a positive relationship between individualism and ERT is more equivocal.
Masculinity/femininity refers to the dominance of ego-enhancing versus relationship-enhancing tendencies in a culture, which are often associated with gender roles. In masculine cultures, the dominant values are assertiveness, achievement, and success, whereas the dominant values in feminine cultures are quality of life, warm interpersonal relationships, and caring for the weak. Because a focus on achievement, power, and dominance encourages self-favoring tendencies on these dimensions, masculine cultures should be more likely to exhibit ERT. Conversely, an emphasis on relationships, nurturance, and the welfare of people and nature is more in line with a self-favoring communion orientation, so feminine cultures should be more likely to exhibit MRT Van Hemert and colleagues (2002) report a correlation of −.17 between masculinity/femininity and MRT This correlation is consistent with theoretical expectations but does not reach statistical significance because of the low power of the test.
Table 1 summarizes our review of the literature on the constellation of SDR. The table also lists unresolved issues addressed in the global study, which we describe next.
Empirical Evidence from the Global Study
Method
In our global survey, we included measures for the Big Five inventory, the Schwartz Value Survey, and sociodemographics (see the Web Appendix at http://www.marketingpower.com/jmrapril10). We used hierarchical IRT modeling to compute the latent scores on the personality and value constructs. Table 4 provides information on their reliabilities and reports correlations between all individual difference variables, pooled across countries. We investigated the nomological constellation of ERT and MRT using the following multilevel specification. For Level 1,
For Level 2,
Reliability and Pooled Correlations between Individual-Difference Variables
Notes: O = openness to experience, E = extraversion, ES = emotional stability, C = conscientiousness, A = agreeableness, Cons = conservation, Open = openness to change, SelfTran = self-transcendence, and SelfEnh = self-enhancement. In parentheses, we report partial correlations, as Schwartz (1992) recommends.
Estimation
As a baseline model, we estimated a model with a random intercept but no individual-level or country-level covariates. The Level 1 variances for ERT and MRT were .034 and .173, and the Level 2 variances were .016 and .025, respectively. Thus, approximately 32% (13%) of the variation in ERT (MRT) was between countries. After we added the individual-level covariates, the Level 1 variances decreased to .025 (ERT) and .133 (MRT), respectively. Thus, these constructs explained 27% of individual differences in ERT and 25% of individual differences in MRT. Finally, we included the cultural variables, which explained 29% (18%) of the cross-national variation in ERT (MRT). Multicollinearity is no reason for concern, because all variance inflation factor values are below 3.
We report the (unstandardized) parameter estimates in Table 5. In multilevel analysis, standardized coefficients are not used as the variance is partitioned across different levels.
Effects on ERT and MRT
Notes: n.s. = not significant at p = .05; for national-culture effects, we use p < .10 as the cutoff because there are only 23 degrees of freedom for these parameters. A variance component is fixed if the variance of the coefficient in question is not significantly different from zero. This implies that the effect is the same across countries. T-values are reported for the structural coefficients, and χ2 values are reported for the variance components.
Personality
Our results confirm that conscientiousness (γ40,MRT = .1714, p < .01) and agreeableness (γ50,MRT = .1226, p < .01) are positively associated with MRT. As we also expected, conscientiousness has a positive secondary effect on ERT (γ40,ERT = .0694, p < .01), and agreeableness is unrelated to ERT. In further support of previous research, openness to experience (γ10,ERT = .0258, p < .01) and extraversion (γ20,ERT = .0181, p < .01) are positively related to ERT, though the effect sizes are smaller than those for conscientiousness/agreeableness and MRT. Extraversion has a small negative effect on MRT (γ20,MRT = −.0202, p < .05), but openness is unrelated to MRT. The effects of emotional stability on ERT and MRT are both significant (γ30,ERT = .0491, p < .01; γ30,MRT = .0210, p < .01), but the former is more than twice as large.
Overall, these results support prior research, and they address unresolved issues in three ways (Table 1). First, they indicate that the predicted relationships are generalizable across a diverse sample of respondents from many different cultures. Second, our multivariate procedure enables us to assess the unique effect of each trait, controlling for the effects of other traits, thus alleviating a methodological limitation of previous research. Third, consistent with theoretical arguments, conscientiousness is indeed more strongly related to MRT than to ERT.
Values
In general, the findings confirm the predicted relationships among the four value domains and ERT/MRT. Openness to change is positively related to ERT (γ70,ERT = .0060, p < .01), whereas self-transcendence has a positive effect on MRT (γ80,MRT = .1010, p < .01), which is conceptually consistent with previous research (Lalwani, Shavitt, and Johnson 2006). In addition, self-enhancement is positively associated with ERT (γ60,ERT = .0061, p < .01), and conservation has a significant, positive influence on MRT (γ90,MRT = .0519, p < .01). Finally, self-enhancement and openness to change have significant, negative relationships to MRT (γ60,MRT = .0609, p < .01; γ70,MRT = .0557, p < .01). Though not hypothesized, this finding makes sense because these values are in conflict with need for approval.
Sociodemographics
Confirming previous research, ERT is higher for men, whereas MRT is higher for women (γ100,ERT = .0131, p < .01; γ100,MRT = .0281, p < .01). Furthermore, older respondents ranked higher on both ERT and MRT (γ110,ERT = .0006, p < .01; γ110,MRT = .0038, p < .01). Education has a negative effect on ERT and a positive effect on MRT (γ120,ERT = .0044, p < .05; γ120,MRT = .0095, p < .05). Social class is unrelated to ERT and MRT.
Relative Effect of the Three Types of Individual-Difference Variables
We performed a series of sequential analyses to examine the relative contribution of the three blocks of individual-difference variables. We began by entering sociodemographics because this information is widely used by marketing practitioners. Next, we added personality and then values, and vice versa. We focus on the change in explained variance. The results (see Table 6) show several things. First, people's psychological makeup is much more important in explaining ERT and MRT than their sociodemographic characteristics. Second, for both types of SDR, personality traits have a greater impact than a person's values. Third, personality traits have a substantially greater impact on ERT than on MRT, while values had a much greater effect on MRT than on ERT.
Relative Effects of Big Five and Value Domains on ERT and MRT
National Culture
Our findings are consistent with prior research in that individualist countries ranked lower on MRT (γ01,MRT = −.0020, p < .01). Our findings also address the unresolved issue of the effect of masculinity. In our data, masculine countries are characterized by somewhat higher ERT scores (γ02,ERT = .0005, p < .10) and lower MRT scores (γ02,MRT = −.0007, p < .10). However, whereas previous research has suggested that individualist countries rank higher on ERT, we found the opposite (γ01,ERT = −.0016, p < .05). Yik, Bond, and Paulhus (1998) report a similar finding. They show that, overall and relative to a comparable North American sample, Chinese respondents tend to self-efface. However, on agentic traits, there is actually a tendency toward self-enhancement. Sedikides, Gaertner, and Toguchi (2003) provide additional evidence, documenting that people in all cultures have a need to self-enhance. Thus, it appears that the relationship between individualism and collectivism on the one hand and ERT and MRT on the other hand is more complicated than initially assumed.
The Meaning of Relationships between SDR Scales and Measures of Marketing Constructs
So far, we have examined the two dimensions of the construct of SDR and embedded ERT and MRT in a constellation of associated personality traits, values, sociodemographics, and national culture. We have not yet addressed whether the systematic variance captured by SDR scales always signals stylistic contamination, as has typically been assumed in the marketing literature, or whether it could also indicate substance. In this section, we discuss this vexing issue and expand on previous work by proposing a procedure to assess whether an observed association of ERT or MRT with a substantive marketing scale constitutes nonnegligible bias. Finally, we use our global data set to identify, for nine substantive marketing scales, countries in which potential contamination with ERT and MRT is minimal. In these countries, marketing researchers can use the scale in question without worrying about SDR contamination.
Do ERT and MRT Scales Capture Substance or Style?
In line with Tourangeau and Yan (2007), a high score on an SDR scale may indicate one or more of the following: (1) Although the self-descriptions given are seemingly overly positive, the respondent actually engages in the socially desirable behaviors and refrains from engaging in the reported socially undesirable behaviors; (2) the respondent provides exaggerated self-descriptions, but the self-reports are sincere; and (3) the respondent deliberately presents an inflated self-view to manage a certain impression. In the first case, the SDR scale clearly captures substance, and in the last case, it clearly captures style. The second case is somewhat ambiguous. On the one hand, the self-report is distorted, so if style is equated with bias (i.e., departure from reality), there is stylistic responding. On the other hand, the self-report is sincere, so if “positive illusions” are viewed as a component of the substantive construct of interest, there may be grounds for viewing unconscious bias as substantive (see Paulhus 1991).
To determine whether shared variance between an SDR scale and a measure of a construct of interest is due to substantive or stylistic overlap, several procedures have been proposed. Consider first correlational approaches to separate accurate responses from overly positive self-reports (regardless of whether they are made unconsciously or deliberately). One procedure is based on the idea that if an SDR scale assesses distortion, it should be positively correlated with the extent to which a self-report exceeds a hypothesized unbiased criterion for the self-report. Paulhus and colleagues (2003) call such indexes criterion discrepancy measures and distinguish operational criteria (e.g., intelligence test scores) and social consensus criteria (e.g., ratings by knowledgeable observers). Let S and SDR refer to respondents’ self-reports on a substantive marketing scale of interest and some SDR scale, respectively; let O be an appropriate criterion measure for the self-report (e.g., peer rating); and regress S on both O and SDR:
If a2 > 0, this supports the validity of SDR as a measure of response distortion. Prior empirical research using this approach has demonstrated that SDR scales indeed assess overly positive responding (e.g., Paulhus 2002; Paulhus et al. 2003; Pauls and Stemmler 2003).
The criterion discrepancy approach has considerable intuitive appeal, but there are several problematic aspects. First, truly objective criterion measures are rare, and even if they exist, they are cumbersome to collect. Second, bias measures based on self- and observer ratings may not be a valid indicator of overly positive self-presentation because (1) a self-rating that is higher than an observer rating does not necessarily indicate self-favoring because the respondent may provide overly positive ratings in general about both self and others (see Kwan et al. 2004) and (2) observer ratings may not be a valid (unbiased) criterion measure (e.g., Konstabel, Aavik, and Allik 2006).
Another procedure is based on a comparison of the criterion-related validity of S (i.e., the self-report measure for the marketing construct of interest) for predicting O (i.e., the rating of the respondent on the same construct by a knowledgeable observer, though in general any external criterion, such as a measure of objective job performance, could be used), with or without controlling for SDR. The two models that are compared are as follows:
Assuming that O is free of stylistic variance so that the overlap in variance between S and O is solely due to shared substantive variation (or, at a minimum, that the stylistic variance in O is uncorrelated with the stylistic variance in S and SDR), a significant relationship between S and SDR is attributed to style or substance by comparing b′1 with b1: (1) If b1 > b′1, SDR measures primarily style (controlling for SDR purifies the relationship between O and S), and (2) if b1 < b′1, SDR measures primarily substance (controlling for SDR removes substantive variance and thus weakens the original substantive relationship between O and S). The case of b1 ≈ b′1 most likely indicates that SDR measures both style and substance, unless SDR was unrelated to S to begin with, in which case the issue of whether SDR captures substantive or stylistic variance in S does not arise.
Beginning with McCrae and Costa's (1983) influential study, research based on the partialing approach has usually shown that the coefficient of S remains unchanged, or that its magnitude decreases in absolute value, when SDR is introduced as a control variable, which is inconsistent with the notion that SDR scales measure only style (for more recent evidence, see Kurtz, Tarquini, and Iobst 2007; Pauls and Stemmler 2003). Unfortunately, despite claims to the contrary, these studies are not as conclusive as they first appear because the assumption of unbiased criterion scores is probably not generally true. If the association between S and O in Equation 5 is due to both substance and style, b′1 is not a useful standard of comparisons, because b′1 is inflated as a result of the shared stylistic variance between S and O.
Partial correlation approaches have also been used to check whether relationships between different constructs are influenced by SDR. In this case, the problems associated with the partialing approach are exacerbated because O is no longer a (presumably unbiased) criterion but now refers to a measure of another construct collected from the same respondent. If it is assumed that the association between O and S in Equation 5 is due to substance and SDR measures substance, b1 < b′1 implies that SDR incorrectly removed substantive variance from S and O, whereas if it is assumed that the initial association between O and S was inflated by style and SDR measures style, b1 < b′1 suggests that controlling for SDR successfully removed the confounding effect of stylistic variance (for an example of this type of reasoning, see Mick 1996).
It is clear that all the correlational techniques reviewed so far have serious shortcomings. Consequently, researchers have proposed other procedures that are based on a different logic. These alternative techniques have also been used to differentiate between unconscious and deliberate distortion, which the correlational methods are ill-equipped to handle. The basic idea is to experimentally manipulate the degree of demand for self-presentation and to compare respondents’ scores in “standard” (low-demand) and “fake good” (high-demand) conditions (see Paulhus 2002). Asking respondents to “fake good” should encourage deliberate misrepresentation, so if scores on socially desirable constructs increase relative to the “standard” (control) condition, this provides evidence that conscious SDR can contaminate scores. Of particular relevance, prior research has shown that SDR scales are sensitive to demand manipulations, which suggests that they can capture deliberate distortion (McFarland and Ryan 2006; Pauls and Crost 2004; Roth and Herzberg 2007). An important question that these findings raise is whether respondents naturally distort their answers to questions when situational demands are high, even when they are not explicitly asked to do so.
In summary (see Table 1), there is broad consensus in the psychological literature that SDR scales contain both substantive and stylistic variance. A correlation between a marketing construct and an SDR scale may indicate confounding, but the evidence is not conclusive. Partialing SDR from measures of substantive marketing constructs is of limited usefulness, and more explicit experimental manipulations are needed to establish whether scores on constructs of interest are contaminated.
A Procedure to Check for SDR Bias in Marketing Constructs
On the basis of the foregoing discussion, we propose the following procedure to address the substance versus style issue (for a flowchart, see Figure 2). The first step is to investigate whether there is a potential problem with SDR bias. This involves establishing whether there is an association between the marketing construct of interest and SDR. If the associations with both ERT and MRT are negligible, there is no social desirability problem, and the researcher can safely proceed. If either MRT or ERT has a nonnegligible relationship to the marketing construct, additional analyses are necessary to shed further light on the issue.

Proposed Procedure for Distinguishing between Substance and Style
What constitutes a negligible or nonnegligible relationship probably depends on the magnitude of the associations typically encountered in the area of research under consideration, but from our experience in scale construction and survey research, we propose that a standardized regression coefficient exceeding .2 indicates a nonnegligible relationship between SDR and the marketing scale of interest. Because ERT and MRT are positively correlated, a standardized coefficient of .2 roughly corresponds to a zero-order correlation greater than .2, which is halfway between a small and a medium effect size.
The second step is to investigate whether the association between the marketing construct and SDR is due to shared stylistic variance and whether the bias (if there is bias) is unconscious or deliberate. Initially, a conceptual analysis of the situation at hand should be conducted. When demands for favorable self-presentation are minimal (e.g., when the items contained in the scale do not measure highly sensitive topics, the data collection occurs under conditions of guaranteed anonymity of responses, and there is no incentive to manage an impression), it is likely that the self-reports of respondents with relatively high SDR scores are either accurate or distorted unconsciously. In contrast, when there are strong situational demands for favorable self-presentation (i.e., when there are incentives to project a favorable image), respondents with high SDR scores also include those who may dissemble deliberately.
Marketing researchers will probably be most concerned about respondents deliberately distorting their answers to surveys to manage a certain impression. This will occur if situational demands for favorable self-presentation are high. To ascertain whether conscious SDR bias is a problem, the scores on the marketing construct of interest of respondents who are relatively high in SDR in the high-demand situation (either ERT or MRT) should be compared across low- and high-demand conditions (either between or within subjects). If a construct is (not) susceptible to conscious misrepresentation, the distribution of scale scores obtained in the high-demand situation should (not) be significantly different from the distribution obtained in the low-demand situation (see also Pauls and Crost 2004, 2005).
When a marketing construct is significantly associated with an SDR scale under conditions of low demand, any bias, if it exists at all, is most likely due to unconscious distortion. If it is of interest to separate unconscious distortion from seemingly desirable but accurate responding, a different procedure is needed. Currently, relatively little is known about the cognitive processes involved in SDR, but Holtgraves (2004) indicates that unconscious distortion is a relatively automatic process. This implies that a manipulation encouraging respondents to be more reflective and less impulsive may eliminate the distortion caused by unconscious SDR (for similar arguments in a different context, see Strack and Deutsch 2004). Specifically, to ascertain whether unconscious SDR bias is a problem, the scores on the marketing construct of interest of respondents who are relatively high in SDR in the low-demand situation (either ERT or MRT) should be compared across low-demand and reflective mind-set conditions (either between or within subjects). In the reflective mind-set condition, respondents are encouraged to be more mindful of their behavior relative to the behavior of others or in relation to objective standards of behavior. For accurate responders (because they already respond truthfully), this comparison should make no difference, but for sincere self-deceivers, distortion should decrease in the reflective mind-set condition.
We emphasize that at this point, no empirical evidence on the efficacy of the proposed reflective mind-set manipulation is available. Research by Wilson and Schooler (1991) shows that introspection (analyzing the reasons for a person's preferences, evaluating all attributes of different choice objects) can decrease the quality of preferences and decisions, so it is important to verify that reflection actually reduces nonconscious response distortion and does not introduce another bias.
Effects of ERT and MRT on Marketing Scales: Empirical Evidence from the Global Study
Although a full illustration of the proposed procedure for distinguishing between substance and style is beyond the scope of this article, we briefly present some findings relevant to the first step. To ascertain whether there is a potential for SDR bias in marketing constructs, it is necessary to relate the marketing scale of interest to separate measures of ERT and MRT. Unfortunately, scale development in marketing has typically relied on the Marlowe–Crowne scale, so we know little about the (differential) effects of ERT and MRT on marketing constructs. We know even less about the effects of ERT and MRT in other countries because most scale development work has been carried out in the United States.
We collected data for scales of nine important marketing constructs: susceptibility to normative influence, innovativeness, deal proneness, nostalgia, quality consciousness, material success, environmental consciousness, consumer ethnocentrism, and health consciousness (for details, see the Web Appendix at http://www.marketingpower.com/jmrapril10). We investigate potential SDR bias in these scales by regressing respondents’ scores on the nine marketing scales on their ERT and MRT scores. We did this for each country and each scale separately because we were interested in the differential effects across countries. Figure 3 displays plots of the standardized regression coefficients for ERT and MRT.

Plots of Within-Country Standardized Regression Coefficients of ERT and MRT for Nine Substantive Marketing Scales
The plots show several things. First, they reveal countries in which researchers can use a particular scale without worrying about stylistic contamination (assuming that the conditions of scale administration are similar to this study). Second, the plots identify country–marketing scale combinations in which the effect of ERT or MRT exceeds the |.2| cutoff. For most marketing scales, there is always a subset of countries in which there is a relatively strong relationship between the marketing scale and either ERT or MRT. This finding underlines the importance of studying social desirability in cross-cultural survey research.
Third, there are no cases in which a substantive marketing scale exhibits a substantial relationship to both ERT and MRT. In other words, marketing scales apparently share variance with either agency- or communion-related SDR, but not both. Fourth, in the United States, where most scales were developed, in general, social desirability does not seem to contaminate scale scores, which increases our confidence in the validity of published findings in marketing. The only exceptions are material success (for a similar observation, see Mick 1996) and health consciousness.
In countries in which there is a nonnegligible relationship between a particular marketing scale and ERT/MRT, additional analyses (as described previously) are required to determine whether scale scores are actually contaminated by SDR. Further research is necessary to provide more conclusive evidence about whether potential contamination translates into actual contamination and, more generally, how effective the proposed experimental procedures are in identifying unconscious and deliberate distortion in marketing constructs.
Discussion
Socially desirable responding has been of long-standing interest to the field of marketing. Unfortunately, the construct has not always been well understood by marketing researchers, which has led to misconceptions and erroneous practices. The purpose of this article is to remedy this unsatisfactory state of affairs. We provide a review of the SDR literature organized around three key issues—(1) the construct of SDR; (2) the theoretical constellation of personality traits, values, sociodemographics, and cultural dimensions associated with SDR; and (3) the vexing issue of substance versus style in SDR scales. We highlight the state of the literature, identify unresolved issues, and present results from an extensive global study to reinforce what is known and to address several unresolved issues. Importantly, our findings pertaining to the basic measurement characteristics and the theory-based constellation of related constructs provide strong support for the nomological validity of the ERT and MRT measures on a global basis.
Our study provides several concrete guidelines for marketing researchers. First, there are two distinct, content-based dimensions of SDR, grounded in different modalities of human experience, which are differentially affected by personal and cultural factors. Consequently, the use of the unidimensional Marlowe–Crowne scale should be discontinued. It confounds the two SDR dimensions, and as such, it is unclear what it really measures. Instead, marketing research should include dedicated scales for ERT and MRT, Paulhus's BIDR scale being the preferred instrument. Our study supports the nomological validity of the BIDR in international applications for countries ranging from France to China. To facilitate implementation of this recommendation, the Web Appendix (http://www.marketingpower.com/jmrapril10) provides the translation of the 20-item short form of the BIDR in the 19 languages represented in our study.
Second, the assumption that a correlation between a marketing scale and an SDR measure invariably indicates contamination is unwarranted. Although a significant correlation between ERT or MRT and a substantive scale should be taken seriously because it may signal bias, it is necessary to conduct more detailed follow-up work to establish whether the observed association is due to substance or style. We outlined the contours of such a procedure. A corollary of the previous point is that the widespread practice in scale development research to purify scales by deleting items that correlate highly with SDR scales may actually reduce the construct validity of the scale, unless it is established that this association is driven by style.
Third, for nine substantive marketing scales, we identify countries in which potential contamination with ERT and MRT is minimal (Figure 3). In these countries, marketing researchers can use the scale in question without worrying about possible SDR contamination. These marketing scales measure important constructs, and the countries included in our study cover more than 80% of total global market research (Marketing News 2008). To facilitate use of these important scales outside the United States, translations of the scale items are available on request. We also identify countries in which SDR contamination might be an issue for the construct in question and in which a follow-up experiment is necessary before we can conclusively decide whether the substantial effect of ERT or MRT on that construct is primarily due to style or substance.
Suggestions for Further Research
Our cross-national empirical study sheds light on several unresolved issues (Table 1), but important issues remain. Our test of the theoretical constellation of SDR is based on main effects. As such, it shares the limitation of previous research of being potentially susceptible to common method bias. Further research might address this issue by including interactions. The information on fixed versus random effects (Table 5) is useful in directing such research. For most traits and values, we found that the variance component was significant for ERT and/or MRT. This indicates that the effect in question varies across countries. Work by Church (2000), Van de Vijver, Van Hemert, and Poortinga (2008), and others on how psychological constructs and culture interact in shaping people's responses to the environment may prove useful in developing a conceptual rationale for such cross-level interactions.
Currently available self-report measures of ERT and MRT cannot reliably distinguish between conscious and nonconscious SDR. It is unclear whether it is possible to construct such scales, but the issue certainly warrants additional research before a conclusion can be reached. Furthermore, an intriguing result was the unexpected negative effect of cultural individualism on ERT. Further research should probe more deeply the theoretical mechanisms underlying this effect, which probably requires experimental studies.
We proposed a new procedure to clarify the meaning of relationships between SDR scales and substantive marketing constructs, but only the first part of the procedure was illustrated with our data set. Future studies should further develop, test, and refine the procedure in its entirety. An issue that is especially pertinent is whether the reflective mind-set condition is effective in removing SDR bias.
Survey research would also benefit from a better understanding of when style is more important than substance, and vice versa. Such efforts could ultimately lead to informed predictions regarding SDR contamination versus substantive overlap in marketing scales. It is also important to explore the reasons particular countries exhibit substantial versus negligible effects of ERT or MRT on a specific marketing scale.
Although our samples were broadly representative on key sociodemographics, this does not guarantee that the samples were also representative on personality and values. If the relationship between traits and values and willingness to participate in surveys differs across countries, this may also give rise to country differences. Further research could attempt to investigate this issue.
Finally, we note that SDR scales are used to correct for SDR after the data have been collected. There are also procedures that aim to change the design of the survey to prevent SDR from biasing measures in the first place. Various methods have been proposed, such as indirect questioning (Fisher 1993), bogus pipeline techniques (Roese and Jamieson 1993), and, most recently, item randomized response (De Jong, Pieters, and Fox 2010). The item randomized response procedure in particular seems to have considerable value, and research that compares the effectiveness of the different methods is necessary.
Much remains to be studied before we can offer more definitive answers about whether social desirability is a serious problem in survey research, but we hope that this article provides an impetus to other marketing researchers to make SDR the focus of some of their own work.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
