Analyzing factorial survey data with structural equation models

Abstract

In factorial survey designs, respondents evaluate multiple short descriptions of social objects (vignettes) that experimentally vary different levels of attributes of interest. Analytical methods (including individual-level regression analysis and multilevel models) estimate the weights (or utilities) assigned to the levels of the different attributes by participants to arrive at an overall response to the vignettes. In the current paper, we explain how data from factorial surveys can be analyzed in a structural equation modeling framework using an approach called structural equation modeling for within-subject experiments. We review the use of factorial surveys in social science research, discuss typically used methods to analyze factorial survey data, introduce the structural equation modeling for within-subject experiments approach, and present an empirical illustration of the proposed method. We conclude by describing several extensions, providing some practical recommendations, and discussing potential limitations.

Keywords

Factorial survey structural equation modeling conjoint analysis vignette study survey methods

In factorial surveys (sometimes referred to as vignette studies), respondents are asked to evaluate multiple vignettes, which are short descriptions of social objects that experimentally vary attributes of interest (Auspurg and Jäckle 2017). This experimental variation allows researchers to quantify the effect of each of the attributes on respondents’ evaluations (Auspurg and Hinz 2015; Auspurg and Jäckle 2017; Dülmer 2016; Shamon, Dülmer, and Giza forthcoming; Su and Steiner 2020). The aim of factorial surveys is to estimate the relative contribution of different levels of each attribute to overall evaluations (Wallander 2009). For instance, a researcher could use a factorial survey to assess the impact of different levels of racial diversity (the attribute) in a neighborhood (the social object) on willingness to live there (the evaluation; Emerson, Chai, and Yancey 2001). In such a study, respondents would be asked to rate their willingness to live in neighborhoods described in vignettes that experimentally vary racial diversity (probably along with some other neighborhood attributes). By contrast, in a more traditional approach, respondents would indicate their degree of agreement with classical Likert-type items such as “I prefer to live in a neighborhood with people of the same race as me.”

Factorial surveys are increasingly being used in the social sciences in general and in sociology in particular (Auspurg and Hinz 2015; Dülmer 2016; Jasso 2012; Liebe et al. 2020; Su and Steiner 2020), in part because vignettes offer several advantages over classical (Likert-type) attitude items (Dülmer 2007, 2016; Jasso 2006; Shamon et al. forthcoming; Wallander 2009). Firstly, analyzing people's responses to vignettes containing concrete descriptions is likely to yield more realistic representations of judgments and evaluations than asking people to indicate their reaction to certain attributes of social objects directly. In particular, a major benefit of factorial surveys is that they incorporate the tradeoffs between attributes that frequently arise in practice. Secondly, responses to vignettes tend to be less prone to social desirability bias because when an object is described on several attributes simultaneously, participants may feel less pressure to have to defend the relative influence of each attribute on their response, as each response can be justified in multiple ways (Tomassetti, Dalal, and Kaplan 2016).

When researchers analyze data from factorial surveys, they employ analytical methods to estimate the weights (or utilities) assigned to the levels of the different attributes by participants to arrive at an overall response to the vignettes. This is typically done by estimating a regression-type model in which the response variable is specified as a linear function of the levels of the vignette attributes (Jasso 2006). In the current paper, we propose that data from factorial surveys can be analyzed in a structural equation modeling framework using an approach called Structural Equation Modeling for Within-Subject Experiments, or succinctly as SEMWISE (Weijters and Baumgartner 2019). In this approach, a factor model is specified in which the responses to the different vignettes serve as the indicators of latent weight factors that represent individual variation in the weights or utilities assigned to the levels of the attributes manipulated in the factorial survey. Individual differences in how strongly different attributes and their levels influence overall ratings can then be related to various antecedents or consequences in an integrated model. In addition, the usual advantages of structural equation modeling apply, that is, measurement error in the observed responses can be accounted for, the fit between model and data can be thoroughly assessed based on established model fit indices, and the generalizability of relationships across groups of respondents can be investigated using multisample models.

In the following, we will review the use of factorial surveys in social science research, discuss typically used methods to analyze factorial survey data, introduce the SEMWISE approach, compare the SEMWISE model to the multilevel approach, illustrate its use in an empirical illustration, and suggest some possible modeling extensions. We will conclude by making some practical recommendations and discussing potential limitations of this approach.

Factorial surveys

Factorial surveys in the social sciences and sociology

Factorial surveys are popular in marketing research, policy studies, organizational research, and applied psychology, as well as in sociology (Liebe et al. 2020; Shamon et al. forthcoming). In research domains outside sociology, the method is often presented using different names (the most common names include conjoint analysis in marketing; policy capturing or judgment analysis in organizational research; factorial surveys or vignette studies in sociology and political science). This has led to cross-disciplinary fragmentation, and consequently researchers in one discipline are frequently not aware of the methodological advances of the approach in other disciplines (Aiman-Smith, Scullen, and Barr 2002).

In marketing research, the term rating-based conjoint analysis is used for factorial surveys in which each respondent rates multiple vignettes (usually descriptions of products along various attributes) in terms of attractiveness or purchase intention. Conjoint analysis (which can also be based on choices between products) has a long research tradition in marketing (Green and Rao 1971) and has grown into a highly specialized and sophisticated academic field (Carroll and Green 1995; Green, Krieger, and Wind 2001; Green and Srinivasan 1990). The method is also widely used by applied marketing researchers (Gustafsson, Herrmann, and Huber 2000), particularly for designing new products, making pricing decisions, and forecasting market shares following product introductions (Green and Srinivasan 1990). The goal of conjoint analysis is to derive respondent-specific weights (called part-worth utilities) that show the relative contribution of the different levels of the product attributes of interest (e.g. whether a coffee is organic or not, has a good or average flavor, and a low vs. high price) to a product's overall utility.

Factorial surveys are also popular in organizational research and applied psychology, where they are called policy capturing (Aiman-Smith et al. 2002; Karren and Barringer 2002; Tomassetti et al. 2016) or, less commonly, judgment analysis (Tomassetti et al. 2016). As these names suggest, in organizational research the method is typically used to understand issues such as how people in organizations make hiring decisions, assign performance ratings, or set compensation. In this context, policy capturing is employed to study which factors influence people's judgments and how heavily each factor is weighted (Aiman-Smith et al. 2002). Similar to marketing, this method has a long tradition dating back almost half a century (Doherty and Keeley 1972; Mertz and Doherty 1974).

In sociology, factorial surveys are mainly used to study the factors that drive social responses (including evaluations, judgments, and beliefs) toward social objects such as individuals, groups, or situations (Dülmer 2007, 2016; Emerson et al. 2001; Jasso 2006; Liebe et al. 2020; Shamon et al. forthcoming; Wallander 2009). People routinely judge multidimensional phenomena in everyday life, as the following examples illustrate (Shlay 2010): deciding whether to buy a house based on factors such as location, price, and neighbors; determining whether a sex act constitutes rape based on the attire of the victim, the presence of alcohol, and other factors; evaluating the appropriateness of a mate based on education, family background, and ethnicity; or voting for a president based on political position, race or ethnicity, and gender.

Analyzing factorial surveys

Factorial surveys are typically analyzed using linear regression models or extensions thereof (Wallander 2009). In these models, the vignette attribute levels are used as the (categorical) independent variables and vignette ratings as the dependent variable. Table 1 provides an overview and examples of regression-based analytical approaches that have been used in sociological research. The most straightforward approach is to run a pooled regression across all individual vignette ratings provided by respondents, estimating one intercept term and one set of regression coefficients for the entire sample of respondents (see row 1 in Table 1). As an example, Schwappach and Gehring (2014) study how attributes of medical situations (error in checking a prescription, missed hand disinfection, rule violations in medication preparation, rule violation during lumbar puncture) affect the likelihood of speaking up about safety concerns among healthcare professionals, using pooled multiple linear regression (i.e. one set of regression coefficients is estimated for the entire sample of respondents). This approach has two key shortcomings. First, since each respondent contributes multiple data points, the assumption that all observations are sampled independently from an underlying population is violated. Second, estimating a single equation for all respondents assumes that the effects of the vignette attributes on responses are homogeneous across respondents, which also prevents researchers from studying potential antecedents or consequences of between-respondent differences in how vignette attributes affect ratings. Whereas the first issue is mainly a statistical nuisance, as it may result in faulty estimates of standard errors if the dependencies of the observations are not accounted for, the second issue may have important substantive implications.

Table 1.

Examples of how factorial surveys in sociology have been analyzed.

Type of analysis	Example	Intercept(s)	Attribute weights	Antecedents of weights
(1) Pooled regression	Schwappach and Gehring (2014)	Pooled	Pooled	N.A.
(2) Pooled regression with fixed effects for respondents	Di Stasio and Gerxhani (2015)	A fixed individual intercept term is estimated for each respondent (i.e. fixed effects)	Pooled	N.A.
(3) Multilevel regression with random intercept	Lyons (2008)	A random intercept term captures between-respondent variation in mean ratings	Pooled	N.A.
(4) Individual regressions	Jasso and Opp (1997)	A fixed intercept term is estimated for each respondent (i.e. fixed effects)	Separate fixed regression coefficients are estimated for each respondent	(Would require a separate analysis using the estimates of the individual regression coefficients as the data)
(5) Multilevel regression with random intercept and random coefficients	Finger (2016)	A random intercept term captures between-respondent variation in mean ratings	Random regression coefficients capture between-respondent variation in the effects of attribute levels on ratings	Interaction terms involving the attribute levels at the within-level and observed variables at the between-respondent level
(6) SEMWISE	Current study	Intercept factor	Weight factors	Regression coefficients linking weight factors to observed and/or latent respondent variables

SEMWISE: structural equation modeling for within-subject experiments.

To address the issue of nonindependent observations, researchers can use a fixed effects model in which a separate intercept is estimated for each individual respondent (see row 2 in Table 1). As an example, Di Stasio and Gerxhani (2015) use this approach to study how applicant characteristics (gender, previous work experience, participation in an internship at the firm, level of education, field of study, study duration, grade point average, extracurricular activities) affect hiring propensity, applicant trainability, and applicant–organization fit among a sample of English employers in the Information and Communication Technology industry. Alternatively, researchers can use a random intercept model in which the individual intercepts are assumed to follow a known probability distribution (usually a normal distribution) and only the mean and variance of this distribution have to be estimated (see row 3 in Table 1). As an example, Lyons (2008) studies how incident characteristics (offender race, number of offenders, victim race, victim sex, victim sexuality) affect perceptions of seriousness of the offense among students, using a random intercept specification to account for data dependencies at the respondent level.

Allowing for variation in the intercept does not address the limiting assumption of effect homogeneity in the other regression coefficients. Researchers in sociology analyzing factorial surveys are often interested not only in the social component (the generally agreed-upon or average component) but also in the individual component (the individual deviation from the social component) of people's judgments (Rossi and Anderson 1982). For example, respondents may favor neighborhoods with low racial diversity on average (the social component), but some may do so more than others and a few may actually prefer higher racial diversity (i.e. there are individual deviations from the social component). Although some researchers have shown an interest in individual differences, Wallander (2009) points out, based on her extensive review of the literature, that many researchers (particularly in earlier studies) have largely overlooked the potentially relevant between-respondent variation in judgments.

Researchers who have accounted for individual differences in the weighting of vignette attributes have done so in different ways. One solution is to estimate a separate regression model for each individual respondent (see row 4 in Table 1). These individual-level regression weights could then be used in a subsequent analysis in which the weights are linked to individual background variables such as age or sex (Jasso 2006) and/or other observed respondent-level variables (Castillo, Olivos, and Azar 2019; Finger 2016). As an example, Jasso and Opp (1997) study how characteristics of protests (economical/political discontent, legal/illegal protest, low/high personal influence, personal risk, expected number of participants, gender) affect normative evaluations of protest participation among German citizens, using individual-level regression models to account for the fact that respondents differentially weigh the protest characteristics in their evaluations. The disadvantage of individual regressions is that it requires the estimation of a large number of parameters, and these estimates may be unstable, particularly when the number of repeated observations per person is relatively small. Recent methodological advances and related software developments have encouraged the use of multilevel modeling to analyze factorial surveys (Shamon et al. forthcoming). In multilevel models (see row 5 in Table 1), random regression coefficients capture the between-respondent variation in the effects of different attribute levels on overall ratings. As in individual-level regression, the individual variation in the random coefficients can be related to various antecedents specified to explain this variability (technically, these effects are cross-level interactions between the within-respondent and between-respondent effects), but the advantage of random effects models is that the parameter estimates tend to be much more stable. As an example, Liebe et al. (2020) study the effect of offense (knocked over milk, stole milk), type of sanction (scolding, beating), and sanctioning person (relative, nonrelative) on the acceptability of punishment among rural Benin residents, accounting for between-respondent heterogeneity in average ratings and effects by means of a multilevel model involving a random intercept and random slopes. As another example, Finger (2016) studies the effect of application situations (distance, reputation, selection procedure, information about application and admission procedure of the university, personal interest in subject, social networks, size of the university city) on intention to apply to university among future students as a function of their academic background, using a multilevel model with random intercepts and random slopes and interaction terms between the vignette attributes and respondent-level variables.

In this paper we will present another approach for analyzing factorial surveys called SEMWISE, which generally yields the same results as multilevel modeling but offers important advantages under specific circumstances, as described in more detail below. In the following, we will first provide an intuitive explanation of how SEMWISE works using a simple example of a specific factorial survey. We will then present a more formal development of the SEMWISE approach and demonstrate its equivalence to the multilevel modeling framework.

Analyzing factorial surveys using SEMWISE

In this section, we first provide an intuitive explanation of the SEMWISE approach to introduce the basic idea and then develop the SEMWISE model more formally. Since it is likely that many readers are already familiar with multilevel modeling, we start out with the multilevel specification and then show how the multilevel specification can be translated into the SEMWISE specification.

Intuitive explanation of the SEMWISE approach

As a specific example, consider a factorial survey in which two attributes of immigrant profiles are varied orthogonally at two levels each: (1) the country of origin (COO) of the immigrant is either Canada or Mexico, and (2) the immigrant's level of education is having or not having a university degree. Respondents rate four immigrant profiles or vignettes (two countries of origin by two levels of education) by indicating to what extent they approve or disapprove of an immigrant from a given country with a given level of education being allowed to enter the country based on a rating scale ranging, for instance, from 0 (definitely disapprove) to 10 (definitely approve). The SEMWISE approach can then be used to model individual variation in the extent to which respondents are (a) more or less accepting of immigrants in general, (b) more or less sensitive to immigrants’ COO (i.e. Canada vs. Mexico in the present case), and (c) more or less sensitive to immigrants’ level of education (i.e. whether or not they have a university degree). This is done as follows.

The ratings of the four immigrant profiles per person are used as the four observed indicators of three latent factors: an intercept factor (F_INT), which captures respondents’ average approval rating of immigrants in general; a weight factor for country of origin (F_COO), which captures the effect of immigrant country of origin (Canada vs. Mexico) on respondents’ approval ratings; and a weight factor for university degree (F_UD), which captures the effect of immigrant education level (university degree vs. no university degree) on respondents’ approval ratings. The distinguishing feature of the SEMWISE approach is that the loadings of the weight factors are not free model parameters but fixed to specific values such that they correspond to the different attribute levels. Although different coding schemes are possible, we propose to use effect coding, which means that the loading on the F_COO weight factor is +1 (−1) for profiles in which the immigrant is from Canada (Mexico) and the loading on the F_UD weight factor is +1 (−1) for profiles in which the immigrant has (does not have) a university education. The loadings on the intercept factor F_INT are fixed to +1 for all profiles.

Figure 1 shows how the SEMWISE model decomposes the overall ratings of the four profiles into component utilities for one fictitious respondent. Although the factor loadings are fixed for all respondents, the factor scores for the intercept factor and the two weight factors capture how this respondent rates immigrants in general and how an immigrant's COO and level of education influence the overall ratings. In the example, the respondent's average approval rating of immigrants in general is 4 on a scale from 0 (definitely disapprove) to 10 (definitely approve), and the respondent is more approving of immigrants from Canada than Mexico and less approving of immigrants with a university education than those without a university immigration. To illustrate the interpretation of Figure 1, consider a fictitious respondent who has latent weight factor scores of F_INT = 4, F_COO = 2, and F_UD = −1 and whose ratings are perfectly determined by the weight factor scores and profile attributes for the sake of the example. The factor scores can be interpreted as follows. First, the factor score of 4 for the intercept factor indicates that this respondent gave the vignettes an average rating of 4 (since all vignette ratings have a unit loading on the intercept factor, they are all affected equally by the intercept factor score). Second, the factor scores of + 2 and −1 for the two weight factors mean that the respondent provided higher ratings for Canadian immigrants (whose factor loading is +1 which, when multiplied by the factor score of + 2, increases the overall rating by + 2) than Mexican immigrants (whose factor loading is −1 which, when multiplied by the factor score of + 2, decreases the overall rating by −2), and that the respondent also gave lower ratings to immigrants with a university degree (whose factor loading is + 1 which, when multiplied by the factor score of −1, decreases the overall rating by −1) than immigrants without a university degree (whose factor loading is −1 which, when multiplied by the factor score of −1, increases the overall rating by + 1).

Figure 1.

SEMWISE (structural equation modeling for within-subject experiments) model rationale for one fictitious respondent.

Figure 2 shows the specification of the core SEMWISE model across all respondents in graphical form. The intercept and weight factors now capture individual variation in average ratings and the influence of the manipulated attributes on overall ratings. Note that the intercept and weight factors can be correlated. For instance, a positive correlation between F_COO and F_UD would indicate that respondents who like Canadian (vs. Mexican) immigrants also tend to like immigrants with (vs. without) a university degree. The intercept and weight factors each have a mean (the gamma (γ) parameters in the figure), which represents the social component of the effects of the immigrant attributes on approval ratings. The intercept and weight factors also have a variance, which represents the individual component (as variation around the mean) in the effect of the immigrant attributes on approval ratings. Finally, the approval ratings each have a unique factor, which captures random error and other stochastic variance not related to either the average rating or the effect of the two immigrant profile attributes.

Figure 2.

Structural equation modeling for within-subject experiments model.

In the current example, there are two binary attributes and four profiles, and three weight factors are needed to model the variation in the intercept and the effect of the binary attributes on ratings. As specified, the SEMWISE model in Figure 1 only accounts for the main effects of University Degree (UD) and COO on the profile ratings; the fit of the model to the data indicates whether this specification is appropriate (extensions that include interactions will be described below). Researchers can formally test whether a specified model is consistent with the data, using the usual chi-square (χ²) goodness-of-fit test, or rely on alternative fit indices and conventional rules of thumb (e.g. root mean square error of approximation (RMSEA) ≤ .06; standardized root mean squared (SRMR) ≤ .08; comparative fit index (CFI) ≥ .95; Tucker-Lewis index (TLI) ≥ .95; Hu and Bentler 1999) to assess whether the model provides a reasonable approximation to the data. If the model is judged to be adequate, the parameter estimates of interest can be interpreted; typically, the interpretation will focus on the intercept and weight factor mean and variance estimates (as illustrated in Figures 1–3).

Figure 3.

Extended structural equation model for within-subject experiments.

Once the core model has been tested and found to be acceptable, researchers can extend the model by embedding the intercept and weight factors in a broader nomological net of antecedents, correlates, and consequences. Figure 3 illustrates how latent antecedents and consequences can be related to the SEMWISE factors (a similar approach applies to observed variables). For example, personal values could be added as antecedents of the intercept and weight factors (this will be discussed further in the empirical application), and willingness to sign an anti-immigration petition could be considered as a potential consequence. Before including antecedents and consequences, we recommend that researchers carefully evaluate the measurement model underlying the latent antecedents and/or consequences in a preliminary analysis. Also, when the intercept and weight factors are modeled as dependent variables, residual covariance terms should be included because it is unlikely that the antecedents considered in the model will be able to account for all of the shared variance between the factors. This would also apply to the situation in which the factors are treated as antecedents because in most cases the factors will be correlated. If the extended model shows acceptable fit, the key parameters of interest are the regression coefficients relating the antecedents to the intercept and weight factors or the intercept and weight factors to the consequences. Their interpretation will be further illustrated in the empirical example. The next section provides a more formal development of the SEMWISE model and compares it to a multilevel specification that may be more familiar to some readers; readers who are unfamiliar with multilevel modeling can skip to the empirical illustration.

Formal specification of the core SEMWISE model

In this section, we develop and explain the SEMWISE model formally and contrast it with the more common multilevel modeling approach. For ease of exposition, we again use the same example as before (see Figure 1). Since there are four ratings per person, the data set consists of R × 4 observations, where R is the total number of respondents. To analyze the data by means of multilevel modeling, the data set is typically structured in long format: Each record contains a particular respondent's rating of one of the four vignettes. In other words, in long format, the repeated observations of a given respondent are separate records and, consequently, each respondent is represented using four lines in the data set.

A multilevel model for these data can be specified as follows. The level-1 model is:

Y_{r v} = β_{0 r} + β_{1 r} C O O + β_{2 r} U D + ε_{r v},

(1)

where

Y_{r v}

is respondent r's (r = 1 to R) rating of vignette v (v = 1 to 4), COO is the country of origin of the immigrant, UD indicates whether or not the immigrant has a university degree, and

ε_{r v}

is an error term. The coefficients

β_{0 r}

β_{1 r}

, and

β_{2 r}

are the intercept and the effects of COO and UD on Y, respectively. Note that all three coefficients are respondent-specific (i.e. random). In factorial surveys, the independent variables will usually be effect- or dummy-coded nominal variables. For example, effect-coding would yield COO = + 1 or −1 depending on whether the immigrant's COO is Canada or Mexico, respectively, and UD = + 1 or −1 depending on whether the immigrant has or does not have a university degree.

The level-2 model is given by:

β_{0 r} = γ_{00} + υ_{0 r}

(2)

β_{1 r} = γ_{10} + υ_{1 r}

(3)

and

β_{2 r} = γ_{20} + υ_{2 r},

(4)

where

γ_{00}

γ_{10}

, and

γ_{20}

are the fixed effects (or means) corresponding to the three coefficients and

υ_{0 r}

υ_{1 r}

, and

υ_{2 r}

are the respondent-specific deviations from the fixed effects. It is usually assumed that the means of

ε_{r v}

υ_{0 r}

υ_{1 r}

, and

υ_{2 r}

are zero and that

ε_{r v}

is uncorrelated with

υ_{0 r}

υ_{1 r}

, and

υ_{2 r}

. However, the

υ_{i r}

are allowed to covary.

Substituting equations (2) to (4) into equation (1) yields

Y_{r v} = γ_{00} + γ_{10} C O O + γ_{20} U D + υ_{0 r} + υ_{1 r} C O O + υ_{2 r} U D + ε_{r v} .

(5)

Usually, researchers are interested in the fixed effects, which indicate the average rating (

γ_{00}

) across all immigrant profiles and the average effect of COO and UD on the ratings (

γ_{10}

and

γ_{20}

), respectively, as well as the variability of the coefficients across respondents (i.e.

V a r (υ_{0 r})

V a r (υ_{1 r})

, and

V a r (υ_{2 r})

), and possibly the covariances between

υ_{0 r}

υ_{1 r}

, and

υ_{2 r}

The model in equation (5) can be contrasted with two other models. If the three coefficients are constant across respondents (which implies that the variability in these coefficients is zero), one obtains the pooled regression model, in which a single regression with constant coefficients is specified across all observations (see row 1 in Table 1). Another possibility is to specify a separate regression model for each individual respondent (individual-level regression; see row 3 in Table 1). As it turns out, the multilevel model is a weighted combination of the pooled regression and the individual-level regressions (Gelman and Hill 2007). When there is little variability in coefficients across respondents, the multilevel model estimates will be close to the pooled estimates, but as the variability in coefficients across respondents increases, the multilevel model estimates move toward the individual-level estimates. This weighting is advantageous because it allows for effect heterogeneity across respondents while avoiding the instability of the individual-level regression coefficients (particularly when the number of profiles is small).

The SEMWISE approach is formally identical to the multilevel model approach, but the model specification is quite different (Weijters and Baumgartner 2019). To begin with, the data have to be structured in wide format (or multivariate format), where each respondent is represented with one line of data and the four ratings are treated as four different variables (i.e. $Y_{r 1}$ , $Y_{r 2}$ , $Y_{r 3}$ , and $Y_{r 4}$ , with R records in total). Figure 4 illustrates the difference between long and wide format. In addition, a factor model is specified in which latent weight factors capture the individually varying intercepts and effects of the experimentally manipulated attributes on the response variable. To see how this works, note that, depending on whether COO and UD equal + 1 or −1, equation (5) yields four different equations:

Y_{r 1} = γ_{00} + γ_{10} + γ_{20} + υ_{0 r} + υ_{1 r} + υ_{2 r} + ε_{r 1}

(6)

Y_{r 2} = γ_{00} + γ_{10} - γ_{20} + υ_{0 r} + υ_{1 r} - υ_{2 r} + ε_{r 2}

(7)

Y_{r 3} = γ_{00} - γ_{10} + γ_{20} + υ_{0 r} - υ_{1 r} + υ_{2 r} + ε_{r 3}

(8)

Y_{r 4} = γ_{00} - γ_{10} - γ_{20} + υ_{0 r} - υ_{1 r} - υ_{2 r} + ε_{r 4}

(9)

Figure 4.

Illustration of long format versus wide format.

By setting $γ_{00} + υ_{0 r} = F_{I N T, r}$ (where intercept (INT) refers to the intercept), $γ_{10} + υ_{1 r} = F_{C O O, r}$ , and $γ_{20} + υ_{2 r} = F_{U D, r}$ , we get

Y_{r 1} = F_{I N T, r} + F_{C O O, r} + F_{U D, r} + ε_{r 1}

(10)

Y_{r 2} = F_{I N T, r} + F_{C O O, r} - F_{U D, r} + ε_{r 2}

(11)

Y_{r 3} = F_{I N T, r} - F_{C O O, r} + F_{U D, r} + ε_{r 3}

(12)

Y_{r 4} = F_{I N T, r} - F_{C O O, r} - F_{U D, r} + ε_{r 4},

(13)

where

E (F_{I N T, r}) = γ_{00}

E (F_{C O O, r}) = γ_{10}

E (F_{U D, r}) = γ_{20}

V a r (F_{I N T, r}) = V a r (υ_{0 r})

V a r (F_{C O O, r}) = V a r (υ_{1 r})

, and

V a r (F_{U D, r}) = V a r (υ_{2 r})

. The covariances between

F_{I N T, r}

F_{C O O, r}

, and

F_{U D, r}

are also parameters of this model. It is apparent that this is a factor model in which the

Y_{r v}

load on all three factors

F_{I N T, r}

F_{C O O, r}

, and

F_{U D, r}

, but the loadings are fixed to values of +1 or −1. In addition, the factor model has a mean structure because the means of the factors (

γ_{00}

γ_{10}

, and

γ_{20}

) are model parameters.

In the SEMWISE approach, $E (F_{I N T, r}) = γ_{00}$ equals the mean rating across all immigrant profiles because $(1 / 4) [E (Y_{r 1}) + E (Y_{r 2}) + E (Y_{r 3}) + E (Y_{r 4})] = γ_{00}$ . The variance of $F_{I N T, r}$ shows the variability in respondents’ ratings of immigrant profiles in general. In part, this reflects differences in respondents’ acceptance of immigrants in general, but it also captures scale usage differences and other respondent-specific influences on ratings. The means of $F_{C O O, r}$ and $F_{U D, r}$ express the average effect of being an immigrant from Canada versus Mexico and of being an immigrant with versus without a university degree on ratings, respectively. For example, if $E (F_{C O O, r}) = .5$ , this implies (given the effect coding described earlier) that being an immigrant from Canada (Mexico) increases (decreases) ratings by.5 on average. Therefore, the difference in acceptance of immigrants from Canada and Mexico is 2(.5) = 1. The variances of $F_{C O O, r}$ and $F_{U D, r}$ indicate the variability in acceptance of immigrants from different countries and immigrants with different educational backgrounds, respectively, across respondents.

In the SEMWISE approach it is straightforward to relate the variation in the intercept and weight factors to various respondent-level antecedents, correlates, or consequences. As the data are already structured by respondent, the model can be easily extended to incorporate observed or latent variables that are associated with the intercept and weight factors. If the respondent-specific intercept and weight factor scores vary significantly across respondents, a researcher may be interested in investigating the determinants of this variation. In a multilevel model, antecedents of the random effects can be added to equations (2) to (4) as cross-level interactions (i.e. interactions involving variables from two different levels in the data). In the SEMWISE model, $F_{I N T, r}$ , $F_{C O O, r}$ , and $F_{U D, r}$ can be expressed as functions of potential determinants of the intercept and weight factors. For example, individual differences in acceptance of immigrants may be related to the values that respondents hold, as shown in our empirical demonstration. In addition, the SEMWISE approach makes it easy to specify $F_{I N T, r}$ , $F_{C O O, r}$ , and $F_{U D, r}$ as determinants of other observed or latent variables. For example, acceptance of immigrants may be studied as an influence on willingness to provide support for immigrants. In fact, one of the major advantages of the SEMWISE approach is that integrative frameworks of the antecedents and consequences of the latent intercept and weight factors can be easily specified.

Advantages of the SEMWISE approach relative to the multilevel model approach

For basic factorial survey designs, the multilevel modeling and SEMWISE approaches will yield identical results. However, the SEMWISE approach has several advantages. Firstly, SEMWISE enables sophisticated measurement model specifications that in some cases are likely to prove more realistic. For example, in multilevel models, the level-1 error variances are usually specified to be the same (see $V a r (ε_{r v}))$ in equations (6) to (9)). This assumption is not necessary in SEMWISE, and the unique factor variances of different profiles are generally not constrained to be equal. It is also straightforward to formulate more explicit measurement models when multiple indicators of respondents’ reactions to each profile are available (whereas in multilevel models multiple responses are usually simply averaged), and it is possible to take into account systematic response biases (if assumed to be present).

Secondly, SEMWISE makes it possible to investigate antecedents and consequences of the latent intercept and weight factors and study potential mediational mechanisms in nomological networks of related constructs. Although antecedents of the random effects can be accommodated in multilevel models, they are not modeled as latent variables (whereas modeling the random effects as random variables facilitates modeling their antecedents and/or consequences). Also, studying consequences of the random effects and multistage frameworks of antecedents and consequences is less straightforward in multilevel modeling but can be done easily with SEMWISE.

Thirdly, SEMWISE provides tests of overall model fit and facilitates model comparisons. Both global and local fit tests and comparisons of models and model parameters can be conducted using a variety of estimation procedures that take into account nonnormality, nonindependence of observations, and other violations of standard estimation methods.

Fourthly and finally, as explained in more detail below, multisample models can be formulated that allow researchers to study the (in)variance of model parameters across groups of respondents. In particular, the multisample SEMWISE approach can be used to investigate moderator effects of grouping variables (e.g. gender) on any of the single-sample model parameters.

Empirical illustration

In this section, we will present an empirical example to illustrate the specification, estimation, and testing of SEMWISE models and to demonstrate several key benefits (e.g. overall model testing and model comparisons, investigating antecedents of the latent intercept and weight factors). Specifically, we will estimate a model for factorial survey data corresponding to the example shown in Figure 2, and we will additionally demonstrate how individual variation in the intercept and weight factors can be related to other (latent) variables. In particular, we will investigate whether acceptance of immigrants in general and acceptance of immigrants as a function of the immigrant's COO and level of education is associated with the personal values held by the respondent. The main purpose of the empirical illustration is to provide a practical application of the SEMWISE approach.

Conceptual background

In previous studies, universalism and security (Schwartz et al. 2012) have been found to be positively and negatively related to acceptance of immigrants, respectively (Beierlein, Kuntz, and Davidov 2016; Davidov and Meuleman 2012). We will therefore test whether participants who score higher on universalism and lower on security values are (a) more accepting of immigrants in general; (b) less sensitive to immigrants’ COO (i.e. Canada vs. Mexico in the present case); and (c) less sensitive to immigrants’ level of education (i.e. whether or not they have a university degree). More specifically, based on previous literature (e.g. Davidov and Meuleman 2012), we expect that individuals who endorse universalistic values are more likely to approve of immigration (H1a) and to be less sensitive to the country of origin (H1b) and level of education of immigrants (H1c). In contrast, we expect that individuals who endorse security values will be less supportive of immigration (H2a) and will prefer immigrants from Canada (vs. Mexico; H2b) and immigrants with a university degree (vs. no university degree; H2c).

Design

Immigrant COO and level of education were manipulated in a factorial survey in which participants (who were U.S. residents) rated all four possible immigrant profiles. We also included several filler attributes. Some were part of the background story (e.g. male) and were kept constant across vignettes (Shamon et al. forthcoming); others (e.g. name) were randomized across vignettes and respondents. This strategy has two advantages: (a) although the design based on the two attributes is simple, the inclusion of additional attributes enabled us to sample from a broader set of possible profiles and, as a consequence, (b) the task becomes less transparent and respondents can justify their choices based on other attributes that are not of interest, which should reduce socially desirable responding. Including fully randomized filler attributes adds variance to the response variable, but this is accounted for by the unique variance terms for each of the observed variables in the model (see the model specification below). Table 2 summarizes the vignette design used in this illustration.

Table 2.

Vignette design.

Attribute	Attribute type	Levels
Gender	Constant	Male
marital status	Constant	Single
COO	Experimental	Canada/Mexico
educational level	Experimental	No university degree/university degree
Name	Randomized filler	Random (nested in COO) For Canada: Angus May, Peter Elliott, Rocco Smith, Charlie Cole, Zach Holmes, Cooper Stone, Marcel Macdonald, Lennon Knight For Mexico: Lucho Gálvez, Samuel Puig, Maximiliano Hurtado, Lucas Cicerón, Jacobo Abasto, Benjamín Céspedes, Arturo Carballar, Umberto Moya
Age	Randomized filler	Random (24–29 years)

COO: country of origin.

Note: Random names were generated from https://www.fantasynamegenerators.com/british-english-names.php and https://www.fantasynamegenerators.com/hispanic_names.php.

We presented the profiles using textual (not tabular) presentation in order to minimize potential social desirability bias (Shamon et al. forthcoming). Figure 5 displays the four example vignettes used in the study. Participants were instructed as follows: “On the following pages, you will be presented with the personal profiles of four single men who recently moved to the U.S. and who are applying for a residency permit. Please indicate to what extent you personally feel each person should or should not be granted a permit. There are no right or wrong answers, only your personal opinion matters here.”

Figure 5.

Example vignettes.

Measures

Vignette response variable. For each vignette, participants were asked the following question: “Please indicate whether you would approve or disapprove of a residence permit application for the individual described above?” Respondents indicated their answer on a slider rating scale ranging from 0 (definitely disapprove) to 10 (definitely approve).

Antecedents. We measured values by means of selected items from the PVQ57 (Cieciuch et al. 2014; Schwartz et al. 2012), using the following instructions: “Here we briefly describe some people. Please read each description and think about how much each person is or is not like you. Tick the box to the right that shows how much the person in the description is like you (very much like me, like me, somewhat like me, a little like me, not like me, not like me at all)” (Davidov, Schmidt, and Schwartz 2008; Zercher et al. 2015). Measures of the value domains universalism (concern and tolerance, portrait values questionnaire (PVQ) items 5, 37, 52, 14, 34, 57) and personal and societal security (PVQ items 13, 26, 53, 2, 35, 50) were of particular interest to us (see Table 3 for the items used; e.g. Davidov and Meuleman, 2012).

Table 3.

Value items (shown here in the male version only).

PVQ item	Value domain	Parcel	Item statement
pvq2	SEC	sec1	It is important to him that his country is secure and stable.
pvq5	UNI	uni1	It is important to him that the weak and vulnerable in society be protected.
pvq13	SEC	sec1	It is very important to him to avoid disease and protect his health.
pvq14	UNI	uni1	It is important to him to be tolerant toward all kinds of people and groups.
pvq26	SEC	sec2	It is important to him to be personally safe and secure.
pvq34	UNI	uni2	It is important to him to listen to and understand people who are different from him.
pvq35	SEC	sec2	It is important to him to have a strong state that can defend its citizens.
pvq37	UNI	uni2	It is important to him that every person in the world have equal opportunities in life.
pvq50	SEC	sec3	It is important to him that his country protect itself against all threats.
pvq52	UNI	uni3	It is important to him that everyone is treated justly, even people he doesn't know.
pvq53	SEC	sec3	It is important to him to avoid anything dangerous.
pvq57	UNI	uni3	It is important to him to accept people even when he disagrees with them.

Note: The first column reports the item number in the PVQ.

SEC: security; UNI: universalism; items were assigned to three parcels per value domain (cf. the column labeled “parcel”).

Background variables. For sample descriptive purposes, we included the following background variables: year of birth, gender, education (“About how many years of education have you completed, whether full-time or part-time? Please report these in full-time equivalents and include compulsory years of schooling. Type in …”), immigration status (“Were you born in the USA? Yes/No/Don't know”), and self-reported income (“Which of the following descriptions comes closest to how you feel about your household's income nowadays? Living comfortably on present income = 1, Coping on present income = 2, Finding it difficult on present income = 3, Finding it very difficult on present income = 4, Refusal = 7, Don't know = 8”; ESS 2001).

Sample

A convenience sample of U.S. residents from the Amazon Mechanical Turk panel responded to a Qualtrics survey. Nine respondents were deleted because they responded negatively to the question, “In your honest opinion, should we use your data in our analyses in this study? Please answer honestly, this will not affect your compensation” (Meade and Craig 2012). In addition, six respondents were dropped because they provided no responses to the four profiles in the factorial design. The resulting sample included N = 232 respondents, with ages ranging from 20 to 79 years (Mean = 38.47); 36.64% of the respondents were female, and respondents had 13.86 years of formal education on average (min = 1, max = 30). A total of 3.9% of respondents were immigrants (i.e. they were not born in the USA), and the income distribution suggested that 36.2% were “Living comfortably on present income,” 41.4% were “Coping on present income,” 12.5% were “Finding it difficult on present income,” and 8.2% were “Finding it very difficult on present income.”

Results

The mean ratings and standard deviations in response to the four vignettes were M = 6.35 (standard deviation SD = 2.59) for Canadian immigrants with no university degree; M = 7.85 (SD = 2.07) for Canadian immigrants with a university degree (i.e. the most positive attitude); M = 5.97 (SD = 2.81) for Mexican immigrants with no university degree (i.e. the least positive attitude); and M = 7.60 (SD = 2.16) for Mexican immigrants with a university degree.

For the analysis, we used the R package lavaan 0.6–3 (Rosseel 2012). However, any general structural equation modeling (SEM) package could be used to analyze the data (i.e. the model specification is not specific to lavaan). In the current empirical application, we fit the SEMWISE model corresponding to Figure 2 to the four profile ratings. Table 4 reports the SEMWISE model syntax for lavaan and Mplus. The SEMWISE model fits the data well (χ²(1) = .467, p = 0.494, CFI = 1.00, TLI = 1.02, RMSEA = .000, SRMR = .008). Note that this model includes main effect weight factors only (i.e. it is assumed that there is no interaction between country of origin and education level), and the fact that the model showed no significant misfit to the data implies that the interaction effect between country of origin and education level is not significantly different from zero.

Table 4.

SEMWISE model syntax in lavaan and Mplus.

Lavaan	Mplus
semwise_core <-’	model
f_int = ∼1yr1 + 1yr2 + 1yr3 + 1yr4	f_int by yr1@1 yr2@1 yr3@1 yr4@1;
f_coo = ∼1yr1 + 1yr2 + −1yr3 + −1yr4	f_coo by yr1@1 yr2@1 yr3@−1 yr4@−1;
f_ud = ∼1yr1 + −1yr2 + 1yr3 + −1yr4	f_ud by yr1@1 yr2@−1 yr3@1 yr4@−1;
yr1∼0*1	(yr1 − yr4@0);
yr2∼0*1	(f_int f_coo f_ud);
yr3∼0*1
yr4∼0*1
f_int ∼1	f_int@1;
f_coo ∼1	f_coo@1;
f_ud ∼1	f_ud@1;

SEMWISE: structural equation modeling for within-subject experiments.

Note: yr1–yr4 are the observed variables (i.e. respondents’ ratings of the four profiles); f_int, f_coo, and f_ud are the intercept and weight factors.

The parameter estimates for this model are reported in Table 5. The estimated mean of the intercept factor of 6.946 shows that respondents are, on average, moderately positively predisposed toward immigrants in general (as the response scale ranged from 0 to 10). The average effect of Canada (Mexico) as COO is .149 (−.149), which implies that, on average, immigrants from Canada are rated .298 (i.e.149 − (−.149)) points higher than those from Mexico (p = 0.002). Furthermore, immigrants with a university degree are, on average, rated 1.560 (i.e. 0.780 − (−.780)) points higher than those without (p < 0.001). These fixed effects capture the mean rating and the mean differences between the levels of the two attributes across respondents, but importantly all the factors show statistically significant and substantial variance. This finding indicates that there are individual differences in mean profile ratings and in the effects of country of origin and education level on acceptance of immigrants. Given this variability, we can relate the intercept and weight factors to other variables, thus utilizing the extended SEMWISE approach (in this case, a conditional SEMWISE model) in which the latent weight factors are expressed as functions of potential antecedents.

Table 5.

Parameter estimates for the core SEMWISE model in Figure 2.

	Mean estimates			Variance estimates			Correlations
	M	SE	p(> \|z\|)	Var.	SE	p(> \|z\|)	F_COO	F_UD
F_INT	6.946	0.128	<0.001	3.467	.353	<.001	−.115	−.266
F_COO	0.149	0.047	0.002	0.254	.058	<.001		.241
F_UD	0.780	0.072	<0.001	0.903	.116	<.001

SEMWISE: structural equation modeling for within-subject experiments; F_INT: factors: an intercept factor; F_COO: factor for country of origin; F_UD: factor for university degree.

Note: The significance level of the correlation reported in boldface is p < 0.01 and of all other correlations is p > 0.05; M: mean, SE: standard error.

For the illustration we use value measures as antecedents. We first evaluate the measurement model for the value factors. Specifically, we constructed three parcels for universalism and three parcels for security (Cole, Perkins, and Zelkowitz 2016), and our specific parceling approach is reported in Table 3. A confirmatory factor analysis with universalism and security as latent factors, each with three parcels as indicators, yields an acceptable fit based on most fit indices (χ²(8) = 26.08, p = 0.001, CFI = .984, TLI = .971, RMSEA = .099, SRMR = .031). In addition, the standardized factor loadings range from.82 to.92 (all p-values <0.001), and both factors show good internal consistency with composite reliabilities of C.R. = .93 for universalism and C.R. = .90 for security.

Adding the universalism and security factors (each with three parceled indicators) as antecedents of the intercept and weight factors to the SEMWISE model resulted in an acceptable model fit (χ²(27) = 47.345, p = 0.009, CFI = .988, TLI = .979, RMSEA = .057, SRMR = .030). The estimates of the structural coefficients for the intercept and the weight factors are reported in Figure 6. Consistent with expectations, and as evidenced by the relationship between the values and the intercept factor, participants who scored higher on universalism values were more positively predisposed toward immigrants in general, supporting H1a. In contrast, participants who scored higher on security values were less positively predisposed toward immigrants in general, supporting H2a. The associations between the values and the weight F_COO and education displayed in Figure 6 suggest that universalistic individuals are approximately equally supportive of immigrants regardless of country of origin and level of education. This is demonstrated by the small and nonsignificant association between the universalism scores and the two weight factors, which is consistent with hypotheses H1b and H1c. By way of contrast, individuals who attribute greater importance to security values prefer immigrants from Canada (vs. Mexico) and immigrants with a university degree (vs. not having a university degree), as demonstrated by the significant and positive associations between the security value scores and the two weight factors, which supports H2b and H2c.

Figure 6.

Structural model with estimates.

Extensions of the SEMWISE model

The previous empirical example demonstrated the basic SEMWISE model with added latent antecedents. This basic SEMWISE model can be extended in various ways. These extensions relate to the factors manipulated in the survey, the measurement of the dependent variable (ratings), and differences in model parameters across groups of respondents.

In the basic model, the manipulated factors were binary nominal variables. If a factor is metric and the effect of different levels of the factor on responses is thought to be linear, loadings with equal intervals can be used (e.g. loadings of 1, 2, and 3 for price levels of $1, $2, and $3, respectively). It is also possible to model interactions between the manipulated factors. The loadings of interaction factors are obtained by multiplying the loadings of the components of the interaction. For example, if a researcher expects that COO and level of education have an interactive effect on acceptance of immigrants, an interaction weight factor with loadings of + 1, −1, −1, and + 1 can be added to the model (although it should be noted that for the model to be identified, further restrictions should be imposed).

Often, the dependent variable is measured with a single indicator, especially when respondents are asked to rate many different profiles. However, if a researcher believes that single measures may not be sufficiently reliable, multiple measures may be used to assess respondents’ reactions to each profile. A second-order factor model can be specified in this case, in which the first-order factors capture the commonality of the multiple responses to each profile and the first-order factors are then related to the second-order weight factors. It is also possible to account for method variance in the ratings (Podsakoff et al. 2003), for example, by relating the observed responses to a hypothesized method factor (e.g. social desirability) or including an (implicit) method factor (e.g. if there are reverse-keyed items, a method factor for the reversed items could be specified; see, e.g. Billiet and Davidov 2008; Billiet and McClendon 2000).

Finally, the basic SEMWISE model was formulated for a single-sample context. However, it is possible to specify a multisample model in which separate models are estimated and tested for different groups of respondents. For example, a researcher may hypothesize that men and women respond differently to immigrants and thus might want to investigate gender differences explicitly. Gender would be employed as a grouping variable in this case, and the multisample model could be used to formally assess similarities and differences in acceptance of immigrants between men and women (e.g. in terms of mean differences and homogeneity of effects as well as relationships with various antecedents and consequences). Consequently, one could estimate if the means and variances of the weight factors differ across males and females. The multisample approach may be particularly beneficial in cross-cultural and cross-country studies.

Summary and general discussion

Factorial surveys are commonly used by researchers in many disciplines, including the social sciences in general and sociology in particular. One likely reason is that they offer several advantages over classical (Likert-type) attitude items. Analyzing people's responses to vignettes with concrete descriptions is likely to yield more realistic and direct representations of judgments and evaluations than using more general survey questions. In addition, responses to vignettes tend to be less prone to social desirability bias. In the current paper, we proposed that data from factorial surveys could be analyzed in an SEM framework using the Structural Equation Modeling for Within-Subject Experiments, that is, SEMWISE, approach. In SEMWISE, the weights linking the factorial attributes to responses are modeled as latent variables and the model is essentially a confirmatory factor model with a fixed loading matrix. The key advantages of using SEMWISE to analyze factorial survey data are that (a) the latent intercept and weights can be easily related to observed or latent antecedents and outcome variables—thus making it possible to model interindividual variation in these weights; (b) measurement error in observed responses can be accounted for; (c) the overall fit of the model can be assessed; and (d) the generalizability of the means of the intercept and weights and their relationships with other variables across groups of respondents can be studied using multisample models.

In this paper we began by introducing the method, we illustrated how it can be applied, and we explained its advantages over other approaches to analyze factorial surveys. In the practical illustration we examined how the (positive) effect of universalism values and the (negative) effect of security values on people's willingness to allow immigrants into the country may vary as a function of immigrants’ country of origin (Canada vs. Mexico) and level of education (university degree or no university degree). The goal of the empirical illustration was to provide a hands-on example of how the SEMWISE approach can be used in practice and to verify that it yields results that are consistent with the extant literature. In line with expectations, the results suggested that universalists more strongly endorsed, and individuals scoring higher on security more strongly rejected, immigrants in general. Importantly, individuals who attributed greater importance to security values preferred immigrants from Canada (vs. Mexico) and immigrants with a university degree (vs. not having a university degree).

Even though we pointed out some advantages of factorial surveys, there are also some limitations, and these limitations cannot be avoided by using structural equation modeling during data analysis. First, while the factorial survey approach tries to address the problem of social desirability by employing specific scenarios rather than more general questions as commonly implemented in surveys, it does not resolve the problem fully. It could still be the case that respondents provide socially desirable responses, which may bias the conclusions. Furthermore, although the description of specific vignettes for the measurement of dependent variables may help to establish causality, the direction of the causal process linking the individual difference variables to the weight factors is not entirely clear (i.e. people's responses to specific scenarios may influence the values they express later in the survey). Only panel or experimental studies can address this issue more concretely. Finally (and as pointed out by a reviewer), vignette studies are not commonly included in large social surveys such as the European Social Survey, the European Value Study, or other large-scale international surveys. One reason may be that classical Likert items are perceived as relatively efficient, especially when the aim it to measure many evaluations that do not involve tradeoffs. However, if the aim is to understand (individual differences in) the way attributes of social objects influence respondents’ evaluations of these objects, especially when tradeoffs between attributes are involved, factorial survey designs may be the more useful approach.

The SEMWISE approach may be particularly useful when researchers want to model individual variation in individuals’ responses to the attributes manipulated in a factorial survey, and when they want to relate this individual variation to other variables, especially if these relations involve conceptual networks or sequences of multiple variables (e.g. with mediation) and/or latent variables (e.g. personal values). Our practical illustration demonstrated that the effects of vignette attributes on responses vary significantly across individuals, and the SEMWISE approach may help identify and model such variation more accurately.

Footnotes

Acknowledgments

Eldad Davidov would like to thank the University of Zurich Research Priority Program Social Networks. Hans Baumgartner gratefully acknowledges support from the Smeal Chair Endowment. All three authors would like to thank Lisa Trierweiler for the English proof of the manuscript.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Bert Weijters

Author Biographies

Bert Weijters is an associate professor of market research at Ghent University (Belgium) in the department of Work, Organization, and Society and the research center Behavioral Economics For Life (BE4Life). His main research interests and recent publications are in research methodology (focusing on survey methods and measurement analysis) and consumer psychology (focusing on sustainable consumer behavior).

Eldad Davidov is a professor of sociology at the Universities of Cologne (Germany) and Zurich (Switzerland) and codirector of the university research priority program on social networks at the University of Zurich. He was president of the European Survey Research Association between 2015 and 2017. His research interests concentrate on structural equation modeling especially applied to cross-cultural and longitudinal survey data. In his research he analyzes human values and attitudes toward immigrants or other minorities.

Hans Baumgartner is the Smeal Chair Professor of Marketing in the Smeal College of Business at the Pennsylvania State University, University Park, PA. His research interests are in consumer psychology and research methodology, particularly structural equation modeling and measurement analysis.

References

Aiman-Smith

Lynda

Scullen

Steven E.

Barr

Steve H.

. 2002. “Conducting Studies of Decision Making in Organizational Contexts: A Tutorial for Policy-Capturing and Other Regression-Based Techniques.” Organizational Research Methods 5(4):388–414.

Auspurg

Katrin

Hinz

Thomas

. 2015. Factorial Survey Experiments. Quantitative Applications in the Social Sciences series, Vol. 175. Los Angeles: Sage Publications.

Auspurg

Katrin

Jäckle

Annette

. 2017. “First Equals Most Important? Order Effects in Vignette-Based Measurement.” Sociological Methods & Research 46(3):490–539.

Beierlein

Constanze

Kuntz

Anabel

Davidov

Eldad

. 2016. “Universalism, Conservation and Attitudes Toward Minority Groups.” Social Science Research 58:68–79.

Billiet

Jaak B.

Davidov

Eldad

. 2008. “Testing the Stability of an Acquiescence Style Factor Behind Two Interrelated Substantive Variables in A Panel Design.” Sociological Methods and Research 36(4):542–62.

Billiet

Jaak B.

McClendon

McKee J.

. 2000. “Modeling Acquiescence in Measurement Models for Two Balanced Sets of Items.” Structural Equation Modeling 7(4):608–28.

Carroll

J. Douglas

Green

Paul E.

. 1995. “Psychometric Methods in Marketing Research: Part I, Conjoint Analysis.” Journal of Marketing Research 32(4):385–91.

Castillo

Juan C.

Olivos

Francisco

Azar

Ariel

. 2019. “Deserving A Just Pension: A Factorial Survey Approach.” Social Science Quarterly 100(1):359–78.

Cieciuch

Jan

Davidov

Eldad

Schmidt

Peter

Algesheimer

René

Schwartz

Shalom H.

. 2014. “Comparing Results of an Exact vs. an Approximate (Bayesian) Measurement Invariance Test: A Cross-Country Illustration with A Scale to Measure 19 Human Values.” Frontiers in Psychology 5:982.

10.

Cole

David A.

Perkins

Corinne E.

Zelkowitz

Rachel L.

. 2016. “Impact of Homogeneous and Heterogeneous Parceling Strategies When Latent Variables Represent Multidimensional Constructs.” Psychological Methods 21(2):164–74.

11.

Davidov

Eldad

Meuleman

Bart

. 2012. “Explaining Attitudes Towards Immigration Policies in European Countries: The Role of Human Values.” Journal of Ethnic and Migration Studies 38(5):757–75.

12.

Davidov

Eldad

Schmidt

Peter

Schwartz

Shalom H.

. 2008. “Bringing Values Back In: The Adequacy of the European Social Survey to Measure Values in 20 Countries.” Public Opinion Quarterly 72(3):420–45.

13.

Di Stasio

Valentina

Gerxhani

Klarita

. 2015. “Employers’ Social Contacts and Their Hiring Behavior in A Factorial Survey.” Social Science Research 51:93–107.

14.

Doherty

Michael E.

Keeley

Stuart M.

. 1972. “Use of Subjective Predictors in Regression Analysis for Policy Capturing.” Journal of Applied Psychology 56(3):277–78.

15.

Dülmer

Hermann

. 2007. “Experimental Plans in Factorial Surveys: Random or Quota Design?” Sociological Methods & Research 35(3):382–409.

16.

Dülmer

Hermann

. 2016. “The Factorial Survey: Design Selection and its Impact on Reliability and Internal Validity.” Sociological Methods & Research 45(2):304–47.

17.

Emerson

Michael O.

Chai

Karen J.

Yancey

George

. 2001. “Does Race Matter in Residential Segregation? Exploring the Preferences of White Americans.” American Sociological Review 66(6):922–35.

18.

ESS, European Social Survey. 2001. European Social Survey Core Questionnaire Development - Overview. London: Centre for Comparative Social Surveys, City University London.

19.

Finger

Claudia

. 2016. “Institutional constraints and the Translation of College Aspirations Into Intentions-Evidence From A Factorial Survey.” Research in Social Stratification and Mobility 46(Part B):112–28.

20.

Gelman

Andrew

Hill

Jennifer

. 2007. Data Analysis Using Regression and Multilevel/Hierarchical Models (Analytical Methods for Social Research). New York: Cambridge University Press.

21.

Green

Paul E.

Krieger

Abba M.

Wind

Yoram

. 2001. “Thirty Years of Conjoint Analysis: Reflections and Prospects.” Interfaces 31(3):S56–73.

22.

Green

Paul E.

Rao

Vithala R.

. 1971. “Conjoint Measurement for Quantifying Judgmental Data.” Journal of Marketing Research 8(3):355–63.

23.

Green

Paul E.

Srinivasan

Venkat

. 1990. “Conjoint Analysis in Marketing: New Developments with Implications for Research and Practice.” The Journal of Marketing 54(4):3–19.

24.

Gustafsson

Anders

Herrmann

Andreas

Huber

Frank

. 2000. “Conjoint Analysis as an Instrument of Market Research Practice.” Pp. 5–45 in Conjoint Measurement, edited by Gustfsson

Hermann

Huber

. Berlin, Heidelberg: Springer.

25.

Li-tze

Bentler

Peter M.

. 1999. “Cutoff Criteria for Fit Indexes in Covariance Structure Analysis: Conventional Criteria Versus New Alternatives.” Structural Equation Modeling: A Multidisciplinary Journal 6(1):1–55.

26.

Jasso

Guillermina

. 2006. “Factorial Survey Methods for Studying Beliefs and Judgments.” Sociological Methods & Research 34(3):334–423.

27.

Jasso

Guillermina

. 2012. “Safeguarding Justice Research.” Sociological Methods & Research 41(1):217–39.

28.

Jasso

Guillermina

Opp

Karl-Dieter

. 1997. “Probing the Character of Norms: A Factorial Survey Analysis of the Norms of Political Action.” American Sociological Review 62(6):947–64.

29.

Karren

Ronald J.

Barringer

Melissa Woodard

. 2002. “A Review and Analysis of the Policy-Capturing Methodology in Organizational Research: Guidelines for Research and Practice.” Organizational Research Methods 5(4):337–61.

30.

Liebe

Ulf

Moumouni

Ismaïl M.

Bigler

Christine

Ingabire

Chantal

Bieri

Sabin

. 2020. “Using Factorial Survey Experiments to Measure Attitudes, Social Norms, and Fairness Concerns in Developing Countries.” Sociological Methods & Research 49(1):161–92.

31.

Lyons

Christopher J

. 2008. “Individual Perceptions and the Social Construction of Hate Crimes: A Factorial Survey.” The Social Science Journal 45(1): 107–31.

32.

Meade

Adam W.

Bartholomew Craig

. 2012. “Identifying Careless Responses in Survey Data.” Psychological Methods 17(3):437–55.

33.

Mertz

William H.

Doherty

Michael E.

. 1974. “The Influence of Task Characteristics on Strategies of Cue Combination.” Organizational Behavior and Human Performance 12(2):196–216.

34.

Podsakoff

Philip M.

MacKenzie

Scott B.

Lee

Jeong-Yeon

Podsakoff

Nathan P.

. 2003. “Common Method Biases in Behavioral Research: A Critical Review of the Literature and Recommended Remedies.” Journal of Applied Psychology 88(5):879–903.

35.

Rosseel

Yves

. 2012. “lavaan: An R Package for Structural Equation Modeling.” Journal of Statistical Software 48(2):1–36.

36.

Rossi

Peter H.

Anderson

Andy B.

. 1982. “The Factorial Survey Approach: An Introduction.” Pp. 15–67 in Measuring Social Judgments: The Factorial Survey Approach, edited by Rossi

P. H.

Nock

S. S.

. Beverly Hills: Sage.

37.

Schwappach

David L. B.

Gehring

Katrin

. 2014. “Silence That can Be Dangerous: A Vignette Study to Assess Healthcare Professionals’ Likelihood of Speaking Up About Safety Concerns.” PLoS One 9(8):e104720.

38.

Schwartz

Shalom H.

Cieciuch

Jan

Vecchione

Michele

Davidov

Eldad

Fischer

Ronald

Beierlein

Constanze

Ramos

Alice

Verkasalo

Markku

Lönqvist

Jan-Erik

. 2012. “Refining the Theory of Basic Individual Values.” Journal of Personality and Social Psychology 103(4):663–88. doi: 10.1037/a0029393

39.

Shamon

Hawal

Dülmer

Hermann

Giza

Adam

. Forthcoming. “The Factorial Survey: The Impact of the Presentation Format of Vignettes on Answer Behavior and Processing Time.” Sociological Methods & Research. https://doi.org/10.1177/0049124119852382.

40.

Shlay

Anne B

. 2010. “African American, White and Hispanic Child Care Preferences: A Factorial Survey Analysis of Welfare Leavers by Race and Ethnicity.” Social Science Research 39(1):125–41.

41.

Dan

Steiner

Peter M.

. 2020. “An Evaluation of Experimental Designs for Constructing Vignette Sets in Factorial Surveys.” Sociological Methods & Research 49(2):455–97.

42.

Tomassetti

Alan J.

Dalal

Reeshad S.

Kaplan

Seth A.

. 2016. “Is Policy Capturing Really More Resistant Than Traditional Self-Report Techniques to Socially Desirable Responding?” Organizational Research Methods 19(2):255–85.

43.

Wallander

Lisa

. 2009. “25 Years of Factorial Surveys in Sociology: A Review.” Social Science Research 38(3):505–20.

44.

Weijters

Bert

Baumgartner

Hans

. 2019. “Analyzing Policy Capturing Data Using Structural Equation Modeling for Within-Subject Experiments (SEMWISE).” Organizational Research Methods 22(3):623–48.

45.

Zercher

Florian

Schmidt

Peter

Cieciuch

Jan

Davidov

Eldad

. 2015. “The Comparability of the Universalism Value Over Time and Across Countries in the European Social Survey: Exact vs. Approximate Measurement Invariance.” Frontiers in Psychology 6:733.