Abstract
Item orientation—whether agreement with a survey item indicates higher (positive) or lower (negative) standing on a construct—can influence the factor structure of psychological assessments in education research. This study investigated how item orientation affected inferences about the factor structure of students’ epistemology in history because previous studies consistently measured factors of epistemology with uniformly oriented items within factors and polarized item orientation across factors. We administered a survey comprised of positively and negatively oriented items for three previously hypothesized dimensions of student epistemology—complex, tentative, and integration beliefs. We used confirmatory factor analysis to examine how different approaches to handling item orientation affected conclusions about the factor structure of epistemology. First, we replicated the factor structure of epistemology in prior research using a subset of uniformly oriented items within factors and polarized item orientation across factors, finding similar fit and small inter-factor associations as seen in previous research. Second, we introduced items of mixed orientation to measure each factor and found that model fit became poor. Third, we simultaneously modeled substantive and method factors, resulting in improved model fit. The method factor explained more variance than substantive factors, indicating item orientation was a significant confound. When controlling for a method effect (i.e., item orientation), associations among epistemology factors were large and positive, suggesting that previous findings of roughly orthogonal epistemology factors may result from offsetting negatively associated method factors and positively associated substantive factors.
Keywords
Introduction
Self-report surveys measuring respondents’ psychological traits, attitudes, or beliefs often contain items that are positively or negatively oriented toward the construct of interest. When a respondent agrees with a positively oriented item, this indicates a higher level of a construct. For example, the following positively oriented item intends to measure respondents’ beliefs about the complexity of knowledge in history: “To what extent do you agree with the following? Historical events are due to multiple causes.” Agreement is hypothesized to indicate a belief that knowledge in history is complex. Because historians generally agree that knowledge in history is complex and tentative (as opposed to simple and certain), strong agreement with the positively oriented item indicates sophisticated epistemological development that is adaptive for performance in source-based reasoning tasks in history (Bain, 2006; Goldman et al., 2016; Monte-Sano, 2012; Wiley et al., 2020).
Conversely, when responding to a negatively oriented item, disagreement indicates a higher standing on that construct (Zhang & Savlei, 2024). To measure beliefs about the complex nature of knowledge in history, a survey item might ask: “To what extent do you agree with the following? Historical events are due to a single cause.” Disagreement in this case indicates high levels of belief in the complexity of historical knowledge. Because these items have an inverse relation to the construct of interest, they are frequently reverse-coded for ease of interpretation in descriptive and predictive models (Bandalos, 2018; van Sonderen et al., 2013). These items are sometimes referred to as reverse-coded, negatively valenced, or negatively keyed. We use oriented presently for clarity of meaning. We also distinguish between negative orientation, where agreement indicates lower levels of a construct, and negative wording, using grammatical negations like “not.” An item can be negatively oriented without using negation, and vice versa.
Importantly, the use of positively and negatively oriented items in the same survey presumes individuals respond to both groups of items similarly (e.g., whether an item is positively or negatively oriented, a belief in the complex nature of knowledge consistently informs their response) (Marsh, 1996; Wang et al., 2018). However, growing empirical evidence undermines this presumption (Cole et al., 2019; Dalal & Carter, 2014; Deemer & Minke, 1999; Dueber et al., 2022; Woods, 2006).
If patterns of item orientation affect responses to surveys and these response patterns are unaccounted for, the underlying psychometric properties of the measure are confounded by method artifacts. We argue that previously validated scales measuring students’ epistemology—beliefs about the nature of knowledge and how it is formed (Barzilai et al., 2015)—are confounded by response patterns due to item orientation. Specifically, patterns of uniform item orientation within factors and polarized item orientations across factors (e.g., dimension A is comprised of positively oriented items; dimension B is comprised of negatively oriented items) contribute to apparent factor structure and attenuate observed relations between factors. We use a survey measuring epistemological beliefs in history to illustrate broader measurement challenges with research measuring epistemological beliefs.
To contextualize our study, we first discuss item orientation as a confound in other psychological scales. Then, we discuss research measuring dimensions of students’ epistemological beliefs, highlighting patterns of item orientation that make it unclear whether findings related to the dimensionality of epistemology are due to substantive differences in epistemology or, simply, item orientation.
Item Orientation as a Confound in Psychological Measurement
Some researchers suggest that a mix of positively and negatively oriented items may force survey respondents to think more intently about each item before responding and, consequently, reduce acquiescence bias or social desirability bias (Arias et al., 2022; Barnette, 2000; Hughes, 2009). Others argue that including negatively oriented items in surveys can introduce bias in measures because negatively oriented items can be difficult to understand (Wang et al., 2018). For example, the statement “Historical explanations should not change in light of new information” is negatively oriented because disagreement indicates a higher standing on the construct—tentative beliefs. Individuals who disagree with that statement should agree with its positively oriented version, “Historical explanations should change in light of new information,” for a psychometric scale to be reliable and valid.
Empirically, numerous studies show positively and negatively oriented items do not always function equivalently; they can generate response patterns based on item orientation that are independent of the underlying construct of interest (e.g., Deemer & Minke, 1999). Consequently, failing to address item orientation produces data and interpretations that are confounded by artifactual method effects, including deflated scale means and changes in factor structure (e.g., Arias et al., 2022; Cole et al., 2019; Dalal & Carter, 2014; DiStefano & Motl, 2006; Dueber et al., 2022). For example, one might incorrectly describe a unidimensional concept as multidimensional because half of the items were positively oriented and half were negatively oriented (Arias et al., 2022).
Despite theoretical and empirical documentation of item orientation as a confound, measures of the dimensions of epistemology—beliefs about knowledge and how it is constructed—do not fully account for the influence of item orientation in their development of scales or in subsequent descriptive and predictive studies reporting evidence that epistemology is multidimensional (e.g., Bråten & Strømsø, 2009; Conley et al., 2004; Muis et al., 2015; Ståhl, 2019; Strømsø et al., 2008; Trevors et al., 2017; Wiley et al., 2020; Yli-Panula et al., 2021). Failure to adequately control for method variance has potentially problematic consequences for claims researchers make about the unique predictability of epistemic dimensions as “patterns of responses based on item orientation may generate independent factors that represent artifacts of the language of items, rather than underlying construct[s]” (Dueber et al., 2022, p. 2; see also Deemer & Minke, 1999; DiStefano & Motl, 2006). If the factor structure of epistemology is confounded by artifactual method factors (e.g., item orientation), we cannot measure epistemology accurately. Therefore, the present study was designed to examine the influence of item orientation on scales measuring secondary students’ epistemology in history, providing an evaluation of potential measurement issues for epistemological beliefs more broadly. Before describing the present study, we review previous studies examining the dimensionality of epistemology.
Dimensional Views of Epistemology
Early theoretical work describing students’ beliefs about knowledge—epistemology (Hofer, 2001; Hofer & Pintrich, 1997; Schommer, 1994)—advanced multidimensional concepts of epistemology that recent empirical work sought to validate for descriptive or predictive applications (e.g., how different sets of beliefs predict reasoning). Two prominent dimensions represented in this research reflect an individual’s views about the nature of knowledge: (a) whether knowledge is simple or complex, and (b) whether knowledge is certain or tentative. Two other dimensions emphasized focus on an individual’s views about how one comes to know: (c) what one considers a good source of knowledge, and (d) how one should justify a knowledge claim (Bråten & Strømsø, 2006; Bråten et al., 2008; Bromme, 2005; Strømsø et al., 2008; Wiley et al., 2020). These theorized dimensions of epistemology are conceptualized as existing on a spectrum ranging from naive (e.g., knowledge is simple and certain) to more sophisticated beliefs (e.g., knowledge is complex and tentative) as illustrated in Figure 1A.

Dimensional view of epistemology.
In the figure, more sophisticated beliefs are represented on the right-hand side of the spectrum. Each line is hypothesized to be a distinct dimension that in turn predicts individual variation in complex reasoning, reading, evaluation, and writing tasks across multiple subjects (e.g., Barzilai & Eshet-Alkalai, 2015; Bråten & Strømsø, 2006; Bråten et al., 2014; Bromme, 2005; Muis et al., 2015). For example, in history, Wiley et al. (2020) argued that different components of students’ epistemic beliefs—measured by simplicity/certainty and integration components—uniquely predicted historical writing outcomes across three studies and found these beliefs were not strongly related.
To develop items to measure epistemology in self-report surveys, researchers have commonly adapted items from more extensive epistemology surveys (e.g., the Topic-Specific Epistemic Beliefs Questionnaire [TSEBQ]) to fit with specific domains (e.g., science or history). They also have reduced items in their studies using factor analytic methods. However, researchers have generally failed to report which items were retained or modified from the TSEBQ or how these decisions were made (e.g., Bråten & Strømsø, 2010). Further, the “dimensions” researchers find are typically informed by items with uniform orientation within subscales and disuniform, or polarized orientation across subscales (i.e., one dimension has negatively oriented items, and one has positively oriented items). These subscales become distinct and dissociable factors predicting reasoning outcomes, though they may conflate response patterns due to item orientation with substantive factors. In short, item orientation may be a confound.
Turning to a specific study, Bråten and Strømsø (2009) adapted the TSEBQ to measure students’ epistemic beliefs about climate change. They posited that simplicity and certainty were two distinct dimensions of epistemic beliefs about climate change and that these dimensions uniquely predicted student reasoning. Their scale has been applied or modified in subsequent studies (see Wiley et al., 2020). We note here that the names of dimensions are aligned with naive beliefs, whereas other researchers name the dimensions aligned with the higher end of the construct. However, the naming of the construct is arbitrary. One can simply reverse-code all items for the same result. What matters is whether the inclusion of negatively and positively oriented items within and across scales is possibly confounding response patterns with constructs of interest. For clarity in this paper, we operationalize positively oriented items as those where agreement indicates high amounts of sophisticated beliefs.
Returning to Bråten and Strømsø’s study, consider two items in their first dimension that are all positively oriented—where agreement indicates tentative beliefs: “Theories about climate change can be disproved at any time” and “What is considered to be certain knowledge about climate research today may be considered to be false tomorrow.” Agreement with these questions implies more sophisticated beliefs that knowledge is tentative. Meanwhile, a second dimension is composed entirely of negatively oriented questions, where disagreement indicates sophisticated, complex beliefs: “Knowledge about climate change is indisputable.”
Although this item, like the two before it, assesses a belief in whether knowledge changes, it is placed within a different construct alongside other negatively oriented items. When researchers reverse-code this item, we contend that responses should match the first two items. They don’t. Similarly, Bråten and Strømsø’s (2009) positively oriented item, “Theories about climate change can be disproved at any time,” is similar to Wiley et al.’s (2020) negatively oriented item, “Historical explanations should not change in light of new information.” These two researchers posit different factor structures, despite semantic overlap in items (see Appendix C). Given examples of this throughout the field, we hypothesize that when items are grouped into separate factors based on uniform item orientation within factors and polarized orientation across factors, factor analytic results partially reflect response patterns based on item orientation, rather than substantively distinct constructs.
Across all the papers we examined using Likert scale responses to measure dimensions of epistemology, the use of subscales/factors/components/dimensions with uniform item orientation was ubiquitous. Table 1 shows 21 examples of hypothesized dimensions of epistemology measured with uniformly oriented items within factors and polarized item orientation across at least one other factor. These studies use a variety of factor analytic approaches but share the theoretical perspective that epistemology has multiple dimensions, which they represent in uniformly positive or negative subscales. We identified only three counterexamples of studies not using polarized items across scales. Gunes and Bati (2018) present one dimension that is nearly uniformly negative and another that is nearly uniformly positive (each dimension features one disuniform item). Bråten et al. (2008) use all negatively oriented items for both the Source and Simplicity dimensions. They report low Cronbach’s alpha values (.70 and .56), marginally poor fit in CFA, and do not test their two-factor model against a baseline unidimensional model. Özmen and Özdemir (2019) appear to be an exception. They note they omit inappropriate response patterns and delete items with cross-loadings when building their scales.
Studies Using Uniform Item Orientation Within Scales and Polarized Item Orientation Across Scales
Note. *Additional dimensions were used.
Boldface indicates the orientation of items within that construct.
No studies we examined accounted for the potential effects of item orientation methodologically (e.g., by correlating residuals or controlling for positive and negative valence of items through correlated trait-–correlated method approaches). Further, researchers frequently reported reducing items by examining fit statistics and coefficient alpha while ignoring patterns in item orientation (e.g., Bråten & Strømsø, 2009; Bråten et al., 2009; Strømsø et al., 2008; Trevors et al., 2017; Wiley et al., 2020; Yli-Panula et al., 2021) and used factor analytic methods (i.e., PCA and EFA) that are sensitive to capturing latent factors in the data that may be “method” factors (i.e., sharing item orientation) (Marsh, 1996). At the same time, researchers have noted issues with cross-validation, disagreements in factor structure and relations between dimensions, and item loadings that are unexpected or contradictory to hypotheses in epistemological research using self-report measures (Buehl, 2008; Schraw, 2013). While theory suggests dimensions of epistemology are correlated, many studies find factors to be orthogonal. All of this indicates a potential measurement issue we investigate presently using beliefs in history as an illustrative case.
There appears to be some consideration that item orientation may influence response patterns. For example, Bråten and Strømsø (2010) briefly discussed the issue of item orientation in a study examining the effects of epistemic beliefs on student reading. They used PCA to reduce items in the TSEBQ and noted that items with poor factor loadings were discarded: While it was not possible to differentiate between the items that we excluded and the items that we retained with respect to content, it should be noted that all the items excluded from the justification dimension were negatively [oriented] (e.g., I often feel that I just have to accept that what I read about climate problems can be trusted), suggesting that, at least to our participants, rejection of a negatively [oriented] item did not necessarily mean the same as endorsement of a positively [oriented] item. (p. 643)
Despite this observation, the influence of item orientation was not further investigated, and the practice of using statistical methods to reduce items and create uniform item orientation within scales has persisted in research examining the dimensions of epistemic beliefs (e.g., Demirbag, 2021; Huang et al., 2023; Urhahne & Kremer, 2023; Voitle et al., 2022; Wiley et al., 2020; Yli-Panula et al., 2021).
Present Study
Given the influence of item orientation as a confound in other areas of research and the inattention to item orientation as a confound in research measuring dimensions of epistemology in various subjects, we conducted a study to examine the influence of item orientation on a scale measuring students’ beliefs about knowledge in history. The following research question guided the study:RQ 1: Do factors identified in previous epistemology research primarily reflect substantively distinct dimensions of epistemology or method factors due to item orientation?
We designed and administered a scale to measure students’ epistemological beliefs in history with both positively and negatively oriented items measuring hypothesized epistemological factors from prior research. Having both positively and negatively oriented items allowed us to use confirmatory factor analysis (CFA) to test the presumption of whether these items function equivalently as assumed in previous research. We then presented various models to examine the influence of item orientation on response patterns. We organize these analyses under two subordinate research questions.
RQ 1a: Can item orientation be leveraged to both replicate factors in previous research and create artifactual factors?
First, we estimated two models with epistemology factors having uniformly oriented items within factors, as is common practice in previous research. In one of these models, items were loaded onto factors that align with previous research. In a second model, items were loaded onto factors in a way that contradicted previous findings about the dimensionality of epistemology, but patterns of item orientation remained uniform. If both models had a good fit, this would (1) illustrate how factor structure can change due only to method variance (i.e., item orientation) and (2) support the hypothesis that item orientation is a confound (see Figure 1b).
RQ 1b: What are the relative contributions of substantive and method factors in explaining response patterns when using a mix of positively and negatively oriented items to measure dimensions of epistemology?

Models showing how item orientation contributes to factor structure in previous research.
Next, we tested the relative fit of models where substantive and method factor[s] were represented to explain item responses. In these models, comparative model fit, factor loadings, and ancillary bifactor indices were evaluated to estimate the relative contributions of substantive and method factors (i.e., item orientation) in explaining item responses.
Based on prior studies showing item orientation as a confound in other domains and the consistent use of uniform item orientation within factors and polarized item orientation across factors, we anticipated that the practice of positively and negatively orienting items on an epistemology survey in history would impact its factor structure in every model we tested. Further, we expected that item orientation, represented in a method factor, would explain more variance in response patterns than previously hypothesized substantive factors. In short, we expected that respondents having to agree or disagree with questions would contribute more to factor structure than the substance/content of the questions.
Methods
Participants
This study was conducted in a suburban school district in the western United States in 2022, with a predominantly Hispanic population (80%). Most students qualified for free or reduced-priced lunch (64%), and approximately 14% of the students were English learners as designated by their districts. About 45% of the students in the participating district historically met or exceeded standards for English language arts prior to the COVID-19 pandemic (during the pandemic, either the test results were unreliable, or the tests were not administered).
Participants in this study included 214 students in Grades 6–12 who were participating in a field trial of an instructional intervention to improve students’ source-based argument writing in history. In spring 2022, 20 classes in Grades 6–12 took a measure of student epistemology following the yearlong intervention. Of the 214 students, there were 34 in Grade 6, 44 in Grade 7, 33 in Grade 8, 22 in Grade 10, 34 in Grade 11, and 30 in Grade 12; 52% of the students were female.
Epistemology Survey
We generated a 16-item epistemology survey using existing research measures (Bråten et al., 2014; Buehl & Alexander, 2005; VanSledright & Maggioni, 2016; Voss, 1998; Wiley et al., 2020). The scale was developed using a domain-specific (i.e., history) approach as prior research has provided evidence that when measuring students’ epistemic beliefs, the discipline in which students are reasoning must be specified due to differences in disciplinary norms around knowledge construction (Goldman et al., 2016; McCrudden & Sparks, 2014; Rouet et al., 2017).
Items included in our epistemological survey were designed to measure beliefs in three potential dimensions that are relevant to creating knowledge in history: complex (interrelated theories and facts, and multiple causes or explanations are best in history), tentative (explanations and truths can be revised or changed with new information), and integration beliefs (multiple sources of evidence are needed to justify claims). In history, beliefs that knowledge is complex and tentative are needed to spur effortful evaluation of conflicting information (Bain, 2006; Goldman et al., 2016; Wiley et al., 2020). The integration dimension corresponds to how one believes knowledge claims should be justified and how evidence should be integrated, as evidentiary reasoning is a key component of history as a discipline (Goldman et al., 2016; Wiley et al., 2020). We did not measure a fourth dimension about what counts as a good source of knowledge given unclear findings regarding this dimension in previous research and a lack of clear applicability of this dimension in history (i.e., sometimes trusting authority as a source of knowledge is adaptive and sometimes one should trust themself as a constructor of knowledge).
Ten potential items were generated for each of the three hypothesized dimensions—complex, tentative, and integration beliefs—based on items used in previous research (Bråten & Strømsø, 2009, 2010; Wiley et al., 2020). We adopted items from previous studies of epistemology as much as possible, but some items were modified to ensure the scale was developmentally appropriate for the sample, which featured some middle school students and some English learners.
Items also needed to be modified from previous research to ensure that at least two positively and two negatively oriented items were developed for each dimension. This was necessary due to the observed practice of featuring uniform item orientation within scales and polarized orientation across scales. Items that were negatively oriented were worded to (1) most closely resemble the semantic opposites of the corresponding positive items and (2) be as clear as possible. For example, the following two items represent students’ beliefs in complexity: “Most historical events are due to a single cause” and “Most historical events are due to multiple causes.” The first item appears in Wiley et al. (2020). The second item was generated by us to match the semantic opposite of the first and to align with an interpretation that a student has strong beliefs in the complexity of knowledge.
We did not write, “Most historical events are not due to a single cause,” as this would be confusing. However, confusing items featuring negation are common in extant research—“Problems within climate research do not have any clear and unambiguous solutions” (Bråten and Strømsø, 2009; emphasis added). In sum, we tried to match items from previous research to better contextualize prior findings, but sometimes modified items to make them understandable. This included avoiding items with negation, when possible, as these are particularly confusing (Arias et al., 2022).
Some items were more difficult to orient negatively, clearly, and congruently to positively oriented items. For example, the item “To understand the causes of historical events, you need to connect evidence using reasoning” might be oriented negatively to read “To understand the causes of historical events, you don’t need to connect evidence using reasoning.” However, this item is unclear when written in the negated mode (Colston, 1999). Instead, drawing on insights from previous researchers that the opposite of justification through evidence and reasoning is justification through apparent facts (e.g., Buehl & Alexander, 2005; VanSledright & Maggioni, 2016; Voss, 1998; Wiley et al., 2020), we represented the opposite end of the spectrum measuring this epistemological dimension with the question: “To understand the causes of historical events, you just need the facts.” In history, students who disagree with this statement would have firm beliefs in integrating evidence from multiple sources.
Before administering the survey, three members of the larger study team, who were not co-authors on this paper, reviewed the list of 30 items. They noted issues with (1) readability and clarity for the target population and (2) alignment with the hypothesized dimension. Given their feedback, a 16-item scale was produced and administered (see Table 2).
Items Measuring Epistemology Using a Dimensional Approach
Note. Items marked with an asterisk (*) were reverse-coded so responses were all positively oriented across dimensions; the highest response represents a view of knowledge as complex, tentative, and justified through the integration of evidence.
Student responses were collected using a sliding scale from 1 to 10, with 1 indicating that they strongly disagreed with the statement and 10 indicating they strongly agreed with it. All negatively oriented items were reverse-coded before analyses. Items were presented to students in a randomized order. Given this randomization and equivalent content in positively and negatively oriented items, we could systematically test whether responses reflected the orientation of the items or underlying substantive dimensions of epistemology.
Analytic Approach
Confirmatory factor analysis (CFA) using Mplus 8.4 (Muthén & Muthén, 1998-2017) was used to answer our research question: Do factors identified in previous epistemology research primarily reflect substantively distinct dimensions of epistemology or method factors? All models were estimated using full information maximum likelihood. CFA was suitable for our research question because it allowed us to use various models to (a) test hypotheses about the role of item orientation as a source of method variance and (b) evaluate whether dimensional structures reported in prior research could be recovered once orientation was modeled explicitly. Exploratory factor analyses were not conducted because the study was not intended to identify an optimal factor structure.
We first tested two models of epistemology where item orientation is fully confounded with substantive dimensions. Each dimension was comprised of only positive or negative items (Figure 2). These models are intended to show how item orientation could be leveraged to change factor structure irrespective of the content of underlying items held constant. The first model, Figure 2a, represented complex and tentative beliefs as a single dimension (see Wiley et al., 2020) and featured negatively oriented items. Integration beliefs were a separate dimension featuring positively oriented items. This model has been used previously.

Dimensions of epistemology with uniform item orientation within factors.
Figure 2b tested a model at odds with prior theoretical assumptions and empirical findings but used uniform item orientation within factors. This model combined negatively oriented items from the complex and integration dimensions to form a single, atheoretical dimension. Another dimension consisted of positively oriented items from the tentative dimension. If this model had a similar fit to the first model, it would provide evidence that method variance can be leveraged to generate artifactual factor structures, which we hypothesize is occurring in prior research (see Table 1).
Next, we tested five competing CFA models using all the items in the survey. For all models, correlated residuals were specified for item pairs (e.g., complex1 and complex1b) to account for shared wording or semantic overlap between items. We tested three types of models—substantive factors, method factors, and reference and nonreference factors—that allowed us to estimate the relative contributions of (a) method variance due to item orientation and (b) substantive dimensions of epistemology on response patterns in the data (Byrne, 2013; DiStefano & Motl, 2006; Geiser et al., 2008; Marsh, 1996).
The first model we tested was a baseline unidimensional model representing epistemology as a single construct (Figure 3a). This model had no dissociable dimensions of epistemology or factors modeling method variance. Next, we tested a correlated three-factor model reflecting three theoretical dimensions of epistemology in history: complex, tentative, and integration (Figure 3b). This model assumes responses are due only to the substantive dimensions of epistemology. It was necessary to test unidimensional and multidimensional models, given our hypothesis that factor structure was an artifact of method variance in previous research.

Alternative models of student epistemology in history with and without method factors.
Next, we tested a correlated two-factor “methods” model reflecting only positive and negative item orientation of the underlying measure (Figure 3c). One factor represented the method variance of positive item orientation, and another factor represented the method variance attributable to negative item orientation. This model represented the hypothesis that item orientation is the only source of item variance.
The final two models refit the unidimensional and multidimensional models but added a nonreference factor, representing method effects, to account for systematic residual variance due to item orientation. In these correlated trait–correlated methods minus one CT-C(M–1) models (Eid et al., 2003; Geiser et al., 2008), positively oriented items loaded onto a reference factor (e.g., epistemology measured by positively oriented items) and negatively oriented items loaded onto the reference and nonreference factors (i.e., negative item orientation). The nonreference factor, which we called “negative orientation,” captured shared variation among items due to the method, above and beyond the traits (i.e., epistemology) specific to the negatively oriented items. This allowed us to decompose variance in responses due to trait and method (item orientation) effects—the key purpose of the study. The nonreference factor could also have been comprised of positive items without changing results.
Specifically, Figure 3d was a bifactor model with epistemology as a substantive reference factor and a nonreference factor accounting for the variance due to negative item orientation. Figure 3e was a two-tier model with three correlated reference factors measuring three dimensions of epistemology with positively oriented items—complex, tentative, and integration—and a specific nonreference factor accounting for the variance due to negative item orientation (Cai, 2010). In Figure 3d and e, the nonreference method factor was orthogonal to the reference substantive factor(s) (Eid et al., 2003; Geiser et al., 2008; Marsh, 1996; Wang et al., 2018).
After CFA, various model fit indices were considered, including chi-square goodness of fit, comparative fit index (CFI), root mean square error of approximation (RMSEA), and standardized root mean squared residual (SRMR within and between). Recommended cutoff values of those indices for good model fit are statistically nonsignificant chi-square (p ≥ .05), CFI and TLI ≥ .90 with ≥.95 considered ideal, RMSEA ≤ .06, and SRMR ≤ .10 with ≤.08 considered ideal (Hu & Bentler, 1999; Kline, 2015). Model comparisons for nested models, especially comparing unidimensional and three-factor models, were conducted using chi-square difference-of-fit tests (Hu & Bentler, 1999; Kline, 2015).
We also evaluated the validity of models by examining parameter estimates—the direction and magnitude of standardized factor loadings and correlations between factors (DiStefano & Motl, 2006; Dueber, 2017; Marsh, 1996). Further, the relative strength of the hypothesized substantive factor(s) to the negative nonreference method factor in Figures 3d and e was examined using the index explained common variance (ECV). Viewing each of the substantive factors (epistemology in Figure 3d and complex, tentative, and integration in Figure 3e) as its own general factor and the nonreference factor as a specific factor, larger ECVs (closer to 1) would suggest the general factor(s) explains a greater amount of item variance than a specific nonreference method factor (Dueber, 2017; Rodriguez et al., 2016).
The contributions of the substantive factors and method effects to item variances were also evaluated by examining coefficient omega hierarchical (ωH), omega hierarchical subscale (OmegaHS), and Hancock’s H (Dueber, 2017; Hancock & Mueller, 2001; Reise, 2012; Reise et al., 2010; Rodriguez et al., 2016). A higher coefficient omega indicates that a higher proportion of the observed total score for these items is explained by this factor after parceling out the variance attributed to the subscale or general factor. Hancock’s H indicates the stability of a factor—in our case, substantive or method factors.
Results
An examination of the uni- and multivariate properties of variables indicated all variables were adequate for engaging in CFA (the slight skewness that caused them to fail the Shapiro-Wilk test was not sufficient to cause them to be problematic in our analyses). Items that were negatively oriented did have lower overall means than other items. For example, responses to the item: “Explanations should change with new information” were higher (M = 6.55; SD = 2.37) than responses to the item “Explanations should not change with new information” after reverse coding (M = 5.86; SD = 2.65). Descriptive statistics for our variables are set out in Table 3.
Items Measuring Dimensions of Students’ Epistemology
Note. Responses ranged from 1–10 and were recorded by sliding a bar between strongly disagree and strongly agree. An asterisk (*) indicates responses that have been reverse-coded. Using the Shapiro-Wilk normality test (Shapiro & Wilk, 1965), the variables indicated with ‡ were not normally distributed.
We also examined bivariate and item-total correlations between variables. A correlation matrix is presented in Appendix A. At this point, we determined 2 of the 16 variables should be omitted from CFA. The variables complex3b and just1 had negative correlations with other substantive and methodologically similar (shared item orientation) items. Complex3b had negative correlations with three of the four variables in the hypothesized complex dimension and weak or statistically insignificant correlations with other positively oriented items. In a single-factor model, both complex3b and just1 had negative factor loadings, suggesting they were particularly misunderstood by students or were poorly designed items.
The first model of epistemology, where item orientation is confounded with substantive dimensions (Figure 2a), had good model fit: χ2 = 37.947, df = 19, p < .001; CFI = .956; TLI = .935; RMSEA = .056; SRMR = .042. In this model, complex/tentative was a single dimension featuring items that were all negatively oriented, and integration was a separate dimension featuring items that were uniformly positive in orientation. All factor loadings for the respective dimensions were statistically significant and moderate to strong (.508–.668), and the factors were orthogonal (r = −.001, p = .994). The relations between factors, model fit, and estimates from this model were theoretically and methodologically consistent with previous research (see Table 1).
The next model also leveraged method variance but combined items in a way that was incongruent with previous research (Figure 2b). This model intended to show item orientation as a source of method variation contributing to factor structure from previous research. This model combined negatively oriented items from complex and integration to form a single dimension. Another dimension consisted of positively oriented items from the tentative dimension. This model also had good fit: χ2 = 30.830, df = 19, p < .001; CFI = .974; TLI = .961; RMSEA = .044; SRMR = .038. All factor loadings for the respective dimensions were statistically significant and moderate to strong (.423–.721). The relation between factors was also approximately 0 (r = −.046, p = .554). Together, the models in Figure 2 provide evidence that factor structure can change by leveraging method variance.
Next, five CFA models were fit to the data using a mix of positive and negative items. Model fit indices are shown in Table 4. Overall, the one-factor and three-factor substantive models (3a and 3b, respectively) that failed to account for method variance showed poor fit. The two-factor model, which featured method factors only (3c), had marginally good fit (χ2 = 147.647; 70; <.001) with CFI (.916) and RMSEA (.059) values considered good, SRMR considered ideal (.054), and TLI considered marginally poor (.890). This model had consistently significant, positive, and moderate to strong standardized factor loadings (.427–.663) as seen in Figure 4. The correlation between method factors was weak and insignificant, similar to factors modeled in previous research.
Model Fits of Confirmatory Factor Analysis Shown in Figure 4
Note. Correlated residuals for item pairs were set in all models.

Standardized loadings for methods factor model; item orientation explains item variation.
The models that included both reference factor(s) and a nonreference factor modeling residual variance due to negative item orientation (3d and 3e) had good fit. Figure 3d—one substantive reference factor and one nonreference method factor—had an SRMR (.045) value that was ideal and CFI (.927), TLI (.895), and RMSEA (.058) values that were good. The fit indices for the three correlated substantive reference factors and one nonreference method factor model (Figure 3e) were ideal (CFI = .963; TLI = .944; RMSEA = .044; SRMR = .044). Further, the chi-square difference-of-fit test indicated a preference for this model over the one reference factor model (∆χ2 = 35.1237, ∆df = 2, p < .001) and the two method factors model (∆χ2 = 52.514, ∆df = 9, p < .001).
Examining the magnitude of standardized factor loadings, we see that nonreference factors contributed substantially to variation in item responses. However, standardized factor loadings were generally small and sometimes negative for the reference factors, indicating a lack of clearly defined epistemology factor(s) after a method factor was estimated.
The three reference and one nonreference factor model (see Figure 5) featured standardized factor loadings from the negative orientation factor that were all moderate and statistically significant (.438–.664). For the reference factors, all but two of the factor loadings were statistically significant. All of the factor loadings, except one (−.153), were positive. Correlations between reference factors were significant, positive, and moderate to strong (.622–815), though 8 of the 14 factor loadings onto reference factors were less than .300, with a range of .136 to .710. Given the factor loadings for reference factors shown in Figure 5 and the consistently strong, significant, and positive direction of factor loadings in Figure 4 (two method factors), it can be argued that the methods-only model explains the underlying data well.

Standardized loadings for three correlated substantive reference factors and one nonreference method factor model.
Returning to the CT-C(M–1) models, ancillary bifactor indices (Table 5) show the relative strength of the hypothesized epistemology factor(s) compared to the negative orientation factor in Figure 3d and e. First, considering the model with one substantive epistemology factor, Table 5 shows a lower explained common variance (ECV) for the substantive epistemology factor (.414) than the negative orientation factor representing method effects (.975). Additionally, omega hierarchical (.232) and omega subscale (.807) indicated the negative orientation factor was a better source of reliable variance than the epistemology factor (Rodriguez et al., 2016).
Ancillary Bifactor Indices for Substantive and Method Factors
Note. Factor indices computed using Dueber (2017).
Similar coefficients are seen in the model with three correlated reference factors. Specifically, the negative orientation factor had the highest ECV (.957), followed by integration (.593), tentative (.520), and complex (.252) factors. Omega hierarchical for the reference factors ranged from .126 to .329, and omega subscale for the negative orientation factor was .809, indicating that the negative orientation factor representing method effects was a better source of reliable variance than the substantive reference factors. Hancock’s H, which indicates the stability of a factor, suggests (1) none of the substantive reference factors are stable or reliable, and (2) the negative factor is actually a “well-defined latent variable” with a value greater than .80 (Hancock & Mueller, 2001, p. 230). In sum, substantive factors are weak and unstable. The method factor is strong, stable, reliable, and consistently explains more variance in item responses.
Importantly, the method factor(s) in the two most plausible models (the method factors model, Figure 4, and the three correlated substantive reference factors and one nonreference method factor, Figure 5) have different interpretations. In the method factors–only model, the method factors are roughly orthogonal, suggesting independent sources of variation for positively and negatively oriented items. This model has the advantages of simplicity and interpretability, although it raises the question of which, if either, method factor better captures aspects of participants’ epistemic beliefs.
The three correlated substantive reference factors and one nonreference method factor model (Figure 5) effectively imply that a method factor strongly affects participants’ responses on negatively oriented items. Responses are also affected by three moderately to highly correlated substantive dimensions of epistemology, albeit to a lesser degree than by the methods factor. In this case, the method factor could plausibly be conceptualized as “lack of comprehension” or “inattention,” assuming that confused or inattentive respondents are more likely to show different response patterns for negatively and positively oriented items. We estimate that this effect is strong, given that models accounting for, or leveraging, item orientation had a significantly better fit.
As sensitivity analyses, we also examined the fit of the best-fitting model for older students (Grades 8, 10, 11, and 12) only and non-EL students only, given these subgroups would be least likely to fail to comprehend items. In these models, the main patterns in model fit, factor loadings, and ancillary bifactor indices hold; the method factors explain more variance than the substantive factors.
As additional sensitivity analyses, we fit three more models to the data: 1) a second-order bifactor model, where a general epistemology factor encompassed three subfactors—complex, tentative, and integration beliefs—as seen in Figure 3e; 2) the model shown in Figure 3d with an additional method factor representing response patterns due to positive item orientation; and 3) Figure 3e with an additional method factor representing response patterns due to positive item orientation. The latter two analyses use a multitrait–multimethod modeling (MTMM) approach (Byrne, 2013; DiStefano & Motl, 2006; Marsh, 1996), presenting effects of both positively and negatively orienting items as method effects.
All models had a similar fit to the correlated three-factor model reported in Figure 5, and patterns of factor loadings did not differ from the main results reported; factors identified in previous epistemology research are likely confounded with method effects that influence factor structure beyond substantive dimensions of epistemology. See Appendix B for the model fits of the sensitivity analyses.
Discussion
Item Orientation as a Significant Confound
We found substantial evidence that item orientation significantly influenced item response patterns when students completed a survey designed to assess their epistemological beliefs in history. First, models leveraging uniform item orientation within factors and polarized item orientation across factors achieved ideal fit even if items were set to load onto factors in patterns that contradicted prior research and theory (i.e., complex and integration beliefs as a single construct). It is important to note that these models are methodologically consistent with prior research; they leverage shared item orientation within scales and polarized item orientation across scales. The marginally good fit of the method factors model also emphasizes that significant sources of variance due to item orientation can generate artifactual factor structure in epistemology scales (Arias et al., 2022; Woods, 2006). This model could be viewed as proposing two distinct factors of “epistemology.” It would have model fit and coefficient alpha equivalent to models in prior research, even as it disregards prior theoretical contributions by mixing items from three “substantive” dimensions to form artifactual latent constructs. To illustrate, our models with excellent fit showed the item pairs like the following as representing two separate dimensions of epistemology: “Most historical events are due to a single cause” and “Most historical events are due to multiple causes.” We agree with those arguing that positively and negatively oriented items do not function equivalently (Roszkowski & Soven, 2010; van Sonderen et al., 2013; Woods, 2006).
Second, the significant improvement of model fit when including a nonreference factor to capture method variance also highlights the influence of item orientation on response patterns. Moreover, the models that did not account for item orientation and did not leverage item orientation as a confound by having uniformly positive or negative items in a factor all had poor fit. While the correlated three-factor model with a nonreference method factor had a good fit, the ancillary bifactor indices and factor loadings indicated that item orientation was more than a confound; it was the primary source of variation explaining student response patterns. The hypothesized constructs of complex, tentative, and integration beliefs more weakly informed response patterns. These factors were weak and unstable. Further, the consistently small or negative item loadings for negatively oriented items onto their respective reference factors indicate that responses to these items have little, if anything, to do with these epistemological dimensions once method variance is accounted for in the nonreference factor. We note here that choosing the negative method factor was arbitrary and that statistically equivalent models specifying a positively oriented, rather than a negatively oriented, method factor would yield identical results.
Given these findings and those from investigations into other psychological constructs (e.g., Cole et al., 2019; Deemer & Minke, 1999; DiStefano & Motl, 2006; Wang et al., 2018), we argue that studies measuring the dimensions of epistemic beliefs are substantially confounded with artifactual method effects to the extent that claims about factor structure are invalid. Not only do prior studies fail to control for the influence of method artifacts that we have presently modeled as significant and predominant, they amplify this shared variance due to item orientation by (1) using methods like PCA and EFA and (2) reducing items to form scales without considering the content and orientation of items (e.g., Bråten & Strømsø, 2009; Conley et al., 2004; Huang et al., 2023; Muis et al., 2015; Ståhl, 2019; Strømsø et al., 2008; Trevors et al., 2017; Wiley et al., 2020; Yli-Panula et al., 2021). These practices exacerbate the item orientation as a confound by leveraging the shared variance of positively and negatively oriented items within a subscale to create distinct yet artifactual factors. In our study, the greater explanatory power, reliability, and stability of a method factor and the excellent fit of models leveraging item orientation demonstrate how one can effectively increase scale reliability by leveraging the commonalities generated by artifactual method effects. In sum, when item orientation is not properly accounted for, the factor structure of epistemology is capricious.
Contextualizing Previous Research Findings
Our findings may also help explain other counterintuitive findings or disagreements from previous research. First, competing previous findings related to factor structure (e.g., Strømsø & Bråten, 2009; Wiley et al., 2020) are plausibly explained by item orientation as a confound. Wiley et al. (2020) find that complex and tentative beliefs are a single dimension—with all items uniformly negative—only after positively oriented items from Bråten and Strømsø’s study (2009) were reversed or removed (see Appendix C). Whether a construct is one dimension or two, it appears the answer partially lies in how much method variance, unrelated to the constructs, is leveraged. The good fits of our models leveraging method variance—Figure 2a and b—explicitly highlight the malleability of factor structure. The weak or negative factor loadings for negative items in Figure 5 show how such malleability can be generated by a method factor that is comprised of items not at all related to epistemology.
Second, our findings may also help contextualize prior findings that epistemological dimensions were weakly correlated or orthogonal (see Table 1). We see two possible explanations for the low correlations between dimensions of epistemology in prior research: (1) dimensions of epistemology are not strongly related (e.g., having beliefs that knowledge is complex is not strongly related to having beliefs that knowledge is tentative) or (2) low correlations or orthogonal relations are due to partially offsetting negatively associated method factors and positively associated substantive factors. We see the second explanation as more plausible, given theoretical conceptualizations of epistemological dimensions as related (Hofer, 2001; Hofer & Pintrich, 1997; Schommer, 1994) and our own findings.
Our models that leveraged uniform item orientation within factors and polarized orientation across factors (Figure 2a and b) had correlations between factors of −.001 and −.046. These match low correlations in extant research (see Table 1). Likewise, our methods-only model with a marginally good fit (Figure 3c) reported a near-zero correlation between factors. These models measured substantive and method variance together in the nearly uncorrelated factors. In doing so, they plausibly feature offsetting negatively associated method factors and positively associated substantive factors.
Conversely, when we used a CT-C(M–1) approach, we disentangled plausibly offsetting negatively associated method effects by utilizing reference and nonreference factors. This resulted in positively associated substantive factors. Therefore, it appears that weak correlations between oppositionally valenced factors in previous research reflect offsetting (1) highly negatively correlated method factors and (2) highly positively correlated substantive factors, which we have disentangled in the present study. Conceptually, strong correlations between substantive factors make considerably more sense than previous studies with weakly correlated substantive factors (see the right-most column in Table 1). A student with more sophisticated beliefs in the nature of knowledge, viewing knowledge as complex and tentative, likely has more sophisticated beliefs in how knowledge is constructed (Hofer, 2001; Hofer & Pintrich, 1997; Schommer, 1994). Our final model indicates there may be conceptually distinct dimensions of epistemology that are strongly related (if method variance is accounted for). Yet, the ancillary bifactor indices and consistently weak factor loadings for items loading onto the reference and nonreference factors in the best-fitting model (Figure 5) make us hesitant to make any claims about reliably estimating substantive dimensions of epistemology.
Implications for Future Research
We argue that there is not presently a reliable understanding of the dimensions of epistemology. Accordingly, we see several steps as warranted. First, when reevaluating prior findings related to the dimensions of epistemology, testing for the effects of item orientation should be conducted in the formation of and validation of scales that propose epistemology is best represented by multiple dimensions. Further, insufficient reporting of psychometric decisions likely compounds the effects documented here. When reducing or modifying items, researchers should name these items, document their reasons for removing them, and note the item orientation of retained/removed items. Empirically investigating the influence of item orientation is vital if researchers are concerned with measuring what they intend to measure and not artificial method variance.
Future scales might include uniform item orientation across scales or intentionally balance item orientation across hypothesized dimensions in scale creation and validation. Positive item orientation and clear, developmentally appropriate language might ensure that language fluency or other individual attributes do not confound response patterns (Hughes, 2009). This may include removing potentially confusing, negatively oriented and/or worded items (e.g., “Problems within climate research do not have any clear and unambiguous solution” or “Historical explanations should not change in light of new information”; Arias et al., 2022; Colston, 1999; Wang et al., 2018).
At the same time, we lack the content validity evidence to attest that using only clear and positively oriented items would reliably measure epistemology. Instead, researchers should engage in more systematic content validation and the use of alternative formats to Likert scales. Methods such as the expanded format have individuals choose between two positively and clearly worded options. Instead of choosing between agreement and disagreement, individuals might indicate their simple or complex beliefs by sliding a bar across continuous scale points to indicate their placement on a spectrum of epistemological development: Only one good historical explanation ←→ Many good historical explanations
This approach is more semantically clear about what “lower” and “higher” responses mean, avoids polarity, is less likely to elicit acquiescence bias compared to Likert scales, and results in response patterns more consistent with theory (Zhang & Savalei, 2024; Zhang et al., 2019). Item-specific (Saris et al., 2010) and forced-choice formats (Schuman et al., 1981) also include specific, semantically clear response options for each item. Though these formats take more time for respondents to complete, they can reduce issues in response patterns, which, we argue, presently obscure clear measurement.
Limitations
Given the limited sample size, linguistic heterogeneity of participants, and purpose of the study, we only provide evidence about the problematic effects of measurement practices rather than definitive statements about the dimensional structure of epistemological beliefs in this or a general population. Although reading skills and processing may influence scale responses (Roszkowski & Soven, 2010; Dueber et al., 2022), we see it as plausible that issues of item orientation affect factor structure in previous studies with different populations, given the replication of previous findings in Figure 2a and b. Still, using the research design employed presently with post-secondary or adult populations would add validity to the present findings.
Another limitation is that although most item pairs are clear opposites in valence or meaning, some may not be clear opposites. In balancing readability, item orientation, and meaning, we cannot be sure that all items functioned as intended. For example, the negation of tent3b, “You can be certain that historical explanations are true,” might more accurately be “You cannot be certain that historical explanations are true.” Similarly, tent2 and tent2b could be more clearly opposed by rewriting tent2b as “The knowledge about a historical topic remains the same.” To address these limitations, future studies should consider having external reviewers assess items for validity and equivalent meaning.
At the same time, these item pairs do not need to be exact in CFA models that account for shared variance in item pairs with correlated residuals, allow for correlations between factors, and permit additional sources of variance while still modeling the construct of interest: beliefs in the tentative nature of knowledge. Further, many of the item pairs that we see as clear semantic opposites (e.g., complex1 and complex1b, tent1 and tent1b, just2 and just2b) had weak or even negative correlations after reverse coding. This provides evidence at the item level that item orientation influenced response patterns. Still, one might simply bypass this issue by using alternatives to Likert-scale formats, as mentioned previously.
The testing conditions are also a limitation in this study. We had limited oversight over the testing situation because teachers administered the epistemological surveys as part of classroom instruction. This could have also influenced the results, as students might not have treated their responses with the seriousness we expected as researchers. Future studies in a more tightly monitored testing situation, or with items to indicate careless response patterns, could extend our understanding of why and when item orientation affects response patterns. Finally, examining the relative difficulty of items using item response theory could further clarify the findings.
Conclusion
The current investigations offer further support for researchers and test developers who argue against the inclusion of negatively oriented items due to their influence on the factor structure (Cole et al., 2019; Dalal & Carter, 2014; Deemer & Minke, 1999; Hughes, 2009; Roszkowski & Soven, 2010). Given the influence of item orientation as a confound to factor structure in other domains and the present study, previous studies featuring dimensional scales that permit and ignore item orientation as a source of variance should be reevaluated.
Hofer and Pintrich’s (1997) initial theoretical work outlining multiple dimensions of epistemology is still influential in studies of epistemology today, as epistemic cognition is often seen as occurring within unique epistemological dimensions that are specifically relevant to an inquiry task (Chinn et al., 2011). Therefore, accurate measurement of factors is important to understand how specific beliefs about knowledge are related to student reasoning, especially in a 21st-century information society where struggles with epistemic differences (e.g., how to know what is true) create social and civic discord (Barzilai & Chinn, 2020; Kavanagh & Rich, 2018). First, though, we contend that accurately and reliably measuring constructs of interest is a foundational process that must be reexamined and improved to move forward.
Footnotes
Appendix
Perceived Modifications Made by Wiley et al. (2020), From Extant Research
| Items Grouped by Original Dimension | Modification | Modified Item Under Simple/Certain Factor |
|---|---|---|
| [Negatively oriented] Simplicity of Knowledge Dimension | [Negatively oriented] Component (Simple/Certain) | |
| Within topic, various theories about the same thing will make things unnecessarily complicated. | Change topic | The best explanations in history are those that stick just to the one major cause that most directly leads to the event. |
| Knowledge about topic is indisputable. | Change topic | Good explanations in history are always indisputable. |
| With respect to topic, there are seldom connections among different issues. | Change topic | There is only one good historical explanation that can be written from a set of facts. |
| [Positively oriented] Certainty of Knowledge Dimension | [Negatively oriented] Component (Simple/Certain) | |
| What is considered to be certain knowledge about topic today may be considered to be false tomorrow. | Change topic, orientation, and dimension | You can be certain that historical explanations are true. |
| Theories about topic can be disproved at any time.* | Change topic, orientation, and dimension | Historical explanations should not change in light of new information. |
| Problems within topic research do not have any clear and unambiguous solution. | Change topic, orientation, and dimension | Most historical events are due to a single cause. |
Note. *This item is also similar to “The knowledge about topic is constantly changing.” If this were the adapted item, there is also a change in item orientation and a subsequent change in factor structure. The item is also similar to “Knowledge about topic is indisputable,” which appears in the negatively oriented simplicity dimension. One might argue that there is a substantive difference between something being indisputable versus not being able to be disproved at any time (or knowledge and theory are different), and, therefore, these are indicators of substantively different dimensions of epistemology. However, we see this argument as weak on its own and not robust to critiques of item orientation. One item reoriented would result in the following two questions loading onto different dimensions of epistemology: “Knowledge about topic can be disproved” and “Theories about topic can be disputed.”
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Institute of Education Sciences, U.S. Department of Education, through Grant R305C190007, the WRITE Center. The opinions expressed are those of the authors and do not represent the views of the Institute or the U.S. Department of Education.
Open Practices
Authors
JACOB STEISS was a Project Scientist at the University of California, Irvine when this study was conducted;
DREW BAILEY is a professor at the University of California, Irvine;
TAMARA TATE is a project scientist at the University of California, Irvine, and associate director of the Digital Learning Lab (
);
STEVE GRAHAM is the Warner Professor in the Division of Leadership and Innovation in Teachers College;
