Abstract
Epistemic beliefs and their possible effects on human behavior have been a focal point in recent decades. By measuring epistemic beliefs, researchers aim to better understand how individuals perceive and approach knowledge. However, more recently, the construct of epistemic beliefs has become the subject of further investigation, as it is often described as difficult to conceptualize and assess. Hence, in this paper the psychometric properties of one of those instruments, namely the Connotative Aspects of Epistemological Beliefs (CAEB), are further analyzed. Across four data sets, a total of N = 1,108 educational sciences and psychology students were asked to complete the CAEB with regard to educational science and/or psychology. Data was analyzed via confirmatory factor analyses, exploratory structural equation models and analyses of measurement invariance. Results show that neither the originally proposed theoretical, nor the empirically identified dimensions of the CAEB can be replicated. Furthermore, exploratory analyses reveal a new model with a total of 12 items and two factors, which share some similarities with the empirically identified measurement model. Group comparisons between the samples indicate that the metric invariance hypothesis holds, while scalar invariance cannot be substantiated. The findings suggest that the originally proposed CAEB dimensions are not applicable to educational science and psychology. Instead, this preliminary 12-item version appears promising for assessing connotative epistemological beliefs. Nonetheless, the characteristics of the convenience sample, as well as limitations regarding reliability, and the lack of scalar invariance, need to be addressed in future studies in different domains and across different samples.
Introduction
Modern society is built upon the accumulated knowledge and learning of its individuals. The understanding of what knowledge is and what it entails has led to the development of educational systems that pass on this information in order to shape the future. More recently, greater attention has been paid to how individuals think about knowledge in specific domains and its role in our behavior. Epistemic beliefs, beliefs about the nature of knowledge and knowing (Hofer & Pintrich, 1997), have become an extensively researched phenomenon in educational science, psychology and related fields. They are reported to influence, among others (at least indirectly, e.g., Muis & Franco, 2009), learning processes and conceptual change (Hofer & Pintrich, 1997).
However, epistemic beliefs are described as a complex and difficult construct to measure (Mason, 2016). Many studies rely on self-report questionnaires, but if these instruments lack reliability and validity, meaningful conclusions about their influence cannot be drawn. Poor measurement can lead to misinterpretations and ineffective interventions, ultimately limiting the practical application of findings. For instance, consider a teacher whose epistemic beliefs shape how they approach new or contradictory research findings. If these beliefs are then assessed with an unreliable instrument, any conclusions drawn about them and their role in processing and integrating research findings may be inaccurate. Furthermore, if the relationships between epistemic beliefs and how individuals engage with knowledge are based on inaccurate measurements, their theoretical and empirical significance must be reconsidered. These concerns have also been voiced by Lee et al. (2021), who observed that numerous studies assessing epistemic beliefs lack adequate reporting on validity. Against this backdrop, ensuring the validity and reliability of measurement tools is essential for advancing research and deriving meaningful practical implications.
Hence, the aim of the current study is to reevaluate an epistemic beliefs instrument, namely the Connotative Aspects of Epistemological Beliefs scale (CAEB; Stahl & Bromme, 2007), which has been in use for almost 20 years, but which has not been critically evaluated in terms of its psychometric properties for the disciplines of educational science and psychology. This paper aims to contribute to the current discourse on the feasibility of the quantitative assessment of epistemic beliefs and the reliability and validity of the particular instrument CAEB.
Theoretical Background
In the last 25 years, researchers investigated a wide variety of concepts and frameworks addressing beliefs about the Nature of Knowledge and the process of Knowing which are usually subsumed as epistemic beliefs (Muis et al., 2016). The early beginning of research in this field is usually associated with William G. Perry’s research, reasoning that individuals start with dualistic epistemic beliefs (e.g., knowledge is true or false, etc.), proceed in their development to a certain level of multiplism (e.g., preliminary knowledge can be contradictory), and – in some cases – become more sophisticated and reach relativism (e.g., knowledge is socially and actively constructed). Perry’s model is considered very influential (Hofer & Bendixen, 2012), as researchers published numerous refinements, extensions or revisions with regard, for example, to the number and gradation of the stages (e.g., Barzilai & Weinstock, 2015; Kuhn & Weinstock, 2002; Peter et al., 2016), the judgment domains (Kuhn et al., 2000) or the granularity of the entities the beliefs are about (Merk et al., 2018; Muis et al., 2006). However, all frameworks and models have in common, that they posit a (recursive) development through at least three stages. This is why some researchers subsume these frameworks as developmental perspective on epistemic beliefs (Muis et al., 2006).
Starting from criticizing the key assumption of developmental approaches that several aspects of epistemic beliefs change simultaneously, while proceeding to the next developmental stage, the so-called dimensional perspective evolved, giving the epistemic beliefs research a tremendous boost. A first Likert-type assessment of epistemic beliefs was developed by Schommer (1990; cf. Hofer & Pintrich, 1997), which laid the groundwork for more elaborated statistical methods like factor analysis. Thereupon a vivid debate about the number and nature of these aspects/dimensions of epistemic beliefs began. Since then, the construct has repeatedly been identified as hard to conceptualize and measure (Lee et al., 2021; Mason, 2016).
To date, the most broadly adopted (Bromme et al., 2010) framework (Hofer & Pintrich, 1997) postulates two sub dimensions for the Nature of Knowledge (Certainty and Simplicity of Knowledge) and two dimensions for the Nature of Knowing (Source and Justification of Knowing). But, as Bråten et al. (2011) point out, factor-analytic approaches could not consistently verify these four dimensions (Hofer, 2000; Kienhues et al., 2008). Despite this lack of consensus regarding the structure of epistemic beliefs, many studies in the field of education have investigated possible effects of them, such as their association with teaching practice, work engagement, teaching motivation and burnout (Bråten & Ferguson, 2015; Kilinç & Seymen, 2014; Lammassaari et al., 2021; Lammassaari et al., 2022). Teaching motivation, in particular, has been examined in the context of epistemic beliefs through social utility, where teachers are motivated by the societal value of their work (Kilinç & Seymen, 2014), and task value, which reflects the perceived importance of teaching tasks (Bråten & Ferguson, 2015). While not directly linked, these motivations conceptually align with the teacher behavior of providing rationales, as described by Ahmadi et al. (2023).
In addition to influencing teaching motivation, epistemic beliefs are also said to shape teachers’ instructional approaches in the classroom (Soleimani, 2020). Some studies suggest that teachers who hold more sophisticated beliefs are more likely to create learning environments that foster deeper, self-regulated learning, ultimately promoting critical thinking (Brownlee et al., 2012). In contrast, less elaborated beliefs can result in negative predictors of student outcomes (Alexander et al., 1997; Bråten et al., 2011; Mason & Boscolo, 2004; Pieschl et al., 2008). Further evidence suggests that epistemic beliefs may also influence students’ argumentation, enjoyment of learning and academic achievement (Guo et al., 2022; Liang et al., 2023; Şen et al., 2023). Nevertheless, such findings must be treated with caution, as the continuing challenges of accurately measuring epistemic beliefs cast doubt on how robust such findings actually are.
Connotative Aspects of Epistemological Beliefs
Given the complexities of the construct, different approaches have been used to conceptualize and measure epistemic beliefs (Kienhues et al., 2008). While many studies have focused on the explicit, reflective dimensions of epistemic beliefs as outlined above, others have explored alternative conceptualizations that capture less deliberate, more intuitive aspects. This distinction was made more apparent by Stahl and Bromme (2007), who further examined how intuitive epistemic beliefs are expressed. In their terminology, they distinguish between explicit, denotative beliefs, which are accessible to conscious reflection and often context-specific, and associative, connotative beliefs, which are activated spontaneously and often require further contextual information. For example, judging whether a specific psychological study’s findings are certain or not, activates denotative aspects of beliefs, whereas the question, or rather the feeling, whether psychological knowledge is generally certain or uncertain addresses connotative aspects of beliefs (example adapted from Rott et al., 2015). In this sense, most of the above-mentioned belief scales address denotative beliefs.
Recognizing the abundance of denotative questionnaires, Stahl and Bromme (2007) specifically aimed to develop an instrument that addresses connotative beliefs: the Connotative Aspects of Epistemological Beliefs (CAEB). They analyzed the data with a factor analytical approach in two studies with more than 1,000 participants each. The CAEB is a semantic differential instrument comprising 24 pairs of opposite adjectives (e.g., certain–uncertain, stable–unstable). Participants rate these pairs on a 7-point Likert scale in relation to a specific domain – such as educational science or psychology (used in the studies presented below), which are typical subjects in teacher education.
Instead of the theoretically assumed three factors, Simplicity, Certainty, and Source (derived from a literature review), Stahl and Bromme (2007) empirically identified two factors via exploratory factor analysis (see Table 1 for a comprehensive depiction): The factor Texture is a mixture of concepts about the Nature of Knowledge and about the Nature of Knowing. It mainly consists of items about the structure and accuracy of knowledge, mainly from the hypothesized factors Simplicity (e.g., superficial–profound) and Source (e.g., precise–imprecise). The factor Variability represents beliefs about the stability and dynamics of knowledge, which were initially subsumed under the factor Certainty (e.g., dynamic–static). In this context, ratings that knowledge is rather exact, structured (both Texture), stable, and inflexible (both Variability) are commonly described as naïve or absolutistic beliefs, whereas the notion that knowledge is vague, unstructured, dynamic, and flexible rather reflects multiplistic beliefs (Rosman et al., 2017).
Classification of the 24 Adjective Pairs of the CAEB to the Theoretically Assumed Dimensions and the Empirically Identified Dimensions by Stahl and Bromme (2007).
Source. Stahl and Bromme (2007).
Note. The English adjective pairs are translations of the German items used in the studies presented below. The brackets indicate the original position of the item pair in the psychometric scale. Based on the results from exploratory factor analysis, item pairs marked with (T) were later assigned to the dimension Texture, item pairs marked with (V) show items of the dimension Variability. All other item pairs could not be located by Stahl and Bromme and were excluded from their measurement model.
The two-factor solution identified by Stahl and Bromme (2007) was based on ratings about knowledge in plant identification, genetics, physics, and organic chemistry by (mainly biology) students. Subsequently, the CAEB was applied to other domains (Kienhues et al., 2011; Pieschl et al., 2008; Rosman et al., 2017; Schreck et al., 2023), particularly in teacher education (Kunter et al., 2017; Münchow et al., 2019). The two dimensions Texture and Variability were usually adopted, but, such as with many other instruments for the assessment of epistemic beliefs, their transferability to other disciplines or audiences has not been systematically questioned, which seems essential in the light of the so-called replication crisis (Shrout & Rodgers, 2018).
Current Study
Effects of epistemic beliefs have consistently been found using CAEB adaptations, and their practical relevance is emphasized across literature (e.g., Kienhues et al., 2011; Pieschl et al., 2008; Schreck et al., 2023). Nonetheless, their actual impact remains uncertain, as the psychometric quality of this instrument has not been rigorously evaluated across different domains. For instance, Schreck et al. (2023) used an adaptation of the CAEB instrument in mathematics, but their primary focus was on investigating students’ development of epistemic beliefs over time rather than on validating the instrument. Similarly, the studies by Kienhues et al. (2011) and Pieschl et al. (2008) were limited by small sample sizes, which restricts the generalizability of their findings.
While the CAEB has been applied in various studies, systematic validation efforts remain scarce. This lack of validation is particularly significant in the disciplines of educational science and psychology, which are central to epistemic beliefs research. These fields provide key contexts for studying epistemic beliefs, especially in relation to teaching and learning (Hofer & Pintrich, 1997). Robust measurement tools are also essential for teacher education programs, where epistemic beliefs are believed to influence variables like instructional practices and student outcomes (e.g., Bråten et al., 2011; Lammassaari et al., 2022). Validating the CAEB in these domains not only ensures its applicability but also strengthens its utility for studying relationships between epistemic beliefs and other constructs. This study addresses this gap by systematically evaluating the psychometric properties of the CAEB in these two critical domains.
Against this background, this contribution aims at inquiring, whether established, and commonly used instruments like the CAEB (Stahl & Bromme, 2007) are feasible for study programs in educational sciences and psychology. Additionally, this research seeks to contribute to the systematic assessment of the insufficiently explored psychometric properties of the CAEB scale. Accordingly, the following research questions are addressed in this paper:
a: three-dimensional, theoretically assumed model with the dimensions Simplicity, Source, and Certainty
b: two-dimensional, empirically identified model with the dimensions Texture and Variability
Methods
Data Collection and Samples
A total of four convenience samples were used in this study. These datasets were not originally collected for the purpose of scale validation, but were obtained in different research projects with distinct objectives. The data for the CAEB scales were subsequently used for secondary analysis in this paper to validate the instrument with reference to educational science and psychology (for comparison between datasets see Table 2):
The first convenience sample was drawn from a (large-scale) study within the German, cross-university project Learning the Science of Education (LeScEd; Groß Ophoff et al., 2014). In one substudy, which aimed at item generation and test standardization for the assessment of Educational Research Literacy, 1,360 students from different study programs in the field of Educational Science at six German universities were recruited (Study 1: winter semester 2012/2013 and summer semester 2013, Groß Ophoff et al., 2014). As part of the original study’s methodology, a pre-test was conducted, in which two-thirds of the participants completed subscales from intelligence tests, that is, the Culture Fair Intelligence Test (CFT 20-R; Weiß & Weiß, 2006) or the Intelligenz-Struktur-Test (I-S-T 2000R; Groß Ophoff et al., 2014; Liepmann et al., 2001). The remaining one-third of the participants (N = 364 students), were randomly assigned to complete the CAEB instrument, answering questions related to either educational science or mathematics. For the purposes of this paper, only the subsample that completed the CAEB with regard to educational science is analyzed.
The second convenience sample was obtained from the German BilWiss study (e.g., Linninger et al., 2015). This project assessed the general pedagogical knowledge of more than 3,000 teacher training students in a census survey in North Rhine-Westphalia in spring/summer 2011 (Kunter et al., 2017). The corresponding scientific use file is available upon request at the German Research Data Centre and contains data of N = 519 university graduates who completed the CAEB instrument in educational science.
The third convenience sample was collected as a part of the Assessment and Training of Scientific Literacy (ASTRALITE) project, which aimed to evaluate and develop scientific reasoning of psychology students (N = 225; Münchow et al., 2019). Besides the newly developed instruments, such as the Argument Structure Test (AST) and Argumentative Judgment Test (AJT; see Münchow et al., 2019), the CAEB was administered to assess epistemic beliefs regarding psychology.
The fourth convenience sample comprises a subsample from the German BilWiss study (N = 264; see Sample 2), who completed the CAEB instrument with respect to psychology (referred to as BilWissP).
In all four samples, participation was voluntary and pseudonymous. For further details about study implementation, see Groß Ophoff et al. (2014, Sample 1), and Kunter et al. (2017, Sample 2, Sample 4), and Münchow et al. (2019, Sample 3).
Descriptive Statistics of the Samples From the Main Studies. LeScEd (Sample 1), BilWiss (Sample 2), ASTRALITE (Sample 3), BilWissP (Sample 4).
Note. n = number of study participants; M (SD) = mean (standard deviation).
Semester = portion of an academic year, which in German Higher Education usually refers to a university session of 6 months.
Abitur = German University Entrance Qualification, grades range from 1 to 6 (4 as lowest passing grade) with lower numbers indicating better results.
Psychometric Scale and Statistical Analyses
As the main focus of this research was to explore the psychometric qualities of connotative epistemological beliefs, the CAEB was administered to all four samples. In Sample 1 and Sample 2, students from the field of educational sciences were asked to judge the Nature of Knowledge and Knowing within that domain. In contrast, Sample 3 comprised psychology students who appraised the CAEB with regard to psychology. Sample 4, a subsample of Sample 2, evaluated the CAEB in relation to psychology.
The presented analysis was initially based on confirmatory factor analyses (CFA) for the theoretically assumed, and empirically identified, dimensions by Stahl and Bromme (2007). In the case that confirmatory analyses provide an insufficient model fit, exploratory analyses in the form of structural equation modeling (MPlus, Version 8; Muthén & Muthén, 2017) would be used to determine a new, and ideally stable, measurement model. Specifically, the data were analyzed using so-called exploratory structural equation models (ESEM; Asparouhov & Muthén, 2009). ESEM combines the advantages of exploratory and confirmatory factor analyses with structural equation modeling approaches by assigning items not only to one dimension (as is common in CFA), but to all dimensions in the measurement model. For the evaluation of model fit, different fit-indices like the Comparative Fit Index (CFI, Bentler, 1990) or the Root Mean Squared Error of Approximation (RMSEA) were used (Marsh et al., 2004) with RMSEA < 0.080 and CFI > 0.900 indicating an acceptable and RMSEA < 0.050 and CFI > 0.950 indicating a good model fit.
As the robustness of CFA and ESEM results depends on adequate sample sizes, the adequacy of the sample sizes was evaluated based on established guidelines for CFA and ESEM, as compiled by Kyriazos (2018). Since this study draws on datasets from different research projects, no a priori sample size estimations were conducted. However, the sample sizes of the datasets fell within the commonly recommended range (N = 100–250) and were deemed sufficient for the validation analyses (Kyriazos, 2018).
Sample 1 served as the starting point for model development for two main reasons. First, it was the earliest dataset readily accessible to the author team, allowing for unrestricted exploratory analyses and iterative refinement. This accessibility was essential for the initial development process. Second, the CAEB scale was originally validated in the context of educational science, and participants in Sample 1 had a curricular background in this field, providing a theoretically grounded basis for model construction. Sample 2, Sample 3, and Sample 4 were subsequently used for cross-validation, both within the field (educational science) and across disciplines (psychology). The workflow of this investigation is depicted in Figure 1.

Analysis procedure.
In the first instance, the ratings of epistemic beliefs with regard to knowledge in educational science were modeled. In order to identify a stable, well-fitting solution, items with low factor loadings on more than one factor, with no factor loading above λij = .30 on at least one factor (Hair et al., 2014), or with inconclusive factor loading patterns (i.e., similar loadings on more than one factor), were successively excluded. The selection of items was guided by established principles of factor analysis (Clark & Watson, 2019), with particular attention to factor loadings and loading patterns. While item-total correlations were examined as supplemental information, they were not used as the primary criterion for item exclusion. This decision reflects concerns raised in the literature: relying solely on item-total correlations may inflate internal consistency but compromise construct coverage (DeVellis & Thorpe, 2021). This is especially critical for epistemic beliefs, a construct known for its conceptual complexity and the frequent occurrence of low scale reliabilities (Lee et al., 2021). Therefore, maximizing reliability alone was deemed insufficient and potentially misleading in this context.
Once a satisfactory factor solution was identified, it was translated into a confirmatory measurement model for further examination. After a sufficient solution was found, the measurement model was then applied to Sample 2, Sample 3, and Sample 4 using CFA. In addition, measurement invariance for Sample 2, and/or Sample 3, and/or Sample 4 versus Sample 1 was analyzed successively by comparison of hierarchically nested models of configural, metric, and scalar factorial invariance (Cheung & Rensvold, 2002). The more parsimonious model was retained, if the decline in model fit was ΔCFI ≤ 0.01 (Chen, 2007). With the examination of measurement invariance, it can be determined whether an instrument measures the same construct across different samples. At a minimum, metric measurement invariance (i.e., corresponding factor loadings are equivalent) must be established to ensure that the same constructs are being assessed. Based on the standardized factor loadings, the composite reliability, McDonald’s ω, was computed (McDonald, 1999). This coefficient provides information to what extent the indicators are able to represent the information of an underlying factor and can be interpreted like Cronbach’s alpha.
Results
At the heart of this paper is the question of the dimensionality of connotative aspects of epistemological beliefs in educational science and psychology, and whether the emerging dimensions are generalizable across closely related disciplines. In order to gain insights on this issue, the following analyses were conducted.
Confirmatory Factor Analyses
To test whether epistemic beliefs can be measured by the (a) theoretically assumed, three dimensions (24 items), and/or by the (b) empirically identified two dimensions (17 items) of the CAEB (Stahl & Bromme, 2007), CFA were modeled for each individual study (see Table 3).
Factor Analyses for All Samples.
Note. / = confirmatory model did not converge.
Confirmatory Factor Analyses of Stahl & Bromme’s (2007) Theoretical (Three Factors, 24 Items)-, and Empirical (Two Factors, 17 Items) Model for Each Individual Convenience Sample.
First, the theoretically assumed model did not converge in confirmatory factor analysis for Sample 1, Sample 2, and Sample 4. Only the CFA for Sample 3 yielded a measurement model, however the fit can be considered as poor (CFI = 0.682, RMSEA = 0.082). Second, the empirically identified model did not converge in CFA analysis for Sample 4. In Sample 1 (CFI = 0.788, RMSEA = 0.080), Sample 2 (CFI = 0.738, RMSEA = 0.102), and Sample 3 (CFI = 0.764, RMSEA = 0.088), where the models converged, the model fits can be described as poor.
Exploratory Structural Equation Modeling
Because of the unsatisfying results of previous CFAs (see above), the analysis was started from scratch again: In correspondence with the theoretically assumed model (Stahl & Bromme, 2007), a three-dimensional model with all 24 CAEB items (ESEM1, see Table 4) was specified for Sample 1, which showed fit values below the commonly applied thresholds of an acceptable model fit (CFI = 0.898, RMSEA = 0.050). The ESEM-model could be improved (ESEM2) by successively excluding four items (items 04, 15, 23, and 24; for full item descriptions see Table 1) with low factor loadings (λ ij < .30) and inconclusive loading patterns. The remaining pool of 20 items showed an acceptable model fit. However, items 14 (inconclusive loading pattern) and 20 (low factor loading) were excluded from further analysis, before deeming the model as sufficient and transferring it into a confirmatory model. Next, a CFA was specified, in which the items with the highest factor loadings on a factor were fixed on the corresponding factor. Therein, the factor loadings of the two items representing the third factor (items 01 and 07) were fixed on 1. This CFA showed non-acceptable model-fit parameters (CFA1; see Table 4), which is why, in the next step, a three-factorial solution was abandoned and a two-factorial solution was pursued. In addition, item 10 showed the highest number of cross-loadings and was therefore excluded from further analyses. Furthermore, correlations between the residuals of items 21 (accepted–disputed) and 22 (certain–uncertain) and for items 11 (sorted–unsorted) and 16 (structured–unstructured) were additionally permitted, because the item wording was very similar (face validity). This can lead to measuring overlapping facets of the same construct, and high modification indices (see CFA2). This holds especially true for items 21 and 22, where similar words were used to assess epistemic beliefs in direct succession. Allowing correlations between residuals, particularly in cases of item similarity, is common practice in factor analysis, provided it is theoretically justified and applied carefully (Brown, 2015). The newly identified measurement model, showed improvements but still wasn’t sufficient for reliable analyses. In order to address this issue, in addition to standardized factor loadings and modification indices, other evaluation metrics for model refinement were also examined for CFA2. They revealed a high measurement error, low item-total correlations, and low inter-item correlations for items 08 and 18, and cross-loadings for item 17; theoretical and construct validity considerations further justified the exclusion of items 08 and 18. The final analysis step (CFA3) showed an acceptable fit based on Sample 1 (CFI = 0.921, RMSEA = 0.060) and was consequently used in the analysis of Sample 2, Sample 3 and Sample 4 (see next section). The final two-factor model, which was used for cross-validation, is presented in Figure 2.
Goodness-of-Fit Statistics of the Exploratory and Confirmatory Factor Analyses of the CAEB-Items From the Main Study Sample in Winter Semester 2012/2013.
Note. ESEM = exploratory structural equation modeling; CFA = confirmatory factor analysis; n f = number of latent factors; n i = number of test items included.

Identified two-factorial solution for the LeScEd dataset.
This model is generally comparable, but not identical, to the two-factor (Texture, Variability) empirical model of Stahl and Bromme (2007). Thus, the factor labels were retained for the measurement model presented here.
Cross-Validation of the Dimensionality
Subsequently, a CFA was conducted for the remaining three samples, which revealed acceptable (Sample 2: CFI = 0.939, RMSEA = 0.060) to good (Sample 3: CFI = 0.955, RMSEA = 0.051; Sample 4: CFI = 0.961, RMSEA = 0.040) fit measures (cf., Supplemental Material Table A1). Regarding the reliability of the emerging factors, McDonald’s ω for the Texture factor ranged from ω = .80 (Sample 4) to ω = .84 (Sample 2), which can be considered as high across all samples. The factor Variability ranged from ω = .59 (Sample 4) to ω = .66 (Sample 3), which is at most sufficient for group comparisons (Taber, 2018). The factors correlated negatively at a significant level for Sample 1 (r =−.39, p < .001) and Sample 3 (r = −.29, p = .001). In Sample 2 (r = .07, p = .302) and Sample 4 (r = −.15, p = .155) the correlations remained insignificant.
Table 5 gives an overview of the standardized factor loadings of the above reported CFAs of all four convenience samples, respectively, which range from λs = .35 (item 13, Sample 4) to λs = .86 (item 06, Sample 4). In order to support the interpretation of what the newly identified dimensions represent, the formerly theoretically assumed and empirically identified dimensions by Stahl and Bromme (2007) are reported in the third and fourth columns on the left side of Table 5. On the first dimension (here called Texture), nine item pairs are located, of which five (items 03, 05, 09, 12, 21) were originally expected to represent beliefs about the Source of Knowledge, three the Certainty of Knowledge (items 02, 11, 22), and one the Simplicity of Knowledge (item 16). Six of these items were later identified by Stahl and Bromme as representing the empirical dimension of epistemic beliefs about the Texture of Knowledge. This model also includes three additional items, namely items 21 and 22, which were originally excluded, and item 02, which was originally assigned to the Variability dimension – a dimension that was also identified on the basis of the samples described in this paper. However, due to the consistent and sufficiently high factor loading patterns, and their content-related connection to the factor, these items were included further on and the factor label Texture remained.
Standardized Factor Loadings (Standard Errors) for Confirmatory Factor Analysis With the Two CAEB-Dimensions Texture and Variability for Sample 1, Sample 2, Sample 3, and Sample 4.
Note. λ s = factor loading (standardized solution) with standard error in brackets. The number in brackets behind each item text indicates the position in the scale. Excluded items: simple-complex (01); integrated-separated (04); superficial-profound (07); temporary-everlasting (08); absolute-relative (10); definite-ambiguous (14); negotiated-discovered (15); completed-uncompleted (17); refutable-irrefutable (18); connected-divided (20); detailed-global (23); constructed-preexisting (24).
The second dimension contains three items which were originally described by Stahl and Bromme (2007) as beliefs about the Certainty of Knowledge. As they later labeled the items of this dimension Variability, the factor label Variability was also retained for this model. Different to the other dimension, the items dynamic–static (06), flexible–inflexible (13), and open–closed (19) were originally reverse-coded, so that higher values represent the notion that knowledge in the corresponding domain is rather static (06), inflexible (13), or closed (19).
Overall, this measurement model differs from Stahl and Bromme’s (2007) empirical model in that it has fewer items and includes some refinements. However, despite these modifications – such as the inclusion of items 21 and 22, and the reassignment of item 02 – the underlying factor structure remains the same, and can therefore be considered a preliminary short form of their empirical model.
As a last step multiple group analyses with increasingly restrictive forms of measurement invariance were compared for Sample 1, Sample 2, Sample 3, and Sample 4. The results of these group comparisons can be viewed in Tables 6 and 7 (overall test of measurement invariance), and in the Supplemental Material (Table A2). However, it should be noted that the assumption of scalar measurement invariance had to be rejected because of ΔCFI > 0.010, but metric measurement invariance can be assumed based on all group comparisons. In other words, the CAEB-items appear to measure the same dimensions of connotative epistemological beliefs, but there are differences in the intercepts (i.e., latent means; for more information see Supplemental Material Tables A3 and A4). More precisely, this means that the groups differ in how they perceive the overall level of epistemic beliefs, which limits the ability to make direct comparisons of the means between groups.
Goodness-of-Fit Statistics of the Confirmatory Factor Analyses (Two Factors, 12 Items) Between Two Samples.
Goodness-of-Fit Statistics of the Confirmatory Factor Analyses (Two Factors, 12 Items) Between All Four Samples (LeScEd vs. BilWiss vs. ASTRALITE vs. BilWissP).
Discussion
The overarching purpose of this paper was to assess the dimensionality of the CAEB instrument, and whether the scale can be applied to the disciplines of educational science and psychology. First, the question was stated, which dimensions of epistemic beliefs can empirically be differentiated for the available convenience samples. Confirmatory factor analyses for the theoretically assumed and empirically identified dimensions by Stahl and Bromme (2007) yielded poor fitting models, and often did not converge at all. Unfortunately, this trend is not new in this line of research, regarding replication of other established epistemic belief instruments (e.g., Leal-Soto & Ferrer-Urbina, 2017). As to the question of why problems arise, when trying to replicate epistemic belief scales, the answers are manifold. Some psychometric scales are difficult to generalize across cultures (e.g., Odebiyi & Choi, 2022; So & Lee, 2010; Tuncay-Yüksel et al., 2023), and others are dependent on the target groups (e.g., Yli-Panula et al., 2021). Other researchers stress the vagueness of the construct itself and the subsequently inadequate operationalization of epistemic beliefs (Mason, 2016). Like Stahl and Bromme (2007), the current study administered the CAEB to German university students, but with a crucial difference: participants were asked about educational science and psychology (compared to plant identification, organic chemistry, etc.). This raises the question of whether the poor results are primarily related to characteristics of the available convenience samples, or to more general challenges in measuring epistemic beliefs.
Therefore, second, based on the available data, a new preliminary CAEB model was determined by ESEM and subsequent CFAs for the disciplines of educational science and psychology. This model has the same two-factor structure as the model empirically identified by Stahl and Bromme (2007), but with a few caveats: (a) instead of 17 items, the current model contains a total of 12 items, (b) item 02 (stable–unstable) was originally fixed on the factor Variability, the current analyses however assign it to the factor Texture, (c) items 21 (accepted–disputed) and 22 (certain–uncertain) were assigned to the factor Texture, instead of not being included in the empirically identified model. The reassignment of item 02 suggests that it is perceived as addressing the structure and accuracy of knowledge, rather than its stability and dynamics. Future research should investigate whether this reassignment is consistent across different samples and contexts, or whether item 02 is generally more fluid than previously assumed.
Despite these changes, results from four convenience samples indicate that the preliminary 12-item model exhibits acceptable psychometric performance in the context of educational science and psychology, although important limitations regarding characteristics of the convenience samples, and their reliability must be considered. This interpretation is further informed by the inspection of measurement invariance, which indicated that metric invariance could be assumed across all samples, whereas scalar invariance was not fully established. Consequently, these results primarily support the claim that the CAEB-items represent the same underlying construct and dimensions across disciplines, while group comparisons of latent means remain limited.
Different from the three convenience samples of students in educational sciences (Sample 1, Sample 2, and Sample 4), Sample 3 appears to be an outlier, where psychology students were asked to answer the CAEB with regard to psychology. ΔCFI values indicate that group comparisons including Sample 3 show larger differences than those excluding it. These differences may reflect sample characteristics such as composition, homogeneity, participants’ level of expertise, or minor disciplinary variations; however, the present study cannot identify the specific cause. Unfortunately, data on participants’ semester level or prior exposure to the discipline, such as previous coursework or related experience, were not collected for Sample 2 and Sample 4. As a result, the reasons for scalar non-invariance between these groups cannot be determined conclusively. Consequently, comparisons of latent means across groups should be interpreted with caution.
Strengths
The present study contributes to an issue that is heavily discussed among researchers, namely, the replicability of quantitative epistemic belief instruments. Four samples were used to provide a detailed evaluation of the applicability of the CAEB in educational science and psychology. The analyses revealed that neither the theoretically assumed, nor the empirically identified dimensions by Stahl and Bromme (2007) could be identified for the psychometric scale at hand, when trying to apply them in different disciplines. However, a structurally slightly different preliminary model with 12 items and two dimensions, similar to the empirically identified dimensions by Stahl and Bromme, could be identified based on the present convenience samples. The few modifications, namely the inclusion of items 21 and 22, and the reassignment of item 02, may suggest that the overall model should be reconsidered. Despite these adjustments, the underlying factor structure remains recognizable, as the items are conceptually consistent with the two factors –Texture and Variability– proposed by Stahl and Bromme (2007), although the reliability of the Variability factor warrants further examination in future studies. Thus, the measurement model presented can be considered as a preliminary adapted short form of the original model. This is further supported by its initial validation across two independent but related disciplines, where both students in educational sciences and psychology were tested. With configural and metric invariance established, it can be derived that, across all samples, the same construct and dimensions are measured, whereas the lack of scalar invariance suggests that group comparisons of latent means remain limited. This study also extends previous studies by systematically evaluating the CAEB across two distinct disciplines, using larger and more diverse samples than earlier studies. Unlike prior research, which often focused on specific outcomes or relied on small sample sizes, this study provides a more detailed assessment of the instrument’s psychometric properties. Also, by addressing the lack of systematic validation efforts of the CAEB, and by focusing on two key disciplines associated with epistemic beliefs research, this study provides an initial basis for improving the applicability of epistemic beliefs measures across diverse contexts. Overall, this study emphasizes the need for more rigorous and transparent evaluation of epistemic belief instruments, particularly in recognition of researchers’ responsibility amid the replication crisis in psychological research (Korbmacher et al., 2023). The use of the CAEB in its original form is especially being called into question.
Limitations
Despite these strengths, the current study comes with some notable limitations that need to be addressed. First, from the original CAEB measure only half of the items remained in the presently proposed preliminary model. This can be viewed as a drastic change, and casts doubts on whether the CAEB should be applied at all. There can be several reasons for the weak performance of the CAEB. One reason might be the wording of the items, as ESEMs and CFAs revealed inconclusive loading patterns and high amounts of cross-loadings for several items. Furthermore, the original studies conducted by Stahl and Bromme (2007) also support the interpretation that there is redundancy within the scale. Another problem the CAEB faces is the factor Variability. Across four samples the reliability ω remained questionable, and at most sufficient for further analyses on group level. These issues warrant further examination in the future, especially through the development and testing of new items for the scale. Another challenging aspect of this investigation is the interpretation of measurement invariance between the different groups. Since the four samples differ slightly, the possible explanations for missing scalar invariance can only be made with the utmost caution. In particular, the lack of scalar invariance may reflect differences in the characteristics of our convenience samples (e.g., composition, homogeneity, or level of expertise), and could be also influenced by domain differences. However, due to the nature of the convenience sampling, the present study cannot draw conclusions about the specific causes of these differences, and thus cannot isolate or separately assess the main effects and interactions of domain-specificity, expertise, and sample homogeneity. These challenges remain open and should be addressed in the future. Nonetheless, the emerging patterns are worthy of critical discussion and can be the starting point for future studies in this field.
Implications and Future Directions
Regardless of all these issues, the current preliminary short form of the CAEB can be considered as a useful instrument for assessing connotative epistemological beliefs. By targeting connotative aspects, the CAEB is unique in its aim to assess spontaneous and associative beliefs, whereas most instruments aim to activate explicit denotative beliefs. This distinction is important because the CAEB seeks to qualify epistemic beliefs, by rather focusing on attitudes and emotions about the Nature of Knowledge and Knowing instead of conscious reflection (Stahl & Bromme, 2007). Across four convenience samples from two disciplines, this version shows potential as a more economical instrument for assessing connotative epistemological beliefs in educational science and psychology. In this context, the Texture factor demonstrates comparatively stable and satisfactory properties, whereas the Variability factor warrants further examination and testing and should therefore be used with caution due to its lower reliability and greater sensitivity to sample characteristics. It is important to note that these conclusions are based on four convenience samples and on models for which only metric, but not scalar, invariance could be established. Consequently, the current form should be regarded as a preliminary solution whose robustness and generalizability require further validation in more diverse and independent samples.
Building on these points, there are still issues the current CAEB-model faces, that need to be addressed in future investigations and applications. First, the comparatively lower reliability of the Variability factor has implications for its practical use and interpretation. Researchers applying the CAEB should therefore consider potential consequences of insufficient reliability, such as reduced sensitivity to true differences or increased measurement error. As far as the factor itself is concerned, it could be a worthwhile endeavor to develop and test new items in order to improve its robustness.
Second, the domain-generalizability of the current CAEB-model should be assessed in future investigations. While the results of this initial study provide preliminary evidence for the utility of the CAEB in educational science and psychology, it remains uncertain whether it can be applied to other disciplines and/or samples. In the long term, a domain-general instrument for associative epistemic beliefs could have its merits, as it may provide a different perspective on how individuals conceptualize knowledge in different domains. Such an instrument could also give insights into overarching patterns, that cannot be captured by domain-specific measures. The current preliminary short form of the instrument shows promise – despite its limitations (e.g., characteristics of the convenience samples, limited reliability, lack of scalar invariance) – as an adaptable, economical, and efficient tool, potentially complementing instruments measuring denotative epistemic beliefs.
Third, the question of whether the poor performance of the CAEB is due to the instrument itself or to the overall construct of epistemic beliefs cannot be answered conclusively. On the one hand, the CFAs and ESEMs suggest that a lot of items are redundant, suboptimally worded and/or not fitting to the proposed factors. On the other hand, these problems may stem from the overall messiness of the construct itself, which has been described by other authors (e.g., Lee et al., 2021; Mason, 2016) and is a problem that other researchers have had when trying to validate other epistemic belief instruments (e.g., Leal-Soto & Ferrer-Urbina, 2017; Odebiyi & Choi, 2022; Yli-Panula et al., 2021). These challenges reflect broader issues highlighted by the replication crisis, underscoring the need for more robust and transparent methodological practices (Korbmacher et al., 2023). Thus, the issue can only be resolved by more rigorous and thorough attempts to validate the CAEB in other samples and in other domains.
Fourth, the testing of measurement invariance suggests that factors such as domain-specificity and characteristics of convenience samples (e.g., level of expertise) may influence how epistemic belief questionnaires are answered. These findings raise important questions about the practical use of the CAEB across academic disciplines. Although a uniform scoring system is currently applied, this may ignore discipline-specific ways of interpreting and responding to items. Some items may feel more relevant or intuitive to certain groups, potentially introducing bias. The lack of scalar invariance suggests that such interpretive differences undermine the comparability of scores across disciplines. Group comparisons should therefore be conducted only with these psychometric limitations in mind, particularly regarding the comparability of latent means. Experimental designs isolating the effects of disciplinary background could clarify their influence on measurement invariance and score validity. Future research should also explore whether discipline-specific scoring or item weighting could enhance the fairness and validity of the CAEB. With this understanding, future development of epistemic beliefs questionnaires – or the adaptation of already established measures – could benefit significantly.
Last, future research on epistemic beliefs should address the question of which approaches are most suitable for measuring epistemic beliefs, and in the case of this paper, the connotative aspects of them. Given the emphasis by other researchers on taking a step toward more methodological openness (i.e., multi-method approaches, contextualized questionnaires, etc.; Elby et al., 2016; Lee et al., 2021; Mason, 2016), it may be worth considering alternative, complementary methods for measuring connotative epistemological beliefs. Specifically, the inclusion of qualitative methods, such as think-aloud protocols or interviews, could be considered alongside the traditional quantitative approach.
Supplemental Material
sj-docx-1-sgo-10.1177_21582440261461971 – Supplemental material for Validating the Assessment of Epistemic Beliefs in Education and Psychology: Insights From Four Convenience Samples
Supplemental material, sj-docx-1-sgo-10.1177_21582440261461971 for Validating the Assessment of Epistemic Beliefs in Education and Psychology: Insights From Four Convenience Samples by Marcel Mayr, Jana Groß Ophoff, Samuel Merk and Benjamin Rott in SAGE Open
Footnotes
Ethical Considerations
Ethical approval was not required for this validation study.
Consent to Participate
Written informed consent to participate was obtained from participants in the LeScEd study. Since the authors of this paper were not involved in the administration or implementation of the BilWiss and ASTRALITE projects, consent to participate is not applicable for those studies.
Consent for Publication
Written informed consent for data to be published was obtained from participants in the LeScEd study. Since the authors of this paper were not involved in the administration or implementation of the BilWiss and ASTRALITE projects, consent to publish data is not applicable for those studies.
Author Contributions
M. Mayr: Conceptualization, Methodology, Data Curation, Formal analysis, Visualization, Writing – original draft, Writing – review and editing, Project administration, Final approval. J. Groß Ophoff: Conceptualization, Conception and design of Study 1, Data Acquisition Study 1, Formal Analysis, Writing – original draft, Writing – review and editing, Project administration, Final approval. S. Merk: Conceptualization, Writing – original draft, Final approval. B. Rott: Conceptualization, Writing – original draft, Final approval.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The corresponding author’s position is funded by the Austrian Federal Ministry of Education, Science and Research (Bundesministerium für Bildung, Wissenschaft und Forschung; BMBWF) in cooperation with the Innovation Foundation for Education (Innovationsstiftung für Bildung; ISB) through the initiative “Educational Innovation Needs Educational Research B3,” which aims to strengthen the field of educational research by creating cooperative doctoral programs between universities and teacher training colleges. The data of Study 1 stems from the project Learning the Science of Education (LeScEd), which was originally funded by the German Federal Ministry of Education and Research (Bundesministerium für Bildung und Forschung; BMBF) within the research program KoKoHs (2011–2015). The data of Study 2 and Study 4 stems from the project Bildungswissenschaftliches Wissen und der Erwerb professioneller Kompetenz in der Lehramtsausbildung (BilWiss), which was originally funded by the German Federal Ministry of Education and Research (Bundesministerium für Bildung und Forschung; BMBF) within the research program ProPäda (2009–2016). The data of Study 3 stems from the project Assessment and Training of Scientific Literacy (ASTRALITE), which was originally funded by the German Federal Ministry of Education and Research (Bundesministerium für Bildung und Forschung; BMBF) within the research program KoKoHs (2016–2018).
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
Upon request, access to research data can only be provided to the LeScEd data set. The data from the BilWiss and ASTRALITE projects were obtained through publicly funded projects. Consequently, data access requests for these studies must be directed to the corresponding project administrators.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
