Abstract
People base judgements about their own memory processes on probabilistic cues such as the characteristics of study materials and study conditions. While research has largely focused on how single cues affect metamemory judgements, a recent study by Undorf, Söllner, and Bröder found that multiple cues affected people’s predictions of their future memory performance (judgements of learning, JOLs). The present research tested whether this finding was indeed due to strategic integration of multiple cues in JOLs or, alternatively, resulted from people’s reliance on a single unified feeling of ease. In Experiments 1 and 2, we simultaneously varied concreteness and emotionality of word pairs and solicited (a) pre-study JOLs that could be based only on the manipulated cues and (b) immediate JOLs that could be based both on the manipulated cues and on a feeling of ease. The results revealed similar amounts of cue integration in pre-study JOLs and immediate JOLs, regardless of whether cues varied in two easily distinguishable levels (Experiment 1) or on a continuum (Experiment 2). This suggested that people strategically integrated multiple cues in their immediate JOLs. Experiment 3 provided further evidence for this conclusion by showing that false explicit information about cue values affected immediate JOLs over and above actual cue values. Hence, we conclude that cue integration in JOLs involves strategic processes.
Metacognition research focuses on judgements and decisions that refer to one’s own cognitive processes (e.g., Do I know this river’s name? Do I know this topic well enough to stop studying?). In contrast, research on judgement and decision-making (JDM) addresses judgements and decisions about the external world (e.g., Will it freeze tonight? Is the water here safe to drink?). This difference in the object of judgement, however, does not necessarily imply that the cognitive processes underlying metacognitive judgements and judgements about the external world are entirely different. In particular, a lack of definitive knowledge about the current and future states of the object of judgement may produce similar cognitive processes. While a lack of definitive knowledge is obvious in many judgements about the external world because the world is inherently unpredictable or information about relevant variables is missing (Hastie & Dawes, 2010), it may be somewhat surprising in the case of metacognitive judgements (Koriat, 1997, 2000, 2007). However, there is abundant evidence that people have no direct access to their cognitive systems and therefore have to infer metacognitive judgements from cues that they regard as predictive of the cognitive processes to be judged (see Koriat, 2015; Rhodes, 2016, for recent reviews). Consequently, both metacognitive judgements and judgements about the external world rely on probabilistic cues.
Concerning people’s predictions of remembering recently studied information at a later test (judgements of learning, JOLs), cues can be classified into three groups (Koriat, 1997). Intrinsic cues are characteristics of the study items such as concreteness or emotionality of to-be-studied words, whereas extrinsic cues refer to conditions of the specific study situation (e.g., presentation time, number of study presentations). Intrinsic and extrinsic cues can affect JOLs directly through the deliberate application of a rule or a belief about the cues’ impact on memory. Alternatively or additionally, intrinsic and extrinsic cues can impact JOLs indirectly through their effects on mnemonic cues. Mnemonic cues such as encoding fluency or perceptual fluency derive from people’s processing of the items at study and give rise to a subjective feeling of ease, fluency, or mastery that has the quality of a direct, unified experience without a clear awareness of its basis (Koriat, 1997; Koriat & Levy-Sadot, 1999).
Given that metacognitive judgements and judgements about the external world are inferred from a variety of cues, the issue of how and when people integrate multiple pieces of information in their judgements is essential to fully account for the basis and nature of either type of judgement. For several decades, JDM research has focused on the question of whether people integrate multiple cues or neglect information in judgements about the external world (Anderson, 1981; Gigerenzer, Todd, & the ABC Research Group, 1999). In contrast, researchers have only just begun to investigate cue integration in metacognitive judgements. A recent study tested whether people can integrate intrinsic and extrinsic cues in their JOLs (Undorf, Söllner, & Bröder, 2018). Across four experiments, participants studied single words and estimated the probability of recalling each word at test. Later, they completed a free recall test. When varying the two extrinsic cues number of study presentations (1 vs 2) and font size (18 point vs 48 point), individual-level analyses revealed that the majority of participants integrated both cues in their JOLs. Similarly, people integrated the two intrinsic cues concreteness (abstract vs concrete) and emotionality (neutral vs emotional) in their JOLs. Two further experiments showed that all four cues affected JOLs when manipulated simultaneously and confirmed that people integrated three cues that varied on a continuum in their JOLs. Overall, Undorf and colleagues (2018) found that each of the up to four simultaneously manipulated cues affected JOLs at the aggregate level and that no less than half of the participants integrated two or more cues in their JOLs.
The issue of cue integration in JOLs, however, is far from being settled. In particular, it is possible that in Undorf and colleagues’ (2018) study, participants based their JOLs on a single mnemonic cue rather than on multiple intrinsic and extrinsic cues. Specifically, multiple intrinsic and extrinsic cues may have fed into a unified feeling of ease, which in turn may have been the only cue that informed JOLs (cf. Koriat, 1997; Koriat & Levy-Sadot, 1999). If so, the resulting data would mimic strategic cue integration, which we define as basing JOLs on multiple intrinsic or extrinsic cues. Basing JOLs on a unified feeling of ease would not qualify as strategic cue integration, as this would mean that JOLs were based on only a single cue. To address this potential alternative explanation of Undorf and colleagues’ (2018) findings, it is necessary to consider whether strategic cue integration occurs when people’s JOLs cannot rely on mnemonic cues such as feelings of ease.
Also, a difference between Undorf and colleagues’ (2018) experiments and typical JDM studies was that participants were not explicitly informed about cue values and therefore had to extract cue values from the to-be-studied items. In JDM studies, people often receive explicit information about cue values, as in studies on multiple cue probability learning (MCPL; see Newell, Lagnado, & Shanks, 2015, for an overview), cue search in matrix-like information boards (Payne, Bettman, & Johnson, 1988), or clinical judgements based on vectors of symptom values (Hammond, 1955), among others. It therefore may be that explicit information about cue values would boost cue integration in JOLs.
In sum, the aim of this study is to improve our understanding of cue integration in metacognitive judgements by addressing two questions. First, is cue integration in JOLs strategic? Second, does explicit information about cue values boost cue integration in JOLs?
To examine cue integration in the presence versus absence of mnemonic cues, we solicited pre-study JOLs and immediate JOLs (cf. Castel, 2008; Jia et al., 2015; Mueller, Dunlosky, & Tauber, 2016; Mueller, Dunlosky, Tauber, & Rhodes, 2014; Mueller, Tauber, & Dunlosky, 2013; Price & Harrison, 2017; Sitzman, Rhodes, & Kornell, 2016; Susser, Panitz, Buchin, & Mulligan, 2017; Witherby & Tauber, 2017). Pre-study JOLs are made prior to studying the respective item. Together with the JOL prompt, participants are provided with explicit information about cue values (e.g., “You are about to study a concrete word pair”). Because pre-study JOLs are made before the to-be-judged items appear, they cannot possibly rely on mnemonic cues that derive from people’s processing of the respective item at study. However, pre-study JOLs can of course be based on intrinsic and extrinsic cues. In contrast, immediate JOLs are made immediately after studying each pair and may therefore be based on intrinsic and extrinsic cues as well as on mnemonic cues (for discussion, see Undorf & Erdfelder, 2015).
Previous studies revealed that pre-study JOLs were affected by word frequency (Jia et al., 2015), writing words with the dominant or non-dominant hand (Susser et al., 2017), and feedback about the accuracy of answers on a test (Sitzman et al., 2016). Studies comparing pre-study and immediate JOLs found that the relatedness effect (higher JOLs for related pairs such as moon–star than for unrelated pairs such as duck–cake, Mueller et al., 2013) and the identical effect (higher JOLs for identical pairs such as moon–moon than for related or unrelated pairs, Mueller et al., 2016) were smaller in pre-study JOLs than in immediate JOLs. Witherby and Tauber (2016) found that effects of word concreteness were less pronounced in pre-study JOLs than in immediate JOLs (Experiment 2) or similar across the two JOL types (Experiment 3). Mueller and colleagues (2014) demonstrated similar effects of presenting words in a larger font and a smaller font on pre-study and immediate JOLs. Most relevant for present purposes, Price and Harrison (2017) obtained pre-study JOLs when manipulating two within-subjects factors. The results revealed that, compared with immediate JOLs, pre-study JOLs evidenced larger effects of font size and smaller effects of relatedness. Because relatedness had a much stronger effect on memory performance than font size, pre-study JOLs were less accurate than immediate JOLs.
Any difference between pre-study and immediate JOL conditions apart from the timing of JOLs can complicate the interpretation of results. For this reason, participants in both JOL conditions usually received explicit information about cue values (Mueller et al., 2016; Mueller et al., 2014; Mueller et al., 2013; Witherby & Tauber, 2016). For instance, Witherby and Tauber (2016) informed not only the pre-study group whether the upcoming word would be concrete or abstract but also told the immediate group whether the previously studied word was concrete or abstract. As explained above, this variation of the standard immediate JOL procedure might promote cue integration in JOLs. The current study therefore included two immediate JOL conditions as used previously by Price and Harrison (2017, Experiment 3). The immediate group conformed to standard immediate JOL conditions without explicit information. Conversely, participants in the immediate-plus group were provided with explicit information when making immediate JOLs. This design allowed separating effects of JOL timing from effects of explicit information about cue values. Price and Harrison (2017, Experiment 3) found that effects of relatedness or font size on JOLs did not differ between the immediate and immediate-plus groups. However, this does not allow the conclusion that cue integration is independent of providing explicit information. As has been argued by Undorf and colleagues (2018), effects of two or more simultaneously manipulated cues on JOLs at the aggregate level may occur even if some participants based their JOLs only on one cue (e.g., relatedness) and other participants based their JOLs only on the other cue (e.g., font size). Thus, investigating cue integration in JOLs requires not only aggregate-level analysis but also individual-level analysis.
The first two experiments reported here simultaneously varied two intrinsic cues (concreteness, emotionality) in three conditions (pre-study JOL group, immediate JOL group, and immediate-plus JOL group). Both experiments tested (1) whether people strategically integrate multiple intrinsic cues in their JOLs and (2) whether explicit information boosts cue integration in JOLs. If people integrate multiple intrinsic cues in their JOLs, JOLs from all conditions should evidence effects of concreteness and emotionality at the aggregate and at the individual level. In contrast, if previous findings of multiple cues affecting JOLs at the individual level were due to these cues feeding into a unified mnemonic feeling, participants should not integrate multiple cues in their pre-study JOLs. Instead, each participant would base his or her pre-study JOLs on a single cue (i.e., concreteness or emotionality) or on neither cue. If explicit information about cue values boosts cue integration in JOLs, cue integration should be less prevalent in immediate JOLs than in immediate-plus JOLs. In contrast, similar amounts of cue integration across immediate and immediate-plus JOLs would indicate that cue integration in JOLs is independent of explicit information about cue values. To evaluate these hypotheses, we compared the pre-study, immediate, and immediate-plus conditions with respect to cue effects on JOLs at the aggregate and individual levels. In addition, to see whether potential differences in cue integration affected JOL accuracy, we compared the three conditions regarding recall performance, resolution (i.e., the degree to which JOLs capture differences in the relative memorability of items), and calibration (i.e., the extent to which JOLs are realistic). In Experiment 3, we addressed an alternative interpretation of the findings from the first two experiments.
Experiment 1
In Experiment 1, we presented participants with word pairs that varied in concreteness and emotionality. Both concreteness and emotionality were manipulated in two discrete levels (i.e., each pair was either clearly low or high in concreteness and emotionality). In the pre-study and immediate-plus conditions, cue levels were denoted by exact values (e.g., concreteness of 2.68 and emotionality of 6.01).
Based on studies that manipulated concreteness of word pairs in isolation (e.g., Begg, Duft, Lalonde, Melnick, & Sanvito, 1989; Tullis & Benjamin, 2012), we expected that JOLs and recall performance would both increase with concreteness. Studies that investigated the effects of word emotionality all found higher JOLs for positive and negative items as compared with neutral items (Hourihan, Fraundorf, & Benjamin, 2017; Tauber & Dunlosky, 2012; Undorf et al., 2018; Zimmerman & Kelley, 2010). Free recall performance for single words showed a similar pattern (Hourihan et al., 2017; Tauber & Dunlosky, 2012; Undorf et al., 2018; Zimmerman & Kelley, 2010), whereas cued recall performance was similar for negative and neutral word pairs or worse for negative than for neutral word pairs but better for positive than for neutral word pairs (Zimmerman & Kelley, 2010). Based on these findings, we expected that emotionality would increase JOLs. With regard to cued recall performance, Zimmerman and Kelley’s (2010) findings suggested three possible outcomes. First, recall might slightly increase with emotionality due to the facilitating effects of positive word pairs. Second, recall might slightly decrease with emotionality due to the detrimental effects of negative word pairs. Finally, effects of positive and negative word pairs might cancel each other out, resulting in a null effect of emotionality on recall performance. 1
More importantly, if people integrate concreteness and emotionality in their JOLs, individual-level analyses should reveal that a large number of participants from the pre-study, immediate, and immediate-plus conditions base their JOLs on both cues. In contrast, if previous findings of multiple cues affecting JOLs at the individual level were due to JOLs relying on a unified mnemonic feeling, each participant’s pre-study JOLs should be affected by at most one cue. Finally, if explicit information about cue values boosts cue integration in JOLs, individual-level analyses should show less cue integration in the immediate than in the immediate-plus condition. In contrast, similar amounts of cue integration across conditions would show that cue integration in JOLs is independent of explicit information about cue values.
Method
Participants and design
In this and the subsequent experiment, we set a sample size of about 30 participants per group. A post hoc power analysis using the effect sizes found in Undorf and colleagues’ (2018) Experiment 2 (emotionality: Cohen’s d = .87, concreteness: Cohen’s d = .85) and an alpha level of .05 revealed that statistical power to replicate effects of emotionality and concreteness on JOLs within each group exceeded .99. Also, the power for detecting a medium-sized within-between interaction was above 0.99. As shown by Quené and van den Bergh (2004), these values provide conservative estimates of statistical power for the generally more powerful multilevel regression models used in Experiment 2. Participants were 105 University of Mannheim undergraduates, randomly assigned to the pre-study (n = 35), immediate (n = 35), or immediate-plus (n = 35) conditions.
Materials
Stimuli were 60 unrelated word pairs that varied in concreteness and arousal. Neutral pairs were low in arousal, whereas emotional pairs were high in arousal. The two words in each pair were similar in concreteness and arousal. Normed values for imagery (rated on a 7-point scale ranging from 1 = low imageability to 7 = high imageability) and arousal (rated on a 5-point scale ranging from 1 = low arousal to 5 = high arousal) were taken from Võ and colleagues (2009). 15 pairs each were abstract and neutral (imagery: M = 2.90, SD = 0.74; arousal: M = 2.43, SD = 0.24), abstract and emotional (imagery: M = 2.87, SD = 0.51, arousal: M = 3.44, SD = 0.18), concrete and neutral (imagery: M = 5.71, SD = 0.49, arousal: M = 2.35, SD = 0.28), and concrete and emotional (imagery: M = 5.53, SD = 0.74, arousal: M = 3.38, SD = 0.35). Four additional pairs (one from each combination of concreteness and emotionality) served as primacy buffers and were not included in the analysis. Participants were prompted with exact values of imagery and arousal that were transformed to range from 1 to 7 and rounded to two decimal places.
Procedure
The experiment consisted of a study phase and a cued recall test. Instructions informed all participants that they would study 64 word pairs of varying concreteness and emotionality. It was explained that words high in concreteness refer to things that can be experienced directly through the senses (e.g., pear), whereas words low in concreteness refer to things that cannot be experienced through the senses (e.g., purpose). Moreover, it was explained that words high in emotionality evoke pleasant or unpleasant feelings (e.g., love, poison), whereas words low in emotionality evoke no feelings (e.g., percentage). Participants from the pre-study and immediate-plus groups were also told that, when making JOLs, they would be informed about each pair’s concreteness and emotionality, with values of 1 indicating very low concreteness or emotionality and values of 7 indicating very high concreteness or emotionality. Participants studied each pair for 5 s and made a JOL for it by clicking a percentage scale that was displayed on the screen together with a done button (participants could change their response but not after clicking done). In the pre-study condition, participants were asked to make their JOL for each pair before studying it. JOLs were prompted with (translated from German) “You are about to study a word pair with a concreteness of x and an emotionality of y, please estimate your chance of recall.” In the other conditions, participants were asked to make their JOL for each pair immediately after studying it. In the immediate condition, the JOL prompt was “Please estimate your chance of recall.” In the immediate-plus condition, JOLs were prompted with “You have studied a word pair with a concreteness of x and an emotionality of y, please estimate your chance of recall.” Following the study phase, participants performed a numerical filler task for 3 min. Afterwards, they completed a self-paced cued recall test, in which the first word of each pair was presented as the cue and participants were asked to type the second word (target). For each participant, stimuli were presented in a new random order at study and test.
Results
Concreteness and emotionality affected JOLs and recall performance in all three conditions (see Figure 1). JOLs were submitted to a mixed analysis of variance (ANOVA) with concreteness (abstract, concrete) and emotionality (neutral, emotional) as within-subjects factors and the two contrast-coded between-subjects factors JOL timing (prior to study, after study) and explicit information (without, with).
2
Due to the unbalanced design, interactions involving JOL timing and, at the same time, explicit information could not be computed. The main effects of concreteness, F(1, 103) = 94.89, p < .001,

Mean judgements of learning (JOL; top panel) and percentage of correctly recalled targets (recall; bottom panel) in the pre-study, immediate, and immediate-plus groups of Experiment 1.
A mixed ANOVA on recall performance with concreteness and emotionality as within-subjects factors and JOL timing and explicit information as between-subjects factors revealed a main effect of concreteness, F(1, 103) = 98.49, p < .001,
To examine cue integration at the individual level, we first analysed simple mean differences. We coded participants as having based JOLs on concreteness if their JOLs were higher for concrete pairs than for abstract pairs and as having based JOLs on emotionality if their JOLs were higher for emotional pairs than for neutral pairs. The results revealed that 79 participants (75.24%) integrated concreteness and emotionality in their JOLs (binomial test against a chance level of 25%: p < .001). The remaining participants based their JOLs on either concreteness (7 participants) or emotionality (15 participants) or on neither cue (4 participants). The percentage of participants who integrated concreteness and emotionality in their JOLs did not differ across the pre-study (85.71%), immediate (74.29%), and immediate-plus groups (65.71%), p = .172 by Fisher’s Exact Test.
In addition, we calculated Cohen’s d for concreteness and emotionality for each participant. As can be seen in Figure 2, the majority of participants from all three groups are located in the upper right quadrant, indicating that they predicted better memory for concrete than for abstract pairs and better memory for emotional than for neutral pairs. At the same time, effect sizes were much larger in the pre-study group than in the immediate or immediate-plus groups. Using Cohen’s (1977) small effects convention of |d| ⩾ .2 as evidence for reliable cue effects, 47.62% of participants integrated concreteness and emotionality in their JOLs, binomial test against a chance level of 25%: p < .001, with percentages differing across the pre-study (65.71%), immediate (34.29%), and immediate-plus groups (42.86%), p = .028 by Fisher’s Exact Test. Follow-up tests revealed significant differences in the probability of cue integration depending on JOL timing, p = .013, but not depending on explicit information about cue values, p = .624.

Scatterplot of individual effect sizes (Cohen’s d) measuring the effects of concreteness (x-axis) and emotionality (y-axis) on JOLs in the pre-study, immediate, and immediate-plus groups of Experiment 1.
Mirroring the analysis at the aggregate level, the two individual-level analyses converge on the conclusion that cue integration occurred in all three groups, with only the individual-level analysis based on effect size suggesting that cue integration was more frequent in pre-study JOLs than in immediate JOLs.
We also evaluated JOL accuracy in terms of resolution (measured by within-participant gamma correlations between JOLs and recall performance) and calibration (see Table 1). Calibration was indexed by two measures: bias (difference of mean JOLs and mean recall performance) and Brier scores (mean squared difference between JOLs and recall performance at the item level). Gamma correlations (excluding four participants because of a lack of variability in recall performance; ns per condition are noted in Table 1), bias, and Brier scores were submitted to separate two-way ANOVAs with JOL timing and explicit information as between-subjects factors. Again, interactions involving JOL timing and, at the same time, explicit information could not be computed. For gamma correlations, a marginal effect of JOL timing, F(1, 98) = 3.51, p = .064,
Mean gamma correlations, bias, and Brier scores in the pre-study, immediate, and immediate-plus groups of Experiments 1 and 2 and in the immediate-plus group of Experiment 3.
Numbers in parentheses are standard deviations. Asterisks refer to one-sample t tests against zero.
n = 33.
n = 28.
p < .05. **p < .01. ***p < .001.
Discussion
High concreteness produced higher JOLs and better memory performance, whereas high emotionality produced higher JOLs but lower memory performance. Both concreteness and emotionality had larger effects on pre-study JOLs than on immediate JOLs. Two complementary individual-level analyses revealed that cue integration occurred in all three conditions. Most importantly, the finding of cue integration in pre-study JOLs argues against the possibility that basing immediate JOLs on a single mnemonic cue mimicked strategic cue integration in previous research (Undorf et al., 2018). Instead, it suggests that participants integrated multiple cues in their JOLs. Unexpectedly, the individual-level analysis based on effect sizes showed that cue integration was more frequent in pre-study JOLs than in immediate JOLs. We will return to this point shortly. Concerning the effects of providing explicit cue values, individual-level analyses revealed no evidence that this may have boosted cue integration in JOLs. Finally, pre-study JOLs were less accurate than immediate JOLs, particularly regarding calibration.
Why were cue effects on pre-study JOLs larger than cue effects on immediate JOLs? This finding was somewhat unexpected because prior studies usually found smaller or similar cue effects on pre-study JOLs than on immediate JOLs (e.g., Mueller et al., 2016; Mueller et al., 2014; Mueller et al., 2013; Witherby & Tauber, 2017, but see Price & Harrison, 2017). We suspect that larger cue effects on pre-study JOLs were due to the possibility to incorporate one’s own study experiences in immediate JOLs but not in pre-study JOLs. Because concreteness and emotionality were the only pieces of information available when making pre-study JOLs, these cues probably impacted pre-study JOLs more strongly than immediate JOLs. As a related point, idiosyncratic person–item interactions may have reduced the impact of concreteness and emotionality on immediate JOLs (Bröder & Undorf, 2019). For instance, the abstract and neutral word percentage might be highly memorable for a person who just learned that he or she failed to achieve the percentage of points needed to pass an exam. This item–person interaction would boost the person’s memory and immediate JOL for percentage, but not his or her pre-study JOL, which is made before the item is presented. Consequently, item–person interactions may decrease the overall impact of cues on immediate JOLs but not on pre-study JOLs. We suspect that these two mechanisms are also responsible for increased cue integration in pre-study JOLs. In particular, it is plausible that increased reliance on concreteness and emotionality also fostered integration of the two cues in JOLs.
In sum, Experiment 1 suggested that participants integrated multiple cues in their JOLs. In Experiment 2, we aimed to extend these results to a representative design where concreteness and emotionality varied on a continuum.
Experiment 2
Experiment 2 tested whether the Experiment 1 finding of cue integration in pre-study JOLs would replicate when manipulating concreteness and emotionality on a continuum rather than in two easily distinguishable levels. In Experiment 1, we selected words that were clearly low or high in concreteness and emotionality. This resulted in a nonrepresentative selection of stimuli in so far as we excluded all words of intermediate concreteness and emotionality. In JDM research, it has been argued that a nonrepresentative selection of stimuli may change the processes that underlie people’s judgements as compared with a natural domain (e.g., Dhami, Hertwig, & Hoffrage, 2004; Gigerenzer, Hoffrage, & Kleinbölting, 1991). The study list for Experiment 2 included word pairs of intermediate concreteness and emotionality in addition to word pairs that were clearly low or high in concreteness and emotionality. This also enabled us to test whether explicit information about cue values would boost cue integration when there are fine-grained differences in concreteness and emotionality.
Method
Participants
Participants were 89 University of Mannheim undergraduates, randomly assigned to the pre-study (n = 30), immediate (n = 29), or immediate-plus (n = 30) conditions.
Materials and procedure
Stimuli were a representative sample of 120 German 3–10 letter nouns from Võ and colleagues (2009). The selected words were similar to all words with respect to concreteness (Ms: 4.28 vs 4.17, ranges: 1.67–6.78 vs 1.22–6.89) and arousal (Ms: 2.74 vs 2.76, ranges: 1.47–4.42 vs 1.11–4.71). We created 60 unrelated word pairs, each consisting of two words with similar concreteness and arousal. Four additional pairs served as primacy buffers and were not included in the analysis. As in Experiment 1, participants were prompted with values of imagery and arousal that were transformed to range from 1 to 7 and rounded to two decimal places. The procedure was identical to that of Experiment 1.
Results
We used multilevel regression models to evaluate the impact of concreteness and emotionality on JOLs in the pre-study, immediate, and immediate-plus conditions. JOLs were regressed on the contrast-coded fixed-effect predictors JOL timing and explicit information, the centred fixed-effect predictors’ concreteness and emotionality, and interactions between predictors. The regression model also included random intercepts for participants. As in Experiment 1, interactions involving JOL timing and, at the same time, explicit information could not be computed. Significantly positive unstandardized regression coefficients for concreteness, b = 2.74, SE = 0.12, t(5,251) = 21.95, p < .001, and emotionality, b = 2.64, SE = 0.13, t(5,251) = 20.42, p < .001, indicated that JOLs increased with both concreteness and emotionality. A significant interaction between concreteness and emotionality, b = 0.18, SE = 0.07, t(5,251) = 2.63, p = .009, indicated that the effect of emotionality on JOLs increased with increasing levels of concreteness. Significant interactions of JOL timing with concreteness, b = 1.10, SE = 0.18, t(5,251) = 6.26, p < .001, and emotionality, b = 1.60, SE = 0.18, t(5,251) = 8.74, p < .001, revealed larger effects of either cue on pre-study JOLs than on immediate JOLs. A significant interaction of explicit information with emotionality, b = −0.39, SE = 0.16, t(5,251) = 2.48, p = .013, revealed smaller effects of emotionality on immediate JOLs than on immediate-plus JOLs. A significant triple interaction between JOL timing, concreteness, and emotionality, b = −0.27, SE = 0.10, t(5,251) = 2.74, p = .006, indicated that the difference between pre-study and immediate JOLs decreased with increasing levels of concreteness and emotionality. No other effects were significant, all t <= 1.59, p >= .113.
We followed up on the interactions with separate multilevel regression models for each group. In these models, JOLs were regressed on concreteness, emotionality, and their interaction. The results revealed significant main effects of concreteness and emotionality in all three groups, pre-study group: t(1,770) = 21.11, p < .001, for concreteness and t(1,770) = 22.46, p < .001, for emotionality; immediate group: t(1,711) = 10.10, p < .001, for concreteness and t(1,711) = 5.81, p < .001, for emotionality; immediate-plus group: t(1,770) = 8.72, p < .001, for concreteness and t(1,770) = 9.68, p < .001, for emotionality.
A logistic regression on recall performance revealed a main effect of concreteness, b = 0.37, SE = 0.02, z = 17.25, p < .001, indicating that memory performance increased with concreteness. A main effect of emotionality, b = –0.05, SE = 0.02, z = 2.28, p = .023, indicated that memory performance decreased with emotionality. An interaction between concreteness and emotionality, b = 0.05, SE = 0.01, z = 4.53, p < .001, indicated that detrimental effects of emotionality decreased with increasing concreteness. An interaction between concreteness and explicit information, b = –0.05, SE = 0.03, z = 1.99, p = .047, indicated that concreteness increased memory performance particularly when comparing immediate and immediate-plus JOLs. No other effects were significant, all z <= 1.77, p >= .076.
To examine cue integration at the individual level, we submitted each participant’s JOLs to a multiple linear regression with concreteness, emotionality, and their interaction as predictors. Similar to the individual-level analysis based on simple mean differences, we coded participants as having based JOLs on a particular cue if that cue revealed a positive regression weight. The results revealed that 74 participants (83.15%) integrated concreteness and emotionality in their JOLs, p < .001 in a binomial test against a chance level of 25%. The remaining participants based their JOLs on either concreteness (6 participants) or emotionality (7 participants) or neither cue (2 participants). The percentage of participants who integrated concreteness and emotionality in their JOLs did not differ across the pre-study (86.67%), immediate (75.86%), and immediate-plus groups (86.67%), p = .530 by Fisher’s Exact Test.
For the individual-level analysis based on effect size, Figure 3 depicts individual participants’ standardised regression weights. As in Experiment 1, the majority of participants from all groups are located in the upper right quadrant and the effect sizes are larger in the pre-study group than in the immediate or immediate-plus groups. Testing each regression weight against Cohen’s (1977) effect size convention for small effects in measures of association (|r| ⩾ .10) revealed that 50.56% of the participants integrated concreteness and emotionality in their JOLs, p < .001 in a binomial test against a chance level of 25%, with percentages of cue integration differing across the pre-study (70.00%), immediate (41.38%), and immediate-plus groups (40.00%), p = .015 by Fisher’s Exact Test. Follow-up tests revealed significant differences in the probability of cue integration depending on JOL timing, p = .017, but not depending on explicit information about cue values, p = 1.

Scatterplot of individual effect sizes (standardised regression weights) measuring the effects of concreteness (x-axis) and emotionality (y-axis) on JOLs in the pre-study, immediate, and immediate-plus groups of Experiment 2.
To evaluate JOL accuracy, we conducted ANOVAs with JOL timing and explicit information as between-subjects factors on gamma correlations (excluding four participants because of a lack of variability in recall performance; ns per group are noted in Table 1), bias, and Brier scores. For gamma correlations, a significant effect of JOL timing, F(1, 82) = 14.35, p < .001,
Discussion
As in Experiment 1, JOLs from the pre-study, immediate, and immediate-plus conditions increased with concreteness and emotionality, with stronger cue effects on pre-study JOLs than on immediate JOLs. Concreteness again improved memory performance, whereas emotionality reduced not affect memory performance. Replicating Experiment 1, two complementary individual-level analyses showed cue integration in all conditions. Evidence for cue integration in pre-study JOLs argues against the possibility that basing immediate JOLs on a single mnemonic cue mimicked strategic cue integration in previous work. The individual-level analysis based on effect sizes again showed that cue integration was more frequent in pre-study JOLs than in immediate JOLs. As in Experiment 1, we did not find any effects of explicit information about cue values on cue integration in immediate JOLs. Finally, pre-study JOLs were less accurate than immediate JOLs, particularly regarding resolution.
Experiment 3
In Experiments 1 and 2, multiple cues affected immediate JOLs and, critically, pre-study JOLs. We interpreted this finding as indicating that cue integration in JOLs is strategic rather than due to people basing their JOLs on a single, unified mnemonic feeling. However, an alternative interpretation is that despite the generally similar effects of concreteness and emotionality on pre-study JOLs and immediate JOLs, the two types of JOLs may have had different bases. Specifically, people may have based their pre-study JOLs on strategic integration of multiple cues, but may have based their immediate JOLs on a single mnemonic cue. 3 As an analogy, consider a person who judges his emotional reaction to pictures based on descriptions. That person would probably predict that a picture high in threat content and low in controllability would frighten him. Another person who actually sees a picture of a snarling Rottweiler let off the leash might come up with a very similar judgement. It seems unlikely, however, that the two judgements would be based on the same psychological processes. Similarly, it might be possible that pre-study JOLs and immediate JOLs in Experiments 1 and 2 are based on different cognitive processes. If so, the results from the pre-study conditions would still indicate that, in principle, people are capable of strategically integrating multiple cues in their JOLs. These results would not, however, imply that people do so when making immediate JOLs. Rather, the results would then be compatible with the idea that people base their JOLs on unified mnemonic cues whenever possible (see also Undorf & Erdfelder, 2013).
Consistent with this possibility, the basis of metacognitive judgements is known to vary between judgement types and is affected by situational variables. For instance, global JOLs, in which people predict how many items of an entire study list they will recall, rely more on metacognitive knowledge and beliefs than immediate JOLs (Bjork, Dunlosky, & Kornell, 2013; Undorf & Erdfelder, 2015). Also, study-test practice changes the basis of immediate JOLs from reliance on intrinsic and extrinsic cues towards reliance on mnemonic cues (Koriat, 1997; Undorf & Erdfelder, 2015, see Serra & Ariel, 2014, for a different interpretation).
To test whether immediate JOLs from Experiments 1 and 2 were based on a unified mnemonic feeling, Experiment 3 investigated whether false explicit information about cue values affected immediate JOLs. Unlike veridical explicit information about cue values, false explicit information can impact immediate JOLs only when participants strategically integrate multiple cues in their JOLs. Finding that immediate JOLs rely on false explicit information would therefore argue in favour of strategic integration of multiple cues in immediate JOLs. In Experiment 3, all participants made immediate JOLs and received explicit information about cue values, which was false for some items and veridical for the remaining items.
If people strategically integrate multiple intrinsic cues in immediate JOLs, JOLs should evidence not only effects of actual concreteness and actual emotionality but also effects of false explicit information, that is, of announced concreteness and announced emotionality. In contrast, if previous findings of multiple cues affecting immediate JOLs were due to participants basing these JOLs solely on a unified mnemonic feeling, JOLs should reveal the effects of actual concreteness and actual emotionality only.
Method
Participants
Participants were 50 University of Mannheim undergraduates. A post hoc power analysis revealed that statistical power for detecting medium-sized effects of actual and announced cue values exceeded 0.90 (conservative estimate from repeated-measures ANOVA, see Quené & van den Bergh, 2004).
Materials and procedure
Stimuli were 64 unrelated word pairs, 16 of which were abstract and neutral (imagery: M = 2.90, SD = 0.59; arousal: M = 2.15, SD = 0.27), abstract and emotional (imagery: M = 2.94, SD = 0.56, arousal: M = 3.44, SD = 0.28), concrete and neutral (imagery: M = 5.77, SD = 0.44, arousal: M = 2.14, SD = 0.21), and concrete and emotional (imagery: M = 5.83, SD = 0.52, arousal: M = 3.42, SD = 0.28). For each combination of concreteness and emotionality, 12 word pairs were prompted with actual values of concreteness and emotionality (transformed to range from 1 to 7 and rounded to two decimal places). The remaining four word pairs at each combination were prompted with false values of concreteness (one item each), false values of emotionality (one item each), or false values of concreteness and emotionality (two items each). For each participant, we randomly selected false values for items low in arousal or concreteness from the range of high values (5.03–6.65) and false values for items high in arousal or concreteness from the range of low values (1.38–2.88). The procedure was identical to that of the immediate-plus condition in Experiments 1 and 2.
Results
To evaluate the impact of actual and announced concreteness and actual and announced emotionality on immediate JOLs, we used a series of multilevel regression models with random intercepts for participants. Model 1 predicted JOLs from actual concreteness, actual emotionality, and their interaction. Model 2 additionally included announced concreteness, announced emotionality, and their interaction as predictors. Comparing the models’ relative fit using likelihood ratio tests revealed that Model 2 provided a significantly better fit to the data than Model 1, χ2(3) = 22.82, p < .001. To specifically test for effects of announced concreteness, we regressed JOLs on actual concreteness, actual emotionality, their interaction, and announced concreteness. This model provided a significantly better fit to the data than Model 1, χ2(1) = 6.34, p = .012. Similarly, we tested for effects of announced emotionality by regressing JOLs on actual concreteness, actual emotionality, their interaction, and announced emotionality. This model also provided a significantly better fit to the data than Model 1, χ2(1) = 16.48, p < .001. Together, these results show that JOLs were based not only on actual concreteness and actual emotionality but also on announced concreteness and announced emotionality.
Since actual and announced cue values were correlated by design, it was important to avoid problems in estimating regression weights due to multicollinearity. We therefore used sequential likelihood ratio tests to establish independent influences of announced and actual cue values on JOLs. At the same time, the regression weights of the full model (Model 2) confirmed the results reported above. In particular, significantly positive regression coefficients for actual concreteness, b = 2.51, SE = 0.23, t(3,150) = 10.67, p < .001, and actual emotionality, b = 0.82, SE = 0.24, t(3,150) = 3.47, p < .001, indicated that JOLs increased with actual concreteness and actual emotionality. Significantly positive regression coefficients for announced concreteness, b = 0.59, SE = 0.24, t(3,150) = 2.52, p = .012, and announced emotionality, b = 0.96, SE = 0.24, t(3,150) = 4.06, p < .001, indicated that JOLs increased with announced concreteness and announced emotionality. None of the interactions were significant, both t < 1. This again showed that people integrated false explicit information in their JOLs, arguing against the possibility that immediate JOLs were solely based on a unified mnemonic feeling.
A logistic regression on recall performance with actual concreteness, actual emotionality, announced concreteness, announced emotionality, and their interactions as predictors and random intercepts for participants revealed a main effect of actual concreteness, b = 0.33, SE = 0.03, z = 11.03, p < .001, indicating that memory performance increased with actual concreteness, and a main effect of actual emotionality, b = −0.07, SE = 0.03, z = 2.41, p = .016, indicating that memory performance decreased with actual emotionality. No other effects were significant, all z <= 0.94, p > = .347.
To examine cue integration at the individual level, we submitted each participant’s JOLs to a multiple linear regression with actual concreteness, actual emotionality, announced concreteness, and announced emotionality as predictors. As in Experiment 2, we coded participants as having based JOLs on a particular cue if that cue revealed a positive regression weight. The results revealed that 9 participants (18.00%) integrated all four cues in their JOLs and that 27 participants (54.00%) integrated three cues in their JOLs. Another 14 participants (28.00%) integrated two cues in their JOLs, with 5 participants integrating actual concreteness and actual emotionality, 6 participants integrating actual concreteness and announced emotionality, 2 participants integrating actual and announced emotionality, and 1 participant integrating actual emotionality and announced concreteness. Overall, 45 participants (90.00%) integrated announced concreteness and/or announced emotionality in their JOLs.
The individual-level analysis based on effect size revealed a similar pattern: 4 participants (8.00%) integrated all four cues in their JOLs, 15 participants (30.00%) integrated three cues in their JOLs, 16 participants (32.00%) integrated two cues in their JOLs (actual concreteness and actual emotionality: 1 participant, actual and announced concreteness: 3 participants, actual concreteness and announced emotionality: 6 participants, actual emotionality and announced concreteness: 3 participants, actual and announced emotionality: 1 participant, announced concreteness and announced emotionality: 2 participants), 2 participants based their JOLs on only one cue (actual concreteness: 1 participant, announced concreteness: 1 participant), and 5 participants based their JOLs on neither cue. Overall, 35 participants (70.00%) integrated announced concreteness and/or announced emotionality in their JOLs.
As can be seen in Table 1, gamma correlations, bias, and Brier scores were similar to Experiments 1 and 2.
Discussion
Consistent with Experiments 1 and 2, immediate JOLs made in the presence of explicit information about cue values increased with actual concreteness and actual emotionality. As in Experiment 1, actual concreteness improved memory performance, whereas actual emotionality reduced memory performance. More importantly, JOLs but not cued recall performance also evidenced effects of announced concreteness and announced emotionality. Two complementary individual-level analyses showed that the majority of participants integrated announced concreteness and/or announced emotionality in their JOLs. The finding that immediate JOLs were based on false explicit information that did not feed into a unified mnemonic feeling argues against the possibility that people base their JOLs on a single mnemonic cue whenever possible. In contrast, it indicates that people integrate multiple cues in their JOLs.
General discussion
Until recently, the issue of how and when people integrate multiple pieces of information in metacognitive judgements was hardly addressed (cf. Rhodes, 2016). Manipulating up to four cues simultaneously, Undorf and colleagues (2018) found that the majority of people based their immediate JOLs on multiple cues. However, this result did not necessarily imply that people adopted a strategy of information integration. As an alternative explanation, it was possible that people based their JOLs on a single mnemonic cue (i.e., a unified feeling of ease, fluency, or mastery), which was affected by multiple intrinsic or extrinsic cues. Hence, an automatic process might mimic strategic cue integration. Furthermore, it was possible that providing explicit information about cue values (e.g., the extent to which to-be-studied items were concrete or emotional) would further boost cue integration in JOLs. The current experiments addressed both issues.
Experiments 1 and 2 reported here revealed that cue integration was not limited to immediate JOLs but also occurred with pre-study JOLs. Pre-study JOLs were made before studying each item and therefore could not possibly rely on a single mnemonic cue that derived from people’s processing of the item at study. Consequently, cue integration in pre-study JOLs showed that people can, in principle, base their JOLs on multiple cues. This was found independently of whether cues were manipulated in two discrete levels (Experiment 1) or on a continuum (Experiment 2). However, it was still possible that participants strategically integrated multiple cues in pre-study JOLs, but based their immediate JOLs on a single mnemonic cue. Inconsistent with this possibility, Experiment 3 showed that false explicit information about cue values contributed to immediate JOLs. Notably, false explicit information could impact immediate JOLs through strategic integration of multiple cues but not through mnemonic cues such as a unified feeling of ease.
Neither Experiment 1 nor Experiment 2 found differences in cue integration depending on whether or not participants were provided with explicit information about cue values (for a similar finding, see Price & Harrison, 2017). This is quite remarkable given that, in Experiment 2, there were fine-grained differences between items in concreteness and emotionality. These findings suggest that people can easily extract relevant cue information from to-be-studied items, as has been suspected by Undorf and colleagues (2018).
There are three more noteworthy aspects of the current results. First, across experiments, concreteness and emotionality had larger effects on pre-study JOLs than on immediate JOLs. In contrast, prior studies usually found smaller or similar cue effects on pre-study JOLs than on immediate JOLs (e.g., Mueller et al., 2016; Mueller et al., 2014; Mueller et al., 2013; Witherby & Tauber, 2017, but see Price & Harrison, 2017). As mentioned earlier, we suspect that larger cue effects on pre-study JOLs than on immediate JOLs resulted because participants did not have the possibility to incorporate their own study experiences in pre-study JOLs. In particular, when making pre-study JOLs, participants could not attune their beliefs about how concreteness and emotionality would affect memory to their subjective impressions of item memorability. Moreover, idiosyncratic person–item interactions could not possibly diminish cue effects on pre-study JOLs but may reduce the impact of the two manipulated cues on immediate JOLs (Bröder & Undorf, 2019). For instance, a dedicated librarian may accurately predict that she will remember edition, even though this is an abstract and neutral word according to word norms. Alternatively or additionally, the fact that concreteness and emotionality were the only pieces of information available when making pre-study JOLs may explain why the two cues impacted these JOLs more strongly than immediate JOLs, for which the learning experience provided additional cues. Also important, given that false explicit information about cue values affected immediate JOLs in Experiment 3, we can exclude the possibility that differences in the size of cue effects were due to participants basing immediate JOLs solely on mnemonic cues. Note, however, that this account cannot explain why prior studies found smaller or similar cue effects on pre-study JOLs than on immediate JOLs. One might speculate that providing participants with exact cue values (e.g., concreteness of 6.01) rather than with verbal descriptions of cue values (e.g., concrete word pair) increases cue effects on pre-study JOLs. Another possibility may be that manipulating two cues simultaneously increases cue effects on pre-study JOLs. Consistent with this possibility, the only prior study that found larger cue effects on pre-study JOLs than on immediate JOLs also manipulated two cues (Price & Harrison, 2017), whereas all other studies manipulated single cues. Finally, it is possible that our results are not so discrepant after all, because previous studies also yielded inconsistent results concerning the relative size of cue effects on pre-study and immediate JOLs. For instance, Witherby and Tauber (2017) found smaller effects of concreteness on pre-study JOLs than on immediate JOLs in one experiment but similar effects of concreteness on the two types of JOLs in another experiment. More research is needed to test these ideas.
Second, in the current experiments, participants integrated two cues, one of which (concreteness) had parallel effects on JOLs and memory performance, whereas the other (emotionality) resulted in higher JOLs but had a detrimental effect on memory performance. These findings demonstrated that immediate JOLs were indeed based on probabilistic cues rather than on direct access to internal memory representations (see, e.g., Koriat, 1997, 2007).
Finally, accuracy of pre-study JOLs was lower than accuracy of immediate JOLs, with differences showing up mainly in calibration in Experiment 1 and differences showing up mainly in resolution in Experiment 2. The finding that pre-study JOLs revealed a certain degree of accuracy demonstrated that the manipulated cues alone permitted a reasonably good prediction of later memory performance. At the same time, accuracy disadvantages of pre-study JOLs as compared with immediate JOLs showed that the possibility to factor one’s own experiences into JOLs permits yet better predictions (see also Bröder & Undorf, 2019). Whether this excess accuracy is due to effects of mnemonic cues, idiosyncratic cues in the form of person–item interactions, or both, is an interesting avenue for future research.
To test whether people integrate multiple cues in single JOLs, this study examined cue integration across items. Obviously, simultaneously manipulating multiple cues in a within-participants design necessitates using several items (e.g., a minimum of four items when manipulating two cues with two levels each). We therefore cannot exclude the possibility that participants based their JOLs for some items on concreteness and their JOLs for other items on emotionality. However, this problem is common to all studies on cue integration in judgements and decisions and is by no means specific to this study.
From a broader perspective, similar effects of multiple cues on pre-study JOLs and immediate JOLs suggest that explicit beliefs about memory may govern JOLs in situations with multiple varying cues, as has been found in some studies that manipulated single cues (Mueller et al., 2014; Undorf & Zimdahl, 2019; Witherby & Tauber, 2017, but see Besken & Mulligan, 2013; Undorf & Erdfelder, 2015; Undorf & Zander, 2017; Undorf, Zimdahl, & Bernstein, 2017). This fits with analytic processing theory, according to which people deliberately search for variability across items and base their JOLs on activated or newly formed beliefs about how item characteristics or experimental manipulations may affect memory (Dunlosky & Tauber, 2014; Mueller et al., 2013). Although we cannot exclude the possibility that simultaneously manipulating more than two cues may foster the reliance of JOLs on mnemonic cues such as fluency (cf. Undorf et al., 2018), this study provided no evidence for this idea. Rather, this work suggests that cue integration in JOLs is at least partly strategic.
Footnotes
Acknowledgements
The authors thank Gabriela Ay, Maike Czink, Carolin Horn, and Franziska Schäfer for help with data collection.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by Grant UN 345/1-3 from Deutsche Forschungsgemeinschaft (to MU), a Margarete von Wrangell fellowship from the state of Baden-Württemberg (to MU), and a grant from the Deutsche Forschungsgemeinschaft to both the authors (UN 345/2-1 and BR 2130/14-1).
