Unveiling pre-service teachers’ cognitive processes: insights from thinking-aloud protocols on self-regulated learning assessment / Descubriendo los procesos cognitivos de los docentes en formación: enseñanzas sobre la evaluación del aprendizaje autorregulado mediante protocolos de reflexión en voz alta

Abstract

Spanish

Self-regulated learning (SRL) is essential for lifelong learning but challenging to achieve without guidance. Teachers play a key role in fostering SRL, requiring diagnostic competence to assess students’ SRL skills and provide tailored support. However, many teachers lack the ability to accurately assess SRL, highlighting the need to address this in pre-service teacher training. This study explored how SRL-specific tools influence pre-service teachers’ cognitive processes and judgement accuracy in SRL assessment. An experimental study with 96 elementary education pre-service teachers (47 first-year and 49 fourth-year; M = 19.9 years) compared the use of a rubric and a rating scale for assessing SRL. Our results revealed that the rubric significantly enhanced pre-service teachers’ cognitive processes, enabling them to recall observed actions and align them with assessment criteria, which led to more accurate judgements. These findings highlight the importance of integrating structured tools like rubrics into teacher training programmes to enhance SRL assessment and its development and support future educators in promoting SRL effectively.

Keywords

self-regulated learning SRL assessment rubric pre-service teachers experiment

Self-regulated learning (SRL) is a fundamental skill for lifelong learning, but its development requires training due to its complexity. Studies emphasize the importance of SRL training in schools and the need for teachers to have specific SRL competences to effectively promote it (Karlen et al., 2020). One of these essential competences is the diagnostic competence, which is crucial for adapting SRL promotion to students’ needs (Dignath & Sprenger, 2020; Karlen et al., 2024).

Recent research has begun to explore how teachers develop SRL diagnostic competence within classroom settings, revealing that many teachers lack sufficient knowledge about SRL and the appropriate tools to assess it effectively (Michalsky, 2017). This highlights the potential value of focusing on pre-service teachers, equipping them with this competence during their training so they can apply it effectively in their professional practice. At the same time, understanding how SRL assessment is carried out requires more than just identifying gaps in knowledge or training. It also requires a deeper exploration of the internal processes that teachers engage in when assessing students’ SRL. Although ongoing research continues to explore the outcomes of SRL assessment (Dörrenbächer-Ulrich et al., 2021), limited attention has been given to the specific cognitive processes that underpin teachers’ assessments. Investigating these processes in pre-service teachers could provide valuable insights into how SRL assessment is performed, complementing efforts to improve pre-service teacher training by informing strategies to enhance the accuracy and effectiveness of SRL assessments.

Equally, research on assessment has consistently highlighted the importance of using structured assessment tools with clear criteria to ensure accurate and reliable evaluations (Woolf, 2004). Rubrics and rating scales, in particular, have been widely recognized as effective tools for guiding assessment by providing explicit criteria that help evaluators make more consistent judgements. However, these tools are often mixed, despite their distinct structures and purposes (Brookhart, 2013). Since they have proven useful for assessing other competencies, it is important to determine which of the two better supports pre-service teachers in accurately assessing SRL. Clarifying their distinct contributions can help refine assessment practices and improve teacher training.

We designed an experimental study to explore the cognitive processes pre-service teachers use when assessing students’ SRL with either a rubric or a rating scale. We also looked at how these processes and assessment tools influence their judgement accuracy. Additionally, we controlled for pre-service teachers’ year level (first and fourth year) to see if the general training they receive — intended to prepare them for professional practice — helps develop their diagnostic skills, even if it is not directly related to SRL.

Self-regulated learning and teachers’ diagnostic competence

Self-Regulated Learning (SRL) refers to ‘self-generated thoughts, feelings and actions that are planned and cyclically adapted for the attainment of personal goals’ (Zimmerman, 2000, p. 14). SRL is a dynamic process influenced by a range of internal components and external environmental factors. To better understand and explain these interactions, researchers have developed various models. Although each model differs slightly in its approach, they generally share the idea that SRL consists of distinct components (e.g., cognition, metacognition, motivation, emotion and behaviour) interacting with each other (Panadero, 2017). Self-regulated learners have the ability to use and combine various SRL strategies to prepare, monitor and control different actions depending on the learning situation and task to achieve the desired goals (Pressley et al., 1987). It is, therefore, understandable that SRL is a complex process that requires training and teachers’ support.

It is essential for teachers to acquire specific competences that enable them to promote SRL in the classroom effectively (Heirweg et al., 2022; Karlen et al., 2025). Within these skills, teachers must develop diagnostic competences. Only when teachers can accurately assess their students’ SRL will they be able to promote SRL adaptively (Karlen et al., 2024).

Teachers’ diagnostic competence refers to their ability to accurately assess students’ SRL. In this context, SRL assessment specifically aims to identify students’ strengths and weaknesses while learning, allowing teachers to adapt their instruction and offer personalized support to foster SRL skills (Karlen et al., 2020). This diagnostic competence requires teachers to master various assessment skills. In addition to possessing a deep understanding of SRL, teachers must be knowledgeable about effective SRL assessment practices, including the use of appropriate tools and cues. Furthermore, teachers need the ability to critically analyse and interpret the information gathered in order to make accurate and informed instructional decisions (Loibl et al., 2020). However, studies indicate that many teachers lack the necessary knowledge and skills to carry out effective SRL assessments (Karlen et al., 2024; Latva-aho et al., 2024). It is, therefore, essential to support teachers in improving their ability to manage SRL assessment effectively. One approach to achieving this could be by integrating SRL diagnostic competence training from the earliest stages of teacher education.

Pre-service teachers’ cognitive processes when assessing SRL

Pre-service teachers will engage in cognitive processes when assessing SRL, since they have to carry on actions such as identifying, gathering, recording, interpreting and reporting information about their students — processes that ultimately inform how they adapt their SRL instruction (Karlen et al., 2020; Michalsky, 2017). These types of cognitive processes are well established as integral to the formative assessment process and typically involve careful observation of students, reflective analysis of what has been observed, interpretation of that evidence and informed instructional decision-making (Black & Wiliam, 2009).

These actions are complex and difficult to see, since they happen embedded in the teaching–learning process. It is then difficult to understand what pre-service teachers really do when they want to assess students’ SRL, since they must transfer what they have learned into practice, which may sometimes be in conflict (Deprit et al., 2025). Also, it is not easy for researchers to get to this information while pre-service teachers are performing the assessment in a real classroom environment.

For this reason, most research on SRL assessment — performed with in-service teachers — has focused primarily on its outcomes (Dörrenbächer-Ulrich et al., 2021; Karlen et al., 2024). However, to improve SRL assessment practices, it is crucial to understand the cognitive processes employed during the assessment itself. One way to stimulate these internal processes could be to use think-aloud protocols (Panadero, 2023).

Think-aloud protocols are a widely established method for collecting data on cognitive processes (Ericsson & Simon, 1980). In educational contexts, they serve to connect participants’ internal perspectives — such as those of pre-service teachers — with a more systematic and objective analysis of the phenomenon under investigation (Ericsson & Fox, 2011; McIntyre et al., 2022). As stated by Panadero et al. (2024, 2025) in their SEFEMO model, this method has proven to be highly effective in the fields of SRL and self-assessment, offering valuable insights into the cognitive processes students undertake while engaging in these complex operations.

Building on this, our first research question focuses on identifying the cognitive processes pre-service teachers engage in while assessing SRL, as elicited through think-aloud protocols conducted during the assessment process. Additionally, we examine how these cognitive processes relate to the accuracy of their assessments.

Accurate SRL assessment

Accuracy in assessment states for comparing what has been assessed and the real performance or competence of the assessed student (Brown & Harris, 2013). Therefore, the quality of the assessment will be determined by its accuracy. Research on the judgement accuracy of SRL assessment (e.g., Carr & Kurtz, 1991; Friedrich et al., 2013; Karlen et al., 2024) has focused essentially on the outcome of the assessment. These studies have shown that teachers struggle to accurately judge students’ SRL skills, emphasizing the importance of better understanding teachers’ assessment processes and improving their diagnostic competence.

To address this, our study contributes to SRL assessment research by examining judgement accuracy alongside online measures, such as think-aloud protocols, to gain deeper insights into the cognitive processes involved. By combining accuracy measures with an analysis of pre-service teachers’ thought processes, we aim to provide a clearer framework for refining SRL assessment practices. This approach allows us to identify strategies that support SRL assessment and to examine key factors — such as year level, the use of assessment tools and their interaction — that may influence judgement accuracy. Although it is well established that diagnostic competence does not fully develop without explicit SRL training (Michalsky, 2021), we hypothesize that using structured SRL-related tools may help compensate for this gap and positively impact pre-service teachers’ SRL judgement accuracy. If these tools support pre-service teachers in making more accurate judgements, they could serve as valuable scaffolds for developing SRL assessment skills, even in the absence of targeted SRL instruction.

Building on this, our second research question investigates whether the experimental conditions (assessment tools and year level) and their interaction influence pre-service teachers’ SRL assessment accuracy.

How to measure SRL skills: assessment practices and tools

Part of the diagnostic competence is that teachers master the assessment practices and tools they will use for SRL assessment. However, research shows this is not the case, since teachers frequently depend on tools that do not provide relevant diagnostic information (Michalsky, 2017). The choice of assessment tools depends largely on the types of inferences teachers wish to draw about students’ SRL (Cleary & Russo, 2023). It is crucial, however, that clear assessment criteria are established to ensure the primary goal of the assessment is met (Clark, 2012) — namely, identifying students’ strengths and needs in SRL to promote SRL adaptively.

Michalsky (2017) found that the most commonly used assessment tools for SRL assessment are oral interviews and learning diaries — tools that typically lack explicit assessment criteria. Research on assessment accuracy has consistently shown that using clear, systematic assessment criteria — where evidence is compared against defined standards — is more effective than relying on subjective judgements (Heitzmann et al., 2019; Südkamp et al., 2012). This highlights a key tension in SRL assessment: the need to shift from intuitive, impression-based methods to more structured approaches. One way to address this challenge is through the use of tools that incorporate SRL-specific assessment criteria, such as rubrics or rating scales.

Rubrics are tools that ‘articulate expectations for student work by listing criteria for the work and performance level descriptions across a continuum of quality’ (Brookhart, 2018, p. 1). In contrast, rating scales are ‘lists of specific characteristics with a place for marking the degree to which each characteristic is displayed’ (Brookhart, 2013, p. 78). These two assessment tools are often mistaken for one another. However, while both can provide teachers with specific assessment criteria that serve as clear standards for evaluating SRL, they differ in their design and purpose. Rubrics offer a detailed framework that outlines the specific actions students need to take to achieve the desired outcome, whereas rating scales are designed for quicker assessments, facilitating more immediate evaluations during the teaching–learning process. Given these inherent differences, these assessment tools may elicit different cognitive processes in pre-service teachers when performing SRL assessment.

Limited research has specifically utilized these tools for SRL assessment, despite their established effectiveness in evaluating academic performance (Bores-Garcia et al., 2023; Panadero & Jonsson, 2013). As a result, this study is primarily exploratory.

In this line, our third research question explores whether the use of these assessment tools — either independently or in interaction with pre-service teachers’ year level — affects the cognitive processes pre-service teachers engage in while assessing SRL.

Pre-service teachers’ training on SRL assessment

Assessment skills are a core professional competence across educational systems, enabling teachers to support students’ learning adaptively. Consequently, supporting teachers in developing their assessment skills has been a primary focus over the past two decades, particularly emphasizing assessment skills related to students’ performance (Allal, 2020; DeLuca et al., 2016). To date, few studies have focused on teacher training for diagnostic competence in SRL (Bäuerlein et al., 2023). A promising approach to fostering the development of teachers’ SRL diagnostic competence is to integrate relevant training into their initial teacher education. Specifically, incorporating SRL assessment into pre-service teacher training programmes can help future teachers acquire the necessary skills to accurately assess their students’ SRL from the outset of their professional careers (Dignath & Büttner, 2008).

Previous studies have highlighted that while interventions designed to train pre-service teachers in SRL — either as learners, teachers or both — have been implemented, they often lack a focus on SRL assessment (Ortube et al., 2024). Therefore, there appears to be a notable gap in research on how to train pre-service teachers to assess their students’ SRL effectively. Research suggests that teachers’ experience and age show a weak correlation with SRL judgement accuracy (Michalsky, 2021), indicating that years of teaching practice alone do not substantially aid in developing this competence. By extension, it is plausible to assume that pre-service teachers, who lack professional experience in assessing students, may face even greater difficulties in cultivating this critical skill.

This raises an important question: does the standard training provided in teacher education programmes — thus, pre-service teachers’ development as professionals — adequately contribute to the development of skills associated with diagnostic competence? We consider it relevant to investigate whether the general training received throughout the programme — including the development of competencies for the teaching profession and two internship experiences — has an impact on pre-service teachers’ diagnostic competence, especially in their judgement accuracy and the cognitive processes they use.

To address this, we included pre-service teachers’ year level in research questions two and three. While year level is not the primary focus of the study, it serves as an important factor in understanding the potential influence of training and experience on SRL assessment outcomes.

Aim and research questions (RQs)

Our aim is to investigate which cognitive processes pre-service teachers develop when assessing students’ SRL and whether these cognitive processes, combined with regular teacher training and the use of distinct assessment tools, make pre-service teachers more accurate assessors. To achieve this aim, we examine the following research questions (RQ):

RQ1: Which are the cognitive processes pre-service teachers use when assessing students’ SRL?

H1: We expect that pre-service teachers use cognitive processes similar to the ones seen before in other assessment processes (SEFEMO model — see Method) (H1a). We also anticipate identifying cognitive processes that negatively correlate with pre-service teachers’ Absolute Accuracy Error, indicating that certain cognitive processes may enhance their assessment accuracy (H1b).

RQ2: How do year level and the type of assessment tool influence pre-service teachers’ judgement accuracy in assessing students’ SRL?

H2: We expect that pre-service teachers using the rubric are more accurate in their SRL assessments, since this assessment tool has been proven to be highly effective (Jönsson, 2020) (H2a). We do not expect a direct effect of pre-service teachers’ year level on their judgement accuracy (Michalsky, 2021) (H2b). We do expect an interaction effect between rubric and year of level in judgement accuracy (H2c).

RQ3: Do the year level and the assessment tool affect pre-service teachers’ cognitive processes when assessing students’ SRL?

H3: We expect the year level (pre-service teachers’ development as professionals) to have a direct impact on the cognitive processes (H3a). We expect the assessment tools to have a direct impact on the cognitive processes (Woolf, 2004) (H3b). We expect an interaction between year level and assessment tool (H3c).

Method

Sample and procedure

A total of N = 96 elementary school pre-service teachers from the University of Deusto (Spain) participated in this study, of whom 76 were female. Pre-service teachers in our study were first-year (n = 47) and fourth-year students (n = 49). The average age of the sample was M = 19.9 years (SD = 1.96).

An a priori power analysis was conducted for one-way ANOVA using G*Power version 3.1.9.7 (Faul et al., 2007). The minimum sample size required was n = 96 to achieve 95% power for detecting a low effect size at a significance criterion of α = .05. The power analysis showed that our sample size has sufficient power to detect a possible effect.

Our sample is a convenience sample, since the participants were freshman university students from the university of the first author we had access to. We approached the students by contacting their university teachers and asking for permission to present the study in the classrooms. The first author prepared a brief session in which the benefits of SRL assessment and the aim of the study were presented. Then, pre-service teachers had the opportunity to write their name and email in an Excel document, so the first author could contact them. Pre-service teachers participated in the study for 1.5 hours and received a reward of .5 points towards their final grade in the subject where the explanation of the study was conducted. By linking participation to their academic performance, this reward contributed to enhancing the study’s ecological validity. Even though the task was beneficial for their training, without this incentive, pre-service teachers may not have perceived the task as directly connected to their training. By linking the study to their grades, we ensured that participants recognized the relevance of the task within their broader educational context.

Design

We developed a randomized between-groups design with four conditions (Figure 1). The aim of the experiment was to explore what pre-service teachers do when assessing students’ SRL. For this, pre-service teachers participating in the experiment developed two SRL assessments of two different fictitious elementary school students (one masculine and the other feminine). Pre-service teachers were randomly assigned to one of two groups: one group used a rubric for both assessments (n = 47), while the other group used a rating scale (n = 49). Both assessment tools function as SRL diagnostic cues, differing only in their format. Importantly, the random assignment to groups was independent of the participants’ academic level, meaning that both first-year (n = 47) and fourth-year students (n = 49) were distributed across both conditions.

Figure 1.

Design of the study.

As part of a larger project, the study incorporated a comprehensive set of measures, including a battery of self-reported questionnaires, eye-tracker data, electro-dermal activity data and concurrent think-aloud protocols (Appendix 1 in the Supplementary Material). In this article, we focus solely on analysing think-aloud protocol data to examine the actions, strategies and criteria pre-service teachers employ when assessing students’ SRL, as well as the accuracy of their assessments.

Instruments

Fictitious elementary school student description

We created two fictional scenarios in which two elementary students were developing different academic assignments (see Appendix 2 in the Supplementary Material). These scenarios were developed based on Seli and Dembo’s (2020) book, where this type of activity is proposed. The scenarios Selly and Dembo propose describe university students, but we adapted those exemplars to elementary students to align with our sample of elementary education pre-service teachers. To develop the fictional profiles, we created two 11-year-old students — one boy and one girl. We chose older elementary school students because, given the exploratory nature of our study, we wanted to represent students with more advanced SRL strategies and make these behaviours more visible. Additionally, since our participants were assessing SRL for the first time, we sought to keep the task manageable by avoiding overly complex profiles. Each fictional student was designed to represent a distinct SRL profile: one with high SRL skills and the other with low SRL skills. The profiles incorporated specific cues related to SRL areas and strategies (Appendix 3 in the Supplementary Material), providing opportunities for pre-service teachers to identify and interpret them accurately. To prevent order effects, we counterbalanced the presentation of the scenarios across participants.

The first author, in collaboration with an expert university teacher, developed an expert assessment for both scenarios (see Appendices 3 and 4 in the Supplementary Material). While we acknowledge that expert ratings may vary depending on the assessor, we took steps to minimize this potential variability. In this case, both researchers conducted an independent expert rating, and both assessments were thoroughly compared and discussed to ensure consistency and establish a well-founded final expert assessment as the standard.

Think-aloud method

We gathered concurrent think-aloud protocols as participants performed the SRL assessment (read Procedure below). Next, we explain the decisions we took regarding our think-aloud method.

Instructions

During the pre-experimental phase participants were asked to act as teachers during the experiment, and we explained that, when assessing the students, they had to explain aloud the reasoning for their assessment. For more clearness, we told them to explain as concretely as possible why they were rating the students with that number and no other.

Task nature

In this study, all participants completed the same task — assessing the SRL of two students — ensuring that any variation in difficulty was not attributable to differences in the task. Additionally, the activity was chosen for its moderate level of difficulty, following recommendations from existing literature (Charters, 2003; Ericsson & Simon, 1993). However, the task was something none of the participants had ever done before, even if it is completely related to their professionalization process.

Think-aloud level of verbalization

The instructions, task characteristics and experimental conditions were specifically designed to elicit primarily level-1 and level-2 verbalizations. Focusing on these levels was essential to reduce potential disruptions to the natural cognitive processes involved in completing the task (Ericsson & Simon, 1993). Level-1 verbalizations were elicited with verbal actions closely related to the task instructions (Ericsson & Simon, 1980). For example, participants engaged in level-1 verbalizations when talking about the rating they were giving to the students or the assessment criteria. This is not highly demanding cognitively.

Level-2 verbalizations were prompted when participants needed to articulate their deeper reasoning into speech. For example, participants verbalizing their steps and reasons for their decision-making during the assessment. These verbalizations are essential because they capture cognitive processes and decision-making as they occur, with minimal alteration through verbalization, thereby maintaining the authenticity of the natural cognitive flow (Ericsson & Simon, 1980).

Segmentation and unit of analysis

Each unique idea conveyed by the participants was treated as a unit of analysis. While these units typically aligned with single sentences, they occasionally extended across multiple sentences when necessary to encompass a complete idea. This method enabled a thorough examination of the complexity of participants’ cognitive activity. Defining individual ideas as discrete units allowed for detailed monitoring of how participants navigated the task.

Assessment tools

Rubric

We created a rubric based on Pintrich’s (2000) model (see Appendix 5 in the Supplementary Material). The rubric has nine assessment criteria and four performance levels presented in a table format. Each assessment criteria refers to an SRL strategy, and performance levels explain the quality of those strategies, being four the maximum quality level and one the least. Pre-service teachers had to rate from 1 to 4; no medium rates were allowed (e.g., 2.5).

Rating scale

We developed the rating scale by identifying key characteristics that a highly self-regulated student might demonstrate. We ensured that these reflected the same SRL strategies as the rubric but without explicitly naming them (see Appendix 6 in the Supplementary Material). The rating scale is presented in nine individual statements, which pre-service teachers rated on a four-point scale (1 to 4), with no intermediate scores allowed (e.g., 2.5).

Procedure of the assessment

Participants were contacted and given a date for coming to the laboratory for 1.5 hours individually. Once in the laboratory, the pre-experimental phase took place. First, the aim of the study and the ethical proceedings were explained to the participants. Second, participants signed the informed consent. Third, participants were asked to act as a teacher from that moment on, throughout the whole experience, and the experiment was explained in depth to them using a helping diagram (see Appendix 7 in the Supplementary Material). Fourth, the thinking-aloud instructions were given to them (see the thinking-aloud method). No specific training for the thinking-aloud procedure was given, since the instructions were clear enough to elicit the information we were expecting. Fifth, participants were asked to ask questions regarding the experimental procedure if necessary. Finally, participants filled out all the pre-test measures.

After the pre-experimental phase, participants were taken into the experimental setting to start the experimental phase. The experimental phase had two parts. In the first part, after the eye-tracker calibration, the first fictitious student description was shown. The participants were asked to read the description, simulating that they were observing this fictitious student’s behaviour. When finishing, participants filled up some questionnaires and jumped into the SRL assessment. For this, participants were shown the same description they had read before and the assessment tool corresponding to their condition (rubric or rating scale — nine assessment criteria to assess in both) (see Appendix 7 in the Supplementary Material).

Participants were asked to rate the student SRL while thinking aloud their assessment process (more detailed information in the thinking aloud method). Finally, participants filled up some post-test measures. In the second part of the experimental phase, participants underwent the same process, but this time they had to assess a different fictitious student.

Data analyses

Coding scheme

We developed a coding scheme based on the SEFEMO model (Panadero et al., 2024) to analyse the Thinking-Aloud Protocols (TAP) related to SRL assessment. The categories were adapted to our data, but we equally found actions, strategies and criteria in our TAP data. The relation between the coding schemes can be seen in Table 1.

Table 1.

Comparison of coding schemes.

Panadero et al. categories	Our categories	Explanation
No change in the categories
Rate	Rate	In Panadero et al., rate is defined as the instance when students estimate the quality of their own work. In our coding scheme, rate refers specifically to the point at which pre-service teachers (PSTs) explain the rating they assigned to the fictitious student.
Assess	Assess	In Panadero et al., assess involves students estimating the quality of their own work based on predetermined criteria. In our study, assess refers to the action in which PSTs justify their scoring decisions by explicitly referencing the assessment criteria they applied to the fictitious student.
Reinterpretation of categories
Recall	Recall	Panadero et al. defined recall as the process by which students remember their actions when they did the activity they are self-assessing. In our coding, we identified recall as a strategy embedded within the assess action. This category emerged when PSTs assessed a specific criterion by recalling details of the fictitious student’s actions, thus referencing information from the provided scenarios.
Recoding of the categories
Redo	Reassess	In Panadero et al., redo is used when students perform the task they are assessing again. In our coding, we recoded this category as reassess, reflecting an advanced strategy that involves re-evaluating the task, critically reflecting on the initial assessment, identifying mistakes and justifying adjustments, with the strategies used in the assessment phase serving as criteria in the reassessment process.
Compare	Moment-to-moment observation	In Panadero et al., compare is used when students compare information sources. In our coding, this category refers to ‘moment-to-moment observation’ strategy. This process involves PSTs explicitly revisiting the text to locate information required for the assessment. By engaging in this strategy, PSTs actively compared the scenarios to the assessment tool.
Not added categories
Read	—	In our study, all participants were required to read the provided materials (scenarios and assessment tools). As a result, reading itself was not considered a distinct action indicative of the SRL assessment process. Instead, the observation process, nested within the assess action, was simulated through reading.
New categories
Tools		This strategy refers to the use of the assessment tool when performing SRL assessment.
Beliefs		This strategy refers to the use of SRL and learning beliefs when performing SRL assessment. Statements were coded as ‘beliefs’ when PSTs expressed personal opinions about student actions or the assessment criteria.
Advice		This action refers to PST offering guidance to the fictitious students about how to improve SRL.

The first author read all the think-aloud data and created general categories derived from the manual. These general categories were classified as actions (rate, assess, reassess and advice) (see Appendix 8 in the Supplementary Material). Then, to identify the sub-categories, the first author and another expert researcher independently coded six participants per round, doing a total of nine rounds (54 participants in total). After each round, both researchers met and discussed the disagreements and decided to cluster or discard some of the categories. The inter-rater reliability led to a κ = .84, which means a high agreement (Landis & Koch, 1977). Finally, the rest of the cases were coded by the first author.

Through the coding process, a final coding scheme was established, which identified four main categories, each nested within the other (see Figure 2). First, four core actions were identified, as previously mentioned. Second, within these core actions, a sub-category termed strategies was established, which describes what participants do to execute those core actions. Third, a more refined category called criteria was identified, outgoing how participants used the identified strategies. Within this last category, we further differentiated the quality of the criteria employed by participants. To capture this variation, the criteria were categorized into levels (ranging from 0 to 2).

Figure 2.

Representation of the final coding scheme.

Quantitative analysis — frequencies, correlations, MANOVAs and accuracy analysis

We performed four quantitative analysis procedures to analyse the data encountered through the coding scheme.

First, we gathered the frequencies of all the categories. This way we had a descriptive overview of what participants were doing when assessing students’ SRL. These frequencies were used to answer RQ1. For this analysis, we utilized data from the entire sample, which included participants from both experimental conditions (rubric vs. rating scale). Therefore, when interpreting these results, it should be considered that the data are already influenced by the experimental conditions.

Second, we conducted a correlation analysis between the most frequently used categories and pre-service teachers’ Absolute Accuracy Error, which was calculated as the difference between the SRL assessments made by pre-service teachers and the yardstick assessment. This analysis allowed us to identify the most effective cognitive processes for assessing SRL. We used this analysis in RQ1.

Third, to determine which groups (first-year vs. fourth-year pre-service teachers and rubric vs. rating scale) were more accurate in assessing SRL, we conducted a two-way ANOVA to compare the Absolute Accuracy Error across the groups. This analysis provided the data necessary to address RQ2.

Fourth, and last, in order to examine the association of pre-service teachers’ year level and assessment tools, as well as their interaction, on pre-service teachers’ cognitive processes while assessing students’ SRL, we performed MANOVA analyses for each category of the coding scheme. We used these data to answer RQ3.

Results

RQ1. Which are the cognitive processes pre-service teachers use when assessing students’ SRL?

Frequencies show that the actions performed were rate, assess, reassess and advice (see Table 2). The first two actions were specifically promoted in the experimental setting, and so, the frequencies are high (rate f = 1,490; assess f = 3,726). However, through the process of SRL assessment, pre-service teachers also reassessed and advised. Even if the frequencies for these actions are low (reassess f = 26; advice f = 22), the fact that the researchers did not previously ask them makes them relevant.

Table 2.

Frequencies of actions, strategies and criteria.

ACTIONS	STRATEGIES	CRITERIA		f		%	f	%
ACTIONS	STRATEGIES		Level	f		%	f	%
Rate	Rate	The score and the explanation do not match	0	1,490	1,490	—	15	1
		Pre-service teacher is doubtful when rating	1				75	5
		Pre-service teacher is sure when rating	2				1400	94
Assess	Moment to moment observation	Explicit finding of evidences	1	3,726	276	7.4	276	-
	Recall of observed information	Erroneous evidence// Imprecise information//Judgements	0		1,794	48.1	422	23.5
		Description of concrete and detailed actions	1				1123	62.6
		Inferences// Identification of lack of information	2				249	13.8
	Tools	Only read	0		1,114	29.8	102	9.2
		Affirm or deny the achievement of the assessment criteria	1				771	69.2
		Break down the assessment criteria and analyse it by pieces	2				241	21.6
	Beliefs	The argument given is not related to the criteria	0		539	14.5	32	5.9
		Beliefs of what is right and/or wrong	1				277	51.4
		Personal opinions about the criteria or about the student	2				230	42.7
Reassess	Reassess	Use of erroneous evidence	0	26	26	—	3	11.5
		Use of beliefs and/or opinions	1				5	19.2
		Use of tools and/or proper observed information	2				18	69.2
Advice	Advice	Explain actions the student could do to improve their SRL	1	22	22	—	17	77.3
Advice	Advice	Explain actions that he/she could do as pre-service teacher to help the student improve	2	22	22	—	5	22.7

Note: For more detailed information regarding the categories, see Appendix 5 in the Supplementary Material.

Regarding the strategies and criteria used by pre-service teachers, Table 2 highlights that the most frequently employed were rate (f = 1,490), recall (f = 1,794), use of assessment tools (f = 1,114) and reliance on beliefs (f = 539). To examine the impact of these strategies and their criteria on pre-service teachers’ diagnostic competence, we conducted a correlation analysis with their judgement accuracy (Table 3), measured using Absolute Accuracy Error (see Method section).

Table 3.

Correlation analysis of strategies and criteria with judgement accuracy.

Strategies	r	p	Criteria	r	p
Rate	−.145	.158	Level 2	−.155	.132
Recall	−.294	.004**	Level 1	−.357	< .001***
Tool	−.287	.005**	Level 1	−.252	.013*
Beliefs	.462	< .001***	Level 1	.070	.500
Beliefs	.462	< .001***	Level 2	.529	< .001***

Note: n = 96; df = 94 for all correlations; *p < .05; **p < .01; ***p < .001

Data show that pre-service teachers’ judgement accuracy improves when using recall (r = −.294, p = .004) and when relying on the assessment tools (r = −.287, p = .005). Further correlation analysis of the most frequently used criteria within these strategies confirms that accuracy increases when pre-service teachers recall concrete student actions (r = −.357, p ⩽ .001) and when they use assessment tools to affirm or deny the achievement of assessment criteria (r = −.252, p = .013). Conversely, the data also reveal that relying on personal opinions during SRL assessment reduces accuracy (r = .529, p ⩽ 001).

In sum, the data suggest that, within the experimental setting, pre-service teachers most frequently engaged in rating and assessing, often in combination with recalling observed information and utilizing the assessment tool — both of which enhanced their judgement accuracy. While the use of personal beliefs was also common, albeit to a lesser extent, it appeared to impact the accuracy of SRL judgements negatively. These findings confirm our hypothesis, since the cognitive processes identified align with those described in the SEFEMO model, while also revealing assessment-specific processes unique to the SRL assessment context (see Table 1). Furthermore, our hypothesis is also supported by the correlation between these cognitive processes and pre-service teachers’ SRL judgement accuracy, reinforcing the link between thought processes and assessment performance.

RQ2. How do year level and the type of assessment tool influence pre-service teachers’ accuracy in assessing students’ SRL?

The results showed no significant interaction effects between year level and assessment tool in SRL assessment accuracy. Moreover, there were no main effects of the year level in accuracy. However, wa significant main effect of the assessment tool was found, F(1, 92) = 21.87, p < .001, η² = .188, indicating that the type of assessment tool explained approximately 19% of the variance in judgement accuracy. Pre-service teachers who employed the rubric for SRL assessment demonstrated greater accuracy than those using the rating scale. Although this represents a relatively large effect size, it is consistent with the experimental nature of the manipulation, where the rubric provides detailed performance-level descriptions that directly structure the evaluation process, whereas the rating scale offers less explicit guidance (see Table 4).

Table 4.

Two-way ANOVA of the Absolute Accuracy Error.

Means								Two-way ANOVA
Year level*Assessment tool				Year level		Assessment tool		Two-way ANOVA
Rubric		Rating scale		Year level		Assessment tool		Year level*Assessment tool			Year level			Assessment tool
1st	4th	1st	4th	1st	4th	Rubric	Rating scale	Year level*Assessment tool			Year level			Assessment tool
12.78	11.71	15.75	15.28	14.29	13.53	12.23	15.51	F	p	η²	F	p	η²	F	p	η²
12.78	11.71	15.75	15.28	14.29	13.53	12.23	15.51	.185	.668	.002	.983	.324	.010	21.871	< .001***	.188

Note: n = 96. All effects were tested with F(1, 92). The model explained 20.1% of the variance (R² = .201). ***p < .001; η² indicates the classical effect size

Therefore, our hypothesis is partially confirmed. As anticipated, we found a direct effect of the rubric on pre-service teachers’ judgement accuracy, while no such direct effect was observed for the year level. However, our expectation of an interaction effect between these two variables on SRL judgement accuracy was not supported. This indicates that, although we initially hypothesized that the combination of both factors would enhance pre-service teachers’ SRL judgement accuracy, our results suggest that it is the rubric alone that has a meaningful impact.

RQ3. Do the year level and the assessment tool affect pre-service teachers’ cognitive processes when assessing students’ SRL?

Regarding the actions, no significant main or interaction effects were found for any of the four actions analysed (rate, assess, reassess, advice). Given the exploratory nature of these analyses and the number of comparisons conducted across strategies and criteria levels, the findings should be interpreted with caution. This suggests that neither year level nor type of assessment tool had a measurable impact on the types of actions pre-service teachers engaged in during SRL assessment (see table in Appendix 9 in the Supplementary Material).

Regarding strategies, while no interaction effects were observed, several main effects emerged. Year level significantly affected the use of the ‘recall’ strategy (F = 11.93, p < .001), with first-year students relying on it more heavily. In contrast, assessment tool type had a significant effect on three strategies: observation (F = 7.54, p = .007), use of the tool (F = 7.89, p = .006) and beliefs (F = 26.43, p < .001). Notably, students using rating scales relied more on beliefs and observational judgements, while those using rubrics referred more directly to the tool itself (Table 5).

Table 5.

MANOVA and estimated means with strategies as dependent variables.

Action	Strategy	Year level*Assessment tool			Means				Main effects
		Year level*Assessment tool			Means				Year level			Means		Assessment tool			Means
		F	p	η²	Rubric		Rating scale		F	p	η²	1st	4th	F	p	η²	Rubric	Rating scale
		F	p	η²	1st	4th	1st	4th	F	p	η²	1st	4th	F	p	η²	Rubric	Rating scale
Assess	Observation	.107	.744	.001	2.652	1.792	3.792	3.240	2.246	.137	.024	3.222	2.516	7.542	.007**	.076	2.222	3.516
	Recall	.705	.403	.008	21.391	18.042	20.542	15.040	11.932	< .001***	.115	20.966	16.541	2.259	.136	.024	19.716	17.791
	Tool	.215	.644	.002	12.391	14.000	8.708	11.360	3.583	.062	.037	10.550	12.680	7.891	.006**	.079	13.196	10.033
	Beliefs	.055	.816	.001	3.435	3.542	7.792	7.520	.010	.919	.000	5.613	5.531	26.429	< .001***	.223	3.488	7.656

Note: n = 96. Results correspond to univariate ANOVAs conducted following the MANOVA, F(1, 92). **p < .01; ***p < .001

Regarding criteria, this dimension revealed the most nuanced differences (Table 6). Several main and interaction effects were found across different strategies and cognitive levels:

Recall: both year level and assessment tool affected the accuracy of recalled information, with first-year students showing more errors (F = 10.9, p = .001) and rubric users the ones recalling students’ concrete behaviours (F = 6.79, p = .011).

Tools and beliefs: interaction effects indicated that fourth-year students using rubrics engaged in more analytic evaluations, whereas those using rating scales often relied on surface-level or personal interpretations (e.g., beliefs Level 0: F = 11.19, p = .001).

In sum, while the broader actions during SRL assessment were not directly influenced by year level or assessment tool, both factors played a crucial role in shaping pre-service teachers’ specific cognitive processes. Our hypothesis is confirmed, since we found that assessment tools — particularly the rubric at the criteria level— and their interaction with year level influenced the depth of reasoning and strategies used during SRL assessment. Notably, the rubric appeared to promote more structured and analytical assessment practices, an effect that was especially pronounced among more experienced pre-service teachers.

Table 6.

MANOVA and estimated means with criteria as dependent variables.

Action	Strategy	Criteria	Year level*Assessment tool			Means				Main effects
			Year level*Assessment tool			Means				Year level			Means		Assessment tool			Means
			F	p	η²	Rubric		Rating scale		F	p	η²	1st	4th	F	p	η²	Rubric	Rating scale
			F	p	η²	1st	4th	1st	4th	F	p	η²	1st	4th	F	p	η²	Rubric	Rating scale
Rate	Rate	Level 0	.624	.432	.007	.060	.042	.083	.060	1.740	.190	.019	.042	.021	1.740	.190	.019	.215	.103
		Level 1	.010	.922	.000	.000	.000	.083	.120	.077	.782	.001	.420	.60	.357	.552	.004	.701	.857
		Level 2	.748	.389	.008	.261	.250	.083	.160	.870	.353	.009	.172	.205	.694	.407	.007	15.130	14.080
Assess	Observation	Level 1	.107	.744	.001	2.650	1.790	3.790	3.240	2.250	.137	.024	3.222	2.52	7.540	.007**	.076	2.222	3.520
	Recall	Level 0	.000	.995	.000	5.522	3.375	5.458	3.320	10.998	.001***	.107	5.490	3.35	.008	.927	.000	4.448	4.389
		Level 1	.409	.524	.004	13.478	12.458	11.625	9.400	2.967	.088	.031	12.552	10.92	6.797	.011*	.069	12.968	10.513
		Level 2	.781	.379	.008	2.391	2.208	3.458	2.320	1.493	.225	.016	2.925	2.26	1.188	.279	.013	2.300	2.889
	Tool	Level 0	6.859	.010**	.069	.609	.167	.208	3.160	3.751	.056	.039	.409	1.66	4.004	.48*	.042	.388	1.684
		Level 1	.004	.951	.000	8.609	8.625	7.417	7.520	.007	.993	.000	8.013	8.07	2.614	.109	.028	8.617	7.468
		Level 2	6.601	.012**	.067	3.174	5.208	1.083	.680	2.955	.089	.031	2.129	2.94	48.661	< .001***	.346	4.191	.882
	Beliefs	Level 0	11.193	.001***	.108	.870	.292	.042	.160	4.877	.030*	.050	.456	.226	21.261	< .001***	.188	.581	.101
		Level 1	.343	.560	.004	1.609	2.500	3.500	3.840	1.711	.194	.018	2.554	3.17	11.780	< .001***	.114	2.054	3.670
		Level 2	.114	.705	.002	.957	.750	4.250	3.520	.461	.499	.005	2.603	2.13	19.345	< .001***	.174	.853	3.885
Reassess	Reassess	Level 0	3.087	.082	.032	.060	.420	.830	.060	.343	.560	.004	.042	.021	.343	.560	.004	.021	.042
		Level 1	.166	.685	.002	.000	.000	.830	.120	.166	.685	.002	.042	.060	5.097	.026*	.052	.000	.102
		Level 2	.210	.648	.002	.261	.250	.083	.020	.119	.731	.001	.172	.205	1.961	.165	.021	.255	.122
Advice	Advice	Level 1	.882	.350	.009	.304	.020	.250	.160	2.985	.087	.031	.277	.080	.214	.645	.002	.152	.205
Advice	Advice	Level 2	3.615	.060	.038	.430	.020	.020	.160	1.185	.279	.013	.022	.080	1.185	.279	.013	.022	.080

Note: Results from Observation Level 1 are the same as in Table 5, as is the same variable. n = 96. Results correspond to univariate ANOVAs conducted following the MANOVA, F(1, 92). *p < .05; **p < .01; ***p < .001

Discussion

Our study aimed to explore the cognitive processes that pre-service teachers engage in while assessing students’ SRL. By analysing their think-aloud protocols, we sought to understand how they approach SRL assessment and examine whether the year level and the use of distinct assessment tools influence their cognitive processes and their judgement accuracy of SRL assessment. In the following section, we synthesize our findings by addressing our research questions.

Insights into SRL assessment process

The think-aloud protocols’ analysis revealed that pre-service teachers followed relatively straightforward strategies when assessing students’ SRL using assessment tools with clear assessment criteria. These strategies are recalling previously observed student behaviours and comparing them with the criteria outlined in the assessment tool, affirming or denying whether the criteria had been met — both of which contribute to enhancing pre-service teachers’ judgement accuracy. Notably, these same strategies have been identified in prior research as effective for developing assessment practices (Heitzmann et al., 2019; Südkamp et al., 2012).

Moreover, driven by the coding framework developed by Panadero et al. (2024), we developed a categorization that underscores the structured nature of SRL assessment and the role of explicit assessment criteria in guiding pre-service teachers’ evaluations, a factor previously emphasized by research in the field of assessment (Brookhart, 2013; Woolf, 2004).

Our findings suggest that providing pre-service teachers with SRL-specific assessment tools can clarify and simplify the SRL assessment process. Research by Michalsky (2017) has shown that, in the absence of explicit assessment criteria (e.g., learning diaries), SRL is often assessed in a more generalized and less precise manner. In contrast, the tools used in this study offer clear, structured criteria that guide pre-service teachers in focusing on specific aspects of SRL, rather than treating SRL as an overly broad construct.

However, our findings also reveal that pre-service teachers often rely on personal beliefs when assessing SRL, and those who do so more frequently tend to be less accurate in their evaluations. This aligns with previous research indicating that teachers might also hold beliefs about SRL that conflict with SRL theory and are negatively related to SRL practices (Vosniadou et al., 2020). This highlights the need for further research to explore the impact of beliefs about SRL on SRL assessment processes. While this is an area requiring deeper investigation, our results also suggest that pre-service teachers using the rubric are less likely to rely on personal beliefs, which may contribute to more accurate assessments.

The role of rubrics in SRL assessment and instruction

Our findings indicate that using the rubric led to greater accuracy in SRL assessment compared to the rating scale and also shaped more refined cognitive processes. This aligns with previous research highlighting the effectiveness of rubrics as assessment tools (Brookhart, 2018; Panadero & Jonsson, 2013). One possible explanation is that the rubric provides structured SRL-related information, which may have helped pre-service teachers better understand SRL and how to identify it, since they had a tool that may have given them more information to make better evidence-based interpretations (Black & Wiliam, 2009). Prior research has shown that teachers with higher SRL knowledge are more likely to implement SRL-promoting practices in the classroom (Spruce & Bol, 2015).

This raises an important question: Could the same be true for SRL assessment? Furthermore, could engaging in SRL assessment contribute to developing SRL knowledge? If so, rubrics could extend beyond assessment, serving as a valuable tool for fostering teachers’ SRL knowledge and instructional practices.

In this sense, it is important to recognize that SRL assessment goes beyond the diagnostic phase; it also encompasses the adaptive phase, where teachers use the collected data to inform their instructional decisions (Klug et al., 2013). While our study primarily focuses on the first step — diagnosing students’ SRL — it is crucial to acknowledge the significance of taking action based on assessment findings, as highlighted by adaptive learning theory (Randi, 2017). Although our data suggest that some pre-service teachers initiated this second step by providing advice to students, the low frequency of such occurrences prevented us from drawing definitive conclusions on this aspect. Future research should focus on the adaptivity aspect of SRL assessment to gain a more comprehensive understanding of the entire process.

Impact of the teacher training programme

In line with previous research showing low correlation between teachers’ experience and SRL judgement accuracy (Michalsky, 2021), we also found that the year level pre-service teachers belonged to did not significantly impact their SRL judgement accuracy. This finding is not unexpected, since the teacher training programme in which elementary pre-service teachers are enrolled is a programme that does not include specific coursework on SRL. While this training equips them with the competencies needed to enter the workforce as elementary teachers, it does not provide dedicated instruction on SRL throughout the four years of the degree. This lack of SRL-specific training is not an isolated case. Previous research has shown that SRL interventions directed at pre-service teachers typically do not focus on developing their diagnostic competence (Ortube et al., 2024).

However, our findings suggest that fourth-year pre-service teachers, when using the rubric, demonstrated more analytical and in-depth cognitive processes during SRL assessments. While the rubric likely played a role in fostering these deeper cognitive processes, the development of pre-service teachers’ professional competencies during their general training may also have contributed to this effect.

In sum, our results show that while pre-service teachers’ year level — reflecting their general training — did not significantly affect their judgement accuracy, their cognitive processes during assessment were more sophisticated in their final year, particularly when supported by the rubric. Although this interaction did not impact the judgement accuracy of fourth-year pre-service teachers, it is important to emphasize the value of incorporating SRL-specific assessment tools into teacher preparation programmes. Such tools can foster the development of cognitive processes that, as demonstrated in our study, are associated with improved SRL judgement accuracy, ultimately enhancing both instruction and the development of SRL diagnostic competence.

Educational and theoretical implications

Regarding the theoretical implications, our study underscores the effectiveness of think-aloud protocols in examining the cognitive processes involved in educational activities such as SRL assessment. This methodology has proven valuable in shedding light on the complexities of SRL assessment, and further research should continue to employ it to deepen our understanding of these processes.

While diagnostic competence is not limited to diagnosis alone, the accuracy of diagnosis plays a crucial role, since it directly influences the potential for appropriate instructional interventions. In this regard, our study’s educational implications suggest the development of tools that can assist in the diagnostic phase of SRL assessment, particularly in identifying students’ SRL strengths and weaknesses. Specifically, our SRL-based rubric has proven to guide pre-service teachers’ cognitive processes more effectively, leading to a more accurate assessment compared to the use of a rating scale. Moreover, as we have argued through the discussion, this rubric could also serve as instructional material for developing pre-service teachers’ diagnostic competence.

Limitations of the study and future research lines

While this study provides valuable insights into pre-service teachers’ cognitive processes and accuracy of SRL assessment, several limitations must be acknowledged. Firstly, the experimental nature of the study may limit its ecological validity. The controlled setting allowed us to isolate and examine specific variables related to SRL assessment and pre-service teachers’ cognitive processes; however, it does not fully replicate the complexities and dynamics of real classroom environments. As such, the findings may not entirely capture how pre-service teachers would approach SRL assessment in authentic teaching contexts, where multiple factors such as classroom management, time constraints and student diversity may influence their decision-making and cognitive processes (Duffy et al., 2009).

Also, it is important to note that the data used to draw our conclusions are inherently influenced by the study’s experimental design, particularly the assessment tool used by participants. Our experiment did not include a control group that did not use any assessment tool, since SRL judgement accuracy was one of our outcome variables, and the lack of an assessment tool would have complicated the analysis. Therefore, these results serve as an initial exploration and should be contextualized within the study’s framework and experimental conditions to derive meaningful conclusions. Future studies might replicate the study with a group without an assessment tool, to see the cognitive processes aroused in that situation.

There is a last limitation regarding the specific population studied. This research focuses on pre-service teachers who have limited to no prior experience in assessing students, particularly in the domain of SRL. As novices in both teaching and assessment, their understanding of SRL assessment strategies and criteria is likely less developed than that of in-service teachers. Consequently, the findings reflect the perspectives and challenges of pre-service teachers in training and may not generalize to more experienced teachers. Addressing these limitations in future research will help build a more comprehensive understanding of how to support both pre-service and in-service teachers in developing diagnostic competence in SRL assessment.

Conclusion

This study provides valuable insights into how diagnostic competence can be developed in pre-service teachers, highlighting the importance of SRL-specific assessment tools with clearly defined assessment criteria, specifically rubrics. It also emphasizes the importance of training pre-service teachers to identify and interpret SRL-related diagnostic cues and underscores the need for further research to refine and adapt assessment tools for practical use in educational settings. A key contribution of this study is its focus on the structured nature of SRL assessment, which is often lacking in prior research. By employing explicit criteria through rubrics, this study differentiates itself from previous work, offering a more systematic approach to SRL assessment that can potentially enhance diagnostic accuracy and support teachers in fostering students’ SRL skills.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.26 MB