Iterative co-evaluation with a rubric of narrative texts in Primary Education / Coevaluación iterativa con rúbrica de textos narrativos en la Educación Primaria

Abstract

Spanish

This study compares two types of activities involving narrative text revision: the first one consists in a traditional evaluation by teachers and the second one involves iterative co-evaluation with a rubric. A total number of 128 Primary school learners that were randomly assigned to the two types of assessment took part in this study. They were asked to write a composition that was assessed with the assigned evaluation procedure in each case. After the evaluation process (by means of hetero-evaluation or iterative co-evaluation), the participants completely rewrote their compositions. A standardized test (PROESC: assessment of writing processes) and a rubric were applied in order to analyze the improvements in the new versions of the compositions. The narrations were also marked by four teachers, unaware of the research. The number of changes introduced by the participants in the second version of the narrations was also counted. The results show that 76% of the subjects that were assessed by means of iterative co-evaluation introduced the changes suggested by their peers as opposed to 86% of the learners in the other group that incorporated the modifications indicated by their teacher. The process of iterative co-evaluation with rubric resulted in a clear improvement in the organization and contents of the narrative texts (mainly in the description of the characters and in the story setting) while the students assessed by their teachers improved, significantly, their grammar and, above all, spelling mistakes.

The act of writing, as an ability requiring the activation of linguistic knowledge and cognitive processes, has been the object of extensive research. Learning to write requires a complex process of knowledge construction with diverse cognitive operations occurring simultaneously during the act of writing (Bereiter & Scardamalia, 1987; Butterfield, Hacker, & Alberstong, 1996; Camps 1990; Cassany, 1999; Flower & Hayes, 1981; Hayes, 1996, 2006; Sánchez, 1998). Theoretical models not only consider textual editing processes to be cognitive operations specific to writing, but they also regard planning and revising skills as a fundamental part of said competence (Castelló, 2002; Faigley & Witte, 1981; Fitzgerald, 1987; Piolat, 1991). These two metacognitive skills involve the development of high levels of self-regulation. Hence, many researchers not only believe that it is important for students to write frequently (Graham, Gillespie, & McKeown, 2013; Graham, Harris, and Hebert, 2011; Rogers & Graham, 2008), but also that self-regulation strategies in the learning process need to be acquired (Fidalgo & García, 2008; Zimmerman & Reisemberg, 1997).

Revision and self-regulation in writing processes

Various studies have documented the effects of peer support on writing self-efficacy and self-regulation, as well as on improvements in compositions (Graham, Harris, & Mason, 2005; Schunk & Zimmerman, 2007; Vass, Littleton, Miell, & Jones, 2008). During the writing process, reflection occurs when students consciously plan, revise, and assess their own texts. Gender differences in the ability to reflect or self-regulate when writing have not been demonstrated (Ramos, Cuadrado, & Iglesias, 2005), although age differences have been seen, where an upward trend in said ability exists as the student’s school grade level increases (Graham & Harris, 2000). Strategies such as self-correction and peer review of texts are useful for motivating and creating helpful reflection over the writing itself (Chen, Wei, Wu, & Uden, 2009; Harris, Graham, & Mason, 2006; Saito & Miwa, 2007).

This last resource, peer review, has proven to be particularly useful for potentiating self-regulation of composition processes, insofar as it favours the self-assessment of those involved (Min, 2006; Yang, 2010). An analysis of self-regulation’s principal theories reveals that self-assessment is the common denominator and the crucial component of that process (Puustinen & Pulkkinen, 2001). It has been shown that when students draft and co-evaluate a composition in pairs they produce better texts (above all with respect to the vocabulary used), in comparison with those produced individually (De la Paz & Graham, 2002; Graham et al., 2005; Yarrow & Topping, 2001). The verbal information exchanges that are a part of co-evaluation trigger many more changes to the compositions than those that would be expected from the feedback analysis received from peers (Peterson, 2003). This data supports the idea that co-evaluation activities indirectly generate additional self-assessment processes, derived from suggestions or discussion with peers. In fact, the positive effects from peer help and evaluation have not only been noted in the quality of the compositions, but also in the strategic and motivational behaviours characteristic of the self-regulated learning process for composing narrative and argumentative texts (Harris et al., 2006).

Even though these benefits have been widely documented in Secondary Educational levels (see the extensive and sound review from Graham & Perin, 2007), we have less evidence to go on in the Primary Education stage. Ochoa-Angrino, Aragón, Correa, and Mosquera (2008) tested a planning and evaluation system for writing stories in two stages, individually and jointly. In the case of individual correction, it was found that the children focused on superficial features related to grammar and spelling; however, in joint correction, the students were able to pick up on errors linked to deep content and text coherence. These results suggest that, when evaluating, the students are capable of practicing metacognitive processes and skills of a more advanced level when they receive help from teachers and peers, producing higher quality texts.

Other research has studied the measure in which the provided feedback is incorporated into story creating tasks occurring in peer review conditions (Chou, 1999; Tsui & Ng, 2000). These investigations concluded that less than 50% of the students incorporated the suggestions proposed by their peers. Such co-evaluation tends to rely on certain instruments that promote self-assessment, such as scripts (Alonso-Tapia & Panadero, 2010; Bannert, 2009), or rubrics (Jonsson & Svingby, 2007).

The use of rubrics in the production and assessment of texts

A rubric is a descriptive-ordinal scale comprised of a series of criteria or skill assessment categories, or specific product execution levels (normally four) that are defined with exact precision. The rubric lets each student easily recognize their situation with respect to the objectives that they must reach to produce a concrete piece of work (Bissell & Lemons, 2006; Gallavan & Kottler, 2009; Schamber & Mahoney, 2006), which constitutes ‘high-quality’ information about their own learning process (Arter & McTighe, 2001; Wiggins, 1989). Moreover, rubrics create explicit expectations that facilitate self-efficacy, self-assessment, and learning (Jonsson & Svingby, 2007). This benefit depends, however, on various factors. In a recent meta-analysis of 21 studies on the use of rubrics, Panadero and Jonsson (2013) identified several factors that modulate the effects of rubrics on self-regulated learning. The results of the studies chiefly concur in that the assessment criteria should be ‘transparent’, meaning that they should be understandable to the students, and adequately put into operation. On the one hand, this helps the students understand what is expected of them (Andrade & Du, 2005), reduces their anxiety, and increases their sense of self-efficacy (Panadero, Alonso-Tapia, & Huertas, 2012). On the other hand, the rubric’s criteria and levels should be set forth in such a way that they facilitate mindful reflection and feedback throughout the whole learning process, from the planning and forming of drafts, to the self-assessment of the final product (Andrade & Du, 2005; Panadero et al., 2012).

In spite of studies that also vouch for the rubric’s utilization in the classroom (Rezaei & Lovorn, 2010), concrete issues that hinder its use and efficacy have been found. A certain mistrust exists on the part of the students over the validity of the rubric as an assessment instrument (Ross-Fisher, 2005), mainly derived from the assessors’ absence of training and experience (Knoch, Read, & von Randow, 2007). In this respect, Jonsson and Svingby (2007) analyzed a series of risks stemming from the use of rubrics. Among the noteworthy is the difficulty in defining appropriate execution levels for the skills or products that are to be assessed, especially if these are not sufficiently specific; as well as achieving an appropriate reliability index when there are several subjects using the same rubric to assess a task. Such reliability, however, can be improved if the rubrics are supplemented with examples and if the assessors are trained in their use.

Furthermore, using a rubric for in-class learning seems to be quite influenced by the application context. We know, for example, that the teachers’ attitudes toward the rubric tend to be more positive when there are fewer students in the classroom (Kutlu, Bilican, & Yildirim, 2010). The negative attitudes of the teaching staff and a lack of training on the use of rubrics can cause incorrect use of the instrument, and consequently, less transparency (Gelbal & Kelecioglu, 2007). Finally, it seems that broader interventions are needed in order to succeed in documenting the rubric’s positive effects in Primary classrooms, as opposed to when higher-level students utilize them (Panadero & Jonsson, 2013).

We can specifically look at a few previous studies on the use of rubrics in Primary students’ writing activities. Benítez (2008) drafted a rubric for assessing a narration’s theme, characters (variety, description…), the story’s context (time, place…), plot (clarity and structure of chapters), and other linguistic aspects (grammar, spelling, vocabulary…). Although he found relations between the study’s two variables (analogical reasoning and quality text production), he also observed that some writing domains depended on knowledge other than verbal analogical reasoning.

Yan et al. (2012) developed a rubric made up of seven different criteria (with four levels of execution) for assessing the quality of a text. Relevance, breadth, depth in the elaboration of ideas, cohesion, coherence, and text structure and intelligibility were specifically evaluated. The results showed gender differences in textual quality, in favour of the girls. However, it was unclear if this difference was due to writing skills, planning, or a combination of the two.

Objectives

In this theoretical framework, the main objective of the present research is to gain information on the use of a rubric during peer review activities in Primary Education; an educational stage in which, as we have remarked, only a small number of studies have been done. More specifically, we seek to study the effects of a brief sequence of classroom activities that we have named coevaluación iterativa con rúbrica (CIR - iterative co-evaluation with a rubric) (Lucero & Montanero, 2012). By co-evaluation we mean a collaborative activity in which the students themselves actively participate in all or some of the assessment phases of their own learning and that of their peers. The process is iterative because the learner turns in the composition again after revising it, before receiving a new assessment (either from peers or the teacher himself). Both assessment activities depend on a rubric with four levels of execution for each criterion.

We specifically analyze the effects of a CIR on the improvement of narrative text production processes from students in the last years of Primary Education in this research, as compared with the traditional hetero-evaluation activities that are usually employed in the classroom. We additionally aim to document the changes that the students succeed in incorporating into their work, based on the suggestions from teachers and classmates.

Method

Participants

The participants in this study were 128 students from the 4th, 5th, and 6th years of Primary Education, from three schools located in the Badajoz province. The students of each group or class in the study’s two experimental conditions (traditional assessment methodology and iterative co-evaluation with a rubric) were randomly distributed in each school, following an alphabetical criterion.

The final sample distribution included 52 students from the 4th year of Primary, 37 students from 5th year, and 39 from 6th year, all were between the ages of 10 and 12, with a slightly larger representation of girls (Table 1). There were no immigrants among the student body that participated in the study, nor any student that presented special educational needs due to an insufficient command of the Castilian Spanish language. Any student that only attended one out of the two sessions held was also excluded.

Table 1.

Sample Distribution.

Group	Year	Age	Gender	No.
Co-evaluation with a rubric	4th	10 years	Male	9
	4th	10 years	Female	17
	5th	11 years	Male	4
	5th	11 years	Female	12
	6th	12 years	Male	12
	6th	12 years	Female	8
Conventional hetero-evaluation	4th	10 years	Male	8
	4th	10 years	Female	18
	5th	11 years	Male	8
	5th	11 years	Female	13
	6th	12 years	Male	8
	6th	12 years	Female	11

Procedure

The study’s design was based on a comparison of two evaluation and improvement methods for writing narrative compositions: the iterative co-evaluation with a rubric (CIR) and the conventional hetero-evaluation (CH), based on the scoring and individualized corrections from teachers in the ‘traditional’ format that is usually practiced in Primary classrooms. The texts were produced and assessed over the course of two sessions, each one an hour long.

The first session was identical for both experimental conditions: CIR and CH. During an initial 30-minute phase, one of the researchers gave an oral review on the concept of narration, the main narrative genres, and the structure of a story (based on the model proposed by Thorndyke, 1977). Using a familiar story, he demonstrated how to identify the parts of a narration (setting, initial event, plot, resolution), remarking aloud on each one of them after the reading. Due to time constraints, he did not subsequently assess the students’ learning.

During the second phase, each student had 30 minutes in which to write a story (invented or real) whose theme dealt with a boy that becomes a hero in his town, with a length of between one or two sides of a sheet of paper.

The second session (45 minutes long) took place four days later, after randomly distributing the students into the two experimental conditions and placing them into different classrooms in the school. In the traditional evaluation (CH), the teacher of each group or class, who had previously graded and left written remarks on the original narrations from the students of said group using a scale from 0 to 10, handed back the compositions to each student and gave them some time to revise them or raise any doubts. During that same time, the teacher made an overall assessment of the principal improvement needs detected. They then read a story entitled ‘The Skinny Princess’ with the students, which served to model the improvements that the students should make. Finally, each student had 30 minutes available for rewriting the story, to try to improve upon it (on another sheet of paper and keeping the previous version and the teacher’s review present). The teachers did not receive strict instructions for running the sessions, just brief oral and written guidelines.

The CIR groups’ second session also lasted approximately 45 minutes and was run by the researchers (one for each school). At the start, the researcher illustrated the rubric’s use, utilizing the same story as in the traditional evaluation group (‘The Skinny Princess’) as a model. The student pairs then exchanged their narrations. Each student individually reviewed their schoolmate’s composition, recording the level of execution on a rubric according to each criterion, and writing down any qualitative comments that applied. The rubrics were then exchanged and each student discussed the noted assessments with their schoolmate, explaining them and adding suggestions for improvement. Finally, each student had 30 minutes available for rewriting the story, to try to improve upon it (on another sheet of paper and keeping the previous version and the rubric present).

The 256 narrations composed by the students (before and after each one of the hetero-evaluation and co-evaluation modalities) were analyzed from different perspectives. First, each story was evaluated with a standardized test, the assessment of writing processes (PROESC), and a narration assessment rubric (that we discuss in the following section). Additionally, four Primary teachers (unaware of the research and unconnected with the schools in which the study took place) corrected the narrations (64 texts each) using a scale of 0 to 10, without knowing what condition or stage each composition was from.

Materials

The textual content of the compositions written by the students participating in the research was analyzed through two systems of categories that were devised, respectively, from the assessment criteria for writing processes proposed in PROESC (Cuetos, Ramos, and Ruano, 2002) and from the rubric’s criteria that was created ad hoc for this investigation.

The PROESC is an assessment battery of writing processes (Cuetos et al., 2002) consisting of 10 assessment criteria grouped into two large dimensions: contents and coherence. The ‘contents’ dimension is formed with the following criteria: where and when, characters, event with consequences, coherent outcome, and creativity. Whereas the ‘coherence’ dimension contains the following criteria: logical continuity, unity, literary figures, complex sentences, and vocabulary. The instrument’s manual reports an internal consistency of 0.82 (coefficient alpha). Furthermore, it presents good criterion related validity and an adequate factorial validity. In accordance with the test manual, each one of these criteria had two execution levels to which a certain score was awarded. Thus, a grade of ‘0’ corresponded with the absence of that criterion’s requirements in the assessed narration, whereas a grade of ‘1’ was used to indicate its presence.

The rubric for assessing narrations is a descriptive-ordinal scale with a total of seven criteria: four refer to the ‘Organization and content’ dimension and three to the ‘Grammatical aspects’ (see Appendix 1). Both the criteria as well as the four respective execution levels were compiled while keeping a few antecedents in mind, mainly the instructional studies on Thorndyke’s organizational pattern of texts (1977) and the research on story rubrics from Yan et al. (2012). Aspects related to the assessment criteria in the area of the Spanish Castilian language, established in the Enseñanzas Mínimas de Primeria (Minimum Teaching Requirements in Primary Education) were also considered. The statements at each level were formulated so that they formed exhaustive and mutually exclusive categories concerning the story’s quality. The following scores were associated with each level of execution: 0 points (level 1), 0.5 points (level 2), 1 point (level 3), and 1.5 points (level 4). The student received the score for a level only after their text met all the requirements set forth for that level. If any requirement was missing, the score given was that of the level immediately below.

In order to calculate the reliability in the PROESC and rubric’s applications, a brief training session on its application was given. Afterward, 14 narrations were randomly chosen, which were then separately analyzed by two of the researchers. The reliability index obtained by Cohen’s Kappa was 0.97 (p < .01) in the total scores from the PROESC and 0.88 (p < .01) in the rubric’s total scores.

Results

Improvements in the quality of the narrations

Tables 2 and 3 reflect the data corresponding to the first and second narration that the students in the two experimental conditions composed, according to the quantitative assessment of their quality obtained with the PROESC and the rubric.

Table 2.

Means and standard deviations of the PROESC criteria results in the Iterative Co-evaluation with a Rubric (CIR) and conventional hetero-evaluation (CH) conditions.

		Original narration (N1)				Final narration (N2)				Dif. N2-N1
		CIR		CH		CIR		CH		CIR	CH
Criteria		Mean	SD	Mean	SD	Mean	SD	Mean	SD	Mean	Mean
Content	Where and when	0,65	0,48	0,77	0,42	0,74	0,41	0,77	0,42	0,09	0,00
	Characters	0,65	0,49	0,77	0,50	0,79	0,40	0,77	0,50	0.14**	0,00
	Event with consequences	0,90	0,30	0,91	0,31	0,94	0,25	0,94	0,27	0,04	0,03
	Coherent ending	0,61	0,49	0,73	0,45	0,60	0,49	0,73	0,45	‒0,01	0,00
	Creativity	0,60	0,49	0,73	0,44	0,63	0,49	0,77	0,41	0,03	0,04
	Content subtotal	3,41	1,40	3,91	1,25	3,70	1,14	3,98	1,11	0.29*	0,07
Coherence	Logical flow	0,76	0,43	0,80	0,39	0,71	0,46	0,83	0,36	‒0,05	0,03
	Unity	0,60	0,49	0,67	0,50	0,60	0,49	0,72	0,49	0,00	0,05
	Literary figures	0,37	0,49	0,45	0,50	0,43	0,50	0,50	0,49	0,06	0,05
	Complex sentences	0,18	0,37	0,26	0,45	0,21	0,40	0,36	0,48	0,03	0.10*
	Vocabulary	0,39	0,49	0,18	0,39	0,47	0,50	0,27	0,45	0,08	0,09
	Coherence subtotal	2,30	1,32	2,36	1,37	2,42	1,36	2,68	1,27	0,12	0.32*
	Total	5,71	2,46	6,27	2,22	6,12	2,23	6,66	2,07	0,41	0,39

Note: Statistically significant differences: (*) p < .05; (**) p < .01.

The quality of the original narration (before intervention) was homogenous in both experimental conditions. We only find a significant difference in the PROESC ‘richness of vocabulary’ criterion (Chi-squared = 6.66; p < .05), in which the CIR condition received a slightly higher average.

Table 3.

Means and standard deviations of the rubric’s criteria results in the Iterative Co-evaluation with a Rubric (CIR) and conventional hetero-evaluation (CH) conditions.

	Original narration (N1)				Final narration (N2)				Dif. N2-N1
	CIR		CH		CIR		CH		CIR	CH
Criteria	Mean	SD	Mean	SD	Mean	SD	Mean	SD	Mean	Mean
Setting	0,92	0,41	1,02	0,39	1,22	0,46	1,02	0,40	0.30**	0,00
Theme	0,67	0,33	0,61	0,31	0,72	0,35	0,64	0,29	0,05	0,03
Plot	0,78	0,36	0,81	0,30	0,77	0,36	0,85	0,29	–0,01	0,04
Creativity	0,54	0,46	0,58	0,43	0,53	0,46	0,60	0,42	–0,01	0,02
Organization and content subtotal	2,91	1,13	3,02	1,01	3,24	1,18	3,11	1,03	0,33	0,09
Sentences	0,85	0,37	0,86	0,39	0,87	0,37	1,01	0,41	0,02	0,15
Vocabulary	0,67	0,29	0,61	0,25	0,71	0,28	0,61	0,26	0,04	0,00
Spelling	0,23	0,41	0,20	0,41	0,35	0,47	0,70	0,46	0,12	0.50**
Grammatical aspects subtotal	1,75	0,72	1,67	0,66	1,93	0,74	2,32	0,82	0,18	0.65*
Total	4,66	1,58	4,69	1,44	5,17	1,61	5,43	1,55	0,51	0,74

Note: Statistically significant differences: (*) p < .05; (**) p < .01.

The application of the PROESC (Table 2), reveals some score improvements between the first and second CIR condition narrative productions in the following categories: vocabulary, literary figures, where and when, and characters. Even though the only significant difference seen referred to the ‘characters’ criterion (Chi-squared = 24.28; p < .01), significant differences were also observed in the total score for the overall ‘content’ subtotal (t = 2.75; p < .05).

In the traditional methodology group (CH), improvements were observed between the first and second narration in the following criteria: vocabulary, complex sentences, and literary figures. These differences became significant in the specific criterion ‘sentences’ (Chi-squared = 3.68; p < .05), as well as between the averages in the overall ‘coherence’ subtotal (t = 2.26; p < .05).

The assessment results with the rubric follow the same trend as those obtained with the PROESC. In Table 3, improvements are seen in the content of the first and second narration from the CIR condition participants, even though they were only significant for the specific criterion ‘setting’ (t = 4.62; p < .01), which alludes to the spatial and temporal contextualization of the story and its characters. For their part, the students that received improvement suggestions from their teacher (CH condition) registered significant improvements in the grammatical aspects of the stories (t = 5.23; p < .05), especially in ‘spelling’ (t = 3.04; p < .01).

Finally, the overall assessment from the four teachers (unaware of the research), who corrected 64 randomly distributed texts each, only registered significant improvements between the first and the second narration in the hetero-evaluation modality. The assessment average of the first narration from the CH condition was 4.99; whereas the second was 5.73 (t = 2.07; p < .05).

Amount of modifications introduced in the narrations

In order to study the changes introduced between the first and second narration of both experimental conditions in more detail, a re-counting of the changes made in the second narration from each subject, with respect to the first narration, was performed and were then analyzed qualitatively. For example, the second subject of the CIR condition (identified as JRR) improved a level in the rubric’s ‘setting’ criterion, as is seen in the following transcription:

Extract from the 1st narration: ‘Once upon a time there was a boy that lived in a very ugly city. The boy was named Alejandro […]’

Peer correction: The co-evaluating peer selects the third level out of the four that exist in the ‘setting’ criterion. In addition, they make the following written suggestion: ‘You have to describe the character’. After that, JRR completes the 2nd narration incorporating the change proposed by his peer.

Extract from the 2nd narration: ‘Once upon a time there was a poor boy that lived in a city. The boy was named Alejandro, he was tall, strong, and had very green eyes […]’

Fruitless or negative changes were also analyzed. This situation presumably occurred when the student did not understand or was incapable of incorporating the suggestions that the peer evaluator had pointed out, even though they often tried. An example of a negative change in the ‘spelling’ criterion is shown below:

Extract from the 1st narration: ‘Once upon a time there was a boy who always dreamed of being a hero. One day while he was playing with his friends a spider bite him but it did not hurt him one day fiddling with his hands a spider web shot out. […]’.

Peer correction: The peer co-evaluator selects the third level out of the four existing in the ‘spelling’ criteria, and adds the following suggestion: ‘Don’t make mistakes’.

Extract from the 2nd narration: ‘One morning Martin wake up happy, his mother named Sara made him some delicious toast. They sat down to eet breakfast, but suddenly ‘cataplás’ that noise interupted the breakfast, they went runing up the stairs but it was nothing. […]’

The results of this analysis showed that 47 of the 62 subjects (76%) that revised the texts following a CIR method incorporated changes in at least one of the evaluation criteria. This data is similar to the number of students that incorporated changes derived from the proposed improvements from teachers (condition CH), 57 subjects out of 66 total (86%).

Table 4 displays the criteria in which changes have been made, as well as their positive or fruitless-negative trend.

Table 4.

Percentage of changes after the CIR condition co-evaluation, according to criteria.

Dimension	Criterion	Positive change	Negative change	Criterion total	Dimension total
Organization and content	Setting	33(35.1%)	0(0%)	33(35.1%)	61(65%)
	Theme	5(5.3%)	0(0%)	5(5.3%)
	Plot	8(8.6%)	1(1%)	9(9.6%)
	Creativity	11(11.7%)	3(3.2%)	14(15%)
Grammatical aspects	Sentences	9(9.6%)	1(1%)	10(10.6%)	33(35%)
	Vocabulary	7(7.4%)	2(2.2%)	9(9.6%)
	Spelling	12(12.8%)	2(2.2%)	14(15%)

As can be seen, the variations made after the CIR mainly occurred in the ‘content’ dimension. Sixty-five percent of the changes generated in the set of narrations from the CIR condition subjects belong to this dimension, whereas only 35% pertain to grammatical aspects. A great number of these changes have more specifically affected the ‘setting’ of the narration (35.1%), whereas the rest of the criteria did not exceed 15%. It should be noted that the percentage of negative changes, taking into account the suggestions from co-evaluating classmates, is lower than 4% in all the rubric’s criteria.

Table 5 exhibits the criteria in which the group assessed with traditional methodology made changes. These variations mainly occurred in ‘spelling’ (37.6% of the confirmed changes in this condition’s set of narrations). A high percentage of changes in the structure of ‘sentences’ (21.8%) was also registered. However, the changes in the rest of the criteria do not even reach 10%. The high percentage of negative changes that occurred in creativity, even reaching higher than that of the positive changes, should be noted.

Table 5.

Percentage of incorporation of changes after the traditional teacher evaluation, according to criteria.

Dimension	Criterion	Positive change	Negative change	Criterion total	Dimension total
Organization and content	Setting	8(7.9%)	0(0%)	8(7.9%)	34(33.66%)
	Theme	8(7.9%)	0(0%)	8(7.9%)
	Plot	9(8.9%)	0(0%)	9(8.9%)
	Creativity	4(4%)	5(4.9%)	9(8.9%)
Grammatical aspects	Sentences	21(20.8%)	1(1%)	22(21.8%)	67(66.34%)
	Vocabulary	5(4.9%)	2(2%)	7(6.9%)
	Spelling	36(35.6%)	2(2%)	38(37.6%)

If we compare Tables 3 and 4, we see that the number of changes in ‘theme’, ‘plot’, and ‘vocabulary’ is very similar in the two didactic conditions. The major differences regarding percentage change according to the assessment systems occur in the criteria: setting (CIR: 35.1%; CH: 7.9%), sentences (CIR: 10.6%; CH: 21.8%), spelling (CIR: 15%; CH: 37.6%) and creativity (CIR: 15%; CH: 8.9%). These inequalities are summarized in the following comparison chart in which the improvements in content and formal aspects from both conditions can be found, according to the changes analyzed across the rubric’s criteria. The students that rewrote the narration on the basis of the proposed suggestions after the co-evaluation, effected a significantly greater percentage change in the ‘content’ dimension than in the dimension related to formal aspects (t = 2.06; p < .05). The opposite occurred in the case of the group that received hetero-evaluation (t = 2.45; p < .05).

Figure 1.

Changes produced after the conventional hetero-evaluations by teachers (CH) and the Iterative Co-evaluations with a rubric (CIR).

Discussion and Conclusions

In this study we sought to compare the effects of different assessment activities on the improvement of production processes and narrative text revisions from students in the 2nd and 3^rd year of Primary Education.

We know that one of the principal difficulties of writing compositions that affects students of this educational stage is the ‘structuring of the content’ (Gallego, 2012; Ramos et al., 2005). The results from applying the PROESC and a rubric created ad hoc to assess Primary students’ narrations shows that the brief classroom activity of peer reviewing an original narration, with hardly any teacher participation, can be more useful than traditional teacher evaluation for students in the last years of Primary trying to improve certain aspects of their organization and content. The subjects that participated in an iterative co-evaluation with a rubric activity (CIR) showed significant improvement, especially in character descriptions and story setting; whereas those that were teacher-assessed within a conventional hetero-evaluation activity (CH) improved grammatical and, above all, spelling aspects. This result, although revealing the limitations that peer review activities between Primary students hold for achieving significant improvements in the syntactic and orthographic structure of sentences, supports the potential of the CIR methodology for improving other aspects more closely related to the story planning processes.

Several possible explanations for the difference found between both didactic modalities exist. It is possible that, when evaluating compositions, a large portion of the Primary teachers tend to focus their corrections more on formal language issues, both grammatical and orthographic, than on content planning processes. The fact that the teachers, unaware of the research, evaluated the second narrations from the CH condition as better overall is consistent with this interpretation since they gave a higher score to those texts that they found to have a better use of grammar. In contrast, the students that co-evaluated their classmates’ narrations proposed more improvement suggestions on aspects related to the narration’s content, that is, the setting, theme, plot, or creativity of the story (Table 4).

Another possible reason has to do with the quality of the improvement processes regarding story content that the peer review activities generate, unlike the traditional hetero-evaluation activities. In fact, some studies have determined that discussions about a learning task are more productive when the students work in pairs than those that take place when a teacher is the one doing the evaluating (Kirschner, Beers, Boshuizen, & Gijselaers, 2008). The discussions that occur during co-evaluation are characterized by the use of a ‘language between peers’ and by a high level of participant involvement. In this respect, other studies have demonstrated that the peer review activities produce better texts than those in which a student works individually (De la Paz & Graham, 2002; Graham et al., 2005; Yarrow & Topping, 2001).

The fact that we have not found significant improvements across all of the CIR content specific criteria may be related to the greater complexity of the writing processes that underlie these narration components, mainly the plot. It could also be connected to the transparency (Panadero & Jonsson, 2013) of the rubric that the peer review process presented. On the one hand, it is possible that the levels of the rubric’s execution, with regard to the plot and theme criteria, are not well adapted to the students’ level; or else they are not adequately functional for generating sufficiently specific revisions of text pieces. On the other hand, several studies have demonstrated the difficulties that the students encounter when interpreting tables and other external representation systems in school tasks (Gabucio, Martí, Enfedaque, Gilabert, & Konstantinidou, 2010). There is no doubt that the rubric’s format is difficult for a Primary student to interpret given that it contains columns and rows that signify something different than what is usually associated with this type of representation.

Regarding the second specific objective of this investigation, knowing the measure in which the improvement suggestions from the teachers and students were incorporated into the text revisions, we have determined changes in 75% of the narrations from both modalities. This result contrasts with other previous studies with older students, which concluded that less than 50% of the students incorporated the suggestions made by their peers (Chou, 1999; Tsui & Ng, 2000).

The quantification of the changes is also consistent with the above conclusion, in that the CIR activities triggered more modifications in the narration’s content and the organization, whereas the traditional hetero-evaluation activities generated more changes in syntactic and orthographic aspects. In accordance with the data in Table 4, more than 64% of the CIR condition’s changes affected narration content (either setting, theme, plot, or story creativity). In the case of the traditional evaluation method (CH), the positive changes are concentrated in the ‘spelling’ (37.6%) and in the ‘sentences’ (21.8%) criteria, remaining lower than 10% in the rest of the dimensions. The few negative or fruitless changes produced in both assessment methods can be explained by problems related to how the suggestions for improvement were expressed or understood (Graham & Hebert, 2010, 2011; Martínez, Martín, & Mateos, 2011).

In conclusion, even though the students produced 10% less positive changes after receiving suggestions from classmates (CIR condition) than after the teachers’ corrections (CH condition), which was brought to light in previous research (Paulus, 1999), peers offer assessments that can be more or less effective for improving the content of narrations. Many students most likely noticed their own errors that could be corrected in their compositions while using the rubric for evaluating those of their classmates (Gallavan & Kottler, 2009; Schamber & Mahoney, 2006). The use of rubrics would make it easier for students to self-evaluate (in greater measure than other traditional activities) text content planning processes that, as we have already explained, are crucial to self-regulated learning processes.

This conclusion carries relevant implications for the revision of educational practices in teaching writing. From the previous results, we could not conclude that Primary teachers pay insufficient attention to planning and revising processes in writing, beyond grammatical and spelling aspects. However, there is no doubt that didactic resources such as those that we have tested in the CIR activities can be used by teachers as a supplement to the instructional support that other more ‘traditional’ activities supply.

We should first mention that the meager amount of instruction and non-existent experience in using rubrics that the participants possessed were among this work’s principal limitations (to which should be added that the degree to which the rubrics’ different criteria were understood was not evaluated). We can see evidence that peer reviewer training has positive effects if we compare the assessments from the subjects that have received instruction in this area with those that lacked the instruction (Min, 2006; Stanley, 1992; Zhu, 1995). In this regard, we seek to replicate this study in future research with students who are familiar in the use of external representation systems that serve to support co-evaluation processes at different educational levels.

Likewise, it would be advantageous to ensure that the Primary teachers in charge of the traditional evaluation activities have sufficient knowledge of the writing processes and the narrative structures on which the rubric is based, so that this lack of training will not be responsible for the differences found.

The last limitation deals with the complexity of the assessment activities studied, which involves different didactic variables. In fact, the combination of the use of the rubric with other metacognitive activities is one of the most commonly seen limitations in the review of the literature (Panadero & Jonsson, 2013). At present, we are conducting research with another Primary learner sample to see up to what point the benefits of the CIR condition specifically derive from peer co-evaluation, from the instrument that supports it (either a rubric, a script, etc.), or from a combination of both.

Coevaluación iterativa con rúbrica de textos narrativos en la Educación Primaria

El acto de la escritura, como habilidad que exige la activación de conocimientos lingüísticos y de procesos cognitivos ha sido objeto de una amplia investigación. El aprendizaje de esta competencia requiere de un proceso complejo de construcción de conocimientos, con diversas operaciones cognitivas que ocurren simultáneamente durante la escritura (Bereiter y Scardamalia, 1987; Butterfield, Hacker, y Alberstong, 1996; Camps 1990; Cassany, 1999; Flower y Hayes, 1981; Hayes, 1996, 2006; Sánchez, 1998). Los modelos teóricos, no solo contemplan los procesos de edición textual como operaciones cognitivas propias de la escritura, sino que consideran como parte fundamental de dicha competencia las habilidades de planificación y revisión (Castelló 2002; Faigley y Witte, 1981; Fitzgerald, 1987; Piolat, 1991). Estas dos habilidades metacognitivas implican el desarrollo de altos niveles de autorregulación. De ahí, la importancia atribuida por muchos investigadores, no sólo a que los estudiantes escriban con frecuencia (Graham, Gillespie, y McKeown, 2013; Graham, Harris, y Hebert, 2011; Rogers y Graham, 2008), sino también a la adquisición de estrategias de autorregulación en el proceso de aprendizaje (Fidalgo y García, 2008; Zimmerman y Reisemberg, 1997).

Revisión y autorregulación de los procesos de escritura

Diversas investigaciones han documentado los efectos de la ayuda entre iguales en la autoeficacia y la autorregulación de la escritura, así como en las mejoras de las composiciones (Graham, Harris, y Mason, 2005; Schunk y Zimmerman, 2007; Vass, Littleton, Miell, y Jones, 2008). En el proceso de la escritura, la reflexión ocurre cuando el alumnado planifica, revisa y evalúa sus propios textos de manera consciente. No se han podido demostrar diferencias en la capacidad de reflexión o de autorregulación a la hora de escribir en función del sexo (Ramos, Cuadrado, e Iglesias, 2005), aunque sí de la edad, existiendo una tendencia ascendente en dicha capacidad a medida que aumenta el nivel en el que se encuentra escolarizado el alumno (Graham y Harris, 2000). Estrategias tales como la auto-corrección y la revisión por pares de los textos, son útiles para motivar y generar una reflexión útil sobre la propia escritura (Chen, 2010; Chen, Wei, Wu, y Uden, 2009; Harris, Graham, y Mason, 2006; Saito y Miwa, 2007).

Este último recurso, la revisión por pares, se ha mostrado particularmente útil para potenciar la autorregulación de los procesos de composición, en la medida en que favorece la autoevaluación de los implicados (Min, 2006; Yang, 2010). Un análisis de las principales teorías de la autorregulación pone de manifiesto que la autoevaluación es el denominador común y el componente crucial de dicho proceso (Puustinen y Pulkkinen, 2001). Se ha demostrado que, cuando los alumnos redactan y coevalúan una composición por parejas, producen textos mejores (sobre todo en cuanto al vocabulario empleado) si lo comparamos con las producciones de los que trabajan de manera individual (De la Paz y Graham, 2002; Graham et al., 2005; Yarrow y Topping, 2001). Los intercambios verbales que se producen en situaciones de coevaluación desencadenan muchos más cambios en las composiciones de las que cabría esperar del análisis de la retroalimentación recibida de los compañeros (Peterson, 2003). Este dato apoya la idea de que las actividades de coevaluación generan indirectamente procesos de autoevaluación adicionales, derivados de las sugerencias o de la discusión con los compañeros. De hecho, los efectos positivos de la ayuda y evaluación entre iguales, no se han constatado sólo en la calidad de las composiciones, sino también en los comportamientos estratégicos y motivacionales que caracterizan los procesos de aprendizaje autorregulado de la redacción de textos narrativos y argumentativos (Harris et al., 2006).

Aunque estos beneficios se han documentado ampliamente en los niveles educativos a partir de la Educación Secundaria (véase la extensa y sólida revisión de Graham y Perin, 2007), contamos con menos evidencias en la etapa de la Educación Primaria. Ochoa-Angrino, Aragón, Correa, y Mosquera (2008) pusieron a prueba un sistema de planificación y evaluación de la escritura de cuentos en dos fases, una individual y otra colectiva. Se observó que en la corrección individual los niños se centraron en rasgos superficiales, relacionados con gramática y ortografía; sin embargo, en la corrección grupal, los alumnos pudieron percatarse de la existencia de errores vinculados con el contenido profundo y la coherencia del texto. Estos resultados sugieren que, al evaluar, los alumnos son capaces de poner en práctica procesos y habilidades metacognitivos de un nivel más avanzado cuando cuentan con la ayuda de profesores y compañeros, produciendo textos de mayor calidad.

Otros trabajos han estudiado en qué medida se incorporan las retroalimentaciones realizadas en un contexto de revisión por pares en tareas de creación de relatos (Chou, 1999; Tsui y Ng, 2000). Estas investigaciones concluyen que menos de un 50% de los estudiantes incorporan las sugerencias propuestas por los compañeros. Dicha coevaluación suele apoyarse en ciertos instrumentos que promueven la autoevaluación, como los guiones (Alonso-Tapia y Panadero, 2010; Bannert, 2009), o las rúbricas (Jonsson y Svingby, 2007).

El uso de rúbricas en la producción y evaluación de textos

Una rúbrica es una escala descriptivo-ordinal, compuesta por una serie de criterios o categorías de evaluación de una habilidad o producto operativizada en niveles de ejecución (normalmente cuatro) que son enunciados con cierta precisión. La rúbrica facilita que cada alumno conozca su situación con respecto a los objetivos que debe alcanzar en la realización de una tarea concreta (Bissell y Lemons, 2006; Gallavan y Kottler, 2009; Schamber y Mahoney, 2006), lo cual constituye una información de ‘alta calidad’ sobre su propio proceso de aprendizaje (Arter y McTighe, 2001; Wiggins, 1989). Por otra parte, las rúbricas crean expectativas explícitas, que facilitan la autoeficacia, la autoevaluación y el aprendizaje (Jonsson y Svingby, 2007). Este beneficio depende, no obstante, de diversos factores. En un reciente meta-análisis de 21 estudios sobre el uso de rúbricas, Panadero y Jonsson (2013) identificaron diversos factores que modulan los efectos de las rúbricas en la autorregulación del aprendizaje. Los resultados de los estudios coinciden principalmente en que los criterios de evaluación deben ser ‘transparentes’, en el sentido de que sean comprensibles para los estudiantes y estén adecuadamente operativizados. Por un lado, esto ayuda a que los estudiantes entiendan qué se espera de ellos (Andrade y Du, 2005), se reduzca su ansiedad y aumente su sentimiento de autoeficacia (Panadero, Alonso-Tapia, y Huertas, 2012). Por otro lado, los criterios y niveles de la rúbrica deben estar enunciados de tal forma que faciliten una reflexión consciente y una retroalimentación a lo largo de todo el proceso de aprendizaje, desde la planificación y confección de borradores, hasta la autoevaluación del producto final (Andrade y Du, 2005; Panadero et al., 2012).

A pesar de los estudios que avalan también su utilización en el aula (Rezaei y Lovorn, 2010), se han descubierto aspectos concretos que dificultan el uso y la eficacia de la rúbrica. Por una parte, existe una cierta desconfianza sobre la validez de la rúbrica como instrumento de evaluación por parte de los estudiantes (Ross-Fisher, 2005), derivada principalmente de la falta de formación y experiencia de los evaluadores (Knoch, Read, y von Randow, 2007). En este sentido, Jonsson y Svingby (2007) analizaron una serie de riesgos del uso de rúbricas, entre los que podríamos destacar la dificultad para delimitar niveles de ejecución adecuados para las habilidades o productos que se evalúan, especialmente si estos no son suficientemente específicos; así como para conseguir un índice adecuado de fiabilidad, cuando varios sujetos evalúan una tarea utilizando la misma rúbrica. Dicha fiabilidad, sin embargo, puede mejorarse si las rúbricas son complementadas con ejemplos y si los evaluadores son entrenados en su uso.

Por otra parte, la utilidad de la rúbrica para el aprendizaje en el aula parece estar muy mediatizada por el contexto de aplicación. Sabemos, por ejemplo, que la actitud de los profesores hacia la rúbrica tiende a ser más positiva a medida que el número de alumnos en el aula es menor (Kutlu, Bilican, y Yildirim, 2010). Las actitudes negativas del profesorado y la falta de formación sobre el uso en rúbricas puede provocar una utilización incorrecta del instrumento y, como consecuencia, una menor transparencia (Gelbal y Kelecioglu, 2007). Por último, parece que en las aulas de Primaria se necesitan intervenciones más amplias para conseguir documentar efectos positivos de las rúbrica, que cuando las utilizan estudiantes de niveles superiores (Panadero y Jonsson, 2013).

Contamos específicamente con algunos antecedentes de investigaciones sobre el uso de rúbricas en actividades de escritura de niños de Primaria. Benítez (2008), confeccionó una rúbrica para evaluar el tema de una narración, los personajes (variedad, descripción…), el contexto del relato (tiempo, lugar…), la trama (claridad y articulación de los episodios), y otros aspectos lingüísticos (gramática, ortografía, léxico…). Aunque encontró relaciones entre las dos variables de estudio (razonamiento analógico y producción de textos de calidad), hay que matizar que observó que algunos dominios de la escritura dependían de otros conocimientos, además del razonamiento analógico verbal.

Yan et al. (2012) elaboraron una rúbrica de siete criterios (con cuatro niveles de ejecución), para evaluar la calidad de un texto. Concretamente se evaluó la relevancia, la amplitud, la profundidad en la elaboración de ideas, la cohesión, la coherencia, la estructura del texto y su inteligibilidad. Los resultados mostraron diferencias de calidad textual en función del género, a favor de las niñas. Pero no se pudo determinar si esta diferencia se debió a habilidades escritoras, de planificación o a la combinación de ambas.

Objetivos

En este marco teórico, el principal objetivo de la presente investigación es obtener más información acerca de la utilidad de la rúbrica en actividades de revisión por pares en la Educación Primaria; etapa educativa en la que, como hemos comentado, se han realizado un menor número de estudios en esta línea. Más específicamente pretendemos estudiar los efectos de una breve secuencia de actividades de aula que hemos denominado coevaluación iterativa con rúbrica (CIR) (Lucero y Montanero, 2012). Entendemos por coevaluación una actividad colaborativa en la que los propios alumnos participan activamente en todas o en algunas de las fases de la evaluación de su aprendizaje y el de los compañeros. El proceso es iterativo, porque el alumno vuelve a entregar el trabajo, después de revisarlo, antes de recibir una nueva evaluación (ya sea de los compañeros o del propio profesor). Ambas actividades de evaluación se apoyan en una rúbrica, con cuatro niveles de ejecución por cada criterio.

En esta investigación concretamente analizamos los efectos de una CIR en la mejora de los procesos producción de textos narrativos de alumnos de los últimos cursos de Educación Primaria, en comparación con las actividades tradicionales de heteroevaluación que suelen emplearse en el aula. Además, nos propusimos documentar los cambios que los alumnos consiguen incorporar a sus producciones, a partir de las sugerencias de profesores y compañeros.

Método

Participantes

En el estudio participaron 128 alumnos de 4°, 5° y 6° de Primaria, de tres centros educativos de la provincia de Badajoz. En cada centro se procedió a la distribución aleatoria del alumnado de cada grupo-clase en las dos condiciones experimentales del estudio (metodología tradicional de evaluación y coevaluación iterativa con rúbrica), siguiendo un criterio alfabético.

La distribución final de la muestra en cuanto a cursos fue de 52 alumnos de 4° de Primaria, 37 estudiantes de 5° curso y 39 de 6°, con edades comprendidas entre los 10 y 12 años y una representación ligeramente mayor de niñas (Tabla 1). En el estudio no participó alumnado inmigrante, ni aquel que presentaba necesidades educativas especiales por no tener un dominio suficiente de la Lengua Castellana. También se excluyó al alumnado que únicamente asistió a una de las dos sesiones desarrolladas.

Tabla 1.

Distribución de la muestra.

Grupo	Curso	Edad	Género	N°
Coevaluación con rúbrica	4°	10 años	Hombre	9
	4°	10 años	Mujer	17
	5°	11 años	Hombre	4
	5°	11 años	Mujer	12
	6°	12 años	Hombre	12
	6°	12 años	Mujer	8
Heteroevaluación convencional	4°	10 años	Hombre	8
	4°	10 años	Mujer	18
	5°	11 años	Hombre	8
	5°	11 años	Mujer	13
	6°	12 años	Hombre	8
	6°	12 años	Mujer	11

Procedimiento

El diseño de la investigación se basó en la comparación de dos métodos de evaluación y mejora de narraciones: la coevaluación iterativa con rúbrica (CIR) y la heteroevaluación convencional (HC), basada en la calificación y correcciones individualizadas por parte del profesorado, con el formato ‘tradicional’ que habitualmente se desarrolla en las aulas de Primaria. El proceso de producción y evaluación de los textos se desarrolló en dos sesiones de una hora de duración cada una.

La primera sesión fue idéntica para las dos condiciones experimentales: CIR y HC. En una primera fase de 30 minutos de duración, uno de los investigadores realizó oralmente un repaso del concepto de narración, los principales géneros narrativos y la estructura de un relato (basado en el modelo propuesto por Thorndyke, 1977). A partir de un cuento conocido, enseñó a identificar las partes de una narración (marco, suceso inicial, trama, resolución), comentándola tras la lectura en voz alta de cada una de ellas. Por razones de tiempo, no se evaluó posteriormente el aprendizaje de los alumnos.

Durante la segunda fase, cada alumno dispuso de 30 minutos para escribir una historia (inventada o real) cuya temática versaba sobre un niño que se convierte en un héroe de su pueblo,con una extensión entre una y dos caras de un folio.

La segunda sesión (de 45 minutos de duración) tuvo lugar cuatro días después, tras distribuir de manera aleatoria a los alumnos en las dos condiciones experimentales, que ocuparon diferentes aulas del centro. En la actividad de evaluación tradicional (HC), el profesor de cada grupo-clase, que previamente había calificado y comentado por escrito las narraciones originales de los alumnos de dicho grupo utilizando una escala de 0 a 10, entregó las redacciones con las correcciones a cada alumno y les dio un tiempo para que las revisasen o formulasen alguna duda. Paralelamente hizo una valoración global de las principales necesidades de mejora detectadas. Después, leyó con los alumnos un cuento titulado ‘la princesa flaca’, el cual sirvió de modelo de las mejoras que debían introducirse. Por último, cada alumno dispuso de 30 minutos para redactar nuevamente la historia, intentando mejorarla (en otro folio y manteniendo la versión anterior presente y la revisión del profesor). Los profesores no recibieron instrucción propiamente dicha para realizar la sesión, sino tan sólo unas breves orientaciones oralmente y por escrito.

La segunda sesión del grupo CIR duró también 45 minutos aproximadamente y estuvo a cargo de los investigadores (uno por centro). Al comienzo, el investigador ejemplificó la utilización de la rúbrica, utilizando como modelo el mismo cuento que en el grupo de evaluación tradicional (‘la princesa flaca’). Posteriormente, las parejas de alumnos se intercambiaron las narraciones. Cada alumno revisó individualmente la redacción del compañero, registrando en una rúbrica el nivel de ejecución, según cada criterio, así como algunos comentarios cualitativos. Después, se intercambiaron las rúbricas y cada alumno discutió con el compañero las valoraciones anotadas, justificándolas y añadiendo sugerencias de mejora. Finalmente, cada alumno dispuso de 30 minutos para redactar nuevamente la historia intentando mejorarla (en otro folio y manteniendo la versión anterior y la rúbrica presente).

Las 256 narraciones producidas por los alumnos (antes y después de cada una de las modalidades de heteroevaluación o coevaluación) fueron analizadas desde diferentes perspectivas. Por un lado, cada relato fue valorado con una prueba estandarizada, el test de Procesos Escritores (PROESC), y una rúbrica de evaluación de narraciones (que describimos en el siguiente apartado). Por otro lado, cuatro maestros de Primaria (ajenos a la investigación y a los centros educativos en los que se llevó a cabo el estudio) corrigieron las narraciones (64 textos cada uno) con una escala de 0 a 10, sin conocer a qué condición o momento pertenecía cada manuscrito.

Materiales

El contenido textual de las composiciones escritas creadas por el alumnado participante en la investigación, fue analizado a través de dos sistemas de categorías que se elaboraron, respectivamente, a partir de los criterios de evaluación para procesos escritores propuestos en el PROESC (Cuetos, Ramos, y Ruano, 2002) y de los criterios de la rúbrica creada ad hoc para esta investigación.

El PROESC es una batería de evaluación de los procesos de escritura (Cuetos et al., 2002) compuesta por 10 criterios de evaluación agrupados en dos grandes dimensiones: contenidos y coherencia. Así, la dimensión ‘contenidos’ se encuentra constituida por los siguientes criterios: dónde y cuándo, personajes, suceso consecuencias, desenlace coherente y creatividad. Mientras que la dimensión ‘coherencia’ cuenta con los siguientes criterios: continuidad lógica, sentido unitario, figuras literarias, oraciones complejas y vocabulario. El manual del instrumento informa de una consistencia interna de 0,82 (coeficiente de alfa). Además, presenta una buena validez referida a criterio y una adecuada validez factorial. De acuerdo con el manual de la prueba, se consideró que cada uno de estos criterios contaba con dos niveles de ejecución a los que se otorgó una puntuación determinada. Así, la puntuación de ‘0’ se correspondió con la ausencia de los requisitos de ese criterio en la narración evaluada, mientras que la puntuación de ‘1’ se empleó para indicar su presencia.

La rúbrica para evaluar narraciones es una escala descriptivo ordinal que cuenta con un total de siete criterios: cuatro referidos a la dimensión ‘Organización y contenido’ y tres a ‘Aspectos gramaticales’ (ver Apéndice 1). Tanto los criterios como los cuatro respectivos niveles de ejecución se redactaron teniendo en cuenta algunos antecedentes, principalmente los estudios instruccionales sobre el patrón organizativo de los textos de Thorndyke (1977) y la investigación con rúbricas de relatos, de Yan et al. (2012). Además, se consideraron aspectos relacionados con los criterios de evaluación del área de Lengua Castellana, establecidos en las Enseñanzas Mínimas de Primaria. Los enunciados de cada nivel se formularon de modo que conformaran categorías exhaustivas y mutuamente excluyentes de la calidad del relato. A cada nivel de ejecución le fueron asociadas las siguientes puntuaciones: 0 puntos (nivel 1), 0,5 puntos (nivel 2), 1 punto (nivel 3) y 1,5 puntos (nivel 4). El alumno recibió la puntuación de un nivel, únicamente cuando su texto cumplía con todos los requisitos enunciados en el mismo. En caso de faltar algún requisito, la puntuación obtenida fue la del nivel inmediatamente inferior.

Para calcular la fiabilidad en la aplicación del PROESC y la rúbrica, después de un breve entrenamiento en su aplicación, se eligieron 14 narraciones al azar, que fueron analizadas por separado por dos de los investigadores. El índice de fiabilidad obtenido, por el método Kappa Cohen, fue de 0,97 (p < ,01) en las puntuaciones totales del PROESC, y de 0,88 (p < ,01) en las puntuaciones totales de la rúbrica.

Resultados

Mejoras en la calidad de las narraciones

Las Tablas 2 y 3 reflejan los datos correspondientes a la primera y segunda narración que redactaron los alumnos de las dos condiciones experimentales, en función de la valoración cuantitativa de su calidad obtenida con el PROESC y la rúbrica.

La calidad de la narración original (antes de la intervención) resultó homogénea en ambas condiciones experimentales. Únicamente encontramos una diferencia significativa en el criterio ‘riqueza de vocabulario’ del PROESC (Chi-cuadrado = 6,66; p < ,05), en el que la condición CIR obtuvo un promedio ligeramente superior.

La aplicación del PROESC (Tabla 2), revela ciertas mejoras en las puntuaciones de la primera a la segunda producción narrativa de la condición CIR en las categorías: vocabulario, figuras literarias, dónde y cuándo, y personajes. Aunque sólo resultó significativa la diferencia referida al criterio ‘personajes’ (Chi-cuadrado = 24,28; p < ,01), se observan también diferencias significativas en la puntuación total de la dimensión global ‘contenido’ (t = 2,75; p < ,05).

En el grupo de metodología tradicional (HC) se observan mejoras entre la primera y segunda narración en los criterios: vocabulario, oraciones complejas y figuras literarias. Estas diferencias resultaron significativas en el criterio específico de ‘oraciones’ (Chi-cuadrado = 3,68; p < ,05), así como entre los promedios de la dimensión global ‘coherencia’ (t = 2,26; p < ,05).

Tabla 2.

Medias y desviaciones típicas de los resultados de los criterios del PROESC en las condiciones de Coevaluación Iterativa con Rúbrica (CIR) y heteroevaluación convencional (HC).

		Narración original (N1)				Narración final (N2)				Dif. N2-N1
		CIR		HC		CIR		HC		CIR	HC
Criterios		Media	DT	Media	DT	Media	DT	Media	DT	Media	Media
Contenido	Dónde y cuándo	0,65	0,48	0,77	0,42	0,74	0,41	0,77	0,42	0,09	0,00
	Personajes	0,65	0,49	0,77	0,50	0,79	0,40	0,77	0,50	0,14**	0,00
	Suceso con consecuencias	0,90	0,30	0,91	0,31	0,94	0,25	0,94	0,27	0,04	0,03
	Desenlace coherente	0,61	0,49	0,73	0,45	0,60	0,49	0,73	0,45	-0,01	0,00
	Creatividad	0,60	0,49	0,73	0,44	0,63	0,49	0,77	0,41	0,03	0,04
	Subtotal Contenido	3,41	1,40	3,91	1,25	3,70	1,14	3,98	1,11	0,29*	0,07
Coherencia	Continuidad lógica	0,76	0,43	0,80	0,39	0,71	0,46	0,83	0,36	-0,05	0,03
	Sentido unitario	0,60	0,49	0,67	0,50	0,60	0,49	0,72	0,49	0,00	0,05
	Figuras literarias	0,37	0,49	0,45	0,50	0,43	0,50	0,50	0,49	0,06	0,05
	Oraciones complejas	0,18	0,37	0,26	0,45	0,21	0,40	0,36	0,48	0,03	0,10*
	Vocabulario	0,39	0,49	0,18	0,39	0,47	0,50	0,27	0,45	0,08	0,09
	Subtotal Coherencia	2,30	1,32	2,36	1,37	2,42	1,36	2,68	1,27	0,12	0,32*
	Total	5,71	2,46	6,27	2,22	6,12	2,23	6,66	2,07	0,41	0,39

Nota: Diferencias estadísticamente significativas: (*) p < ,05; (**) p < ,01.

Los resultados de evaluación con la rúbrica siguen la misma tendencia que los obtenidos con el PROESC. En la Tabla 3 se observan mejoras en el contenido de la primera y segunda narración del grupo de alumnos que participó en la actividad de CIR, si bien sólo resultaron significativas en el criterio específico ‘marco’ (t = 4,62; p < ,01), que alude a la contextualización espacial y temporal del relato y sus personajes. Por su parte, el alumnado que recibió las sugerencias de mejora por parte de su profesor (condición HC), registró mejoras significativas en los aspectos gramaticales de los relatos (t = 5,23; p < ,05), especialmente en ‘ortografía’ (t = 3,04; p < ,01).

Tabla 3.

Medias y desviaciones típicas de los resultados de los criterios de la rúbrica en las condiciones de Coevaluación Iterativa con Rúbrica (CIR) y heteroevaluación convencional (HC).

	Narración original (N1)				Narración final (N2)				Dif. N2-N1
	CIR		HC		CIR		HC		CIR	HC
Criterios	Media	DT	Media	DT	Media	DT	Media	DT	Media	Media
Marco	0,92	0,41	1,02	0,39	1,22	0,46	1,02	0,40	0,30**	0,00
Tema	0,67	0,33	0,61	0,31	0,72	0,35	0,64	0,29	0,05	0,03
Trama	0,78	0,36	0,81	0,30	0,77	0,36	0,85	0,29	‒0,01	0,04
Creatividad	0,54	0,46	0,58	0,43	0,53	0,46	0,60	0,42	‒0,01	0,02
Subtotal Organización y contenido	2,91	1,13	3,02	1,01	3,24	1,18	3,11	1,03	0,33	0,09
Oraciones	0,85	0,37	0,86	0,39	0,87	0,37	1,01	0,41	0,02	0,15
Vocabulario	0,67	0,29	0,61	0,25	0,71	0,28	0,61	0,26	0,04	0,00
Ortografía	0,23	0,41	0,20	0,41	0,35	0,47	0,70	0,46	0,12	0,50**
Subtotal Aspectos gramaticales	1,75	0,72	1,67	0,66	1,93	0,74	2,32	0,82	0,18	0,65*
Total	4,66	1,58	4,69	1,44	5,17	1,61	5,43	1,55	0,51	0,74

Nota: Diferencias estadísticamente significativas: (*) p < ,05; (**) p < ,01.

Por último, la valoración global de los cuatro maestros (ajenos a la investigación), que corrigieron 64 textos cada uno, distribuidos al azar, sólo registró mejoras significativas entre la primera y la segunda narración en la modalidad de heteroevaluación. El promedio de valoración de la primera narración de la condición HC fue de 4,99; mientras que en la segunda fue de 5,73 (t = 2,07; p < ,05).

Cantidad de modificaciones introducidas en las narraciones

Para estudiar pormenorizadamente las mejoras introducidas entre la primera y segunda narración de ambas condiciones experimentales se realizó un recuento de los cambios que había experimentado la segunda narración de cada sujeto con respecto a la primera y se analizaron cualitativamente. Por ejemplo, el segundo sujeto de la condición CIR (identificado como JRR) registró una mejora de un nivel en el criterio ‘marco’ de la rúbrica como se aprecia en la siguiente transcripción:

Fragmento de la 1^a narración: ‘Erase una vez un niño que vivía en una ciudad muy fea. El niño se llamaba Alejandro […]’.

Corrección del compañero: El compañero coevaluador selecciona el tercer nivel de los cuatro existentes en el criterio ‘marco’. Además, le hace la siguiente sugerencia escrita: ‘Tienes que describir al personaje’. Tras ello, JRR realiza la 2^a narración incorporando el cambio propuesto por su compañero.

Fragmento de la 2^a narración: ‘Erase una vez un niño pobre que vivía en una ciudad. El niño se llamaba Alejandro, era alto, fuerte y con unos ojos muy verdes […]’.

También se analizaron los cambios infructuosos o negativos. Esta situación se daba presumiblemente cuan el alumno no comprendía o era incapaz de incorporar las sugerencias que el compañero evaluador había señalado, aunque frecuentemente lo intentara. A continuación, se muestra un ejemplo de cambio negativo en el criterio ‘ortografía’:

Fragmento de la 1^a narración: ‘Erase una vez un niño que siempre soñaba con ser un héroe. Un día mientras que jugaba con sus amigos le pico una araña pero no le dolió un día jugando a mover las manos disparó una telaraña. […]’.

Corrección del compañero: La compañera coevaluadora selecciona el tercer nivel de los cuatro existentes en el criterio ‘ortografía’, y añade la siguiente sugerencia: ‘no cometas errores’.

Fragmento de la 2^a narración: ‘Una mañana Martín, se levanto contento, su mamá que se llamaba Sara le hizo unas ricas tostadas. Se sentaron a desallunar, pero de repente cataplás ese ruido interrunpió el desayuno, subieron curriendo las escaleras pero no era nada. […]’

Los resultados de este análisis mostraron que 47 de los 62 sujetos (76%) que realizaron la revisión de textos siguiendo una metodología de CIR incorporaron cambios en al menos uno de los criterios de evaluación. Este dato es similar al número de alumnos que incorporaron cambios, derivados de las propuestas de mejora del profesorado (condición HC), 57 sujetos de un total de 66 (86%).

En la Tabla 4 se observan los criterios en los que se han producido los cambios así como la tendencia de los mismos, positiva o infructuosa-negativa:

Tabla 4.

Porcentaje de cambios tras la coevaluación en la condición CIR, según criterios.

Dimensión	Criterio	Cambio positivo	Cambio negativo	Total criterio	Total dimensión
Organización y contenido	Marco	33(35,1%)	0(0%)	33(35,1%)	61(65%)
Organización y contenido	Tema	5(5,3%)	0(0%)	5(5,3%)
	Trama	8(8,6%)	1(1%)	9(9,6%)
	Creatividad	11(11,7%)	3(3,2%)	14(15%)
	Oraciones	9(9,6%)	1(1%)	10(10,6%)	33(35%)
Aspectos gramaticales	Vocabulario	7(7,4%)	2(2,2%)	9(9,6%)
Aspectos gramaticales	Ortografía	12(12,8%)	2(2,2%)	14(15%)

Como puede apreciarse, las variaciones realizadas tras la CIR se han producido, fundamentalmente, en la dimensión ‘contenido’. Un 65% de los cambios generados en el conjunto de las narraciones de los sujetos de la condición CIR afectan a esta dimensión, mientras que sólo un 35% a aspectos gramaticales. Gran parte de estos cambios han afectado más específicamente al ‘marco’ de la narración (un 35,1%), mientras que en el resto de criterios no superan el 15%. Cabe destacar que el porcentaje de cambios negativos teniendo en cuenta las sugerencias de los compañeros coevaluadores es inferior al 4 % en todos los criterios de la rúbrica.

En la Tabla 5, se observan los criterios en los que se han producido los cambios en el grupo evaluado con metodología tradicional. Estas variaciones, se han producido, fundamentalmente, en ‘ortografía’ (37,6% de los cambios constatados en el conjunto de las narraciones de esta condición). También se ha registrado un alto porcentaje de cambios en la estructura de las ‘oraciones’ (21,8%). Sin embargo, en el resto de criterios los cambios no llegan al 10%. Cabe destacar el alto porcentaje de cambios negativos que se ha producido en creatividad, llegando a ser incluso superior que el de cambios positivos.

Tabla 5.

Porcentaje de incorporación de cambios tras la evaluación tradicional del profesor, según criterios.

Dimensión	Criterio	Cambio positivo	Cambio negativo	Total criterio	Total dimensión
Organización y contenido	Marco	8(7,9%)	0(0%)	8(7,9%)	34(33,66%)
Organización y contenido	Tema	8(7,9%)	0(0%)	8(7,9%)
	Trama	9(8,9%)	0(0%)	9(8,9%)
	Creatividad	4(4%)	5(4,9%)	9(8,9%)
	Oraciones	21(20,8%)	1(1%)	22(21,8%)	67(66,34%)
Aspectos gramaticales	Vocabulario	5(4,9%)	2(2%)	7(6,9%)
Aspectos gramaticales	Ortografía	36(35,6%)	2(2%)	38(37,6%)

Si comparamos las Tablas 3 y 4, observamos como el número de cambios en ‘tema’, ‘trama’ y ‘vocabulario’ es muy similar en las dos condiciones didácticas. Las grandes diferencias en cuanto a porcentaje de cambio en función del sistemas de evaluación se producen en los criterios: marco (CIR: 35,1%; HC: 7,9%), oraciones (CIR: 10,6%; HC: 21,8%), ortografía (CIR: 15%; HC: 37,6%) y creatividad (CIR: 15%; HC: 8,9%). Estas desigualdades se sintetizan en el siguiente gráfico comparativo en el que se observan las mejoras en contenido y aspectos formales de ambas condiciones experimentales, según los cambios analizados a través de los criterios de la rúbrica. El alumnado que volvió a redactar la narración a partir de las sugerencias propuestas tras la coevaluación, realizó un porcentaje de cambios en la dimensión ‘contenido’ significativamente mayor que en la dimensión referida a los aspectos formales (t = 2,06; p < .05). En el caso del grupo que recibió heteroevaluación, ocurrió lo contrario (t = 2,45; p < .05).

Figura 1.

Cambios producidos tras las actividades de heteroevaluación convencional del profesorado (HC) y de coevaluación iterativa con rúbrica (CIR).

Discusión y conclusiones

En este estudio pretendíamos comparar los efectos de diferentes actividades de evaluación en la mejora de los procesos producción y revisión de textos narrativos de alumnos de 2° y 3^er ciclo de Educación Primaria.

Sabemos que una de las principales dificultades de composición escrita que afecta al alumnado de esta etapa educativa es la ‘estructuración del contenido’ (Gallego, 2012; Ramos et al., 2005). Los resultados de aplicar el PROESC y una rúbrica creada ad hoc para evaluar narraciones de alumnos de Primaria muestran que el desarrollo en el aula de una breve actividad de revisión por pares de una narración original, sin apenas participación del profesor, puede ser más útil que la evaluación tradicional de los maestros, para que los alumnos de los últimos cursos de Primaria consigan mejorar ciertos aspectos de su organización y contenido. Los sujetos que participaron en una actividad de coevaluación iterativa con rúbrica (CIR) mejoraron significativamente, en especial la descripción de los personajes y del marco en el que ocurre la historia; mientras que los que fueron evaluados por el profesor, en una actividad de heteroevaluación convencional (HC), mejoran aspectos gramaticales y, sobre todo, ortográficos. Este resultado, si bien pone de manifiesto las limitaciones de las actividades de revisión por pares entre alumnos de Primaria para conseguir mejoras significativas en la construcción sintáctica y ortográfica de las oraciones, apoya la potencialidad de la metodología CIR para mejorar otros aspectos más relacionados con los procesos de planificación del relato.

Existen varias explicaciones posibles para la diferencia encontrada entre ambas modalidades didácticas. Es posible que, cuando evalúan redacciones, buena parte del profesorado de Primaria tienda a centrar sus correcciones más en cuestiones formales del lenguaje, gramaticales y ortográficas, que en los procesos de planificación del contenido. El hecho de que los maestros ajenos a la investigación valoraran globalmente mejor las segundas narraciones de la condición HC es coherente con esta interpretación, ya que calificaron con una puntuación mayor aquellos textos que se encontraban gramaticalmente mejor redactados. El alumnado que coevaluó las narraciones de sus compañeros, en cambio, propuso más sugerencias de mejora sobre aspectos relacionados con el contenido de la narración, es decir, con el marco, el tema, la trama o la creatividad del relato (Tabla 4).

Otra posible razón tiene que ver con la calidad de los procesos de mejora del contenido del relato que generan las actividades de revisión por pares, a diferencia de las actividades de heteroevaluación tradicionales. De hecho, algunos estudios han constatado que las discusiones sobre una tarea de aprendizaje son más productivas cuando los estudiantes trabajan en pareja que las que se producen cuando evalúa una maestra (Kirschner, Beers, Boshuizen, y Gijselaers, 2008). Las discusiones que se producen durante la coevaluación se caracterizan por el uso de un ‘lenguaje entre iguales’ y por una elevada implicación de los participantes. En este sentido, otros trabajos han demostrado que las actividades de revisión por pares producen textos mejores que aquellas en las que se trabaja de manera individual (De la Paz y Graham, 2002; Graham et al., 2005; Yarrow y Topping, 2001).

El hecho de que no hayamos encontrado mejoras significativas en todos los criterios específicos de contenido de la modalidad CIR, puede estar relacionado con la mayor complejidad de los procesos escritores que subyacen a estos componentes de la narración, principalmente la trama. También podría relacionarse con la transparencia (Panadero y Jonsson, 2013) de la rúbrica que vehiculó el proceso de revisión por pares. Por un lado, es posible que los niveles de ejecución de la rúbrica, respecto de los criterios de la trama y el tema, no estén bien ajustados al nivel del alumnado; o bien no se encuentren adecuadamente operativizados, para generar revisiones suficientemente específicas de fragmentos del texto. Por otro lado, diversos estudios han evidenciado las dificultades que encuentra el alumnado a la hora de interpretar tablas y otros sistemas externos de representación en tareas escolares (Gabucio, Martí, Enfedaque, Gilabert, y Konstantinidou, 2010). No cabe duda de que el formato de la rúbrica es complejo de interpretar por el alumnado de Primaria, dado que cuenta con columnas y filas que tienen un significado diferente al que se suele asociar a este tipo de representaciones.

En cuanto al segundo objetivo específico de esta investigación, conocer en qué medida las sugerencias de mejora de profesores y alumnos eran incorporadas en la revisión de textos, hemos constatado cambios en más de 75% de las narraciones de ambas modalidades. Este resultado contrasta con otros estudios previos con alumnos de mayor edad, que concluyen que menos de un 50% de los estudiantes incorporan las sugerencias propuestas por los compañeros (Chou, 1999; Tsui y Ng, 2000).

La cuantificación de los cambios es también coherente con la conclusión anterior, en el sentido de que las actividades de CIR desencadenan más modificaciones en el contenido y la organización de la narración, mientras que las actividades de heteroevaluación tradicionales generan más cambios en los aspectos sintácticos y ortográficos. De acuerdo con los datos que aparecen en la Tabla 4, más de un 64% de los cambios de la condición CIR afectaron al contenido de la narración (ya sea al marco, el tema, la trama o la creatividad del relato). En el caso de la metodología tradicional de evaluación (HC), los cambios positivos se concentran en el criterio ‘ortografía’ (37,6%) y en ‘oraciones’ (21,8%), siendo inferiores al 10% en el resto de dimensiones. Los pocos cambios negativos o infructuosos que se producen en ambas modalidades de evaluación pueden ser explicados por problemas relacionados con la expresión y la comprensión de las sugerencias de mejora (Graham y Hebert, 2010, 2011; Martínez, Martín, y Mateos, 2011).

En conclusión, si bien los alumnos producen un 10% menos de cambios positivos tras las sugerencias de los compañeros (condición CIR) que tras las correcciones de los profesores (condición HC), como se ha puesto de manifiesto en investigaciones anteriores (Paulus, 1999), los compañeros ofrecen valoraciones que pueden ser tanto o más efectivas para mejorar el contenido de las narraciones. Probablemente muchos alumnos se percataron de los propios errores que podían corregir en su producción al utilizar la rúbrica para evaluar las de los compañeros (Gallavan y Kottler, 2009; Schamber y Mahoney, 2006). El uso de rúbricas facilitaría que el alumnado autoevalúe, en mayor medida que otras actividades tradicionales, los procesos de planificación del contenido del texto; aspecto que, como ya hemos justificado, se considera crucial en los procesos de aprendizaje autorregulado.

Esta conclusión tiene implicaciones relevantes para la revisión de la práctica educativa en la enseñanza de la escritura. De los resultados anteriores no podemos concluir que los maestros de Primaria presten una insuficiente atención a los procesos de planificación y revisión de la escritura, más allá de los aspectos gramaticales y ortográficos. Sin embargo, no cabe duda de que recursos didácticos como los que hemos puesto a prueba en las actividades de CIR pueden ser útiles al profesorado para complementar la ayuda educativa que facilitan otras actividades más ‘tradicionales’.

Entre las principales limitaciones de este trabajo debemos señalar, en primer lugar, la escasa instrucción y la nula experiencia que los participantes tenían en el manejo de rúbricas (a lo que cabe añadir que no se evaluó el grado de comprensión de sus diferentes criterios). Contamos con evidencias de que la formación en revisión por pares tiene efectos positivos si comparamos las evaluaciones de sujetos que han recibido instrucción en este ámbito con aquellos que carecen de la misma (Min, 2006; Stanley, 1992; Zhu, 1995). En este sentido, en futuros estudios pretendemos replicar este estudio con alumnos familiarizados con el uso de los sistemas externos de representación que sirven de apoyo a los procesos de coevaluación en diferentes niveles educativos.

De igual manera, sería conveniente asegurarse que los profesores de Primaria encargados de las actividades de evaluación tradicional tienen un suficiente conocimiento de los procesos de escritura y las estructuras narrativas en los que fundamentan la rúbrica, de modo que no sea esta carencia formativa la responsable de las diferencias encontradas.

Una última limitación tiene que ver con la complejidad de las actividades de evaluación investigadas, que involucran diferentes variables didácticas. De hecho, la combinación del uso de la rúbrica con otras actividades metacognitivas es una de las limitaciones más frecuente que se aprecia en la revisión de la literatura (Panadero y Jonsson, 2013). Actualmente estamos estudiando con otra muestra de alumnos de Primaria hasta qué punto los beneficios de la condición CIR derivan específicamente de la coevaluación por pares, del instrumento que le da soporte (ya sea una rúbrica, un guión, etc.), o de la combinación de ambos.

Footnotes

References

Alonso-Tapia,

Panadero

(2010). Effect of self-assessment scripts on self-regulation and learning. Infancia y Aprendizaje, 33(3), 385–397.

Andrade,

(2005). Student perspectives on rubric-referenced assessment. Practical Assessment, Research & Evaluation, 10(3), 1–11.

Arter,

J. A.

McTigue

(2001). Scoring Rubrics In The Classroom: Using Performance Criteria For Assessing And Improving Student Performance. Thousand Oaks, CA: Corwin Press.

Bannert,

(2009). Promoting self-regulated learning through prompts. Zeitschrift Fur Padagogische Psychologie, 23(2), 139–145.

Benítez,

(2008). La evaluación de las narraciones escritas: una perspectiva holística focalizada. Enunciación, 13, 28–37.

Bereiter,

Scardamalia

(1987). The psychology of written composition. Hilsdale, NJ: Erlbaum.

Bissell,

A. N.

Lemons

P. R.

(2006). A New Method For Assessing Critical Thinking In The Classroom. Bioscience, 56(1), 66–72.

Butterfield,

Hacker

Alberstong

(1996). Environmental, cognitive and metacognitive influences of text revision: Assessing the evidence. Educational Psychology Review, 8(3), 239–297.

Camps,

(1990). Modelos del proceso de redacción: algunas implicaciones para la enseñanza. Infancia y Aprendizaje, 49, 3–17.

10.

Cassany,

(1999). Construir la escritura. Barcelona: Paidós.

11.

Castelló,

(2002). De la investigación sobre el proceso de composición a la enseñanza de la escritura. Revista Signos, 35(51–52), 149–162.

12.

Chen,

C. H.

(2010). The implementation and evaluation of a mobile self- and peer-assessment system. Computers & Education, 55(1), 229–236.

13.

Chen,

N. S.

Wei

C. W.

K. T.

Uden

(2009). Effects of high level prompts and peer assessment on online learners’ reﬂection levels. Computers & Education, 52, 283–291.

14.

Chou,

M. C.

(1999). How peer negotiations shape revisions. In Katchen

Leung

Y. N.

(Eds.), The Proceedings of the Seventh International Symposium on English Teaching (pp. 349–359). Taipei: The Crane Publishing Co.

15.

Cuetos,

Ramos

J. L,.

Ruano

(2002). PROESC: Batería de evaluación de los procesos de escritura. Madrid: TEA Ediciones.

16.

De la Paz,

Graham

(2002). Explicitly teaching strategies, skills and knowledge: writing instruction in middle school classrooms. Journal of Educational Psychology, 94, 291–304.

17.

Faigley,

Witte

S. P.

(1981). Analyzing revision. College Composition and Communication, 32, 400–414.

18.

Fidalgo,

García

J. N.

(2008). El desarrollo de la competencia escrita a través de una enseñanza metacognitiva de la escritura. Cultura y Educación, 20(3), 325–346.

19.

Fitzgerald,

(1987). Research on revision in writing. Review of educational research, 57, 481–506.

20.

Flower,

Hayes

(1981). A cognitive process theory of writing. College Composition and Communication, 32(4), 365–387.

21.

Gabucio,

Martí

Enfedaque

Gilabert

Konstantinidou

(2010). Niveles de comprensión de las tablas en alumnos de primaria y secundaria. Cultura y Educación, 22(2), 183–197.

22.

Gallavan,

N. G.

Kottler

(2009). Constructing rubrics and assessing progress collaboratively with social studies students, social studies students. The Social Studies, 100(4), 154–158.

23.

Gallego,

J. L.

(2012). Cómo estructuran el contenido de la escritura alumnos de Educación Primaria. Contextos educativos, 15, 9–26.

24.

Gelbal,

Kelecioglu

(2007). Ögretmenlerin Ölçme Degerlendirme Yöntemleri Hakkindaki Yeterlik Algilari Ve Karsilastiklari Sorunlar. Hacettepe Üniversitesi Egitim Fakültesi Dergisi, 33, 135–145.

25.

Graham,

Gillespie

McKeown

(2013). Writing: importance, development, and instruction. Reading & Writing: An Interdisciplinary Journal, 26(1), 1–15.

26.

Graham,

Harris

(2000). The role of self-regulation and transcription skills in writing and writing development. Educational Psycologist, 35(1), 3–12.

27.

Graham,

Harris

Mason

(2005). Improving the writing performance, knowledge and self-efficacy of struggling young writers: the effects of self-regulated, strategy development. Contemporary Educational Psychology, 30, 207–241.

28.

Graham,

Harris

K. R.

Hebert

(2011). Informing writing: The benefits of formative assessment. Washington, DC: Alliance for Excellence in Education.

29.

Graham,

Hebert

(2010). Writing to reading: Evidence for how writing can improve reading. Washington, DC: Alliance for Excellence in Education.

30.

Graham,

Hebert

(2011). Writing-to-read: A meta-analysis of the impact of writing and writing instruction on reading. Harvard Educational Review, 81, 710–744.

31.

Graham,

Perin

(2007). A Meta-Analysis of Writing Instruction for Adolescent Students. Journal of Educational Psychology, 99(3), 445–476.

32.

Harris,

K. R.

Graham

Mason

L. H.

(2006). Improving the wraiting, Self-Regulated Strategy Development With and Without Peer Support. American Educational Research Journal, 43(2), 295–340.

33.

Hayes,

(1996). A new framework for understanding cognition and affect in writing. In Levi

C. M.

Ransdell

(Eds.), The Science of Writing: Theories, methods, individual differences and applications (pp. 1–27). Mahwah, NJ: Lawerence Erlbaum Associates.

34.

Hayes,

J. R.

(2006). New Directions in Writing Theory. In Mac Arthur

Graham

Fitzgerald

(Eds.). Handbook of writing research. New York: The Guildford Press.

35.

Jonsson

Svingby

(2007). The Use Of Scoring Rubrics: Reliability, Validity And Educational Consequences, Educational Research Review, 2, 130–144.

36.

Kirschner,

P. A.

Beers

P. J.

Boshuizen

H. P. A.

Gijselaers

W. H.

(2008). Coercing shared knowledge in collaborative learning environments. Computers in Human Behavior, 24, 403–420.

37.

Knoch,

Read

von Randow

(2007). Re-training writing raters online: How does it compare with face-to-face training? Assessing Writing, 12(1), 26–43.

38.

Kutlu,

Ö.

Bilican

Yildirim

Ö.

(2010). A study on the primary school teachers’ attitudes towards rubrics with reference to different variables. Procedia Social and Behavioral Sciences, 2(2), 5398–5402.

39.

Lucero,

Montanero

(2012). Aprender a aprender colaborando. VI Encuentro Estatal de Orientación: “Innovación y buenas prácticas” (Abstract published in CD).

40.

Martínez,

Martín

Mateos

(2011). Enseñar a leer y escribir para aprender en la Educación Primaria. Cultura y Educación, 23(3), 399–414.

41.

Min

(2006). The effects of trained peer review on EFL students’ revision types and writing quality. Journal of Second Language Writing, 15, 118–141.

42.

Ochoa-Angrino,

Aragón

Correa

Mosquera

(2008). Funcionamiento metacognitivo de niños escolares en la escritura de un texto narrativo antes y después de una pauta de corrección conjunta [Versión electrónica], Acta Colombiana de Psicología, 11(2), 77–88.

43.

Panadero,

Alonso-Tapia

Huertas

J. A.

(2012). Rubrics and self assessment scripts effects on self-regulation, learning and self-efficacy in secondary education. Learning and Individual Differences (0). doi:10.1016/j.lindif.2012.04.007.

44.

Panadero,

Jonsson

(2013). The use of scoring rubrics for formative assessment purposes revisited: A review. Educational Research Review, 9(0), 129–144.

45.

Paulus,

(1999). The effect of peer and teacher feedback on student writing. Journal of Second Language Writing, 8(3), 265–289.

46.

Peterson,

(2003). Peer response and students’ revisions of their narrative writing. L1-Educational Studies in Language and Literature, 3(3), 239–272.

47.

Puustinen,

Pulkkinen

(2001). Models of Self-regulated Learning: A review. Scandinavian Journal of Educational Research , 45(3), 269–283.

48.

Piolat,

(1991). Effect of Word processing on text revision. Language and Education, 5, 255–272.

49.

Ramos,

J. L.

Cuadrado

Iglesias

(2005). La composición escrita en el alumnado de Educación Primaria y Secundaria. Cultura y Educación, 17(3), 239–251.

50.

Rogers,

Graham

(2008). A meta-analysis of single subject design writing intervention research. Journal of Educational Psychology, 100, 879–906.

51.

Ross-Fisher,

R. L.

(2005). Developing effective success rubrics. Kappa Delta Pi, 41(3), 131–135.

52.

Saito,

Miwa

(2007). Construction of a learning environment supporting learners’ reﬂection: a case of information seeking on the Web. Computers & Education, 49, 214–229.

53.

Sánchez,

(1998). Comprensión y redacción de textos. Barcelona: Edebé.

54.

Schamber,

J. F.

Mahoney

S. L.

(2006). Assessing And Improving The Quality Of Group Critical Thinking Exhibited In The Final Projects Of Collaborative Learning Groups. Journal Of General Education, 55(2), 103–137.

55.

Schunk,

D. H.

Zimmerman

B. J.

(2007). Influencing children’s self-efficacy and self-regulation of reading and writing through modeling. Reading and Writing Quarterly, 23, 7–25.

56.

Stanley,

(1992). Coaching student writers to be effective peer evaluators. Journal of Second Language Writing, 1(3), 217–233.

57.

Thorndyke,

P. W.

(1977). Cognitive structures in comprehension and memory of narrative discourse. Cognitive Psychology, 9, 77–110.

58.

Tsui,

A. B. M.

(2000). Do secondary L2 writers benefit from peer comments? Journal of Second Language Writing, 9(2), 147–170.

59.

Vass,

Littleton

Miell

Jones

(2008). The discourse of collaborative creative writing: peer collaboration as a context for mutual inspiration. Thinking Skills and Creativity, 3(3), 192–202.

60.

Wiggins,

(1989). A True Test: Toward More Authentic And Equitable Assessment. Phi Delta Kappan, 79, 703–713.

61.

Yan,

C. M. W.

McBride-Chang

Wagner

R. K.

Zhang

Wong

A. M. Y.

Shu

(2012). Writing quality in Chinese children: speed and fluency matter. Reading and Writing, 25(7), 1499–1521.

62.

Yang,

(2010). Students’ reﬂection on online self-correction and peer review to improve writing. Computers & Education, 55, 1202–1210.

63.

Yarrow,

Topping

(2001). Collaborative writing: The effects of metacognitive prompting and structured peer interaction. British Journal of Educational Psychology, 71, 261–282.

64.

Zhu,

(1995). Effects of training for peer response on students’ comments and interaction. Written Communication, 1(4), 492–528.

65.

Zimmerman,

Reisemberg

(1997). Becoming a self-regulated writer: A social cognitive perspective. Contemporary Educational Psychology, 22, 73–101.