Abstract
This research inspects the allocation of involvement load to the evaluation component of the involvement load hypothesis, examining how three typical approaches to evaluation (cloze-exercises, sentence-writing, and composition-writing) promote word learning. The results of this research were partially consistent with the predictions of the hypothesis: the two writing tasks with greater involvement load led to significantly better word learning than cloze-exercises with lower load, while composition-writing was significantly more effective than sentence-writing despite the same involvement load according to the matrix of the original model. Such results are explained from the perspectives of information organization and pre-task planning, based on which evaluation induced by cloze-exercises is suggested to be allocated with ‘moderate evaluation’ as it involves no use of chunking, hierarchical organization or pre-task planning, evaluation induced by sentence-writing with ‘strong evaluation’ as it involves chunking and pre-task planning at the sentence level, and evaluation induced by composition-writing with ‘very strong evaluation’ for it involves chunking, hierarchical organization and pre-task planning at the composition level.
I Introduction
Word knowledge plays a significant role in communication and is regarded as an essential component of second language acquisition. Knowing a word generally entails knowledge of its form, meaning and use, and word learning is incremental in nature as some aspects of word knowledge are developed before others (Nation, 2001). A number of hypotheses have been proposed to explain word learning, among which one of the most important and powerful is Laufer and Hulstijn’s (2001) involvement load hypothesis.
The involvement load hypothesis holds that in language learning the retention of word knowledge is conditional upon its involvement load, that is, how involved the student is in the word learning task, and tasks with greater loads are more effective than those with lower loads. The hypothesis thus acknowledges the significance of the level of processing and realizes the necessity of operationalizing such general cognitive notions from the perspective of word-focused tasks. According to the theory, the load of a task is the sum of the prominence degrees of three factors: need, search and evaluation. Need is the drive to complete a task, search is the attempt to find the meaning or form of a word, and evaluation involves comparing and selecting the most suitable meaning or form, or the creation of an original context. Among the three components, need relates to the motivational dimension and has two prominence degrees: strong need (symbolized by ‘need ++’) is intrinsic, and moderate need (‘need +’) is extrinsic. Search, from the cognitive dimension, has only one prominence degree and is symbolized by ‘search +’, and includes such activities as dictionary consultation, inferencing and negotiation. The third component, evaluation, is also a cognitive factor, and has two prominence degrees. It is moderate (‘evaluation +’) when the decision-making process involves only comparisons, but strong (‘evaluation ++’) when learner-created contexts are generated (Laufer & Hulstijn, 2001).
1 Research on the involvement load hypothesis
The proposal of the involvement load hypothesis has triggered numerous studies. On the one hand, many researchers found supportive evidence of it. For example, Hulstijn and Laufer (2001), Beal (2007), Keating (2008), Kim (2008), Huang, Willson, and Eslami (2012), Eckerth and Tavakoli (2012), and Mármol and Sánchez-Lafuente (2013) all noted that cloze-exercises (need +, search –, evaluation +) were less effective than writing tasks (need +, search –, evaluation ++), but more effective than reading tasks (need +, search –, evaluation –). Laufer and Girsai (2008) also found that translation exercises (need +, search +, evaluation +/++) were more effective than reading tasks and cloze-exercises (need +, search +, evaluation –/+). Nassaji and Hu (2012) further showed that the task of inferring the meaning of and making additional derivational changes to target words (need +, search ++, evaluation ++) was most effective, while inferring meanings without options (need +, search +, evaluation +) was less effective, and inferring meanings with options provided (need +, search –, evaluation +) the least effective. Pichette, de Serres, and Lafontaine (2012) found that writing tasks with strong evaluation were more effective than reading tasks with zero or moderate evaluation. The value of writing tasks with high involvement loads has also been highlighted by Laufer and Rozovski-Roitblat (2011) and Moonen, de Graaff, Westhoff, and Brekelmans (2014). Additionally, Niu and Helms-Park (2014) observed that written and oral output with higher involvement loads led to significantly better learning than simple reading with a lower load.
On the other hand, many studies have suggested counter-evidence to the involvement load hypothesis. Laufer’s (2003) research revealed that the task of sentence completion and dictionary consultation (need +, search +, evaluation +) was more effective than sentence-writing (need +, search –, evaluation ++) in promoting the retention of target words, even though they induced the same sum of involvement load. Folse (2006) also questioned the hypothesis as he found that three cloze-exercises (need +, search –, evaluation +) were more effective than one sentence-writing exercise (need +, search –, evaluation ++). Similar results to these are discussed in Lu (2013). Some researchers – for example, Folse (2006), Keating (2008) and Webb (2005) – have also pointed out that their results in support of the hypothesis were problematized by the fact that the effect of tasks with higher loads disappeared when time invested in the learning activities was the same (Bird, 2012).
Like the present article, other studies have suggested various elaborations and refinements of the involvement load hypothesis. The framework of Nation and Webb (2011) incorporates a greater number of elements (e.g. motivation, noticing, retrieval, generation and retention), utilizing a technique feature analysis that may produce a more sensitive prediction of vocabulary learning. As promising as this is, the present research uses Laufer and Hulstijn’s (2001) involvement load hypothesis as a starting point.
2 Limitations of the involvement load hypothesis
The review of the involvement load hypothesis literature suggested one limitation that would benefit from further investigation. It was noted that the construct does not clearly differentiate between the two degrees of prominence of evaluation, and also does not fully explain why it has only two degrees or why those differing degrees of prominence are allocated to the different methods of evaluation. Among various approaches, cloze-exercises with moderate evaluation, and composition-writing and sentence-writing with strong evaluation are three of the most frequently practiced approaches by language learners. However, composition-writing is commonly believed to be more difficult than sentence-writing. Laufer (in a personal communication with Kim), also recognized that the overall difficulty of a task should not only be attributed to the processing of target words but also to other factors such as the need to maintain coherence, and that composition-writing holistically involves deeper cognitive processing than sentence-writing (Kim, 2008). Nevertheless, she still claimed that these two approaches induce the same involvement load because they both require learners to use target words in self-generated contexts (Kim, 2008).
Moreover, most studies on the hypothesis concentrate on comparing the effectiveness of different approaches to evaluation but pay little attention to the thinking processes induced by them; and hence do not offer any explanations for why various approaches differ from each other. Therefore, research that investigates the thinking processes entailed by different approaches to evaluation is essential for an explicit explanation and further development of the involvement load hypothesis.
This study aims to examine how evaluation induced by cloze-exercises, sentence-writing, and composition-writing differ from one another in terms of involvement load. Furthermore, the study investigates the reasons for any similarities or differences that might be detected with regard to the processing of target words by participants.
II Method
To achieve the two research objectives, I conducted an experiment with 147 participants randomly assigned to three groups: Group 1 (44 participants) were asked to complete Task 1: cloze-exercises; Group 2 (46 participants) were asked to complete Task 2: sentence-writing; and Group 3 (57 participants) were asked to complete Task 3: composition-writing. To measure their word learning, 30 participants from Group 1, 30 from Group 2 and 34 from Group 3 were given an immediate posttest after task completion, and another posttest, unexpectedly, one week later (see Table 1).
Allocation of participants to tasks.
With the purpose of tapping into the participants’ thinking process and triangulating the collected data, 8 participants from Group 1, 10 from Group 2, and 15 from Group 3 were trained to do a think-aloud protocol while completing their assigned tasks. Additionally, 6 participants from Group 1, 6 from Group 2, and 8 from Group 3 took part in retrospective interviews about how they tackled these tasks. These participants did self-reporting in English or Chinese (their first language), as they preferred. These participants had also participated in the immediate posttest, and had been further interviewed about their ability to recall meanings of target words and generate original contexts using these words. However, this test-data for these participants were discarded from the statistical analyses to eliminate the possible influence of engagement in self-reporting on subsequent test performance.
1 Tasks
Task 1: cloze-exercises asked participants to read a text where 10 target words were deleted from the original text and replaced by 10 blanks. The target words were listed with their glosses in the margin of the text, and the participants were required to fill in the blanks by selecting appropriate words from the list. According to the involvement load hypothesis, this task induced need +, search –, evaluation +, as marginal glosses for the target words were provided, and participants were required to learn these words by comparing them and deciding which one suited the contexts most appropriately.
The same 10 target words with accompanying glosses were employed in the two writing tasks. Task 2: sentence-writing asked the participants to write 10 original sentences of at least 10 words in length using the target words. The sentences had to be semantically and grammatically appropriate for the target words. Task 3: composition-writing asked the participants to write a composition that coherently connects the 10 target words, and correct use of all words was required for task completion. The involvement loads of these two tasks were the same: need +, search –, evaluation ++, as the need was externally imposed, glosses were provided, and generation of original contexts was involved. No feedback was given to any participants after their completion of the assigned tasks.
2 Reading text, target words and glosses
A reading text about procrastination was used because this topic was likely to be similarly familiar to all participants, as the experience of procrastinating and coping with it was very common. Only the participants who did cloze-exercises were given the text. The participants who did sentence-writing and composition-writing were provided with a list of target words and their glosses.
Ten monosemous words (divulge, renege, taunt, lassitude, trait, apprehensive, assiduous, indispensable, ostensible and pernicious) were selected from within the context of procrastination. A range of parts of speech was covered so as to eliminate any possible influence of grammatical category. A pretest among the 147 participants was conducted three weeks before the experiment, and the results showed that the participants had almost zero pre-knowledge of the target words.
Covering the parts of speech and basic meanings, the glosses of target words were written based on their definitions in three reputable dictionaries: the Longman Dictionary of Contemporary English (5th edition, 2009), the Oxford Advanced Learner’s Dictionary (7th edition, 2009) and the Collins Cobuild Advanced Dictionary of English (2009).
3 Participants
The 147 participants, 81 female and 66 male, were non-English major freshmen at Tsinghua University, China. Their ages ranged from 18 to 21 years, and they had been learning English for at least 10 years. Their scores in the College English Examination Band 4, a Chinese national English proficiency exam in which they participated three months earlier, ranged from 425 to 450. Such scores are roughly equivalent to an IELTS score of 5.5; thus, these participants could be regarded as intermediate English learners.
4 Testing materials
Folse’s (2006) modified version of Paribakht and Wesche’s (1997) vocabulary knowledge scale (see Table 2) was employed as it is able to measure learners’ receptive knowledge of word meanings and productive knowledge of meanings and use. These aspects, on which most previous studies have concentrated, are those that are most likely to be learnt (Schmitt, 2000).
The modified vocabulary knowledge scale.
5 Scoring
The scoring system for the modified vocabulary knowledge scale is based on those of previous studies. Following Paribakht and Wesche (1993), Hulstijn and Laufer (2001), and Keating (2008), a meaning was graded zero if it is completely incorrect (e.g. writing an answer of ‘comprehensive’ for the meaning of ‘apprehensive’), a half score if it is a semantically acceptable equivalent of the target word (e.g. ‘negative’ for ‘pernicious’), and a full score if it is a comparable meaning to that of the target word (e.g. ‘tease’ for ‘taunt’). Regarding the sentences, the criteria developed by Paribakht and Wesche (1993), also used by Hulstijn and Laufer (2001), were applied: a sentence was graded zero if it has a completely inappropriate semantic context for the target word (e.g. ‘It’s apprehensive to work hard’), a half score if it has an appropriate semantic context but the target word is used ungrammatically (e.g. ‘I’m lassitude and don’t want to do anything’), and a full score if it has an appropriate semantic context and the target word is used grammatically (e.g. ‘Never renege on your promises to others’) (for details, see Zou, 2012).
Blind scoring was employed using two trained raters who scored the answers separately. As 10 target words were investigated, full marks equated to 20 (10 for meanings and 10 for sentences). The Pearson’s r for the inter-rater reliability was .95 for the immediate posttest, and .94 for the delayed posttest.
6 Analysis of think-aloud and interview data
The audio data obtained from self-reporting were transcribed into Microsoft Word documents. The transcriptions were in both English and Chinese because, as explained earlier, the participants did self-reporting in either language, as they preferred. English translations for the Chinese transcriptions were added below the corresponding transcripts.
After obtaining a holistic view of all data through the transcription stage, I coded the data following Goodfellow (1998): all data were skimmed first to note some distinct facilitative elements for word learning, relevant literature of which was then reviewed to identify theories for explanations; after this, re-examination of the data and analysis of the thinking processes was conducted. The participants’ thinking patterns were identified through analysing their thinking processes and noting repetitive expressions with similar contexts (see Denzin, 1997).
III Results
The descriptive statistics, as shown in Table 3, demonstrate that: (1) all three tasks led to effective word learning, since the participants’ pre-knowledge of the target words was close to zero; (2) participants doing composition-writing acquired higher mean scores in both posttests (15.91 in the immediate and 13.91 in the delayed posttest) than participants doing sentence-writing (12.33 in the immediate and 9.60 in the delayed posttest), and participants doing cloze-exercises had the lowest scores (8.32 in the immediate and 5.30 in the delayed posttest).
Descriptive statistics of the participants’ scores in the pretest, immediate and delayed posttests.
To further examine whether any significant differences existed among the effectiveness of the three tasks, two one-way ANOVA tests were applied. The results of the one-way ANOVA test of participants’ scores in the immediate posttest showed statistically significant differences among the three groups (F = 28.64, p < .001, η2 = .386). These statistics are also supported by the results of the post hoc: multiple comparisons as the significance values of the three pairs of comparisons were all smaller than .001. Similarly, concerning participants’ scores in the delayed posttest, significant differences were identified among the three groups (F = 31.17, p < .001, η2 = .406). The significance values of the three pairs of post hoc multiple comparisons were also all smaller than .001, and the value of zero was excluded from the 95% confidence intervals of differences between these tasks.
In summary, each of the three tasks showed significant differences to the other two in both immediate and delayed posttests. These results were partially consistent with the involvement load hypothesis, as the two writing tasks with strong evaluation were found to promote significantly more effective learning than cloze-exercises with moderate evaluation. However, contrary to the results predicted by the hypothesis, the effectiveness of sentence-writing and composition-writing showed statistically significant difference even though they were accorded the same load.
Based on these findings, I suggest allocating strong load (strong ++) to the evaluation that is induced by sentence-writing, and very strong load (very strong +++) to the evaluation induced by composition-writing, given that the evaluation induced by cloze-exercises is ‘moderate +’. In order to explain my proposed involvement loads in more detail, the following section discusses how the thinking processes of participants doing the three tasks differed from one another.
IV Discussion
The similarities and differences among the three approaches to evaluation (cloze-exercises, sentence-writing, and composition-writing) will be justified in this section from the perspectives of information organization in cognitive science (chunking and hierarchical organization), pre-task planning, time-on-task, and lengths of written products.
1 Chunking
The notion of chunking, as a unifying information-processing mechanism, was initially proposed by Chase and Simon (1973) to explain expertise behaviors. A ‘chunk’ is defined as a collection of information elements that have associations with each other, examples of which can be the manner of grouping letters into words, words into sentences, and sentences into paragraphs or even compositions (Gobet et al., 2001). Note that the use of the terms ‘chunk’ and ‘chunking’ here is different from the way the term ‘chunk’ is used in formulaicity where it refers to formulaic multiword units (Boers & Lindstromberg, 2012). Through the use of chunking, the information being processed is grouped into meaningful units, and hence is much easier to memorize (Hintzman, 1978). The significant role of chunking in increasing learners’ abilities of extracting information from the contexts has been noted by many empirical studies, and it is believed that chunking is important in determining the effectiveness of information memorization activities (Gobet et al., 2001). Research on chunking is, however, limited to the area of psychology, and no previous research has discussed chunking in the field of language acquisition. In the context of this research, participants’ generation of original sentences that shared associated semantic contexts is regarded as a sign of their use of chunking.
Participants doing the cloze-exercises did not make use of chunking because, according to the self-reporting data, they did not proactively create any contexts that associated different target words, but simply did what was required for task completion: comparing and selecting words by deciding the parts of speech and meanings of words to be placed in the blanks according to the words located immediately before or after the blanks. They did make decisions based on contexts. However, these were just the micro-contexts within sentences, not the macro-contexts between sentences. The data also showed that most participants did not feel required to comprehend the text thoroughly; many of them even noted in the post-task interviews that ‘I did not read sentences without blanks because they are useless.’ As a result, these participants’ learning of the target words involved very little processing of the connections between different words, although semantically associated sentences were provided. After task completion, they had difficulty recalling the content of the text, let alone the connection of target words to the contexts. In contrast, an examination of the thinking processes of participants doing sentence- and composition-writing shows that the method of organizing information through chunking was applied by them.
a Use of chunking by participants doing sentence-writing
The use of chunking by participants writing sentences is verified by considerable supporting evidence that can be easily found from the think-aloud and interviews transcripts. The data confirm that when a participant attempts to conceive a context for a target word, he or she is quite often influenced by those contexts he or she has already created for the previous target words, and consequently chunks among these words are generated. For example, the first think-aloud extract below shows that a participant generated two sentences that were closely connected by the context of exam, and the second extract illustrates how another participant conceived a context that connects renege and taunt.
Apprehensive, adjective, worried that something unpleasant may happen … worry … something unpleasant … Um … the final exam is coming, maybe I can write something about it … Um … I am apprehensive that … do bad in the exam … I am … that I may not perform good in the exam … OK. The next one: assiduous, adjective again, hardworking and careful … Um … still can be related to the exam … should work hard for the exam … and be careful … Um … To perform good in the exam, one should be assiduous. Renege on something, to break a promise, an agreement … Um … Mary reneged on her promise … what kind of promise? … work hard … her promise that she will work hard for the project … taunt somebody about something, to upset somebody by laughing at him, … taunt … Um, why? … When would one taunt another? … Oh, when Mary did something wrong … so John taunts Mary about her breaking the promise.
The following interview extract of another participant also demonstrates the use of chunking. Moreover, this participant explicitly recalled that it was the sentence he generated for assiduous that prompted his creation of the context for pernicious and taunt.
When I tried to construct a context for pernicious, I spontaneously recalled the sentence I wrote for assiduous. Um, ‘I work assiduously for the project because I have great interest in it.’ And the picture that I stayed up very late at night working for the project was in my mind, and I know that was unhealthy … harmful. So I wrote the sentence of ‘It is pernicious to sleep late and eat unhealthy food.’ … And when I wrote the sentence for taunt, again, I recalled that I worked hard for the project because I dislike failure. I’m afraid of being laughed at … um, of being taunted.
Abundant examples of similar supporting evidence can be found in the self-reporting data; hence it is clear that the use of chunking was involved in the processing of target words by participants doing sentence-writing (for more examples, see Zou, 2012).
Further, the data make clear that the use of chunking contributed significantly to the participants’ success in word learning. With the successful recall of the meaning of a target word, the meanings of other words in the same chunk were triggered by the association of the words with each other. The following think-aloud transcript of a participant provides evidence for this, revealing that his recollection of the meaning of lassitude resulted from the retrieval of the sentence he generated for it, the recall of which was prompted by recalling the sentence that shared the associated context.
I got the meaning of lassitude, because I recalled the sentence I created for it, and it is the sentence I created for assiduous that prompted my recollection of the sentence for lassitude, because I wrote that ‘lassitude results from a long period of hardworking’.
Another participant also stated that he memorized the meaning of apprehensive because it was in the same chunk as renege.
Actually … Um, I originally thought that I could not recall the meaning of apprehensive, but when I memorized that I had generated the sentence for renege by writing that ‘I’m apprehensive that he will renege on his promise’, I brought into mind its meaning.
A participant even pointed out directly that words sharing associated contexts were easy to memorize because they can induce the learners’ memory of one another easily.
My memory of the meaning for taunt was evoked by divulge, because the contexts I created for these two words were closely associated to each other … So one word induces the memory of the other easily.
Therefore, it is reasonable to conclude that the use of chunking contributed significantly to the effectiveness of sentence-writing in promoting word learning.
b Use of chunking by participants doing composition-writing
The analysis of the thinking processes of participants doing composition-writing also reveals that associated semantic contexts of target words were generated, and this contributed to word learning. Supporting evidence can be found in the following transcripts.
Um, maybe I can, um … put ostensible and lassitude together. It can be, the ostensible reason is … people feel tired … Um. The ostensible reason for procrastination is lassitude, so people do not feel like working … but … However, the real reason is … people do not want to work hard … assiduous, people don’t want to be assiduous. I remembered firmly the three verbs because my whole composition was developed from a story constructed by these three words … Tom taunted Jim, so Jim was very angry. So Jim reneged on his promise of keeping a secret for Tom and divulged this secret to Ann.
c Differences in the use of chunking by participants doing sentence-writing and composition-writing
There was however a great difference in the manner in which participants doing sentence-writing and composition-writing used chunking. The former group of participants made use of chunking spontaneously, whereas the latter did so systematically. Participants writing sentences only needed to create contexts for individual target words rather than contexts that connected a target word with previously generated contexts. Accordingly, none of them intentionally forced themselves to figure out a context that must have connections with others. If they did happen to get an idea that easily associated different words within the same context, chunking might be produced, whereas if that idea did not come to them, they made no special effort to generate a context that suited more than one target word. Evidence of this can be found in the following extract.
I did not bother conceiving a context that could connect trait with other words, because it is not required. The reason that I connected divulge and pernicious is that I happened to get the idea that it is pernicious to divulge national secrets.
Even when participants writing sentences had an idea of a context that could connect two target words, if their first attempts at writing a sentence with that context in English failed, they would quickly give up and conceive an unrelated but more easily produced context. Evidence of this can be found in the thinking processes of another participant who abandoned his idea of writing something about an accident in a Japanese nuclear power plant, because he did not know how to express the concept of nuclear leak in English:
I changed my mind while generating the sentence for pernicious, I did not write a sentence about the accident that happened in the Japanese nuclear power plant, although my sentences for apprehensive, divulge, and ostensible were all related to the accident, because I did not know how to write it … Um, I originally wanted to say that ‘the nuclear leak was pernicious’, but I did not know to express it in English … I did not know the word for nuclear leak.
However, participants who did composition-writing were required by the task to conceive an organized scenario that appropriately connected all target words, thus they endeavored to relate every target word with at least one of the others through associated contexts. The following think-aloud extract shows that when a participant found ostensible had not been used, he continued to contemplate how it could be associated with other words. This effort continued until an appropriate context had been successfully conceived.
OK. Among these adjectives, many of them can be used to describe a person, like apprehensive, assiduous, um … and indispensable … And this noun trait is about personality. So I can write them together … So a workaholic, he seldom has lassitude … but it’s pernicious … and, how about the three verbs: divulge, renege and taunt? … Um, he is a bad guy, he reneged on his promise and divulged a secret, so others laughed at him … Um … ostensible has not been used. Where can I put it? ostensible, seeming to be true, but not necessarily so … Where can I put it? Um, it can also be connected to the guy’s personality … reasons for his being a workaholic … the ostensible reason was that he is proud …
Based on the above discussion, it is justifiable to conclude that the use of chunking is a significant difference among the different approaches to evaluation and, further, that it helps explain the different performances of participants doing evaluation through cloze-exercises, sentence-writing, and composition-writing. Moreover, as the use of chunking by participants doing sentence-writing was different from those doing composition-writing, and the statistics show that participants doing composition-writing did significantly better than those writing sentences, it is reasonable to conclude that the difference in the manner in which participants doing these two tasks used chunking helps explain the different effectiveness of these two tasks.
2 Hierarchical organization
Hierarchical organization is another method of information organization that could explain the difference in effectiveness of the three tasks for word learning and justify the proposal of reallocating involvement loads to evaluation for cloze-exercises, sentence-writing, and composition-writing. Hierarchical organization involves units and sub-units and various relations among different units. It has been found to be an effective mechanism for cueing memory of target information because it enables participants to systematically search their memories for the items (Anderson, 2010). The difference between hierarchical organization and chunking is that in a hierarchically organizational structure, every entity, except the superordinate, is subordinate to a single other entity, whereas in chunking, items are independent from one another. Participants who organize information not only in terms of units but also in terms of hierarchical relations among various units can retrieve significantly more information than those who do not make use of hierarchical organization (Anderson, 2010; Myers, 2007).
In the context of this research, the participants’ coherent connection of various chunks through systematizing their hierarchical relationships is regarded as a sign of their use of hierarchical organization. An analysis of the thinking processes of participants doing cloze-exercises, sentence-writing, and composition-writing shows that the structuring of information through hierarchical organization was applied by participants doing composition-writing, but not those doing cloze-exercises or sentence-writing. Although participants writing sentences processed information about target words through the use of chunking, the chunks of target words were independent and shared no hierarchical relationship. Thus, no hierarchical organization was involved.
Figure 1 illustrates the sentences written by one female participant, and the following extract from her interview transcript demonstrates the use and non-use of chunking by the participant as well as the lack of hierarchical organization in her use of chunking. It can be seen that among the 10 sentences, only two chunks involving six words were developed, while the other four sentences were completely independent. She did not attempt to analyze the possible hierarchical relationships among the target words so as to connect them. Some sentences happened to share associated contexts and were connected by chunks, but others were not. Also, the two chunks were unrelated not only to one another, but also to the other four independent sentences. The participant simply created contexts for target words one by one without systematic organization; hence the chunks and the separate sentences were independent and shared no relevance to one another.
The idea of systematically organizing all the 10 target words has never come to me. I simply wrote sentences for the target words one by one. Some sentences are associated because it just happened that I conceived those associated contexts. But I did not intend to do that, so many of them are not … And the associated contexts are separately developed, so they have no connection to one another, I mean, the chunks are developed by chance … I did not have an organization of them.

Chunks used by a participant who did sentence-writing.
Many other participants doing sentence-writing thought in a similar way to the above participant. The following extract reveals that another participant who did not attempt to systemize the relationships among target words through hierarchical organization.
I simply conceived contexts for each target word for its own sake. I never bothered working out what logic relationships these contexts or words might have with one another.
It is therefore evident that participants doing sentence-writing did not make use of hierarchical organization while generating original sentences.
In contrast, participants doing composition-writing did make use of hierarchical organization. This can be seen in Figure 2 which shows the composition written by a male participant and the extract from his interview transcript. The participant systematically developed three chunks that connected all target words, and constructed a hierarchical organization of the three chunks through analyzing the relationships among the chunks so as to associate them.
Firstly, I put assiduous, trait and lassitude together because people with the characteristics of hardworking are easy to feel tired. Then, I put ostensible, pernicious, apprehensive and indispensable together, because they are all adjectives … and they are of parallel relationship. Um, then, finally, I put taunt, divulge and renege together … One tells another a secret, then he is afraid that he may not keep the promise, or even taunt him.

Hierarchical organization used by a participant who did composition-writing.
The attempt to join the three chunks indicated the participant’s intention to make use of hierarchical organization through identifying the logical relationships among these chunks. It is the processing of possible connections among target words and the chunks constituting them that helped him organize the whole composition as a coherent piece of writing. The identification of the causal relationship between Chunk 1 and Chunk 2 and the illustrational relationship between Chunk 2 and Chunk 3 enabled him to build a network of these words through an organized hierarchy.
Further evidence can be seen in the following think-aloud transcript, which shows an identification of a relationship among target words and their corresponding and associated contexts:
Um, to compare with people who procrastinate, there can be people who are hardworking and careful, assiduous … And, I can say something about their traits … To say something about one’s personality, I can use indispensable … Um, in spite of tiredness, lassitude, they try to manage everything well … and say they are indispensable … I can use trait there … The three verbs … Oh, I see. I can use them to give advice from the perspective of assiduous people to people who procrastinate. Firstly, do not try to break a promise, use renege, one should do what he promised … Secondly, use taunt, you delay something and fail to obey your promise and taunt yourself seriously … Then, divulge, tell others that you always procrastinate, and ask them to supervise you.
The following extract from another participant shows that the use of hierarchical organization helped the participant identify the relationships among different target words in order to recall several words that shared associated semantic contexts with the first word that could be recalled:
I can recall the meanings of these words, because … when I attempted to recall their meanings, I would think about the composition I wrote. Once I brought the composition to mind, I was able to work out the words’ meanings … Um, moreover, as these words are connected to each other, the recollection of one of them would help recall the others that share associated contexts. I wrote the composition myself, so it is easy for me to remember it clearly.
It is clear that the use of hierarchical organization was involved in the processing of target words by participants doing composition-writing, and that hierarchical organization contributed significantly to their success in word learning. Such results are in accordance with Anderson (2010), who noted that participants who organized target words into meaning hierarchies by using associative networks had better word learning than those who used the same words in random combination. The hierarchical organization of relationships among target words enabled participants to systematically search their memories for target items in terms of their connections to each other. Hierarchical connections may also inter-activate each other and provide participants with a large amount of cues for recall.
3 Pre-task planning
A third possible explanation for the diverse effectiveness of the three tasks is the different degrees of pre-task planning that are induced by them. As the participants doing the cloze-exercises were given contexts for the target words and hence did not feel required to do anything more than comparing different words to decide how to place them in the blanks provided, they did not engage in any pre-task planning. However, the participants who did sentence- and composition-writing needed to generate original contexts, and before they wrote down any text on the paper, they created potential scenarios incorporating the target words in a virtual mental space. These participants, therefore, were likely to have practiced using the target words twice: virtually in their heads during pre-task planning, and actually on paper while writing the text down. This, which can also be considered as a type of rehearsal, is facilitative to word learning according to Hulstijn (2001).
Nevertheless, the pre-task planning induced by sentence-writing is less demanding than that by composition-writing because the latter requires participants to think of a coherent story in which all obligatory words would fit, while the former does not. A further inference is that the pre-task planning induced by sentence-writing is at the sentence level, while that by composition-writing is at the composition level and involves more elaborate processing. This may also explain why composition-writing was the most effective among the three tasks.
4 Time-on-task
Another alternative explanation of the findings may be the influence of time-on-task. The average completion time of participants doing the cloze-exercises was around 30 minutes, and that of the participants doing the two writing tasks was both approximately 35 minutes. Whether such differences place a determinant effect on the participants’ learning of the target words is uncertain, as this is not the focus of this research and has not been explicitly investigated. Existing studies, however, have shown conflicting results with regard to the influence of task completion time on task effectiveness. Some found a negative correlation between the study time and the subsequent retention (e.g. Erten & Tekin, 2008; Nakata, 2008); some noted a positive correlation (e.g. Folse, 2006; Keating, 2008); while others have argued that time-on-task had no relation to task effectiveness (e.g. Chen, 2002; Hill & Laufer, 2003).
5 Length of the written products
In addition to the above discussions on the possible explanations of the different effectiveness of the three tasks, it is worth noting that each task produced texts of different lengths. The length of the text for the cloze-exercises was 517 words, the average total number of words written in the sentence-writing task was 153 (with a standard deviation of 30.52), while that by the participants who did composition-writing was 138 (standard deviation 50.19). Such differences, however, are unlikely to be the main determinant of the learning outcomes of the participants, especially since the major factor that affects the lengths of the texts is the richness of the transitional and contextual language, rather than the number of words that are combined with the target words. Different from the given text for the cloze-exercises which includes rich background information and transitional language, the participant-generated sentences and compositions did not contain much a great deal of language aimed at building coherence and cohesiveness (see Figure 3).

Comparison of the texts of the three tasks.
This is perhaps due to the restriction of the English proficiency levels of the participants and the fact that they were reluctant to do more than the minimum required by the tasks. The composition-writing group produced even less language than the sentence-writing group as composition-writers often used several target words in one sentence, and did not try to achieve smooth transitions between different ideas.
6 The augmented evaluation framework
Based on the three essential differences among the three tasks (whether chunking, hierarchical organization, or pre-task planning is involved or not), I propose an augmented evaluation framework to differentiate various approaches to evaluation and to identify the degrees of prominence that should be allocated to them.
a Moderate evaluation: the phrase level
Evaluation induced by a task is moderate when the processing of target lexical items (words, phrases, or idioms) entails simply comparisons among several target items or several senses of one target item to see which one fits a given context best. The focus of evaluation induced by such tasks is at the phrase level, as the contexts are given and participants do not need to generate original contexts themselves. No chunking, hierarchical organization, or pre-task planning is involved in this case. Consulting a dictionary about a polysemous word or doing cloze-exercises are typical examples of tasks inducing moderate evaluation.
b Strong evaluation: the sentence level
Evaluation induced by a task is strong when the task involves the generation of original sentences by using target lexical items (words, phrases, or idioms). The evaluation induced by such tasks is at the sentence level as contexts are not given but must be generated by participants. These tasks induce chunking and pre-task planning at the sentence level, but the chunking units are usually independent of each other. Sentence-writing is a typical example of a task inducing strong evaluation.
c Very strong evaluation: the composition level
Evaluation induced by a task is very strong when the task involves the generation of a composition with original contexts that coherently associate all target lexical items (words, phrases, or idioms). The focus of evaluation induced by such tasks is at the composition level because participants doing this task not only need to create original contexts but also have to ensure that all these contexts are associated coherently. Chunking, hierarchical organization, and pre-task planning at the composition level are all involved in this case. Composition-writing is a typical example of a task inducing very strong evaluation.
This augmented evaluation framework is consistent with the work of Joe (1998) who applied ‘generative learning theory and the depth of processing theory to incidental vocabulary learning’ (p. 358) and found that ‘higher levels of generation producing greater gains for previously unknown words’ and ‘greater use and retrieval of the target form in recall is likely to strengthen the learning pathway’ (p. 375). Specifically, the findings of the present research are consistent with Joe’s (1998) scale of generativity, wherein tasks with moderate evaluation (e.g. cloze-exercises) entail no or low generation as only small grammatical or inflectional changes may be involved; tasks with strong evaluation (e.g. sentence-writing) lead to reasonable or high generation as the use of new collocations or stretching of meanings is induced; and tasks with very strong evaluation (e.g. composition-writing) induce high generation for it involves using other words and stretching meanings.
V Implications
The results of this study demonstrate that writing exercises with target words is highly effective in word learning, as such writing tasks involve pre-task planning and require systematic organization of information about target words via chunking or hierarchical organization. Further, cloze-exercises – which entail no chunking, hierarchical organization, or pre-task planning – are significantly less effective. This is consistent with psychological studies that have shown that information organization via chunking and hierarchical organization is facilitative for information memorization (e.g. Myers, 2007). Hence, it is suggested that effective word learning should involve writing exercises which engage learners in pre-task planning and organizing target items via chunking or chunking and hierarchical organization. Self-learners are also encouraged to spend time and effort on writing tasks which involve new vocabulary. In terms of teaching materials, reading-based exercises will be more effective if they are complemented with writing tasks using target vocabulary, especially if the writing tasks require chunking, hierarchical organization, and pre-task planning.
VI Conclusions
This research re-examined the involvement load hypothesis with an aim to further develop its allocation of involvement loads to evaluation. The results partially support the hypothesis in that the two writing tasks accorded greater involvement load according to the matrix of the original model were significantly more effective than the cloze-exercises which was accorded a lesser load. However, results inconsistent with the involvement load hypothesis were also found: composition-writing led to significantly better word learning than sentence-writing even though they were hypothesized to induce the same load. The research findings can be explained in terms of the different information organization methods and degrees of pre-task planning these tasks entailed. Correspondingly, I propose an augmented evaluation framework, suggesting that evaluation induced by the cloze-exercises should be ‘moderate +’, evaluation induced by sentence-writing ‘strong ++’, and evaluation induced by composition-writing ‘very strong +++’.
Due to limits of time and resources, some limitations that may affect the generalizability of the results are indicated, such as the scope of the study, the short time gap between posttests (one week), and the possible influence of experimental conditions. For instance, as the research concentrated on the participants’ generation of original contexts using target words, the appropriateness of the resulting text of the sentence-writing and composition-writing tasks in terms of connotations, collocations, and grammatical accuracy were not taken into consideration. Finally, it should also be noted that the notion of ‘composition-writing’ in the experiment was one that required the incorporation of certain target words in the writing, whereas the term ‘composition-writing’ generally entails students deciding by themselves what words to use.
Despite the limitations, the very strong correlations between the observed results, supported by the think-aloud protocols and interview data, and explication via the cognitive science concepts of information organization (i.e. chunking and hierarchical organization), suggest that further research along these lines would be productive, especially with the aim of improving the generalizability of results.
Footnotes
Appendix
The reading text.
| Dealing with Procrastination | |
|---|---|
| Procrastination refers to the act of delaying the work you should do to a later time. It is wasting time when you have some work to do, but choose not to do it early. Most procrastinators do not feel that they are doing this on purpose. Instead, they feel that they really tried to do the work. But they could not start because there are too many things out of their control. So they have a long list of [1 , ] reasons, such as ‘I did not have time’, ‘I had to attend a wedding’ or ‘I had other important things to do’. However, these surface reasons are not true. When procrastination becomes a habit, it is [2, ]. If you procrastinate, you may often find yourself not having enough time to do a satisfying work. This can make other people unhappy and get a bad impression of you. Habitual procrastination can even damage your friendships. As you always [3, ] on your promises to complete work on time, your friends may no longer trust you. Thus it is very necessary to know the causes of procrastination and learn to deal with it. Procrastination is often caused by a real or imagined fear or worry. For instance, you might delay preparing for an oral presentation, because you are [ 4, ] that you will not be able to remember the entire speech. You may be so worried about doing a bad job that you decide not to work on it until the last minute. Being a perfectionist is a main [5, ] that causes fear and anxiety. When you imagine yourself making an English presentation, are you comparing yourself to great speakers? Or are you picturing that others [6, ] you about your accent? Rather than worry yourself with these thoughts, think of specific ways to improve the performance, may help to lessen performance anxiety. ‘Lack of motivation’ may also cause procrastination. If you are forced to learn a subject you are not interested in, you may find yourself wasting time instead of being [7, ]. Even if you know that the subject can help you get a good job, you will not work hard. It is not easy to think carefully about something you have no interest in. Feelings of [8, ] can cause procrastination, too. This often happens when you keep on pushing yourself very hard without getting any rest. If so, you may experience a state of tiredness and feel unable to focus on any work. Learning to balance your time can be helpful in preventing this. Sometimes you put off doing something because you do not know how to do it. For example, if you start doing a job that requires collecting data and creating graphics, having the right skills is [9, ]. Knowing how to do a task before you begin it is very important. Sometimes it is difficult to ask for help and sometimes it is even harder to realize that you need help. Being able to [10, ] personal limitations and ask for help is a skill we need to learn. |
apprehensive: (adj.) worried that sth. unpleasant may happen assiduous: (adj.) hardworking and careful divulge: (vt.) [divulge sth. to sb.] to tell others information that should be secret lassitude: (n.) a state of tiredness ostensible: (adj.) seeming to be true, but not necessarily so indispensable: (adj.) absolutely essential pernicious: (adj.) very harmful renege: (vi.) [renege on sth.] to break a promise, an agreement, etc. taunt: (vt.) [taunt sb. about sth.] to upset sb. by laughing at him trait: (n.) a particular characteristic, personality or quality that someone or sth. has |
Notes. adj. = adjective, sb. = somebody, sth. = something, vi. = intransitive verb, vt. = transitive verb.
Acknowledgements
I would like to thank Alice Chan, James Lambert and Haoran Xie for their helpful and insightful comments. Special gratitude also goes to the editor, Frank Boers, and the reviewers of Language Testing Research for their invaluable suggestions for improvement. This article is based on part of my PhD dissertation, which was submitted to City University of Hong Kong in 2012.
Funding
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit.
