Abstract
The aim of this experiment was to investigate how glossing influences second language (L2)reading comprehension in relation to text difficulty and the two local and global meaning representations. Fifty-eight undergraduate students were asked to read three easy, moderate, and difficult texts and, following each passage, answer twenty comprehension questions targeting local and global concepts in one of the two first-language-glossed and unglossed conditions. Half of the participants in each group were supposed to think aloud while reading. The results revealed a significant difference between the performance of glossed and unglossed groups on comprehension of local concepts in all three difficulty levels. However, the impact of glossing on comprehension of global concepts was significantly influenced by text difficulty. The qualitative analysis of think-aloud protocols suggested a substantial difference in glossing functionality on fluency between the easy and the difficult texts. Furthermore, it is suggested that revisiting the glossing effect in combination with text difficulty on the reading product and underlying processes might reconcile some divergent hypotheses on glossing impact on fluency.
I Introduction
Reading comprehension is not a mechanical part-to-whole process, as postulated by bottom-up (i.e. text-driven) models, nor is it performed in a unidirectional order as it is envisaged by individual bottom-up and top-down (i.e. conceptually-driven) models; yet it is a constantly bidirectional process requiring synchronous lower-order (e.g. word-recognition) and higher-order (e.g. inference-making) processes which are supported by and regulated in working memory (e.g. Khalifa and Weir, 2009; Kim, 2020; Kintsch, 1988; Perfetti, 1985, 2007; Walczyk, 2000). Since second language (L2) readers’ working memory has a limited capacity, it selectively directs focal attention by constantly prioritizing between these cognitive processes. In this respect, Perfetti’s (1985, 2007) verbal efficiency model of reading identifies text difficulty (particularly in terms of lexical coverage) as the main determinant in fluent L2 reading. In easy texts, for instance, working memory balances the reading processes in favor of comprehension, leading to free fluent reading with the least attentional capacity dedicated to linguistic processing; conversely, increasing the text’s linguistic demands changes this balance dynamically in favor of lexical processing leading to intensive, scrupulous, and less efficient reading. Hence, L2 readers’ mental representation of the text might remain fragmentary and disconnected (Perfetti, 1985, 2007; Perfetti and Stafura, 2014).
Traditionally, one technique that serves to help struggling readers establish an integrated text comprehension by overcoming their vocabulary limitations is marginal glossing. By offering extra information in the margin of a text, this adjunct micro-level technique can help L2 readers understand unknown words and avoid inaccurate guessing. Moreover, glosses might prevent L2 learners’ frustration of continual recourse to dictionaries. Nonetheless, from a muddled picture painted in previous studies, it is so little understood whether glossing would facilitate or hinder fluency through either reserving attentional resources for or distracting them from higher-order comprehension (e.g. Bowles, 2004; Johnson, 1982; Jung, 2016, 2017, 2020; Ko, 2005; Martínez-Fernández, 2008, 2010).
One plausible explanation for understanding the conflicting effects of reading adjuncts (here glossing) might lie in the Rand Reading Study Group’s (Snow, 2002) perspective, which assumes that the success of any intervention in reading comprehension varies depending on other factors such as text, task, and learner variables. Accordingly, in order to determine the optimal level of glossing effect, it is necessary to consider the interplay between glossing and such variables (Jung and Révész, 2018). In particular, although various levels of text difficulty (i.e. easy, moderate, and difficult) compromise differently the aforementioned push-pull competition between lexical and comprehension processes, and as a result, regulate the product of reading, up to the best of the researchers’ knowledge, previous glossing studies did not account for the glossing functionality in combination with text difficulty on product and processes of L2 reading tasks.
II Review of literature
1 The effect of glossing on reading comprehension product
Previous studies have used different annotation types such as first language (L1) translation and L2 definitions (e.g. Ko, 2005, 2012), paper-and-pen and computer-based glosses (e.g. Bowles, 2004), fill-in-the-blank, and multiple-choice glosses (e.g. Martínez-Fernández, 2008, 2010; Nagata, 1999), single textual and multimedia glosses (e.g. Chun & Plass, 1996; Davis & Lyman-Hager, 1997) and glosses in different degrees of explicitness (e.g. Hulstijn & Laufer, 2001), to name a few, for different linguistic elements (e.g. vocabulary and grammar). When it is a question of glossing impact on L2 reading comprehension, the findings of some studies are not supportive of the facilitative role of glossing on reading comprehension (e.g. Davis & Lyman-Hager, 1997; Jacobs, Dufon, and Hong, 1994; Jung, 2016; Johnson, 1982; Pak, 1986). For instance, in an early study, Johnson (1982) found no trend in favor of learners who read the text with annotated definitions of some unknown words on a written recall posttest. The researcher proposed that glossing might preclude L2 readers from fluent reading, predisposing them to focus on linguistic elements of the text and thus proceed reading the text from a bottom-up standpoint. Later studies by Pak (1986) and Jacobs et al. (1994) supported this conclusion. The former indicated that the reading ability, contrary to the marginal definition of target words, had an effect on readers’ comprehension, as measured through a cloze test. The latter found that neither L1 nor L2 glossed groups outperformed the control group on general comprehension, as measured by written recall protocols. Finding a trend in favor of participants with higher proficiency levels, the researchers concluded that in the two ends of the difficulty continuum (i.e. easy and difficult texts), glossing does not promote comprehension, being either superfluous or inadequate. In the same vein, in Davis and Lyman-Hager (1997), the effect of different glossing conditions (L1-, L2- and no-glossed) on comprehension of a literary excerpt (i.e. a descriptive diary) was investigated. Although 85 percent of participants consulted L1 glosses, the researchers found no effect for glossing on reading comprehension, as measured by written recall protocols and a multiple-choice test. The researchers mentioned that the potential interplay of glossing and text difficulty might have clouded the glossing effect on text comprehension. Finally, in a recent study, Jung (2017) found no effect for glossing (L1 gloss vs. no-gloss) on comprehension of two relatively easy expository texts, as measured by a multiple-choice test. Jung, the same as Davis and Lyman-Hager (1997), referred to the potential interplay of text difficulty with glossing effect as an explanation for the observed findings in her study and suggested that future study might shed light on gloss effects in easy and difficult texts through accounting for different comprehension levels.
Conversely, several studies found a facilitative role for glossing in reading comprehension (e.g. Bell and LeBlanc, 2000; Bowles, 2004; Chun & Plass, 1996; Davis, 1989; Jacob, 1994; Ko, 2005). Davis (1989), for instance, indicated that learners who read a glossed text (a literary story), accompanied by questions-and-comments in advance, outperformed the learners who read the same text with no aid on a written recall posttest. In a follow-up study Jacob (1994) found similar results in favor of the L1 glossed group and concluded that marginal annotations, rather than distracting L2 readers, assist fluent reading comprehension. Ko (2005), on the other hand, showed that between L1 and L2 gloss groups, only the participants in L2 glossed group outperformed the unglossed condition on a 25-item multiple-choice posttest. This finding was not supported by Bell and LeBlanc (2000), who showed in a computerized experiment that both gloss types (L1 vs. L2) correlated with improved comprehension. In response to the use of computerized annotations, Bowles (2004) investigated the impact of computerized and paper-and-pen glosses on reading comprehension as measured by twelve multiple-choice items. The findings showed glossed groups in both computerized and paper-and-pen conditions accomplished the posttest significantly better than the control group, while there was no statistically significant difference between the computerized and paper-and-pen conditions.
In sum, despite the growing body in this research strand, there is a critical limitation inherent in previous studies. Whilst some studies found glossing to be an effective strategy (e.g. Jacob, 1994; Ko, 2005), further follow-up investigation is needed in order to show that whether this effect is merely due to enhanced perception of local ideas (glossed items) or the gloss effect has permeated to global ideas (unglossed ideas) through inferential comprehension as well. The same is true in case no effect is found; For instance, Jung (2017) suggests that the idea type (i.e. glossed and unglossed ideas) might have acted as a confounding variable in her study; in other words, the absence of an effect for glossing in the comprehension of the two easy texts might be for the interference of glossing with the fluent comprehension of global ideas.
2 The effect of glossing on reading processes
Lomicka (1998) employed process measures to investigate the effects of textual and multimedia hypertexts on text comprehension in a computerized study. Twelve beginner French learners were asked to read aloud a poem, and their verbalization of inferences (i.e. paraphrases, predictions, and explanations) was coded as a measure of text comprehension. Despite learners’ tendency toward consulting textual glosses (L1 translation of target word), neither textual nor multimedia annotations resulted in improved comprehension.
In a later study, Ko (2005) targeted the beneficial effects of glossing on reading strategies by asking twelve participants to think aloud while reading a non-fictional story. Think-aloud protocols indicated an improvement in employing high-level strategies by participants in both (L1 and L2) glossed conditions. However, due to not dividing comprehension levels, Ko failed to indicate whether these heightened high-level strategies led to higher-level comprehension in L2 reading.
As a follow-up, some researchers have intentionally differentiated comprehension levels in relation to glossing impact in their research designs, like Guidi (2009) and Martínez-Fernández (2008). The former assigned sixty-five college students to four experimental groups based on whether they read an L1-glossed or unglossed text and whether they think aloud or not; participants read a ‘fairly easy’ (p.145) text, a description of Argentinians’ common customs in a narrative sequence (which was well-organized and contained clear transitions), and answered ten multiple-choice questions targeting comprehension of glossed items (i.e. local items). Guidi used think-aloud protocols as qualitative evidence for investigating the glossing impact on global comprehension. The latter statistically estimated the comprehension of both local concepts (i.e. concepts expressed by glossed items) and global concepts (i.e. the concepts that are not expressed by glossed items) by determining ten open-ended questions for each meaning representation. Not surprisingly, the two studies reached compatible findings. Glossing was shown to be a facilitative strategy for recalling local ideas but, unlike Ko (2005), had no effect on activating high-level strategies and on global comprehension.
Even though Martínez-Fernández (2008, 2010) have separated comprehension levels to investigate glossing functionality on their underlying processes (i.e. word recognition vs. comprehension), the interplay of glossing with the effects of other task-related factors such as text type (Yoshida, 2012) and text difficulty (Horiba, 2000) on fluency was not addressed adequately in this study. In coherently organized genres such as the story in Martínez-Fernández (2008), the readers might more tolerably stand the difficulty of linguistic items in the text because they still can understand episodically related events. Consequently, further studies are needed to investigate whether with more difficult texts or whether with various genres in which fluency is restricted, glossing would affect global comprehension in another way.
3 Text difficulty, fluency, and comprehension
From a cognitive information processing standpoint, reading processes include two broad essential components, namely ‘lexical access’ and ‘comprehension’ (Perfetti, 1985, p. 4). Lexical access embraces individual word identification in terms of their underlying semantic meaning and surface formal properties such as pronunciation and orthographic features. Text comprehension includes propositional encoding, which refers to encoding individual sentences, and propositional integration, which refers to connecting individual sentences to develop a whole in the form of a coherent text. Lexical access is addressed as lower-order processing, and reading comprehension is referred to cognitively as higher-order processing. Fluency is a key element in comprehension which links lower-level word recognition to higher-level comprehension (National Reading Panel, 2000). To complete the reading task efficiently, readers require, at least, a minimum degree of automaticity (i.e. fluency) on the part of the lexical access processing (Schreiber, 1980), which allows readers to discharge available cognitive and attentional resources from lower-order processes and accommodate more of higher-order comprehension. An increase in cognitive demand of lexical access shifts the attention from higher-level comprehension to lower-order linguistic processing. Thus, if the linguistic difficulty of a given text (i.e. lexical access) exceeds the readers’ existing cognitive resources, comprehension becomes unsuccessful and inefficient. The present study investigates how assisting lexical access processing through micro-level aids such as glossing might influence understanding at the comprehension level as the texts’ linguistic-processing demands increase.
4 Taxonomy of text difficulty
There is no single way to determine the degree of difficulty of a given text for a particular population of readers. Traditional methods, such as lexical coverage and readability formulas, have defined text difficulty by relying on various vocabulary measures. Lexical coverage, for instance, reflects the percentage of vocabularies that the readers know in a given text. Laufer (1989) indicated that minimum lexical coverage of 95% is sufficient for achieving adequate comprehension with a score of 55%. Carver (1994) also found that in reasonably easy texts, nearly 0% are unknown words. In fairly hard texts, nearly 2% or more are unknown words, and in texts that are matched with readers’ current proficiency, about 1% are unknown words. On the other hand, Hu and Nation (2000) indicated that L2 learners need to understand 98% of lexical units in the text for ‘adequate’ comprehension. Later studies suggested that optimal and minimum comprehension require 98% and 95% lexical coverage, respectively (Nurmukhamedov and Webb, 2019). Readability indices, on the other hand, point to the sentence- and word-length difficulty and are measured via old-generation (e.g. Flesch–Kincaid, FOG, and SMOG) and new-generation (e.g. New Dale–Chall and Lexile) of readability formulas. These tools use rather identical text variables, while the new-generation readability formulas employ more sophisticated algorithms for calculating text difficulty (Hiebert & Pearson, 2014). Though being prevalent among researchers and commercial material publishers for several decades (Lupo et al., 2019), readability formulas have been criticized for being superficial, relying on only limited measures of vocabulary, and underestimating the difficulty of texts containing short sentences (Sheehan et al., 2014).
With advances in cognitive theories regarding how readers might process texts, some researchers have hypothesized that text difficulty should be defined in terms of inter-sentential components (e.g. cohesion) in a passage rather than merely measuring intra-sentential, word-related features (e.g. Britton & Gulgoz, 1991; McNamara & Kintsch, 1996). Cohesion can be described as a set of connectives that relate sentences to each other (Halliday and Hasan, 2013). Text Evaluator and Coh-Metrix are two software programs that measure respectively lexical- and deep-cohesion (Lupo et al., 2019).
The major weakness of the methods mentioned above is that they poorly function when being practiced with web pages. This phenomenon might be due to the web pages’ short length (often lower than 100 words) and their inherent noise (Schwarm and Ostendorf, 2005). Some studies indicated that statistical language modeling (SLM) is a practical technique for calculating the probability of occurrence of a particular word or grammar form in web-based texts (Collins-Thompson & Callan, 2004). However, this technique requires a large corpus of texts that have already been grouped in the same difficulty level as well as a computer scientist beside the reading researcher (Heilman et al., 2007, 2008).
In the present study, the aim is to investigate the glossing effect on reading comprehension when the automaticity (or fluency) of readers over lower-order operations decreases. In Perfetti’s (1985, 2007) verbal efficiency model, lexical units are assumed to be the building block of lower-order processes. Hence, the present study regulates text difficulty at intra-sentential and particularly word level using traditional methods of lexical coverage and readability formulas. In so doing, by increasing text difficulty, the general criterion that the type of instructional support (here glossing) should be compatible with the type of difficulty (either at higher- or lower-level) in a given text would be observed too (McDaniel & Einstein, 1989, 2004). Moreover, each difficulty level was validated by investigating students’ perceived difficulty (a subjective method proposed by (Schraw, Bruning, & Svoboda, 1995)) and matching each difficulty level to actual comprehension scores of participants in a pilot study.
5 The present study
In the present study, the term product refers to what proportion of ideas, regardless of being glossed or unglossed in the text, can be recalled in general post-exposure tests. On the other hand, the underlying processes refer to how glossing affects comprehension of glossed and unglossed ideas through accounting for different comprehension levels in quantitative post-exposure measurements as well as gathering qualitative evidence using think-aloud protocols to validate text-difficulty modifications. In so doing, according to Perfetti’s (1985) reading processes model, it is assumed that local comprehension (here understanding of glossed ideas) reflects the glossing impact on reading processes at the lexical access level while global comprehension (here understanding of unglossed ideas) reflects the glossing impact on reading processes at the comprehension level. Moreover, since code complexity and fluency are inversely related, the glossing impact on fluency is reconsidered in relation to different text-difficulty levels. To this end, the following research questions guided the present study:
Does L1 glossing affect the L2 reading outcome in texts with different difficulty levels (easy, moderate, and difficult)?
To what extent does L1 glossing have an impact on local and global comprehension levels in specific text-difficulty levels?
III Method
1 Design
The study used a mixed between-within participants (split-plot) design in which the participants were randomly assigned to the control (i.e. unglossed condition) and the experimental (i.e. L1 glossed condition) groups and read three passages followed by comprehension tests during the three treatment sessions.
2 Participants and context
The participants were 58 freshmen, taking their general English course as two 90-minute sessions per week. Twenty-eight students were male, and the mean age was 18.5. The participants, majoring in either chemistry or physics, were chosen from an original pool of more than 150 students based on the following criteria: 1. not possessing prior knowledge of target items 2. demonstrating English proficiency at pre-intermediate level (A2 in the Common European Framework of Reference for languages (CEFR)) 3. not receiving explicit English instruction out of the school curriculum.
IV Materials and instruments
1 Placement test
In order to homogenize the participants, the Interchange placement test developed by Richards et al. (2008) was administered with a satisfactory internal consistency (α = 0.86). The test includes 70 items and, based on its standard scoring scale, scores between 18 to 23 are associated with pre-intermediate level (A2 in CEFR scale).
2 Glossed items
The target items were two grammatical forms, reduced adjective clauses comprising single verbal items (present and past participles) and past modal tense comprising grammatical structures (should have/ought to have + past participle). This means that the glossed items were carrying both grammatical and lexical meaning in this research. Each text was glossed twenty times, in which ten glosses were past modal tense, and the other ten were reduced adjective clauses through each text. The target present and past participles in these two forms were selected based on the results of a vocabulary test in a pilot study (described later) and emerged once through each passage. Each text was accessible in two forms: a text with no gloss and a text with L1 gloss (Persian translation of target words along with the structural meaning that it carried in the text). Two Iranian native speakers, who were proficient in English, too, checked the accuracy of translations. Each glossed item was colored and boldfaced, shown in Appendix 1.
3 The pilot test of present and past participles
In the first phase of the study, a list of target present and past participles through the three passages accompanied by their low-frequent synonyms (which did not belong to 5,000 most frequent English words according to the oxford word frequency list) was developed. One year prior to the main experiment, twenty randomly-selected pre-intermediate students in the same general English courses took a vocabulary test -requiring the participants to provide the L1 translation of verbs in the list; target items (i.e. present and past participles) were chosen among those items in the test that all participants did not know their meanings. In so doing, some of the participles were kept as the original ones in the passages, and some were changed with their low-frequent synonyms.
4 Targeted texts and text difficulty
The experimental texts were adaptations of three narrations from the Oxford Bookworms series. The texts were chosen with reference to two considerations: (1) unfamiliarity of the topics to the participants and (2) an adequate existence of target grammatical items or texts’ capability of being embedded with those structures (this criterion is set for the learning phase of this project). The three selected texts were classified easy (i) when the text difficulty was suitable for learners’ current proficiency level (i.e. linguistic elements of the input impose the least decoding difficulty to the learners); moderate (i+1) when it was one level beyond learners’ current proficiency level (i.e. the linguistic difficulty (here lexical difficulty) of the input does not hinder the general comprehension); and difficult (i+2) when it is two levels beyond learners’ current proficiency level (i.e. the linguistic difficulty of the input considerably hinders the general comprehension). To reach these difficulty levels accurately, the procedure for corroborating the purported difficulty levels was twofold, modification of text features (readability vs. lexical coverage) and validation of those modifications in later pilot studies.
After selecting three extracts from the storybooks, the Flesch–Kincaid readability formula was calculated for each text. Text difficulty was modified by increasing or decreasing sentence lengths and exchanging short and long words to reach the required levels. This readability index can vary from 0 to 100, in which the Lower values represent more difficult texts. Table 1 Provides the readability scores for each text accompanied by corresponding CEFR levels. The conversion of scores from Flesch–Kincaid scale to CEFR proficiency levels is derived from https://linguapress.com/teachers/flesch-kincaid.htm (see also Flesch 1948).
Readability indices with their parallel proficiency levels.
In the second step, based on findings in Laufer’s study that 95% coverage provides ‘adequate’ comprehension (i.e. a score of 55%), lexical coverage was set out for ‘moderate’ text, 95% and in the same vein, 97% and 93% for easy and difficult texts respectively. To reach these difficulty levels and ensure that all participants lack the knowledge of a certain quantity of words in particular difficulty levels, pseudowords were substituted with some vocabularies inside the texts (see Appendix 2). Moreover, since glossed items were selected from unknown present and past participles, in each difficulty level, the number of unknown words consisted of the number of glossed items plus the number of pseudowords (see Table 2).
Descriptive statistic and measures of lexical coverage.
Except for glossed items and pseudowords, to ensure that all readers can understand other vocabularies in the passages, particularly in easy and moderate texts, other difficult words were taught in advance during the course. The students’ knowledge of these words was assessed repeatedly prior to the experiment. Those students that had difficulty in recalling some of the meanings were instructed again. Eventually, all participants in the experiment could easily remember these words and their meanings.
In the next phase, text difficulty was subjectively evaluated and, the readability and word coverage modifications were validated in a pilot study one semester before the implementation of the main experiment. Accordingly, fifteen pre intermediate students in the same courses were taught in advance the vocabularies in easy and moderate texts, except for pseudo and glossed items. Then they were asked to read each passage and underline the unknown words. After reading each text, participants answered comprehension questions while being allowed to consult the text content. Finally, students subjectively rated ‘ease of comprehension’ (Schraw et al., 1995, p. 1) through two items (i.e. the passage was ……… to understand and the passage was ………… to recall) with six options in the post-reading questionnaires for each passage; the answers to these two items were significantly correlated. The analysis of participants’ ratings was entirely in line with previously obtained readability and lexical coverage indices. Almost all A2 level participants in the sample (99% and 98% respectively) ranked the first and third readings respectively easy and very difficult to understand. In the case of the second passage, their ratings fluctuated between the normal (74%) and the rather difficult (26%) options. In addition, participants underlined no word other than the target participles and pseudowords. They achieved comprehension scores of 26%, 58%, and 80%, respectively, in the difficult, moderate, and easy texts.
The familiarity/unfamiliarity of the topics to the participants was checked through a post-reading questionnaire for each passage; the comprehension measures for the participants who had prior background knowledge of at least one of the topics were excluded from the following data analysis phase.
5 Reading comprehension measures
Various methods, such as cloze test, written recall protocols, and multiple-choice tests, were employed to measure comprehension in previous studies. However, each of them had its own limitations for the purpose of the present study. For instance, the number of recalled ideas in production tests was very low in previous studies (i.e. around 33% in written recalls). On the other hand, using mere recognition tests such as multiple-choice questions could not cover all target ideas in the text because developing four options (one correct answer and three distractors) was not viable for some of the ideas (i.e. some options would have become redundant or could have lead readers to the correct answers in the other questions). At the same time, reducing the multiple-choice options could increase the chance scores. Thus, various question types, requiring both production and recognition, comprising short answer, multiple-choice, and true/false items for measuring each of local and global comprehension, were employed in this experiment (see Appendix 3 for examples). A 20-item test was developed for each individual passage. Half of the items in each test addressed the understanding of local concepts, represented by target items in the passage, and the other half addressed the global concepts, representing meanings and ideas that were not expressed by glossed items. By going through a pilot-revise-pilot cycle, the quality of reading comprehension items was examined in the same General English courses one semester before the main study. The internal consistency was measured through Cronbach’s alpha formula for the three tests, α1 = 0.89, α2 = 0.93, α3 = 0.84.
6 Questionnaires
Before the experiment, the participants answered a background questionnaire, which collected information about their demographic characteristics and prior English learning background. In addition, after each treatment session, they were asked to fill a post-reading questionnaire in order to rate their perceived difficulty as described previously and point out their familiarity with the topic of each passage.
7 Think-aloud protocols
Fourteen students in each group were prompted to produce concurrent verbal reports. First, they received explicit instructions to verbalize their thoughts while reading. Specifically, they were provided with examples of how to think aloud when they encounter unknown words. Besides, they were required to read aloud the texts in their own style and express anything that passes through their minds. They were substantially asked to avoid meta-talk, including explaining their thought and reading processes, giving justification for their choice of any reading strategy, and explaining their reasons for either attending or not attending to specific glosses; this is because this type of verbalization of thoughts is more probable that affect their performance on reading (Bowles & Leow, 2005). Moreover, they were informed that they were allowed to verbalize their thoughts in either English or Persian.
8 Procedure
In the first session (week 1), all the participants took the placement test, background questionnaire. In addition, to ensure their lack of knowledge of target participles, they took the same vocabulary test administered in the pilot study.
Three weeks later, they were randomly assigned to L1 glossed and unglossed conditions in three subsequent sessions held in two successive weeks (weeks 4 and 5). They read one passage in every session and, after collecting the passages, answered the reading comprehension questions followed by a posttest questionnaire. The reading time for each passage was limited to 15 minutes. Participants who were supposed to think aloud were sitting in different classes from silent groups. They recorded their voice using headsets and voice recording programs on their cellphones. The examiners monitored their performance during reading and prompted participants who remained silent for a while to verbalize their thoughts and keep reading aloud. Half of the participants were instructed to think their thoughts aloud while reading in both groups, and they had the same time (15 minutes) to complete their reading. The reactivity of verbal reports on reading comprehension was investigated through submitting comprehension scores of glossed and unglossed conditions for the + TA and – TA groups to separate one-way ANOVAs. The results revealed that verbal report was not reactive on participants’ performance in comprehension tests in glossed and unglossed conditions. The text orders were counterbalanced (i.e. there were three orders for presenting texts within each glossed and un-glossed conditions) to obviate the ordering effects.
To restrict potential contamination of results due to history effect, participants’ additional exposure to the target forms and embedded participles outside the experiment sessions, during the whole time in which the experiment was administered, was controlled through a post-reading questionnaire in the last session.
V Data analysis
1 Statistical analyses
In order to analyse the data, the SPSS software version 24 (IBM Corp., 2016) was used. The reliability index for different tests was calculated by Cronbach’s alpha formula. The level of significance for the whole statistical procedures was set at p < .05. Data was summarized by using descriptive statistics, including means and standard deviations. For answering research questions, several mixed-model ANOVAs were conducted.
2 The think-aloud coding system
The think-aloud protocols were investigated by the coding system that represented reading processes at the two levels of lexical access and reading comprehension. The preliminary coding system was developed based on six verbal protocols collected from a total of 39 protocols. Two protocols were related to the easy passage, two to the moderate passage, and two to the difficult passage. Two raters then independently coded all 39 concurrent verbal reports. The interrater agreement was 90%. They tried to reach an agreement by discussing that 10% of comments coded divergently.
VI Results
Prior to addressing the research questions, although the participants were randomly assigned to both groups, the equivalence of groups in terms of proficiency scores was checked by conducting a one-way ANOVA. The results confirmed the absence of any difference between groups, F(1,56) = 0.1, p = .752.
Table 3 provides the descriptive statistics for comprehension scores by groups. As can be seen, the percentages of mean scores in the unglossed condition (69.3%, 43%, and 21% for respectively easy, moderate, and difficult levels) are lower than those obtained in the pilot study. The reason for this decrease in the mean scores and their corresponding percentages might lie in the fact that, in the main experiment, the texts were collected immediately after reading and were inaccessible to the participants during the time in which they were answering the comprehension questions. However, a reasonable difference between the mean scores is still observed in three levels in the unglossed condition. Furthermore, the unglossed group performed better on global concepts when compared to local concepts in all three difficulty levels. Whether these observed differences are or are not statistically significant and how glossing has impacted them will be discussed in the following sections.
Descriptive statistics for the text comprehension.
1 The effects of glossing on reading outcomes in three levels of text difficulty
To answer the first research question, a mixed-model ANOVA was conducted with Glossing (i.e. glossed and un-glossed groups) as a between-participant variable and Text Difficulty (i.e. easy, moderate, and difficult) as a within-participant variable on scores of the three reading comprehension tests. The results yielded a significant main effect for Text Difficulty, F(2,112) = 423.889, p = .000, η2 = .883, a significant main effect for Glossing, F(1,56) = 43.778, p = .000, η2 = .439, and a significant interaction for Text Difficulty and Glossing, F(2,112) = 87.798, p = .000, η2 = .611. Then, the Bonferroni-adjusted post hoc test was conducted to identify the location of significant differences. The results of post hoc tests indicated that across the two difficulty levels (i.e. except the easy passage (p = 1), the glossed group performed much better in the comprehension test than the unglossed group (p < 0.05). Also, although, in the unglossed condition, the observed difference between the mean scores was statistically significant in different difficulty levels (p < 0.05), it is interesting that in the glossed condition, no significant difference was found in participants’ performance on comprehension measures of easy and moderate texts (p = 1.000). Figure 1 illustrates the results obtained for text comprehension by glossing and difficulty levels.

Interaction between glossing and textual difficulty on reading comprehension.
2 The effects of glossing on reading comprehension in relation to local and global concepts
To answer the second research question, the scores on the three comprehension tests related to easy, moderate, and difficult texts were given to three separate ANOVAs. A mixed-model ANOVA with one between-participant variable (Glossing) and one within-participant variable (Type of Concept: Global vs. Local) was conducted on reading comprehension scores for each text in three difficulty levels. The findings are reported subsequently according to the difficulty orders.
3 The easy text
The result of mixed-model ANOVA indicated no significant main effect for Glossing, F(1, 56) = 0.175, p = .677, partial η2 = .003, but a significant main effect for Type of Concept, F(1, 56) = 146.183, p = .000, partial η2 = .723, and a significant interaction for Type of Concept and Glossing, F(1, 56) = 77.155, p = .000, partial η2 = .579. The post-hoc Bonferroni-adjusted tests indicated that a. the glossed group outperformed the un-glossed group on the comprehension of local ideas (i.e. concepts that are conveyed by glossed items) (p < 0.05), b. conversely, the un-glossed group outperformed the glossed group on the comprehension of global concepts (p < 0.05) (i.e. the concepts that are not conveyed by glossed items) than on local concepts, and c. the both un-glossed and glossed group performed significantly better on global concepts when compared to local concepts (p < 0.05). Moreover, Figure 2 illustrates the interaction effect between glossing and concept type on comprehension of the easy text.

Interaction between glossing and type of concept on reading comprehension for the easy text.
4 The moderate text
The result of mixed-model ANOVA for the moderate text indicated a significant main effect for Type of Concept, F(1, 56) = 8.287, p = .006, partial η2 = .0129, a significant main effect for Glossing, F(1, 56) = 49.935, p = .000, partial η2 = .471, and a significant interaction for Type of Concept and Glossing, F(1, 56) = 121.147, p = .000, partial η2 = .684. The post-hoc Bonferroni-adjusted tests yielded the following results. Like on the easy text, a. the glossed group outperformed the un-glossed group on the comprehension of local ideas (p < 0.05), and also b. the un-glossed group performed significantly better on global concepts when compared to local concepts (p < 0.05); but unlike on the easy text, c. the glossed group performed significantly better on the local concepts when compared to global concepts (p < 0.05). To elucidate these findings, the interaction pattern of glossing and type of concept on reading comprehension are illustrated in Figure 3.

Interaction between glossing and type of concept on reading comprehension for the moderate text.
5 The difficult text
The result of mixed-model ANOVA for the difficult text indicated no significant main effect for Type of Concept, F(1, 56) = 1.507, p = .225, partial η2 = .026, but a significant main effect for Glossing, F(1, 56) = 117.874, p = .000, partial η2 = .678, and a significant interaction for Type of Concept and Glossing, F(1, 56) = 29.906, p = .000, partial η2 = 348. The post-hoc Bonferroni-adjusted tests revealed that (1) the experimental group significantly outperformed the control group in the comprehension of both local and global concepts (p < 0.05); (2) the glossed condition appears to perform better on the local than the global concepts, while the un-glossed condition appears to reveal the opposite results (p < 0.05). The interaction pattern by glossing and concept type on reading comprehension is given in Figure 4.

Interaction between glossing and type of concept on reading comprehension for the difficult text.
In sum, glossing has significantly boosted the performance on comprehension of local concepts in all of the three difficulty levels (i.e. easy, moderate, and difficult), which supports previous studies that generally have found favorable findings for glossing (Davis, 1989; Jacob, 1994; Ko, 2005; Bell and LeBlanc, 2000; Bowles, 2004; Guidi, 2009; Martínez-Fernández, 2010). Nonetheless, the effect of glossing on comprehension of global concepts is highly dependent on the difficulty level of the text being read. In such a way that, in the easy text, glossing has a detrimental impact on comprehension of global concepts; in return, in the difficult text, the comprehension of global concepts was significantly promoted in the experimental group. Moreover, local comprehension surpassed global comprehension in the glossed condition for the moderate and difficult texts.
6 Think-aloud protocols
To ensure the validity of different text difficulty levels and compare reading processes under the two glossing conditions (glossed vs. unglossed), the think-aloud protocols were transcribed and primarily investigated for the text processing at the level of lexical access (i.e. word identification). Two coders took part in interpreting what was stated and at the same time what was not stated in thought units. The readers rarely had verbalized word identification processes, particularly when they were automatized. The remaining verbalizations were broken down into thought units; besides previously discerned codes, eight codes were identified in the protocols at the comprehension processes level. Each of these categories (at the level of lexical access and comprehension) was explained in detail and often was illustrated by an example to help raters counting instances of their occurrence very accurately through the think-aloud protocols (see Table 4).
Code frequency for verbal reports.
Notes. Processing level: A = Reading comprehension, B = Lexical access.
Overall, in the unglossed condition, the readers produced fewer instances of the verbal report for the easy passage; at the same time, they frequently reported feeling exhausted and experiencing difficulty for the difficult passage. Moreover, the number of codes at the word identification level, in the unglossed reading condition (n = 20, n = 141, and n = 193, respectively, for the easy, moderate, and difficult texts), validated the text difficulty manipulation, which aimed to increase the amount of attention dedicated to the lexical access processing in the working memory. Likewise, in the glossed group, not surprisingly, the frequency of lower-level processes was noticeably higher than that in the unglossed group, which was primarily due to reading glosses out loud and noticing target items. Interestingly, contrary to the easy and moderate texts, glossing has induced more higher-level processes (especially the inferential comprehension; (n = 36) at the level of text difficulty, which was designed to be more challenging. Besides, for the code (Expressing personal feelings), recalls for the easy passage yielded more comments than those for the moderate and difficult texts. Lastly, just in case of reading the easy text, participants have reported higher degrees of awareness (i.e. understanding) of target items.
VII Discussion
The results of this study revealed that glossing facilitates reading comprehension in difficult texts, which runs counter to Davis et al. (1997) and Jacobs et al. (1994) studies, which suggested that glossing might lose its effect on comprehension in cases that the readers’ proficiency lags far behind the difficulty level of the text. The effects found here can be explained with reference to the interplay of glossing with the specific type of processing that distinct genres call for. Based on Material Appropriate Difficulty (MAD) framework (McDaniel & Einstein, 1989, 2004), informationally-elaborated text types (such as expository or descriptive texts) call for individual-item processing (i.e. sweeping individual propositions and concepts in the text) and relationally-elaborated text types (such as narratives) call for organizational processing (i.e. a comprehensive understanding resulted from connecting episodically related components of the text). In other words, in narratives, there is a consistent flowing coherence bolstered by massive relational propositions through the story, while in descriptions, abundant informational propositions might or might not connect to each other through intricate, diverse, and detached logical coherences. The text in Davis et al. (1997) experiment –written in a diary form – was full of detailed descriptions and in essence informationally-elaborated, quite the opposite, the text in the present study – narrating the story of a prisoner – contained an abundant sequentially interconnected concepts and was by its very nature relationally rich. By so doing, when the difficulty of a reading task increases (in terms of linguistic elements), the loose relations between propositions and various types of coherences in informationally-elaborated texts do not allow the effects of glossing to permeate to understanding other individual propositions. In other words, the glossing effects, at best, are limited to understanding the meaning of target items, while the reading comprehension process is still entirely impaired by other diverse unknown propositions and vague coherences. Based on MAD, these text types need individual-item processing for which glossing provides neither sufficient background nor linguistic knowledge. On the other hand, in relationally-elaborated texts, the beneficial impact of annotated meanings associates with the consistent, integrated, and well-established relations in the text, steering the reader’s attention to use cohesive ties in the passage for attributing meaning to the rest of unknown items. The reader does not need to understand all the peculiarities of the story,; but the main challenge is to connect the episodes. Once some relations are set with the help of provided meanings in the marginal annotations, the reader uses these coordinated relations to integrate other individual information. The qualitative investigation of think-aloud protocols also corroborated the beneficial effect of glossing in the difficult text, reporting that more than half of the participants in the control group -among those who were recording their voice while reading- submitted the passage without even succeeding to recite the whole reading once and were stuck in understanding the first paragraph while their attempts for using strategies such as skipping the unknown concepts and rereading the first paragraph were in vain. To compensate for their lack of linguistic and background knowledge, most of them wildly guessed the main purpose of the passage. In contrast, in the experimental group, participants used both low- and high-level strategies more efficiently. Skipping, making inferences, rereading, Linking ideas and contents, and paraphrasing were the most frequent strategies found in the glossed group. Moreover, the findings observed in the second research question that glossing has significantly improved both local and global comprehension of the difficult text lends further credence to this claim that item-based interventions can considerably facilitate comprehension processes in difficult texts relationally well-established.
The results obtained for the moderate text (one grade-level beyond the reader’s current proficiency) were comparable with Martínez-Fernández’s (2008, 2010) findings –using a narrative text a bit difficult for participants. Interestingly enough, both studies yielded matching results, reporting a significant difference in the local comprehension between the glossed and un-glossed groups but not in the global comprehension, despite a trend favoring students who had glosses. The type of processing in narrative texts might be helpful here as well to explain this observed discrepancy in the effects of glossing on comprehension of local and global concepts. The prevalence of relational processing in narratives invites the readers to minimally decode individual linguistic items in the text and try to rely on the coherent organization of the text to form an integrated meaning representation. In this type of processing, glossing does not facilitate the comprehension process, where other familiar contextual elements and linguistic items are also available. Because of L2 readers’ limited attentional resources, attending to the glossed items has shifted the locus of their attention from other linguistic cues to the target items in the text for making those relations. This means that although glossing is a facilitative strategy for recalling local items in narratives, it is unable to improve global comprehension in relationally-elaborated texts which are at or a bit above readers’ grade-level. The participants have ignored, skipped, guessed wildly, or even made wrong inferences of some target items in the unglossed condition, but their global comprehension was still not precluded by the text difficulty as was the case in the difficult passage.
The obtained findings for the easy text have the potential to reconcile some apparently contradictory results in this strand. On the one hand, the results on general outcome support studies that reported no effect for glossing on reading comprehension using relatively easy texts such as Jung (2017). On the other hand, the findings for local comprehension support the studies that have found an effect for glossing, particularly in the comprehension of local concepts (i.e. target items) like Guidi (2009).
Surprisingly enough, glossing was shown to significantly hinder the comprehension of global concepts in this difficulty level. In addition to the ceiling effect that might have concealed the between-group dissimilarities (Jung, 2016), the analysis of think-aloud protocols shed more light on this finding. While reading the easy passage, the control group experienced rapid and fluent reading (silent reading) with minimum reference to linguistic items. They all read the text once without any example of rereading or pausing because of either orthographical or phonological lack of knowledge. In their think-aloud protocols, it was evident that participants were reading for pleasure in this reading condition; that is, they were ‘able to read without the interruption of looking up words’ (Hu & Nation, 2000, p. 403) like many advanced level learners. Based on McLaughlin attention-processing model (McLaughlin, 1987), their focal attention was occupied with meaning and generating a coherent discourse of the text; the unknown linguistic elements were periphery although reading L2 texts was still a ‘controlled’ process for them, that is, ‘capacity limited and temporary’ (McLaughlin, Rossman and McLeod, 1983, p. 142). They could effortlessly make connections and relations needed for the relational processing in narratives. In the experimental group, however, there were two subgroups. One that did not notice the glossed items at all and proceeded to read like the control group; the other were those who started to bring the glossed items into their focal attention (i.e. they shifted the focus of their attentional resources from associations among propositions to individual unknown vocabularies). In this relatively easy text, there was no linguistically cognitive demand on working memory intruding semantic operations in the readers’ focal attention, and glossing itself acted as an extra burden on the limited capacity of attentional resources. As a result, in the comprehension test, these participants recalled more local items at the expense of the lower number of global ideas. In return, understanding the moderate and distinctively difficult texts was a demanding task for the readers; in consequence, their working memory and respectively focal attention were engaged in processing linguistic elements of the given content. In this circumstance, an adjunct micro-level aid such as glossing could reduce the cognitive burden on attentional resources and reserve them for more relational processing.
VIII Conclusions
This study attempts to address some debates on the role of glossing in reading comprehension in relation to reading conditions that vary in the degree of text difficulty. More precisely, the contradictory postulates regarding the issue of how glossing may impact higher-order comprehension have simultaneously received empirical support in the current study. Unlike Guidi (2009) and Martínez-Fernández (2010), who did not support Ko’s (2005) claim on the effect of glossing on higher-level strategies, the present study partly bolsters Ko’s (2005) hypothesis by bringing evidence that in the moderate and distinctively difficult texts that are structurally coherent and well-organized, glossing has a beneficial impact on the global understanding through promoting higher-level inferential comprehension. On the other hand, the present findings partly confirm Johnson’s (1982) claim for the deleterious role of glossing on the global comprehension just in cases that the passage is deemed relatively easy by the readers – on such occasions, glossing functions as a distractor by bringing linguistic elements to the readers’ focal attention and occupying their limited attentional resources.
Pedagogically, the findings of the present investigation provide some support for the notion that students might benefit from glossed texts which are below, at, or above their grade levels in different ways (Snow, 2002). That said, teachers, material developers, and researchers should pay attention to the interaction between text features and glossing for particular readers and particular purposes. For instance, difficult and challenging stories, when being glossed, are capable of promoting comprehension (i.e. learn to read) abilities of below-grade-level readers. In other words, glossing provides the condition for using difficult stories that may otherwise remain unattended in course books and classroom materials because of their difficulty or absence of any instructional support. At the same time, glosses in such texts stimulate more processing of target items and, in consequence, enhance readers’ awareness of those items to the shallow level of noticing, which might be conducive to the learning of item-based features like vocabulary and lexicogrammatical forms (Jung, 2020). However, such difficult texts do not induce more attentive processing at a deeper level of awareness (i.e. understanding) needed for grammatical development (Martínez-Fernández, 2010). Meanwhile, the present study indicated that easy texts, when being glossed, can induce attention at the level of understanding while reading.
Nonetheless, there are some limitations to the current study. First of all, the three passages with different difficulty levels were read under time constraints, which is fundamentally what learners often do when performing a reading comprehension task in the classroom. While it should be mentioned that the findings of this study are applicable in many classroom settings where there are time restrictions for reading, there may be concerns that the observed trade-off effect between reading comprehension and noticing of target items in the easy text might be due to reading under time pressure. In this sense, future experiments could reinforce the current findings by investigating whether the demonstrated patterns of performance for readers in each of the reading conditions would be different when reading under no time pressure. Another limitation is rooted in the choice of target items. Two criteria were applied: possessing a verbal base (i.e. being either a present or a past participle) and occurring once through the texts. It would be interesting to consider other factors in future studies such as essentialness for comprehension, abstractness, frequency, inferrability, and the target items’ position in the sentence. In the present study, the target items were revealed to possess low functional load. Another limitation lies in the repeated measure design of this study. In spite of providing a way for controlling between-participant differences, this type of design does not allow investigating the effects of glossing in combination with the effect of text difficulty on L2 learning. Future research can address this weakness via adopting non-overlapping groups. Finally, this study used a single, lower-level text feature manipulation, a narrative story text, pre-intermediate Iranian first-year university students. Further research is needed to shed light on whether our results can be generalized to other reading tasks with different cognitive loads at higher-level processes, genres, native speakers, and participants.
Footnotes
Appendix 1. Target vocabularies used in the text and Persian glosses
| Text 1 (moderate) | Text 2 (easy) | Text 3 (difficult) | ||||
|---|---|---|---|---|---|---|
| Target words embedded in target structures | L1 translation | Target words embedded in target structures | L1 translation | Target words embedded in target structures | L1 translation | |
| 1 | Supposed | که قرار بود | pampered | که ناز پرورده شده بودند | garbed in | که آراسته شده بودند در |
| 2 | perching | که نشسته بود | I ought to have expired | می بایست میمردم | violating | که داشتند تخطی میکرند از |
| 3 | you should have scolded me | تو می بایست بهمن ناسزا می گفتی | Bessie shouldn’t have complained | بسی نمی بایست شکایت میکرد | you should have incarcerated | شما می بایست زندانی میکردید |
| 4 | you shouldn’t have dared | تو نمی بایست جرات میکردی | You shouldn’t have quarreled | تو نمی بایست دعوا میکردی | You ought to have gas them | شما می بایست آنها را با گاز میکشتید |
| 5 | oppressing | که ظلم میکرد | inspiring | که الهام بخش بود | You shouldn’t have convened | شما نمی بایست جمع می شدید |
| 6 | I ought to have escaped | می بایست فرار میکردم | mildewed | که کپک زده بودند | grumbling | که داشت غر میزد |
| 7 | captivated | که مجذوب شده بود | leaking | که چکه میکردند | rallying | که داشتند اعتراض می کردند7 |
| 8 | I should have terminated | من می بایست خاتمه میدادم | enraged | که عصبانی شده بود | you shouldn’t have approached | تو نمی بایست نزدیک می شدی |
| 9 | meddling | که فضولی میکردند | Eliza oughtn’t to have assisted him | الیزا نمی بایست به او کمک می کرد | You ought to have denied | تو می بایست تکذیب میکردی |
| 10 | agitating | که آشفته میکرد | you should have deferred | تو می بایست احترام می گذاشتی | I should have corresponded | می بایست نامه می نوشتم |
| 11 | sealed | مهر و موم شده بود | you ought to have evaded | تو می بایست پرهیز میکردی | I ought to have apologized | می بایست معذرت خواهی میکردم |
| 12 | I oughtn’t to have squandered | نمی بایست هدر میدادم | He should have ceased | او می بایست متوقف می کرد | I oughtn’t to have hanged him | نمی بایست او را دار می زدم |
| 13 | perambulating | که پیاده روی میکرد | assaulting | که داشت حمله می کرد | compelled | که مجبور شده بود |
| 14 | devastated | که ویران شده بود | bleeding | که داشت خونریزی می کرد | I shouldn’t have lynched | نمی بایست می کشتم |
| 15 | I oughtn’t to have frittered away | نمی بایست هدر میدادم | She should have obeyed | او می بایست اطاعت میکرد | I shouldn’t have ruined | ای کاش از بین نمی بردم |
| 16 | I should have confessed | 19می بایست اقرار می کردم | She shouldn’t have beat him | نمی بایست او را میزد | abiding | که در انتظار بود |
| 17 | I should have appealed | می بایست درخواست میکردم | you ought to have accused John | شما می بایست جان را متهم می کردید | carved | که حکاکی شده بود |
| 18 | manufactured | که ساخته شده بود | I should have slaughtered him | می بایست او را می کشتم | inundated | که لبریز شده بود |
| 19 | I ought to have announced you | می بایست بهت میگفتم | blowing | که می وزید | fetching | که داشت می آورد |
| 20 | he should have informed me | او می بایست به من اطلاع میداد | devoted | که فدا شده بود | startled | که از وحشت یکه خورد |
Appendix 2. List of pseudo words used in this study
| Difficult text | Moderate text | |||
|---|---|---|---|---|
| Pseudo words | Equivalent real words | Pseudo words | Equivalent real words | |
| 1 | obene | edge | eyoti | faint |
| 2 | pros | jail | mincom | secret |
| 3 | taumplan | building | upworeri | property |
| 4 | pucollu | penalty | jeexta | coarse |
| 5 | kabl | Clan | nageri | misery |
| 6 | dasidise | criminal | derbe | horse |
| 7 | tellentic | comfortable | limena | stupid |
| 8 | diectims | supplies | difem | bride |
| 9 | kendmici | clemency | coilletr | creature |
| 10 | fatiol | lawyer | obull | truth |
| 11 | fete | feet | dopper | wicked |
| 12 | ostattel | envelope | duffe | house |
| 13 | milvers | letters | ||
| 14 | olouce | office | ||
| 15 | toksoms | victims | ||
| 16 | tanges | bushes | ||
| 17 | rogas | lives | ||
| 18 | lunolon | chaplin | ||
| 19 | neurv | guest | ||
| 20 | laplats | windows | ||
| 21 | urcellabie | helicopter | ||
| 22 | tophotr | witness | ||
| 23 | acemophoic | excitement | ||
| 24 | zighthit | entrance | ||
| 25 | abil | pity | ||
| 26 | breaters | guards | ||
Appendix 3
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
