Abstract
The research described in this article investigates test takers’ cognitive processing while completing onscreen IELTS (International English Language Testing System) reading test items. The research aims, among other things, to contribute to our ability to evaluate the cognitive validity of reading test items (Glaser, 1991; Field, in press).
The project focused on differences in reading behaviours of successful and unsuccessful candidates while completing IELTS test items. A group of Malaysian undergraduates (n = 71) took an onscreen test consisting of two IELTS reading passages with 11 test items. Eye movements of a random sample of these participants (n = 38) were tracked. Stimulated recall interview data was collected to assist in interpretation of the eye-tracking data.
Findings demonstrated significant differences between successful and unsuccessful test takers on a number of dimensions, including their ability to read expeditiously (Khalifa & Weir, 2009), and their focus on particular aspects of the test items and texts, while no observable difference was noted in other items. This offers new insights into the cognitive processes of candidates during reading tests. Findings will be of value to examination boards preparing reading tests, to teachers and learners, and also to researchers interested in the cognitive processes of readers.
This article describes a research project investigating onscreen IELTS (International English Language Testing System) reading test items in terms of candidates’ cognitive processing, in ways which can potentially assist in improving the cognitive validity (Glaser, 1991; Field, 2012) of these and similar reading test items. Demonstrating cognitive validity in a test of academic language proficiency implies demonstrating that it assesses candidates on the same range and types of cognitive operations as those required of students in the target programme of higher academic study. The research project reported here drew on established methods for investigating these areas, including stimulated retrospective recall interviews, and it also used eye-tracking technology in order to seek new insights into candidates’ cognitive operations.
Eye-tracking technology has been used extensively for several decades to investigate various forms of reading. However, since its use to research the particular type of reading employed during language tests is rare, one contribution which this research seeks to make is a methodological one, investigating the extent to which eye-tracking technology can offer new understanding of test takers’ reading behaviour. A second potential contribution, as suggested above, is to use eye-tracking to help assess the cognitive validity of elements of the IELTS reading paper. In terms of application, the research reported here can also assist the process of test design in the future, and can in addition inform language teachers and students seeking to prepare for reading tests such as the IELTS reading component.
Cognitive validity in reading tests
It has long been argued that tests should exhibit what is known as cognitive validity (Glaser, 1991; Baxter & Glaser, 1998), since cognitive interpretative claims are ‘not foregone conclusions, [but] need to be warranted conceptually and empirically’ (Ruiz-Primo, Shavelson, & Schultz, 2001, p. 100). As Alderson (2000, p. 97) argued, ‘[t]he validity of a test relates to the interpretation of the correct responses to items, so what matters is not what the test constructors believe an item to be testing, but which responses are considered correct, and what process underlies them.’ In short, understanding of the trait being measured requires an insight into the cognitive processing required for completion of the task.
Field has recently summarized the situation for language testing as follows: ‘we need to find out if the mental processes that a test elicits from a candidate resemble the processes that he/she would employ in non-test conditions’ (Field, 2012, emphasis in original). Field (in press) sets out three central questions which a language test must deal with in terms of its cognitive validity, namely Similarity of processing, Comprehensiveness, and Calibration (Field, in press). These considerations, which reinterpret Messick’s (1989) notions of construct-irrelevant variance and construct under-representation from a cognitive processing perspective, imply the need for better understanding of candidates’ cognitive processing in language tests. One contribution of this article, then, is to explore the potential contribution of eye-tracking to help assess the cognitive validity of reading test items.
Modelling cognitive processing in the testing of reading
A recent model of reading which pays close attention to the role of readers’ cognitive operations is that set out by Khalifa and Weir (2009). This derives in turn from work by Urquhart and Weir (1998), which characterizes reading as taking place at the local or global level, and being in nature either careful or expeditious (defined as involving ‘quick, selective and efficient reading to access desired information in a text’; Weir, Hawkey, Green, & Devi, 2009, p. 160). Khalifa and Weir’s model describes cognitive processing in reading in terms of different levels of complexity, with, for example, lexical processing as the least complex, and intertextual reading as the most. Khalifa and Weir’s model is therefore particularly valuable in that it operationalizes the concept of cognitive processing in reading, and proposes in a way amenable to empirical investigation a hierarchy of cognitive processing complexity in reading. This hierarchy is summarized in brief in Table 1.
Levels of cognitive processing in reading tests (adapted from Khalifa & Weir, 2009).
In terms of language testing the authors argue that for a high-level academic reading test to be cognitively valid it should test the full range of lower and higher cognitive processes. Research by Weir et al. (2009) then used this model to investigate the cognitive processes underlying the academic reading construct as measured by IELTS, using retrospective questionnaires and reports as the main research method. The present research project builds on that earlier work, but introduces eye-tracking in addition. It also differs from that study in that it focuses specifically on the lower level cognitive operations, levels 1–5 in Table 1.
Metacognitive strategies
Another related area of reading which has in recent years received research attention, is readers’ metacognitive strategies, often defined as ‘one’s knowledge of strategies for learning from texts, and … the control readers have of their own actions while reading for different purposes’ (Carrell, 1989, p. 650, emphasis in original). Metacognition will be considered in this article because, as Carrell notes, successful students tend to make better use of planning (Carrell, 1989, p. 650). The implication of this is that good readers can be distinguished from weaker readers in terms of their conscious ‘direction of effort’, a dimension which will be of relevance to the research described below.
Eye-tracking in reading research
Eye-tracking and ‘default’ reading
This research builds on the work by Weir et al. (2009) described above, but also uses eye-tracking technology to gain further insight into readers’ cognitive processes while they complete IELTS reading test items. The use of eye-tracking to investigate reading is not new. Rayner (1998) reviews 100 years of research into reading using eye-tracking of various sorts, divided into three periods before we reach what Duchowski (2002) has called the current ‘fourth era’ distinguished by the possibility of interactivity.
However, most of this research examined native English speakers in what has been termed the ‘default’ mode of reading, that is, ‘when comprehension is proceeding without difficulty and the eyes are continuing to move forward along a line of text’ (Reichle et al., 2009, p. 9), that is, with relatively few regressions. By contrast, however, much eye movement behaviour on the part of second language (L2) readers during reading tests is significantly different from this, for example when a reader reads a test item first then searches for relevant parts of a text. For these reasons many of the findings derived from previous eye movement research are not directly applicable to the research to be discussed here. Even so, it is useful briefly to review the main strands of such previous research in order to gather insights which could assist us in researching L2 test taker behaviour.
Rayner (1998) highlights some insights which eye-tracking offers for our understanding of (mainly ‘default’) reading. First, it is noted that eye fixations (when the eye dwells on a particular point) typically last about 200–250 ms, the mean saccade size (i.e. when the eye moves from one point to another) being 7–9 letter spaces (Rayner, 1998, p. 375). This is of importance in the present study when identifying individual words in a text which constitute the answer to a test item.
Furthermore, saccades can be distinguished according to their roles (Rayner, 1998; Rayner, Pollatsek, Ashby, & Clifton, 2012). In left-to-right languages rightward saccades typically drive onwards through the text, while four other types typically have the function (at least in ‘default’ L1 reading) of correcting ‘inefficient’ text processing (Rayner, 1998). The first of these, regressions, are defined as backwards motions for a distance of a few letters so as to reprocess a word which may not have been analysed properly previously. In Rayner’s view (1998) regressions of more than few letters are indicative of the reader’s probable failure to understand the content.
The second type, termed return sweeps, consists of the eye’s return to a precise fixation point, probably recalled by the reader as a source of processing difficulty. Importantly for the research discussed in this article, it is usually surmised that higher-proficiency readers typically use return sweeps more efficiently, since they are able to determine and recall the position in the text which caused them difficulties.
By contrast, readers at lower-proficiency levels tend instead to adopt the third type, as they backtrack through the text less efficiently until they (re)discover the source of difficulty (Rayner, 1998). The fourth type, termed corrective saccades, are eye-movements which tend successfully to re-identify text (Rayner, 1998), and again are considered a mark of higher proficiency readers. These last two types are reminiscent of what Khalifa and Weir (2009) term expeditious reading, in which effective readers quickly find areas on which they need to focus.
It is also apparent that eye movements are influenced by textual and typographical variables, for example ‘as text becomes conceptually more difficult, fixation duration increases, saccade length decreases, and the frequency of regressions [where the eye moves back rather than forwards] increases’ (Rayner, 1998, p. 376). As Rayner et al. (2012) make clear, virtually all information is extracted through fixations; therefore more efficient readers are likely to have fewer but longer fixations, with longer saccades between them. However, as fixations give limited information, so other factors also come into play – vocabulary and grammatical knowledge, and short-term memory, for example. In addition, good readers tend to be more ‘strategic’ (Pang, 2008) and are therefore more likely to use longer saccades as they locate target areas of the text. This helps to frame the research questions for the current study, which anticipated finding a difference between the fixation patterns, and the search reading patterns, of more and less successful readers under test conditions, drawing largely on local measures for analysing fixation on particular aspects of the texts and global measures for looking at higher-order processing and larger chunks of text.
Eye-tracking and cognitive processing
An important question to be addressed is whether eye-tracking can assist in the identification and explanation of underlying cognitive processes. Rayner affirms that the basic theme of his historical review is indeed that ‘eye movement data reflect moment-to-moment cognitive processes’ (Rayner, 1998, p. 372). Recent studies concur as to the value of eye-tracking for researching cognitive processes in general (e.g. Bertram, 2011; Buscher, Biedert, Heinesch, & Dengel, 2010; Eger, Ball, Stevens, & Dodd, 2007). Spivey, Richardson and Dale offer a detailed discussion of how and why eye movements can be taken to be good indicators of cognitive processes, and term them ‘a window into language and cognition’ (2009, p. 225). The same metaphor is used by Salvucci and Goldberg, who see eye-tracking as ‘a window into observers’ visual and cognitive processes’ (2000, p. 71; see also Anson, Rashid Horn, & Schwegler, 2009).
However, recent computational models of eye movements, such as the latest versions of the E-Z Reader (Reichle et al., 2009, Rayner, Pollatsek, Ashby, & Clifton, 2012), are cautious about going beyond the lexical encoding level. The main reason for this is the limited evidence which eye-tracking can provide for higher-order processes when researching ‘default’ mode reading, in which the reader makes use of relatively few regressions, return sweeps, backtracking and corrective saccades, and therefore provides little evidence which could potentially give insight into processing above the lexical level.
However, in the few studies which focus on more ‘disrupted’ forms of reading, which for example cause readers to make more regressions, evidence does start to emerge which could shed light on post-lexical cognitive processes. For instance, studies of reading Finnish sentences containing long compound words (Hyönä & Pollatsek, 1998 and Pollatsek, Hyönä, & Bertram, 2000, cited in Reichle et al., 2009) were interpreted as giving evidence of post-lexical cognitive activity, and the E-Z Reader version 7 was adapted accordingly. The implication of this is that research into ‘non-default’ reading activity can potentially help to give insight into processes beyond the lexical level, because such reading provides different kinds of evidence of what readers are doing.
The research reported here, by contrast, offers eye-tracking evidence quite different from that offered in ‘default’ reading research. As was illustrated in the only previous study into cognitive processes in reading tests using eye-tracking technology (Bax & Weir, 2012), the very nature of language tests means that readers constantly jump between the text and test item, and repeatedly regress and jump forward in various ways in their search for answers, in ways quite different from ‘default’ reading patterns. This behaviour therefore potentially offers evidence concerning higher-order processes of a kind not available to other reading research. For example, if the answer to a language test item can be found only by reading through a whole paragraph using inferencing to deduce the answer, and eye-tracking data subsequently shows Reader X doing so, then answering the question correctly, and then the reader reports this in retrospective interview, it is legitimate to infer that this reader has probably used higher-order inferencing strategies. In other words, carefully constructed research into language test reading, making use of eye-tracking technology in conjunction with other research tools, can potentially offer insights into readers’ higher-order, post-lexical processing behaviour.
A further difference from default reading is that we are concerned here with second language and not first language readers. Eye-tracking research into L2 reading has unfortunately been limited almost exclusively to investigating online parsing procedures and for detecting sensitivity to ungrammaticalities in isolation, as opposed to researching the reading of texts beyond the clause level (see, e.g., Verhagen, 2009; Keating, 2009). However, some research uses regressions and the total fixation time on words and phrases to draw inferences about later processing stages in ways which will be of value here (see Roberts, 2012 for a review).
Research methodology
Research questions
This article addresses three research questions, the first methodological:
To what extent and in what ways can eye-tracking technology shed light on the cognitive processing of participants completing onscreen reading test (IELTS) items?
To what extent and in what ways are successful readers differentiated from less successful readers in terms of their eye movements while completing onscreen reading test (IELTS) items?
To what extent and in what ways are successful readers differentiated from less successful readers in terms of their cognitive and metacognitive processing while completing onscreen reading test (IELTS) items, as evidenced from eye movement data and stimulated retrospective interview data?
Research approach, participants and instruments
In order to investigate these questions, the following approach was adopted. A group of Malaysian undergraduates (n = 71), with first languages including Bahasa Melayu, Tamil, Chinese and others, took an onscreen test consisting of two IELTS reading passages with a total of 11 test items which were considered to target the cognitive operations which this research seeks to investigate (see below). The students were first- and second-year undergraduates studying a B.Ed. at a UK university, with an average IELTS score of 6.5. The eye movements of a random sample of participants (n = 38) were tracked in ways described below. Their activities were also recorded using screen recording software.
Participants signed appropriate ethics forms and personal information forms. They were also asked to rate their familiarity with computers and onscreen tests. All reported extensive familiarity with computer technology and onscreen tests of various kinds.
In this study, for reasons of focus, it was decided specifically to investigate Careful Local reading and Expeditious Local reading, in Khalifa and Weir’s terms (2009). (See Table 2.) To this end, a Sentence completion task was chosen of a type that ‘tests [the] ability to find detail/specific information in a text’, in other words testing Careful Local reading (Cambridge ESOL, n.d.). The second was a Matching task, which ‘assesses [the] ability to scan a text in order to find specific information’ (Cambridge ESOL, n.d.), thereby testing Expeditious Local reading. The particular texts and task had previously been piloted by Devi (2010) and were selected from the IELTS Practice Papers series (Cambridge University Press), having been developed and trialled by Cambridge ESOL (English for Speakers of Other Languages), the partner responsible for IELTS test production. They were therefore considered representative of genuine IELTS reading tests and items. Devi’s research identified some test items that were not functioning effectively, so these were dropped, with the 11 remaining items being those adopted for this research project.
Characteristics of the selected reading test texts and items (Devi, 2010).
The texts and items were delivered onscreen using Adobe Flash, and linked to a database to allow for more efficient data processing. The font size was 11 point and the font was Arial. A screenshot of the test showing text, question and navigation layout can be found in Appendix 1. Each of the two passages was split across three pages, while questions for each passage remained on the same page to minimize navigation problems. Students could use the buttons at the bottom of the page for navigation, so scrolling was avoided. This was all explained to students in the video tutorial before the test.
The eye tracker used was a Tobii T60, with the tracking cameras hidden in the monitor casing to allow fully natural reading. The T60 sample rate is 60 Hz per second, with typical accuracy of 0.5 degrees, typical spatial resolution of 0.35 degrees, temporal resolution of 60 frames per second, and tracking distance of 50–80 cm, which allows detailed tracking of normal reading, and was set to a screen recording rate of 10 frames per second. (Full technical specifications can be found at: www.tobii.com.) In addition the device was furnished with binocular tracking, a user camera, and speakers for playing the tutorial soundtrack. Although this device offers somewhat lower accuracy than other devices, it compensates by allowing a fully natural experience for test takers, which was crucial in this research.
Procedure
After all personal information forms, consent forms and computer familiarity forms had been completed, the project consisted of the following steps:
Stimulated recall interviews
After the test a sample of the eye tracked candidates (n = 20) were then randomly selected, for reasons of time, to undergo a stimulated recall interview, in which each viewed the video footage of their own test, observing their eye movements represented on the screen, for 40–60 minutes each. Each was asked to describe their reading behaviour as they observed, and their analysis was recorded. The video was slowed, stopped and rewound at their request to allow them to view and comment freely, with additional prompt questions posed at various points. In this way readers offered a moment-by-moment stimulated commentary and explanation of why they read as they did, providing important evidence to explain the eye-tracking data.
Analysis and findings
The process of analysis included reliability item analysis, analysis of eye-tracking data both quantitatively and qualitatively, and analysis of the Stimulated Recall interview data, all of which will now be described.
Item analysis
As summarized in Table 3, item analysis was carried out to examine the reliability of the test items. The reliability coefficient for the 11 items was .722 (Cronbach’s Alpha), which is acceptable considering the limited number of items analysed here. The items seem on the evidence to be relatively easy for the tested population, but were nonetheless still targeting the participating students’ proficiency levels reasonably well, since the mean of the most difficult item (item 5) was .54 and that of the easiest items (item 9) was .87. However, Item 1 was not functioning adequately, the item–total correlation value being lower than .25 (Henning, 1987), so this was excluded from further analysis.
Item analysis of the 11 reading items (N = 71).
Analysis of the eye-tracking data
Quantitative analysis
The eye-tracking data consisted of full recordings of 38 participants’ complete eye movements. Analysis proceeded as follows:
2. To write out the human genome on paper would require ________ books.
This was analysed as requiring the reader to locate the following segment of the text (paragraph 3):
The human genome is the compendium of all these inherited genetic instructions. Written out along the double helix of DNA are the chemical letters of the genetic text. It is an extremely long text, for the human genome contains more than 3 billion letters. On the printed page it would fill about 7,000 volumes.
Further analysis of this item suggested that the reader, in order to answer the item correctly, would as a minimum need to use lexical matching – synonymy (in Khalifa and Weir’s terms, see Level 2 of Table 1), by identifying the elements ‘
By way of further exemplification, item 5, statistically the most difficult item according to the item analysis above, was as follows:
5. Research into genetic defects had its second success in the discovery of the cause of one form of____
This was analysed as requiring a higher level of cognitive processing in Khalifa and Weir’s terms than item 2, since the reader must of necessity read across sentences and use inference, in addition to lexical matching (synonymy) to obtain the answer. The target elements for eye-tracking analysis were identified as ‘second success’ and ‘the cause of’ in the question, and – in the text – ‘In 1989’ (required to identify this as the second success), ‘gives rise to’ (the idiom which implies causation) and the answer, ‘Cystic fibrosis’. Other items were analysed in the same way.
to the text as a whole while seeking specifically to answer that test item (not including pre-reading or post-reading);
to key sections of the text at sentence level or beyond (e.g. on the correct page of the text, or on the paragraph containing the answer), while seeking specifically to answer that test item;
to those more specific sections (e.g. words or phrases) of the text and test question previously identified as targets for each test item.
To compare the behaviour of successful and unsuccessful test takers on each item, it was decided to use non-parametric tests, as the datasets did not meet the normality and homogeneity of variances assumptions. In each case, therefore, the Mann-Whitney U test was used rather than a parametric equivalent, and given the fact that this is an ordinal test, the median values are reported and discussed below alongside the means, as is common practice (Olsen, 2003; Pallant, 2010).
It was anticipated that of the three areas of investigation cited above, the first (i) might give insight into readers’ expeditious reading ability, since it was hypothesized that successful readers would be able more expeditiously to find the correct parts of the text so as to answer the question. The second dimension (ii) was intended to give insight into readers’ higher order processing of a kind which requires reading across segments of text beyond the sentence level. It was surmised that in cases where successful readers paid more attention to these larger segments of text, this might be explained as being necessary in order to carry out the required cognitive processes beyond the clause level. If weaker readers spend significantly longer, it could be assumed that it was because they either could not find the answer (again apparently a failure of expeditious reading) or else had problems for example with lexis. Such possibilities could then be explored through the Stimulated Recall Interview data.
It was anticipated that dimension (iii) could give insight into the extent to which better readers or weaker readers attended to key areas of lexis and syntax. If unsuccessful readers were shown to spend longer over a particular element, it could be surmised that they found it more taxing, and this could be confirmed or disconfirmed at interview; if by contrast successful candidates attended significantly more to a target element, it could be supposed that that element was crucial in their successful answering of the question, a point which could also be confirmed or disconfirmed by the interview data.
A similar approach was adopted for each test item, as will now be discussed in detail. Only those issues which were found to be statistically significant need be discussed here, and since for items 4, 6, 8, 9 and 11 no significant differences between the eye movements of successful and unsuccessful readers were found in the dimensions analysed, attention will focus on items 2, 3, 5, 7 and 10. (As reported above, item 1 was excluded.) The findings will first be reported for each test item individually, and then the overall picture is discussed in greater detail below. (The items in the test can be seen in Appendix 2.)
Item 2
Item 2 required readers, as a minimum, to identify lexical synonymy within a single sentence, a relatively low level of processing. Analysis of eye movements for item 2 showed a significant difference between successful and unsuccessful candidates on one measure, namely the amount of time spent (in seconds) on the correct page of the text. The incorrect students on average (by median) spent 163.25 seconds on the correct page of the text, while the correct students on average spent only 102.55 seconds ([Incorrect group] N = 7, mean = 174.73, median = 163.25, SD = 69.01; [Correct group] N = 31, mean = 115.02, median = 102.55, SD = 68.86; Mann-Whitney U = 55.0, Z = −2.015, p = .044, sig Z = −2.015).
This suggests that although this was a relatively easy item (see Table 3 above), unsuccessful students were not able to read expeditiously so as to find the location of the answer, and therefore spent significantly more time looking (fruitlessly) than the successful readers. Interview data corroborated this impression, with five out of seven unsuccessful students making comments, as they watched their own eye movements, such as ‘I was trying here to find where the answer was’ and ‘Now I was looking for the answer up and down’, and so on.
To illustrate the visual evidence used throughout the analysis, Figure 1 shows the GazePlot data, and Figure 2 shows the Heatmap output, from a successful candidate on item 2. This candidate has expeditiously identified the location of the answer, and answered correctly within 25 seconds.

Gazeplot output from successful candidate answering item 2

Heatmap output from successful candidate answering item 2
(In the Gazeplot in Figure 1 the circles represent the eye fixations on each part of the text, smaller for shorter fixations and larger for longer fixations, with lines representing the saccade movement between fixations. In the Heatmap in Figure 2, the same data is represented according to the intensity of fixation on different parts of the text, the darkest spot on the right (red in the original version) indicating greatest intensity of focus, the lightest shading (originally yellow) next and the majority grey(originally green) the weakest).
By contrast Figures 3 and 4 show an unsuccessful candidate completing the same item, failing expeditiously to locate the place of the answer, and therefore spending much time wastefully scanning the page, taking more than 172 seconds, and answering incorrectly.

Gazeplot output from unsuccessful candidate answering item 2

Heatmap output from unsuccessful candidate answering item 2
Item 3
Item 3 was as follows:
3. A genetic problem cannot be treated with drugs because strictly speaking it is not a_____.
The answer can be found in this sentence:
None of the single-gene disorders is a disease in the conventional sense, for which it would be possible to administer a curative drug: the defect is pre-programmed into every cell of the sufferer’s body.
This required readers, as a minimum, to focus on specific terms in the question (drugs, genetic problem) and lexically match them with related terms in the correct part of the text (drug, single-gene disorders, diseases). This is in itself a relatively low-level cognitive process in Khalifa and Weir’s terms (see Table 1). Unlike item 2, however, the lexis here is notably more complex and technical, and the lexical relation between terms in the question and terms in the text is more vague. In addition the reader first has to find the correct part of the text (using expeditious reading skills) and then to disambiguate the complex syntax, in order to identify the answer as ‘disease’.
There were no significant differences between successful and unsuccessful students in terms of expeditious reading, but numerous significant differences in the area of lexis. As can be seen in Table 4, significant differences were found between the successful and unsuccessful groups in terms of their Total Fixation Duration, Fixation Count, Visit Duration and Visit Count on one element in the question (the term genetic problem), and on two elements of the text (the terms drug and single-gene disorder, except that the Fixation count for the last element was not significantly different). In all cases the unsuccessful students showed greater focus on these elements than successful students.
Eye-tracking statistics for item 3.
All significant at p < .05, n1 (incorrect) = 12, n2 (correct) = 26.
This is an interesting finding, pointing strongly in one direction. Interview data revealed that weaker students could not understand some of the terms, and – more importantly – could not with confidence match the elements of the answer with the appropriate elements in the text, nor disambiguate the syntax so as to arrive at the correct answer. One unsuccessful student said, for example ‘I am here trying to understand these words, looking at them a lot’, and eight others (out of 12 unsuccessful) made similar comments. This explains the significantly different dwell times and fixations on these particular elements of the question and text, as weaker students focused on them repeatedly in a vain attempt to answer. According to their retrospective reports, in short, the majority of them did not have the lexical knowledge, nor the syntactic knowledge, to disambiguate the target sentence appropriately. The eye-tracking data supports that analysis.
Item 5
It was noted above that item 5 required a higher level of cognitive processing in term of inference, and reading across sentences, as well as lexical knowledge. It is worth noting that statistically it was the most difficult item.
With this item, unsuccessful students looked significantly more at the whole text, as can be seen from the figures in Table 5. For example, the total fixation duration on the whole text for the unsuccessful students was on average (by median) 122.50 seconds, whilst that for the successful students was 65.39 seconds. This suggested that, as in item 2, they were unable to find the correct location of the answer efficiently, apparently another failure of expeditious reading.
Eye-tracking statistics for item 5.
All significant at p < .05.
However, with the same item successful students focused significantly more on a key element of the text. The question for this item asked readers to consider ‘the cause of ’ one form of a disorder (cystic fibrosis, the answer to the item). In order to find the answer the reader therefore had to find and disambiguate the piece of the text which matches this, in this case the phrase ‘gives rise to’. As Table 6 shows, successful students did indeed focus more on this phrase in terms of fixation duration, fixation count and visit duration, indicating that they successfully identified it and worked on it as they proceeded towards the answer. Successful students spent an average (median) of 3.83 seconds fixating on this element while unsuccessful students spent only 0.9 seconds (median). This was supported by interview evidence; for example one student commented: ‘I was looking at that piece of the text a lot because I thought it fitted what the question wanted’; this reflected the comments of 15 of the 21 successful students. In short, the evidence from eye-tracking and interview seems to point to successful students’ cognitive processing of the relevant syntax in the text so as to come to the correct answer.
Eye-tracking statistics for item 5.
All significant at p < .05, n1 (incorrect) = 17, n2 (correct) = 21.
Item 7
Unlike items 1–5, the later test items did not require the typing of a full word or number, but only the selection of a letter matching a given correct answer. As reported above (Table 2), these items were designed to make students read expeditiously for local information, this being achieved by scattering the answers across a lengthy text and obliging readers to look for particular words or phrases in order to identify the correct match.
With items 2 and 5 successful readers seemed better at expeditiously locating the place of a correct answer in the text. In those two items it was the weaker students who spent longer on larger areas of text (a page in one case and the whole text in the other) presumably unable to locate the correct part quickly, and then read the target text carefully. However, with item 7 it was successful readers who focused significantly more on a smaller, correct part of the text (see Table 7) suggesting that they were better able to identify the correct segment so as then to work more intensively on it. For example correct students had significantly more fixations on the correct paragraph (median = 15.19) than unsuccessful students (median = 2.51).
Eye-tracking statistics for item 7.
All significant at p < .05.
In addition item 7 demonstrated a further significant difference between successful and unsuccessful students in terms of their focus on one element in the text which was key to the answer, namely the phrase ‘Olympic athletes’. As can be seen in Table 8, on four different measures successful students gave significantly more attention to this element, while unsuccessful students virtually ignored it, indeed more than half did not fixate on it at all (Visit count mean = 0.56, median = 0). The reason why this was significant to the successful candidates is that elsewhere in the text other athletes were mentioned, which served to distract the unsuccessful students, as revealed in their interview data. As the previous discussion shows, however, successful students demonstrated greater acuity in locating the correct paragraph, and then in addition focusing on the key element, in this case the mention of Olympic athletes specifically, which gave them the correct answer.
Eye-tracking statistics for item 7.
All significant at p < .05.
Item 10
As with item 7 above, this item achieved its aim of testing expeditious reading skills, in that successful students attended significantly more to the paragraph with the correct answer than unsuccessful students, reinforcing the evidence from earlier items suggesting that unsuccessful students lacked the appropriate expeditious reading ‘locating’ skills. Table 9 demonstrates this significantly greater focus, with successful students spending 13.62 seconds on the paragraphs (visit duration median) in comparison with only 0.54 seconds for unsuccessful students.
Eye-tracking statistics for item 10.
All significant at p < .05, n1 (incorrect) = 12, n2 (correct) = 26.
Discussion
The eye-tracking data, although it showed significant differences in only five of the 10 items analysed, succeeded nonetheless in demonstrating in those items clear differentiation between proficient and less proficient students at three levels of cognitive processing in different test items, as shown in Table 10.
Summary of test results in terms of levels of processing.
Test items 3, 5 and 7 were therefore successful in distinguishing between proficient and less proficient students in terms of their cognitive processing at the lexical (word matching), lexical (synonymy), and grammatical levels (though the idiom ‘gives rise to’ is perhaps as much lexical as grammatical, and so could arguably be treated as level 2). Others also showed differences in terms of expeditious reading (items 2 and 10). There was no evident difference between successful and unsuccessful students at higher cognitive levels with this cohort, but this can be attributed to the fact that the test items were designed specifically to target specific information at a local level only.
Cognitive validity
In terms of cognitive validity, then, the findings from this study show the potential of eye-tracking to investigate cognitive validity in reading tests. Some of the IELTS items investigated here did prove capable of distinguishing, in cognitive processing terms, between successful and unsuccessful candidates. In particular, the use of eye tracking successfully gave insight into precisely which elements of a text and test item may be most significant in distinguishing between successful and unsuccessful responses, even at word level, in a way which no other previously used method could have done so convincingly; this is of crucial importance to test designers. In addition, the findings also demonstrated that successful candidates did employ the kinds of cognitive strategies which would be expected in real-life academic situations, while unsuccessful candidates did not, on several items.
Expeditious reading and metacognitive awareness
The results also showed significant differences in terms of expeditious reading, with unsuccessful students apparently unable to locate the site of a correct answer as effectively as successful students. They thus spent longer on larger chunks of text (items 2 and 5), while stronger students demonstrated that they could locate a smaller, particular part of a text and focus more expeditiously on it so as to extract the answer (items 7 and 10).
In the interview data for these items (2 and 5), successful candidates reported using conscious metacognitive strategies aiming to help them read expeditiously, for example ‘Here I was trying as quickly as I could to find the place where the answer was’. By contrast, unsuccessful students reported no such conscious strategies, but seemed rather to be searching almost at random, and with no strategic purpose. In other words, this area of expeditious reading seemed to be linked, in interesting ways, to metacognitive awareness, and seemed to distinguish successful from unsuccessful readers.
Other processes and strategies
In many of the test items (5 of 10), no significant differences were identified in the eye movement behaviour of those who were correct on each item and those who were not. The implication of this is that in these cases the more successful students did not use any one cognitive process significantly more than the less successful students in coming to the correct answer with these items, but perhaps used a variety of processes and strategies, or other faculties related for example to memory or lexical knowledge, none of which was either predominant or else was traceable in the eye-tracking record. This in itself is of interest, suggesting that many test items – even items designed to target particular cognitive processes – might in practice be answered using a range of cognitive processes operating together. It also demonstrates a limitation in using eye-tracking to research this area.
However, since the research showed that with some test items there were clear and significant differences between the cognitive processes of successful and unsuccessful test takers, it can be concluded that – in terms of the first research question set out in the ‘Research questions’ subsection above – the value of using eye-tracking analysis to evaluate the cognitive processing of text takers in language tests has been established. At the same time the research successfully offered an answer to the second and third research questions, in demonstrating that when completing some test items, proficient readers in such tests do make use of significantly different eye movement behaviour from less proficient test takers, their behaviour presumably linked to different cognitive processing at a number of levels.
Implications for test design and education
The findings from this study suggest that to some extent it is effective for test designers to target specific cognitive processes, since some items in this research proved able to distinguish between successful and unsuccessful candidates in terms of their cognitive processing operations. Test item writers can therefore usefully draw on Khalifa and Weir (2009), for example, to plan the kinds of items they design, so as to test different levels of cognitive processing, with a view to achieving greater cognitive validity in their reading tests. In terms of these IELTS items in particular, it can be concluded that the eye-tracking data, so far as it goes, does show evidence that they are eliciting the cognitive operations which they are targeting, at least to some extent. However, the cognitive operations in question are limited to ‘lower level’ areas. Further research is needed into IELTS items which aim to target higher-order areas such as inferencing, building a mental model and whole-text function, if we are to conclude that the IELTS reading paper as a whole is covering the full range of cognitive processes.
The research also has implications for teachers and learners, since it is clear that those candidates were more successful who made use of expeditious reading strategies, particularly to locate in the text the possible site of the correct answer as speedily as possible. Successful candidates also showed better abilities at the lexical level (in matching words in the question and the text, and in doing the same with synonyms), which suggests that it is valuable for learners to work on their lexical knowledge and their ability to identify lexical matches of various kinds. Successful students also showed better ability in terms of dealing with syntactic ambiguities, so it is useful for learners to learn to deal with grammatical ambiguity in reading test items which might obscure the correct answer.
As discussed earlier, research which uses eye tracking to investigate L2 reading is scarce, and what exists focuses mainly below the clause level and not on the reading of larger texts. It is worth noting that in this study nothing emerged which appeared peculiar to L2 readers, such was the proficiency of the participants; further research replicating this study with L1 readers would be needed to identify any salient differences.
Conclusion
In conclusion, this research project represents the first substantial research into cognitive processes in reading tests using eye-tracking technologies, and has successfully illuminated some of the darker corners of readers’ mental processing while they try to complete such tests. It has demonstrated the potential of this research tool, and also shown up some of its limitations, and offered insights into successful and unsuccessful readers’ behaviour in ways which can be of value to test designers, to students, to teachers and to researchers interested in the wider cognitive dimensions of reading.
Footnotes
Appendix 1: Layout of the text and test items
Appendix 2: Test items
| List of biometric systems | List of users to be matched by the test taker |
|---|---|
| A. fingerprint scanner | 6. sports students |
| B. hand scanner | 7. Olympic athletes |
| C. body odour | 8. airline passengers |
| D. voiceprint | 9. welfare claimants |
| E. face scanner | 10. home owners |
| F. typing pattern | 11. bank customers |
Appendix 3: Analysis of each test item in terms of anticipated cognitive processing
| Item | Level of text | Type of processing required | Target in question item – AOI | Target in the reading texts – AOI |
|---|---|---|---|---|
| 2. | Within one sentence | Lexical, Synonymy | Books
On paper |
Printed page
7000 volumes |
| 3. | Within one sentence | Lexical matching Syntactic parsing |
Drugs
Genetic defects |
Drug (only occurrence in text) Single-gene disorders disease |
| 4. | Across sentences | Inference (First)
Lexical, Synonymy |
First success
One form of |
In 1986 (compared to 1989 later)
One type of Muscular dystrophy |
| 5. | Across sentences | Inference (dates) Lexical, Synonymy |
Second success
The cause of |
In 1989 (compared to 1986 earlier)
Gives rise to Cystic fibrosis |
| 6 | Within one sentence | Lexical matching Synonymy |
Sports students | Students, athletic (para A) |
| 7. | Within one sentence | Lexical matching | Olympic athletes | Olympic, athletes (para E) |
| 8. | Within one sentence | Lexical matching Synonymy |
Airline passengers | Passengers, airport (para F) |
| 9. | Within one sentence | Lexical matching | Welfare claimants | Welfare, welfare payments (para D) |
| 10. | Within one sentence | Synonymy | Home owners | Housing (para A) |
| 11. | Within one sentence | Lexical matching | Bank customers | Customers, Bank (para A) |
Acknowledgements
The author would like to thank colleagues at CRELLA, in particular Dr Fumiyo Nakatsuhara, for their comments on early drafts of this paper. Thanks also to Yun-ning Chen and Chihiro Inoue for their research assistance, and to staff and students at Canterbury Christ Church University for their kind cooperation.
Funding
The research reported in this article was supported in part by the ELT Research Award scheme funded by the British Council to promote innovation in English language teaching research. The views expressed are not necessarily those of the British Council.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
