Book review: Language Testing: Theories and Practices

Abstract

Q: What kind of a reader reads the series editor’s introduction to an edited volume of academic articles?

A: A review writer.

In the case of O’Sullivan’s edited volume, the series editor’s introduction is essential for orienting the potential readers of the volume, including this reviewer. It identifies the target audience – postgraduate and research students and researchers – and states that the book is intended to ‘offer the informed readership a conspectus of perspectives on key themes’ related to the present and future of language testing, as seen through the editor’s eyes’. Furthermore, Candlin’s preface also presents the salient disclaimer that the book is not intended to present an overall guide or provide exhaustive coverage of subfields of language testing. In other words, hardly anyone (bar a review writer) would likely be expected to read the volume cover to cover. On the other hand, the book has something to offer to many language testing students and professionals, thus making it essential for the libraries of institutions that offer courses in language testing at the postgraduate level, and probably the professors who teach such courses. Individual chapters will also be of interest to researchers and postgraduate students who work on topics related to the key themes of the book: test validation, and the development and use of language tests.

The book is divided into four sections. The first is theoretical, the second covers application of theory to practice, the third includes academic studies of practical test development projects, and the fourth covers studies of language test use. The first three sections include four articles each, while the last section includes two.

The theoretical section opens with a chapter by O’Sullivan and Weir on test validation. The authors argue for the usefulness of Weir’s socio-cognitive model of test validation on the grounds that it defines the scope and sequence of validation work that examination boards can be expected to perform to support the quality of the test they publish. They argue that the model presents an improvement over other recent theoretical frameworks and models of test validity and validation because it presents a workable agenda for test development boards. The chapter is at its best when it describes Weir’s model and the rationale for its use in the second half. It is somewhat less clear in the case it builds against alternative models that testing boards could follow in their validation work, which ultimately seems to boil down to practicality. An interesting exercise for the reader would be to evaluate Weir’s model in the same way that O’Sullivan and Weir evaluate the set of ‘other models’ they consider for guiding validation work.

The second chapter, by Brian North, discusses the concept of proficiency levels and level descriptors. This is an advanced, somewhat philosophical discussion of what the intuitive concept of proficiency levels actually means when analyzed more deeply, while also providing some ‘insider’ information about the philosophy underlying the descriptors in the Common European Framework of Reference. North also discusses descriptor style and describes the process of descriptor development and validation. He acknowledges that level descriptors are summaries of perceptions, using conventional wordings, rather that learning theory. This chapter will be interesting for researchers and practitioners who work with descriptor development.

The third chapter, by Alan Davies, analyzes whether theoretical arguments in language testing offer support for the Linguistic Relativity Principle, otherwise known as the Sapir-Whorf hypothesis. Davies introduces the Linguistic Relativity Principle and evaluates how the theory in its strong or weak form fares in relation to questions such as structural versus communicative language testing, unitary versus divisible competence, and ‘old’ versus ‘new’ validity. He concludes that work in language testing does not offer support for the Linguistic Relativity Principle. This chapter is valuable for serious students of language testing in addressing theories that may not be discussed very frequently, and offering models of theory evaluation.

The final chapter in the theoretical section of the book, by Paul Joyce, presents a carefully designed and reported study of the influence of eight isolated (and relatively low level) listening ‘subskill’ factors on overall listening test performance. Joyce’s discussion mostly addresses the strangely low contribution of vocabulary knowledge, not only in the results of this study, but also in other studies. The chapter should prove very interesting and informative to students researching listening comprehension.

The next four chapters in the book cover applications of theory to practice. These chapters follow the standard model of journal articles. In the first, Kantarcioglu and Papageorgiou discuss a comprehensive list of procedures for standard setting design and implementation, especially in the context of the Common European Framework of Reference. They describe conceptual issues in a reader-friendly way. The chapter is useful especially for examination boards planning to conduct standard setting.

In the next chapter, Toshihiko Shiotsu discusses his development of a word difficulty index using self-assessment data from Japanese learners of English. Shiotsu modeled the difficulty of the words and correlated the results with frequency indicators from the British National Corpus and an 8000-word list created specifically for Japanese students of English. He found moderate correlations, but also found that the Rasch difficulty data did not predict frequency data very well. This chapter will prove interesting for students doing vocabulary research as well as developers of vocabulary item banks.

In the third application chapter, Abdul Raof discusses a study where he developed a scale for rating conference presentations for criteria that are relevant to content experts rather than language experts. He argues that such a real life orientation can improve the validity of rating/testing, especially if the purpose of the test and intended score use is to inform real life decisions rather than language education decisions. This article is useful for practitioners working with ‘real-life’ specific purpose assessments.

In the final application chapter, O’Sullivan and Nakatsuhara describe a way of implementing Conversation Analysis-inspired measures of topic initiation, topic ratification, and amount of talk for analyzing group oral tasks. The article includes good references for how the measures were derived, presents evidence for how the measures were actually implemented in analysis of group test discourse, and discusses implications for rating scale content. This article is useful for readers interested in speaking assessment.

For me, the next four chapters, which focus on practical test development projects, provide the most significant contribution of the volume to the language testing literature. Their informed discussion of practical test development considerations throws light on the non-ivory-tower reality of much of language testing. In the first chapter, Anthony Green describes the real-life influences on the development of a test for placing students in different levels of English class. The article looks at the development through both Bachman’s validity argument for test use and Weir & O’Sullivan’s socio-cognitive framework for test validation. Practicality and acceptability considerations arise as a significant additional influence on the shape of the final test. This article should prove interesting reading for test developers.

The next chapter, by Deirdre Burrell and colleagues, describes the development of a diagnostic assessment of language knowledge that a modern languages department in the UK developed for assessing the language knowledge of their trainee teachers. The project members identified gaps in essential knowledge by analyzing student-produced texts and interviewing stakeholders, constructed a diagnostic instrument and revised it based on pilot data, implemented the assessment, and evaluated its effects on the students and the program. The article should prove interesting especially to teacher educators, but also to many others who are interested in implementing assessments that support learning.

The third chapter, by Florescano and colleagues, reports on the development of an English language examination at the University of Veracruz in Mexico. The university assembled a team that, under the guidance of external advisors including O’Sullivan, developed a high-quality examination model that met their needs yet was affordable. The resulting examination suited their contextualized needs better than a broad, international examination. At the same time, the team developed the local expertise to develop and administer a large-scale examination. This chapter is interesting for language testers and administrators considering local examination development.

The fourth practical chapter, by Brown and Jaquith, describes the development and validation of an online rater training and marking system in the United Arab Emirates. Following a summary of research on online training and marking, the authors describe the development and implementation of their local marking system, including design and technical difficulties and their resolution. The article should prove interesting to examination boards that are considering an online marking system for writing.

The last two chapters cover research on test use. In the first, Rea-Dickins, Kiely and Yu report on their study of the use of the IELTS examination in university admissions. After situating their study in existing research, they report the practices they found through document analysis, questionnaires, and interviews. They discuss the power of the language test as a ‘hard’ criterion among more malleable, qualitative considerations, the misinterpretation of a language test score as a content qualification for a course of study, and the limited amount of knowledge about the language test by many administrators who make admission decisions. In the second chapter, O’Dwyer presents a case from a private university school in Turkey, where a new, formative assessment system was introduced into an existing system of language learning. He discusses the administrative and management decisions and procedures that were required to achieve the intended aims of the formative assessment system, from managing change in learner expectations and developing staff expertise, to ensuring resources to support both the assessment and the new learning model that the assessment was intended to engender. This should prove interesting reading to researchers and practitioners who work with validation, especially the consequences of test use.

All in all, O’Sullivan’s edited volume offers a range of perspectives into language testing that has something to offer to many. It is particularly good at showing cases of practical test development. The book is a useful resource for academic advisors and libraries, and individual chapters will prove interesting to a wide range of students and researchers.