Book Review: Statistical Analyses for Language Testers

Abstract

Green’s book is, by and large, an accessible book for language testers who might be involved in developing or running language tests. It is also intended for those who might be studying towards an MA or PhD in language testing.

The book covers both Classical Test Theory (CTT) and Modern Test Theory (MTT), and is divided into 16 chapters. The first two chapters address data entry and manipulation using the Statistical Package for Social Sciences (SPSS). These chapters include instructions for data entry in SPSS, accompanied by many screen shots. Chapters 3 to 9 cover CTT topics and chapters 10 to 16 cover MTT. A two-page introduction explains the difference between CTT and MTT. Additional exercises and data control file instructions are provided in 13 appendices. These make the book very much a hands-on experience in language test analysis.

The author uses three very commonly used statistical packages in language testing for illustrating her points: the SPSS, Winsteps and FACETS. (SPSS is produced by IBM, and Winsteps and FACETS by Mike Linacre.) The readers are provided with five symbols in each chapter to guide them: a boot, a compass, a road sign, a question mark and a pair of binoculars. The difficulty of each chapter is indicated by the number of boots. For example, the data entry chapter has one boot, whereas the chapter on factor analysis has four. The compass indicates an explanation of new terminology, the road sign indicates an explanation of a table or figure, the question mark indicates questions, and the pair of binoculars indicates the key to the exercises. These five symbols, along with 11 acronyms, are explained prior to Chapter 1. There is a page of references and further reading at the end of the book.

This book is innovative in the way it introduces applied statistics. The first two chapters deal purely with data entry and formatting within SPSS, with no attempt to explain the statistical concepts. One might assume that this involves no more than copying and pasting the data entry instructions from the SPSS manual. Whilst this is true to some extent, the data used to illustrate the procedures are language testing data, and the choices in data formatting are the most commonly used options in language test analyses. This helps analysts to familiarize themselves with the multiple screens/options within SPSS before being exposed to test analysis in the later chapters. Reading these two chapters and completing their accompanying exercises should prepare readers to focus on what needs to be analysed, rather than how the data should be prepared, once they turn to the subsequent chapters.

Chapters 3 to 9 cover CTT analyses, and start with a brief description of the main statistical concepts and their importance to language test developers, followed by instructions on how to carry out the analyses in SPSS. CTT begins with item analysis in Chapter 3, covering facility, mean, mode, median, score range, discrimination and reliability. Chapter 4 looks at descriptive statistics such as distribution, standard deviation, skew and kurtosis. Chapter 5 looks at questionnaire analysis using test-taker feedback data. Chapter 6 shows how to compare performance using means, scatterplots and correlations, whereas Chapter 7 shows how to compare performance using parametric and non-parametric analyses for two groups. Chapter 8 shows how to compare performance across more than two groups; this is where ANOVA is introduced, and Chapter 9 deals with factor analysis. Chapters 1 to 9 only use a selection of test analysis options within SPSS for practical reasons; the choice of topics seems to be linked to the SPSS menus.

Chapters 10 to 16 deal with IRT analyses. Chapter 10 is similar to Chapters 1 and 2, in that it explains how to write a control file for the Winsteps software. Chapters 11 and 12 introduce the concept of IRT using Winsteps. Chapter 11 covers convergence tables and variable maps, and introduces the Rasch model. Chapter 12 looks in more detail at item and person statistics, and what they can offer test developers. Chapter 13 deals with distractor analysis for multiple-choice items. Whereas most other statistical books discuss this topic under CTT, its introduction here is a welcome departure from the norm, as distractor analysis is meaningless without the item difficulty concept, which is more appropriately addressed within an IRT framework. Moreover, Winsteps provides CTT statistics alongside the IRT output. Chapter 14 introduces the FACETS software, but is very similar in approach to Chapter 11. Chapter 15 looks at iteration reports and the vertical ruler in FACETS and Chapter 16 addresses item and rater measurement reports using FACETS.

General observations

The book deals with both Classical Test Theory (CTT) and Modern Test Theory (MTT). While the author acknowledges that there is no shortage of statistical books that are written specifically for language testing readers (see, e.g., Alderson, Clapham, & Wall, 1995; Bachman, 1990, 2004; Brown, 2005; Henning, 1987; and McNamara, 2000), she notes that such books are somewhat intimidating for the “everyday” language test developer or item writer. According to the author, the book “provides a ‘taster’ of what is out there – something to work through and decide whether you want to delve further into the mysteries of statistical analyses” (p. ix).

What is different in Green’s book is its intended audience, which is mainly the less statistically informed test developer. In fact, the author makes it very clear from the outset that the book is not about theoretical statistics or mathematics; rather it is about statistical analyses “applied” to the area of language testing. This theme is very much adhered to throughout the chapters, where theoretical concepts are first simply and economically explained, and then followed by step-by-step instructions on how to apply the concepts to language test data using SPSS, Winsteps and Facets. However, given that the book is intended for postgraduate students also, one must question the adequacy of the descriptions of the statistical concepts provided in different chapters. Most descriptions are too short, and sometimes look like bulleted points. This would make the book less suitable for academically oriented language testing programmes. Of course, getting the balance right between explaining the theoretical concepts and describing practical applications of those concepts is always a challenge. Nevertheless, I believe that there is insufficient depth to the statistical descriptions in the book, which sometimes leads me to describe it as more of a workbook than a textbook.

By and large, the coverage of statistical topics seems to be not dissimilar to the other, previously mentioned statistical textbooks in language testing. Despite Brown’s (2007) argument in reviewing Bachman’s (2004) book, that the inclusion of statistics such as ANOVA may more be appropriate to a general quantitative language research book than to a language testing statistics book, Green provides a strong argument as to why this statistic is important for investigating certain aspects of test performance.

Conclusion

Most instructions in this book are based on practical exercises that the author has found useful over a decade during her career of training language testers for delivering local and international projects and those who attended her language testing courses at Lancaster University.

Alderson, in the Foreword, mentions that the book has been used in a summer course titled “Language Testing” at Lancaster. The way the materials are presented in the book gives the impression that the book was used as part of a tutorial for that course. In fact, the book could be a good complement for any language testing course and provides lots of practical exercises for explaining a range of important and useful statistical concepts for test developers. The inclusion of simple and clear instructions for three commonly used statistical packages for language test analysis makes the book accessible to a less statistically informed audience. Indeed, as Alderson mentions, the book will help readers discover how statistics can reveal all sorts of interesting things about test items, test tasks and test scores. While this may be true, a more in-depth discussion of key statistical analyses would have extended its readership to those in academic programmes as well.

The title of the book reminds one of Bachman’s (2004) publication, Statistical Analyses for Language Assessment, and there is a striking title similarity between the two books, which inevitably demands some comparison. Unlike Bachman’s (2004) book, where statistical concepts are presented in one book and practical test analysis exercises in another workbook (Bachman & Kunnan, 2005), Green’s book mixes the two, giving priority to the latter. In this respect, it lacks the depth of the statistical discussions one would expect to see in an academic context. This makes it hard to recommend the book as a main textbook for MA and PhD programmes. In these contexts, it would perhaps work best if it were used alongside a more academically oriented textbook, such as Bachman’s publication.

When I first looked at Green’s book, I was in two minds about its usefulness. On the one hand, I was looking for more in-depth statistical discussion; while on the other hand, I wanted it to be accessible to newcomers in language testing. The approach in this book reminded me of Weigle’s (1998) article on using FACETS to model rater training effects, which made rater training effects accessible to general language testing readers by explaining how they could be investigated using FACETS. Green has done a similar job here, but has applied it to much broader statistical analysis techniques. This book definitely targets a new audience: language test developers and item writers. The choice of the three most commonly used statistical packages in language test analyses and the provision of clear instructions on how to run the analyses and interpret the results has made statistical analysis very accessible to newcomers to language testing. There is coverage of an appropriate variety of test analyses for test development. The provision of an explanation at the end of each statistical output and the signposting will help readers to identify what is important in the data and how to interpret it. I believe this book will have a special place in the library of many practitioners for years to come.

References

Alderson

J. C.

Clapham

Wall

(1995). Language test construction and evaluation. Cambridge: Cambridge University Press.

Bachman

L. F.

(1990). Fundamental considerations in language testing. Oxford: Oxford University Press.

Bachman

L. F.

(2004). Statistical analyses for language assessment. Cambridge: Cambridge University Press.

Bachman

L. F.

Kunnan

A. J.

(2005). Statistical analyses for language assessment workbook and CD ROM. Cambridge: Cambridge University Press.

Brown

J. D.

(2005). Testing in language programs: a comprehensive guide to English language assessment (new edn). New York: McGraw-Hill College.

Brown

J. D.

(2007). Book review: Statistical analyses for language assessment. Language Testing, 24(1), 129–135.

Henning

(1987). A guide to language testing: Development, evaluation and research. Cambridge, MA: Newbury House.

McNamara

(2000). Language testing. Oxford: Oxford University Press.

Weigle

S. C.

(1998). Using FACETS to model rater training effects. Language Testing, 15(2), 263–287.