Open Science in language assessment research contexts: A reply to Winke

Abstract

We read with interest the vision Professor Winke (2024) articulated in “Sharing, collaborating, and building trust: How Open Science advances language testing.” The viewpoint contribution brings language assessment into current discussions in applied linguistics, and scientific research more broadly, by arguing that Open Science practices will bolster the quality, clarity, and equity of language assessment research. The view is that these three goals would mark advances in the field to the benefit of all. The claim is that Open Science as defined by the practices Professor Winke outlined would contribute to research quality. The argument is that “higher caliber research” will result from increased “collaboration and sharing” of knowledge, methods, and data; the use of carefully planned methodologies; and “accountability” that leads to excellence. Open Science is also expected to promote “clarity of research methods and communication, p. 846, original emphasis” thereby enhancing language assessment literacy for all, which is to provide the basis for trust in language tests across society. Finally, open science is credited as the means for fostering equity, which is predicted to occur when research about language learning and language testing becomes available to all, regardless of their geographical or financial circumstances.

The vision is universalist and utopian. What could go wrong?

In the interest of critical examination of Open Science for contributing to “quality, clarity, and equity” p. 846 in language assessment research, we paint a more context-embedded view of the field. Our view of language testing as an area of study central to applied linguistics provides useful grounding because of reflections explicitly articulated about the identity of applied linguistics. For example, the authors of Mapping Applied Linguistics define applied linguistics as a mode of inquiry about language-related problems that takes into account the needs of people (e.g., learners, test-score users, and personnel in industry) while being responsive to local factors affecting research. Responsiveness in applied linguistics research typically involves collaborating with local actors in the design and evaluation of findings and recommendations (Hall et al., 2011).

This characterization of applied linguistics research resonates with our language assessment experience that straddles academics and testing programs. Like applied linguistics, we see language assessment as fundamentally engaged with real people and issues arising in political environments where academic perspectives and research goals have to be balanced against real-world needs, participants’ goals, local knowledge, and the existing beliefs and practices where language assessments are situated (Dimova et al., 2020; Fox & Artemeva, 2022). The intended benefits of Open Science need to be considered along with its potential unintended consequences for language assessment in the real world.

Open Science practices intended to increase the quality of language assessment research may actually have the opposite effect in the real world. Secondary research based on open data relies on metadata made available by the primary researcher to adhere to Open Science. For example, the primary researcher, knowing the limitations of the data collection conducted within the constraints of a school testing program, is able to use the data for the intended purpose. However, a secondary researcher using the metadata to conduct a bias analysis on the test may incorrectly conclude that the test is biased against students with particular first languages because the secondary researcher would have no way of knowing that the finding could be explained by circumstances of the test taking experience for particular students—not a biased test. Had the primary research been designed for a bias analysis, the researcher would have sought explanations that would lead to a context-informed interpretation of findings. Recognizing the need for context-informed research, when language assessment researchers use secondary data, they commonly work with language testing companies who contribute to the field by making carefully curated data available to researchers for conducting proposed research. In this situation, an ongoing communication channel is established between the data source and the researcher to support the appropriate use of the data.

In our experience when working with data compiled by entities other than testing organizations, it is not unusual to find data not adequately cleaned, insufficient information about the participants and the conditions of data collection, and no access to guidance from a responsible person. In other words, open data can lack qualities essential for credible interpretations. Even if crowd-sourced quality assurance is expected to mitigate inadequate data in the open data pool, credible interpretations depend on the research context and purpose. Moreover, the motivation to participate in Open Science practices may actually decrease the degree of contextual and qualitative data gathered in language assessment research in order to adhere to ethical and legal requirements for confidentiality of participants’ data.

The clarity of Open Science is also claimed to “ensure trust in the field . . . based on evidence gathered through rigorous and accessible peer-reviewed research” (Winke, 2024). But what would stop unethical researchers from simply making up the data that they use to obtain fictitious results? In this way, Open Science has the potential to lead to the proliferation of misinformation. Ethical researchers could unknowingly publish work based on erroneous data available in open data repositories. As a result, a few fallacious data sets provided by a small number of unethical researchers could have extreme detrimental effects on knowledge in the field of language assessment. We would suggest that trust building entails a range of practices in a community in which integrity is modeled and nurtured. Central to such communities are academic journals like Language Testing, whose editors and reviewers conscientiously uphold their responsibilities. In addition, high-profile conferences such as the Language Testing Research Colloquium (LTRC) where researchers openly challenge each other about their data collection procedures, and assessment contexts help to ensure this trust.

The clarity of Open Science is also claimed to “promote language assessment literacy.” However, in the real world of language assessment, it might actually have the opposite effect. Creating assessments and collecting data can be, and often is, a very challenging and time-consuming endeavor. If researchers can obtain publicly available language assessment data, they may not feel compelled to learn how to create assessments and collect data, or to analyze and interpret data for their assessment purpose. Being able to access other researchers’ data to conduct publishable research could lead to decreased knowledge of critical how-to aspects of language assessment literacy.

A final example of a potential unintended consequence of Open Science affects the epistemological underpinnings of the field. The field of language assessment has largely followed an a priori approach to research. In view of the interests and needs in language assessment contexts, researchers typically develop hypotheses based on theory and evidence and then evaluate hypotheses with various types of analyses. Open Science can be seen as an invitation for data mining instead of research based on theory, previous literature, and practical needs. Data mining can yield boundless meaningless statistical differences and relationships, which are subject to frivolous post hoc explanations that confuse rather than enlighten understanding of language assessment. We see a distinction between data mining and the current limited practice of researchers exploring data that they have collected themselves or obtained from a testing company. Researchers working with known data are more likely to increase their knowledge of dimensions within a research project about which they already have built an understanding and are therefore positioned to make appropriate interpretations of results.

The goals of quality, clarity, and equity, which were so pointedly defined and solved by Open Science in a universal language assessment culture, have to be recast as more relativistic entities in real-world language assessment contexts. What counts as quality in language assessment research has to be evaluated, in part, by its responsiveness to the context where the research needs were identified. Clarity in communicating language assessment research has to be judged by the intended audience, which can encompass a range of knowledge, needs, and interests. Equity is similarly contingent. In the real world where academic knowledge intersects with personal aspirations and business interests, equity inevitably requires consideration of equity for whom. Overall, the applied nature of language assessment research seems to us to distinguish it sufficiently from many types of science research to warrant a critical appraisal of the idea to embrace Open Science.

A critical appraisal must take into account the existing practices that have built the field. Language assessment professionals maintain a legacy of evolving practices that created and have moved the field forward over the past 50 years. These include research on language assessment using both quantitative and qualitative methods; ongoing dedication to high-quality, peer-reviewed journals; specializations in language assessment within applied linguistics graduate programs; participation of major international testing programs in the field; and an international organization for enacting the aspirations of the professional community. These practices have worked to support strong ethical values that are critical to research progress. For decades, language assessment researchers have largely succeeded in creating a research culture that instills high values beginning in training programs and then nurtures these values in the research community. It is difficult to see how Open Science would offer a better path forward than this well-established one. In short, Open Science practices would best be circumscribed to areas where they can improve research without attenuating the practices that have fueled the field for years.

Footnotes

Author contributions

Carol A. Chapelle: Conceptualization; Data curation; Writing—original draft; Writing—review & editing.

Gary J. Ockey: Conceptualization; Data curation; Writing—original draft; Writing—review & editing.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Carol A. Chapelle

References

Dimova

Yan

Ginther

(2020). Local language testing: Design, implementation and development. Routledge.

Fox

Artemeva

(2022). Reconsidering context in language assessment: Transdisciplinary perspectives, social theories, and validity. Routledge.

Hall

C. J.

Smith

P. H.

Wicaksono

(2011). Mapping applied linguistics: A guide for students and practitioners. Routledge.

Winke

(2024). Sharing, collaborating, and building trust: How Open Science advances language testing. Language Testing, 41(4), 845–859. https://doi.org/10.1177/02655322231211159