Open Science should be welcomed by test providers but grounded in pragmatic caution: A response to Winke

Abstract

Keywords

Data sharing IELTS Open Science test provider transparency

The call for the field of language testing to adopt Open Science is a timely one, particularly as part of a welcome drive for greater transparency in research and assessment practices across the board.

The intention of this response is to support Open Science by offering some on the ground observations from researchers at a large-scale test provider (IELTS) and to highlight some of the challenges and opportunities that arise through its proposed adoption. Specifically, we will address what the move may mean for trust and risk around tests being used for high-stakes purposes, and our own experiences of data sharing thus far as part of the IELTS Joint-funded Research Programme and beyond. A series of recommendations (and supportive caveats) to help facilitate the move are included; we remain cognisant of the need to balance academic priorities with practical or commercial considerations, and believe that doing so remains a viable and desirable outcome. Bringing “pure” academic research and operational test provider validation research closer together through Open Science may help achieve this aim.

High-stakes testing contexts: Trust, risks, and suggested mitigations

“Open Science will hopefully remove trust in the field of language testing that is based on reputation without reasonable proof, and ensure trust in the field is based on evidence gathered through rigorous and accessible peer-reviewed research” (Winke, 2024, p. 6). The above quote highlights the opportunity to maintain trust that is at risk of erosion if test providers’ claims are not supported by robust research evidence. All test providers’ validation evidence should be interrogated rather than taken at face value, such as claims of equivalence, of alignment to existing frameworks or around test security. This is especially important if existing standards regarding test use and fitness for purpose come under increasing pressure. For example, in the current climate those involved in decision-making cannot always be assumed to have the assessment literacy skills required to make a considered judgement (Gan & Lam, 2022). Calls for publicly available research evidence (i.e., datasets, or rigorously peer-reviewed journal articles with clearly articulated academic findings) could become central to maintaining standards. We believe that Open Science can play an increasingly valuable role here; it is imperative that commercial pressures, marketing tactics, and lobbying do not replace empirical evidence as the force that guides decision-making. Reputations for trust among test score users should be earned, and maintained.

Furthermore, it is important not to conflate the implications of Open Science for all parties. Testing contexts differ in stakes and scale of administration, for example, and “pure” academic research is not necessarily the same as validation research. Each case should be considered separately, to avoid unexpected consequences. The sharing of datasets (described below) is a positive way to bring these forms of research together. However, for any test provider large or small, Open Science must not equate to “open season,” and shared datasets—for example—should not be misused or weaponised to suit a particular worldview, commercial agenda, or academic hegemony. For example, if one test provider shares data more openly than another, this could prove problematic, especially if a competitor uses the data in some way that deliberately reflects badly on another provider. With increased sharing comes greater responsibility on behalf of the researcher not to abuse it, or there may be a risk that test providers retract for reasons of self-protection. The scope for manipulation or abuse of Open Science (e.g., secondary analysis) should be considered as part of implementation, as well as the possible mitigations.

Relatedly, publishing outcomes no matter what they show is a laudable academic practice, and an increase in research with null or negative results “seeing the light of day” (p. 6) would be an encouraging shift towards transparency. Of course, this may not always be viable for test providers with commercial interests. Although preregistration may offer an opportunity for compromise, under operational testing conditions it is important that some studies are allowed to evolve over time, and the approach to a research design may require increased flexibility compared to carefully pre-planned academic inquiry. The ability to pivot to meet emerging demands, incidents, or probe deeper into findings is ever-present, and can actually have positive implications for a research study. Trialling some of the above aspects of Open Science with a large-scale test provider may help investigate the viability of compromise: IELTS research could be a good candidate for this.

Winke’s editorial piece also acknowledges that not all test materials can be shared (p. 8). Similarly, there will be validation research, data, and outcomes that must remain internal (e.g., for reasons of test security). Although Open Science may be considered “an accelerating, world-wide movement, on its way towards being a normative practice in many fields, including applied linguistics” (Winke, 2024, p. 3), not all fields of research are the same. Language testing can influence key decisions and have major consequences and as such sensitivities around data sharing and increased transparency must not be overlooked. The suggested “embargo” on research findings is a good one. This may be extended to include the right to withdraw if needed, in line with participant consent in a research project, for example.

The notion that Open Science fosters equity in research is appealing, but some researchers will require support to learn the “research literacy” skills required (e.g., uploading preprints, using relevant tools or platforms, navigating public repositories). Although Winke (2024) offers clear steps that researchers can take, it is essential that these are straightforward to use, especially if researchers who are short of time are not to be further disadvantaged. It is important that Open Science does not raise the bar for publication for those who are already struggling to participate. A re-evaluation of the lengthy academic peer review process may be welcome alongside Open Science reform, if the process is not to become overly time consuming. The introduction of new types of article (e.g., brief reports) is conducive to sharing test provider research, and rigorous but streamlined reviews may further help document our studies while the content remains relevant for the reader in an industry evolving at pace.

Data sharing, access, and supporting researchers

“It may be transformative for language-testing grant programs to require a public registration of any successful grant proposal after grant review, but before funding is provided and data are collected” (Winke, 2024. p. 6).

Beyond maintaining trust in language testing, Open Science may allow us to further develop and refine our existing data-sharing capabilities. Some of the editorial suggestions are already functioning as part of IELTS research, while others could be trialled for implementation. The IELTS Joint Research Committee (JRC) manages funded projects, alongside our research collaborations with external researchers. The latter may either be commissioned to investigate a particular area (e.g., evaluation of test constructs) or a non-commissioned project of interest to all parties (such as an ongoing impact study looking at stakeholder perceptions of university admissions tests). Another responsibility of the JRC is data sharing, which follows a process designed to balance data access with ethical, security, or other practical considerations. Winke’s suggestion of public registration (p. 6) of projects is viable. Although we do log funded projects on the IELTS website, more information may indeed make researchers more accountable to their original plans, and ensure that funding organisations uphold the same integrity. However, experience suggests that researcher accountability has not been the biggest challenge; the practicalities or security concerns around sharing data linked to the EU and UK-adapted General Data Protection Regulation (GDPR), a privacy and security law for processing personal data, are perhaps more significant.

Winke also accepts that preregistration is less feasible for exploratory research. Stakeholder perspectives play a key role in the validation process, and different kinds of evidence alongside psychometric data are required to support both test development and validation. The IELTS Research Group uses exploratory and qualitative research for these purposes and would highlight the need for research in this vein to maintain an equal footing with quantitative methods as part of Open Science.

Development of a secure databank for research access

In recent years, the JRC has overseen the design of a secure and sizeable bank of IELTS speaking and writing test performance data for researchers to use. Datasets have been cleaned and all personally identifying information about candidates are removed, in line with our consent procedures.

At present, its use remains largely limited to funded researchers for contractual reasons. However, we are interested in potentially making this more publicly accessible in line with Open Science requirements, and as Winke suggests, enhancing transparency around the procedures for borrowing data, the types of datasets available, and publishing data request forms. Currently, researchers interested in accessing IELTS data for independent research tend to email one of the IELTS teams rather than following a standardised protocol. Developing a series of guidelines around the use of shared datasets could be an important next step as part of a commitment to adopting Open Science principles, including respecting the context in which the original data were collected and the purpose for which they were collected. It is not immediately clear how Open Science can increase assessment literacy (Winke, 2024): further guidance on how we can help facilitate this would be welcome.

The use of non-disclosure agreements has permitted sharing more sensitive data in the past, as has the possibility for researchers to visit Cambridge and conduct supervised work with internal documents onsite. It should also be noted that external requests are often high in number and not always part of a robust research design. The need to support researchers to understand or use the data they are given does arise. Although we had never considered a fee for data sharing (Winke, 2024, p. 4), the suggestion may contribute to the costs and help make the overall process more manageable.

In summary, we reiterate our optimism and concur with Winke (2024) that the adoption of Open Science in language testing has the potential to advance “quality, clarity and equity” (p. 846) in research. The IELTS Research Group is open to further developing our existing processes in line with Winke’s suggestions through Open Science. This should be done with cautious pragmatism bearing in mind, as Winke reminds us, the specific requirements and challenges of operational testing research compared to purely research-based testing.

Footnotes

Author contributions

Tony Clark: Conceptualization; Writing – original draft; Writing – review & editing.

Emma Bruce: Conceptualization; Writing – original draft; Writing – review & editing.

Declaration of conflicting interests

The authors declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: IELTS is jointly owned by Cambridge University Press & Assessment, British Council and IDP Education. The first and second author are employed by Cambridge and the British Council, respectively. The views shared represent our individual perspectives and experiences rather than the organisations as a whole. As Chair of the IELTS Joint Research Committee, the first author oversees the sharing of data with researchers. These data are securely held in Cambridge.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Tony Clark

References

Gan

Lam

(2022). A review on language assessment literacy: Trends, foci and contributions. Language Assessment Quarterly, 19(5), 503–525. https://doi.org/10.1080/15434303.2022.2128802

Winke

(2024). Sharing, collaborating, and building trust: How Open Science advances language testing. Language Testing, 41(4), 845–859. https://doi.org/10.1177/02655322231211159