Open Science practices in language assessment: Introducing the special issue

Abstract

The movement to promote Open Science (OS) practices in applied linguistics research is already more than a decade old, with instrument repositories for language research such as the IRIS (Instruments for Research Into Second languages) database launched in 2011 (Marsden & Mackey, 2014), individual scholars calling for increased methodological transparency (e.g., Marsden, 2019; Plonsky, 2014), and journals such as Language Learning being early adopters of Open Science Badges (https://www.cos.io/initiatives/badges). The movement has gained additional traction in recent years with a growing number of initiatives. Consequently, these practices are becoming more widespread and increasingly encouraged or sometimes even mandated by selected funders, journals, and governments. This has generally increased the salience of OS in the discourse of language testing research and applied linguistics more broadly, as seen in discussions of questionable research practices, conflict of interest declarations, and much more.

OS is thereby an umbrella term for a range of scientific practices, going far beyond open access publication. The United Nations Educational, Scientific, and Cultural Organization (UNESCO) General Conference, adopting and publishing their Recommendation on Open Science in 2021, defines it as

an inclusive construct that combines various movements and practices aiming to make multilingual scientific knowledge openly available, accessible and reusable for everyone, to increase scientific collaborations and sharing of information for the benefits of science and society, and to open the processes of scientific knowledge creation, evaluation and communication to societal actors beyond the traditional scientific community. It comprises all scientific disciplines and aspects of scholarly practices, including basic and applied sciences, natural and social sciences and the humanities, and it builds on the following key pillars: open scientific knowledge, open science infrastructures, science communication, open engagement of societal actors and open dialogue with other knowledge systems (UNESCO, 2021).

As such, these practices have the potential to make scholarship more transparent, inclusive, and accessible by performing, reporting, verifying, and assessing research openly (Al-Hoorie et al., 2024; Al-Hoorie & Hiver, 2024; Liu, 2023; Liu et al., 2023; see also Winke, 2024, published in this issue, for additional arguments for and benefits of OS). Language assessment has a longstanding history of individual Diamond Open Access journals, such as the journal Studies in Language Assessment (formerly Papers in Language Testing and Assessment, and Melbourne Papers in Language Testing before that), as well as being (and self-identifying as being) at the forefront of methodological robustness (i.e., measurement and validation) considerations (Purpura et al., 2015). Beyond that, the most recent editors of the journal Language Testing (particularly Harding, Winke, and Isaacs), have implemented many initiatives to accelerate the move towards OS, with the adoption of Open Science Badges, inviting authors to preregister studies and share materials and data openly (Harding & Winke, 2021), encouraging authors to publish open accessible summaries in the OASIS (Open Accessible Summaries In Language Studies) database alongside their articles (Marsden et al., 2018), continuing a freely available podcast on selected papers started by Glenn Fulcher many years ago (now as a vodcast), making test reviews freely accessible (Harding & Winke, 2021), diversifying the article type categories (Harding & Winke, 2021), encouraging post-prints by authors on public digital repositories such as the Open Science Framework (Isaacs & Winke, 2024), and requiring conflict of interest declarations from editors and authors (Isaacs & Winke).

Nonetheless, we had observed informally that there seemed to be a lack of broad uptake of OS practices in the field so far (see Liu et al., 2024, for systematic evidence on this), suggesting some hesitance. Much like how the wider field of applied linguistics has shown that it takes time for these practices to become common, replication studies still seem few and far between; Language Testing has yet to publish a registered report (Isaacs & Winke, 2024); sharing of materials, data, and code is not standard practice; and other language assessment-related journals or language assessment associations have not championed OS in the same way as Language Testing. While there may be a trend towards more open access publishing in recent years, other important aspects of OS practices regarding methodological rigour, research quality, transparency, and reproducibility would appear still largely unaddressed. There may be subfield-specific reasons for this. Some might think that language testing, particularly where the stakes are high, involves commercial or political interests that could be at odds with some OS practices, and that this might be one of the primary reasons for these practices only slowly becoming mainstream. While this may be the case to some degree, we have to acknowledge that there is also a lot of language assessment research taking place that would not be constrained by the circumstances surrounding large-scale high-stakes tests, and large-scale test providers may actually be more open to the idea of supporting OS practices than one might think (see responses to Winke’s, 2024, Viewpoint article). The reasons are thus likely more complex and multifaceted, ranging from a lack of systematic incentivization to infrastructure, resources, and possibly simply a general lack of awareness for the issue at hand.

The idea of having a Special Issue of Language Testing on OS practices in language assessment therefore seemed to be very timely, not just to document the status quo of the field. When we proposed the Special Issue in 2023, our hope was also to open up the discussion about field-specific challenges and affordances in the specific context of language testing research and, more importantly, to add to the momentum of the movement more widely. We had toyed with this idea already back in informal online late-night/early-morning discussions between Innsbruck and Hawaiʻi in 2020, but it is only in 2023 that we felt the time was right to suggest it. The call resulted in a wide range of highly interesting papers, both examining and exemplifying OS practices. It might seem like there is a lack of qualitative empirical studies, but for various reasons, not all manuscripts invited after the initial call made it to publication in this Special Issue. We are particularly excited by those which did, as they make this indeed a very special Special Issue. Different from typical Special Issues, where papers are generally bound together by a certain topical focus, we wanted to additionally illustrate how implementing OS practices really is an issue that pertains to numerous and potentially very different aspects and topics of language assessment research, thereby providing useful examples for future research to follow.

After considering all 18 submitted proposals, we invited 10 authoring groups (representing 18 authors overall) to submit full drafts of their manuscripts. Eight papers were submitted for review and went through the journal’s standard procedures of external double-blind peer review, with five papers ultimately accepted for publication. In the spirit of OS, the review process was characterized by one optional feature. If authoring teams agreed, the reviewers were also asked whether they would like to opt into our trial of a Transparent Review process, by the end of which all comments and author responses would be published as supplementary material alongside the article and thus allow for a transparent reconstruction of the paper’s genesis. Only if all parties involved agreed to the transparent review procedure did we go ahead with it. Five papers were ultimately accepted for publication; three of which feature this groundbreaking Transparent Review. In addition, one more empirical paper that exemplified OS practices but went through the normal, non-Special Issue submission process was added to the Special Issue at the recommendation of the Language Testing editors and with the authors’ agreement.

In the first paper, Meng Liu et al. (2024) provide a systematic review of the status and trends of OS practices in three journals of our field. They examined the prevalence of open manuscripts, open materials, open data, and open code in the flagship journals Language Testing, Language Assessment Quarterly, and also Language Testing in Asia, a newer journal with all articles published open access via article processing charges (i.e., Gold Open Access). What makes this paper even more interesting is that they investigate how the exhibited OS practices in our field are related to authors being from the Global South or the Global North. This article was part of the Transparent Review pilot, and readers can find a link to the full peer review record in the article’s supplemental materials.

Dylan Burton (2024), in his paper, probed the relationship of non-verbal behaviour, as recorded and analysed by facial recognition and machine learning software, to language proficiency ratings for fluency, vocabulary, grammar, and comprehensibility. Burton provides a model illustration of OS practices, with the study being preregistered, and all data and code are available to readers. This article, too, has a Transparent Review record available as a supplement.

Amber Dudley et al. (2024) report on the development and initial validation of a test of high-frequency French vocabulary knowledge for use in the General Certificate of Secondary Education (GCSE) exams in England. Focusing on lower-proficiency learners and a language other than English, the tool simultaneously addresses two critical research gaps and extends the toolkit of researchers and educators, all the while being an exemplar of OS, as the resulting instrument is freely accessible.

Hitoshi Nishizawa’s (2024) piece in this Special Issue offers another resource as a result of the study. He compiled an openly available Fluency Corpus of Academic English Lectures (FCAEL) that aims to allow test developers to sample representative fluency features for creating authentic listening passages. He compared the corpus data to the academic lecture passages in two established proficiency tests on 14 temporal fluency measures, proposing tentative thresholds of each of these features for test and material developers. Again, coding schemes, analysis codes, and the raw corpus data for this study are openly available.

The Brief Report by Junlan Pan and Emma Marsden (2024) describes the development of an openly accessible, internet-based foreign language aptitude test battery. The authors describe the theory and need for such a tool and offer data from an initial validation attempt. Again, the paper is exemplary for OS practices in that it describes an open research endeavour that even has the potential as an open data pool for researchers. It also has a Transparent Review record available as a supplemental file.

Another Brief Report by Hung Tan Ha et al. (2024) examined factors that predict the difficulty of English vocabulary test items among Vietnamese learners. Motivated by recent research suggesting that non-frequency factors can meaningfully predict word difficulty for learners, the authors used random forest, a type of machine learning technique rarely used in language testing research, to explore how a wide range of other lexical variables might predict difficulty. This study includes open data, which allows interested readers to attempt reproducing results of Ha et al. or conduct their own exploratory analyses of the data.

One core component that we had envisaged from the start for this Special Issue was to have several scholars from both academia as well as industry openly discussing the affordances and potential challenges of transitioning to increased OS practices as a field. Given her experience as a past journal editor, in which she championed many OS initiatives, Paula Winke was the ideal candidate to initiate this discussion with the Viewpoint piece found in this SI. We are grateful that many colleagues from a range of contexts and levels of seniority accepted our invitation to respond to the arguments put forward by Winke, resulting in the seven Letters to the Editors at hand (Chapelle & Ockey, 2024; Clark & Bruce, 2024; Gebril & Bali, 2024; Koizumi et al., 2024; LaFlair, 2024; Papageorgiou, 2024; Jin & Fan, 2024). We anticipate this exchange to be amongst the key contributions of the Special Issue.

All these papers illustrate OS practices to varying degrees. To recap, in addition to the transparent review pilot, this Special Issue is groundbreaking in that, for the first time, all empirical papers in a Language Testing issue feature open data, and most also feature open analysis scripts. In addition, most papers are published open access. We are particularly grateful to SAGE for supporting access to the Viewpoint and response letters by removing the paywall to ensure that everyone in our field and beyond can access and participate in this critical discussion. As such, we hope that this Special Issue will make a significant contribution to the OS conversation and future action and be a landmark in both showcasing these practices and ringing in the next generation of language assessment research that will be more transparent, inclusive, and accessible for all.

Footnotes

Author contributions

Benjamin Kremmel: Conceptualization; Writing—original draft.

Daniel R. Isbell: Conceptualization; Writing—review & editing.

Declaration of conflicting interests

The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Benjamin Kremmel is the Book Reviews Editor of Language Testing. Daniel R. Isbell is an Associate Editor of Language Learning.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iDs

Benjamin Kremmel

Daniel R. Isbell

References

Al-Hoorie

A. H.

Cinaglia

Hiver

Huensch

Isbell

D. R.

Leung

Sudina

(2024). Open science: Considerations and issues for TESOL research. TESOL Quarterly, 58(1), 537–556. https://doi.org/10.1002/tesq.3304

Al-Hoorie

A. H.

Hiver

(2024). Open science in applied linguistics: An introduction to metascience. In Plonsky

(Ed.), Open science in applied linguistics (pp. 18–49). Applied Linguistics Press.

Burton

J. D.

(2024). Evaluating the impact of nonverbal behavior on language ability ratings. Language Testing, 41(4), 729–758. https://doi.org/10.31219/osf.io/tc3qg

Chapelle

Ockey

(2024). Open Science in language assessment research contexts: A reply to Winke. Language Testing, 41(4), 882–885. https://doi.org/10.1177/02655322241239377

Clark

Bruce

(2024). Open Science should be welcomed by test providers but grounded in pragmatic caution: A response to Winke. Language Testing, 41(4), 872–876. https://doi.org/10.1177/0265532223122310

Dudley

Marsden

E. J.

Bovolenta

(2024). A context-aligned two thousand test: Towards estimating high-frequency French vocabulary knowledge for beginner-to-low intermediate proficiency adolescent learners in England. Language Testing, 41(4), 759–791. https://doi.org/10.31219/osf.io/x6bzs

Gebril

Bali

(2024). A Global South perspective on Open Science in language assessment: A response to Paula Winke. Language Testing, 41(4), 886–891. https://journals-sagepub-com-s.web.bisu.edu.cn/doi/10.1177/02655322241260121

H. T.

Nguyen

D. T. B.

Stoeckel

(2024). What is the best predictor of word difficulty? A case of data mining using random forest. Language Testing, 41(4), 828–844. https://doi.org/10.1177/02655322241263628

Harding

Winke

(2021). Editorial 2021. Language Testing, 38(1), 3–5. https://doi.org/10.1177/0265532220965757

10.

Isaacs

Winke

P. M.

(2024). Purposeful turns for more equitable and transparent publishing in language testing and assessment. Language Testing, 41(1), 3–8. https://doi.org/10.1177/02655322231203234

11.

Koizumi

Maie

Yanagisawa

In’nami

(2024). Considerations to promote and accelerate Open Science: A response to Winke. Language Testing, 41(4), 892–897. https://doi.org/10.1177/02655322241239379

12.

LaFlair

G. T.

(2024). An industry perspective on open science: A response to Winke. Language Testing, 41(4), 865–871. https://doi.org/10.1177/02655322241261716

13.

Liu

(2023). Whose open science are we talking about? From open science in psychology to open science in applied linguistics. Language Teaching, 56, 443–450. https://doi.org/10.1017/S0261444823000307

14.

Liu

Al-Hoorie

A. H.

Hiver

P. V.

(2024). Open access in language testing and assessment: The case of two flagship journals. Language Testing, 41(4), 703–728. https://doi.org/10.17605/osf.io/vbjd6

15.

Liu

Chong

S. W.

Marsden

McManus

Morgan-Short

Al-Hoorie

A. H.

Plonsky

Bolibaugh

Hiver

Winke

Huensch

Hui

(2023). Open scholarship in applied linguistics: What, why, and how. Language Teaching, 56(3), 432–437. https://doi.org/10.1017/S0261444822000349

16.

Marsden

(2019). Methodological transparency and its consequences for the quality and scope of research. In McKinley

Rose

(Eds.), The Routledge handbook of research methods in applied linguistics (1st ed., pp. 15–28). Routledge. https://doi.org/10.4324/9780367824471-2

17.

Marsden

Alferink

Andringa

Bolibaugh

Collins

Jackson

Kasprowicz

O’Reilly

Plonsky

(2018). Open Accessible Summaries in Language Studies (OASIS) [Database]. https://www.oasis-database.org

18.

Marsden

Mackey

(2014). IRIS: A new resource for second language research. Linguistic Approaches to Bilingualism, 4(1), 125–130. https://doi.org/10.1075/lab.4.1.05mar

19.

Nishizawa

(2024). Authenticity of academic lecture passages in high-stakes tests: A temporal fluency perspective. Language Testing, 41(4), 792–816. https://journals-sagepub-com-s.web.bisu.edu.cn/doi/10.1177/02655322241262453

20.

Pan

Marsden

(2024). Developing internet-based Tests of Aptitude for Language Learning (TALL): An open research endeavour. Language Testing, 41(4), 817–827. https://doi.org/10.1177/02655322241241849

21.

Papageorgiou

(2024). Can language test providers do more to support open science? A response to Winke. Language Testing, 41(4), 860–864. https://journals-sagepub-com-s.web.bisu.edu.cn/doi/10.1177/02655322241232361

22.

Plonsky

(2014). Study Quality in Quantitative L2 Research (1990-2010): A Methodological Synthesis and Call for Reform: Study Quality in Quantitative L2 Research (1990-2010). The Modern Language Journal, 98(1), 450–470. https://doi.org/10.1111/j.1540-4781.2014.12058.x

23.

Purpura

J. E.

Brown

J. D.

Schoonen

(2015). Improving the validity of quantitative measures in applied linguistics research. Language Learning, 65(Suppl. 1), 37–75. https://doi.org/10.1111/lang.12112

24.

UNESCO. (2021). UNESCO recommendation on Open Science. https://doi.org/10.54677/MNMH8546

25.

Winke

(2024). Sharing, collaborating, and building trust: How Open Science advances language testing. Language Testing, 41(4), 845–859. https://doi.org/10.1177/02655322231211159

26.

Jin

Fan

(2024). Open Science for language assessment research and practice in China: A response to Winke. Language Testing, 41(4), 877–881. https://doi.org/10.1177/02655322231223100