Universal tools activation in English language proficiency assessments: A comparison of Grades 1–12 English learners with and without disabilities

Abstract

English learners (ELs) comprise approximately 10% of kindergarten to Grade 12 students in US public schools, with about 15% of ELs identified as having disabilities. English language proficiency (ELP) assessments must adhere to universal design principles and incorporate universal tools, designed to increase accessibility for all ELs, including those with disabilities. This two-phase mixed methods study examined the extent Grades 1–12 ELs with and without disabilities activated universal tools during an online ELP assessment: Color Overlay, Color Contrast, Help Tools, Line Guide, Highlighter, Magnifier, and Sticky Notes. In Phase 1, analyses were conducted on 1.25 million students’ test and telemetry data (record of keystrokes and clicks). Phase 2 involved interviewing 55 ELs after test administration. Findings show that ELs activated the Line Guide, Highlighter, and Magnifier more frequently than others. The tool activation rate was higher in listening and reading domains than in speaking and writing. A significantly higher percentage of ELs with disabilities activated the tools than ELs without disabilities, but effect sizes were small; interview findings further revealed students’ rationale for tool use. Results indicate differences in ELs’ activation of universal tools depending on their disability category and language domain, providing evidence for the usefulness of these tools.

Keywords

Accessibility English language proficiency assessment English learners English learners with disabilities K–12 universal tools

Introduction

With the advancement of technology, computer-based or online assessments have incorporated numerous accessibility features that are intended to aid all students to access test content, fully demonstrate their language abilities (Shafer Willner & Monroe, 2016), and produce fair and valid outcomes. These accessibility features, also known as universal tools, are “selectable embedded features or hand-held instruments used to carry out a particular purpose” (Shafer Willner & Monroe, 2016, p. 3). They have the potential to better support test-takers, particularly kindergarten to 12th grade (K–12) English learners (ELs) as they engage with high-stakes English language proficiency (ELP) assessments in US school settings. We define ELs as students who use languages other than English at home and could benefit from support via language instruction educational programs.

Universal tools are often embedded into assessments as part of universal design. Universal design offers an approach to test development to improve accessibility and validity from the onset of design with consideration for all intended students, regardless of age, socio-economic status, disability, or linguistic or cultural background (American Educational Research Association [AERA] et al., 2014; Hansen & Mislevy, 2006). Universal design principles include presentation of content using multiple modalities, test items featuring prompts with supporting animations and graphics, embedded scaffolding, tasks broken into chunks, and modeling that uses task prototypes and guides (Shafer Willner & Monroe, 2016). They also include accessibility and accommodations; while accessibility is for all students, accommodations are limited to supports provided to students with disabilities. Thus, the aim of accessibility is to provide all students with an unrestricted opportunity to show their true performance on the measured construct (AERA et al., 2014). To this end, students can be presented with various supports (Shafer Willner & Monroe, 2016). Therefore, assessments employing universal design promise to reduce accessibility barriers for all students, including ELs and students with disabilities (Hansen & Mislevy, 2006; Liu & Anderson, 2008). In the United States, the Department of Education requires that state assessments follow universal design principles and incorporate universal tools to enhance accessibility for all students (see U.S. Department of Education [USDE], 2018).

Computer-based testing facilitates the possibility of incorporating built-in accessibility features (Abedi et al., 2020; Chia & Kachchaf, 2018; Thurlow et al., 2010; Wolf et al., 2022). Computer testing platforms are flexible to accommodate a variety of different features and their implementation can be more efficient (Chia & Kachchaf, 2018). For example, the features can be item specific, and they can be better matched to test-taker needs due to their interactive nature (e.g., needs-based use) (Solano-Flores, 2022). In addition, computer-based testing allows test-takers to use multiple features simultaneously, with the intention of improving accessibility (Abedi et al., 2020; Wolf et al., 2022). Accessibility features embedded in ELP assessments differ in nature from those in content assessments. The former tends to support the test-taking experience rather than providing explicit linguistic support (e.g., translating the prompt from English to other languages), considering that ELP assessments are designed to measure language ability.

Computer technology allows test developers to maintain and/or introduce various universal tools (e.g., note-taking, outlining, or other “scratch paper” tools; highlighters; and line guides) in ELP assessments that are related to processing skills and meaning-making. In addition, computer-based testing enables test developers to track the extent to which accessibility features are used and obtain data to explore cognitive processing (e.g., the response time) of test-takers (Crotts-Roohr & Sireci, 2017; Wolf et al., 2022). However, there is limited evidence regarding the activation and effectiveness of universal tools in ELP computer-based assessments, particularly in K–12 settings. The lack of research into how young language learners engage with universal tools on language assessments prevents test developers from thoroughly understanding whether these tools provide the benefits intended in the language testing context. The absence of research findings leaves open the possibility that the tools may not function as intended for young language learners. If universal tools are distracting for young language learners, drawing their attention away from the test content, they may introduce a risk to the validity of the assessment.

K–12 ELs

In Fall 2019 in the United States, Kindergarten to Grade 12 ELs constituted approximately 10% of the public-school student body, and this number is expected to grow (Irwin et al., 2022). ELs are a diverse group of students and include those with disabilities. The challenges ELs with disabilities face are multifaceted, involving both language- and disability-specific factors, which can be connected (Zehler et al., 2003).

ELs, including students with and without disabilities have a legal right to receive appropriate language support as they learn academic content (see Castañeda v. Pickard, 1981; Lau v. Nichols, 1974). The US federal law¹ also requires state and local education agencies to appropriately assess, place, and monitor the progress of ELs annually with ELP tests (Every Student Succeeds Act [ESSA], 2015; Section 1111 (b)(2)(G)). The law not only mandates maximizing the inclusion of ELs with and without disabilities but also states the importance of accessibility for all assessments, including ELP assessments in terms of their design and development (ESSA, 2015; USDE, 2014, 2018). Given the need for accessibility, test developers should consider employing universal design principles and incorporate necessary accessibility features (USDE, 2018). Accessibility features, such as virtual highlighters or online dictionaries, are intended to reduce the impact of extraneous and/or disability-related factors, so that, test-takers can show their abilities the assessments intend to measure.

Previous research on universal tools for ELs

Although there has been a plethora of research on accommodations (see Abedi, 2002; Pennock-Roman & Rivera, 2011 for overviews), studies on accessibility features are still emerging (AERA et al., 2014). Often, ELs with disabilities have been ignored in research on accessibility (Thurlow & Kopriva, 2015), with ELs and students with disabilities often examined separately (Albus & Thurlow, 2008). In addition, most of the research on computer-based accessibility features pertains to content assessments for ELs (Abedi, 2014; Abedi et al., 2012, 2020; Cohen et al., 2017; Crotts-Roohr & Sireci, 2017; De Backer et al., 2019; Wolf et al., 2022). Most of these studies focused on pop-up glossaries (Abedi et al., 2020; Cohen et al., 2017; Crotts-Roohr & Sireci, 2017; Kopriva et al., 2021; Wolf et al., 2022), tools for paraphrasing (Crotts-Roohr & Sireci, 2017), and visualization, such as font manipulation (Abedi et al., 2012; Kopriva et al., 2021).

For example, Cohen et al. (2017) investigated the effectiveness and validity of a pop-up glossary feature within a computer-based math and English language arts (ELA) assessment given to Grades 3 and 7 ELs versus non-ELs. The glossary benefited the EL population; however, results were somewhat inconsistent. Although Grade 7 ELs’ performance on the ELA assessment improved when they had access to glossaries, their math achievement suffered with glossary use. Similarly, Grade 3 students’ performance was slightly negatively affected by the tool, suggesting potential distraction as a result of using the glossary tool. Such study findings could make one question the effectiveness of universal tools.

In a randomized control design, Wolf et al. (2022) assigned no support, linguistic modification, or English glossary to 518 ELs and non-ELs on a ninth-grade math test. Regression analysis results indicated that students’ reading proficiency was a predictor of math performance, but not the universal tools. Notably, the authors explored the response time and telemetry data (a record of keystrokes and clicks), which revealed that when the test language was modified or a glossary was presented, both groups of students spent more time on the test, but the differences were insignificant. The telemetry data on glossary activation showed that most students used this tool only once, and fewer than 15% of the students used the function again. The authors believe “what matters is whether there are any students who use the accommodation, rather than whether there is a sizable number of students using the accommodation” (p. 43). Hence, significance testing results or frequency analysis might not suggest conclusive results about the effectiveness of the tools.

Similarly, in another study, De Backer et al. (2019) investigated the perceptions about read-aloud, test translation, and test translation accompanied by read-aloud in a Dutch science test. A total of 752 fifth-grade students were randomly assigned to these universal tools. The students were given a questionnaire to gauge the usefulness of the tools, and 35 students were interviewed about their experience with the presented tools. The questionnaire’s results showed that students’ perceptions varied based on the support provided. While more than half of the students found translations useful, 60% of those not receiving this support thought it would have been helpful. On the other hand, 40% of the students perceived read-aloud as useful and only 36% of those not receiving it wished to have this tool. The interview results showed students’ use of the tools changed based on item content and difficulty. The students also utilized them for a variety of purposes, such as gaining more information about the items, learning new vocabulary, or gaining time for other test items. Meanwhile, the students not using the test tools indicated that they did not need them. The authors concluded that it is difficult to generalize about ELs’ tool use as “the perceived value of the accommodation does not necessarily depend on the frequency of use” (p. 437) for the highly heterogeneous EL student population.

Compared to the literature on accessibility features in content assessments, research on such tools in ELP assessments is limited. In a meta-analysis, Liu et al. (2020) found only 11 studies between 2010 and 2018 that were conducted on ELs and only a couple focused on ELP assessments. Existing studies focus on foreign language assessments in higher education rather than K–12 contexts (e.g., Choi & Cho, 2016; Frankenberg-Garcia, 2011). These studies typically explore linguistic tools in reading and writing assessments. For instance, Oh (2018) researched adult test-takers’ use of spelling, grammar, dictionary, and thesaurus tools in a writing assessment. The performance of the participants at different proficiency levels with and without access to the universal tools across different tasks was compared. The findings provided evidence for reliability of the assessment and generalizability of the score interpretations when test-takers were allowed access to universal tools, specifically spelling and reference tools. The study also demonstrated that inclusion of these tools increased interactivity and the authenticity of tasks.

The use of universal tools in K–12 ELP assessments is a nascent area of research (e.g., Educational Testing Service, 2020; Guzman-Orth et al., 2020). For instance, Kim et al. (2022) investigated EL educators’ perceptions of universal tools and how their students utilized them. In a mixed methods study, the researchers surveyed K–12 educators from 30 states and conducted follow-up interviews. The authors indicated that educator perceptions of universal tools varied, and educators valued certain tools more than others. The tools educators valued most were the highlighter, line guide, and underlining tool. The educators also attributed the frequency of students’ use of universal tools to a variety of factors, including familiarity with computers, length of residence in the United States, grade level, and special needs. However, more research is needed to understand ELs’ actual use of the tools.

Overall, the accessibility of ELP assessments for K–12 students is not well understood, especially in the context of computer-delivered ELP assessments. In addition, the field has yet to examine how ELs with or without disabilities access these assessments. Computer-delivered assessments with embedded universal tools may provide an opportunity for ELs to better access the test content and demonstrate their language development.

Study purpose and research questions

The current research aims to address the gap in the literature by investigating how ELs utilize universal tools embedded in ELP assessments and whether these tools provide the intended support to this underserved population. In detail, this study² examines the activation of universal tools embedded in an online ELP assessment by Grades 1–12 ELs with and without disabilities. In a two-phase mixed methods study, we explore the tools ELs frequently activate and how these patterns of activation vary based on students’ disability status. While the Phase 1 quantitative findings from analyzing telemetry data provide us with an overview of students’ activation of universal tools, the Phase 2 qualitative findings from student interviews help us understand the rationale for students’ use of the tools and their helpfulness.

Findings from the current study could inform the extent to which universal tools serve intended subgroups. Note that the effectiveness of the universal tools—whether they actually improve ELs’ test performance—beyond the scope of the study. The study is mainly concerned with the tool activation behavior of ELs, given the available data, which provide a baseline understanding of complex student test-taking behaviors among a highly diverse test population. In addition, the interview data offer insights regarding students’ rationale for using the tools. The following research questions are addressed in the study:

To what extent do ELs, including ELs with and without disabilities, activate the universal tools embedded in an ELP assessment?

To what extent do ELs with and without disabilities differ in their activation of the universal tools?

Do ELs with disabilities vary in their activation of universal tools depending on their disability type?

What were the ELs’ rationales for activating the universal tools?

Research context

This study elucidates the extent to which Grades 1–12 ELs activate the accessibility features embedded in ACCESS for ELLs (hereafter ACCESS), an ELP assessment annually administered across 41 US states and territories. Although ACCESS is offered as both a paper and online test, this study focuses on the online format, available for Grades 1–12. ACCESS Online, administered annually to approximately 1.5 million ELs, is developed by WIDA (2022c) and the Center for Applied Linguistics with extensive educator collaboration. The test measures the academic English language development of students in the four language domains of listening, reading, speaking, and writing. ACCESS Online is delivered completely via computer by default, except for the Grades 1–3 writing test, in which students handwrite their responses. A recent study by Kim, Lee et al., (2019) revealed that students in these younger grades can produce more writing and typically perform better when they handwrite rather than when they type on a keyboard. Grades 4–12 students also have the option to handwrite their responses if this accommodation is required. Therefore, the data for the writing domain are smaller compared to those in other domains.

Methods

We conducted a two-phase mixed methods study to analyze both quantitative and qualitative data to understand ELs’ activation of universal tools. For Phase 1 of the study, we analyzed telemetry data (a record of keystrokes and clicks) to examine students’ activation of the tools. We define “activate” as turning on a given universal tool and applying it when responding to the test questions. The telemetry data used in the study do not allow us to make determinations of students’ intentions or to know whether tool use was a purposeful action (e.g., a student applying highlighter to mark an important part of the text). For this reason, we conducted Phase 2 to better understand students’ intentions of using the universal tools (see Table 1 for a summary of the methods).

Table 1.

Summary of study methods.

	Phase 1 (quantitative)	Phase 2 (qualitative)
Data	Telemetry data of 1.25 million Grades 1–12 ELs who took ACCESS (including 12% of students with disabilities)	55 Grades 4–12 students (including 15 students with disabilities)
Procedures	• Descriptive and frequency analyses for each universal tool in the four language domain tests • Analyses conducted separately on ELs with and without disabilities	Qualitative analysis of: • The tools used • Their rationale for tool use • Suggestions for improving the tools

Data

For Phase 1, data consisted of students’ ACCESS telemetry data and demographic variables, most notably disability status and category. We analyzed data of approximately 1.25 million Grades 1–12 ELs from school year 2016–2017. Roughly, 12% of the students were reported to have one or more disabilities.³ The majority of the participating ELs identified as Hispanic (64.2%) and there were slightly more males (53.9%) than females (44.4%). Telemetry data contain information on a variety of student actions, such as activation of universal tools as well as the duration of their actions. Numerous universal tools are embedded in the online test platform. This study explored eight of those nine tools: Color Overlay, Color Contrast, Highlighter, Line Guide, Magnifier, Help (Help [General] and Help [Tools]), and Sticky Notes (see Table 2 and Figure 1). Most of these universal tools offer ELs support in processing the language input of the test. For example, the two Color tools, Line Guide, and Highlighter are intended to support students’ processing of the written input provided throughout the assessment, not just on the reading test. With the exception of the listening test, students must process written input throughout the ACCESS test in the form of stimuli for the constructed response tasks, and the reading passages on the reading test. Sticky Notes are intended to support students as they plan for and then respond to writing tasks. Students may list key words, note an outline for their response, or draft text for their final response. The universal tools are available throughout each domain and can be activated for individual items, except for the Sticky Notes, which is exclusive to the writing domain. The only universal tool not included in this study was the test pause button, which does not contribute to the accessibility of the assessment as directly as the others. Students could activate the tools an unlimited number of times at any point during the test.

Table 2.

Description of universal tools examined in this study.

ACCESS Online universal tools	Availability
Color Overlay allows test-takers to only change the color behind the words and pictures. There are six pre-defined background colors	All four domains
Color Contrast allows test-takers to change the color of words and the color behind words. There are six pre-defined combinations	All four domains
Highlighter allows test-takers to mark parts of the text presented	All four domains
Line Guide allows test-takers to drag a horizontal line across the lines of the text presented	All four domains
Magnifier allows test-takers to manipulate the graphic and text size, which can be enlarged to 1.5 or 2.0 times the default size	All four domains
Help gives test-takers more information about the universal tools, with two options: (1) “What’s This?” (which is referred to as Help [General] in the report) that describes how to use the Help tool and (2) “Open Help” (which is referred to as Help [Tools]) that explains how to navigate the online test platform and activate the universal tools	All four domains
Sticky Notes gives test-takers a free-write space to organize ideas and plan their writing	Writing

Figure 1.

A sample test item in ACCESS Online showing universal tools at the bottom of the screen.

Phase 2 data consisted of interview and telemetry data of 55 ELs (see Table 3) in Grades 4–12.⁴ Of the 55 students, 15 students were identified with disabilities: All 15 students had a primary disability of specific learning disability; and four students had a secondary disability of speech/language impairment. Data were collected from five different school districts across three states in the United States.

Table 3.

ELs’ reported use of universal tools.

	Color tools	Help tools	Highlighter	Magnifier	Line Guide	Sticky Notes
All ELs (n = 55)
Listening	1 (1.8%)	2 (3.6%)	4 (7.3%)	4 (7.3%)	2 (3.6%)	0 (0.0%)
Reading	0 (0.0%)	0 (0.0%)	13 (23.6%)	11 (20.0%)	4 (7.3%)	0 (0.0%)
Speaking	0 (0.0%)	1 (1.8%)	1 (1.8%)	3 (5.5%)	1 (1.8%)	0 (0.0%)
Writing	0 (0.0%)	1 (1.8%)	9 (16.4%)	3 (5.5%)	1 (1.8%)	5 (9.1%)
Total	1 (1.8%)	4 (7.3%)	27 (49.1%)	21 (38.2%)	8 (14.5%)	5 (9.1%)
ELs without disabilities (n = 40)
Listening	0 (0.0%)	2 (5.0%)	3 (7.5%)	2 (5.0%)	1 (2.5%)	0 (0.0%)
Reading	0 (0.0%)	1 (2.5%)	8 (20.0%)	10 (25.0%)	4 (10.0%)	0 (0.0%)
Speaking	0 (0.0%)	1 (2.5%)	0 (0.0%)	3 (7.5%)	1 (2.5%)	0 (0.0%)
Writing	0 (0.0%)	1 (2.5%)	9 (22.5%)	3 (7.5%)	1 (2.5%)	3 (7.5%)
Total	0 (0.0%)	4 (10.0%)	20 (50.0%)	16 (40.0%)	7 (17.5%)	3 (7.5%)
ELs with disabilities (n = 15)
Listening	1 (6.7%)	0 (0.0%)	1 (6.7%)	2 (13.3%)	1 (6.7%)	0 (0.0%)
Reading	0 (0.0%)	0 (0.0%)	5 (33.3%)	2 (13.3%)	0 (0.0%)	0 (0.0%)
Speaking	0 (0.0%)	0 (0.0%)	1 (6.7%)	0 (0.0%)	0 (0.0%)	0 (0.0%)
Writing	0 (0.0%)	0 (0.0%)	0 (0.0%)	1 (6.7%)	0 (0.0%)	2 (13.3%)
Total	1 (6.7%)	0 (0.0%)	7 (46.7%)	5 (33.3%)	1 (6.7%)	2 (13.3%)

Note: EL: English learner.

Procedures

For Phase 1, descriptive and frequency analyses were conducted for each universal tool in each of the four language domains to explore the extent students activated the universal tools built into ACCESS. Tool activations were item specific; once activated, the tool remained activated until a student proceeded to the next item. We analyzed the number of activations for each of the tools in each domain for each student. The main focus was on examining the percentages of students who activated a given tool at least once while completing a domain. To reiterate, the term activate was operationalized as turning on a given tool and applying it when responding the item.

To learn about the variation of universal tool activation between ELs with and without disabilities, descriptive and frequency analyses were conducted separately on the two groups. Universal tool activation was also explored by varying disability type: autism, deaf–blindness, developmental delay, hearing impairment, intellectual disability, multiple disability, other health impairments, orthopedic impairment, serious emotional disability, specific learning disability, speech impairment, traumatic brain injury, and visual impairment.

For Phase 2, the research team interviewed students regarding their tool use after taking ACCESS (retrospective verbal protocol). The interviews were conducted individually in English via the Zoom video conferencing platform for approximately 15–20 minutes, and interviews were video-recorded. Researchers first transcribed the interviews and then analyzed students’ verbal data according to (1) the tools they reported having used during the test, (2) their rationale for tool use or for not using the tools, and (3) suggestions for improving the tools in the future. Two members of the research team coded the data set independently. Then, the researchers reviewed each other’s coding of the student interviews. Discrepancies in coding were resolved during a meeting with the entire research team.

Results

In this section, we present findings from the two phases of the study. Phase 1 focuses on the telemetry data—which tools were activated most frequently across the test domains, the differences in tool activation between ELs with and without disabilities, and tool activation according to disability types. Phase 2 results provide an overview of which tools the interviewees used, their rationale for using the tools, and their suggested enhancements to the tools.

Phase 1—Telemetry data

Activation of tools among all ELs

Before presenting the findings from Phase 1 of the study, it is important to note the overall low rate of tool activation across the four domains, which was generally below 15%. In the listening domain (see Figure 2), more students activated the Magnifier (9.3%), Line Guide (8.7%), and Highlighter (4.7%) than other tools. In addition, students activated the Highlighter and Magnifier most frequently as indicated by the mean and median. Comparatively, a low percentage of students activated the two Help and two Color tools, with the Help tools having the lowest activation rate (1.8%). The majority of the activations of the Help and Color tools were also singular occurrences.

Figure 2.

Frequency of universal tool activation among all ELs across four domains.

Figure 3.

Frequency of universal tool activation by ELs with and without disabilities.

In the reading test (see Figure 2), more students activated the Highlighter at 11.1% (Med. = 9), compared to 4.7% of students who activated it in the listening domain. The percentage of students using the Magnifier remained the same as in the listening domain at 9.3%. The Line Guide was expected to attract more students in the reading domain than in listening, yet fewer ELs activated this tool in the listening domain. However, the median of the activation was greater at 2 than in listening (Med. = 1). The two help tools were least activated with fewer students (0.8%) activating them compared to the listening domain (1.8%).

There was a general drop in the frequency of tool activation in the speaking and writing domains compared to the listening and reading domains. In the speaking domain (see Figure 2), the tool activation rate was higher for the Magnifier (5%) and Line Guide (4.5%) than for the other tools. Highlighter activation dropped considerably to 3.8% (Med. = 5). The two Help tools were again the least activated features (about 1%) with single-time occurrence.

In the writing test (see Figure 2), the Highlighter was the most activated tool (5.3%), followed by the Magnifier (4.9%). Also, 4% of students activated the Line Guide and Sticky Notes (Med. = 2); the activation of the Sticky Notes tool (only available on the writing domain) was lower than expected despite its presupposed usefulness in providing students a place to organize their ideas before responding to the writing prompt. Again, the two Help features were the least frequently activated tools (fewer than 1%) similar to the speaking domain. To summarize the Phase 1 findings regarding all ELs, a higher percentage of students activated the Highlighter, Line Guide, and Magnifier than the other tools. The Highlighter was the most activated feature in both the reading and writing sections. However, the Sticky Notes activation in the writing domain was lower than expected. Notably, as students progressed through the assessment from the listening and reading domains to either the speaking or writing domain, fewer students activated the tools. The continuous decrease in the activation of the Help tools could be partially attributed to the students’ increased familiarity with the computer environment as they spent more time on the assessment.

Difference in tool activation between ELs with and without disabilities

Both ELs with and without disabilities demonstrated tool activation similar to the overall EL group (or “all ELs” in short). However, percentages of students activating the universal tools were generally slightly higher among students with disabilities than students without disabilities across all domains. In the listening domain, a higher percentage of students with disabilities activated all tools compared to students without disabilities (see Figure 3). Students without disabilities activated the Magnifier and Line Guide at a lower rate of 2.8% and 1.6%, respectively, compared to students with disabilities. The Highlighter was the third most activated tool among students with disabilities, with 5.9% (Med. = 6) using the tool, which was 1.4% higher than the rate by students without disabilities. Activation of the Color and Help tools were more limited within both groups with group differences ranging between .1% and .8 %.

Significance testing of differences between students with and without disabilities was conducted for each accessibility feature and the effect size was calculated. Due to non-normal distribution (i.e., positive skew) of tool activation across the four domains and unequal samples sizes for each group, the Mann–Whitney U test was used. This is because parametric tests may not yield robust results when the violation of distributional assumptions is mixed with unbalanced sample sizes (Howell, 2013). Small (p-values) and significant results would be anticipated with large sample sizes as even small changes will be detected. Calculating effect sizes becomes more useful and reliable in large sample cases as sample size is taken into consideration.

In the listening domain, the two groups significantly differed with respect to their activation of the Color Overlay, Color Contrast, Line Guide, Highlighter, and Magnifier, yet effect sizes were small. For example, activation of the Highlighter was statistically significantly higher for students with disabilities (mean rank = 27,213.38) than students without disabilities (mean rank = 25,436), U = 197,236,889, z = 10.161, p < .001; but the effect size was negligible (effect size = 0.04).

Activation of tools in the reading domain decreased compared to the listening domain for both students with and without disabilities, except for the Highlighter (see Figure 3). Almost, twice as many students with disabilities (11.4%) and students without disabilities (10%) activated the tool in the reading test than in listening. Like the listening domain, in general, more students with disabilities activated the tools compared to students without disabilities. Mann–Whitney U findings for the reading domain showed a significant group effect for four of the universal tools: Color Contrast, Line Guide, Highlighter, and Magnifier. However, similar to the listening domain, effect sizes were small.

Activation of tools by students with and without disabilities generally decreased in the speaking and writing domains (see Figure 3). Consequently, group differences in tool activation also narrowed. For instance, in the speaking domain, the Magnifier was the most activated tool, with only a 1% of difference between the two groups. Similarly, the difference in the tool activation between students with and without disabilities was the smallest in the writing domain, with differences below .5%. There were significant group differences in the activation of the Highlighter and Magnifier, but effect sizes were small. Although Mann–Whitney U findings showed statistical differences between the Highlighter and Magnifier in both domains, effect sizes were small.

Based on the Phase 1 data, universal tool activation was slightly higher among students with disabilities than students without disabilities. The group differences were more noticeable in the listening and reading domains than in the speaking and writing domains, especially for the Line Guide, Highlighter, and Magnifier. Although group differences were statistically significant for certain tools (e.g., Highlighter and Magnifier), effect sizes were very small, indicating the differences were not particularly meaningful.

Activation of universal tools among ELs with disabilities by disability category

Universal tool activation was postulated to vary by different disability categories for ELs with disabilities (see Table 4). The Phase 1 findings indicate that in all domains, specific learning disability (18%; e.g., disorder in the basic psychological processes involved in understanding or using languages, such as dyslexia and developmental aphasia) was the most common primary disability type, and speech or language impairment (6.5%) was the second most common disability type. Deaf–blindness (0.01%) and visual impairments (0.05%) were the least common primary disability types.

Table 4.

Universal tool activation by disability type in each language domain.

Disability type	Domain	N	Color Overlay (%)	Color Contrast (%)	Help (general) (%)	Help (tools) (%)	Line Guide (%)	Highlighter (%)	Magnifier (%)	Sticky Notes (%)
Autism spectrum disorder	L	1,913	2.4	2.0	4.0	2.6	9.8	6.4	11.5
	R	1,908	1.9	2.1	1.5	0.9	8.6	10.5	9.6
	S	1,838	1.5	1.7	1.4	1.4	5.8	5.8	6.3
	W	922	1.2	1.7	1.7	1.0	3.9	6.8	6.0	4.8
Deaf–blindness	L	16	0	0	0	0	6.3	0	18.8
	R	18	0	0	0	0	5.6	22.2	5.6
	S	11	9.1	0.0	0.0	0.0	9.1	0.0	0.0
	W	9	0.0	0.0	0.0	0.0	22.2	11.1	11.1	11.1
Developmental delay	L	2,979	1.2	1.2	1.6	1.0	6.1	3.7	7.7
	R	2,970	1.3	1.2	1.0	0.5	6.9	5.6	7.7
	S	2,915	0.7	0.8	0.4	0.4	3.3	2.5	4.7
	W	97	0.0	2.1	1.0	0.0	8.2	10.3	14.4	4.1
Hearing impairment, including deafness	L	461	3.0	3.0	2.6	1.7	9.8	3.9	11.7
	R	483	2.3	1.4	0.4	1.0	10.6	12.6	9.9
	S	446	2.0	1.6	1.1	1.3	4.0	5.4	5.8
	W	302	2.3	2.0	1.0	0.7	5.0	4.3	6.6	3.3
Intellectual disability	L	1,860	2.4	1.7	3.2	1.8	9.8	4.9	11.5
	R	1,844	1.7	1.4	0.9	.5	6.7	9.2	10.0
	S	1,758	1.3	1.0	0.7	1.0	3.8	3.5	4.7
	W	1,374	0.6	0.5	0.4	0.4	3.6	4.0	4.4	2.5
Multiple disability	L	441	2.7	0.7	2.5	1.4	10.0	7.0	12.7
	R	438	1.6	1.1	0.5	0.9	8.0	9.4	11.9
	S	425	0.9	0.9	0.7	0.7	5.9	5.2	8.2
	W	296	1.0	2.0	1.0	0.3	4.7	5.1	6.4	2.4
Other health impairments	L	4,691	4.6	4.4	3.3	2.5	13.4	8.8	15.5
	R	4,673	3.0	3.3	1.3	1.0	11.1	13.2	13.3
	S	4,568	2.5	3.0	1.5	1.2	6.4	6.0	7.5
	W	3,103	2.8	3.1	0.9	0.7	4.9	6.1	5.6	5.5
Orthopedic impairment	L	176	2.3	1.7	1.7	3.4	11.4	5.1	9.7
	R	176	2.3	2.3	1.1	0.6	13.1	12.5	11.9
	S	173	0.6	1.7	1.7	1.7	4.6	2.9	3.5
	W	95	1.1	1.1	0.0	1.1	6.3	6.3	3.2	4.2
Serious emotional disability	L	1,281	5.7	5.2	4.3	3.7	17.3	9.8	16.7
	R	1,260	3.3	3.3	1.3	1.0	10.6	13.1	12.0
	S	1,196	3.9	3.7	1.9	1.7	8.7	9.9	9.7
	W	930	2.3	3.0	1.0	1.1	5.8	5.5	6.8	5.8
Specific learning disability	L	26,599	3.3	2.8	2.6	1.9	9.9	5.7	11.8
	R	26,520	2.5	2.6	0.9	0.6	9.2	12.2	11.7
	S	25,917	1.9	1.9	1.1	0.9	5.0	4.7	5.9
	W	19,911	1.7	1.7	0.8	0.6	3.6	5.0	5.1	4.0
Speech or language impairment	L	9,576	2.0	2.0	2.4	1.5	7.8	4.9	9.6
	R	9,560	1.8	1.9	1.0	0.6	8.6	10.0	11.1
	S	9,409	1.4	1.5	1.0	0.8	5.0	4.1	5.7
	W	3,295	2.0	2.1	1.3	0.8	5.7	7.1	6.7	5.8
Traumatic brain injury	L	123	0	0.8	3.3	0	6.5	7.3	13.8
	R	121	0.8	0.8	0.8	0.8	8.3	11.6	14.0
	S	118	0.8	0.8	0.0	0.0	4.2	1.7	3.4
	W	81	0.0	1.2	1.2	0.0	3.7	3.7	7.4	4.9
Visual impairment, including blindness	L	82	1.2	3.7	0	0	14.6	1.2	41.5
	R	80	3.8	3.8	1.3	0	11.3	8.8	51.2
	S	75	1.3	4.0	2.7	1.3	4.0	4.0	18.7
	W	49	2.0	6.1	2.0	0.0	6.1	2.0	51.0	0.0
NA	L	97,583	3.3	3.0	2.8	2.1	10.3	5.9	11.9
	R	97,544	2.5	2.4	1.0	0.7	8.4	11.5	10.9
	S	94,687	1.8	1.9	1.0	0.8	5.1	4.7	6.2
	W	62,292	2.0	2.1	0.8	0.6	4.2	5.6	5.4	4.4

In the listening domain (see Table 4), the activation rate of all tools was highest among students with a serious emotional disability. In the United States, individual states base their definitions of serious emotional disability on the Diagnostic and Statistical Manual of Mental Disorders (American Psychiatric Association, 2022), with most state definitions mentioning a functional impairment that may impact a child’s ability to engage with school activities and ability to learn. Students with other health impairments had the second highest rate of tool activation. One exception was the Magnifier: 42% of students with a visual impairment activated the Magnifier, and 19% of deaf–blind students activated the tool. Meanwhile, students with deaf–blindness, developmental delays, and visual impairments had generally low tool activation rates. For example, students with deaf–blindness only activated the Line Guide and Magnifier.

In the reading domain (see Table 4), students with a visual impairment and a serious emotional disability generally displayed high tool activation rates. In particular, the Magnifier activation was highest among students with visual impairments (51.2%). In addition, students with other health impairments displayed high activation of the Line Guide, Highlighter, and Magnifier tools. Notably, students with deaf–blindness displayed the highest activation rate of the Highlighter (22.2%). In contrast, students with a developmental delay and intellectual disability generally had low rates of tool activation.

In the speaking domain (see Table 4), tool activation was generally higher among students with a serious emotional disability than other disability groups. Students with a visual impairment was an exception, with a high percentage of these students activating the Magnifier (18.7%). Yet, in comparison to the reading domain, the Magnifier activation generally dropped among all students with disabilities in the speaking domain. Regarding other tools, students with deaf–blindness showed comparatively high activation of the Color Overlay (9.1%) and Line Guide (9.1%) although they did not activate any other tools. The universal tool activation was lowest among students with developmental delays, intellectual disabilities, and traumatic brain injuries.

In the writing domain (see Table 4), similar to other domains, students identified with serious emotional disabilities and visual impairments had comparatively high rates of tool activation. In particular, the rate of the Magnifier activation was high among students with visual impairments, as 51% of these students activated this tool in the writing section. Deaf–blind students displayed a high rate of activation of the Line Guide (22.2%), Highlighter (11.1%), Magnifier (11.1%), and Sticky Notes (11.1%). Meanwhile, universal tool activation in the writing domain was generally the lowest among students with an intellectual disability.

In conclusion, students with a serious emotional disability activated the tools at comparatively high rates across the four domains. Similarly, students with visual impairments had a higher rate of activation of the Color Contrast and Magnifier tools. However, students with developmental delays, intellectual disabilities, and traumatic brain injuries displayed low universal tool activation. In the listening domain, deaf–blind students displayed a high rate of Magnifier activation. Similarly, in the reading domain, students with visual impairments activated the Magnifier more frequently than other students. In the speaking and writing domains, a high percentage of deaf–blind students activated the Line Guide.

Phase 2—Student interviews

Activation of universal tools: Perspectives of all ELs

The Phase 1 telemetry findings were confirmed via results from the Phase 2 student interviews (see Table 3). Overall, students reported having used the Highlighter (49.1%), Magnifier (38.2%), and Line Guide (14.5%) more than other embedded tools. As observed in the telemetry findings, the Highlighter was the most activated feature in both the reading and writing tests. It is also worth noting that the Phase 2 findings did not indicate meaningful differences in tool use between students with and without disabilities, partly due to the limited size of the number of interviewees (see Table 3).

Students’ rationales for activating universal tools

The interview findings provide insight into the reasons why ELs (n = 55) activated or did not activate the universal tools during the test. The rationale for not using the tools is equally meaningful as the rationale for using the tools considering the relatively low activation of the tools, as discussed above. During the interviews, we learned that most students used the tools for their intended purposes. For example, per the WIDA (2022b) Accessibility and Accommodations Manual, “a sticky notes tool is built into the test platform for the Writing test . . . [this tool can be opened] to create a small box in which to type notes” (p. 7). ELs’ use of the tools generally aligned with the official guidance provided to educators in pre-testing materials and student-facing test demonstrations and practice.

The Highlighter was the most popular tool with 27 students having reported using the tool across the four domains (see Table 3). Students explained that their primary reason for using the Highlighter was to highlight key words or information on the screen (n = 21). A fifth grader with disabilities—specific learning disability—recalled using the Highlighter during the reading test “to help me answer the questions . . . I could use the Highlighter to highlight stuff to help me [answer the question] and to help me with stuff I don’t know.” In addition, an eighth grader shared that she “used the Highlighter to highlight the words related to the question” while taking the reading test. Meanwhile, an eleventh grader described in detail how she used the Highlighter:

I basically looked through all the information they gave me, and based on the question, I go back to the reading and then try to find what it is for arguing. I looked through those, and I argue based on that information, so it matches the question that they are asking . . . It points out what I need to focus on instead of looking at the screen that has a lot of words on it. It’s basically like using elimination so that I don’t have to focus on things that does not matter.

These students all used the Highlighter in ways that helped them respond to various tasks on different domains. Although the majority of students used the tools according to the tools’ intended use, some students reported that they activated the tools accidentally or out of curiosity. For instance, two students activated the Highlighter by accident. Once they had learned what the tools could do, they then used the tools while taking the assessment.

The Magnifier was the second most frequently used tool by ELs (n = 21). Increasing the size of the text on the screen (n = 19) was the principal reason for using the magnifier. Notably, the Magnifier was used most often by students (n = 11) while taking the reading test. One fourth grader explained that she “couldn’t really read [the letters in the item stem],” so she used the Magnifier while taking the reading test. A sixth grader with an individualized education program (IEP) echoed this sentiment as she used the Magnifier during the reading test “because the words are small, so I couldn’t see them. I needed to make them bigger.” Another sixth-grade student also shared that she used the Magnifier to increase the size of the words: “I couldn’t see the words.” Therefore, the student activated the Magnifier during the speaking test.

Per the interviews, the number of students who used the Line Guide (n = 8) and Sticky Notes (n = 5) was not as high as the number of students who used the Highlighter and Magnifier; however, when ELs used these tools, they used them in expected ways. For example, students who activated the Line Guide did so to focus their attention on the text on the screen. One ninth grader used the Line Guide on the writing test because it allowed her to “focus on one sentence at a time,” while a sixth grader used the Line Guide on the reading test because if he got lost while reading, “it showed me where I needed to be.” In addition, the Sticky Notes, which are only available during the writing test, were also used as expected. One fifth grader shared that he used the Sticky Notes “to takes notes about how the story began or what happens” in the story. A seventh grader explained that she used the Sticky Notes to “make a list of what [the test narrator] suggested. Then I wrote it into a very short summary” before responding to the writing prompt.

Students’ rationales for limited activation of universal tool

According to findings from both phases, there was a high percentage of students who did not activate the universal tools. Therefore, it was essential to examine why students did not use the tools. According to the interviews, students were least likely to use the tools during the speaking test, with only six students activating them. Listening was another domain that required little use of the tools (n = 13), followed by the writing test (n = 19). The two most common reasons for not using the universal tools were not needing the tools and not knowing what the tools were. ELs reported that they did not need the tools (71 times) when answering interview questions about tool use across the domain tests. They also shared that they did not know what the tools were (16 times).

A fourth grader who never used the tools because she did not need them had similar reasons for not using the tools across all domains. She, for example, did not activate the tools during the listening test because she just “listened to what they were saying” and answered the questions. Likewise, for the reading test, she noted that she “could just read . . . and didn’t have to use the buttons (tools).” An eighth grader who never used the tools explained that he “didn’t need the Highlighter” for the reading test because he “knew the content.” This student also shared that he did not activate the tools during the writing test because he did not need them as he “was busy writing his responses” to the prompts.

As mentioned earlier, some ELs were not aware of the tools and their functionality. A fourth-grade student recalled not using the universal tools on the listening and reading tests because he “wasn’t sure what they were going to do” if he had clicked on the buttons to activate the tools. Another fourth grader, with an IEP, shared that she did not activate the tools because “I didn’t know what they were, so I didn’t touch them. I just left them there.” A fifth-grade student “didn’t really know what they were in the beginning” when she was taking the listening and reading tests. However, she activated one tool (Magnifier) during the speaking test. Two middle school students conveyed that they did not know anything about the tools. One of these students, a sixth grader, explained that she did not use any of the tools because she “didn’t know what they were.” Finally, none of the high school students reported a lack of knowledge of the universal tools.

Students’ suggestions for improving the universal tools

The student interviews also provided insight into improving the universal tools. The themes that emerged were: (1) enhancing existing tools and (2) creating new tools. For enhancing existing tools, one student requested the option to click and highlight a word, instead of having to activate the Highlighter by dragging it across the words to be highlighted. Three students also recommended that the Highlighter and Sticky Notes be available in different colors, so that they could use them for unique purposes, such as using different colors of Sticky Notes for different topics. For the Magnifier, two students wanted more options for the zoom ratio instead of choosing 1.5- or 2-times magnification from the drop-down menu; for example, suggestions included typing the zoom ratio or having an adjustable scale. Also, a student with an IEP requested automatic magnification of the screen for students who wear glasses. Other general recommendations included: (1) enlarging the universal tools and moving them to the left or right side of the screen, (2) adding labels to the tools to make it easier for test-takers to understand the intended purpose of each tool, and (3) possibly reducing the number of tools as having too many tools could be confusing for students who are new to learning English and taking ACCESS.

Regarding the second theme of creating new tools, one student found the Highlighter to be distracting, so instead, she recommended the development of an underlining tool. This new tool could be activated for use on any text on the screen. Several ELs also suggested the addition of an embedded dictionary across the domains. This tool would allow students to click on a key word and then read the definition of the word in a pop-up box. A handful of ELs requested this dictionary tool be multilingual or serve as a translation tool for key words; however, such features may not be ideal as universal tools should not affect the language construct that is measured on the assessment. Finally, as the reading test is the only test in which students do not hear any of the words that appear on the screen, one student requested a pronunciation tool for this test. Students would click on key words in the reading passages to hear these words pronounced to help them better understand them.

Discussion

This study aimed to investigate Grades 1–12 ELs’ activation of universal tools embedded in an ELP assessment. It sheds light on the extent to which ELs activate the accessibility tools across test domains and their rationale for tool activation. It also explores how disability and disability type affect students’ activation of the tools. Findings also suggest potential enhancements to universal tools. Although universal tools are intended to improve access to content, the fact that they are available in an assessment does not guarantee their activation. Therefore, it is important to analyze students’ interactions with these tools to understand their behavior. Examining ELs’ activation of universal tools could help researchers and educators better support ELs in accessing test content, thereby increasing the relevance of the tools to students and the tools’ effectiveness (Sireci & Faulkner-Bond, 2015). This study also uncovers essential design considerations for the presentation of the tools.

Before discussing key findings, we recognize that this study has several limitations. First, for Phase 2, it was difficult to conduct the interviews immediately after test administration because ACCESS is administered across multiple days in school settings with students completing one or two domains during a test session. Students were often interviewed within a week after completing ACCESS, but a concurrent or immediate retrospective verbal protocol may provide more accurate results to students’ rationales for tool activation. Second, for Phase 1, disability types were not reported for many students. Thus, the results pertaining to ELs with a specific disability need to be interpreted with caution. Disability-specific findings were based on reported data only; therefore, findings could have been different if the data provided information on all ELs with disabilities. Finally, the assessment consisted of different item types (e.g., multiple choice, open ended). The findings presented are domain based rather than item based. Future research could focus on whether different item types trigger more tool activation among EL test-takers. Despite these weaknesses, the study findings are meaningful to the field as discussed below. The authors recognize there would be value to the field in releasing the data set that underlies the quantitative analyses reported in this paper. Unfortunately, this is not possible due to the data use agreements in place between WIDA and the state education agencies that participate in the ACCESS assessment.

Students’ limited activation of universal tools

Both quantitative and qualitative findings from this study show that a limited number of students activated the universal tools across all domains of ACCESS and that tool activation generally dropped as students progressed to the latter domains (i.e., speaking and writing). These results align with previous studies, which revealed that online universal tools remain unused by the majority of test-takers. For example, Crotts-Roohr and Sireci (2017) observed that only one third of ELs in their study made use of the available tools, and students’ reliance on them decreased as they progressed through the assessment. Wolf et al. (2022) also documented that a small group of ELs took advantage of the glossary tool, and they spent more time on the assessment when they did so.

Likewise, in the current study, the activation of the Help and Color tools notably decreased throughout the test as supported by the Phase 1 findings. The decrease in students’ activation of the Help tools could be attributed to increased familiarity with the test environment and the tools. Since the Help tools are designed to show how students can navigate other tools, once students become familiar with the functions of the tools, they do not need to rely on the Help tools much. The reduction in students’ activation of the Color tools might also indicate that the majority of students found the default color and contrast settings of the assessment satisfactory. Sufficient contrast between the background and text is important for readability (Liu & Anderson, 2008). Findings suggest the two Color tools may be serving their purpose, considering a higher percentage of students with visual challenges activated these Color tools. Therefore, a limited activation of a tool may not signal irrelevance or ineffectiveness of the tool. As Wolf et al. (2022) suggested, “what matters is whether there are any students who use the accommodation, rather than whether there is a sizable number of students using the accommodation” (p. 43). More in-depth information must be gathered from students about their tool choices, intentions, and uses before stronger claims about tool use and effectiveness can be made.

Students’ rationale for activating universal tools

Interview findings from Phase 2 of the study suggest that students often did not use the universal tools due to a lack of necessity or familiarity with the tools. Abedi et al. (2020) also observed a negative impact of tool use on test scores and commented that “lack of effectiveness of language-based accommodations is [due to] students’ unfamiliarity with these accommodations because they are seldom used in classroom instruction and teacher assessments” (p. 48). WIDA (2022a) creates a series of resources that educators can share with their students before they take ACCESS. One of the resources is a Test Demo that introduces students to the universal tools examined in this study. However, these resources may not be sufficient or convenient for students and their teachers. Similar to the findings from this study, Kim et al. (2022) found that K–12 language educators value the Line Guide and Highlighter tools more than other features, which may have influenced the use of these tools during in-class instruction. They argue that unless educators emphasize the importance of the tools, ELs may not become familiar with the tools and may not recognize the benefit of activating them. Hence, familiarity with the tools could impact tool activation and selection. In addition, tools, such as Sticky Notes are rare in widely used word processors, making them unfamiliar to students. Therefore, educators’ role in explaining universal tools via instruction and test demos is essential, especially for newcomer ELs and ELs with disabilities who may potentially benefit more from these tools.

Results indicate that ELs in this study activated the Line Guide, Highlighter, and Magnifier more than the Color and Help tools across all domains in the assessment. However, test-takers’ higher activation rate for the Magnifier and Line Guide tools during the ACCESS speaking domain test (see the Phase 1 findings) is not easily explained since the speaking domain includes relatively little text. The activation of these tools might suggest students’ effort to navigate the visual aspects of the online test interface, which in turn, may indicate that some students struggle with the default user design. Further research designed to better understand optimal user design, including the presentation of graphics and text within the computer interface, may help shed light on any inherent user experience issues. Alternatively, these findings on tool use might hint at some unplanned activation of the available tools. Computer-based assessments are more complex and multidimensional in nature than paper-based assessments, and online tools could add more complexity to the tests or introduce additional barriers (Crotts-Roohr & Sireci, 2017). Findings raise the possibility that the tools may be distracting for some students; in fact, telemetry data of individual test-takers revealed that some students were activating certain tools too frequently, resulting in a high mean value of tool activation than the median. Although computer-based testing allows for innovative design features, the relevance of these features for different domains should be carefully evaluated. When necessary, certain features or tools may need to be restricted to avoid disruptive use of these features.

Study results also suggest ELs’ selective universal tool activation. According to the Phase 1 findings, the activation of the Highlighter tool dramatically increased in the reading domain. Similarly, in the writing domain, the Highlighter was the most activated tool. Thus, students’ selection of tools could be partially explained by test domain requirements; the reading and writing domains require more text analysis than other domains, which may have led to the increase in Highlighter activation. As Almond et al. (2010) suggested, accessibility is an interaction between the characteristics of an assessment and a test-taker rather than a static property of the test. The Phase 2 interview findings support these trends as many students reported having used the Highlighter to focus on key words and information to better respond to items.

Universal tool activation among ELs with disabilities

The findings offer preliminary evidence that the universal tools explored in this study could support ELs with disabilities, specifically those with visual disabilities and learning difficulties. The Phase 1 findings indicate that slightly more ELs with disabilities activated the universal tools across all domains than ELs without disabilities. ELs with visual impairments demonstrated relatively high activation of four tools—Color Contrast, Color Overlay, Line Guide, and Magnifier—which suggests that these tools could better support the needs of these students.

Furthermore, results show the varying patterns of universal tool activation among different disability subgroups. For instance, almost half of the students with visual impairments activated the Magnifier tool during the listening, reading, and speaking tests, suggesting that the tool may have been activated for the intended purposes. The findings also demonstrate that students with serious emotional disabilities activated the universal tools across all domains more than students with other disability types. It is possible that these students just clicked through the tools without actually benefiting from them, given their special conditions (e.g., reduced attention span). However, due to the nature of telemetry data, the findings cannot provide insights into the actual intentions of students when activating these tools or whether the tools are beneficial to students’ test performance. More focused studies, likely involving cognitive labs for students with specific disabilities will be necessary to better understand the interactions between disability and the intentions behind universal tool activation.

Conclusion

The present study reveals important baseline findings concerning the accessibility of a large-scale and high-stakes ELP assessment for ELs with and without disabilities, both of which were underrepresented in previous research. Given the growing prominence of universal design and tools for fairer and more accessible assessments, this study makes several contributions to accessibility research and practice. In the literature, only a few studies have explored the documentation of ELs’ universal tool activation in the K–12 ELP context using telemetry data. It also makes a deliberate attempt to take different test-taker background variables, such as disability status and disability category, into account, thereby acknowledging the heterogeneity of EL populations.

Furthermore, the present study provides valuable insights with respect to the design and development of online accessibility features embedded in ELP assessments. The Phase 1 descriptive findings on ELs’ tool activation inform readers about the role of universal tools, specifically regarding the extent to which ELs might benefit from them, depending on language domain or students’ disability type. The Phase 2 interview results indicate students’ suggestions for further enhancing the quality of the universal tools, such as having multiple colors for the Highlighter, allowing students to enter their preferred zoom ratio in the Magnifier, or an embedded dictionary tool. The findings suggest that test developers should be conscious about the development and inclusion of specific universal tools in ELP assessments and that these decisions should be informed by both theoretical considerations and student data (see Guzman-Orth et al., 2016).

The accessibility tools available to test-takers should have clear pedagogical and practical purposes (Kim et al., 2022). Multiple factors should be considered when developing universal tools, such as (1) students’ grade level, (2) students’ language proficiency, (3) the cognitive load of the test, (4) the number of tools presented, (5) students’ familiarity with the tools, (6) the relevance of the tools to the test content and format, and (7) disability types of students. It must be also kept in mind that the inclusion of too many universal tools in an online platform might inadvertently lead to the inappropriate use of the tools, thereby negatively impacting students’ performance (Higgins et al., 2012).

The present research also suggests further optimization of the tools to make them more beneficial to students. Tools that are geared toward processing information and helping ELs with cognitive strategies (i.e., Highlighter, Sticky Notes) might be more relevant to ELs’ needs for understanding information presented on the test than manipulating the test environment (i.e., Color tools), while tools designed to support disability-related barriers (i.e., Magnifier, Line Guide) might be more beneficial for ELs with disabilities. In addition, limiting the activation of tools per item and presenting tools depending on the domain requirements might help avoid some of the apparently unintended overuse of the tools among some students.

There is also a need to bridge assessment and instruction with respect to universal tools. Integrating the tools into day-to-day classroom teaching could increase students’ familiarity with the tools and the likelihood that students make more use of them during testing. The importance of the classroom teacher as a conduit for successfully preparing students for the test experience is indicated in this study and reinforces the role of the educator as a key assessment stakeholder. It is incumbent upon test developers to provide simple and easily accessible resources to educators and students, and introduce the universal tools to stakeholders before test administration.

Footnotes

Declaration of conflicting interests

The authors declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: At the time of the research project, the authors were affiliated (as either an employee or an intern) with WIDA, University of Wisconsin-Madison that develops ACCESS for ELLs assessments.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

ORCID iDs

Ahyoung Kim

Jason A. Kemp

Notes

References

Abedi

(2002). Assessment and accommodations of English language learners: Issues, concerns and recommendations. Journal of School Improvement, 3(1), 83–89. http://files.eric.ed.gov/fulltext/ED458225.pdf

Abedi

(2014). The use of computer technology in designing appropriate test accommodations for English language learners. Applied Measurement in Education, 27(4), 261–272. https://doi.org/10.1080/08957347.2014.944310

Abedi

Bayley

Ewers

Mundhenk

Leon

Kan

(2012). Accessible reading assessments for students with disabilities. International Journal of Disability, Development and Education, 59(1), 81–95. https://doi.org/10.1080/1034912X.2012.654965

Abedi

Zhang

Rowe

S. E.

Lee

(2020). Examining effectiveness and validity of accommodations for English language learners in mathematics: An evidence-based computer accommodation decision system. Educational Measurement: Issues and Practice, 39(4), 41–52. https://doi.org/10.1111/emip.12328

Albus

Thurlow

M. L.

(2008). Accommodating students with disabilities on state English language proficiency assessments. Assessment for Effective Intervention, 33(3), 156–166. https://doi.org/10.1177/1534508407313241

Almond

Winter

Cameto

Russell

Sato

Clarke-Midura

Torres

Haertel

Dolan

Beddow

Lazarus

(2010). Technology-enabled and universally designed assessment: Considering access in measuring the achievement of students with disabilities—A foundation for research. Journal of Technology, Learning, and Assessment, 10(5), 1–52. https://ejournals.bc.edu/index.php/jtla/article/view/1605/1453

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing. American Educational Research Association. https://www.testingstandards.net/uploads/7/6/6/4/76643089/9780935302356.pdf

American Psychiatric Association. (2022). Diagnostic and statistical manual of mental disorders (5th ed., text rev.). https://doi.org/10.1176/appi.books.9780890425787

Castañeda v. Pickard, 648 F.2d 989 (5th Cir. 1981).

10.

Chia

Kachchaf

(2018). Designing, developing, and implementing an accessible computer-based national assessment system. In Elliott

Kettler

R. J.

Beddow

Kurtz

(Eds.), Handbook of accessible instruction and testing practices: Issues, innovations, and applications (pp. 75–91). Springer. https://doi.org/10.1007/978-3-319-71126-3

11.

Choi

Cho

(2016, April 9–12). The impact of spellchecker use during an English writing assessment: A case study [Paper presentation]. Annual Meeting of the American Association for Applied Linguistics, Orlando, FL, United States.

12.

Cohen

Tracy

Cohen

(2017). On the effectiveness of pop-up English language glossary accommodations for EL students in large-scale assessments. Applied Measurement in Education, 30(4), 259–272. https://doi.org/10.1080/08957347.2017.1353986

13.

Crotts-Roohr

Sireci

S. G.

(2017). Evaluating computer-based test accommodations for English learners. Educational Assessment, 22(1), 35–53. https://doi.org/10.1080/10627197.2016.1271704

14.

De Backer

Baele

Van Avermaet

Slembrouck

. (2019). Pupils’ perceptions on accommodations in multilingual assessment of science. Language Assessment Quarterly, 16(4–5), 426–446. https://doi.org/10.1080/15434303.2019.1666847

15.

Educational Testing Service. (2020). Accessibility and usability for the English language proficiency assessments for California: A cognitive lab study with students who are deaf or hard of hearing and students who are blind or have low vision. California Department of Education. https://www.cde.ca.gov/ta/tg/ep/documents/elpaccognitiverpt.pdf

16.

Every Student Succeeds Act, 20 U.S.C. § 6301 (2015). https://www.congress.gov/bill/114th-congress/senate-bill/1177

17.

Frankenberg-Garcia

(2011). Beyond L1–L2 equivalents: Where do users of English as a foreign language turn for help? International Journal of Lexicography, 24(1), 97–123. https://doi.org/10.1093/ijl/ecq038

18.

Guzman-Orth

Laitusis

Thurlow

Christensen

(2016). Conceptualizing accessibility for English language proficiency assessments (Research Report No. RR-16-07). Educational Testing Service. https://doi.org/10.1002/ets2.12093

19.

Guzman-Orth

Sova

Albee

(2020). Accessibility considerations for English learners with disabilities in English language proficiency assessments. In Wolf

M. K.

(Ed.), Assessing English language proficiency in US K–12 schools (pp. 185–204). Routledge. https://doi.org/10.4324/9780429491689-10

20.

Hansen

E. G.

Mislevy

R. J.

(2006). Accessibility of computer-based testing for individuals with disabilities and English language learners within a validity framework. In Hricko

Howell

(Eds.), Online assessment and measurement: Foundations and challenges (pp. 214–261). Information Sciences. https://doi.org/10.4018/978-1-59140-720-1.ch011

21.

Higgins

Fedorchak

Katz

(2012). Assignment of accessibility tools for digitally delivered assessments: Key findings. Measured Progress.

22.

Howell

D. C.

(2013). Statistical methods for psychology (8th ed.). Cengage Learning.

23.

Irwin

De La Rosa

Wang

Hein

Zhang

Burr

Roberts

Barmer

Bullock Mann

Dilig

Parker

(2022). Report on the condition of education 2022 (Report No. NCES 2022-144). National Center for Education Statistics. https://nces.ed.gov/pubs2022/2022144.pdf

24.

Kim

A. A.

Lee

Chapman

Wilmes

(2019). The effects of administration and response modes on grades 1-12 students’ writing performance. TESOL Quarterly, 53(2), 482–513. https://doi.org/10.1002/tesq.495

25.

Kim

A. A.

Monroe

Lee

(2022). Examining K–12 educators’ perception and instruction of online accessibility features. Computer Assisted Language Learning, 35(3), 437–468. https://doi.org/10.1080/09588221.2019.1705353

26.

Kim

A. A.

Yumsek

Chapman

Cook

H. G.

(2019). Investigating K-12 English learners’ use of universal tools embedded in online language assessments (WIDA Technical Report No. TR-2019-2). WIDA at the Wisconsin Center for Education Research. https://wida.wisc.edu/sites/default/files/resource/investigating-k12-english-learners-use-universal-tools-embedded-online-language-assessments.pdf

27.

Kopriva

R. J.

Wright

Triscari

Willner

L. S.

(2021). Examining a multi-semiotic approach to measuring challenging content for English learners and others: Results from the ONPAR elementary and middle school science study. World Journal of Educational Research, 8(1), 1–25. https://doi.org/10.22158/wjer.v8n1p1

28.

Lau v. Nichols, 414 U.S. 56 (1974).

29.

Liu

K. K.

Anderson

(2008). Universal design considerations for improving student achievement on English language proficiency tests. Assessment for Effective Intervention, 33(3), 167–176. https://doi.org/10.1177/1534508407313242

30.

Liu

K. K.

Lazarus

Thurlow

M. L.

Stewart

Larson

(2020). A summary of the research on test accommodations for English learners and English learners with disabilities: 2010-2018. National Center for Educational Outcomes, University of Minnesota. https://files.eric.ed.gov/fulltext/ED605768.pdf

31.

S. R.

(2018). Investigating test-takers’ use of linguistic tools in second language academic writing assessment (Publication No. 10747964) [Doctoral dissertation, Columbia University]. ProQuest Dissertations & Theses Global. https://doi.org/10.7916/D8B00HDQ

32.

Pennock-Roman

Rivera

(2011). Mean effects of test accommodations for ELLs and non-ELLs: A meta-analysis of experimental studies. Education Measurement: Issues and Practices, 30(3), 10–28. https://doi.org/10.1111/j.1745-3992.2011.00207.x

33.

Shafer Willner

Monroe

. (2016). The WIDA accessibility and accommodations framework: Considerations influencing the framework development. WIDA. https://wida.wisc.edu/sites/default/files/resource/WIDA-Accessibility-Accommodations-Framework.pdf

34.

Sireci

G. S.

Faulkner-Bond

(2015). Promoting validity in the assessment of English language learners. Review of Research in Education, 39, 215–252. https://doi.org/10.3102/0091732X14557003

35.

Solano-Flores

(2022). Fairness in testing: Designing, using, and evaluating test accommodations for English learners. In Jonson

J. L.

Geisinger

K. F.

(Eds.), Fairness in educational and psycdihological testing: Examining theoretical, research, practice, and policy implications of the 2014 standards (pp. 271–292). American Educational Research Association. https://doi.org/10.2307/j.ctv2kzv0fw

36.

Thurlow

M. L.

Kopriva

R. J.

(2015). Advancing accessibility and accommodations in content assessments for students with disabilities and English learners. Review of Research in Education, 39(1), 331–369. https://doi.org/10.3102/0091732X14556076

37.

Thurlow

M. L.

Lazarus

S. S.

Albus

Hodgson

(2010). Computer-based testing: Practices and considerations (Report No. Synthesis Report 78). National Center on Educational Outcomes, University of Minnesota. https://nceo.umn.edu/docs/onlinepubs/synthesis78/synthesis78.pdf

38.

U.S. Department of Education. (2014). Questions and answers regarding the inclusion of English learners with disabilities in English language proficiency assessments and Title III annual measurable achievement objectives. http://www2.ed.gov/about/offices/list/osers/osep/index.html

39.

U.S. Department of Education. (2018). A state’s guide to the U.S. department of education’s assessment peer review process. https://www2.ed.gov/admins/lead/account/saa/assessmentpeerreview.pdf

40.

Wolf

M. K.

Yoo

Guzman-Orth

Abedi

(2022). Investigating the effects of rest accommodations with process data for English learners in a mathematics assessment. Educational Assessment, 27(1), 27–45. https://doi.org/10.1080/10627197.2021.1982693

41.

WIDA. (2022a). ACCESS test practice and sample items. https://wida.wisc.edu/assess/access/preparing-students/practice

42.

WIDA. (2022b). Accessibility and accommodations manual. https://wida.wisc.edu/sites/default/files/resource/Accessibility-Accommodations-Manual.pdf

43.

WIDA. (2022c). Building a WIDA assessment. https://wida.wisc.edu/assess/building-wida-assessment

44.

Zehler

A. M.

Fleischman

H. L.

Hopstock Stephenson

T. G. P. J.

Pendzick

M. L.

Sapru

(2003). Descriptive study of services to LEP students and LEP students with disabilities. Development Associates, Inc. http://www.ncela.us/files/rcd/BE021195/policy_report.pdf