Abstract
In this article, we describe current research findings on assessment accommodations and universal design within the context of emerging interactive digital assessment tasks that employ simulations such as in science, technology, engineering, and mathematics (STEM). STEM education in many classrooms now includes digitally based activities such as science simulations and virtual laboratories that have been shown in some cases to promote learning gains. When such technologies are used in STEM assessments, a major challenge is to ensure assessments are accessible so all students can show what they know and can do. Federal laws and regulations including the Individuals with Disabilities Education Act, Elementary & Secondary Education Act, and Americans with Disabilities Act require that students with disabilities (SWD) be provided an opportunity to participate in educational programing and services available to nondisabled peers. In addition to implementing principles of universal design in assessment contexts, reasonable accommodations must be afforded to ensure accessibility. This article focuses on universal design and accommodations where the STEM construct is not adjusted or modified. Here, we employ synthesis of the research literature to document accessibility recommendations and practices around interactive assessment tasks, especially in STEM. We illustrate with an example and highlight directions that future development might take. The intention is to inform educators, school administrators, state and local policy makers, and assessment developers on the availability and use of accommodations in interactive assessment contexts such as simulation, and what is needed to ensure appropriate accessibility for SWD.
Keywords
Introduction
New learning activities and assessments rapidly becoming available in science, technology, engineering, and mathematics (STEM) education often employ technologies such as digital devices, new media, and an extensive range of simulation and virtual interactivity. In assessments, evidentiary goals may include to measure student understanding of STEM concepts and complex problems such as in virtual simulations. A major challenge for such activities when used in assessment is to ensure that tasks are accessible such that all students are able to show what they know and can do (Haertel et al., 2010; Hansen, Liu, Rogat, Hakkinen, & Darrah, 2016).
Increasingly, teachers want to engage students and get them excited about what they are learning in STEM classes through real-world STEM problem-solving (Jolly, 2017; Jurdak, 2016). Digital devices in STEM are used to simulate dynamic systems of objects in a real or imagined world (Akpan & Andre, 1999) for both learning and assessments. Simulations, video, interactivity, and other dynamic digital representations are particularly important in STEM to show students scientific phenomena that cannot be observed easily in real time (Scalise et al., 2011). For example, They can allow students to see things in slow motion, such as the development of a wave, or speeded-up, such as erosion caused by a river. They are used to model phenomena that are invisible to the naked eye, such as the movement of molecules in a gas. Simulations are also employed in situations that require several repetitions of an experiment, each with varied parameters, within limited instructional time—for example, rolling a ball down a slope while varying mass, angle of inclination, or the coefficient of friction. (Scalise et al., 2011, p. 1053)
The research questions we ask here are broadly based but can be considered 2-fold: To what extent does the research literature address making rich technology contexts accessible for STEM, especially in assessments intended to be aligned with this type of emerging instruction and application? Do themes emerge in the research literature regarding understanding assessment accessibility for such contexts, and do gaps appear in the research literature regarding understanding accessibility for the themes that emerge?
The purpose of this investigation is a broad consideration of whether and how the research literature identifies accessibility solutions for assessment approaches in virtual labs and simulation software tasks for Grades K–12.
That said, several caveats are anticipated in advance and also discussed with the findings. First, research is somewhat sparse directly in STEM and some applicable work can be found in the so-called technology-related contexts that are applicable to STEM. We acknowledge that this is an empirical question for STEM and is investigated here in Research Question 1 by reporting the STEM contexts of the findings (see Table 1).
Domain Representation for Sample.
Note. Specific application in more than one domain is counted as multiple attempts, so N exceeds 48 and percentages exceed 100%. STEM = science, technology, engineering, and mathematics.
Second, much of what is described here for virtual performance tasks and experimental simulation is acknowledged in advance as potentially having application outside of STEM. However, we feel that a resource specifically contextualized for STEM and that does a complete job through saturation evaluation (see Method section) of exploring the STEM literature if not all other contexts is both warranted and appropriate at this time. An investigation that explores the extent to which emerging STEM contexts have been addressed for accessibility is not yet available. We have included in the Conclusions section possible implications of some extensions to other domains.
Finally, to clarify a third caveat in a different direction, we do acknowledge that this work is specifically intended to be emergent, or to allow themes to emerge from the relevant literature, based on applying a set of criteria to a body of work, as described in the Method section. Table 2 in the Results section, for instance, shows special needs addressed by the accessibility studies, with Individuals with Disabilities Education Act (IDEA) categories used to identify specific disabilities. Subsets of the results could warrant separate papers in some cases. We believe an overview paper, however, is useful to identify findings, establish directions, and paint the broader perspective for STEM developers.
Accessibility Needs Representation for Sample.
Note. Specific application for more than one need is counted as multiple presentations, so N exceeds total articles of 48 and total percentages exceed 100%.
In the next two sections, we take up (i) a set of definitions to be used in this article and (ii) a brief review of literature on existing accessibility resources used in STEM. Then, we present the method for this study’s investigation, followed by the results and discussion, and finally, the conclusions and some potential implications for future work.
Definitions
Assessment standards play a crucial role in ensuring students with disabilities (SWD) equal access to computer-based testing (CBT). The Standards for Educational and Psychological Testing explains that testing should be fair to all individuals including individuals with disabilities (American Educational Research Association American Psychological Association & National Council on Measurement in Education, 2014). Furthermore, Section 504 of the Rehabilitation Act of 1973 mandates the consideration and implementation of accommodations necessary for SWD. The appropriateness of accommodations includes their effective impact in “leveling the playing field” between students with and without disabilities (Byrnes, 2008; Harrison, Bunford, Evans, & Owens, 2013).
Technology-based assessments must include features that increase test content accessibility (U.S. Department of Education, 2004). Section 504 requires electronic and information technology to be accessible to people with disabilities. Furthermore, recent federal regulations (e.g., No Child Left Behind Act of 2001, IDEA) require states to provide the link between instructional and assessment accommodations (Harrison et al., 2013; Lindstrom, 2010).
Here, a variety of types of STEM software in assessment are considered for accessibility. In the Results section, IDEA categories are used as one type of coding pattern for the special needs addressed by the accessibility studies. IDEA categories may not be fully familiar to all STEM audiences; under the IDEA, these are categories under which a student is eligible to receive the protections and services promised by this law. For the purposes of theme identification in this article, additional codes are also used here for other accessibility-related non-SWD needs, such as for language learners or for attentional supports, which are considerations in the technology enhancements.
In regard to defining STEM simulations, they generally fall into two main categories: virtual laboratories and simulations of phenomena (Scalise et al., 2011). Virtual laboratories attempt to simulate experiments digitally that may otherwise be performed in hands-on activities in STEM subjects. Virtual lab activities often employ data sets derived from prior scientific experimentation of real-world phenomena. They may simulate on-screen the materials, equipment, and tools with simplified changes.
Simulations of scientific phenomena, which we refer to in this article as “simulations,” by contrast may or may not be involved in an actual STEM virtual investigation but regardless they attempt to model phenomena not easily observed in real life, or where computer simulation may offer other advantages. The article employs the de Jong definition of simulation as “a program that contains a model of a system (natural or artificial; e.g. equipment) or a process” (de Jong & van Joolingen, 1998) along with two extensions to include representations as well as models of an event, object, or phenomenon (Thompson, Simonson, & Hargrave, 1996) to simulate dynamic systems of objects in a real or imagined world (Akpan & Andre, 1999).
Screens from an example science simulation virtual assessment task are shown in Figure 1a and b. Metadata for the alignment profile of the example task shows in Figure 2. This high school task focuses on life science and earth science expectations in the Next Generation Science Standards through applying the science practice of analyzing and interpreting data and applying the crosscutting concept of patterns. In the tasks, student show how they can investigate the genetics of wild and domesticated organisms, in this case trees, to identify combinations of inheritance that produce edible or poisonous fruits. As part of the task, students create a computational simulation to illustrate relationships among management of natural resources, the sustainability of human populations, and biodiversity. Investigations in the task employ simulations, through which students provide evidence toward performance expectations.

(a) Dynamic assessment object in the example task employs a simulator and (b) dynamic assessment object in the example task employs an interactive representation.

Full alignments for the example task to the educational standard set used in this case, the Next-Generation Science Standards. It shows the overall three-dimensional performance expectations of the simulation-based task and other information.
This article considers research on both symbolic and experiential simulations described in the literature (Gredler, 1996). Simulations are symbolic when the student is an observer such as of video, whereas an experiential simulation immerses the student as an involved agent in a complex, changing environment.
In accessibility for STEM assessments, one principle is to observe accessibility in classroom practices during learning interactions. Inquiry learning in STEM is defined here with the de Jong definition (2006) as “an approach to learning that involves a process of exploring the natural or material world, and that leads to asking questions, making discoveries, and rigorously testing these discoveries in search for new understanding.”
Brief Review of Literature Regarding Existing Accessibility Resources Used in STEM
As a brief review, we first examine four commonly used resources for accessibility of digital content in STEM: Universal Design for Learning (UDL), Web Content Accessibility Guidelines (WCAG), Assessing Special Education Students (ASES) accommodations manual, and Accessible Portable Item Protocol (APIP) conformance.
First, UDL originated as a concept in architecture but has expanded to many other fields including educational assessment (Thompson, Johnstone, & Thurlow, 2002). A goal has been to support participation of the widest possible range of students in assessments, whether digital or otherwise, in a way that produces evidence for valid inferences about the performance for all participants. Researchers have explored many aspects of UDL in educational contexts including for item development (Johnstone, Thompson, Bottsford-Miller, & Thurlow, 2008), technology contexts (Haertel et al., 2012; Rose & Meyer, 2002), evidence-centered design (Haertel et al., 2010), and educational environments (Mcguire, Scott, & Shaw, 2006; Rose & Meyer, 2006; Rose, Meyer, & Hitchcock, 2005). Universal design was defined by the Center for Universal Design (1997) as “the design of products and environments to be usable by all people, to the greatest extent possible, without the need for adaptation or specialized design.” Application in educational assessments was originally described as based on seven principles that have come to define the field (Thompson et al., 2002): supporting an inclusive assessment population, establishing defined constructs, amenable to accommodations, clear instructions, employing accessible, unbiased items, supporting simple and clear instructions, and maximizing legibility.
Second, in technology settings, developers also often employ the Information Management System Global Learning Consortium [IMS GLC] standards and The World Wide Web Consortium WCAG, which promote technology tools to improve participation and accessibility. WCAG Version 2.0 current in 2017 presents technical standards for accessibility of web content, developed through the Web Accessibility Initiative of the World Wide Web Consortium (Web Accessibility Initiative, 2005; updated 2017). WCAG addresses the goal of inclusivity in a digital setting, which is UDL Principle 1, and through technical specifications addresses UDL Principles 3–7. Principle 2 regarding precisely defined constructs is not addressed in WCAG, which is primarily concerned with information technology. Principle 2 is often addressed through measurement science such as domain modeling, discussed here in the Results section.
The WCAG web-based standards have been applied for accessibility by many developers of digital content more generally. The guidelines are organized into four principles: perceivable, operable, understandable, and robust, as shown in Figure 3. Three levels of success criteria specified by WCAG are levels A, AA, and AAA. Each additional “A” adds testable success factors and stricter compliance.

Overview diagram showing a summary from the four principles of the Web Content Accessibility Guidelines.
Third, the ASES accommodations manual (Christensen, Carver, VanDeZande, & Lazarus, 2011) is used by practitioners engaged in educational assessment of SWD. ASES describes practices for selecting, administering, and evaluating accommodations in both instruction and assessment for SWD. Beginning with a discussion of applicable federal and state laws, ASES discusses UDL, accommodations, and modifications for instruction and assessment.
ASES definitions include that accommodations do not reduce learning expectations, but meet specific instruction and assessment needs of SWD, while modifications refer to practices that change, lower, or reduce learning expectations and may modify the underlying construct of an assessment. Note that our article focuses primarily on technology-enhanced assessment approaches that are not intended to modify the underlying constructs, therefore, not specifically on alternate assessments or other modifications.
ASES includes steps for administering accommodations during assessment including a Logistics Planning Checklist. ASES tools such as the Assessment Adaptation Grid are helpful for assessment design and cover many of the same topics as WCAG but with less specificity.
Fourth, the APIP standardizes the interchange file format for digital assessment items and tasks (IMS GLC, 2014). This allows digital content to be exchanged among different APIP compliant test item banks. In terms of accessibility, it also provides the test delivery interface with information and resources needed to make content accessible for SWD. APIP expands the IMS Question and Test Interoperability specification with markup for accessible assessments. For comparisons, the purpose of APIP focuses primarily on the compatibility principle of WCAG. APIP has three compliance categories for accessibility, as shown below. The alternate representations and adapted interactions that APIP supports exceed some aspects of: Alternate representations, which include spoken language, braille, tactile, sign language, translation, language simplification, and alternate text; Adapted presentations, which include magnification, reverse contrast, text/background color contrast, and color overlay for text; and Adapted interactions, which include attentional supports such as masking answer choices, line reading tool, definitions (glossing), emphasis marking, auditory calming sound, additional testing time, breaks, and sometimes cognitive scaffolding such as hinting.
Method
While the four resources described in the previous section provide considerable utility, extensions for some technology-enhanced contexts may be needed (Bechard et al., 2010). In parallel with previous work on science simulations and virtual laboratories (Scalise et al., 2011), we apply here similar methodology to examine STEM accessibility with a systematic synthesis of the reviewed research using the Harris Cooper Methodology in Synthesizing Research (3rd ed.; Cooper, 1998), along with saturation evaluation as described in the next section. The Cooper approach premises systematic guidelines for synthesis outcomes: Establishing the research question that defines the scope of the project for problem formulation. In this case, the research questions are described above. Utilizing basic tenets of sound data gathering to produce a sufficiently comprehensive integration of past research in a data collection stage. Implementing a data validation stage where clear methodology is used to assess and compare the evidence and results in the studies such as pattern identification and theming. Employing an analysis and interpretation stage where data are triangulated through synthesis techniques including qualitative or descriptive results if appropriate. Disseminating results such as through research-based articles.
Saturation Evaluation
A saturation evaluation approach was used here as in prior work (Scalise & Felde, 2017; Scalise et al., 2011). Saturation evaluation selects a representative sample of the research base and examines where sources are converging and diverging, and whether sufficient replication is seen in the findings. As each new resource is added, the information function, or in other words, the new information obtained for each new resource, is typically rising but at some point when sufficient resources have been examined, the function begins to flatten. This indicates that a strong degree of reiteration is being captured, over the body of collected work.
The evaluation of successful convergence in the saturation technique here employed a small validity check sample to determine the range of new information still emerging after a selected proportion (approximately 90%) of the analysis was complete. This is done by retaining a subsample of about 10% of the papers from the identified sample to analyze subsequent to the initial coding. The goal of the 90–10% subsample is determining the number of new patterns and trends still emerging after 90% of the evaluation is complete, to reflect on the degree to what the saturation is reasonably complete and sufficient replication is occurring.
Relevancy Criteria and Application to the Literature
We applied the following four criteria to judge the relevancy of the articles we reviewed, using the definitions described above: (1) the study addressed accessibility in assessments regarding SWD or other accessibility needs such as language simplification for students who are English learners (EL); (2) assessment content addressed involved or was applicable to primarily rich interactivity such as virtual labs and/or simulations for STEM education, and not primarily text-based or static graphic context; (3) the scope focused on Grades K–12 and not primarily postsecondary education, and (4) publication dates were in the specified scope of 2009–2016.
Several rounds of analysis were employed to identify studies that fit the synthesis criteria. First, study investigators were assigned to explore (i) electronic databases available to the researchers through the University of Oregon library, using key word searches bounded by date and scope criteria over 483 databases of publications contained in the library collection, (ii) ancestry and descendant searches focused on references in identified papers, and (ii) gray literature identified through presentations and other not-yet-published materials in a “snowball” sampling approach through researchers identified working in the area. Titles and abstracts were read at this stage to identify papers that were likely to be the most relevant, based on the selection criteria, and retaining papers for which review of the full text was required.
Following determination of relevancy in the full text, papers were read by researchers and coded according to an initial set of patterns, for which codes are summarized in tables in the Results section. Codes were amplified where necessary to expand for elements identified during the initial reading. After identifying studies that fit the synthesis criteria, the articles were read and analyzed to categorize the various components, features, and results related to identifying accessibility solutions for assessment approaches employing rich interactivity such as virtual labs and simulation software. In this case, identified articles that were longer and more comprehensive were selected to represent the validity check sample. These were included in the full results but not analyzed prior to the initial pattern elicitation.
From this evidence, we next built filtering criteria for evaluating claims of accessibility in virtual and simulation-based e-learning products. We identified themes in the literature that were described as having a relationship with accessibility in these contexts, and we coded them into categories (Miles & Huberman, 1994). We call these categories “patterns” and use them to consider accessibility challenges and solutions identified in the research literature. Following processing and inclusion of the validity check sample, the results were generated. The goal was to strengthen possible approaches for accessibility in these challenging STEM contexts.
Following this, a subgroup of researchers reviewed the results for the patterns identified across papers that indicated emerging themes relevant to the research questions, created tables, and presented results to the full research group. Following a discussion of results to this point of analysis, themes were assigned to a subgroup of researchers who through a second reading of papers confirmed or discussed revision of the findings and developed the Results section.
Results and Discussion
Following the review of both formal and informal literature sources as discussed above, 151 research articles were identified as “possibly relevant” by University of Oregon researchers using the overall search process and the relevance criteria outlined previously.
After a secondary screening of the full-text articles, 59 of the original set of articles were initially determined to be relevant using the selection criteria describe in the Method section, and 92 were deemed less relevant and removed from the subsequent analysis. A tertiary screening of the 59 articles replaced two articles and eliminated seven, for a total sample of 52.
The two replacements made were based on determining an earlier article on similar work by the same authors was more comprehensive or applicable; readers should please note that these two papers were then drawn from a slightly earlier date range. The seven articles finally eliminated were due to duplication within the identified sample from a similar paper from the same authors (two papers), one unpublished work from gray literature search that was not possible to obtain (one dissertation), and four articles that on full review did not include sufficient focus on either accessibility or interactive content for the relevancy criteria in this article.
Following the tertiary screens, four articles, or about 8% of the sample, were identified as the subsample for the validity check, as described in the Method section.
Relevant Literature Aggregation
Of the 48 relevant articles that remained in the study excluding the validity check subsample, 15 addressed accessibility in interactive assessments for elementary students, 25 for middle school, 7 for high school, and 18 for more general populations or where the K–12 age-group was not specifically identified, with some overlap among the categories where the article addressed more than one group, with proportions of overlap explained in each section below.
As shown in Table 1 regarding STEM domain, 22 articles addressed accessibility in mathematics, 8 in physical sciences, 9 in life sciences, 4 in earth and space sciences, 6 in science and technology or engineering, 1 in science and personal or social perspectives, and 1 in digital literacy, along with 17 that addressed accessibility in digital interactions more generally. Some studies considered more than one listed domain, with for Table 1 on average approximately one to two domains identified per article reviewed. This means that most manuscripts in the sample addressed either examples in a single domain as listed in the table, such as mathematics, or considered two of the listed domains such as considering assessment accessibility examples in both physical sciences and life sciences within a single manuscript.
Special needs addressed by the accessibility studies are shown in Table 2. IDEA categories identify specific disabilities and were used as described earlier in the article. Additional codes were used for other non-SWD needs such as for language learners or for attentional supports. Some studies explored more than one need, with for Table 2 on average approximately two codes identified per article reviewed. This means that most manuscripts in the sample addressed special needs for approximately two categories in the table, such as students with deafness or hearing impairment.
Table 3 shows the types of methodologies used in the articles in the study and how often they were employed. Institute of Education Sciences (IES) categories were used initially for identifying study design, with aggregation of some categories as well as additional codes added as the research team encountered designs that were not specifically identified in the IES scheme.
Study Design Representation for Sample.
aConceptual such as policy brief, technical manual, guideline, or framework.
Overall, the most common study type, used in nearly a third of the papers reviewed, was quasi-experimental with usually a two group comparison of different accessibility avenues, in a few papers with cases randomly assigned to groups. Qualitative case studies and literature synthesis at 17% each accounted for slightly more than another third of the papers. The remaining approaches in Table 3 accounted for the remaining third.
Regarding the validity check sample, no new accessibility patterns emerged in the four papers reviewed subsequent to the coding of the 48 papers described in Tables 1 –4. Hence, we conclude that the saturation evaluation is reasonably completed for the pattern identification undertaken, given the relevancy criteria. Note however that new research emerging subsequent to the 2016 end date do include at least one new solution, sonification.
Summary of Five Patterns Identified for Sample.
Next, we report on five overarching patterns identified in the papers, summarized here as shown in Table 4: framework approaches to devising accessibility improvements, assignment and administration of accommodations, language simplification, presentation modalities, and approaches through setting, timing, or tools. A sixth pattern, validity, is discussed separately.
Pattern 1: Framework Approaches to Devising Accessibility Improvements
In addition to the resources listed in the prior literature review, evidence-centered design (ECD) and other domain-modeling approaches are recommended for STEM accessibility.
ECD
ECD provides a framework for identifying and making explicit the structures and supporting rationales of a given assessment or assessment system. The framework includes the inferences intended to be drawn based on student test scores as well as the evidence needed to validly base inferences (Mislevy, Steinberg, & Almond, 2003). Based on ECD and often applying University Design for Assessment (UDA) described previously, developers consider the varied ways that individuals interact with assessments to demonstrate proficiency (see Johnstone et al., 2008; Mcguire et al., 2006; Thompson et al., 2002).
ECD and UDL/UDA have been coapplied in STEM including for the diverse population of students with various types and severity of identified disabilities. Recent large-scale collaborations have incorporated accessibility theory, including ECD and UDL/UDA, in the development of guiding principles and training and evaluation tools, for example, as a basis for broadening accessibility when assessments include reading (see Thurlow et al., 2009) and in alternate assessment with modified achievement standards contexts (see Test Accessibility and Modification Inventory and associated Accessibility Rating Matrix; Beddow, Elliott, & Kettler, 2010; Kettler, Elliott, & Beddow, 2009). Within the context of mathematics assessments based on alternate academic achievement standards for students with significant cognitive disabilities, Cameto, Haertel, DeBarger, and Morrison (2011) discussed the application of UDL/UDA principles within a broader ECD test/item development framework where the focus is test accessibility for students with severe limitations. Cameto and colleagues coapplied ECD and UDL/UDA principles to “consider systematically the [test] content, task, and learner characteristics that influence student performance” (p. 7).
Coapplying ECD and UDL, Haertel et al. (2010) put forth specifically for STEM the Principled Assessment Designs for Inquiry, an online assessment design for the purpose of developing blueprints for innovative assessment with a focus on science inquiry tasks. Haertel and colleagues offered guidelines to address the learning and test taking needs of SWD in science and demonstrate how these guidelines have been applied to redesign middle-school science assessment tasks and improve test accessibility for SWD. Similarly, Quellmalz, Silberglitt, and Timms (2011) and Quellmalz, Timms, Silberglitt, and Buckley (2012) offer insight on applying UDL/UDA to innovative, technology-enhanced STEM assessments.
In earlier work, Hansen and Mislevy (2006) offered the ECD conceptual framework as a means to prescriptively build accessibility features for SWD and ELs into CBT systems. These features may enhance the validity of test results and associated inferences. Hansen and Mislevy describe ECD as a validity framework, discussing how certain CBT accommodations can be designed and applied to appropriately relate accessibility needs.
Pattern 2: Assignment and Administration of Accommodations/UDL
In the articles surveyed in the saturation evaluation, assignment and administration of accommodations, as compared to UDL, was a common theme for STEM. This forms Pattern 2.
Accommodation policy
Murphy (2012) summarized key aspects of the 2008 amendments to the Americans with Disabilities Act (ADA) impacting students identified as having attention-deficit hyperactivity disorder (ADHD). ADA was designed to bring greater flexibility to those applying for accommodations in high-stakes accountability testing contexts. Chief among the Murphy recommendations was for diagnosticians (and any other individuals involved in the identification and accommodation process) to provide rigorous and comprehensive professional documentation to support test accommodation requests. Other researchers have concluded that identifying accommodations for SWD is complex and nonlinear, and neither wholly systematic nor intuitive in nature (Cawthon, 2010, 2011; Young & King, 2008). Researchers conclude that professionals should document symptoms early to rule out alternative explanations for functional impairment in testing environments (Lazarus, Thurlow, Lail, & Christensen, 2009) and for accommodation requests to be appropriately approved and “provide equal access to the test and prevent discrimination” (Murphy, 2012, p. 13).
Practitioner understanding of accommodation administration
Wolf, Kao, Rivera, and Chang (2012) found that teachers, considered here the front line of implementation, were not well versed in how to utilize or implement accommodations, when needed beyond UDL. Wolf and colleagues found that teachers interviewed often did not know, for instance, who determined the selection of accommodations for EL students. Also, for STEM, of the 11 permitted accommodations on the state math assessment used in the study, eight were used less than 50% of the time and almost never during classroom-based assessments. Wolf, Kao, et al. (2012) found that teachers perceived both reading directions aloud in English and extending time as having a significant positive effect on mathematics scores in one state’s cohort, but not in the second. These disparities, Wolf and colleagues argued, imply considerable variation in understanding around accommodation selection criteria and types of assessments allowed, and thus, bring into question the adequacy of the accommodation implementation and the comparability of accommodated test results.
Administering-specific accommodations
In terms of administering-specific accommodations in large-scale testing contexts, much of the identified research in STEM focused for the special needs and accessibility of EL student populations, rather than SWD, so the EL topics for this pattern will be reviewed next. Content-based accountability assessment of EL has gained much attention in the educational field in the last decade-plus (Abedi, 2009b). For the nearly 11 million linguistic minority students who are still acquiring English, access to test content is largely dependent upon the quality and appropriateness of the accommodations available and administered.
Factors such as familiarity with the specific accommodation, classroom and test contextual characteristics, educator perceptions of the implemented accommodations, and student-level differences, as well as student preferences for one accommodation contribute to implementation success and appear as consistent themes (see, e.g., Cawthon, 2009; Rogers, Christian, & Thurlow, 2012; Thurlow & Kopriva, 2015; Wolf, Kim, & Kao, 2012).
A promising accommodation for ELs appears to be computer-delivered assessments that allow for built-in accommodations that are used at students’ discretion. According to Abedi (2009a), CBT was the most appropriate and effective accommodation for providing such accessible assessments for EL students. Simply stated, Abedi argued that CBT used in this way does not alter the difficulty of the content, but rather, provides greater access—with the validity of assessment results unaffected. In more recent research, Abedi (2014) recommended algorithmic solutions as a means to select the most appropriate embedded accommodations that reflect the level of English ability of the test taker and other student-level factors. It is reasonable and conceivable that such accommodation available in CBT environs offer the possibility of serving the testing needs of SWD as well, but limited STEM-specific research is available. Thus, substantial gaps in the research literature for SWD can be seen in this pattern, for such formats as simulation and virtual lab applications.
Pattern 3: Language Simplification
A third theme across the included literature for this synthesis involves a steady rise in investigating language-based accommodations for mathematics and science assessments (Elliott et al., 2010; Rogers et al., 2012). Language-based accommodations can involve providing simplified questioning of curricular content, read-alouds, glossaries and dictionaries, and other tools to account for the communicative needs of SWD and ELs (Glass & Oliveira, 2014). However, one of the gaps in the current literature includes a lack of data demonstrating how well these specific accommodations are working for students who are utilizing them in the classroom (Rogers et al., 2012; Thurlow & Kopriva, 2015).
Language-driven accommodations for SWD
Read-aloud accommodations are one of the most frequently used testing accommodations across contexts. Researchers believe that read-aloud accommodations provide accessible communication of scientific information to students and also promote student acquisition of the specialized language of science (Glass & Oliveira, 2014). These accommodations involve a teacher, for instance, reading directions to students, rereading subtask directions, restating questions with more appropriate vocabulary, and/or using simplified language in directions. Overall, the consensus in the limited literature suggests that these types of accommodations have improved performance on challenging math and science assessment scores for students with reading deficits and learning disabilities (Elliott, Kratochwill, McKevitt, & Malecki, 2009; Shelton, 2012; Spiel et al., 2016), without sacrificing measurement comparability when using strict methodological approaches to examine fairness and equity based on demographic group membership, including for SWD and EL (e.g., Huggins & Elbaum, 2013), or when compared to other accommodation types or standard test administration (see Kim, Schneider, & Siskind, 2009a, 2009b). However, inconsistent findings are found across some of the studies on this topic. Elliott, Kratochwill, McKevitt, and Malecki (2009) found that SWD benefited differentially from these accommodations compared to their peers without disabilities, whereas others found that the groups had similar results (Freeland, Emerson, Curtis, & Fogarty, 2010).
Some researchers argue that technology will prove to be a viable option for read-aloud accommodations considering that reading aloud to students is a time-consuming, resource-intensive endeavor for teachers that is not routinely feasible. Developing CBT that has embedded text-to-speech features might not only increase the feasibility of implementation of read-alouds but also encourage independence and self-paced access to the assessment content (McMahon, Wright, Cihak, Moore, & Lamb, 2016).
Language-driven accommodations for EL
Once again, much of the identified literature spoke to special needs and accessibility for language simplification (Pattern 3) for EL students discussed in this section. Because assessments typically use language to ask questions, the inferences made from EL’s accountability assessment scores can be compromised. Researchers have shown a gap in performance between EL and native-English students for more linguistically complicated items within mathematics and science tests (Abedi, 2009a).
English dictionaries and glossaries
In their meta-analysis that empirically examined accommodations for EL on large-scale assessments, Kieffer, Lesaux, Rivera, and Francis (2009) found that English dictionaries and glossaries were the only accommodations with statistically significant and positive average effect sizes. Dissimilarly, physical glossaries did not appear to have an effect on increasing EL math test scores in the small but randomized control study by Wolf, Kao, et al. (2012) and in Abedi’s (2009a) study. Abedi stated that under 8% of the fourth graders and 15% of the eighth graders had used an English dictionary in their classrooms.
Simplified English and extra time
Abedi (2009a) noted effective and valid results of an extended time accommodation on ELs performance. Kieffer et al.’s (2009) meta-analysis found no evidence that extra time improved EL performance in a statistically significant manner.
Dual-native language
Kieffer et al.’s (2009) meta-analysis included studies using a variety of dual-native language accommodations. Their analysis indicated that none of the accommodations had a statistically significant average effect size.
Pattern 4: Presentation Modalities
Presentation accommodations (different format modalities) in the identified studies ranged from methods of content presentation, to the context of presentation and methods for selecting test items. Generally speaking, studies for STEM accessibility are small and still emerging, with limited empirical evidence to validate but many promising supports presented.
Accommodations for students who are blind or have low vision
Recent research relevant to accommodating the needs of learners with vision impairments has largely focused on methods for rendering graphical or three-dimensional content accessible for blind students and those with low vision (Darrah, 2013; Goncu & Marriott, 2011; Hansen et al., 2016; Levy & Lahav, 2011; Sullivan, Sahasrabudhe, Liimatainen, & Hakkinen, 2014). The technologies tested in these studies include tactile or vibrotactile interfaces, haptic feedback devices, and audio interfaces describing visual content. Generally, the results of these studies indicate that these interfaces and related devices are effective in partially, but not completely, remediating the access barriers experienced by blind students and those with low vision when contacting graphical or three-dimensional information. Each of these three accommodations is time-consuming for the students (Zebehazy, Zigmond, & Zimmerman, 2012).
Accommodations for students who are deaf or hard of hearing
Russell, Kavanaugh, Masters, Higgins, and Hoffmann (2009) compared the effectiveness (on performance accuracy) and preferability of embedded signing support within a CBT using either a recorded human signer or an avatar. Results indicated that there were no statistically significant main effects of the modality of signing on test performance, either in terms of time to complete the assessment or in terms of accuracy of test responses. There were, however, consistent findings that students preferred the recorded human signer relative to the avatar. Students reported preferring both to externally mediated recorded signing instructions (i.e., via DVD).
There are several important considerations that emerge from this study. First, if the two modalities are truly comparable in terms of performance, the avatar-mediated signing accommodations are likely to be considerably less expensive over time. However, due to the critical nature of facial expressions in signed communication (Reilly, 2005; Reilly, McIntire, & Bellugi, 1990), the detail and clarity of avatar-mediated signing must be sufficient to facilitate this aspect of nonmanual communication. Further, if these nonmanual aspects of the signed communication are present but unnatural or difficult to discern, it will increase working memory burden of test takers and accelerate fatigue over time or interactions with working memory deficits, especially for those with multiple disabilities (Cawthon & Leppo, 2013).
Accommodations for students with ADHD or emotional/behavioral disability (EBD)
We identified three articles with relevance to students with ADHD or EBD. Points of emphasis across these articles include: (a) a general call to emphasize motivating assessment practices with these populations (Cayton-Hodges et al., 2012), (b) an empirically demonstrated approach to increase intention to use CBT on the part of test takers (Terzis, Moridis, & Economides, 2012), and (c) the relevance of task choice to task completion and problem behavior reduction (Harrison et al., 2013). Generally, motivating students with ADHD and EBD disabilities involving low task engagement, and keeping them absorbed with test content are viewed as critical elements of successfully accommodating them during testing, but more research is needed (Ganguly, 2010).
Accommodations for students with learning disabilities (SWLD)
Research on format accommodations for CBT with SWLD included additional visual supports (Solano-Flores, Wang, Kachchaf, Soltero-Gonzalez, & Nguyen-Le, 2014; Wu, Kuo, Jen, & Hsu, 2015), item delivery with rich context (Bouck & Flanagan, 2009), response feedback (Golke, Dörfler, & Artelt, 2015), and compensatory supports (i.e., calculators, spoken text; Bouck & Flanagan, 2009; Quellmalz, Silberglitt, & Timms, 2011). Findings from the reviewed studies are generally limited for STEM and require additional research. While practices such as anchored instruction (rich content presentation) have generally demonstrated positive effects, the research across all of these areas indicates a need for additional study. Further, many accommodations studied in the reviewed articles have limited or no support for application with SWLD (i.e., visual supports), lack positive outcomes (i.e., response feedback), or lack a differentially positive effect favoring SWLD (i.e., calculators). Authors noted the highly engaging nature of simulation-based tests may facilitate concentration for those students with attention deficits or impaired executive functioning.
Accommodations in CBT in general
Quellmalz et al. (2012) reported the results of their study examining the technical adequacy and feasibility of the SimScientist simulation-based science assessments. The simulation-based assessments demonstrated less disparity between scores from either SWD or EL, and their peers respective classification peers. This finding, Quellmalz and colleagues argued, indicated that the simulation-based tests facilitated access to the content being tested more effectively than the standard paper-and-pencil tests typically administered. Explicit accessibility features embedded within the simulation-based tests included integrated text-to-speech functionality, screen magnification, and support for extended testing time.
Pattern 5: Targeting Setting, Time, and Tools
Regarding specific aspects of setting, time, and tool availability, a review of state policies by Thurlow and Larson (2011) identified a total of 72 types of testing accommodations for SWD in the context of state reading assessments, many of which apply to this Pattern 5 topic in STEM as well, with the Educational Testing Service listing the following accommodations as the most predominant in CBT contexts for SWD: (a) extended time, (b) separate room, (c) screen or human reader, (d) large print, (e) screen magnification, (f) calculator, (g) scribe or keyboard entry aide, (h) additional supervised break time, and (i) sign language (Hakkinen, 2015).
Oral administrations using assistive technologies, such as screen readers and computerized oral tests, and multiple accommodation administration that combine two or more of setting, time, and assistive tools (e.g., extended time with calculator) have also been implemented for SWD across testing contexts (Lindstrom, 2010).
Assistive technologies
Assistive technologies are critical tools for SWD to gain access to computer-based assessment content. According to the IDEA, assistive technology tools include “any item, piece of equipment, or product system, whether acquired commercially, off-the-shelf, or customized, that is used to increase, maintain, or improve functional capabilities of individuals with disabilities” (U.S. Department of Education, 1997, 2004). Assistive technologies have been commonly used in math assessment (Bouck & Flanagan, 2009). These assistive technologies can come built into phones or tablets (e.g., screen readers, magnifiers).
Disability-driven study contexts
The most common disabilities found in the literature for setting, time, and tools accommodations were visual and physical impairments, speech disorders, learning disabilities, ADHD, and language delays. However, accommodations for students who exhibit attentional/engagement-related disabilities (e.g., ADHD and EBD) with heterogeneous behavioral deficits (e.g., disruptive, off-task behavior) have not yet been adequately synthesized (Harrison et al., 2013), especially in STEM. Also, current research depicts a lack of existing guidance for decision-making for EL in CBT with rich assessment task contexts, especially those who exhibit disabilities coupled with unique linguistic learning needs (Wolf, Kao, Rivera, & Chang, 2012).
Age-related study contexts
Testing accommodations literature concerns students of all ages from kindergarten to college. Researchers in this review were found to most commonly address middle-school students (e.g., Lindstrom, 2010; McMahon et al., 2016) and, although outside the scope of this study, college students (e.g., Lovett & Leja, 2015). While most of the reviewed literature targeted middle school to pre-college students in the discussion of accommodation tools, various authors depict the need to further focus on earlier age ranges such as preschool and elementary school students (Harrison et al., 2013; Lindstrom, 2010).
Content-related contexts
Math and reading have been the most important focus of attention regarding tools for accommodation. Two studies (Lovett & Leja, 2015; McMahon et al., 2016) investigated the effects of two interventions (i.e., read-aloud using a digitized podcast and extended time) on the reading skills of SWD. Reading delays are a common characteristic of SWD, which applies in STEM contexts as well.
Accommodations provided within the subject of math have been widely studied, particularly among the population of students diagnosed with ADHD and for SWLD (e.g., Bouck & Flanagan, 2009; Cayton-Hodges et al., 2012; Lewandowski, Lovett, Parolin, Gordon, & Coding, 2007; Lindstrom, 2010). Cognitive-deficit areas (e.g., attention, processing, and memory) can impact academic functioning in STEM. While the science content area has been targeted by some accommodation research (e.g., Glass & Oliveira, 2014; Hansen et al., 2016; Levy & Lahav, 2011; McMahon et al., 2016; Quellmalz et al., 2011; Quellmalz, Timms, Silberglitt, & Buckley, 2012; Shelton, 2012), as CBT contexts become richer and more interactive, such content should be targeted for future research of tools for accommodation. The current body of literature for designing and implementing accommodations and appropriate tools in the area of CBT for science is lacking and calls for more research and informed practice around the implementation and utility of accommodations for SWD within this academic context.
Time-related accommodations
Extended time during testing is one of the most common accommodations for SWD, particularly for students with ADHD (Ranseen & Parks, 2005; Stretch & Osborner, 2005). Accommodations like extended time have been proven to benefit all students, not just SWD. Allowing only SWD access to such accommodations can be controversial in STEM (Lovett & Leja, 2015), including because such practices raise questions about test bias and fairness and the appropriate utility of accommodations during testing that could negatively impact the validity of the test results (Fuchs & Fuchs, 2001).
One indicator to consider for STEM in support of accommodations is the differential boost framework. Differential boost in this context justifies the use of accommodations when SWD benefit substantially more from them than students without disabilities (Fletcher et al., 2006; Fuchs, Fuchs, Eaton, & Hamlett, 1999). Several studies have shown that many students, with and without disabilities, scored higher on tests such as in math under extended time conditions (Elliott & Marquart, 2004; Lewandowski et al., 2007; Pariseau, Fabiano, Massetti, Hart, & Pelham, 2010), and this may also be expected in STEM more generally, although Pariseau, Fabiano, Massetti, Hart, and Pelham (2010) found that for elementary and middle-school students with ADHD, extended time was counterproductive in STEM, as rate of accurate mathematics completion increased in the presence of shorter time limits.
Validity Threats to Assessment Fairness/Bias Through Missing Not at Random
A major topic found missing from the identified STEM-related papers concerns when inferences are made based on assessments for which substantial missing data are present. Within any assessment context, missing data can arise from a participant’s refusal or inability to respond to item, their failure to comply with protocol, an absence during or exclusion from the data collection, or in the case of a test-based accommodation, a breakdown in the delivery or receipt of the intended accommodation. In correlational studies, missing data negatively impact statistical power and therefore, the detection of relationships if participants with incomplete responses are excluded from analysis. When missing data results in the nonrandom loss of participants from accountability reporting or a well-designed intervention study, estimates of unit performance or program efficacy will likely be biased. For example, in a randomized study of an instructional innovation for SWD, if student participants are nonrandomly responsive to the assessment modality, a group difference in performance outcomes may be due to the differential loss of student responses instead of the action of the intervention under study. A similar challenge arises within the context of school accountability reporting. If assessment procedures result in the nonrandom exclusion of students, performance estimates will be systematically biased.
Historically, the potential for missing data or attrition effects in the study of student, program, and school performance was nontrivial as schools exercised some control over the testing and reporting of scores from certain student populations (Abedi, 2004; Thurlow, 2004). For example, in one study that examined the performance outcomes of schools under a scenario where students receiving a test accommodation were included or excluded from accountability reporting, district and school achievement was generally higher and the performance of student groups more similar in the more restricted student sample (Zvoch & Stevens, 2005). More recently, with the focus on more inclusive assessment and reporting practice, the amount of missing assessment data may be somewhat less prevalent, but as outlined above, the concern about how the data were obtained and its construct validity remains. In particular, while the provision of test accommodations facilitates more universal assessment of English learners and SWD, variance in the administration of accommodations and more generally, students’ opportunity to learn (OTL) are increasingly ubiquitous validity threats. Furthermore, systematic study of OTL has revealed unequal and nonequitable OTL across multiple dimensions for SWD (Elliott, 2015). Without equal access to the assessed curriculum, concerns about the validity of inference necessarily shift from the nonrepresentativeness of the sample to either bias or alternatively assessment impact that can arise from the measurement of performance in domains where instructional exposure is variable.
Conclusion and Implications for Future Work
STEM educational practices both in the classroom and in assessments are increasingly employing richer digital resources, more interactivity, and an array of modalities and response types. The research questions we ask here pertain to what the research literature identifies as accessibility solutions for assessment approaches in virtual labs and simulation software tasks for Grades K–12.
To address the first research question, concerning to what extent does the research literature address making rich technology contexts accessible for STEM, this article describes current research findings on accessibility within the context of emerging interactive digital assessment tasks that employ simulations and virtual STEM activities. STEM education in many classrooms now includes more digitally based activities such as science simulations and virtual laboratories, which have been linked to some improved learning outcomes (Scalise et al., 2011). When such technologies are used in STEM assessments, a major challenge is to ensure assessments are accessible so all students can show what they know and can do (Haertel et al., 2010; Hansen et al., 2016).
Here, we employ synthesis of the research literature to document accessibility recommendations and practices around interactive assessment tasks, especially in STEM. From an initial draw of 151 research articles, 52 were identified as most relevant according to the relevancy guidelines of addressing accessibility in assessments regarding SWD or other special needs, involving or applicable to primarily rich interactivity such as virtual labs and/or simulations for STEM education, focusing on K–12, and within the specified date range.
Our next research question considered whether themes emerge in the research literature regarding understanding assessment accessibility for such contexts, and whether gaps appear in the research literature. In terms of themes, five overarching patterns were identified in the papers: framework approaches to devising accessibility improvements, assignment and administration of accommodations, language simplification, presentation modalities, and approaches through setting, timing, or tools. A sixth pattern, validity, was also identified and was discussed where applicable in the five pattern sections as an integrated picture, and at the conclusion of the pattern sections regarding the implications of missing data not at random and inferences of results.
Tables shown in the Results section summarize the research sample according to STEM domain, accessibility need(s) addressed, the study design and methodologies employed, and the accessibility patterns identified as themes for the synthesized findings. Additionally, the validity check subsample results concluded that few additional patterns or themes were identified, and therefore likely the saturation evaluation criteria were reasonably satisfied by the sample, for research at or prior to 2016.
Regarding the question of gaps in the research literature, we conclude that each of the pattern sections shows limited research in the context of STEM task accessibility to inform educators, school administrators, state and local policy makers, and assessment developers on the availability and use of accommodations in interactive assessment contexts such as simulation and virtual STEM activities. Framework approaches to devising accessibility improvements have been explored in STEM. A few limited examples of applying UDL/UDA within evidence-centered design for STEM are available.
Also, accommodation policy does exist but the research literature identified here indicates that identifying accommodations is complex and nonlinear for educators, which can be especially true as assessments become more complex including technologically. Practitioner understanding of accommodation administration for STEM shows considerable variation in understanding selection criteria and types of assessments allowed. Permitted accommodations were perceived by some teachers and not others as having a significant positive effect on STEM scores. One of the most important findings to be emphasized in this article is the need for preparation of teachers to implement and utilize testing accommodations, especially important in many inclusive STEM classrooms. Such teachers can also especially for science and technology classes have large class size in the middle and upper grades in some U.S. states, many demands for laboratory and individualized education program support, and few resources for STEM support and professional development in the elementary grades in many U.S. states.
In terms of administering-specific accommodations in large-scale testing contexts, much of the identified research in STEM focused on accessibility and accommodation of EL student populations with substantial gaps in the literature for SWD. As described in the findings, it is reasonable and conceivable that some of the accommodations available in CBT environs offer the possibility of serving the testing needs of SWD as well, but limited STEM-specific research is available. Thus, substantial gaps in the research literature for SWD can be seen in this pattern.
Language-based accommodations in particular formed such a substantial portion of the literature here that met the relevancy criteria that the topic was identified as its own pattern. These supports include simplified questioning of curricular content; read-alouds, glossaries, and dictionaries; and other tools to account for the communicative needs of SWD and ELs. Gaps particularly important for STEM technology enhancements include a lack of data demonstrating how well the specific accommodations are working for students who are utilizing them in STEM assessments.
Presentation accommodations/UDA (different format modalities) in the identified studies ranged from methods of content presentation to the context of presentation and methods for selecting test items. For the purposes of this article and because Table 2 in the Results section employed IDEA categories but needed to be extended for the results of the patterns identified in the papers, we attempted to organize the presentation accommodations in ways that would seem familiar to technology developers, as shown in the list below. Note that the fifth bullet point (accommodations for students qualifying for alternative assessments based on modified achievement standards) is technically out of scope for this study but was included because it appeared as one of several other topics identified in the relevant papers: Accommodations for students who are blind or have low vision, Accommodations for students who are deaf or hard of hearing, Accommodations for students with ADHD or EBD, Accommodations for SWLD, Accommodations for students qualifying for alternative assessments based on modified achievement standards, and Accommodations in CBT in general
Generally speaking, studies on presentation modalities for STEM accessibility were found to be small and still emerging, often organized by particular special needs as described above, and with limited empirical evidence to validate the utility of the supports, but however with many promising supports emerging that are described in the literature. Some of the supports may be difficult to implement given current technology infrastructure for K–12 assessments, whether classroom based or large scale. Therefore, both for student outcomes and for school decision-making, more information based on evidence of utility will be helpful for future work.
Regarding additional implications for future work, it was anticipated in this article’s Introduction section that some of what was found in STEM might have potential application outside of STEM. To consider this, in the context of the findings, the articles selected based on the relevancy criteria did seem to yield a useful set of results regarding richly interactive technology–based contexts for assessments, especially when content employed simulation and virtual performance with use of observed phenomena and digital tools. In a sense, the italicized aspects of content shown here could be seen as imposing needs and constraints that may be less applicable to non-STEM domains. For instance, Pattern 1, the need for more theoretical frameworks for development due to the complexity of the technology affordances, and Pattern 4, specific presentation modalities, might yield different findings in other domains, for instance, if assessments are more text based, static, and less oriented to the need for simulation of real-world phenomena and manipulation of variables, models, and experimental setups such as in STEM.
Additionally, Pattern 3 regarding language simplification may differ in some aspects between STEM and other domains because of the extensive technical range for STEM terms, definitions, and discourse language. Language may be unfamiliar or even at odds with real-world usage of the same vocabulary in STEM. The degree to which assessment targets can be glossed, defined, provided in situ, or supported as text to speech may also differ, compared to requirements of what is being measured. Pattern 5, approaches through setting, timing, or tools, also may have some degree of variance across domain such as in time needed to complete complex tasks.
The two other patterns may have more commonality. For Pattern 2, assignment and administration of accommodations, most of the issues raised such as consistent assignment and fidelity of administration, are likely to apply generally across domains. Implementation may be less well understood and facilitated in STEM contexts however. The sixth pattern, validity and exclusion, is a universal consideration. If assessment procedures result in the nonrandom exclusion of students, performance estimates are subject to concerns on the basis of fairness at the individual level and will also be systematically biased.
To address the consideration in this article’s Introduction section of whether the emergent themes show coherence or should be presented in a series of separate publications, we do acknowledge a diaspora of findings here. Discerning readers are likely to have noted that the results point to a series of implications for future work needed from others in several different areas. For technology developers in STEM, we believe an overview helps to provide context about the range of research so far in STEM and implications for future work. Extensions of the literature summarization could be especially helpful for (1) SWD and accommodations on global digital assessments generally across domains that take on richer interactive context; (2) SWD and ELs and language accommodations in digital assessment delivery; (3) disability-specific accommodations for digital assessments globally, especially considering Pattern 4 with regard to the student-facing displays and interactivities of the new technologies; (4) validity threats and the implementation of digital accommodations more universally including the entire realm of process data unexplored here being obtained from the interactive assessments; and finally (5) modifications and alternate assessments that involve modifying aspects of the assessment frameworks themselves, in the context of richer interactive content.
Data such as percentages of students taking STEM-related digital assessments, outcomes of students, and prevalence of new technologies will be an important part of making the argument for improvements in accessibility. Fragmentation, implementation, and differential policies of data collection may obscure the need for better supports, if no sound tracking of these data exists.
Our final conclusion regarding STEM specifically is that sufficient maturation for accessibility will be a key consideration regarding whether such tasks and activities can be effectively deployed to realize their promise in K–12 education. This is true in both instruction and assessment; indeed, this is key for SWD due to the necessary link between assessment that reflects the contexts of instruction. It is also both an encouragement and a caution to the field. As virtualization efforts for STEM education move forward through simulations and adding rich interactive inquiry activities that may improve learning outcomes, equal attention must be paid to STEM accessibility for all students, with both emerging solutions specific for STEM and better empirical support to support the desired inferences.
Footnotes
Acknowledgments
We wish to acknowledge the contributions of the Behavioral Research and Teaching Center at the University of Oregon and the provision of expertise toward this study from Gerald Tindal and Dan Farley.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
