Abstract
This study investigates the relationship between text-to-speech (TTS) usage and item-by-item performance in the 2017 eighth-grade National Assessment of Educational Progress (NAEP) math assessment, focusing on students with disabilities (SWDs), English language learners (ELLs), and their general education (GE) peers. Results indicate that all students use TTS more for longer and more difficult math items as well as for multiple-choice or short-response formats. Among SWDs and GE students, lower math proficiency and higher perceived time pressure are linked to higher TTS usage. Moreover, among GE students, factors such as male gender, minority status, lower math persistence, and higher math interest and effort during testing contribute to higher TTS usage. TTS usage is positively associated with item performance for SWDs and ELLs who received extended time accommodations but not for those who did not receive such accommodations or for general education students. The study suggests that the time constraints of speeded digital assessments may limit the potential benefits of TTS for SWDs and ELLs in math problem-solving.
Keywords
Math assessments should accurately reflect learners’ mathematical problem-solving knowledge and abilities rather than their disabilities or language barriers. In compliance with federal law, learners with disabilities and English language learners must be included in large-scale academic assessments, and accommodations must be made available to ensure that their performance reflects their abilities. Traditional assessments can accommodate individual learning differences, but technological advancements can provide a more accurate and fair assessment of the achievement and progress of all learners, particularly those with disabilities or language barriers. The Universal Design for Assessment (UDA) framework promotes the design of digital assessments that reduce barriers and maximize students’ potential to showcase their knowledge and skills, irrespective of their abilities or disabilities (Rose et al., 2016). This framework includes features like text-to-speech (TTS) or speech-to-text options, visual aids, and other tools that cater to diverse learning needs.
TTS, a commonly used UDA tool in digital assessments, was made available to all students during the 2017 eighth-grade National Assessment of Educational Progress (NAEP) math assessment. TTS enables synthetic computer voices to read aloud math item texts, questions, and answer choices while students read along. TTS helps students with disabilities (SWDs) and English language learners (ELLs) overcome reading barriers, comprehend instructions, and improve math problem-solving. Despite the long history of using read-aloud as an accommodation, limited research has examined TTS as a UDA tool—particularly at the item level—and its impact on SWDs, ELLs, and general education (GE) students.
Theoretical Frameworks
The cognitive load theory suggests that TTS reduces cognitive load by providing an auditory channel for processing information, complementing the visual channel. Incorporating both visual and auditory working memory is more effective than relying solely on the visual channel (Mayer & Moreno, 2003). TTS aligns with the universal design principle of offering multiple means of representation, action, and expression (Rose et al., 2016). Additionally, TTS supports language acquisition by providing spoken-language input that is easier for ELLs to comprehend, bridging the gap between their current proficiency level and academic content (Moon, 2012). These theoretical frameworks lay the foundation for understanding the potential benefits of TTS for SWDs and ELLs, such as reducing cognitive load, promoting inclusive educational practices, and enhancing language acquisition. Moreover, the differential boost hypothesis further suggests that the use of TTS or other accessibility features should lead to greater improvements in test scores for SWDs or ELLs compared to their nondisabled or non-ELL peers (Abedi et al., 2020; Sireci & O’Riordan, 2020). However, the magnitude of this differential boost may vary across contexts and assessments.
Usage of TTS in Large-Scale Digital Assessments
Previous research on TTS usage in large-scale digital assessments indicates that the overall usage of TTS is generally low and varies among different student subgroups. Lee et al. (2021) examined TTS both as a test accommodation, available only to SWDs or ELLs, or as a designated support, available to all students identified by their educators. This study found that the usage of TTS technology is generally low, with only 61% of students approved to use TTS using it during a state digital assessment in the spring of 2018. The study also revealed variations in TTS usage among different student subgroups: A higher percentage of SWDs (69%) and ELLs with disabilities (74%) used TTS compared to ELLs without disabilities (54%) and GE students (49%).
Recent studies have explored TTS as a UDA tool in digital assessments, available to all students without requiring prior approval. These studies found TTS usage is influenced by the test accommodation condition and item characteristics. For example, Lee et al. (2020) analyzed the 2017 NAEP fourth-grade math assessment data and found that students who received test accommodations used TTS more frequently than those who did not. The study also discovered that students used TTS more often on difficult text-heavy items and short items with a median difficulty level compared to other math test items. Moreover, TTS usage varied depending on the item location, with students using it less frequently toward the end of the exam (Crotts-Roohr & Sireci, 2017).
TSS Usage and Math Performance
More than half of US states provide TTS accommodations for students with dyslexia (Albus et al., 2020). Empirical studies have shown that using assistive technology such as TTS readers and speech-to-text writers can alleviate some of the adverse effects of dyslexia (Schneps et al., 2019; Wagner et al., 2022). According to a meta-analysis of 22 studies, TTS has an average effect size of 0.35 in improving reading comprehension for students with dyslexia (Wood et al., 2018). TTS has also been found to enhance self-efficacy (Svensson et al., 2019), attention, and time on task for students with dyslexia (Bonifacci et al., 2022). However, limited research has been conducted on the relationship between TTS usage and math performance. Two studies examining the use of TTS as a testing accommodation for students with dyslexia found improved scores in math concepts, application, and problem-solving (Fuchs et al., 2000; Tindal et al., 1998). A recent study analyzing the 2017 NAEP fourth-grade math assessment process and performance data found that TTS was associated with higher overall math performance among students who received test accommodations but not among those who did not (Hicks et al., 2021).
Previous studies have primarily focused on the effects of TTS or read-aloud on test performance among SWDs. In contrast, fewer studies have examined the impact of TTS on ELL performance, with mixed results. Several studies found no significant effect of TTS or read-aloud on ELL achievement scores (Abedi et al., 2020; Anderson et al., 2000; Wolf et al., 2009, 2012). Abedi et al. (2020) proposed three possible reasons for the lack of impact of TTS on improving test performance among ELLs: first, the assessment may have been simplified linguistically, limiting the effects of TTS on ELLs; second, teachers may not have adequately prepared ELLs for the assessment; and third, ELLs may be unfamiliar with TTS due to its infrequent use in classrooms.
Limitations of Previous Studies
Previous research has contributed to our understanding of the effectiveness and validity of TTS for SWDs and ELLs, but knowledge gaps remain. Specifically, while there is a considerable body of research on the impact of TTS on reading performance for SWDs or ELLs, fewer studies have specifically examined its impact on math performance, particularly among ELLs (Abedi et al., 2020). Additionally, most previous research has focused on overall performance rather than the item-level relationship among TTS usage, item characteristics, and math performance. It is important to note that TTS usage is item-specific, meaning usage pattern varies by item characteristics, and the impact of TTS on performance may vary across different test items. Indeed, Bolt and Thurlow (2007) noted the importance of examining TTS and math performance by item characteristics, finding that TTS was beneficial for difficult-to-read math word problems but not for computation questions.
Furthermore, few studies have explored whether receiving other accommodations, including extended time accommodation (ETA), mitigates the impact of TTS. Notably, Wei and Zhang (2023) observed that students with learning disabilities who were granted ETA tended to use TTS more frequently. Similarly, Knoop-van Campen et al. (2022) discovered that TTS usage extended reading time for both students with and without dyslexia. This leads us to hypothesize that TTS’s effectiveness might vary among students based on whether they received ETA. Our study aims to fill these research gaps by examining both item and student characteristics that influence the use of TTS at the item level. Additionally, we investigate how this item-level TTS usage correlates with math performance among SWDs, ELLs, and GE students.
The Unique Opportunity Afforded by NAEP Process Data
The 2017 eighth-grade NAEP math assessment process data offers a unique opportunity to examine students’ item-level TTS usage. NAEP process data capture TTS usage and timestamps as students interact with the digital math test. This study is the first to examine TTS usage and its relationship with performance at the item level using NAEP data. Specifically, this study posed the following research questions (RQs) for SWDs, ELLs, and GE students respectively:
How is item-level TTS usage associated with item characteristics and student characteristics?
Do the associations between item-level TTS usage and item correctness differ based on ETA status?
Do the associations between item-level TTS usage and item response time differ based on ETA status?
Method
Data
The data for this study were obtained from the 2017 NAEP math assessment. NAEP is a nationwide, low-stakes test that evaluates U.S. student achievement. The assessment selects a representative sample of schools and students in each participating state using a deeply stratified multistage cluster sampling plan. In 2017, the NAEP math test included 144,900 eighth-grade students from 6,500 different schools.
NAEP aims to assess a diverse range of students, including SWDs and ELLs, in a consistent and inclusive manner. Approximately 90% of SWDs and ELLs in the fourth and eighth grades participated in the NAEP math assessments. Only students with the most significant cognitive disabilities or ELLs who had been enrolled in U.S. schools for less than one full academic year and were unable to access NAEP were eligible for exclusion. All students had access to built-in universal design tools, including TTS, zoom, color contrast, answer choice elimination, volume control, a scratch pad, and so on. Additional accommodations, such as ETA, were available upon request by school staff and had to be consistent with the student’s Individualized Education Program and/or 504 plan.
Study Sample
On tablets, students completed a single form consisting of two 30-minute blocks, along with a 15-minute survey. This analysis focused on a specific block of 15 items released by the NAEP. The position of this block varied within the forms, with it being presented first in half of the forms and second in the other half.
The study sample consisted of 2,750 SWDs, of which 1,650 were granted ETA and 1,100 were not granted ETA. Additionally, there were 1,400 ELLs, including 460 ELLs granted ETA and 940 without ETA. It is worth noting that approximately 10% of the sampled SWDs were also classified as ELLs. The sample of 1,400 ELLs included ELLs without any disabilities.
Measures
The study utilized response, process, and survey data from a released math test within the 2017 NAEP math assessment. The response data recorded students’ answers to each math item, while the process data captured their interactions, including clicks, entries, and timestamps, as they worked through the test items. Students’ demographic characteristics and responses to survey questions were included in the survey data.
Student-Level Variables
The demographic variables included gender, age at the time of testing, race/ethnicity (categorized as African American, Hispanic, white, or other), disability categories among SWDs, and ELL status. In addition to ETA, five other accommodations (breaks, cueing, bilingual dictionary, preferential seating, or separate sessions) were available for SWDs or ELLs.
The NAEP provides a report on total math scores. In addition to total scores, the NAEP categorizes students’ math proficiency into levels based on their overall performance across two math tests adjusting for item difficulty. The proficiency levels are classified on a scale from 1 to 4: Level 1 signifies “Below Basic” proficiency, Level 2 corresponds to “Basic,” Level 3 is indicative of “Proficient,” and Level 4 represents “Advanced” proficiency.
After completing the math assessment, students were asked to complete a survey. The survey included items to calculate a perseverance index score and a math interest or enjoyment index score. The responses were rated on a 5-point Likert scale ranging from 1 (not at all like me) to 5 (exactly like me). The index scores were estimated using an item-response theory partial-credit scaling model, with scores ranging from 0 to 20. The survey also included questions about perceived test effort, difficulty levels, and time pressure, rated on a 5-point Likert scale.
Item Characteristics
In our analysis, we focused on five key variables to describe the characteristics of the math items. The first variable is the word count of each item, which serves as an indicator of its length or complexity. Next, we considered item easiness, represented by the percentage of students who answered each item correctly, thereby reflecting its relative difficulty. The third variable encompasses the type of items, categorizing them into multiple choice, short response, and constructed response. Additionally, we analyzed the item location within the test, classifying them into three categories: the first 5 items, the middle 5 items, and the last 5 items of the test. For a more detailed breakdown of these variables, readers are directed to refer to online Appendix Table A1.
TTS Usage
To utilize the TTS feature, students could click on the TTS icon located in the upper left corner of the screen (refer to Figure 1). Activating the TTS feature highlighted specific areas on the screen that could be read out loud. By tapping on these areas, students could listen to a computer-generated female voice reading the corresponding math item aloud. Deactivating the TTS was as simple as tapping the TTS button again. It is important to note that students had to turn off the TTS to provide their answers to the questions.

Demonstration of text-to-speech tool usage in the 2022 NAEP eighth grade mathematics assessment tutorial.
From the process data, two variables related to TTS usage were extracted. The first variable was binary, indicating whether a student used the TTS on an item. The second variable was a count representing the number of times a student tapped the areas eligible for audio reading on a specific math item. The number of times TTS was used served as a practical measure to capture the intensity of TTS usage. It is worth noting that TTS usage variables were recorded in the process data regardless of whether the students answered the item or omitted it.
Item Correctness
The NAEP math test consists of 15 test items covering various mathematical concepts. Six of the items were binary, with 0 indicating a wrong answer and 1 indicating a correct answer. Eight items had a maximum score of 2, with 0 indicating an incorrect answer, 1 indicating a partially correct answer, and 2 indicating a fully correct answer. One item had a maximum score of 4, with 0 assigned to incorrect responses, a middle score for partially correct responses, and the maximum score for fully correct responses. To simplify interpretation and facilitate statistical modeling, the scores for the nine items were converted into binary outcomes. Incorrect and partially correct responses were combined and labeled as 0, while correct responses were labeled as 1. Therefore, item correctness was recorded as a binary variable, with 1 indicating a correct answer and 0 indicating an incorrect or partially correct answer. Omitted items were coded as incorrect responses for item correctness. The average percent correct by group and by TTS usage status is presented in online Appendix Table A2.
Item Response Time and Total Response Time
The NAEP process data recorded both the total response time and item response time for the assessment. Since students were allowed to navigate between items, the time spent on each item was not a single data point but a cumulative measure across multiple visits. Item response time refers to the total time, in seconds, that a student spent working on an item, including initial visits and subsequent revisits. Total response time represented the total time, in seconds, that a student spent working on all 15 items in the math assessment. If a question was not visited, its item response time is recorded as 0 seconds. The average item response time by group and by TTS usage status is presented in Appendix Table A2.
Statistical Analysis
Descriptive Analysis
All analyses used R version 4.1.0 (R Core Team, 2021). We describe the demographic characteristics of five groups and item-level and test-level TTS usage and performance.
Generalized Linear Mixed-Effects Models With Binary Outcomes or Count Outcomes
Generalized linear mixed-effects models (GLMMs) are commonly used statistical tools in social science and medical research to analyze non-normally distributed dependent variables. They are particularly useful when there are correlations with a dataset due to grouping structures (Jiang, 2007). GLMMs account for both fixed and random effects, enabling estimation of both item-level and student-level effects. The GLMMs were fitted using the glmer function in the R lme4 package.
To answer RQ1, the study employed GLMMs with the first level as the item level and the second level as the person level. We used a binary link function for the binary TTS variable (indicating whether TTS was used on item i by student j) or a Poisson link function for the count of TTS usage (the number of times TTS was used on item i by student j). Specifically, the logit of whether TTS is used on item i by student j, ηij, is given by the equation:
where γ00 is the overall intercept across items and students, β01 is the fixed effect of student characteristics, β10 is the fixed effect of item characteristics, µj is the random student effect, and eij is the error term. To estimate the counts of TTS usage, the previous model was revised to use the Poisson link function for count data.
To answer RQ2, GLMMs were used to estimate the likelihood of students correctly answering an item using a binary link function. For item i and student j, the logit of the correct response probability, ηij, is given by the equation:
In this equation, TTSij takes the value of 1 if student j used the TTS tool on item i and 0 otherwise. Alternatively, TTSij can also take the value of the number of times TTS was used on an item. The fixed effect β20 represented the association between TTS and item correctness. We excluded item 3 from the analysis for RQ2 due to its low usage of TTS (less than 3%). Item 3 was a multiplication problem that didn’t require the use of TTS for comprehension. However, we included performance on item 3 as a predictor to control for calculation skills.
Hierarchical Linear Models With Continuous Outcome
To answer RQ3, hierarchical linear models were used to estimate the continuous outcome, item response time, while accounting for the nesting structure of the data. The item response time data were right-skewed; commonly used test statistics are not robust to skewed distribution. To address this, we transformed the response time (RT) data using the formula logRT = log(RT + 1). This transformation normalized the distribution, making logRT approximately symmetric. The “+1” in the formula ensures all RT values are positive, as the logarithm of zero is undefined. This is crucial for instances of zero or near-zero response times, preventing undefined values in our analysis.
The equation for the HLM model is:
where Yij is item response time on item i for student j, TTSij takes the value of 1 if student j used the TTS tool on item i and 0 otherwise, or it can represent the counts of the TTS usage on an item. The coefficient β20 represented the estimated association between item response time and TTS usage. Roughly speaking, a one-unit increase in the TTS multiplies the response time by eβ20.
Results
Descriptive Analysis Results
Table 1 presents descriptive statistics for student-level variables across five groups: SWDs with ETA, SWDs without ETA, ELLs with ETA, ELLs without ETA, and GE students without ETA. Significant differences in student characteristics among these groups were identified using chi-square or ANOVA analyses. Specifically, the two SWD groups had a higher proportion of male students who were older and received test accommodations such as breaks, cueing, preferential seating, or separate sessions compared to the other three groups. They also reported lower levels of math persistence, enjoyment, and interest, and found the exam more challenging. Around a third of ELLs with ETA also received bilingual dictionary accommodation. Moreover, SWDs and ELLs who received extended testing time spent more time on the exam and reported lower levels of time pressure but scored lower than their counterparts who did not receive extra time.
Descriptive Statistics of Student Variables by Group
Note. The average percentage or mean values (with associated standard deviations in brackets) are presented. SWDs = students with disabilities; ELLs = English language learners; ETA = extended time accommodation; NA = not applicable. About 10% of SWDs are ELLs.
Source: U.S. Department of Education, Institute of Education Sciences, National Center for Education Statistics, National Assessment of Educational Progress (NAEP), “Response Process Data From the 2017 NAEP Grade 8 Mathematics Assessment.”
Table 2 presents descriptive statistics on item-level and test-level TTS usage and performance. The results show that a higher percentage of SWDs and ELLs used TTS compared to their GE peers on each item and throughout the test. Moreover, in the ETA group, a higher or equal proportion of SWDs and ELLs used TTS compared to the no-ETA group across all 15 math items. The frequency of TTS usage among those who used TTS was also examined. The results indicate that students in the ETA group used TTS equally or more frequently than their counterparts in the no-ETA group on most items. Among ELLs in the ETA group who utilized TTS, the average frequency of TTS usage across all 15 items was the highest compared to the other groups. Additionally, ELLs in the ETA group showed the highest TTS usage on difficult items located towards the end of the test (#6 and #10 through #15). In terms of item-level percent correct, SWDs and ELLs had lower rates compared to the GE students.
Item-level Text-To-Speech (TTS) Use and Math Performance by Group
Note. SWDs = students with disabilities; ELLs = English language learners; ETA = extended time accommodation.
Source: U.S. Department of Education, Institute of Education Sciences, National Center for Education Statistics, National Assessment of Educational Progress (NAEP), “Response Process Data From the 2017 NAEP Grade 8 Mathematics Assessment.”
Association Between Item and Student Characteristics and TTS Usage
Table 3 illustrates the correlation between item characteristics and the usage of TTS. The findings suggest that SWDs, ELLs, and GE students tend to use TTS more frequently for math items that are longer and more complex. A common trend observed across these student groups is a pronounced preference for using TTS in multiple-choice or short-response items, as opposed to constructed-response items. The frequency of TTS usage, however, varies among the groups. SWDs used TTS more frequently with multiple-choice or short-response items, in contrast to GE students, who preferred TTS more with constructed-response items. Furthermore, the analysis reveals that SWDs generally utilize TTS more in the first five items of a test, while ELLs and GE students demonstrate increased TTS usage in later test items.
Generalized Linear Mixed-Effects Models Predicting Math Item-Level Text-To-Speech (TTS) Use
p < .05, ** p < .01, *** p < .001
Note: NA = Not applicable; SWDs = students with disabilities; ELLs = English language learners; ETA= extended time accommodation. The two-level HLMs nest items within students. Level 1 of the two-level HLM is the item. Level 2 of the HLM is the student. Coefficients and associated standard errors from the HLMs are presented in the table.
Source: U.S. Department of Education, National Center for Education Statistics, “Response Process Data from the NAEP 2017 Grade 8 Mathematics Assessment.”
Regarding the relationship between student characteristics and TTS usage, GE students who were male, African American, Hispanic, and had lower math persistence, higher math interest, exerted more effort during the test, and perceived more time pressure tended to use TTS more frequently. In addition, TTS usage was associated with higher perceived time pressure and lower math proficiency among SWDs.
Association Between TTS Usage and Math Performance
Table 4 suggests that TTS is associated with improved math item performance only among ELLs or SWDs who were granted ETA. ELLs who used TTS with ETA were 51% more likely to answer an item correctly compared to those who did not use TTS (b = 0.41, p < .01, odds ratio = 1.51). Each additional use of TTS was associated with a 2% increase in the odds of answering an item correctly for ELLs (b = 0.02, p < .01, odds ratio = 1.02) for ELLs. Similarly, for SWDs who received extended time accommodations, each additional use of TTS was associated with a 1% increase in the odds of answering an item correctly (b = 0.008, p < .05, odds ratio = 1.01). However, there was no association between TTS usage and item performance for SWDs or ELLs who did not receive ETA or for GE students.
Two-Level Hierarchical Linear Models (HLM) Predicting Math Item Accuracy or Item Response Time from Text-To-Speech (TTS) Use Controlling for Student and Item Characteristics
p < .05, ** p < .01, *** p < .001.
Note: SWDs = students with disabilities; ELLs = English language learners; ETA = extended time accommodation. All models in this study incorporate controls for student calculation skills measured by performance on item 3, student background characteristics, survey variables, and item characteristics. This table presents the coefficients and standard errors associated with the TTS usage variables, while other predictors are not included in the tables to conserve space.
Source: U.S. Department of Education, National Center for Education Statistics, “Response Process Data from the NAEP 2017 Grade 8 Mathematics Assessment.”
Association Between TTS Usage and Item Response Time
The analysis indicates a significant trend: students who used TTS generally took more time to complete items compared to those who did not use TTS. This was consistent across all five groups studied. Specifically, on average, the use of TTS corresponded to an increase in item response time by 59% for SWDs with ETA, 79% for SWDs without ETA, 54% for ELLs with ETA, 129% for ELLs without ETA, and 79% for GE students. Additionally, our data shows that for each additional instance of TTS usage, there was an incremental increase in time spent on each item: about 2% more for both SWDs with and without ETA, 1% more for ELLs with ETA, 3% more for ELLs without ETA, and 2% more for GE students.
Discussion
This study explores how eighth-grade students use TTS technology during the NAEP math assessment. We found that approximately 38% of SWDs with ETA used TTS, compared to 34% of SWDs without ETA, 33% of ELLs with ETA, and 28% of ELLs without ETA. Notably, these rates are lower than the 61% observed in a previous study by Lee et al. (2021) on a digital statement assessment. Furthermore, only 21% of GE students used TTS, which is less than the 49% reported by Lee et al. (2021). One potential explanation for this discrepancy is that the prior studies focused on high-stakes exams, whereas this study examined the low-stakes NAEP test. Research has indicated that students may not be incentivized to perform well on low-stakes assessments (Wolf & Smith, 1995). As a result, even if TTS is available to them, students may opt not to use it.
Our study also examined how item and student characteristics influence TTS usage. We observed similarities and differences across SWDs, ELLs, and GE students. For example, all groups were more likely to use TTS on questions with longer text and higher difficulty. Moreover, TTS was more commonly used on multiple-choice or short-response items than on constructed-response items across all groups. However, the frequency with which TTS was used differed among the groups: GE students used it more frequently for constructed-response items, ELLs primarily used it more frequently for constructed-response than short-response items, and SWDs used it most often for multiple-choice or short-response items.
In terms of item location and TTS usage patterns, our study yielded some contrasting results compared to previous research. Crotts-Roohr and Sireci (2017) reported a decline in TTS usage toward the end of the exam. However, our analysis revealed that SWDs tend to use TTS more frequently on the first five items compared to the rest of the items, while ELLs and GE students exhibit lower TTS usage on the first five items compared to the subsequent items. These nuanced findings call for further investigation to gain a deeper understanding of these factors and how they can effectively support students’ test-taking experiences.
Our findings also revealed significant associations between TTS usage and certain demographic and psychological factors among GE students. Specifically, male and minority students were more likely to use TTS, highlighting the need for targeted support and training to ensure equal access to assistive technologies. Furthermore, students who used TTS reported lower math persistence but higher math interests and exerted more effort, indicating that promoting interest and effort in math may increase TTS usage and potentially improve math performance. However, using TTS was also associated with higher perceived time pressure, suggesting that students who use this technology may need additional time management support during exams. Exam administrators may need to adjust the time limit or provide time management assistance to accommodate TTS users and optimize their performance.
These findings underscore the joint influence of item attributes and student characteristics in shaping the decision to employ TTS. They highlight the significance of understanding the intricate interplay between item or student characteristics and TTS usage to effectively address the diverse learning needs of students. Further research in this area will contribute to the advancement of educational practices that optimize the integration of TTS technology to enhance students’ test-taking experiences and promote equitable academic outcomes.
Our analysis results provide a cautionary note to the differential boost hypothesis of test accommodation, which suggests that accommodations like extended time or read-aloud should result in greater improvement of test scores for SWDs and ELLs than for GE students (Thurlow et al., 2003). While TTS reduces the impact of a student’s disability or linguistic barrier on test-taking and allows them to demonstrate their true ability, our findings suggest that time limitation is a critical factor to consider when evaluating the effectiveness of TTS in timed assessments. Specifically, the positive impact of TTS on math performance was found only among ELLs or SWDs who were given extra time, indicating that time limitations can reduce the benefits of TTS use on digital learning and assessments. Therefore, further research is needed to explore ways of incorporating TTS into digital assessments that allow ELLs and SWDs to use this accommodation without being penalized for taking longer to complete the tasks. It is essential to provide these students with additional time or alternative testing formats that allow them to use TTS without being constrained by time limits. This may include adjusting test schedules, setting appropriate time limits, or developing new testing formats that allow for more flexible use of TTS. By doing so, educators and policymakers can ensure that all students have equal opportunities to demonstrate their knowledge and skills on assessments.
Our findings highlight the potential of TTS technology in leveling the playing field and providing support to students with diverse learning needs within digital learning and assessment systems. Specifically, our study revealed that among ELLs and SWDs in the ETA condition, those who utilized TTS or used it more frequently demonstrated higher math performance compared to their peers who either did not utilize TTS or used it less frequently. This observation underscores the effectiveness of TTS in reducing cognitive load, enhancing comprehension, and ultimately improving math performance, which aligns with the principles of cognitive load theory, universal design, and language acquisition theories.
To foster inclusion and ensure equal opportunities for students from disadvantaged backgrounds, it is imperative to integrate TTS accessibility features into digital learning and assessment systems. By incorporating TTS functionality, these systems can mitigate the challenges faced by students with diverse learning needs, enabling them to access and engage with content more effectively. Additionally, the integration of high-quality and culturally sensitive TTS into chatbots and AI-based conversational agents further enhances their educational capabilities (Okonkwo & Ade-Ibijola, 2021). Chatbots with TTS integration can convert text-based responses into spoken words, providing personalized assistance and catering to users who struggle with reading or prefer auditory information. This integration plays a crucial role in developing inclusive and equitable AI and chatbot systems, promoting enhanced engagement, accessibility, and improved learning outcomes for students from diverse backgrounds.
While our findings provide valuable insights into the use of TTS among SWDs, ELLs, and GE students, it is important to interpret them with caution due to several study limitations. Firstly, this study’s lack of random assignment introduces the possibility of treatment selection bias. Students who are more familiar with technology or have used TTS before may be more likely to use it, while others may choose not to use it. To address this limitation, future studies should consider randomly assigning students to receive TTS, stratified by their familiarity with technology. Additionally, this study focused on eighth-grade students who participated in the NAEP math assessment, so the results may not be generalizable to other populations or types of assessments.
Furthermore, this study only used quantitative data analysis, which did not include qualitative data or feedback from students or teachers. Including such feedback in future studies could provide additional insights into the effectiveness of TTS and how it can be improved. Lastly, while this study identified associations between TTS usage and specific student/item characteristics or between TTS usage and math performance, it is essential to remember that correlation does not necessarily imply causation. Other factors could be influencing the results that were not measured or accounted for in this study.
Despite these limitations, our study’s findings offer valuable insights into using TTS in math assessments and its association with student performance. By analyzing process data, we were able to identify which types of math items were associated with higher TTS usage and which group of students benefited the most from TTS under what accommodation condition. Our study is a starting point for future research that utilizes process data further to develop equitable and accessible math assessments for all students. By continuing to investigate the use of TTS and other UDA tools, we can strive to create assessments that reflect students’ mathematical abilities and knowledge rather than their disabilities or language barriers.
Supplemental Material
sj-pdf-1-edr-10.3102_0013189X241232995 – Supplemental material for Text-to-Speech Technology and Math Performance: A Comparative Study of Students With Disabilities, English Language Learners, and Their General Education Peers
Supplemental material, sj-pdf-1-edr-10.3102_0013189X241232995 for Text-to-Speech Technology and Math Performance: A Comparative Study of Students With Disabilities, English Language Learners, and Their General Education Peers by Xin Wei in Educational Researcher
Footnotes
Notes
Author
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
