Abstract
In this study, we intend to determine if reading tests aloud provides a differential boost to youth with elevated symptoms of Attention-Deficit/Hyperactivity Disorder (ADHD) relative to same-aged peers. Participants were 36 youth, 44% with or at risk for ADHD, who participated in a week long summer camp. Over the course of the week, youth attended five 45-min classroom periods followed by 10-min tests. Participants were randomized into one of two conditions (i.e., read aloud and silent) that alternated across 5 days. Results indicate that reading tests aloud in small groups significantly improved the testing performance of youth with or at risk for ADHD and provided a differential boost relative to youth without ADHD. Implications for special education practice and future research are discussed.
Youth with Attention-Deficit/Hyperactivity Disorder (ADHD) are at risk for severe and pervasive academic problems and often receive assistance as a result of difficulties in school (Loe & Feldman, 2007). Compared with students without ADHD, students with ADHD are at higher risk for poor grades, grade retention, and dropping out of school and often require additional services (Kent et al., 2011; Loe & Feldman, 2007). As a result of academic impairment, many youth with ADHD qualify for and receive interventions, accommodations, and modifications through special education or Section 504 of the Rehabilitation Act of 1973 (i.e., Section 504 Plans). One of the most common services provided for youth with ADHD is oral test administration (i.e., read aloud; Spiel, Evans, & Langberg, 2014) wherein students have tests or quizzes read to them rather than taking the test in a silent format.
Federal law mandates that services provided through special education are to be based on peer-reviewed research to the extent practicable (34. C.F.R. § 300.320). Recent reviews of special education services for students with ADHD raise doubts about whether current practices adhere to this standard (Harrison, Bunford, Evans, & Owens, 2013; Spiel et al., 2014). In fact, no published studies were found examining the efficacy of read aloud as a possible accommodation for students with ADHD. Currently, there is insufficient evidence to determine if reading tests aloud for youth with ADHD provides any benefit. If it does not, then using this service for children with or at risk for ADHD may be incompatible with federal guidelines and could potentially obstruct or delay the provision of services with known effectiveness. As such, research determining the benefits of reading tests aloud is needed.
ADHD and Education Service Provision
Approximately one quarter (28%; Bussing, Zima, Mason, Hou, & Wilson, 2005) to one half (57%; Reid, Maag, Vasa, & Wright, 1994) of youth with ADHD receive school-based services to address academic and behavioral impairment. Services may be provided through Section 504 of the Rehabilitation Act of 1973 or the Individuals With Disabilities Education Improvement Act (IDEIA). In fact, a large portion of all students who receive special education services are diagnosed with ADHD, including the majority of students in the Other Health Impaired (OHI; 66%) and Emotional Disturbance (ED; 58%) categories, and 20% and 21% of the students in the Learning Disability and Mental Retardation (now known as Intellectual Disability) categories, respectively (Schnoes, Reid, Wagner, & Marder, 2006). One service frequently listed on the Individualized Education Programs (IEPs) and 504 plans for students with ADHD is reading tests and quizzes aloud. In a review of services provided to 467 youth with ADHD receiving special education, Schnoes and colleagues (2006) found that read aloud was included in 53% of IEPs. In a separate study that investigated the content of 60 IEPs and 37 Section 504 plans for youth with ADHD, Spiel and colleagues (2014) reported that 70% of IEPs and 32% of Section 504 plans included reading aloud instructions, tests, and quizzes to the student. Having this service listed on the IEP or 504 Plan of students with ADHD entitles them to have teachers or instructional assistants read tests aloud to them, typically in a small group or one-on-one setting, rather than being tested using the standard, whole class, silent testing format.
Reading Tests Aloud
Reading tests and quizzes aloud is considered a presentation accommodation by Fuchs, Fuchs, Eaton, Hamlett, and Karns (2000) and others (e.g., Harrison et al., 2013) as it is hypothesized to remove construct-irrelevant variance or variance that is related to test features and not the content being measured. For example, when students have a reading disability, the variance in test performance on a math test that includes word problems might be partly attributed to reading level instead of the mathematic knowledge and skill. Thus, in theory, reading the content aloud removes variance due to reading performance from the assessment and restricts the focus of the test to math performance.
The rationale for reading tests aloud to youth with ADHD is not as apparent as it is for youth with a reading disability and is not specifically stated in the literature. Evidence documenting the effects of reading tests aloud to students with ADHD is also lacking. To date, there are no published studies reporting the efficacy of reading tests aloud to students with ADHD. However, there are studies documenting (a) the effect of small group instruction on the performance of children with ADHD and (b) deficits associated with ADHD that could inform hypotheses on the effect reading tests aloud may have on youth with ADHD.
Small Group
As a consequence of the read aloud service only being provided to eligible students (e.g., students with IEPs), it is often provided either one-on-one or in a small group. On-task behavior may increase as a result of working with an adult in a small group setting. One study reported that students with ADHD displayed more on-task behavior during small group instruction compared with whole group instruction. However, students completed a greater proportion of work accurately in one-on-one conditions compared with small group or whole group. Thus, the small group aspect of the read aloud service may improve on-task behavior during instruction, but not necessarily accuracy (Hart, Massetti, Fabiano, Pariseau, & Pelham, 2011).
Deficits Associated With ADHD
There is evidence that students with ADHD have difficulties with behavioral inhibition (Luk, 1985; Oosterlaan & Sergeant, 1995), attention to detail (Tucha et al., 2009), and latency (Borger et al., 1999). In a testing situation, these deficits may result in students with ADHD rushing through test items, making careless errors, and being less productive compared with same-aged peers. Thus, we hypothesize that variance in test performance might be at least partially attributed to deficits associated with ADHD, and aspects of reading tests aloud (i.e., the presence of an adult, small group, content provided at a consistent rate, spoken content) could reduce this barrier so that knowledge of the content can be accurately assessed.
Read Aloud as an Accommodation
According to Harrison and colleagues (2013), a service must meet four criteria to be considered an accommodation: The service (a) constitutes a change to normal practice, (b) does not alter the standard of academic content, (c) mediates the impact of the disability on access to the general education curriculum, and (d) provides a differential boost to the student with the disability. To evaluate the research on read aloud as a potential accommodation, we will review the service as it pertains to each of the four criteria.
With regard to the first criterion, reading tests aloud could be a change depending on what is considered to be normal practice. If tests are normally administered without the questions and answers read aloud, then reading tests aloud meets this criterion. As for the second criterion, reading tests aloud may alter the content standards depending on the construct measured by the test. For example, if the purpose of the test is to measure reading comprehension, then it is possible that read aloud could alter the content standard. However, if the purpose of the test is to measure mathematics ability or knowledge of a specific content area, then reading the test aloud likely does not alter the content standards. Both the third and fourth criteria hinge on the magnitude of benefit for students. If an accommodation mediates the impact of the disability on the student’s ability to demonstrate knowledge acquired from the general education curriculum, then we would anticipate improved testing performance with the accommodation relative to testing performance without the accommodation. If an accommodation provides a differential boost, then we would expect improvement in testing performance only within a target population (i.e., students with a specific disability or disorder) and not all students in the general population (i.e., same-aged peers) when the accommodation is implemented. This criterion is critical as it distinguishes between good teaching practices that should have relatively similar benefits for all students and an accommodation that is intended to help a subset of students with a disability learn the curriculum and demonstrate their knowledge while compensating for any disadvantage associated with the disability (much like a ramp affords those in a wheelchair the opportunity to enter a building, but provides little to no benefit for an ambulatory individual).
Though there are examples of mixed findings in the literature, well-controlled trials with large sample sizes have suggested that there is a differential impact of read aloud as an accommodation for students with reading disabilities. For example, Laitusis (2010) evaluated a sample of 2,028 fourth- and eighth-grade students who were grouped as either having a reading-based Learning Disability (n = 903) or as a control group (n = 1,125). Fourth- and eighth-grade students completed two equivalent forms of a reading comprehension test in a randomized order. Laitusis reported that students with a reading disability had lower scores overall, but also received a differentially larger boost in the read aloud condition compared with students without a reading learning disability in the fourth grade (Cohen’s d = .57 and .14) and the eighth grade (.32 and .06). Furthermore, even after controlling for word reading ability, this differential boost remained. Others have found similar results (e.g., Crawford & Tindal, 2004); however, their sample included students with a wide variety of disabilities so it is difficult to determine how or if reading the tests aloud mediated the impact of the disability. Therefore, it appears that there may be a differential effect of a read aloud service on test scores for students with reading disabilities. However, in spite of the widespread use of this service with students with ADHD, there is a lack of evidence for its utility among this population. To know if this service is useful to students with ADHD, additional research is needed that specifically focuses on the differential boost criterion.
The purpose of this study was to determine if reading tests aloud improves the testing performance of students with ADHD and meets the full definition of an accommodation. To satisfy the first two criteria for an accommodation, we designed this study so that read aloud would be a change to normal practices (i.e., silent testing) and did not alter the standard of academic content. Thus, the focus of this study is on testing the third and fourth criteria for an accommodation. Specifically, we attempt to answer the following questions: (a) Does reading tests aloud in a small group improve the test scores of young adolescents with or at risk for ADHD as compared with a standard test condition? and (b) If so, is the improvement in scores for youth with or at risk for ADHD significantly greater than for youth without ADHD? This second question is addressed both for the entire sample and a subsample that match the testing performance of youth with or at risk for ADHD.
Method
Participants
Thirty-six youth were recruited from several public elementary and middle schools in Ohio to participate in a no-cost 1-week summer camp. At intake, participants were between 9 and 14 years of age (Mdn = 11.5; see Table 1). Approximately half of the participants (44%) met criteria for or were at risk for ADHD. Of the 16 participants with ADHD symptoms and related impairment, 56% had predominately inattentive symptoms (i.e., ≥6 inattentive symptoms), 6% had predominantly hyperactive/impulsive symptoms (i.e., ≥6 hyperactive/impulsive symptoms), and 38% were reported to demonstrate elevated inattentive and hyperactive symptoms (i.e., ≥6 of both inattentive and hyperactive/impulsive symptoms). In addition, semi-structured clinical interviews with the primary caregiver suggested that 50% of the participants with ADHD symptoms (n = 8) and 5% of the participants without ADHD symptoms (n = 1) had elevated symptoms of oppositional defiant disorder (i.e., ODD symptoms ≥4).
Participant Characteristics.
Note. ADHD = Attention-Deficit/Hyperactivity Disorder; ODD = Oppositional Defiant Disorder; CD = Conduct Disorder; WIAT-III = Wechsler Individualized Achievement Test–Third Edition; WASI-II = Wechsler Abbreviated Scale of Intelligence–Second Edition; FSIQ = full-scale IQ.
Chi-square test for race compared White with non-White. Significant results indicate greater variety of race in the control group. bChi-square test examined group differences at/above and below the median (i.e., US$50,000–US$74,999). cParent-reported symptoms on the Disruptive Behavior Disorder Rating Scale–Parent Version (DBD; Pelham, Gnagy, Greenslade, & Milich, 1992).
p < .05.
Procedures
Recruitment
In the spring of 2013, we distributed materials describing a no-cost summer camp for youth with and without ADHD to parents of all fourth- through seventh-grade students attending local elementary and middle schools, faculty at a local university, and individuals who had participated in studies conducted at the Center for Intervention Research in Schools (CIRS). Families were directed to call research staff at CIRS to obtain information about the summer camp. All parents who expressed interest in the summer camp were scheduled for an evaluation.
During the evaluation, written documentation of informed assent and consent was obtained. Subtests of the Wechsler Abbreviated Scale of Intelligence–Second Edition (WASI-II; Wechsler, 2011) and the Wechsler Individualized Achievement Test–Third Edition (WIAT-III; Wechsler, 2009) were administered to participants. The Children’s Interview for Psychiatric Syndromes–Parent Version (P-ChIPS; Weller, Weller, Fristad, Rooney, & Schecter, 2000) was administered to caregivers who were also asked to complete the Disruptive Behavior Disorder Rating Scale–Parent Version (DBD; Pelham et al., 1992) and the Impairment Rating Scale–Parent Version (IRS; Fabiano et al., 2006). Caregivers were also asked if their child had ever been diagnosed with ADHD or a learning disability. Symptoms of ADHD were considered present if reported during the P-ChIPS interview (i.e., “yes” to symptoms questions) or on the DBD (i.e., endorsed as “pretty much present” or “very much present”). Impairment was considered present if endorsed on the P-ChIPS (i.e., “yes” to impairment questions) or on the IRS (i.e., a score of 3 or higher). The measures given to caregivers were used to classify youth as (a) with or at risk for ADHD or (b) without ADHD. Because we did not gather information from the teachers of these children to inform diagnoses, we did not make formal diagnoses of ADHD. Nevertheless, 16 participants met criteria for ADHD according to parent report on the diagnostic interview and rating scales according to the Diagnostic and Statistical Manual of Mental Disorders (4th ed., text rev.; DSM-IV-TR; American Psychiatric Association, 2000). Youth with ≥6 inattentive and/or hyperactive symptoms and impairment reported in ≥2 settings were classified in this study as having ADHD symptoms and impairment. In addition, 11 participants had a prior clinical diagnosis of ADHD. The remaining five were categorized as “at risk” for ADHD. Henceforth, groups are referred to as ADHD and control.
Randomization
Prior to the study, participants were randomized into two classes balanced for gender, age, and ADHD symptoms (Randomization 1; see Figure 1). Next, participants within these classes were randomized into two groups balanced for classification (i.e., ADHD or control) and WIAT-III Word Reading T-scores (Group A/B and Group C/D; Randomization 2; see Figure 1). These groups alternated test administration conditions each day so that one day youth from Groups A and C participated in the read aloud condition and then participated in silent condition the next day, with participants in Groups B and D following the opposite schedule. Thus, all participants completed 2 or 3 days in each condition.

Flow chart showing randomization procedures.
To control for the possible negative influence of disruptive participants and to create groups similar in size to those that may be used for the read aloud service in schools, participants in the read aloud condition were daily randomized (Randomizations 3.1–3.5; see Figure 1) into two smaller groups consisting of four to five adolescents per group. This randomization occurred each day so that the same groups of participants were not together each time they had tests read aloud.
Summer camp
Participants attended a no-cost 1-week summer camp that operated from 8:30 a.m. until 4:00 p.m. Monday through Friday at a local middle school and included daily activities such as art, recreation, and an academic learning center (ALC). A strict behavior management system was employed during all camp activities wherein participants earned points for positive behaviors (e.g., complimenting, contributing) and lost points or received a 5- to 10-min time-out for negative behaviors (e.g., teasing, physical aggression). All points earned could be redeemed for prizes (e.g., toys, snacks) at the end of each day at a camp store.
During the ALC, participants attended a 45-min interactive discussion followed by a 10-min period to complete a test designed to assess knowledge of the material presented by the teacher. The behavior management system implemented during the rest of camp operated during the ALC with the addition of a rule requiring youth to stay on-task. Rules were posted at the front of the classroom and youth were explicitly taught these rules on the first day of the ALC and reminded of these rules at the start of each activity.
Class discussions
The discussion portion of the ALC occurred in a classroom with two groups of 18 youth for 5 consecutive days. Within each classroom, children were seated in five rows of six in the alphabetical order of the children’s first names. Each classroom was led by two teachers: one lead teacher who was responsible for delivering academic content and enforcing the behavior management system and one instructional assistant who helped the lead teacher enforce the behavior management system. The teachers remained consistent over the course of the camp. All teachers were clinical psychology graduate students with experience working in schools and consulting on education practices. Teachers received similar training, delivered content following detailed outlines, and were observed by trained research staff (see Integrity section below) to ensure fidelity to discussion content and equivalence across teachers and, therefore, classrooms. Discussion topics were focused on science content (i.e., outer space) and lasted for 45 min per day. Content was presented in a didactic format that included lecture; multi-media presentations; and opportunities for students to make contributory remarks.
Test administration
Following the class, participants completed a 20-item test during a 10-min period. Participants were administered tests under one of two conditions: silent or read aloud. In both conditions, tests were administered by one of the classroom teachers or another camp counselor, who instructed participants to observe classroom rules and to not ask questions until after the test. Participants in both conditions were informed that they were not allowed to leave the ALC or start another activity until the 10-min testing period ended. They were instructed to either check their answers or wait quietly if they completed the test early. All participants in both conditions completed the test within the 10-min period.
Silent condition
Following the class discussion, half of the participants remained in the ALC classroom and were administered a test in the silent condition. After tests were handed to participants and instructions were read aloud, participants were instructed to turn the tests over and begin. Participants completed the tests and administrators monitored the youth in silence until the end of the 10-min period. The test administrators reported zero instances of disruptive behavior during the silent testing condition across all 5 days of camp.
Small group read aloud condition
The other half of participants were divided into small groups of four or five youth and each small group was led into a separate classroom. After tests were handed to them and instructions were read aloud, participants were instructed to turn the tests over and begin. Test administrators read the test aloud at a rate of one question per 30 s. Each question (and answer choices for multiple-choice questions) was read twice. Similar to the silent testing condition, the test administrators reported zero instances of disruptive behavior during the read aloud testing condition across the 5 days of camp.
Integrity
Observations were conducted on 40% of discussions and test administrations to evaluate integrity to administration procedures and adequate coverage of content across instructors. These observations were split equally between both classrooms. Two out of the 5 days of the study were randomly selected and each observation was conducted by two independent observers to assess inter-observer agreement of integrity ratings. The observers remained at the back of the classroom and did not interact with the students during either the discussions or test administrations to minimize observer effects on student learning and behavior. The integrity observers were provided with copies of the detailed lecture outlines. Main ideas and important supporting details were bolded on these outlines. Observers recorded the number of bolded items covered in the lecture. Integrity for each observation was computed by dividing the number of bolded items covered in the lecture by the total number of items and multiplying the quotient by 100. Inter-observer agreement was assessed using the percent agreement statistic (Cooper, Heron, & Heward, 2007), in which the total number of items in which observers agreed is divided by the total number of possible items and multiplied by 100. Integrity observations of the small group read aloud test administrations were also conducted by two independent observers who were provided with a test integrity checklist. This checklist included all 20 test items and required observers to rate the administrators’ adherence to three components of administration. Specifically, observers were asked to indicate if each question and all answer choices were read completely and understandably, read twice, and read at a pace of one question per every 30 s. Inter-observer agreement was assessed using the percent agreement statistic described above. Integrity was observed to be 98% for the lectures and 100% for the tests. There was 100% inter-rater reliability for both lecture and test integrity observations.
Measures
P-ChIPS
The P-ChIPS (Weller et al., 2000) is a structured interview administered to parents, about their children (ages 6–17), to screen for DSM-IV (American Psychiatric Association, 1994) disorders. Scoring of the P-ChIPS adheres strictly to the DSM-IV criteria for presence of the disorder, assessing for symptoms, duration, age of onset, and impairment. Psychometrics of the P-ChIPS are well established and have good construct validity, sensitivity, and specificity (Fristad, Teare, Weller, Weller, & Salmon, 1998). The level of agreement between interviewers using the P-ChIPS is adequate with average kappa coefficients ranging from .122 to .596 across disorders. Sensitivity averages 87% across diagnostic categories and specificity averages 76% (Fristad et al., 1998).
DBD
The parent version of the DBD (Pelham et al., 1992) is a 45-item scale that assesses for the presence and frequency of symptoms of inattention, hyperactivity/impulsivity, ODD, and Conduct Disorder (CD) according to DSM-IV criteria. Parents rate items on a 4-point scale ranging from 0 (not at all present) to 3 (very much present). The ADHD subscales have adequate internal consistency for children and young adolescents (α = .86–.94; DuPaul, Power, McGoey, Ikeda, & Anastopoulos, 1998; Hartung, McCarthy, Milich, & Martin, 2005), which is consistent with internal consistencies in this sample (α = .93–.96). The measure has acceptable convergent (i.e., correlations with other ADHD rating scales and interviews) and discriminant validity (i.e., between children with and without ADHD; DuPaul et al., 1998, Hartung et al., 2005).
IRS
The IRS (Fabiano et al., 2006) is a seven-item rating scale that assesses parents’ perceptions of their child’s functioning in multiple domains (i.e., academics; family; relationships with peers, siblings, and parents; self-esteem; and overall). For each item (i.e., domain), parents place an “X” on a line to indicate their child’s level of impairment in each domain. This line represents a continuum of impairment and need for treatment from 0 (no problem/definitely does not need treatment) to 6 (extreme problem/definitely needs treatment). A score equal to or greater than 3 is associated with significant impairment and need for clinical services. The measure has acceptable cross-informant reliability (e.g., r > .47), convergent and divergent validity with other impairment scales, and predictive validity in identifying children with ADHD (Fabiano et al., 2006).
WASI-II
The WASI-II (Wechsler, 2011) is an individually administered tool to provide an estimate of intelligence of examinees ages 6 to 90 years. Four subtests on the WASI-II provides composite scores that estimate abilities in verbal comprehension index (VCI) and perceptual reasoning index (PRI) as well as an estimate of overall cognitive functioning. Overall cognitive functioning can be estimated by computing a full-scale IQ (FSIQ) based on two or four subtests. For this study, eligibility is determined using the four-subtest FSIQ. Within child samples (ages 6–16), composite scores have strong internal consistency (r = .92–.96), temporal stability over a 10-day re-test interval (r = .85–.92), and evidence for convergent validity based on correlations with the Wechsler Intelligence Scale for Children–Fourth Edition (WISC-IV; r = .79–.88).
WIAT-III
The WIAT-III (Wechsler, 2009) is an individually administered assessment of achievement of children ages 4 years 0 months through 19 years 11 months. Sixteen subtests on the WIAT-III measure listening, speaking, reading, writing, and mathematics skills. In this study, the Word Reading subtest of the WIAT-III is used to control for the effects of reading on test performance in the analyses. The Word Reading subtest measures speed and accuracy of decontextualized word recognition. This subtest has strong internal consistency (r = .97), temporal stability across age bands over a 13-day re-test period (r = .89–.95), and evidence for validity based on significant correlations with general cognitive ability on the WISC-IV (r = .68).
Academic Content Tests
Each test contains 20 items (15 multiple-choice and five short-answer). The tests are rated at a second- and third-grade reading level (Flesch–Kincaid grade level ranged from 2.2 to 3.4), which is at or below the grade-based reading level of all participants based on the WIAT-III Word Reading (range = 2.9 to >12.9). Multiple-choice questions have four possible answers. Short-answer questions require participants to provide single word answers or short phrases. These questions require participants to recall explicitly stated facts from the class discussion. Test items assess for main ideas and important supporting details from the detailed lecture outlines. Dividing the total number of items answered correctly by the total number of items derives the score for each test.
Data Analyses
Before conducting the main analyses, we compared classes (Classrooms 1 and 2; see Figure 1) on parent-rated inattentive, hyperactive/impulsive, ODD, and CD symptoms, WIAT-III Word Reading T-score, gender, WASI-II FSIQ, parent-reported learning disability, test performance in the silent and small group read aloud conditions, and test scores using a series of independent samples t tests and chi-square tests. The Holm–Bonferroni method was used to control the familywise error rate and no significant differences were observed. Participants randomized to receive either the read aloud condition on the first day (group A/C; Randomization 2, see Figure 1) or the silent condition on the first day (group B/D; Randomization 2; see Figure 1) also did not significantly differ on any of these variables.
To perform the main analyses, we mean-centered test scores for each day to account for variability in the difficulty of the tests across days. For each participant, we calculated average test scores separately for the tests administered in each condition (i.e., silent and read aloud). A two-way repeated measures ANCOVA was performed with condition (i.e., small group read aloud or silent) as the within subject factor, group (i.e., ADHD or control) as the between subject factor, and average test score as the dependent variable. Participants’ WIAT-III Word Reading T-scores were included as a covariate. Finally, we computed a difference score by subtracting the average silent score from the average small group read aloud score. This allowed for further inspection of the magnitude of differences between conditions across groups.
Exploratory analyses were conducted to examine the effect of reading tests aloud to youth who performed below average in the silent testing condition. As the evidence reviewed in the introduction suggests that the performance of youth with learning disabilities improves when tests are read aloud, youth with parent-reported learning disabilities were also excluded only from the exploratory analysis. Of the total sample, 16 participants (nine in the control and seven in the ADHD group) scored below the mean in the silent testing conditions and had no parent-reported learning disabilities. For this exploratory analysis, we conducted a two-way repeated measures ANOVA with condition (i.e., small group read aloud or silent) as the within subject factor and group (i.e., ADHD or control) as the between subject factor.
Results
After controlling for word reading ability, results of a two-way repeated measures ANCOVA indicate that the testing condition by group (i.e., ADHD or control) interaction is significant, F(1, 33) = 8.90, p = .005. Youth in the ADHD group receive significantly higher scores in the read aloud condition compared with the silent testing condition, F(1, 15) = 17.74, p = .001. On average, the scores of youth in the ADHD group rise 6.99 percentage points (SD = 6.64, Cohen’s d = 0.42) in the read aloud condition compared with the silent testing condition. The test scores of youth in the control group are not significantly different in the read aloud testing condition compared with the silent testing condition, F(1, 19) = .36, p = .56. On average, the scores of participants in the control group drop 1.2 percentage points (SD = 8.99) in the small group read aloud condition compared with the silent testing condition (see Figure 2). Whereas the performance of youth in the ADHD and control groups differs significantly in the silent testing condition, t(34) = −2.1, p < .05, there is no significant difference between the test scores of these groups in the small group read aloud condition, t(34) = −.32, p = .75. In addition, there is no significant difference between the mean test scores of the participants in the ADHD group in the read aloud condition and the mean test scores of the control group in the silent testing condition, t(34) = −.59, p = .55.

Average mean-centered test score by condition for the full sample of participants in the ADHD and control group (left) and among participants who performed below average in the silent testing condition and did not have parent-reported learning disabilities in the ADHD and control group (right).
For the ADHD group, difference scores (i.e., average silent score minus average small group read aloud score) range from −7.4 to 22.77 (M = 6.99, SD = 6.64) with one participant scoring lower on the read aloud condition compared with the silent condition and 15 participants scoring higher in the read aloud condition compared with the silent condition. For the control group, difference scores range from −16.94 to 15.74 (M = −1.2, SD = 8.98) with 12 participants scoring lower in the read aloud condition compared with the silent condition and eight participants scoring higher in the read aloud condition compared with the silent condition.
Given the discrepancy between the average performance of youth in the ADHD and control groups in the silent testing condition, we conducted an exploratory analysis to investigate whether reading tests aloud in a small group provides a differential boost to participants in the ADHD group or if it benefits all youth who perform poorly on tests. The testing condition by group interaction was not significant, F(1, 14) = 3.83, p = .07, although the trends suggest that youth in the ADHD group may improve more than the control group in the read aloud condition (see Figure 2). Among participants who scored below average in the silent testing condition, the average difference scores of participants in the ADHD group is 7.77 percentage points greater than the scores of the control group (Cohen’s d = 1.00).
Discussion
The purpose of this study is to determine (a) if reading tests aloud in small groups improves the performance of youth with or at risk for ADHD as compared with a standard test condition (i.e., performance improvement) and (b) if the improvement in performance for youth with or at risk for ADHD is significantly greater than for youth without ADHD (i.e., differential boost). Results from this study provide evidence that reading tests aloud in small groups significantly improves the test performance of, and may provide a differential boost for, participants with or at risk for ADHD compared with typically developing peers (see Figure 2). To our knowledge, this is the first evidence of an accommodation for youth with or at risk for ADHD. That is, reading tests aloud in a small group may be a change to normal practice that does not alter the standard of academic content, but provides a differential boost for the population of students with the disability. This is in contrast to preliminary evidence suggesting that other commonly used techniques, such as extended time on tests, do not provide a differential boost and in fact may not benefit most students with ADHD (Harrison et al., 2013).
Among participants who performed below average in the silent testing condition and did not have a parent-reported learning disability, youth in the ADHD group appear to benefit from the read aloud condition more than youth in the control group. Although not statistically significant (i.e., p = .07), this finding suggests that reading tests aloud in a small group may have an effect specific to youth in the ADHD group and not only all youth who underperform academically. Although there are limitations to this analysis (see Limitations section), the between groups effect size for participants in this subsample was large (Cohen’s d = 1.00).
Our finding that the testing performance of youth in the ADHD group in the traditional silent reading condition is significantly below the testing performance of the control group is consistent with prior studies demonstrating that youth with ADHD typically perform lower on math and reading achievement tests and received lower grades in core academic classes than their peers (Kent et al., 2011; Loe & Feldman, 2007). However, the finding that there is no significant difference between the mean testing performance of youth in the ADHD group in the read aloud condition and the mean testing performance of youth in the control group in the silent condition suggests that reading tests aloud in small groups may reduce method variance related to a test and normalize the performance of youth with ADHD.
There are several hypotheses that could be drawn to understand the possible differential boost found with the ADHD group and not the control group. For example, completing the tests in the presence of an adult in a small group may provide for a more controlled atmosphere than a group of 10 students (standard administration). Although this small difference in group size may play a role, we expect that this effect is probably minimal as all groups were smaller than typical classroom sizes. Receiving the testing information through multiple information routes (i.e., visual and auditory) may help to decrease errors that are common for children with ADHD due to difficulties sustaining attention to details over time. Reading testing items aloud at a steady pace may also reduce the likelihood of participants rushing through test questions or answer choices. This reduction in impulsive and inattentive responding that is common of children with ADHD and contributes to their poor performance on tests and other academic tasks may mediate the impact of the children’s disability on their performance, resulting in a more accurate assessment of the students’ knowledge. In contrast, participants in the control condition who did not have these deficits did not experience the same effect. Further research is needed to determine the specific mechanism of action for the read aloud accommodation. Understanding the causal pathway could lead to additional revisions to the accommodation that could further enhance its benefits to students.
Limitations
One factor that may limit the generalizability of these findings is that this study took place in a summer camp setting rather than an actual classroom. Though efforts were made to structure the ALC in a way that closely resembled a typical classroom setting, a number of notable differences from typical practice exist. First, the short-term nature of the camp limits the generalizability of these findings to typical practice, where it may be expected that the service be given across months or years. Future investigations can help determine whether potential gains are maintained over a longer period of time. Second, within this setting, there was a greater number of staff to students than that found in typical classrooms. Third, a rigid behavior management system was used to encourage participants’ attention during all instruction and testing periods. As such, replication of these findings in typical classrooms outside of a summer camp setting would be helpful to determine the generalizability of these results. Fourth, the sample used in this study was gathered using convenience sampling and efforts would be needed to replicate these results with a nationally representative sample to help determine generalizability.
Another limitation involves the tests used in the ALC. Our tests were created by the authors and are not standardized tests. Thus, the results may not generalize to other academic performance measures, such as standardized achievement tests. This is true both in generalizability of content and format, in that it may be expected that some testing situations in typical practice may require multiple hours of testing in a single session and across multiple content domains. The size of the groups in the read aloud and silent conditions also differed throughout testing (i.e., 8–10 in the silent condition and 4–5 in the read aloud condition), and thus these results might be confounded due to group size. There are also limitations to the implications that can be drawn from our exploratory analyses (i.e., differential boost among all underperforming participants without parent-reported learning disabilities). Specifically, by including only those youth who performed below average in the silent testing condition, our sample is cut in half, causing these analyses to be underpowered.
In addition, not all participants had a formal diagnosis of ADHD due to not collecting symptom and impairment data from teachers. Although all participants were categorized based on parent report of ADHD symptoms and related impairment, studies with a diagnosed sample involving a comprehensive evaluations including teacher ratings are needed to confirm the results reported in this study. Furthermore, participants were not comprehensively assessed for reading or listening ability. In the statistical analysis, WIAT-III Word Reading T-scores were entered as a covariate to control for participant word reading ability, but there may be other variables, such as reading comprehension and listening comprehension, that effect the impact of the small group read aloud accommodation on test performance.
Implications for Practice
The results from this study indicate that reading tests aloud in small groups may mediate the impact of the disability on the test performance of students with ADHD. However, it does not address the question of where this accommodation might fit in an education plan for students with ADHD. To answer this question, it is important to consider the long-term goal that is being pursued when providing services to youth with ADHD. For example, if the goal of services is to improve test scores for students with ADHD to pass a course, reading tests aloud in a small group setting may be appropriate and accomplish this goal. However, if the goal is to enhance the competencies of students so they can independently meet age-appropriate academic or behavioral expectations, reading tests aloud in a small group setting likely will not be helpful (see Evans, Owens, Mautone, DuPaul, & Power, 2014, for a discussion of this topic). Nevertheless, if a student who is identified with ADHD and is eligible for an IEP or Section 504 Plan does not respond adequately to interventions, we recommend reading tests aloud to youth with ADHD in small groups before implementing other accommodations that lack a research base.
Future Directions
This study helps to illustrate several directions for future research related to using the read aloud accommodation for youth with ADHD. Foremost is the differential benefit of read aloud across various tasks and settings. Because of variations in the length, content (e.g., reading, writing, mathematics), and response formats in tests (e.g., multiple-choice, short-answer, essay), a comparison of the performance of youth with ADHD under silent and read aloud conditions across various testing settings is needed to help inform education planning for these students. In addition, given the brevity of the tests administered in this study, it is unknown whether these results apply to longer tests. As youth with ADHD have difficulties with sustained attention that increase with task length (Hooks, Milich, & Lorch, 1994), it is possible that reading tests aloud in small group settings may have a larger effect on longer tests compared with shorter tests. Further research investigating the read aloud accommodation with longer tests is needed to measure this effect.
Youth with ADHD also vary on several factors related to academic performance, such as cognitive ability, academic achievement, presentation of ADHD symptoms, and comorbid disorders. It is possible that the effect of read aloud may be different for a student with an inattentive presentation of ADHD compared with a student with a hyperactive/impulsive presentation of ADHD. Studies investigating the potential differential effectiveness of reading tests aloud in small groups between children with variations of these possible moderators are needed to improve our understanding of for whom this accommodation might be most beneficial.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
