Abstract
This study examined the psychometric properties of test presentation and response formats that were modified to be accessible with the use of assistive technology (AT). First, the stability of psychometric properties was examined in 60 children, ages 6 to 12, with no significant physical or communicative impairments. Population-specific differences were then examined with samples that included 24 children with cerebral palsy and matched control peers. Children were administered standard and modified versions of tests. The type of AT access did not have a statistically significant effect on modified test scores. Measurement stability between the standard and modified versions of quadrant forced-choice format tests was sufficient. The findings support the potential use of AT and accessible procedures for some test instruments in the assessment of children with cerebral palsy.
Introduction
It is not unusual for children who receive special education support under health impairment or multiple disabilities classifications to have significant impairments in speech/articulation and/or manual dexterity that could preclude participation in traditional psychological and educational assessments (National Longitudinal Transition Study-2, 2008). The inaccessibility of test instruments is an obstacle to understanding the risks and needs associated with specific neurodevelopmental conditions, including cerebral palsy (CP), the most common cause of childhood physical disability. There have been limited efforts to create modified test procedures that potentially are accessible to children with multiple impairments. Techniques have included converting response options to a forced-choice format (Berninger, Gans, St James, & Connors, 1988; Sabbadini, Bombardi, Carlesimo, Rosato, & Pierro, 2002) and using event-related brain potentials (ERPs; Byrne Dywan, & Connolly, 1995). Assistive technology (AT) computer access options involving direct selection or linear scanning potentially could be used for forced-choice format assessment if stimuli and response options were computerized (B. Wagner & Jackson, 2006).
Modifying standardized procedures to make them accessible may change the psychometric properties of tests, including construct validity (Hill-Briggs, Dial, Morere, & Joyce, 2007). The Standards for Psychological and Educational Testing (American Educational Research Association, 1999) identify specific test accommodations, including modifications in stimulus and response formats, time constraints, and test selection. The standards for testing individuals with specific impairments, however, include recommendations to pilot test the modifications prior to clinical use and to provide psychometric data regarding test validity as well as normative data regarding standard modification recommendations. One possible systematic approach can be to initially examine the effects of modifications on the psychometric properties in a sample of children who do not have significant impairments and can therefore participate in the standard and modified versions of tests. Subsequent research can examine the psychometric effects of modifications in target populations. In this vein, this study was conducted to examine the psychometric properties of accessible instruments in which standardized procedures were modified for use with AT to minimize oral speech and motor response demands. Instruments were selected to include proxy measures of intellect and academic achievement that use forced-choice format response arrays, as well as a test of phonological analysis and synthesis.
First, we examined the extent to which psychometric properties of modified versions of tests were maintained in a representative sample of children without physical or communicative impairments. Dependent variables were scores on the standard and modified versions of the selected instruments. Then, the performances of children with and without CP were compared. Specifically, standard and modified versions of tests were administered to a sample of children with CP who were matched with a subset of the initial representative sample on age, gender, and receptive vocabulary.
It was decided a priori that, to be clinically useful, the adapted instruments should meet the following criteria: (a) they should yield standardized scores that are not statistically significantly different from standard counterparts and (b) they should demonstrate intraclass correlation indexes of agreement with standard counterparts that would be at least .75 (Lee, Koh, & Ong, 1989). Preliminary analyses were conducted to compare the nomothetic span of test versions.
In the initial analyses with the sample of representative children without CP, the sample size was sufficient to examine Bland Altman plots for further evidence of measurement agreement and test bias. A priori criteria for interpretation of Bland Altman tests set the acceptable upper 95% confidence limit (UCL) and lower 95% confidence limit (LCL) of 1.96 SD of the differences between the methods (UCL1.96 SD,diff, LCL1.96 SD,diff) as equal to or smaller than the normative standard deviation (SD = 15; Bland & Altman, 1986). In addition, two types of AT access were examined with the hypothesis that type of access would not have a significant effect on psychometric properties. Population-specific effects of test modifications were examined under the general null hypothesis so that there would not be significant group differences in the psychometric properties of standard and modified tests.
Method
Participants
Following institutional review board approval, participants were recruited through local community flyers and websites connected with two Midwest medical rehabilitation centers over a period of approximately 2 years. Participants with no significant physical or communicative impairments included 60 children between 6 and 12 years of age, mean 9.1 years (SD = 1.8), 56.4% female, and 91% Caucasian. Five children (9%) had been diagnosed with attention deficit/hyperactivity disorder. Twelve percent of the children in the study were born prematurely. Five children (9%) wore corrective lenses, and one child (2%) had a corrected hearing impairment. Seven children (12%) had special education services, though none received services for physical impairments. The mean Hollingshead Index was 3.1 (SD = 0.7).
To examine population-specific effects of modifications, the participants were 24 children with congenital CP, each matched on age, gender, and vocabulary with a control peer from the original sample of 60 children with no significant physical or communicative impairments. Specifically, children were matched by their scores on the standard version of the Peabody Picture Vocabulary Test–Third Edition (PPVT-III; SS within 10 points), age (within 2 years), and gender. Because of the usage of the multivariate matching strategy as a partial control for ability to use AT methods, only participants with IQ > 69 were included. This multivariate pairs matching was conducted from initial samples of 30 children with CP and 60 controls, as optimal two-group matching typically requires at least a 1:2 sample size ratio. The pooled sample had a mean age of 9.6 years (SD = 1.7), 56.3% female, with a mean PPVT-III of 104.6 (SD = 12.7).
Inclusion criteria for the CP group included ability to make a reliable dichotomous choice with a raw score of 12 or better on the Dichotomous Choice Screen (DCS; Van Tubbergen, Warschausky, Birnholz, & Baker, 2008). Participation in the DCS also requires hearing and vision sufficient for test participation. Children with CP also were oral communicators and able to participate in both the standard and modified versions of the tests. Children were excluded if they were on unstable or frequently changing doses of medications that could affect cognitive function. Exclusion criteria also included history of an acquired brain injury or other major neurological or psychiatric condition (for children with CP, this refers to events subsequent to the onset and diagnosis of CP) or an inability of the parent or guardian to complete child history.
In the CP group, 73.9% of the sample exhibited spasticity. Functional levels using the Gross Motor Functional Classification System (Palisano et al., 1997) criteria were as follows: Level I (18) 75%, Level III (5) 20.8%, and Level IV (1) 4.2%. Manual Ability Classification System (Eliasson et al., 2006) levels included Level I (5) 20.8%, Level II (17) 70.8%, and Level III (2) 8.3%. Two thirds of the CP group had a history of prematurity with a mean gestation of 28.9 weeks (3.2) and a mean birth weight of 3.8 pounds (2.1). There was a history of seizures in 20.8% of the CP sample. There were significant group differences in socioeconomic status (SES) as indicated by the Hollingshead scores, with lower SES noted in the CP group, F(1, 47) = 7.31, p < .05, η2 = .17. There also were statistically significant group differences in special education certification, χ2 = 13.44, p < .01 (CP 69.6%, Control 16.7%). Clearly, the combination of including only those children with CP able to participate in both the test versions as well as the matching procedure resulted in a sample of children with CP with milder motor impairments and higher cognition than is noted in the general population with CP (Sigurdardottir et al., 2008).
Procedure
Parents and children provided informed written consent or witnessed assent for voluntary participation. Child participants completed the test battery while a parent or legal guardian (usually the mother; 92%) completed a set of survey instruments to describe the child’s demographic, medical, and educational characteristics.
Participants completed both the modified and standardized versions of the tests. Responses were coded in the standardized manner for both test versions. The order of test version administration (modified vs. standard) was randomized. Participants also were randomly assigned to one of two response modalities for the adapted versions of the tests: either Pressure Switch (Big Red®, 2010) or HeadMouse® (HeadMouse Extreme, 2010). In the pressure switch condition, linear scanning was used, and participants activated the pressure switch when the preferred choice was highlighted; autoscan speed was set between 1,000 and 1,500 ms, mean 1,230 ms (SD = 250 ms). Linear scanning of response options is done with a red framing border that sequentially appears as a frame around each of the response options. The pressure switch is 5 in. in diameter, and when the participant presses the switch, it stops the linear scanning, with the now static red framing border indicating the participant’s response. The HeadMouse has a wireless optical sensor that tracks a small target that is placed on the participant’s forehead. Head movements are translated into movements of the cursor on the screen. As the cursor is moved to a response option, a red framing border appears around that option. If the cursor dwells on that option for a preset duration, there is an alert signal, and the frame freezes around that option. HeadMouse dwell time for choice selection also was set between 1,000 and 1,500 ms, mean 1,340 ms (SD = 230 ms). Speed and dwell time were different, F visually adjusted for child comfort. Group differences in AT device usage and autoscan speed were not significant. HeadMouse dwell time group differences were statistically significant, F(1, 47) = 10.03, p < .01, η2 = .31, with slower dwell time in the group with CP. No participant completed both the modified and standardized versions of the tests. Responses were coded in the standardized manner for both test versions. The order of test version administration (modified vs. standard) was randomized. Participants also were randomly assigned to one of two response modalities for the adapted versions of the tests: either Pressure Switch (Big Red, 2010) or HeadMouse (HeadMouse Extreme, 2010). Inparticipants were excluded due to inability to use either modality.
Instruments
PPVT-III is an individually administered test designed to measure single-word receptive vocabulary through a forced-choice format (Dunn & Dunn, 1997). For each of the 204 test items, examinees are shown a page with four black and white illustrations, and a target word is presented orally. The child must identify the picture that best describes the word by either pointing to or saying the number of the correct picture. The PPVT-IIIAdapted uses items identical to the PPVT-III, but the stimuli have been computerized for use with BoardMaker Dynamically Speaking Pro™ software, allowing for responses to be made using a switch interface system. The stimuli are enlarged by 214% and presented in response boards as four distinct 12.5 cm × 12.5 cm response regions that are selected by responding with a pressure switch via linear scanning or a HeadMouse via direct selection devices. The PPVT-III has strong test–retest reliability, ranging from .91 to .94 over a 1-month interval, with a gain of 1.0 to 3.2 points, as well as concurrent validity through high correlations, ranging from .91 to .91 with the WISC-III Verbal IQ (Dunn & Dunn, 1997).
Raven’s Coloured Progressive Matrices (Raven’s CPM) were developed as a measure of Spearman’s g or general intellect (Raven, Raven, & Court, 1998). Individual items consist of a visual pattern with a missing element; the element that correctly completes the pattern must be selected from a set of six choices. Raven’s CPMAdapted uses the same 36 items, all of which have been digitized for use with BoardMaker Dynamically Speaking Pro™ software; responses are made using either the Pressure Switch or HeadMouse switch interface systems. The stimuli were altered with the target pattern box border enlarged by 206% to a 10.9 cm × 16.2 cm display box, and the six response option box borders were enlarged by 519% from 2.7 cm, 5.8 cm × 10.4 cm. This significant enlargement of the response options for the Raven’s was necessary due to the difficulties inherent in using the HeadMouse (direct selection) with very small activation regions. Psychometrics of Raven’s CPM include split-half reliability of .85 and test–retest reliability of .86 to .95, over a 10- to 21-day interval, and concurrent validity including correlations of .5 to .7 with Wechsler Full Scale IQs (Raven et al., 1998).
The Elision subtest of the Comprehensive Tests of Phonological Awareness (R. K. Wagner, Torgesen, & Rashotte, 1999) requires the participant to analyze phonological information. In the standard administration, children are asked to repeat a spoken word and then asked to remove a phonemic segment of the word, for example, to say “cat” without /k/. The correct response “at” demonstrates mastery of phonemic analysis skills. ElisionAdapted required significant changes to allow for an accessible forced-choice response format without the requirement of a word or a word segment utterance by the participant. Item prompt words (e.g., “cat”) and target words (e.g., “at”) were identical to those in the standard Elision test. In ElisionAdapted, however, the child first views a pictorial representation of the prompt word on a computer screen, which is verbally labeled by the examiner. The second screen shows three pictures, each on a virtual button. The box borders for all stimuli are 10.0 cm × 10.9 cm. The examiner points to and verbally labels each picture, for example, “This is ‘mat,’ this is ‘cap,’ and this is ‘at.’” Then, the examiner directs the child to “listen carefully. Show me ‘cat’ without /k/.” The examinee then uses AT to select the picture of the correct target word (“at”) from among the three choices. Foils were created that systematically altered the beginning, middle, or end of the target word and were selected using spoken vocabulary rated for familiarity at 70% or greater at the fourth-grade level or below (Dale & O’Rourke, 1976). The ElisionAdapted test was presented on a computer screen using BoardMaker™ software. The psychometrics of the Comprehensive Test of Phonological Processing include strong interrater reliability coefficients of .96 to .99 and a test–retest coefficient of .82 over a 2-week interval, as well as confirmatory factor analytic evidence showing high load on a Phonological Awareness factor.
The Peabody Individual Achievement Test Revised/Normative Update (PIAT-R/NU) is an individually administered achievement test designed to measure performance in six different academic domains through a multiple choice format (Markwardt, 1998). This study used the Reading Comprehension (PIAT-R RC) subtest. Children are instructed to read a sentence silently and are then presented with a choice of four pictures to describe the previously read statement. As with the PPVT-III, the original items of the PIAT-R RCAdapted were digitized for use with BoardMaker™ software. Each item in PIAT-R RCAdapted used a sequential presentation of stimuli; the first screen showed the sentence to be read, and the second screen presented the four possible pictorial responses to each sentence as presented in the standard administration. The box borders for response options were enlarged by 223% and presented within a 12.5 cm × 12.5 cm activation border with responding via HeadMouse or linear-scanning techniques. As reported by Markwardt (1998), the split-half reliability for the RC subtest is reported as .90 to .96 and test–rest coefficients range from .78 to .94. Evidence of criterion prediction validity includes correlations of .67 to .97 with PIAT-R Reading Recognition and .54 to .75 with the PPVT-R.
Results
To initially examine the effects of modifications on the psychometric properties in a sample of children who do not have significant impairments and can therefore participate in the standard and modified versions of tests, a separate mixed models repeated-measure ANOVA was run for each instrument with test version (Standard, Modified) as the within-subject variable and AT device (HeadMouse, Pressure Switch) as the between-subject variable (Table 1). As the greater concern was with the Type II error, there was no correction in significance level for multiple comparisons. The main effects for test version were not statistically significant for the PPVT-III, Raven’s CPM, and the PIAT-R RC. Elision scores, however, were significantly higher with the modified version, F(1, 59) = 12.42, p < .001, η2 = .17. Virtually identical results were obtained when these analyses were repeated in the CP group. 1
Means and Standard Deviations for Standard and Modified Tests in Typically Developing Children (N = 60)
Note. PPVT-III = Peabody Picture Vocabulary Test–Third Edition; Raven’s CPM = Raven’s Coloured Progressive Matrices; Elision = Comprehensive Test of Phonological Processing Elision subtest; PIAT-R RC = Peabody Individual Achievement Test–Revised Reading Comprehension.
p < .01.
Measurement agreement was examined by computing intraclass correlation coefficients (ICCs; Model 2.1; Shrout & Fleiss, 1979) and conducting Bland Altman tests (Table 2; Bland & Altman, 1986; Rankin & Stokes, 1998). PPVT-III and PIAT-R RC ICCs indicated very good to excellent agreement or high alternate-form reliability. Raven’s CPM and Elision ICCs were low, indicating insufficient measurement agreement.
Repeatability and Measurement of Agreement: Intraclass Correlations and Bland Altman Test Results in Typically Developing Children (N = 60)
Note. ICC = intraclass correlation coefficients; CI = confidence interval;
To further examine measurement agreement, Bland Altman plots were constructed by plotting the difference between each individual’s test version score against the mean of the two scores (Figure 1). The plots show excellent agreement for test versions of the PPVT-III and PIAT-R RC. The Elision plot indicates inadequate limits of agreement but no systematic bias. Figure 1 indicates inadequate agreement characterized by test version bias. Raven’s CPM plot indicates inadequate limits of agreement but no systematic bias.

Bland Altman test plots
Bivariate correlations within standard and modified versions were computed (Table 3). Coefficients were compared, using Fisher’s r to z transformations. There were significant bivariate correlations between the PIAT-R RC and the PPVT-III, Raven’s, and Elision scores with both test versions. The bivariate correlations between the PPVT-III and Elision scores were significant in both test versions, as well. The differences in the standard and modified test version correlation matrices were not statistically significant.
Bivariate Correlations for Standard and Modified Tests in Typically Developing Children (N = 60)
Note. Standard version above the diagonal and modified version below the diagonal. PPVT-III = Peabody Picture Vocabulary Test–Third Edition; Raven’s CPM = Raven’s Coloured Progressive Matrices; Elision = Comprehensive Test of Phonological Processing Elision subtest; PIAT-R RC = Peabody Individual Achievement Test–Revised Reading Comprehension.
p < .05. **p < .01.
Measurement agreement between the standard and modified versions of each measure was examined by computing the ICCs. In the CP group, ICCs for standard and modified measures were adequate for the PPVT-III (.78), Raven’s CPM (.86), and the PIAT-R RC (.95) and just below criterion for Elision (.74). In the control group, ICCs were adequate only for the PPVT-III (.95) and the PIAT-R RC (.88). Raven’s CPM (.34) and Elision (.44) ICCs for the control group were low, indicating insufficient measurement agreement. Test administration order had been randomized; however, to further examine insufficiencies in measurement agreement, control group order effects for the Raven’s and Elision were examined, post hoc. Order effects were not statistically significant.
Bivariate correlations between standard and modified versions of tests were computed to provide preliminary data regarding the nomothetic spans of instruments. Coefficients were compared, using Fisher’s r to z transformations. Test version differences were noted only in the CP group, in the associations between the Raven’s CPM and Elision, with a significant correlation noted with the standard version, r = .54, p < .01, but not with the Modified version, r = .18, n.s.
Discussion
This study was conducted to examine the psychometric properties of test instruments that had been modified for use with AT to minimize oral speech and motor response demands. Initial findings in a sample with no significant impairments provide preliminary evidence that the modification of existing quadrant forced-choice response instruments to allow access via AT does not result in significant change in scores or alter the nomothetic span. However, similar modifications of Raven’s CPM, with the standard six forced-choice responses, were not fully successful. In addition, an attempt to modify an existing phonemic awareness task resulted in significantly higher scores compared with scores on the standard version. The type of AT access, whether HeadMouse with direct selection or Pressure Switch with linear scanning, did not significantly affect scores on any of the instruments.
The modification of quadrant forced-choice instruments for accessible responding via AT does not alter the psychometric properties of the instruments in children with or without CP. There are group differences, however, in the psychometric effects of modifications of Raven’s CPM. Specifically, while Raven’s CPM measurement agreement is not adequate in the control group, it is adequate in the sample with CP. The reason for this is not clear at this time, and it may be a spurious finding.
The results support the potential to modify specific-item presentation and response formats of tests that use a quadrant choice format in ways that are accessible to children with disabilities who use AT, while maintaining the psychometric properties of the standardized versions. That said, modifications of a more complex test format with a larger response option set, as well as an initial attempt to create a multiple-choice pictorial response format for a test designed for verbal responses, clearly changed the psychometric properties of instruments.
There are a number of study limitations that affect the interpretation of the findings and the future applications. The samples were relatively small with nonrepresentative levels of intellect, including high average intellect in the initial control sample and average intellect in the sample with CP. Intellect, in particular, is related to fundamental choice-making capability, including a potential for AT use and level of participation in assessment (Van Tubbergen et al., 2008). Limited AT options were examined, and all participants were found to be novice users of these types of AT who did not have very significant sensorimotor impairments. All the selected instruments required adequate vision and, therefore, were not accessible to a signification number of children with disabilities. Finally, the AT adaptations in the current study offer a model for assessment that potentially could be used in school and clinic settings at a reasonable cost and with relatively brief training; currently, however, there are legal restrictions associated with copyrights that may preclude the use of the study instruments in clinical settings.
The current findings set the stage for further study of the psychometrics of accessible test procedures in children with disabilities, including children who are only able to participate in modified accessible testing. There is some complexity, however, to this next phase in research. In the population of children with disabilities, AT access tends to be tailored to each individual child’s optimal sensorimotor capabilities; two related issues then come to the fore. First, accurate motoric responding in children with motoric impairments may paradoxically require increased attentional and other cognitive resources so that AT creates access but at a general cognitive cost. Second, it is conceivable that AT access methods differ in cognitive load though relevant findings to date are quite limited (Dropik & Reichle, 2008; Mizuko & Esser, 1991; B. Wagner & Jackson, 2006). If indeed there are such differences, there is the potential for a mismatch between what is optimal for motor versus cognitive capabilities. A critical component of future studies will be to examine the cognitive load associated specifically with the demands of the accessible response technology. In the meantime, allowing children with CP to take tests such as the four-item forced-choice tests (e.g., PIAT-R RC and PPVT-III) in a computerized format with assistance of either a HeadMouse or button pressure switch appears to hold promise for adaptation of assessment procedures that may otherwise not be accessible to many of these children. Standardizing these adaptations in collaboration with test publishers in a way that is consistent with copyright law as well as professional standards regarding test security is a goal for future development.
Footnotes
Declaration of Conflicting Interests
The authors declared no potential conflicts of interests with respect to the authorship and/or publication of this article.
Funding
This work was supported by a U.S. Department of Education, Office of Special Education Programs (OSEP) Model Demonstration Project Award H234M020077, NIH R21 HD052592-01A, NIH R21 HD057344-01; U.S. Department of Education, National Institute on Disability and Rehabilitation Research Award FI H133G070044; and a grant from The Mildred Swanson Foundation.
