Abstract
The Pediatric Evaluation of Disability Inventory-Computer Adaptive Test is an alternative method for describing the adaptive function of children and youth with disabilities using a computer-administered assessment. This study evaluated the performance of the Pediatric Evaluation of Disability Inventory-Computer Adaptive Test with a national sample of children and youth with autism spectrum disorders aged 3–21 years. Parents (n = 365) completed an online survey that included demographics, the Social Communication Questionnaire, and the Pediatric Evaluation of Disability Inventory-Computer Adaptive Test Social/Cognitive, Daily Activities, and Responsibility domains. Item response theory analysis confirmed items in each domain fit a unidimensional model and few items misfit. A large number of items in the Social/Cognitive domain showed differential item functioning, indicating a unique order of item difficulty in this population in this domain. Differences in item difficulty estimates were addressed through a parameter linking (equating) process. Simulations supported the accuracy and precision of the Computer Adaptive Test. Results suggest that the Pediatric Evaluation of Disability Inventory-Computer Adaptive Test, as modified for autism spectrum disorder, is an efficient and sound assessment for this population.
Introduction
The growing number of youth with autism spectrum disorders (ASDs) underscores the importance of efficient, sensitive, and well-targeted instruments to measure outcomes for this population. Measures are needed that can document variations in capacities and limitations that may distinguish meaningful subgroups, to examine the associations among symptoms, impairments, and function, and to evaluate the effectiveness of interventions for improving long-term outcomes. The marked heterogeneity of the population argues for the importance of conducting research with a variety of measures since some may be better suited to certain research questions or clinical purposes than others.
One important focus of research in ASDs is the effectiveness of interventions to improve functional and adaptive outcomes, that is, the capacity of the person to perform daily life skills that are expected of similar-aged peers including practical skills, social skills, and conceptual skills (Schalock et al., 2010). Achievement of these skills is associated with the eventual ability of the individual with ASD to transition to adult roles, including independent living, employment, and community participation (Esbensen et al., 2010; Farley et al., 2009; Taylor and Seltzer, 2011). Although a growing body of research highlights the significant functional difficulties and poor adult outcomes experienced by individuals on the autism spectrum (Carter et al., 1998; Howlin et al., 2013; Newman et al., 2009), investigations of the relation between underlying impairments and symptomology and functional performance have yielded inconsistent findings. For example, studies examining the relationship between autism symptom severity and adaptive behavior have resulted in inconsistent findings. One study found that severity of autism symptoms is negatively correlated with overall adaptive behavior (Perry et al., 2009), while another reported that symptoms and behaviors were only correlated for children classified as high-functioning autism (Liss et al., 2001). Findings from research exploring the relation between measures of cognitive functioning and measures of adaptive function have also varied. Adaptive behavior was predicted by intelligence quotient (IQ) levels for children with ASDs classified as “low functioning,” while language and memory predicted adaptive behavior scores for children classified as “higher functioning” (Liss et al., 2001). However, in another study, all children with ASDs had adaptive behavior skills that were significantly lower than IQ (Gabriels et al., 2007). Finally, studies have also reported varied findings across domains of adaptive behavior, particularly whether domains other than socialization are impaired (Carter et al., 1998; Liss et al., 2001; Stone et al., 1999).
The challenges and inconsistencies encountered when attempting to understand the mechanisms underlying successful life outcomes for individuals with ASDs and the interaction between symptom severity, underlying impairments, and functional performance suggest that the field may benefit from other approaches to measuring the functional performance of everyday activities. To date, most investigations have used instruments developed within the framework of adaptive behavior in the field of intellectual disabilities (Schalock et al., 2010), in particular the Vineland Adaptive Behavior Scales (VABS-II) (Sparrow et al., 2005) and the Adaptive Behavior Assessment System (ABAS-2) (Harrison and Oakland, 2003). This conceptualization of adaptive behavior identifies two general aspects, personal independence and social responsibility, with three clusters of domains: conceptual, social, and practical. Within adaptive behavior instruments, the domains and subdomains have generally followed the categories defined by the AAIDD (Tassé et al., 2012), including communication, community use, functional academics, independent living and daily living skills, health and safety, and leisure. The items in each category address skills associated with that aspect of daily life. However, a previous content analysis of the VABS-II (Gleason and Coster, 2012) suggested that social and communication impairments, in particular, might influence ratings on items, and therefore summary scores, in a variety of subdomains not explicitly seeking to measure communication, including the VABS-II Daily Activities subdomain. Alternative approaches to defining domains, in which items measure skills that depend primarily on a single underlying ability (e.g. motor coordination, communication), might improve our ability to find patterns and relationships between underlying symptomology, functional skills, and life outcomes.
The World Health Organization (WHO) has adopted an alternative framework for describing function in daily life. The International Classification of Functioning, Disability, and Health (ICF) (WHO, 2001) and the Children and Youth version (ICF-CY) (WHO, 2008) were designed to provide a universal language for describing health, disability, and function across populations. The ICF-CY distinguishes several components of functioning and disability. “Body structures and functions” include foundational psychological and intellectual functions such as adaptability, attention and orientation, and range and regulation of emotion, which are typically impaired in children and youth with ASDs. Impairments in body structures and functions may lead to difficulties with “Activity,” which is defined as the execution of a task or skill. Difficulties in activities are represented as functional limitations and can be described in terms of capacity (ability to perform in a standardized environment) or performance (typical behavior in one’s own usual environment). The broader ICF component of “Participation” describes the person’s involvement in life situations and reflects the person’s engagement in culturally relevant social roles and settings. This framework distinguishes underlying impairment from the person’s application of their abilities, as demonstrated by functional skills and participation, and provides a guiding structure to conceptualize measurement of each component. Nine chapters define the domains of activity and participation: learning and applying knowledge; general tasks and demands; communication; mobility; self-care; domestic life; interpersonal interactions and relationships; major life areas; and community, social, and civic life. The ICF framework also incorporates the influence of contextual factors (personal and environmental) on function. This framework has increasingly been utilized as a conceptual framework for outcomes measurement in health and disability research for a variety of clinical populations, including youth with developmental disabilities and autism (Bonanni et al., 2009; Castro et al., 2013; Poon, 2011).
The ICF’s emphasis on universal versus condition-specific description of health and function, along with growing adoption of modern test development approaches, has stimulated research into the extent to which instruments used to measure function can be applied across clinical populations. One focus of this research has been whether the underlying structure of function—that is, the relations among different dimensions of performance in a given domain or the relative level of ability required to perform a particular set of tasks, remains the same across populations, a property known as measurement invariance (Alguren et al., 2011; Kim et al., 2013). If not, then these differences pose a potentially significant threat to valid interpretation of scores since scores from the same instrument may represent different underlying patterns of function in different groups. While measurement invariance has been investigated for the most frequently used measures of adaptive behavior, these investigations have been limited to subgroups defined by age, gender, and socio-economic status (Harrison and Oakland, 2003; Sparrow et al., 2005). Comparable investigations of measurement invariance of these instruments across clinical populations are not yet common (although see Frazier et al., 2014, for a recent example).
This article reports results from such an investigation, in which we evaluate the structural validity of a new instrument developed within an ICF framework, the Pediatric Evaluation of Disability Inventory-Computer Adaptive Test (PEDI-CAT) for children and youth with symptoms of ASDs. The PEDI-CAT offers a potential alternative method for describing functional profiles for children and youth with ASDs. However, given that it is based on a different conceptual framework than other measures of adaptive behavior, empirical evidence is needed to verify that the assessment functions as intended in youth with ASDs. It is possible that the unique profile of strengths and limitations that characterize ASDs may be associated with systematic differences that could threaten the validity of assessment scores.
Specific research questions were as follows.
Do items from the PEDI-CAT Daily Activities, Social/Cognitive and Responsibility domains cohere to represent the same three unidimensional constructs identified in the standardization sample, suggesting the performance of youth with ASDs fits the conceptual model of the instrument?
Do all items in each domain demonstrate acceptable fit along a unidimensional continuum, indicating that items reflect a common underlying construct in youth with ASDs?
Do any items show differential item functioning (DIF) between the original PEDI-CAT sample and this population, indicating measurement variance in this clinical population?
To what extent do scores generated by the CAT administration correlate with scores generated from the full item set?
Methods
Participants
A convenience sample of parents of children and youth aged 3–21 years was recruited through local and national service, support, and advocacy groups for children with ASDs via flyers, listserv emails, and website announcements. Eligibility was determined via email or over the phone by positive responses to the following questions: (1) Has your child been diagnosed with ASD, including autism, Asperger’s syndrome, or pervasive developmental disorder-not otherwise specified (PDD-NOS)? (2) Do you and your child currently live in the United States? (3) Is your child between the ages of 3 years, 0 months and 21 years, 11 months? (4) Are you the child’s legal guardian (even if your child is above the age of 18)? The research team used two additional screening questions to reduce the risk of enrolling respondents purposefully misrepresenting their eligibility for participation, a common risk in internet research (Kramer et al., 2014).
Instruments
The original PEDI (Haley et al., 1992) is a parent- or clinician-report measure that has been used extensively in rehabilitation and special education programs as well as in clinical research with children with a variety of disabilities including cerebral palsy (Dolva et al., 2004; Hinderer and Gupta, 1996), Down syndrome (Dolva et al., 2004), osteogenesis imperfecta (Engelbert et al., 1997), and acquired brain injury (Kothari et al., 2003). It has been translated into multiple languages, and studies have demonstrated its reliability, validity, and responsiveness with various populations (Haley et al., 2011). Recently, the PEDI was revised to increase the age range covered (up through age 20), to revise and update the content and rating scales to make them maximally applicable across the diverse population of children and youth with disabilities, and to develop a computer-based assessment. The ICF-CY (WHO, 2008) served as a guiding conceptual framework when developing the PEDI-CAT domains and corresponding rating scales.
The PEDI-CAT assesses four domains of function: Daily Activities, Social/Cognitive, Mobility, and Responsibility. The Daily Activities, Social/Cognitive, and Mobility domains of the PEDI-CAT were designed to measure the performance aspect of the ICF construct of “activity,” that is, how the child or youth performs the activities in his or her usual daily environment. Items in the Daily Activities domain assess a child’s daily living skills such as eating, dressing, and grooming activities. The Daily Activities domain also includes items related to household maintenance and the operation of electronic devices. The Social/Cognitive domain assesses a child’s ability to interact with others in a community and to manage functional cognitive activities such as counting out change. Social/Cognitive items address communication, interaction, safety, behavior, play, attention, and problem-solving. The Mobility domain examines the child’s ability to perform a variety of functional movements including walking, getting in and out of a chair, climbing stairs, or running. The Daily Activities and Mobility items also include line drawings depicting the activity assessed in each item. In order to maximize item fit to a unidimensional measurement model, items were written as much as feasible to reflect the application of a single primary underlying ability: full-body movement for Mobility, use of arms and hands for Daily Activities, and cognition for Social/Cognitive. Item descriptions were worded to minimize reference to particular modes of performance and thus credit the capabilities of children and youth with disabilities who may use alternative methods to accomplish daily tasks successfully.
The child or youth’s performance of items in the PEDI-CAT Daily Activities, Social/Cognitive, and Mobility domains is rated using a 4-point Likert scale measuring the extent to which he or she has difficulty performing each activity: “Unable,” “Hard,” “A little hard,” and “Easy” (see Supplementary Table S1 in Appendix). Parents identify the most appropriate rating by considering their child’s typical performance while using usual supports such as alternative communication devices.
The Responsibility domain assesses the extent to which a young person is managing life tasks that are important for the transition to adulthood and independent living (e.g. fixing a meal, planning and following a weekly schedule). Thus, taking responsibility for managing life tasks supports participation in life situations and reflects the “participation” construct within the ICF. This domain contains content assessing health management and literacy, citizenship, safety, and community mobility and, like the activities dimension, reflects the person’s current typical behavior in his or her usual environment. A 5-point rating scale is used to assess the shift of responsibility for a life task from parents taking all responsibility, to shared responsibility, to the young person taking all responsibility. A 5-point scale was used because the shift of responsibility has a meaningful mid-point where parent and child equally share responsibility for managing a task. The ratings for the Responsibility items do not require the young person to perform each life task independently, but instead reflect overall independence in managing the task. Independence from this perspective may include requesting specific assistance or resources as needed or directing others (e.g. personal assistant) in order to accomplish the task.
Each domain of the PEDI-CAT can be completed separately. Scores computed include norm-based T-scores (mean of 50, standard deviation (SD) of 10), criterion-referenced scores (reported on a 20–80 scale), and percentile ranges. Criterion scores are the preferred scores to detect change over time in a context of overall delay because they capture changes in performance along the overall continuum of function represented by the items.
The PEDI-CAT was standardized with a stratified nationally representative (US) sample of 2205 children and youth without disabilities and 703 children and youth with heterogeneous disabilities aged 0–21 years, including 108 with ASDs (Haley et al., 2011). Item parameters obtained from the standardization sample were then used to construct the CAT. Results from a prospective field study of the CAT showed excellent re-test reliability (⩾0.95 for all domains) and differentiation between groups with and without disabilities (Dumas et al., 2012). Parents took an average of 12 min to complete all four PEDI-CAT domains.
Prior to this study, we conducted an extensive qualitative evaluation of the appropriateness of the content and rating scales of the PEDI-CAT for children and youth with ASDs (Kramer et al., 2012). First, a series of focus groups and cognitive interviews were conducted with professionals and parents of children with ASDs to identify whether aspects of the original PEDI-CAT might need modification. We focused on the Daily Activities, Social/Cognitive, and Responsibility domains, as they are most likely to be uniquely impacted by characteristics associated with autism. The findings were used to expand the item pools and modify some aspects of instructions to give directions for rating certain behavior patterns common in ASDs (e.g. inconsistent performance; see details in Supplementary Table S1 in Appendix). The Social/Cognitive domain was expanded with 4 new items, 5 items were reinstated after being dropped from the original PEDI-CAT, and 15 items were supplemented with additional directions to guide rating decisions for children with ASDs (see Supplementary Table S2). The Responsibility domain was expanded with six new items, one reinstated item, and two items supplemented with additional directions. The Daily Activities domain was expanded by reinstating eight items dropped from the original PEDI-CAT. This modified PEDI-CAT, which includes the modified instructions and minor item revisions and additions, was used in this study and for clarity will be referred to as the PEDI-CAT (ASD). One of the advantages of item response theory (IRT)–based scales, as discussed below, is that new items can be added to an existing domain and situated along the same measurement continuum based on findings from the IRT analysis. These additions do not alter the meaning of criterion scores obtained using the original set of items; rather they provide additional items that can be used to estimate the person’s location on the functional continuum (Haley et al., 2009).
The CAT approach to assessment
A CAT (Wainer et al., 2000) employs a simple form of artificial intelligence that selects questions directly tailored to the person’s estimated ability level, shortens or lengthens the test to achieve the desired precision, scores everyone on a standard metric so that results can be compared, and displays results instantly. CAT applications require a large set of items in any one domain, and those items should consistently scale along a unidimensional continuum from low to high functional proficiency.
The IRT measurement approach is used to obtain empirical evidence about the relationships of items within a domain. Item response theory approaches require that items meet two important assumptions: unidimensionality and local independence. Confirmation that these assumptions are met (as determined through IRT analysis) means that items from one domain tap a single common dimension. That is, the performance on different items is not related except through the shared underlying ability or trait (i.e. the continuum of function in that domain of the PEDI-CAT). The data obtained from the research sample are used to locate items along the underlying dimension; thus, each item and rating are associated with a specific position on that continuum and a corresponding score. Items that are located below a particular score are likely to be achieved by a respondent with that score, and items that appear above that score represent performance that most likely is still developing. Thus, the summary score is not strictly dependent on which items were administered, but rather represents the best estimate of the person’s ability (location on the continuum of function) based on his or her performance on the items completed. This unique approach to conceptualizing the measurement of abilities was first applied to large-scale educational assessments and in recent years has been adopted as the method of choice in medical and rehabilitation fields to develop more effective and efficient outcome measures (Cella et al., 2007; Velozo et al., 2012). One important feature of IRT-derived scales is that the resulting measurement model is expected to be sample invariant. That is, the item parameters derived from the calibration sample are expected to remain invariant across other samples from the same population. This is a critical difference from measures developed from a Classical Test Theory (CTT) approach, in which the psychometric properties of the instrument are tied to (and reflect) the distribution of the trait or characteristic being measured in the sample used to standardize the instrument (Streiner and Norman, 2008).
Administration of the CAT begins with a global item that is selected a priori on the basis of the range of performance or ability it covers; thus, all respondents answer the same first question. Based on the response to the first item, a score and standard error are estimated, and then the computer algorithm selects the next optimal item and a response is recorded. With the administration of the next item, the score is re-estimated. The computer algorithm determines whether a pre-determined stopping rule has been satisfied based on the magnitude of the error estimate. If satisfied, the assessment of that domain ends. If not satisfied, new items are administered in an iterative fashion until the stopping rule is satisfied and the final score is obtained (see Figure 1).

PEDI-CAT item selection process.
Previous studies, including studies of the PEDI-CAT, suggest that after 15 items a well-designed CAT can provide scores that are as precise as scores obtained using a domain’s full item set (Dumas et al., 2012). CATs have several logistical benefits over paper and pencil assessments. Reducing the number of items each respondent must complete reduces administration time and respondent burden. CATs can be administered via the Internet, at an independent computer station, or on an electronic tablet, and scores are generated instantly, thus reducing the personnel resources usually associated with assessment administration. The computer-based administration provides additional benefits such as the ability to include pop-up directions and pictures that help respondents understand the question; such features help ensure a variety of respondents interpret the assessment in a consistent manner.
CATs are well suited to meet the challenges of measuring a population with a very broad range of functional abilities such as children and youth with ASDs. The CAT process of selecting the most relevant items from a pool of possible items enables researchers and clinicians to administer the same assessment to respondents with a range of skills or impairment severity. Essentially, each respondent completes a unique set of items that are targeted to their abilities; however, their resulting scores are located on the same overall scale and are highly precise. The location of the score along the continuum of items allows researchers and clinicians to quickly and accurately identify the expected performance on each item of a person with that ability level.
Procedures
Human ethics approval was received from the University Review Board before research activities were conducted. All data collection was completed online using a secure website hosted by a private company. After eligibility was determined, parents received a link and password to access the website. Parents provided informed consent online before proceeding to the online survey. They first completed a demographic questionnaire to gather descriptive information about the children and their family context and the Social Communication Questionnaire (SCQ)-Current. The SCQ Current and Lifetime were used to describe current and past symptom severity. The SCQ (Rutter et al., 2003) is a 40-item parent report used to assess the presence of specific autistic behaviors. Higher scores indicate more severe behaviors. The SCQ has established psychometric properties for persons aged 2–40 years, and a series of studies suggests that the SCQ has acceptable sensitivity and specificity to screen for autism based on a cut-off score of 15 (Berument et al., 1999; Chandler et al., 2007; Charman et al., 2007; Witwer and Lecavalier, 2007). Since our focus was to examine how the PEDI-CAT performs with individuals ranging in functional ability and symptoms, we did not exclude respondents based on SCQ scores.
Parents next answered all items in the Social/Cognitive (68 items), Daily Activities (76 items), and Responsibility (58 items) domains. Parents could exit the survey at any time and return within a 14-day period. After completing the PEDI-CAT items, parents had the option of completing the SCQ Lifetime.
Data were downloaded from the secure website into SPSS. Duplicate respondents and respondents who did not provide data beyond initial consent were removed from the database.
Analytical procedures
The structure and unidimensionality of the domains were evaluated using confirmatory factor analysis (CFA) and several indexes of fit. Comparative Fit Index (CFI) and Tucker Lewis index (TLI) values range from 0 to 1. Values of 0.90 or higher indicate acceptable fit and values above 0.95 indicate good fit. The root mean square error of approximation (RMSEA) was also examined; values less than 0.08 indicate acceptable fit and less than 0.05 indicate good fit.
An IRT approach was used to obtain item parameter estimates for each of the domain item pools for youth with ASDs. We used a graded response IRT model to obtain item parameters that reflect both item difficulty (location along the underlying continuum) and discrimination (the extent to which each item is sensitive to differences in respondent abilities). These item parameters provide the information for the computer algorithm driving the CAT. In a graded response model analysis, data must meet model assumptions of unidimensionality and fit. Item parameters, estimated using PARSCALE, were individually examined for fit to the construct. Likelihood ratio chi-square statistics were used to examine item fit parameters across the distribution of the construct; a p-value less than 0.05 indicated item misfit (Haley et al., 2011).
Next, we used logistic regression to examine DIF to determine whether the item parameters derived from the ASD sample were significantly different from those derived from the standardization sample. DIF occurs when the difficulty of one item relative to another changes for respondents with different characteristics, although the item still fits along the underlying continuum. If DIF is present, the valid interpretation of scores is threatened due to shifts in locations of the items along the continuum (see Figure 2). The dependent variable in the regression was the response to an item, and the independent variables were participants’ total score, group membership (ASD sample or standardization), and an interaction term between the total score and group membership. The standardization sample was a subset of 525 youth without disabilities whose caregivers answered the full set of PEDI items in a particular domain during the original standardization.

An illustration of differential item function (DIF).
The analytic approach was to successively add total score, group membership, and interaction term into the model in three steps, and the procedure was repeated for each item. The test statistic was the −2log likelihood difference between models, which is distributed as a chi-square with two degrees of freedom, and the effect size was the R2 change between models. The following criteria were set for DIF analysis: If the likelihood difference test was statistically significant and the R2 change was greater than 0.035 for one item, that item exhibited DIF. If substantial DIF was found between samples, item parameters from the ASD sample would need to be adjusted and then linked with those in the existing PEDI-CAT so that domain criterion scores obtained from the two assessment versions remained comparable (T-scores reflect only the model derived from the normative sample and are not adjusted) (Embretson and Reise, 2000; Haley et al., 2009).
To address the fourth research question, simulations were conducted to examine the accuracy and precision of scores obtained from the PEDI-CAT (ASD) software were compared to scores obtained from administration of all items in the domain. Simulations were completed to obtain estimated CAT scores for each respondent for each domain. As items were selected for administration in the simulation, responses were taken from the actual data. For these simulations, we established specific stop rules of 5, 10, and 15 items for each domain. This procedure produced one simulated record of responses for each respondent for a 5-, 10-, and 15-item CAT version. Results were evaluated based on strength of correlations between full item set scores and scores obtained from the three CATs. We also examined the percent of subjects with individual score reliability at two levels of confidence: r > 0.90 and r > 0.95.
Results
Children and youth in the sample (n = 365) ranged in age from 3 years to 21 years (M = 11.9 years, SD = 4.67 years). The majority of respondents were mothers (93.7%) reporting about their male children (83.3%). Parents reported their child’s diagnosis as autism (52.1%), Asperger’s (25.5%), and PDD (22.2%). In all, 21% (21.9%) of parents indicated their child had a current or past diagnosis of intellectual disability. Table 1 contains additional demographic information about this sample.
Sample demographics (n = 365).
IQ: intelligence quotient; SCQ: Social Communication Questionnaire.
A total of 33 respondents (9%) preferred not to report their income.
Optional: 300 respondents completed the SCQ Lifetime.
In all, 27% of respondents did not provide this information.
The number of responses available for each domain varied: Social Cognitive (n = 365), Daily Activities (n = 359), and Responsibility (n = 356). Two respondents who were later determined to not meet inclusion criteria were included in the Social/Cognitive analysis and one was included in Daily Activities and Responsibility analyses; analyses remain as reported as IRT methods are robust to such small variations.
Evaluation of unidimensionality and item fit
All three PEDI-CAT (ASD) domains showed good evidence of unidimensionality based on the CFA, supporting the appropriateness of the PEDI-CAT’s conceptual model for this population. The CFI, TLI, and RMSEA all indicated good fit to a unidimensional model for the Daily Activities and Responsibility domains and acceptable fit for the Social/Cognitive domain (Table 2).
Confirmatory factor analysis of PEDI-CAT (ASD) domains.
CFI: Comparative Fit Index; TLI: Tucker Lewis index; RMSEA: root mean square error of approximation; PEDI-CAT: Pediatric Evaluation of Disability Inventory-Computer Adaptive Test; ASD: autism spectrum disorder.
The domain items also showed good fit to their respective underlying continuum. Only four items across the three domains had poor fit (see supplementary Table S3 in the Appendix). In the Social/Cognitive domain, all revised items and all new and reinstated items except one had acceptable fit in this sample. In the Daily Activities and Responsibility domains, all new, revised, and reinstated items had acceptable fit. We used a two-step approach to determine whether any of the misfitting items should be dropped: first, we examined item performance in the standardization sample and second, we examined the extent to which each item assessed a task of unique difficulty on the underlying continuum. Three of the four items had acceptable fit in the standardization sample and assessed unique tasks. In order to maximize similarity in the item sets between the two versions of the instrument, we decided to retain these items. Only one item from the Social/Cognitive scale was removed because it also had poor fit in the earlier analyses.
Evaluation of DIF
Four Responsibility items, 4 Daily Activities items, and 32 Social/Cognitive items demonstrated DIF between the sample without disabilities and the sample with ASDs. One Daily Activity item with poor fit also had DIF. Two Daily Activity items and one Responsibility item with DIF pertained to gender-specific self-care activities. Items from the Social/Cognitive domain with DIF represented all four content areas addressed within the domain (communication, interaction, everyday cognition, and self-management); however, the area of everyday cognition had the highest percentage of items with DIF (64%).
Given the very limited DIF in the Daily Activities and Responsibility domains, no modifications to the original CAT parameters were deemed necessary. However, the large number of items with DIF in the Social/Cognitive domain required an equating, or linking, procedure to be performed to ensure that criterion scores for youth with ASDs would remain a valid representation of youths’ functional performance while still being comparable to the original PEDI-CAT Social/Cognitive criterion scores. We followed recommended procedures for doing so (Embretson and Reise, 2000). First, we obtained item parameter estimates using the responses from the ASD sample. These unique item parameters reflect the relative difficulty of the items to one another along the underlying unidimensional continuum for youth with ASDs. Next, we statistically linked the ASD item parameters to the original PEDI-CAT scale (Embretson and Reise, 2000). The result of this linking is that criterion scores obtained using the ASD item parameters can be directly compared to criterion scores obtained from the original PEDI-CAT. Thus, for example, the overall estimated ability of a youth with ASD who obtains an ASD-CAT criterion score of 65 is the same as a youth with typical development who obtains the same score on the PEDI-CAT. We used a similar equating technique for several items with missing parameter estimates for the highest rating scale thresholds: one Social/Cognitive item, three Daily Activities items, and two Responsibility items. This was a conservative approach that assumed DIF in the absence of the ability to empirically evaluate DIF.
Real-data simulations were then conducted by utilizing the original or linked item estimates as described above in the CAT software. Results showed high correlations (intraclass correlation coefficient (ICC) >0.95) between the scores obtained from the full item set and scores from all three CAT options (Table 3). These results are highly similar to results from the original PEDI-CAT simulations (Haley et al., 2011). Examination of individual score reliability indicated that substantially more subjects had individual score reliabilities at the designated levels with the 10-item and 15-item CATs compared to the 5-item CAT (Table 4).
Agreement (intraclass correlation coefficient, 95% CI) between scores from 5, 10, and 15-item CAT and full item bank for PEDI-CAT (ASD) domains.
CI: confidence interval; PEDI-CAT: Pediatric Evaluation of Disability Inventory-Computer Adaptive Test; ASD: autism spectrum disorder.
Percent of subjects with score reliability >0.90 and >0.95 for PEDI-CAT (ASD) domains.
PEDI-CAT: Pediatric Evaluation of Disability Inventory-Computer Adaptive Test.
Discussion
This study provides evidence that criterion scores of children and youth with ASDs obtained from administering the PEDI-CAT (ASD) can be compared to those obtained from other groups assessed with the standard PEDI-CAT. That is, domain criterion scores on the PEDI-CAT and PEDI-CAT (ASD) represent the same degree of function in that domain. Good unidimensionality for all three domain item sets, as indicated by CFA results and item fit analyses, suggests that the structure of these scales is appropriate to assess children and youth with a range of autism symptoms who are heterogeneous in age and function. The simulation results indicate that the 15-item PEDI-CAT (ASD) yields a criterion score that is statistically equivalent to administering the full item set for each domain. These results provide evidence that the PEDI-CAT (ASD) can provide an efficient measure of the functional performance of children and youth with ASD aged 3–21 years.
The items in the Daily Activities and Responsibility domains showed little DIF. These findings suggest that the order of skill acquisition and taking on responsibility for life tasks, as measured by the PEDI-CAT items, is not substantially different for children with heterogeneous ASDs. Results from a sample of 108 children and youth with ASD included in the development of the original PEDI-CAT found that they performed significantly lower than children without disabilities at the reference ages of 10 and 15 years, but not at 5 years (Kao et al., 2012). Other studies have reported that children with ASDs may acquire skills required to complete daily functional tasks at a slower rate than children without disabilities (Fisch et al., 2002; Liss et al., 2001). The lack of DIF in these domains in this study’s larger sample provides additional support for the idea that children with ASDs may present with delays but learn to perform increasingly complex daily activities and manage life responsibilities in approximately the same sequence as children and youth without disabilities. These findings provide additional information regarding the development of functional skills in this population and demonstrate the potential usefulness of the PEDI-CAT (ASD) for pursuing related lines of inquiry.
Conversely, the high number of items with DIF from the Social/Cognitive domain suggests that certain skills are uniquely easier or harder for children and youth with ASDs to achieve compared to youth without disabilities at the same ability level. This finding was partially anticipated given the communication, interaction, and meta-cognitive difficulties that are associated with autism. DIF can arise from several factors, for example, associated impairments that present unique challenges in certain types of activities. Further research is needed to determine what factors may account for the differences found in this study. It is important to note that although the difficulty parameters for the items with DIF suggest a unique influence, these items nevertheless still fit along the same underlying domain continuum, as indicated by item fit and CFA results. In other words, the items could all be located on the same continuum; however, their location on the continuum—that is, their relative order of difficulty—was consistently different from that of the standardization sample.
These results should be considered in the context of several limitations. For feasibility reasons, in order to obtain the large sample size required for IRT-based item calibration, we used a convenience sample and relied on parent report measures of cognitive level and symptoms of ASDs. Although participants were diverse in geographic location, they were mostly mothers, predominately Caucasian, and with moderate to high family incomes. However, the resulting item parameters were compared to those from the nationally representative normative sample and were not substantially different for two of the three domains examined and for just under half of the items in the Social/Cognitive domain. Nevertheless, future studies should examine whether items would function in a similar manner for children with ASDs with other cultural backgrounds and family contexts and to examine the sensitivity and precision of the PEDI-CAT with children and youth with ASDs with more severe impairments.
This sample also included children and youth with SCQ Current scores that were below the published cut score of 15, and we did not have an independent verification of ASD diagnosis. However, this study was not designed to norm the instrument or to create a diagnostic profile, but rather to test the validity of the IRT measurement model of the original PEDI-CAT for the population with children and youth with symptoms of ASDs. For that purpose, heterogeneity in age, a broad range in function, and potential variation in patterns of function are desirable so that results are broadly applicable. The IRT assumption of sample independence allows us to apply our findings to youth with a similar range of symptomologies as captured by the SCQ.
This article examined an important question related to the construct validity of the PEDI-CAT for use with the ASD population. For instruments developed using Modern Test Theory approaches such as IRT, this analysis is a required first step before proceeding to traditional psychometric analyses (Velozo et al., 2012). A subsequent study investigating the properties of the PEDI-CAT (ASD) examines whether the high re-test reliability of the original PEDI-CAT is replicated for the PEDI-CAT (ASD) and the relation of PEDI-CAT (ASD) scores to a well-established measure of adaptive behavior (VABS-II). These results will be reported in a separate paper, as well as parents’ responses to the novel CAT format.
Conclusion
This study provides initial support for the structural validity of the PEDI-CAT (ASD) for children and youth with ASDs. The criterion scores generated for the Daily Activities, Responsibility and equated (linked) Social/Cognitive domains take into account commonalities and differences in item parameters between children and youth with ASDs and children without disabilities. As a result, criterion scores account for the unique characteristics of children with autism symptoms while remaining comparable to scores generated from the original PEDI-CAT. The findings support the potential use of the PEDI-CAT (ASD) in practice and research.
Footnotes
Funding
This research was supported through a grant from National Institutes of Health (NIH)/National Institute of Child Health and Human Development (NICHD) R21HD065281 to Boston University. Data featured in this manuscript are available through the NIH-supported National Database for Autism Research (NDAR). Collection ID 1880: http://ndar.nih.gov/data_from_labs.html?id=1880&showSingle=true
