Abstract

Spaulding, T. J., Szulga, M. S., & Figueroa, C. (2012). Using norm-referenced tests to determine severity of language impairment in children: Disconnect between U.S. policy makers and test developers. Language Speech, and Hearing Services in Schools, 43, 176–190.
School districts typically require that speech-language pathologists (SLPs) use standardized norm-referenced tests to qualify children for services. SLPs may also be expected to classify children with language impairment into different severity categories as a way of determining the frequency and intensity of services a child should receive.
Researchers investigating the relationship between the degree of language skill in children with language impairment and their topics of investigation may use children’s performance on norm-referenced tests as a prime indicator of impairment severity. The goal of research typically is to describe similarities and differences among cohorts of children. In contrast, the goal of a clinical assessment is to obtain a comprehensive picture of the child’s linguistic skills and the impact that linguistic deficits may have on the child’s functioning in everyday environments. Therefore, to determine the severity of a child’s language impairment, clinicians may supplement norm-referenced tests with a more in-depth analysis of the child’s needs.
Certain assumptions are made when using norm-referenced tests to determine the severity of a child’s language impairment. One assumption is that the lower the children score on these tests, the greater their severity of impairment. Another is that the tests administered accurately distinguish among varying degrees of linguistic impairment. However, there is no evidence to support these assumptions. Norm-referenced tests typically are not developed to assess fine distinctions in language skill. Furthermore, severity ratings for children with language impairment have been found to change on different versions of the same test (Ballantyne, Spilkin, & Trauner, 2007). For example, children with language impairment were rated as more severely impaired on the older CELF-R (Clinical Evaluation of Language Fundamentals; Semel, Wiig, & Secord, 1995), than on the newer CELF-III (Semel, Wiig, & Secord, 2003). Although this difference may be partially attributable to the renorming of the test or alternative population modifications, it does not change the fact that children known to have a language impairment were still rated less severely on the newer edition of this test. In fact, some children with language impairment were rated as exhibiting normal language proficiency on the CELF-III when using the cutoff points recommended for use in the CELF-III manual. The lack of consistency in impairment severity determinations between these two tests and the normal language proficiency characterization assigned to some children with language impairment on one version suggest that it is unlikely that all child language tests are useful for differentiating among severity of language impairment.
If severity determinations are used to make clinical decisions, then the usefulness of the severity categories depends on the accuracy with which they capture a child’s degree of language difficulty. If clinicians are required to use norm-referenced tests for this purpose, then the tests must be developed to be sensitive to different degrees of impairment and this information should be explicitly stated in the test manuals. A child’s norm-referenced score on a language test is an estimate of how that child’s language skill, in the domain(s) of language assessed, compares with that of his or her peers. Although knowing how a child’s performance compares with that of typically developing children may help to inform a clinician as to whether or not the child’s language skills are impaired, this type of information does not permit identification of the severity of the language deficit. If the purpose of administering a child language test is to gather information to determine the severity of a language impairment, rather than to determine if an impairment exists, a different reference population is required for comparison. This reference population would be children with language impairment. Comparing the performance of a child with language impairment to the performance of children with different severities of impairment would help a clinician to determine how similar the child is to children with different degrees of impairment.
Severity designations may have important implications for prognosis and service delivery. Children whose language impairments are designated as severe will likely be given a more guarded prognosis than children who exhibit a more mild degree of language impairment. A child with a severe language impairment may be offered more intervention than one whose language impairment is deemed to be less severe (Barker, Baldes, Jenkinson, & Wilson, 1982).
If clinicians use norm-referenced tests, then how do they map children’s scores on these assessments to severity categories. One way of doing this is to use cutoff points. A cutoff point is the minimum score value required of a performance level. Research in the area of child language has investigated the use of cutoff points on norm-referenced tests for differentiating children with impairments from typically developing children (e.g., Greenslade, Plante, & Vance, 2009; Gutiérrez-Clellen & Simon-Cereijido, 2007). However, once children are identified as having a language impairment, clinicians may also use cutoff points to differentiate among severity levels.
The Study
This study had two purposes:
To determine whether state education departments provide criteria for determining the severity of impairment in children with language difficulties, particularly if they use criteria related to the use of norm-referenced tests for this purpose.
To investigate whether the purpose and characteristics of the tests support their use for identifying language impairment severity in children.
State Guidelines
The authors obtained state education departments’ guidelines specifying procedures for determining the severity of language impairment in children from a comprehensive search of state education department websites. Each state’s education department was contacted for one of two purposes.
For states in which guidelines were available online, confirmation that posted guidelines were up to date was determined by phone calls.
For states that did not have posted guidelines available online, these state departments were contacted to confirm that written guidelines for clinical reference were not available.
For those states that did have written, published guidelines that addressed language severity, information was obtained regarding the number of levels of severity they used for describing the degree of language impairment, the labels for each severity level, and their operational definitions. In addition, data regarding the use of norm-referenced tests in this process were obtained. If norm-referenced test performance was included in making severity decisions, the authors asked if it was the sole factor in making severity determinations. If performance on norm-referenced tests was used to determine or to assist in determining the severity of a child’s language impairment, information was collected on what cutoff points each state used for differentiating the severity levels and whether states provided information on which norm-referenced tests were appropriate for this purpose. In addition, information was obtained as to whether state guidelines that identified norm-referenced tests as tools for making severity decisions provided specific procedures for determining the severity of impairment when more than one norm-referenced test was administered. Finally, information was obtained regarding how severity level determinations were used with reference to eligibility, treatment frequency and duration, and treatment priority decisions.
Test Review
The authors reviewed the latest edition of 45 norm-referenced test manuals of child language for use with school-age children between the ages of 5 and 17 years. Criterion referenced tests and screenings were excluded from this review.
From the review, the authors determined
Whether or not the test manuals indicated that the tests could be used for the purpose of identifying children’s language impairment severity.
Whether children with different severities of language impairment were included in the standardization sample.
Whether the manuals provided information to convert test performance to a severity category, the terms and operational definitions for the severity ratings, and the boundary cutoff points between the severity categories. In addition, they reviewed the manuals to identify how specified cutoff points were derived, that is, whether discriminate analyses to differentiate severity categories were conducted and were made available within the test manual for reference to determine the accuracy with which the tests differentiated the severity categories at the cutoff points specified.
Results
State Guidelines
Eight of 50 state education departments have adopted specific criteria for clinicians to use to determine the severity of children’s language impairment (Arkansas, Colorado, Illinois, Kentucky, Maine, North Dakota, Tennessee, and Virginia). All of these states included information on how to use norm-referenced tests to determine or to assist in determining children’s degree of language difficulty.
Six of these eight states indicated the importance of using alternative assessments, such as language sampling, observation, and checklists, to assist in determining the severity of impairment. One state specified that such alternative assessments could only be used when the validity of the norm-referenced test performance was in question.
One state specified that two or more diagnostic procedures/standardized tests needed to be employed to determine the severity of impairment but did not indicate what other diagnostic procedures aside from standardized tests were acceptable.
Each of the eight states provided cutoff-point criteria based on norm-referenced test performance to determine the severity of language impairment, but the cutoff points were not consistent across all states. Hence, a child’s eligibility for services could depend on the child’s state of residence.
Five of the eight states provided a clear weighting procedure for determining the relative contributions of norm-referenced tests and other informal methods of assessments to a severity determination.
With the exception of North Dakota, the guidelines provided by state education departments indicated specific boundary cutoff points for clinicians to employ without indicating which tests these boundaries are appropriate for use with.
Test Review
Only the manual for the Test of Word Knowledge (Wiig & Secord, 1992), indicated that it could be used for the purpose of determining the severity of language impairment in children. However, no information was provided on how to calculate or describe the degree of language impairment.
Eleven test manuals provided tables to convert standard scores or scaled scores to language severity categories although no manual indicated that this was a purpose of the test.
Only three test manuals used severity category terminology that aligns with those used by the eight state education departments. None of the test manuals that used severity categories provided a data-driven method for determining the severity of language impairment. In other words, no test manual provided statistical analyses to indicate how the severity categories and cutoff points that they provided for determining degree of impairment were empirically derived. Both cutoff points and category labels tended to vary based on which companies published the tests rather than on how children with language impairment performed on the tests.
Discussion
If SLPs are going to use norm-referenced assess tests to determine a child’s degree of language impairment, then the tests selected for use should be designed for that purpose. To be valid for identifying the severity of language impairment in children, the test should be standardized on a representative sample of the population for which it is intended, specifically, children with language impairment. Tests developed with a reference sample from the general population will likely have an insufficient number of items that assess the lower skill levels which are necessary to differentiate the severity of language impairment. Tests developed to determine the severity of impairment by targeting those lower skill level areas would be poor at determining if an impairment exists because children with typical language would likely perform at ceiling or close to ceiling levels. This would result in less discrepancy between how typically developing children and children with language impairment perform on the assessment, thus decreasing the diagnostic accuracy of the assessment tool. Therefore, a test that is developed to be sensitive to identifying the presence of language impairment and one developed to be sensitive to differentiating the severity of language impairment would contain very different test items at different levels of difficulty.
SLPs must determine whether a norm-referenced test is useful for a particular clinical intention. If the purpose is to determine the severity of impairment, it is important to make sure the norming sample aligns with this intended purpose. Tests designed to assess language severity should be sufficiently sensitive to differences in language ability at the difficulty level at which impaired children perform on the test, or in the case of severity categories, between the severity boundaries. If severity categories are used, discriminate analyses supporting the accuracy with which cutoff points identify the severity of impairment need to be included. Without this information, there is no indication that the severity ratings assigned to a child using these assessments are accurate representations of a child’s degree of language impairment. It is important to note, however, that a well-constructed, norm-referenced test designed for determining language severity does not provide a holistic picture of a child’s degree of language impairment for several reasons:
Given the heterogeneity of developmental language impairment, clinical disorders manifest with relative strengths and weaknesses in different areas of language, many of which cannot be captured sufficiently by performance on norm-referenced tests.
Norm-referenced test performance is a static measure of current language functioning and does not represent a child’s potential for learning (Gillam & McFadden, 1994). Modifiability of a child’s language deficits would be an important factor to consider when making a severity determination.
Using standardized test performance and cutoff-point criteria for determining the severity of impairment corresponds to a medical view of disability. The more important factor for determining impairment severity should be the impact that the language disability has on the child’s functioning.
Children with language impairment exhibit interindividual and intraindividual variations in their language skills. This makes it challenging for SLPs to determine the severity of their language impairment. Ultimately, school-based clinicians must be aware of the child’s language skills, his or her potential for modifiability, the curricular and social demands, and the interaction among the four to make a severity decision. Norm-referenced test performance is insufficient for this purpose. If intervention decisions are based on severity determinations, then accurate measurement of language impairment severity is essential to sound clinical practice. Using tests for a purpose other than that which they are intended is not sound practice. Although norm-referenced tests could potentially be developed to be more sensitive to differences in the severity of language impairment, the overwhelming majority currently indicate that this is not their intention.
This article did not address the appropriateness of using severity classifications to determine frequency or intensity of services. School districts are likely to provide the most services to those students with the highest severity ratings. Some of these students, however, may have low modifiability even with intense services. In contrast, students with lower severity ratings may be more modifiable, and would make substantial gains with more intense services. Hence, severity ratings should not provide the only basis for making decisions regarding frequency and intensity of services.
