Abstract
The Katzenberger Hebrew Language Assessment for Preschool Children (henceforth: the KHLA) is the first comprehensive, standardized language assessment tool developed in Hebrew specifically for older preschoolers (4;0–5;11 years). The KHLA is a norm-referenced, Hebrew specific assessment, based on well-established psycholinguistic principles, as well as on the established knowledge in the field of normal language development in the preschool years. The main goal of the study is to evaluate the KHLA as a tool for identification of language-impaired Hebrew-speaking preschoolers and to find out whether the test distinguishes between typically developing (TDL) and language-impaired children. The aim of the application of the KHLA is to characterize the language skills of Hebrew-speaking children with specific language impairment (SLI). The tasks comprised in the assessment are considered in the literature to be the sensitive areas of language skills appropriate for assessing children with SLI. Participants included 454 (383 TDL and 71 SLI) mid–high SES, monolingual native speakers of Hebrew, aged 4;0–5;11 years. The assessment included six subtests (with a total of 171 items): Auditory Processing, Lexicon, Grammar, Phonological Awareness, Semantic Categorization, and Narration of Picture Series. The study focuses on the psychometric aspect of the test. The KHLA was found useful for distinguishing between TDL and SLI when the identification is based on the total Z-score or at least two of the subtest-specific Z-scores in −1.25 SD cutoff points. The results provide a ranking order for assessment: Grammar, Auditory Processing, Semantic Categorization, Narration of Picture Series/Lexicon, and Phonological Awareness. The main clinical implications of this study are to consider the optimal cutoff point of −1.25 SD for diagnosis of SLI children and to apply the entire test for assessment. In cases when the clinician may decide to assess only two or three subtests, it is recommended that the ranking order be applied as described in the study.
Keywords
Language assessment in general and in preschool children in particular is performed by applying a variety of tools. Some tools are criterion referenced, others are norm referenced, and yet others are standardized formal tests of language (Paul, 2007). It has been well established that the specific language being learned has an influence on the manifestation of disordered performance (Hickmann, 2010; Leonard, 2001). Hence, it becomes mandatory to construct an assessment tool that is specifically sensitive, among its other features, to the rich Hebrew inflectional system that has been found to be more comparable to that of Italian or Spanish than to English (Leonard, 2001). Thus, a standardized norm-referenced test of language for Hebrew-speaking preschool children – namely, the Katzenberger Hebrew Language Assessment for Preschool Children (henceforth: the KHLA) – was developed and published (Katzenberger, 2009).
The KHLA is a norm-referenced, Hebrew specific assessment, based on well-established psycholinguistic principles, as well as on knowledge in the field of normal language development in the preschool years. The tasks and items included in the test were developed based upon psycholinguistic knowledge in Hebrew language development (Berman, 2004) and considered in the literature to be the sensitive areas of language skills and thus appropriate to assess children with SLI. This research focuses on the application of the KHLA test to a group of Hebrew-speaking children who have been diagnosed with SLI.
SLI is an exclusion-based description of children who have language impairment, but who also have normal cognitive abilities and no identifiable etiology for their difficulties. According to Paul (2007), the overall profile of language skills in a child with SLI may not resemble that of a child with normal language function at any point of the development of language. Pragmatic skills are generally better than lexical, grammatical, and morphological language skills. While the child might show ‘soft’ neurological signs such as clumsiness, poor attention, and/or mild motor difficulties, his or her social and behavioral functioning symptoms are considered to be secondary to the language difficulty.
Following the above description, the KHLA test focuses on areas mentioned as important for description and diagnosis of Hebrew-speaking children with SLI in the preschool years. The assessment and scoring categories will be described below.
Some language assessment tests being used in Hebrew are translated and adapted from English. Currently, there are only two norm-referenced language-screening tests developed originally in Hebrew for preschool children (Gorelnik, 1995; Tavor, 2008). Both screening tests are characterized by a wide age range (Gorelnik – 2;7–6;0, Tavor – 2;0–8;0) and a restricted number of items (each test includes about 50 items). The Gorelnik test assesses a wide range of language faculties, while the Tavor test assesses mainly picture naming. The first comprehensive, standardized language assessment tool developed in Hebrew specifically for older preschoolers is the KHLA applied in the present study.
The KHLA was developed as an in-depth assessment tool programmed to assess children within a restricted age range (4;0–5;11 years) applying a wide range of language constructs. The restricted age range allows the assessment of the specific language skills developed by children at a specific stage of development and can differentiate between achievers and under-achievers, taking into consideration Bishop’s (1997) notion that language impairment expresses differently at each age range. A good example of this phenomenon was depicted by Berman (2004) in the following three phases of language development. (1) Emergent phase – Pre-grammatical: item based, situation-bound – the children rely critically on the mechanisms of rote learning and imitation. (2) Acquisition – Grammatical: structure-dependent, rules of morpho-syntax and lexicon. (3) Mastery of both linguistic structure and language use – conventionalized: context-oriented, discourse-motivated. Berman’s three stages of development stress the importance of assessing a restricted age range. For example, a morpho-lexical task that requests the participant to inflect plural forms with changing stems and/or irregular suffixes (e.g. from simla ‘dress’ to smalot ‘dresses’) expresses differently at different developmental stages in Hebrew; at ages 2–4 years by rote memory (smalot), at ages 3–6 years by applying the regular form (simlot) and at ages 6 years and above by applying lexical knowledge of the irregular forms (smalot). Both the Gorelnik and the Tavor tests with a wider age range and the simple instructions required in a screening test opt for a binary scoring method of correct or incorrect response, missing the opportunity to credit children for applying the regular form. The KHLA with the restricted age range allows for a differentiating scoring method as is described below.
In addition, the selected age range (4;0–5;11 years) is developmentally more adept at performing in a situation in which the participant is presented with pictures and asked to answer questions as required in the tasks applied by the KHLA. The KHLA requires mostly short verbal responses (one or two words) with the exceptions of sentence retelling, narrative retelling, and spontaneous narrative. Taking into consideration Bishop’s (1997) remark that although most children with SLI have some impairment in comprehension, the impairment in production may be more salient, most of the KHLA tasks require production responses from the children and in only a few items participants are asked to perform instructions non-verbally.
The following is a description of the language subtests applied in the KHLA and the theoretical background for their choice. Since the focus of this study is on the psychometric aspect of the test, theoretical justification is limited.
Auditory Processing. According to a number of studies, the language learning abilities of most children with language impairment are simultaneously constrained by multiple factors affecting information processing (attention, speech perception, adequacy of phonological representations, central executive function, or general processing capacity) (Gillam & Hoffman, 2004). Thus, the adequacy of phonological representations and general processing capacity was assessed.
Grammar. The command of grammar is considered to be a significant diagnostic marker of language impairment in children (Leonard, Miller, & Gerber, 1999). Hebrew-speaking children with typical development acquire basic grammar between 2 and 5 years of age (Berman, 2004). Thus, appropriate grammar tasks for older Hebrew-speaking preschool children aim mainly to assess morphological, syntactical, morpho-syntactical, and morpho-lexical knowledge acquired towards the end of this period. This knowledge is considered to be the most informative towards the identification of children with language impairment (Paul, 2007).
Lexicon. During the preschool years the lexicon of children increases significantly (Berman, 2004). Thus the lexicon is a well-established criterion of language command. On the other hand, it is well known that tests of single-word vocabulary do not yield strong identification of children with SLI despite evidence of broader semantic deficits (Brackenbury & Pye, 2005). Accordingly, in the KHLA, lexicon is assessed in two levels, by means of vocabulary (henceforth: lexicon) and by means of lexical categorization that are essential for language development (Ravid, 2006) (henceforth: semantic categorization).
Phonological Awareness. Phonological awareness is considered essential for reading and writing acquisition (Catts & Kamhi, 2005). Thus, it is important to assess the performance of preschool children who are at the stage of emergent literacy.
Narration of Picture Series. Narration is considered a complex ability. The storyteller must be in command of at least three kinds of knowledge: linguistic, textual, and narrative (Berman, 1995). The narration of picture series also requires the ability to translate static visual pictures into dynamic verbal expression and to interpret spatial arrays as temporally related sequences of events (Berman, 1995). Complexity makes narration very sensitive to the child’s language use.
The main goal of the study is to evaluate the KHLA as a tool for identification of specific-language-impaired Hebrew-speaking preschoolers. The study was designed in order to test whether the KHLA is sensitive enough to distinguish between children with typical language development and those who manifest SLI in the age range of 4;0–5;11 years. This decision was made based on the contention that the diagnosis of SLI is restricted to ages 4 years and above (Rescorla & Lee, 2001) and children aged beyond 6 years are generally literate and thus outside the scope of the KHLA test. The TDL participants were organized in four age groups, based on Paul’s (2007) argument that the age range of 4–6 years is very sensitive to developmental changes.
Johnston (2007) argued that one of the goals of language assessment is to identify the scores that will effectively distinguish between children with language problems and typically developing children (cutoff point). Spaulding, Plante, & Farinella (2006) reviewed the data from 43 language assessment tests in English and stated that test specificity and sensitivity rates are crucial for its selection. Following Johnston (2007) and Spaulding et al. (2006), the performance of children independently identified by professional speech-language pathologists (SLPs) was compared with that of their normally developing age, sex, and SES background peers. The scores of the assessments of the two groups of children were analyzed applying three psychometric tools: cutoff point, sensitivity, and specificity.
The KHLA is a comprehensive tool allowing a focused, in-depth exploration of specific linguistic domains and the relationships among them. It has been argued that for each language area that children are presented with, knowledge in other domains would help acquisition (Christophe, Millotte, Brusini, & Cauvet, 2010). It is possible that the lexical, semantic categorization, and phonological awareness areas do not involve only language functions. It has also been argued that it is necessary to develop lexical knowledge so as to be able to perform at the level of adult typical language, in addition to the basic knowledge of syntactic and morphological rules (Berman, 2004). Ravid and Schiff (2009) further argued that in the case of Hebrew, morphological acquisition is paced by morphological, lexical and phonological factors. It is well known that the narrative area involves linguistic, textual, and narrative knowledge (Berman, 1995). Accordingly, a second goal of the study is to find relationships among the various language subtests and relationships among the various language subtests and the auditory processing subtest which is considered to be a constraint on language performance (Montgomery, Magimairaj, & Finney, 2010), thus possibly showing interrelationships with all language areas.
An additional goal of the study is to find domains that are particularly sensitive among some or all of the language-impaired population of Hebrew-speaking preschoolers in the study specific group, by means of the various tasks included in the assessment.
In sum, this study has three goals: (a) to evaluate the KHLA as a tool for identification of language-impaired Hebrew-speaking preschoolers by applying three psychometric tools: cutoff point, sensitivity and specificity; (b) to find relationships among the various subtests; and (c) to find domains that are particularly sensitive among some or all of the language-impaired population of Hebrew-speaking preschoolers, by means of the various tasks included in the assessment.
Description of the test
In order to meet the goals specified above, a large-scale research instrument (26 tasks composed of 205 items), was devised as a means of assessing the linguistic knowledge and language skills of Hebrew-speaking children aged 4;0–5;11 years. The KHLA reliability was measured by Cronbach’s α. The lowest value obtained was 0.7 and most values exceeded 0.8. A common procedure based on Cronbach’s α for deleting redundant items was applied, resulting in a reduction of 34 items from the original list of 205. The current version of the test, described below, is based on the remaining 171 items.
The measures consisted of six subtests, as detailed below (Katzenberger, 2009) in the presentation order:
Auditory Processing (5 tasks, 28 items). Participants are asked to perform instructions (e.g. to put their hand on their mouth and then on their belly); to point to pictures (e.g. to point to a small white bear and a big black cat). A sentence imitation is known as a measure of syntax production. Nevertheless, it also assesses general processing capacity. Thus, participants are required to repeat sentences of progressive length and complexity. The instructions were designed in basic lexicon and simple grammatical structures. One of the tasks instructs participants to point to three or four objects (among six) having phonologically similar names (e.g. shaon ‘watch’; limon ‘lemon’; balon ‘balloon’). The number of information units is counted.
Lexicon (5 tasks, 40 items). Participants were asked to name items that are assumed to be common to children (i.e. not dependent on personal experience or on interest but tied to conceptual development). There is no formal list of lexical frequency in Hebrew; thus the common words presented in the test were judged as being frequent by mothers, preschool teachers, and SLPs involved in this study. The items that were tested are acquired before and after the age of 4 years (Johnston, 2007). The items include both closed-class items (grammatical functors such as prepositions and question words) and also open-class content words (nouns, verbs, and adjectives). The participant is presented with pictures and asked to name a locative preposition (e.g. betox ‘inside’ and meaxorey ‘behind’) (Berman, 2004); to name parts of the body (e.g. lexi ‘cheek’ and berex ‘knee’) (Berman, 2004); to describe dimensions (e.g. raze ‘thin’ and namux ‘short’) (Clark, 2003); to answer information-questions (e.g. mi ‘who’ and lama ‘why’) (Clark, 2003); and to name specific verbs (e.g. kotev ‘writes = is writing’ and metsayer ‘draws = is drawing’). Participants might answer by using the adult typical Hebrew or a semantically correct answer that is not used by adults (see ‘Scoring’ section below). The quality of the answer is considered in the scoring.
Grammar (6 tasks, 34 items). The participant is presented with pictures and asked to answer questions intended to elicit verbs inflected in past and future tense. This measures the child’s ability to simultaneously mark person, number, and gender agreement (Dromi, Leonard, Adam, & Zadunaisky-Ehrlich, 1999). the participant is also requested to use the same verb in two verb-patterns (‘Binyanim’) (e.g. the verb li-rxots ‘to wash’ as raxats basic-transitive ‘wash something’ and hitraxets reflexive ‘wash oneself’); to inflect plural forms with changing stems and/or irregular suffixes (e.g. from simla ‘dress’ to smalot ‘dresses’) and to inflect singular forms from irregular plural suffixes (e.g. from magafaim ‘boots’ to magaf ‘boot’); to inflect the plural adjectives of irregular inflected nouns (e.g. nemala shxora ‘black ant’; noun and adjective in singular feminine suffix to nemalim shxorot ‘black ants’; noun in plural masculine suffix and adjective in plural feminine suffix) and to derive consequential adjectives from a verb (e.g. metukan ‘repaired’ from the verb le-taken ‘to repair’). The participant might use the adult typical Hebrew or a grammatically/semantically correct answer not used by adults (see ‘Scoring’ section below). The quality of the answer is considered in scoring.
Semantic Categorization (3 tasks, 14 items). This ability is assessed using a task requiring word retrieval out of given categories, with a time limit (cf. Semel, Wiig, & Secord, 1994), and word retrieval out of given categories without a time limit. Participants are also asked to explain the resemblance between items that are under a shared category. The number of correct items and quality of answers are considered in scoring (for relevant examples see Table 1).
Phonological Awareness (4 tasks, 19 items). Participants are presented with pictures of simple objects and the examiner names the objects. Participants are required to say what they hear at the beginning of the word and at the end of the word and to find the same rhyme unit of the last syllable between the names of the objects (e.g. aron ‘closet’ – xalon ‘window’). Children might answer by a syllable, a phoneme or a letter’s name. The quality of the answer is considered in scoring. Phonemic isolation of the first consonant of nonsense CVC syllables are also tested because of its strong correlation to reading acquisition (cf. Shatil & Share, 2003).
Narration of Picture Series (3 tasks, 36 items). Because of their complexity, narration tasks are the last tasks to be assessed. The narration task, as opposed to all other tasks, is considered an open task, thus the data for scoring is taken from the text provided by the participant. The texts are analyzed following Berman and Katzenberger (1998) and Bornens (1990).The participant is asked to retell a story and to narrate two other stories. The narrated texts are analyzed according to the narrative essential characteristics (relations between the visual stimulus and the story; and relations between text events) as well as its optional characteristics (conventional opening; a dynamic text; evaluation; script specific lexicon; protagonist maintenance by linguistic means; and adult-level picture sequencing). The number of appropriate items and quality of answers are considered in scoring.
The task ‘explaining the resemblance between items’, its items, and scoring.
Hebrew-speaking children use the term kley ‘tools’ in the meaning of group/family.
Scoring
The KHLA item scoring was determined according to the qualitative and/or developmental principles described above. Twenty adults with at least a high school level of education, and all mid–high SES Hebrew native speakers, participated as a control group. Their answers were considered as a sample of conventional adult language and therefore got a maximal score. Partial scoring was given to a semantically/grammatically correct answer that is accepted but not conventional. For example, the locative preposition meaxorey ‘behind’ got a maximal score while the children’s version meaxora was scored partially; the inflection of the plural form smalot ‘dresses’ with changing the stem simla ‘dress’ got a maximal score while the inflection without changing the stem simlot was scored partially.
The maximal score of a task was determined according to its relative performance difficulty level. For example, the naming of specific or generic-level terms is considered an easier task than the naming of shared super-ordinate-category terms (Ravid, 2006). Thus, the maximal score for naming a body part (e.g. lexi ‘cheek’) is one point but the maximal score for naming a shared super-ordinate category (e.g. kley rexev/kley taxbura ‘tools-of-transportation/vehicles’) is two points. An example of the qualitative and developmental scoring of the answers for the task of explaining resemblance between items is presented in Table 1.
The purpose of the present study is to apply the KHLA to a group of preschool children who were clinically and independently evaluated by their SLPs as SLI (Rice, Warren, & Betz, 2005) and to find out whether the assessment distinguishes between typically developing and language-impaired children. Specifically, the study examines the composite criterion applied by Tomblin, Records, and Zhang (1996) for the EpiSLI criteria, under which a participant is identified as SLI if his or her total score or at least two of the subtest-specific scores fall below the cutoff point.
Method
Participants
Four hundred and fifty-four (454) monolingual native speakers of Hebrew aged 4;0–5;11 years participated in the study. Questionnaires given to the children’s parents and teachers showed that the children had no medical health problems and no sensory, cognitive, or emotional impairments. The children’s articulation was typical to their age (articulation was not formally assessed but clinicians’/student clinicians’ impressions of the child’s articulation in the interaction were collected), in accordance with Rice (2007), who stressed the importance of controlling for phonological/speech impairments in order to avoid confounding parameters of morphological impairments with phonological or speech impairments. Demographic information was gathered within the above mentioned questionnaires. The children were from mid–high SES (as defined by mothers’ education – cf. Chiu & McBride-Chang, 2006), from the same ethnic background, from central Israeli suburbs, and they attended regular public (state) kindergarten. The children’s parents had at least a high school education and were native speakers of Hebrew. The children’s parents gave written consent for the participation of their children in the study.
The children were classified into two groups: (a) 383 children defined by their teachers and parents as children with typically developing language (TDL); (b) 71 children defined by their SLPs as having specific language impairment (SLI).
All participants were taken from the same public educational system, in the same geographical and SES areas. Participants (half males and half females) were divided into four age groups as described in Table 2. All children completed the 205 KHLA items (comprising 26 tasks) included in the original assessment. The test was applied to 383 TDL participants and 71 SLI participants.
Distribution of population by age (in half years) and groups.
Procedures
Each participant was tested individually in a quiet room. The participants completed the KHLA in one to three sittings. Children with typically developing language (TDL) were individually tested in their preschool setting by speech-language pathology students in their last year of education. The student clinicians were trained by the first author in the application of the KHLA and in the scoring procedure. The training consisted of an 8-hour workshop which included a presentation of the theoretical background of the test, the test manual, scoring, and practical practice in applying the test and in the scoring procedure.
The SLI children were individually tested in a clinical setting by experienced SLPs. The SLPs had participated in workshops with the first author and have had work experience in the application of the KHLA as well as in the scoring procedure. The SLPs routinely administered the original version of the KHLA (205 items) during the first and/or second meeting with all the children referred to their clinic by teachers and educational psychologists. The SLPs then proceeded to informally assess the language skills of the children, applying informal criterion-referenced procedures, and language intervention was administered accordingly. Following a period of 4–9 months of weekly interventions the authors and the SLPs recruited for the study all children that fulfilled the following criteria: presented a clinical profile of primary language impairment (Paul, 2007), normal hearing, mid–high SES, monolingual native speakers of Hebrew, and had not received intervention prior to assessment by the KHLA. Seventy-one assessments were analyzed in the study.
Inter-rater reliability was obtained for each group. The second author re-scored 25% of the tests for each group, achieving an internal reliability of 95% in the TDL group and 90% in the SLI group. Disagreements were discussed and resolved.
Results
Statistical analysis was designed and performed to formalize the implementation of the following three goals.
Goal 1(a): Scoring and cutoff
Each participant received a raw score in each of the six subtests and an aggregate total raw score. Each of the seven raw scores was standardized (‘Z-score’) so as to yield average 0 and standard deviation 1 for the TDL participants within each of the four age groups. The cutoff point (in Z-scores) differentiating between the two groups was determined as −1.25 (1.25 SD below the mean, for TDL participants). A participant was characterized as SLI if the total Z-score or at least two of the subtests-specific Z-scores fell below this cutoff.
The choice of the total score as summary statistic
Correlation coefficients between subtest scores and total score were evaluated for TDL and SLI participants, as well as partial correlations between subtest scores for TDL participants, after removing the effect of the total score (see Tables 3, 4, and 5). It is evident from these data that subtest scores are quite independent of each other except for a common factor related to performance degree: excluding Phonological Awareness, off-diagonal entries in Tables 3 and 4 are close to each other, and off-diagonal entries in Table 5 are close to 0. It seems that the common factor explains a higher proportion of the variance in SLI than in TDL participants, indicating that SLI participants may have a more heterogeneous degree of performance. These findings justify the choice of subtests included in the KHLA test and support the summary total score as indicator of performance. The next two paragraphs provide further theoretical support for this choice.
Correlation coefficients between subtests scores and total scores for TDL participants.
Correlation coefficients between subtests scores and total scores for SLI participants.
Partial correlation coefficients between subtests scores for TDL participants, after removing the effect of the total score.
The inter-relationships between scores are expressed by the matrix of their correlation coefficients. This matrix is the data for the identification of the principal components behind the scores. The strength of influence of these components is measured by eigenvalues and the component weights themselves, by eigenvectors. The Kaiser criterion in Factor Analysis stipulates that the important components are those with an eigenvalue exceeding 1. Thus, the number of eigenvalues exceeding 1 is the dimension of the list of scores. In the case represented by Table 3, the maximal eigenvalue is 2.83 and the five other eigenvalues are similar to each other, ranging from 0.51 to 0.72. This is clear evidence that one properly weighted score will adequately represent the discriminating power of the list of scores. As it turns out, the entries of the principal component have similar size, ranging from 0.37 to 0.43. Hence, the equal-weights total score is an adequate summary statistic on which to base the test.
The summary score weights should have been determined, in principle, by some form of theoretical optimization (such as likelihood-ratio or minimal variance methods, perhaps taking age into account). The application of this approach sharpened in-sample discrimination between SLI and TDL participants only marginally, leading to a strong preference for the equal-weights total score, more robust than sample-dependent optimal solutions whose performance cannot be reliably cross validated on the available sample sizes.
The subtest of Phonological Awareness seems to behave differently from the other five subtests as displayed in Tables 3, 4, and 5 (correlation with other subtests) and Tables 7 and 8 (performance compared to the other subtests). Its correlation coefficient with the total score is 0.81 for TDL participants (highest among the subtests), as compared with 0.39 for the SLI participants (lowest among the subtests). Correlation coefficients between Phonological Awareness and the other subtests follow the same pattern as with the total score (Tables 3 and 4). Table 5 shows that when the effect of the total score is removed, the other five subtests are uncorrelated with each other, but negatively correlated with Phonological Awareness. Tables 6 and 7 rank Phonological Awareness as the least discriminating subtest.
Percentage of impairment in participants with SLI as a function of subtest and cutoff point.
Estimated T-score of each subtest score.
Rates of wrong identification of participants.
Statistical tests for effects of group and age
A two-way ANOVA (participant group versus age group) showed highly significant differences between the TDL and the SLI groups (F = 445.266, df = (1,452), p-value = 0). T-tests for differences between TDL and SLI in each age group separately displayed p-values below 0.00001.
Significant interaction (F = 3.729, df = (3,452), p-value = 0.011) was found between participant groups and age groups, due to larger differences between SLI and TDL participants in the youngest age group (4;0–4;5 years) than in the other three groups.
Differences between age groups are highly significant as well (F = 40.961, df = (3,452), p-value = 0). Since the TDL groups are calibrated so as to have no differences, this implies that differences among age groups in the SLI group are highly significant.
The choice of cutoff
The KHLA was found useful for distinguishing between TDL and clinically identified SLI when the identification is based on the total Z-score or at least two of the subtest-specific Z-scores in −1.25 SD cutoff points. In addition, −1 SD and −1.50 SD cutoff points were examined in order to find out the preferred cutoff point. Table 8 describes rates of participants’ incorrect identification for each of the three cutoff points.
Goal 1(b): Sensitivity and specificity
Table 8 shows that identification is not clear-cut. There is a price to pay for sensitivity (accuracy in identification of SLI) in terms of a higher proportion of TDL wrongly identified as SLI. It seems that a 5% error rate in SLI identification is excessive but a reduction in error rate below 1.6% yields excessive false positive results. Thus, the choice of the composite score based on −1.25 SD as cutoff seems a good compromise, with a sensitivity of 98.4% and specificity 82.2%.
Goal 2: Relationships among subtests
In order to find relations among the six subtests included in the assessment, a K-means clustering was applied to the vectors of subtest scores of the 71 children with SLI. Clustering into three groups revealed three clusters differentiated by simultaneous severity. The method failed to identify compensatory patterns and yielded only global degree of impairment. Under these conditions the global score is further justified as the main tool.
Goal 3: Discrimination power of the various subtests
In this section, subtests will be ranked so as to yield the most discriminating single subtest, pair of subtests, and more generally, sets of subtests of given size.
In principle, raw and standardized scores can be computed for aggregate aspects, thus yielding, for instance, a ‘Grammar + Phonological Awareness score’. Sensitivity and specificity constitute a ‘vector’ score, difficult to apply as a comparison criterion. A measure of the performance of the KHLA test should be a ‘scalar’ index providing a one number summary of the effect of five parameters: the mean (0) and SD (1) of TDL participants, the mean and SD of the SLI participants and the proportion of SLI participants in the population. Various such likelihood-based indices proposed in the literature (e.g. the F-measure) reflect a compromise between sensitivity and specificity, based partially on the proportion of the two classes in the population, which should be ignored in this context. The T-score index applied here, not influenced by the proportion of SLI in the population, is t-test-like statistic (mean (SLI) − mean (TDL))/ √ (std(SLI)2 + std(TDL)2), which in the present set-up takes the simple form:
The rationale for choosing this index is that it determines the (error) probability that a randomly selected SLI individual will outperform a randomly selected TDL individual.
[For normally distributed within-group scores, this error probability is the standard normal cumulative distribution function evaluated at the T-score.]
Examining percentage of impairment as a function of language subtest (in Z-score) and cutoff points enables us to find the most sensitive subtest in the identification of children with SLI. Since Z-scores of TDL are standardized and nearly normally distributed, the proportion of TDL with Z-score below −1.25 SD is 10.5% and below −2 SD is 2.3%, in each of the subtests. Table 6 displays the corresponding proportion for the SLI group in total and in each of the six subtests.
Tables 6 and 7 display empirical subtest-wise percentages of impairment in participants with SLI and T-score error probabilities. Grammar is the most sensitive subtest (impairment 82.23%, error probability 8.74%) and Phonological Awareness is the least sensitive (impairment 55.63%, error probability 11.93%). In contrast, the total score achieves impairment 92.18%, with error probability 2.94%.
Tables 7 and 8 show that the ranking order of subtests in terms of discrimination power is Grammar, Auditory Processing, Semantic Categorization (Narration of Picture Series, Lexicon), and Phonological Awareness. The two tables lead to essentially identical ranks except for a switch between Narration of Picture Series and Lexicon subtests.
These ranks refer to each subtest’s performance by itself. Thus, if only one aspect is to be tested, Grammar is the most informative. We now proceed to analyze ranks derived from progressive introduction of multiple subtests, as provided by the T-score parameter.
Although Auditory Processing and Semantic Categorization are individually almost as informative as Grammar, if only two aspects are to be tested, the most informative pair is Grammar + Narration of Picture Series, with an estimated T-score of 1.67 (error probability 4.66%). Details of less informative pairs (and in the sequel, triples, etc.) are omitted.
If only three aspects are to be tested, the most informative triple is Grammar + Narration of Picture Series + Phonological Awareness, with an estimated T-score of 1.78 (error probability 3.74%).
The most informative four aspects are Grammar + Narration of Picture Series + Phonological Awareness + Lexicon, with an estimated T-score of 1.85 (error probability 3.17%).
It is immaterial which of Auditory Processing and Semantic Categorization is tested next, with a five-aspect estimated T-score of 1.87 (error probability 3.03%).
Table 9 reports differences in the performance of a subtest (result) among children that differed in their performance in another subtest (source).
T-tests for differences between scores of PLI children in all subtests.
significant <5%.
highly significant <1%.
For example, consider children below and above median performance in Grammar. These two groups of children differ highly significantly in their performance in Auditory Processing, Lexicon and Semantic Categorization, but they do not differ in their performance in Phonological Awareness and Narration of Picture Series. High- and low-performing children in Phonological Awareness do not differ significantly in any other subtest while high- and low-performing children in Narration of Picture Series differ significantly in all other subtests. These results also corroborate with the previously shown ranking of the most informative subtests.
Discussion
The first and main goal of this study is to evaluate the KHLA as a tool for identification of language-impaired Hebrew-speaking preschoolers in the age range of 4;0–5;11 years. The KHLA was found to be a reliable and sensitive tool for distinguishing between TDL and clinically identified SLI when the identification is based on the total Z-score or at least on two of the subtest-specific Z-scores in −1.25 SD cutoff points, with sensitivity of 98.4% and specificity 82.2%.
Highly significant differences between the TDL participants and the SLI participants in each age group were found, with larger differences between SLI and TDL participants in the youngest age group than in the other three age groups. The youngest age group participants in this study (4;0–4;5 years), showed more profound impairment compared to the three other SLI age groups. Perhaps, this age group of children is clinically identified and treated earlier due to the severity of their impairment. It was shown by Conti-Ramsden, Simkin, & Pickles (2006) that parents can be effective identifiers of impairment when the impairment is below 2 SDs from the population mean. In their study, the youngest and more impaired children were identified and referred to the SLP around the age of 4 years. In a previous longitudinal study, Bishop & Edmundson (1987) found that language impairment is more frequent and profound at age 4 than in the older age group because 37% out of their 71 children that were identified as SLI at the age of 4 years, were not identified as SLI at the age of 5;6 years in a later study (Stothard, Snowling, Bishop, Chipchase, & Kaplan, 1998). In the KHLA cross-sectional study, the children with SLI were clinically identified before the test was applied. The explanation for the difference between the results achieved by Bishop et al.’s (1987) longitudinal study and this study may be a result of the different sampling design used in each study.
The KHLA is a comprehensive tool that allows for focused, in-depth exploration of specific linguistic domains and relationships among them. The second goal of this research is to find relationships among the various areas included in the assessment. Theoretically, in addition to basic knowledge of syntactic and morphological rules, mastering the relationships between the language domains is necessary in order to perform at higher order language levels (Berman, 2004; Ravid & Schiff, 2009). Specific assessment of the performance in each of the areas is required in order to find this kind of relation, such as was found in Ravid & Schiff (2009), who tested 110 Hebrew-speaking first graders aged 6–7 years at the beginning and at the end of the school year, on plurals and phonological awareness. These researchers found that the use of nouns with non-changing stems and irregular suffixes and nouns with regular and irregular suffixes of changing stems improve significantly with age and are correlated with phonological awareness. Clustering of SLI participants in the present study in terms of area scores focused on global severity and failed to find participant groups differentiated by compensatory patterns that might have disclosed relationships between language areas. This is perhaps due to the younger age and the wider spectrum of language areas compared with those applied by Ravid & Schiff (2009).
The last goal was to find domains that are particularly sensitive among some or all of the language-impaired population of Hebrew-speaking preschoolers, by means of the various tasks included in the assessment. Notwithstanding the limited sample of language-impaired children included in the present study, it is possible to state that the ranking order of subtests as shown in the results provides a ranking order for assessment. As described, the ranking order obtained is: Grammar, Auditory Processing, Semantic Categorization, Narration of Picture Series/Lexicon, and Phonological Awareness. Thus, if only one subtest is to be tested, we recommend choosing the most informative subtest, Grammar, considered a specific diagnostic marker of language impairment in English-speaking children (Leonard, Miller, & Gerber, 1999; Spaulding et al., 2006).
Grammar, Auditory Processing, and Semantic Categorization subtests are highly correlated and the additional discrimination value provided by any one of the three in addition to the information provided by the other two is small. This is in spite of the fact that each of the three is individually informative. Our results support Bishop’s (2009) argument that the SLI population has grammatical difficulties and poor auditory processing in spite of the differences in the tasks applied in the two studies. Gillam & Hoffman (2004) stated that lexical categorization and retrieval abilities are most vulnerable to increased processing demands, as found in this study.
If only two aspects are to be tested, the most informative pair is Grammar + Narration of Picture Series. As stated above, Grammar is the best marker of SLI while Narration of Picture Series is not a good marker of SLI by itself but represents a wide range of linguistic and non-linguistic abilities. Narration of Picture Series was connected to all other subtests, manifesting the involvement of linguistic knowledge in narration (Berman, 1995) and the increased processing demands needed in performing a narration (Montgomery, Magimairaj, & Finney, 2010).
If only three aspects are to be tested, the most informative triple is Grammar + Narration of Picture Series + Phonological Awareness. The Phonological Awareness subtest adds information due to its independence of all other subtests. Results show that TDL children developed this area late and that SLI children show low performance that may imply later literacy impairment.
Clinical implications
The clinical recommendation of this study is to determine the cutoff point for diagnosis of SLI children as −1.25 SD when applying the KHLA and to apply the entire test for assessment in order to avoid higher error probability. However, there are clinical situations in which the clinician may decide to assess only two or three subtests. In those situations, it is recommended that Grammar + Narration of Picture Series or Grammar + Narration of Picture Series + Phonological Awareness be applied, as described in the study.
The choice of assessing only two subtests (Grammar and Narration of Picture Series) implies less than 5% wrong identification. The choice of assessing only three subtests (Grammar + Narration of Picture Series + Phonological Awareness) implies less than 4% wrong identification. Nevertheless, the assessment must be comprehensive and allow for focused, in-depth exploration of specific linguistic domains. The description of a child’s current comprehensive language competence enables the clinician to plan language intervention goals individualized for each child (Johnston 2007). Thus, it is recommended that the complete KHLA be applied.
The KHLA represents an example of an assessment tool that is directly adapted to the language-specific structures and typology of Hebrew. The KHLA variety of language domains and well established psycholinguistic principles could be usefully adapted for use in other languages as well. The authors are currently working with Arabic-speaking colleagues on the translation and adaptation of the KHLA to the Palestinian-Arabic language. The similar Semitic language roots of the two languages provide a basis for the translation and adaptation of the test.
Limitations of the study
The study is based on a sample size that is borderline adequate for establishing the discrimination power of the KHLA (as summarized in Table 8) but inadequately small for the analysis of differential performance on subpopulations or proper statistical error estimation by means of cross-validation. Future clinical monitoring of the application of the test should provide data needed for proper threshold calibration and the study of finer questions.
Footnotes
Appendix
The inter-relationships in a list of scores are expressed by the matrix of their correlation coefficients. This matrix is the data for the identification of the principal components behind the scores. The strength of influence of these components is measured by eigenvalues and the component weights themselves, by eigenvectors. The Kaiser criterion in Factor Analysis stipulates that the important components are those with an eigenvalue exceeding 1. Thus, the number of eigenvalues exceeding 1 is the dimension of the list of scores. In the case represented by Table 3, the maximal eigenvalue is 2.83 and the five other eigenvalues are similar to each other, ranging from 0.51 to 0.72. This is clear evidence that one properly weighted score will adequately represent the discriminating power of the list of scores. As it turns out, the entries of the principal component have similar size, ranging from 0.37 to 0.43. Hence, the equal-weights total score is an adequate summary statistic on which to base the test.
Acknowledgements
The authors would like to thank Ruth Berman from Tel Aviv University for her careful advice and valuable insights.
The statistical analysis was performed by the Tel Aviv University Statistical Laboratory
The authors wish to thank all those who enabled this research, the students of the speech-language pathology program at Hadassah Academic College Jerusalem, the preschool teachers, the speech-language pathologists, the children, and their parents
Funding
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
