Abstract
This study aimed to consider collaborative practice in contributing to joint assessment and producing appropriate referral of children to speech and language therapy (SLT). Results of formal testing of selected comprehension skills are compared with functional/classroom performance as rated by class teachers. Thirty children aged 6.5–8.4 years, from three mainstream schools, were assessed using Clinical Evaluation of Language Fundamentals (CELF-4; Semel et al., 2006) subtests: ‘Understanding Spoken Paragraphs’ and ‘Concepts and Directions’. The children’s teachers completed the ‘listening’ component of the Observational Rating Scale. Combined scores for Understanding Spoken Paragraphs and Concepts and Directions subtests are significantly correlated with Observational Rating Scale scores, which reflect teachers’ ratings of comprehension in the classroom. The Concepts and Directions subtest scores alone correlated significantly with the teachers’ Observational Rating Scale ratings, but the Understanding Spoken Paragraphs scores did not. The findings suggest that teachers can fairly accurately identify the level of children’s functioning from informal observations, and these were corroborated through standardized speech and language therapy assessment. It is argued that holistic assessment and collaboration between health and education professionals can provide the basis for appropriate referral and effective assessment, contributing to accurate profiling and monitoring of intervention.
I Introduction
1 Language comprehension
Language comprehension (LC), whether in the classroom or other contexts, involves speech perception, semantics, understanding sentences and pragmatics, as well as working memory and metacognition. Considering the child in the context of a learning environment, assessment of LC is a complex process. Some areas of LC can be assessed formally/systematically, but evaluation also needs to consider pragmatics, listening and attention and functional comprehension. This includes assessing how the child uses the skills they have, and any strategies used to overcome difficulties as well as the overall impact upon self-confidence and learning. Effective and comprehensive assessment is therefore vital for inter-professional goal-setting and monitoring for intervention.
Wiig (2001) describes how the objectives, content, strategies and format of speech and language therapy (SLT) assessment in the 1990s changed from being ‘deficit-driven and reactive’ to more ‘strength driven, curriculum-based and proactive’. Paul (2000) emphasizes how assessment must encompass real-time processing, with new or unfamiliar information as encountered in the classroom. Hasson and Joffe (2007) agree, citing evidence from several authors that many current standardized tests may be inadequate to effectively assess children unless combined with more dynamic assessment. For example, Camilleri and Law (2007) found that factors such as poor attention skills, shyness, and lack of world experience negatively affect performance. The World Health Organization’s (2001) ‘International Classification of Functioning’ asserts that social aspects of disability must be taken into account (not just focusing on ‘medical’ or ‘biological’ dysfunction). Law et al. (2008a) suggested a move towards functional methods for SLT intervention, and therefore also assessment methods for LC that are functionally valid.
Owens (2004) reports that teachers are central to the functional assessment process, and this view is supported by Paul (2000), who reminds us that teacher’s judgements are based on weeks/months of everyday teaching, and are informed by the children’s overall learning outcomes. This source of information is vital in building a full picture of a child’s skills and difficulties to support true inclusion within the mainstream learning environment.
2 Standardized norm-referenced assessment
Once a child is referred to SLT, standardized assessment to inform differential diagnosis may take place. Different children may present functionally with the same difficulties, and standardized assessment can disentangle specific difficulties such as specific language impairment (SLI), auditory processing difficulties, and short term memory deficits. Such assessments are often theoretically driven, tapping into linguistic and cognitive-neuropsychological markers for differential diagnosis, and involve extensive standardization with a comparable population. They can determine the degree, nature and extent of a language disorder and therefore lead to effective SLT intervention (Wiig, 2001).
Furthermore, exclusionary criteria for entry procedures to specialized educational provision have traditionally been based on standardized test results (Botting et al., 1997). The Clinical Evaluation of Language Fundamentals assessment (CELF-4; Semel et al., 2006) is a very well-known standardized language assessment for SLT use. It can identify error patterns and highlight areas of strength or need, compared to the typical population (Semel et al., 2006). However, Semel et al. (1995) indicate that the CELF (3rd edition, UK: CELF-3) subtests cover primarily the ‘form’ and ‘content’ components of Lahey’s (1988) model of language, and do not address ‘use’ of language. The term ‘form’ covers morphological, syntactical and phonological skills, ‘content’ relates to semantic knowledge, and ‘use’ to pragmatic aspects of language and communication. As with most standardized assessments, subtests of the CELF-4 are usually carried out in a one-to-one context with minimal distraction. This is ideal to achieve the child’s optimal performance, but it may not be comparable to their performance in everyday environments, such as in the classroom. Sometimes completion of a full battery of subtests is not always possible, and individual subtests are used in isolation. Therefore, it would be useful to find out whether individual subtests are functionally valid, in order that results can be interpreted effectively.
Two standardized subtests exploring LC in the CELF-4 are Understanding Spoken Paragraphs (USP) and Concepts and Directions (CaD). Bishop (1997) suggests that these subtests examine two very different areas of LC and are very different in nature. The USP subtest aims to evaluate a child’s ability to listen to and comprehend paragraphs of increasing complexity (Semel et al., 2006) with some reliance on the child formulating a verbal response. Semel et al. (2006) suggest that this subtest relates to classroom activities for understanding stories, descriptions and directive materials. The CaD subtest measure of comprehension is instruction-based, assessing understanding of concept words and sequences without contextual/inferential background, and may be demanding on working memory. Semel et al. (2006) claim the CaD subtest evaluates the child’s interpretation of spoken directions of increasing length and complexity, whilst also following instructions and carrying-out related actions. They suggest that this relates to following teacher instructions, remembering homework, internalizing scripts and rules and general comprehension.
3 Comparison of standardized assessment and teacher observations
In practice, children sometimes perform poorly on standardized assessment of LC but manage to competently comprehend within the classroom-context using coping strategies, peer support and contextual cues. Alternatively a child is sometimes reported to be struggling to comprehend in the classroom but manages to score appropriately on one-to-one standardized assessment. Bishop (1997) quotes a parent commenting on their child:
The tests that people give him just don’t capture the difficulties he has in the real world, when people are talking rapidly, and one sentence follows another without any pause. (p. 155)
Observation of functional skills in naturalistic environments is an optimal adjunct to individual assessment and may indeed precede standardized SLT assessment as a path to identifying children with comprehension difficulties. Gilmore and Vance (2007) report several studies that demonstrate moderate correlations between teacher ratings and children’s performance. In their own study with 4–5-year-old children they asked teachers to complete a questionnaire to assess attention and listening and comprehension. They found a significant positive correlation with LC measured on standardized tests, including the Clinical Evaluation of Language Fundamentals: Pre-school (Wiig et al., 2000), concluding that teachers are accurate predictors of pupils’ abilities. Williams (2006) also reported that teachers of children aged 5–7 made similar judgements about children’s overall language skills when compared with standardized test scores, although no specific measures were reported for comprehension skills, and the standardized tests used were limited in number and variability.
Botting et al. (1997) explored concordance rates between standardized tests and clinical opinion of 242 children with language difficulties in UK language units. They compared identification rates of specific areas of language difficulty compared to peers without SLI. Functional performance was assessed using teacher judgements. On the measure of syntax/morphology LC there was 27% agreement that there was no difficulty, and 24% agreement that there was a difficulty (where the cut-off for difficulties was the 5th percentile). Thus, 51% of the teachers’/therapists’ opinions agreed with the standardized assessments. Using a similar methodology, Semel et al. (1995), while measuring construct-validity for the CELF-3, explored the assessment’s ability to identify language difficulties compared to the school system’s identification of language difficulties. 21% of children were identified by the CELF-3 as having no language difficulty, despite the school system highlighting problems. In reverse, the CELF-3 signalled that 7.4% had a language difficulty when the school system reported no difficulty. Ultimately, 42.6% were correctly identified by both methods as having no difficulties, and 28.7% where both measures agreed that there were difficulties.
Boynton-Hauerwas and Addison-Stone (2000) explored correlations between teachers’ ratings of children’s language with standardized testing across typically-developing participants and participants with SLI. They found no significant correlations between teachers’ ratings and formal test scores for the typically-developing children, but for children with SLI teachers’ ratings were positively correlated. However an unpublished, unvalidated questionnaire was used for teacher ratings, the reliability of which is therefore questionable.
Speake (2003) includes the AFASIC language checklists in her publication, and suggests that they are particularly valuable for identifying comprehension problems, as these children are sometimes previously unidentified, but struggle with the new challenges facing them when starting school. The checklists are described as providing a ‘sound basis for deciding whether or not to refer the child for an assessment as well as beginning to plan the kind of support that may be helpful’ (p. 10). Questions for the age 6–7-year checklist were prepared by a panel of professionals with high internal-reliability measures, but no test–retest reliability scores are available.
4 Clinical evaluation of language fundamentals subtests
This present study uses a published rating-scale tested for construct validity, from the CELF-4. The Observational Rating Scale (ORS; Semel et al., 1996, personal communication) is a checklist to quickly assess language functioning by gathering qualitative information from parents and teachers on children’s communication skills, including listening, speaking, reading, and writing. The ORS statements were developed from interviews with children with language impairments, their teachers, and parents, and this process was claimed to strengthen content validity (Semel et al., 1996, personal communication). Wiig and Secord (2006) suggest that the ORS has enabled the CELF-4 to incorporate a contextual lens, which creates a real-life picture of children’s behaviours and academic performances. Massa et al. (2008) explored these factors for 73 10-year-olds within their study of concordance rates between parents and teachers using the ORS. Overall, significant moderate correlations were found between teacher ratings and standardized assessment. Correlations were not presented for specific individual subtests or areas of language, and can therefore only be interpreted on an overall predictive level.
Semel et al. (1996, personal communication) also completed a field study exploring criterion-related validity for the ORS. They correlated parent, teacher, and student ORS ratings with CELF-3 receptive, expressive and total language performance in 1,208 typically-developing children and 117 children with language impairment, aged 6–16 years. Teachers’ ratings on the ‘listening’ component of the ORS when correlated with CELF-3 ‘comprehension’ total score produced a significant moderate coefficient of .33 for children with no language difficulty, and .38 for children with language difficulties. Again, no specific information is provided about individual standardized comprehension subtest correlations with the ORS. It is clear that more studies are needed to investigate specifically whether standardized assessment subtests of comprehension agree with clinical/environmental observations such as the ORS. Previously this relationship between individual comprehension subtests and functional LC for school-age children has not been explored directly. Such information would support decision-making when professionals have to be selective about assessment methods and standardized subtests, and assist in producing objective, yet clinically-valid descriptions of children with language difficulties.
As one measure of construct validity, Semel et al. (1995) explored inter-correlations between Listening to Paragraphs (equivalent to the USP subtest in this early edition of the test) and CaD and obtained a mild correlation coefficient of 0.2, suggesting that the two subtests are weakly associated, but assess very different aspects of comprehension and/or produce different scoring. Consequently, it is possible to assume that one of these subtests might be more predictive of classroom LC than the other. Similar correlations for the CELF-4 of the USP and CaD produced a higher coefficient (0.46), indicating more parity between the two subtests in the newer edition of the test (although it is not clear which changes/improvements produced this adjustment in the correlations).
5 Aims of the current study
This study investigated whether standardized assessment of LC is a valid measure of functional comprehension skills in the classroom context as compared to standardized assessment scores with teacher observation ratings. Children’s performance on two CELF-4 subtests (Semel et al., 2006) that differ in presentation and content, Understanding Spoken Paragraphs and Concepts and Directions, is compared with teacher completion of the Observational Rating Scale. This allows the exploration of whether these different subtests tap different functional LC skills and whether one is more indicative of the child’s functional LC within the classroom than the other.
II Methodology
1 Design
A correlational procedure was chosen in order that the relationships between individual participants’ performance on each of the measures could be analysed for direction, magnitude and significance. Correlation coefficients are one of the most useful statistical measures for describing the degree of relationship between variables (Trochim, 2006).
2 Participants
Full ethical approval was granted by the University of Sheffield’s Research Ethics Committee before commencing the study. Letters and consent forms were sent to seven schools inviting teachers of Year 2 and Year 3 classes (children aged 6.0–8.9 years) to participate in the project. For children of this age demands in the classroom often increase, and comprehension difficulties may become more evident and/or problematic, even when earlier delayed language development may have appeared to resolve. Three schools responded, each with one teacher consenting to participate (each teacher having several years of teaching experience). More than one school was involved in order to reduce the effect of setting-specific results. The researcher had no direct connection with the schools before the study. School details were acquired from the most recent Ofsted (The Office for Standards in Education, Children’s Services and Skills) reports (the outcome of mandatory regulatory school inspections within the UK) (see Table 1).
Characteristics of participating schools.
Letters and consent forms were sent to all parents with a child in each of the participating teacher’s class, and 30 children were given consent to participate and were included in the study with no exclusions. Participant sample details of gender, ages and percentage consenting in each school are summarized in Table 2. No details were collected about children’s special educational needs or first language in order to maximize the sample size.
Characteristics of sample (total n = 30).
3 Materials
CELF-4 subtests were used for standardized assessment. Understanding Spoken Paragraphs (USP) requires the child to answer 5 questions about each of three different test paragraphs read to them by the assessor. Concepts and directions (CaD) requires the child to follow instructions of increasing grammatical and conceptual complexity read to them by the assessor.
Teacher’s ratings of children’s comprehension were collected using the Observational Rating Scale (ORS; Semel et al., 1996, personal communication). This scale is comprised of 40 descriptive statements that are categorized into four general areas. For this study only the ‘listening’ section was used as it relates directly to comprehension skills; e.g. ‘Has trouble following spoken directions’; ‘Has trouble looking at people when talking or listening.’ Written instructions ask teachers to rate each of the nine statements by indicating the frequency of the behaviour (never, sometimes, often, and always). Teachers were also able to add any additional concerns or observations that were not covered in the descriptive statements. Assessments were planned for the spring term so that participant teachers had known/taught the children for at least one term, providing more reliability to their ORS ratings.
4 Procedure
Consenting participants were allocated dates and times for standardized assessment in the afternoon sessions (based on convenience for the school/teachers), and parents were informed about assessment times but not invited. The same person (first author) carried out every assessment session to ensure consistency. Each involved an initial general conversation with the child to familiarize the child with the researcher and to allow him or her to ask questions. Both subtests were carried out over one session in a quiet room at the child’s school and took an average time of 17 minutes to complete, with no major variation in this across participants/schools. Half of the participants completed the USP first and CaD second, the other half vice versa, so that order of presentation was not a confounding variable. Standardized assessment protocol for the CELF-4 as outlined in the test manual (including instructions and practise questions) was followed for all assessments.
Once participants had been assessed their teacher was given the ORS form (Appendix 1) to fill out, which ensured ORS results did not influence standardized assessment interpretation. The teacher was given no indication about how well each child had performed on standardized subtests when asked to complete the rating scale.
A score was then applied to ORS ratings as follows: never 4, sometimes 3, often 2 and always 1, so that higher numerical scores represented more positive ratings (i.e. absences of behaviours). The total ORS scores for each child were calculated (total possible score 36) to be used for analysis. Teachers were free to add additional comments about the children, which sometimes detailed special educational needs or first language; however, as this was not a standard request the information was not analysed specifically.
5 Analysis
Several comparative measures were carried out for each individual child’s results in order to be able to answer the research question and to inform the discussion. Correlation coefficients were calculated using the Spearman’s rho statistical test.
III Results
All 30 participants (age range 6.5–8.4 years) completed both standardized subtests without reluctance, discontinuation or abandonment. Teachers completed the ORS for all participants. The mean, standard deviation and range for each measure are presented in Table 3. The results for each measure show a wide range of performance from the lowest to highest score. The ORS rating showed the least proportionate range, with a skew in the mean towards the upper margin (see also Table 3), suggesting a possible lack of sensitivity in this measure at the upper end.
Descriptive results for each measure.
Note. CaD = Concepts and Directions; ORS = Observational Rating Scale; USP = Understanding Spoken Paragraphs.
1 Distribution of data from each measure
For the ORS score a higher score refers to a more positive result (less occurence of poor comprehension behaviours). The ORS produced a negative skew whereby 7 participants achieved 36 (maximum score). However, Kolmogorov–Smirnov analysis confirmed that all measures were normally distributed (USP: z = .341; CaD: z = .161; ORS: z = .145; p < .05 for all measures).
2 Correlations between measures
The combined scores for CaD and USP was significantly correlated with the total ORS score (r = .392, p = .016) with a medium effect size (see Table 4). Children who score more highly on the two standardized subtests together were also rated more highly by their teacher on the ORS regarding functional comprehension skills. Children from the sample whose results did not fit this pattern are discussed in more depth as individual cases later. Performance on each of the subtests USP and CaD were compared with the teacher ratings in turn. Scores on the CaD subtest were significantly positively correlated with the ORS rating. However, scores on the USP subtest were not significantly correlated with the ORS rating (see Table 4). The CaD and USP raw scores were significantly correlated (r = .565, p = .001) so the more highly children scored on one subtest the more highly they scored on the other.
Correlation coefficients between USP and CaD scores and ORS.
Note. * Significant below the .05 level (one tailed); CaD = Concepts and Directions; ORS = Observational Rating Scale; USP = Understanding Spoken Paragraphs.
3 Analysis of ORS statements
Visual inspection of the data indicated that certain statements within the ORS seemed to elicit more variation in teachers’ ratings and generally lower ratings than other statements (see Table 5). Individual statement ratings from the ORS were then correlated with each of the two standardized subtest measures and as this required multiple analyses, a Bonferroni correction was applied, adjusting the p-level to 0.005 (see Table 5). Using this as a measure of significance, only Statement 6 produced a significant positive correlation with each of the standardized subtests (significance to p = 0.001). Statement 7 also produces a medium correlation effect size with both standardized subtests individually but not significant at the level of p < 0.005. The same applies to the CaD correlation with statements 4 and 5. Ratings on other statements were not significantly correlated with the either of the standardized subtest raw scores.
Median and standard deviation for ORS statement ratings.
Note. CaD = Concepts and Directions; ORS = Observational Rating Scale; USP = Understanding Spoken Paragraphs.
4 Agreement across measures in individual participants
The data was further analysed at the level of individual performance with respect to variation across the three measures. The USP and CaD raw scores were converted to standard scores, which allowed investigation of whether children performed within the average range in comparison to their peer population. The average range for the standard scores was taken as between 8 and 12, using the CELF-4 recommendations, i.e. between the 25th and 75th centile. In the absence of standardized data for the ORS in the CELF-4 manual, the data from this study sample of 30 participants was used to calculate an ‘average range’ figure of 28–36 whereby a score of 27 or below on the ORS for this sample places a child below the 25th centile.
Agreements between measures were based on whether participant’s scores placed them within the ‘average range’ (inter-quartile range), not accounting for confidence intervals; 14/30 participants had scores within the average range on all three measures. The remaining 16 participants were identified as having scores out of the average range on one or more of the three measures using the criteria above, and these individuals were explored in more detail: Four of these showed average/above average scores across all 3 measures (3/4 were given the highest possible rating from their teacher on the ORS) (see Table 6).
Participants scoring strongly / above average with agreement across measures.
Note. CaD = Concepts and Directions; ORS = Observational Rating Scale; USP = Understanding Spoken Paragraphs.
In summary, 19 participants in total (63%) showed general agreement in performance across all three measures and there were 11 participants for whom there was non-agreement in performance across two or more of the measures. Table 7 shows 5 participants with disagreement between scores on the USP and CaD (standardized measures). Participants 10 and 30 scored significantly below average on the USP subtest but average on the CaD subtest. Participants 16, 17 and 20 scored within the average range on the USP but lower than average on the CaD subtest. In these five cases the CaD score agreed with the ORS rating, and the USP disagreed with the ORS rating (except for participant 17) (see Table 7).
Participants with disagreement between CaD and USP scores.
Note. ADHD = attention deficit hyperactivity disorder; CaD = Concepts and Directions; ORS = Observational Rating Scale; USP = Understanding Spoken Paragraphs.
The remaining 6 participants had scores on both USP and CaD (the standardized subtests) that disagreed with the ORS ratings (see Table 8). The teachers of participants 7 and 24 rated them as below the 25th centile, whereas standardized test performance places them within the average range. In contrast, teachers of participants 15 and 27 rated them as within the average range, whereas standardized test performance was below the 25th centile. Notably, participants 5 and 26 scored above the 75th centile on one or both of the standardized tests but were rated as below the 25th centile for their comprehension skills on the ORS, signalling wide non-agreement between the standardized test measures and the ORS rating.
Participants with disagreement between standardized measures and ORS.
Note. CaD = Concepts and Directions; ORS = Observational Rating Scale; USP = Understanding Spoken Paragraphs.
IV Discussion
This article explores collaborative assessment with school-age children. The study addresses the question of whether standardized assessments of comprehension by SLT agreed with teacher’s perceptions of functional comprehension skills in the classroom, as measured on a published rating scale.
1 Correlations/agreements
Initial analysis indicates that participants’ combined USP and CaD raw scores correlate to a moderate level with their ORS comprehension rating. This is comparable to findings of Semel et al. (1996, personal communication), although without using the whole battery of comprehension subtests. This implies that completing two subtests alone can give a reliable overall picture of functional performance when used summatively. This is particularly the case for children who have no comprehension difficulties: the 19 out of 30 (63%) in this sample who had full agreement across all measures all scored in the average or above average range. Botting et al. (1997) report a similar figure of 67% agreement between standardized comprehension scores and teacher opinion. The results however challenge Massa et al.’s study (2008), which concluded that the ORS domain scores lack specificity. However, Massa et al. used the ORS in its entirety, measuring speaking, listening, reading and writing, whereas this study focused on comprehension skills only (the ‘listening component’).
2 Non-agreements
There were six individual participants (20%) for whom performance on the two standardized subtests differed from the functional assessment rating. Participants 15 and 27 (6.8% of sample) were rated within the average range on the ORS but below average on both standardized subtests. Although not specifically investigating comprehension, Semel et al. (1995) report similar figures: 7.4% of the sample were identified as having no difficulties on the school system but scored below average on the CELF-3. Twenty-three percent of the SLI sample reported by Botting et al. (1997) also showed this profile. This implies that some children with language comprehension (LC) difficulties are unidentified by the teacher or that LC difficulties may not pose a problem functionally in the classroom. This may be because non-linguistic/contextual cues are available to aid comprehension (which holds implications for their future where demands may increase or where cues are unavailable). It is also important to acknowledge that written language comprehension was not assessed in this study but should be considered. For example, children who may manage functionally in the classroom using pragmatic skills and context may still struggle with understanding written language. A replication of this study which also measures written comprehension both with standardized and observational tests would contribute to this discussion.
Four participants (5, 7, 24 and 26: 13%) were rated below average on the ORS but scored within the average range on both standardized subtests. This would suggest that use of the USP and CaD subtests may miss comprehension difficulties that some children experience functionally in the classroom. There is consequently a need for a blind replication of this study with a sample of children with identified language difficulties in order to explore in more depth what these missing elements may be.
Participants 5 and 26 are unusual cases as they scored above average for the standardized subtests but below average on the ORS, showing a marked contrast. Both teachers added comments to their ratings for these children; one regarding fine motor skills, and the other regarding autistic spectrum issues. Here it is possible that the ORS taps into the social and pragmatic comprehension difficulties experienced by some children who are not targeted by the standardized assessments. This issue could be explored further with a more specific sample and wider functional assessments (e.g. using SLT observations).
3 Implications for practice
It could be argued that observational ratings alone are insufficient for the identification of LC difficulties; as Paul (2000) identifies, comprehension signals are not comprehension itself. This study has highlighted how not all abilities identified on standardized assessment can be observed in the classroom (for 6 children the functional measure did not reflect the combined USP and CaD scores). Some difficulties may be less obvious in the classroom, but as demands change these could lead to later academic failure and potentially to behaviour difficulties or withdrawal. This highlights the need for both wider standardized and more observational types of assessment for identification purposes, or the use of more sophisticated observation schedules.
Consideration of the predictive validity of standardized SLT assessments must be taken. These tell us about a child’s abilities at a ‘static’ point in time, but skills and strengths change over time and through development (Law et al., 2008b). In the UK, the Department for Education (2011) reported that ‘We know that effective multi-agency assessments look at the child’s overall needs and are used as a dynamic process rather than representing a snapshot of their support needs’ (p. 36). Gardner et al. (2006) highlight the linguistic diversity in SLI, which can only be described in-depth by using standardized assessment of the linguistic system. Similarly, standardized assessment helps us to better understand the roots of a comprehension difficulty, which could lead to better intervention.
Peets (2009) found that different varying classroom contexts have a varying impact upon turn taking, self-monitoring and language productivity/complexity. This suggests that – should they be able to allocate time to closer observation – teachers are the best informants of functional comprehension skills as they see the children every day in different contexts/situations. The correlation found between teacher’s ratings and standardized SLT assessment in this project was found using just 9 ORS statements, and taking approximately 10 minutes of teacher time per child. Justice et al. (2002) have highlighted that SLT services are likely to be too stretched to screen for language difficulties in mainstream schools, and therefore it would be good practice to make greater use of teacher’s knowledge of children in their class to identify those who may require further assessment.
The two standardized subtests correlated significantly with each other (r = .565), and this is comparable to the correlation coefficient r = .46 found by the CELF-4 authors (Semel et al., 2006). This suggests that both subtests are tapping some of the same abilities in comprehension. When examining specificity, both subtests are equally effective at ruling out difficulties. The USP identified 80% of participants as not having any difficulties who also had ORS ratings within the average range, and this figure was 83% for the CaD scores and ORS rating. However, these figures do leave 17%–20% of participants with a mismatch between teacher rating and performance on standardized assessment, suggesting that reliance on ORS ratings might result in some children in need not being investigated further, or children who may not need assessment being referred. In considering sensitivity, the USP subtest identified only 20% of children who showed below average scores on the ORS, and the CaD subtest identified 50% of children with below average ratings on the ORS. This would indicate potential lack of identification of children with poorer functional communication. However, these percentages are based on a very small sample of 5 or 6 participants, and may not be very reliable.
Performance on the USP subtest is partly reliant on an expressive response, and for some children expressive language may be less developed than comprehension; this may explain why USP scores alone were not significantly correlated with ORS ratings. Additionally, CELF-4 reliability data indicated that split-half and test–retest reliability for the USP subtest was only .63, although Semel et al. (2006) suggest that this was possibly due to the subtest having relatively few items and a small scoring range. Performance on the CaD subtest was significantly correlated with ORS rating. This subtest is instruction-based and sequential in nature and so may reflect functional comprehension required in the classroom more closely, and difficulties with this subtest may reflect difficulties with understanding that are also identified by teachers.
V Conclusions
This study explores how children’s performance on two standardized comprehension subtests relates to functional classroom comprehension as rated by class teachers. The combined score across two language comprehension subtests of the CELF-4 appears to provide a functionally valid measure of comprehension skills, for approximately 80% of children. The Concepts and Directions subtest may provide a more valid measure of functional comprehension than the Understanding Spoken Paragraphs subtest, as CaD scores were significantly correlated with teacher-rated classroom comprehension skills and USP scores were not significantly correlated. For screening purposes, functional assessment using the ORS may be useful. However, some children showed uneven profiles, with a mismatch in performance across the standardized subtests and across the subtests and the teachers’ observational rating. Reliance on the ORS for screening for language comprehension difficulties may result in some children with poorer skills remaining unidentified. Further investigations are warranted with a larger sample of children, and to include children with known language difficulties. However, these preliminary findings provide some indications that the CaD subtest and the ORS are potentially useful, and that teachers and SLTs can collaborate to identify children in need of additional support for language.
Footnotes
Appendix
Teacher ratings record sheet.
Directions: The following statements describe comprehension problems that some children have. Tick the box beneath the appropriate heading (Never, Sometimes, Often or Always) that best describes how often each behaviour happens for this child. Try not to leave any blank. Fill them in to the best of your knowledge.
| Never | Sometimes | Often | Always | |
|---|---|---|---|---|
| Has trouble paying attention. | ||||
| Has trouble following spoken directions. | ||||
| Has trouble remembering things people say. | ||||
| Has trouble understanding what people are saying. | ||||
| Has to ask people to repeat what they have said. | ||||
| Has trouble understanding the meaning of words. | ||||
| Has trouble understanding new ideas. | ||||
| Has trouble looking at people when talking or listening. | ||||
| Has trouble understanding facial expressions, gestures or body language. |
For (roughly) how long have you been teaching this child?___________________
Is there anything else you would like to add to describe the child’s skills in the classroom?
Thank you for completing this questionnaire.
Source. Adapted from Semel et al., 2006.
Acknowledgements
The authors would like to thank all of the teachers and children who generously gave their time and effort to participate in the study. They would also like to thank Cornwall and Isles of Scilly SLT service who kindly allowed the use of their CELF-4 assessment for the duration of the study. This study was completed in partial fulfilment of MSc in Language and Communication Impairment in Children at The University of Sheffield.
Funding
This research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors.
