Abstract
Literature on “Thin Slice” ratings indicates that a number of personality characteristics and behaviors can be accurately predicted by ratings of very short segments (<5 min) of behavior. This study examined the utility of Thin Slice ratings of young children with autism spectrum disorder for predicting developmental skills and language gains over time. A total of 22 preschool-aged children with autism spectrum disorder participated in a battery of developmental assessments and a video-taped therapist–child interaction at Time 1. They then participated in follow-up testing of language skills and a second therapist–child interaction 6 months later (Time 2). Groups of approximately 25 naïve undergraduate students provided impression ratings (“Thin Slice ratings”) about each child’s skills and behaviors during 2-min segments taken from the therapist–child interaction videos at each time point. Thin Slice ratings at Time 1 were highly correlated with child scores on several developmental assessments at Time 1. In addition, Thin Slice ratings at Time 1 predicted gain in parent-reported expressive vocabulary over the course of 6 months, over and above the predictive utility of Time 1 vocabulary size. These findings provide preliminary evidence for the concurrent and predictive validity of Thin Slice ratings in young children with autism spectrum disorder.
Introduction
Most young children with autism spectrum disorder (ASD) experience delays in acquiring language skills (see Eigsti et al., 2011 for review), and earlier acquisition of functional language predicts better outcomes for these children (e.g. Szatmari et al., 2003). Several pre-linguistic skills, including imitation and joint attention (e.g. Thurm et al., 2007), have been identified as predictors of language growth over time. However, there is still large variability in language outcomes for children with ASD that is not well-predicted by commonly used developmental assessments. Thus, additional strategies for accurately predicting language outcomes over time for children with ASD are needed.
There is significant controversy in the fields of Medicine and Psychology about the utility of clinical inference for making diagnostic and treatment decisions. While many practitioners argue that clinical inference is a key part of effective practice, others contend that such subjective judgments are inferior to statistical prediction and prone to cognitive errors of the part of clinicians (see Grove et al., 2000). However, a growing body of literature on “Thin Slice” judgments (i.e. impressions based upon very short segments of dynamic verbal or non-verbal behavior) suggests that subjective ratings of behavior may contain useful information about a variety of clinically relevant behaviors. In past studies, trained and untrained raters have been able to predict personality traits and disorders, trait anxiety, depression, and suicidality after viewing very short (seconds to minutes long) audio or video clips (see Slepian et al., 2014 for review). In addition, these ratings have been found to have predictive value for therapeutic outcomes, such as treatment dropout and symptom improvement during cognitive therapy for depression (Sasso and Strunk, 2013). In most cases, non-verbal behavior appears to contribute more to the accuracy of these ratings than verbal behavior (Slepian et al., 2014).
Recently, several researchers have applied Thin Slice ratings to children with ASD. Grossman (2014) found that adult raters could accurately judge social awkwardness in children with ASD from clips as short as 1 s. In addition, Walton and Ingersoll (2012) found that Thin Slice ratings of 2-min video clips were able to detect subtle changes in interaction style between children with ASD and their siblings following a sibling-implemented intervention program. However, Thin Slice ratings have not been examined in detail in young children with ASD to determine how these ratings may relate to and predict gains in standardized developmental measures.
In this article, we examine the contribution of subjective ratings of brief (2 min) video clips of children with ASD interacting with a therapist for predicting gains in language skills over the course of 6 months. We hypothesize that Thin Slice ratings will correlate with children’s scores on standardized developmental measures. We will also explore whether these ratings contribute to predicting language gains over time in children with ASD, over and above the predictive power of standardized developmental measures.
Methods
Participants
Participants in this study were 22 children (20 boys, 2 girls) with ASD ranging in age from 22 to 47 months at the time of the first assessment. All participants were diagnosed with ASD by an outside professional and met cutoff for “Autism” or “autism spectrum” on Module 1 of the Autism Diagnostic Observation Schedule (ADOS; Lord et al., 2000). See Table 1 for participant characteristics.
Participant characteristics.
ADOS: Autism Diagnostic Observation Schedule; BSID-III: Bayley Scales of Infant and Toddler Development–3rd Edition; ESCS: Early Social Communication Scales; PLS-4: Preschool Language Scale–4th Edition; MCDI: MacArthur-Bates Communicative Development Inventory.
Measures
All children completed the following developmental assessments at Time 1 (T1).
Autism symptoms
The ADOS (Lord et al., 2000) is a play-based, clinician-rated assessment of autism symptoms. All children in this study were assessed using Module 1 of the ADOS (for children who are non-verbal or use single words). Communication + Social Interaction Total (C + SI) scores for Module 1 can range from 0 to 24, with higher scores indicating more autism symptoms. In combination with individual cutoffs for C and SI behaviors, scores of 12 or above on C + SI Total indicate an ADOS classification of “Autism” and scores between 7 and 11 indicate a classification of “autism spectrum.” The ADOS was used to confirm diagnosis of ASD and to estimate autism severity.
Developmental level
The Bayley Scales of Infant and Toddler Development–3rd Edition (BSID-III; Bayley, 2006) is a standardized developmental assessment normed for children 1–42 months of age. Children were asked to complete a number of problem-solving tasks such as matching, puzzles, and finding hidden objects. As some children were outside the standard age range for the BSID-III, the age equivalent (AE) rather than the standard score was used in all analyses. The BSID-III cognitive subscale was used to estimate children’s non-verbal cognitive skills at T1.
Joint attention
The Early Social Communication Scales (ESCS; Siebert et al., 1982) is a play-based assessment of early social communication skills; administration procedures are standardized, but developmental norms are not available. The number of joint attention initiations (IJA; e.g. gaze switches, showing, pointing) and percentage of joint attention responses (RJA; i.e. following a pointing gesture) demonstrated by the child during the ESCS were used as measures of joint attention skills. Number of IJA can be as low as 0, with no set ceiling. Percentage of RJA can range from 0% to 100%.
Children completed the following assessments at T1 and Time 2 (T2), approximately 6 months later.
Imitation
The Unstructured Imitation Assessment (UIA; Ingersoll and Meyer, 2011) is a play-based assessment of a child’s ability to imitate in a social-interactive context. Administration procedures are standardized; however, developmental norms for this assessment are not available. The UIA consists of 20 imitation tasks presented in a play-based context in which toys are freely available. Imitation of each model is scored 0 for no imitation, 1 for partial or emerging imitation, and 2 for full imitation, with total scores ranging from 0 to 40. The UIA was used to measure children’s ability to imitate in a social-interactive context.
Language
The Preschool Language Scale–4th Edition (PLS-4; Zimmerman et al., 2002) is a standardized measure of language development that is normed for children aged birth to 6 years, 11 months. Children were asked to complete a variety of tasks such as naming objects and pictures and answering simple questions. Many of the children in this study achieved standard scores at the floor of the assessment. Therefore, AEs on the Auditory Comprehension and Expressive Communication Scales were used to provide measures of children’s receptive and expressive language skills, as AE captured greater developmental variability than standard scores in this sample. The MacArthur-Bates Communicative Development Inventory (MCDI; Fenson et al., 2007) is a developmentally normed parent-report checklist containing 396 words (Words and Gestures) or 680 words (Words and Sentences) that are commonly found in the vocabularies of young children. Parents were asked to check off words that their child understands and says. Because many of the children in this sample scored at or near floor level when using age-based percentile scores, the number of words produced was used as a measure of expressive vocabulary.
Video clips
Continuous 2-min-long video segments of each child interacting with a clinician were used in Thin Slice ratings. These segments contained 2–4 min of the UIA at both T1 and T2 and were pre-selected (prior to viewing the specific video clips) to represent a sample in the middle of the interaction; the same continuous 2-min sample was selected for each child regardless of the child or clinician behaviors that occurred during that time. Clips included both audio and video.
Thin Slice ratings
Thin Slice ratings were completed by 255 undergraduate students drawn from a psychology research participation pool. Raters were told that they would be watching videos of children and adults playing together and would be asked to provide their opinions about the videos. They were not told any information about the study procedures or hypotheses, the children’s developmental status, or whether they were viewing videos from T1 or T2. Raters were split into 10 groups; each rater participated in only one group and the total number of raters for each clip ranged from 23 to 27. Each group viewed between three and six video clips. Each set of video clips was selected to include an approximately equal number of T1 and T2 clips and to include a maximum of one clip of any individual child. After viewing each 2-min video clip a single time, raters were instructed to score the clip on five separate statements using a 1–5 Likert-type scale. A score of 1 indicated “strongly disagree,” 3 indicated “neither agree nor disagree,” and 5 indicated “strongly agree.” The five statements the raters were asked to respond to were as follows: (1) the child imitates actions with toys modeled by the adult, (2) the child shows an interest in the adult, (3) the child plays with toys appropriately, (4) the child uses language appropriately, and (5) the child uses appropriate gestures in his or her own play and/or communication. Cronbach’s alpha for the items on the scale was 0.795, indicating good internal consistency. Therefore, scores for the five items were averaged to create a single score for each rater for each video. Then, scores from all raters of each video were averaged to create a single “Thin Slice rating” for each video clip that could range from 1 to 5. The reliabilities of each group of judges’ ratings were computed using intraclass correlations (ICCs) with a two-way random effects model for absolute agreement. ICCs for the different rater groups ranged from 0.884 to 0.989, with a mean ICC of 0.956 (see Table 2).
Reliabilities of judges’ ratings.
Intraclass correlations for all judges and one judge using a two-way random effects model for absolute agreement.
Results and discussion
Construct validity of Thin Slice ratings
To examine whether Thin Slice ratings were related to concurrent scores on developmental assessments, bivariate correlations between T1 Thin Slice ratings and T1 developmental assessment scores were calculated. Significant bivariate correlations were detected between a child’s T1 Thin Slice rating and his or her T1 scores for all of the developmental assessments: ADOS C + SI Score (r = −0.442, p < 0.05); BSID-III Cognitive AE (r = 0.739, p < 0.05); PLS-4 AE for Expressive Communication (r = 0.672, p < 0.01); PLS-4 AE for Auditory Comprehension (r = 0.553, p < 0.01); MCDI Expressive Vocabulary (r = 0.497, p < 0.05); UIA Total Score (r = 0.658, p < 0.01); and ESCS Responses to Joint Attention (r = 0.561, p < 0.01). Because many of these variables (e.g. number of words in expressive vocabulary, language AE) would be expected to increase significantly as child age increased, partial correlations controlling for child age were also examined. Partial correlations between the T1 Thin Slice ratings and scores on the following measures remained significant: BSID-III Cognitive AE (r = 0.590, p < 0.01); PLS-4 AE for Expressive Communication (r = 0.555, p < 0.01); and UIA Total Score (r = 0.580, p < 0.01). See Table 3 for correlations.
Bivariate and partial (controlling for age) correlations among developmental measures and Thin Slice ratings at Time 1.
ADOS: Autism Diagnostic Observation Schedule; BSID-III: Bayley Scales of Infant and Toddler Development–3rd Edition Cognitive Age Equivalent; PLS-4 Exp.: Preschool Language Scale–4th Edition Expressive Communication Age Equivalent; PLS-4 Rec.: Preschool Language Scale–4th Edition Auditory Comprehension Age Equivalent; MCDI: MacArthur-Bates Communicative Development Inventory Number of Words Produced; UIA: Unstructured Imitation Assessment; IJA: Number of joint attention initiations on the Early Social Communication Scales; RJA: Percentage of joint attention responses on the Early Social Communication Scale.
p < 0.05; **p < 0.01; ***p < 0.001.
Notably, many of the correlations between Thin Slice ratings and the developmental measures were similar in magnitude to the correlations among the different developmental measures. These strong correlations suggest that Thin Slice ratings captured meaningful developmental information that generalizes beyond this 2-min sample of behavior. This finding lends some credence to the idea that the clinical impressions of trained clinicians (which would be expected to be more accurate than the observations of untrained raters; Garb and Boyle, 2003) may have significant value in understanding a child’s developmental skills and needs. In addition, T1 and T2 Thin Slice ratings were significantly correlated with one another (r = 0.621, p < 0.01), suggesting some stability in this measure over time. While this information provides only very preliminary evidence of the construct validity of Thin Slice ratings for this population, it suggests that further study is warranted.
Thin Slice ratings as a predictor of language growth
To examine whether T1 Thin Slice ratings predicted growth in language skills, over and above the growth predicted by standardized developmental measures, a series of regression analyses were conducted. Change scores (T2−T1) for each language outcome (i.e. PLS, MCDI) were entered as dependent variables. 1 Predictor variables were entered in a single step and included T1 score for the dependent variable, T1 score for all other developmental assessments, and T1 Thin Slice rating. None of the T1 standardized developmental assessment scores emerged as significant predictors of change in MCDI or PLS-4 scores. Therefore, they were dropped from the final regression models.
T1 Thin Slice rating emerged as a significant predictor of growth in MCDI Expressive Vocabulary (β = 0.38, p < 0.01). That is, a child with a T1 Thin Slice rating 1 standard deviation (SD) above the mean would be predicted to gain an additional 21 words of expressive vocabulary on the MCDI over 6 months compared to a child with a T1 Thin Slice rating at the mean. T1 Thin Slice ratings were not a significant predictor of change in PLS-4 scores (see Table 4). 2 These results indicate that brief subjective ratings were able to predict changes in parent-reported child expressive vocabulary over time, even after controlling for T1 vocabulary size. This is particularly impressive, given that none of the standardized developmental assessments given at T1 emerged as significant predictors of change in vocabulary size or other language outcomes. However, the finding that Thin Slice ratings predicted gain in MCDI score, but not in PLS-4 score, is somewhat puzzling. One possible explanation for this discrepancy is that the MCDI represented a more change-sensitive measure of language skill. While the average gain in vocabulary size was 49 words (around 70% gain from pre-test), the average gain in PLS-4 AE was just 2.9 months for expressive language (about a 17% gain) and 3.6 months for receptive language (about a 21% gain). This relatively larger gain and greater variability in gain in MCDI scores may have increased the sensitivity for this measure compared to the PLS-4. It is also possible that a number of factors other than language skills (e.g. attention, motivation, compliance) heavily impacted children’s performance on the PLS-4 both at T1 and T2. These “test taking” factors are expected to be relatively stable and might mask language changes that occurred over the course of 6 months. As a parent-report measure, the MCDI would likely be less influenced by these “test taking” factors. Finally, it is possible that the specific behaviors detected by the Thin Slice ratings are differentially related to growth in vocabulary skills, as opposed to other more structural language skills.
Linear regressions predicting change in language skills over time.
MCDI: MacArthur-Bates Communicative Development Inventory; PLS-4: Preschool Language Scale–4th Edition; T1: Time 1; T2: Time 2.
Sample statistic = F for overall models, t for individual predictors.
p < 0.01; ***p < 0.001.
This study has several limitations that should be addressed in future research. The sample of children with ASD in this study was relatively small, primarily male, and composed primarily of children experiencing significant language and cognitive delays. Therefore, it is difficult to determine whether Thin Slice ratings would have similar concurrent or predictive validity in a more heterogeneous population of children with ASD. Second, this study examined ratings of children’s behaviors during a single interaction with one adult. While the adult’s behavior was somewhat standardized (in that all video clips were taken from the same semi-structured assessment), it is likely that cues contained in the adult’s behavior contributed to Thin Slice ratings. In previous studies examining thin slices, raters have detected teachers’ expectancies about students’ performance, as well as relationship or rapport between two individuals (Ambady et al., 2000). Therefore, it is possible that the expectancies or beliefs of the adult about the child’s skills or abilities were evident in the adult’s behavior during the video clip and may have influenced Thin Slice ratings. In future studies, it would be useful to examine concurrence among ratings of videotapes taken during interactions with different adults and in different contexts.
Furthermore, this study is unable to elucidate which specific features of children’s behavior contributed most strongly to Thin Slice ratings. Previous research on Thin Slice ratings indicates that ratings are strongly influenced by dynamic non-verbal behavior expressed through any channel of communication (i.e. body, face, voice) or their combination (Ambady et al., 2000). Children with ASD exhibit significant impairment in their use of non-verbal behavior (American Psychiatric Association, 2013). Therefore, it is possible that subtle vocal and non-verbal features associated with ASD (e.g. unusual vocal tone, asynchrony between verbal and non-verbal behaviors, unusual patterns of movement) may have significantly influenced Thin Slice ratings in this study. Future research that can identify specific non-verbal behaviors that influence these ratings and predict developmental outcomes would be highly useful for improving diagnostic models and outcome predictions for children with ASD. These findings provide preliminary evidence for the concurrent and predictive validity of Thin Slice ratings in young children with ASD. More extensive investigation of the utility of these types of ratings is warranted, as they may provide a relatively quick, easy-to-collect, and socially valid measure of behavior in young children with ASD.
Footnotes
Funding
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
