Abstract
Video modeling involves the learner viewing videos of a model demonstrating a target skill. According to the National Professional Development Center on Autism Spectrum Disorders (2011), video modeling is an evidenced-based intervention for individuals with Autism Spectrum Disorder (ASD) in elementary through middle school. Little research exists evaluating video modeling for individuals with ASD in high school. This study examined the effectiveness of video modeling to facilitate the development of word recognition and pronunciation in three male high school students with ASD. A single-case multiple baseline experimental design across participants (i.e. video modeling sequentially implemented across three students) was used to evaluate the effectiveness of video modeling. Results indicate that video modeling was effective in facilitating word recognition and pronunciation. Findings suggest that video modeling may be a viable intervention to foster the reading development of adolescents with ASD.
I Introduction
Approximately one out of every 88 children is identified with an Autism Spectrum Disorder (ASD) (Centers for Disease Control and Prevention, 2012). ASD is characterized by several traits including social impairment, communication impairment, and rigidity in thinking (Moore et al., 2000). Over the past 10 years, the prevalence rates of ASD have increased by 78% (Insel, 2012). Although individuals with ASD can live fulfilling lives, research shows that only 4% to 12% of adults with ASD live independently (Billstedt et al., 2005).
Individuals with ASD may have difficulty developing functional academic skills, including reading. Although there is an assumption that individuals with ASD demonstrate a relative strength in reading, data regarding levels of reading in this population is missing. One hypothesis is that because oral language skills place children at-risk for reading difficulties, and individuals with ASD commonly have language impairments, difficulty learning to read can be expected (Nation et al., 2006). Further, it is common for individuals with ASD to have intellectual difficulties. In fact, some studies suggest that up to three quarters of individuals with ASD have also been identified as having an intellectual disability (Yeargain-Allsopp et al., 2003). Information is lacking regarding effective strategies for teaching phonics and phonemic awareness to students with intellectual disabilities; however, there is strong evidence for using systematic prompting to teach sight words (Bowder et al., 2006). Identifying effective and efficient reading interventions for this population is crucial.
In contrast to the social, communication, and academic deficits exhibited by this population, individuals with ASD exhibit relative strengths in visual processing as compared to auditory processing (Hodgdon, 1995). Thus, researchers and educators are exploring instructional strategies that use this relative strength in visual processing to facilitate the acquisition of important social, communication, and behavioral skills. One instructional method that utilizes visual processing is modeling. Modeling is based on Albert Bandura’s social learning theory (1977), which proposes that children learn new behaviors and skills through observing models performing those behaviors. Many researchers have tested Bandura’s theory and found evidence of the importance of modeling for learning (e.g. Barry and Overmann, 1977; Charlop et al., 1983; Charlop and Walsh, 1986; Egel et al., 1981; Ingersoll et al., 2007).
Recently, researchers have examined the effectiveness of video modeling for teaching individuals with ASD. Video modeling involves the learner watching a video in which the target behavior is modeled. Children with ASD may learn target skills more rapidly with video modeling than with other types of instructional techniques (Charlop-Christy et al., 2000; Williams et al., 2002). Video modeling may be effective because it reduces attention and language requirements necessary for success, it does not require the individual to interact socially with the instructor, uses observational learning, and can be individualized for different students, skills, and settings (Delano, 2007; Scherer et al. 2001). Additionally, compared to in-vivo modeling, video modeling allows for better control over the modeling procedure, is more convenient, allows for more opportunities to respond, and can be reused (Thelen et al., 1979). Video modeling techniques can be administered using portable devices such as tablet technology, which educators report as motivating modes of instruction for individuals with ASD (Fisher et al., 2013).
Video modeling has been used to teach preschool children play skills (D’Ateno et al., 2003), compliment-giving (Apple et al., 2005), appropriate affective responding (Gena et al., 2005), conversation skills (Scherer et al., 2001), and prosocial behaviors such as initiating a social interaction, responding to an initiation or invitation to socialize, and maintaining a social interaction (Kroeger et al., 2007). A number of studies have demonstrated that video modeling can also be effective for older children with ASD. Video modeling has been shown to lead to increases in the conversational skills (Charlop and Milstein, 1989; Scherer et al., 2001), play-related comments (Taylor et al., 1999), social initiation and reciprocal play skills (Nikopoulos and Keenan, 2004) of elementary and middle school aged students with ASD.
Although video modeling has been shown to be effective for preschool, elementary, and middle school-aged children with ASD, research is needed to ascertain whether this intervention would also be effective for high school-aged learners (National Professional Development Center on Autism Spectrum Disorder, 2011). Little research exists exploring the use of video modeling for adolescents with ASD in teaching areas of reading, such as sight word recognition. The purpose of the present study was to examine the effectiveness of video modeling to teach word recognition and pronunciation to adolescents with ASD. It was hypothesized that participating in the video modeling intervention would result in increases in word recognition and pronunciation of the participants.
II Method
1 Participants and setting
Three male high school students with ASD, Trevor, Jack, and Sal, participated in the study. Trevor was 18 years of age while Jack and Sal were 17 years of age. Students attended a specialized school for the education of individuals with ASD in the Northeastern United States. Trevor and Jack could speak in 3–4-word sentences. Sal could speak in full sentences (7–8-word sentences). No information was available regarding participants reading levels. Students were a convenience sample selected for inclusion in the study based on teacher report that they had the prerequisite skills necessary for completion of the study (i.e. adequate attention and imitation skills). Trevor, Jack, and Sal’s scores on the Childhood Autism Rating Scale (Schopler et al., 2002) were 31.5, 33, and 30, respectively. Therefore, the severity of autistic symptoms of all three students fell in the mild to moderate range.
2 Materials
Students watched videos that were created by GemIIni, a company that provides commercially-available video modeling therapy sessions. Videos were watched on a desktop computer in the participants’ classroom. The intervention was individualized for each participant. Videos were selected from the GemIIni library to address each student’s unique needs. Classroom teachers identified each student’s needs.
Trevor’s teacher requested that he learn how to identify and pronounce five words he would need to know for his upcoming vocational training placement working in the school’s store. The videos he watched to help him learn each of the target words consisted of a similarly-aged, female, typically-developing peer model. The model was shown from the waist-up. The videos began with her stating the target word while the text of the word was shown on a white card at the bottom of the screen. Next the videos cut to a close-up of the model’s mouth stating the word again. Finally, the videos cut back to the model shown from the waist-up again, with text of the target word shown on a white card at the bottom of the screen, stating the target word one final time. The five videos shown to Trevor ranged from 9 to 11 seconds in duration. Jack’s teacher requested that he be taught to identify and state five sight words. Therefore, he was shown videos identical to those shown to Trevor but with different target words. Jack viewed videos that ranged from 9 to 12 seconds in duration.
For Jack and Trevor, word pronunciation was assessed in the participants’ classroom by showing each student five cards; each with one of the target words printed on it and asked ‘What word?’ If the student was able to articulate the word, and the word was recognizable, it was scored correct, even if there was some mispronunciation. Word recognition was assessed for Jack and Trevor by placing the five cards in front of the students and asking them to non-verbally identify each of the target words. For example, to assess a student’s recognition of the word also the tester would ask the student to ‘touch also’. If the student was able to touch the card with the word also written on it, the item was scored correct.
Sal’s video modeling intervention differed to that of Trevor and Jack because his skills were more advanced. Like Trevor and Jack, the goal of the intervention was to teach him to be able to identify and pronounce five new words but his intervention had the added component of learning the definitions of the words. Sal’s videos consisted of a similarly-aged, male, typically-developing peer pronouncing the target word, providing an illustration of the meaning of the word, and then providing the definition of the word. For example, the video teaching the word ambiguous began with the target word displayed in white text against a black background for approximately two seconds. Next, the peer model was shown from the waist-up against a white background stating the target word twice. The video then cut to a close-up of the model’s mouth stating the target word two additional times followed by cutting to the model engaged in an interaction with an adult to illustrate the meaning of the word. This interaction was as follows:
‘Hi John. Do you want rice or potatoes for dinner?’
‘Okay.’
‘Okay? That’s an ambiguous answer. You have to choose, rice or potatoes.’
‘Potatoes.’
‘Okay, we’ll have potatoes.’
Finally, the video ended with the text of the definition of the word appearing on the screen along with an audio recording of the definition being read. Sal viewed videos that ranged from 19 to 43 seconds in duration.
To assess word pronunciation, recognition and understanding of what the word meant Sal was shown a card with the word printed on it and asked, ‘What word?’ Next he was asked, ‘What does it mean?’ If he was able to both pronounce the word and provide its definition, then the item was scored correct. The criterion for a correct definition was that the substantive meaning of the word was accurately described. This procedure was repeated for the remaining four target words. Word recognition was assessed for Sal by placing the five cards in front of him, providing a definition of one of the words, and then asking him to touch the card that matched the definition of the word. This procedure was repeated for the remaining four target words. Target words for each of the students are listed in Table 1.
Word targets during video modeling intervention.
Interobserver agreement for the scoring of the word pronunciation and word recognition was collected for 33% of sessions; it was calculated using the point-by-point method in which the number of agreements was divided by the number of agreements plus disagreements and multiplied by 100%. Interobserver agreement was 97% (range, 90% to 100%) for Jack’s data; 100% for Trevor’s data; and 100% for Sal’s data.
3 Design and procedure
A single-case multiple baseline across students experimental design (i.e. video modeling sequentially implemented across three students) was used to evaluate the effectiveness of video modeling to teach words to participants with ASD.
a Baseline
Baseline data collection consisted of testing the students’ word pronunciation and recognition of the target words. For Trevor, 3 baseline data points were collected. For Jack, 5 baseline data points were collected and for Sal, 8 baseline data points were collected. The time interval between data collection points ranged from 1 day to 3 days (data was collected daily Monday through Friday but not collected over the weekend).
b Intervention
The intervention phase consisted of students watching videos for each target word 10 times per session, which took approximately 15 minutes. At the end of each session, students’ knowledge of target words was tested. The time interval between sessions ranged from 1 day to 3 days (data were collected daily Monday through Friday but not collected over the weekend). Trevor, Jack, and Sal received 14, 12, and 5 intervention sessions, respectively. Intervention data were collected across approximately three weeks.
c Follow-up
To assess maintenance of gains for participants, follow-up measurements were conducted three months after the completion of the intervention and were identical to measurements taken during the video modeling intervention. Videos were not viewed prior to these assessments. Follow-up data were collected daily across four days.
4 Social validity
After the completion of the study, four teachers (each student’s primary instructor in addition to the head teacher of the classroom) were asked to participate in a social validity assessment. Teachers returned the assessments to the first author’s mailbox with no identifying information. The researcher asked teachers to rate statements about the acceptability of video modeling on a Likert-type scale (1-4; 1 = strongly disagree, 4 = strongly agree). Statements included: ‘Student participation in video modeling was problematic’; ‘Student participation in video modeling was beneficial’; and ‘I would be willing to implement video modeling in the future.’ In addition, following the intervention, each student was asked by the first author, ‘Did you like watching the videos?’ and ‘Would you like to watch videos again to learn new words?’
III Results
1 Word recognition
Figure 1 shows the effects of video modeling on the participants’ word recognition. Word recognition accuracy ranged from 0 to 20% for Trevor during baseline data collection, with a mean word recognition accuracy during baseline of 7% (SD = 12%). During intervention, there was variability in Trevor’s word recognition accuracy, which ranged from 0 to 80% and showed an increasing trend following session 12. Trevor’s mean word recognition accuracy was 33% (SD = 30%) during intervention, and 75% (SD = 10%) during follow-up.

Percentage correct word recognition for Trevor, Jack, and Sal across baseline, intervention, and follow-up phases.
Sal’s word recognition accuracy ranged from 0 to 40% during baseline data collection. with a mean word recognition accuracy of 25% (SD = 18%). Immediately following the start of intervention, Sal’s word recognition accuracy rose immediately to 100% and remained at this level across intervention and follow-up.
Across baseline, Jack’s word recognition accuracy ranged from 0 to 100%, with a mean word recognition accuracy across baseline of 37% (SD = 45%). During intervention, Jack’s word recognition accuracy ranged from 80 to 100% with a mean of 98% (SD = 6%).
2 Word pronunciation
All participants’ word pronunciation accuracy increased between baseline and intervention, and baseline and follow-up (see Figure 2). Figure 2 shows the effects of video modeling on word pronunciation. Data paths for all three students were stable, with flat trends, and no variability during baseline. Jack’s word pronunciation rapidly and continually increased when the intervention was introduced. Jack’s mean percent accurate was 0% during baseline, 86% (SD = 24%) during intervention, and 85% (SD = 10%) during follow up. Sal’s word pronunciation and definition accuracy increased between baseline and intervention. Sal’s mean accuracy was 0% during baseline, 84% (SD = 26%) during intervention, and 80% (SD = 16%) during follow-up.

Percentage correct word pronunciation for Trevor and Jack across baseline, intervention, and follow-up phases. Percentage correct word pronunciation and definition for Sal across baseline, intervention, and follow-up phases.
Intervention effects were less clear for Trevor. Some effects were identified. Trevor’s mean percent accurate was 0% during baseline, 23% (SD = 25%) during intervention, and 30% (SD = 12%) during follow-up. However, Trevor’s word pronunciation increased during the third intervention session which was when the intervention was introduced to Jack. Finally, word pronunciation was inconsistent and variable during intervention indicating limited intervention effects.
3 Social validity
Four teachers completed the social validity assessment (each student’s primary instructor and the head teacher of the classroom) and results indicated that all deemed video modeling acceptable. Teacher responses to the social validity assessment are displayed in Table 2. In addition, following the intervention all of the students were asked, ‘Did you like watching the videos?’ and ‘Would you like to watch videos again to learn new words?’ All participants responded in the affirmative to both questions.
Teacher responses to social validity assessment.
Notes. Teachers rated statements about the acceptability of video modeling on a Likert-type scale (1–4; 1 = strongly disagree, 4 = strongly agree).
IV Discussion
Findings are consistent with previous research that has demonstrated the effectiveness of video modeling for use with individuals with ASD (e.g. Apple et al., 2005; D’Ateno et al., 2003; Kroeger et al., 2007) and extends the literature by demonstrating an effect for video modeling in teaching word recognition and pronunciation to adolescents with ASD.
All three participants improved and maintained accuracy scores during the follow-up phase as compared to the baseline phase for both target behaviors (i.e. word recognition and word pronunciation). This study provides support for the use of video modeling to teach basic reading recognition skills to adolescents with ASD. Although there has been little research about reading abilities in the ASD population, some theories exist suggesting students with ASD are at-risk for reading difficulties (Nation et al., 2006). Results of this study suggest that teachers and educators can use video modeling to target reading skills.
Word recognition intervention effects were clear for Sal and Trevor. Sal’s word recognition accuracy increased by 75% (baseline = 25%; intervention = 100%). Sal maintained his growth with 100% accuracy at follow-up. Although Trevor’s word recognition only appears to have increased by 26% (baseline = 7%; intervention = 33%), by the final three sessions of the intervention Trevor was able to reach 80% accuracy, which was maintained in the follow-up. This demonstrates that word recognition was established.
Effects were less clear for Jack. Jack’s word recognition began increasing and reached 100% accuracy during the baseline phase. There are three hypotheses for why this may have happened. First, the data collection procedure may have served as an intervention. By giving Jack a number of opportunities to see the words during data collection he may have developed word recognition. Second, further investigation into all participants indicates an increase in word recognition after exposure to the target words three times. Future research should investigate the relationship between exposure opportunities and learning effects. And third, Jack’s cognitive and/or reading ability may have been higher than Sal and Trevor, allowing him to learn the target words at a rate much quicker than his peers. Regardless of learning effects during baseline, word recognition was established and maintained at 100% at follow-up.
Word pronunciation intervention effects were clear for all three students; however, strength of effects were varied. Jack’s and Sal’s word pronunciation accuracy increased by 100% (baseline = 0%; intervention = 100%). The accuracy of Jack and Sal’s word pronunciation at baseline was 0% and rose to means of 85% and 80% at follow-up. Although Trevor’s word pronunciation accuracy increased by 60% (baseline = 0%; intervention = 60%), intervention effects were delayed and partially maintained at follow-up with 30% accuracy. Delay in intervention effects is consistent with previous research, where participants have failed to respond to a video modeling intervention until after 4 to 5 intervention sessions (e.g. Sherer et al., 2001; Taylor et al., 1999). Of more concern is that Trevor’s intervention data was inconsistent and variable throughout even though he received the longest period of intervention. There are three hypotheses for why this may have happened. First, the intervention design may not have been appropriate. Previous research has demonstrated differential effects of video modeling when the videos are commercially (i.e. with actors, unfamiliar settings) versus custom made (i.e. familiar model, familiar setting) (Rosenberg et al., 2010). It may have been more effective if the video used a self- and familiar-model (i.e. Trevor) rather than an unfamiliar peer, in a familiar setting. Second, the intervention may have not been intensive enough to result in significant increase in word pronunciation. And, finally, although Trevor’s use of language may have been commensurate with Sal and Jack, his reading and/or cognitive levels may have been significantly lower and may have impacted his ability to respond in a manner similar to his peers.
Although the current study extends the current literature by examining the use of video modeling in adolescents with ASD to learn foundational reading skills, there are several limitations that should be noted. One limitation was that the study did not compare video modeling intervention directly to other models of instruction and therefore more research is needed to ascertain whether video modeling is more or less effective than other methods for the education of adolescents with ASD. Further, given the nature of single case research, findings from this study may not be generalizable. Finally, it is possible that the methods used to collect accuracy scores may have impacted the effects of the intervention.
Future research replicating these findings with larger samples as well as comparing video modeling to other models of instruction is needed. The lack of information regarding student reading and/or cognitive ability made it difficult for the current researchers to explain variations on participant response to interventions. Lastly, future research should be conducted exploring the application of video modeling in teaching other reading skills for which research is lacking (i.e. phonics, phonemic awareness).
Footnotes
Declaration of conflicting interest
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
