Abstract
Interventions designed for children with disabilities and their families should be socially valid. Parent-implemented Communication Strategies-Storybook (PiCSS) is an intervention package designed to coach parents on shared storybook reading and naturalistic teaching strategies. In PiCSS program, the participating parents used the communication teaching strategies with high fidelity, and children responded more to their parents’ communication. To evaluate the social validity of PiCSS, we designed video-based rating surveys and collected data from masked raters (college students and practitioners). In total, 120 raters evaluated 12 1-min video clips. Quantitative analyses revealed that all raters scored coaching videos significantly higher than baseline videos for parent and child outcomes, indicating the PiCSS program had positive outcomes in parent–child interactions.
Parent-Implemented Interventions
Researchers have demonstrated that parents have the capacity to learn communication teaching strategies and implement them with high fidelity with their children with developmental disabilities (DD). Moreover, parents can generalize the strategies they have learned to other natural activities and provide their children with many learning and practice opportunities (Akamoglu, Meadan, & Burke, 2016; Kashinath, Woods, & Goldstein, 2006). Thus, parent-implemented communication interventions aim to teach parents specific communication teaching strategies so that they can create opportunities for their children to practice and learn communication skills. The effectiveness of parent-implemented interventions has been well documented for children with DD. In these studies, parents successfully learned and implemented strategies to increase communication and language skills in their children (Akamoglu & Meadan, 2018; Biggs & Meadan, 2018). For example, parents learned to use specific strategies, such as modeling (Gillett & LeBlanc, 2007; Meadan, Angell, Stoner, & Daczewitz, 2014), language-facilitating prompts (Kashinath et al., 2006), and language expansions (Moore, Barton, & Chironis, 2014).
Researchers reported positive results of parent-implemented interventions for children with DD (Kaiser & Roberts, 2013; Kashinath et al., 2006; Meadan, Angell, et al., 2014; Meadan et al., 2016). As a result of parents’ use of communication teaching strategies, children have shown increases in their rate of responses and initiations (Meadan, Angell, et al., 2014; Meadan et al., 2016), use of single and multiword utterances (Gillett & LeBlanc, 2007; Kashinath et al., 2006), and mean length of utterance (Kaiser, Hancock, & Nietfeld, 2000). Some parent-implemented intervention studies focus on training and coaching parents to teach targeted skills (Akamoglu & Meadan, 2018).
One of these intervention studies is the Parent-Implemented Communication Strategies-Storybook (PiCSS; Akamoglu & Meadan, 2019) which is a parent-implemented intervention program that incorporates parent training and coaching. In the PiCSS study, two mothers were taught and coached to implement storybook reading techniques (e.g., attention getters, feedback) and three evidence-based communication teaching strategies to support their children’s communication skills: (a) modeling, (b) mand-model, and (c) time delay. A multiple baseline design across four tiers (a set of book-reading techniques and three communication teaching strategies) was employed. The study consisted of four phases: baseline, post-training, coaching, and maintenance. The ultimate goal in PiCSS was to increase children’s participation in shared book reading and promote children’s communication skills such as responses and initiations. The mothers received in-person individualized coaching on each of the communication teaching strategies until they implemented each strategy with fidelity score of 4 (highest score). For example, to receive a fidelity score of 4 in modeling, the mother had to complete four steps: (a) establish joint attention; (b) present a verbal, vocal, or gestural model; (c) wait 2 to 3 s for the child to respond; and (d) respond to the child’s behavior (Meadan et al., 2016).
The data resulting from piloting the PiCSS intervention program were promising. After coaching, (a) the mothers used the modeling, mand-model, and time delay strategies with higher rates and fidelity; (b) the children produced more spoken words in their responses to their mothers’ communicative acts and initiated more communicative acts upon their mothers’ use of time delay; and (c) the parents reported improved social-communication skills and engagement of their children during reading and found the study procedures feasible. Both mothers used the strategies with high fidelity when coaching was provided, but not after training alone. The training alone was not sufficient to produce improvements in the mothers’ behaviors. Although the results were promising and the parents reported positive outcomes, a comprehensive assessment of social validity was warranted to strengthen claims regarding the effectiveness and importance of the outcomes.
Social Validity
Intervention importance and meaningfulness are referred to as “social validity,” which has its roots in applied behavior analysis (ABA). Social validity was developed from anecdotal reports of concerns about the social meaningfulness of interventions in ABA-based research (Wolf, 1978). Kazdin (1977) and Wolf (1978) responded to these concerns, defined the concept, and proposed ways to measure “social importance” and which is now referred to as social validity. Kazdin (1977) and Wolf (1978) suggested measuring: (a) the relevance of intervention goals, (b) the feasibility of intervention procedures, and (c) the meaningfulness and acceptability of the intervention outcomes. These questions are related to the social importance and meaningfulness of any intervention and extend beyond the effectiveness of the intervention. For example, in single-case research (SCR) studies, a functional relation between the intervention and target behavior can be present, but the goals, procedures, and outcomes might not be socially valid which limits the importance of the intervention and the study itself.
Measuring Social Validity
The following are sample questions of evaluations across goals, procedures, and outcomes (Turan & Meadan, 2011): (a) Goals: What skills are important for the child to learn? (b) Procedures: Are these procedures acceptable in terms of cost, ethics, and practicality? Do they reflect a close contextual fit for those using them? and (c) Outcomes: Have the outcomes (e.g., target child’s behaviors) made meaningful changes in the child’s everyday functioning? To address these questions, researchers have used different methods and tools. Barton, Meadan, and Ledford (2018) divided these methods into two categories: (a) typical subjective measures of the social validity of an intervention, including interviews, questionnaires, and rating scales; and (b) measures that are less subjective, including normative comparisons, masked rating, measurement of maintenance, and participant preference measurement.
Most frequently, researchers use subjective measures including questionnaires, ranking scales (i.e., quantitative), or interviews (i.e., qualitative) for subjective evaluation (Hurley, 2012). Social validity interviews and surveys can be useful to understand the perspectives of the participants and other stakeholders (e.g., parents or teachers of the participant child) about the intervention goals, procedures, and outcomes (Snodgrass, Chung, Meadan, & Halle, 2018). For example, in a study of the social validity of an early reading intervention, participants rated themselves as having a high understanding of the intervention, but qualitative interview data indicated they actually had low levels of understanding (Lyst, Gabriel, O’Shaughnessy, Meyers, & Meyers, 2005). Likewise, Chung, Snodgrass, Meadan, Akamoglu, and Halle (2016) conducted an SCR study with three parents and their children who were minimally verbal. They found that their outcomes were not clearly aligned with the perceived intervention effects reported by the parents and coaches. Thus, they conducted a secondary analysis of the original data to determine if they could identify additional, socially valid effects using reliable and valid measures. The authors found that the parents had more gains in their use of communication teaching strategies and the children made more substantial progress in their communication behaviors.
Researchers who evaluate social validity also use measures and methods that are relatively less subject to bias and in some cases objective (Barton et al., 2018). These methods are considered to be more objective than the other social validity assessment methods and researchers use them to assess the importance and significance of their interventions (Ledford, Hall, Conder, & Lane, 2016). Ratings by masked raters—who are not involved in the study and unaware of the purpose, procedures, and outcomes—although subjective, are less likely to be biased than ratings by study participants or people involved in the study. Barton et al. (2018) described masked (i.e., blind) rating as follows: Blind ratings can be used to less subjectively determine whether participants’ behavior is rated as “different” before and after intervention or during baseline versus intervention phases (socially important outcomes) by people who do not know the status of the session they watched (e.g., pre- or postintervention, baseline or treatment phases) and/or the purpose of the study. (p. 146)
Researchers who used masked rating, recruited external evaluators who were teachers, parents, and/or professionals who had expertise in the child’s disability but naïve (masked) to the study (Foster & Mash, 1999; Meadan, Stoner, et al., 2014). Some researchers used masked rating where they had external evaluators rate video clips of the study participants to evaluate if masked raters could observe a difference in participants’ behavior from pre- to postintervention. For example, Meadan and colleagues (2014) had parents, speech language pathologists (SLPs), and early childhood special education (ECSE) teachers rate randomly selected 2-min video clips of parent–child interactions from pre- and postintervention sessions and complete a questionnaire about parent and child behaviors. Although all evaluators had higher ratings for postintervention videos, Meadan and her colleagues noted that teachers reported higher scores. Likewise, Jung, Sainato, and Davis (2008) showed 20 ECSE teachers three randomly ordered 2-min video clips (one from the baseline phase, and two from the intervention phase), so the teachers were masked to the study condition. Using a checklist to rate the videos, 60% of the teachers rated intervention procedures as easy and 85% rated the intervention as “useful” to improve social interactions between peers. In a social skills intervention, Storey, Danko, Ashworth, and Strain (1994) showed video clips taken from baseline, intervention, and generalization phases of their study to 12 children and 18 ECSE experts (undefined). Using a Likert-type scale, 68% of the raters rated intervention and generalization videos significantly higher than baseline videos. English, Goldstein, Shafer, and Kaczmarek (1997) also used masked raters. They had 11 early intervention (EI) professionals evaluate video clips from pre- and postintervention of the target children based on dimensions of quantity and quality of play. These studies used masked raters to assess the significance of their interventions. In addition, other researchers found that raters with different backgrounds and experiences viewed or evaluated intervention outcomes differently (Meadan, Stoner, et al., 2014; Oke & Schreibman, 1990). These types of findings have implications for research and practice. Additional replications of research using masked raters is warranted to continue to examine valid methods for evaluating social validity.
Current Study
We used three separate social validity measurement procedures to evaluate the social validity of the PiCSS program comprehensively. These were as follows: (a) evaluation of parent and child outcomes using data obtained from our analysis within our SCR (i.e., behavioral observation) and formal and informal child language assessments, (b) participant interviews and short, pre- and postintervention questionnaires, and (c) an evaluation of the PiCSS program by masked raters. Because previous research analyzing the social validity of the PiCSS has focused on the first two procedures, and given the use of masked raters in social validity, we aimed to strengthen claims of effectiveness of the PiCSS program by complementing the data from the behavioral observations, interviews, and questionnaires with a masked rating social validity evaluation. We also compared the ratings between rater groups (students vs. practitioners).
The following research questions were investigated:
Method
Participants
Two mother–child dyads participated in PiCSS intervention. The first dyad included Sarah and her daughter Emily (pseudonyms); Emily had autism spectrum disorder (ASD) and was 3 years 6 months old at the beginning of the study. The second dyad included Noelle and her son Mason; Mason was 5 years old at the beginning of the study and diagnosed with cerebral palsy. Both parents were White, had graduate degrees, and reported an annual household income of greater than US$65,000. Before the study, Emily primarily used sounds and gestures to communicate with others and had seven functional spoken words. Mason primarily used single-word utterances and occasionally multiword utterances (phrases or short sentences) to communicate with others. He had 201 functional spoken words before the study. During the coaching phases of the study, Emily produced 34 single-word utterances to respond to her mother, and Mason produced 103 multiword utterances to respond to his mother.
Masked Raters
Masked raters consisted of graduate and undergraduate students in special education and speech and hearing science departments, and practitioners (i.e., EI service providers, ECSE teachers, and SLPs) who worked in local programs or organizations. In total, 133 raters responded to the survey with 120 respondents completing the entire survey (see Table 1). In total, 13 respondents rated some of the videos, quit the survey, and left it incomplete. In the remaining 120 survey data, no missing data were present because the survey required participants to rate each item and each video clip without the option of skipping. Students were recruited by sending e-mail messages to the student listserv of the special education department and contacting the speech and hearing science department at a large university in the Midwest. The recruitment message included a description of the study and a URL link to the survey. Practitioners were recruited by contacting the local speech and hearing clinics and early intervention agencies and organizations in a Midwestern state. As an incentive, respondents were given the option to enter their e-mail addresses for a drawing to win US$10 gift cards.
Characteristics of Participants.
Note. SPED = special education; SLP = speech and language pathologist.
Rater Survey
Two separate surveys, one for each parent–child dyad, were created on the university-licensed SurveyGizmo to evaluate the social validity of the intervention outcomes. Each survey included six 1-min-long, randomly selected video clips: three videos from the baseline phase and three videos from the coaching (intervention) phase. Masked raters, who were masked to the phase from which the clips were obtained, rated the parent–child interactions using a 10-item, 5-point Likert-type rating scale (1 = strongly disagree to 5 = strongly agree) for each video. See the appendix for rating items. The items were developed to reflect the intervention outcomes (parent and child dependent variables).
Video Clip Selection
Across all phases of the study (baseline, post-training, coaching, and maintenance), we videotaped 20 parent–child reading sessions for each parent–child dyad (40 in total). Of those 20, 5 were from the baseline phase, and 15 were from coaching sessions. During the baseline condition, the mothers were instructed to “read the book as you typically would.” During coaching, the researcher and mother set action goals for each session, the researcher observed the mother during reading, asked the mother to reflect on the sessions, and provided the mother with supportive and corrective feedback. For each strategy use, coaching sessions continued until the mother reached a preestablished performance criterion level (i.e., a score of Fidelity 4 on 80% of strategy use for two consecutive sessions).
To identify videos for the social validity rating, for each parent–child dyad, we randomly selected three videos from baseline sessions and three from coaching sessions to represent pre- and postintervention parent–child interactions. The three coaching session videos were selected: one from coaching on modeling, one from coaching on mand-model, and one from coaching on time delay. From these six sessions, we edited 16, 1-min-long video clips: (a) eight video clips from the baseline phase and (b) eight video clips from the coaching phase for each parent–child dyad (32 in total). A research team consisting of the interventionist (first author), one faculty member (third author), and a doctoral student who helped with coding videos, independently reviewed the 32, 1-min-long, randomly selected video clips and completed a short video clip eligibility form for each parent–child dyad separately. The form included the following questions: (a) Does the session represent the corresponding phase (e.g., baseline, coaching)? Why or why not? and (b) Should we include the video clip in the survey? Why or why not? Each question was asked once for each of the video clips that were randomly chosen for that phase.
After the research team members reviewed the video clips independently and identified those with the clearest representations of each phase, the interventionist sent the commonly selected clips to the team members again to verify that each clip represented the phases. Using this process and after the research team reached a consensus, three video clips were identified and selected from the baseline phase and three from the coaching phase for each parent–child dyad. In videos from the baseline phase, typically the mothers were reading the book while the child was sitting passively. In videos from the coaching phases, typically the mothers were using reading techniques and communication teaching strategies to engage their children and the children actively participated in the reading. The coach was not present in the videos because while the mother and child were reading a book, the coach was observing the interaction and recording the video without interacting with the mother or child.
Data Entry and Analysis
A quantitative approach to data analysis was employed to analyze the ratings. The first author downloaded the survey data from SurveyGizmo to an Excel spreadsheet and cleaned the data by omitting incomplete survey responses. To confirm accurate data entry, the second author reviewed the data file, verified that all data were entered accurately, and asked the first author to make any needed corrections. After the final corrections, the second author organized the data in the Excel sheet and entered them to SPSS version 24.0 for analysis. We examined the distribution of the scales to determine whether they were normally distributed. Because no previous information about the scale characteristics was available, we performed an exploratory factor analysis to check if any underlying factors should be considered. We included the scores from all 10 items of the social validity rating instrument in the analysis and used principal components as the extraction method with varimax rotation. We considered factors with an eigenvalue greater than 1 (Harman, 1976). The internal consistency of the instrument was assessed through (a) Cronbach’s alpha, (b) the Kaiser–Meyer–Olkin test with a recommended cutoff of 0.5 (Tabachnick & Fidell, 2007), and (c) Bartlett’s test of sphericity with a significance value of .05 (Kaiser, 1974).
To compare the social validity ratings among the study variables, variance analyses were performed. Levene’s test was considered to examine the homogeneity assumptions in all variance tests. When the assumption of homogeneity was violated, nonparametric Mann–Whitney tests were performed. We calculated effect sizes using Cohen’s d for parametric tests (independent and paired-samples t tests) and rank-biserial correlations for Mann–Whitney tests. Cohen’s d values for small, medium, and large effects correspond to 0.2, 0.5, and 0.8, respectively (Cohen, 1992), and the corresponding values for rank-biserial correlations, which can be interpreted as Pearson’s r, are .1, .3, and .5 (Goss-Sampson, 2018). Finally, the Bonferroni correction for multiple testing (Bender & Lange, 2001) was used to minimize Type I error. We divided alpha values of .05 by the number of tests performed. Because we analyzed three different coaching phase videos and the average score, we divided the alpha by four, obtaining a corrected p value of .013 for the paired-samples t tests. For the independent samples t test that we performed between raters and between student majors in baseline, coaching and difference scores, the corrected p value was .017.
We then performed descriptive and variance analyses. First, we conducted paired-samples t tests to compare raters’ ratings between baseline and coaching sessions. To do so, we calculated mean scores of each rater for each baseline and coaching session. Second, another paired-samples t test was conducted to examine the potential pre–post differences (baseline – coaching) at the item level. We calculated mean scores for each item across three baseline and three coaching video clips. Third, we conducted independent samples t tests to analyze the differences between students and practitioners. Finally, we calculated the difference scores (score at coaching minus the score at baseline) to obtain a gain score. We analyzed the differences between types of major and rater using the difference score as the dependent variable to examine the differences in gain between the groups of each variable (Huck & McLean, 1975; Thomas & Zumbo, 2012).
Results
We conducted exploratory factor analysis on the social validity scale. The Kaiser–Meyer–Olkin was 0.93 and the Bartlett’s test of sphericity was statistically significant: χ2(45) = 1,940.64, p < .001. These results suggested the adequacy of the data for factor analysis. We conducted an exploratory factor analysis to determine the underlying factors. The solution extracted one factor with all 10 items with factor loadings above .75 and explained 83.37% of the variance.
The internal consistency of the baseline scores was optimum (α = .93), the interitem correlations ranged between .12 and .78 (M = .57, SD = .13), and the scale had a mean score of 2.09 (SD = 0.72). Cronbach’s alpha for coaching was also optimum (α = .98). Interitem correlations ranged between .62 and .95 (M = .81, SD = .08), and the scale had a mean score of 4.02 (SD = 0.92). The post–pre difference (coaching – baseline) ranged from −1.00 to 3.67 (M = 1.93, SD = 1.08). After analyzing the internal consistency of the scores, we analyzed the scores on baseline and coaching. As shown in Table 2, scores on all three coaching videos as well as the overall (combined) coaching had a similar mean score of 4 points in the 5-point scale, whereas baseline scores were slightly more than 2 points.
Descriptive Statistics for all Three Coaching Session Videos and Overall Baseline and Coaching Scores (N = 120).
Average score of scores on Videos 1, 2, and 3.
Because there were no differences among the three coaching phases (modeling, mand-model, and time delay), they were combined to generate one average coaching phase score. We then analyzed the differences between baseline and coaching. Results showed statistically significant differences, even after Bonferroni correction, with large effect sizes, between baseline and coaching in all 10 items. Coaching scores were higher in all cases (see Table 3). There were no statistically significant differences between the coaching on modeling and mand-model, t(119) = −0.18, p = .85, d = 0.02, between mand-model and time delay, t(119) = 1.00, p = .32, d = 0.09, and between modeling and time delay, t(119) = 0.62, p = .54, d = 0.06, indicating that the raters rated coaching on each communication strategy with high scores and observed differences between baseline and coaching scores overall but not among strategies.
Paired t Tests Between Baseline and Coaching Scores in all Three Strategies and the Overall Coaching Score.
Average score of scores on videos of Strategies 1, 2, and 3.
Similarly, we analyzed the differences between baseline and coaching scores by the type of rater and academic major. Table 4 shows that the differences between baseline and coaching scores were statistically significant with large effect sizes for each type of rater (student vs. practitioner) and academic major (SPED vs. SLP). For academic majors, differences were especially large for SLPs even after Bonferroni correction. For type of rater, practitioners also showed a large baseline–coaching difference of almost two and a half standard deviations as indicated by Cohen’s d.
Paired t Tests Between Baseline and Coaching Scores by Major and Type of Rater.
Note. SPED = special education; SLP = speech and language pathologist.
Item-Level Differences Among Raters
At the overall scale level (all 10 items), no statistical differences were found between students and practitioners in baseline and coaching phases when Bonferroni correction was applied. Only Item 2 (The child responds to the parent’s attempts to encourage social communication/interaction) showed statistically significant differences with practitioners having a larger baseline–coaching difference with medium effect sizes (p = .003, d = 0.59).
Item-Level Differences Among Academic Majors
Regarding the differences between academic majors in baseline and coaching scores in all items of the scale, no statistically significant differences were found. In coaching videos, SLP majors seemed to rate all items higher than SPED majors, although these differences were statistically significant only in Item 7 (The parent is giving instructions, asking questions, and/or giving choices to elicit responses from the child) after Bonferroni correction (p = .001, d = 0.81) and Item 8, which is “The child is responding to the parent’s instructions, questions, and/or choices” (p = .003, d = 0.73). Finally, the baseline–coaching difference score showed small differences in most of the items in favor of the SLP majors, but these differences were only statistically significant, after Bonferroni correction, in Item 7 (p < .001, d = 0.47). These differences had the same direction, where raters with a major in SLP showed a larger difference. Consequently, academic major was the only variable, as shown in Table 5, showing a statistically significant difference in the overall score on social validity, owing to this pattern of responses shown by SLPs.
Comparison Between Groups of Raters and Majors in Baseline, Coaching, and Coaching–Baseline Difference Scores.
Note. Paired-samples t tests and Mann–Whitney results. The statistic column shows the student t test and Mann–Whitney values. Effect size is given by Cohen’s d; for the Mann–Whitney test, effect size is given by the rank biserial correlation. ES: Estimated score. SPED = special education; SLP = speech and language pathologist; SV = social validity.
Statistic and significance value from Mann–Whitney U test owing to the violation of the equal variance assumption (Levene’s test < .05).
Figures 1 and 2 summarize the different results of the baseline and coaching scores by type of rater and major presented above. The significance values represent the simple main effects for within-groups comparisons (near each line), the differences between groups in the baseline and coaching phases (square brackets in the graph), and the differences between groups in relation to the coaching–baseline difference score (square brackets in the legend).

Graphical summary of results on baseline and coaching comparisons by raters.

Graphical summary of results on baseline and coaching comparisons by student majors.
Discussion
Although the assessment of social validity has been regarded as a critical feature and indicator of quality in SCR methodology, few researchers have conducted rigorous analysis of social validity; our study expands the body of social validity literature related to parent-implemented communication interventions. The PiCSS study provided support for the effectiveness of delivering systematic parent training and coaching and extended the evidence for the PiCS (Meadan, Angell, et al., 2014) and i-PiCS programs (Chung et al., 2016; Meadan et al., 2016) as effective models of training and coaching to facilitate parent-implemented communication interventions across different family routines such as storybook reading. In the current study, we evaluated the program’s social validity to examine the acceptability and importance of the PiCSS program using masked raters. This information is especially important because socially valid interventions increase the likelihood of parents’ continued use of the strategies they learn during interventions (Kazdin, 2011; Turan & Meadan, 2011; Wolf, 1978). By seeking feedback from external evaluators or masked raters, we achieved the role of anonymity in the assessment of social validity. The feedback we received through social validity rating confirmed the social importance of the PiCSS program. All masked raters, regardless of major (SPED vs. SLP) or type (college students vs. practitioners), rated the coaching videos significantly higher than the baseline videos, meaning that the PiCSS intervention resulted in improved parent use of communication strategies and child social-communication behavior. The videos from the coaching phase were rated significantly higher, which is consistent with the qualitative (interview) social validity findings of the PiCSS study (Akamoglu & Meadan, 2019).
Moreover, these findings are consistent with previous research assessing the social validity of interventions by asking external evaluators from the community to evaluate videos of participants from various phases in the study (English et al., 1997; Jung et al., 2008; Meadan et al., 2014; Morrison, Sainato, Benchaaban, & Endo, 2002). These researchers combined social validity rating scales and/or checklists with video clips and showed them to masked raters. Although different aspects of the procedures such as the behavior of participating children (Oke & Schreibman, 1990) or the feasibility of the intervention (Jung et al., 2008) were evaluated, raters in all those studies had higher ratings for postintervention procedures and outcomes compared with preintervention. In their literature review on social validity, Ledford et al. (2016) reported that approximately half of the studies that used masked ratings reported positive results, which aligns with the findings of the current study.
We found statistical differences in the mean scores across types of raters. Practitioners scored baseline videos with lower and coaching videos with higher mean scores than the students. This difference between practitioners and students might be explained by teaching experience. Practitioners might have more experience with parent–child interactions and might be more sensitive to changes. There were statistically significant differences in the mean scores between academic majors. SLP majors scored baseline videos with lower and coaching videos with higher mean scores compared with SPED majors. SLPs are trained specifically on language, speech, and communication development and skills. This might be the reason for the differences between the majors. However, in a similar study (Meadan et al., 2014), ECSE teachers reported higher scores compared with SLPs in service. The authors hypothesized that the difference might be due to the ECSE teachers receiving training in naturalistic teaching strategies during preservice years and having more experience working with children with disabilities.
Overall, regarding the effectiveness of the PiCSS program, the findings are consistent with previous parent-implemented communication interventions (e.g., Brown & Woods, 2015; Kaiser & Roberts, 2013; Kashinath et al., 2006; Meadan, Angell, et al., 2014; Meadan et al., 2016). These studies have also shown that parents can be taught and coached to implement naturalistic teaching strategies and their implementation might lead to positive results in children’s social-communication skills.
Limitations
This study has three primary limitations. First, this was a small-scale study and focused on videos of only two mother–child dyads. The participants in the original PiCSS study were White, educated, and from high-income families. Thus, this limits the generalizability of the social validity findings. Future research should conduct similar interventions in large-scale studies and evaluate social validity of similar interventions with diverse families from different ethnic, education, and income backgrounds.
The second limitation is related to the participants in the social validity survey. Because all families’ interviews and surveys data were included in Akamoglu and Meadan’s (2019) article, the other data were not included in this article. Also, including the families’ interviews and surveys data together would have required using a “mixed-method” design which was not the goal in this article. In this study, only practitioners and preservice teachers were recruited to rate the videos. To strengthen the results with different perspectives, other parents who have children with or without disabilities or delay could have been included. Future research should evaluate the social validity of such interventions with a diverse group of raters (practitioners, students, and parents). In addition, the data from the participant interviews and survey could be combined with the external raters’ data using a mix-methods design. Information from all stakeholders could provide a comprehensive and holistic picture for social validity of the intervention.
The third limitation is related to the video selection. Each selected video was 1-min long and it is possible 1-min videos did not represent all changes in parent–child interactions. To provide the whole picture, future research should consider different ways to include longer videos for a better understanding of parent–child interactions.
Implications
This study adds to the literature on social validity assessment methods, and it has implications for research and practice. Feedback collected from external evaluators through social validity assessments (e.g., surveys) can be a valuable resource to facilitate the buy-in of their work by families and practitioners and to translate it into practice (Hurley, 2012). Researchers can supplement their observational SCR data with quantitative data obtained from external evaluators via surveys (Meadan et al., 2014). Information gathered from people who are naïve to the study but have expertise or knowledge about parent–child interactions might be necessary to determine whether the magnitude of change in the dependent variable is socially valid. As pointed by Meadan et al. (2014), researchers should identify “(a) appropriate evaluators, (b) an accurate and adequate sample, and (c) a valid instrument (e.g., survey)” (p. 417).
We also found that college students from special education and speech and hearing departments were capable of rating the parent–child interactions. Researchers in teacher and practitioner preparation programs can recruit students for social validity studies and seek feedback from them as external evaluators. Yet, in EI/ECSE, there is a need for professionals from different backgrounds and with different experiences to collaborate with each other and partner to support families and children. Therefore, professionals in preservice training, should be taught how to collaborate and partner with families of children with DD. SPED and SLP majors should graduate with knowledge and experience in EI/ECSE, professional–family partnership, and different disability categories.
Our findings highlight two specific implications for practice. First, family values, goals, and priorities are important in identifying appropriate intervention goals and procedures. Social validity assessments provide an avenue for parents to make their voices heard by others. This provides an opportunity for practitioners to listen to parents’ and their children’s voices and understand their priorities, concerns, and goals for their children and families.
Second, social validity findings supplement observational data and can be used to facilitate the buy-in process with families. Many families seek evidence-based and/or recommended practices for their children with disabilities. Thus, practitioners can combine observational (e.g., single-case) and social validity findings to guide families about practices that are feasible and effective (Ledford et al., 2016).
Conclusion
Wolf (1978) made an imperative for the field of ABA: assessing social validity of our interventions. The number of social validity assessments has been growing, especially after it was defined as a critical feature of SCR (Horner et al., 2005). Hence, the number of different methodologies to assess social validity has also increased. For example, quantitative rating scales could be used in a study to confirm that an intervention produced positive and socially valid outcomes for the participants. Researchers should continue to examine valid and comprehensive methods and tools for measuring social validity.
Supplemental Material
Appendix_A_Rating – Supplemental material for Using Masked Raters to Evaluate Social Validity of a Parent-Implemented Communication Intervention
Supplemental material, Appendix_A_Rating for Using Masked Raters to Evaluate Social Validity of a Parent-Implemented Communication Intervention by Yusuf Akemoglu, Pau Garcia-Grau and Hedda Meadan in Topics in Early Childhood Special Education
Footnotes
Appendix
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
