Abstract
A pivotal response treatment package consisting of clinician-delivered and parent-implemented strategies was recently found to be effective in improving language and social communication deficits in children with autism spectrum disorder. Reciprocal vocal contingency, an automated measure of vocal reciprocity, may provide stronger and complementary evidence of the effects of the pivotal response treatment package. Reciprocal vocal contingency is derived through an automated process from daylong audio samples from the child’s natural environment. Therefore, reciprocal vocal contingency is at lower risk for detection bias than parent report and brief parent–child interaction measures. Although differences were non-significant at baseline and after 12 weeks of intervention for the 48 children with autism spectrum disorder who were randomly assigned to the pivotal response treatment package or a delayed treatment control group, the pivotal response treatment package group had higher ranked reciprocal vocal contingency scores than the control group after 24 weeks (U = 125, p = .04). These findings are consistent with results from parent report and parent–child interaction measures obtained during the trial. The participants in the pivotal response treatment package exhibited greater vocal responsiveness to adult vocal responses to their vocalizations than the control group. Findings support the effectiveness of the pivotal response treatment package on vocal reciprocity of children with autism spectrum disorder, which may be a pivotal skill for language development.
Lay abstract
A recent randomized controlled trial found that children with autism spectrum disorder who received a pivotal response treatment package showed improved language and social communication skills following the intervention. The pivotal response treatment package includes clinician-delivered and parent-implemented strategies. Reciprocal vocal contingency is an automated measure of vocal reciprocity derived from daylong audio samples from the child’s natural environment. It may provide stronger and complementary evidence of the effects of the pivotal response treatment package because it is at lower risk for detection bias than parent report and brief parent–child interaction measures. The current study compared reciprocal vocal contingency for 24 children with autism spectrum disorder in the pivotal response treatment package group and 24 children with autism spectrum disorder in the control group. The pivotal response treatment package group received 24 weeks of the pivotal response treatment package intervention. The control group received their usual intervention services during that time. The groups did not differ in reciprocal vocal contingency when the intervention started or after 12 weeks of intervention. However, after 24 weeks the pivotal response treatment package group had higher ranked reciprocal vocal contingency scores than the control group. These findings are consistent with results from parent report and parent–child interaction measures obtained during the trial. The participants in the pivotal response treatment package exhibited greater vocal responsiveness to adult vocal responses to their vocalizations than the control group. Findings support the effectiveness of the pivotal response treatment package on vocal reciprocity of children with autism spectrum disorder, which may be a pivotal skill for language development.
Keywords
One instantiation of naturalistic developmental behavioral interventions (NDBIs) combines clinician-delivered intervention and parent training. Pivotal response treatment (PRT) is an NDBI for children with autism spectrum disorder (ASD) that targets “pivotal” skills (e.g. motivation and responsivity) to facilitate broad changes across functional skills (e.g. Koegel et al., 2005). However, prior randomized controlled trials (RCTs) for PRT had investigated either clinician-delivered or parent-implemented PRT (e.g. Hardan et al., 2015; Mohammadzaheri et al., 2014).
A recent RCT (Gengoux et al., 2019) evaluated the effectiveness of combining PRT parent training with in-home, clinician-delivered PRT (PRT-Package or PRT-P) on communication skills of children with ASD aged 2 to 5 years old. The treatment group received 12 weeks of weekly parent training sessions and 10 hours per week of clinician-delivered in-home treatment, followed by 12 weeks of monthly parent training sessions and 5 hours per week of clinician-delivered in-home treatment (Gengoux et al., 2019). The delayed treatment control group received continued community services during the 24-week trial. Participants who received PRT-P (n = 23) exhibited greater gains in frequency of functional utterances during structured laboratory observations (SLO) and in parent-reported expressive vocabulary (MacArthur-Bates Communicative Development Inventory (MB-CDI); Fenson et al., 2007), Clinical Global Impressions–Improvement subscale (CGI-I; Guy, 1976), and Brief Observation of Social Communication Change (BOSCC, Grzadzinski et al., 2016) during a parent–child interaction relative to the control group (n = 20; Gengoux et al., 2019). Investigators and research staff blind to group assignment completed outcome measures requiring clinical assessments (CGI-I) or video scoring (SLO, BOSCC).
In this more strenuous analysis, we test whether the PRT-P group outperforms the control group on a more objective measure less influenced by context and at lower risk of detection bias—reciprocal vocal contingency (RVC)—from daylong vocal samples in the home environment. Two 16-hr weekend-day audio recordings were collected using Language ENvironment Analysis (LENA) devices for participants at baseline, intervention midpoint (12 weeks post study entry), and end of intervention (24 weeks post study entry). Briefly, RVC is the degree to which an adult vocal response to a child vocalization increases the likelihood of an immediately following child vocalization beyond chance (Harbison et al., 2018). RVC quantifies the extent to which child vocalizations occur after parent responses to child vocalizations more than after other event types or times (Harbison et al., 2018). It measures a three-event sequence (child vocalization→adult vocalization→child vocalization) rather than a two-event sequence (child vocalization→adult vocalization; adult vocalization→child vocalization) used by the LENA conversational turn variable.
RVC is an appealing outcome measure for the PRT trial for at least three reasons. First, RVC quantifies vocal reciprocity, which is related to communication and language development based on empirical and theoretical evidence in children with ASD (Harbison et al., 2018; Warlaumont et al., 2014). Second, its automated nature reduces bias risk because no human coding is used. With other measures, the coder’s lack of blindness to treatment assignment can increase risk of overestimating the treatment groups’ true skills relative to the control group, particularly when communication acts are marginally intelligible. Third, because RVC is calculated from audio recordings of daylong vocal samples in the child’s natural environment, it represents the child’s interaction with multiple adults across natural settings. Even on weekends, parents, grandparents, adult friends, and other visitors may talk with the child. Assuming that only one of these adults received parent training, some adults will not be using the trained interaction style. In this way, RVC is a measure of children’s generalized tendency to be vocally responsive to adults’ vocal responses. This characteristic of RVC is a large advantage over measures of parent–child interaction in the lab (e.g. SLO, BOSCC) because when parents are the adults eliciting the child behavior, the parents are playing the “examiner” role in the measurement context for the child dependent variable. When half of the parents are using an interaction style that immediately elicits child vocalizations, as is possibly the case for parents receiving PRT-P training, children in the PRT-P group are tested under importantly different conditions than children in the control group. Thus, child skills derived from parent–child interaction sessions in parent-implemented intervention trials are at risk of correlated measurement error (i.e. true score is systematically elevated for the predicted superior group and systematically underestimated for the control group; Yoder et al., 2018). RVC from daylong vocal samples in the natural environment is much less likely to be biased in this manner. Ultimately, PRT-P is designed to influence children’s generalized tendencies, not just their behaviors when interacting with their parents in a laboratory environment.
Purpose
This study evaluated whether participants in the PRT-P group exhibited higher RVC values than the participants in the control group (community treatment) after 12 and 24 weeks of intervention. If the PRT package including clinician-delivered and parent-implemented PRT is effective on automatic measures of RVC from daylong recordings, a stronger argument about effects of PRT-P could be made than has been previously documented.
Methods
Participants
Participants in this study were enrolled in the PRT-P RCT (Gengoux et al., 2019). They were referred by local professionals or recruited through flyers and word of mouth. Participants enrolled from December 2013 through July 2016. See Gengoux et al. (2019) for Consort Form flow chart. ASD diagnosis was based on meeting criteria on the Autism Diagnostic Interview–Revised (ADI-R; Lord et al., 1994), Autism Diagnostic Observation Schedule (ADOS-2; Lord et al., 2012), Diagnostic and Statistical Manual of Mental Disorders, fifth edition, diagnostic criteria for ASD (American Psychiatric Association, 2013), and expert clinical judgment. Inclusion criteria also included scoring at least one standard deviation below the mean for 2 and 3 year olds, two standard deviations below the mean for 4 year olds, and three standard deviations below the mean for 5 year olds on the Preschool Language Scale–Fifth Edition Expressive Communication subscale (Zimmerman et al., 2011). Exclusion criteria included individual speech therapy more than 1 hour per week, one-on-one applied behavior analysis treatment more than 15 hours per week, severe psychiatric disorder, genetic abnormality or active medical disease, primary language other than English, or living more than 50 miles from the research center.
Forty-eight participants were randomized (PRT-P = 24; control = 24). Forty-three families completed the trial (PRT-P = 23; control = 20). Data for the current analysis (i.e. naturalistic audio recording with at least 60 min of “meaningful time”; LENA Research Foundation, 2015) were available for 40 participants (PRT-P = 20; control = 20). Meaningful time is defined consistent with LENA user guidelines as “usable, distinguishable speech that is included in the reported information” (LENA Research Foundation, 2015, p. 35).
Participating children were primarily male (88%). Participating parents were primarily female (79%) and college graduates (84%). The sample was ethnically diverse with 56% Asian, 28% Caucasian, 7% biracial/other, 7% Hispanic, and 2% Native Hawaiian. Median annual household income was reported in the $150,000–$200,000 range. Table 1 displays additional participant characteristics. Mann–Whitney U tests only revealed a group difference at baseline for words understood on the MacArthur-Bates Communicative Development Inventory–Words and Gestures Form (MB-CDI WG; Fenson et al., 2007). However, because MB-CDI WG Receptive was not correlated with RVC at Time 3 (Spearman’s rho = .25; p = .12), this baseline difference cannot explain the treatment effect presented below.
Participant characteristics at study initiation.
PRT-P: pivotal response treatment package including parent training and clinician-delivered intervention; control group: delayed treatment control group; mental age: mean age equivalent score across subtests of the Mullen Scales of Early Learning (MSEL; Mullen, 1995); developmental quotient: mental age divided by chronological age multiplied by 100; MB-CDI WG: MacArthur-Bates Communicative Development Inventory–Words and Gestures Form (Fenson et al., 2007); MSEL: Mullen Scales of Early Learning; MB-CDI WS: MacArthur-Bates Communicative Development Inventory–Words and Sentences Form (Fenson et al., 2007); U: test statistic for Mann–Whitney U test.
PRT-P and control conditions
The PRT-P included weekly individual parent training sessions and 10 hours per week of in-home intervention for 12 weeks, followed by 12 additional weeks of monthly parent training sessions and 5 hours per week of in-home intervention. Parent training sessions were conducted in a clinic and involved systematically introducing the motivational components of PRT to parents, clinician modeling of implementation with the child, and parent practicing directly with the child with in vivo clinician feedback. During in-home sessions, a trained clinician played with the child using child-preferred toys and implemented PRT strategies to encourage functional communication practice. See Gengoux et al. (2019) for a detailed description of the intervention procedures. Children in the delayed treatment control group continued stable community services during this study.
RVC
We quantified RVC using a three-event sequential analysis to calculate the degree to which children’s vocal responses were contingent on prior adult vocal responses to children’s immediately preceding vocalizations. RVC is independent of chance sequencing of events (Yoder et al., 2018). Because this analysis requires long vocal samples, we used two daylong LENA system audio recordings per participant per period. Recordings were collected at baseline and after 12 and 24 weeks of intervention. We used the LENA Pro computer program to segment and classify adult and child vocalizations and an open-sourced, cross-platform computer program (Version 1; https://github.com/HomeBankCode/LENA_contingencies) to compute RVC values for each daylong vocal sample using a 2-s pause duration (Yoder et al., 2016).
Results
The mean length for the audio recordings was 15 h, 25 min (SD = 1 h, 42 min) for the PRT-P group and 15 h, 2 min (SD = 2 h, 39 min) for the control group. These lengths did not vary by group (t = 0.55, p = .59). Two families contributed a single recording at Time 1 (PRT group = 0), seven did so at Time 2 (PRT group = 3), and eight at Time 3 (PRT group = 4). Three of the single recordings included less than 60 min of meaningful time and therefore were excluded. All participants with usable Time 3 recordings had usable Time 1 and Time 2 recordings except for one participant due to zero values in multiple contingency table cells for Time 2 RVC.
As a measure of RVC stability across recordings, we calculated intraclass correlation coefficients (ICCs) for two daylong audio recordings at each time point. The ICCs were 0.59, 0.77, and 0.63 at Times 1, 2, and 3 respectively. Mitchell (1979) describes ICCs of 0.7 as “very good.” The following RVC analyses used the sum of the two daylong recordings per participant per time point.
Because the population distribution of RVC is unknown, we used the nonparametric Mann–Whitney U test to assess between-group differences. Although group differences were non-significant at baseline (Time 1; n1 = 20 (PRT-P group); n2 = 20 (control group); ∑R1 = 458; ∑R2 = 362; U = 152; p = .20) and after 12 weeks of intervention (Time 2; n1 = 20; n2 = 19; ∑R1 = 449; ∑R2 = 331; U = 141; p = .18), children in the PRT-P group had higher ranked RVC scores than children in the control group after 24 weeks of treatment (Time 3; n1 = 20; n2 = 20; ∑R1 = 485; ∑R2 = 335; U = 125, p = .04). The percentage of all possible pairwise comparisons for which the PRT-P group exceeded the control group at Time 3 was 69%, which is a moderate effect size (Wuensch, 2015). Figure 1 displays the median and confidence intervals for Time 3 RVC by group.

Reciprocal vocal contingency at the end the trial.
Discussion
Compared with the control group, the PRT-P group exhibited significantly higher RVC values after 24 weeks of intervention. No significant difference was observed at baseline or intervention midpoint. The lack of an effect at the intervention midpoint is not explained by reduced stability because RVC stability was higher at the midpoint (Time 2) than the endpoint (Time 3), for which an effect was identified. The findings support the effectiveness of PRT-P on vocal reciprocity of children with ASD, which may be a pivotal skill for language development. The RVC analysis results are consistent with results from expressive language measures obtained during the RCT (Gengoux et al., 2019) and earlier PRT (e.g. Gengoux et al., 2015; Hardan et al., 2015; Mohammadzaheri et al., 2014) studies. They provide additional support for PRT-P benefits. Arguably, RVC assesses a more generalized behavioral tendency and is at lower detection bias risk than previously reported PRT-P benefits (e.g. parent-reported expressive vocabulary and social communication within parent–child interaction sessions). In addition, RVC is derived from vocal samples collected from the child’s natural environment, which often included multiple communication partners, not only the parent who had learned to implement the intervention.
Limitations
The identities of the adult communication partners in the daylong audio recordings are unknown. Therefore, we cannot differentiate between participants’ RVC values with the parent who received PRT training versus other individuals who did not receive PRT training. There is also some known error in the classification of acoustic events for the LENA system. These classification errors involve misidentifying the speaker (e.g. child vs adult) and event type (e.g. speech-like child vocalization vs other child vocalization; Warren et al., 2010). Reports indicate that the LENA system algorithms are generally conservative relative to human coders (e.g. Warren et al., 2010). Collecting a vast amount of data (i.e. multiple daylong samples) reduces the impact of these classification errors.
Future directions
Given the novel nature of the RVC treatment effect, replication is required. Continued investigation is also required to determine the specific time range and developmental level for which RVC is most useful.
Conclusion
In summary, participants in the PRT-P group exhibited a greater generalized tendency to be vocally responsive to adults’ vocal responses than participants in the control group. The identified treatment effect on vocal reciprocity as measured by RVC, which is at low risk for bias and evaluates children’s performance in naturalistic settings, lends further support to the broad reaching effects of PRT. The finding supports continued development and evaluation of PRT for children with ASD.
Footnotes
Author contributions
J.M. participated in the study design, analyzed the data, and drafted the manuscript; P.Y. conceived the study, participated in the study design, helped interpret the data, and helped draft the manuscript; M.C. participated in the study design and analyzed the data; M.E.M. collected the data and helped interpret the data; C.M.A. collected the data and helped interpret the data; G.W.G. and A.Y.H. conceived the study, participated in the study design, supervised data collection, and helped interpret the data. All authors read and approved the final manuscript.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was funded by the National Institute on Deafness and Other Communication Disorders (NIDCD) (DC01368902) and supported by two US Department of Education grants (H325D140087 and H325D140077). We thank all of the children and families who participated to make this work possible.
