Abstract
Naturalistic developmental behavioral intervention is an emerging class of interventions for young children with autism spectrum disorder. The present article is a meta-analysis of outcomes of group-design studies (n = 27) testing interventions using naturalistic developmental behavioral intervention strategies. Small, significant positive effects of naturalistic developmental behavioral intervention were found for expressive language (g = 0.32), reduction in symptoms of autism spectrum disorder (g = −0.38), and play skills (g = 0.23). Larger effects were found for social engagement (g = 0.65) and overall cognitive development (g = 0.48). A marginal effect was found for joint attention (g = 0.14) and receptive language (g = 0.28). For joint attention, improvement was moderated by hours of professional involvement. Evidence of publication and reporting bias was present for language outcomes. This meta-analysis grows the evidence base for naturalistic developmental behavioral interventions, particularly in the key areas of social engagement and cognition.
Autism spectrum disorder (ASD) is a neurodevelopmental disorder associated with deficits in social communication, as well as restricted interests and repetitive behaviors. ASD is frequently comorbid with impairments in cognitive development, adaptive behavior, and verbal and nonverbal communication (American Psychiatric Association [APA], 2013). ASD can be reliably diagnosed as young as 24 months, and gold-standard diagnostic tools are validated for children as young as 12 months (Luyster et al., 2009; Ozonoff et al., 2015). Early intensive behavioral intervention (EIBI) can improve communication, adaptive behavior, and cognitive development (Rogers & Vismara, 2008; Smith & Iadarola, 2015). However, change in core symptoms, such as social reciprocity, has been more elusive (Reichow, 2012; Smith & Iadarola, 2015). Developmental social-pragmatic (DSP) interventions focus more on adult facilitation of social interaction and use of naturally occurring teaching opportunities for addressing core social impairments, but have less evidence for efficacy, particularly in key areas such as cognitive development (Ingersoll, 2010).
Naturalistic developmental behavioral intervention
Naturalistic developmental behavioral intervention (NDBI) is an intervention model based on both behavioral and developmental principles, which incorporates naturally occurring contexts and contingencies, and shared control between the interventionist and student (Schreibman et al., 2015). In NDBIs, teaching in natural learning environments is strongly emphasized. Accordingly, decontextualized teaching (e.g. at a table or in other highly structured environments) is eschewed in favor of natural contexts such as play and daily routines. Another defining feature of NDBI models is their emphasis on encouraging spontaneous initiations, rather than responses to specific questions or prompts. Similarly, emphasis is placed on natural connections between the target skill and the consequence. In practice, this translates to using reinforcers that are contextually and logically linked to the expressed skill (e.g. social praise and affect for demonstration of social communication, or access to materials upon appropriate and relevant verbal request). Setting up opportunities for spontaneous, flexible responses is emphasized, rather than using repetitive cues (discriminative stimuli) which signal circumscribed, specific responses. What further distinguishes NDBI models from other models is the systematic application of behavioral principles and behavioral modification. Some models which emphasize the naturalistic features just described also endorse “universal acceptance” of child behaviors, with few or no provisions for corrective or contingent feedback. A final distinguishing feature is clear and explicit behavioral targets and monitoring of goals.
Although NDBIs vary in some specifics, are all characterized by the same overarching principles (Schreibman et al., 2015). When considered as a class, a robust and growing evidence base for NDBIs has accrued. Initial research efforts for many models used primarily single-subject designs. Reciprocal Imitation Training (RIT; Ingersoll & Schreibman, 2006), Pivotal Response Training (PRT; Koegel & Frea, 1993; Stahmer, 1995), and the Early Start Denver Model (ESDM; Vismara, Colombi, & Rogers, 2009) are examples of NDBIs that have and continue to acquire strong single-subject design support. Recently, a wider array of group-design studies has been conducted. For many of the most widely used NDBIs, including PRT, JASPER (Joint Attention, Symbolic Play, Engagement, and Regulation), ESDM, and LEAP (Learning Experiences Alternative Program), there are a number of clinician-mediated, group-design studies with promising results (e.g. Kasari, Freeman, & Paparella, 2006; Mohammadzaheri, Koegel, Rezaee, & Rafiee, 2014; Strain & Bovey, 2011). Some positive results have also been reported for parent- and teacher-mediated treatments (Kasari et al., 2014; Wetherby & Woods, 2006), although findings are mixed (Rogers et al., 2012).
Rationale for the present study
The present study aims to quantitatively examine the existing NDBI evidence base by conducting a meta-analysis of published NDBI group-design studies. While many positive reports of NDBIs exist, further consolidation of existing literature will allow for clearer consensus on the effects of NDBI for specific outcomes (e.g. joint attention, cognitive development, expressive language). In addition, many NDBI studies examine low-intensity models and/or treatments that are implemented in community settings. This focus on effectiveness is a strength of the NDBI evidence base in many ways; however, it may lead to smaller and more variable effect sizes in comparison to larger and more tightly controlled efficacy studies. Meta-analysis is likely to be an effective way to integrate this literature base, and the combined weight of multiple low-intensity stakeholder-implemented studies will be a more powerful way to test for attenuated but possibly still present therapeutic effects.
Methods
Primary search procedure
A literature search was conducted in June 2016 and updated in January 2018 using PsychInfo, Dissertation Abstracts International, and MEDLINE. Since the term “NDBI” was not formalized until 2015, the following search terms were chosen to cast a wide net of possibly applicable intervention studies: autism AND (intervention OR treatment OR therapy) AND (development* OR behav* OR naturalistic) AND (social OR communication OR language). This initial search resulted in 2301 records.
Inclusion/exclusion process and secondary search procedures
Reports in languages other than English and with non-pediatric populations were excluded, and remaining abstracts and titles were reviewed for relevancy. Of this initial search, 95 reports were identified as possibly relevant, and 25 additional reports were identified through a reference review of this group. This search was updated in January 2018 and additional 45 reports were added for a total of 165 possible relevant reports.
Methodological review
Next, all identified reports were methodologically reviewed using the following inclusion criteria: (1) must use a group design with a control or comparison group; (2) participants must have ASD or be showing red flags for ASD; (3) mean age of participants is under 6 years, 0 months; and (4) reported outcomes must be related to development or core features of ASD and must be reported at a post-treatment time point. Group-design studies were required, as other designs are incompatible with this quantitative approach (e.g. multiple baseline design studies) or not sensitive to maturational effects (pre–post design studies). NDBIs are typically explicitly geared for preschool populations with ASD. Accordingly, a mean age of under 6 and concern regarding symptoms of ASD were required for study participants. Efficacy for older children or children with other developmental delays are important but separate questions. Studies which focused on parent variables (e.g. parental stress) or concerns not related to a diagnosis of ASD were excluded due to limitations of scope of the present analysis. Each study was double-coded by the authors on methodological criteria, and disagreements were resolved through discussion. Coders agreed on 90% of reviewed reports; 90 of the 165 reports were excluded.
Intervention review
Reports which passed the methodological review (n = 75) were reviewed to ascertain if they contained an experimental condition qualifying as NDBI-based. The authors used (1) qualitative descriptions of the core theoretical principles of NDBIs, as laid out by Schreibman et al. (2015), and (2) review of intervention manuals of six established NDBI models: JASPER, ImPACT (Improving Parents as Communication Teachers), RIT, PRT, ESDM, and SCERTS (Social Communication and Emotional Regulation Transactional Support) (Ingersoll, 2008; Ingersoll & Dvortcsak, 2009; Kasari, 2012; Koegel & Koegel, 2006; Prizant, Wetherby, Rubin, Laurent, & Rydell, 2005; Rogers & Dawson, 2010), in order to establish clear decision rules and inclusion/exclusion criteria for identifying whether interventions qualified as NDBI-based. Reviewers were the first and second authors, who collectively have formal training and practical experience in four manualized NDBIs, as well as Applied Behavior Analysis (ABA). Out of this process, the inclusion and exclusion criteria in Table 1 emerged, with four broad inclusion domains: (1) use of natural contexts, (2) use of natural contingencies, (3) integration of developmental principles, and (4) integration of behavioral principles.
Inclusion and exclusion criteria for intervention review.
Identified reports (n = 26) that used interventions noted in the Schreibman et al. (2015) paper as exemplars of the NDBI models were automatically included, as the inclusion criterion for the present analysis was explicitly based off commonalities in their manuals. These identified exemplars included JASPER (n = 13), PRT (n = 6), ESDM (n = 4), RIT (n = 2), Incidental Teaching (n = 0), Enhanced Milieu Teaching (n = 0), ImPACT (n = 1), and SCERTS (n = 0). Remaining reports not using one of these identified models were double-coded for inclusion. In order to determine whether other interventions met criteria, authors used information from each research article, as well as other available published information such as other referenced papers or reports, published treatment manuals, and websites describing commercially available curricula. For the 49 reports which did not use a pre-included model, reviewers agreed on eligibility in 88% of cases. Disagreements were resolved by discussion. Studies examining use of Telehealth models were not included, as the stated goals of these studies were typically to examine innovative service delivery formats, rather than intervention efficacy (n = 1). In addition, studies comparing two NDBI models to one another, or an NDBI model to another established evidence-based intervention such as Picture Exchange Communication System (PECS) or ABA-based interventions (n = 7 reports from five studies), were excluded, as effect size interpretation for these studies would be expected to differ substantially (i.e. an expectation of similar efficacy across treatment models) as compared to studies using an inactive control/comparison group. A total of 37 reports were excluded as a result of the intervention review, leaving 30 reports for inclusion. Finally, individual follow-up searches were conducted for each identified qualifying NDBI to fully capture relevant literature. Results of these searches yielded 20 additional studies. Six of these studies passed methodological and intervention review and were added to analyses for a final count of 37 included reports. Of these studies, seven were excluded from the analyses due to incompatible outcome measures or data which were redundant with another included study. This left 29 studies from 27 unique samples for inclusion. Included studies are marked with an asterisk in the reference section (see Figure 1).

Flowchart of study selection and inclusion.
Data analysis plan
Assessment of heterogeneity
Heterogeneity in reported outcomes can affect interpretability of results. We assessed heterogeneity in terms of variance in reported outcomes across studies (Q statistic) with significant p values indicating presence of significant heterogeneity (Cochran, 1954). We also assessed heterogeneity in terms of percentage of unexplained variance among study estimates (I2) with larger percentages indicating greater unexplained variance across study estimates (Huedo-Medina, Sánchez-Meca, Marín-Martínez, & Botella, 2006).
Systematic differences across studies can contribute to heterogeneity in reported effects and limit interpretability of resulting composites. Some reports include multiple measures within one domain (e.g. two expressive language outcomes). These measures often varied in their degree of standardization and proximity to the intervention context. We hypothesized that more proximal and less standardized measures would evidence larger effects compared to more standardized and distal outcome measures. To test this hypothesis, we used a subgroup meta-analysis to compare stakeholder-reported versus examiner-administered outcome measures. For social engagement, we grouped measures by whether they involved interaction with a partner who was directly involved in the intervention (e.g. a parent in a parent-mediated intervention or a teacher in a classroom-based intervention) versus a non-involved engagement partner (e.g. a novel research staff member or a parent in a classroom-based intervention). For studies that reported both a proximal and a distal measure for a given outcome, each measure was included separately in the subgroup meta-analysis. Measures in remaining outcome domains (IQ, adaptive behavior, autism symptoms) included fewer than four measures in each outcome category (i.e. proximal and distal), precluding use of subgroup meta-analysis for these outcomes.
Results of the subgroup meta-analysis based on measure guided the next analysis steps; in the case that subgroup meta-analyses indicated significantly different effects between groups, these groups were analyzed separately to reduce systematic heterogeneity; otherwise, the groups were collapsed for the moderator analyses described below, as subgroup and moderator analyses are unable to be conducted simultaneously.
We hypothesized that research setting (lab-based vs community-based) and implementer (research staff vs stakeholder) may also affect results. However, we were unable to examine this quantitatively, as the majority of studies used community settings and stakeholders as implementers: 21 and 24 of 29, respectively (see Appendix 1). Qualitatively, this finding underscores the emphasis NDBI models place on community-participatory models, which may benefit stakeholders to a greater degree long term.
Outcome selection
Studies were reviewed for relevant developmental and ASD symptom outcomes. Sufficient data (three or more data points) were available for expressive and receptive language, composite and nonverbal IQ, adaptive behavior, core symptoms of ASD (social communication and restricted, repetitive behavior), social engagement, initiating joint attention, and play. For studies reporting outcomes on multiple measures under one domain, a single measure was selected using the following criteria: (1) direct observation was preferred over informant report, (2) standardized measures were preferred over unstandardized measures, and (3) blinded assessors or non-interventionists were preferred over measures from involved parties. For illustrative purposes, all measures in the same domain for each study are visually displayed in forest plots (measures not selected for final analyses are indicated with an asterisk).
Moderation analyses
Following the subgroup meta-analysis described above, dosage and study quality were examined as moderators of outcome across all studies in each outcome group. Intervention dosage was defined as number of hours of supervision or intervention provided by trained research staff. We hypothesized that lower intensity of professional involvement would result in smaller effects. Study quality was assessed using the Evaluative Method for Determining Evidence-Based Practice in Autism (Reichow, Volkmar, & Cicchetti, 2008). This scale yields ratings of “weak,” “adequate,” and “strong,” depending on methodological rigor and thoroughness of reporting. Relatively few studies in the present sample received a designation of weak (n = 6); therefore, results were coded on a binary scale, with a weak or adequate designations coded as zero, and strong designations coded as one. Study quality was double-coded. Raters agreed in 83% of cases and disagreements were resolved through discussion. If publication bias affected results, higher quality studies would evidence smaller effect sizes as compared to lower quality studies.
Data extraction
Information necessary for each moderator and outcome was independently extracted by both authors to ensure accuracy. Effect sizes were computed using Hedges’ g. Hedges’ g was chosen over Cohen’s d due to its additional correction, meant to counteract bias in small sample sizes (Durlak, 2009). Magnitude of composite effect sizes is reported using conventional cut points: small <0.4, moderate ⩾0.4 and <0.8, and large ⩾0.8 (Cohen, 1977). Preferred calculation method for Hedges’ g was based off exit time point means, standard deviations, and sample size. If group differences on outcome measures existed at baseline (t-test p < 0.05), F values or p values from a repeated measures analysis of variance (ANOVA) group by time interaction were used (Chiang, Chu, & Lee, 2016: Joint Attention; Drew et al., 2002: Nonverbal IQ; Duifhuis et al., 2017: ASD). In the case that this was not reported, study authors were contacted, and if relevant data were provided, this was incorporated into analyses.
Assessment of publication bias
Publication bias was assessed using funnel plots. When studies with large positive effect sizes and large standard errors are visibly overrepresented, reporting or publication bias may be present. Visual analysis of funnel plots was augmented by the “trim and fill” method (Duval & Tweedie, 2000). This nonparametric data imputation procedure statistically imputes “missing” studies, such that point estimates in the funnel plot are normally distributed. This provides a more conservative, and potentially more accurate, effect size estimate. Due to methodological compatibility and sample size (n > 10) constraints, this bias correction was conducted for only a subset of measures with sufficient data, without dosage or study quality as moderators.
Results
Expressive language
A subgroup meta-analysis comparing stakeholder completed measures (g = 0.36, p = 0.03, 95% confidence interval (CI) = −0.05 to 0.68) to researcher administrated measures (g = 0.28, p = 0.05, 95% CI = −0.001 to 0.55) indicated no significant differences based on outcome measure type (Q = 0.16, p = 0.69) (see Figure 2(a)). Meta-analysis of all studies combined indicated a statistically significant effect of NDBIs of small strength (g = 0.32, p = 0.01, 95% CI = 0.07 to 0.56). Heterogeneity was moderate in magnitude, and statistically significant (Q = 22.9, p = 0.02, I2 = 54%), indicating significant variance in effect size estimates among studies. Dosage did not moderate effects (β = 0.09, p = 0.35, 95% CI = −0.10 to 0.29). A marginal moderation effect was found for study quality, with higher quality studies associated with more modest effect sizes (β = −0.41, p = 0.065, 95% CI = −0.84 to 0.03). Consistent with this, visual examination of the funnel plot indicated possible publication or reporting bias. The “trim and fill” analysis added three imputed studies to the left side of the distribution resulting in a more conservative, nonsignificant, effect size estimate (g = 0.19, p = 0.14, 95% CI = −0.07 to 0.45) (see Figure 3(a)).

Meta-analyses forest plots for (a) expressive language, (b) receptive language, (c) cognitive development, (d) adaptive behavior, (e) reduction in symptoms of ASD, (f) joint attention, (g) social engagement, and (h) play skills.

Funnel plots to examine publication bias.
Receptive language
A subgroup meta-analysis indicated no significant differences in effect size for stakeholder completed measures (g = 0.19, p = 0.30, 95% CI = −0.17 to 0.55) as compared to researcher administrated measures (g = 0.34, p = 0.15, 95% CI = −0.12 to 0.80) (Q = 0.26, p = 0.61). Therefore, a random-effects meta-regression was performed across all receptive language outcomes (n = 10); this analysis indicated marginally positive intervention effects of small magnitude (g = 0.28, p = 0.06, 95% CI = −0.02 to 0.58) (see Figure 2(b)). Residual heterogeneity was moderate-to-large in magnitude and statistically significant (Q = 24.9, p < 0.01, I2 = 64%). When study quality and dosage were added as moderators, neither dosage (β = 0.15, p = 0.35, 95% CI = −0.16 to 0.46) nor study quality (β = −0.28, p = 0.36, 95% CI = −0.87 to 0.32) significantly moderated the effects. The “trim and fill” assessment of publication bias added three studies to the left side of the distribution and effects were attenuated, falling below statistical significance (g = 0.12, p = 0.44, 95% CI = −0.19 to 0.43) (see Figure 3(b)).
Cognitive development
Results from a random-effects meta-analysis yielded a significant effect of moderate strength (g = 0.48, p < 0.01, 95% CI = 0.22 to 0.74) for composite IQ/cognitive development. Heterogeneity was nonsignificant and moderate in magnitude (Q = 5.3, p = 0.25, I2 = 30%). Dosage (β = 0.06, p = 0.79, 95% CI = −0.36 to 0.47) and study quality (β = −0.09, p = 0.80, 95% CI = −0.82 to 0.63) did not significantly moderate the effects. For nonverbal IQ, the meta-analysis yielded a small but significant effect (g = 0.21, p = 0.04, 95% CI = 0.01 to 0.41; see Figure 2(c)). Heterogeneity was nonsignificant and small in magnitude (Q = 6.1, p = 0.40, I2 < 1%). Neither study quality (β = 0.24, p = 0.25, 95% CI = −0.17 to 0.65) nor dosage significantly moderated the effects (β = 0.18, p = 0.14, 95% CI = −0.06 to 0.42).
Adaptive behavior
A random-effects meta-regression found no effect of NDBI on adaptive behavior (g = 0.09, p = 0.60, 95% CI = −0.24 to 0.42;. see Figure 2(d)). Heterogeneity was marginally significant and moderate in magnitude (Q = 9.1, p = 0.06, I2 = 56%). Due to limited variability and power (n = 5), study quality was not examined as a moderator. A marginally significant effect was found for dosage; more professional contact hours were associated with more positive findings (β = 0.30, p = 0.06, 95% CI = −0.02 to 0.62). Residual heterogeneity was nonsignificant and small to moderate in magnitude (Q = 4.3, p = 0.22, I2 = 31%) when the moderator was added.
Symptoms of ASD
A random-effects meta-regression found a significant effect of small to moderate strength for NDBIs on reduction of total symptoms of ASD (g = −0.38, p = 0.03, 95% CI = −0.71 to −0.04; see Figure 2(e)). For this set of outcomes, negative estimates indicate decreased symptom presence in the NDBI group relative to the control group. Heterogeneity was significant and of moderate to large magnitude (Q = 26.1, p < 0.01, I2 = 67%), and thus impacts interpretability of results. Neither dosage (β = 0.13, p = 0.54, 95% CI = −0.28 to 0.54) nor study quality (β = −0.07, p = 0.86, 95% CI = −0.89 to 0.74) moderated the results.
Joint attention
A subgroup meta-analysis indicated that joint attention measures involving individuals directly involved with treatment (g = 0.51, p < 0.01, 95% CI = 0.14 to 0.89) as compared to measures involving novel contexts and individuals (g = 0.13, p = 0.15, 95% CI = −0.05 to 0.31) were not statistically different (Q = 3.1, p = 0.08), although results trended toward larger effect sizes for more proximal measures. A random-effects meta-analysis revealed a small, marginally significant effect on initiating joint attention (g = 0.14, p = 0.0696, 95% CI = −0.01 to 0.28). Heterogeneity was nonsignificant and small in magnitude (Q = 16.0, p = 0.32, I2 = 7%). Study quality did not moderate the effects (β = −0.09, p = 0.59, 95% CI = −0.40 to 0.23). Dosage significantly moderated the results such that increased hours of professional contact resulted in more positive joint attention outcomes (β = 0.17 p = 0.02, 95% CI = 0.02 to 0.32). Inclusion of these moderators resulted in further reduction of heterogeneity (Q = 10.1, p = 0.60, I2 < 1%; see Figure 2(f)). Visual inspection of the funnel plot did not reveal evidence of publication bias. The trim and fill procedure added zero studies to the distribution, and accordingly results were unaffected (see Figure 3(c)).
Social engagement
A subgroup meta-analysis indicated that social engagement measures involving individuals directly involved with treatment (g = 0.69, p < 0.01, 95% CI = 0.28 to 1.11) as compared to measures involving novel contexts and individuals (g = 0.40, p < 0.01, 95% CI = 0.09 to 0.71) were not statistically different (Q = 1.28, p = 0.26), although results trended toward larger effect sizes for more proximal measures. A random-effects meta-analysis indicated significant effects of NDBI on social engagement (g = 0.65, p < 0.01, 95% CI = 0.37 to 0.93; see Figure 2(g)). Heterogeneity was significant and of moderate magnitude (Q = 34.2, p < 0.01, I2 = 64%). Neither dosage (β = 0.17, p = 0.20, 95% CI = −0.09 to 0.44) nor study quality (β = 0.25, p = 0.42, 95% CI = −0.36 to 0.85) moderated the results. Some evidence of publication bias appears possible, although a “trim and fill” analysis added zero studies to the left side of the distribution, and consequently conclusions were unaffected (see Figure 3(d)).
Play
A random-effects meta-analysis revealed a small, significant effect of NDBI on play (g = 0.23, p = 0.02, 95% CI = 0.04 to 0.41). Heterogeneity was nonsignificant and was small in magnitude (Q = 7.7, p = 0.36, I2 = 11%). Inadequate variability in study quality was present in the sample to examine this as a moderator. Dosage did not significantly moderate the results (β = −0.11, p = 0.21, 95% CI = −0.27 to 0.06; see Figure 2(h)).
Discussion
Language outcomes
A small but significant intervention effect was found for expressive (g = 0.32) and a marginal effect for receptive language (g = 0.28). Although effect sizes are modest, it is important to remember that most studies included in this analysis use either parent- or teacher-training models, and were relatively short in duration/low in intensity (75% of studies with language outcomes involved less than 50 total treatment hours). With this in mind, even small effects on communication outcomes (a key deficit area for young children with ASD) may be promising and meaningful. In another meta-analysis of parent-mediated intervention for children with ASD, language and communication outcomes were statistically significant but small in magnitude (g = 0.16; Nevill, Lecavalier, & Stratis, 2018). These findings dovetail with the present meta-analysis, which found somewhat larger effects when including a mixture of parent and professional intervention models. Diversity in NDBI models reporting language outcomes tentatively suggests that many NDBI models as a class may positively affect language development, as results appear consistent across different models.
Evidence of potential publication bias is visibly evident in funnel plots for both expressive and receptive languages, and further supported by the results of the study quality moderation analysis for expressive language. Results of the “trim and fill” analyses resulted in a smaller and statistically nonsignificant effect size estimates. These findings suggest that small studies (often exploratory, underpowered, and less well-controlled) with exceptionally positive outcomes may be overrepresented in published material.
Cognitive development and adaptive behavior
Many NDBI models target social engagement and language during play routines specifically (e.g. JASPER, ImPACT, RIT) and have some, but relatively few, provisions for generalization to other areas of development such as pre-academics and cognitive function. In addition, many of the studies in the current sample utilized short, low-intensity models, such that change to global development would not be anticipated. Accordingly, only five studies in the present sample reported cognitive development as an outcome; three of these studies were relatively high intensity (184, 792, and 1581 total hours), while two were relatively low intensity (12 and 26 h). However, effects were consistently positive, and the composite effect size for global development was statistically significant and of moderate strength (g = 0.48). The majority of studies in this sample used the ESDM model. Therefore, conclusions for other models are less clear, and it is likely that less comprehensive, more targeted intervention models (e.g. those which do not explicitly address pre-academic skills) may not be as efficacious in this domain. The smaller effect size when examining nonverbal cognitive development (g = 0.21) suggests that gains in overall IQ may be substantially driven by gains in the verbal domain.
Five studies reported adaptive behavior outcomes, and the resulting composite effect size estimate was nonsignificant. This result emerges despite the fact that three of the five studies reporting adaptive behavior outcomes were among the highest intensity interventions reviewed. While still preliminary given the low number of studies reporting outcomes in this domain, this finding suggests that NDBI procedures may be less effective for targeting adaptive skills.
Symptoms of ASD
Effects of NDBI on overall symptoms of ASD were small to moderate (g = −0.38) and significant. Of note, significant heterogeneity was unaccounted for by the included moderators, indicating that substantial variability existed between studies. One explanation for this may be the inadequacy of available measures at capturing change on this outcome (Smith & Iadarola, 2015). The Autism Diagnostic Observation Schedule, which is the primary measure reported here, was initially designed to be a diagnostic tool, not necessarily sensitive to incremental change as a result of intervention. While calibrated severity scores were later developed to track change over time, it is still unclear whether these are sensitive enough to capture intervention effects (Kitzerow, Teufel, Wilker, & Freitag, 2016). Change-sensitive measures for core symptoms of ASD are a noted area of need in the field (Anagnostou et al., 2015). Valid, sensitive measures in this domain must be established before firm conclusions regarding intervention efficacy (or inefficacy) are merited. Recently, emergent measures, such as the Brief Observation of Social Communication Change, hold promise (Kitzerow et al., 2016).
Social engagement, joint attention, and play
Engagement, joint attention, and play are stated proximal treatment targets in a majority of NDBIs. Within the NDBI framework, improvement in other outcomes is theoretically contingent on establishing shared attention to objects and people. Indeed, building communication skills and diversifying object interaction (and in turn cognitive development) hinge on first establishing reciprocal engagement (Schreibman et al., 2015). Among these outcomes, social engagement with a familiar social partner is perhaps the most directly targeted, proximal treatment outcome, particularly in parent- and teacher-training studies. Therefore, it is unsurprising that social engagement measures resulted in the largest and most consistent effect (g = 0.65). While not significantly different, larger effect sizes were observed for measures with a partner actively involved in intervention (g = 0.69) as opposed to a partner not directly involved (g = 0.40). A similar trend was observed for initiations of joint attention. This suggests that observed changes in social engagement and joint attention may be attributable to changes in adult behavior as well as child behavior. While this may represent a less conservative estimate of change in child behaviors, changes in interaction with familiar partners may also be particularly meaningful, as these familiar social partners are key players in a child’s daily social interactions.
In contrast to the relatively large effect size for social engagement, the composite effect sizes for joint attention (g = 0.14) and play (g = 0.23) were small in magnitude, and the effect size for joint attention was only marginally significant when more distal measures were selected for analysis. Unlike social engagement, many studies assessing joint attention and play use standardized assessments with unfamiliar examiners, rather than assessing during play with a familiar social partner. These assessments of skill use in more distal contexts pull for a generalization of treatment effects across contexts and people. In addition, both require spontaneous and relatively unprompted use of skills. Responses to specific, familiar prompts (such as those used on most standardized language or cognitive measures) may tap an area of relative strength for young children with ASD, who tend to perform better as a group when tasks are concrete (Qian & Lipkin, 2011). Furthermore, requirement that intention be purely social for joint attention outcomes likely presents an even more challenging task for this population. In sum, initiating joint attention and spontaneous play are advanced skills for young children with ASD, and even small movement on these outcomes may be clinically meaningful. Joint attention was the only outcome for which increased intervention hours was significantly associated with greater effect size, which may provide evidence for the validity of this measure, even if effects are small. Regarding play, the analyzed measures primarily examine functional play skills. Future studies should examine effects on other play outcomes (e.g. symbolic play, play diversity). JASPER studies predominate the domains of engagement, pay, and joint attention. However, estimates obtained from other intervention models in these domains appear similar to those contributed by JASPER, tentatively suggesting that NDBI models as a class may improve skills in these domains. Future studies should examine whether NDBI models other than JASPER (e.g. RIT, ImPACT, ESDM) have an impact on these outcomes.
Limitations and future directions
The present study attempted to tightly operationalize a definition of what features constitute an NDBI. Other experts in the field may define this class of interventions differently. In addition, a conservative approach was taken to selection of outcome measures, with a preference for more distal, generalized, and standardized measures when available. While subgroup meta-analyses did not indicate systematic differences based on which reported outcome was selected for analysis, results generally trended toward larger effects for more proximal outcomes. Thus, estimates may be accordingly conservative. Similarly, use of post-intervention effect sizes for analyses unless clear group differences existed at baseline (using a relatively stringent definition; p < 0.05) was also relatively conservative. This field will benefit from additional review efforts using varied methodologies to identify convergent findings. In particular, some additional moderators which may bear meaningful results in the future include baseline child characteristics such as age, symptom severity, and cognitive development. In addition, future meta-analyses may wish to examine the role of specific active ingredients to identify what constituent pieces are most efficacious, as well as examining studies comparing NDBI to other active, evidence-based interventions (e.g. EIBI, PECS). Finally, as is made clear by the small number of studies included for many outcomes, additional empirical evidence is an area of particular need.
Conclusion
The present study used meta-analytic techniques to quantitatively examine the theoretical framework outlined in Schreibman et al.’s (2015) formative paper, which identified NDBI as a distinct and promising class of intervention for young children with ASD. This literature base is comprised largely of studies which incorporate community-based elements such as stakeholder-mediated models, which take place in homes and schools. Long term, this likely facilitates feasibility and sustainability for communities. Tempered by evidence of publication bias, small but positive effects emerged for communication measures. Larger effects, less clearly impacted by publication bias, were identified for social engagement and cognitive development. Small but significant effects emerged for reduction in ASD symptoms and improvement in functional play, although interpretation results regarding ASD symptoms were complicated by significant unexplained heterogeneity across studies. Effects were marginal for joint attention, but significantly moderated by hours of professional involvement in intervention. Clear evidence of improvement was not apparent for adaptive behavior. For joint attention and joint engagement outcomes, some evidence suggested that larger changes were evident on more proximal measures, as compared to more distal outcomes (e.g. standardized measures, interactions with less familiar play partners). This meta-analysis extends the growing evidence base for NDBIs, particularly in the key areas of social engagement and cognition.
Supplemental Material
AUT836371_Lay_Abstract – Supplemental material for Meta-analysis of naturalistic developmental behavioral interventions for young children with autism spectrum disorder
Supplemental material, AUT836371_Lay_Abstract for Meta-analysis of naturalistic developmental behavioral interventions for young children with autism spectrum disorder by Gabrielle Tiede and Katherine M. Walton in Autism
Supplemental Material
AUT836371_Supplemental_material – Supplemental material for Meta-analysis of naturalistic developmental behavioral interventions for young children with autism spectrum disorder
Supplemental material, AUT836371_Supplemental_material for Meta-analysis of naturalistic developmental behavioral interventions for young children with autism spectrum disorder by Gabrielle Tiede and Katherine M. Walton in Autism
Footnotes
Appendix 1
Included Studies.
| Study | Model | Setting | Implementer | Professional hours | Quality |
|---|---|---|---|---|---|
| Colombi et al. (2018) | ESDM | Community | Educators | 156 | Weak |
| Dawson et al. (2010) | ESDM | Community | Research staff | 1581 | Adequate |
| Rogers et al. (2012) | ESDM | Lab-based | Caregivers | 12 | Strong |
| Vivanti et al. (2014) | ESDM | Community | Educators | 792 | Strong |
| Xu et al. (2018) | ESDM | Community | Caregivers | 40 | Strong |
| Zhou et al. (2018) | ESDM | Lab-based | Caregivers | 39 | Adequate |
| Wetherby and Woods (2006) | ESI | Community | Caregivers | 113 | Weak |
| Oosterling et al. (2010) | Focus | Community | Caregivers | 34 | Adequate |
| Van der Paelt, Warreyn, and Roeyers (2016) | JA/Imitation | Community | Educators | 16.8 | Strong |
| Warreyn and Roeyers (2014) | JA/Imitation | Community | Educators | 8 | Adequate |
| Goods, Ishijima, Chang, and Kasari (2013) | JASPER | Community | Research staff | 12 | Weak |
| Chang, Shire, Shih, Gelfand, and Kasari (2016) | JASPER | Community | Educators | 16 | Strong |
| Chiang, Chu, and Lee (2016) | JASPER | Not reported | Caregivers | 20 | Weak |
| Kaale, Fagerland, Martinsen, and Smith (2014) | JASPER | Community | Educators | 14 | Strong |
| Kaale, Smith, and Sponheim (2012) | JASPER | Community | Educators | 14 | Strong |
| Kasari, Freeman, and Paparella (2006) | JASPER | Community | Research staff | 13.75 | Strong |
| Kasari, Gulsrud, Paparella, Hellemann, and Berry (2015) | JASPER | Not reported | Caregivers | 10 | Strong |
| Kasari, Gulsrud, Wong, Kwon, and Locke (2010) | JASPER | Not reported | Caregivers | 12 | Strong |
| Kasari et al. (2014) | JASPER | Community | Caregivers | 21.6 | Strong |
| Kasari, Paparella, Freeman, and Jahromi (2008) | JASPER | Community | Research staff | 13.75 | Strong |
| Lawton and Kasari (2012) | JASPER | Community | Educators | 6 | Weak |
| Shire et al. (2017) | JASPER | Community | Educators | 81 | Strong |
| Wong (2013) | JASPER | Community | Educators | 9 | Weak |
| Boyd et al. (2014) | LEAP | Community | Educators | 585 | Strong |
| Strain and Bovey (2011) | LEAP | Community | Educators | 184 | Strong |
| Drew et al. (2002) | Parent training | Community | Caregivers | 26 | Adequate |
| Duifhuis et al. (2017) | PRT | Lab-based | Caregivers | 15 | Strong |
| Hardan et al. (2015) | PRT | Lab-based | Caregivers | 16 | Adequate |
| Ingersoll (2012) | RIT | Lab-based | Research staff | 30 | Adequate |
ESDM: Early Start Denver Model; ESI: Early Social Interaction Project; JA/Imitation: intervention emphasizing joint attention and imitation skill-building; JASPER: Joint Attention, Symbolic Play, Engagement, and Regulation; LEAP: Learning Experiences Alternative Program; PRT: Pivotal Response Training; RIT: Reciprocal Imitation Training.
Complete list of included studies and features were examined in this meta-analysis. For a list of intervention models, see Appendix 2. Setting is defined as where intervention sessions took place. Implementer is defined as who was the primary interventionist: caregivers (typically parents), educators (teachers, preschool staff, or community-based clinicians), or research staff (individuals employed by or associated with the research team). Professional hours is defined as the number of hours of research staff and participant contact. Study quality was assessed using The Evaluative Method for Determining Evidence-Based Practice in Autism (Reichow, Volkmar, & Cicchetti, 2008).
Appendix 2
Acknowledgements
Some portions of these results were previously presented at the conference for American Association on Intellectual and Developmental Disabilities (2016) and the American Psychological Association Convention (2018). We would like to thank Luc Lecavalier, Theodore Beauchaine, and Andrea Witwer for their comments on a previous draft of this manuscript.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
