Abstract
Early intensive behavioral intervention (EIBI) is widely applied in young children with autism spectrum disorder. Little research has addressed the significance of adherence to EIBI practices for treatment outcomes. The York Measure of Quality of Intensive Behavioral Intervention (YMQI) was designed to assess EIBI quality delivery in Ontario, Canada. The objective of this study was to examine the cross-cultural validity of the YMQI in a clinical Swedish community sample of 30 boys and four girls with autism aged 2.5 to 6 years. Internal consistency was alpha = .87 for the full scale YMQI. Interrater reliability among three raters on 97 video-recorded therapy sequences was .71 (intraclass correlation coefficient [ICC]), and intrarater reliability of two raters re-scoring 15 sequences after 6 months was ICC = .87. The convergent validity of the YMQI with EIBI expert ratings was r = .49. Findings endorse the psychometric properties of the YMQI and its usability outside of Anglo-Saxon countries.
Keywords
Autism spectrum disorder (ASD) is an early onset neurodevelopmental condition characterized by impairments in social communication and interaction, alongside excesses in repetitive and restricted behaviors and circumscribed interests (American Psychiatric Association [APA], 2013). More than 1% of the childhood population fulfills criteria for ASD (Baird et al., 2006; Idring et al., 2012), a mostly chronically disabling condition, without known cure (Bölte, 2014). Available interventions focus on increasing quality of life and functional skills, as well as reducing problem behaviors and preventing worse outcomes, with currently varying evidence base (Oono, Honey, & McConachie, 2013; Reichow, Barton, & Hume, 2012).
Early Intensive Behavioral Intervention (EIBI) is a widely applied teaching program based on applied behavior analysis (ABA) that has shown some evidence in the treatment of ASD in controlled trials (Eldevik et al., 2009; Klintwall, Eldevik, & Eikeseth, 2015; Peters-Scheffer, Didden, Korzilius, & Sturmey, 2011; Reichow et al., 2012; Virués-Ortega, 2010). EIBI is typically provided to children with ASD aged 4 years or younger, with a duration of 2 to 3 years. To reach optimal outcomes using EIBI, intensive training of 20 to 40 hr per week is usually recommended (Klintwall & Eikeseth, 2013; Klintwall et al., 2015). Treatment effects of EIBI are highly variable, with some children reported to reach average ranges of functional skills following training, whereas others show little to no progress (Howlin, Magiati, & Charman, 2009). Factors that are likely to affect outcome involve characteristics of the child (Eldevik, Hastings, Jahr, & Hughes, 2012; Flanagan, Perry, & Freeman, 2012; Perry, Blacklock, & Dunn Geier, 2013; Perry et al., 2011), family variables (Osborne, McHugh, Saunders, & Reed, 2008; Shine & Perry, 2010), and treatment components (Eldevik et al., 2012; Kasari, 2002; Klintwall et al., 2015; Penn, Prichard, & Perry, 2007; Virués-Ortega, 2010; Wolery & Garfinkle, 2002).
Among the child characteristics, the child’s IQ at baseline has the most robust empirical support in the literature for predicting outcomes, with more cognitively able children benefiting more from training than those in the low IQ range (Eldevik et al., 2012; Perry et al., 2013; Rogers & Wismara, 2008). In addition, although less clear, low adaptive levels and high severity of symptoms do predict less desirable EIBI outcomes (Perry et al., 2011). Another child factor that has shown to be important is age at treatment onset, with earlier intervention correlating with better outcomes (Eldevik et al., 2012; Flanagan et al., 2012; Perry et al., 2013). A family factor that has been found to have a negative impact on EIBI treatment outcome is parental stress (Osborne et al., 2008; Shine & Perry, 2010). Stress is an important aspect to address, as parents often are extensively involved in delivering the treatment. Still, most predictive findings are yet too inconsistent to derive personalized treatment decisions about the utility of EIBI for individual children (Eldevik et al., 2012; Kasari, 2002; Klintwall et al., 2015; Penn et al., 2007; Virués-Ortega, 2010; Wolery & Garfinkle, 2002).
Although both child and family factors are important in understanding the variable outcomes of EIBI, characteristics pertaining to the treatment itself and its delivery are at least equally significant to identify and are potentially easier to optimize. EIBI is a comprehensive treatment package that is diverse in techniques, content, and intensity. It is currently not disentangled to determine which of its elements are more or less effective in young children with ASD (Kasari, 2002). For instance, several studies have demonstrated a positive relationship between the intensity of EIBI (hours of training/week) and outcomes (Klintwall et al., 2015; Virués-Ortega, 2010; Virués-Ortega & Rodríguez, 2012). However, it is not always straightforward to control for or estimate the “true” intensity of training time spent on the child (Eldevik et al., 2012). Regarding EIBI content, there are currently few systematic studies comparing different elements of treatment and their relation to favorable outcomes in ASD (Kasari, 2002). Smith, Groen, and Wynn (2000) reported that training of early acquisition of verbal imitation and labeling was linked to better outcomes pointing to the importance of language teaching formats. Generally, EIBI programs are recommended to be based on a progressive teaching curriculum, starting with the conveyance of early developing and basic skills and gradually moving to more complex and independent skills (Eikeseth, 2010). EIBI curricula are tailored to the individual child and are based on pretreatment assessment of deficits and excesses in relation to typical development. Several manuals have been published describing EIBI curricula relevant to ASD (Lovaas, 2002; Maurice, Green, & Luce, 1996).
Different training approaches such as discrete trial teaching, incidental teaching, and natural environment teaching are combined in EIBI to maximize the child’s learning opportunities (Eikeseth, 2010; Eldevik et al., 2009). A 1:1 trainer to child ratio has been used in early EIBI programs that use gradual transitions to teaching in larger groups and different environments (Eldevik et al., 2009). All approaches systematically use positive reinforcement as the pivotal technique to achieve behavior change and skill acquisition. Moreover, prompts and systematic fading of prompts are used to assist the child in understanding and mastering a task without having to experience failure (Lovaas, 2002; Maurice et al., 1996).
Several authors have expressed concern that EIBI relatively often is conducted by insufficiently knowledgeable and skilled trainers, who may also have ambiguous attitudes toward EIBI, which jeopardizes treatment integrity and accuracy (Klintwall, Gillberg, Bölte, & Fernell, 2011; McLeod, 2009). For instance, in the Scandinavian countries, EIBI is predominantly conducted by regular preschool staff, after a brief theoretical and practical introduction workshop led by behavior analysts. Recent studies have shown that knowledge of autism and EIBI is generally low and attitudes toward EIBI are rather negative among preschool and primary school teachers in Sweden (Långh, Hammar, Klintwall, & Bölte, 2016; Zakirova Engstrand & Roll-Pettersson, 2014). Therefore, when implementing an intervention as comprehensive and complex as EIBI in community settings, it is of paramount importance to examine adherence and the quality of delivered services, as well as to establish integrated and continuous EIBI control mechanisms (Denne, Thomas, Hastings, & Hughes, 2015; Woley & Garfinkle, 2002).
Consequently, to improve EIBI delivery quality, that is, the accuracy of performing procedures and techniques according to best practice, regular supervision is considered mandatory, and recommendations have been published that describe how supervision should be designed to shape effective treatment (Bibby, Eikeseth, Martin, Mudford, & Reeves, 2002; Davis, Smith, & Donahoe, 2002; Eikeseth, 2010; Eikeseth, Hayward, Gale, Gitlesen, & Eldevik, 2009; Gibson, Grey, & Hastings, 2009; Grey, Honan, McClean, & Daly, 2005; Hastings & Symes, 2002; Jahr, 1998; LeBlanc & Luiselli, 2016; Symes, Remington, Brown, & Hastings, 2006). To protect consumers of behavior analysis services worldwide, the mission of the Behavior Analyst Certification Board (BACB; https://bacb.com/) is to officiate as a standard setting organization by systematically establishing, promoting, and disseminating professional standards. Therefore, the BACB has developed credentials, such as Board Certified Behavior Analysts (BCBA), Board Certified Assistant Behavior Analyst (BCaBA), and Registered Behavior Technician (RBT) as an effort to ensure a standard of quality of behavior analytic intervention. Despite a growing number of BCBAs internationally, BCBAs are still scarce in Scandinavia. Some centers in the United States and Canada require that staff has BACB credentials, but these thresholds are still out of reach for many countries, including Sweden. Thus, currently, EIBI quality management cannot be based on BACB guidelines, not at least since the services are mostly municipality based (education services), with regulations and directions being different from those of counties (health care).
The significance of the quality of delivered EIBI has not been thoroughly studied or taken into account in many EIBI treatment studies (Penn et al., 2007), including two from Swedish community settings (Barnevik Olsson et al., 2016; Fernell et al., 2011). This is unfortunate, as conclusions about the usefulness and efficacy of the method itself are severely limited without control of the qualification and allegiance of trainers and treatment integrity.
One reason for the lack of research on the effect of delivered EIBI quality on treatment outcome may be, until more recently, the unavailability of evaluated standardized tools to assess EIBI quality to facilitate overall quality management (Denne et al., 2015; Penn et al., 2007). The York Measure of Quality of Intensive Behavioral Intervention (YMQI) is a relatively new and currently, the only published and evaluated instrument designed to assess delivered EIBI quality regarding a broad range of treatment elements and targets (Penn et al., 2007; Perry, Flanagan, & Prichard, 2008; Whiteford, Blacklock, & Perry, 2012). Although, the YMQI is a promising instrument for EIBI quality control and suits local requirements, it has not yet been comprehensively evaluated by an independent research group or in settings outside of Canada and the United Kingdom. Thus, the objective of this study was to evaluate the cross-cultural validity of the YMQI in a community setting in Sweden, also investigating additional psychometric properties.
Method
Instruments
The YMQI is a building block of the broader York System of Quality Assurance (YSQA), that was establish based on clinical experience, research, and expert opinion in the field of EIBI (Perry et al., 2008; Perry, Prichard, & Penn, 2006). The YSQA includes documents concerning programming, level of staff education, amount of supervision, and cooperation with parents. In this study, the YMQI was evaluated as a stand-alone scale, not combined with any other YSQA component. Possible combinations of the YMQI with other YSQA methods and detailed YMQI item descriptions are specified in its manual. The scale consists of 31 items and assesses different aspects of EIBI delivery quality grouped in nine categories (Perry et al., 2008) (see Table 1). The YMQI is rated using two 5-min sequences of videotaped trainer performance during regular EIBI sessions. Items are scored on a 5-point (1, 1.5, 2, 2.5, 3) Likert-type scale with “1” representing poor quality and “3” indicating excellent quality. There is also a possibility on several items to code “not applicable” (N/A), for example, if the child does not exhibit a specified behavior problem or if no prompts are used. An EIBI summary score is calculated by the average of the summed ratings of each of the 2 × 5 min sequences (N/A not included). A summary score of less than 2.1 is considered to reflect poor quality, a score of 2.1 to 2.5 good quality, and a score of 2.5 to 3 excellent quality. Aside from the nine categories used to group the items, the final version also includes four additional derived subdomains (Organization, Pacing, Teaching level, Generalization; see Table 2). Here, only 15 of the totality of 31 items are used to summarize trainer performance and the quality criteria for which values are viewed as poor, good, and excellent differ across these subdomains (Perry et al., 2008).
The York Measure of Quality of Intensive Early Behavioral Intervention.
Note. SDS = discriminative stimuli.
Subdomains of YMQI.
Note. YMQI = York Measure of Quality of Intensive Behavioral Intervention; SDS = discriminative stimuli.
Studies on the YMQI have demonstrated good psychometric properties (Denne et al., 2015; Penn et al., 2007; Perry et al., 2008; Whiteford et al., 2012). The final version of the YMQI (Perry et al., 2008) showed an internal consistency (Cronbach’s α) of rα = .82, and an interrater reliability (intraclass correlation coefficient [ICC]) of ricc = .68. In a study by Whiteford et al. (2012), internal consistency was rα = .77. Interrater reliability was reported as percent agreement across the 31 items showing an average of 88.95%, with a range of 74% to 97%. Denne et al. (2015) used the YMQI in a study aiming to assess competencies in ABA and reported an average inter observer agreement on its items of 99% (range = 81%-100%).
The validity of the YMQI was examined in Perry et al. (2008) by correlating its scores with consensus ratings of trainer performance quality provided by dyads of EIBI experts reaching an overall convergent validity of r = .58. The expert scale consisted of nine quality factors, corresponding with the nine categories of the YMQI and was scored on a 5-point Likert-type scale (1, 1.5, 2, 2.5, 3) where “1” indicated little evidence of appropriateness (significant concerns), “2” generally appropriate (moderate concerns), and “3” consistently appropriate (no concerns). In addition to the nine quality areas, the experts also conducted an overall judgment of quality using a 7-point Likert-type scale (1 = extremely poor quality to 7 = exceptional quality). For the present study, the expert scale was translated to Swedish by a bilingual expert and approved by the original authors.
EIBI
In Sweden, a child with ASD has the right to receive free treatment from the so-called Habilitation Centers in their respective county. The type and comprehensiveness of the intervention is decided on in cooperation with the parents taking into account the child’s autism severity and developmental level as well as the parent’s and preschool’s possibility to participate in the treatment. In the Swedish delivery model, EIBI requires an involvement from both parents and the preschool and a formal agreement of commitment is signed. As mentioned earlier, EIBI is conducted at preschools by regular staff or nursery assistants, who are supervised by personnel from the local Habilitation Centers. Usually the intervention starts with a basic and brief EIBI workshop for designated preschool trainers conducted by EIBI experts. Thereafter, training is continued using case supervision on treated children during meetings of EIBI experts and preschool teachers that take place every week or every second week for a period of approximately 2 years. Programming and supervision are based on national guidelines and recommendations (Föreningen Sveriges Habiliteringschefer, 2004), including learning objectives along with a description of different teaching styles and formats, such as discrete trial teaching, as well as naturalistic approaches.
EIBI Trainers and Supervisors
EIBI trainers were 34 preschool employees (32 females, two males) with different professional backgrounds recruited via local kindergartens that apply integrative approaches for special needs children. Trainers were preschool nurses (n = 19), preschool teachers (n = 10), and nursery assistants (n = 5). They were on average 38.9 years old (range = 20-62, SD = 11.9). Their professional experience varied between 1 and 37 years (M = 11.7, SD = 10.8), and their experience conducting EIBI between 2 weeks and 48 months (M = 8.9 months, SD = 13.1). The preschool staff was supervised by autism experts from local Habilitation Centers with different professional backgrounds (psychologists, speech and language pathologists, teachers in special education) and experience. To ensure high quality of supervision, the Habilitation Centers use a model of internal consultation, which means that a senior colleague with extensive EIBI experience accompanies more junior Habilitation colleagues when supervising preschool staff, and is also available for consultation at other times. Three of the four internal consultants were BCBA.
Autism Sample
Thirty-four children with a clinical consensus diagnosis of ASD, assessed according to Diagnostic and Statistical Manual of Mental Disorders (4th ed., text rev.; DSM-IV-TR; APA, 2000) or the International Classification of Diseases, 10th revision (ICD-10) criteria and following regional clinical guidelines for ASD assessment and treatment (Stockholms läns landsting, 2010; Stockholms läns landsting, Barn- och ungdomspsykiatri, 2015), were recruited during ongoing EIBI from the Autism Center for Young Children in Stockholm, Sweden. Participating children were 30 boys and four girls, aged 2.5 to 6 years.
Procedure
The study was conducted in a natural clinical setting, informed consent was obtained from both parents and preschool trainers, and the project was approved by the regional ethics review board in Stockholm, Sweden. Video clips of 20 min length were recorded at the child’s preschool by psychology students, who were otherwise excluded from the planning of the recording and selection of the training sequences, and were instructed to not interfere with or influence the training sessions. Children were videotaped 3 times with 2 to 3 months intervals during 2013 and 2014. Four children could not be recorded 3 times, resulting in 97 video clips in total. Two 5-min sequences (total of 10 min) were randomly selected from each video clip using the instruction in the YMQI-manual and then rated for EIBI delivery quality using the YMQI.
YMQI Raters and EIBI Expert Raters
To examine the pure YMQI’s technical properties to assess EIBI quality, rather than a priori rater skills, such as EIBI knowledge, three independent psychology master students (one female, two males) not involved in the ongoing EIBI delivery, and with little to no experience of EIBI, were recruited as YMQI raters. They were trained on scoring the YMQI using the instrument’s self-guided instruction DVD and had to reach at least 80% agreement with a given gold standard on three sample videos, as well as to pass a theoretical test of 25 questions on basic ABA-knowledge (achieving 80% correct answers, 20 of 25) before participating in the study. The original English scoring sheets, manual, and training material were used without translation into Swedish. The YMQI student raters were fluent in English and passed the basic YMQI training without difficulties.
Ten behavior modification experts completed the EIBI expert scale to evaluate convergent validity with the YMQI. This expert group (eight females, two males) consisted of behavior therapists (three psychologists, three teachers in special education, three speech and language pathologist, and one occupational therapist) with 10 to 15 years of experience in the field of EIBI.
Data Analysis
The statistic program R 3.2.3 was used to calculate the YMQI’s psychometric properties according to principles of classical test-theory. Item analysis was computed for item difficulty (Ip) and item-totals correlations (Ii − t, part-in-whole corrected). Ip are provided in form of mean YMQI item scores in the total sample, and reflect how well the trainers on average performed on specific EIBI quality indicators. Ii − t are calculated by Pearson product–moment correlations between YMQI item scores and the summary scores in the total sample, reflecting how well the performance of a specific EIBI quality indicator predicts the overall delivered EIBI quality. Reliability was determined using (a) internal consistency (rα) of the total scale and four subdomains using Cronbach’s alpha, (b) interrater reliability (rrr) between two student raters on 97 EIBI training video sequences (one rater scored all 97 videos, the two other raters scored about half of the videos each) expressed as percentage agreement on item and subscale levels, and ICC for the YMQI summary score and the subscales, (c) intrarater reliability (rtt) on 15 EIBI video sequences that were re-scored by two student raters after 6 months, expressed as percentage agreement and ICC for summary score and subdomains. In accordance with earlier studies on the YMQI (Penn et al., 2007; Perry et al., 2008; Whiteford et al., 2012), percentage agreement was measured with a tolerance of 0.5 discrepancies (i.e., a score of 2 is considered reliable with 1.5 and 2.5). The N/A ratings were included in the interrater and intrarater reliability analyses, but converted from the alphanumerical dummy variable “N/A” to the numerical dummy variable “0,” to enable agreement calculations between raters who might have coded the same situation as either N/A or with a score of 1 to 3.
The validity of the YMQI was examined using the concept of convergent validity in a sample of 30 videos. YMQI ratings by student raters (mean of two raters) were correlated with the scores of EIBI expert consensus ratings (consensus of two) using the general EIBI quality scale on the same EIBI videos. Expert ratings on EIBI quality were correlated with the YMQI summary score, the four domain scores, and the nine YMQI category scores using Pearson’s coefficient.
Results
Item Analysis
YMQI item difficulties, reflecting average trainer performance on single YMQI items, and item-totals, reflecting the predictive power of single YMQI items for the total performance on the YMQI, are summarized in Table 3. Difficulties were generally in a similar range (Ip = 2.04-2.67), reflecting average to low average difficulties. The most difficult item was “evidence of skill acquisition” (Item 20), the item with the lowest difficulty was “rapid reinforce delivery” (Item 3). Item-total correlations varied widely between Ii − t = −.13 and .74. Nine items had low item-totals (Ii − t ≤ .30), one of which (Item 6) “relation of reinforcers to the task” had a negative value. Eight items reached high item-totals (Ii − t ≥ .60), with Item 29 “result of problem behavior” showing the best prediction of the total quality of EIBI delivery.
YMQI Item Difficulties (Ip), Item-Totals (Ii − t), and Item Interrater and Intrarater Reliabilities.
Note. YMQI = York Measure of Quality of Intensive Behavioral Intervention; SDS = Discriminative Stimuli.
Reliability
The internal consistency of all YMQI items was rα = .87 and rα = .60 for the domain “organization” (two items), rα = .80 for the “pacing” (three items), rα = .53 for “teaching level” (two items), and rα = .48 for “generalization” (eight items). The interrater reliability for the YMQI summary score using ICC was .71. The ICCs for the four domains were .70 for organization, .70 for pacing, .44 for teaching, and .64 for generalization. Overall interrater reliability expressed as percentage agreement across items was 76%. Agreement on YMQI item level is presented in Table 3. Median item agreement was 78% (range = 56%-90%). On items where N/A is an option, the agreement between whether these items could be scored or not was on average, 83% (range = 62%-100%). Intrarater reliability for the YMQI summary score measured with ICC was .87. ICCs for the four domains were .75 for organization, .90 for pacing, .80 for teaching and .89 for generalization. Intrarater reliability in terms of overall percentage agreement was 91%. Percentage of agreement on item level is presented Table 3.
Validity
The convergent correlation between the YMQI summary score and the expert rating scale total score was r = .49 (p = .006). Correlations between YMQI ratings and expert ratings of EIBI quality for the four subdomains were r = .27 (p = .15) for organization, r = .53 (p = .0025) for pacing, r = .52 (p = .003) for teaching level, and r = .73 (p < .0001) for generalization. For the nine YMQI categories (Penn et al., 2007; Perry et al., 2008), correlations between experts and raters are reported in Table 4. Organization showed the lowest convergence, r = .27 (p = .15), and Generalization the highest, r = .71 (p < .0001). The problem behavior category could not be estimated due to the large number of N/A observations.
Convergent Validity of YMQI and Expert Ratings on Empirically Derived EIBI Quality Categories.
Note. YMQI = York Measure of Quality of Intensive Behavioral Intervention; EIBI = Early intensive behavioral intervention.
Discussion
Despite its apparent importance for intervention outcomes, unfortunately, the quality of delivered EIBI according to best practice has been largely neglected in many studies on the efficacy and effectiveness of the method. Studies lacking EIBI quality monitoring must be viewed as having limited validity and generalizability. One significant aspect that might have previously hampered better control of EIBI quality was the shortage of available psychometrically sound standardized instruments for EIBI quality assessment. An exception is the YMQI, which has earlier demonstrated good reliability and validity (Denne et al., 2015; Perry et al., 2008; Whiteford et al., 2012). However, the YMQI has not been evaluated by an independent research group or comprehensively outside of any Anglo-Saxon country. The current study is therefore the first to examine the YMQI’s psychometrics independently and from a cross-cultural perspective.
In a naturalistic community setting in Scandinavia, overall, we found moderate to excellent indicators of reliability, as well as convergent validity, with EIBI expert ratings of delivered treatment quality. Item analyses showed mostly average to low average item difficulties in the examined trainer sample. This appears adequate for participants with fair EIBI expertise under expert supervision on a scale that is essentially criterion-oriented, that implicitly assumes that all EIBI should be delivered with high quality, that is, one should be able to reach high level on all items. Findings for item-total values were mostly acceptable to excellent, although a substantial minority of nine of 31 items showed low correlations with the overall quality of performance, and one item even correlated negatively with the summary score of quality. These low values indicate that the items are not measuring the same underlying construct measured by the majority of the other items included in the YMQI. The respective items mostly represent the quality of reinforcement and generalization. The internal consistency for the domain “generalizability” was low, despite the relatively many items composing it. Our data suggest that the inclusion of these items, their administration, construction or scoring should be reconsidered or further examined. However, measuring quality in EIBI is complex and probably in itself multidimensional, thus one might not expect that a certain construct should be measured with high fidelity in the YMQI. That is, the YMQI is rather a criterion-based scale, broadly inventorying and evaluating trainer skills, where the expectation is that ideally all trainers score maximally on all items. Compared with other psychological tools, the YMQI does not aim to differentiate individuals on a single quantitative construct. Therefore, some procedures of classical test-theory might only offer incomplete information on the scale’s usefulness. Consistently, some of the YMQI items showing low item-totals also showed very low raw score variation (i.e., there were no scores reflecting really high or really low quality). The latter is not atypical for rather complex criterion-oriented scales, limiting the potential of correlational analyses. The item “result of problem behavior” showed the highest correlation with the YMQI total score, which might reflect that adequate management of challenging behavior may be a good predictor of general trainer skills. Our detailed item analysis reveals novel psychometric data on the YMQI and its configuration of items that might be taken into account in case of a revision of the instrument. Moreover, our analyses on intrarater reliability provide new insights into the stability of YMQI ratings. Findings indicate good to excellent reliability for all items of the YMQI within raters over time.
Aside from our item analyses and intrarater reliability data, which entail characteristics of the YMQI that have not been previously investigated, the findings of the current study on internal consistency, interrater reliability, and convergent validity are mostly in line with earlier results (Penn et al., 2007; Perry et al., 2008; Whiteford et al., 2012), although interrater agreement for some items and overall validity were somewhat lower in our study. For example, Item 11 (“lack of prompting errors”) only yielded a percentage agreement of 56%. For rating this YMQI item, several aspects of prompting errors need to be considered, some of which are probably challenging in face of limited experience of EIBI. Even in the study by Whiteford et al. (2012), Item 11 showed one of the lowest values for interrater agreement.
As we used rather naïve raters for YMQI, one explanation for the moderately lower observer agreement might therefore be differences in rater experience in using the instrument. There might also have been minor language and cultural differences in the interpretation of the manual, and cultural differences in the understanding of good EIBI practice, teaching and supervision styles. We found that interrater reliability increased from the initial YMQI ratings to later ratings, indicating a practice effect. The DVD-based introduction course to the YMQI and the knowledge test may not be extensive enough to obtain high spontaneous reliability and validity. In the study by Whiteford et al. (2012), the raters received a booster session and a consensus discussion with focus on items that were considered difficult to assess, while our coders did not receive any comparable extra education after initial training. However, the YMQI is constructed to be used by non-EIBI experts after having passed the DVD course. Thus, the current study might provide more precise estimation of the YMQI’s pure or initial psychometric properties.
Some limitations of this study deserve to be addressed. First, the YMQI ratings in our study may have been influenced by the fact that the raters for practical and economic reasons also had collected the video clips. However, the raters were strictly excluded from any other procedures related to data collection (e.g., sequence selection) why the risk of related bias should be limited. Second, we used the English version of the manual, not a translated and back-translated version, which might have hampered an optimal administration of the instrument. Third, we only included three YMQI raters, while a more naturalistic setting of raters should include far more raters with different backgrounds and levels of experience.
As discussed previously by Denne et al. (2015), the YMQI focuses on the detail competence needed to perform EIBI in daily teaching, which might be deemed a narrow means of assessing quality. EIBI competence involves other aspects, such as cooperation with colleagues and parents, organizational skills, and preference assessments and collection of the child’s learning rate. The lack of correlation between YMQI and other instruments in Denne et al. (2015) might imply that the YMQI measures a specified technical aspect of EIBI quality rather than its totality. Some items, such as “suitable task difficulty” and “teaching embedded in naturalistic activities,” may reflect the supervisor competence in programming rather than the trainer skills indicating the importance of a broader quality assessment.
Chosen objectives and teaching strategies by the supervisors also affect the trainer’s probability of high scores on certain items. For example, “relation of reinforcers to the task” will reach higher quality scores if the trainer is working according to the practices of naturalistic interventions such as Natural Language Paradigm or Milieu Therapy. On the contrary, the use of incidental procedures may lead to low scores on items assessing instructions, as the strategy here is to wait for the child’s initiation instead of teacher-initiated learning opportunities. Nowadays, incidental strategies are frequently used in EIBI programs: Therefore, there is a risk that a trainer showing high quality in this technique might be underestimated due to the construction of the scale. A possibility to score N/A on the item “attending during SD” along with an additional description of incidental teaching in the manual would be helpful improvements for the YMQI. Furthermore, as our results show low agreement for some items regarding whether they could or should be scored or not (N/A vs. trainer performance score), additional clarifications in the manual about when to choose N/A would be beneficial.
There is a certain risk that the relatively limited video material (10 min) used to score the YMQI is not representative of longer extracts and a comparison to an YMQI evaluation. For instance, 20 to 30 min of recorded trainer behavior would give useful information on this matter. Future research should also include descriptions of programs and guidelines used, as well as key material and data sheets. The broader YSQA involves live observations and assessments of other aspects not captured by the YMQI only, and in quality assurance in clinical settings, it is advisable to complement the YMQI with these measures or other instruments like the one used in the study by Denne et al. (2015). Moreover, to be able to fully judge the validity of the YMQI, future research need to address its prognostic validity using longitudinal designs, that is to investigate to which extent the quality of EIBI is a predictor of treatment outcome. The latter should include detailed analyses of the prognostic power of single quality indicators (YMQI items) in relation to child characteristics (e.g., developmental level, adaptive skills, severity, and symptom profiles) that could inform trainer education and program development. Similarly, we suggest that the usefulness of the YMQI might be enhanced by using it not only for pure quality assessment but also for individual outcome improvement, by using it as an education and supervision tool to reach emended trainer performance. As such, based on the video recordings taken, there are valuable options to provide standardized and tailored feedback to trainers on how to improve intervention.
In conclusion, we found predominantly moderate to excellent psychometric properties for YMQI for several parameters of reliability and convergent validity with expert ratings of EIBI performance quality. Hence, this study endorses and enlarges the evidence base of the YMQI, especially related to its cross-cultural usability outside of Anglo-Saxon countries, and feasibility even in EIBI inexperienced users. We also identified some possible areas of improvement of the YMQI concerning items to assess EIBI reinforcement and generalization. To our best knowledge, the YMQI is currently the only available standardized scale of EIBI quality measurement and control. As EIBI delivery quality has often been disregarded in research and community delivery settings, we hope the current study will further stimulate monitoring and assurance of EIBI quality in treatment trials and clinical practice of ASD.
Footnotes
Acknowledgements
We want to thank all children and trainers who participated in the present study and the leads of Habilitation and Health, Stockholm County Council, for the possibility to conduct the study in a routine clinical setting. We also sincerely thank Nina Milenkovic and Ragnar Nordqvist who acted as YMQI (York Measure of Quality of Intensive Behavioral Intervention) raters.
Declaration of Conflicting Interests
The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Sven Bölte discloses that he has in the last 3 years acted as an author, consultant, or lecturer for Shire, Medice, Roche, Eli Lilly, Prima Psychiatry, GL Group, System Analytic, Kompetento, Expo Medica, and Prophase, and receives royalties for text books and diagnostic tools from Huber/Hogrefe, Kohlhammer and Uni-Taschenbücher (UTB) publishers.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Ulrika Långh was supported by Frimurare Barnhuset, Stiftelsen Kempe-Carlgrenska fonden, Sällskapet Barnavård, Stiftelsen Claes Groschinkys Minnesfond, the Stockholm County Council, and Karolinska Institutet. Sven Bölte was supported by the Swedish Research Council.
