Abstract
More than a decade has passed since the National Research Council described the common elements of effective educational programs for young children with autism. Since that time, few studies have attempted to understand the mechanisms of change and factors affecting the effectiveness of research supported interventions implemented in community settings. Using Dunst’s (2013) framework of implementation science, we examined the relationships between the fidelity of an implementation practice (i.e., a parent–teacher consultation called the Collaborative Model for Promoting Competence and Success; COMPASS), the fidelity of the intervention practice (i.e., teachers’ adherence to the intervention plans generated as a result of COMPASS), and child goal attainment outcomes using data from a randomized controlled trial. Results confirmed the predicted relationships between implementation fidelity, intervention practice fidelity, and child outcomes. Specifically, we replicated findings that two hypothesized mechanisms of change, individual education program (IEP) quality and teacher adherence, positively affected intervention practices directly and child outcomes indirectly.
Keywords
More than 10 years ago, the U.S. Department of Special Education Programs asked the National Research Council (NRC; 2001) to summarize studies describing the science, theory, and policy for educating young children with autism. To accomplish this goal, the NRC invited a group of experts to form the Committee on Educational Interventions for Children with Autism, to address several issues including early intervention. The committee identified six critical features common across all effective programs: (a) early treatment that starts as soon as autism is suspected, (b) active engagement in instructional programming for a minimum of 25 hr a week, (c) planned and repeated instruction, (d) parent training, (e) low student–teacher ratios, and (f) ongoing assessment and progress monitoring. In addition to these fundamental characteristics, the committee also identified specific and prioritized domains of instruction that included the core features of autism—social and communication skills and engagement and flexibility in developmentally appropriate tasks and play, as well as other areas of fine and gross motor skills, cognitive skills, and behaviors that are the foundation to success in general education classrooms, such as independent organizational skills.
Since the publication of this report a decade ago, researchers have called for more studies documenting the mechanisms of change of effective interventions to help explain why particular approaches lead to positive child outcomes (Bernard-Opitz, Ing, & Kong, 2004; Kasari, 2002; Kasari et al., 2005; Schreibman, Dufek, & Cunningham, 2011; Wetherby & Woods, 2008). In this context, mechanisms of change refers to “the basis for the effect, i.e., the processes or events that are responsible for the change; the reasons why change occurred or how change came about” (Kazdin, 2007, p. 3). Although isolated studies have focused on specific interventions (cf. Yoder & Stone, 2006), information remains limited on active ingredients (Lerner, White, & McPartland, 2012), and even less has been learned about the implementation or adoption and uptake of the NRC recommendations within the policies of state departments of education or public school programs of students with autism.
Implementation Science
The omission of research on the adoption and uptake of best practices is consistent with the general failure to attend more carefully to the importance of implementation science (Odom, Cox, & Brock, 2013). Fixsen, Naoom, Blase, Friedman, and Wallace (2005) define implementation science as a “specified set of activities designed to put into practice an activity or program of known dimensions.” Dunst (2013) operationalizes an implementation framework (see Figure 1) that characterizes these “known dimensions” as three related, but distinct, components: the implementation practice, the intervention practice, and the associated outcomes. In this model, implementation practice refers to the methods used by implementation agents (e.g., program directors, consultants, coaches, trainers) to promote interventionists’ (e.g., teachers, therapists, parents) use of evidence-based intervention practices (e.g., applied behavior analysis) that should be associated with positive outcomes in targeted groups (e.g., children, students; Dunst & Trivette, 2012; Fixsen et al., 2005). A strength of the Dunst framework is the explicit differentiation of implementation and intervention components. Evaluation of the methods for transferring skills from the trainer (i.e., implementation practice) to the trainee (i.e., intervention practice) is especially appropriate within the educational environment, in which a trainer or consultant, is often tasked with training teachers in a new intervention. Unfortunately, compared with what is known about available evidence-based practices, research on implementation practices is relatively rare.

Example of COMPASS Teaching Plan for a Social Goal
Although the lack of implementation science research is certainly not limited to interventions for children with autism (Fixsen et al., 2005), the potential negative consequences for not prioritizing such implementation research are pressing and serious. Recent estimates of the incidence of autism from the Centers for Disease Control and Prevention (2012) indicate that as many as 1 out of 88 children are affected, which translates into 1 of about 54 boys. Although the rates of autism are surging, the availability of high-quality and prepared teachers is not keeping pace. A study by Hess, Morrier, Heflin, and Ivey (2008) of 185 teachers revealed that fewer than 10% of the teaching methods used in the classroom had scientific support. Furthermore, parent-initiated litigation is growing among this group of students (Bitterman, Daley, Misra, Carlson, & Markowitz, 2008; Fogt, Miller, & Zirkel, 2003; Yell, Katsiyannis, Drasgow, & Herbst, 2003) and is disproportionately high when compared with other disabilities (Zirkel, 2011). At the same time, teacher burnout is high (Billingsley & McLeskey, 2004; Emery & Vandenberg, 2010), further exacerbating the national shortage of special educators (Boe, Cook, & Sunderland, 2008; McLeskey & Billingsley, 2008). Importantly, stressed teachers also are less effective (Ruble & McGrew, 2013). These are critical issues of public policy that point to the urgent need for focused attention on implementation science research in public schools.
Probably the best example to date describing the implementation process of a professional development intervention for teachers of students with autism was the Evidence-Based Individualized Program for Students With Autism (EBIPSA) provided by Odom and colleagues (Odom, Collet-Klingenberg, Rogers, & Hatton, 2010). Their model includes four stages: exploration, installation, initial implementation, and full implementation. A total of nine states were trained at the school/building/district levels in EBIPSA. Using descriptive evaluation data, the authors found significant changes in overall program quality (e.g., intervention fidelity) and program outcomes (e.g., student goals and parent satisfaction) using EBIPSA. With respect to intervention variables, there was a significant increase in both teachers’ use of evidence-based practices and fidelity over time. With respect to outcomes, teacher-rated goal attainment scaling of student progress at the end of the school year indicated student improvement for 98% of 420 goals assessed; of these, 79% were met or exceeded. For families, parent-reported satisfaction was high for the program. The data are quite promising and illustrate the potential importance of assessing both outcomes and intervention variables in helping understand the active ingredients/mechanisms of change for implementation models, such as the EBIPSA.
The Collaborative Model for Promoting Competence and Success (COMPASS)
The aim of the present study was to apply Dunst’s (2013) implementation science framework to examine the critical factors and mechanisms of change of an evidence-based consultation intervention called COMPASS (Ruble, Dalrymple, & McGrew, 2012), designed to improve the educational outcomes of children with autism. The COMPASS intervention uses a bottom-up approach that begins with the people who have the most frequent interactions with the child—parents and teachers. A critical feature is that it includes parents in the initial consultation for goal planning and setting and intervention development. The full COMPASS intervention consists of the initial consultation, which generates student educational goals and the associated intervention plans, followed by a series of four teacher coaching sessions spaced evenly throughout the school year (less than 10 hr total). It is classified as an implementation practice, because it intervenes indirectly rather than directly with the targeted “client,” the student, by attempting to improve the intervention practice (i.e., the teaching practices and skills of classroom teachers).
The COMPASS intervention has been shown to be effective in two randomized controlled trials (RCTs; Ruble, Dalrymple, & McGrew, 2010; Ruble, Dalrymple, & McGrew, 2012). In both trials, the design explicitly included an analysis of the implementation practice (what the consultant did with the teacher) and the intervention practice (what the teacher did with the child), as well as an objective measurement of the practice outcome by an independent observer unaware of group assignment (child goal attainment). Results of both trials indicated that COMPASS is effective in helping children attain their Individual Education Program (IEP) goals (Ruble, Dalrymple, & McGrew, 2010; Ruble, McGrew, Toland, Dalrymple, & Jung, 2013), as indicated by a very large effect size when applied face-to-face (d = 1.5 and 1.4, for RCTs 1 and 2, respectively), and a large effect size when coaching sessions were conducted using web-based videoconferencing (WEB) compared with a placebo-control condition (effect size, d = 1.1; Ruble et al., 2013). No statistically significant differences were found between the face-to-face (FF) and WEB conditions in student goal attainment, parent and teacher satisfaction, and coaching fidelity.
Consistent with the aforementioned NRC recommendations, COMPASS targets three core deficit areas associated with autism: social, communication, and independent learning skills (e.g., ability to start and complete a task independently). The goals reflect both parent and teacher priorities. Teaching plans are created for the three core areas, and personalized for each child, taking into account the child’s personal and environmental challenges and supports relevant to the targeted skill, which are then incorporated into the teaching plan. Research supported practices are integrated into the teaching plans by linking focused interventions (Odom et al., 2010) to the targeted skill. Figure 2 shows an example of a consultation report and teaching plan following the consultation. In this example, the child’s personal and environmental challenges and supports are discussed in relation to the goal of turn-taking. The teaching strategies are discussed and online resources of evidence-based practices are recommended. More details on the development of goals and teaching plans as well as case study examples are provided in a book-length manual on COMPASS (Ruble, Dalrymple, & McGrew, 2012).

Based on Dunst (2013) Framework for Implementation Science
After the team develops goals and teaching plans for each of the three targeted areas, teachers are asked to update the child’s IEP to reflect these specialized and personalized learning goals. Thus, COMPASS consultation serves as the platform for creating a comprehensive high-quality revision of the teaching plan, including measureable goals and clearly delineated evidence-based teaching strategies. In the first RCT (Ruble, Dalrymple, & McGrew, 2010), we hypothesized that improved IEP quality would be a key active ingredient of COMPASS consultation. To test this, we examined features of IEP quality expected to change as a result of receiving COMPASS (i.e., IEP targeted quality). As predicted, IEP quality was higher for the COMPASS group compared with the control group and was positively correlated with child goal attainment change. These findings suggest that IEP quality improves, as modified during COMPASS, and may be a critical active ingredient/mechanism of action of COMPASS effectiveness.
Another potential active ingredient of COMPASS consultation is adherence to the teaching plans, first as formulated originally following the initial consultation and later as revised throughout the year following the four additional coaching sessions. We measured teacher adherence to the teaching plans during each coaching session. Using data from the first RCT, preliminary evidence suggests that at least four coaching sessions are needed because teacher adherence in implementing the teaching plans improved over time and was associated with child goal attainment outcomes only for the last coaching session (Ruble, Dalrymple, & McGrew, 2010). Thus, there is preliminary evidence that intervention quality, as measured both by IEP quality and by teacher adherence (i.e., what the teacher does as a result of the implementation practice or COMPASS) affects outcomes. Moreover, for the present study, consistent with Dunst’s (2013) framework, we hypothesize further that COMPASS implementation quality (i.e., the quality of the initial consultation and subsequent coaching sessions as delivered by the consultant) should affect the intervention quality (i.e., IEP quality and teacher adherence, respectively).
The present study examines data from a second RCT in an attempt to replicate and extend results from Ruble, Dalrymple, and McGrew (2010) on the potential mechanisms of action of COMPASS using the implementation science framework as a guide. We had three hypotheses. First, we hypothesized that high-quality implementation practice, as indicated by consultant and coaching fidelity, would be positively associated with measures of intervention practice, that is, IEP quality and teacher adherence to the implementation of teaching plans for the four coaching sessions. The second hypothesis was that high-quality intervention practice would be positively associated with practice outcomes, that is, child goal attainment collected sequentially during each of the four coaching sessions and at the end of the school year by an independent observer. Specifically, as initially tested and shown in the original RCT, we expected that the active ingredients of the intervention practice—IEP quality and teacher adherence in implementing the intervention plans—would be associated with child goal attainment outcomes. Finally, also as initially tested and shown in the original RCT, features of IEP quality associated with the intervention and expected to change as a result of the intervention (i.e., IEP targeted quality) will be higher in the intervention groups compared with the placebo-control group (see Figure 1).
Method
Participant Characteristics
Teachers
A total of 44 special education teachers, one randomly selected student with autism from each teacher’s caseload (n = 44) and their parents (n = 44) participated. All but one of the teachers were female (n = 43). Fifty percent of the teachers had bachelor’s degrees and 43% had master’s degrees. Half of the teachers were from schools located in towns with fewer than 75,000 residents. Thirty percent of the teachers taught preschool, 18% taught in a resource room, and 48% taught in a special education classroom. The average class or caseload size was 12.4 (SD = 5.3). The majority of teachers reported having no formal coursework in autism (56%) or supervised field work (75%). Instead, most reported that their training came from other formal means (86%), informal training (80%), workshops or conferences (68%), reading (64%), informal consultation (61%), or the Internet (50%).
Children with autism
The mean age of the participating children was 6.0 years (SD = 1.6). Children were screened for autism with the Modified Checklist for Autism in Toddlers (M-CHAT; Robins, Fein, Barton, & Green, 2001), or the Social Communication Questionnaire (SCQ; Rutter, Bailey, & Lord, 2004), depending on age. All completed the Autism Diagnostic Observation Schedule (Lord et al., 2000). Eighty-four percent of students were male, 77% were White, 7% Black, 2% Asian, 7% other, and 7% were unidentified. Twenty-eight percent of participants had household incomes less than US$25,000; 36% were between US$25,000 and US$49,999; and 36% were above US$49,999 (eight parents did not report income).
About a third of the children (n = 16) were preschoolers taught in inclusive early childhood special education programs. Fourteen split their time between general education and special education classes, with 7 educated primarily in the general education classroom (5+ hr) and 7 educated primarily in special education classrooms. The remaining 11 were in a segregated resource room.
Sampling Procedure
Special education teachers were recruited from two mid-southern states from 14 different counties, who were responsible for the IEPs of students between 3 and 8 years with autism.
School administrators were approached first, followed by direct contact with teachers. If teachers had multiple students with autism, one student was randomly selected for recruitment. Teachers forwarded a letter asking the parent for permission to be contacted by the researchers. If the parent refused to participate, another child was randomly selected. A total of 44 parent–student dyads were randomized following the baseline evaluation into the following groups: 15 control, 14 face-to-face (FF), and 15 Internet-video-based (WEB) teacher–child participants. The 14 FF and 15 WEB participants comprised the experimental group. Both teachers and parents provided informed consent to participate.
Measures
Measures to establish sample equivalency
To ensure that random assignment worked properly, a combination of school, program, teacher, and child variables were compared (see Table 1). Demographic information about schools was collected. Teachers completed a questionnaire on each student’s program (e.g., time in general education, time spent 1:1 with special education teacher, and with teaching assistant) and their own experience and training in autism. For child comparisons, five commonly used reliable child measures were administered at Time 1 to verify equivalency: autism severity, language, cognition, adaptive behavior, and externalizing behavior. The Childhood Autism Rating Scale (CARS; Schopler, Reichler, DeVellis, & Daly, 1980) was used to determine severity of symptoms. Language was assessed with the Oral and Written Language Scales (OWLS; Carrow-Woolfolk, 1995). Cognitive level (IQ) was evaluated using the General Conceptual Ability (GCA) subscore of the Differential Abilities Scale (DAS; Elliott, 1990). Adaptive behavior was measured with the classroom edition of the Vineland Adaptive Behavior Scales (VABS; Sparrow, Cicchetti, & Balla, 2005). Externalizing behavior was measured using the Behavior Assessment System for Children (BASC-2; Reynolds & Kamphaus, 2004).
Between-Group Analysis of Teacher and Child Variables by Group Assignment
Childhood Autism Rating Scale.
Oral and Written Language Scales.
Differential Abilities Scale.
Vineland Adaptive Behavior Scales.
Behavior Assessment Scale for Children, 2nd ed.
All tests are two tailed.
Implementation fidelity
Two measures of implementation fidelity were administered. One focused on the initial consultation and the other on the coaching sessions.
Consultant adherence to the initial consultation
Consultant adherence to the initial consultation protocol (Consultation Fidelity) was assessed with a 25-item close-ended (yes/no) checklist completed by teachers (α = 82).
Consultant adherence to coaching protocol
Consultant adherence to the coaching protocol (Coaching Fidelity) was assessed using a 7-item scale completed by the teachers. Teachers used a 4-point Likert-type scale ranging from 1 (not at all) to 4 (very much) to rate the degree to which the consultant followed the protocol with fidelity (α = .85).
Intervention fidelity
Two intervention fidelity measures were collected: independent ratings of IEP quality and consultant ratings of teacher adherence to the intervention plans obtained at each coaching session.
Individual Education Program (IEP) quality
Initial IEPs were collected prior to randomization as part of the Time 1 evaluation for all groups, and a second time, following the consultation for the two experimental groups, WEB and FF, because all teachers were asked to update the IEPs to include the objectives targeted by the COMPASS consultation. The IEP quality measure (Ruble, Dalrymple, McGrew, & Jung, 2010) was developed using standards from the Individuals With Disabilities Education Act (IDEA; 2004) and best practices from the NRC (2001). The measure assesses eight IDEA and nine NRC quality indicators. For the present study, six items assessing features of the IEP expected to change as the result of the COMPASS consultation were averaged to create the Targeted IEP Quality subscale and included three IDEA indicators (i.e., goal is measurable, criterion for skill accomplishment is provided in the goal description, conditions under which behavior is to occur is described) and three NRC quality indicators (i.e., IEP contains goals for social, communication, and learning skills). Items were rated using a 3-point Likert-type scale: 0 (no/not at all), 1 (somewhat), and 2 (yes/clearly evident). A copy of the entire measure is available in Ruble, Dalrymple, McGrew, and Jung (2010). IEP quality scores were computed at Time 1 and Time 2 with higher scores reflecting higher-quality IEPs. To help ensure objective ratings, the primary rater for IEP quality did not participate in the consultation or coaching sessions and was unaware of group assignment. Interrater agreement was calculated for 20% of the sample. The sample ICC for absolute agreement was .95.
Teacher adherence to intervention
Immediately following each coaching session, the consultants completed a 5-point Likert-type scale using one item (1 = not very, 5 = very much) to rate the degree to which the teacher followed the teaching plan recommendations. All sessions were audio-taped. Two raters independently rated 45% of the coaching sessions. Estimated percentage agreement, weighted Kappa, and interclass correlations (ICC) in the current sample were .79, .80, and .90, respectively.
Practice outcome
Psychometric Equivalence Tested Goal Attainment Scaling (PET-GAS; Ruble, McGrew, & Toland, 2012)
PET-GAS, an idiographic assessment system, was applied because each child had different goals, different baseline levels of skill associated with the goals, and different teaching plans. Several procedures were implemented to ensure reliable, valid, and group-equivalent outcome assessment using PET-GAS. Goal attainment scales were written using the following 5-point rating scale: −2 = child’s present levels of performance, −1 = progress, 0 = expected level of outcome, +1 = somewhat more than expected, +2 = much more than expected. Half-scores were allowed when raters observed skill level between two benchmarks. A score of zero represented improvement consistent with the actual description of the written IEP objective. To ensure goal equivalence, structured guidelines were utilized during the creation of the goals, and psychometric equivalence was tested following goal creation using three measures: level of difficulty (e.g., goals were selected that were expected to be attainable by most children but not easy), measurability (e.g., use of clear behavioral descriptions including specific wording concerning duration, frequency, and needed supports), and equivalence (an equivalence chart was created for percentage accuracy, frequency, number of prompts, and level of support needed in performing behaviors). Detailed descriptions are provided in Ruble, Dalrymple, and McGrew (2010) and in Ruble, Dalrymple, and McGrew (2012). When group non-equivalence was detected, first, goals were reformulated in an attempt to ensure equivalence, and second, if needed, covariate analyses were used to control for non-equivalence. The pre- and posttreatment GAS ratings used to determine COMPASS effectiveness were based on direct observations from an observer unaware of group assignment of teachers’ instructional activities. For goals that represented generalization of skills across settings, the independent rater had to directly observe the skill as demonstrated at the time of the direct observation and confirm those observations with other records when needed (teacher data and teacher interview). As recommended, only raw scores were used in the analysis (MacKay, 1996; Schlosser, 2004). To assess interrater agreement, 39% of the teacher–student learning situations were videotaped as part of the final Time 2 evaluations and rated by two observers. Using raw scores, the sample single-measure ICC for interrater agreement of GAS scores was .82 for social, .86 for communication, and .91 for learning skills goals.
Research Design
A randomized, single-blind, pre–post, control group design was applied. Teacher–child pairs were randomized into one of three groups: (a) teachers who received an online autism training that served as a placebo-control group; (b) teachers who received COMPASS comprised of conjoint parent–teacher consultation plus follow-up face-to-face (FF) teacher coaching sessions; and (c) teachers who received COMPASS comprised of conjoint parent–teacher consultation plus follow-up web-based (WEB) teacher coaching sessions.
Intervention
The intervention consisted of an initial 3-hr parent–teacher consultation and four 1.5-hr coaching sessions. Parents were invited but not required to attend the coaching sessions.
COMPASS consultation
The initial consultations were provided by the first author or a second consultant. All were conducted in-person at the school and occurred within the first 2 months of the start of the school year. Prior to the consultation, parents and teachers completed a COMPASS assessment questionnaire, which was collected and summarized into a joint form used for discussion about the child’s personal and environmental challenges and supports associated with social, communication, and independent/adaptive behavior skills at school and at home. A shared decision-making approach was used to identify and discuss areas of overlap and differences in parent and teacher observations and concerns. Goals related to the three core learning domains of communication, social skills, and independence were identified and prioritized, and each was translated into an IEP objective. Each objective was carefully written as a measurable skill, that included type and frequency of prompts to be used, criterion for skill to be demonstrated, and where or with whom the skill would occur. After the goal was specified, teaching plans were developed for each of the three skills to be targeted and tracked throughout the school year. The strategies described in the teaching plans were developed using evidence-based strategies summarized by the National Autism Professional Development Center on Autism Spectrum Disorder (NAPDC), customized for the specific context of each child. This “personalization” process is consistent with recommended procedures for implementing evidence-based practices in psychology as outlined by the American Psychological Association (APA Task Force on Evidence Based Practice, 2006), and with the COMPASS theoretical framework of identifying personal and environmental challenges and supports necessary for deriving a successful teaching plan and for ensuring a fit between the child’s characteristics and the educational context (see Figure 2). At the end of the consultation, parents and teachers completed a checklist rating of perceptions of consultant adherence to the COMPASS implementation protocol and a satisfaction measure. The child’s IEP team met within 2 weeks following the initial consultation to update the IEP so that the goals identified in the consultation and targeted for follow-up coaching were reflected in the child’s program.
Prior to the first coaching session, the consultant created the GAS for each skill. The GAS was used for progress monitoring at the four subsequent teacher coaching sessions and at the final outcome assessment.
Teacher coaching
Coaching sessions took place approximately every 5 weeks. Typically, two sessions occurred during the fall semester, and two during the spring semester. Similar to the initial consultation, a written protocol was developed and followed for each coaching session that included: (a) observing a teacher-made videotape of instructing the child on the three targeted objectives and soliciting teacher feedback on what was observed, (b) scoring the child’s progress using the GAS form, and (c) discussing the teaching plans and making any adjustments to the plans based on discussion and review of the video. The same protocol was implemented for both the FF and WEB groups. An adherence checklist was used to ensure the consultant implemented all procedures similarly for both the FF and WEB groups. Within a week after each session, a 2- to 3-page summary was written and sent to both the teacher and parent. The report provided a description of the video observations, information discussed, measured progress using the GAS form, and recommendations prior to the next session.
Results
Group Equivalency
Table 1 provides a summary of between-group analyses for school, teacher, and child variables. For school variables, there were no differences between groups on school location, χ2(3) = 2.5, p = .47, or students who qualified for free or reduced lunch, χ 2 (3) = 3.0, p = .39. For program variables, there were no differences between groups in the number of hours of school attendance or in a general education classroom or for the number of minutes of 1:1 support from a special education teacher or teaching assistant. Teacher factors of experience with autism and training also were equivalent between groups. A higher percentage of teachers in the experimental group (n = 16) had master’s degrees compared with the comparison group, n = 3; χ2(1) = 5.3, p = .046. There were no differences between groups on any of the child variables.
Process Measures
Adherence measures
Consultant fidelity to COMPASS consultation and coaching protocols
Overall mean adherence for the initial COMPASS consultation was 22.6 (SD = 2.9) out of 25 as rated by teachers, meaning that about 90% of the COMPASS components were viewed as being implemented, and 20.0 (SD = 6.8) as rated by parents, meaning that 80% of the components were viewed as being implemented. Independent samples t-tests indicated no statistically significant difference in mean adherence scores between FF and WEB groups based on teacher and parent ratings (see Table 2). Overall mean adherence for coaching was 3.8 (SD = 0.19) for both the FF and WEB conditions, indicating that teachers rated consultants as adhering to the protocol “very much” (max rating = 4). An independent samples t-test showed no statistically significant difference in mean adherence scores between WEB (M = 3.8, SD = 0.19) and FF (M = 3.8, SD = 0.23) groups, t(20) = 0.83, p = .42, Cohen’s d = 0.0.
Between-Group Analysis of Fidelity Ratings of FF and WEB Groups
Note. FF = face-to-face; WEB = web-based videoconferencing.
Teacher fidelity to teaching plans
The overall mean scores for Coaching Sessions 1 to 4 for the combined WEB and FF groups were 3.7 (SD = 1.1), 3.6, (SD = 1.1), 4.0 (SD = .8), and 4.2 (SD = .7), respectively, out of a possible score of 5.0. Results from the independent samples Mann–Whitney U tests showed no statistically significant difference between groups in consultants’ ratings of teacher fidelity for each of the 4 coaching sessions, z = 0, p = .99, z = −0.51, p = .61, z = −0.02, p = .98, z = 0, p = .99. Effect sizes using the r metric (i.e., r = |z|/ √n) for each consecutive time point were 0, .09, .004, and 0. A Friedman test was also conducted to test differences in fidelity ratings across coaching sessions within each group. Results showed no statistically significant difference in fidelity ratings within the FF group, χ2(3) = 5.97, p = .11, Kendall’s W = .17; and WEB group, χ2(3) = 6.63, p = .09, Kendall’s W = .15, although both groups showed evidence of a trend in increasing adherence. Ignoring group membership, however, results showed statistically significant differences (improvement) in fidelity ratings over time for each coaching session using a Friedman test, χ2(3) = 12.39, p = .006, Kendall’s W = .15. Specifically, the mean rank of the fidelity ratings improved across coaching sessions (mean rank for Sessions 1, 2, 3, and 4 were 2.26, 2.06, 2.78, and 2.91, respectively).
Results for Primary Hypothesis
Implementation variables
Table 3 shows the intercorrelation matrix of the implementation, intervention, and practice outcome variables for the 29 experimental group participants. The WEB and FF groups were combined for the experimental group. We combined the two groups because they both represented the intervention and because we did not find differences in student-level outcomes (PET-GAS scores), in teacher (intervention practice) variables (such as fidelity of implementation of teaching plans) or in consultant (implementation practice) variables (fidelity of implementation of consultation and coaching procedures or parent/teacher satisfaction). We did find one difference between the WEB and FF group at the child level. IQ was lower for the WEB group children compared with the FF children. But when we used IQ as a covariate in evaluating COMPASS outcomes, we found no difference in results (Ruble et al., 2013). The matrix provides tests of cross validation of the measures within the implementation practice variables and formal tests of association between implementation and intervention variables. As shown, the implementation variables were related to each other in expected ways, providing initial support for the validity of the implementation quality measures. That is, implementation quality of the coaching sessions (i.e., coaching adherence) was positively correlated with aspects of intervention practice expected to be affected by coaching (i.e., teacher adherence to implementing the teaching plans), and this pattern was replicated for Coaching Sessions 1, 2, and 4 (r = .44, p = .02; r = .44, p = .03; r = .09, p = .34; r = .40, p = .03, respectively).
Intercorrelations Between the Implementation Practice, Intervention Practice, and Practice Outcome Variables for the Experimental Sample (n = 29)
Note. A Pearson correlation coefficient was estimated between Consultant Adherence, Coaching Adherence, IEP quality, Coach 1-4 GAS, and Gas Change. A polychoric correlation was estimated among all of the ordinal Teacher Adherence 1-4 variables. All other correlations were based on a polyserial correlation. All correlations were estimated in Mplus 7.11 using the unweighted least-squares with mean and variance adjustment (ULSMV). IEP = individual education program; GAS = goal attainment scale.
Source of Ratings: T = teacher; I = independent rater unaware of group assignment; C = consultant.
p < .05, two-tailed. **p < .01, two-tailed. ***p < .001, two-tailed.
Intervention variables
With respect to the hypothesized associations between intervention variables and practice outcomes, the results were largely supportive. As predicted, there was a positive association between intervention quality (IEP quality scores) and child outcomes (GAS improvement) for the experimental group participants. Specifically, there was a positive linear correlation between final GAS ratings and targeted IEP quality (r = .46, p = .002), as well as between GAS ratings for Coaching Sessions 2 (r = .40, p = .01), 3 (r = .30, p = .02), and 4 (r = .35, p = .02) and targeted IEP quality.
Similarly, the hypothesis that intervention quality, as measured by teacher adherence to the COMPASS intervention, would be positively associated with GAS scores was supported. Not unexpectedly, intervention quality (teacher adherence) was related to child outcomes (GAS scores) only when both were measured contemporaneously (i.e., targeting the same time points). Similarly, adherence ratings collected at parallel coaching sessions for GAS tended to be positively related. The exception was Coaching Session 3, in which there was no significant relationship between observed adherence ratings and the corresponding coaching GAS scores, although the correlation was still strongest for the matched as compared to the non-matched sessions. It also was the only one of the four adherence ratings to fail to correlate significantly with coaching fidelity. Finally, the relationship between intervention quality (teacher adherence) and outcomes was also detected using summary measures created across the four coaching sessions. However, there was no statistically significant association between adherence at the final (fourth) coaching session and GAS change score.
Finally, the hypothesis that features of IEP quality targeted by COMPASS would be higher in the intervention groups compared with the placebo-control group was supported. Results from a 3 × 2 between (group) by within (time: before vs. after COMPASS consultation) fixed-effect ANOVA of targeted IEP quality showed a statistically significant interaction between group and time, F(2, 39) = 23.43, p < .001, η2partial = .55, MSE = 0.02. Although the main effects for group and time were statistically significant, F(2, 39) = 5.36, p = .009, η2partial = .22, and F(1, 39) = 82.43, p < .001, η2partial = .68, they were not interpreted due to the statistically significant interaction. Simple main effects tests were then conducted to examine differences among the three groups at Time 1 (before COMPASS initial consultation) and Time 2 (after COMPASS consultation). The analyses indicated a statistically significant difference among groups at Time 2, F(2, 39) = 18.57, p < .001, η2partial = .49, but not at Time 1, F(2, 39) = .84, p = .44, η2partial = .04. Planned comparisons showed that, as expected, the final mean IEP quality score for the control group (M = 1.16, SD = 0.27, n = 15) had a statistically significant lower mean than the FF group (M = 1.63, SD = 0.23, n = 13), t(39) = −4.94, p = .000 (two-tailed), Cohen’s d = 1.87, and WEB group (M = 1.64, SD = 0.22, n = 14), t(39) = −5.18, p = .000 (two-tailed), Cohen’s d = 1.94. No difference was observed between the FF and WEB groups on final mean IEP quality scores, t(39) = −.095, p = .93 (two-tailed), Cohen’s d = 0.05.
Discussion
The purpose of this study was to examine the relationships between implementation practice, intervention practice, and practice outcomes using an implementation science framework as a guide. Results from this study provide some evidence for each of the hypothesized links in the Dunst (2013) model. That is, implementation practice (COMPASS fidelity) was directly related to intervention quality (teacher behavior), and intervention quality was directly related to child outcomes. Findings also were consistent with those reported by Ruble, Dalrymple, and McGrew (2010), in which intervention practice quality, as measured using the quality of the IEP, was higher in the COMPASS experimental groups and directly associated with child educational outcomes. These results were replicated in both the WEB and FF groups.
The replication in the present study of the prior finding that a measure of intervention quality (i.e., IEP quality), was directly related to outcomes, and is thus an active clinical ingredient, is noteworthy. IEPs are a required element and arguably the centerpiece of special education planning. The fact that outcomes may be affected by improving IEP quality is of potential importance for all special education practice, not just consultation involving COMPASS. However, although there is good evidence that IEP quality may be important, we still do not understand why IEP quality is critical and what happens during the implementation (i.e., consultant activity) to affect IEP quality and what impact this may subsequently have on the intervention (i.e., teacher activity) to have an effect on child outcomes. We offer some insights below.
Analysis of teacher ratings of how well COMPASS informed the IEP and intervention strategies provides some potential directions for understanding the finding and for further investigation. One possibility is that the activities needed to develop high-quality IEP goals, such as a focus on measurability and objectivity, also may be essential for establishing clarity in teacher goal-directed behaviors. Goals are integral for focused behavior (Snyder, 2000), and goal setting theory (Ryan, 1970) posits that goals affect action. Furthermore, clearly defined, well-specified, time-limited goals set the stage for task performance. For example, Locke and Latham (2002) describe four ways in which goals may affect action: (a) by drawing and maintaining effort toward activities associated with the goals, (b) by increasing effort, (c) by influencing persistence, and (d) by affecting indirect behaviors of excitement, discovery, and use of task-relevant knowledge and strategies (Wood & Locke, 1990). The goal setting activities within the COMPASS consultation embeds these important actions of goal development and goal measurement. Moreover, the coaching sessions provide additional features critical for goal attainment. That is, for goals to be obtained, feedback toward progress is essential (Locke & Latham, 2002). Coaching sessions include performance feedback within the set activities of progress monitoring. However, future research is needed to carefully assess and test the degree to which COMPASS actually includes and promotes these aspects of goal setting and their impact on outcomes.
We found strong and consistent evidence that intervention quality, as measured by teacher adherence ratings, was related to outcomes, as measured by contemporaneous ratings of GAS improvement. That is, data from the present study suggest a nuanced and dynamic pattern of findings concerning the relationship between adherence and outcome over time. This pattern reflects the fact that both adherence and outcomes tend to vary over time. Specifically, adherence ratings collected at each coaching session correlated most strongly and significantly with GAS scores collected at the same session, and this was true for three of the four coaching sessions. Thus, adherence between baseline and Session 1 predicted GAS scores at Coaching 1, adherence between Session 1 and Session 2 predicted GAS scores at Session 2, and so on. The sole exception to this pattern was for teaching adherence at Time 3, which may have been problematic, given that it also was the only adherence measure to fail to be related to overall coaching fidelity. This suggests (a) that intervention quality (i.e., adherence, as well as outcomes) should be viewed as a dynamic process that can change and vary over the course of an intervention, and (b) that to the extent intervention quality and outcomes are related, they should be most closely related when their windows of measurement overlap (i.e., cover the same points in time). Thus, the inability to detect a statistically significant association between adherence and final goal attainment outcome may be due to the time varying quality of teacher adherence and the need for simultaneous measurement of adherence and goal attainment. Given that the final GAS outcome assessment occurred several weeks following the last coaching session, this lack of contemporaneousness may explain the weaker, trend-level association between coaching four adherence and final GAS scores.
The results of the study should be viewed as preliminary. The findings are limited to a single consultation model, tested within a limited age range for special education (children between 3 and 8 years old) and delivered by consultants from the research team. Another concern is method variance. For our implementation measures, only teachers completed the ratings, and most measures were rating scales. Moreover, several of the measures were single items which may limit content validity or construct coverage. It would be helpful to have more observational measures, multi-item measures, and implementation measures from other sources to verify these results. In addition, we were unable to assess reliability of the teacher measures, which is a further concern. Finally, the feasibility of a child-specific consultation intervention that can be delivered efficiently requires more evaluation. Future research that tests whether all children in a classroom would benefit from individual COMPASS consultations, or whether teachers might obtain skills that spillover to students and improve educational outcomes from information learned from COMPASS for one child would be helpful. It would also be important to investigate the possible influence of alternate variables that might account for correlations between intervention and practice outcomes, such as teacher burnout (Ruble & McGrew, 2013). Nevertheless, the findings demonstrate the potential usefulness of applying an implementation framework to begin to help us understand and identify mechanisms of change through analysis of relationships between implementation, intervention, and practice outcomes. As predicted, practice outcomes of child goal attainment were most closely associated with the intervention practices, and the intervention practices were most closely aligned with the implementation practice.
Footnotes
Acknowledgements
We are grateful to the teachers, families, and children who generously donated their time and effort. We extend our thanks to special education directors and principals for allowing their teachers to participate. We also want to acknowledge our research team members Nancy Dalrymple, co-developer of COMPASS, as well as Ryan Johnson, Rachel Aiello, Jessica Birdwhistell, Jennifer Hoffman, and Lauren Feltner for their efforts.
Lisa Ruble and Michael Toland, Department of Educational, School, and Counseling Psychology, University of Kentucky; John H. McGrew, Department of Psychology, Indiana University–Purdue University, Indianapolis.
This work was supported by Grant Numbers R34MH073071 and 1RC1MH089760 from the National Institute of Mental Health. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute of Mental Health or the National Institutes of Health.
