Abstract
Traditional professional development for teachers seldom results in program implementation with high fidelity or improved student outcomes. In this study, we evaluated the effects of performance feedback on the implementation of a class-wide, behavioral level system in four self-contained, secondary classrooms for students identified with emotional disturbance. Using a multiple-baseline across-participants design, we examined the effects of performance feedback on the treatment integrity of the level system, along with changes in student engagement and student disruptive behavior. Results indicated a clear functional relation between performance feedback and teachers’ treatment integrity, with less of a relation observed between performance feedback and students’ academic engagement or disruptive behaviors. Implications of these findings are discussed within the context of effective behavioral interventions for students with significant behavioral challenges.
Keywords
Compared with other students receiving special education services, students identified with emotional disturbance (ED) demonstrate significantly more social and behavioral problems at an early age (Poulou, 2015; Wang & Fredricks, 2014), often leading to difficulties extending into adulthood (Copeland, Wolke, Shanahan, & Costello, 2015). Adding to the problem, teachers who work with students with ED often need extra support to develop and use the complex set of skills required to effectively manage students’ various behavioral difficulties (Prather-Jones, 2011).
Performance Feedback
Most students with ED receive their educational programming in self-contained classrooms, in settings removed from general education students (U.S. Department of Education, Office of Special Education Programs, 2013). Unfortunately, many of the educators in these classrooms do not receive sufficient preservice training on how to manage behavior (Oliver & Reschly, 2010), resulting in students’ continued low achievement or increases in students’ problem behavior over time (Hopman et al., 2018; McGrath & Van Bergen, 2015). In addition, because of the stressful nature of working with students with ED, educators in these classrooms may experience more burnout and stress than other special education teachers (Brunsting, Sreckovic, & Lane, 2014). Fortunately, it appears that educator attrition is improved when teachers are supported by district administrators and provided opportunities for professional development (Brunsting et al., 2014).
Lane, Jolivette, Conroy, Nelson, and Benner (2011) recommended school districts (a) ensure high fidelity of implementation of intervention practices, and (b) prepare teachers for the challenge of working with students with ED by providing high-quality professional development. Historically, educator training has taken the form of lecture-style workshops; however, this style of training, without continued follow-up support in the classroom, has proven largely unsuccessful (Darling-Hammond, 2009; Joyce & Showers, 2002). Without follow-up support, educators’ implementation efforts have lacked treatment integrity (TI), which is broadly defined as the extent to which an intervention is implemented as intended (Gresham, 2009). In general, educators demonstrate low levels of TI with new programs after a presentation-only or lecture-style format training (Joyce & Showers, 2002). However, when job-embedded coaching with opportunities for guided practice and feedback is added to a lecture-style training, teachers’ TI can be greatly increased (Fallon, Collier-Meek, Maggin, Sanetti, & Johnson, 2015).
One type of job-embedded support used to improve TI is performance feedback (PF). Using a PF model, a consultant provides feedback to implementers specifically to improve TI (Alvero, Bucklin, & Austin, 2001). In this context, PF is a process in which a consultant monitors specific behaviors related to the TI of an intervention and then uses the data to provide the consultee with timely feedback. In a recent systematic review, Fallon et al. (2015) identified PF as an evidence-based intervention to increase teachers’ TI to behavioral interventions. Nonetheless, only a few studies have investigated the use of PF to increase TI to class-wide interventions in ED classrooms. In two such examples, Sutherland, Wehby, and Copeland (2000) determined PF increased the frequency with which one teacher delivered behavior-specific praise to students, and Rathel, Drasgow, and Christle (2008) used PF to increase preservice teachers’ use of positive communication with students with ED.
Level Systems
Although there is minimal research investigating how classroom management strategies help improve students’ behavior specifically in self-contained classrooms, there are some research-based strategies available to educators (Maggin, Robertson, Oliver, Hollo, & Partin, 2010). For example, Simonsen, Fairbanks, Briesch, Myers, and Sugai (2008) determined that overall classroom structure is positively associated with student academic engagement. In addition, Cook, Landrum, Tankersley, and Kauffman (2003) indicated key features in creating a structured classroom include arranging the room to minimize distractions as well as posting, teaching, and reviewing expectations (i.e., antecedent interventions). Moreover, teacher delivery of contingent and specific praise, both within and independent of group contingencies for classroom management (e.g., level systems, token economies), is associated with increases in student on-task and prosocial behavior.
One research-based, class-wide intervention designed for use in ED classrooms incorporating the aforementioned strategies—and broadly speaking, one fostering more classroom structure—is a level system (Cancio & Johnson, 2007; Mastropieri, Jenne, & Scruggs, 1988). A level system is an independent group contingency in which students earn privileges (through a hierarchy of levels) based on the extent to which they engage in appropriate behaviors (Cooper, Heron, & Heward, 2007; Smith & Farrell, 1993). When implementing a level system, a teacher must (a) clearly delineate the classroom rules and expectations, (b) explain the privileges associated with each level, (c) explain the criteria for earning each level (e.g., number of points necessary), and (d) determine how often students will have the opportunity to change levels (Cooper et al., 2007; Mastropieri et al., 1988).
Although research supports the effectiveness of individual components of a level system (e.g., behavior-specific praise) with students with ED, few researchers have examined the overall impact of level systems on student outcomes in educational settings (Cook et al., 2003; Smith & Farrell, 1993). Moreover, despite the widespread use of level systems in classrooms with students with ED, Smith and Farrell (1993) expressed concern over its continued use due the lack of empirical studies demonstrating its efficacy and called for its continued investigation.
Interestingly, over the past 30 years we identified only one peer-reviewed study that evaluated a level system with students with ED. Mastropieri and colleagues (1988) trained a special education teacher to implement a level system intervention with 15 high school students with behavioral disorders. Students earned levels based on their performance related to following the rules, engaging in work, and remaining in their seat during work time. Employing a reversal design, Mastropieri and colleagues concluded that implementation of the intervention increased the number of assignments completed by students, the accuracy of completed work, and the amount of time students spent in their seats during the class period.
Purpose of the Current Study
For this study, our primary aim was to investigate the effects of using PF to enhance TI of a level system in self-contained classrooms for students with ED. To evaluate this, the present study addressed the following primary research question: When PF is provided for educators using a level system in a self-contained ED classroom, does their TI improve? Secondarily, given the lack of research on the effects of level systems on student behavior in ED classrooms, we also sought to examine whether student behavior would improve when teachers implemented the level system with high fidelity. As such, we proposed the following secondary research question: What is the effect of PF for implementing a class-wide level system on (a) student academic engagement and (b) student disruptive behavior?
Method
Participants and Setting
Our study took place during the 2014-2015 school year in the Southwest United States within a large, urban school district with approximately 40,000 students. All district-wide data described next are from the 2014-2015 school year. Approximately half of the students in the district qualified for free and/or reduced price lunch, the truancy rate (defined as a student missing more than 30 min of instruction without an excuse three or more times during the school year) was 34%, and 4.9% of students in Grades 7 to 12 dropped out of school during the year prior to the study. Approximately 1,300 students in the county (3.3%) were eligible for special education with an ED eligibility, but only 90 middle and high school students identified with ED received educational services in the district’s public schools (the focus of our study). The other 1,210 students with ED in the county were elementary students, or middle and high school students who attended either specialized nonpublic or private schools. All students in the four classrooms in this study were previously found eligible for special education as a student with ED under the eligibility criteria of the Individuals with Disabilities Education Act (IDEA, 2004).
There were six self-contained classrooms in the district for middle and high school students with ED who had difficulty participating in general education classrooms due to behavioral challenges. These classrooms were located on general education campuses and were reserved for students who were eligible for special education with ED. The district required all six teachers and staff to implement the intervention as initially trained by the researchers, and teachers from all six classrooms were invited to participate in this study (i.e., receive PF); however, in one school (with two self-contained ED classrooms), school administrators chose to provide follow-up coaching and support to the staff themselves and thus declined to participate in the study. The four teachers in the other schools opted to participate with follow-up coaching and support from the first author and were enlisted in the study.
Each of the four participating classrooms had two instructional assistants and one teacher. None of the four classrooms had implemented a class-wide behavior intervention during the previous school year. Teachers had an average of 16 years of teaching experience (range = 8–29), and the instructional assistants had an average of 12 years of experience (range = 2–30). Each classroom supported an average of eight students with ED (range = 6–11). Classrooms 1 and 3 supported students in middle school (Grades 6–8), and Classrooms 2 and 4 supported high school students (Grades 9–12).
The teacher in Classroom 1 was a White female with 11 years of teaching experience. One instructional assistant was a Latina female with 15 years of experience, and the other was a White female with 17 years of experience. There were eight male students enrolled in the class during the observed period. In Classroom 2, the teacher was a Black male with 8 years of teaching experience. One instructional assistant was a Latino male with 10 years of experience, and the other was a White female with 2 years of previous experience. Classroom 2 had six male and two female students during the observed period. The teacher in Classroom 3 was a Black female with 16 years of teaching experience. One of the instructional assistants was a Black female with 2 years of experience, and the other was a Latina female with 30 years of experience. There were 10 male and one female student in Classroom 3. The teacher in Classroom 4 was a White male with 29 years of teaching experience. One of the instructional assistants was a White male, and the other was a White female. Both had 10 years of experience. There were four male and two female students in Classroom 4.
At the beginning of the school year, district administrators sent a letter home with all students who received at least part of their educational services in the self-contained classrooms. The letter explained the nature of the class-wide behavior intervention, how it would be implemented in their child’s classroom, and information about who to contact with questions about the intervention. It also explained the process by which the researchers would observe the teachers and students and explained the researchers would not collect information on individual students or disseminate information about the students except for their age, grade level, eligibility for special education, and class-wide behavior data. Because the program was part of a district-wide adoption of a research-based practice, only passive parental consent was required per the district’s institutional review board.
Procedure
Before beginning the project, we gained approval from the institutional review board at the participating school district and the first author’s university affiliation. The first author served as a behavior consultant to the district throughout the 2014–2015 school year helping to support the district-mandated class-wide behavior intervention. The first author was a White female with a master’s degree in school psychology who had been a practicing school psychologist for the 6 years prior to the study. She was enrolled in a doctoral program in school psychology throughout this study.
Per the district’s request, the project began with the first author and faculty from the university providing a 2-day training (12 hr total) to the participating teachers and instructional assistants on central aspects of classroom management. Specifically, during the first day of training, we delivered a PowerPoint lecture with related activities on several core principles of applied behavior analysis (ABA), including (a) operant conditioning (i.e., positive/negative reinforcement, positive/negative punishment), (b) the three-term behavior contingency (i.e., antecedent–behavior–consequence), (c) differential reinforcement, (d) matching law, and (e) group contingency management in schools. During the second day of training, we emphasized the application of these principles to a level system. More specifically, we presented examples of the different types of level systems incorporating ABA strategies in the literature (e.g., Cruz & Cullinan, 2001; Mastropieri et al., 1998) and then outlined the specific components of the level system to be implemented. These components were based on Cancio and Johnson’s (2007) seven features of a point/level system: identifying target behaviors (e.g., behavioral expectations), determining point values for behaviors, determining opportunities to earn bonus points, deriving a continuum of levels (backup reinforcers) students can earn on a daily basis (see Table 1 for an example of level privileges developed in training), determining whether to employ another type of backup reinforcer (e.g., fun Friday), determining how to communicate points/levels with parents, and designing a method for monitoring student progress.
Sample Level Privileges in One Secondary Classroom.
Note. Level 1 is most restrictive and Level 4 is least restrictive. Levels are changed daily based on the points earned by the student the previous day.
We then allowed staff time to develop the procedures and rules for their classrooms according to their needs. In other words, we provided the parameters for the level system but allowed teachers some flexibility in how to apply it in their classroom. For example, some staff wanted to use previously established rules. We specified that if the rules were stated in terms of behavioral expectations (i.e., the behavior in which they wanted the student to engage), they could continue to use them. We provided all attendees a copy of the PowerPoint presentation, examples of point sheets to monitor students’ rule-following, and sample “menus” identifying reinforcers for each level. In addition, workshop presenters explained the first author would be observing all staff’s TI to the level system in the classrooms and measuring the academic engagement and disruptive behavior of the students in their classes. Finally, we provided each staff member with the TI monitoring data sheet (described later).
We used the Microsoft Excel random number generator to determine the period during which each classroom would be observed for TI, academic engagement, and disruptive behavior. The breakdown of each classroom, period, and subject was as follows: Classroom 1—second period, History; Classroom 2—fifth period, English/Language Arts; Classroom 3—third period, Math; and Classroom 4—first period, Math.
Design
We used a concurrent multiple-baseline across-participants design (four classrooms) to address the research questions. It is important to note we were unable to collect data on certain weeks due to school-wide testing, teacher absences, field trips, or other disruptions (e.g., fire drill). In Classroom 1, we did not collect TI or student data during Week 1 (baseline phase). In Classroom 2, we did not collect data during Weeks 1 and 3 (baseline phase) and Weeks 10 and 12 (intervention phase). In Classroom 3, we were unable to collect data during Week 6 (baseline phase). And in Classroom 4, we did not collect data during Weeks 3 and 4 (baseline phase).
Baseline phase
During the baseline/no-feedback phase, we asked staff to implement the intervention as trained. The first author served as the primary consultant and was in each classroom 1 to 2 hr each week to assist with material development and help staff tally the level points correctly. The consultant collected TI data and student behavior using the tools described in the Measures section. The observations for TI and student behavior lasted an average of 41.2 min (range = 39.4–43.8 min). In addition, the consultant met with all teaching staff once per week during their weekly professional development time. Site administrators arranged these meetings to review policies, program issues, and address staff questions. The consultant used approximately 20 to 30 min of this hour-long meeting to answer general questions and review general behavioral principles, but did not display the results of the TI observation or provide specific information about how the staff were implementing the level system. During this phase, staff completed a Microsoft Excel spreadsheet detailing the number of points students earned each day. The consultant used these data to compile weekly summary data to review with the classroom teams on the percentage of points earned per student by week and by month.
Performance feedback phase
We determined the order in which classrooms began receiving intervention based on a visual analysis of level and trend of the baseline TI data (i.e., a response-guided approach). After three data points were collected in a phase, we examined data for stability and trend. After the third data point was collected in Classroom 1, the baseline trend for TI was identified as stable. We reviewed the baseline data for each classroom on a weekly basis to determine when it was appropriate to begin intervention for other classrooms; once a relatively stable rate of responding was established, we initiated PF for each subsequent classroom.
During the intervention phase, we continued to collect data as described in the baseline phase; however, during the intervention phase, the primary consultant gave each classroom’s staff specific feedback about their TI. Due to concerns with interrupting class and staff scheduling, the consultant did not deliver PF immediately after the observation. However, the primary consultant observed the classrooms either Monday or Tuesday during the respective class periods (following the same schedule as baseline observations) and then presented the data for that week’s observation during staff meetings after school on Tuesday or Wednesday.
For each meeting, the consultant graphed the results from each observation using Microsoft Excel and displayed results as a percentage of TI in each area of the TI measurement system (permanent product, Direct Observation 1, Direct Observation 2, and praise) and overall TI (the total points divided by the total number of opportunities). Staff received printed graphs to review and the consultant (a) explained the four TI areas according to the measurement tool and (b) gave specific feedback as to why scores were high or low in each area. For components demonstrating less than 100% TI, the consultant provided specific suggestions for improvement (e.g., “levels need to be posted with descriptors of what students have access to at that level”). During each PF session, staff were also (a) able to view their overall TI over time; (b) told whether their TI was increasing, decreasing, or stable; and (c) given time to ask questions. The consultant used a timer to keep meetings under 15 min. On average, meetings lasted 9.5 min (range = 8.25–15 min). Although the consultant encouraged staff to email questions between meetings, no staff members emailed between meetings throughout the intervention phase.
Procedural Integrity
We gathered procedural integrity for all PF sessions. The consultant completed an eight-item checklist during each session, reflecting each component of a PF session. The items were (a) greeting members of the team and providing an overview of the meeting, (b) providing the most recent TI graphs for all members to review, (c) verbally providing information regarding percentage of TI achieved while referencing the information on the graph, (d) providing staff members with graphs measuring their TI progress over time, (e) verbally providing information on staff’s average percentage of integrity over time, (f) setting goals for the next meeting, (g) verbally answering all questions from staff, and (h) maintaining a 15-min time limit.
Items on the PF integrity checklist were rated dichotomously, as either occurred or not occurred. Procedural integrity self-ratings ranged from 87.50% to 100% (M = 92.71%). Procedural integrity was lower on days in which the consultant did not make enough copies of the graphs for each staff member to have their own. An outside observer conducted procedural fidelity ratings for 24% of PF sessions using the same checklist. To train the outside observer on the use of the PF checklist, the consultant reviewed the checklist and conducted a mock PF session, while the outside observer watched and scored. The outside observer was considered reliable after obtaining 90% or higher inter-observer agreement (IOA) with the consultant for two consecutive sessions. Depending on availability, the outside observer attended either the first or second PF session and then either the fourth or fifth session for each classroom. IOA was calculated as percent agreement (number of agreements divided by total number of agreements and disagreements) and resulted in 100% agreement across all observed sessions.
Dependent Variables
The three dependent variables included TI of teacher-delivered level system, student academic engagement, and student disruptive behavior. To measure these variables, we developed two assessment tools, one for staff TI and the other for student behavior. We evaluated content validity for both measures via feedback from an expert review panel consisting of eight doctoral students in school psychology, three PhD faculty members in a master’s program for ABA, and three PhD faculty members in a school psychology doctoral program. We used the feedback to develop and modify all measures until all members of the expert panel felt the measures would validly assess the variables of interest.
During observations, TI data were always collected first and separately from student outcomes. The process of collecting data included (a) entering the room discretely, (b) ensuring the lesson was not within the first 10 or approaching the last 20 min of the class period (to allow for sufficient observation time), (c) collecting TI data, (d) collecting student data, and (e) leaving the room discretely. Due to the complexity of the TI observation system (see description), we determined it would be too difficult to collect TI and student data simultaneously.
Treatment integrity
TI was the primary dependent variable in this study. We collected TI data using a researcher-created form with both permanent product and direct observation components. The TI tool consisted of four sections: (a) permanent product review, (b) Direct Observation 1, (c) Direct Observation 2, and (d) praise score. Items reflected concepts presented in the 2-day training and manualized components of the level system. The consultant conducted each observation during the middle of each class period, outside of the first and last 10 min of class. After each observation, the first author derived scores for each section and an overall TI score. See Table 2 for a list of the components in the observation tool and how each component was measured.
Components of Level System Targeted for Measurement of Treatment Integrity.
Note. Direct Observation 1 measured the extent to which the staff utilized the components of the level system throughout the total observation period, whereas Direct Observation 2 measured staff’s response to specific student behaviors via partial interval recording.
For the permanent product review, the observer identified whether the various components of the level system in their classroom were on display. Specifically, the observer scored the following as either “present” or “not present” in the room: (a) four to eight class rules with clear definitions posted, (b) posted levels with descriptors of what students have access to in each level, and (c) student names posted under the corresponding level they earned for that day. In addition, the first author collected information from the teacher related to (a) staff-developed behavioral scripts used the previous week, (b) tallied point sheets from the previous week, and (c) letters sent home to parents with information about their child’s behavior and point levels during the previous week.
For Direct Observation 1, the observer determined the extent to which staff used behavior scripts and how the staff directed the students to engage with the level system during the classroom observation. For example, the observer marked “yes,” “no,” or “not applicable” to whether staff used behavioral scripts to teach expected behaviors (“not applicable” was used when there was no opportunity to observe a lesson in which staff were reviewing behavioral expectations). In addition, the observer indicated whether students had access to level privileges listed on the posted level descriptions throughout the observation.
The Direct Observation 2 section of the tool required the first author to observe how staff responded to appropriate and inappropriate student behavior in accordance with the level system during a 15-min direct observation period (conducted after Direct Observation 1). When completing the Direct Observation 2 section, the observer used 30-s partial interval recording to determine whether there was an opportunity for classroom staff to implement some aspect of the level system. Depending on how many staff were working with students during the observation, the consultant split up the time to observe each staff member equally. In other words, if there were three staff members working with students, each staff member was observed for 5 min each. For consistency, the consultant always started with the staff member closest to the door and then moved around the room to observe other staff in a clockwise manner. This process was consistent with our goal of evaluating the effects of PF across classrooms—each of which utilized three total staff members.
During Direct Observation 2, if there was an opportunity, the observer recorded whether the system was implemented as planned. For example, if one or more students met the stated criterion of an appropriate behavior within the 30-s observation interval (e.g., followed staff’s directions the first time given), the observer recorded whether the observed staff member responded according to the specified intervention plan. Because some of the students’ appropriate behaviors were not discrete (e.g., remaining on task), we tabulated scores for each 3-min segment of the total 15-min observation. As a result, for each 3-min segment, if at least one student was demonstrating appropriate behavior in the class, we indicated there was one opportunity for staff to engage in the correct response (provide behavior-specific praise and give the student a point on the tally sheet). As a result, there were a total of five opportunities (one per each 3-min period over 15 min) for staff to (a) verbally praise a student for engaging in appropriate behavior using behavior-specific praise, and (b) mark a point on his or her data sheet. In contrast, if one or more students met the stated criterion of an inappropriate behavior (e.g., left the classroom without permission), the observer recorded whether staff engaged in the correct response within that 30-s time period. Overall TI per observation session was scored as a ratio of correct staff responses divided by the number of observed opportunities to respond in all areas of the level system.
For the last section of the tool, the praise score, the observer used the data collected during Direct Observation 2. The observer divided the number of times staff provided praise for appropriate student behavior by the number of times staff provided reprimands for inappropriate student behavior. The staff collectively earned one point in this area (scored dichotomously 1 or 0) if the ratio was at least four praise statements to each reprimand (based on recommendations from Alberto & Troutman, 2003).
Student engagement and disruptive behavior
We measured student engagement and disruptive behavior because research has indicated these two variables are predictive of later outcomes for students (Wang & Fredricks, 2014). We collected student academic engagement and student disruptive behavior data on a researcher-created measure adapted from the Behavior Observation System of Students (BOSS; Shapiro, 2004) and the Planned Activity Check (PLA-check; Cataldo & Risley, 1973; Cooper et al., 2007). We chose to use the PLA-check as the basis for measuring student outcomes due to its relative ease of use and its accuracy in the assessment of group behavior (Dart, Radley, Briesch, Furlow, & Cavell, 2016). We conducted these observations immediately after the TI observation.
We adapted the BOSS by combining the definitions of active and passive engagement into one construct: academic engagement. We considered students academically engaged if they were either actively (e.g., writing, reading aloud, raising hand, talking to the teacher about assigned material, talking to a peer about the assigned material, looking up a word in the dictionary) or passively (e.g., listening to a lecture, looking at a worksheet, silently reading assigned material, looking at the board during teacher instruction, listening to a peer respond to a question) engaged in assigned work. We also extended the recording interval to 30 s from 15 s on the BOSS to better incorporate the PLA-check (described in the next paragraph). Like the BOSS, we coded academic engagement using momentary time sampling. Finally, we amended the off-task behavior section of the BOSS by changing it to partial interval sampling for disruptive behavior. Disruptive behavior was defined as any behavior that caused the teacher or other students to become distracted, look in the student’s direction, interrupted the flow of instruction, interrupted the teacher or other students due to noise or movement, or involved the student physically or verbally engaging with another person in the room that removed them from engaging in the task (e.g., making audible noises, calling out, making unauthorized comments, hitting others, throwing objects).
We coded student outcomes for the entire class rather than at the level of the individual, following the PLA-check method. Instead of observing just one student, as is done with the BOSS, student observation was conducted in a class-wide fashion, as in the PLA-check. In this way, observation sessions proceeded as follows: the consultant (a) stood at the back of the room with the observation tool on a clipboard; (b) set two interval timers, one for 15 s and one for 30 s; (c) recorded a “+” during the appropriate 15-s interval if there was any disruptive behavior in the classroom; and (d) scanned the room quickly from left to right to record the number of academically engaged students when the interval timer vibrated to signal a 30-s interval. These observation procedures were repeated until the end of the 15-min observation period. Cooper and colleagues (2007) supported the use of momentary time sampling for continuous activities, such as engagement, and emphasized that in this context, momentary time sampling approximates the true occurrence of the behavior.
Academic engagement was calculated by adding all the 30-s intervals during which at least 80% of the students were engaged and dividing the sum by 30 (the total number of 30-s intervals). Disruptive behavior was calculated by adding all the 15-s intervals during which at least one student engaged in disruptive behavior and dividing the sum by 60.
Treatment Acceptability
Using the Intervention Rating Profile–15 (IRP-15; Martens, Witt, Elliott, & Darveaux, 1985), we measured staff treatment acceptability for both the level system intervention and the consultant’s role in providing PF. The IRP-15 assesses teacher acceptability of an intervention through 15 statements (e.g., “I liked the procedures used in this intervention”) to which participants respond on a 6-point Likert-type scale (1 = strongly disagree, 6 = strongly agree). The IRP-15 was designed to measure general acceptability, and in a principal components factor analysis, it yielded one primary factor with item loadings ranging from .82 to .95 and a Cronbach’s alpha of .98 (Martens et al., 1985). In the current study, Cronbach’s alpha for staff responses was .94 for the level intervention and .90 for the consultant’s use of PF.
Inter-Observer Agreement
We collected IOA for all dependent variables in the baseline phase during Weeks 2 to 4 in all classrooms and during Week 8 in Classroom 4 (M = 62.5% of baseline sessions). In addition, we collected IOA data across all classrooms during the intervention phase in Weeks 10 and 11 (M = 44.8% of intervention sessions). Observers were doctoral graduate students enrolled in a school psychology program. The consultant trained the observers on the measures using the following process: (a) the consultant explained the forms to the observers, (b) the observers observed the consultant coding each measure in nonparticipating classrooms, (c) the consultant and observers then independently coded behaviors in nonparticipating classrooms, and (d) IOA was calculated. This process continued until each observer obtained at least .80 IOA with the consultant for each measure across two consecutive sessions. IOA was then calculated interval by interval, comparing the primary and secondary observational data for each interval (Cooper et al., 2007) for the time-sampling portion of the TI observation tool and both variables (academic engagement and disruptive behavior) measured through the class-wide observation tool. For TI, the average IOA, calculated interval by interval, across classrooms was 87% (range = 76%–100%). IOA for disruptive behavior/student engagement across classrooms was 90% (range = 89%–92%).
Data Analysis
Evidence of a functional relation between PF and (a) staff TI and (b) student academic engagement and disruptive behavior was examined using baseline logic and visual analysis (Cooper et al., 2007). Visual analysis involves examining (a) level, (b) trend, (c) variability, (d) immediacy of effect, (e) overlap, and (f) consistency of data patterns within and between phases (Kratochwill et al., 2012).
We supplemented visual analyses with Tau-U, a metric which allows for an analysis of the overlap and nonoverlap of data points between phases in single-case design (Parker, Vannest, Davis, & Sauber, 2011). We used Tau-U because other effect size calculations presented in the literature (e.g., percentage of nonoverlapping data or PND) may produce inaccurate results due to procedural sensitivities (Pustejovsky, 2018). In addition, unlike other commonly used metrics, such as nonoverlap of all pairs (NAP), Tau-U accounts for changes in level and trend between phases. Furthermore, in a review of all nonoverlap indices, Parker and colleagues (2011) determined Tau-U was the least likely to produce skewed outcomes due to a ceiling effect, and Tau-U is only slightly affected by autocorrelation of the data (Brossart, Vannest, Davis, & Patience, 2014; Vannest & Ninci, 2015). Vannest and Ninci (2015) reported, “Tau-U can address the issues that are problematic for the other [effect sizes] by controlling for the baseline trend, handling smaller data sets, discriminating well at the upper and lower limits, and correlating well with other indices” (p. 407). We calculated Tau-UA vs. B – Trend A via the single-case effect size calculator (Pustejovsky, 2017).
There is no consensus on numerical guidelines for interpreting Tau-U; rather, Brossart and colleagues (2014) suggested considering factors such as length of the intervention, comparison data from other studies, and social significance of the effect when interpreting intervention effects. The Tau-U metric can be interpreted as the percentage of data in baseline that overlap with data in the treatment condition (Parker et al., 2011). Tau-U values range between −1 and 1, where negative numbers are associated with a decrease in level between baseline and treatment phases.
Results
Effect of Performance Feedback on Treatment Integrity
Our primary research question focused on examining changes in TI upon introduction of PF across the four classrooms. Figure 1 shows the graphed TI data for each classroom.

Graphed result of treatment integrity.
Classroom 1
During baseline for Classroom 1, the mean percentage of TI was 20% (range = 17%–23%) with little variability across observations (i.e., the data path was relatively flat and stable). Upon implementation of PF, TI increased to 84% after the first session. The mean percentage of TI during the PF phase was 84% (range = 74%–92%), and the data trend remained relatively flat and stable with a slight upward trend. Tau-U for Classroom 1 was 1.0, indicating no overlap between baseline and intervention phases.
Classroom 2
In Classroom 2, the mean percentage of TI was 32% (range = 19%–36%) during baseline. During initial baseline observation in Classroom 2, TI was lower, at 19%; however, during the three subsequent observations, TI remained at 36%. Upon implementation of PF, TI immediately increased to 62%, where an upward trend was observed for the next two sessions until leveling off at 88%. The mean for TI during the PF phase was 78% (range = 62%–88%). Tau-U for Classroom 2 was 1.0, indicating no overlap between baseline and intervention phases.
Classroom 3
During the baseline phase in Classroom 3, average TI was 36% (range = 29%–48%). The data path was relatively flat for the first 3 weeks of observation with a slight downward trend. During Week 4 of baseline, TI increased to 48% and decreased to 40% by Week 7. Upon implementation of PF, TI increased immediately to 70% and continued to increase for two subsequent weeks, demonstrating an upward trend, until it peaked at 92%. However, during the fourth and last observation, TI decreased to 72%. Tau-U for Classroom 3 was 1.0, indicating no overlap between baseline and intervention phases.
Classroom 4
Average TI for Classroom 4 during baseline was 21% (range = 15%–30%) with a relatively flat, stable data path. Upon implementation of PF, TI immediately increased to 60%. During the PF phase, TI averaged 56.5% (range = 52%–60%). The data path during intervention remained stable with little variability observed. Tau-U for Classroom 4 was 1.0, indicating no overlap between baseline and intervention phases.
Effect of Performance Feedback on Academic Engagement
Regarding academic engaged time, all classrooms evidenced a change in level between baseline and treatment phases (see Figure 2). Like TI, the differences between baseline and intervention for academic engagement was mostly related to level, not trend. In two of the four classrooms (Classrooms 1 and 4), there was a complete separation (i.e., no overlap of data points) between baseline and treatment.

Graphed result of student academic engagement and disruptive behavior.
Classroom 1
The baseline average for academic engagement in Classroom 1 was 44% (range = 35%–50%). Academic engagement trended slightly downward during baseline, ending at the lowest of the three observation points at 35%. Upon implementation of PF, academic engagement immediately increased to 84%, with an average of 88% across all observations during the PF phase (range = 78%–100%). During the initial three observations during the PF phase, the data bounced between 84%, 100%, and back to 80%. However, the last four data points of the PF phase represent a stable, upward trend. Tau-U for Classroom 1 was 1.0.
Classroom 2
For Classroom 2, baseline average for academic engagement was 71% (range = 60%–90%). Academic engagement ranged between 60% and 70% except for one data point, during the third observation, at 90%. Upon implementation of PF, academic engagement jumped from 70% to 82%. Average academic engagement during the PF phase was 86% (range = 82%–90%), and the data trended slightly upward until the last data point. Tau-U for Classroom 2 was .65, indicating a moderate degree of overlap between baseline and intervention phases.
Classroom 3
Baseline data for Classroom 3 averaged 77% (range = 60%–90%) and demonstrated a slight upward trend. At the beginning of the PF phase, the first data point was 100%, up from 90% at the end of the baseline phase. During the PF phase, academic engagement averaged 95% (range = 90%–100%) and remained relatively flat and stable. Tau-U for Classroom 3 was .83, indicating some overlap between baseline and intervention phases.
Classroom 4
In Classroom 4, the baseline data for academic engagement were variable and averaged 20% (range = 0%–40%). Although academic engagement was measured as high as 40% during one observation, the last two data points in the observation phase were 0% academic engagement. During the first observation of the PF phase, academic engagement jumped to 50% and averaged 48% across the PF phase (range = 40%–60%). The data path during the PF phase was stable, ending with a slight upward trend. Tau-U for Classroom 4 was .92, indicating little overlap between baseline and intervention phases.
Effect of Performance Feedback on Disruptive Behavior
For disruptive behavior (see Figure 2), the data do not suggest an obvious functional relation between the consultant’s use of PF and students’ disruptive behavior given the relatively inconsistent data and small to no improvement in behavior during the intervention phase.
Classroom 1
The baseline average for disruptive behavior in Classroom 1 was 32% (range = 26%–35%) with a relatively flat and stable data path. Between the last data point in baseline (35%) and the first data point in the PF phase (10%), there was evidence of the immediacy of the effect. During the PF condition, the data remained relatively stable; the average disruptive behavior during intervention was 14% (range = 10%–30%). Tau-U for Classroom 1 was –.90, indicating little overlap between baseline and intervention phases.
Classroom 2
For Classroom 2, the average disruptive behavior during baseline was 6% (range = 3%–10%). Little variability was noted during baseline. Between the last data point in the baseline phase and the first data point in the PF phase, the rate of disruptive behavior increased from 5% to 19%. The average disruptive behavior observed during the PF phase was 11% (range = 3%–19%), and the data path was more variable during the PF phase than during baseline. Tau-U was .45, indicating a large degree of overlap and small, nontherapeutic treatment effects.
Classroom 3
For disruptive behavior in Classroom 3, the average during baseline was 18% (range = 6%–30%). The data path in baseline initially began with an upward trend, peaking at 30% and then trending downward to a final observation at 10%. The initial data point in the PF phase was also 10%, indicating no immediate change. The average disruptive behavior in the PF phase was 10% (range = 0%–20%). Data were less variable in the PF phase; however, Tau-U was –.45, indicating a large degree of overlap of the data between baseline and PF.
Classroom 4
In Classroom 4, the data path for disruptive behavior was highly variable during baseline (M = 50%, range = 23%–100%) with a steep upward trend for the last three data points. With the introduction of PF, disruptive behavior decreased significantly from 100% to 50%. Average disruptive behavior during the PF phase was 34% (range = 20%–50%) with significantly less variability than during the baseline phase. Tau-U for Classroom 4 was –.58, indicating a large degree of overlap between baseline and intervention.
Treatment Acceptability
Acceptability was evaluated with an average score (per adult participant) across all items on the IRP-15. Across all teachers, the average rating of the level system was 4.1 (range = 2.0–5.0), and across instructional assistants, the average rating was 3.9 (range = 1.0–4.0). Across all teachers, the average rating of the consultation services that integrated PF was 4.5 (range = 2.0–5.0), and across instructional assistants, the average rating was 4.2 (range = 3.0–5.0). These data suggest, on average, both the level system and PF-based consultation achieved a basic level of acceptability (4 = slightly agree on each IRP item) for the instructional staff.
Discussion
The primary goal of this study was to examine the extent to which PF was associated with increases in staff’s TI to a class-wide level system intervention for students with ED. We also examined whether PF related to teachers’ TI had effects on student behavior. One key finding was that classroom staff demonstrated clear improvements in TI corresponding with the use of PF. Before beginning the PF intervention, even after a relatively intensive 2-day training on behavioral principles and classroom management, implementation of the level system averaged only 28% TI across the classroom staff—which unfortunately is consistent with previous research investigating teachers’ implementation integrity after a day or two of professional development (e.g., Joyce & Showers, 2002). However, once PF began, the staff in our study increased their TI to an average of 78%. There were obvious improvements across all four classrooms, and improvements in TI sustained throughout the duration of the study. The improvement in TI in this study is congruent with findings from Fallon et al.’s (2015) recent synthesis.
Another key finding of this study was that upon introduction of PF, students’ academic engagement improved across all four classrooms. In Classrooms 1 and 4, the level of improvement was consistent with the level of improvement of TI. However, in Classrooms 2 and 3, the level of improvement in TI was greater than the improvement in student academic engagement. It is not necessarily surprising that student behavior in some classrooms did not mirror the relative improvements in teachers’ TI. In their meta-analysis, Solomon, Klein, and Politylo (2012) determined PF often produced greater change in teacher behavior than student behavior. They explained these findings by emphasizing PF specifically targets the fidelity with which the intervention is delivered, and not with how effective the intervention is itself.
Relatedly, we cannot fully explain the relation between improvements in TI and students’ academic engagement in our study because our research design focused on examining the relationship between PF and TI. Although our study contributes to the limited research on the level systems and student behavior, an alternative research design would better address the relationship between level systems and student behavior. Future research that explores teachers’ TI as a potential mediator of improved student behavior is warranted. This is especially the case in the context of a class-wide level system intervention for students with ED, as the research in this area is still limited.
Our third main finding was that although we observed a decrease in student disruptive behavior upon introduction of PF in some classes, effects were small and inconsistent in comparison with the effects observed with TI and students’ academic engagement. The lack of improvement in student disruptive behavior in this study may be due to several issues. First, floor effects existed in Classroom 2, where students engaged in disruptive behavior during fewer than 10% of intervals observed during baseline. Alternatively, it is possible that a level system intervention for secondary students with ED—even when well implemented—is not effective in decreasing disruptive behavior. Again, future research is needed to explore these possibilities.
Limitations
Although we observed a functional relation between PF and TI across all classrooms, the staff in Classroom 4 never achieved 80% integrity, even after four sessions of PF. Our study was not designed to provide an analysis of why certain classrooms achieved greater outcomes than others. However, because the teacher in Classroom 4 had the most experience (29 years), future research could explore the relation between teaching experience and willingness to implement new interventions. Regardless, external validity of findings should be strengthened through systematic replication of the effect across multiple studies (Horner et al., 2005). In addition, because we randomly chose the period in which to monitor TI, it may be important to determine whether level of TI is dependent on the type of class period or subject. Future research should explore the subject–TI relationship by controlling for subject across classrooms (e.g., measuring TI and student outcomes only during math class).
In addition, with respect to the specific procedures used in this study, due to periodic teacher absences and state testing days, scheduled observations had to be postponed on certain weeks. As a result, on certain weeks there were no observations, and we therefore conducted no PF. Research suggests the more consistently PF is conducted, and the less amount of time between behavioral observations and delivery of PF, the more effective it is as a behavior change intervention (Solomon et al., 2012). However, even with occasional weeks when PF could not be conducted, the effects of PF on teachers’ TI were strong and consistent across each of the classrooms. This finding is promising because, in practice, the educational support staff working to consult and provide PF to classroom personnel may also experience similar periodic gaps in support.
Another limitation is that the measures were created for the purposes of this study, and their psychometric properties have not been fully explored. In addition, the observation tool did not measure all types of TI discussed in the current literature. For example, according to Sanetti and Collier-Meek (2014), TI is a multidimensional construct that includes content, quantity, and process-related dimensions. The TI tool in our study included observational elements and a permanent product review, but there was no consideration for the level of adherence or the quality of intervention delivery.
Finally, as we did not collect maintenance or generalization data for the effects of PF on TI, we cannot speak to the effect of systematically fading PF on the level of TI to the intervention, nor can we assess the extent to which teachers continued to use this intervention over time in the same setting or adapted the intervention for use in other settings. Researchers have consistently highlighted the importance of these endeavors (e.g., Horner & Sugai, 2018), and future researchers should examine these issues when designing studies. In this way, researchers will be better able to speak to the lasting impact of the intervention on behavior change.
Implications for Research and Practice
The current study adds to the literature on empirically based class-wide management strategies for students with ED in self-contained classrooms. In contrast to previous studies, this investigation emphasized improving TI to the overall system, as well as examining the indirect effect of PF on student behavior. Findings that PF was associated with large improvements in TI and some improvements in student behavior are particularly relevant given calls to ensure high integrity of intervention practices (Lane et al., 2011). As a result, future research should, at minimum, incorporate TI monitoring to demonstrate an intervention is being delivered as intended. Researchers should continue to evaluate the effectiveness of interventions for students with ED, as there is currently no consensus on which types of programs produce the most effective and sustainable positive outcomes for this population. To this end, future research of PF should include student outcomes, not just TI, as dependent variables.
Despite the limitations of this study and the need for additional research, our findings offer some implications for current practice in PF and TI. For example, when school teams implement new classroom management procedures, they should consider supporting staff through PF. For class-wide behavioral interventions containing multiple components for these students, well-intentioned staff may struggle to implement the intervention without the appropriate support. In general, based on the results from this study, a coach could efficiently conduct observations and provide PF to help substantially improve TI in about 30 to 45 min per week.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
