Abstract
The purpose of this meta-analysis was to quantitatively summarize the single-case research (SCR) literature on the use of behavior contracts with children and youth. This study examined the efficacy of behavior contracts on problem behaviors and academic behaviors across 18 SCR studies. Academic and behavioral outcomes were examined for 58 children and youth ages 5 to 21 using the TauU effect size index. Results indicated the overall moderate effect of the use of behavior contracts was ES = .57 (95% confidence interval [CI95] = [0.55, 0.58]) with a range of effects across studies (ES = .27 to ES = 1.00). Moderator analyses indicated that behavior contracts are beneficial for students regardless of grade level, gender, or disability status. Findings suggest that the intervention is more effective in reducing inappropriate behaviors than increasing appropriate behaviors, and that academic outcomes are positively affected by behavior contracting.
Behavior contracting is a behavioral intervention within the applied behavior analysis literature that has been used to influence behavior change for more than 45 years (e.g., Bailey, Wolf, & Phillips, 1970). Major features of behavior contracts (also referred to as contingency contracts) include (a) clearly stating the behavioral expectations regarding behavior change, (b) incorporating rewards for adhering to the contract, and (c) consequences for not meeting agreed upon expectations (Kidd & Saudargas, 1988). Behavior contracts offer advantages such as cost-effectiveness and flexibility for designing individualized interventions. In addition, they provide structure, allow for the establishment of clear expectations, and can be extended across settings (Houmanfar, Maglieri, Roman, & Ward, 2008). Behavior contracts were most likely first used as written agreements between therapists and clients in therapeutic settings (Janz, Becker, & Hartman, 1984). However, they also have been used in a wide range of other settings including K-12 classrooms (Gurrad, Weber, & McLaughlin, 2002), college-level courses (Bristol & Sloane, 1974), homes (Weathers & Liberman, 1975), prisons (Clements & McKee, 1968), mental health facilities/psychiatric hospitals (Levendusky, Berglas, Dooley, & Landau, 1983), as part of family therapy (Stumphauzer, 1985a), outpatient rehabilitation facilities (Leslie & Schuster, 1991), substance abuse treatment settings (Lash et al., 2007), and probation programs (Stumphauzer, 1985b).
Individuals with disabilities have been participants in much of the research investigating the effectiveness of behavior contracts. Research has included students with emotional and behavioral disorders (EBD; Ruth, 1996), autism (Mruzek, Cohen, & Smith, 2007), attention deficit hyperactivity disorder (ADHD; Gurrad et al., 2002), intellectual disabilities (Jenkins & Gorrafa, 1974), and traumatic brain injury (Hufford, Williams, Malec, & Cravotta, 2012). In addition, several mental health studies have used behavior contracts as part of treatment for borderline personality disorder (L. J. Miller, 1990), post-traumatic stress disorder (Otto, Reilly-Harrington, Kogan, & Winett, 2003), and eating disorders (Solanto, Jacobson, Heller, Golden, & Hertz, 1994). Furthermore, improvements across a range of outcomes have been found with the use of behavior contracts. They include academic gains (Kelley & Stokes, 1982), improved social behaviors (Arwood, Williams, & Long, 1974), weight control (Rotatori, Switzky, & Fox, 1981), adherence to an exercise program (Leslie & Schuster, 1991), medication adherence (Hillman & Miller, 2009), patient compliance (Janz et al., 1984), and a reduction in aggressive and violent behaviors (Wallace, Teigen, Liberman, & Baker, 1973).
Behavior Contracts in School and Home Settings
Behavior contracting has been used to address problem behaviors in both school (e.g., Homme, Csanyi, Gonzalez, & Rechs, 1969) and home settings (e.g., Wahler & Fox, 1980). Most of the research on the efficacy of behavior contracts in school settings has focused on reducing challenging and disruptive behavior, or promoting appropriate behavior. Behaviors addressed with contracting have included the following: remaining seated (Allen, Howard, Sweeney, & McLaughlin, 1993), engaging in appropriate social interactions with peers (Arwood et al., 1974), reducing disruptive behaviors (DeMartini-Scully, Bray, & Kehle, 2000), demonstrating on-task behavior (Flood & Wilder, 2002), decreasing loud vocalizations (Hawkins et al., 2011), following class rules (Mruzek et al., 2007), and attending school (MacDonald, Gallimore, & MacDonald, 1970). School settings in which behavior contracting has been used include general education (DeMartini-Scully et al., 2000) as well as special education classrooms (Diaddigo & Dickie, 1978). Behavior contracts have been used in studies investigating academic outcomes like writing skills (Newstrom, McLaughlin, & Sweeney, 1999), percent of items answered correctly (Kelley & Stokes, 1984), and homework completion (D. L. Miller & Kelley, 1994).
Likewise, behavior contracts have been used in home environments to address problem behaviors such as adhering to curfew (Weathers & Liberman, 1975; Welch, 1985), controlling one’s temper, and maintaining household chores and responsibilities (Welch, 1985). Also, the impact of behavior contracting on social behaviors during playtime (viz., reducing oppositional and aggressive behaviors) has been examined (Wahler & Fox, 1980).
Meta-Analysis and Single-Case Research (SCR)
Meta-analysis within SCR is increasingly being used to quantitatively synthesize studies and make statements regarding the external validity and generalizability of a given literature base (Shadish, Rindskopf, & Hedges, 2008). SCR has been used to identify a range of interventions among a wide range of participants. This method of inquiry has been identified as a valid approach to identifying practices that are evidence-based (Horner et al., 2005). The use of effect sizes to summarize literature within meta-analysis allows for determining the size or magnitude of academic or behavioral change (Parker & Hagan-Burke, 2007). Determining the magnitude of an intervention effect is an important aspect of identifying evidence-based practices (Council for Exceptional Children, 2008).
SCR is generally regarded as having strong internal validity (i.e., the ability to document intervention effects) through the use of designs that can document a functional relationship (e.g., reversal, multiple baseline, alternating treatment), and the replication of intervention effects (Horner et al., 2005). However, SCR also has weaker external validity and requires systematic replication to show the generalizability of an intervention. External validity in SCR is strengthened by the replication of the effects of an intervention through multiple studies that include a range of participants, conditions, outcome variables, settings, and researchers (Horner et al., 2005). Meta-analysis provides an effective and efficient method for systematically analyzing the replications within SCR (Parker & Hagan-Burke, 2007). Meta-analytic reviews are critical to establishing the evidence base for effective behavioral interventions and practices as they allow “researchers to arrive at conclusions that are more accurate and more credible than can be presented in any one primary study or in a non-quantitative, narrative review” (Rosenthal & DiMatteo, 2001, p. 61). Although studies using SCR designs are often excluded from meta-analyses (Allison & Gorman, 1993), it is becoming more common for data from SCR studies to be quantitatively summarized to determine evidence-based practices (Parker, Vannest, Davis, & Sauber, 2011).
Effect sizes provide a common metric for analyzing SCR data across studies (Parker & Hagan-Burke, 2007). As with studies using group designs, they are essential for the aggregation and comparison of results across studies. Confidence intervals complement effect size reporting and are needed for accurate interpretation (Cooper, 2011; Hunter, Schmidt, & Jackson, 1982; Thompson, 2002, 2007). An effect size with confidence intervals refers to the relative size of an effect compared with other treatments. It provides a standard metric for comparison and aggregation, thus providing data that are readily understood and interpretable through visual analyis (Parker et al., 2011). Moreover, effect sizes with confidence intervals are a preferred reporting approach by the American Psychological Association (APA; 2010; Wilkinson & APA Task Force on Statistical Inference, 1999).
Purpose and Research Questions
The purpose of this meta-analysis was to quantitatively summarize the SCR literature on the use of behavior contracts with children and youth. Behavior contracts have a robust history in applied behavior analysis. However, a meta-analysis has not been conducted to date, and reviews of literature on the intervention could not be found. This meta-analysis examined the effect of behavior and contingency contracts across SCR studies, as most of the behavior contracting research involving children and youth has used SCR designs. The current study was designed to contribute to the behavior contracting literature in several ways. In particular, this meta-analysis (a) is the first meta-analysis conducted on behavior contracts, (b) examined the impact of behavior contracts on a multiple behaviors across settings, (c) examined the effectiveness of behavior contracts for children and youth with and without disabilities across grade levels, and (d) reported effect sizes with confidence intervals based on APA standards for reporting effect sizes (APA, 2010). Overall, two main research questions were addressed:
Method
Literature Search and Inclusion Criteria
An electronic search of the literature was conducted to identify relevant studies using the Education Full Text, Educational Resources Information Center (ERIC), PsycINFO, and MEDLINE databases. To help reduce publication bias, searches were not limited to peer-reviewed publications; the Theses/Dissertations database was included. Searches were not restricted to a date range. Search terms included behavior contract, contingency contract, and learning contract. These terms were searched independently and in combination with emotional disturbance, behavior disorder, attention deficit hyperactivity disorder, autism, conduct disorders, delinquents, and absenteeism. Following the searches, a review of both the title and abstract was conducted to determine if the article required further inspection. In addition, an ancestral search was conducted of possible articles located in reference lists in the initial studies gathered from the database searches. Titles and abstracts were reviewed to decide whether further examination was necessary. A total of 250 articles were found and considered for inclusion. To be included in this meta-analysis, studies had to (a) use an SCR design; (b) implement a behavior contract to reduce problem behavior or increase appropriate or desired behavior; (c) involve elementary, middle, or high school–age participants; (d) be published (in a peer-reviewed journal) or conducted (as dissertation research or an unpublished article) between 1969 and 2013; and (e) be reported in English. Articles were excluded if they (a) were duplicate studies (viz., studies that appeared in more than one database search); (b) described home-school contracts; (c) focused on health- or mental health-related outcomes; (d) were case studies; (e) described studies but did not report data; and (f) provided a description of behavior contracts. A total of 18 articles were found that met the inclusionary criteria.
Publication Bias
Publication bias, the tendency for studies yielding statistically significant results to be published (Rosenthal & DiMatteo, 2001), was tested in WinPepi (Abramson, 2011) using the Egger’s test (Egger, Smith, Schneider, & Minder, 1997). Although unpublished studies were sought for inclusion, the intercept for the Egger’s test (2.09, 90% confidence interval [CI90] = [1.23, 2.94], p = .005) suggested publication bias. However, sensitivity analyses within a fixed effects model in WinPepi indicated that no single study had an undue impact on the findings. Heterogeneity was measured using Higgins and Thompson’s H and I2 statistics (Higgins & Thompson, 2002), where H = 2.7 (CI95 = [2.3, 3.3]) and I2 = 86.8% (CI95 = [80.5, 91.0]). Although these results indicate evidence of considerable heterogeneity, a caution must be considered as the test has low power with few studies (Higgins & Thompson, 2002). The I2 statistic indicated that most of the observed variance is reliable versus the result of sampling error.
Coding Reliability and Intercoder Reliability
The first author operationally defined all variables and coded all 18 studies in an Excel spreadsheet. Two graduate students were trained on the codes and independently coded a group of studies across all variables using separate Excel spreadsheets. Thus, each study was double- or triple coded. Reliability was calculated for 30% (n = 5) of the studies (that were randomly selected) across the following 13 variables: number of participants, grade level, gender, ethnicity, disability, target behaviors, whether other interventions were used in conjunction with the behavior contract, intervention length (duration), interobserver agreement (IOA), fidelity, implementer, setting, and SCR design. The formula used for inter-coder reliability was the sum of agreement/total number of agreements + disagreements × 100 (House, House, & Campbell, 1981). Initial agreement across all 13 variables was 81%. Disagreements were resolved after the graduate students reread and discussed articles, resulting in 100% final agreement across all codes. The coding guide is available from the first author on request.
Potential Moderators
Studies were coded across four potential moderator variables: (a) grade level, (b) gender, (c) disability status, and (d) target behavior. Variables were derived from the behavior contract literature base, and were selected because they have been used in previous meta-analyses examining students’ behavioral and academic outcomes (e.g., Bowman-Perrott, Burke, Zhang, & Zaini, in press; Reid, Gonzalez, Nordess, Trout, & Epstein, 2004; Rohrbeck, Ginsburg-Block, Fantuzzo, & Miller, 2003).
Grade level
Grade level was represented by two levels: “elementary” (or elementary-age participants) and “secondary” (or secondary-age participants). Elementary grades included kindergarten through Grade 6. Secondary grades represented middle and high school Grades 7 through 12. Secondary also included high school–age students identified as dropouts. Participants in studies conducted in homes were categorizedbased on the grade level or age reported in individual studies.
Gender
Participant gender was coded as “female” or “male” for the studies reporting these data.
Disability or at-risk status
Studies were coded based on participants’ reported primary disability. Studies that identified students with attention deficit hyperactivity disorder under the other health impairment (OHI) category were coded “ADHD.” Studies involving students with “EBD,” learning disabilities (“LD”), or autism or autism spectrum disorder (“ASD”) were coded as such. Students at-risk for EBD identification or continued involvement with the criminal justice system were coded as “at-risk.”
Target behavior
Behaviors were coded as “appropriate,” “inappropriate,” or “academic responses.” Examples of appropriate behaviors were appropriate social interactions, remaining seated, completing assignments, and completing household chores. Inappropriate behaviors included ignoring teachers’ requests, physical aggression, using profanity, and curfew non-compliance. Academic responses included homework and in-class work completion and percentage correct on work assignments.
Calculation of Effect Size
TauU
TauU is an effect size measure based on non-overlap between phases that can also control for confounding baseline trends (Parker et al., 2011). It is derived from Kendall’s Rank Correlation (an index of trend) and the Mann–Whitney U test between groups (an index of non-overlap; Parker et al.). The proportion of pairwise comparisons that improve from Phase A to Phase B, or the percentage of non-overlapping data, is obtained with TauU. Data from all AB phases were analyzed, with the exception of maintenance phases. Advantages of TauU include (a) it has more statistical power than many other non-overlap or non-parametric indices; (b) it is distribution-free and is suitable for ordinal and interval scaled scores; (c) it is complementary of visual analysis; (d) it avoids ceiling effects; and (e) it controls for level, trend, and pre-existing baseline trend (Parker et al., 2011).
TauU phase contrasts and data coding
The GetData Digitizer program (version 2.25) was used to scan and code graphed data. Graphed data from A and B phases were extracted from articles and transformed into raw numerical data by setting a scale based on the X and Y values for each phase. An effect size was calculated for each AB contrast (e.g., an effect size for the A1/B1 contrast and a separate effect size for the A2/B2 contrast). Digitized data values were then entered into the TauU calculator (Vannest, Parker, & Gonen, 2011) to obtain TauU and standard error of TauU (SETau) values. TauU and SETau values were entered into WinPepi (Abramson, 2011) using the meta-analysis function to aggregate the data and arrive at an effect size and confidence interval (CI) for each study. Specifically, the following data analysis functions were selected: (a) compare 2, (b) meta-analysis; analysis of stratified data, (c) others, or proportions or rates with effect sizes/CIs, and (d) also enter standard error. Each AB contrast, or “stratum,” was entered, and all contrasts were “combined” (Abramson, 2012). The effect size for each study was then entered into WinPepi to obtain an overall effect size across all studies.
Statistical significance
Statistical significance for TauU values was determined using CI95. When determining if change is reliable, a 90% to 95% confidence interval is standard (Nunnally & Bernstein, 1994), indicating a reasonable change of 5% to 10% likelihood of error. Statistical significance between TauU values was determined by calculating CI83.4 to visually test for overlap of upper and lower limits between effect sizes. Visual comparison of two effect sizes with CI83.4 is the same as a p = .05, or 95% confidence-level test between the two scores (Goldstein & Healy, 1995; Payton, Greenstone, & Schenker, 2003).
Results
Participant Characteristics
A total of 58 participants ages 5 through 21 were included in the 18 SCR studies analyzed in this review. Participants in most studies were in kindergarten through Grade 12. Of note, Kelley and Stokes (1982) included older youth (e.g., age 21), as they targeted high school dropouts in their study. Almost an equal number of studies focused on secondary students (n = 11) and elementary school students (n = 9); two included both elementary and secondary students (Hawkins et al., 2010; Navarro, Aguilar, Aguilar, Alcalde, & Marchena, 2007). Fifteen studies reported participant gender, 3 did not. Of the studies reporting these data, 41 participants were male (71%), 17 were female. Only five studies reported participant ethnicity. Ethnicities represented among those studies were Caucasian, African American, Hawaiian, and Hispanic.
Participants were described in several ways across the 18 studies. They were described as having trouble completing academic work or performing below average academically (n = 3), displaying disruptive or inappropriate behaviors (n = 3), being juvenile delinquents (e.g., engaging in robbery) or wards of the court (n = 2), dropouts (n = 2), being verbally and non-verbally aggressive (n = 1), and truant (n = 1). Students with ADHD were identified as participants in 3 studies, students with EBD in 2 studies, students with ASD in 5 studies, and students with LD in 1 study. Four studies identified youth at-risk for being identified as EBD or for long-term involvement with the criminal justice system. Participants in other studies were described as simply demonstrating inappropriate classroom behaviors (e.g., laughing, talking when they were not supposed to). For example, Williams, Long, and Yoakley (1972) said that participants in their study “would have been considered ideal in many schools, [but] they were selected as targets because the teacher deemed them to be the most disruptive students in her class” (p. 330).
Target Behaviors
Target behaviors measured across the 18 studies were analyzed by category: inappropriate behaviors, appropriate behaviors, and academic responses. Four studies addressed problem behaviors (e.g., destructive behaviors, off-task behavior, inappropriate social interactions); nine focused on appropriate behaviors (e.g., displaying appropriate social behavior, on-task behavior); and six studies examined the impact of behavior contracts on academic outcomes (e.g., homework completion, percentage of correct answers). One study examined both behavioral and academic outcomes. Three of the studies investigated the efficacy of behavior contracts with regard to school attendance/truancy and/or keeping curfews at home.
Intervention Characteristics
All studies used some form of behavior or contingency contracting. However, four studies described the use of behavior contracts with another intervention. DeMartini-Scully et al. (2000) used a mystery motivator and token economy, whereas Flood and Wilder (2002) incorporated Functional Communication Training (FCT). D. L. Miller and Kelley (1994) used goal setting, and Navarro et al. (2007) used a token economy system. Intervention length was reported in days, weeks, or months in 11 of the 18 studies. The average intervention length was 149 days (range 28-270 days), or approximately 4½ months. Only one study (Navarro et al., 2007) collected maintenance data.
Teachers implemented the behavior contracts in most of the studies (n = 12), whereas parents were implementers in two studies (D. L. Miller & Kelley, 1994; Welch, 1985). A teacher and researcher together carried out the intervention in another study (Newstrom et al., 1999), and a researcher administered the behavior contract in one study (Gurrad et al., 2002). A therapist or counselor implemented the intervention in two studies; the implementer in one of the studies was identified as an undergraduate psychology student (Flood & Wilder, 2002).
Behavior contract interventions generally occurred in classroom settings (n = 11). Other settings included the hallway outside of a classroom (n = 1), across school settings (n = 1), in homes (n = 4), and during a therapy session (n = 1). Interventions were carried out in rural (Allen et al., 1993), suburban (Martinez, 1986), and urban (DeMartini-Scully et al., 2000) settings.
Design Characteristics
Several SCR designs were used across studies. Eight studies used multiple baseline designs (across participants, tasks, or behaviors). Three studies used reversal designs (e.g., ABC, ABAB), six used multi-element designs (e.g., ABCB, ABCBAD), and one used a changing criterion design (Mruzek et al., 2007). IOA was only reported in 10 studies, 5 of which examined behavioral outcomes, 4 of which investigated academic outcomes; 2 reported data for both academics and behavior. Average IOA for the behavior studies was 92% and 98% for the academic studies. Seven of the 10 studies reporting IOA reported the percentage of observations or academic tasks for which reliability was coded (range 25%-100%; Kratochwill et al., 2010). Fidelity of implementation data were only reported in 2 studies; fidelity was 100% in each.
Overall Effect
In response to the first research question, the overall effect of the use of behavior contracts across all 18 studies was .57 (.01, CI95 = [0.55, 0.58]). Thus, a moderate overall effect can be attributed to the use of behavior contracts to decrease problem behaviors and increase appropriate behaviors in children and youth. Figure 1 illustrates the range of effect sizes and confidence intervals across all of the studies at a 95% confidence level. Thus, there is a 95% certainty that the true value for the obtained effect size fell between the upper and lower limits of the calculated confidence interval. As illustrated, effect sizes for individual studies ranged from .27 (CI95 = [0.15, 0.39]) to 1.00 (CI95 = [0.66, 1.00]).

Forest plot for the effects of behavior contracts on participant behaviors.
Findings for Potential Moderators
Potential moderators examined were grade level, gender, disability or at-risk status, and target behaviors. Results address the second research question. Table 1 illustrates the range of effect sizes and CIs across the potential moderator variables at an 83.4% confidence level.
Potential Moderator Variable.
Note. CI = confidence interval; LL = lower limit; ES = effect size (TauU); SE = standard error (SETau); UL = upper limit; LD = learning disabilities; EBD = emotional and behavioral disorders; ADHD = attention deficit hyperactivity disorder; ASD = autism spectrum disorder.
Grade level
Findings for grade level revealed similar results for elementary (ES = .55, .01, CI95 = [0.53, 0.57]) and secondary students (ES = .54, .01, CI95 = [0.53, 0.56]). The elementary group represents 27 participants; the secondary group 30 participants. Because the average age of participants in the Kelley and Stokes (1982, 1984) studies was 18 and 17, respectively, these studies were coded with the secondary grade-level data. Overlapping CI83.4% indicate that there was no statistically significant difference in the effect of behavior contracts across grade levels.
Gender
In studies that reported data on participant gender, a slightly larger effect size was obtained for males (ES = .58, .02, CI95 = [0.53, 0.63]) than females (ES = .51, .04, CI95 = [0.43, 0.59]). Overlapping CI83.4% results indicate that there was no statistically significant difference for males and females. Males represented 44 participants; females represented 7 participants.
Disability and at-risk status
Eleven studies included students with identified disabilities: LD, EBD, ADHD, and ASD. Four studies included students at-risk for identification as having an EBD or for continued involvement in the criminal justice system. Children and youth with ADHD (ES = 1.00, .18, CI95 = [0.64, 1.00]; n = 3) appeared to benefit most from behavior contracts, followed by those with EBD (ES = .86, .17, CI95 = [0.52, 1.00]; n = 2), LD (ES = .78, .07, CI95 = [0.64, 0.94]; n = 6), and ASD (ES = .65, 07, CI95 = [0.52, 0.78]; n = 5). The effect size for participants identified as at-risk was (ES = .66, .05, CI95 = [0.56, 0.76]). There was no statistically significant difference between participants in each of the disability groups or students identified as at-risk, as determined by CI83.4% overlap. Although there were few participants in each of the disability groups and the at-risk group, TauU weighs the number of observations in each study by the inverse of the variance.
Target behaviors
Participants seemed to gain the most benefit from behavior contract interventions that examined academic responses (ES = .60, .02, CI95 = [0.55, 0.65]; n = 6). A reduction in inappropriate behaviors (ES = .57, .04, CI95 = [0.50, 0.64]; n = 4) obtained a larger effect size than an increase in appropriate behaviors (ES = .44, .03, CI95 = [0.38, 0.51]; n = 9). The number of studies for target behaviors equal 19 because 1 study collected data for inappropriate and appropriate behavior; data were analyzed separately (see Table 2). Although there was no statistical difference between inappropriate and appropriate behaviors, non-overlapping CIs indicate that academic responses were significantly different than the other two types of behaviors.
Behavior Type Moderator Data.
Discussion
Meta-analysis is an essential tool in establishing the evidence-base for effective behavioral interventions, as it allows for the systematic synthesis of data across studies. The purpose of this meta-analysis was to quantitatively summarize the SCR literature on the use of behavior contracts with children and youth. Surprisingly, in more than 40 years of published studies on behavior contracting, relatively few studies were found. That is, very little research has been conducted on an intervention with such a long history (e.g., Bailey et al., 1970) and that has been identified as an empirically supported practice (Simonsen, Fairbanks, Briesch, Myers, & Sugai, 2008). This study represents the first meta-analysis on behavior contracts. Two main research questions were posed. First, what are the overall effects of behavior contracts? Second, what are the effects of potential moderators on academic as well as behavioral outcomes?
Overall Effect
The overall effect size for behavior contracts across appropriate, inappropriate, and academic responses was .57 (.01, CI95 = [0.55, 0.58]), indicating that a moderate effect on behavior change can be attributed to this intervention. Unfortunately, no previous meta-analyses or reviews of literature were found with which to compare the effect size results of the current study. These results are cautiously interpreted as behavior contracts having a modest overall effect, and that not all students who receive the intervention may be responsive to it. A range of effect sizes were found (ES = .27-1.00), and it is clear that some studies found the intervention to be more effective than others. Some studies suggested behavior contracts may be most beneficial as part of a multi-component package. For example, two studies that found strong effects incorporated other interventions. FCT was used along with behavior contracting in the Flood and Wilder (2002) study and a token economy was used in the Navarro et al. (2007) study. Other studies used a functional assessment in conjunction with behavior contracts; this might also result in promoting improved behavioral outcomes (e.g., Gurrad et al., 2002; Mruzek et al., 2007).
Potential Moderators and Implications for Practice
Potential moderators examined were grade level, gender, disability or at-risk status, and target behavior. Although only target behaviors revealed a significant difference, practical significance can be attributed to the each of the results. First, findings revealed that behavior contracts are equally effective for elementary-age children as secondary-age youth. Thus, behavior contracting can be used across age groups to promote prosocial behaviors (e.g., Arwood et al., 1974). A benefit of using behavior contracts in schools is that they can be easily used by teachers, school psychologists, and other educators (Downing, 1990). Moreover, they are cost-effective and flexible, and can be incorporated into behavior intervention plans. Second, results of this meta-analysis suggested that males benefitted slightly more from the use of behavior contracting than females. It is important to identify empirically supported interventions that promote positive behavior change for all children and youth. However, boys in particular are at greater risk of being identified with behavior problems (Loeber, Green, Lahey, Frick, & McBurnett, 2000), and are more likely to engage in disruptive behaviors (Loeber, Green, Keenan, & Lahey, 1995). In addition, there could be an interaction between gender and behavior type that mediates treatment outcomes. For example, males who exhibit aggressive behaviors may respond differently to and benefit more from the use of behavior contracts than males who display disruptive behaviors.
Third, it was encouraging to note that students in each of the disability categories and the at-risk group achieved some benefit from the intervention. Students with ADHD benefitted the most, followed by students with EBD, and students with LD. Behavioral interventions that hold promise of supporting students with ADHD, EBD, and LD in becoming more successful are needed, as they often display challenging beahviors. This is a particularly important finding for these three groups of students because they have been consistently identified in the literature as being at greatest risk of disciplinary exclusion (Bowman-Perrott, Benz, Hsu, Kwok, Eisterhold, & Zhang, 2011). In addition, behavior contracting seems to be an intervention that could be incorporated into behavior intervention plans, either to increase prosocial behavior or academic outcomes.
Fourth, the use of behavior contracts had a moderate effect on academic responses, demonstrating positive behavior change. It is encouraging to note that the use of behavior contracts can support academic behaviors, as it does appropriate behaviors. This finding is relevant because a number of children and youth with behavior challenges also experience academic problems (Algozzine, Wang, & Violette, 2011; Burke, Boon, Hatton, & Bowman-Perrott, in press). Fifth, with regard to inappropriate and appropriate behaviors, inappropriate behaviors yielded a larger effect than appropriate behaviors. These differences were statistically significant, as illustrated by non-overlapping CIs (see Table 1). This finding is similar to the results from the Good Behavior Game meta-analysis (Bowman-Perrott, Burke, Zaini, Zhang, & Vannest, in press), as this behavioral intervention also had a greater influence on reducing inappropriate behaviors than increasing appropriate behaviors. Three of the four studies that documented the largest effects were Gurrad et al. (2002), Flood and Wilder (2002), Newstrom et al. (1999), and Mruzek et al. (2007), which focused on the reduction of challenging behaviors. It is possible that the setting in which each of these studies took place (special education classrooms in two studies and a therapy session in the third) influenced the efficacy of behavior contracting. Perhaps the smaller or one-on-one setting and more individualized attention affected participants’ outcomes.
Limitations and Future Research
The findings of this meta-analysis should be considered in light of the following limitations and need for further research. First, findings may not be generalizable to settings other than schools, as most of the interventions took place in elementary or secondary schools. Second, many of the studies on the use of behavior contracts with children and youth are dated (e.g., Arwood et al., 1974; Williams et al., 1972). Date of publication may have an impact on study quality and outcomes, as quality standards for SCR have been developed fairly recently (Cooper, 2011; Horner & Kratochwill, 2012; Kratochwill et al., 2010). Third, because there were so few studies available, the design quality standards proposed by Kratochwill et al. (2010) were not included as an inclusion criterion. Thus, studies may vary in quality and in their ability to potentially demonstrate experimental control (e.g., at least three demonstrations of the effect of the intervention at three points in time, phases have a minimum of three data points; Kratochwill et al., 2010). These proposed standards are now being applied to evaluate the quality of research in the behavioral literature in many areas (Maggin, Briesch, & Chafouleas, 2012).
Notwithstanding these limitations, the effects of behavior contracts were replicated across multiple research teams, several participants, and in varied locations (Horner et al., 2005). Results reflect, at a minimum, practical significance as participants generally experienced a reduction in inappropriate behaviors and an increase in appropriate and desirable academic responses. Additional areas of future research in behavior contracting include the following. First, there is a need to update the literature and place behavior contracting in the context of current approaches (e.g., positive behavioral interventions and supports). Second, follow-up or maintenance phases should be included in research designs to better determine the efficacy of behavior contracts. It would be beneficial to examine whether positive behavioral or academic gains gleaned from behavior contracting are long-term. Only one study in this analysis included a maintenance phase (Navarro et al., 2007). Third, IOA should be reported more consistently in future studies. IOA is an important and recommended aspect of conducting observational research (Kratochwill et al., 2010). In addition, fidelity of implementation should be a part of data collection and reporting practices. Fidelity is needed, in part, to help determine how the delivery of a strategy or intervention can be improved. These data can also aid in evaluating SCR designs (Kratochwill et al., 2010).
Other areas of future research might include considering the function of the behavior (e.g., Hawkins, 2011), the impact of students helping to develop the contract (e.g., Arwood et al., 1974), the potential role of implementer training on the efficacy of behavior contracts (Mruzek et al., 2007), the impact of fading reinforcement during certain phases of the study (DeMartini-Scully et al., 2000), and further investigating the influence of creating a contract with positive consequences for adherence to the expectations versus negative consequences for non-adherence (Kidd & Saudargas, 1988).
Intervention setting was unable to be investigated as a potential moderator in this study, as there were few studies conducted in non-school settings. Future meta-analyses should examine the potential influence of this variable, as the setting(s) in which the intervention taskes place may moderate participant outcomes. For example, Hawkins (2011) reported teacher implementation of a behavior contract at school for students with ASD. However, when a component was added to allow the student to receive reinforcement at home (watching a favorite television program), inappropriate behaviors decreased even more. It is also possible that this “dual reinforcement” may serve as a motivator for students (with and without ASD). More frequent rewards have been shown to further decrease problem behavior for students with and without EBD (Bowman-Perrott, Burke, Zaini, Zhang, & Vannest, in press).
Improved academic responding was identified as a collateral benefit of the use of a contingency contract for behaviors in one study (Diaddigo & Dickie, 1978). Future research should investigate such collateral academic benefits in studies focused primarily on behavioral outcomes. In addition, research should continue to examine the efficacy of behavior contracts in homes. Finally, the potential moderating effect of the implementer (e.g., parents) should be investigated.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
