Abstract
Introduction. Group-randomized trials (GRTs) are one of the most rigorous methods for evaluating the effectiveness of group-based health risk prevention programs. Efficiently designing GRTs with a sample size that is sufficient for meeting the trial’s power and precision goals while not wasting resources exceeding them requires estimates of the intraclass correlation coefficient (ICC)—the degree to which outcomes of individuals clustered within groups (e.g., schools) are correlated. ICC estimates vary widely depending on outcome, population, and setting, and small changes in ICCs can have large effects on the sample size needed to estimate intervention effects. This study addresses a gap in the literature by providing estimates of ICCs for adolescent sexual risk-taking outcomes under a range of study conditions. Method. Multilevel regression analyses were applied to existing data from four federally funded GRTs of school-based HIV/STI/pregnancy prevention programs to obtain a variety of ICC estimates. Results. ICCs ranged from 0 to 0.15, with adjustment for covariates and repeated measurements reducing the ICC in the majority of cases. Minimum detectable effect sizes with 80% power and 0.05 significance levels ranged from small to medium Cohen’s d (0.13 to 0.53) assuming 20 schools of 100 students each. Conclusions. This study provides the first known set of ICC estimates for investigators to use when planning studies of school-based programs to prevent sexual risk behaviors in youth. The results provide further evidence of the importance of using the appropriate adjusted ICC estimate at the design stage to maximize resources in costly GRTs.
Group-randomized trials (GRTs), sometimes called cluster-randomized trials, are common in behavioral health intervention research in youth. They are considered one of the most scientifically rigorous methods for evaluations in which intervention and control conditions must be assigned to groups (Donner, Birkett, & Buck, 1981; Donner & Klar, 2000; Murray et al., 2004). Many GRTs are conducted in schools, which are seen as an important venue for interventions because most youth attend schools, making them efficient sites for program delivery (Davis & Bauman, 2013).
In school-based trials of behavioral health interventions, the GRT is used when interventions are delivered to whole schools or classrooms rather than at the individual level. This is commonly done when it is not feasible or cost effective to deliver the intervention at the individual level or when the intervention is inherently group based (Donner & Klar, 2000; Murray, 1998; Murray & Blitstein, 2003; Varnell, Murray, Janega, & Blitstein, 2004). While GRTs randomize groups to intervention and control conditions, most are focused on assessing the impact of the intervention on the outcomes of individuals, so that analyses are performed on observations measured at the individual, not group level. However, observations on individual members within the same group may be correlated, violating the assumption of independence of observations required by the statistical theories underlying classical techniques for trial planning and analysis. The correlation between outcomes on individual members of the same group must be accounted for to ensure the study being designed is not underpowered, and/or the study being analyzed does not have inflated Type I error rates (Donner et al., 1981; Donner & Klar, 2000; Goldstein, 1995; Murray, 1998). In particular, classical sample size determinations must be inflated by a factor of 1 + (n − 1) * ICC, where n is the size of each group, and ICC, or intraclass correlation coefficient, is a measure of the correlation between observations within groups (Cornfield, 1978; Donner et al., 1981; Kish, 1965). Application of this factor, termed the “design effect” (Snijders & Bosker, 1999) or “variance inflation factor” (Donner et al., 1981), is necessary to ensure that a GRT will have adequate power and precision for detecting and estimating intervention effects. The ICC can be thought of as a measure of the proportion of total variance in the outcome attributable to variation between groups (e.g., schools); for this reason some authors prefer the name “variance partition coefficient” to denote the ICC (Rasbash, Steele, Browne, & Goldstein, 2009).
A small change in an ICC estimate can have a large impact on the required sample size needed to power a study with the desired accuracy to detect the effect size of interest. For example, with an ICC of 0.001, a researcher planning a school-based GRT would need to recruit nine schools, assuming average group sizes of 100 students recruited per school, 80% power, a .05 significance level and a 0.2 Cohen’s d minimum detectable effect size (equivalent to a change in a proportion outcome from 0.5 to 0.4); with an ICC of 0.01, the number of schools needed increases to 16 under the same assumptions; an ICC of 0.05 increases the number of schools to 63. Therefore it is important for researchers planning GRTs to have access to ICCs that best match their study design, primary outcomes, population, and setting; this will help ensure that a sample size is selected that is sufficient for meeting the trial’s power and precision goals while not wasting resources exceeding them.
Studies have demonstrated that ICCs can vary substantially depending on the outcome, population, setting and intervention (Donner & Klar, 2000; Murray, 1998; Resnicow et al., 2010; Rotondi & Donner, 2009). Additionally, estimates of ICCs vary depending on the model or method of analysis. For example, ICCs often decrease with the inclusion of covariates in an outcome model (Murray & Blitstein, 2003; Pals, Beaty, Posner, & Bull, 2009; Zhang et al., 2014), and may differ depending on whether repeated measures or cross-sectional analyses are applied (Murray & Blitstein, 2003). Published ICC estimates found for trials of community and school-based interventions focused on tobacco and alcohol prevention, and physical activity and nutrition promotion ranged from 0 to 0.03 (Murray, Birnbaum, Phillips, & Glenn, 2001; Murray & Blitstein, 2003; Murray et al., 2004; Murray et al., 2006; Murray & Short, 1996, 1997; Siddiqui, Hedeker, Flay, & Hu, 1996), while those found for community-based HIV/STI prevention trials ranged from 0.001 to 0.16 (Feldblum et al., 1999; Pals et al., 2009; Zhang et al., 2014); no ICC estimates were found for school-based HIV/STI/pregnancy prevention trials.
Estimates of ICCs for outcomes involving sexual risk-taking behaviors under a larger variety of conditions are sorely needed in the field of sexual and reproductive health (Pals et al., 2009; Resnicow et al., 2010; Zhang et al., 2014). The present study obtained estimates of ICCs for sexual risk-taking behavior outcomes using data from four published, school-based HIV/STI/pregnancy prevention trials for adolescents, and examined how the ICCs varied depending on the inclusion of covariates in analyses and whether the analyses accounted for repeated measures across time. We also examined the impact of these issues on sample size/power considerations. The purpose was to contribute needed data for trial planning in the field of school-based HIV/STI/pregnancy prevention interventions, providing researchers with a new array of parameter estimates from which to choose depending on their population, study design, outcomes, and planned analyses.
Method
Samples and Study Designs
The data sets were from four completed, federally funded GRTs of theory-based HIV/STI/pregnancy prevention programs. All the studies collected student self-report survey data on several sexual risk behaviors including sexual initiation, unprotected sex in the past 3 months, and condom and birth control use at last sex. Each study is described briefly and summarized in Table 1.
Study Descriptions.
The original article also presents a fourth wave of survey data at 36-month follow-up. This final survey was not part of the original study and has more than 30% attrition. To minimize the effect of such large attrition on the ICC estimates, the 36-month follow-up was not analyzed for this article.
Study 1 (All4You2!) evaluated the individual and combined effects of a HIV/STI/pregnancy prevention classroom curriculum and a service learning component on sexual risk outcomes of high school-aged youth attending district-run alternative high schools in northern California; the classroom curriculum significantly reduced unprotected sex in the treatment group relative to control at 6-month follow-up (Coyle et al., 2013). Study 2 (All4You!) evaluated the combined effects of an intervention involving a classroom curriculum and a service-learning component on sexual risk-taking in youth attending county-run alternative high schools; the combined intervention had a statistically significant effect on condom use behaviors at the 6-month follow-up (Coyle et al., 2006). Study 3 (Safer Choices) evaluated an intervention involving classroom and school-level components to reduce sexual risk-taking behaviors among inner city high school students in Texas and California, and found a significant reduction in unprotected sex (Coyle et al., 2001). Finally, Study 4 (Draw the Line/Respect the Line) evaluated a classroom curriculum to reduce sexual risk behaviors in middle school youth and found a significant reduction in sexual initiation among boys but not girls (Coyle et al., 2004).
Measures
Outcome measures in this study were the main behavioral outcomes reported for the original studies. For the three high school studies (Studies 1-3) they included (a) number of times had unprotected sex in the past 3 months, (b) number of partners with whom had unprotected sex in the past 3 months, (c) condom used at last sex, (d) effective method of birth control (condom or hormonal) used at last sex. Given convergence problems associated with Poisson models in some of the original outcome studies, we treated count variables as continuous, assuming an underlying normal distribution. When count variables were highly skewed, we applied transformations to reduce skew or dichotomized to 0 times/partners and 1 or more times/partners.
For the first two outcomes, analyses were performed on “full samples” of all study participants, with respondents assigned a value of 0 if they had not had sex in the past 3 months or had never had sex. For the last two outcomes, analyses were conducted only on filtered samples of sexually active youth as in the original studies, to understand the impact of condom use separate from abstinence.
For Study 4 (Draw the Line/Respect the Line) in which the prevalence of sexual behaviors was much lower because of the younger population, the following outcome variables were selected: sexual initiation (was sex initiated between baseline and a given follow-up time point); a Likert-type scale measuring self-efficacy to refuse sex; and a Likert-type scale measuring sexual limits (e.g., “Imagine you are alone with someone you like very much. Would you let them . . . touch your private parts below the waist?”).
The covariates examined in the studies included age, gender, race, parent lived with most of the time (mother, father, both), whether cohabitated with partner, probation status, number of suspensions, language spoken at home (English primary or not), language spoken with friends, whether born in United States, self-reported grades, parent education level, years lived in United States, and student’s educational goal level. A large subset of these were available for each study.
Analysis
Linear and logistic multilevel regression models (MLMs) were used to obtain all ICC estimates. MLMs partition the error variance into components associated with the data hierarchy and yield estimates of the standard errors of regression parameters that are adjusted for the presence of intraclass correlation (Goldstein, 1995). These, or equivalent models, should be used when analyzing data from GRTs (Murray 1998; Murray et al., 2004).
Because the four studies described above were longitudinal GRTs, three-level repeated measures MLMs, with time (including baseline) as Level 1, student as Level 2, and school as Level 3 would make the most efficient use of all of the available data while accounting for all three levels of nesting (Baumler, Harrist, & Carvajal, 2003; Murray & Blitstein, 2003). However, ICCs estimated from these models would be most applicable to the planning of future GRTs that also are longitudinal with the same or a similar number of follow-up time points and that intend to utilize repeated measures MLMs for data analysis. Therefore, to broaden the utility of our results, we applied both two-level and three-level MLMs. The two-level models, referred to as “endpoint analysis” models, analyzed data from a single follow up time point (the method of analysis currently required for Teen Pregnancy Prevention replication evaluations funded by the Office of Adolescent Health), with student as Level 1, school as Level 2, and the baseline value of the outcome as a covariate (Coyle et al., 2006). Random intercept MLMs were applied in all cases given the trial aims were to examine whether there was a difference in mean outcomes across study arms.
To assess the impact on ICCs of including an intervention indicator and various covariates as independent variables, the following models were run successively for each outcome (at each time point, for the endpoint analyses): (a) a “means model,” in which the outcome variable was the dependent variable and a constant was the independent variable; (b) the previous model with the addition of a fixed effect treatment indicator as an independent variable; and (c) the previous model with the addition of a set of covariates significantly related to the outcome variable at p < .15 added as fixed effect independent variables. 1 Stata Version 12.1 was used to obtain estimates of all ICCs and their standard errors.
Computation of School-Level ICCs: Continuous Outcomes
Following the terminology of Hox (2002), Baumler et al. (2003), and Rasbash et al. (2009), for the two-level case, school-level ICC was defined as the school-level error variance (i.e., variance component) divided by the total error variance: ICCschool = σu2/(σu2 + σe2), where σu2 is the component of variance in the outcome attributable to differences in school-level means (the Level 2 variance), and σe2 is the component of variance in the outcome attributable to differences in student-level means (the Level 1 variance). For the three-level case, school-level ICC was defined as ICCschool = σu2/(σu2 + σv02 + σe2), where σu2 is the component of variance in the outcome attributable to differences in school-level means (the Level 3 variance), σv02 is the component of variance attributable to differences in student-level means across time but within schools (a component of the Level 2 variance), and σe2 is the component of variance attributable to differences in repeated observations within student (the Level 1 variance).
Computation of School-Level ICCs: Binary Outcomes
Given that the lowest level of error variance is fixed for binary outcomes, we used a latent variable formulation (sometimes referred to as a threshold model formulation) to calculate ICCs. Under this formulation, a school-level ICC for a two-level model is defined as ICCschool = σu2/(σu2 + π2/3), and for a three-level model is ICCschool = σu2/(σu2 +σv02 +π2/3) (Hox, 2002; Rodriguez & Elo, 2003).
Sample Size/Minimal Detectable Effect Size
Minimal Detectable Effect Size (MDES) estimates were computed instead of required sample size because, like the four studies included in the analyses, school-based GRTs are generally constrained by budgets and feasibility to a fixed sample size of schools, often 20 or less. Thus, for a range of ICCs we obtained a range of MDES estimates, using Cohen’s d effect size measure, which represents the standardized difference between group means (e.g., mean for treatment group minus mean for control group divided by pooled estimate of standard deviation; Cohen, 1992). We assumed a sample size of 20 schools, an average of 100 students per school, and 80% power to detect significance at p < .05, all conditions that are likely to occur in a real-world research setting of this type. Optimal Design Version 3.0 (Raudenbush, Bryk, Cheong, & Congdon, 2011; Spybrook, Bloom, Hill, Martinez, & Raudenbush, 2011) was used to compute MDES estimates.
Results
Table 1 shows demographic characteristics of students in each of the studies. Tables 2 and 3 show the ICC estimates for the two-level endpoint and three-level repeated measures models, respectively, under the various conditions and across the four studies. While standard errors are reported in the tables, it should be noted that for small ICCs they may not be reliable and should be interpreted with caution (Murray et al., 2006). Table 4 shows the MDESs across various conditions.
School-Level ICCs for Two-Level Endpoint Models.
Note. ICC = intraclass correlation coefficients; SE = standard error.
Covariates include baseline value of the outcome variable unless model is for baseline time point or outcome is initiation since baseline. bStudents who reported not being sexually active in the 3 months prior to the survey were given a score of 0, allowing the entire sample to be included in the analysis (i.e., full sample), unless otherwise indicated. cBecause this variable was highly skewed, it was dichotomized to 0 times/partners and 1 or more times/partners. dICCs are listed as 0 if the estimate obtained from Stata was <1.0e−10. In this case the accompanying SE and confidence interval limits were also always <1.0e−10.
School-Level ICCs for Three-Level Longitudinal (Repeated Measures) Models.
Note. ICC = intraclass correlation coefficients; SE = standard error.
Students who reported not being sexually active in the 3 months prior to the survey were given a score of 0, allowing the entire sample to be included in the analysis (i.e., full sample), unless otherwise indicated. bBecause this variable was highly skewed, it was dichotomized to 0 times/partners and 1 or more times/partners. cICCs are listed as 0 if the estimate obtained from Stata was <1.0e−10. In this case the accompanying SE and confidence interval limits were also always <1.0e−10.
Minimum Detectable Effect Size. a
Note. MDES = minimum detectable effect size; ICC = intraclass correlation coefficients.
The MDES for each ICC assumed a sample size of 20 schools, an average of 100 students per school, and 80% power to detect significance at p < .05.
ICC Estimates
Endpoint MLMs
Unadjusted ICCs from two-level endpoint MLMs ranged from 0 to 0.15 depending on the outcome, follow-up time point and study population. The range for covariate-adjusted ICCs across outcomes, time points and study populations was the same, although the majority of ICCs were substantially reduced after adjusting for covariates. In Studies 1 and 3, adjusting for covariates reduced the ICC for all outcomes at all follow-ups; in Study 3, ICCs were at least halved by the addition of covariates. In Study 4, ICCs remained relatively the same (with a few slight increases) with the addition of covariates. In Study 2, in several cases—mostly for baseline models—the ICCs actually increased with the addition of covariates to the MLM. This result is examined further in the Discussion section. The range of ICCs for the four sexual behavior outcomes measured across multiple studies were similar: 0 to 0.14 for both number of times unprotected sex in past 3 months and number of partners with whom had unprotected sex in past 3 months, 0 to 0.15 for condom use at last sex, and 0 to 0.10 for birth control at last sex.
Repeated Measures MLMs
The unadjusted ICCs from the three-level repeated measures MLMs ranged from 0.004 to 0.092 while covariate-adjusted ICCs ranged from 0 to 0.062. In both Study 1 and Study 3, ICCs were more than halved by the addition of covariates to the models. In Study 4, the addition of covariates to the MLMs also reduced ICCs but not by as much. Finally, similar to the ICCs estimated from the endpoint models, for Study 2, for three of the four outcomes examined, ICCs increased with the addition of covariates to the repeated measures MLMs.
Minimum Detectable Effect Sizes
Table 4 shows the minimum detectable effect sizes across the range of ICCs found. The MDES ranged from a small Cohen’s d of 0.13 to a medium d of 0.53.
Discussion
This study found that the range of ICCs from several group-randomized trials of school-based HIV/STI/pregnancy prevention interventions in the United States was 0 to 0.15. This is similar to the range (0.003 to 0.16) found in the only other published reports of ICC estimates for HIV/STI community prevention interventions, albeit with different populations and in different settings (Feldblum et al., 1999; Pals et al., 2009). This range is slightly wider than the range of ICCs reported for tobacco, alcohol, and nutrition health-related outcomes (range 0.01 to 0.03; Murray & Blitstein, 2003), and has much lower bounds than those reported for educational outcomes (range 0.15 to 0.25; Hedges & Hedberg, 2007). The fact that the results of this study, and others focused on sexual risk outcomes, differed from those found in studies of other health and education outcomes provides further evidence of the importance of utilizing ICC estimates from studies with similar outcomes. Furthermore, ICC estimates from this study differed depending on the type of sexual risk outcome, the population, and the type of analysis performed, indicating the need to select ICC estimates from studies closely aligned in population, design, primary outcome, and intended analysis method to maximize efficiency when planning GRTs.
In the majority of cases, estimates of ICCs at baseline were larger than those at follow-up time points. This may be because as the studies progressed in time the original student groupings within schools or classrooms shifted because of real-world factors such as attrition and migration, increasing the within-school variation. For two outcomes in the high school studies—condom use and birth control use at last sex—the ICC was appreciably larger at a later follow-up time point. However, for these variables analyses were conducted on filtered samples of only students who were sexually active in the past 3 months—a sample that would likely have changed in composition substantially as students get older. These results suggest that not only should an ICC estimate be from an outcome as closely matched as possible to that of the study being planned, but the specific follow-up time point also should be considered.
For a given study and outcome, the ICC estimated using a repeated measures model appeared to be roughly somewhere in the middle of the ICCs estimated using endpoint models for the individual time points. Again, results from this study further confirm the importance of ensuring that the type of analysis planned is taken into account when selecting ICCs at the design stage of a study.
In most cases, the addition of just the treatment indicator as an independent variable to the means model did not result in appreciable differences to the ICC estimates. This could be because in these studies the intervention did not account for school-level variation. On the other hand, including a set of demographic/risk-related covariates lowered the ICC in the vast majority of cases for both the endpoint and repeated measures analyses. While Murray and Blitstein (2003) noted the importance of understanding the distribution of covariates across the groups being randomized before assuming that the addition of covariates will decrease between-group variance, in the four studies examined here adjustment for a set of covariates associated with the outcome variable was almost always beneficial in terms of reducing the ICC. The one exception was for Study 2—the All4You! study involving county-run alternative schools, for which the ICCs increased appreciably with the addition of the set of covariates, particularly in the repeated measures cases. County-run alternative schools are often used as an option for students who are not succeeding in district-run alternative schools or for students with more serious behavioral issues; they tend to serve very specific communities, such as youth from a certain geographic area, and their relatively homogeneous composition may affect covariate-adjusted ICCs differently than they affect ICCs for other school enviornments.
Limitations
One of the major limitations of this study was the relatively large standard errors seen in the ICC estimates, in part a reflection of the relatively low number of schools available for estimating between-school variation in each study. Additionally, the reliability of the standard error estimates of ICCs is low (Murray et al., 2006). Another limitation of this study is that it obtained estimates of ICCs only for studies that were published; studies for which there were no significant impacts on any sexual behavior outcomes often are not published and may have different ICCs associated with them. However, this limitation is mitigated by the fact that we found ICC estimates were negligibly affected by the inclusion of treatment indicators in the models. Furthermore, we chose to present ICC estimates for the most commonly addressed outcomes in the literature regardless of whether these four programs positively affected those particular outcomes. Finally, we did not look at whether ICCs differed for treatment and control groups through separate subgroup analyses, but the finding that the addition of a treatment indicator variable in the models did not change ICCs suggests they would not differ. Despite these limitations, this study offers the first published data on ICC estimates for school-based HIV/STI/pregnancy prevention interventions, an important contribution to the literature.
Implications for Practice
This study provides one of the first sets of ICC estimates for researchers to use when planning school-based studies involving sexual behavior outcomes among adolescents. ICC estimates differed across studies, outcomes, follow-up time points, and method of analysis, providing further evidence of the importance of selecting ICC estimates that reflect the characteristics of the study being planned.
In general, ICCs ranged from 0 to 0.15, with adjusted ICCs for unprotected sex in the past 3 months—often the target outcome for studies of this type (e.g., replication studies for the Office of Adolescent Health’s Teen Pregancy Prevention Initiative)—estimated at 0.023 or less. With such an ICC, studies able to recruit at least 20 schools could detect a relatively small effect size of d = 0.23 or more. In the vast majority of cases examined across the four studies, controlling for a set of demographic and related covariates reduced the intraclass correlation coefficient; further research needs to be conducted to understand which particular covariates result in the biggest reductions in ICCs.
Footnotes
Authors’ Note
The authors dedicate this work to the memory of Dr. Ronald Harrist, whose warm optimism and practicality affirmed the value of our work.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The study was supported by the National Institute of Child Health and Human Development (HD# 068173).
