Abstract
Analysis of the differential treatment effects across targeted subgroups and contexts is a critical objective in many evaluations because it delineates for whom and under what conditions particular programs, therapies or treatments are effective. Unfortunately, it is unclear how to plan efficient and effective evaluations that include these moderated effects when the design includes partial nesting (i.e., disparate grouping structures across treatment conditions). In this study, we develop statistical power formulas to identify requisite sample sizes and guide the planning of evaluations probing moderation under two-level partially nested designs. The results suggest that the power to detect moderation effects in partially nested designs is substantially influenced by sample size, moderation effect size, and moderator variance structure (i.e., varies within groups only or within and between groups). We implement the power formulas in the R-Shiny application PowerUpRShiny and demonstrate their use to plan evaluations.
A common goal of experimental evaluations is determining the average effectiveness of a program, intervention, or policy (i.e., treatment). However, treatment effectiveness can depend on the individual (for whom) and contextual factors (under what conditions). Inclusion of moderator variables that capture the factors by which effects vary is a common technique to investigate treatment effect heterogeneity. Planning evaluations that consider treatment effect moderation is aided by the availability of power formulas for detecting moderation effects in various randomized designs, but these formulas are unavailable for partially nested designs in which the grouping structure of the treatment condition is different from that of the control condition. In this study, we develop, describe, and investigate power formulas for moderated effects in evaluations with partial nesting to ensure adequate sample sizes and improve evaluation planning.
Our analyses focus on experimental evaluations because of their ability to produce strong causal evidence regarding the effectiveness of a treatment but recognize the value of supplementary questions. Understanding the variability of treatment effects across differing individual characteristics (e.g., race, gender, pretest score) or group characteristics (e.g., organization size or location) reveals for whom and under what conditions a treatment is effective. It also provides an avenue to investigate opposing treatment effects across groups that produce a null main effect (MacKinnon et al., 2011). For example, a near-zero treatment effect would be produced by an educational intervention that increases academic achievement among male students but decreases achievement among female students.
The value of capturing a more complete understanding of treatment effects is often reflected in evaluation literature through the use of moderator variables that detail variance in the effects of an intervention across different groups. For example, recent literature has used this approach to study an intervention aimed at reducing violent video game consumption across different lifestyles (Rivera et al., 2016); a treatment program for mental illness across race, gender, and age (Kenny et al., 2004); an online intervention for depression across education levels, attitudes toward online instruction, and willingness to change (Lüdtke et al., 2018); and an implementation intentions method to increase physical activity across levels of executive function (Hall et al., 2014). Funders and professional organizations also emphasize investigations that capture a more complete understanding of treatment effects (e.g., Institute of Education Sciences, 2016; Society for Research on Educational Effectiveness, 2012).
In conjunction with the increasing emphasis on moderated treatment effects is a growing literature detailing design and analysis techniques that support the inclusion of moderator variables. For example, existing research details the inclusion and analysis of different types of moderator variables (e.g., categorical or continuous) in various experimental study designs (Bloom, 2005; Dong et al., 2018; Jaciw et al., 2016; Spybrook et al., 2016). Many of these advancements have been implemented in software (e.g., Dong et al., 2016), expanding the capacity of evaluators to plan for and capture treatment effect moderation. This capacity does not, however, extend to designs with partial nesting which occur when there are disparate grouping or nesting structures across treatment conditions.
A wide variety of partial nesting structures occur in practice (e.g., Sterba et al., 2014). Such structures can occur when, for example, assignment to a treatment condition eliminates some nested structure (e.g., homeschooling treatment vs. typical schooling control condition) or when assignment induces or utilizes some nesting structure in the treatment condition that does not exist in the control condition. For example, a common design in the field of counseling involves randomly assigning individuals to therapy led by a counselor or to a waitlist control condition (e.g., Roberts & Roberts, 2005). In this setting, treatment individuals are nested within counselors while control individuals are not nested. Similar examples of partial nesting occur in a wide range of fields (e.g., Bauer et al., 2008; Lohr et al., 2014; Sterba et al., 2014). In the medical field, patients can receive novel treatments in a clinic setting or be assigned to receive the typical home-based care in the control condition (e.g., Morrell et al., 1998); in education settings, a treatment may consist of a school-based intervention and be compared to an individualized home-based intervention control condition (e.g., Roberts et al., 2011); and in psychotherapy, the treatment condition may involve a group therapy while the control condition utilizes individual therapy (e.g., Dishion et al., 2001).
Partial nesting occurs across these examples because assignment to the treatment condition creates a grouping structure that is dissimilar to the structure of the control condition. Those in the treatment condition of the provided examples experience the treatment as a group (e.g., classroom, organization, neighborhood) or share a common agent of implementation (e.g., patients with the same counselor or students with the same tutor). The result is a treatment condition with a two-level data structure brought on by treatment delivery but a control condition without any grouping structure. Our examples focus on these “two/one” partially nested designs that combine the grouping structure of a two-level cluster-randomized trial in the treatment condition with the single-level structure of an individual-randomized trial in the control condition. The two/one partially nested nomenclature stems from the design having two levels in the first treatment condition and one level in the other condition. These designs are common in education, public health, and other social science settings (e.g., Bauer et al., 2008; Lohr et al., 2014; Sanders, 2011).
For clarity, Figure 1 illustrates the two/one partially nested designs included in the scope of this investigation (see Panels C and D) along with the more common individual- (Panel A) and cluster-randomized design (Panel B). The purpose of Figure 1 is to show the differences and similarities in treatment and control condition grouping across these designs. Panel (A) in Figure 1 presents a typical individual-randomized design that does not have any grouping in the treatment or control condition. Panel (B) in Figure 1 displays a typical cluster-randomized design with intact groups (represented using circles) randomly assigned to the treatment or control condition. Panel (C) in Figure 1 displays a partially nested design that leverages individual assignment but whose treatment induces a two-level nesting structure in the treatment condition. We emphasize that this grouping is new (i.e., treatment-induced) using spikes around the group circles in the figure. Notice that the partially nested design combines an unclustered structure for the control condition with a two-level nesting structure for the treatment condition that was brought about by the nature of the treatment (e.g., sharing a therapist or teacher). Panel (D) in Figure 1 displays a similar type of partially nested design as in Panel (C) because it yields a two-level nesting structure in the treatment condition and an unclustered structure in the control condition. However, the clustering structure in the treatment condition of Panel (D) arises differently. In Panel (D), grouping is an artifact of a preexisting social structure (e.g., extant groups). To illustrate extant grouping, we have study participants (black figures) join others (gray figures) in existing groups (solid circles).
Evident from the panels in Figure 1, the structures of the control conditions are identical across the partially nested designs (Panels C and D) and the individual-randomized design (Panel A) because all three designs leverage individual assignment that results in an ungrouped or unclustered control condition. Similarly, the structures of the treatment conditions in Panels B (cluster-randomized trial), C (two/one treatment-induced partial nesting), and D (two/one partial nesting with extant groups) are the same as they each result in a two-level structure for the treatment condition. The distinguishing feature of the partially nested designs (C and D) is their use of different grouping structures in the treatment and control conditions. The specific partially nested designs are distinguished by the mechanism that produces the grouping or clustering. Recall in (C), the clustering is generated by the nature of the treatment (e.g., individuals assigned to treatment are grouped to form a new therapy group) whereas in (D), the clustering exists prior to the study (e.g., individuals are assigned to an existing therapy group).

Individual, cluster, and partially nested randomized designs. (A)….(D)
The unique data structure of partial nesting is often disregarded, and past research has widely documented the various problems this can introduce in terms of efficiency, bias in standard errors of the treatment effect, and bias in estimates of variance components. These problems lead to inaccurate results and incorrect inferences (Baldwin et al., 2011; Bauer et al., 2008; Candlish et al., 2018; Hedges & Citkowicz, 2015; Korendijk et al., 2012; Lee & Thompson, 2005; Sanders, 2011; Schweig & Pane, 2016). Increasing attention has focused on development of analytic approaches and design strategies to address these issues in partially nested studies. For example, multiple-arm multilevel models have been extended for partially nested data (e.g., Lachowicz et al., 2015; Lohr et al., 2014; Sterba et al., 2014), and several studies have investigated sample size considerations and the use of covariates in partially nested designs (e.g., Moerbeek & Wong, 2008; Roberts & Roberts, 2005).
The purpose of this study is to develop statistical power formulas for moderated effects in common two/one partially nested designs and investigate these formulas to provide guidance and recommendations for evaluation planning. We structure our analyses to address two/one partially nested designs in which (a) treatment assignment induces a nesting structure in the treatment condition such that individual-level moderators plausibly vary only within groups and (b) treatment assignment inserts individuals into an extant nesting structure in the treatment condition such that individual-level moderators plausibly vary within and between groups.
In the first design (see Panel C of Figure 1), random assignment introduces a nesting structure such that individual-level moderator variables vary within but not across groups. It may seem tenuous to assume that the variability of an individual-level moderator arises solely from differences among individuals and not from differences among groups. However, because the formation of treatment conditions in partially nested designs is frequently the specific feature that induces nesting that would not otherwise exist, the values of an individual-level pretreatment moderator variable are typically established before assignment and exposure to the cluster-inducing treatment. As a result, the values of individual-level moderators will typically not be clustered and will not have variation at the group level.
Consider an evaluation of a counseling therapy in which individuals are randomly assigned to participate in therapy with a therapist (treatment) or to remain on a waitlist (control). This design creates a partially nested structure because treatment individuals are nested within therapists whereas control individuals remain ungrouped on a waitlist. Now consider an individual-level moderator such as pretreatment mental health. With individuals randomly assigned to therapists, there is no reason to suspect that the therapist-level averages of pretreatment individual-level mental health will systematically differ across therapists. Pretreatment mental health should be equally dispersed across therapists because pretreatment mental health took on values before individuals were assigned to therapists.
In the second design (see Figure 1 Panel D), we consider evaluations in which individuals assigned to the treatment condition are inserted into a preexisting nested structure such that individual-level moderator variables plausibly vary within and across groups. In this design, treated individuals participate in extant groups rather than forming new groups. Consider an evaluation of a group therapy in which individuals are randomly assigned to group therapy sessions (treatment) in preexisting groups or remain on a control waitlist. Treatment condition structure in this design has individuals nested in therapy groups whereas the control condition continues to be unclustered.
For example, let us consider pretreatment assessment of individual mental health as a moderator of group therapy effectiveness. When groups are formed prior to treatment assignment, it is plausible that the groups differ in their average pretreatment levels of mental health. Pretreatment mental health may be clustered within groups because of, for example, the prior progress of the groups or the self-selection of individuals with similar mental health levels into a therapy group. These average differences in prior mental health may play important contextual roles that moderate the effectiveness of the therapy. It is possible that the therapy is highly effective for groups with high average pretreatment mental health but ineffective for groups with low average pretreatment mental health. As a result, the average pretreatment mental health of a group may play an important moderating role.
Below, we further detail two/one partially nested designs setting a foundation for subsequent power formula development. We outline the analytic models, describe the error variance of the moderation effect, then provide formulas for estimating statistical power in two/one partially nested designs with treatment conditions that induce nesting. We repeat this process for designs that assign treatment to extant groups. A probe of the newly developed power formulas investigates the feasibility of detecting moderator effects in evaluations with partial nesting. This is followed by an illustrative example to demonstrate the application of the formulas in evaluation planning. To conclude, we summarize results, discuss implications, note limitations, and provide recommendations.
Two/One Partial Nesting
In designs with two/one partial nesting, individuals are randomly assigned to a treatment or control condition one of which has a two-level data structure while the other has a single-level data structure. Most often, individuals in the control condition are unaffected by treatment grouping. This creates a treatment condition with a two-level data structure and a control condition with a single-level structure. Our derivations apply to partially nested designs with a two-level data structure in the control condition, but we focus on those with a two-level data structure in the treatment condition (as illustrated in Figure 1 Panels C and D).
We noted an example in counseling when individuals are randomly assigned to receive a treatment delivered by a therapist or placed on a waitlist. This two/one partially nested design has a treatment-induced nesting structure (patients within a therapist) with a waitlist control condition comprised of independent individuals (i.e., the control condition retains a single-level structure). Two/one partially nested designs can also arise when treatments use extant grouping structure. In the context of our counseling example, assigning individuals to group therapy sessions utilizes a preexisting two-level treatment (i.e., individuals within therapy groups) with wait-listed control individuals retaining a single-level data structure. In either case, the treatment condition has two levels while the control condition has one level.
We take up these two complementary types of two/one partial nesting separately because moderator variability likely differs under each type. We refer to the two types of partial nesting as (a) treatment-induced nesting (moderator plausibly varies only within groups) and (b) treatment assignment to extant nesting (moderator plausibly varies within and between groups). Below, we examine each of the scenarios assuming that moderators are continuous variables but formulas are adaptable to binary moderators (see Binary Moderator in the Technical Supplemental Appendix).
Treatment Assignment–Induced Nesting Structure
Treatment-induced nesting structure results in a moderator that plausibly varies within groups only. We use a working example to help ground the analytic models, moderator effect variance formulas, and subsequent power formulas. This hypothetical evaluation in an educational setting investigates the effectiveness of a spatial intervention program on secondary student mathematics performance (Lowrie et al., 2019). The spatial intervention is implemented as a summer school program with a sample of students selected from those performing below proficient levels in mathematics. Students randomly assigned to the control condition will be placed on a waitlist and will continue with their summer as usual. Students assigned to the treatment condition will complete the classroom-based intervention program over 3 weeks in the summer. This evaluation has a two/one partially nested design with the treatment condition containing two levels (i.e., students nested within classrooms) and the control condition (i.e., wait-listed students) representing a single level.
In addition to considering the main effect of the spatial intervention program on secondary student mathematics performance, we include math anxiety as a possible moderator. Student math anxiety represents a typical individual-level continuous moderator. Math anxiety has a deleterious relationship with math performance (e.g., Ashcraft & Krause, 2007; Ashcraft & Moore, 2009), and it is possible that the spatial intervention program has differentiated effects based on a student’s math anxiety. Given random assignment of students to intervention groups, we can assume that math anxiety will not have variation at the group level. There is no reason with treatment-induced nesting to suspect that average math anxiety in the intervention groups will systematically differ across groups. Although math achievement (outcome) likely varies across individuals and groups, random assignment of individuals to groups ensures that in expectation pretreatment covariates such as math anxiety will be evenly distributed across groups.
Analytic Models
We use two analytic models to reflect the different treatment and control conditions and draw on the common multiple-arm multilevel framework for partially nested data (MA-PN). This approach makes power formulas more accessible (Spybrook et al., 2016). For the two-level treatment condition with an individual-level continuous moderator that only varies within groups, we have
The superscript t indicates the treatment arm, and subscripts i and j follow common multilevel model notation indicating individual and group, respectively. The outcome is represented by
For the single-level control arm, the outcome model is
Most variables (e.g., Y, M, X, and V) and parameters (e.g.,
Moderator Effect and Error Variance
Estimation of the moderator effect (ME) is possible by contrasting the coefficients capturing the relationship between the moderator and outcome with
A difference between
The novel contribution of our power formulas are expressions to track the expected uncertainty of the moderator effect (
(see Moderator Effect and Error Variance in Technical Supplemental Appendix for expanded formulations). We unpack
with
Next, we have individual per group sample size (
The variance of the moderated effect formulas indicate which parameters and design components are necessary for an a priori power analysis in a two/one partially nested design. Evaluators must predict the magnitude of the moderated effect, outcome and moderator variance structure (i.e., intraclass correlation coefficients [ICCs]), proportion of variance explained by predictors, and several sample sizes (
To summarize, we developed a formulation for moderator effect variance in two/one partially nested designs with a treatment-induced nesting structure that plausibly limits moderator variance to within groups. The moderator effect variance formulation is suitable for power analysis in the planning stages of an evaluation with formula structure suggesting that greater variance of the outcome (
Treatment Assignment to Extant Nesting Structure
When the treatment assignment utilizes extant grouping, it is no longer tenable to assume that moderators vary within groups only. With preexisting groups, the average moderator values may systematically differ across groups. Consider a new working example evaluation that investigates the effectiveness of a group-based intensive lifestyle intervention on weight loss (e.g., Mayer-Davis et al., 2004). The intervention consists of weekly group sessions encouraging physical activity and proper nutrition. An evaluation design could randomly assign a pool of volunteers to attend ongoing group-based lifestyle intervention sessions (i.e., two-level treatment) or continue with their current care (i.e., single-level control). That is, volunteers are randomly assigned to join groups formed independently and before the onset of the study. Group-based intensive lifestyle interventions have demonstrated an ability to increase weight loss among participants (Mayer-Davis et al., 2004), but these effects have been shown to be moderated by personal characteristics such as optimism (e.g., Scheier & Carver, 1992; Van Nguyen et al., 2018). Optimism represents an individual-level continuous moderator that plausibly varies across groups under this design because the intervention groups were formed prior to assignment. The use of extant groups makes it plausible that groups will differ in their average level of optimism. These average differences may influence the effectiveness of the intensive lifestyle intervention on weight loss.
To consider situations, like our working example, in which a variable’s aggregate or average may moderate treatment effects, we adjust the treatment outcome model to allow the continuous moderator (accommodations for binary moderators remain unchanged) to vary within and between groups such that
When considering moderators that vary within and between groups, a moderation effect can occur at Level 1 (
Moderator Effect and Error Variance
The moderated effect is still estimated using differences between the treatment and control model coefficients associated with the moderator. The
The
The statistical significance test for the moderated effect and power formula (see Test Statistic and Power Formula in Technical Supplemental Appendix) does not differ under the new analytic model. They do require the new moderation effect (see Equation 7) and a formulation of moderator effect variance that reflects
The formulation of
Several of the terms in the expanded version of
There are, however, two new variance components from the group level of the outcome and moderator models in the treatment arm (
Including covariates (that explain outcome variance) is an effective strategy for increasing power to detect the moderated effect. Reducing
To summarize, partially nested designs with moderators that only vary within groups (
Design Implications
After gathering evidence of formula accuracy (see Power Formula Accuracy Simulation in Technical Supplemental Appendix), we investigated the feasibility of detecting moderator effects in two/one partially nested designs when (a) the moderator varied within groups only (i.e., induced nesting design) and (b) the moderator varied within and between groups (i.e., extant nesting design). We considered moderator effects of
Variance structure conditions include individual-level variance of the outcome and moderator set at
The remaining parameters were held constant. Variance explained by predictors was set at 50% (
A fully crossed design with these factors produced 64 conditions. Results can help address common evaluation planning questions such as: How many intervention groups and individuals are necessary to consistently detect moderated treatment effects? Is the evaluation sample size large enough to consider moderators that vary within and between groups? and How will the size of the moderated effect influence the adequate sample of groups and individuals per group?
Results
Moderator Varies Only Within Groups
Table 1 presents power rates to detect a moderated effect in a two/one partially nested design when the individual-level continuous moderator only varies within groups. Larger moderation effects and larger sample sizes substantially (and predictably) increased power. These power rates remained constant under the different variance structures considered. Results indicate that adequate power to detect
Power to Detect a Moderated Effect When the Moderator Varies Only Within Groups.
Note. Individual-level variance of the outcome and moderator are
Individual per group sample size has a substantial influence on power to detect the moderated effect (see Figure 2). A reasonable result, given the individual per group sample size, directly reduces outcome variance, which reduces the variance of the moderation effect (see Equation 5). This relationship is noteworthy because power to detect main effects in a typical group-randomized trial is driven by group sample size (e.g., Raudenbush, 1997). The result implies increasing individual per group sample size is an effective design strategy for detecting these moderation effects. This is often less expensive than sampling additional groups. In our working example evaluation, we could sample more students per intervention group to increase the likelihood of detecting a moderation effect.

Power to detect a moderated effect when the moderator varies within groups only by group and individual per group sample size.
Moderator Varies Within and Between Groups
Table 2 presents power rates to detect a moderated effect in a two/one partially nested design when the individual-level continuous moderator varies within and between groups. Larger total moderation effects (
Power to Detect a Moderated Effect When the Moderator Varies Within and Between Groups.
Note. Individual-level variance of the outcome and moderator are
Increasing the sample of individuals per group does little to alleviate the need for a large sample of groups (see Figure 3). The power to detect

Power to detect a moderated effect when the moderator varies within and between groups by group and individual per group sample size.
In summary, simulation study results indicated that our formulas produced appropriate Type I error rates and accurately predicted power (see simulation results in Technical Supplemental Appendix). A probe of these formulas found that power to detect both
Adequate power (e.g., 80%) to detect moderator effects when the moderator only varied within groups was achievable using typical sample sizes for studies planned to detect main effects. Increasing individual per group sample size substantially influenced power to detect these effects. When the moderator varied within and between groups, achieving adequate power required larger sample sizes or larger moderated effects. A primary driver of these power rate differences was increased moderator effect variance caused by the inclusion of group-level outcome variance (
Illustrations
We now illustrate our results and the use of our formulas in the planning of evaluations with a two/one partially nested design. Our first evaluation example examined the effect of a spatial intervention program on math performance while considering the moderating effects of student math anxiety. Treatment assignment induced nesting (i.e., novel intervention groups were created) so we can assume intervention groups will have approximately equal levels of student math anxiety (i.e., the moderator varies within groups only).
Using the R-Shiny application PowerUpRShiny for moderated effects in partially nested designs (Bai et al., 2020; Bulus et al., 2019) , we can predict the power rate to detect a moderated effect for this evaluation with a specific sample size or determine the adequate sample to detect a moderated effect with acceptable power (e.g., >80%). Several design parameter estimates are needed to conduct this type of evaluation planning. We assume a variance structure based on educational evaluations considering academic outcomes (e.g., Hedges & Hedberg, 2007) such that variance of math performance and math anxiety at the individual level is set at
Our evaluation takes place in a large school district so we have access to a large number of qualifying students and resources for approximately 50 intervention groups (i.e.,
Our second evaluation example examined the effect of an intensive lifestyle intervention program on weight loss while considering the moderating effects of optimism. Treatment participants were assigned to an existing intervention group (i.e., use of extant intervention groups) so intervention groups may vary on aggregate levels of optimism (i.e., the moderator varies within and between groups). Using the described parameter estimates and conditions, but with
Discussion
Experimentally designed evaluations provide strong evidence regarding average effectiveness of a treatment but treatment effects may depend on individual and contextual factors. Including moderators in evaluation design planning can help evaluators identify these differential treatment effects. Planning experimental evaluations that include moderation effects, however, has been limited in some cases. Specifically, partially nested designs pose a challenge because statistical power formulas for moderation effects have not yet been available.
In response, this study develops these formulas and investigates their properties and implications for practice. Our aim is to encourage the use of adequate sample sizes, to identify typical sample sizes necessary to detect moderated effects, and to determine the factors that influence these sample sizes. We first considered moderators that varied within groups only which are likely in partially nested designs with treatment-induced nesting. Random assignment serves to eliminate systematic differences across groups on pretreatment moderator variables under this design. In a second set of formulas, we relaxed this assumption and allowed moderators to vary within and between groups, which is likely when using extant groups in the treatment condition.
The power formulas presented here improve evaluation efficiency. Evaluators can now determine the sample sizes necessary to detect a moderated effect in a proposed evaluation, avoiding wasted resources from oversampling. Evaluations that consider moderation effects also produce better evidence regarding treatment effects. When a significant effect is present, evaluators can examine for whom and under what conditions it is applicable. Conversely, evaluators can investigate nonsignificant results and determine whether the treatment was effective for some groups while counter-effective for others resulting in a combined effect near zero (i.e., counteracting treatment effects).
The initial probe of power formulas for moderated effects in partially nested designs found that the sample sizes required to achieve adequate power are similar to those required to detect main and mediated effects when the moderator only varies within groups (e.g., Kelcey et al., 2017). Evaluations examining this type of moderation effect are feasible as the sample size is likely to be adequate based on planning for other effects. We also found a particularly strong relationship between individual per group sample size and power. This implies that evaluations with a limited number of groups can still consistently detect moderated effects if a large individual per group sample size is available. For example, an evaluation limited to 25 treatment groups only has an 18% chance to detect a .1 magnitude moderation effect with 10 individuals per group. However, if the group contains many easily accessible individuals (e.g., school) such that individual per group sample size can be 100, the power to detect the moderation effect is greater than 90%.
In comparison, when the moderator varies within and between groups, the additional variance from the group level can overpower any additional moderated effect from the aggregated moderator. Evaluations with this type of moderator will often require large group sample sizes with increases to individual per group sample size doing little to increase power once
Conclusion
To conclude, we highlight some limitations, opportunities for future research, and summarize implications for practice. We limited the scope of our study to designs with a two-level data structure in one treatment arm and a single-level data structure in the other treatment arm (i.e., two/one partially nested designs). Many evaluations take place across a single entity (e.g., school district, state, company) with the treatment inducing nesting or using extant groups. However, many settings will have additional levels of nesting that should be considered for statistical or substantive reasons (e.g., students within schools within districts). These conditions require considerations beyond two levels. We encourage future research examining power in three/one and three/two partially nested designs.
Limited design conditions were used in our probe of the newly developed power formulas (e.g., sample sizes from 10 to 100). A more comprehensive set of sample size, moderator variance structure, and outcome variance structure combinations is needed. It would be informative to establish expected power rates for total and specific moderation effects (i.e., within-group moderation effects and between-group moderation effects) across a wider range of design conditions.
We noted that making assumptions about inputs to the power analysis was necessary because of the sparsity of literature reporting such values. Pilot studies are an excellent source for these values but are not always practical. We encourage future research to report the empirical values required for the power formulas presented here. Additionally, investigations into the robustness of predicted power to misspecified parameter values could indicate the degree of precision required for accurate predictions of power. Despite these limitations, this study enhances the set of tools that evaluators can use to plan evaluations. Specifically, considering moderated effects is increasingly relevant to policy and practice. Better planning evaluations to generate evidence about for whom and under what conditions an intervention, program, or policy is effective is coveted across evaluation settings.
To close, we highlight several takeaway recommendations for evaluators interested in moderation effects with partially nested designs. First, consult existing evidence (e.g., literature or pilot study results) to identify moderators that are likely to have a large effect on the treatment–outcome relationship. This evidence should also be consulted to identify other parameter values required for the power analysis. Second, include covariates that reduce outcome variation. This is especially important when the moderator varies within and between groups as reductions in group-level outcome variance typically have a substantial influence on power. Third, when a moderator only varies within groups, increasing the sample of individuals per group is an effective strategy to increase power. Finally, if the moderator varies within and between groups, carefully consider the variance structure of the outcome as it has a substantial influence on evaluation feasibility and the relationship between power and sample size.
Supplemental Material
Supplemental Material, sj-pdf-1-aje-10.1177_1098214020977692 - Statistical Power for Detecting Moderation in Partially Nested Designs
Supplemental Material, sj-pdf-1-aje-10.1177_1098214020977692 for Statistical Power for Detecting Moderation in Partially Nested Designs by Kyle Cox and Benjamin Kelcey in American Journal of Evaluation
Footnotes
Authors’ Note
The opinions expressed herein are those of the authors and not the funding agency.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This article is based on work funded by the National Science Foundation [ #1012665 and #1760884].
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
