Statistical Power in Three-Arm Cluster Randomized Trials

Abstract

This article shows how to compute statistical power for testing the main effect of treatment in three-arm cluster randomized trials. Using orthogonal coding, we derive the exact test statistic of the treatment effect and its non-central distribution. The non-centrality parameter in the omnibus test is found to be related to the non-centrality parameters in the contrast tests. A study of physician and pharmacist comanagement of patients’ blood pressure is used as an example to show the power computation in a three-arm cluster randomized trial.

Keywords

statistical power sample size cluster randomized trial

Cluster randomized trials have become very popular in medical research because individual patients often cannot be randomly assigned due to ethical and logistical concerns. It is especially true when the intervention is administered at different clinics and hospitals. For instance, different patient care management strategies can only be implemented at the hospital level rather than at the level of individual physicians working at the same hospital.

However, cluster randomized trials tend to lack sufficient statistical power for testing the treatment effect when compared to the designs that randomly assign individual subjects. The design effect or the intraclass correlation can inflate the standard error of the treatment effect estimate in cluster randomized trials. As a result, the cluster randomized trial may suffer insufficient statistical power (Donner & Klar, 2010; Murrary, 1998). It is, therefore, crucial to conduct power analysis in planning a cluster randomized trial. The previous literature has provided approximate formulas for statistical power and sample size in a two-arm cluster randomized trial (Hsieh, 1988). Sometimes, the cluster randomized trial involves three arms, say, two treatment arms and one control arm (Perria et al., 2007; Zwar et al., 2010). The literature does not provide the formulas for computing statistical power in three-arm cluster randomized trials.

The existing software packages for statistical power in multilevel designs (i.e., PinT and Optimal Design program) do not accommodate more than two study conditions in cluster randomized trials. The overall test of any treatment differences among more than two study conditions is more difficult to compute than the test of two study conditions. The overall test requires special coding (i.e., orthogonal coding) to test multiple parameters in the relevant multilevel model. When there are three study conditions in the cluster randomized trial, the overall test corresponds to two parameters of two orthogonal contrasts among the three study conditions. The current software for multilevel modeling uses a Wald statistic to test multiple parameters, which is a chi-square statistic based on large sample approximation. However, such large sample approximation may not be appropriate when cluster randomized trials involve limited number of clusters. Further, it is difficult to estimate the overall effect size among more than two study conditions for power analysis. There is no standard way to relate the pairwise treatment effects to the overall effect size among more than two study conditions, which is required for computing power in the overall test.

In this article, we will use Helmert coding to relate treatment mean differences to the overall effect size for more than two study conditions. We will use an F statistic for the overall test instead of the Wald statistic. The F statistic is an exact test, and it is more accurate than the Wald statistic. We will then use an example to show the power computation in a three-arm cluster randomized trial.

Statistical Power in Cluster Randomized Trials

We can analyze the data in cluster randomized trials, using linear mixed models. The linear mixed model for a cluster randomized trial with three arms in matrix notation is given by:

y_{j} = X_{j} β + Z_{j} u_{j} + e_{j}, j = 1, 2, \dots, J,

where the vector y_j contains the individual observations nested in the jth cluster $y_{j}^{'} = [Y_{1 j}, Y_{2 j}, \dots, Y_{n j}]$ ; $β^{T} = [β_{0}, β_{1}, β_{2}]$ ; $X_{j} = 1 \otimes [1, X_{1 j}, X_{2 j}]$ ; $X_{1 j}$ and $X_{2 j}$ are the indicator variables that differentiate the clusters between the three arms; Z _j = 1; the vector u _j simplifies to a scalar u _j (the random effect due to the jth cluster); and the error vector is $e_{j}^{'} = [e_{1 j}, e_{2 j}, \dots, e_{n j}]$ . We can simplify the matrix notation and represent the cluster randomized trial with three arms in one linear equation,

Y_{i j} = β_{0} + β_{1} X_{1 j} + β_{2} X_{2 j} + u_{0 j} + e_{i j}, (i = 1, 2, . . ., n; j = 1, 2, . . . J),

where $Y_{i j}$ is the ith outcome observation in the jth cluster. The indicator variable $X_{1 j}$ takes $\frac{1}{3}$ for the first treatment arm, $\frac{1}{3}$ for the second treatment arm, and $- \frac{2}{3}$ for the control condition. The second indicator $X_{2 j}$ assumes $\frac{1}{2}$ for the first treatment arm, $- \frac{1}{2}$ for the second treatment arm, and 0 for the control condition. Such orthogonal coding represents two independent Helmert comparisons. Although other orthogonal contrasts can be used, they will not change the power computation. We use Helmert coding because this coding scheme is frequently used in orthogonal comparisons.

The first indicator $X_{1 j}$ is used to compare the average outcome of the two treatment arms with that of the control arm. The second indicator $X_{2 j}$ is created to contrast the average outcome between the two treatment arms. The coefficient β₁ is the estimate of the mean difference between the average of the two treatment conditions and that of the control condition.

β_{1} = \frac{μ_{T 1} + μ_{T 2}}{2} - μ_{T 3},

where $μ_{T 1}$ , $μ_{T 2}$ , and $μ_{T 3}$ represent the mean on outcome performance for the first treatment arm, the second treatment arm, and the control arm, respectively. The coefficient β₂ is the mean difference between the two treatment arms,

β_{2} = μ_{T 1} - μ_{T 2} .

The intercept β₀ is the grand mean. It is easy to see why the three coefficients correspond to the grand mean and the mean differences.

Taking expectation on both sides of Equation 2 produces the mean for each arm, that is,

μ_{T 1} = β_{0} + \frac{1}{3} β_{1} + \frac{1}{2} β_{2},

μ_{T 2} = β_{0} + \frac{1}{3} β_{1} - \frac{1}{2} β_{2},

μ_{T 3} = β_{0} - \frac{2}{3} β_{1} .

The above formulas are useful for power analysis because the mean differences between the treatment conditions are straightforward and intuitive. Researchers naturally think about mean differences when they attempt to estimate the effect size in comparing different treatments.

The statistical test and power can be derived from the aggregated model based on cluster means. For planning purposes, we assume that the clusters are of equal size. The proposed approach also works when cluster sizes are not drastically different from each other. We therefore use a balanced design for the cluster randomized trial with equal cluster sizes. We can use one random term $e_{j}^{*}$ to represent the two random terms $u_{0 j}$ and ${\overset{ˉ}{e}}_{j}$ in the aggregated model (i.e., $e_{j}^{*} = u_{0 j} + {\overset{ˉ}{e}}_{j}$ ). The combined random term $e_{j}^{*}$ can be conceived of as having a constant variance $σ^{* 2}$ .

{\overset{ˉ}{Y}}_{j} = β_{0} + β_{1} X_{1 j} + β_{2} X_{2 j} + u_{0 j} + {\overset{ˉ}{e}}_{j} .

The cluster mean ${\overset{ˉ}{Y}}_{j}$ , therefore, has a normal distribution ${\overset{ˉ}{Y}}_{j} ~ N (β_{0} + β_{1} X_{1 j} + β_{2} X_{2 j}, σ^{* 2})$ . Further, the cluster mean is the maximum likelihood estimate of its population mean, that is, ${\overset{ˉ}{Y}}_{j} = {\hat{β}}_{0} + {\hat{β}}_{1} X_{1 j} + {\hat{β}}_{2} X_{2 j}$ . The maximum likelihood estimates ${\hat{β}}_{0}$ , ${\hat{β}}_{1}$ , and ${\hat{β}}_{2}$ can, therefore, be derived from the cluster means. The estimates thus computed are also maximum likelihood estimates (McCulloch & Searle, 2001).

The complex error variance $σ^{* 2}$ is

σ^{* 2} = τ + σ^{2} / n .

The estimate of the variance $σ^{* 2}$ can be based on the pooled within-arm sum of squares of the cluster means divided by the sum of the within-arm degrees of freedom,

{\hat{σ}}^{* 2} = \frac{{S S W}_{T 1} + {S S W}_{T 2} + {S S W}_{T 3}}{J - 3} .

The term SSW represents the within-arm sum of squares of the cluster means ${\overset{ˉ}{Y}}_{j}$ . The subscripts $T 1$ , $T 2$ , and $T 3$ represent the two treatment arms and the control arm, respectively. For instance, $S S W_{T 1}$ is the sum of squares of the cluster means for the first treatment arm.

S S W_{T 1} = \sum_{j = 1}^{J / 3} {({\overset{ˉ}{Y}}_{T 1 j} - \sum_{j = 1}^{J / 3} {\overset{ˉ}{Y}}_{T 1 j} / (J / 3))}^{2},

where ${\overset{ˉ}{Y}}_{T 1 j}$ is the cluster mean in the first arm. The degrees of freedom for the estimated variance ${\hat{σ}}^{* 2}$ is $J - 3$ if the three arms in the cluster randomized trial have the same number of clusters (i.e., $J / 3$ ). In each treatment arm, the within-treatment degrees of freedom is $J / 3 - 1$ . Their sum is equal to $J - 3$ . The estimate ${\hat{σ}}^{* 2}$ is the restricted maximum likelihood estimate (McCulloch & Searle, 2001).

The estimate of the treatment contrast β₁ is the weighted sum of the outcome means for the three treatment arms with weights being $\frac{3}{2} X_{1 j}$ . In our case, the orthogonal coding produces the Helmert contrasts. The estimate of the first Helmert contrast is

{\hat{β}}_{1} = \frac{1}{2} {\overset{ˉ}{Y}}_{T 1} + \frac{1}{2} {\overset{ˉ}{Y}}_{T 2} - {\overset{ˉ}{Y}}_{T 3},

where ${\overset{ˉ}{Y}}_{T 1}$ , ${\overset{ˉ}{Y}}_{T 2}$ , and ${\overset{ˉ}{Y}}_{T 3}$ are the mean of the three treatment arms. Specifically, ${\overset{ˉ}{Y}}_{T 1}$ is the average of cluster means in the first treatment arm,

{\overset{ˉ}{Y}}_{T 1} = \frac{\sum_{j = 1}^{J / 3} {\overset{ˉ}{Y}}_{T 1 j}}{J / 3},

where ${\overset{ˉ}{Y}}_{T 1 j}$ is the jth cluster mean in the first treatment arm. The other two averages of cluster means for the second treatment and control arm are

{\overset{ˉ}{Y}}_{T 2} = \frac{\sum_{j = J / 3 + 1}^{2 J / 3} {\overset{ˉ}{Y}}_{T 2 j}}{J / 3} a n d {\overset{ˉ}{Y}}_{T 3} = \frac{\sum_{j = 2 J / 3 + 1}^{J} {\overset{ˉ}{Y}}_{c j}}{J / 3} .

The average of the cluster means for each treatment arm has a common variance,

V a r ({\overset{ˉ}{Y}}_{T 1}) = V a r ({\overset{ˉ}{Y}}_{T 2}) = V a r ({\overset{ˉ}{Y}}_{T 3}) = \frac{σ^{* 2}}{J / 3} .

The variance of ${\hat{β}}_{1}$ becomes

\begin{array}{c} V a r ({\hat{β}}_{1}) = \frac{1}{4} V a r ({\overset{ˉ}{Y}}_{T 1}) + \frac{1}{4} V a r ({\overset{ˉ}{Y}}_{T 2}) + V a r ({\overset{ˉ}{Y}}_{T 3}) \\ = (\frac{1}{4} + \frac{1}{4} + 1) \frac{σ^{* 2}}{J / 3} = \frac{3}{2} \frac{σ^{* 2}}{J / 3} . \end{array}

Substituting ${\hat{σ}}^{* 2}$ (Equation 10) for $σ^{* 2}$ in the above equation, we obtain the estimated variance

\hat{V a r} ({\hat{β}}_{1}) = \frac{3}{2} \frac{{\hat{σ}}^{* 2}}{J / 3} .

In general, we can write $\hat{V a r} ({\hat{β}}_{1})$ in terms of the Helmert contrast coefficients $a_{1}$ , $a_{2}$ , and $a_{3}$ . The Helmert contrast coefficients are related to the values of the orthogonal coding (i.e., $a_{k} = \frac{3}{2} X_{1 j}$ ).

\begin{aligned} a_{k} = \frac{3}{2} X_{1 j} \\ a_{1} = \frac{3}{2} \times \frac{1}{3} = \frac{1}{2} \\ a_{2} = \frac{3}{2} \times \frac{1}{3} = \frac{1}{2} \\ a_{3} = \frac{3}{2} \times (- \frac{2}{3}) = - 1, \end{aligned}

where the multiplier $3 / 2$ is obtained by taking the reciprocal of $\sum X_{1 j}^{2} = (1 / 3)^{2} + (1 / 3)^{2} + (- 2 / 3)^{2}$ . The estimate of the Helmert contrast can be written as

{\hat{β}}_{1} = \sum_{k = 1}^{3} a_{k} {\overset{ˉ}{Y}}_{T k},

where ${\overset{ˉ}{Y}}_{T k}$ represent the mean of the kth treatment arm. The variance of ${\hat{β}}_{1}$ is

\begin{aligned} V a r ({\hat{β}}_{1}) & = \sum_{k = 1}^{3} a_{k}^{2} V a r ({\overset{ˉ}{Y}}_{k}) \\ = \sum_{k = 1}^{3} a_{k}^{2} \frac{σ^{* 2}}{J / 3} . \end{aligned}

The estimated variance is

\hat{V a r} ({\hat{β}}_{1}) = \sum_{k = 1}^{3} a_{k}^{2} \frac{{\hat{σ}}^{* 2}}{J / 3} .

The test for the first Helmert contrast can be based on a t statistic with a degrees of freedom $J - 3$ ,

T_{1} = \frac{{\hat{β}}_{1}}{\sqrt{\hat{V a r} ({\hat{β}}_{1})}} .

When the null hypothesis is true ( $H_{0} : β_{1} = 0$ ), the t statistic has a central t distribution. When the alternative hypothesis is true ( $H_{a} : β_{1} \neq 0$ ), the t statistic follows a non-central t distribution $T_{1}^{'}$ with a degrees of freedom $J - 3$ and a non-centrality parameter $λ_{1}$ ,

\begin{aligned} λ_{1} & = \frac{β_{1}}{\sqrt{V a r ({\hat{β}}_{1})}} \\ = \frac{β_{1}}{\sqrt{\sum_{k = 1}^{3} a_{k}^{2} \frac{σ^{* 2}}{J / 3}}} \\ = \frac{β_{1}}{\sqrt{\frac{3 (τ + σ^{2} / n)}{J} \sum_{k = 1}^{3} a_{k}^{2}}} = \frac{β_{1}}{\sqrt{\frac{4.5 (τ + σ^{2} / n)}{J}}} . \end{aligned}

We can write the test for the second Helmert contrast $β_{2}$ in a similar way. The contrast coefficients $b_{1}$ , $b_{2}$ , and $b_{3}$ for the second Helmert contrast are 1, −1, and 0. We can use $2 X_{2 j}$ to derive the contrast coefficients for the means of the three treatment arms,

\begin{aligned} b_{k} = 2 X_{2 j} \\ b_{1} = 2 \times \frac{1}{2} = 1 \\ b_{2} = 2 \times - \frac{1}{2} = - 1 \\ b_{3} = 2 \times 0 = 0. \end{aligned}

The estimate of the second Helmert contrast is

{\hat{β}}_{2} = \sum_{k = 1}^{3} b_{k} {\overset{ˉ}{Y}}_{k} .

Its variance is

V a r ({\hat{β}}_{2}) = \sum_{k = 1}^{3} b_{k}^{2} \frac{σ^{* 2}}{J / 3} .

The estimated variance of ${\hat{β}}_{2}$ is therefore

\hat{V a r} ({\hat{β}}_{2}) = \sum_{k = 1}^{3} b_{k}^{2} \frac{{\hat{σ}}^{* 2}}{J / 3} .

The test for the second contrast also uses a t statistic with the degrees of freedom $J - 3$ ,

T_{2} = \frac{{\hat{β}}_{2}}{\sqrt{\hat{V a r} ({\hat{β}}_{2})}} .

It has a central t distribution when the null hypothesis $H_{0} : β_{2} = 0$ is true. Under the alternative hypothesis $H_{a} : β_{2} \neq 0$ , the test statistic follows a non-central t distribution $T_{2}^{'}$ with the degrees of freedom $J - 3$ and a non-centrality parameter $λ_{2}$ ,

\begin{aligned} λ_{2} & = \frac{β_{2}}{\sqrt{V a r ({\hat{β}}_{2})}} \\ = \frac{β_{2}}{\sqrt{\sum_{k = 1}^{3} b_{k}^{2} \frac{σ^{* 2}}{J / 3}}} \\ = \frac{β_{2}}{\sqrt{\frac{6 (τ + σ^{2} / n)}{J}}} . \end{aligned}

It should be noted that the non-centrality parameter for a pairwise comparison between any two treatment arms takes the same form as $λ_{2}$ except that $β_{2}$ needs to be replaced by the corresponding means of the treatment arms. For instance, if the mean of the first treatment arm is compared with that of the control arm, the pairwise comparison test uses a t statistic with degrees of freedom $J - 3$ and a non-centrality parameter $(μ_{T 1} - μ_{T 3}) / \sqrt{6 (τ + σ^{2} / n) / J}$ . The numerator $μ_{T 1} - μ_{T 3}$ in the non-centrality is simply the mean difference between the first treatment arm and the control arm.

Statistical power for testing the contrasts can be formulated in one equation with subscript $p$ indicating which contrast it is. The statistical power in a two-sided test is

\begin{aligned} P [|T_{p}^{'} (J - 3, λ_{p})| \geq t_{0}] & = 1 - P [T_{p}^{'} (J - 3, λ_{p}) < t_{0}] \\ + P [T_{p}^{'} (J - 3, λ_{p}) < - t_{0}], \end{aligned}

where $p = 1$ means the first contrast $β_{1}$ ; $p = 2$ means the first contrast $β_{2}$ ; and $t_{0}$ is the critical value. We can use the critical value $t_{1 - α / 2, J - 3}$ when the orthogonal contrasts are a priori and done without an omnibus F test.

The omnibus F test examines the null hypothesis that the means of the three treatment arms are all equal $H_{0} : μ_{T 1} = μ_{T 2} = μ_{T 3}$ , which suggests that the orthogonal contrasts are simultaneously zero (i.e., $H_{0} : β_{1} = β_{2} = 0$ ). It can be proven that the overall F test is the average of the two squared t statistics for the two orthogonal contrast tests,

F = \frac{1}{2} (T_{1}^{2} + T_{2}^{2}) .

The F statistic has 2 degrees of freedom in the numerator and $J - 3$ degrees of freedom in the denominator.

It is intuitive to see why the two squared t statistics add up to an F statistic. Squaring $T_{1}$ produces an F statistic. By the definition of an F statistic, it is a ratio between two chi-squares,

T_{1}^{2} = F (1, J - 3) = \frac{χ_{1}^{2} / 1}{χ_{3}^{2} / (J - 3)} .

Likewise,

T_{2}^{2} = F (1, J - 3) = \frac{χ_{2}^{2} / 1}{χ_{3}^{2} / (J - 3)} .

Note that the two squared T tests share a common $χ_{3}^{2}$ in the denominator and that $χ_{1}^{2}$ and $χ_{2}^{2}$ are independent due to orthogonality between the two contrasts. Adding two independent chi-square variables yields a chi-square with the degrees of freedom equal to the sum of their respective degrees of freedom. Thus, we have

\begin{aligned} \frac{1}{2} (T_{1}^{2} + T_{2}^{2}) & = \frac{(χ_{1}^{2} + χ_{2}^{2}) / 2}{χ_{3}^{2} / (J - 3)} \\ = \frac{χ_{d f = 2}^{2} / 2}{χ_{3}^{2} / (J - 3)} ~ F (2, J - 3) . \end{aligned}

The F statistic is more appropriate than the Wald statistic in testing multiple parameters in cluster randomized trials using multilevel modeling. The Wald statistic is a chi-square test based on large sample approximation (Raudenbush & Bryk, 2002). If the number of clusters $J$ is fairly large, say, 200, then the approximation is good. In this case, it can be proven that $2 \times F_{2, J - 3}$ for large $J$ is approximately equal to the Wald statistic $χ_{2}^{2}$ , and the two tests are basically equivalent. We can verify this by checking the critical values for the two statistics. At 5% significance level, the critical value for the Wald chi-square test is $χ_{.95, 2}^{2} = 5.991465$ . The critical value based on the F statistic is $2 \times F_{.95, 2, 200 - 3} = 6.083506$ . The two tests are almost the same. However, cluster randomized trials typically do not involve that many clusters in practice. If there are sixty clusters ( $J = 60$ ), the critical value for the chi-square test remains the same. The critical value based on the F statistic is now $2 \times F_{.95, 2, 60 - 3} = 6.317685$ . It is obvious that the Wald test is much more liberal than the exact F test. Thus, we shall use the exact F statistic to test the overall treatment effect most of the time.

When the means of the three treatment arms are not equal, it suggests that $H_{a} : μ_{T 1} \neq μ_{T 2} \neq μ_{T 3}$ . The two independent chi-squares $χ_{1}^{2}$ and $χ_{2}^{2}$ have a non-centrality parameter $λ_{1}^{2}$ and $λ_{2}^{2}$ , respectively. The non-centrality parameters for the non-central chi-squares $χ_{d f = 2}^{2}$ are the sum of the squared non-centrality parameters for the orthogonal contrast tests because the non-centrality parameters among independent chi-squares are additive (Johnson, Kotz, & Balakrishnan, 1995). So the non-centrality parameter λ for $χ_{d f = 2}^{2}$ under $H_{a}$ is

λ = λ_{1}^{2} + λ_{2}^{2} .

Under $H_{a}$ the F statistic has a non-central F distribution with $2$ degrees of freedom in the numerator, $J - 3$ degrees of freedom in the denominator, and a non-centrality parameter λ (i.e., $F^{^{'}} (2, J - 3, λ)$ ). Its statistical power function is

P [F^{'} (2, J - 3, λ) \geq F_{0}],

where $F_{0}$ is the critical value for the overall F test.

In estimating the values for β₁ and β₂, we can start with simple effects between the treatment arms and the control arm,

Δ_{1} = μ_{T 1} - μ_{T 3} a n d Δ_{2} = μ_{T 2} - μ_{T 3} .

We can then use the following relationship to convert the simple effects to the parameter values of β₁ and β₂:

β_{1} = \frac{μ_{T 1} + μ_{T 2}}{2} - μ_{T 3} = \frac{Δ_{1} + Δ_{2}}{2},

β_{2} = μ_{T 1} - μ_{T 2} = Δ_{1} - Δ_{2} .

Example

We use a study on physician and pharmacist cooperation in managing patients’ health care as an example. The intervention involves physicians and pharmacists following different protocols in managing systolic blood pressure among patients who suffer chronic kidney disease. Effective management of chronic kidney disease emphasizes strict blood pressure control to lower cardiovascular risk and slow the progression of the disease (DeLusignan et al., 2009). All the physicians and pharmacists affiliated with the same clinic will use the same approach to managing patients’ care. In this case, it is not feasible to randomly assign individual physicians and pharmacists to different interventions. Clinics are, therefore, randomly assigned to use different kinds of physician and pharmacist management. The study forms a cluster randomized trial. The first treatment, $T 1$ , uses the physician–pharmacist collaborative model (PPCM), in which the patient’s personal physician delegates responsibility to the pharmacist to assist with achieving blood pressure control (Carter, 2010; Carter et al., 2010). The second treatment arm, $T 2$ , uses periodic physician and patient consultation to control blood pressure. The control arm, $T 3$ , follows the business-as-usual model in managing patients’ blood pressure.

The medical researcher hypothesizes that the first treatment arm reduces the mean systolic blood pressure by 5 mmHg ( $Δ_{1} = - 5$ ) when compared to the control arm, and that the second treatment arm will lower the mean blood pressure by 2 mmHg ( $Δ_{2} = - 2$ ) when compared to the control arm. The standard deviation of systolic blood pressure is 7.7 mmHg or $σ = 7.7$ (Muntner et al., 2011). Suppose that 50 patients will be recruited from each clinic (n = 50). The researcher is interested in the statistical power for the omnibus F test and the power for the contrast test between the first treatment arm and the control arm ( $μ_{T 1} - μ_{T 3}$ ) and the contrast test between the first treatment arm and the second treatment arm ( $μ_{T 1} - μ_{T 2}$ ). For planning purposes, we compute statistical power under four different intraclass correlations ρ (i.e., 0, .05, .8, .10). The site variance τ can be obtained by using the intraclass correlation ρ and the error variance $σ^{2}$ (i.e., $τ = ρ σ^{2} / (1 - ρ)$ ).

Table 1 lists statistical power for the omnibus F test and the two contrast tests. It is easy to see that statistical power is not evenly distributed among the omnibus test and the contrast tests. The omnibus F tends to have sufficient power. The power for the contrast test depends on the mean difference between the contrasted treatment arms. The contrast $μ_{T 1} - μ_{T 3}$ yields a larger effect than the contrast $μ_{T 1} - μ_{T 2}$ .

\begin{aligned} μ_{T 1} - μ_{T 3} = Δ_{1} = - 5 \\ μ_{T 1} - μ_{T 2} = Δ_{1} - Δ_{2} = - 3 \end{aligned}

Table 1.

Statistical Power and Number of Cluster J.

$ρ$ ρ	J	Omnibus F	$μ_{T 1} - μ_{T 3}$	$μ_{T 1} - μ_{T 2}$
0	6	0.6561	0.6665	0.2971
0	9	0.9742	0.9813	0.6609
0	12	0.9990	0.9994	0.8619
0.05	12	0.7344	0.7513	0.3169
0.05	15	0.8627	0.8761	0.4233
0.05	18	0.9332	0.9417	0.5210
0.05	21	0.9691	0.9738	0.6077
0.05	24	0.9862	0.9886	0.6828
0.05	27	0.9941	0.9952	0.7464
0.05	30	0.9975	0.9980	0.7993
0.08	12	0.5591	0.5685	0.2166
0.08	18	0.8048	0.8168	0.3634
0.08	24	0.9240	0.9313	0.4995
0.08	30	0.9731	0.9765	0.6171
0.08	36	0.9911	0.9925	0.7139
0.08	42	0.9972	0.9977	0.7905
0.08	48	0.9992	0.9993	0.8493
0.1	18	0.7154	0.7260	0.2975
0.1	24	0.8610	0.8700	0.4138
0.1	30	0.9370	0.9427	0.5209
0.1	36	0.9731	0.9762	0.6154
0.1	42	0.9891	0.9905	0.6961
0.1	48	0.9957	0.9964	0.7632
0.1	54	0.9984	0.9987	0.8177

Therefore, statistical power for testing $μ_{T 1} - μ_{T 3}$ is much higher than that for testing $μ_{T 1} - μ_{T 2}$ . If the intraclass correlation is small ( $ρ = .05$ ) and 27 clinics ( $J = 27$ ) are used, statistical power is 0.9952 for the first contrast test and 0.7464 for the second contrast test. These power values may be deemed as sufficient if the first contrast is of primary interest and the second contrast is of lesser interest. If the intraclass correlation is assumed to be .10, then it requires 48 clinics ( $J = 48$ ) to achieve comparable statistical power as before.

For comparison purpose, we can plot the power for the overall test in the three-arm cluster randomized trial and the power for the test in the two-arm cluster randomized trial (Figure 1). We use the effect sizes −5 and −3 in the two-arm cluster randomized trial. The number of clusters J is held equal between the two-arm and three-arm cluster randomized trial. The dashed line at the top of the figure is for the larger effect size −5; the dashed line at the bottom is for the smaller effect size −3. The solid line represents the statistical power for the overall test in the three cluster randomized trial. The power in the three-arm cluster randomized trial is less than that for testing the larger pairwise effect size −5 but higher than that for testing the smaller pairwise effect size −3. The power for the overall test in the three-arm cluster randomized trial is between the powers of two separate cluster randomized trials with two arms. However, it requires more resources to run a two-arm cluster randomized trial twice to study more than two treatment conditions. If we add up the number of clusters in the two separate cluster randomized trials with two arms, the total number of clusters will be larger than those in the three-arm cluster randomized trial. In the former scenario, each two-arm cluster randomized trial includes a control arm. Overall, it is economical to combine the two-arm cluster randomized trials into one three-arm cluster randomized trial, as this can significantly reduce the study cost and time.

Figure 1.

Statistical power in two-arm and three-arm cluster randomized trial.

Discussion

Cluster randomized trials have been increasingly used in health and medical fields because intact social settings need to be randomly assigned to treatment conditions due to ethical and logistical concerns. Treatment interventions are directly given to the entire social units, emulating how the treatment is actually implemented in practice. In pharmacotherapy studies, clinics and hospitals are often randomly assigned to the treatment and control arm. Occasionally, there are two treatment arms and one control in a cluster randomized trial. It is essential to calculate statistical power for testing the main effect of treatment in a three-arm cluster randomized trial because the design often lacks statistical power. The previous literature provides a method to compute statistical power in a two-arm cluster randomized trial based on normal approximation.

The current article shows the exact statistics for the overall test, and the contrast tests, and their statistical power based on the non-central F and t distribution. The F statistic is an exact test, and it is proven to be more conservative than the Wald statistic in testing the overall effect of treatment among more than two study conditions. Using orthogonal coding, we demonstrate an easy way to relate treatment mean differences to the overall effect size among more than two study conditions. The overall effect size can be decomposed into simple effect sizes, which are always meaningful in the research context (Lenth, 2001). Researchers can think about the treatment mean differences among the study conditions and calculate the overall effect size for the omnibus test. This facilitates statistical power analysis in planning a three-arm cluster randomized trial. Although the example does not include any covariates, the described method can be adapted easily to accommodate covariates in the cluster randomized trials. To account for covariates, we can reduce the variances by a certain percentage based on the correlation between the covariates and the outcome (Raudenbush, Martinez, & Spybrook, 2007).

The current article is limited to a single design in multilevel modeling, although future research can include other designs in multilevel modeling. For example, longitudinal studies have been increasingly analyzed as random coefficients model. In clinical studies, patients are often randomly assigned to the treatment and control at different sites and they are followed over an extended period of time. Such a design will call for more complex power analysis because the design not only involves sample size choice but also duration and frequency of the study (Raudenbush & Liu, 2001). Additionally, future research can be extended to power analysis for dichotomous outcomes in cluster randomized trials. The data analysis with binary outcomes typically uses a different estimation strategy than that with continuous variables. Statistical power for continuous variables cannot be directly applied. Power analysis for binary outcomes requires a completely different paradigm from the start. Although there has been tremendous progress in estimation theory of binary outcomes, little has been done on the relevant power analysis for binary outcomes in multilevel modeling. More research is needed to understand statistical power in testing binary outcomes in multilevel modeling.

Footnotes

Appendix

Declaration of Conflicting Interests

The author(s) declared no conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

References

Carter

B. L.

(2010). Designing quality health services research: Why comparative effectiveness studies are needed and why pharmacists should be involved. Pharmacotherapy, 30, 751–757.

Carter

Clarke

Ardery

Weber

James

Vander

… Collaboration, Among Pharmacists Physicians To Improve Outcomes Now (CAPTION) Trial Investigators. (2010). A cluster-randomized effectiveness trial of a physician-pharmacist collaborative model to improve blood pressure control. Cardiovascular Quality and Outcomes, 3, 418–423.

Lusignan. S.

Gallagher

Chan

Thomas

Vlymen

Nation

… Harris

(2009). The QICKD study protocol: A cluster randomized trial to compare quality improvement interventions to lower systolic BP in chronic kidney disease (CKD) in primary care. Implementation Science, 4, 1–15.

Donner

Klar

(2000). Design and analysis of cluster randomization trials in health research. London, England: Arnold.

Hsieh

F. Y.

(1988). Sample size formulae for intervention studies with the cluster as unit of randomization. Statistics in Medicine, 8, 1195–1201.

Johnson

Kotz

Balakrishnan

(1995). Continuous univariate distributions (Vol. 2, 2nd ed.). New York, NY: Wiley.

Lenth

(2001) Some practical guidelines for effective sample size determination. American Statistician, 55, 187–193.

McCulloch

C. E.

Searle

S. R.

(2001). Generalized, linear and mixed models. New York, NY: Wiley.

Muntner

Shimbo

Tonelli

Reynolds

Arnett

D. K.

Oparil

(2011). The relationship between visit-to-visit variability in systolic blood pressure and all-cause mortality in the general population. Hypertension, 57, 160–166.

10.

Murray

D. M.

(1998). Design and analysis of group-randomized trials. New York, NY: Oxford University Press.

11.

Perria

Mandolini

Guerrera

Jefferson

Billi

Calzini

… Pasquarella

(2007). Implementing a guideline for the treatment of type 2 diabetics: Results of a cluster-randomized controlled trial (C-RCT). BMC Health Services Research, 7, 1–9.

12.

Raudenbush

Bryk

(2002). Hierarchical linear models: Applications and data analysis methods (2nd ed.). Newbury Park, CA: Sage.

13.

Raudenbush

S. W.

Martinez

Spybrook

(2007). Strategies for improving precision in grouprandomized experiments. Educational Evaluation and Policy Analysis, 29, 5–29.

14.

Raudenbush

S. W.

Liu

(2001). Effects of study duration, frequency of observation, and sample size on power in studies of group differences in polynomial change. Psychological Methods, 6, 387–401.

15.

Zwar

Richmond

Halcomb

Furler

Smith

Hermiz

… Borland

(2010). Quit in general practice: a cluster randomised trial of enhanced in-practice support for smoking cessation. BMC Family Practice, 11, 1–8.