Abstract
We propose a general approach based on the concept of overdispersion for specification of a random effects model (REM) in meta-analysis. This approach is similar to that used in generalized linear models, and includes the traditional REM as a particular case. A key feature of the model is the interpretation of the multiplicative factor as an intra-class correlation parameter. We provide several motivating examples, discuss statistical inference, and compare the new and standard methods on two examples of published meta-analyses. Estimation of the overdispersion parameter in the proposed model is compared in simulations to that of the traditional between-studies variance in the case of normal means. For small values of heterogeneity, the coverage of the confidence intervals for the overdispersion parameter is more stable.
1 Introduction
In a traditional random effects meta-analysis the variance of the estimated effect size parameter
where θ is the overall effect size. More generally, other distributional assumptions can be made for the ith study effect size estimate:
Examples of non-normal distributions F and G are F binomial and G normal, or F normal and G a skewed normal or skewed t (see Lee and Thompson, 2008). Note, that the variance
The model (1.1) is an additive random effects model (REM). However, in some circumstances, a multiplicative model
might be more appropriate. Note that the
The basic REM (1.1) is based on the assumption that the K studies having particular characteristics are randomly chosen from a population of similar studies. The assumption of randomness, though generally accepted, is not subject to any testing procedure. For example, Peto (1987) noted that non-randomness occurs in designs with chronologically consecutive studies.
The proposed model is constructed by replacing φ in (1.3) by
where a(ni) are known linear functions of the sample sizes ni, i = 1, …, K. This model has an advantage of permitting inflation or deflation (over- or underdispersion) of the variances, in contrast to the additive model in (1.1) which permits only inflation of the variances. A positive feature of this model is that γ can be interpreted as an intra-class correlation parameter or a transformation of it; see Section 3 for details.
The proposed overdispersion model (ODM) is motivated by overdispersion due to intra-class correlation in the normal model or in the binomial model. Some background on overdispersion is provided in Section 2. The proposed model is formally introduced in Section 3. Method of moments and maximum likelihood estimators of the correlation parameter γ are given in Section 4; two well-known examples are provided in Section 5; a simulation study for the case of normal means is described in Section 6; final comments are in Section 7.
2 Background on overdispersion
The idea of overdispersion arises naturally for data from a Poisson distribution. The mean and variance of a Poisson variable are equal, but in some cases the data may have a sample variance that is smaller or larger than the mean. The mean–variance specification has been used as a definition for the terms under- or overdispersion. The paper by Bliss (1953) is one of the earliest to discuss this concept. Cox (1983) describes a model in which the variance has an inflation or deflation factor. An extensive examination of overdispered binomial data is provided by Anderson (1988).
In the case of a one-parameter family, the notion of overdispersion is relatively clear because the variance is a known function of the mean (see Cox, 1983). When the postulated mean–variance relationship does not hold, overdispersion means inflation of the variance above the nominal value. For a beta-binomial model (see Section 3.2), the variance is inflated relative to the binomial variance. In general, the inflation may be by a constant multiplier or it may depend on covariates.
For a location-scale family, such as the normal distribution, the scale can take on any positive value. So there is no notion of data being overdispersed relative to such a model. However, in the case that the data is modelled by a distribution with a particular scale, it can be over- or underdispersed relative to that assumption. In the general meta-analytic context overdispersion means variance inflation relative to that of the fixed effects model. Thus, the REM (1.1) is overdispersed whenever the variance component τ2 is non-zero. Similarly, the REM (1.3) is overdispersed when ϕ > 1.
In the more general context, overdispersion can arise from omitted unobserved variables. Regardless of origin of overdispersion, the main interest lies in the resulting marginal models, i.e., the composite models in which the mean and correlation functions are modelled directly and separately (see Pryseley et al., 2011). In meta-analysis, two possible mechanisms are clustering in the population and mixing resulting in a compound distribution. Different modelling approaches can lead to the same marginal model, so in general it is difficult to infer the precise cause, or underlying process, leading to the overdispersion. See Kim (2006), Hinde and Demétrio (1998) and references therein for further details.
3 Overdispersed models in meta-analysis
The proposed ODM was motivated by overdispersed normal data and by overdispersed binomial data when overdispersion is due to intra-class correlation.
The following examples make use of what has been called the intra-class correlation (ICC) matrix, namely, R = (rjj′) with rjj= 1, rjj′ = ρ (j ≠ j′), j, j′ = 1, …, n, where ρ is the common correlation between two distinct observations. This n-dimensional matrix has one eigenvalue equal to 1 + (n – 1)ρ and n – 1 eigenvalues equal to 1 – ρ. Because R is positive semi-definite, ρ is constrained to the interval: –1/(n – 1) ≤ ρ ≤ 1. For even moderately sized samples the intra-class correlation is predominantly positive.
3.1 Overdispersed normal data
For participant j in study i, denote the outcome by Yij = θ + bi + εij, j = 1, …, Ki, i = 1, …, K. Further, let the random effect variables bi be normally and independently distributed,
where ρi is the intra-class correlation coefficient (ICC). The ni × ni covariance matrix Vi for vector
where
In this parameterization,
For ρi > 0, the hierarchical model
with parameters
with parameters
Assuming homogeneous correlations, i.e., ρi = ρ for all ICCs, we propose the meta-analytic REM of the one-sample study level statistics:
equivalently,
This model can also be called an overdispersed (for ρ > 0) or underdispersed (for ρ < 0) normal model, in which the estimate
3.2 Overdispersed binomial data and the beta-binomial model
Consider the hierarchical model in which the group-level sums Xi of Bernoulli random variables have parameter πi, i.e., Xi ∼ Bin(ni, πi); the probabilities πi have a common beta distribution with parameters α, β, 0 < α, β < 1; that is πi ∼ Beta(α, β). Then Xi has a beta-binomial distribution with the density
where B(u, v) is the ordinary beta function (Euler integral). Note that
Reparametrize (3.6) with π = α/(α + β) and θ = 1/(α + β+ 1), 0 < π, θ < 1, or equivalently, α = π (1 – θ)/θ, β= (1 – π)(1 – θ)/θ. Then E(Xi) = ni π and it follows that
Note the linear factor in the variance.
Alternatively, a marginal ICC-based model can be obtained as follows. Let Bernoulli variables Yij ∼ Bin(1, π) and corr(Yij, Yij′) = ρ Then for
(Prentice, 1986). The beta-binomial model is widely used in both frequentist and Bayesian meta-analysis of binomial data (see e.g., Young-Xu and Chan, 2008). For a general development of hierarchical and marginal approaches to the generalized linear models see Pryseley et al. (2011).
3.3 Comparative studies
In a two-arm study with a treatment (T) and a control (C), hierarchical models can be used to compare directly two-sample effect measures θi, i = 1, …, K. Examples of these measures are standardized mean differences, risk differences or differences of log-odds ratios. As we shall see, a variety of marginal arm-level models are possible, and there is no simple way to choose among them.
where µiC is a control arm mean, zk is an indicator variable for T or C, and the arm-level covariance matrices Vik are given by (3.1) with an additional subscript k to denote an arm of the ith study. Denote correlations within the respective arms by ρiC and ρiT. Additionally, let ρiCT be the correlation across the arms of the ith study. The studies are assumed to be independent.
Suppose the difference of means µT – µC is estimated by the difference
Both forms (3.9a) and (3.9b) provide different insights into the model. The standard REM has a random effect only in the treatment arm (see Thompson and Sharp (1999), models 6a,b), so that ρC = ρCT = 0, and (3.9a), (3.9b) reduce to
Another option is to assume independence between the arms (ρCT = 0), and equal correlations ρC = ρT = ρ within each arm, resulting in
Define R = nT/nC to be the allocation ratio, and
The term v(R) is a precision measure of the difference between sample means. It is a function of the allocation ratio
For
From (3.12),
The form (3.14b) is more useful for estimation because the sample variances
where ai = bni + c are linear functions of the sample sizes ni. When
Formulation (3.15) also holds for other standard two-sample effect measures used in meta-analysis, such as, the standardized mean difference, the log mean ratio and the binary data based effect measures, discussed in the following.
For binary outcomes with the probabilities pT and pC in the treatment and the control arm, denote the difference of risks by RD = pT – pC, the log relative risk by LRR = log(pT/pC), the log-odds ratio by LOR = log([pT/(1 – pT)]/[pC/(1 – pC)]), and the difference of arcsine-transformed risks by
For overdispersed binomial distributions in one or both arms, still assuming independence between the two arms, equations (3.13) and (3.14a) with v(R) for each effect measure given by equations (3.16) can be obtained by the standard delta-method.
In general, in the fixed effects meta-analysis summary effects
The proposed REM of meta-analysis for the two-sample study level statistics is
As before, the values ai = bini + ci are linear in the ni. In the nonnormal case there may be additional nuisance parameters. The values bi = b(Ri, θi, ψi) and ci = c(Ri, θi, ψi) depend on the allocation ratios Ri, parameters of interest θi and (vector) nuisance parameters ψi. The model
can also be called an over- or underdispersed normal model. The choice of ai = ni/vi(Ri) combined with the restriction γ ≥ 0 results in the standard REM.
In the model (3.17) γ is a nuisance parameter independent of study i and the effect of interest θ, and it appears in the variance function as a multiplicative factor. Because γ > –1/max(ai), underdispersion is also possible though usually not likely when sample sizes are moderate or large. For particular choices of coefficients bi and ci, the overdispersion in a meta-analysis can be interpreted through a marginal model involving the ICC. For binary effect measures, when the value of γ is less than 1, γ can be directly interpreted as the ICC ρ. For various measures based on the sample means for normal data, the parametrization γ = ρ/(1 – ρ) is more appropriate, cf (3.5b, 3.14b). Alternatively, for a simple ODM ai = ni can be used.
In (3.17), note that the variance of
4 Estimation of γ
For known γ, the weights for estimating θ are
The parameter γ can be estimated by a variety of methods, no one of which is uniformly the best, regardless of the criterion. We discuss here several alternatives. In this section the variances
4.1 Method of moments estimation of γ
For Cochran’s statistic
Under alternatives (3.17),
The exact non-null distribution of the Q-statistic is complicated. A number of approximations have been developed. One scheme is to approximate the distribution of
To find the moments of Q under alternatives (3.17), one needs the covariances
where
where
The estimator in the spirit of random effects estimates of variance (e.g., DerSimonian and Laird, 1986) is
underdispersion is present for Q < K – 1.
Alternatively, we can use a two-sided test of γ= 0 based on Q. Overall this model appears to be more flexible than the standard REM because Q < K – 1 in a large proportion of meta-analyses, which is the case of negative γ.
4.2 Confidence intervals for γ
Several approaches to obtain a confidence interval for γM are described in the following, though others are available. The first is based on the gamma distribution with two parameters r and λ. Denote the gamma distribution function by G(r, λ) with density
Following Biggerstaff and Tweedie (1997), the variance of Q under alternatives (3.17) is
For equal ai = a the variance (4.5) simplifies to var(Q) = 2(K – 1)(aγ + 1)2 in accordance with the scaled
One approximation for the distribution of Q when the ai differ is to match the first two moments of the gamma distribution:
For simplicity of notation we write r and λ for r(γ) and λ(γ). The estimator
This density function can be used to find a confidence interval for γ(Hedges and Olkin, 1985, pp. 54–56). Define the lower and upper functions L(γ) and U(γ) by
A 100(1 – α)% confidence interval for γ based on this approximate distribution is now obtained as a solution to L(γ) = α1 and U(γ) = αfor α1 + α2 = α.
A second approach is to invert the Q test. When the correct weights
Viechtbauer (2007) shows that the standard REM confidence intervals for τ2 based on this approach, named Q-profile, perform very well, better than the intervals using the method of moments and better than the restricted maximum likelihood confidence intervals described next.
4.3 Maximum likelihood estimation of γ
The likelihood function for the normal distribution with mean µ (omitting constants) is
The ML estimates of γ and µ are given by the solution of the equations
From (4.11b), the estimate of the mean is
For equal ai = a, (4.11b) simplifies and yields
4.4 Restricted maximum likelihood estimation (REML)
The restricted likelihood is
for
where
For K = 2 studies,
The REML confidence intervals are given by all values of γ which satisfy
where
4.5 Mandel–Paule (MP) method
Under alternatives γ ≠ 0, the Q-statistic with the new weights
This is a strictly decreasing function of
The MP algorithm was initially introduced in the context of inter-laboratory studies, but was then adopted for meta-analysis by Rukhin (2003) and DerSimonian and Kacker (2007). In the standard REM, the MP estimate of τ2 coincides with the REML estimate given that the variances are equal to sample variances. As noted by Rukhin, Biggerstaff and Vangel (2000), the MP algorithm is also a generalized Bayes procedure for the common parameter θ. Desirable statistical properties and computational simplicity make this method of estimation our favourite.
Confidence intervals are constructed by the Q-profile method (4.9). A further improvement is possible by replacing K − 1 by an improved approximation for E0(Q) correct to order 1/n. In general, improved approximations are effect-measure specific. The correction of Welch (1951) should be used for normal means; approximation by Kulinskaya, Dollinger and Bjørkestøl (2011) for standardized mean differences. Similarly, the Q-profile confidence intervals for γ should use improved approximations instead of the chi-square distribution.
The Mandel–Paule estimates of the heterogeneity parameter γ, the Q-profile confidence intervals (4.9) for γ and the resulting estimates of the combined effect
4.6 Estimating and testing γ in the case of underdispersion
As noted, the range of γ values is constrained from below by −1/amax. As mentioned previously, the Q*(γ) statistic is a strictly decreasing function of γ given by (4.14). For simplicity, let amax = a1. When γ= −1/a1, the corrected variance of the effect θ1 given by
The limiting value of the weighted mean when γ → −1/a1 is θ1, and the limiting value of Q is Qlim ≈ Q(−1/a1 + ε) for a small value of ε. The value of Qlim determines the estimates of θ and γ. A flowchart for different scenarios is provided in Table 1. In the table, amax = max(ai),
The upper confidence limit
When an improved approximation by a distribution F(⋅) is used to approximate the distribution of Q, the first moment of this distribution should be substituted for K − 1, and critical values of the distribution F(⋅) for the chi-square critical values in the first column of Table 1.
4.7 Confidence intervals for θ
For a weighted mean
An alternative suggested by Rukhin (2009) is to estimate the variance of
He shows that confidence intervals of the form
5 Examples
5.1 Effects of diuretics on pre-eclampsia
A well-known meta-analysis of nine trials evaluated an effect of diuretics on pre-eclampsia (Collins, Yusuf and Peto, 1985). These data have been studied repeatedly, as for example in Hardy and Thompson (1996), Biggerstaff and Tweedie (1997) and Viechtbauer (2007). The basic data with odds ratios, and their logs are provided in Table 2a. The effect sizes indicate considerable heterogeneity in the populations in these trials. This heterogeneity can be seen from the differences in the overall risk
Odds ratios (OR) of effect of diuretics on pre-eclampsia patients; n
C
and n
T
are the numbers of patients in the control and treatment (diuretics) groups, and x
C
and x
T
are the numbers of patients who developed pre-eclampsia in these groups. LOR is the log-odds ratio, and the pooled overall risk
Confidence intervals for log-odds ratios and for odds ratios for diuretics on pre-eclampsia example; ODM is the overdispersion model with a i = ni /2
The Q-statistic value is Q = 27.265, the total sample size N = 6942 and
To see the effect of these estimates of heterogeneity on the inference about the log-odds ratio, we compare the corresponding estimates for LOR and OR, and confidence intervals for LOR and OR in Table 2b. Note that all models yield confidence intervals that do not include zero for LOR or one for OR. The FEM yields the shortest confidence interval, as expected. The four REMs yield confidence intervals with widths ranging from 0.47 to 0.59 for OR and from 0.71 to 0.90 for LOR. The standard REM (with DL) and the ODM (with γM) yield the shortest confidence intervals.
Table 2c shows the weights attributed to each study using the five methods: FEM, REM (with DL), REM (with REML), ODM (with M) and ODM (with MP). As expected, the FEM shows a discrepancy with the four REMs. The ODM yields the largest estimated weights for studies 3, 4, 6 and 9, whereas for the REM studies 2, 5–7 and 9 yield the largest values. However, these differences are generally small, except for studies 5, 7 and 9.
The differences in weights are due to the fact that the precision of the studies 5 and 7 as measured by the inverse of the LOR variance is very low, whereas for study 9 it is rather high. These differences in precision have direct bearing on the ODM weights of the studies, whereas the REM weights of the studies are to large extent equalized.
of inverse variance weights (%) for diuretics on pre-eclampsia example. We compare three models, the FEM, the standard REM and the ODM. The estimators of heterogeneity used are given in the brackets: DerSimonian–Laird (DL), restricted maximum likelihood (REML), method of moments (M) and the Mandel–Paule (MP).
5.2 Cochrane logo data
This example is a meta-analysis of seven randomized controlled clinical trials of a corticosteroids treatment to reduce mortality in premature babies (Crowley, Chalmers and Keirse, 1990). The data is used in the Cochrane Collaboration logo.1 This example illustrates the case of line 2 in Table 1. The effect measure is the difference in risks Δ = pT − pC. Summary data is found in Table 3. For this data, Cochran’s Q-statistic is Q = 3.82 at 6 degrees of freedom, resulting in the p-value of 0.701. The DL and REML methods estimate the random variance component τ2 by zero. The FEM indicates that the treatment significantly reduces mortality. The inverse variance estimate of the risk difference is −0.0547 0.0245.
There is significant heterogeneity of risks of mortality in the treatment (Q = 23.361 at 6 df), but not in the control (Q = 4.358) arms, suggesting the model (3.10). The ai values are calculated from (3.13) with
Risk difference for the effect of corticosteroids on the mortality in premature babies (Cochrane logo data); n T and n C are the sample sizes in treatment (T) and control (C), and x T and x C are the numbers of events in the arms of each study; d is the estimated difference in proportions of events between the two arms of each study.
6 Simulation results
In this section we provide a simulation study of the properties of the Q-profile confidence intervals for the ICC parameter γ in comparison to those for the random effects variance τ2. Because the Q-profile confidence intervals for τ2 were found by Viechtbauer (2007) to perform the best in the standard REM we limit our study to these intervals.
In addition we present a simulation study of confidence intervals for the combined effect θ. We include both conventional and improved intervals based on the variance (equation 4.15) combined with critical values from the t-distribution (Rukhin, 2009) for both REM and ODM, when γ and τ2 are estimated by the moment and the MP methods. Also included are confidence intervals for γ obtained by the improved MP method based on the Welch correction (Welch, 1951) to the distribution of Q, denoted by
The size ni of the studies was generated from a normal distribution with mean n and variance n/4 rounded to the nearest integer and was left-truncated by 3. To make the two models comparable, we generate estimates
The following configurations from Viechtbauer (2007) are included in the simulations: K = (10; 20; 30; 50; 80), n = (10; 20; 40; 80; 160), γ between 0 and 1 in steps of 0.1, and θ is set to 0. A total of 10 000 iterations were run for each combination. Therefore the standard error of the empirical coverage probabilities is at most 0.005.
Achieved confidence levels of confidence intervals for γ and τ2 at the nominal confidence level of 0.95 for the number of studies K = 10, 30 and 80 are presented in Figure 1. The Q-profile confidence intervals for τ2 behave well for τ2 ≥ 0.1. Their actual coverage can be considerably lower (for small n) or higher (for large n) than the nominal confidence level under the null.
Achieved confidence levels of the γ and τ2 confidence intervals at the nominal confidence level of 0.95. The methods used are represented as follows: squares (Q-profile interval for τ2 based on χ2 distribution) triangles (Q-profile interval for γ based on χ2 distribution) and circles (Q-profile interval for γ with Welch correction). Here K is the number of studies and N is the total sample size of each study.
Confidence levels of the Q-profile confidence intervals for γ based on the χ2 approximation to the distribution of Q are considerably lower than the nominal level. The coverage improves for large sample sizes n.
The Q-profile confidence intervals for γ based on the Welch approximation to the distribution of Q are reasonably good across all values of γ even for small sample sizes, and are recommended. The main advantage of these intervals in the comparison to the Q-profile confidence intervals for τ2 is their stability when γ is small.
Achieved confidence levels of the confidence intervals for the combined effect θ at the nominal confidence level of 0.95 for the number of studies K = 10, 30 and 80 are presented in Figure 2. Coverage of the traditional confidence intervals with weights incorporating
Coverage is greatly improved for the confidence intervals obtained by the method of Rukhin (2009). These intervals, centered at the weighted means with the weights incorporating
7 Discussion
The tradition of reporting meta-analytic results is in terms of FEM or REMs. However, whereas the FEM is well defined, there can be alternative representation of the REMs. We have pursued a model that uses an inflationary or deflationary factor applied to the FEM variance, and demonstrated its effect on the estimation of several parameters. The main rationale of our model is the interpretation of the multiplicative factor through the intra-class correlation parameter. To examine more visibly the effects of using different models, we give results for two well-known data examples.
The multiplicative model (1.3) introduced by Thompson and Sharp (1999) was used in a number of publications (Moreno et al. 2009, 2012) to adjust for a possible small study bias. Moreno et al. (2012) argue that this is a FEM because the studies with larger sample sizes have larger weights. When all the studies are large, the traditional REM results in approximately equal weights. In contrast, our ODM apportions larger weights to studies with higher precision.
An important advantage of the ODM is the possibility of interpretation and modelling of the heterogeneity in meta-analysis through the ICC parameter as it relates to the studies and their ‘populations’. We assumed that the ICC parameter was constant between the studies. When the number of studies is limited, as often is the case in meta-analysis, this assumption is necessary for estimation. For a large number of studies, meta-regression can be used for modelling the disparities among the homogeneous in respect to ICC subgroups of studies. Such disparities constitute one of the possible but so far neglected reasons for heterogeneity between studies.
Achieved confidence levels of the combined effect θ confidence intervals at the nominal confidence level of 0.95. The variance estimators used in weights are represented as follows: open squares rhomboids open triangles reverse triangles filled triangles ( with the Rukhin (2009) correction) and filled squares with the Rukhin (2009) correction). Here K is the number of studies and N is the total sample size of the two arms of each study.
We have performed simulations to compare estimation quality for the overdispersion parameter in the new model versus the traditional variance component in the case of normal means. The results are encouraging: the estimation quality in our simulations is considerably better for small values of the overdispersion parameter. Our results agree with other simulations, such as, Viechtbauer (2007), with respect to the poor quality of traditional estimators of the between-studies variance τ2 for small values of heterogeneity. It is worth noting that we used an improved Welch approximation (Welch, 1951) in order to achieve satisfactory coverage of the confidence intervals for γ. Additional simulations need to be carried out for other popular effect measures, such as, the standardized mean difference and the odds ratio.
Footnotes
Acknowledgements
The authors thank two anonymous referees, an associate editor and the editor for helpful suggestions on the presentation of the contents of this article.
