Abstract
Meta-analyses of educational research findings frequently involve statistically dependent effect size estimates. Meta-analysts have often addressed dependence issues using ad hoc approaches that involve modifying the data to conform to the assumptions of models for independent effect size estimates, such as by aggregating estimates to obtain one summary estimate per study, conducting separate analyses of distinct subgroups of estimates, or combinations thereof. We show that these ad hoc approaches correspond exactly to certain multivariate models for dependent effect sizes. Specifically, we describe classes of multivariate random effects models that have likelihoods equivalent to those of models for effect sizes that have been averaged by study, classified into subgroups, or both. The equivalencies also apply to robust variance estimation methods.
1. Dependent Effect Sizes
Across fields ranging from education and psychology to clinical medicine and health sciences, it is very common that some of the primary studies included in a meta-analysis report more than one effect size estimate based on the same sample (Ahn et al., 2012; Page et al., 2015; Tipton et al., 2019). For example, in a meta-analysis of effects of an educational intervention on reading outcomes, some studies might have collected multiple measures of reading performance (each of which meets the inclusion criteria of the synthesis) or may have measured outcomes at multiple follow-up times. Some studies might have tested more than one version of an intervention, such that it is of interest to include effect sizes comparing each version to a no-intervention control condition. Studies might even have several of these features, potentially contributing a large number of relevant effect size estimates.
When a meta-analysis includes multiple effect size estimates from the same sample, the estimates will typically be dependent, or correlated, because they are based on data from the same set of individuals. The meta-analysis will therefore need to use analytic methods that account for the dependence structure. In a review of methods for handling dependent effect size estimates, López-López and colleagues (2018) delineated two types of approaches: “reductionist” strategies, which modify the data structure to conform to the assumptions of a model appropriate for independent effect size estimates, and “integrative” strategies, which involve developing models for the full data, including dependent effect sizes. In this article, we shall examine the connections between these two classes of approaches.
1.1. Reductionist Strategies
Meta-analysts have often handled dependent effect size estimates by making ad hoc modifications to the data in order to meet the assumptions of a simple model for independent effect size estimates. One way to do so is to average the dependent effect size estimates from each study, yielding a dataset with a single summary effect size estimate from each independent sample (Borenstein et al., 2009; Rosenthal & Rubin, 1986). Another pragmatic approach is to classify effect sizes from each sample into subgroups (e.g., based on the type of outcome measure), where each subgroup includes at most one effect size from each sample (Becker, 2000). For situations where it is not feasible to distinctly classify effect sizes from every sample, Cooper (1998, 2016) described a “shifting unit of analysis” strategy. This entails creating distinct subgroups to the extent possible, then averaging together any dependent effect size estimates within each subgroup. The analyst then either conducts separate analyses of each subgroup or estimates a single model to compare results across subgroups.
These reductionist strategies originated out of practical necessity, before the development of software tools that could be used to model dependent effects (Becker, 2000). Even after such tools became widely available, aggregating, subgrouping, and shifting the unit of analysis have remained in regular use. In a review of meta-analytic practices based on syntheses published in 2016, Tipton and colleagues (2019) found that averaging effect sizes and shifting the unit of analysis were used frequently across several fields. For example, Furenes et al. (2021) aggregated effect sizes to the study level in a meta-analysis examining difference in children’s learning outcomes when using digital versus paper books. Other recent meta-analytic reviews on topics within education and psychology have applied the shifting unit of analysis approach (Graham et al., 2020; Jung et al., 2020; Koenka et al., 2021). Thus, meta-analysts have continued to use reductionist strategies in reviews published within the last 5 years.
1.2. Integrative Strategies
Integrative strategies entail developing analytic models for the full, unmodified data, which necessitates using methods that accommodate dependent observations. Multivariate meta-analysis models for dependent effect size estimates were described early in the history of meta-analysis methodology (Hedges & Olkin, 1985, Chapter 10; Raudenbush et al., 1988) and further developed during the 1990s (Becker, 1992; Berkey et al., 1998; Kalaian & Raudenbush, 1996; van Houwelingen et al., 2002). These models required specifying the degree of correlation between effect size estimates drawn from the same study. Although formulas and methods for estimating such correlations are available (Gleser & Olkin, 2009; Wei & Higgins, 2013), their use requires knowing the correlation between outcomes in each sample—information that is infrequently reported in primary studies. This difficulty, combined with the lack of software for estimating multivariate models, long limited the use of multivariate meta-analytic models (Becker, 2000).
More recent methodological developments in meta-analysis have brought renewed interest in multivariate models that can accommodate dependent effect sizes (Jackson et al., 2011). Innovations such as robust variance estimation (RVE; Hedges et al., 2010), pseudo-likelihood methods (Chen et al., 2014; Hong et al., 2018), multilevel meta-analytic models (Konstantopoulos, 2011; Van den Noortgate et al., 2013, 2015), and combinations thereof (Fernández-Castilla et al., 2020; Pustejovsky & Tipton, 2022) provide tools for directly modeling the effect size estimates, without requiring ad hoc modifications to the data structure or statistical information that is seldom reported in primary studies. As a result, these more recently developed approaches can help researchers to develop richer and more nuanced insights from meta-analytic reviews (Cheung, 2019). Although these more recently developed integrative methods have potential advantages over reductionist approaches, they are also more complex and less accessible to researchers without specialized training.
1.3. Aims
In light of the continued use of ad hoc approaches and the greater complexity of more recent innovations, meta-analysts might well have cause to wonder: Are newer methods like RVE or multilevel meta-analysis always necessary? Do they make obsolete the older approaches for handling-dependent effects? Or are simpler approaches such as averaging effect sizes, subgrouping, or shifting the unit of analysis still sometimes useful and reasonable to apply?
In this article, we aim to shed some partial light on these questions by demonstrating equivalencies between older, reductionist approaches that involve aggregating or subgrouping of effect size estimates and newer, multivariate modeling approaches for dependent effect sizes. In Section 2, we describe conditions under which estimates from a random effects meta-regression of averaged effect size estimates correspond precisely to those from a certain multivariate model called the correlated effects model. In Section 3, we describe conditions under which the shifting unit of analysis approach is equivalent to the subgroup correlated effects model for dependent effect sizes, as described by Pustejovsky and Tipton (2022). These equivalence relations concern the likelihoods of the models, implying that full and restricted maximum likelihood estimates from the reductionist models correspond exactly to estimates from the integrative models. The equivalencies also apply to RVE methods for each model, as we detail in Section 4.
The equivalence relations provide a basis for guidance about when the simpler, reductionist methods may be validly applied in practice. The representations of the simpler approaches as multivariate models also provide a way to directly compare them to more complex models, which is useful both for further theoretical development and for practical application, such as in likelihood ratio tests for comparison of nested models. We discuss further implications in Section 5.
1.4. General Notation
Certain matrix operations will be essential in the subsequent development. We will use
We shall consider a collection of J studies, where study j contributes
In the following sections, we will make reference to the likelihood and restricted likelihood based on a model for normally distributed effect size estimates. With a multivariate normal random vector
For this model, the log of the likelihood is
where
Because maximum likelihood estimators of variance components are known to be biased, researchers often prefer to use variance estimators that maximize the restricted likelihood. The restricted likelihood can be understood as the likelihood of a linear combination of
where
2. Aggregating Effect Sizes
One reductionist method to handle the dependence between effect size estimates is to average—or aggregate—the effects within each study and then develop a study-level meta-regression model. Let
where
In many applications, the sampling variances for individual effect sizes within a given study j will all be equal (or nearly equal), so that one can assume
with sampling variance
The approach to aggregating effect sizes based on Equations 5 and 6 has been described in introductory meta-analysis texts (cf. Borenstein et al., 2009, Eq. 24.6) and implemented in software (e.g., the
2.1. Meta-Regression Model
Once the meta-analyst has calculated study-level aggregated effect size estimates, they would then proceed to estimate summary meta-analysis or meta-regression models. Let
where uj
is a random effect for study j, assumed to be normally distributed with mean zero and variance
Rather than averaging effect size estimates, the meta-analyst might estimate a multivariate meta-regression model for the raw effect size estimates. Consider the model
where uj
is a normally distributed, between-study random effect with mean zero and variance
If the aggregated effect size estimates are calculated using inverse-variance weights as given in Equation 4, then the likelihood of the random effects meta-regression model in Equation 7 is equivalent to the likelihood of the correlated effects model given in Equation 8. By equivalent, we mean that the likelihoods are proportional. The equivalence of these models follows from Theorem 2 in Section 3.2. Equivalence implies that full maximum likelihood estimators and restricted maximum likelihood estimators based on Model (7) are, by definition, identical to the corresponding estimators based on Model (8). Model-based standard errors based on Model (7) are also equal to the standard errors based on (8). Furthermore, in a Bayesian inference framework, the posterior distributions of both models will be identical so long as the same priors for
Because of its equivalence with the correlated effects model, the strategy of aggregating effect size estimates by study can be understood not only as a reductionist strategy but also as a multivariate model for the raw effect size estimates, which can be compared to other multivariate models. Conversely, the correlated effects model can be understood as a model based on averaging effect size estimates to the study level. This suggests that one could use averaged effect sizes for the purposes of creating model diagnostics or graphical representations (forest plots, funnel plots, etc.) of meta-regression results based on the correlated effects model.
For equivalence between these models to hold exactly, the averaged effect sizes must be calculated using inverse-variance weights. If the analyst uses other weights or other methods for approximating the sampling variance of
The equivalence between Models (7) and (8) covers estimators based on the full or restricted likelihood but does not necessarily extend to other classes of estimators. Hedges et al. (2010) proposed a method of estimating the correlated effects model using a moment estimator of
2.2. More Complex Meta-Regression Models
When inverse-variance weights are used for averaging effect sizes, the equivalence between models extends to specifications that are more complex than the random effects meta-regression from Model (7). More generally, consider a model for the averaged effect sizes that has additional random effects terms:
where
This more general model encompasses a variety of different random effects structures that are of interest in different contexts (cf. Sera et al., 2019). For example, the J studies included in a synthesis might have been conducted by a smaller number of investigators or lab groups, each of which tends to use similar operational procedures across studies (e.g., similar experimental manipulations, preferred outcome measurement instruments, etc.). In such a situation, the meta-analyst might use a hierarchical random effects model (Konstantopoulos, 2011) to partition the between-study heterogeneity into between-lab variation and within-lab variation. In this specification, the random effects vector
Rather than averaging effect sizes before fitting Model (9), the meta-analyst might instead fit a multivariate model for the raw effect size estimates. By Theorem 2 in Section 3.2, Model (9) is equivalent to the model
where the predictors
2.3. An Illustration
We present an empirical example to illustrate the equivalence between random effects meta-regression models for aggregated effect size estimates and multivariate meta-regression models for raw effect size estimates, as well as to highlight some differences in auxiliary statistical results typically reported with univariate and multivariate models. Clinton-Lisell (2021) conducted a meta-analysis of studies investigating the effects of multitasking on reading performance and reading time. For the purposes of illustration, we reanalyzed the effects for reading performance outcomes, which were drawn from 20 primary studies. Effect size estimates were expressed as standardized mean differences with positive values representing improvement in reading performance. The dataset includes 31 effect size estimates because some studies report results for more than one measure of reading performance, resulting in dependent effect sizes. We estimated a model that includes two study-level predictors (out of five moderators examined in the original paper): pacing control (self-paced or experimenter-paced) and reading medium (screen or paper).
We used the
The top panel of Table 1 reports the estimates of the random effects meta-regression for aggregated effects (first column) and the multivariate correlated effects model (second column). It can be seen that the point estimates, standard errors, and confidence intervals for the fixed and random effects are identical across both models.
Comparison of Aggregated Meta-Regression Model and Multivariate Meta-Regression Model
To illustrate that this equivalence extends to more complex models, we also estimated specifications where the between-study heterogeneity is allowed to vary as a function of pacing control (self-paced vs. experimenter-paced), as in a random effects location-scale model (Viechtbauer & López-López, 2022). The bottom panel of Table 1 reports the estimates of the location-scale model estimated using both the aggregated effect sizes and the multivariate equivalent. As expected, the regression coefficients and variance component estimates are all identical across both approaches, as are the associated standard errors and confidence intervals.
Although the parameter estimates are identical regardless of whether the models are estimated based on the aggregated or raw effect sizes, the log likelihoods and Q statistics differ depending on the specification. Theorem 2 shows that the difference in log likelihoods is a constant that does not depend on the model parameters. As a result, the difference in log likelihoods is the same in both the random effects specification (top panel) and the location-scale specification (bottom panel). A likelihood ratio test comparing the models with and without the scale predictor supports the simpler model in the top panel (
The difference in the Q statistics arises because the Q statistic for the aggregated random effects model captures only between-study heterogeneity, whereas the Q statistic for the correlated effects model captures both between-study and within-study heterogeneity in effect size estimates. The difference in Q statistics can therefore be interpreted as a measure of within-study heterogeneity. In this example, the large difference in Q statistics indicates that there is substantial excess heterogeneity in effect sizes drawn from the same sample (
Another method of evaluating within-study heterogeneity is to fit a multivariate model that includes additional random effects for each estimate, as in the correlated-and-hierarchical-effects (CHE) model described by Pustejovsky and Tipton (2022). A likelihood ratio test comparing the CHE model to the correlated effects model supports the more complex CHE model (
The original analysis by Clinton-Lisell (2021) used a correlated effects model estimated using the robu function from the
3. Subgrouping and Shifting the Unit of Analysis
Meta-regression models based on aggregated effect size estimates are constrained to using study-level predictors, yet meta-analysts may also want to consider predictors that vary within studies. The meta-analyst might use subgrouping or shifting the unit of analysis to handle dependence when analyzing effect size estimates classified by a categorical predictor. For example, Koenka et al. (2021) reported a meta-analytic review of the effects of different forms of feedback on academic motivation. The authors examined a variety of moderators, several of which varied across effect sizes within a study. For example, many primary studies reported effect sizes for multiple measures of motivation that differ in focus (i.e., internal vs. external focus). In order to investigate whether motivation focus explains variation in effects, the authors applied the shifting unit of analysis technique, aggregating across effect sizes from the same study that have the same focus, and then using meta-regression to examine differences in effects depending on whether the outcomes assessed internal or external focus.
Consider a categorical predictor with G categories and let
These strategies can be formally described using a set of indicator variables for each category of the predictor. Let
Following the same reasoning as in the previous section, the most efficient method of calculating weighted average effect size estimates is based on inverse-variance weighting of the effects within each category. With inverse-variance weighting, the weighted average effect size estimate for category g of study j is calculated as
with sampling variance given by
In the common situation where the meta-analyst assumes that effect size estimates from study j all have the same sampling variance
with sampling variance
If there is only a single effect size per category from any given study, then the category-specific estimates are merely a reindexing of the effect sizes in each study.
3.1. Subgroup Analysis
After subgrouping or shifting the unit of analysis, one analytic approach is to fit separate models based on the estimates from each category. Following this approach, the meta-analyst would fit the random effects meta-regression
using only the estimates from category g (i.e.,
Fitting separate models to each subgroup of effect size estimates has the disadvantage that it does not provide any easy way of making formal comparisons across categories. However, the approach described above can be represented as a single model, following the principles of composite marginal likelihood (Chen et al., 2014; Varin, 2008; Varin et al., 2011). Following the composite likelihood approach, one can estimate a single model using all of the data, but allowing for distinct parameters corresponding to each category of effects. The specification then becomes
where
The composite model representation is useful in that it allows for direct comparison of meta-regression coefficient estimates from different subgroups. This might seem counterintuitive because the dependence structure of the effect size estimates is misspecified—although aggregating removes the dependence of effect size estimates within each category, there may still be dependence between the effect size estimates across categories. However, RVE methods can be applied to account for this misspecification and provide valid assessments of uncertainty regarding the regression coefficients, even for comparisons between regression coefficients from different categories (Pustejovsky & Tipton, 2022). Chen et al. (2014) investigated this strategy in the context of multivariate meta-analysis where information about the correlations between effect size estimates is not available. Their simulation results showed that the approach provides well-calibrated standard errors and confidence intervals. We discuss RVE further in Section 4.
Let
Theorem 1 implies that full or restricted maximum likelihood estimators based on the composite Model (16) are identical by definition to those based on Model (15) for each subgroup; model-based standard errors and confidence intervals are also identical across specifications. We show in Section 4 that robust variance estimators based on Model (15) are identical to those based on Model (16).
3.2. Ignoring Cross-Category Dependence
The strategy of shifting the unit of analysis and then estimating a meta-regression of aggregated effect size estimates from each category is not limited to models with distinct parameters for each category of effects (i.e., models with separable parameter sets). Rather, a meta-analyst might want to estimate models in which some parameters are category-specific while others are shared across categories. For example, they might be willing to assume that the category-specific random effects in Model (16) have the same variance across all categories, so that
These specifications can be described by the model
where
Similar to Model (9), the random effects specification in Model (17) encompasses a broad set of possible structures, including hierarchical random effects models (Konstantopoulos, 2011), random effects location-scale model (Viechtbauer & López-López, 2022), phylogenetic meta-analytic models (Hadfield & Nakagawa, 2010; Lajeunesse, 2009), and cross-classified random effects models (Fernández-Castilla et al., 2019). Model (17) includes Model (16) as a special case, in which the parameters
If the category-specific average effect size estimates are calculated using the inverse-variance weights given in Equations 11 and 12, then Model (17) is equivalent to a model for the raw data that treats effect size estimates from different categories as independent. Let
Let
Now, consider the model
where the sampling errors for study j,
The following theorem describes the equivalence between the likelihoods of Model (17) and Model (18). We provide a proof in Online Appendix C.
The implication of Theorem 2 is that a meta-regression model estimated after shifting the unit of analysis with categories
Theorem 2 also establishes the equivalence between Models (7) and (8) and between Models (9) and (10) because all of these models are special cases with
3.3. Another Illustration
Roberts et al. (2022) reported a synthesis of experimental studies of the enactment effect, the phenomenon that physically performed action enhances memory of a word or phase more strongly than simply reading it. We reanalyzed the behavioral studies synthesized by Roberts et al. (2022), comprised of 145 studies and 443 effects, in order to demonstrate the equivalencies described in Theorems 1 and 2. Data and code for replicating all calculations are available at https://osf.io/9u52e/.
The original authors investigated multiple moderator variables, including the comparison task, test format, and use of objects. For illustrative purposes, we examined difference in effect size based on the type of comparison task. The data include 33 studies with 98 effects involving experimenter-performed tasks (EPT), 17 studies with 29 effects involving imagery tasks (IT), and 123 studies with 316 effects involving verbal tasks (VT); 28 of the studies included effects from two distinct types of comparison tasks. The number of effects reported in each study ranges from 1 to 12 for EPT, from 1 to 4 for IT, and from 1 to 8 for VT, leading to the dependence within the comparison task subgroups within some studies. Considering this dependence structure, a meta-analyst might follow the shifting unit-of-analysis strategy and aggregate effect sizes for each comparison task within each study.
Table 2 reports results from several different approaches to the moderator analysis. In the first column, we calculated subgroup-aggregated effect size estimates for each comparison task type in each study, assuming a constant correlation of
Comparison of Aggregated Effects Subgroup Analysis, Aggregated Effects Composite Model, and Subgroup Correlated Effects (SCE) Model
Note. EPT = experimenter-performed tasks; IT = imagery task; VT = verbal task.
In the second column, we estimated average effect sizes for each subgroup using data from all three comparison tasks in a single composite model. The model included separate intercepts for each task type and allowed the between-study heterogeneity to differ by task type (i.e., a location-scale model). Because the estimates are based on a single model, we obtained only a single log-likelihood, equal to the sum of the log-likelihoods for the three task types reported in the first column. Similarly, we obtained a single Q statistic that is the sum of the Q statistics from the subgroup-specific models.
The point estimates and standard errors from the composite model are equal to those from the subgroup-specific models, as implied by Theorem 1. However, the confidence intervals for the regression coefficients differ slightly because they are calculated using different degrees of freedom. The degrees of freedom for the t tests in the subgroup analyses are equal to one less than the total number of studies reporting effect sizes in a given category, whereas those for the composite model are 170, equal to three less than the total number of studies across all three subgroup categories after shifting the unit of analysis. The profile-likelihood confidence intervals for between-study heterogeneity also differ slightly. The confidence intervals in the first column are calculated by profiling the subgroup-specific log likelihoods, whereas those in the second column are calculated by profiling the combined likelihood, a task that is more computationally demanding and that does not take advantage of the separable structure of the log-likelihood.
In the third column of Table 2, we estimated a multivariate subgroup correlated effects model using the raw effect size estimates. Consistent with Theorem 2, the point estimates, standard errors, and confidence intervals match those from the composite model estimated on the subgroup aggregated effects. (One can obtain confidence intervals exactly equivalent to those from the subgroup analysis in the first column by manually setting the
As noted in Section 2.3, the discrepancy in the Q statistics can be interpreted as a measure of heterogeneity in the effect size estimates over which we have aggregated. Here, the large difference in Q statistics indicates that there is substantial excess heterogeneity in effect sizes drawn from the same subgroup and same sample (
4. Robust Variance Estimation
Many of the models that we have described are subject to the criticism that they involve arbitrary or potentially misspecified assumptions about the sampling errors or other aspects of the dependence structure. For instance, if primary studies fail to report correlations among outcome measures, the correlation used to calculate the sampling variance of the aggregated effect sizes based on Equation 6 or 14 might be selected based only on convention. If the assumed correlation is incorrect, then Vj will also be incorrect, which will affect the estimation of the heterogeneity variance components, as well as the standard errors and confidence intervals for meta-regression coefficients. Further, if a meta-analyst uses the shifting unit of analysis approach and estimates a model that includes effect sizes from multiple categories (as in Equations 16 or 17), then the dependence structure of the model will be misspecified due to the correlation between effect size estimates from different categories in the same study.
To address the issues of potential mis-specification, Hedges et al. (2010) proposed RVE methods for meta-regression models. RVE methods provide means to estimate standard errors, conduct hypothesis tests, and construct confidence intervals for meta-regression coefficients that do not rely on the assumed dependence structure being correctly specified. Instead, RVE uses sandwich estimators that work under the weaker assumption that the effect size estimates can be grouped into clusters of dependent observations, where effect sizes from distinct clusters are independent. The original form of RVE described by Hedges et al. (2010) involves approximations that require a large number of independent clusters. However, subsequent work (Tipton, 2015; Tipton & Pustejovsky, 2015) has provided small-sample corrections that reduce the bias of the variance estimators and improve the calibration of hypothesis tests and confidence intervals when the data have a limited number of clusters. RVE methods are therefore an attractive tool for inference in meta-regression models with dependent effect sizes.
4.1. Subgroup Correlated Effects
The equivalence between models for aggregated effect size estimates and models for disaggregated, dependent effect sizes extends to RVE methods, including methods that incorporate small-sample corrections as described by Hedges et al. (2010) and Tipton (2015). We now demonstrate this equivalence by showing the equality of RVE applied to Model (17) for effect sizes aggregated within subgroups and RVE applied to the subgroup correlated effects Model (18) for the disaggregated effect sizes.
Suppose that studies can be grouped into M independent clusters, where
A robust variance estimator for Model (17) is given by
where
where
Let
Let
and
Under quite general conditions,
Equivalencies of RVE applied to several of the other models we have described can be established as special cases of Theorem 3. Specifically, RVE applied to Model (7) is equal to RVE applied to Model (8). Similarly, RVE applied to Model (9) is equal to RVE applied to Model (10).
Somewhat restrictive conditions are required for the RVE matrices with small-sample correction
The second condition requiring that
Online Appendix F demonstrates Theorem 3 by applying the various forms of RVE to the empirical examples from Sections 2.3 and 3.3. In both examples, the sampling variances are not compound symmetric, leading to small discrepancies between the
4.2. Subgroup-Specific and Composite Models
Theorem 3 does not cover the relationship between the subgroup-specific Model (15) and the composite Model (16). Model (16) can be understood as a special case of Model (17), in which the regression coefficients and variance components are separable. More precisely, the composite meta-regression has predictors of the form
where
Applying RVE to the subgroup-specific model entails using only the averaged effect sizes in that category. Let
The implication of Theorem 4 is that applying RVE (with the
5. Discussion
In prior discussions of methods for meta-analysis of dependent effect size estimates, reductionist strategies such as aggregating effect sizes by study or shifting the unit of analysis have been characterized as “ad hoc” techniques (Becker, 2000). This may leave researchers with the impression that reductionist strategies are simply outdated or inferior to more recently developed integrative approaches, such as multilevel meta-analysis or RVE—strategies which might be interpreted as more advanced or sophisticated. Such casual impressions are belied by the theorems that we have presented, which show that reductionist methods are, in fact, exactly equivalent to certain multivariate models for dependent effect sizes.
Equivalencies between different representations of a model have proven to be useful in several related areas of meta-analysis. For example, Cheung (2013, 2014) demonstrated how a range of meta-analytic models can be expressed as structural equation models. Recognizing such connections is useful for estimating complex random effects models and for handling missingness in meta-regression predictors. As a further example, White et al. (2012) showed how inconsistency models for network meta-analysis can be represented as multivariate meta-regression models, which can be fit using standard software tools. Just as in these examples, the connections we have described between reductionist strategies and multivariate models provide both conceptual and practical insights.
Conceptually, the correspondence between reductionist strategies and multivariate models provides a unifying framework for meta-analysis of dependent effect size estimates. Rather than treating reductive strategies as separate from and incommensurable with newer models and methods, Theorems 1 and 2 show that reductive strategies can be understood as specific forms of multivariate models. Namely, aggregating effect sizes by study is equivalent to using a correlated effects model and subgrouping or shifting the unit of analysis is equivalent to using a subgroup correlated effects model. Under the conditions we have described, the equivalencies apply to likelihood-based estimation methods and to RVE methods. Theorems 3 and Theorem 4 imply that reductionist strategies can be interpreted as working models, which can be combined with RVE methods to provide protection from model misspecification. Thus, for purposes of developing further theory and evaluating the properties of different approaches to meta-analysis of dependent effect sizes, it is sufficient to consider the framework of multivariate working models.
The multivariate representations of reductionist approaches are also useful for the purposes of model comparison. As demonstrated in the examples from Sections 2.3 and 3.3, the difference in Q statistics between models based on raw effect sizes and models based on aggregated effect sizes provides a measure for evaluating the appropriateness of strategies that involve aggregating effect size estimates. Alternately, likelihood ratio tests can be used to compare the correlated effects model to a model that also includes within-study heterogeneity. Because of their equivalence, a test of the correlated effects model reflects directly on the appropriateness of aggregating effects by study. Similarly, one could evaluate the appropriateness of shifting the unit of analysis by testing the subgroup correlated effects model against a model that allows for within-study, within-subgroup heterogeneity.
Comparisons among multivariate working models might be of particular interest in syntheses where different subgroups of effects can be conceptualized as a multivariate outcome. For instance, in a synthesis of intervention studies on second language learning, researchers might conceptualize outcomes as consisting of several dimensions, such as listening, speaking, and writing abilities. Currently, researchers often use the ad hoc methods of subgrouping or shifting the unit of analysis to analyze such data, conducting what amount to separate meta-analyses of available data from each dimension. Recognizing the connection between these approaches and the corresponding multivariate model, the subgroup correlated effects model, is useful because it can facilitate comparisons with other multivariate models. Rather than treating different subgroups of effects as independent, researchers might entertain a model in which the study-specific effect sizes from each subgroup follow a correlated, multivariate random effects distribution (Jackson et al., 2011; Sera et al., 2019). Further research is needed to understand the advantages and drawbacks of such multivariate random effects models relative to models that treat different dimensions as independent—especially for meta-analytic data where not every study includes findings for every dimension.
On a practical level, aggregating effect sizes, subgrouping, or shifting the unit of analysis might be appealing because they make an analysis easier to follow, because using them makes it easier to run diagnostics or create illustrations of the results, or because of software limitations. If the corresponding multivariate working model is appropriate, then the results that we have presented provide formal justification for using such reductionist strategies. For instance, a graphical representation of meta-analysis results such as a forest plot could be based on the aggregated effect size estimates rather than the raw estimates, reducing the complexity of the figure. Carrying out calculations on aggregated or subgroup-aggregated effect size estimates will also tend to be more computationally efficient than fitting the corresponding multivariate model. Although differences in computation time might be trivial for estimation of summary meta-analytic models with moderately sized datasets, computational efficiency might matter in practice for syntheses of large datasets and for more demanding calculations such as sensitivity analyses, profile likelihood confidence regions, or bootstrap hypothesis tests (e.g., Du et al., 2023; Joshi et al., 2022), or applications of multiple imputation for missing predictor values (Lee & Beretvas, 2023).
The equivalencies that we have presented are defined under the premise that there is no missingness in the predictor variables included in the meta-regression model. In practice, it is common that the predictors of interest may be available only for a subset of included studies. Recent methodological research has examined various techniques for handling missing predictors, including complete-case analysis, shifting-case analysis, multiple imputation, and full information maximum likelihood (Cheung, 2008, 2014; Lee & Beretvas, 2023; Schauer et al., 2022). If the complete-case analysis or shifting-case analysis method is used to remove the effect sizes with missing predictors, the equivalencies between ad hoc strategies and multivariate models will hold for the subset of data with completely observed predictors. However, further research is needed to examine different representations of meta-regression models with missing predictors if multiple imputation or full information maximum likelihood (Cheung, 2008, 2014) methods are used to account for missingness. The equivalencies that we have described in the present article also highlight the need for missing data methods that work under the multilevel structure inherent in meta-analysis of dependent effect sizes.
The correspondences between models that we have described are limited in several respects. First, exact equivalence between reductionist approaches and the corresponding multivariate models holds only for estimation methods that are based on the likelihood or restricted likelihood of the models. Some other commonly used methods are not based on the model likelihood, such as moment estimators for variance components, confidence intervals based on generalized Q-statistics (Jackson, 2013), or the Hartung and Knapp (2001) correction to standard errors. They will not necessarily be equivalent (or even defined) across both representations. Further research is needed to better understand the relative strengths of likelihood-based estimation versus moment-estimation methods for models with dependent effect sizes, such as those proposed by Hedges et al. (2010) and Jackson et al. (2013).
Second, exact equivalence with multivariate models is limited to reductionist approaches that involve aggregating using inverse-variance weighted averages, with sampling variances calculated based on an assumed correlation between effect size estimates. Some meta-analysts might use more casual approximations for calculating aggregated effect sizes and variances, such as taking simple averages of effect size estimates and sampling variances by study. This could lead to difference between results based on aggregated effect size estimates and results based on the multivariate model, although we expect that such differences will usually be minor in practice.
Finally, the equivalencies that we have described are limited to meta-regression models where the predictors vary only at the level of aggregation—that is, at the study level when aggregating effects by study or at the subgroup level when aggregating effects by subgroup within study. A key advantage of integrative models is that they permit inclusion of predictors that vary across effect sizes within the same subgroup or same study, allowing the meta-analyst to control for potential methodological confounding factors, differences between outcome measures, or other features that may explain within-study variation (López-López et al., 2018; Tipton et al., 2019). There remains a need to investigate the benefits and limitations of using the correlated effects or subgroup correlated effects working models for meta-regressions that include such effect-level predictors.
The array of tools and approaches for synthesizing dependent effect sizes has expanded substantially over the past 15 years. As meta-analysts continue to work with data that involve complex dependence structures, there is a need for further guidance about how to select an appropriate modeling strategy. Recognizing the equivalences between reductionist strategies and multivariate working models will, we hope, help in establishing a more principled basis for such decisions.
Supplemental Material
Supplemental Material, sj-docx-1-jeb-10.3102_10769986241232524 - Equivalencies Between Ad Hoc Strategies and Multivariate Models for Meta-Analysis of Dependent Effect Sizes
Supplemental Material, sj-docx-1-jeb-10.3102_10769986241232524 for Equivalencies Between Ad Hoc Strategies and Multivariate Models for Meta-Analysis of Dependent Effect Sizes by James E. Pustejovsky and Man Chen in Journal of Educational and Behavioral Statistics
Footnotes
Authors’ Note
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research and/or authorship of this article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
