Abstract
One challenge in using multilevel models is determining how to report the amount of explained variance. In multilevel models, explained variance can be reported for each level or for the total model. Existing measures have been based primarily on the reduction of variance components across models. However, these measures have not been reported consistently because they have some undesirable properties. The present study is one of the first to evaluate the accuracy of these measures using Monte Carlo simulations. In addition, a measure based on the full partitioning of variance in multilevel models was examined. With the exception of the Level 2 explained variance measure, all other measures performed well across our simulated conditions.
Keywords
Multilevel modeling (MLM) has become an important tool for organizational scientists because it allows researchers to analyze data that reside at different levels. Organizations are inherently hierarchically structured. For example, employees work in teams, supervisors manage teams, and supervisors work in organizations that have their own unique culture. It is important to consider both individual-level variables and contextual variables when analyzing organizational relationships. Ignoring the influence of contextual variables discards important pieces of the puzzle of organizational behavior and may lead to erroneous conclusions about both micro- and macro-level relationships (Bliese & Hanges, 2004). In the past two decades, a number of articles (Bliese & Ployhart, 2002; Hofmann & Gavin, 1998; Kreft, de Leeuw, & Aiken, 1995; LaHuis & Ferguson, 2009; Maas & Hox, 2005), book chapters (Bliese, 2000; Long, 2012), and books (Hox, 2010; Klein & Kozlowski, 2000; Raudenbush & Bryk, 2002; Snijders & Bosker, 2012) have been dedicated to understanding both technical issues and the application of MLM in an organizational context. Researchers have utilized MLM to study topics such as organizational safety (Bacharach, Bamberger, & Doveh, 2008; Neal & Griffin, 2006), leadership and management (Atwater, Wang, Smither, & Fleenor, 2009; Chen, Kirkman, Kanfer, Allen, & Rosen, 2007), teams (Dierdorff, Rubin, & Morgeson, 2009), stress (Liu, Wang, Zhan, & Shi, 2009), and rater effects (Ng, Koh, Ang, Kennedy, & Chan, 2011). Furthermore, the use of MLM in organizational research is rising. Mathieu, Aguinis, Culpepper, and Chen (2012) conducted a thorough review of multilevel articles published in the Journal of Applied Psychology from 2002 to 2010. They found that there was an average of three published articles using MLM per year between 2000 and 2002. This number more than quadrupled to an average of 13 articles per year between 2008 and 2010. There were similar trends in other prominent journals, such as Organizational Research Methods (Aguinis, Pierce, Bosco, & Muslin, 2009). However, MLM is a relatively new technique and there are still some technical issues that require the attention of researchers. Due to the ubiquitous nesting structure in organizations and the rising popularity of MLM, it is important for researchers to address these issues.
One challenge facing researchers using multilevel models is how to estimate and report effect sizes. Reporting effect sizes is important because they are useful in theory building and testing (Aguinis et al., 2010) and communicating results to others from different professions (Thompson, 2002). The most common effect size in multilevel modeling is explained variance (R 2). However, the presence of multiple variance components in multilevel models complicates the estimation of these statistics. For example, in a basic model containing one Level 1 predictor, one Level 2 predictor, and a cross-level interaction, there are potentially three variance components (e.g., within-group variance, intercept variance, and slope variance) that a researcher can explain. Although a number of explained variance measures for multilevel models exist, there are some issues with these statistics. First, it is possible to find negative values for some of the measures. That is, adding predictors may actually decrease the amount of explained variance. In addition, the accuracy of these measures is unknown. The result of these issues is that researchers often do not report the amount of explained variance in multilevel studies. For example, we searched the 10 most prestigious journals identified by Zickar and Highhouse (2001) using the terms multilevel and hierarchical linear modeling and found 126 multilevel articles. 1 Of these, only 76 (60%) contained some measure of explained variance. This is unfortunate given the importance of effect sizes and the known issues of lower power for some of the relationships in these models (Mathieu et al., 2012).
In the present study, we evaluate the accuracy of the existing measures of explained variance for multilevel models across a number of conditions. To our knowledge, this is the first study to examine the performance of the various effect size indices. Furthermore, we identify some critical issues researchers face when using these measures and seek to provide guidance on these issues based on Monte Carlo simulations. Our hope is that the current study will facilitate the use of explained variance measures in multilevel research. Before discussing effect size measures and the issues that may arise when using them, we will briefly describe a basic multilevel model to introduce concepts for those new to MLM.
Multilevel Models
For simplicity, we focus on a basic two-level multilevel model that includes two Level 1 predictors (X1, X2), one Level 2 predictor (Z), and a cross-level interaction (X1 × Z) predicting a Level 1 outcome (Y). It is often useful to conceptualize multilevel models as equations at different levels. The Level 1 equation regresses the outcome onto the Level 1 predictors:
In this equation, β0j
is the intercept. β1j
and β2j
are the regression slopes predicting Yij
using X1ij
and X2ij
, respectively. The Level 1 error is Rij
and its variance (σ2) represents within-group variance not explained by the model. The subscripts i and j refer to the Level 1 and 2 units, respectively. The Level 2 equations model variance in β0j
, β1j
, and β2j
:
where Zj is a Level 2 predictor, the γs are Level 2 regression coefficients, and U 0j , U 1j , and U 2j represent Level 2 errors. The regression coefficients are referred to as fixed effects; the Level 1 and Level 2 error terms are random effects.
A succinct way of expressing more complicated models in a single equation is:
In Equation 5, there are p Level 1 predictors and q Level 2 predictors, and it is assumed that the Level 1 predictors Xhij do not correlate with the Level 2 predictors Zkj . There are p + 1 random parameters because there is one random intercept term and every Level 1 predictor has a random slope. In practice, researchers may only wish to specify random slopes for a subset of Level 1 predictors. The first four terms contain the fixed part of the model and contribute to the prediction of the outcome. The last three terms contain the random part of the model. The variance of U 0j is τ00 and represents intercept variance not explained by Zj . Similarly, the variances of Uhj are τ hh and represent the unexplained variance in the Level 1 slopes. It is useful to predict unexplained Level 1 slope variance with a Level 2 predictor, which is referred to as a cross-level interaction. It is common to allow the intercept and slopes to covary. A positive covariance between a random intercept and random slope means that groups with higher intercept values have stronger within-group effects, whereas a negative covariance suggests groups with higher intercept values have weaker within-group effects.
As previously mentioned, the presence of multiple variance components complicates the calculation of explained variance for multilevel models. In traditional ordinary least squares (OLS) regression, explained variance can be thought of as a ratio of explained variance to total variance. However, in multilevel models, the total variance is partitioned into Level 1 and Level 2 variance components.
Explained Variance Measures for Multilevel Models
For the most part, organizational researchers have used multilevel explained variance measures based on the reduction in variance components when adding predictors. These measures compare variance components from different models. A disadvantage of these measures is that they can result in negative explained variance estimates because of the way fixed effects and variance components are estimated. A few researchers (Hofmann, Morgeson, & Gerras, 2003; Vandenberghe et al., 2007; Wallace, Edwards, Arnold, Frazier, & Finch, 2009) reported explained variance from an OLS regression analysis, which is based on a variance partitioning approach. Although the OLS measure ignores the multilevel structure of the data, this will provide an unbiased estimate of explained variance because the fixed-effects estimates are typically estimated without bias. More recently, Nakagawa and Schielzeth (2013) proposed an explained variance measure based on variance partitioning that retains the multilevel structure of the model. In the following section, we describe the indices based on each approach in more detail.
Reduction in Variance Components-Based Measures
One of the first explained variance measures for multilevel models is based on the reduction of unexplained variance when predictors are added (Bryk & Raudenbush, 1992). These approximate R
2 values can be calculated at each level by first estimating the null model to determine the amount of variance at each level. Next, predictors are added to the model, which allows the researcher to compare the residual variance components to those from the null model. Level 1 explained variance can be calculated using the formula:
Similarly, Level 2 explained variance is computed using
A major advantage of these measures is that they provide intuitive simple measures of explained variance based on the logic of partitioning unexplained variance in two-level models. However, they can be problematic because they can result in negative values. That is, adding predictors may actually increase the amount of unexplained variance. This can occur because the between-group variance is a function of both Level 1 and Level 2 variances. Specifically, Snijders and Bosker (1994) show that
where nj is the group size. Thus, if a Level 1 predictor is added and σ2 decreases, but the between-group variability is not affected, then τ00 must increase to balance out the decrease in σ2. That is, adding a Level 1 predictor would increase the unexplained Level 2 variance.
Snijders and Bosker (1994) offered alternative measures to address this issue of negative explained variance estimates. They originally described measures for Level 1 and Level 2 R
2s; however, according to our review only the Level 1 version has been used because it reflects the total amount of explained variance. The Level 2 measure did not have a straightforward interpretation. Thus, we focused on the Level 1 version. They described explained variance as the proportional reduction in mean squared prediction error. To address the issue of the interplay between τ00 and σ2, they used both to calculate explained variance. One measure provides an estimate of a total R
2 by using the σ2s and τ00s from null and full models using the formula
Equations 6, 7, and 9 are appropriate for random intercept models. When random slopes are present, the formulas for partitioning unexplained variance are more complicated as described below. Snijders and Bosker (1994) provided a formula for calculating
Variance Partitioning–Based Measures
Several authors (Hofmann et al., 2003; Vandenberghe et al., 2007; Wallace et al., 2009) have used explained variances estimates from an OLS regression equation where Level 2 variables are disaggregated down to Level 1 units. R
2
(OLS) can be expressed in many ways. In the present context, it is useful to consider R
2
(OLS) in the terms of how variance in the outcome is partitioned. In OLS, the regression variance of Y is partitioned into the variance of predicted scores
Equation 10 is the ratio of explained variance to total variance. Although OLS will result in biased standard errors for regression coefficients in multilevel contexts (Bliese & Hanges, 2004), it will usually produce unbiased estimates for the regression coefficients themselves. This will result in R 2 (OLS) being unbiased.
Nakagawa and Schielzeth (2013) proposed an R
2 measure that considers the full partitioning of variance for multilevel models as described by Snijders and Bosker (2012). To simplify the presentation, we refer to the variance of predicted scores as
The last two terms are the unexplained variances at Level 2 and Level 1 as defined previously. From this equation, an explained variance measure based on multilevel variance partitioning R
2
(MVP) can be calculated. The equation is
Put simply, R 2 (MVP) is the variance of the predicted scores divided by the variance of the predicted scores plus the Level 1 and Level 2 variance components. This is equivalent to the R 2 measure proposed by Nakagawa and Schielzeth (2013).
Nakagawa and Schielzeth (2013) did not consider models with random slopes. Consistent with Snijders and Bosker’s (2012) recommendation, they suggested simply basing the explained variance on a model without random slopes because they generally do not affect the amount of explained variance. However, incorporating random slopes in the partitioning variance in Yij
is straightforward. We again simplify the presentation somewhat by using
where μ
X(q)
is a vector of predictor means, T
10 is a vector of intercept-slope covariances, and T
11 is the covariance matrix of random slopes. The term
Researchers may find Equation 14 somewhat daunting. However, the statistical program R provides the necessary components. As with
Potential Issues in Reporting Explained Variance in Multilevel Models
We see three major issues that researchers may encounter when reporting explained variance in multilevel models. The first issue concerns what measure to report. This will likely depend on if researchers are interested in explaining level-specific or total model variance.
A second issue is how to deal with random slopes. Researchers may either ignore slope variance or incorporate it in the equations. Snijders and Bosker (2012) and Nakagawa and Schielzeth (2013) advocate ignoring slope variance and to base explained variance calculation on the random intercept model. However, to our knowledge this suggestion has not been tested empirically under a variety of conditions. We compared these two approaches using Monte Carlo simulations to assess if one of these approaches should be preferred.
A final issue is how to compute and interpret the change in explained variance between two nested models. In OLS, this is calculated by taking the difference between the R
2s produced by the two models. An alternative approach for the explained variance measures based on the reduction in variance components is to compute explained variance using the variance component estimates. It is important to note that the two approaches provide estimates of different things. For example, if Model 1 and Model 2 were nested models, we could compute a change in
Thus, the first approach compares the change in variance components from Models 1 and 2 relative to the variance component from the null model. The latter approach compares the change in variance components against the Model 1 variance and is simply the ratio of the difference between Model 1 and Model 2 variance to the variance component from Model 1. Thus, the first approach expresses the amount of incremental explained variance relative to the total variance. This is consistent with the traditional interpretation of change in R
2. The second approach provides the amount of explained variance relative to the Model 1 residual variance. This applies to explained variance measures based on the reduction in variance components (
The Present Study
The purpose of the current study is to shed light on the three issues we believe researchers are likely to encounter when calculating and reporting explained variance measures in multilevel models. The accuracy of these measures plays an integral role in resolving each issue. It is not clear how accurate any of the explained variance measures are or how they are affected by various characteristics of multilevel studies. To address this, we evaluated the performance of the explained variance measures using three criteria: unbiasedness, consistency, and efficiency. An estimator is unbiased when the expected value of the estimator equals the population value. In our case, an explained variance measure would be unbiased if the mean across samples equaled the population R 2. We adopted Hoogland and Boomsma’s (1998) criterion of .05 or less for acceptable parameter estimate bias. A consistent estimator is one whose estimates converge toward the population value as sample size increases. In the present study, an explained variance measure is consistent if it exhibits less bias as sample size increases. In the multilevel context, sample size refers to the number of groups, group size, or both. We evaluate consistency against all three criteria. Efficiency refers to the precision of an estimator. In the present study, we evaluated efficiency by computing the standard deviation of point estimates for each condition. These standard deviations can essentially be interpreted as standard errors because our generated data represent a population. Finally, effect size measures should be scaled properly given the question of interest (Kelley & Preacher, 2012). All of the measures in the present study seek to quantify the percentage of variance explained by the predictors. Thus, they should have a minimum of zero and a maximum of one. Negative estimates greatly complicate the interpretation of explained variance and of change in explained variance. We tracked the percentage of times an R 2 index returned a negative value.
Method
Design
To evaluate the various explained variance measures, we conducted two sets of Monte Carlo simulations that examined different generating multilevel models. We chose these models because they contain the possible effects one might encounter in multilevel models (cross-level effects, cross-level interactions, etc.). In our first set of simulations, we used a random-intercept model with two Level 1 predictors and one Level 2 predictor. In the second set of simulations, we added a random slope for a Level 1 predictor and a cross-level interaction to the previous model.
For both sets of simulations, we manipulated the number of groups, average group size, the intraclass correlation (ICC) in the outcome, and the amount of explained variance. We based these values on our literature review of multilevel articles in the organizational literature. For each article, we recorded the number of groups, group size, parameter estimates for fixed and random effects, and whether or not explained variance was reported. We derived the number for groups, average group sizes, and R 2 values based on the 25th, 50th, and 75th percentiles of those reported in the studies. Based on these results, we defined the number of groups to be 40, 70, or 160. We used unbalanced group sizes with average group sizes of 4, 8, or 21. Sizes for each group were sampled randomly from a vector ranging from 2 to 27 until the desired number of groups and average group sizes were reached. The R 2 values for each level were .1, .3, or .5 and reflected a broad range of explained variance values. We set the ICC for Y to either .15 or .3. In the random-intercept model simulations, we set the correlation between Level 1 predictors to .3. To simplify the data generation process for the random-slope models, we set the correlation between Level 1 predictors to 0.
Data Generation
For the random-intercept models, we generated data for basic two-level multilevel models using the software program R. Essentially, we specified Level 1 and Level 2 population correlation matrices resulting from the specified model and sampled randomly from the multivariate normal distribution using the mvrnorm command in R. The R code used for the data generation is available from the first author.
For the random-intercept model, the equations for the underlying model were
Table 1 presents the equations used to generate the correlation matrices. We generated the correlations between the outcome and predictors to correspond to the desired R 2 values at Level 1 based on the equations. That is, these values changed systematically to produce the desired population Level 1 R 2. A similar procedure was used for the Level 2 variable. We used the mvrnorm command and specified the population correlation matrix at Level 2. This included correlations between the between-group variance of the outcome and the Level 2 predictor. Again, the actual values of the correlation matrix varied to produce the desired between-group variances in the outcome and the predictors, as well as the desired Level 2 R 2. We then combined the Level 1 and Level 2 data sets to create the full multilevel data set.
Data Generation for the Random-Intercept Model Simulations.
Note: τ00 equaled approximately .176 for the ICCY = .15 condition and .429 for the ICCY = .30 condition.
For the random-slope model, we used a different procedure to generate the data because specifying population Level 1 and Level 2 correlations is considerably more complicated with random-slope models. Instead, we generated the data directly from the equation:
Values for X1, X2, and Z were drawn randomly from standard normal distributions. Values for U
0j
were drawn from a random normal distribution with a mean of 0 and a standard deviation equal to
Analyses
We analyzed the data using restricted maximum likelihood estimation as implemented in the LME4 package in R. A null model was first estimated to obtain baseline σ2 and τ00 variance components. Next, we estimated the reduced and full models to obtain their respective variance components. We then calculated the various R
2 measures using the appropriate equations. We assessed the accuracy of the effect size measures by calculating mean bias and standard deviation of the point estimates. Bias was the mean difference between the estimated
Results
Accuracy of R 2 Measures
Random-Intercept Models
Table 2 presents the overall results for the random-intercept and random-slope models. For the random-intercept model, all measures displayed acceptable levels of bias with values ranging from –0.011 to 0.006 with
Overall Results.
Note: Estimates averaged across 54,000 samples for random-intercept conditions and across 54,000 samples for random-slope conditions. Values in parentheses are based on estimates using only the random-intercept model. There is only one value for R 2 (OLS) because it is based on a disaggregated version of the multilevel model that does not consider Level 2 variance.
Table 3 presents the average bias and standard deviation values as a function of the conditions we manipulated for the random-intercept models. All measures were consistent in that they exhibited less bias as group size increased. The amount of between-group variance in the outcome had little influence on bias. There was more bias as population R 2 increased for all of the measures except R2(OLS) and R2(MVP), which became more accurate as the size of the effect increased.
Simulation Results for the Random-Intercept Model Simulations.
Note: NGS = number of groups; NJ = cases per group; Estimates averaged across 18,000 samples for NGS, NJ, and R 2 conditions and across 27,000 samples for ICC Y conditions.
The overall results for the random-slope models are also displayed in Table 2.
Table 4 presents the bias and standard deviation values as a function of the manipulated conditions for random-slope models. In general, the manipulated conditions had little impact on bias and standard deviation values. For the most part, the measures were consistent and produced less bias as the sample size increased.
Simulation Results for the Random-Slopes Model Simulations.
Note: NGS = number of groups; NJ = cases per group; estimates averaged across 36,000 samples for NGS, NJ, and R 2 conditions and across 54,000 samples for ICC Y and ICC X1 and X2 conditions.
Random-Slope Models
We were also interested in how researchers should deal with random slope variance. One solution is to incorporate it into the relevant formulas. Another solution is to base explained variance off a random-intercept model where the slopes are fixed. To compare these two approaches, we estimated two sets of models. One is the random-slope model with two Level 1 predictors and a cross-level interaction reported previously. The other random-slope model omitted the cross-level interaction thereby resulting in more unexplained slope variance. We used two different approaches to calculate explained variance for each model. The first approach uses the parameters produced by the random-slope model to calculate the various explained variance estimates. The second approach uses the parameter estimates derived from the random-intercept version of the two models. As shown in Table 2, both approaches performed well in terms of bias and efficiency for most of the measures when the cross-level interaction was included. However, when the cross-level interaction was not included
Changes in R2
Table 5 presents the results for the changes in explained variance analyses. We first assessed the change in explained variance when a Level 1 predictor is added. We accomplished this by comparing the full random-intercept model previously described to a reduced model where one of the Level 1 predictors is omitted. Thus, the explained Level 1 variance in the reduced model is half that of the full model. For example, for an R
2 of .3 and an ICC of .15, the difference in Level 1 explained variance between the full and reduced models was .15. This would be the population value for change in
Results for Changes in R 2.
Note: Estimates averaged across 54,000 samples for random-intercept conditions and across 54,000 samples for random-slope conditions.
We also assessed changes in R
2 due to adding a Level 2 predictor for random-intercept models. This was accomplished by comparing the full model to a reduced model where the Level 2 predictor Z was omitted. This resulted in an expected change in
Finally, we assessed changes in R 2 when a cross-level interaction was added to an equation containing two Level 1 predictors and one Level 2 predictor. Thus, we compared the full random-slope model with a random-slope model without the cross-level interaction. The population values for changes in R 2 were identical to those when a Level 1 predictor was added. As shown in Table 5, all of the measures demonstrated acceptable levels of bias and efficiency. All measures except R 2 (OLS) returned negative estimates, but the occurrence was rare (<1% of the time).
Supplemental Analyses
In the previous simulations, we found only small differences between the explained variance measures. One reason for this may have been the clean delineation between Level 1 and Level 2 effects. In practice, this may not be the case. For example, it is possible for a variable to have different Level 1 and Level 2 effects (Snijders & Bosker, 2012). When this is the case, OLS and MLM may produce bias estimates if that variable is uncentered or grand-mean centered. In contrast, group-mean centering the variable and adding the group mean as a Level 2 predictor will produce fairly accurate estimates. However, it is not clear how each of the explained variance measures will perform when this occurs.
We investigated this by conducting some smaller simulations to assess if unequal Level 1 and Level 2 effects produced different results. Given the results of the aforementioned analyses, we generated data with 70 groups, an average of 8 cases per group, and a .30 ICC for the outcome. In one simulation, we specified Level 1 and Level 2 coefficients to be opposite in sign using the same levels of effect size as described previously. For example, in one condition, at Level 1, the variable had a positive effect that explained 10% of the Level 1 variance, and at Level 2, the variable had a negative effect that explained 50% of the Level 2 variance. We fully crossed the signs and effect size levels to produce 18 conditions. In the other simulation, we specified different positive effects at Level 1 and Level 2. For example, one condition had a positive Level 1 effect that explained 10% of the Level 1 variance and a positive Level 2 effect that explained 50% of the Level 2 variance. For both simulations, we generated 1,000 data sets per condition. For each sample, we analyzed the data using grand-mean centering as well as group-mean centering with the group means added in as a Level 2 predictor.
Table 6 presents the results. As expected, there were clear differences between the centering approaches with the group-mean centering approach producing less biased estimates. The bias for grand-mean centering was greater as the discrepancy between the effects sizes grew. However, we again found little differences across explained variance measures suggesting that centering choices and unequal effect sizes did not have different effects on the explained variance measures.
Unequal Effect Sizes at Level 1 and Level 2.
Note: Estimates averaged across 18,000 samples for opposite sign different effect results and 6,000 samples for same sign different effect results. Values outside parenthesis are group-mean centered; values in parenthesis are grand-mean centered.
Discussion
A major challenge for researchers conducting multilevel analyses is to determine how to report explained variance. As such, the amount of explained variance is often not reported for multilevel analyses in organizational studies. Explained variance measures provide a useful summary of the magnitude of effects and may be particularly useful in multilevel studies where unstandardized coefficients are reported often. In the present study, we compared several existing multilevel explained variance measures and investigated several issues likely to occur when using these measures. A summary of the comparisons is presented in Table 7. Our results allow us to draw several conclusions about the measures and provide specific recommendations for their use.
Comparison of Multilevel Explained Variance Measures.
Overall, most of the measures (
The conditions that we manipulated did not appear to have large effects on the majority of the explained variance measures. We manipulated the number of groups, group size, the amount of explained variance at each level, the amount of between-group variance in the outcome, and centering choices. None of these appeared to have large effects on the bias or standard deviations of the measures for either the random-intercept or random-slope models. Finally, most measures resulted in some negatives estimates even when the population effect size was nonzero. This occurred only for the measures that relied on comparing variance components across models. These negative values can likely be attributed to estimation error when sample sizes are smaller.
The presence of random slopes in multilevel models can create problems for estimating explained variance because the partitioning of variance is considerably more tedious. As such, some (Nakagawa & Schielzeth, 2013; Snijders & Bosker, 2012) have suggested basing explained variance on results from a random-intercept model because random slopes do not generally affect the prediction of the outcome. We compared this approach with incorporating the random-slope variance into the measures when applicable. Our results suggest little differences between the two approaches when there was little slope variance but that
The results of the changes in explained variance analyses were generally consistent with those of the overall analyses. One exception was that negative values were found for R 2 (MVP). This primarily occurred because R 2 (MVP) was overestimated in the model without the cross-level interaction and is consistent with the notion that negative explained variance estimates from multilevel models are a result of model misspecification.
Recommendations
The results of the present study allow us to make some specific recommendations for reporting explained variance. Table 6 provides a summary of our recommendations. In general,
Researchers wishing to examine level-specific explained variance can use
Random slopes can complicate the partitioning of variance in multilevel models. Our results support Snijder and Bosker’s (1994) suggestion of using random-intercept models to calculate explained variance because random slopes do not generally predict the outcome. In fact, our results suggest that this approach is necessary for estimating
We suggest that researchers use changes in explained variance for expressing the magnitude of cross-level interaction effects.
Finally, we caution researchers to consider whether it is possible for variables to have different within- and between-group effects when calculating explained variance for multilevel models. When this is the case, OLS and MLM approaches may produce biased estimates unless group-mean centering is used and the group means are added as a Level 2 predictor. We recommend that researchers always compare results between grand-mean centered and group-mean centered analyses. In addition, although R 2 (OLS) can be used to assess explained variance in multilevel models, the OLS-based standard errors will be biased. Thus, researchers need to use MLM to calculate accurate significance levels and confidence intervals.
Limitations and Future Research
One limitation of the present study is that we examined a limited amount of multilevel models. We assessed the performance of explained variance measures across multilevel models that were selected because they represent some of the common scenarios in multilevel research and because they contain most of types of effects that will interest organizational researchers. We believe they generalize to models with more predictors and/or random effects because they are easily added to the models. However, it is possible that this is not the case.
We did not manipulate between-group variance in the Level 1 predictors in our simulations. As such, our results generalize to group-mean centered models. In practice, the decision about which centering method to use ultimately depends on the research question. Our focus on group-mean centering is consistent with Mathieu et al. (2012) and the recommended approach of testing cross-level interactions (Hofmann & Gavin, 1998). In addition, when there is considerable between-group variance in Level 1 predictors, we recommend using group-mean centering and using the group means as Level 2 predictors. This allows researchers to follow Snijders and Bosker’s (2012) recommendation to consider the possibility of different within- and between-group effects.
Finally, we agree with Kelley and Preacher (2012) that effect sizes are estimates and should be reported with confidence intervals. Unfortunately, computing standard errors for multilevel R 2 measures in a single sample is not straightforward because they are partially a function of variance components that have standard errors with undesirable properties (Snijders & Bosker, 2012). The most logical approach would be to employ bootstrapping; however, there is some debate about whether researchers should use case or residual bootstrapping (Goldstein, 2011). Thus, the question of how to estimate standard errors for multilevel explained variance measures still needs to be addressed.
Conclusion
In sum, our results suggest that researchers have several viable options for assessing the amount of explained variance in multilevel models. For the most part, the choice of measure can be a matter of preference although several issues need to be considered in choosing a measure. Specifically, researchers need to consider the presence of random slopes and the possibility of different between- and within-group effects for a variable as these will impact how each measure can be used.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
