An Expanded Decision-Making Procedure for Examining Cross-Level Interaction Effects With Multilevel Modeling

Abstract

Cross-level interaction effects lay at the heart of multilevel contingency and interactionism theories. Also, practitioners are particularly interested in such effects because they provide information on the contextual conditions and processes under which interventions focused on individuals (e.g., selection, leadership training, performance appraisal, and management) result in more or less positive outcomes. We derive a new intraclass correlation, ρ_β, to assess the degree of lower-level outcome variance that is attributed to higher-level differences in slope coefficients. We provide analytical and empirical evidence that ρ_β is an index of variance that differs from the traditional intraclass correlation ρ_α and use data from recently published articles to illustrate that ρ_α assesses differences across collectives and higher-level processes (e.g., teams, leadership styles, reward systems) but ignores the variance attributed to differences in lower-level relationships (e.g., individual level job satisfaction and individual level performance). Because ρ_α and ρ_β provide information on two different sources of variability in the data structure (i.e., differences in means and differences in relationships, respectively), our results suggest that researchers contemplating the use of multilevel modeling, as well those who suspect nonindependence in their data structure, should expand the decision criteria for using multilevel approaches to include both types of intraclass correlations. To facilitate this process, we offer an illustrative data set and the icc beta R package for computing ρ_β in single- and multiple-predictor situations and make them available through the Comprehensive R Archive Network (i.e., CRAN).

Keywords

cross-level analysis interactions measurement models quantitative multilevel research

Researchers in organizational behavior, human resource management, entrepreneurship, strategy, and many other fields now explicitly recognize that lower-level entities are usually nested within higher-level collectives. For example, employees are nested within jobs (e.g., Taylor, Li, Shi, & Borman, 2008) and teams (e.g., Kim, Bhave, & Glomb, 2013), establishments within companies (e.g., Takeuchi, Chen, & Lepak, 2009), and firms within industries (e.g., Short, Ketchen, Bennett, & du Toit, 2006). Similarly, a nested data structure exists in studies involving longitudinal or repeated measures designs in which the lower level refers to observations and the higher level to the units (e.g., entrepreneurs, teams, firms) about which data have been collected over time (e.g., Uy, Foo, & Aguinis, 2010).

Covariation between higher-level variables and lower-level outcomes leads to errors of prediction if a researcher uses statistical approaches such as ordinary least squares (OLS) regression, which are not designed to model data structures that include dependence due to clustering (Aguinis, Gottfredson, & Culpepper, 2013; Heck, Thomas, & Tabata, 2010; Hox, 2010; Raudenbush & Bryk, 2002; Snijders & Bosker, 2012). In other words, dependence is “not adequately represented by the probability model of multiple linear regression analysis” (Snijders & Bosker, 2012, p. 3) and “the effect is generally not negligible” (Hox, 2010, p. 5).

Multilevel modeling, also referred to as hierarchical linear modeling (HLM) (Raudenbush & Bryk, 2002), mixed-effect models (Cao & Ramsay, 2010), random coefficient modeling (Longford, 1993), and covariance components models (e.g., Searle, Casella, & McCulloch, 1992), allows researchers to explicitly incorporate and model bias in standard errors and statistical tests resulting from the dependence of observations that occurs in nested data structures (Kenny, Korchmaros, & Bolger, 2003). Moreover, multilevel modeling allows researchers to assess three types of relationships (Mathieu, Aguinis, Culpepper, & Chen, 2012). First, it allows for tests of lower-level direct effects: whether a lower-level predictor X (i.e., Level 1 or L1 predictor) has an effect on a lower-level outcome variable Y (i.e., L1 outcome). For example, there may be an interest in assessing whether individual job satisfaction predicts individual job performance. Second, it allows for tests of cross-level direct effects: whether a higher-level predictor W (i.e., Level 2 or L2 predictor) is related to a L1 outcome variable Y. For example, a researcher may want to test whether team cohesion (an L2 variable) predicts individual job performance (an L1 outcome). Third, it allows for tests of cross-level interaction effects: whether the nature and/or strength of the relationship between two lower-level variables (e.g., L1 predictor X and L1 outcome Y) change as a function of a higher-level variable W. For example, a researcher may be interested in testing the hypothesis that the relationship between individual job satisfaction and individual performance may vary as a function of (i.e., is moderated by) the degree of team cohesion such that the relationship will be stronger for highly cohesive compared to less cohesive teams.

One of the three types of effects mentioned previously, cross-level interactions, is at the heart of modern-day contingency theories, person-environment fit models, and any theory that considers outcomes to be a result of combined influences emanating from different levels of analysis (Mathieu et al., 2012). In addition to their specific role in those theoretical models, cross-level interaction effects are important in general because they are indicative of the presence of moderator variables. Specifically, the extent to which we understand the presence of cross-level interactions is an indication of theoretical progress because such relationships inform us of the conditions under which relationships change in nature, strength, or both. Cross-level interaction effects are particularly useful for practice because they provide information on the situations when a given intervention may result in more or less positive outcomes. For example, practitioners are particularly interested in knowing whether pre-employment tests, leadership training and development programs, performance management and appraisal processes, and compensation systems are equally as effective in terms of improving individual performance across different types of jobs and occupations, units of a firm (e.g., branches of a bank), and geographic locations (e.g., subsidiaries in different countries).

There is a fundamental question that all substantive researchers face prior to embarking in the search for cross-level interaction effects. Moreover, this fundamental question has remained unchanged since the very inception of multilevel modeling (e.g., Burstein, Linn, & Capell, 1978; Robinson, 1950) and, simply put, is: What is the degree of variability of a lower-level relationship across higher-order units? This has been and continues to be a critical question because its answer will dictate whether one should proceed with a formal test of cross-level moderator hypotheses. Stated differently, variability in the relationship between two variables across higher-level units is a precondition for the presence of moderator variables that could possibly account for this variability.

The goal of our article is to offer an expanded and more comprehensive approach to answering the question of whether there is sufficient variability in a lower-level relationship across higher-level units to warrant the search for cross-level interaction effects. The remainder of our article is organized as follows. First, we describe how researchers typically assess variability across higher-level collectives or contexts and clarify that this usual procedure is not informative regarding the possible presence of cross-level interaction effects. Second, we offer a general variance decomposition of L2 variability in lower-level scores. This section includes a description of the multilevel model, the typical procedure for assessing the presence of variability based on the intraclass correlation (ICC) ρ_α, and the derivation of a new index of variability in lower-level relationships across higher levels of analysis, which we label intraclass correlation ρ_β. Third, we describe a Monte Carlo study complementing analytical material in the previous section to provide evidence that ρ_α and ρ_β are indexes of orthogonal sources of variance. Fourth, we use data from recently published articles to illustrate the need for our recommended expanded procedure that includes ρ_β—and also demonstrate how decisions regarding the use of multilevel modeling improve as a consequence. Fifth, we compare and show the superiority of our newly proposed ρ_β to other indicators of variability that, although available in the statistical and methodological literature, are not usually implemented by organizational science researchers. For example, we describe that these indicators rely on significance testing procedures that require a large number of L2 units that is not frequently observed in management and organizational studies research (Mathieu et al., 2012). Sixth, we offer an illustrative data set and the R function icc_beta for computing ρ_β in single- and multiple-predictor situations and also make this package available through the Comprehensive R Archive Network (CRAN; http://cran.us.r-project.org). Finally, we close with recommendations regarding the expanded decision-making procedure for examining cross-level interaction effects and the possible presence of nonindependence in future empirical research, even if the particular research design and hypotheses do not include multilevel considerations explicitly.

Assessing Cross-Level Dependence and Variability

As is the case in all empirical research, theory considerations dictate the appropriateness of a particular data-analytic approach. Specifically regarding the possibility of using multilevel modeling in general and testing hypotheses about cross-level interactions in particular, there may be theory-based considerations that lead a researcher to suspect that dependence may be present in the data (i.e., variability based on a higher-level context or process). Moreover, as noted by Kenny and Judd (1996), “observations may be dependent, for instance, because they share some common feature, come from some common source, are affected by social interaction, or are arranged spatially or sequentially in time” (p. 138). Stated differently, the resulting data structure may include dependence of observations due to shared experiences even if there is no formal hierarchical structure such as individuals formally belonging to different teams.

Given theory-based considerations, there is a need to assess empirically the extent to which these shared experiences and context and, more generally, the clustering of entities within collectives have actually led to dependence. To do so, the consistent recommendation in the multilevel modeling literature is to assess the degree of dependence by computing the intraclass correlation ρ_α, which assesses the proportion of between-group variance relative to total variance in an outcome variable and can be interpreted as the correlation between two randomly selected members of the same group. This same recommendation is offered in many of the most influential and established textbooks addressing multilevel modeling (e.g., Heck et al., 2010; Hox, 2010; Raudenbush & Bryk, 2002; Snijders & Bosker, 2012). As summarized by Heck et al. (2010),

The first step in a multilevel analysis is partitioning the variance in an outcome variable into its within- and between-group components. If it turns out that there is little or no variation (perhaps less than 5%) in outcomes between groups, there would be no compelling need for conducting a multilevel analysis. (p. 6)

Not surprisingly, given this consistent recommendation in the methodological literature, substantive researchers compute and report results regarding ρ_α as evidence regarding the presence or absence of dependence and for justifying using multilevel modeling (or not) and subsequently testing cross-level interaction hypotheses. This is a pervasive and common practice that is reported in virtually all articles addressing multilevel issues (e.g., Halbesleben, Wheeler, & Paustian-Underdahl, 2013; Hu & Liden, 2013; Hülsheger, Alberts, Feinholdt, & Lang, 2013). Although some methodological sources have raised concerns about the sole reliance on ρ_α (e.g., Snijders & Bosker, 2012), and as we describe later in our article, some indicators of sources of L2 variance exist, using ρ_α is a well-established procedure for determining the possible presence of dependence and deciding whether the data structure requires the use of multilevel modeling. Next, we offer a general variance decomposition of L2 variability, which leads to the derivation of a new intraclass correlation ρ_β and analytical and empirical (i.e., Monte Carlo) evidence that this new index accounts for variance that is orthogonal to the variance assessed by ρ_α, which refers to differences across collectives and higher-level processes but ignores the variance attributed to differences in lower-level relationships.

General Variance Decomposition of Level 2 Variability in $y_{i j}$ Scores

Multilevel Model With a Single Predictor

The relationship between a predictor and a criterion at the lower level of a multilevel study is (Enders & Tofighi, 2007; Raudenbush & Bryk, 2002):

y_{i j} = β_{0 j} + β_{1 j} x_{i j} + r_{i j},

where $y_{i j}$ is the criterion score for the i ^th person in group j, $β_{0 j}$ is the intercept value for group j, $β_{1 j}$ is the slope for group j, $x_{i j}$ is the predictor score for the i ^th person in group j, and $r_{i j}$ is the L1 residual term such that $r_{i j} ~ N (0, σ^{2})$ . The single predictor case involves estimating the following L2 intercept-only models (i.e., models that exclude L2 predictors) (Enders & Tofighi, 2007; Hofmann & Gavin, 1998; Hox, 2010):

β_{0 j} = γ_{00} + u_{0 j},

where the intercept $γ_{00}$ and the residual term $u_{0 j}$ describe how group-level intercepts deviate from the grand-mean intercept, respectively, and

β_{1 j} = γ_{10} + u_{1 j},

where the intercept $γ_{10}$ and the residual term $u_{1 j}$ describe how group-level slopes deviate from the grand-mean slope, respectively.

The usual assumption is that regression coefficients are distributed jointly as random normal variables (Hox, 2010; Raudenbush & Bryk, 2002),

[\begin{matrix} β_{0 j} \\ β_{1 j} \end{matrix}] ~ N_{2} ([\begin{matrix} γ_{00} \\ γ_{10} \end{matrix}], [\begin{matrix} τ_{00} & τ_{01} \\ τ_{01} & τ_{11} \end{matrix}]) .

That is, $τ_{00}$ and $τ_{11}$ are the variances of $β_{0 j}$ and $β_{1 j}$ , respectively; $τ_{01}$ is the covariance between $β_{0 j}$ and $β_{1 j}$ ; and $u_{0 j}$ and $u_{1 j}$ are residuals, or random effects, that capture group differences as mentioned previously. The average $y_{i j}$ across L2 units is $γ_{00}$ . Additionally, as mentioned previously, $γ_{10}$ is the grand-mean slope of $y_{i j}$ on $x_{i j}$ across L2 units. Equations 2 and 3 do not include L2 predictors to be able to quantify the total variance attributed to group differences in intercepts (Equation 2) and slopes (Equation 3). Thus, the L2 Equations 2 and 3 can be substituted into the L1 Equation 1 to yield the mixed-model version of the multilevel linear model:

y_{i j} = γ_{00} + γ_{10} x_{i j} + u_{0 j} + u_{1 j} x_{i j} + r_{i j},

General Multilevel Model

In matrix notation, the multilevel model with more than one predictor is the following (Hox, 2010; Raudenbush & Bryk, 2002):

y_{i j} = x_{i j}^{'} β_{j} + r_{i j},

where $x_{i j}^{'} = (x_{i j 0}, x_{i j 1}, \dots, x_{i j p})$ is a $p + 1$ dimensional row vector of predictors for person i in group j, $x_{i j 0} = 1$ for the intercept, $β_{j}$ is a $p + 1$ dimensional vector of coefficients (the first element is the intercept and the remaining are slopes), and $r_{i j}$ is an error term. Let ${\overset{ˉ}{x}}^{'} = ({\overset{ˉ}{x}}_{0}, {\overset{ˉ}{x}}_{1}, \dots, {\overset{ˉ}{x}}_{p})$ be a vector of grand means and ${\overset{ˉ}{x}}_{j}^{'} = ({\overset{ˉ}{x}}_{j 0}, {\overset{ˉ}{x}}_{j 1}, \dots, {\overset{ˉ}{x}}_{j p})$ a vector of predictor averages for group j, $N = \sum_{j = 1}^{J} N_{j}$ where $N_{j}$ is the sample size in group j (j = 1,…, J), and the overall sample mean is $\overset{ˉ}{y} = \sum_{j = 1}^{J} N_{j} {\overset{ˉ}{y}}_{j} / N$ . Also, let $X_{j}^{'} = (x_{1 j}, \dots, x_{N_{j} j})$ be a $(p + 1) \times N_{j}$ matrix of predictors for group j and $X^{'} = (X_{1}^{'}, \dots, X_{J}^{'})$ a $(p + 1) \times N$ matrix of predictors across the J groups.

Expanding on Equation 6, the typical multilevel model assumes that

r_{i j} ~ N (0, σ^{2}),

y_{i j} | x_{i j}, β_{j}, σ^{2} ~ N (x_{i j}^{'} β_{j}, σ^{2}),

β_{j} ~ N_{p + 1} (γ, T),

where $σ^{2}$ is the error variance conditioned on $x_{i j}^{'} β_{j}$ , $γ$ is a vector of fixed-effects, and T is the variance-covariance matrix of the regression coefficients. Note that deviation from the overall outcome variable mean is denoted by

y_{i j} - \overset{ˉ}{y} = (y_{i j} - {\overset{ˉ}{y}}_{j}) + ({\overset{ˉ}{y}}_{j} - \overset{ˉ}{y}),

where $y_{i j} - {\overset{ˉ}{y}}_{j}$ captures within group-variability whereas ${\overset{ˉ}{y}}_{j} - \overset{ˉ}{y}$ measures between-group variance. Accordingly, each source of variance is independent of the other.

In the next section, we derive ρ_β using the general variance decomposition of $y_{i j}$ in Equation 10. Before we do so, however, we note that rescaling predictors is common in multilevel modeling to help in the interpretation of results (Aguinis et al., 2013; Dalal & Zickar, 2012; Enders & Tofighi, 2007). The two main rescaling approaches are group-mean centering and grand-mean centering, but other options include a hybrid approach (i.e., group-mean centering for L1 predictors and using group-level means for L2 predictors) as well as no rescaling at all. In the context of offering best-practice recommendations for assessing cross-level interaction effects, Aguinis et al. (2013) suggested using group-mean centering in most cases because the use of grand-mean centering conflates the between-L2 and within-L2 effects. Moreover, Aguinis et al. (2013), Enders and Tofighi (2007), and Hofmann and Gavin (1998) concluded that if a researcher uses grand-mean centering for the L1 predictor, it is not possible to make an accurate, or even meaningful, interpretation of the cross-level interaction effect. However, because there are several rescaling approaches available, the model in Equation 6 is general and can accommodate any centering strategy by, for example, defining the vector of predictors as $x_{i j} - \overset{ˉ}{x}$ for grand-mean centering or $x_{i j} - {\overset{ˉ}{x}}_{j}$ for group-mean centering. Moreover, as we describe later in our article, the computation of ρ_β remains the same regardless of the particular rescaling strategy.

Analytical Evidence of Differences between $ρ_{α}$ and $ρ_{β}$

As shown in the Appendix, the sample variance $S^{2}$ can be partitioned as follows:

S^{2} = \frac{1}{N - 1} \sum_{i = 1}^{N} {(y_{i j} - \overset{ˉ}{y})}^{2} = S_{W}^{2} + S_{B}^{2},

where the between-group variance is $S_{B}^{2} = \frac{1}{N - 1} \sum_{j = 1}^{J} N_{j} {({\overset{ˉ}{y}}_{j} - \overset{ˉ}{y})}^{2}$ and the within-group variance is $S_{W}^{2} = \frac{1}{N - 1} \sum_{j = 1}^{J} \sum_{i = 1}^{N_{j}} {(y_{i j} - {\overset{ˉ}{y}}_{j})}^{2}$ . The traditional intraclass correlation ρ_α is defined as the portion of variance in the criterion (i.e., outcome variable) attributed to grouping or nesting (i.e., L2 variability), which is the relative size of $S_{B}^{2}$ to the total variance,

ρ_{α} = \frac{S_{B}^{2}}{S_{B}^{2} + S_{W}^{2}} .

To compute ρ_α, we first estimate the following random-intercepts model, which is equivalent to a one-way random effects analysis of variance (Aguinis et al., 2013; Hox, 2010; Snijders & Bosker, 2012):

y_{i j} = γ_{00} + u_{0 j} + r_{0 i j},

where, as noted earlier, $γ_{00}$ is the grand mean of $y_{i j}$ , $u_{0 j}$ is the corresponding random effect, and $r_{0 i j}$ is the L1 error term. Based on the model in Equation 13, ρ_α is computed as follows (Raudenbush & Bryk, 2002):

ρ_{α} = \frac{τ_{00}}{σ_{Y}^{2}},

where $τ_{00}$ is the variance of $u_{0 j}$ and $σ_{Y}^{2}$ is the variance of $y_{i j}$ .

Equation 10 shows that ρ_α is orthogonal to within-group deviations from the mean, $y_{i j} - {\overset{ˉ}{y}}_{j}$ , and Equation 12 demonstrates that ρ_α measures the share of criterion variance attributed to between-group differences in criterion means. In fact, derivations in the Appendix demonstrate that the expected value of $S_{B}^{2}$ is a function of both between-group intercept and slope differences as follows:

E [S_{B}^{2} | X] = \frac{1}{N - 1} \sum_{j = 1}^{J} N_{j} t r \{[(1 - \frac{N_{j}}{N}) T + {γ γ}^{'}] {\overset{ˉ}{x}}_{j} {\overset{ˉ}{x}}_{j}^{'}\} - \frac{N}{N - 1} t r [{γ γ}^{'} \overset{ˉ}{x} {\overset{ˉ}{x}}^{'}] + \frac{J - 1}{N - 1} σ^{2},

where tr indicates a matrix trace (i.e., the sum of the elements along the diagonal).

In short, ρ_α is a function of between-group variability in intercepts and slopes as quantified by $T$ . However, the extent to which slope differences affect variability in ${\overset{ˉ}{y}}_{j} - \overset{ˉ}{y}$ is a function of group predictor mean differences. For simplicity of exposition, consider the case where ${\overset{ˉ}{x}}_{j} = \overset{ˉ}{x}$ for all j and the last p elements of $\overset{ˉ}{x}$ are zero (e.g., grand-mean centered). One implication is that when groups are equivalent on the predictors, all but the first element of $\overset{ˉ}{x}$ equals zero and $E [S_{B}^{2} | X]$ simplifies significantly. For this special case, if $N_{j} = N_{*}$ and ${\overset{ˉ}{x}}_{j} = \overset{ˉ}{x}$ then $E [S_{B}^{2} | X]$ reduces to

\frac{N - N_{*}}{N - 1} T_{00} + \frac{J - 1}{N - 1} σ^{2},

where $T_{00} = V a r (β_{0 j} | X)$ and $N = J N_{*}$ . Consequently, researchers can expect ρ_α to include some variance attributed to group slope differences in cases where groups vary in average predictor values.

The following derivations show that a portion of slope variability across groups also contributes to variability in $y_{i j} - {\overset{ˉ}{y}}_{j}$ , which implies that ρ_α accounts for only a portion of variability attributed to slope differences. This finding is important because variability of $y_{i j} - {\overset{ˉ}{y}}_{j}$ is orthogonal to the traditional between group variability as indexed by ${\overset{ˉ}{y}}_{j} - \overset{ˉ}{y}$ .

Consider $S_{W}^{2}$ and note that the average criterion score for group j is,

{\overset{ˉ}{y}}_{j} = \frac{1}{N_{j}} \sum_{i = 1}^{N_{j}} (x_{i j}^{'} β_{j} + r_{i j}) = {\overset{ˉ}{x}}_{j}^{'} β_{j} + {\overset{ˉ}{r}}_{j},

where ${\overset{ˉ}{r}}_{j} = \frac{1}{N_{j}} \sum_{i = 1}^{N_{j}} r_{i j}$ . Consequently, the within-group deviation is,

y_{i j} - {\overset{ˉ}{y}}_{j} = x_{i j}^{'} β_{j} + r_{i j} - {\overset{ˉ}{x}}_{j}^{'} β_{j} - {\overset{ˉ}{r}}_{j} = {(x_{i j} - {\overset{ˉ}{x}}_{j})}^{'} β_{j} + r_{i j} - {\overset{ˉ}{r}}_{j} .

There are several important observations regarding Equation 18. First, partitioning variance in $y_{i j} - {\overset{ˉ}{y}}_{j}$ naturally leads to group-mean centering predictors because the within-group average on the outcome, ${\overset{ˉ}{y}}_{j}$ , is a function of ${\overset{ˉ}{x}}_{j}$ . This implication is consistent with the consensual recommendation mentioned earlier that group-mean centering be the preferred rescaling strategy within the specific context of assessing cross-level interaction effects and quantifying outcome variance attributed to slope heterogeneity. Second, in Equation 18 the first element of $x_{i j} - {\overset{ˉ}{x}}_{j}$ is zero because by definition $x_{i j 0} = {\overset{ˉ}{x}}_{j 0} = 1$ . Consequently, group j’s intercept does not influence $y_{i j} - {\overset{ˉ}{y}}_{j}$ . This result reinforces the finding in Equation 10 that the sources of variance captured by ρ_α and ρ_β are indeed orthogonal.

In order to find an estimator for ρ_β, we first must identify the expected value of the within-group variance, $S_{W}^{2}$ , given a matrix $X$ of predictors. In fact, the Appendix includes derivations that imply that $E [S_{W}^{2} | X]$ is defined as

E [S_{W}^{2} | X] = t r [(T + {γ γ}^{'}) \frac{X_{c}^{'} X_{c}}{N - 1}] + \frac{N - J}{N - 1} σ^{2},

where, as shown in Equation 9, $T$ is the variance-covariance matrix of the random effects and $γ$ is a vector of fixed effects. Furthermore, $X_{c}^{'} X_{c} = \sum_{j = 1}^{J} \sum_{i = 1}^{N_{j}} (x_{i j} - {\overset{ˉ}{x}}_{j}) {(x_{i j} - {\overset{ˉ}{x}}_{j})}^{'}$ , which by definition implies that $X_{c}$ is a $N \times (p + 1)$ matrix of group-mean centered predictors for all groups and individuals. Note that the first column of $X_{c}$ includes zeroes for the intercept and $X_{c}^{'} X_{c} / (N - 1)$ represents the average within-group relationship among the predictors over groups. Equation 19 shows that $E [S_{W}^{2} | X]$ is a function of the random effects variance-covariance matrix, $T$ , the relationships among group-mean centered predictors $X_{c}^{'} X_{c} / (N - 1)$ , the fixed effects, $γ$ , and within-group error variance $σ^{2}$ . Accordingly, the portion of $E [S_{W}^{2} | X]$ that is attributed to group slope differences is $t r [T \frac{X_{c}^{'} X_{c}}{N - 1}]$ and the intraclass correlation that quantifies the share of within-group outcome variance attributed to slope differences is

ρ_{β} = t r [T \frac{X_{c}^{'} X_{c}}{N - 1}] S^{- 2} .

One observation from Equation 20 is that the first row and column of $X_{c}^{'} X_{c}$ includes zeros, which has the effect of removing the covariances between intercepts and slopes when computing $t r [T \frac{X_{c}^{'} X_{c}}{N - 1}]$ . Stated differently, covariances between group intercepts and slopes do not contribute to variance quantified by ρ_β.

Monte Carlo Empirical Evidence of Differences Between ρ_α and ρ_β

We conducted a Monte Carlo simulation as a follow-up to the analytical results. We implemented 1,000 replications with number of groups $J$ = 30 and group size $N_{j}$ = 30 for all groups for each condition and then calculated the empirically derived ρ_α and ρ_β values. We manipulated two factors that, based on the derivations in the previous section, affect ρ_α and ρ_β: (a) the variability of group intercepts and slopes as indicated by $T$ and (b) the extent to which groups differ in averages on the predictor variables. We generated the outcome variable as normal with unit variance when conditioned on the predictors and regression coefficients,

y_{i j} | x_{i j}, β_{j} ~ N (x_{i j}^{'} β_{j}, 1) .

In addition, we considered two scenarios for $x_{i j}$ and the between-group predictor means ${\overset{ˉ}{x}}_{j}$ . The first scenario reflected the circumstance where groups did not differ in the population on predictor averages,

x_{i j} | {\overset{ˉ}{x}}_{j} ~ N_{3} ({\overset{ˉ}{x}}_{j}, [\begin{matrix} 0 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{matrix}]), {\overset{ˉ}{x}}_{j} ~ N_{3} ([\begin{matrix} 1 \\ 0 \\ 0 \end{matrix}], [\begin{matrix} 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{matrix}]) .

That is, ${\overset{ˉ}{x}}_{j}$ was assumed to originate from a population with constant values in each group where the first element equals 1 to represent the intercept and, consequently, all predictor variance is attributed to within-group differences. The second scenario for $x_{i j}$ and ${\overset{ˉ}{x}}_{j}$ incorporates some between-group predictor variance. Namely, the distributions for $x_{i j}$ and ${\overset{ˉ}{x}}_{j}$ are

x_{i j} | {\overset{ˉ}{x}}_{j} ~ N_{3} ({\overset{ˉ}{x}}_{j}, \frac{4}{5} [\begin{matrix} 0 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{matrix}]), {\overset{ˉ}{x}}_{j} ~ N_{3} ([\begin{matrix} 1 \\ 0 \\ 0 \end{matrix}], \frac{1}{5} [\begin{matrix} 0 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{matrix}]),

which reflects a circumstance where 20% of the variability in $x_{i j}$ is due to between-group variability. For both scenarios, the distribution of $x_{i j}$ conditioned on ${\overset{ˉ}{x}}_{j}$ is trivariate normal.

Also, we examined the effect of different values of T on ρ_α and ρ_β across three situations in which we fixed the means of $β_{j}$ in the population to $γ = (0, 2^{- 1}, 2^{- 1})$ and specified T so that $β_{j}$ is distributed as follows:

β_{j} ~ N_{3} (\frac{1}{2} [\begin{matrix} 0 \\ 1 \\ 1 \end{matrix}], T) .

The three scenarios for T are as follows:

T = \frac{1}{2} [\begin{matrix} 1 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{matrix}], T = \frac{1}{2} [\begin{matrix} 0 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{matrix}], T = \frac{1}{2} I_{3},

where $I_{3}$ denotes a three-dimensional identity matrix. In other words, the first scenario for T represents the case of no variability in slopes, the second scenario includes between-group variance only in group slopes, and the third scenario considers variability in intercepts and slopes.

Table 1 reports values for ${\hat{ρ}}_{α}$ and ${\hat{ρ}}_{β}$ , which are averages of Monte Carlo estimates of ρ_α and ρ_β from 1,000 replications for the six conditions full crossing (a) two conditions for between-group predictor differences and (b) three conditions for intercept and slope differences. For the sake of completeness, Table 1 also reports results regarding the likelihood ratio test (LRT) of non-zero $τ_{11}$ as well as ${\overset{ˉ}{λ}}_{1}$ and ${\overset{ˉ}{λ}}_{2}$ —average least squares slope reliabilities for the two simulated continuous predictors (we describe results pertaining to LRT and reliabilities later in our article).

Table 1.

Monte Carlo Simulation Results: Average $ρ_{α}$ and $ρ_{β}$ for Different Scenarios Manipulating Between-Group Predictor Differences and Intercept and Slope Differences.

T Scenarios	$x_{i j}$ Scenario 1							$x_{i j}$ Scenario 2
T Scenarios	${\hat{ρ}}_{α}$	${\hat{ρ}}_{β}$	$ρ_{α}$	$ρ_{β}$	LRT	${\overset{ˉ}{λ}}_{1}$	${\overset{ˉ}{λ}}_{2}$	${\hat{ρ}}_{α}$	${\hat{ρ}}_{β}$	$ρ_{α}$	$ρ_{β}$	LRT	${\overset{ˉ}{λ}}_{1}$	${\overset{ˉ}{λ}}_{2}$
1	.238	.004	.242	.000	.012	.097	.096	.287	.004	.292	.000	.010	.098	.103
2	.003	.373	.000	.400	1.000	.922	.920	.114	.297	.117	.320	1.000	.905	.904
3	.160	.312	.161	.333	1.000	.921	.922	.254	.252	.259	.267	1.000	.904	.903

Note: ${\hat{ρ}}_{α}$ and ${\hat{ρ}}_{β}$ are empirically derived Monte Carlo estimates of the theoretical values $ρ_{α}$ and $ρ_{β}$ using 1,000 replications. $x_{i j}$ Scenario 1 corresponds to no between-group differences, and scenario 2 represents the case of 20% variance in predictors attributed to group membership. The T scenarios reflect scenario 1: only intercept differences, scenario 2: only slope differences, and scenario 3: intercept and slope differences. “LRT” denotes the probability of rejecting the likelihood ratio test where the observed deviance was compared with a $χ_{5}^{2}$ critical value (there are 5 degrees of freedom: two slope variances, one covariance between random slopes, and two covariances between slopes and the random intercept). The full model “LRT” denotes Type I error rates for T scenario 1 and statistical power for T scenarios 2 and 3. ${\overset{ˉ}{λ}}_{1}$ and ${\overset{ˉ}{λ}}_{2}$ are the average least squares slope reliabilities for the two simulated continuous predictors.

For scenario 1 for T and scenario 1 for $x_{i j}$ , all between-group variance is attributed to variability in intercepts only and groups are equivalent on the predictors. Table 1 reports Monte Carlo values of ${\hat{ρ}}_{α}$ = 0.238 and ${\hat{ρ}}_{β}$ = 0.004 and their theoretical counterparts are quite similar: ρ_α = 0.242 and ρ_β = 0.000. Introducing group predictor differences in scenario 2 results in a larger ${\hat{ρ}}_{α}$ = 0.287, which corresponds closely with the theoretical value of ρ_α = 0.292.

The second T scenario includes group slope variance and no intercept variance with average Monte Carlo values of ${\hat{ρ}}_{α}$ = 0.003 and ${\hat{ρ}}_{β}$ = 0.373 for the $x_{i j}$ scenario 1. The value of ${\hat{ρ}}_{α}$ = 0.003 mirrors the analytic value of ρ_α = 0.000 and demonstrates that ρ_β accounts for variance that is orthogonal to the variance captured by ρ_α.

Results regarding scenario 2 for $x_{i j}$ demonstrates another contribution of our study, which is that ρ_α includes variance attributed to slopes only when groups differ in average predictor values. The second scenario for $x_{i j}$ represents a situation where 20% of predictor variability is attributed to group mean differences, and in this case ${\hat{ρ}}_{α}$ = 0.114 (with a theoretical value of ρ_α = 0.117), which reflects the variance in criterion means that is attributed to slope differences. Table 1 also shows a decrease in ρ_β in the case of group differences in the predictors with ${\hat{ρ}}_{β}$ = 0.297 and ρ_β = 0.320.

In the third and final T scenario, groups differ in both intercepts and slopes with average Monte Carlo values of ${\hat{ρ}}_{α}$ = 0.160 and ${\hat{ρ}}_{β}$ = 0.312 with corresponding theoretical values of ρ_α = 0.161 and ρ_β = 0.333, respectively. As expected, introducing group predictor heterogeneity in scenario 2 for $x_{i j}$ results in a larger ${\hat{ρ}}_{α}$ = 0.254 and a smaller ${\hat{ρ}}_{β}$ = 0.252. This third T scenario further demonstrates that ρ_α and ρ_β capture two different sources of criterion variance.

In short, Monte Carlo simulation results confirmed the analytical evidence presented earlier: ρ_α and ρ_β are orthogonal and capture two different sources of between-group criterion variance. In addition, ρ_α includes variance attributed to slopes only when groups differ in average predictor values.

Implications for Substantive Research: Different Conclusions Based on the Use of ρ_α Versus ρ_β

Mathieu et al. (2012, Table 1) reported summary statistics for a set of articles addressing cross-level interaction effects. Mathieu et al. reported data for Studies 1 and 2 (based on Chen, Kirkman, Kanfer, Allen, & Rosen, 2007), Study 3 (based on Hofmann, Morgeson, & Gerras, 2003), Studies 4 and 5 (based on Liao & Rupp, 2005), and Studies 6 and 7 (based on Mathieu, Ahearne, & Taylor, 2007). We used those data from Mathieu et al. to compare and contrast the magnitude of ρ_β to the traditional ICC ρ_α.

Table 2 includes ρ_β values, which we calculated for each study. Across the seven studies, ρ_β ranged from 0.00 to 0.111. Stated differently, group slope differences accounted for a range of 0.00% to 11.1% of the variance in the outcome variables, and the average variance attributed to between-group differences in slopes was 3.5%.

Table 2.

Illustration of Differences Between Traditional (i.e., $ρ_{α})$ and Newly Derived Intraclass Correlation (i.e., $ρ_{β}$ ) as Indicators of Higher-Level Variability.

Study	Variables	$ρ_{α}$	$ρ_{x}$	$τ_{11}$	$ρ_{β}$
1: Chen, Kirkman, Kanfer, Allen, and Rosen (2007)	X: Leader-member exchange W: Empowering leadership Y: Individual empowerment	0.000	0.040	0.064	0.057
2: Chen et al. (2007)	X: Individual empowerment W: Team empowerment Y: Individual performance	0.176	0.000	0.002	0.002
3: Hofmann, Morgeson, and Gerras (2003)	X: Leader-member exchange W: Safety climate Y: Safety role definitions	0.000	0.409	0.210	0.111
4: Liao and Rupp (2005)	X: Justice orientation W: Org.-focus PJ climate Y: Satisfaction with organization	0.107	0.111	0.011	0.008
5: Liao and Rupp (2005)	X: Justice orientation W: Sup.-focus PJ climate Y: Supervisor commitment	0.310	0.111	0.021	0.015
6: Mathieu, Ahearne, and Taylor (2007)	X: Work experience W: Empowering leadership Y: Technology self-efficacy	0.000	0.022	0.000	0.000
7: Mathieu et al. (2007)	X: Technology use W: Empowering leadership Y: Individual performance	0.326	0.037	0.114	0.066
Average		0.131	0.104	0.060	0.037

Note: $ρ_{α}$ = traditional intraclass correlation, $ρ_{x}$ = intraclass correlation for xij, $τ_{11}$ = variance of $β_{1 j}$ , and $ρ_{β}$ = intraclass correlation indexing the proportion of between-group variance in an outcome variable due to differences in slopes across groups. Data source: Mathieu, Aguinis, Culpepper, and Chen’s (2012) Table 1, who calculated $τ_{11}$ after transforming the predictor and criterion to have unit variance. X: L1 predictor, W: L2 moderator, Y: L1 outcome, Org.-focus PJ: organization-focused procedural justice, Sup.-focus PJ: supervisor-focused procedural justice.

Table 2 also shows that for 2 of the 7 studies, the variance attributed to slope differences was larger than the variance associated with differences in outcome variable means as assessed by the traditional ICC (i.e., $ρ_{β} > ρ_{α}$ ). The values of ρ_α and ρ_β have implications for how researchers proceed with data analysis and model selection. Specifically, those who base judgment about the need to use multilevel modeling solely on ρ_α are likely to mistakenly ignore important between-group slope variance. For instance, consider Study 1 in Table 2. For Study 1, ρ_α = 0.00 whereas ρ_β = 0.057, which implies that groups do not differ in averages on the outcome variable and 5.7% of criterion variable variance is attributed to group slope differences. Consequently, researchers who rely solely on ρ_α may incorrectly conclude that it is appropriate to use OLS regression as opposed to multilevel modeling. Moreover, this decision would also result in not estimating cross-level interaction effects.

An additional consideration regarding results in Table 2 concerns the magnitude of ρ_β. Specifically, many of the values may seem to be small—perhaps leading to the conclusion that the search for cross-level interaction effects may be futile. However, this conclusion is not warranted for the following reasons. First, LeBreton and Senter (2008) argued that ICC values of about .05 represent a small to medium effect and “values as small as .05 may provide prima facie evidence of a group effect” (p. 838). Reinforcing this recommendation, evidence made available recently demonstrates that the effect size guidelines reported by Cohen (1988) are overestimates of the types of effects usually reported in management and organizational studies research (Bosco, Aguinis, Singh, Field, & Pierce, in press). Second, ICC values should be considered within specific contexts because small observed effects may result from inauspicious designs, studies that involve phenomena leading to obscure consequences, and studies that challenge fundamental assumptions (Cortina & Landis, 2009). Third, there are numerous methodological and statistical artifacts that decrease observed effect sizes compared to their true population counterparts, and this is particularly true for interaction effects (Aguinis, 2004; Aguinis, Beaty, Boik, & Pierce, 2005; Aguinis & Stone-Romero, 1997). Accordingly, it is likely that in many cases, and due to methodological and statistical artifacts, observed variability in slopes as indicated by ρ_β is actually larger in the population than has been estimated. Fourth, in some cases, small effect sizes may be practically significant (Aguinis et al., 2010). Thus, when the phenomena of interest has important implications for theory or practice, even small values for ρ_β may be considered as an indication for the need to assess particular cross-level interaction hypotheses.

Comparison of ρ_β With Existing Tests and Statistics

Although the recommendation in most textbooks on multilevel modeling is to rely on the traditional intraclass correlation ρ_α and management and organizational studies researchers seem to follow this recommendation, there are tests and indices described in the statistical and methodological literature that researchers might employ for understanding the nature of group slope differences (LaHuis, Hartman, Hakoyama, & Clark, 2014). However, ρ_β offers a unique value-added contribution as an index for how group variability in slopes translates to differences in the outcome variable $y_{i j}$ . Furthermore, ρ_β provides researchers insight regarding the amount of variability that exists in $y_{i j}$ due to group differences in slopes and can guide researchers in the theory-based search for cross-level moderators. As such, ρ_β can be used to produce preliminary estimates of group slope differences based on pilot data or exploratory analyses. Next, we compare and contrast ρ_β to tests based on model likelihood and statistics based on the variance of group slopes and reliability indices.

Likelihood Ratio Tests of Non-Zero $τ_{11}$

Researchers could employ LRTs to statistically evaluate whether group slopes differ. Specifically, LRTs are defined as the difference between –2 log-likelihood values (–2LL) for a full and reduced model. For the current context, –2LLnull represents a random intercepts model with nonrandom slopes whereas –2LLfull allows group slopes to differ. LRTs use deviance as a test statistic where deviance = 2LLfull – 2LLnull, which is asymptotically distributed as a χ² random variable (Aguinis et al., 2013).

It is possible to test the significance of adding a random slope to an existing random intercept model with a chi-square distribution with 2 degrees of freedom (i.e., $χ_{1 - α, 2}^{2}$ ): one for the new random slope and a second for the covariance between group slopes and intercepts. However, using $χ_{1 - α, 2}^{2}$ to test for group slope differences is asymptotically too conservative. Specifically, Self and Liang (1987) and Stram and Lee (1994) offered theoretical results showing that the preferred option is a chi-square distribution that is a 50:50 mixture of $χ_{1 - α, 1}^{2}$ and $χ_{1 - α, 2}^{2}$ distributions. Additional research has proposed a Monte Carlo procedure for comparing a calculated –2LL to an appropriate null distribution (Crainiceanu & Ruppert, 2004). Theoretical results hold for large sample sizes, but these are only rarely observed in typical management and organizational studies research, as documented by Mathieu et al.’s (2012) review.

An additional drawback is that the LRT test does not involve a meaningful effect size to assess the extent to which group slopes differ. Namely, LRTs serve as an omnibus test for the presence or absence of group slope differences and only provide researchers with guidance as to whether group slopes might differ in the population. In contrast, ρ_β quantifies the amount of group differences in $y_{i j}$ that is attributed to group differences in slopes. Consequently, in contrast to LRT, researchers can use ρ_β as an index concerning the effect size of group differences in slopes on $y_{i j}$ .

We illustrate some of the aforementioned limitations regarding the use of null hypothesis significance testing to assess the presence of variability of slopes with data made available by Hofmann, Griffin, and Gavin (2000). We refer to this particular illustration because their chapter and data set are used by many instructors teaching multilevel modeling. In particular, a reanalysis of the Hofmann et al. data suggests a statistically significant slope variance with p < 10^-6 for mood whereas ${\hat{ρ}}_{β} = 0.003$ (note that ${\hat{ρ}}_{α} = 0.770$ ). In other words, ρ_β seems to suggest that the variance in the outcome attributed to group-based slope differences in mood is relatively small whereas employing an LRT (with a Monte Carlo procedure using the LRTSim R function) indicates the slope differs from zero (and it is very unlikely that it is zero). This apparent discrepancy in conclusions based on the LRT, and ρ_β is due to the fact that LRTs are tests that become more statistically powerful for larger sample sizes. In contrast, ρ_β is an effect size estimate that is independent of sample size. Specifically, the Hofmann et al. data set includes 1,000 observations (i.e., 50 groups and 20 individuals per group), which results in a statistically powerful test to find non-zero differences in slopes—even if they are small. On the other hand, ρ_β indicates that the variance in the outcome attributed to these slope differences is substantively small. In short, these results offer a good illustration regarding fundamental differences between null hypothesis statistical tests and effect size estimates as they pertain particularly to the assessment of slope variance across groups.

Finally, results included in Table 1 pertaining to LRT also offer additional insights regarding this test in relationship with ρ_β. Specifically, Table 1 includes results based on a comparison of a full model with random intercepts and slopes to a model with only random intercepts. Statistical power for the LRT (i.e., T scenarios 2 and 3) was 1.00 across the three scenarios for $x_{i j}$ . Clearly, knowing the precise amount of variance due to slope differences is more informative than concluding, across all of these conditions, that such variance is unlikely to be zero. In addition, Table 1 shows that using a critical value from a chi-square distribution with 5 degrees of freedom led to a statistically conservative test as only 1.2% and 1.0% of replications were rejected in comparison to the true 5% rejection (i.e., a priori Type I error) level.

Statistics Describing Group Slope Differences

There are additional strategies for interpreting the size and nature of group slope differences. These include confidence intervals and reliability coefficients for group slopes.

First, the variance of slopes could be used to gauge the degree of group slope differences in the population. Specifically, $τ_{11}$ is the variance of the slope random effect, and larger values imply greater group-based differences. Furthermore, it is possible to use $τ_{11}$ to construct a confidence interval or range of plausible values for group slopes in repeated sampling from the population (Raudenbush & Bryk, 2002). For instance, in larger samples it is possible to compute $γ_{10} \pm 2 τ_{11}$ to indicate the approximately 95% range of slope values in the population. That is, larger plausible values ranges are associated with greater group differences in slopes.

Confidence intervals involving $τ_{11}$ provide a practical understanding of group differences in the population. However, $τ_{11}$ and any computed confidence intervals do not translate how variance in $y_{i j}$ is attributed to group differences in slopes. Specifically, $τ_{11}$ is on the metric of the relationship between $x_{i j}$ and $y_{i j}$ rather than a metric for the variance of $y_{i j}$ . Instead, ρ_β is a function of $τ_{11}$ and offers an index for translating observed slope differences into the degree to which group slope differences contribute towards differences in $y_{i j}$ .

As an additional approach, Raudenbush and Bryk (2002) described equations for estimating the reliability of group slopes as an index for the extent to which groups differ in $β_{1 j}$ , but not $y_{i j}$ . Specifically, the reliability of $β_{1 j}$ across groups is defined as,

λ_{1} = \frac{1}{J} \sum_{j = 1}^{J} \frac{τ_{11}}{τ_{11} + υ_{1 j}},

where J is the number of groups and $υ_{1 j}$ is the sampling error for $β_{1 j}$ . Note that $τ_{11} + υ_{1 j}$ represents the “total variance” of $β_{1 j}$ , so $λ_{1}$ represents the average proportion of true slope variability across J groups and, more precisely, quantifies the proportion of variance in the least squares group slopes that are attributed to true between-group slope differences. Although $λ_{1}$ is a valuable index, it does not indicate how group slope differences relate to variability in $y_{i j}$ . Furthermore, $λ_{1}$ provides information about a single predictor only and does not yield information concerning the overall reliability of group slope differences. In contrast, ρ_β is an index of criterion variance attributed to group slope differences.

Referring back to Table 1, results offer additional insights regarding the reliability of slopes as an index of slope variability in relationship with ρ_β. For example, Table 1 shows that the average slope reliability was approximately 0.1 in the absence of slope differences (i.e., T scenario 1) and 0.9 in the presence of slope differences (i.e., T scenarios 2 and 3). Clearly, $λ_{1}$ and $λ_{2}$ provide an indication as to the share of slope variance that is true rather than sampling error, but slope reliability does not explicitly describe criterion variance attributed to slope differences.

Illustrative Data Set for Computing ρ_β

We created a data set to illustrate the computation of ρ_β. The data include the following simulated variables: “1,” which is a column of 1s, “X1” (L1 predictor), “X2” (L2 predictor), and “Y” (criterion). The R code we used for all calculations is included as a vignette in the documentation for the icc_beta R package, which can be freely downloaded through the CRAN. We make this code and data available so that they can be used for calculating ρ_β for substantive as well as instructional purposes. Note that the code can be used in situations including any number of predictors. The data were simulated using the same code as for the simulation with the population model defined as:

y_{i j} | x_{i j}, β_{j} ~ N (x_{i j}^{'} β_{j}, 1)

x_{i j} | {\overset{ˉ}{x}}_{j} ~ N_{3} ({\overset{ˉ}{x}}_{j}, \frac{49}{50} [\begin{matrix} 0 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{matrix}]), {\overset{ˉ}{x}}_{j} ~ N_{3} ([\begin{matrix} 1 \\ 0 \\ 0 \end{matrix}], \frac{1}{50} [\begin{matrix} 0 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{matrix}]), β_{j} ~ N_{3} (\frac{1}{2} [\begin{matrix} 0 \\ 1 \\ 1 \end{matrix}], \frac{1}{20} I_{3}) .

Implementation of the R code using the illustrative data set leads to the following results and conclusions (note that we used a random-intercepts model to calculate ρ_α). First, ${\hat{ρ}}_{α}$ = 0.033, which implies that 3.3% of the variability in the criterion is associated with group mean differences. Second, we estimated a random-slopes multilevel model that included both X1 and X2 as predictors of Y. Results show that ${\hat{ρ}}_{β}$ = 0.075, which means that 7.5% of the variability in Y is accounted for by group differences in slopes. In conclusion, the sole reliance on ρ_α, which is small, would have led to the conclusion that multilevel modeling may not be necessary and, consequently, a missed opportunity to investigate cross-level interaction effects. In contrast, the result based on ρ_β that about 7% of variance is attributed to slope differences across groups leads to the conclusion that there is a need to understand which particular higher-level moderators explain this variance.

Discussion

There is an increased awareness regarding the need to understand the nature of cross-level interaction effects—the extent to which relationships at a lower-level of analysis (e.g., two individual-level variables) vary across higher-level units (e.g., groups, units, firms). This need is central for making progress in modern-day contingency theories, person-environment fit models, and any theory that considers outcomes to be a result of combined influences emanating from different levels of analysis. In addition, a better understanding of cross-level interaction effects offers information that practitioners can use in planning and implementing interventions in specific contexts because such knowledge allows them to anticipate the relative effectiveness of such interventions given certain contextual (i.e., higher-level) factors. Thus, knowledge about cross-level interaction effects allows practitioners to enhance the effectiveness of interventions. Because of these reasons, there has been an exponential growth in the literature on multilevel modeling (Aguinis, Pierce, Bosco, & Muslin, 2009; Mathieu et al., 2012), which is a data-analytic approach that considers and models data dependence and such cross-level interaction effects explicitly.

In spite of the increased diversity and complexity in the methodological literature, there is a common challenge that permeates all types of multilevel modeling: the need to understand the degree of variability of a lower-level relationship across higher-order units, processes, or contexts. Although theory-based considerations initially dictate whether multilevel modeling may be the preferred data-analytic approach, the consistent recommendation in the methodological literature is that researchers first compute the intraclass correlation, ρ_α, as a criterion in deciding whether to use multilevel modeling (e.g., Aguinis et al., 2013). Not surprisingly, researchers use this criterion for deciding whether the use of multilevel modeling is appropriate. If the intraclass correlation is not sufficiently high, then multilevel modeling is not considered necessary and there is also not sufficient justification for assessing cross-level interaction effect hypotheses. Inversely, multilevel modeling is the preferred approach if the intraclass correlation ρ_α is sufficiently high. The reason for this recommendation, which is offered in most major textbooks on multilevel modeling, is that the intraclass correlation ρ_α assesses the proportion of between-group variance relative to total variance in an outcome variable, and therefore, it signals the presence of nonindependence in the data structure.

Our article showed analytically and via simulation that the current conceptualization and estimation of the intraclass correlation captures across-group variability due to intercept differences and only a portion of variability attributed to slope differences. Thus, the intraclass correlation ρ_α is, using psychometric terminology, a deficient index of dependence. In other words, across-group variability may also exist due to slope differences across groups, but this is not reflected in the intraclass correlation as currently conceptualized and calculated. In contrast, the newly derived intraclass correlation ρ_β is an index of proportion of variance in criterion scores due to group differences, but the source of this variability is group difference in slopes.

We used data reported in several articles addressing substantive theories and research domains to illustrate that using the traditional intraclass correlation ρ_α as an index of group differences ignores the variance attributed to group slope differences and reduces the total reported variance attributed to group differences. In some cases, using ρ_α as the sole criterion for understanding the degree of across-group variability may lead researchers to miss an opportunity to study cross-level interaction effects that, as noted by Mathieu et al. (2012), “lay at the heart of modern-day contingency theories, person-environment fit models, and any theory that considers outcomes to be a result of combined influences emanating from different levels of analysis” (p. 952).

There is an additional use for ρ_β that has implications for future theory development. Because ρ_β is expressed in standardized metric, its value is not dependent on the particular scales used in a particular study. Accordingly, similar to a Pearson’s correlation coefficient, ρ_β can be used in meta-analytic reviews, and such research can open up new lines of investigation. For example, assume that a meta-analysis of the literature on the relationship between job satisfaction and task performance results in a larger mean value for ρ_β compared to the estimate based on the relationship between job satisfaction and organizational citizenship behavior (OCB). This result implies that there is greater cross-level heterogeneity for the satisfaction–task performance relationship compared to the satisfaction-OCB relationship. Accordingly, this result indicates that it would be more fruitful to conduct primary-level research investigating cross-level moderating effects of the satisfaction–task performance compared to the satisfaction-OCB relationship. Alternatively, conducting meta-analyses based on ρ_β values can also result in information that would be useful in terms of deciding to not search for cross-level interaction effects in certain domains. Given the proliferation of management and organizational studies theories and the need to engage in theory pruning (Leavitt, Mitchell, & Peterson, 2010), using ρ_β as an index of where one should not search for cross-level interaction effects could be just as useful, or even more useful, than using it as an index of areas where such effects are more likely to be found.

In terms of yet additional uses of ρ_β, our discussion thus far focused on research designs in which units are nested within collectives such as individuals within groups, groups within firms, or firms within industries. However, ρ_β can also be computed within the context of longitudinal designs where repeated measurements are collected for units (e.g., individuals, firms) and time as well as time-varying predictors are included in the model. In terms of a multilevel model conceptualization, the lower level refers to observations and the higher level to the units (e.g., entrepreneurs, teams, firms) about which data have been collected over time. Referring back to Equation 10, in these types of designs, within-unit variability is captured by $y_{i t} - {\overset{ˉ}{y}}_{i}$ where $i$ indexes units and $t$ indexes time. ρ_β can be particularly useful in studies adopting a longitudinal designs because it quantifies variance in an outcome over time that is attributed to unit-based differences in slopes. The presence of such variance can then lead to testing specific hypotheses about moderator variables that may account for slope differences.

Conclusion

ρ_α and ρ_β index different sources of variance in $y_{i j}$ . Because ρ_α and ρ_β reflect two different sources of group-based differences, we suggest that researchers contemplating the use of multilevel modeling, as well those who suspect nonindependence in their data structure, expand the decision criteria for using such data-analytic approach to include both types of intraclass correlations. Continued use of ρ_α as the sole decision criterion may lead to the inappropriate use of data-analytic approaches that require independence among observations and also lead to opportunity cost in terms of testing precise and specific cross-level interaction effect hypotheses. In contrast, using both ρ_α and ρ_β improves the decision-making procedures for using multilevel modeling and assessing of cross-level interaction effects.

Footnotes

Appendix

Acknowledgment

We thank James LeBreton and two Organizational Research Methods anonymous reviewers for providing us with highly constructive and useful feedback that allowed us to improve our manuscript substantially. Also, we thank Kyle Bradley and Harry Joo for their assistance testing the icc_beta R package described in our article.

Authors’ Note

Both authors contributed equally to this research.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

References

Aguinis

(2004). Regression analysis for categorical moderators. New York, NY: Guilford.

Aguinis

Beaty

J. C.

Boik

R. J.

Pierce

C. A.

(2005). Effect size and power in assessing moderating effects of categorical variables using multiple regression: A 30-year review. Journal of Applied Psychology, 90, 94–107.

Aguinis

Gottfredson

R. K.

Culpepper

S. A.

(2013). Best-practice recommendations for estimating cross-level interaction effects using multilevel modeling. Journal of Management, 39, 1490–1528.

Aguinis

Pierce

C. A.

Bosco

F. A.

Muslin

I. S.

(2009). First decade of Organizational Research Methods: Trends in design, measurement, and data-analysis topics. Organizational Research Methods, 12, 69–112.

Aguinis

Stone-Romero

E. F.

(1997). Methodological artifacts in moderated multiple regression and their effects on statistical power. Journal of Applied Psychology, 82, 192–206.

Aguinis

Werner

Abbott

J. L.

Angert

Park

J. H.

Kohlhausen

(2010). Customer-centric science: Reporting significant research results with rigor, relevance, and practical impact in mind. Organizational Research Methods, 13, 515–539.

Bosco

F. A.

Aguinis

Singh

Field

J. G.

Pierce

C. A.

(in press). Correlational effect size benchmarks. Journal of Applied Psychology. doi: https://dx-doi-org.web.bisu.edu.cn/10.1037/a0038047

Burstein

Linn

R. L.

Capell

F. J.

(1978). Analyzing multilevel data in the presence of heterogeneous within-class regressions. Journal of Educational Statistics, 3, 347–383.

Cao

Ramsay

J. O.

(2010). Linear mixed-effects modeling by parameter cascading. Journal of the American Statistical Association, 105, 365–374.

10.

Chen

Kirkman

B. L.

Kanfer

Allen

Rosen

(2007). A multilevel study of leadership, empowerment, and performance in teams. Journal of Applied Psychology, 92, 331–346.

11.

Cohen

(1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates.

12.

Cortina

J. M.

Landis

R. S.

(2009). When small effect sizes tell a big story, and when large effect sizes don’t. In Lance

C. E.

Vandenberg

R. J.

(Eds.), Statistical and methodological myths and urban legends: Doctrine, verity, and fable in the organizational and social sciences (pp. 287–308). New York, NY: Routledge.

13.

Crainiceanu

Ruppert

(2004). Likelihood ratio tests in linear mixed models with one variance component. Journal of the Royal Statistical Society-B, 66, 165–185.

14.

Dalal

D. K.

Zickar

M. J.

(2012). Some common myths about centering predictor variables in moderated multiple regression and polynomial regression. Organizational Research Methods, 15, 339–362.

15.

Enders

C. K.

Tofighi

(2007). Centering predictor variables in cross-sectional multilevel models: A new look at an old issue. Psychological Methods, 12, 121–138.

16.

Halbesleben

J. R. B.

Wheeler

A. R.

Paustian-Underdahl

S. C.

(2013). The impact of furloughs on emotional exhaustion, self-rated performance, and recovery experiences. Journal of Applied Psychology, 98, 492–503.

17.

Heck

R. H.

Thomas

S. L.

Tabata

L. N.

(2010). Multilevel and longitudinal modeling with IBM SPSS. New York, NY: Routledge.

18.

Hofmann

D. A.

Gavin

M. B.

(1998). Centering decisions in hierarchical linear models: Theoretical and methodological implications for organizational science. Journal of Management, 24, 623–641.

19.

Hofmann

D.A.

Griffin

M. A.

Gavin

M.B.

(2000). The application of hierarchical linear modeling to management research. In Klein

K. J.

Kozlowski

S. W. J.

(Eds.), Multilevel theory, research, and methods in organizations: Foundations, extensions, and new directions (pp. 467–511). Hoboken, NJ: Jossey-Bass.

20.

Hofmann

D. A.

Morgeson

F. P.

Gerras

S. J.

(2003). Climate as a moderator of the relationship between leader-member exchange and content specific citizenship: Safety climate as an exemplar. Journal of Applied Psychology, 88, 170–178.

21.

Hox

J. J.

(2010). Multilevel analysis: Techniques and applications (2nd ed.). New York, NY: Routledge.

22.

Liden

R. C.

(2013). Relative leader–member exchange within team contexts: How and when social comparison impacts individual effectiveness. Personnel Psychology, 66, 127–172.

23.

Hülsheger

U. R.

Alberts

H. J. E. M.

Feinholdt

Lang

J. W. B.

(2013). Benefits of mindfulness at work: The role of mindfulness in emotion regulation, emotional exhaustion, and job satisfaction. Journal of Applied Psychology, 98, 310–325.

24.

Kenny

D. A.

Judd

C. M.

(1996). A general procedure for the estimation of interdependence. Psychological Bulletin, 119, 138–148.

25.

Kenny

D. A.

Korchmaros

J. D.

Bolger

(2003). Lower level mediation in multilevel models. Psychological Methods, 8, 115–128.

26.

Kim

Bhave

D. P.

Glomb

T. M.

(2013). Emotion regulation in workgroups: The roles of demographic diversity and relational work context. Personnel Psychology, 66, 613–644.

27.

LaHuis

D. M.

Hartman

M. J.

Hakoyama

Clark

P. C.

(2014). Explained variance measures for multilevel models. Organizational Research Methods, 17, 433–451.

28.

Leavitt

Mitchell

R. R.

Peterson

(2010). Theory pruning: Strategies to reduce our dense theoretical landscape. Organizational Research Methods, 13, 644–667.

29.

LeBreton

J. M.

Senter

J. L.

(2008). Answers to twenty questions about interrater reliability and interrater agreement. Organizational Research Methods, 11, 815–852.

30.

Liao

Rupp

D. E.

(2005). The impact of justice climate and justice orientation on work outcomes: A cross-level multifoci framework. Journal of Applied Psychology, 90, 242–256.

31.

Longford

(1993). Random coefficient modeling. Oxford, UK: Clarendon.

32.

Mathieu

J. E.

Aguinis

Culpepper

S. A.

Chen

(2012). Understanding and estimating the power to detect cross-level interaction effects in multilevel modeling. Journal of Applied Psychology, 97, 951–966.

33.

Mathieu

Ahearne

Taylor

S. R.

(2007). A longitudinal cross-level model of leader and salesperson influences on sales force technology use and performance. Journal of Applied Psychology, 92, 528–537.

34.

Raudenbush

S. W.

Bryk

A. S.

(2002). Hierarchical linear models: Applications and data analysis methods (2nd ed.). Thousand Oaks, CA: Sage Publications.

35.

Robinson

W. S.

(1950). Ecological correlations and the behavior of individuals. American Sociological Review, 15, 351–357.

36.

Searle

S. R.

Casella

McCulloch

C. E.

(1992). Variance components. New York, NY: Wiley.

37.

Self

S. G.

Liang

(1987). Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions. Journal of the American Statistical Association, 82, 605–610.

38.

Short

J. C.

Ketchen

D. J.

Bennett

du Toit

(2006). An examination of firm, industry, and time effects on performance using random coefficients modeling. Organizational Research Methods, 9, 259–284.

39.

Snijders

T. A. B.

Bosker

R. J.

(2012). Multilevel analysis: An introduction to basic and advanced multilevel modeling (2nd ed.). Thousand Oaks, CA: Sage.

40.

Stram

Lee

J.W.

(1994). Variance components testing in the longitudinal mixed effects model. Biometrics, 50, 1171–1177.

41.

Takeuchi

Chen

Lepak

D. P.

(2009). Through the looking glass of a social system: Cross-level effects of high-performance work systems on employees’ attitudes. Personnel Psychology, 62, 1–29.

42.

Taylor

P. J.

W. D.

Shi

Borman

W. C.

(2008). The transportability of job information across countries. Personnel Psychology, 61, 69–111.

43.

M. A.

Foo

M. D.

Aguinis

(2010). Using experience sampling methodology to advance entrepreneurship theory and research. Organizational Research Methods, 13, 31–54.

An Expanded Decision-Making Procedure for Examining Cross-Level Interaction Effects With Multilevel Modeling

Abstract

Keywords

Assessing Cross-Level Dependence and Variability

General Variance Decomposition of Level 2 Variability in y i j Scores

Multilevel Model With a Single Predictor

General Multilevel Model

Analytical Evidence of Differences between ρ α and ρ β

Monte Carlo Empirical Evidence of Differences Between ρα and ρβ

Implications for Substantive Research: Different Conclusions Based on the Use of ρα Versus ρβ

Comparison of ρβ With Existing Tests and Statistics

Likelihood Ratio Tests of Non-Zero τ 11

Statistics Describing Group Slope Differences

Illustrative Data Set for Computing ρβ

Discussion

Conclusion

Footnotes

Appendix

Acknowledgment

Authors’ Note

Declaration of Conflicting Interests

Funding

References

General Variance Decomposition of Level 2 Variability in $y_{i j}$ Scores

Analytical Evidence of Differences between $ρ_{α}$ and $ρ_{β}$

Monte Carlo Empirical Evidence of Differences Between ρ_α and ρ_β

Implications for Substantive Research: Different Conclusions Based on the Use of ρ_α Versus ρ_β

Comparison of ρ_β With Existing Tests and Statistics

Likelihood Ratio Tests of Non-Zero $τ_{11}$

Illustrative Data Set for Computing ρ_β