From Nuisance to Novel Research Questions: Using Multilevel Models to Predict Heterogeneous Variances

Abstract

Constructs that reflect differences in variability are of interest to many researchers studying workplace phenomena. The aggregation methods typically used to investigate “variability-based” constructs suffer from several limitations, including the inability to include Level 1 predictors and a failure to account for uncertainty in the variability estimates. We demonstrate how mixed-effects location-scale (MELS) and heterogeneous variance models, which are direct extensions of traditional mixed-effects (or multilevel) models, can be used to test mean (location)- and variability (scale)-related hypotheses simultaneously. The aims of this article are to demonstrate (a) how the MELS and heterogeneous variance models can be estimated with both nested cross-sectional and longitudinal data to answer novel research questions about constructs of interest to organizational researchers, (b) how a Bayesian approach allows for the inclusion of random intercepts and slopes when predicting both variability and mean levels, and finally (c) how researchers can use a multilevel approach to predict between-group heterogeneous variances. In doing so, this article highlights the added value of viewing variability as more than a statistical nuisance in organizational research.

Keywords

multilevel models mixed-effects location-scale models heterogeneous variance models variability-related hypotheses Bayesian

Organizational researchers commonly use the amount (e.g., mean) of a variable as both a predictor and an outcome of substantive interest, while examining variability for instrumental purposes. For example, researchers using direct consensus or referent shift approaches (Chan, 1998) calculate agreement among members of a group to justify the aggregation of a Level 1 variable (James, Demaree, & Wolf, 1984, 1993). Similarly, before proceeding with their analysis, researchers using general linear models assess whether residuals follow the assumed normal distribution with a homogeneous variance (e.g., Kutner, Nachtsheim, Neter, & Li, 2004). In such cases, researchers may view variability as a nuisance that interferes with obtaining appropriate estimates and inferences regarding the mean. However, theoretically meaningful interpretations of variability exist in many areas of organizational research (e.g., personality consistency, performance variability, leader-member exchange [LMX] differentiation, strategic consensus, pay dispersion) and researchers need to utilize methods designed for studying variability to advance the understanding of these workplace phenomena.

Currently, it is common for researchers publishing in top organizational journals to use single-level aggregation methods in analyses predicting variability (e.g., calculating the within-group standard deviation and using it as an outcome in analyses). However, we caution against this common practice due to important limitations, including a failure to account for uncertainty in the variability estimates and the inability to appropriately incorporate Level 1 predictors in analyses. Instead, we propose that the study of variability is ripe for multilevel investigation and explain how multilevel approaches enable researchers to address new research questions that cannot be answered using aggregation and single-level analyses. Specifically, we demonstrate how models that are direct extensions of traditional mixed-effects (or multilevel) models may be used to test mean (location)- and variability (scale)-related hypotheses simultaneously. The most general of these models, mixed-effects location-scale (MELS) models, allow for random effects on the location side and the scale side of the model, while the more restricted heterogeneous variance models allow for random effects on the location side of the model and fixed effects on the scale side of the model (see the glossary in Appendix A for definitions of all bold terms).

We are not the first to discuss MELS or heterogeneous variance models. MELS models are a relatively recent development in the long history of different approaches to modeling heterogeneous variances (e.g., Aitkin, 1987; Bryk & Raudenbush, 1988; Culpepper, 2010; Goldstein, 2011; Harvey, 1976; Leonard, 1975; Lindley, 1971; Pinheiro & Bates, 2000; Raudenbush, 1988). Researchers have discussed the development of these models in statistical journals and texts (e.g., Goldstein, 2011; Hedeker, Mermelstein, & Demirtas, 2008; Lin, Mermelstein, & Hedeker, 2018a, 2018b; Walters, Hoffman, & Templin, 2018) and applied these models (most often) in fields where the collection of intensive longitudinal data is more common (e.g., medicine; Pugach, Hedeker, Richmond, Sokolovsky, & Mermelstein, 2014). For example, Watts, Walters, Hoffman, and Templin (2016) examined whether time-invariant (Level 2 predictors, e.g., gender, age, Alzheimer’s disease status) as well as time-varying predictors (Level 1 predictors, e.g., day monitor worn) were associated with individual differences in mean level (location side) as well as intraindividual variability (scale side) of physical activity. Although less common, researchers also apply these models to nested cross-sectional data. For example, this approach has been used in an educational context to assess whether the variability in academic achievement within a school is affected by children’s socioeconomic status (SES; Leckie, French, Charlton, & Browne, 2014). By using a Bayesian approach to include a random effect for SES on the scale side of the model, Leckie et al. (2014) demonstrated that the effect of SES on variability in academic achievement within schools varied by private versus public sector (i.e., a cross-level interaction). Culpepper (2010) employed a similar approach to reveal that whites and females demonstrate more predictable academic performance than their male or racial/ethnic minority counterparts.

The first use of these models within the organizational sciences occurred recently, with an important and innovative application to the study of consensus emergence by Lang, Bliese, and de Voogt (2018). With this exception, we suspect MELS and heterogeneous variance models are underutilized in the organizational research literature for at least two reasons. First, it is relatively rare for organizational researchers to have intensive longitudinal data. Second, most articles have been published in statistical and medical journals that organizational scholars tend not to read, and these articles may not be accessible to all readers (e.g., they often assume a depth of mathematical training beyond the level typically provided in graduate programs training organizational researchers).

The objective of this article is to broaden the use of multilevel approaches for studying variability in the organizational sciences. We begin by identifying examples of variance-based constructs that are of interest to organizational researchers, and throughout the article we illustrate the types of research questions that multilevel approaches for predicting heterogeneous variances can answer, including questions that cannot be asked or adequately addressed using typical methodological practices (i.e., aggregation and single-level analyses). Then, using simulated data, we first demonstrate the estimation of a MELS model predicting heterogeneous within-group variance for a nested cross-sectional design.¹ Second, we demonstrate the estimation of a heterogeneous variance model for predicting between-group variances (i.e., intercept and slope) for a longitudinal design using a multigroup approach. In addition to predicting heterogeneous between-group variances for the intercept and slope, this model also allows for a heterogeneous covariance between these two random effects.

The flexibility of modeling the scale side with random effects does have computational costs, especially as the number of random effects increases (Asparouhov & Muthén, 2016). Researchers will likely need to use a Bayesian analysis to estimate more complex MELS models (e.g., those with both random intercepts and slopes on the location and scale sides of the model as is done in the nested cross-sectional example). Throughout the article, as we demonstrate this approach, we aim to strike a balance between providing the technical detail and practical guidance researchers need to use these models in their work. To that end, we use an open-source statistical programming language to conduct the analyses, provide all code used to estimate the models in our examples, and offer detailed interpretations of the model output. Finally, we discuss additional practical considerations for researchers applying multilevel models to predict heterogeneous variance.

Review of the Organizational Literature

In this section, we present the results of a literature review examining the study of theoretically meaningful interpretations of variability in organizational research (i.e., variability-based constructs). We start our review in the year in which Hedeker and colleagues (2008) demonstrated how to estimate the random intercept MELS model in Biometrics using readily available software, thereby making this method more readily available to researchers. We identified articles by searching for the terms variability, dispersion, consensus, and consistency in article titles, keywords, and abstracts of the following top journals: Academy of Management Journal, Administrative Science Quarterly, Journal of Applied Psychology, Journal of Management Studies, Journal of Management, Management Science, Organization Science, Personnel Psychology, and Strategic Management Journal. We identified over 60 articles in which variability-based constructs were discussed (conceptually) or examined as the substantive focus of the article (e.g., as predictor, dependent variable, moderator). While not exhaustive (e.g., researchers may use other terms to refer to variance-based constructs), this search allows us to illustrate the interest in and methods used to examine variability-based constructs in organizational research. We found that organizational researchers have not widely adopted MELS models or similar approaches.

Examples of Variability-Based Constructs in Organizational Research

Across the micro-macro organizational research continuum, we identified many examples of variability-based constructs. These constructs reflect between-person, -team, or -organization differences in variability across time/situations (e.g., stability of preferences, attitudes, behaviors) as well as variability in the characteristics, perceptions, or behaviors of lower-level entities (e.g., people, but also teams, business units) that compose a higher-level entity (e.g., team, occupation, organization, or industry). Table 1 provides a description of these constructs.

Table 1.

Examples of Variability-Based Constructs in Organizational Research.

Differences in the Heterogeneity of the Entities Composing Teams, Business Units, Occupations, Organizations, or Industries
Strategic consensus	The shared understanding of strategic priorities (i.e., agreement) among managers at the top, middle, or operating levels of the organization (Kellermanns et al., 2005).
Investor sentiment agreement	Heterogeneous perceptions of aggregate economic growth among investors (Cen, Lu, & Yang, 2013).
Competency rating consensus	Agreement among job incumbents on the ratings of job competencies (Lievens, Sanchez, Bartram, & Brown, 2010).
Justice climate strength	Variability in team members’ justice ratings (Roberson & Williamson, 2012).
Group consensus	Shared perceptions and feelings or climates within groups (e.g., job satisfaction; Lang et al., 2018). Applicable to a variety of cognitive, motivational, and affective emergent states among group members (Ilgen, Hollenbeck, Johnson, & Jundt, 2005; Marks, Mathieu, & Zaccaro, 2001).
Leader-member exchange differentiation	Differentiated exchanges leaders create with subordinates within the same workgroup (Erdogan & Bauer, 2010; Gooty & Yammarino, 2016; Henderson et al., 2009; Liden, Erdogan, Wayne, & Sparrowe, 2006).
Occupational heterogeneity	Variability in knowledge, skills, abilities, and other competencies of people in the same occupation (Ployhart, Weekley, & Baughman, 2006; Sitzmann, Ployhardt, & Kim, 2019).
Team personality/need dispersion	Variability in team members personalities (Gonzalez-Mulé, DeGeest, McCormick, Seong, & Brown, 2014) or the need for power, achievement, and affiliation (Chun & Choi, 2014).
Efficacy dispersion	Within-team variability in perceptions of team efficacy (DeRue, Hollenbeck, Ilgen, & Feltz, 2010).
Pay dispersion	Within-team variability in compensation (e.g., among top managers; Carnahan, Agarwal, & Campbell, 2012; Chin & Semadeni, 2017; Fredrickson, Davis-Blake, & Sanders, 2010; Jaskiewicz et al., 2017; Lim, 2018; Messersmith, Guthrie, Ji, & Lee, 2011; Trevor, Reilly, & Gerhart, 2012).
Decision-making heterogeneity	Variability of group members’ decision-making preferences (Melkonyan & Safra, 2016).
Client heterogeneity	Variability in the characteristics of clients served (e.g., patients physical and psychological characteristics; Chowdhury & Endres, 2010).
Implementation variability	Variability in the implementation of organizational changes, human resource management systems, or other policies and procedures across organizational subunits (e.g., the implementation of high performance work systems; Pak & Kim, 2018).
Innovation heterogeneity	Within-organization variability in innovation output (e.g., variability in the forward citations of patents submitted by an organization; Patel, Kohtamäki, Parida, and Wincent (2015); variability in the innovativeness of subsidiaries, Figueiredo, 2011).
Performance heterogeneity	Within-industry variability in firms’ performance or profitability (Balasubramanian & Lieberman, 2010; Lenox, Rockart, & Lewin, 2010) or within-occupation variability in individuals’ job performance (Bidwell & Keller, 2014).
Between-Person, -Team, or -Organization Differences in Variability Over Time or Across Situations
Decision-making consistency	Within-person variability in decisions made in the face of similar choice occasions (Melkonyan & Safra, 2016).
Transfer of training consistency	Within-person variability in the application of newly acquired knowledge and skills to the job context (Huang, Ford, & Ryan, 2017).
Personality consistency	Within-person variability in expression of personality traits or other individual differences across situations (i.e., more general intraindividual variability, Dalal et al., 2015; e.g., conscientiousness, Minbashian, Wood, & Beckmann, 2010; sociability and dutifulness, Lievens et al., 2018; mastery goal orientation, Huang et al., 2017).
Emotional labor consistency	Within-person variability in the use of surface acting and deep acting across situations (over time; Gabriel & Diefendorff, 2015; Scott, Barnes, & Wagner, 2012).
Job satisfaction variability	Within-person variability in person-organization and person-job fit perceptions (Gabriel, Diefendorff, Chandler, Moran, & Greguras, 2014).
Justice variability	Within-person stability of fairness over time (Matta, Scott, Colquitt, Koopman, & Passantino, 2017).
Performance variability	Within-person variability of performance over time (Barnes, Reb, & Ang, 2012; Dalal, Bhave, & Fiset, 2014; Minbashian & Luppino, 2014; Reb & Greguras, 2010); within-business unit (Guo, 2017) or within-firm (Wales, Patel, & Lumpkin, 2013) variability of performance over time; within-firm variability of corporate social performance over time (Wang & Choi, 2013), business acquisition rate or success (Kim, Finkelstein, & Haleblian, 2015; Laamanen & Keil, 2008), and provision of services to a customer over time (Sriram, Chintagunta, & Manchanda, 2015).
Turnover volatility	Within-firm variability of employee turnover over time (Hausknecht & Holwerda, 2013).

Many variability-based constructs have broad relevance across levels and areas of focus. For example, researchers have examined performance variability for firms over time (e.g., Wales, Patel, & Lumpkin, 2013; Wang & Choi, 2013), within an industry (e.g., Balasubramanian & Lieberman, 2010; Lenox, Rockart, & Lewin, 2010), across organizational subsidiaries (Figueiredo, 2011), across people who occupy the same type of job (Bidwell & Keller, 2014), and for employees over time (Barnes, Reb, & Ang, 2012; Reb & Greguras, 2010). Similarly, pay dispersion has been discussed in reference to labor markets (Cobb & Stevens, 2017), organizations as a whole (Carnahan, Agarwal, & Campbell, 2012), and perhaps most commonly, top management teams (e.g., Chin & Semadeni, 2017; Fredrickson, Davis-Blake, & Sanders, 2010; Jaskiewicz, Block, Miller, & Combs, 2017; Lim, 2018).

Several constructs also reflect variability in characteristics of group members (e.g., personality heterogeneity) or the extent to which there is agreement among members in the group (e.g., justice climate, efficacy dispersion). For example, strategic consensus regarding the firm’s priorities is commonly assessed for managers in the top team, but also at different levels and in different parts of the organization (Kellermanns, Walter, Lechner, & Floyd, 2005). The variability of supervisor behavior across time (e.g., justice variability; Matta, Scott, Colquitt, Koopman, & Passantino, 2017) and when interacting with different members of the workgroup (LMX differentiation; Gooty & Yammarino, 2016; Henderson, Liden, Glibkowski, & Chaudhry, 2009) has also been a topic of study.

It is important to note that most prior research has either conceptually discussed the nature of variance-based constructs or aimed to establish relationships between the construct and important workplace outcomes. For example, Ployhart, Weekley, and Baughman (2006) found that human capital emergence, conceptualized as consisting of both personality level (aggregate mean personality) and personality homogeneity (aggregate standard deviation of personality) within jobs and within organizations, predicted employee job satisfaction and performance. Further, research focusing on within-person personality consistency has found that the relationship between the level of personality traits and job performance is stronger when people are less variable in their personality expression across time/situations (Dalal et al., 2015). Other research has linked variability in the quality of relationships between a supervisor and his or her subordinates (i.e., LMX differentiation) to job satisfaction, performance, and other outcomes (e.g., Erdogan & Bauer, 2010; Schyns, 2006). In the literature on workplace groups and teams, researchers have proposed that team efficacy dispersion will predict team effectiveness above and beyond the average level of efficacy within the team (DeRue, Hollenbeck, Ilgen, & Feltz, 2010). At the department or organization level, researchers have found a stronger relationship between climate level (i.e., mean rating of perceptions of the workplace) and outcomes of interest (e.g., such as work satisfaction and organizational commitment) when employees view the climate similarly (i.e., there is a strong climate in place; González-Romá, Peiró, & Tordera, 2002). Finally, at the firm level, firms with higher trading variability have been found to have lower expected returns (Chordia, Subrahmanyam, & Anshuman, 2001).

These examples are part of the mounting evidence suggesting that variability (in addition to the average amount) of workplace phenomena predicts important workplace outcomes. Thus, there is a need to understand the antecedents of variability-based constructs. In the next section, we describe the smaller subset of studies (13) that have begun to examine these antecedents (see Table 2), including the limitations of approaches that have been applied previously. We explain how MELS and heterogeneous variance models address these limitations and expand the potential questions researchers may address when predicting variability as a substantive construct of interest.

Table 2.

Prior Organizational Research Predicting Variability-Based Constructs.

Citation	Research Description	Analysis	Type of Variability
Fredrickson, Davis-Blake, and Sanders (2010)	Examined if the percentage of the team on the board, variation in team members stock ownership, and members’ average tenure predicted variability of pay among members of the top management team.	Fixed effect regression was used to predict the coefficient of variation for pay.	Variability within a team
Lenox, Rockart, and Lewin (2010)	Examined if the extent of interdependencies among firms predicted variability of firm profits within an industry.	Mixed (multilevel) model, where the standard deviation of Tobin’s q was used as the outcome.	Variability of firm profits within an industry
Lievens et al. (2010)	Examined whether aspects of the occupation (e.g., complexity, context, work activities) predicted the agreement (less variability) in job incumbent ratings of the job competencies.	Generalizability coefficient (representing agreement among raters) predicted by nature of the occupation in a regression.	Variability in job incumbents’ ratings
González-Benito, Aguinis, Boyd, and Suárez-González (2012)	Examined if strategic consensus regarding competitive methods predicted strategic variability regarding managers’ perceptions of strategic priorities (i.e., strategic consensus regarding objectives).	Distance measure calculated for each firm and used subsequently in a structural equation model.	Variability in managers’ perceptions within a firm
Roberson and Williamson (2012)	Examined if team network density and self-monitoring behavior predicted variability in team members’ perceptions of justice (i.e., justice climate strength).	Calculated standard deviation of justice climate perceptions and used as a dependent variable in a regression.	Variability within a team
Scott, Barnes, and Wagner (2012)	Examined if self-monitoring predicted emotional labor variability (for both surface and deep acting).	Calculated standard deviations for emotional labor variables over two weeks for each person and used that measure to calculate partial correlations with self-monitoring.	A person’s variability over time
Patel et al. (2015)	Examined if entrepreneurial orientation and absorptive capacity predicted variability in forward citations of a patent approved and/or filed (i.e., innovation) within a firm.	Calculated the standard deviation of the forward citations (after adjusting for industry, etc.) to reflect variability and predicted within a structural equation modeling framework.	Variability within a firm
Chin and Semadeni (2017)	Examined if CEO liberalism predicted the variability of pay among members of the top management team (not including the CEO).	Generalized estimating equations (GEE) where the inverse coefficient of variation for pay was used as the outcome.	Variability within a team
Jaskiewicz et al. (2017)	Examined if the type of firm ownership (i.e., founder, family, or later generation owner) predicted the variability of pay among members of the top management team (not including the CEO).	Used aggregate measure of variation (100 × the coefficient of variation for top four managers) for total compensation of four top non-CEO managers as the outcome in a regression.	Variability within a team
Matta, Scott, Colquitt, Koopman, and Passantino (2017)	Examined if supervisors’ self-control predicted variability in overall justice perceived by subordinates.	Used multilevel path analysis but included the standard deviation of justice perceptions across time as a Level 2 variable.	A person’s variability over time
Lang, Bliese, and de Voogt (2018)	Examined whether collective efficacy predicted the trajectory of consensus in job satisfaction over time (Study 1) and whether leaders show different patterns of consensus than other group members (Study 2).	Extended a standard multilevel methodology by examining residual variances within a growth model to account for change in group consensus and included group and individual-level predictors to explain the emergence of greater within-group agreement over time.	Variability within a team over time
Lievens et al. (2018)	Examined if self-rated self-monitoring and functional flexibility predicted variability in respondents’ answers to a situational judgment test examining the traits of sociability and dutifulness.	Used an IRT variance partitioning approach. In the second study, used the within-person standard deviation of personality states as the outcome and predicted it in a regression using intraindividual variability on SJT among other things as a predictor. In a footnote mentioned using multilevel modeling approaches in supplementary analyses, but a description of the specific method and results were not reported.	A person’s variability across situations
Sitzmann, Ployhardt, and Kim (2018)	Examined whether aspects of occupational strength (e.g., task significance) predicted the variability of incumbents’ personalities within occupations.	Used an aggregate measure of heterogeneity (mean Euclidean distance, replicated with standard deviation) in a structural equation model.	Variability within an occupation

Use and Limitations of Aggregation Approaches for Predicting Differences in Variability

Our review found that researchers commonly calculate and use aggregate measures of variability as an outcome in single-level regressions or similar analyses. This approach has important limitations, which researchers can overcome by using multilevel approaches for predicting differences in variability.

First, this approach allows researchers to utilize only Level 2 predictors because Level 2 outcome variables (e.g., the standard deviation estimate most commonly used) do not have any Level 1 variability (Leckie et al., 2014). Thus, researchers cannot incorporate Level 1 variables as predictors of variability-based constructs and instead often create Level 2 versions of Level 1 predictors and utilize them in a single-level regression (Hofmann, 2002). This approach loses information that is likely of interest to researchers. In the sections that follow, we demonstrate how the ability to incorporate Level 1 predictors offers new avenues for organizational researchers.

Second, the aggregation approach does not account for uncertainty in the variability estimates. The standard deviation or variance of a sample is random. Therefore, there is uncertainty associated with that statistic. Further, the standard deviation of the variance (i.e., $2 σ^{2} / \sqrt{n}$ ) is potentially much larger than the standard deviation of the mean (i.e., the standard error of the mean: i.e., $σ / \sqrt{n}$ ; Pawitan, 2001, p. 59). This uncertainty is important because aggregation approaches treat all variability outcomes as equally precise regardless of the number of measurements per Level 2 entity (e.g., person or group), resulting in an underestimation of sources of variation and therefore inflated Type I error rates. In contrast, MELS and heterogeneous variance models appropriately weight each Level 2 estimate according to the amount of data at Level 1, thereby resulting in more appropriate variance estimates and inferences.

Explaining Sources of Variance Versus Predicting Heterogeneous Variances

The use of multilevel models for predicting the location (mean) is common across many areas in the organizational sciences. These models are often “mixed-effects” models because they include both fixed and random effects. For example, a random intercept multilevel model (i.e., the simplest multilevel model) is a “mixed” model that allows each Level 2 unit to have its own predicted mean (e.g., Bliese & Ployhart, 2002; Pinheiro & Bates, 2000). Adding fixed or random effects enable researchers to investigate change in mean-level based on either Level 1 or Level 2 characteristics. For example, it is common for researchers to employ growth models, such as those discussed by Bliese and Ployhart (2002), with data gathered using experience sampling methods in which participants respond to prompts on many occasions (Bolger & Laurenceau, 2013). When researchers use these models, the first step involves variance partitioning (i.e., quantify the amount of variability that exists at higher and lower levels of analysis) and then subsequently explaining some portion of the variance at each level (by predicting differential mean levels with variables of interest). The variance partitioning in these models is the same for everyone in the sample (e.g., one random intercept variance, one random slope variance, and one residual variance).

The characteristic that differentiates MELS and heterogeneous variance models from traditional multilevel models is the explicit modeling of variability (i.e., heterogeneous residual variances). The location side (mean portion) of the model is the same and can be expressed using the notation popularized by Raudenbush and Bryk (2002). The equation on the scale side (i.e., residual variability portion) takes a similar form but uses a different notation to make it easier to keep track of which side of the model the parameter is located. Thus, typical growth models (and other traditional multilevel models) are a special case of MELS and heterogeneous variance models where the scale side of the model contains only the fixed intercept, τ_00. To clarify the difference between these types of models, see example equations for traditional regression, multilevel, heterogeneous variance, and MELS models provided in Figure 1 (compare Equation 2 for the traditional multilevel model to Equation 3 for the heterogeneous variance model and Equation 4 for the MELS model). Figures 2 and 3 further demonstrate this conceptual distinction between explaining sources of variance and predicting heterogeneous variances (i.e., differences in variability). The latter may be used to operationalize and predict variance-based constructs in the organizational literature.

Figure 1.

Comparison of regression, multilevel, heterogeneous variance, and mixed-effects location-scale model equations. The location or mean side of the regression equation is presented in a multilevel form, although it contains no additional random effects and is therefore not a multilevel model. This notation is used to facilitate the comparison of the equations across the various model specifications. Note that there is no Level 1 residual term on the scale side of the model (i.e., no e, which is similar to a generalized linear model [e.g., logistic regression] that also lacks a Level 1 residual). In MELS and heterogeneous variance models, the log of the variance (or standard deviation) is used as the outcome to ensure that the predicted variance is nonnegative.

Figure 2.

Regression plot illustrating regression assumptions. The left panel contains a typical regression with a positive slope and a homogenous variance. Y_i ∼ N(bx_i, σ²) is the common notation used to indicate that Y_i is normally distributed with a mean that depends upon the individual’s x_i value and a variance that is constant (σ²). The right panel displays a heterogeneous variance model where the mean does not change over time depending upon x. The Y_i ∼ N(0, τx_i) notation indicates that the mean is zero for everyone in the sample and the variability depends upon the x_i value.

Figure 3.

Conceptual figure depicting predicted variances for two different groups in a MELS model. The black circles represent three different predictors (two at Level 2 and one at Level 1). The other circles represent the variance partitioning between Level 1 (white) and Level 2 (gray). The overlap between circles represents the variance explained. The size of the circle reflects group specific residual variance estimates. Group 2 has less random intercept variance (top circle) but more random slope (middle circle) and Level 1 (bottom circle) variance as compared to Group 1. In a traditional random slope multilevel model there are only three variance components for everyone in the entire sample. In other words, the size of the circles would be the same for all groups. Although not depicted here, the same principles apply to individuals.

Multilevel Models for Predicting Heterogeneous Variances

Above we discussed the conceptual differences between traditional multilevel models and multilevel models for predicting heterogeneous variances. In this section, we describe the notation presented in Figure 1 for the MELS model. In these descriptions, we use individuals nested within groups to make the interpretations more concrete. However, these interpretations would be similar for different nested cross-sectional and longitudinal designs. Also, please note that the interpretation for each coefficient is the same when the coefficients appear in the other more restricted models presented in Figure 1.

First, the location (mean) side of the model in Equation 4 (see Figure 1) is just like any other multilevel model: $Y_{i j}$ represents the outcome for individual i nested within group j, $X_{i j}$ represents a Level 1 predictor variable (centered at the group mean, ${\bar{X}}_{. j}$ ), W _j represents a Level 2 predictor (centered at the grand mean, ${\bar{W}}_{.}$ ). Please note that the predictors do not have to be the same for the two sides of the model. As usual, $β_{0 j}$ and $β_{1 j}$ are placeholders for the intercepts and slopes, respectively, and are defined in terms of fixed and random effects (i.e., γ’s and u’s). For interpretation purposes, individual-level (Level 1) predictors predict differential mean outcome values between individuals (that could vary across groups depending on random slopes), whereas group-level (Level 2) variables predict differential mean outcomes between groups—each unit increase in a predictor with a positive fixed effect will result in an increase in the average amount of the outcome.

Second, the scale (variability) side of the MELS model is a log-linear model in which the log of the residual variance or standard deviation (we use the standard deviation parameterization) is predicted by a linear function using predictors of interest. The log-linear model is used to ensure that the predicted standard deviation is positive. As shown on the scale side of Equation 4, each group (j) and individual within that group (i) has its own residual standard deviation estimate, $σ_{e_{i j}}$ (i.e., heterogeneous). The interpretation of the model effects on the scale side is similar to the interpretation of the model effects on the location side (assuming the interpretation is made on the log standard deviation scale). Here, $σ_{e_{i j}}$ denotes the residual standard deviation for a particular individual in a particular group, α_0j and α_1j are placeholders for the intercepts and slopes respectively that are defined by the fixed (i.e., $τ$ ’s) and random (i.e., ν’s) effects. For interpretation purposes, individual-level (Level 1) predictors predict differential within-group variability between individuals, whereas group-level (Level 2) predictors predict differential variability between groups—each unit increase in a predictor with a positive fixed effect will result in increased (log) variability indicating more variable, or less consistent, group members.

Finally, this model generally assumes all random effects ν’s and u’s are normally distributed with means of 0 and random effect standard deviations $σ_{u_{q}}$ and $σ_{ν_{q}}$ , respectively. Further, covariances between all location and scale random effects can be estimated (e.g., $σ_{u_{q}, ν_{s}})$ . The examination of covariances allows researchers to answer questions such as do groups that have higher means also tend to have smaller within-group residual standard deviations. Equation 5 contains the assumptions regarding the distribution of the random effects (i.e., multivariate normal with a 0 mean column vector, and an unstructured covariance matrix that allows all possible covariances between the random effects). Allowing the random effects to covary across sides of the model is a unique aspect of the MELS model that cannot be estimated using the aggregation approach described previously. We provide a full interpretation of these effects in the supplemental material. Please note that the parameterization utilizes standard deviations and correlations as opposed to variances and covariances.

[\begin{matrix} u_{0 j} \\ u_{1 j} \\ ν_{0 j} \\ ν_{1 j} \end{matrix}] \sim M V N o r m a l ([\begin{matrix} 0 \\ 0 \\ 0 \\ 0 \end{matrix}], [\begin{matrix} σ_{u_{0}}^{2} & σ_{u_{0}, u_{1}} & σ_{u} ​_{​_{0}, ν_{0}} & σ_{u} ​_{​_{0}, ν_{1}} \\ σ_{u} ​_{​_{1}, u_{0}} & σ_{u_{1}}^{2} & σ_{u} ​_{​_{1}, ν_{0}} & σ_{u} ​_{​_{1}, ν_{1}} \\ σ_{ν} ​_{​_{0}, u_{0}} & σ_{ν} ​_{​_{0}, u_{1}} & σ_{ν_{0}}^{2} & σ_{ν} ​_{​_{0}, v_{1}} \\ σ_{ν} ​_{​_{1}, u_{0}} & σ_{ν} ​_{​_{1}, u_{1}} & σ_{ν} ​_{​_{1}, ν_{0}} & σ_{ν_{1}}^{2} \end{matrix}])

If the scale side of the model contains fixed, but not random, effects of predictors (i.e., no ν_0j or ν_1j), then the model is considered to be a heterogeneous variance model. As described previously, heterogeneous variance models are a restricted version of the MELS model (compare Equation 3 to Equation 4 in Figure 1). This comparison is similar to how linear regression is a restricted version of multilevel models. Heterogeneous variance models, by their inclusion of only fixed effects on the scale side, predict variances that differ for systematic reasons (e.g., males are more variable than females). Recently, Lang et al. (2018) demonstrated the use of an extended multilevel model for heterogeneous variances to predict the emergence of within-group consensus over time. In their article, they estimated several models with random slopes on the location side and the fixed effect of time on the scale side. Some of these models included additional moderators of the effect of time on the scale side (e.g., leadership status and group readiness).

We build on this work by Lang and colleagues (2018) by demonstrating how random slopes may also be included on the scale side of the model. This extension is important because, in a different context, Leckie et al. (2014) and Walters et al. (2018) found that ignoring random effects on the scale side can lead to inflated Type I error rates when predicting heterogeneous within-group variances, specifically when testing hypotheses regarding Level 2 variables. Thus, heterogeneous variance models are susceptible to inflated Type I error rates if additional random effects are erroneously omitted. We want to emphasize that we are not saying the application of heterogeneous variance models is wrong, but rather using these models assumes a systematically varying cross-level interaction (i.e., the variances are heterogeneous due to time as well as other fixed effect interaction terms; Davidian & Giltinan, 1995; Hoffman, 2007; Raudenbush & Bryk, 2002). It is possible to test this assumption by examining the scale-model random effects in the more general MELS models.

A Bayesian Approach

Several years ago, Kruschke, Aguinis, and Joo (2012) outlined a case for the need to apply Bayesian methods in the organizational sciences (see Appendix B for a brief introduction to Bayesian statistics). Kruschke et al. recommend five steps researchers should follow when presenting the findings of a Bayesian analysis. We use these step-by-step guidelines as we make a case for the use of Bayesian methods to estimate MELS and heterogeneous variance models and present the results from our illustrative examples.

First, it is important for researchers to motivate the use of Bayesian methods. While researchers may estimate simple versions of multilevel models predicting heterogeneous variances using maximum likelihood estimation, convergence issues are likely to occur as the number of random effects increase on the location and scale sides of the model (Asparouhov & Muthén, 2016). Thus, we use Bayesian analysis to “open the door to extensive new realms of modeling possibilities that were previously inaccessible” (Kruschke et al., 2012, p. 723). In other words, Bayesian approaches allow researchers to estimate complex models that would be extremely difficult if not impossible to estimate using frequentist approaches (e.g., MELS models that have random intercepts and slopes on both the location and scale sides of the model).

Second, researchers should describe the model and its parameters in detail. We provide this information for both of our illustrative examples in Appendices D and E as well as in the supplemental material. Third, researchers should justify the use of the prior in their analysis. The priors we used were intended to be uninformed (i.e., should not affect the posterior distribution). Thus, for each example, we justify the selection of our prior by performing a sensitivity analysis to demonstrate that the use of different priors did not affect the results.

Fourth, Kruschke et al. (2012) encouraged researchers to describe the Markov chain Monte Carlo (MCMC) process in detail (see Appendix C). The models are estimated using Stan (Carpenter et al., 2017) through the brms R package (Bürkner, 2017). Stan is a powerful probabilistic programming language that allows researchers to estimate many different types of complex models. The Stan programming language is similar to other open-source Bayesian software like OpenBUGS (Spiegelhalter, Thomas, Best, & Lunn, 2007) or JAGS (Plummer, 2013). However, working with the Stan code directly is more difficult than working with these other software programs or typical R packages. The brms package was written to reduce this burden (Bürkner, 2017) and may be used by researchers to generate efficient Stan code. The brms package requests that you provide code that is similar to typical multilevel model code in R (e.g., see the lme4 R package; Bates, Maechler, Bolker, & Walker, 2015). The brms code is then translated into the Stan language (i.e., brms creates the Stan code as well as the Stan data file).

Estimating a Bayesian MELS or heterogeneous variance model in Stan offers, at least, two important advantages over frequentist approaches. First, these models can partition the variability multiple times on both the location and the scale side of the model. Thus, inevitably, some of the variance components are going to become small and likely inestimable using a frequentist framework (Asparouhov & Muthén, 2016). As mentioned previously, a Bayesian approach allows researchers to estimate models with small variance components. Second, Stan uses a different sort of Bayesian sampling procedure (i.e., Hamiltonian Monte Carlo estimation, specifically the No-U-Turn Sampler; Hoffman & Gelman, 2014; see Appendix C for additional explanation) that is more efficient than Metropolis-Hastings sampling and more flexible than Gibbs sampling (i.e., other commonly used sampling procedures in Bayesian analyses).

In addition to describing this estimation procedure, researchers should provide evidence that the sampler has converged and adequately sampled throughout the posterior distribution (Kruschke et al., 2012). For each example, we provide evidence that the different starting values (i.e., different chains) converge to the same distribution and that autocorrelation is not playing too large of a role in the results. Fifth and finally, for each example, we provide detailed interpretations of the posterior distribution. This step-by-step process should serve as a guide for researchers estimating Bayesian MELS and heterogeneous variance models.

Illustrative Examples

In this section, we provide illustrative examples for nested cross-sectional and longitudinal designs. These examples utilize simulated data (see Appendix D for details regarding data generation) to demonstrate the sorts of research questions that can be addressed using MELS and heterogeneous variance models. Given that the purpose of this article is to demonstrate the flexibility of these methods, the illustrative examples involve relatively complex models that incorporate random slopes. In Appendix E, we provide model alternatives that gradually increase in complexity from an empty linear regression model to a MELS model with random slopes on both sides of the model. These buildup analyses are presented using general notation (i.e., not in the context of the illustrative examples). In the remainder of this section, we briefly describe illustrative research questions and then present the analysis and results as would be typical in a published empirical study. In the supplemental material, we provide the code needed to generate the data that serve as the basis for our illustrative examples as well as the code needed to analyze these data. We also provide interpretations of all parameter estimates (see “Readme.doc” for the intended use of each file). The level of detail provided in the supplemental material would not typically be provided in published articles; this information is meant to aid researchers as they move from output interpretation to writing up the results of their own study.

Nested Cross-Sectional Example: LMX Differentiation

This first example illustrates how the MELS model can be used with nested cross-sectional data to provide new insights into the nature of leader membership exchange (LMX) differentiation. It is common for supervisors to develop different quality relationships with the subordinates they supervise (i.e., who are nested within the leader’s group) resulting in a pattern of LMX that can be described by central tendency (e.g., the mean), variation (e.g., the standard deviation), and relative position (i.e., how does an individual’s LMX compare to others with the same leader; Martin, Thomas, Legood, & Dello Russo, 2018). The MELS model allows researchers to predict these three properties in a single model, while appropriately considering model uncertainty and avoiding the use of aggregation and its associated problems. In this illustrative example, we consider the effects of transformational leadership (i.e., a Level 2 predictor) and a leader’s perceived similarity with each subordinate (i.e., a Level 1 predictor) on LMX reported by subordinates.

Previous research has found that perceived similarity is predictive of higher-quality LMX relationships (Liden, Wayne, & Stilwell, 1993). As a Level 1 variable, perceived similarity can have between, within (relative), and contextual effects (Enders & Tofighi, 2007; Kreft, De Leeuw, & Aiken, 1995) on either the location or the scale side of the model.² The within-group level of the model (i.e., Level 1 or subordinate level), which applies to both the location side and the scale side of the model, predicts the relative position piece of the LMX outcomes (see Equation D1 in Appendix D). In this example, γ₁₀ denotes the relative similarity effect on the location side of the model, and τ₁₀ denotes the same effect on the scale side.

Henderson et al. (2009) proposed that transformational leadership may combine with perceived similarity to predict LMX differentiation. They suggest that more transformational leaders (e.g., Leader A in Figure 4) will be less affected by perceived relative similarity. Specifically, we expect that more transformational leaders will have higher mean LMX ratings and that their LMX ratings will be less variable regardless of perceived similarity. In contrast, we hypothesize that less transformational leaders will have lower mean LMX and that their ratings will be more variable (i.e., see differentiation or distance between the data and line for the subordinates who report to Leader B in Figure 4). For these leaders, relative similarity will be more likely to affect relative LMX ratings in terms of the predicted mean LMX as well as in terms of the variation around that predicted mean. Leaders low on transformational leadership will have higher and more consistent LMX with similar group members. These effects manifest themselves in Figure 4 for Leader B as a positive slope of relative similarity on relative LMX and a shrinking distance between the x symbols and the predicted mean (demarked by the line) as similarity increases. We examine these assertions in our illustrative analysis.

Figure 4.

Depiction of hypothetical leaders who differ in transformational leadership. The two lines depicted in this figure represent different leaders (A & B). The location of the subordinates (depicted by o) around Leader A’s line (who is high in transformational leadership) indicates that this leader has high and consistent leader-member exchange (LMX) relationships with subordinates regardless of the level of similarity between the leader and the subordinate. Consistency is indicated by the equal distance of the o symbols from the line. The location of the subordinates (depicted by x) around Leader B’s line (who is low in transformational leadership) indicates that LMX for this leader depends upon similarity such that the leader has higher LMX with subordinates who are more similar to the leader. Further, the distance of the xs from the line indicates that subordinates who are more similar to the leader are treated more consistently.

Analysis and Results

Prior Justification

To assess the sensitivity of the inferences to the prior, we utilized multiple sets of priors (i.e., two noninformative or vague options and one informed option). The results did not vary greatly. Thus, we present the results using the default uninformed priors. We provide the specific priors used in these analyses below and refer readers to McElreath (2016) and Bürkner (2017) for more information regarding the choice of priors. One advantage of using brms with uninformed priors is that the user does not have to specify the priors. As a default, priors designed not to influence the final results are incorporated into the analysis. In the supplemental material, we illustrate that in less complex models where both the frequentist and Bayesian models are estimated, the results match to two or three decimal places when using similar priors. Thus, these priors are not affecting the final results. In this analysis, improper flat priors were used for the fixed effects (i.e., the distribution does not integrate to one, and every possible parameter value is treated as equally likely; Bürkner, 2017). The default prior of a half Student-t distribution with 3 degrees of freedom and a location and scale parameter of 0 and 10 respectively was used for the standard deviations of the random effects. The prior for the 4 × 4 correlation matrix is the LKJcorr (1) (Lewandowski, Kurowicka, & Joe, 2009), which is a flat prior over all valid correlation matrices. The analysis model is the same as the model that was used to generate the data (see Appendix D).

Model Convergence

Before interpreting the results from the MCMC process, we assessed whether the sampler converged and adequately sampled throughout the posterior distribution. If this is not the case, then the results cannot be interpreted. Two potential problems that could prohibit reaching these goals are start value sensitivity and high autocorrelation among the sampled parameter estimates. Like any estimation or sampling procedure, the MCMC sampler used by Stan (i.e., the No-U-Turn Sampler) needs starting values; thus, four chains with different starting values are used (see Appendix B for more information about Bayesian in general and Appendix C for the No-U-Turn Sampler in particular). Moreover, to ensure that potentially poor starting values did not influence the posterior distribution estimates, many of the initial estimates were discarded (i.e., the first half of the 5,000 samples taken from each chain in this study, 2500). The potential scale reduction factor $(\hat{R})$ was used to assess whether the chains converged to the same posterior distribution; $\hat{R}$ was less than 1.05 for all parameters. Thus, the chains are considered to have converged (Asparouhov & Muthén, 2010; see Table 3).

Table 3.

Random and Fixed Effect Output for the Nested Cross-Sectional Leader-Member Exchange Differentiation Example.

Random Effect Standard Deviations and Correlations		brms Output Label	$\hat{μ}$	$\hat{σ}$	Q_.025	TrueValue	Q_.975	Cover	EffectiveSampleSize
${\hat{σ}}_{u_{0}}$		sd(Intercept)	0.25	0.03	0.18	0.23	0.32	1	3567
${\hat{σ}}_{u_{1}}$		sd(similarL1_groupmc)	0.22	0.04	0.14	0.22	0.30	1	2775
${\hat{σ}}_{ν_{0}}$		sd(sigma_Intercept)	0.09	0.04	0.01	0.10	0.17	1	1075
${\hat{σ}}_{ν_{1}}$		sd(sigma_similarL1_groupmc)	0.05	0.03	0.00	0.10	0.12	1	3019
${\hat{σ}}_{u_{0}, u_{1}}$		cor(Intercept, similarL1_groupmc)	-0.24	0.19	-0.60	-0.30	0.16	1	2577
${\hat{σ}}_{u_{0}, ν_{0}}$		cor(Intercept, sigma_Intercept)	0.02	0.32	-0.62	-0.30	0.65	1	4394
${\hat{σ}}_{u_{1}, ν_{0}}$		cor(similarL1_groupmc, sigma_Intercept)	0.18	0.36	-0.60	0.30	0.81	1	2918
${\hat{σ}}_{u_{0}, ν_{1}}$		cor(Intercept, sigma_similarL1_ groupmc)	0.25	0.41	-0.64	0.00	0.88	1	6928
${\hat{σ}}_{u_{1}, ν_{1}}$		cor(similarL1_groupmc, sigma_similarL1_groupmc)	-0.06	0.42	-0.81	-0.30	0.76	1	8784
${\hat{σ}}_{ν_{o}, ν_{1}}$		cor(sigma_Intercept, sigma_similarL1_groupmc)	0.05	0.43	-0.78	-0.30	0.82	1	6853
Fixed or Population-Level Effects		brms Output Label	$\hat{μ}$	$\hat{σ}$	Q_.025	True Value	Q_.975	Cover	Effective Sample Size
${\hat{γ}}_{00}$	Location Intercept	Intercept	-0.05	0.03	-0.11	0.00	-0.00	0	8890
${\hat{τ}}_{00}$	Scale Intercept	sigma_Intercept	-0.14	0.02	-0.19	-0.14	-0.10	1	3980
${\hat{γ}}_{10}$	Similarity location	similar_groupmc	0.57	0.03	0.51	0.56	0.63	1	9411
${\hat{γ}}_{01}$	Similarity Level-2 location	similarL2	0.64	0.06	0.53	0.56	0.76	1	9743
${\hat{γ}}_{02}$	Transformational location	transform_grandmc	0.37	0.03	0.32	0.37	0.42	1	8897
${\hat{γ}}_{11}$	Location Interaction	similarL1_groupmc: transform_ grandmc	-0.27	0.03	-0.33	-0.28	-0.21	1	9926
${\hat{τ}}_{10}$	Similarity scale	sigma_similarL1_groupmc	-0.17	0.03	-0.22	-0.15	-0.12	1	6364
${\hat{τ}}_{01}$	Similarity Level-2 scale	sigma_similarL2	-0.16	0.04	-0.25	-0.15	-0.07	1	12878
${\hat{τ}}_{02}$	Transformational scale	sigma_transform_grandmc	-0.07	0.02	-0.10	-0.10	-0.03	1	9830
${\hat{τ}}_{11}$	Scale Interaction	sigma_similarL1_groupmc: transform_grandmc	0.10	0.02	0.06	0.08	0.15	1	10634

Note: The brms Output Label would not typically be provided in this kind of table; however, we provide it here so that readers can easily link the output from the analysis to the information presented in the table. $\hat{μ}$ is the posterior mean, $\hat{σ}$ is the posterior standard deviation, Q.025 is the .025 quantile of the posterior distribution or the lower limit of the 95% credible interval, Q.975 is the .975 quantile of the posterior distribution or the upper limit of the 95% credible interval. True value refers to the value that was used to generate the data. Effective sample size is an estimate of the amount of independent information in the autocorrelated samples. $\hat{R}$ is the square root of the total variance divided by the within chain variance and is equal to 1.00 for all parameter estimates. Coverage is whether the quantiles contain the true value – 1 indicates that the interval does contain the true value. σs correspond to standard deviation when they contain only a single subscript. For example, ${\hat{σ}}_{v_{1}}$ is the standard deviation of the scale side random slope for similarity. σs with double subscripts are used to indicate correlations. For example, ${\hat{σ}}_{u_{1}, ν_{1}}$ is the estimated correlation between the location side and the scale side random slopes.

To assess whether autocorrelation is prohibiting adequate sampling from the entire posterior distribution, we investigated two sources of information (i.e., trace plots and the effective sample size). The trace plots displaying the sampled estimates for each iteration have a slope of approximately zero; thus, there is no systematic pattern to the autocorrelation (Figure 5). In other words, the plot is indicative of random noise. Additionally, the effective sample size reveals that autocorrelation was not problematic as the ratio of the number of effective samples to the total number of samples taken was greater than .1 (Kruschke, 2015). An additional diagnostic that supports the claim that the sampler performed well is that the posterior predictive plot (Figure 6) also revealed that the model reproduced the data indicating adequate model “fit.”

Figure 5.

Density and trace plot for the Markov chain Monte Carlo samples for the nested cross-sectional leader-member exchange differentiation example.

Figure 6.

Posterior predictive plot for the nested cross-sectional leader-member exchange differentiation example. The observed data are in black, and the data generated from the model are in gray.

The final indication that the model is performing as expected is only available because we simulated the data. The credible intervals contained the data-generating parameters for all the parameters except for the location intercept. Thus, the 95% credible intervals contained the data-generating (true) value for 19 of the 20 estimated parameters (i.e., 95%). Also note that the model was able to detect all nonzero fixed effects and the nonzero random effect standard deviations; the ability to detect nonzero effects is important, because having credible intervals that contain all parameters may be an indication that the intervals are too wide. This does not appear to be the case. However, there were some effects that the model did not detect; none of the nonzero correlations between the scale model random effects were significant. Thus, a larger sample size is needed to detect these correlations.

Interpretation of Posterior Distribution

Both the mean and the variance sides of the model contain the similarity-by-transformational leadership cross-level interaction, the associated simple main effects, and the Level 2 similarity effect. The location intercept ${\hat{γ}}_{00} = - .05,$ CI(–.11, –.00) is the predicted mean value when all predictors are zero (i.e., at the group mean for perceived similarity, in a group that has an average amount of similarity, and has a leader with an average amount of transformational leadership). The simple main effect of transformational leadership indicates that when similarity is at the group mean, a one-unit increase in transformational leadership results in an expected .37 increase in the LMX mean, ${\hat{γ}}_{02} = .37,$ CI(.32, .42). The Level 1 similarity effect on the location side of the model is estimated to be ${\hat{γ}}_{10} = .57$ , CI(.51, .63) indicating that subordinates with a leader who has an average amount of transformational leadership are predicted to have a .57 unit mean increase in LMX for each unit increase in similarity. This effect is weakened by the interaction term ${\hat{γ}}_{11} = - .27$ , CI(–.33, –.21) (see Figure 7). This effect is interpreted as for each additional unit increase in transformational leadership the effect of similarity on the LMX mean becomes smaller by .27.

Figure 7.

Cross-level interaction plot demonstrating how transformational leadership moderates the effect of similarity on leader-member exchange for the location side of the model (i.e., the predicted mean). The bands reflect the 95% confidence interval.

The scale side of the model is interpreted similarly. The scale intercept, ${\hat{τ}}_{00} = - .14$ , CI(–.19, –.10), is the predicted log residual standard deviation for an employee who is at the group mean for perceived similarity, is in a group that has an average amount of similarity, and has a leader with an average amount of transformational leadership. In other words, like any intercept, ${\hat{τ}}_{00}$ represents the predicted value when all of the predictors are zero. The simple main effect of transformational leadership indicates that when similarity is at the group mean, the effect of a one-unit increase in transformational leadership results in a .07 decrease in LMX log residual standard deviation, ${\hat{τ}}_{02}$ = –.07, CI(–.10, –.03). In other words, transformational leaders tend to have more consistent (or less variable) LMX relationships with their subordinates. The similarity effect ${\hat{τ}}_{10}$ = -.17, CI(–.22, –.12) indicates that when transformational leadership is at its grand mean, the effect of a one-unit increase in similarity is expected to decrease the log residual LMX standard deviation by .17. The cross-level interaction effect weakens the similarity effect by .10, CI (.06, .15). As seen in Figure 8, the slope of similarity decreases (i.e., less negative) for those higher in transformational leadership.

Figure 8.

Cross-level interaction plot demonstrating how transformational leadership moderates the effect of similarity on leader-member exchange for the scale side of the model. This plot presents the results for the log residual standard deviation.

In addition to interpreting the scale side of the model on the log metric, the model can also be interpreted in terms of standard deviations (Lang et al., 2018). These standard deviations are the predicted residual standard deviation (i.e., square root of the residual variance) given specific predictor values. Computing the predicted standard deviation is performed by exponentiating the coefficients; the standard deviations can then be converted into a percentage change that provides a meaningful effect size metric. See Appendix F for a more detailed explanation for this calculation. For example, assuming average similarity (i.e., both the Level 2 and Level 1 predictor values are zero) the predicted standard deviation for a group with a leader that is one unit above average on transformational leadership is calculated as e^{(–.14 – .07)}. Thus, their predicted standard deviation value is approximately .81. If the leader had an average level of transformational leadership, then the predicted value would be e^-.14, which is approximately .87. The percentage change is computed as (.81 – .87)/.87, which is approximately –.07 or a 7% decrease. Thus, for a one-unit increase in transformational leadership the standard deviation of LMX is reduced by seven percent.

A similar interpretation could be made for the similarity effect for those in a group with a leader that has an average level of transformational leadership and an average group similarity. A one-unit increase in perceived similarity results in a [(e^{–.14 – .17} – e^–.14)/e^–.14] × 100 = −15.63% decrease in the predicted standard deviation of LMX. The percentage change is smaller for more transformational leaders. Further, Figures 8 and 9 demonstrate how the slope for similarity as a predictor becomes smaller as transformational leadership increases.

Figure 9.

Cross-level interaction plot demonstrating how transformational leadership moderates the effect of similarity on leader-member exchange for the scale side of the model. This plot is rescaled to represent the residual standard deviation.

In summary, LMX relationship quality is affected less by perceived similarity when leaders are higher in transformational leadership. In terms of central tendency, a positive simple main effect of relative similarity (i.e., γ₁₀) indicates that those with relatively more similarity are expected to have higher mean levels of LMX, but that effect decreases as transformational leadership increases. In terms of variation, relatively high similarity results in more consistent LMX relationships. This effect decreases as transformational leadership increases, indicating that transformational leaders treat subordinates consistently regardless of similarity. Groups with similar members have higher mean levels of LMX and less differentiation in LMX ratings.

Longitudinal Example: Between Firm Performance Variability

This second example illustrates how a heterogeneous variance model could be used with longitudinal data to provide new insights into between-firm performance variability differences based on the strategy enacted by different firms. Imitation has traditionally been assumed to reduce between firm performance heterogeneity in an industry. However, Posen and Martignoni (2018) challenged that assumption with a computational model that illustrates the conditions in which imitation can increase between firm performance heterogeneity. They theorize that due to limited information, imitating firms will be unable to duplicate the desired strategy completely and obtain the desired performance. Instead, “copying” firms will have to engage in post-imitation learning where the initial imitation strategy is adjusted. Firms may vary widely in their post-imitation learning ability. Thus, imitation results in both an increased likelihood of achieving the desired performance and an increased likelihood of performing far worse. Thus, there will be increased between-firm performance variability overtime when firms in an industry adopt an imitation strategy.

The data generated in this hypothetical example are meant to emulate monthly performance data across a fiscal year (see Appendix D for details). Thus, time is nested within firm, and each firm is measured on 12 equally spaced occasions. We compare an “imitator” set of firms to a “business as usual” set of firms (i.e., those that continue to perform their preexisting strategies). The outcome is defined generically as firm performance. In terms of heterogeneous variance models, this specific model contains a cross-level interaction on the location side (i.e., the same model as seen in the nested cross-sectional example) and a heterogeneous pattern of Level 2 covariance matrices (i.e., covariance matrices that are systematically varying due to the strategy adopted). The easiest way to estimate this model that allows us to use brms (instead of utilizing Stan directly) is to employ a multiple group approach where each group (i.e., imitators versus business as usual) is allowed to have its own covariance matrix. This approach is similar to some of the approaches described by Kuppens and Yzerbyt (2014). As previously stated, there is a random intercept and linear slope for time as well as their covariance on the location side of the model. In other words, the model allows for differences in terms of initial firm performance (i.e., at the beginning of the fiscal year) and also differences in terms of how much they grow from month to month. The heterogeneous between-group covariance matrices are based on the following rationale.

The decision to imitate the industry leader may be a radical change from the existing practices, and the decision-makers in a firm are likely to be resistant to such radical forms of change. This drastic step would not be taken unless the firm is performing poorly. Based on this rationale, at the beginning of the fiscal year, the imitating firms are hypothesized to have consistent poor performance (i.e., a smaller between-group variability). The location slope variability is hypothesized to be larger in the imitating firms for reasons described previously. Finally, the correlation (i.e., standardized covariance) between the intercept and slope is generated to be larger in the imitation group than the business as usual group. This hypothesis is based on the rationale that very poor initial performance is expected to be associated with a lack of resources or ability to compete in the industry. In other words, no matter what sort of strategy they choose, they are unlikely to succeed.

Analysis and Results

Prior Justification

As in the first illustrative example, the analysis model is the same as the data-generating model, and we use noninformative default priors. The default noninformative priors for the fixed effects are improper (i.e., the distribution does not integrate to one) flat priors (Bürkner, 2017). The standard deviations of the random effects in brms have a default prior of a half student-t distribution with 3 degrees of freedom and a location and scale parameter of 0 and 10 respectively. The prior for each of the 2 × 2 correlation matrices is the LKJcorr (1) (Lewandowski et al., 2009), which is a flat prior over all valid correlation matrices.

Model Convergence

The convergence diagnostics of $\hat{R}$ being less than 1.05 for all the parameters estimates suggest that the sampler has converged (see Table 4). The effective sample size ratio computed as the number of effective samples divided by the total number of samples (i.e., n_eff/10,000) being larger than .1 or n_eff > 1,000 for all the parameters suggests that autocorrelation among the sampled estimates is not causing the sampling procedure to get stuck in any portion of the posterior (see Table 4). The trace plots (Figure 10) indicate that there are no problems with autocorrelation because the slope of those lines is zero and the plots are indicative of random noise. Finally, the posterior predictive plot (Figure 11) also reveals that the model reproduces the data indicating that the model “fits” the data. The final indication that the model is performing as expected is only available because we simulated the data. The credible intervals contained the data-generating parameters for all the parameters.

Figure 10.

Density and trace plot for the Markov chain Monte Carlo samples for the longitudinal between firm performance variability example.

Figure 11.

Posterior predictive plot for the longitudinal between firm performance variability example. Note that the observed data are in black and the data generated from the model are in gray.

Table 4.

Fixed and Random Effect Output for Longitudinal Between Firm Performance Variability Example.

			brms Output Label	$\hat{μ}$	$\hat{σ}$	Q_.025	TrueValue	Q_.975	coverage	EffectiveSampleSize
Random	${\hat{σ}}_{u_{0}}$	Control	sd(Intercept: strat0)	1.46	0.10	1.27	1.41	1.65	1	6251
	${\hat{σ}}_{u_{1}}$	Control	sd(time: strat0)	1.02	0.05	0.93	1.00	1.11	1	4378
	${\hat{σ}}_{u_{0}}$	Imitator	sd(Intercept: strat1)	.99	0.10	0.81	1.00	1.18	1	4303
	${\hat{σ}}_{u_{1}}$	Imitator	sd(time: strat1)	1.77	0.08	1.62	1.73	1.94	1	3629
	${\hat{σ}}_{u_{0}, u_{1}}$	Control	cor(Intercept: strat0, time: strat0)	0.41	0.07	0.26	0.30	0.54	1	1400
	${\hat{σ}}_{u_{0}, u_{1}}$	Imitator	cor(Intercept: strat1, time: strat1)	0.55	0.08	0.39	0.50	0.71	1	1616
	$σ_{e_{i j}}$		sigma	1.89	.02	1.85	1.90	1.93	1	11125
Fixed	${\hat{γ}}_{00}$	Location Intercept	Intercept	0.00	0.11	-0.22	0	0.23	1	8965
	${\hat{γ}}_{10}$	Location Time	time	-0.05	0.06	-0.17	0	0.08	1	4179
	${\hat{γ}}_{01}$	Location Imitator	strat	-0.91	0.15	-1.19	-1.00	-0.62	1	9244
	${\hat{γ}}_{11}$	Location Interaction	time: strat	0.19	0.13	-0.08	0	0.44	1	5423

Note: The brms output label would not typically be provided in this kind of table; however, we provide it here so that readers can easily link the output from the analysis to the information presented in the table. $\hat{μ}$ is the posterior mean, $\hat{σ}$ is the posterior standard deviation, Q.025 is the .025 quantile of the posterior distribution or the lower limit of the 95% credible interval, Q.975 is the .975 quantile of the posterior distribution or the upper limit of the 95% credible interval. True value refers to the value that was used to generate the data. Effective sample size is an estimate of the amount of independent information in the autocorrelated samples. $\hat{R}$ is the square root of the total variance divided by the within chain variance and is equal to 1.00 for all parameter estimates. Coverage is whether the quantiles contain the true value – 1 indicates that the interval does contain the true value. σs correspond to standard deviation when they contain only a single subscript. For example, ${\hat{σ}}_{u_{1}}$ is the standard deviation of the location side random slope for similarity. σs with double subscripts are used to indicate correlations. For example, ${\hat{σ}}_{u_{0}, u_{11}}$ is the estimated correlation between random intercepts and slopes.

Interpretation of Posterior Distribution

The interpretation for the location side of the model is the same as a typical multilevel model with a cross-level interaction. The only nonzero effect in this example was the simple main effect of being an imitator. In line with the first, hypothesis, at the beginning of the study, the imitators are predicted to perform worse than the business as usual condition, ${\hat{γ}}_{01} = - .91,$ CI (–1.19, –.62), see Figure 12. The scale side of the model in this example is incorporated through a multigroup analysis. In this example, the variability differences between the groups are of primary substantive interest. As seen in Table 4, the intercept variability is smaller (32.19% smaller; [.99 – 1.46]/1.46) for the imitator group and slope variability is larger (73.53% larger; [1.77 – 1.02]/1.02). One-tailed hypothesis tests also reveal that the intercept variability is smaller in the imitator group and that the slope variability is larger (i.e., the credible interval of their difference did not contain zero; see Table 5). Support for the hypothesized correlation difference was not found, because the credible interval contained zero. Thus, these results adhered to our expectations regarding the standard deviations but not for the correlations. At the beginning of the study, the imitators performed consistently worse. Throughout the year some of the firms improved substantially, while others did not (see the relatively large slope variability in Figure 13).

Figure 12.

Cross-level interaction plot demonstrating how strategy moderates the change in average performance over time. The shaded regions or bands reflect the 95% confidence interval.

Figure 13.

Bar chart displaying random intercept and slope standard deviations by strategy.

Table 5.

Variability Magnitude Hypotheses.

Hypothesis	$\hat{μ}$ _diff	$\hat{σ}$ _diff	Q_.05	Q_.95
${\hat{σ}}_{u_{0}}$ Control – ${\hat{σ}}_{u_{0}}$ Imitators > 0	0.47	0.15	0.25	∞
${\hat{σ}}_{u_{1}}$ Control – ${\hat{σ}}_{u_{1}}$ Imitators < 0	–0.75	0.09	–∞	–0.60
${\hat{σ}}_{u_{0}, u_{1}}$ Control – ${\hat{σ}}_{u_{0}, u_{1}}$ Imitators < 0	–0.15	0.11	–∞	0.03

Note: $\hat{μ}$ _diff is the posterior mean of the differences between the two estimates; $\hat{σ}$ _diff is the standard deviation of the difference. These are one-sided confidence intervals. Thus, only one side of the credible interval is of interest. When evaluating a less than hypothesis only the upper limit is needed, and when evaluating a greater than hypothesis only the lower limit is needed. Note that the credible intervals do not include zero for the variances but do for correlation comparison.

Discussion

Interest in predicting variance-based constructs is increasing among organizational researchers. Lang et al. (2018) recently illustrated the use of a multilevel model to predict changes in consensus (agreement) among members of a group over time. We build upon this first application of multilevel models for predicting heterogeneous residual variances in the organizational sciences by highlighting the flexible nature of MELS and heterogeneous variance models. Researchers can utilize these models in both micro and macro organizational research with different designs (nested cross-sectional or longitudinal) to model different portions of the variability (within- or between-group variability). Below we summarize the advantages of using the most general MELS models.

Advantages of Using MELS

The MELS model offers several statistical advantages over other approaches, including heterogeneous variance models. First, by including random effects (particularly the random intercept on the scale side of the model), it is possible for researchers to test the assumption that all Level 2 units have the same amount of within-group/person variability (i.e., homogeneous variances) and avoid inflated Type I error rates (Leckie et al., 2014; Walters et al., 2018). Moreover, this model allows researchers to avoid the limitations of aggregation (i.e., computing a standard deviation for each group and predicting it in a single level regression) that include the inability to include Level 1 predictors and disregarding uncertainty in the aggregated variability estimates. This uncertainty should be and is accounted for when the data are analyzed with the MELS model.

As mentioned previously, applying MELS models to predict a construct of substantive interest increases the types of research questions that researchers can address. To this point, organizational researchers have predominately focused on hypotheses about the amount (mean) of responses. However, researchers should consider whether there are sound theoretical reasons to propose hypotheses aimed at predicting variability. Our illustrative example using nested cross-sectional LMX data exemplifies the types of questions that can be addressed with multilevel models but are not possible to examine using an aggregation approach. In the example, cross-level interaction research questions with random slopes for both the mean and the residual standard deviation were addressed using data that were generated based upon a substantive example. In the context of LMX and LMX differentiation, perceived similarity was allowed to have random slopes when predicting LMX mean (location) and LMX differentiation (scale); perceived similarity was also allowed to have Level 2 effects on both sides of the model. The effect of perceived similarity was qualified by transformational leadership as a moderator.

Researchers may also examine questions that consider the amount (mean) and the variability in combination. For example, researchers studying the results of an intervention may find that a particular treatment not only increases the mean of a construct (e.g., employee wellbeing) but also decreases the variability of ratings over time. As another example, researchers might propose that the intervention treatment group would start with a mix (high variability) in wellbeing scores, but over time, all individuals in the treatment group would both improve and become more homogenous. In contrast, a less advantageous treatment might increase the mean, but may also increase the variability. In this case, although on average wellbeing may be improving, the results also suggest that the treatment is creating a disparity between people. In other words, the treatment works for some but not for others. This finding would be similar to the results of the second illustrative example examining firm performance. The ability to predict the mean and variability in the same model makes it possible to test these sorts of combination hypotheses.

Practical Recommendations

Bayesian estimation is appropriate and likely necessary when estimating complex MELS models because of the large number of random effects. The Bayesian approach, in conjunction with brms (Bürkner, 2017) and Stan (Carpenter et al., 2017), utilized in this article allows for the estimation of random slopes as well as cross-level interactions in both sides of the model. In terms of software comparison, brms and Stan is the most widely available alternative. The only other software that has been utilized to estimate this model is Stat-JR which is also a Bayesian program that is distributed with MLWin (Browne, 2017). MixRegLs (Hedeker & Nordgren, 2013) or MixWild does not allow for random slopes on the scale side, and as of version 8.0, Mplus (Muthén & Muthén, 1998-2017) can specify heterogeneous variances, but it is unclear how a random slope could be incorporated on the scale side. Perhaps, the dynamic structural equation modeling framework (Asparouhov, Hamaker, & Muthén, 2018) that incorporates time series analysis into a structural equation modeling framework could provide an alternative.

Bayesian analyses offer several statistical advantages; however, these approaches are also unfamiliar to many researchers. Providing priors is an extra step in conducting an analysis, and researchers must ensure that their choices regarding priors do not influence the final result inappropriately. While brms helps reduce the burden on researchers by supplying default priors that seek not to influence the final results, this is still an issue that researchers should consider. Convergence issues can occur in Bayesian analysis, and when they do, researchers must resolve these issues by investigating various convergence diagnostics. Particularly, researchers may face challenges when estimating random slopes on both sides of the model. We provide code to perform the analyses but do not provide a lengthy discussion of the process of addressing convergence issues. We encourage researchers to consult accessible resources, including Kruschke (2015) and McElreath (2016), as this process is very data/model dependent.

To demonstrate the flexibility provided by utilizing a Bayesian approach to testing variability-related hypotheses, we present a complex MELS model (i.e., with random slopes on the location and scale sides). However, in some cases, less complex models may be more aligned with the questions of substantive interest. We recommend readers use Appendix E as a tool to consider the different types of models they may estimate and to pick the model that best fits their theory and research questions.

One very important takeaway from our review of the existing literature is that precise language is necessary when interpreting complex multilevel models. We found instances where researchers were conducting traditional multilevel models (e.g., growth models) and interpreting the results as if they were predicting heterogeneous variances. Language such as “we predicted variability” may lead readers to think that the study is predicting heterogeneous residual variances when the study is actually using a traditional multilevel model to explain a portion of a single variance estimate that the model provides for everyone in the study (e.g., a Level 2 predictor explains some random intercept variance; see Figure 3). Thus, we recommend that researchers use more precise language when describing their analysis. For example, researchers applying these models should clarify that they used a MELS or heterogeneous variance model to predict heterogeneous residual variances. This language should make it clear that the outcome of interest is a variance-based construct and that these analyses are distinct from explaining variability as is done in a typical regression analysis.

Researchers may have questions regarding centering decisions in MELS and heterogeneous variance models. To our knowledge, researchers have not examined this issue with these specific models. However, given that the scale- side of the model is a log-linear model, researchers may interpret the τs as they would the βs on the scale side. Enders and Tofighi (2007) advise that centering decisions should be based on the research question of interest. Researchers who are interested in Level 1 predictors could include the group mean centered version of the Level 1 predictor without including the Level 2 variable. However, if this approach is used, it is important to remember that all Level 2 variability is being ignored. When researchers are interested in both Level 1 and Level 2 versions of the same predictor, then either group or grand mean centering can be used to provide the Level 1, the Level 2, and the contextual effect. Either approach will provide two of these three effects, and the third effect can be calculated from the other two (Enders & Tofighi, 2007). Further, for researchers interested in a Level 2 predictor only, centering decisions are not overly complicated, and advice concerning single-level regression analyses applies.

Researchers may be interested in providing effect size calculations for models with random slopes on the scale side. Pseudo-R² values could be calculated for each of the variance components in this model, and they have been shown to work well with models that include random intercepts on both the location side and the scale side (Walters et al., 2018). However, these analyses have yet to be extended to the random slope case. Until this research is conducted, researchers may consider using the intuitive effect size estimate obtained by calculating the proportional increase/decrease in the standard deviation (Lang et al., 2018).

Researchers should be aware that a mediation version of the MELS model has also not yet been developed and evaluated. Many of the hypotheses discussed in this article could easily be expanded to include mediated relationships. For example, the LMX example could be extended to view LMX differentiation as a mediator and add group performance as the distal outcome (Henderson et al., 2009). To address this hypothesis without the shortcomings of aggregation, a mediation version of the model would need to be developed.

Power analyses have also not yet been conducted for the full MELS model presented in this article (i.e., random slopes on both sides of the model). However, recent research examining less complicated models has begun to provide some insight into power issues with these analyses. For example, Walters et al. (2018) found that the power to detect the scale-model random intercept variance and the effect of an individual-level predictor of residual variability increased with the number of individuals and occasions of measurement (and provided power curves that may help researchers planning to conduct a similar analysis).

Research on longitudinal modeling has demonstrated the benefits of model building that occurs in a step-up fashion where the models start simple and become increasingly complex. For example, polynomials of increasing order are often fit to get the best model for improvement over time before adding any additional predictors to the model. Within a given side of the model, it makes sense to model the effect of time correctly before adding in any additional predictors. However, whether researchers should model the location (mean) or scale (variance) side of the model first remains an open question or future research. As methodological work on these models proceeds, researchers should ground their modeling building in theoretical arguments and also test multiple competing models. These models can be compared using the Watanabe-Akaike information criterion (WAIC; Gelman, Hwang, & Vehtari, 2014) which is an extension to the commonly applied Akaike information criterion (AIC; Akaike, 1998) that does not assume that posterior distribution is approximately multivariate normal. We provide examples illustrating how to conduct model comparisons in the supplemental material.

Conclusion

Variability is no longer viewed as something to be averaged over, ignored, or tricked into fitting the assumptions of a typical general linear model. It is substantively interesting to researchers working in many different areas. Regression models produce normal distributions that are conditional on the values of the predictors and their associated regression coefficients. Like any normal distribution, these distributions are characterized by both the mean and the variance. Thus, every effort that is used to model the mean correctly should be used to model the variability correctly as well. By expanding the multilevel repertoire to include MELS and heterogeneous variance models, researchers can appropriately analyze mean and variability-related hypotheses at multiple levels simultaneously.

Footnotes

Appendix A

Appendix B

Appendix C

Appendix D

Appendix E

Appendix F

Authors’ Note

We would like to thank Justin Jones as well as special feature editor Rory Eckardt, the other editors, and anonymous reviewers for the Feature Topic on New Approaches to Multilevel Methods and Statistics for their comments and suggestions throughout the review of this article.

Houston F. Lester is also affiliated with Center for Innovations in Quality, Effectiveness and Safety, Michael E. DeBakey VA Medical Center, Houston, Texas.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was partially funded by the first author’s VA Health Services Research & Development (HSR&D) Postdoctoral Fellowship: HX-17-014.

ORCID iD

Houston F. Lester

Notes

References

Aitkin

(1987). Modeling variance heterogeneity in normal regression using GLIM. Applied Statistics, 36(3), 332–339. doi:10.2307/2347792

Akaike

(1998). Information theory and an extension of the maximum likelihood principle. In Selected papers of Hirotugu Akaike (pp. 199–213). New York, NY: Springer. doi.org/10.1007/978-1-4612-1694-0_15

Asparouhov

Hamaker

E. L.

Muthén

(2018). Dynamic structural equation models. Structural Equation Modeling, 25(3), 359–388. doi:10.1080/10705511.2017.1406803

Asparouhov

Muthén

(2010). Bayesian analysis using Mplus: Technical implementation (Tech. Rep. 3). Retrieved from www.statmodel.com/download/Bayes3.pdf

Asparouhov

Muthén

(2016). General random effect latent variable modeling: Random subjects, items, contexts, and parameters. In Harring

J. R.

Stapleton

L. M.

Beretvas

S. N.

(Eds.), Advances in multilevel modeling for educational research: Addressing practical issues found in real-world applications (pp. 163–192). Charlotte, NC: Information Age.

Balasubramanian

Lieberman

M. B.

(2010). Industry learning environments and the heterogeneity of firm performance. Strategic Management Journal, 31(4), 390–412. doi:10.1002/smj.816

Barnes

C. M.

Reb

Ang

(2012). More than just the mean: Moving to a dynamic view of performance-based compensation. Journal of Applied Psychology, 97(3), 711–718. doi:10.1037/a0026927

Bates

D. M.

Maechler

Bolker

Walker

(2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67. doi:10.18637/jss.v067.i01

Bayes

(1763). LII. An essay towards solving a problem in the doctrine of chances. By the late Rev. Mr. Bayes, FRS communicated by Mr. Price, in a letter to John Canton, AMFR S. Philosophical Transactions of the Royal Society of London, 53, 370–418.

10.

Bidwell

Keller

J. R.

(2014). Within or without? How firms combine internal and external labor markets to fill jobs. Academy of Management Journal, 57(4), 1035–1055. doi:10.5465/amj.2012.0119

11.

Bliese

P. D.

Ployhart

R. E.

(2002). Growth modeling using random coefficient models: Model building, testing, and illustrations. Organizational Research Methods, 5(4), 362–387. doi:10.1177/109442802237116

12.

Bolger

Laurenceau

J. P.

(2013). Intensive longitudinal methods: An introduction to diary and experience sampling research. New York, NY: Guilford.

13.

Browne

W. J.

(2017). MCMC Estimation in MLwiN v3.00. University of Bristol, Centre for Multilevel Modelling.

14.

Bryk

A. S.

Raudenbush

S. W.

(1988). Heterogeneity of variance in experimental studies: A challenge to conventional interpretations. Psychological Bulletin, 104(3), 396–404. doi:10.1037//0033-2909.104.3.396

15.

Bürkner

P. C.

(2017). Brms: An R package for Bayesian multilevel models using Stan. Journal of Statistical Software, 80. doi:10.18637/jss.v080.i01

16.

Carnahan

Agarwal

Campbell

B. A.

(2012). Heterogeneity in turnover: The effect of relative compensation dispersion of firms on the mobility and entrepreneurship of extreme performers. Strategic Management Journal, 33(12), 1411–1430. doi:10.1002/smj.1991

17.

Carpenter

Gelman

Hoffman

M. D.

Lee

Goodrich

Betancourt

… Riddell

(2017). Stan: A probabilistic programming language. Journal of Statistical Software, 76. doi:10.18637/jss.v076.i01

18.

Casella

Berger

R. L.

(2002). Statistical inference (2nd ed.). Pacific Grove, CA: Wadsworth.

19.

Cen

Yang

(2013). Investor sentiment, disagreement, and the breadth–return relationship. Management Science, 59(5), 1076–1091. doi:10.1287/mnsc.1120.1633

20.

Chan

(1998). Functional relations among constructs in the same content domain at different levels of analysis. Journal of Applied Psychology, 83(2), 234–246. doi:10.1037//0021-9010.83.2.234

21.

Chin

M. K.

Semadeni

(2017). CEO political ideologies and pay egalitarianism within top management teams. Strategic Management Journal, 38(8), 1608–1625. doi:10.1002/smj.2608

22.

Chordia

Subrahmanyam

Anshuman

V. R.

(2001). Trading activity and expected stock returns. Journal of Financial Economics, 59(1), 3–32. doi:10.1016/S0304-405X(00)00080-5

23.

Chowdhury

S. K.

Endres

M. L.

(2010). The impact of client variability on nurses’ occupational strain and injury: Cross-level moderation by safety climate. Academy of Management Journal, 53(1), 182–198. doi:10.5465/amj.2010.48037720

24.

Chun

J. S.

Choi

J. N.

(2014). Members’ needs, intragroup conflict, and group performance. Journal of Applied Psychology, 99(3), 437–450. doi:10.1037/a0036363

25.

Cobb

J. A.

Stevens

F. G.

(2017). These unequal states: Corporate organization and income inequality in the United States. Administrative Science Quarterly, 62(2), 304–340. doi:10.1177/0001839216673823

26.

Culpepper

S. A.

(2010). Studying individual differences in predictability with gamma regression and nonlinear multilevel models. Multivariate Behavioral Research, 45(1), 153–185.

27.

Dalal

R. S.

Bhave

D. P.

Fiset

(2014). Within-person variability in job performance: A theoretical review and research agenda. Journal of Management, 40(5), 1396–1436. doi:10.1177/0149206314532691

28.

Dalal

R. S.

Meyer

R. D.

Bradshaw

R. P.

Green

J. P.

Kelly

E. D.

Zhu

(2015). Personality strength and situational influences on behavior: A conceptual review and research agenda. Journal of Management, 41(1), 261–287. doi:10.1177/0149206314557524

29.

Davidian

Giltinan

(1995). Nonlinear models for repeated measurement data. Monographs on Statistics and Applied Probability. London, UK: Chapman & Hall.

30.

Depaoli

van de Schoot

(2015). Improving transparency and replication in Bayesian statistics: The WAMBS-checklist. Psychological Methods, 22(2), 240–261. doi:10.1037/met0000065

31.

DeRue

D. S.

Hollenbeck

Ilgen

Feltz

(2010). Efficacy dispersion in teams: Moving beyond agreement and aggregation. Personnel Psychology, 63(1), 1–40. doi:10.1111/j.1744-6570.2009.01161.x

32.

Dulebohn

J. H.

Bommer

W. H.

Liden

R. C.

Brouer

R. L.

Ferris

G. R.

(2012). A meta-analysis of antecedents and consequences of leader-member exchange: Integrating the past with an eye toward the future. Journal of Management, 38(6), 1715–1759. doi:10.1177/0149206311415280

33.

Enders

C. K.

Tofighi

(2007). Centering predictor variables in cross-sectional multilevel models: A new look at an old issue. Psychological Methods, 12(2), 121–138. doi:10.1037/1082-989x.12.2.121

34.

Erdogan

Bauer

T. N.

(2010). Differentiated leader–member exchanges: The buffering role of justice climate. Journal of Applied Psychology, 95(6), 1104–1120. doi:10.1037/a0020578

35.

Figueiredo

P. N.

(2011). The role of dual embeddedness in the innovative performance of MNE subsidiaries: Evidence from Brazil. Journal of Management Studies, 48(2), 417–440. doi:10.1111/j.1467-6486.2010.00965.x

36.

Fredrickson

J. W.

Davis-Blake

Sander

W. G

. (2010). Sharing the wealth: Social comparisons and pay dispersion in the CEO’s top team. Strategic Management Journal, 31(10), 1031–1053. doi:10.1002/smj.848

37.

Gabriel

A. S.

Diefendorff

J. M.

(2015). Emotional labor dynamics: A momentary approach. Academy of Management Journal, 58(6), 1804–1825. doi:10.5465/amj.2013.1135

38.

Gabriel

A. S.

Diefendorff

J. M.

Chandler

M. M.

Moran

C. M.

Greguras

G. J.

(2014). The dynamic relationships of work affect and job satisfaction with perceptions of fit. Personnel Psychology, 67(2), 389–420. doi:10.1111/peps.12042

39.

Gelman

Hill

(2007). Data analysis using regression and multilevel/hierarchical models. Cambridge, UK: Cambridge University Press.

40.

Gelman

Hwang

Vehtari

(2014). Understanding predictive information criteria for Bayesian models. Statistics and Computing, 24(6), 997–1016. doi:10.1007/s11222-013-9416-2

41.

Goldstein

(2011). Multilevel statistical models (Vol. 922). John Wiley.

42.

Gonzalez-Mulé

DeGeest

D. S.

McCormick

B. W.

Seong

J. Y.

Brown

K. G.

(2014). Can we get some cooperation around here? The mediating role of group norms on the relationship between team personality and individual helping behaviors. Journal of Applied Psychology, 99(5), 988–999. doi:/10.1037/a0037278

43.

González-Romá

Peiró

J. M.

Tordera

(2002). An examination of the antecedents and moderator influences of climate strength. Journal of Applied Psychology, 87(3), 465–473. doi:10.1037//0021-9010.87.3.465

44.

Gooty

Yammarino

F. J.

(2016). The leader–member exchange relationship: A multisource, cross-level investigation. Journal of Management, 42(4), 915–935. doi:10.1177/0149206313503009

45.

Greenland

S. G.

(2006). Bayesian perspectives for epidemiological research: I. Foundations and basic methods. International Journal of Epidemiology, 35(3), 765–775. doi:10.1093/ije/dyi312

46.

Guo

(2017). Demystifying variance in performance: A longitudinal multilevel perspective. Strategic Management Journal, 38(6), 1327–1342. doi:10.1002/smj.2555

47.

Harvey

A. C.

(1976). Estimating regression models with multiplicative heteroscedasticity. Econometrica, 44(3), 461–466. doi:10.2307/1913974

48.

Hausknecht

J. P.

Holwerda

J. A.

(2013). When does employee turnover matter? Dynamic member configurations, productive capacity, and collective performance. Organization Science, 24(1), 210–225. doi:10.1287/orsc.1110.0720

49.

Hedeker

Mermelstein

R. J.

Demirtas

(2008). An application of a mixed-effects location scale model for analysis of ecological momentary assessment (EMA) data. Biometrics, 64(2), 627–634. doi:10.1111/j.1541-0420.2007.00924.x

50.

Hedeker

Nordgren

(2013). MIXREGLS: A program for mixed-effects location scale analysis. Journal of Statistical Software, 52, 1–38. doi:10.18637/jss.v052.i12

51.

Henderson

D. J.

Liden

R. C.

Glibkowski

B. C.

Chaudhry

(2009). LMX differentiation: A multilevel review and examination of its antecedents and outcomes. Leadership Quarterly, 20(4), 517–534. doi:10.1016/j.leaqua.2009.04.003

52.

Hoffman

(2007). Multilevel models for examining individual differences in within-person variation and covariation over time. Multivariate Behavioral Research, 42(4), 609–629. doi:10.1080/00273170701710072

53.

Hoffman

M. D.

Gelman

(2014). The No-U-Turn Sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo. Journal of Machine Learning Research, 15(1), 1593–1623.

54.

Hofmann

D. A.

(2002). Issues in multilevel research: Theory development, measurement, and analysis. In Rogelberg

S. G.

(Ed.), Handbook of research methods in industrial and organizational psychology (pp. 247–274). Malden, MA: Blackwell.

55.

Huang

J. L.

Ford

J. K.

Ryan

A. M.

(2017). Ignored no more: Within-person variability enables better understanding of training transfer. Personnel Psychology, 70(3), 557–596. doi:10.1111/peps.12155

56.

Ilgen

D. R.

Hollenbeck

J. R.

Johnson

Jundt

(2005). Teams in organizations: From input-process-output models to IMOI models. Annual Review of Psychology, 56, 517–543. doi:10.1146/annurev.psych.56.091103.070250

57.

James

L. R.

Demaree

R. G.

Wolf

(1984). Estimating within-group interrater reliability with and without response bias. Journal of Applied Psychology, 69(1), 85–98. doi:10.1037//0021-9010.69.1.85

58.

James

L. R.

Demaree

R. G.

Wolf

(1993). rwg: An assessment of within-group interrater agreement. Journal of Applied Psychology, 78(2), 306–309. doi:10.1037//0021-9010.78.2.306

59.

Jaskiewicz

Block

J. H.

Miller

Combs

J. G.

(2017). Founder versus family owners’ impact on pay dispersion among non-CEO top managers: Implications for firm performance. Journal of Management, 43(5), 1524–1552. doi:10.1177/0149206314558487

60.

Kaplan

(2014). Bayesian statistics for the social sciences. New York, NY: Guilford.

61.

Kellermanns

F. W.

Walter

Lechner

Floyd

S. W.

(2005). The lack of consensus about strategic consensus: Advancing theory and research. Journal of Management, 31(5), 719–737. doi:10.1177/0149206305279114

62.

Kim

J. Y.

Finkelstein

Haleblian

(2015). All aspirations are not created equal: The differential effects of historical and social aspirations on acquisition behavior. Academy of Management Journal, 58(5), 1361–1388. doi:10.5465/amj.2012.1102

63.

Kreft

I. G.

De Leeuw

Aiken

L. S.

(1995). The effect of different forms of centering in hierarchical linear models. Multivariate Behavioral Research, 30(1), 1–21. doi:10.1207/s15327906mbr3001_1

64.

Kruschke

J. K.

(2015). Doing Bayesian data analysis: A tutorial with R, Jags, and Stan (2nd ed.). New York, NY: Academic Press.

65.

Kruschke

J. K.

Aguinis

Joo

(2012). The time has come: Bayesian methods for data analysis in the organizational sciences. Organizational Research Methods, 15(4), 722–752. doi:10.1177/1094428112457829

66.

Kuppens

Yzerbyt

V. Y.

(2014). Predicting variability: Using multilevel models to assess differences in variances. European Journal of Social Psychology, 44(7), 691–700. doi:10.1002/ejsp.2028

67.

Kutner

Nachtsheim

Neter

(2004). Applied linear statistical models (5th ed.). New York, NY: McGraw-Hill.

68.

Laamanen

Keil

(2008). Performance of serial acquirers: Toward an acquisition program perspective. Strategic Management Journal, 29(6), 663–672. doi:10.1002/smj.670

69.

Lang

J. W. B.

Bliese

P. D.

de Voogt

(2018). Modeling consensus emergence in groups using longitudinal multilevel methods. Personnel Psychology, 71(2), 255–281. doi:10.1111/peps.12260

70.

Leckie

French

Charlton

Browne

(2014). Modeling heterogeneous variance–covariance components in two-level models. Journal of Educational and Behavioral Statistics, 39(5), 307–332. doi:10.3102/1076998614546494

71.

Lenox

M. J.

Rockart

S. F.

Lewin

A. Y.

(2010). Does interdependency affect firm and industry profitability? An empirical test. Strategic Management Journal, 31(2), 121–139. doi:10.1002/smj.811

72.

Leonard

(1975). A Bayesian approach to the linear model with unequal variances. Techometrics, 17(1), 95–105. doi:10.2307/1268006

73.

Lewandowski

Kurowicka

Joe

(2009). Generating random correlation matrices based on vines and extended onion method. Journal of Multivariate Analysis, 100(9), 1989–2001. doi:10.1016/j.jmva.2009.04.008

74.

Liden

R. C.

Erdogan

Wayne

S. J.

Sparrowe

R. T.

(2006). Leader-member exchange, differentiation, and task interdependence: Implications for individual and group performance. Journal of Organizational Behavior, 27(6), 723–746. doi:10.1002/job.409

75.

Liden

R. C.

Wayne

S. J.

Stilwell

(1993). A longitudinal study on the early development of leader-member exchanges. Journal of Applied Psychology, 78, 662–674. doi:10.1037//0021-9010.78.4.662

76.

Lievens

Lang

J. W.

De Fruyt

Corstjens

Van de Vijver

Bledow

(2018). The predictive power of people’s intraindividual variability across situations: Implementing whole trait theory in assessment. Journal of Applied Psychology, 103(7), 753–771. doi:10.1037/apl0000280

77.

Lievens

Sanchez

J. I.

Bartram

Brown

(2010). Lack of consensus among competency ratings of the same occupation: Noise or substance? Journal of Applied Psychology, 95(3), 562–571. doi:10.1037/a0018035

78.

Lim

(2018). Attainment discrepancy and new geographic market entry: The moderating roles of vertical pay disparity and horizontal pay dispersion. Journal of Management Studies. Advance online publication. doi:10.1111/joms.12430

79.

Lin

Mermelstein

R. J.

Hedeker

(2018a). A shared parameter location scale mixed effect model for EMA data subject to informative missing. Health Services and Outcomes Research Methodology, 18(4), 227–243. doi:10.1007/s10742-018-0184-5

80.

Lin

Mermelstein

R. J.

Hedeker

(2018b). A three level Bayesian mixed effects location scale model with an application to ecological momentary assessment (EMA) data. Statistics in Medicine, 31(26), 2108–2119. doi:10.1002/sim.7627

81.

Lindley

D. V.

(1971). The estimation of many parameters. In Godambe

V. P.

Sprott

D. A.

(Eds.), Foundations of statistical inference (pp. 435–455). Toronto, Canada: Holt, Rinehart, and Winston.

82.

Lynch

S. M.

(2007). Introduction to applied bayesian statistics and estimation for social scientists. New York: Springer.

83.

Marks

M. A.

Mathieu

J. E.

Zaccaro

S. J.

(2001). A temporally based framework and taxonomy of team processes. Academy of Management Review, 26(3), 356–376. doi:10.5465/amr.2001.4845785

84.

Martin

Thomas

Legood

Dello Russo

(2018). Leader–member exchange (LMX) differentiation and work outcomes: Conceptual clarification and critical review. Journal of Organizational Behavior, 39(2), 151–168. doi:10.1002/job.2202

85.

Matta

F. K.

Scott

B. A.

Colquitt

J. A.

Koopman

Passantino

L. G.

(2017). Is consistently unfair better than sporadically fair? An investigation of justice variability and stress. Academy of Management Journal, 60(2), 743–770. doi:10.5465/amj.2014.0455

86.

McElreath

(2016). Statistical rethinking: A Bayesian course with examples in R and Stan (Vol. 122). Boca Raton, FL: CRC Press.

87.

Melkonyan

Safra

(2016). Intrinsic variability in group and individual decision making. Management Science, 62(9), 2651–2667. doi:10.1287/mnsc.2015.2255

88.

Messersmith

J. G.

Guthrie

J. P.

Y. Y.

Lee

J. Y.

(2011). Executive turnover: The influence of dispersion and other pay system characteristics. Journal of Applied Psychology, 96(3), 457–469. doi:10.1037/a0021654

89.

Minbashian

Luppino

(2014). Short-term and long-term within-person variability in performance: An integrative model. Journal of Applied Psychology, 99(5), 898–914. doi:10.1037/a0037402

90.

Minbashian

Wood

R. E.

Beckmann

(2010). Task-contingent conscientiousness as a unit of personality at work. Journal of Applied Psychology, 95(5), 793–806.

91.

Muthén

L. K.

Muthén

B. O.

(1998-2017). Mplus user’s guide (8th ed.). Los Angeles, CA: Muthén & Muthén.

92.

Neal

R. M.

(2010). MCMC using hamiltonian dynamics. In Brooks

Gelman

Jones

G. L.

Meng

X.-L.

(Eds.), The handbook of Markov Chain Monte Carlo (pp. 113–162). Boca Raton, FL: Chapman & Hall/CRC Press.

93.

Pak

Kim

(2018). Team manager’s implementation, high performance work systems intensity, and performance: A multilevel investigation. Journal of Management, 44(7), 2690–2715. doi:10.1177/0149206316646829

94.

Patel

P. C.

Kohtamäki

Parid

Wincent

(2015). Entrepreneurial orientation-as-experimentation and firm performance: The enabling role of absorptive capacity. Strategic Management Journal, 36, 1739–1749. doi:10.1002/smj.2310

95.

Pawitan

(2001). In all likelihood: Statistical modelling and inference using likelihood. Oxford, UK: Oxford University Press.

96.

Pinheiro

J. C.

Bates

D. M.

(2000). Mixed-effects in S and S-Plus. New York, NY: Springer.

97.

Ployhart

R. E.

Weekley

J. A.

Baughman

(2006). The structure and function of human capital emergence: A multilevel examination of the attraction-selection-attrition model. Academy of Management Journal, 49(4), 661–677. doi:10.5465/amj.2006.22083023

98.

Plummer

(2013). JAGS: Just another gibs sampler. Retrieved from http://mcmc-jags.sourceforge.net/

99.

Posen

H. E.

Martignoni

(2018). Revisiting the imitation assumption: Why imitation may increase, rather than decrease, performance heterogeneity. Strategic Management Journal, 39(5), 1350–1369. doi:10.1002/smj.2751

100.

Pugach

Hedeker

Richmond

M. J.

Sokolovsky

Mermelstein

(2014). A bivariate mixed-effects location-scale model with application to ecological momentary assessment (EMA) data. Health Services and Outcomes Research Methodology, 14(4), 194–212. doi:10.1007/s10742-014-0126-9

101.

Raudenbush

S. W.

(1988). Estimating change in dispersion. Journal of Educational Statistics, 13(2), 148–171. doi:10.3102/10769986013002148

102.

Raudenbush

S. W.

Bryk

A. S.

(2002). Hierarchical linear models: Applications and data analysis methods (Vol. 1). Thousand Oaks, CA: Sage.

103.

Reb

Greguras

G. J.

(2010). Understanding performance ratings: Dynamic performance, attributions, and rating purpose. Journal of Applied Psychology, 95(1), 213–220. doi:10.1037/a0017237

104.

Roberson

Q. M.

Williamson

I. O.

(2012). Justice in self-managing teams: The role of social networks in the emergence of procedural justice climates. Academy of Management Journal, 55(3), 685–701. doi:10.5465/amj.2009.0491

105.

Schyns

(2006). Are group consensus in leader-member exchange (LMX) and shared work values related to organizational outcomes? Small Group Research, 37(1), 20–35. doi:10.1177/1046496405281770

106.

Scott

B. A.

Barnes

C. M.

Wagner

D. T.

(2012). Chameleonic or consistent? A multilevel investigation of emotional labor variability and self-monitoring. Academy of Management Journal, 55, 905–926. doi:10.5465/amj.2010.1050

107.

Sitzmann

Ployhart

R. E.

Kim

(2019). A process model linking occupational strength to attitudes and behaviors: The explanatory role of occupational personality heterogeneity. Journal of Applied Psychology, 104, 247–269. doi:10.1037/apl0000352

108.

Spiegelhalter

Thomas

Best

Lunn

(2007). OpenBUGS user manual, version 3.0.2. Cambridge, UK: MRC Biostatistics Unit.

109.

Sriram

Chintagunta

P. K.

Manchanda

(2015). Service quality variability and termination behavior. Management Science, 61, 2739–2759. doi:10.1287/mnsc.2014.2105

110.

Trevor

C. O.

Reilly

Gerhart

(2012). Reconsidering pay dispersion’s effect on the performance of interdependent work: Reconciling sorting and pay inequality. Academy of Management Journal, 55, 585–610. doi:10.5465/amj.2006.0127

111.

Wales

W. J.

Patel

P. C.

Lumpkin

G. T.

(2013). In pursuit of greatness: CEO narcissism, entrepreneurial orientation, and firm performance variance. Journal of Management Studies, 50(6), 1041–1069. doi:10.1111/joms.12034

112.

Walters

R. W.

Hoffman

Templin

(2018). The power to detect and predict individual differences in intra-individual variability using the mixed-effects location-scale model. Multivariate Behavioral Research, 53(3), 360–374. doi:10.1080/00273171.2018.1449628

113.

Wang

Choi

(2013). A new look at the corporate social–financial performance relationship: The moderating roles of temporal and interdomain consistency in corporate social performance. Journal of Management, 39(2), 416–441. doi:10.1177/0149206310375850

114.

Watts

Walters

R. W.

Hoffman

Templin

(2016). Intra-individual variability of physical activity in older adults with and without mild Alzheimer’s disease. PLOS ONE, 11(4). doi:10.1371/journal.pone.0153898