Abstract
Maximum likelihood estimation of multilevel structural equation model (MLSEM) parameters is a preferred approach to probe theories involving latent variables in multilevel settings. Although maximum likelihood has many desirable properties, a major limitation is that it often fails to converge and can incur significant bias when implemented in studies with a small to moderate multilevel sample (e.g., fewer than 100 organizations with 10 or less individuals/organization). To address similar limitations in single-level SEM, literature has developed Croon’s bias-corrected factor score path analysis estimator that converges more regularly than maximum likelihood and delivers less biased parameter estimates with small to moderate sample sizes. We derive extensions to this framework for MLSEMs and probe the degree to which the estimator retains these advantages with small to moderate multilevel samples. The estimator emerges as a useful alternative or complement to maximum likelihood because it often outperforms maximum likelihood in small to moderate multilevel samples in terms of convergence, bias, error variance, and power. The proposed estimator is implemented as a function in R using lavaan and is illustrated using a multilevel mediation example.
Keywords
Organizational research often involves multilevel theories that probe complex systems of latent variables using observable but fallible indicators subject to measurement error. A common approach to operationalize such multilevel structures and address the measurement error associated with fallible indicators is to construct and connect multiple indicator factor models using multilevel structural equation models (MLSEMs; e.g., Li & Beretvas, 2013). Full information methods such as maximum likelihood (ML) estimation of all parameters has been shown to be an effective strategy to address the deleterious effects of measurement error (e.g., bias, inaccurate inferences, erroneous estimates of power) in large sample settings (e.g., Cheung & Lau, 2008, 2017; Cole & Preacher, 2014; Ledgerwood & Shrout, 2011; Li & Beretvas, 2013).
MLSEMs are however often heavily parameterized and, as a result, ML estimation typically requires large sample sizes at each level to dependably deliver stable and admissible solutions that provide unbiased estimates of the parameters and their standard errors (Li & Beretvas, 2013; McNeish, 2017). For instance, research on mediation with MLSEM has demonstrated that ML frequently encounters convergence or estimation errors and produces biased parameter estimates and standard errors even in simple models with samples of 40 organizations and 20 individuals per organization (e.g., Li & Beretvas, 2013). This research has established that a minimum of 80 to 100 organizations are required to ensure model convergence with simple multilevel models and suggested even larger samples are needed for ML to dependably deliver nearly unbiased parameter estimates and standard errors across common conditions (Gagne & Hancock, 2006; Hox & Maas, 2001; Li & Beretvas, 2013; McNeish, 2017).
Yet, substantive literature has emphasized that the benefits of studying theories, interventions, and intermediate processes are not limited to large-scale studies—small- to moderate-scale studies offer critical contributions to theory and social issues when they are well executed (e.g., Bodner & Bliese, 2018; Walton, 2014). In many areas of research, samples of fewer than 80 organizations are typical and samples greater than 80 may be considered prohibitively large (e.g., Schochet, 2011; Spybrook, Shi, & Kelcey, 2016). Perhaps because of the mismatch between the scale of many multilevel studies and the large-scale requirements of ML estimation in MLSEM, literature reviews have reported a widespread absence in the appropriate adjustment for measurement error (e.g., Aguinis, Edwards, & Bradley, 2017). This lack of treatment of measurement error may be in part attributable to the lack of suitable methods available for small- to moderate-scale studies rather than researcher oversight.
A historical alternative to ML estimation of MLSEM parameters is a type of limited information factor score path analysis. This approach reduces MLSEMs to the more parsimonious multilevel path models by using single equation ML to estimate each of the factor models and then substitutes the resulting factor scores in place of the measurement models to estimate the structural model parameters (Cheung & Lau, 2017; Devlieger, Mayer, & Rosseel, 2016; Devlieger & Rosseel, 2017; Hoshino & Bentler, 2013; Li & Beretvas, 2013; Lu, Kwan, Thomas, & Cedzynski, 2011; McNeish, 2017). This type of multilevel path analysis with factor scores has been shown to outperform ML in MLSEM in terms of power and in the stability and convergence of estimates but is well known to produce biased estimates of path coefficients and sampling variance when measurement error is present (Li & Beretvas, 2013).
Contemporary advances in single-level designs have, however, shown that the nature of this bias can be tracked and corrected by taking into account the results of the measurement model error associated with each latent construct (Croon, 2002). One recent development in this framework is Croon’s bias-corrected version of factor score path analysis (Devlieger et al., 2016; Devlieger & Rosseel, 2017).
Croon’s bias-corrected factor score path analysis can be implemented through four general steps: (a) separately estimate factor models and scores for each latent variable (e.g., for covariates, mediator, outcome) using ML; (b) estimate the variance-covariance matrix of the factor scores (e.g., covariates, mediator, outcome) and any observed variables (e.g., treatment status); (c) use the results of the measurement models in (a) to construct a method of moments type correction and use it to adjust the variance-covariance matrix; and (d) estimate the path model using the corrected variance-covariance matrix.
This approach parallels the conventional but suboptimal factor score path analysis method. However, it differs in one key way—the conventional path analysis approach uses the factor score variance-covariance matrix to estimate structural relationships whereas Croon’s approach introduces a method of moments adjustment based on the measurement model—in step (c) above—to correct the factor score variance-covariance matrix and obtain a consistent estimate of the variance-covariance matrix.
Recent developments have provided a theoretical and empirical basis for the application of Croon’s bias-corrected estimator to a broad range of settings and have identified the estimator as an important small to moderate sample estimation alternative for SEMs (Devlieger et al., 2016; Devlieger & Rosseel, 2017; Loncke et al., 2018). For instance, past literature with single-level structural equation models has shown that Croon’s estimator outperforms ML in a wide variety of settings including small to moderate samples, missing data, measurement misspecifications, structural misspecifications, and correlated error terms (Devlieger et al., 2016; Devlieger & Rosseel, 2017; Hayes & Usami, 2019; Kelcey, 2019; Loncke et al., 2018; Lu et al., 2011). Given the potential analytic advantages of Croon’s limited information estimator in buttressing evidence from small- to moderate-scale studies, an open set of questions is if and how these methods can be extended to multilevel settings where governing sample sizes (i.e., the number of organizations) tend to be small to moderate and model complexity tends to be high.
In this study, we extend the scope of Croon’s bias-corrected estimator by deriving its corrections for multilevel studies. We detail the estimation of relationships within a multilevel system. We then take up a case study of multilevel mediation to anchor and apply the developments and provide an R function to implement Croon’s method using the lavaan package. Finally we assess of the absolute and relative performance of the estimator with multilevel mediation models using a Monte Carlo simulation and conclude with a discussion of the results, implications and limitations.
Working Example
We detail the development of the multilevel extension of Croon’s estimator through its implementation in a multilevel mediation application that is graphically depicted in the top-left frame of Figure 1 (notation and models subsequently detailed). We note that although we describe the estimator within the context of a common multilevel mediation model, the framework can accommodate a much wider range of mediation models and MLSEMs. In general, the estimator and method we develop can be applied to a broad range of standard MLSEMs, including, for example, sequential mediation models, multiple mediator models, models with multiple outcomes, models with multiple interventions, and models with combinations of these and other features. We outline some of the limitations and future work of the framework in the discussion section.

Conceptual representation of a 2-1-1 MLSEM mediation model when the mediator, outcome, and covariate are subject to measurement error (top panel) and the steps to implementing this model using the bias-corrected estimator; panels (a), (b), (c), and (d) capture the respective measurement models for each latent trait in the system, while panel (e) captures the corrected path analysis.
Our working example draws on multilevel mediation as it represents one prevalent type of MLSEM in organizational research and in research across almost every area of the social sciences (e.g., Aarons & Sawitzky, 2006; Allen, Herst, Bruck, & Sutton, 2000; Brough & O’Driscoll, 2005; Brough, O’Driscoll, Kalliath, Cooper, & Poelmans, 2009; Brown et al., 2014; Eby, Casper, Lockwood, Bordeaux, & Brinley, 2005; Kelcey, Hill, & Chin, 2019; Ketelaars, 2017; Sampson, Raudenbush, & Earls, 1997; Song, Tsui, & Law, 2009). There are of course many other types of MLSEMs that are common in the organizational sciences and for which the proposed estimator is suitable. However, we focus on multilevel mediation because it provides a clear introduction to the corrections in multilevel settings and it allows us to provide a specific and targeted initial simulation assessment of the performance of the estimator.
Suppose we would like to assess the mediation effect of an organizational policy promoting a family-friendly workplace culture (organization-level intervention) on employee self-reported organizational commitment (individual-level outcome) as it operates through employee self-reported work-life balance (individual-level mediator; e.g., Brough et al., 2009; Eby et al., 2005). Let us consider an experiment that randomly assigns the family-friendly policy to organizations and is designed to assess the extent to which the policy improves organizational commitment by improving work-life balance—both of which are assessed with error using individual-level indicators.
Although random assignment of the policy directly addresses the potential for confounding variables regarding the main effect of the policy on employee organizational commitment, it does not address the potential for variables that confound the relationship between the mediator and the outcome (i.e., sequential ignorability; e.g., VanderWeele, 2010). Omission of variables that confound the mediator-outcome relationship can bias path coefficients and this bias can be further amplified when ignoring measurement error (Fritz, Kenny, & MacKinnon, 2016). In order to plausibly identify the mediation effect, we must adjust for measurement error and for variables that confound the relationship between the mediator and outcome. For this reason, our example additionally includes latent covariates at both levels—an organization-level covariate assessed with error using organization-level indicators (W) and an individual-level covariate measured with error using individual-level indicators (X). Figure 1 provides an illustration that corresponds to this working example. Our subsequent analysis makes the assumption that conditioning on these covariates resolves confounding bias and further correctly identifies the mediation effect. However, we note that this assumption requires careful consideration in practice, and we probe the extent to which our results are sensitive to this assumption in the simulation.
Model
We draw on a MLSEM formulation that uses multilevel and single-level common factor models (e.g., Preacher, Zyphur, & Zhang, 2010). We use the single-level common factor model (described below) for the organization-level latent covariate (
For variation among organizations (
We use b as the path coefficient for the organization-level component of the latent mediator (
Within this framework, we use the a path coefficient from the mediator model to quantify the changes in work-life balance brought about by a new family-friendly workplace policy and the b coefficient to quantify how changes in work-life balance and the new organizational culture are conditionally associated with improved organizational commitment. In our example, we assess the mediation effect (ME) that quantifies the shared organization-level covariance among the organizational policy, work-life balance (mediator) and organizational commitment (outcome) at the organization-level. The effect is typically estimated using the product of the organization-level path coefficients for the intervention-mediator path (a) and mediator-outcome (b) paths:
Croon’s Bias-Corrected Estimator
We next provide a step by step outline of our developments along with an example analysis using the lavaan package (Rosseel, 2012). We first describe the measurement models and factor scores, and considerations for extending Croon’s estimator to a multilevel framework—steps (a) and (b) above. We then detail the resulting multilevel corrections—step (c) above. Last, we discuss the path analysis with the bias-corrected estimator—step (d) above—and end with inferential considerations.
Measurement Models
We begin by delineating step (a)—we detail how to fit separate factor models to each latent variable using ML and then follow with the prediction of the respective factor scores (or the covariance matrix of the factor scores; see a-d in Figure 1). Within the context of multilevel structures, we describe two primary types of measurement models that arise—single-level and multilevel factor models. In our two-level setting, single-level factor models are primarily used when we draw on organization-level indicators (i.e., constant within an organization) to measure a latent variable that varies only among organizations whereas multilevel factor models are employed when we draw on individual-level indicators that vary within and across organizations in order to track a latent variable that potentially varies both within and across organizations. We outline these models below and the considerations that help us to subsequently operationalize the bias-corrected estimator in multilevel settings.
Organization-level variables
We start by considering a single-level common factor model for organization-level latent variables that draws on observed indicators (w). For instance, our measurement model for the latent covariate (
We use j to index organizations, w
j
as the observed indicators for the latent covariate ηW
, Λ
W
as the factor loadings,
Individual-level variables
Now consider an individual-level latent variable such as the outcome (
We use y
ij
as the indicators of individual i in organization j for the latent variable,
To operationalize the proposed bias-corrected estimator in multilevel settings, we need to estimate the covariances of the factor scores at the organization- and individual-level for pairs of constructs. For variables that capture variation solely among organizations (e.g., the organization-level covariate W), we can use single-level factor scoring methods. However, for variables that vary among and within organizations, we must take into account the organizational or clustered nature of its observed indicators. Conceptually, the organization-level factor scores for a latent variable (e.g., outcome) that varies both within an organization and among organizations can be expressed as
Here we use
with
When predicting organization-level factor scores for the organization-level component (
There are multiple univariate and multivariate approaches one could adopt to predict the organization-level values of indicators. Multivariate approaches that simultaneously decompose all of a factor’s indicators potentially strengthen predictions because they leverage the interrelations of the indicators. However, such approaches are potentially susceptible (perhaps to a lesser degree) to the same types of convergence and estimation errors associated with ML because of their complexity relative to sample size. Below we outline univariate approaches that accept issues of factor score indeterminacy and uncertainty in exchange for less complexity and potentially fewer convergence issues. However, the function also allows for multivariate approaches.
We considered two univariate predictors of the organization-level components of indicators—organizational means and empirical Bayes means. Organizational means simply estimate the cluster-level components of the indicators with the observed organization-specific means (denoted
An alternative approach is to predict the organization components of the indicators using an empirical Bayes estimator. Empirical Bayes prediction forms precision weighted means on the basis of the reliability of organizational means (e.g., Raudenbush & Bryk, 2002). For instance, for the first indicator, an empirical Bayes prediction (
where
Multivariate versions of this approach can also be used. For example, for a given set of indicators (e.g., y
1, y
2, and y
3), we can estimate the organization-level covariance matrix (
with empirical Bayes multivariate predictions (
Corrections
We next detail the core machinery of the proposed estimator—the covariance-based corrections. The factor score-based covariance matrices developed above will be biased because they neglect the uncertainty introduced by representing the latent constructs and indicators with factor scores (e.g., Bollen, 1989). We derive a Croon (2002) type method of moments correction that adjusts the covariances for the unaccounted measurement error to produce consistent estimates of the covariance matrices at each level.
The proposed estimator is similar in character to two-stage ML approaches (e.g., Savalei & Bentler, 2009), stepwise estimators developed for latent class models (e.g., Bakk & Vermunt, 2015), or the (improved) regression calibration approaches (e.g., Skrondal & Kuha, 2012) because it introduces a correction factor that accounts for the attenuating effects of the uncertainty in the factor scores stemming from both the measurement and the multilevel components. More conceptually, the corrections detailed below parallel classical test theory disattenuation formulas for the correlation between two unreliably measured constructs—principally the corrections we derive leverage the unreliabilities of the latent variables to adjust for the impact of measurement error on a covariance (Spearman, 1904). For example, the classical test theory adjustment disattenuates the correlation between two measurement error prone variables by dividing it by the square root of the product of the variable reliabilities. In MLSEM these reliabilities (and their uncertainties) are unknown and must be estimated through complex functions of the parameters in the measurement models. The corrections we propose in this study conceptually extend the classical test theory style adjustments because they first identify the expected relationship among the covariance of latent variables, the covariance of their factor scores, and the unreliabilities of the factors, and second develop empirical estimates of those unreliabilities using the estimated factor models. In turn, the proposed estimator blends the estimation stability of factor score path analysis with the measurement error-corrections typically associated with ML.
Below we outline the nature of these corrections (see Supplemental Material Part 2 for details) as classified by the levels of pairs of variables. We outline the corrections for (a) the covariance between two organization-level variables, (b) the covariance between an organization- and individual-level variable, (c) the organization-level covariance between two individual-level variables, and (d) the individual-level covariance between two individual-level variables.
Organization-level variables
The first case we consider is the correction associated with the covariance between two organization-level latent variables. Consider the covariance between two organization-level latent variables (e.g.,
where
In our working 2-1-1 mediation example (Figure 1), consider the covariance between the organization-level latent covariate (
Organization- with individual-level variables
Next, we detail the corrections for cross-level covariances—that is, the covariance between an organization- and individual-level latent variable. In our working 2-1-1 mediation example (Figure 1), consider the covariance between the organization component of outcome latent variable (
Here
Conceptually, the results simply indicate that in addition to adjusting for the unreliability of the factors (as in the case for single-level or with two organization-level variables), we must further correct for the unreliability of the indicators (e.g., using
Organization-level component of individual-level variables
We next consider the organization-level covariance between two latent variables measured at the individual level. When considering the covariance of the organization-level components of an outcome and mediator assessed at the individual level (denoted as
Here
Individual-level component of individual-level variables
In complement to previous type of organization-level covariance, we can consider the individual-level covariance between two latent variables measured at the individual level. This covariance arises when, for example, we are interested in assessing the within organization relationship between the latent mediator and outcome (labeled b
1). The correction for the covariance of the individual-level components of an outcome and mediator assessed at the individual level (
Having corrected the covariances, we can then adjust the variance of each latent variable. When setting the scale of a latent variable by fixing its variance to unity, we can replace the variance of the factor scores for a given latent variable with this scaling (i.e., set factor variances to one). When fixing the loading of the first indicator to unity, we replace the variance of the factor scores with the latent variance estimate obtained in the factor model (e.g., for the outcome, we use the estimated level two variance from model (4)).
Bias-Corrected Path Analysis
Having formed corrected covariance matrices, we can then separately estimate the organization- and individual-level structural models described in Equation 1 using separate conventional single-level path analyses or a combined multilevel path analysis on the corrected covariance matrices.
Inference
In order to draw statistical inferences regarding parameter values, we need to track their error variance. Standard model-based approaches that use the information matrix to estimate the error variance fail in the current context because they do not account for the additional uncertainty that arises from measurement error and its corrections. To address this limitation, we can use bootstrap-based methods (e.g., Efron & Tibshirani, 1993). There are several potentially applicable implementations of the bootstrap in the current context, but one simple approach is a parametric bootstrap based on the estimates obtained in the original analysis.
The flow of this resampling approach is as follows: (a) sample new values of the latent variables based on multivariate normality with covariances taken from the corrected covariance matrix, (b) sample new values of the latent variable indicators using the estimated measurement model parameters, (c) reestimate the MLSEM using Croon’s method, and (d) replicate (a) to (c) a sufficient number of times to establish stable parameter distributions. We then draw on percentile-based confidence intervals to test hypotheses or form confidence intervals.
Illustration
Continuing with our working example, suppose we would like to assess the indirect effect of an organizational policy promoting a family-friendly workplace culture (organization-level intervention) on employee self-reported organizational commitment (individual-level outcome) as it operates through employee self-reported work-life balance (individual-level mediator; e.g., Brough et al., 2009; Eby et al., 2005). Let us consider data that draws on a sample of 30 organizations with 5 employees per organization (150 total individuals).
To begin our example analysis in R, we download call the bcfspa function and download the data (see Supplemental Material) and read it into R using
We then specify the model depicted in Figure 1 using lavaan syntax. We identify the scales of the latent variables by fixing the loading of the first indictor to one yielding a specification of
Alternatively, we could free the loading of the first indicator for each factor at both levels and constrain the variance for each factor at both levels to be one. Having specified the model, we can then estimate the parameters using ML with
To implement the multilevel version of Croon’s estimator detailed in this study, we can call the
A summary of the organization-level coefficients can be obtained using, for example,
The first argument in the
In addition, a
A different specification draws on metric invariance across levels by constraining indicator loadings to be equal across levels. In this context, indicators are equally good at discriminating among organizations and individuals on the latent variables. We can extend the syntax using the standard lavaan code that introducing labels such that
Here, labels such as
Results
The resulting organization-level estimates under the first specification (
Organization-Level Measurement and Structural Model Results for 2-1-1 Illustrative Example.
Note: ML is concurrent estimation of all parameters using maximum likelihood. FS-EB is factor score path analysis using empirical Bayes. Croon-EB is Croon’s bias-corrected method using empirical Bayes.
Although the methods differentially incurred bias across structural coefficients, Croon’s method had the lowest average absolute bias at 0.10 (relative bias of 21%). In contrast, ML took on nearly four times that with an average absolute bias of 0.35 (relative bias of 71%) while the uncorrected approach took on an average absolute bias of 0.22 (relative bias of 44%). Although this example was purposefully chosen to illustrate the differences among the approaches, these observed discrepancies align well with the subsequent simulation results. More generally, the theoretical and subsequent simulation results of our study and other studies suggest that such discrepancies are common in small to moderate samples.
Simulation
To provide an initial assessment of the value and precision of the methods developed, we conducted Monte Carlo simulations that outline performance along several criteria. We generated data under the 2-1-1 mediation model described in Equations 1 and 2 and their respective multilevel and single-level measurement models. We first considered an organization-level sample size of 30, an individual-level sample size of 5 per organization, unit error variances for indicators with an intraclass correlation coefficients of 0.20 for multilevel factors, factor loadings of 0.50, 3 indicators per latent variable, and the following standardized path coefficients: a = 0.20, b = 0.28, c′ = 0.14, γ 1 = 0.27, γ 2 = 0.27, π 1 = 0.40, and π 2 = 0.40, and ζ 1 = 0.50, β1 = 0.50, and b 1 = 0.50. We then held constant these parameter values and examined the influence of variation in the organization-level sample size (ranging from 20 to 100), individual-level sample size (ranging from 5 to 100), intraclass correlation coefficients for the indicator error terms (ranging from 0.10 to 0.70), factor loadings (ranging from 0.3 to 0.7), and the number of indicators per latent variable (3 and 5). We assessed the accuracy of the estimators (Croon’s, ML, and uncorrected factor score path analysis) under a standardized approach across 1,000 samples in terms of bias, error variance, power, and sensitivity to misspecifications.
The results of our analyses are outlined in Figures 2 to 5 and in the Supplemental Material Part 1 Tables S1 and S2. Our analyses suggested that under many common conditions the properties of the estimators were similar with larger samples. Convergence and bias were similar for ML and Croon’s once the sample sizes reach about 50 organizations or the number of individuals per organization exceeded 20 with at least 30 organizations; however, Croon’s approach had lower root mean square error and higher power until samples exceeded 100 organizations with at least 20 individuals per organization. Similarly, differences among the estimators in terms of individual-level coefficients were typically quite small across samples. For these reasons, we outline the results with larger samples in the Supplemental Material (Part 1) and probe the results for organization-level coefficients with small to moderate samples (i.e., 30 to 100 organizations with 5 to 100 individuals per organization) below.

Convergence rate (gray) and average absolute bias (black) by estimator as a function of (a) organizations, (b) individual per organization, (c) factor loadings, (d) number of indicators, and (e) intraclass correlation coefficients of the indicator error terms.

Convergence rate (gray) and root mean square error (black) by estimator as a function of (a) organizations, (b) individual per organization, (c) factor loadings, (d) number of indicators, and (e) intraclass correlation coefficients of the indicator error terms.

Power as a function of organization sample size (x-axis) by path (indicated on y-axis along with path coefficient). Black symbols indicate an individual-level sample size of 20, while gray symbols indicate an individual-level sample size of 5.

Convergence rate (gray) and average absolute bias (black) by estimator and (a) measurement misspecification and (b) omitted covariate.
Convergence
Figure 2 outlines the proportion of samples that converged (gray) for each condition by estimator. For instance, the gray markers in Figure 2 panel (a) identify the convergence rates for each estimator for samples of 20, 30, 40, 50, 60, 80, and 100 organizations (that are separated by vertical dashed lines) holding constant the other parameter values. Likewise, the gray markers in Figure 2 panel (b) identify the convergence rates for each estimator for individual-level sample sizes of 5, 10, 20, 50, and 100 organizations.
The results replicate prior literature and underscore a key advantage of Croon’s method—it routinely converges in small to moderate samples when ML fails to converge. For example, ML converged in just 36% of the drawn samples when using 30 organizations and 5 individuals whereas Croon’s method converged in 70% of these samples (Figure 2). The differences in convergence rates receded as a function of sample size; however, even when we double the number of organizations to 60, Croon’s method (90% convergence) still notably outperformed ML (77% convergence).
A closer look at the results further reveals that Croon’s method continued to yield the same relatively low bias parameter estimates when ML sustained estimation or convergence issues. With 30 organizations, for instance, of the 636 (of 1,000) samples that ML failed to converge in, Croon’s method converged in 406 of them (64%). That is, in a large majority of the samples where ML encountered convergence problems, Croon’s estimator was able to avoid such problems. Increased sample size depreciates this advantage but even with 60 organizations, of the 233 (of 1,000) samples ML failed to converge in, Croon’s method converged in 182 of them (78%). The cross-tabulations for the number of samples the estimators converged in were
Similar advantages in convergence rates were also observed for uncorrected factor score path analysis. With 30 organizations it converged 90% of the time and with 60 organizations it converged in 100% of the samples. Moreover, the uncorrected approach converged in 90% of the samples ML failed in with 30 organizations and in 99% with 60 organizations.
Discrepancies in convergence among the estimators were heavily influenced by sample size but mostly disappeared once the organization-level sample size reached about 50 or the number of individuals per organization exceeded 20 with at least 30 organizations. However, the intraclass correlation coefficients of the indicator error terms, factor loadings and the number of indicators also influenced convergence rates. As the intraclass correlation coefficients of the indicator error terms approached more extreme values of zero and one, convergence rates dipped. As factor loadings increased, the convergence rate for each estimator improved but did not impact the relative rank order of the estimators (Figure 2). In contrast, increasing the number of indicators reduced the convergence rate for Croon’s but improved the convergence rate for ML (Figure 2).
Bias
Figure 2 further outlines the average absolute bias (black) of the path coefficients by estimator for small to moderate samples (see Supplemental Material Part 1 Tables S1-S2 for additional results). For example, the black markers in Figure 2 panel (a) identify the average absolute bias for each estimator for samples of 20, 30, 40, 50, 60, 80, and 100 organizations (that are separated by vertical dashed lines) holding constant the other parameter values.
The results extended prior work with Croon’s method in single-level settings in that Croon’s method often outperformed ML and the uncorrected approach in terms of bias in small to moderate samples. Croon’s method incurred some bias but regularly maintained the lowest level of bias across small to moderate samples. The bias of the ML estimator was plagued by convergence issues and intermittent extreme parameter estimates in small to moderate samples that faded in larger samples. Uncorrected factor score path analysis often returned estimates that were in a similar neighborhood as ML and Croon’s but was typically more biased because of its neglect of measurement error.
An important practical consideration is the extent to which Croon’s method can produce nearly unbiased estimates in samples where ML failed to converge. More specifically, the aforementioned convergence results suggested that a key utility of Croon’s method is that it frequently circumvents convergence issues in samples where ML fails to converge. Further probing the intersection of the convergence and bias results indicated that Croon’s method retained the same relatively low level of bias in samples where it converged but ML did not.
Returning to our earlier example with 30 organizations—of the 406 samples where ML did not converge but Croon did, the average absolute bias of Croon’s method was identical to the overall absolute average bias for all samples (0.09). Similarly, with 60 organizations, of the 182 samples where ML failed to converge but Croon did, Croon’s average absolute bias was still only 0.04—again, identical to the overall average absolute bias for all samples. Uncorrected factor score path analysis also maintained its bias level (e.g., 0.13 for 30 organizations) for the subset of samples in which ML (and/or Croon’s) failed to converge but these higher convergence rates typically came at the price of increased bias.
Bias across all estimators was influenced by sample size at both levels but this relationship was much more pronounced for ML than for the uncorrected and Croon methods. For instance, the average absolute bias of the path coefficients in samples of 30 organizations and 5 individuals was 0.97 for ML and 0.13 for the uncorrected approach whereas for Croon’s method it was only 0.09 (Figure 2). Croon’s bias advantage declined as a function of sample size—for instance, with 60 organizations the bias associated with Croon’s method was reduced to 0.04 whereas it was reduced to 0.06 with ML and 0.08 with uncorrected factor score path analysis. Paralleling the convergence results, once the organization–level sample size reached about 50 or the number of individuals per organization exceeded 20 with at least 30 organizations, Croon’s and ML approached near zero bias while the uncorrected method retained bias proportional to the unreliability of the latent variables.
To a lesser extent, differences in bias among the estimators were also influenced by the other parameters considered but the degree and direction of the influence was not necessarily consistent across estimators (Figure 2). Regarding the different multilevel factor scoring approaches developed, the results suggested that the bias of the univariate empirical Bayes and the organization means approaches were largely indistinguishable. For factor loadings, increases tended to yield lower bias on average. Changes in the intraclass correlation coefficients of the indicator error terms and the number of indicators produced less notable and less consistent changes in bias across the estimators (Figure 2).
Mean square error
The results also suggested that Croon’s method was more efficient than the ML estimator in samples up to 100 organizations but that Croon’s was less efficient when compared to the uncorrected approach. The reduction in bias associated with Croon’s method was bought with an increase in uncertainty relative to the uncorrected factor score path analysis approach. Because Croon’s method pairs bias reduction with additional error variance, we describe the bias-variance trade-off using the root mean square error of the estimators in Figure 3. Like previous plots, the black markers in Figure 3 panel (a) identify the root mean square error for each estimator for samples of 20, 30, 40, 50, 60, 80, and 100 organizations (that are separated by vertical dashed lines).
Two results emerged. First, Croon’s method outperformed ML in terms of root mean square error. The distribution of path coefficient estimates under Croon’s method was consistently more concentrated around the true values compared to those of ML and with the exception of sample size, the values of other parameters (e.g., factor loadings) appeared to have little influence. Second, the uncorrected approach consistently outperformed Croon’s method in terms of root mean square error. That is, although the uncorrected approach returned biased estimates, its dispersion is significantly smaller than that of ML or Croon’s. In contrast to the bias results, the root mean square error of the estimators was not comparable until the number of organizations exceeded 100 and 20 individuals per organization.
Power
We surveyed the power with which each method could detect the presence of a nonzero relationship for each path (e.g., Kelcey, Dong, Spybrook, & Cox, 2017; Preacher & Selig, 2012). Figure 4 displays power as a function of organization-level sample size for an individual-level sample size of 5 (gray) and 20 (black). The analyses suggested two findings that parallel the previously outlined results. First, Croon’s method was typically more powerful than ML in small to moderate samples but this depended on the path and the sample sizes. However, the uncorrected method was more powerful than Croon’s and ML. Second, from an absolute perspective, designing well-powered studies will minimally require moderate sample sizes at both levels (e.g., 60 organizations and 20 individuals per organization) and medium to large effects (e.g., 0.3 to 0.5 on a standardized scale)
Misspecifications
Prior work in single-level SEMs has suggested that Croon’s method can be often less sensitive than ML to certain types of model misspecifications because of its stepwise nature (e.g., Devlieger & Rosseel, 2017). We probed two such misspecifications. The first was a misspecification in the measurement model for covariate X such that the first indicator (e.g., x 1) for the covariate ηx was driven by ηx and ηw (i.e., a type of cross-loading). The results suggested that Croon’s method was no more or less sensitive to this type of misspecification than ML. An example summarizing the impact of the misspecification on convergence and bias is presented in Figure 5 for 30 organizations and 20 individuals/organization.
The second scenario we considered was missing variable bias. A key concern in mediation analyses involving latent variables is the potential for bias due to uncontrolled confounding (Fritz et al., 2016). We examined the sensitivity of the estimators to violations of the sequential ignorability assumption in the presence of measurement error by excluding the confounding organization-level latent covariate (
Discussion
Recent reviews of the literature have suggested that measurement error rarely receives appropriate treatment, and we posited that this oversight may often owe to the mismatch between sample requirements of extant analytic methods and the complexity and scale of many studies. Yet, literature has clearly documented the deleterious and widespread effects of measurement error. For this reason, we developed multilevel extensions to Croon’s bias-corrected factor score-based path analysis estimator that is suitable for MLSEM.
The results of our study and the results of other studies suggest that Croon’s estimator is a viable alternative or complementary estimator for small to moderate sample MLSEMs for three primary reasons. First, initial evidence suggests that Croon’s estimator is likely to provide less biased estimates of path coefficients relative to ML and the uncorrected method in many common small- to moderate-scale studies.
Second, Croon’s estimator more frequently converged in small to moderate samples than ML while also providing the least biased estimates in samples where ML failed to converge. In this way, Croon’s estimator emerges as a key alternative or complementary estimator for settings in which ML estimation fails. The results also suggested that the uncorrected approach can play a similar complementary or backstop role—the uncorrected approach had the most reliable convergence rate but did so at a cost of more bias.
Third, Croon’s estimator had less sampling variability, lower root mean square error, and tended to be more powerful when compared with ML. This result was somewhat unanticipated because past research has shown that in many cases ML is still quite efficient in finite sample sizes (e.g., Lüdtke et al., 2008; Wooldridge, 2002). However, the results of this study and another recent study have suggested that in finite sample sizes Croon’s estimator can be more efficient (Kelcey, 2019). Comparisons between Croon’s and the uncorrected approach were more equivocal on this front because while the uncorrected approach returned biased estimates, it had several desirable properties including lower root mean square error and higher power.
Despite the promising initial performance of the proposed methods, this work thus far has several limitations. From an absolute perspective, Croon’s method may often retain slightly more power than concurrent ML but may still be underpowered with small to moderate sample sizes depending on the type and size of coefficient targeted and variable reliabilities. Moreover, both of these methods were clearly less powerful than the uncorrected approach. Future research should probe sample size limits and guidelines and the conditions that moderate the absolute and relative levels of power for the estimators.
A second limitation is that the proposed approach needs additional augmentations to generalize to settings with more complicated measurement models. The form of Croon’s estimator we detailed is built on a factor-wise implementation—the current approach leverages the ML estimates of the individual measurement models. However, MLSEM can accommodate much more complicated measurement models such as those with within indicator multidimensionality or correlated residual errors (e.g., Hayes & Usami, 2019). Future research is needed to further develop and probe Croon-like corrections for a broad range of measurement structures to detail the extent to which Croon’s method represents a versatile strategy.
Similarly, the current implementation of the proposed estimator is limited in the types of structural models it can serve. For instance, the proposed approach does not immediately accommodate random slopes (e.g., 1-1-1 mediation with random slopes). Estimation of several or more random slopes within the context of MLSEM can be data intensive and will typically require more than the number of organizations we have considered. Extensions to the proposed bias-corrected estimator that can relax the large sample requirements of random slope models may be particularly valuable in many settings.
A third limitation is that, like ML, Croon’s estimator is still subject to bias introduced through the omission of confounding variables. From a relative sense, our simulations suggested that in the presence of measurement error and an omitted confounding variable, Croon’s incurs the same level of additional bias as the ML estimator. However, from an absolute sense, Croon’s and ML still cannot fully correct for confounding variables that have been omitted from the model.
A final limitation of the proposed approach is that it currently lacks indices to assess the global fit of the model. More specifically, a key utility in MLSEM is the ability to test the extent to which the proposed model is plausibly consistent with the data and the degree to which it accounts for the observed covariances among variables. Recent work has begun to address this gap for single-level settings but more work is needed to develop a broad range of fit statistics that is comparable to those of conventional methods (Devlieger, Talloen, & Rosseel, 2019).
Even with these limitations, the results of our study provide important initial support for the utility of Croon’s estimator in multilevel settings. The results suggest that the estimator potentially serves as an important alternative or complement to ML in small- to moderate-scale multilevel models, and more generally they encourage the continued development of this approach.
Supplemental Material
Supplemental Material, 211_example - Croon’s Bias-Corrected Factor Score Path Analysis for Small- to Moderate-Sample Multilevel Structural Equation Models
Supplemental Material, 211_example for Croon’s Bias-Corrected Factor Score Path Analysis for Small- to Moderate-Sample Multilevel Structural Equation Models by Benjamin Kelcey, Kyle Cox and Nianbo Dong in Organizational Research Methods
Supplemental Material
Supplemental Material, 211_example_script - Croon’s Bias-Corrected Factor Score Path Analysis for Small- to Moderate-Sample Multilevel Structural Equation Models
Supplemental Material, 211_example_script for Croon’s Bias-Corrected Factor Score Path Analysis for Small- to Moderate-Sample Multilevel Structural Equation Models by Benjamin Kelcey, Kyle Cox and Nianbo Dong in Organizational Research Methods
Supplemental Material
Supplemental Material, bcfspa - Croon’s Bias-Corrected Factor Score Path Analysis for Small- to Moderate-Sample Multilevel Structural Equation Models
Supplemental Material, bcfspa for Croon’s Bias-Corrected Factor Score Path Analysis for Small- to Moderate-Sample Multilevel Structural Equation Models by Benjamin Kelcey, Kyle Cox and Nianbo Dong in Organizational Research Methods
Supplemental Material
Supplemental Material, simulation_script - Croon’s Bias-Corrected Factor Score Path Analysis for Small- to Moderate-Sample Multilevel Structural Equation Models
Supplemental Material, simulation_script for Croon’s Bias-Corrected Factor Score Path Analysis for Small- to Moderate-Sample Multilevel Structural Equation Models by Benjamin Kelcey, Kyle Cox and Nianbo Dong in Organizational Research Methods
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was supported by the National Science Foundation (grant numbers 1552535, 1760884). The National Science Foundation had no role or involvement in the conduct of the research or the preparation of the manuscript.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
