Testing the Performance of Level-Specific Fit Evaluation in MCFA Models With Different Factor Structures Across Levels

Abstract

A Monte Carlo study was conducted to compare the performance of a level-specific (LS) fit evaluation with that of a simultaneous (SI) fit evaluation in multilevel confirmatory factor analysis (MCFA) models. We extended previous studies by examining their performance under MCFA models with different factor structures across levels. In addition, various design factors and interaction effects between intraclass correlation (ICC) and misspecification type (MT) on their performance were considered. The simulation results demonstrate that the LS outperformed the SI in detecting model misspecification at the between-group level even in the MCFA model with different factor structures across levels. Especially, the performance of LS fit indices depended on the ICC, group size (GS), or MT. More specifically, the results are as follows. First, the performance of root mean square error of approximation (RMSEA) was more promising in detecting misspecified between-level models as GS or ICC increased. Second, the effect of ICC on the performance of comparative fit index (CFI) or Tucker–Lewis index (TLI) depended on the MT. Third, the performance of standardized root mean squared residual (SRMR) improved as ICC increased and this pattern was more clear in structure misspecification than in measurement misspecification. Finally, the summary and implications of the results are discussed.

Keywords

multilevel confirmatory factor analysis (MCFA)model evaluation method level-specific (LS) fit evaluation simultaneous (SI) fit evaluation partially saturated model (PS) method Monte Carlo simulation study

Introduction

The multilevel structural equation model (MSEM) is extensively used in behavioral and social sciences. For MSEM, the model fit evaluation is a primary methodological issue because model fit determines the degree to which a model matches the observed data. Many researchers have used the traditional simultaneous (SI) approach to evaluate MSEM. However, the SI approach has a potential limitation in locating the source of lack of model fit, especially at the between-group level (Ryu, 2014; Ryu & West, 2009). Because sample size is generally much larger at the within-group level than at the between-group level, the model fit is expected to be dominated by the within-group level. Therefore, the SI approach may not be sensitive to model misspecification at the between-group level.

To overcome limitations of the SI approach, the level-specific (LS) approaches, such as the partially saturated model (PS; Ryu & West, 2009) and the segregating (SEG) methods (Yuan & Bentler, 2007), have been proposed. The PS method uses a saturated model to obtain chi-square test statistics and the degrees of freedom for each level. On the contrary, the SEG method requires the following two steps: (a) computing estimates of unrestricted covariance matrices at each level, and (b) conducting single-level covariance structure analysis with estimated covariance matrices as input data. Comparing the two methods, the PS method was reported to be superior to the SEG method in detecting model misspecification (Jung, 2016; Ryu & West, 2009). In addition, it was reported to perform better than the SEG method in terms of convergence rates and Type 1 error rates regardless of the sample size (Ryu & West, 2009).

Recently, Yuan et al. (2016) proposed equivalence testing with adjusted fit indexes that does not depend on conventional model fit criteria. The equivalence testing was considered as a way to advance the inferential nature of structural equation model (SEM) as a confirmatory tool. Some simulation studies (Finch & French, 2018; Marcoulides & Yuan, 2017) showed that the equivalence testing provides additional information for assessing fit of SEM. Marcoulides and Yuan (2020) attempted to extend the equivalence testing to MSEM, but they only used empirical data under very limited conditions (e.g., a group size [GS] of 31). That is, more studies are needed to consider equivalence testing as a better alternative to the PS method. Moreover, in Marcoulides and Yuan’s (2020) study, the PS method and equivalence testing had an almost similar performance for detecting the model misspecification at each level. In addition, a recent study (Rappaport et al., 2020) also recommended the use of the PS method by providing the implementation of an algorithm within open source SEM package OpenMx for PS method users. Therefore, this study would focus on the PS method (Ryu & West, 2009) to calculate LS fit indices.

Since Ryu and West (2009) proposed the LS approach based on PS method, many previous simulation studies have compared the performance of the SI approach with that of the LS approach (Boulton, 2011; Jung, 2016; Ryu & West, 2009). Most studies demonstrated that the LS approach was superior to the SI approach for detecting model misspecification, particularly at the between-group level. In addition, the previous studies have examined the LS approach’s performance under certain design factors, such as intraclass correlation (ICC) and sample size. In particular, the ICC condition was considered as the most important factor for the performance of LS approach. Boulton (2011) and Hsu et al. (2016) verified the performance of the LS approach by diversifying the ICC conditions. Boulton (2011) considered a small ICC value less than .30 and Hsu et al. (2016) considered ICCs with a range of .09 to .50. They showed that the LS approach’s performance was more promising in detecting misspecified between-level models as ICC increased. On the contrary, the effect of sample size on its performance was inconsistent. Boulton (2011) demonstrated that LS approach’s performance improved as the sample size decreased, whereas others (Hsu et al., 2015; Ryu & West, 2009) reported that the effect of sample size on its performance was trivial. Some researchers proposed to consider unbalanced GSs (Boulton, 2011; Hsu et al., 2015; Schermelleh-Engel et al., 2014). However, only Jung (2016) considered unbalanced GSs and she showed that convergence rates were generally higher with equal GSs rather than with unequal GSs. Therefore, LS’s performance across different group balance (GB) conditions needs to be more tested, in that most of the empirical studies tended to use not only the balanced groups but also the unbalanced ones.

Hsu et al. (2015) investigated the performance of the fit indices across misspecification type (MT) conditions. He showed that the performance of Standardized Root Mean Squared Residuals for between-group level $SRM R_{B}$ was changed across MT conditions, such as measurement misspecification (MM) or structure misspecification (SM), which is consistent with the results of Jung (2016). Hsu et al. (2015) and Jung (2016) were interested in MT conditions as a design factor, but they considered only $SRM R_{B}$ rather than LS fit indices. On the contrary, Boulton (2011) demonstrated that the performance of the LS approach depended on the severity of misspecification. In this context, MT conditions might be critical in determining the performance of the LS approach. Moreover, MT conditions could be one moderating factor for the effect of ICC on the performance of LS fit indices. Boulton (2011) found that the effect of ICC on the performance of LS fit indices was differential, depending on the severity of misspecification. For example, the performance of the LS comparative fit index (CFI) and Tucker–Lewis index (TLI) improved with an increase in ICC when the severity of misspecification was low, whereas their performance deteriorated with an increase in ICC when the severity of misspecification was high. Jung (2016) also showed that the effect of ICC on the performance of LS approach may be affected by MT conditions such as MM or SM. However, she did not present any interpretation or discussion of these results. Taken together, GB and MT conditions have been considered as critical design factors for the performance of LS approach, but they were insufficiently examined in previous simulation studies. By considering various situations in real data, additional design factors such as GB, MT, and the interaction effect of ICC and MT, must be explored.

Despite the richness of previous simulation studies, the form of the analysis model is still limited. Previous studies have focused on two-factor multilevel confirmatory factor analysis (MCFA) models only with an identical factor structure across levels (Boulton, 2011; Hsu et al., 2015, 2016; Jung, 2016; Ryu, 2011; Ryu & West, 2009; Schermelleh-Engel et al., 2014). Jung (2016) tried to test the three-factor MCFA model, but it was a model with the same factor structure across levels. The models used in previous studies could be inappropriate when the meaning and conceptual attributes of the construct differ depending on the analysis level (Dunn et al., 2015; Huang & Cornell, 2016). They may be challenging to use in real data settings. In this context, Ryu and West (2009) suggested that it is necessary to investigate the performance of LS approach in MCFA models with different factor structures across levels. In this sense, this study would extend the performance of the LS approach to MCFA models with different factor structures across levels.

In summary, this study would focus on the performance of the LS fit evaluation for MCFA models with different types of factor structures across levels. To this end, five design factors would be examined: ICC, number of groups (NG), GS, GB, and MT. Taken together, the results of this study would fill the literature gap by considering different types of MCFA models and additional design factors. Furthermore, we would aim to present useful guidelines on using the LS approach under various conditions for empirical researchers. The following research questions are addressed using a simulation study:

Research Question 1 (RQ1): How does the performance of the LS approach compare with that of the SI approach for three types of MCFA models with the same or different factor models across levels?

Research Question 2 (RQ2): How does the performance of the LS approach compare with that of SI approach across five design factors?

Research Question 3 (RQ3): What are the interaction effects between ICC and MT on the performance of the LS approach?

Literature Review

Multilevel Structural Equation Model

For MSEM, the group is a simple random sample from a population, whereas the individual is a simple random sample within each group. Therefore, the individual observations are not completely independent. Assuming that the data are collected from $N$ ( $i = 1, 2, \dots, N$ ) individuals nested within $J$ ( $j = 1, 2, \dots, J$ ) groups, the number of individuals in the $j th$ group is represented as $n_{j}$ . The total number of individual observations N is $\sum_{j = 1}^{J} n_{j}$ .

Let $y_{ij}$ present a $p \times 1$ data vector of individual-level variables for the individual $i$ in the $j$ th group where $p$ is the number of individual-level variables. Then, the random variation of individual-level variables ( $y_{ij}$ ) consists of between-group differences ( $y_{Bj}$ ) and within-group differences ( $y_{Wij}$ ), as shown in

y_{ij} = y_{Bj} + y_{Wij}

(1)

$y_{ij}$ can be expressed as a covariance structure by two assumptions. The first assumption is that the between-group random components are uncorrelated with the within-group random components. The second assumption is that the covariance structure at the between-group level is the same for all groups. By these two assumptions, $y_{ij}$ is expressed as

Cov (y_{ij}) = Cov (y_{Bj} + y_{Wij}) = Cov (y_{Bj}) + Cov (y_{Wij})

(2)

Σ_{T} = Σ_{B} + Σ_{W}

(3)

where $Σ_{B}$ and $Σ_{W}$ are the covariance matrix for the within- and between-group levels, respectively, and $Σ_{T}$ is the covariance matrix of a total population.

On the contrary, we used the maximum likelihood estimation (MLE), which estimates the parameters of a probability distribution by maximizing likelihood functions. It supposes that the observed variables follow the multivariate normality. Specifically, the two assumptions are as follows (Liang & Bentler, 2004; Ryu & West, 2009). The first assumption is that the observed variables consist of two uncorrelated random components: within-group variation $y_{Wij}$ and between-group variation $y_{Bj}$ of individual level variables. The second assumption is that two random components are independent and follow a normal distribution at each level. Based on these two assumptions, the maximum likelihood fitting function for two-level covariance structure analysis is represented as

\begin{matrix} F_{ML} = \sum_{j = 1}^{J} (n_{j} - 1) [\log | \sum_{W} (θ) | tr (\sum_{W}^{- 1} (θ) S_{yWj})] + \sum_{j = 1}^{J} [\log | \sum_{gj} (θ) | + tr (\sum_{gj}^{- 1} (θ) S_{gj})] \\ = \sum_{j = 1}^{J} (n_{j} - 1) f_{1 j} (θ) + \sum_{j = 1}^{J} f_{2 j} (θ) \end{matrix}

(4)

where $S_{yWj} = (n_{j} - 1)^{- 1} \sum_{i = 1}^{n_{j}} (y_{ij} - \bar{y_{j}}) (y_{ij} - \bar{y_{j}})^{'}$ , $\bar{y_{j}} = n_{j}^{- 1} \sum_{i = 1}^{n_{j}} y_{ij}$ , $\bar{y} = N^{- 1} \sum_{j = 1}^{J} \sum_{i = 1}^{n_{j}} y_{ij}$ , $\sum_{gj} (θ) = \sum_{B} (θ) + n_{j}^{- 1} \sum_{W} (θ)$ , and $S_{gj} = n_{j} (\bar{y_{j}} - \bar{y}) (\bar{y_{j}} - \bar{y})^{'}$ θ is a vector of a model parameter.

Simultaneous Fit Evaluation

The SI approach is based on the model fit evaluation for single-level SEM. It evaluates the model fit for the entire model across both levels. The following paragraphs describe the formulation and features of chi-square test statistics and fit indices (i.e., $RMSE A_{SI}$ , $CF I_{SI}$ , and $TL I_{SI}$ ) derived from the SI approach.

Simultaneous Chi-Square Test Statistics

The chi-square test is a statistical method for evaluating the goodness of fit for a model. When the observed variables follow a multivariate normal distribution and the sample size is large enough, the product of $f_{ML}$ (maximum likelihood fitting function) and $n - 1$ (sample size minus 1) follows the $χ_{df}^{2}$ distribution. The null hypothesis ( $H_{0}$ ) of the $χ^{2}$ test is $\sum = \sum (θ)$ when ∑ is the covariance matrix variance and $\sum (θ)$ is the research model. If the null hypothesis of the $χ^{2}$ test is not rejected, the model is interpreted to fit the data well. As the value of $χ^{2}$ increases, the model fit worsens. When the value of $χ^{2}$ approaches 0, the model fit improves. If the value of $χ^{2}$ equals zero, it is a perfect fit. In this study, the $χ^{2}$ based on the SI approach is expressed as $χ_{SI}^{2}$ .

Simultaneous RMSEA

$RMSE A_{SI}$ (Browne & Cudeck, 1992) of the absolute fit index theoretically follows a noncentral $χ^{2}$ distribution. It is sensitive to the number of estimated parameters and a most informative fit index. In the model fit criteria, it indicates a good fit when $RMSE A_{SI}$ is below .06 (Hu & Bentler, 1999). The formula is represented as

RMSE A_{SI} = \sqrt{\frac{χ_{H}^{2} - d f_{H}}{d f_{H} (n - 1)}}

(5)

where $χ_{H}^{2}$ is value of $χ^{2}$ and $d f_{H}$ is degrees of freedom in the hypothesized model; n is the sample size.

Simultaneous CFI

$CF I_{SI}$ (Bentler, 1990) of relative fit index is based on a noncentral $χ^{2}$ distribution. It evaluates the model fit by comparing the fit of a hypothesized model with that of an independence model. The values of $CF I_{SI}$ range from 0 to 1, indicating a good fit for the model when the value exceeds .95 (Hu & Bentler, 1999). The formula is represented as

CF I_{SI} = 1 - \frac{Max (χ_{H}^{2} - d f_{H}, 0)}{Max (χ_{H}^{2} - d f_{H}, χ_{I}^{2} - d f_{I})}

(6)

where $χ_{H}^{2}$ is value of $χ^{2}$ and $d f_{H}$ is degrees of freedom in the hypothesized model, and $χ_{I}^{2}$ is the value of $χ^{2}$ and $d f_{I}$ is the degrees of freedom in the independence model.

Simultaneous TLI

The $TL I_{SI}$ (Tucker & Lewis, 1973) considers the parsimony of the model. Therefore, if the fit indices of two models are similar, a simpler model (i.e., greater degrees of freedom) is chosen. $TL I_{SI}$ is an unstandardized value, so it can have a value less than 0 or greater than 1. It indicates a good fit for the model when the value exceeds .95 (Hu & Bentler, 1999). The formula is represented as

TL I_{SI} = \frac{χ_{I}^{2} / d f_{I} - χ_{H}^{2} / d f_{H}}{χ_{I}^{2} / d f_{I} - 1}

(7)

Level-Specific Fit Evaluation

To deal with LS approach, the PS (Ryu & West, 2009), the SEG (Yuan & Bentler, 2007), and equivalence testing (Marcoulides & Yuan, 2020) methods have been proposed. We focus on the PS method to calculate the LS fit indices because the good performance of PS method has been continuously verified and many researchers have recommended the PS method for evaluating model fit in MSEM (Jung, 2016; Marcoulides & Yuan, 2020; Rappaport et al., 2020; Ryu & West, 2009). In the PS method, the model fit statistics are calculated using a saturated model that freely estimates the correlation between all observed variables. The following paragraphs describe the formulation and features for the chi-square test statistics and fit indices derived from the PS method. Note that the subscripts PS_B and PS_W are used to present LS statistics derived from the PS method.

Level-Specific Chi-Square Statistics

The value of $χ^{2}$ can be calculated separately for each level. In the model where the hypothesized model is set at the between-group level and the saturated model is set at the within-group level, the value of $χ_{P S_{-} B}^{2}$ can determine the misspecified between-level models. The value of $χ_{P S_{-} B}^{2}$ is represented as

χ_{P S_{-} B}^{2} = F_{ML} [Σ_{B} (\hat{θ}), Σ_{W} (\hat{θ_{S}})] - F_{ML} [Σ_{B} (\hat{θ_{S}}), Σ_{W} (\hat{θ_{S}})]

(8)

where $θ$ is the vector of parameters in the hypothesized model and $θ_{S}$ is the vector of parameters in the saturated model. $F_{ML} [Σ_{B} (\hat{θ}), Σ_{W} (\hat{θ_{S}})]$ is the fitting-function value for the saturated model at the within-group level, whereas $F_{ML} [Σ_{B} (\hat{θ_{S}}), Σ_{W} (\hat{θ_{S}})]$ is the fitting-function value when the model is saturated at both levels (fully saturated model). The degrees of freedom are the differences between the number of parameters in the fully saturated model ( $d f_{W, S} + d f_{B, S}$ ) and the number of parameters in the PS model ( $d f_{W, S} + d f_{B}$ ). It can be computed as

d f_{P S_{-} B} = (d f_{W, S} + d f_{B, S}) - (d f_{W, S} + d f_{B}) = d f_{B, S} - d f_{B}

(9)

Similarly, the misspecified within-level models can be determined. The hypothesized model is set at the within-group level and the saturated model is set at the between-group level. The value of $χ_{PS_W}^{2}$ is represented as

χ_{P S_{-} W}^{2} = F_{ML} [Σ_{B} (\hat{θ_{S}}), Σ_{W} (\hat{θ})] - F_{ML} [Σ_{B} (\hat{θ_{S}}), Σ_{W} (\hat{θ_{S}})]

(10)

$F_{ML} [Σ_{B} (\hat{θ_{S}}), Σ_{W} (\hat{θ})]$ is the fitting-function value for the saturated model at the between-group level, whereas $F_{ML} [Σ_{B} (\hat{θ_{S}}), Σ_{W} (\hat{θ_{S}})]$ is the fitting-function value when the model is saturated at both levels (fully saturated model). The degrees of freedom are the differences between the number of parameters in the fully saturated model ( $d f_{W, S} + d f_{B, S}$ ) and the number of parameters in the PS model ( $d f_{W} + d f_{B, S}$ ). It can be computed as

d f_{P S_{-} W} = (d f_{W, S} + d f_{B, S}) - (d f_{W} + d f_{B, S}) = d f_{W, S} - d f_{W}

(11)

Level-Specific RMSEAs

With $χ_{P S_{-} B}^{2}$ and its corresponding $d f_{P S_{-} B}$ , RMSEA at the between-group level ( $RMSE A_{P S_{-} B}$ ) is expressed as

RMSE A_{PS_B} = \sqrt{MAX (\frac{χ_{P S_{-} B}^{2} - d f_{P S_{-} B}}{d f_{P S_{-} B} (J)}, 0)}

(12)

where $J$ is NG. $χ_{P S_{-} B}^{2}$ - $d f_{P S_{-} B}$ means unbiased estimates of noncentral parameters.

$RMSE A_{P S_{-} B}$ provides a penalty for $J$ to prevent $χ_{P S_{-} B}^{2}$ from becoming too large. If $χ_{P S_{-} B}^{2}$ is less than $d f_{P S_{-} B}$ ( $χ_{P S_{-} B}^{2}$ - $d f_{P S_{-} B} < 0$ ), $RMSE A_{P S_{-} B}$ is zero. With $χ_{P S_{-} W}^{2}$ and its corresponding $d f_{P S_{-} W}$ , RMSEA at the within-group level ( $RMSE A_{P S_{-} W}$ ) is expressed as

RMSE A_{PS_W} = \sqrt{MAX (\frac{χ_{P S_{-} W}^{2} - d f_{P S_{-} W}}{d f_{P S_{-} W} (N - J)}, 0)}

(13)

Level-Specific CFIs

$χ_{P S_{-} B}^{2}$ , $χ_{P S_{-} W}^{2}$ , $d f_{P S_{-} B}$ , and $d f_{P S_{-} W}$ are used to calculate LS CFIs for each level. $χ_{I_{-} B, S_W}^{2}$ represents the value of $χ^{2}$ in the saturated model at a within-group level and the independence model at a between-group level ( $[Σ_{B} (\hat{θ_{I}}), Σ_{W} (\hat{θ_{S}})]$ , $θ_{I}$ is a vector of parameters in the independence model). ${df}_{I_{-} B, S_W}$ is defined as a value for the difference between ${df}_{B, S}$ and ${df}_{B, I}$ . The formula is shown in

CF I_{PS_B} = 1 - \frac{Max [(χ_{PS_B}^{2} - d f_{PS_B}), 0]}{Max [(χ_{I_B, S_W}^{2} - d f_{I_B, S_W}), 0]}

(14)

The calculation method for $CF I_{P S_{-} W}$ is similar to that of $CF I_{P S_{-} B}$ . $χ_{S_{-} B, I_W}^{2}$ represents the value of $χ^{2}$ in the independence model at a within-group level and the saturated model at a between-group level ( $[Σ_{B} (\hat{θ_{S}}), Σ_{W} (\hat{θ_{I}})]$ , $θ_{I}$ is a vector of parameters in the independence model). ${df}_{S_{-} B, I_W}$ is defined as a value for the difference between ${df}_{W, S}$ and ${df}_{W, I}$ . The formula is shown in

CF I_{P S_{-} W} = 1 - \frac{Max [(χ_{P S_{-} W}^{2} - d f_{P S_{-} W}), 0]}{Max [(χ_{S_{-} B, I_{-} W}^{2} - d f_{S_{-} B, I_{-} W}), 0]}

(15)

Level-Specific TLIs

TLI can evaluate the model fit by comparing the independence model with the hypothesized model. The methods for calculating $TL I_{P S_{-} B}$ and $TL I_{P S_{-} W}$ are shown in

TL I_{PS_B} = \frac{(χ_{I_B, S_W}^{2} / d f_{I_B, S_W}) - (χ_{PS_B}^{2} / d f_{PS_B})}{(χ_{I_B, S_W}^{2} / d f_{I_B, S_W}) - 1}

(16)

TL I_{PS_W} = \frac{(χ_{S_B, I_W}^{2} / d f_{S_B, I_W}) - (χ_{PS_W}^{2} / d f_{PS_W})}{(χ_{S_B, I_W}^{2} / d f_{S_B, I_W}) - 1}

(17)

Level-Specific Standardized Root Mean Squared Residuals (SRMRs)

SRMR is an alternative LS fit index produced by Mplus, namely, SRMR for the within-group level ( $SRM R_{W}$ ) and between-group level ( $SRM R_{B}$ ). It can be represented by

SRMR = \sqrt{\sum_{i = 1}^{p} \sum_{j = 1}^{i} \frac{{[r_{ij} / \sqrt{Va r_{i}} \sqrt{Va r_{j}}]}^{2}}{1 / 2 p (p + 1)}}

(18)

where $r_{ij}$ is $i th$ row and $j th$ column in the residual matrix and $p$ is the number of observed variables. $Va r_{i}$ and $Va r_{j}$ represent the variance of the $i th$ and $j th$ variables in the covariance matrix, respectively.

Previous Simulation Studies

Many studies demonstrated the effectiveness of the LS approach to detect the lack of model fit at any level in MSEM, whereas the SI approach failed to detect misspecified between-level models (Boulton, 2011; Hsu et al., 2015, 2016; Jung, 2016; Ryu, 2011; Ryu & West, 2009; Schermelleh-Engel et al., 2014; Sessoms, 2019). These studies examined the performance of LS and SI fit evaluation relating to the analysis model and design factor. The details are as follows.

Studies have considered several analysis models such as a two-factor MCFA model (Boulton, 2011; Hsu et al., 2015, 2016; Ryu & West, 2009), a three-factor MCFA model (Jung, 2016), and a moderating effect model (Schermelleh-Engel et al., 2014). Although the performance of the LS approach was verified in several analysis models, they focused only on the MCFA model with an identical factor structure across levels. However, the conceptual characteristics of a construct might have different meanings depending on the level of analysis in real data. Specifically, the collective variables, such as positive values (Huang & Cornell, 2016), school environment (Dunn et al., 2015), or collective efficacy (Huang et al., 2015) are reported to have different interpretations across levels. For example, Huang and Cornell (2016) found that positive values have a two-factor structure at the within-group level and a one-factor structure at the between-group level. Therefore, the performance of the LS approach for MCFA models with different factor structures across levels should be investigated.

On the contrary, previous studies have considered various design factors such as ICC, sample size, GB, and MT. First, ICC was related to the performance of LS fit indices (Boulton, 2011; Hsu et al., 2016; Jung, 2016). LS fit indices were more promising for detecting misspecified between-level models with an increase in ICC (Boulton, 2011; Hsu et al., 2016). However, the ICC factor did not affect the performance of SI fit indices.

Second, several studies have considered sample size as a design factor (Boulton, 2011; Hsu et al., 2015; Ryu & West, 2009). Boulton (2011) adopted five levels of sample size ranging from 1,000 to 2,500 and showed that the performance of both SI and LS approaches commonly improved with a decrease in sample size. In addition, if the NG increased with a specific GS (i.e., 20), the performance of both SI and LS fit indices improved. However, other studies found that their performance did not change across different sample sizes (Hsu et al., 2015; Ryu & West, 2009). These studies focused only on large sample sizes (e.g., 2,500–50,000); therefore, more various sample sizes should be investigated.

Third, GB might be related to the performance of SI and LS approaches (Hox et al., 2010; Hox & Maas, 2001; Jung, 2016). Based on reported results, convergence rates were generally higher with equal GSs, rather than with unequal GSs (Hox & Maas, 2001; Jung, 2016). Furthermore, the chi-square model test was more accurate in the balanced group than in the unbalanced group for an MCFA model (Hox & Maas, 2001). Recently, Jung (2016) showed that the performance of SI and LS fit indices was consistent regardless of GB. However, she examined only a limited GB condition with small GSs (e.g., 5–75). Thus, further research is needed to explore the effect of GB with more various GSs.

Finally, MT, which is defined as MM or SM, affected the performance of LS and SI approaches. According to Hsu et al. (2015) and Jung (2016), the SI fit indices were more sensitive to MM than to SM. On the contrary, SRMRs were more sensitive to SM than to MM at either levels. Moreover, the effect of ICC on the LS’s performance might depend on MT. Boulton (2011) showed that the performance of $RMSE A_{P S_{-} B}$ improved with an increase in ICC for most levels of misspecification severity. In the case of $CF I_{P S_{-} B}$ or $TL I_{P S_{-} B}$ , their performance improved with an increase in ICC when the misspecification severity was low, whereas their performance improved with a decrease in ICC when the misspecification severity was high. Boulton (2011) suggested that MT might play a critical role in the effect of ICC on the performance of LS fit indices.

Method

Data Generation

Data were generated based on the three types of population models (Figure 1). Model₁ was set as a two-factor MCFA model with the same factor structure across levels, whereas Model_2A was set as a two-factor structure at the within-group level and a one-factor structure at the between-group level. In addition, Model_2B was set as a one-factor structure at the within-group level and a two-factor structure at the between-group level. All three models have six observed variables ( $y_{1}$ ~ $y_{6}$ ) at each level and cross-loading (a, c) was added to set model misspecification conditions.

Figure 1.

(A) Model₁, (B) Model_2A, and (C) Model_2B

Table 1 provides all population parameter values for the three models. For three ICC levels (.10, .20, and .30), factor loadings and residual variances were manipulated. Residual variances were uncorrelated. In addition, the factor variance was fixed at 1, and covariance was fixed at .5 for all cases, based on previous studies (Hsu et al., 2015; Jung, 2016; Sessoms, 2019).

Table 1.

Population Parameter Values for Data Generation

			ICC = .10		ICC = .20		ICC = .30
	Variable		$λ$	$γ$	$λ$	$γ$	$λ$	$γ$
Model₁	$F_{W_{1}}$	$y_{1}$ - $y_{3}$	.900	.990	.850	.878	.800	.760
	$F_{W_{1}}$	$y_{4}$	.700	.950	.650	.870	.600	.790
	$F_{W_{2}}$		.600	.950	.550	.870	.500	.790
	$F_{W_{2}}$	$y_{5}$ - $y_{6}$	.900	.990	.850	.878	.800	.760
	$F_{b_{1}}$	$y_{1}$ - $y_{3}$	.350	.080	.550	.100	.700	.110
	$F_{b_{1}}$	$y_{4}$	.270	.075	.400	.140	.500	.190
	$F_{b_{2}}$		.230	.075	.315	.140	.400	.190
	$F_{b_{2}}$	$y_{5}$ - $y_{6}$	.350	.080	.550	.100	.700	.110
Model_2A	$F_{W_{1}}$	$y_{1}$ - $y_{3}$	.900	.990	.850	.878	.800	.760
	$F_{W_{1}}$	$y_{4}$	.700	.950	.650	.870	.600	.790
	$F_{W_{2}}$		.600	.950	.550	.870	.500	.790
	$F_{W_{2}}$	$y_{5}$ - $y_{6}$	.900	.990	.850	.878	.800	.760
	$F_{b_{1}}$	$y_{1}$ - $y_{6}$	.350	.080	.550	.100	.700	.110
Model_2B	$F_{W_{1}}$	$y_{1}$ - $y_{6}$	.900	.990	.850	.878	.800	.760
	$F_{b_{1}}$	$y_{1}$ - $y_{3}$	.350	.080	.550	.100	.700	.110
	$F_{b_{1}}$	$y_{4}$	.270	.075	.400	.140	.500	.190
	$F_{b_{2}}$		.230	.075	.315	.140	.400	.190
	$F_{b_{2}}$	$y_{5}$ - $y_{6}$	.350	.080	.550	.100	.700	.110

Note. ICC = intraclass correlation; $λ$ = factor loading; $γ$ = residual variance.

For data generation, a Monte Carlo simulation was conducted using Mplus 8.3 (Muthén & Muthén, 1998–2019), with TYPE = TWOLEVEL. Data sets were generated based on multivariate normality in all simulations and ML estimation was used.

Design Factors

Five design factors for simulations were considered: ICC, NG, GS, GB, and MT. The details of simulation conditions are described below.

Intraclass Correlation

A low ICC was related to low convergence rates and biased parameter estimates (Hox & Maas, 2001; Hsu et al., 2016). In a review of multilevel factor analysis applications (Kim et al., 2016), the ICC ranged from .13 to .34 on average. Previous simulation studies considered the ICC levels ranging from .05 to .30 (Boulton, 2011; Hsu et al., 2016; Sessoms, 2019). In some cases, ICC was set at .50 (Hsu et al., 2016; Ryu & West, 2009) but this ICC level might be unusual for real data. This study assumed that ICC was generally less than .20 and rarely exceeds .30 in real data (Hox & Maas, 2001; Lüdtke et al., 2008). Accordingly, three levels of ICC (i.e., .10, .20, and .30) were chosen for this study.

Number of Groups

NG was related to the stability of the model estimation and convergence rates at the between-group level (Meuleman & Billiet, 2009; Wu et al., 2017). It has been reported that at least 40 to 60 groups are required for accurate estimation (Hox, 2010; Meuleman & Billiet, 2009). In previous simulation studies, NG generally ranged from 20 to 1,000 (Jung, 2016; Sessoms, 2019), however, NG = 1,000 is unrealistic for practical studies. Furthermore, the model estimation at the between-group level is sufficiently stable when NG exceeds 100 (Hox & Maas, 2001); therefore, we adopted three values of NG (20, 50, and 100).

Group Size

Although GS did not affect the accuracy of the parameter estimates and standard errors (Hox & Maas, 2001), it could be a critical design factor in detecting the misfit of MSEM (Jung, 2016). In Kim et al.’s (2016) literature review of multilevel factor analysis applications, median and mean of GS were found to be about 19 and 26, respectively. Most of previous simulation studies have examined GS in the range of 10 to 60 (Hsu et al., 2015; Wu & Kwok, 2012), while a large GS (GS = 100; Hsu et al., 2016; Ryu & West, 2009) was also considered. This study adopted three GSs (20, 50, and 100) based on previous studies.

Group Balance

The convergence rates were higher in the balanced case with equal GS than in the unbalanced case with unequal GS (Hox & Maas, 2001; Jung, 2016). In previous studies, the unbalanced case was manipulated with half the group being a small GS and the other half being a large GS (Hox & Maas, 2001; Jung, 2016). Furthermore, a large GS was set to be 3 times larger than the small GS to maximize the unbalanced effect (Hox et al., 2010; Hox & Maas, 2001). In line with previous studies, we manipulated the unbalanced conditions to (10, 30), (25, 75), and (50, 150).

Misspecification Type

This study manipulated a, b, c, and d values to the misspecified models (Figure 1). For Model₁ (Figure 1A), the five types of models were compared: (a) the correct model, which was the same as the population model; (b) MM in within-level model only (a = 0 in Figure 1A); (c) SM in within-level model only (b = 0 in Figure 1A); (d) MM in the between-level model only (c = 0 in Figure 1A); and (e) SM in the between-level model only (d = 0 in Figure 1A). For Model_2A, three types of models, such as a correct model in addition to MM or SM in within-level model only (a = 0 or b = 0 in Figure 1B) were explored. On the contrary, Model_2B considered a correct model and also MM or SM in between-level model only (c = 0 or d = 0 in Figure 1C).

Analysis

As stated above, 270 conditions (3 $ICC \times$ 3 $NG \times$ 3 $GS \times$ 2 $GB \times$ 5 MT) were manipulated for Model₁, whereas 162 conditions (3 $ICC \times$ 3 $NG \times$ 3 $GS \times$ 2 $GB \times$ 3 MT) were manipulated, respectively, for Model_2A or Model_2B. For 594 conditions, 500 replications for each condition were generated and 500 replications without convergence problems were included for further analysis.

The performance of SI and LS approaches was evaluated by reporting convergence rates, $χ^{2}$ test statistics, and fit indices. For $χ^{2}$ test statistics, rejection rates with a significance level of .05 were considered (Hsu et al., 2016; Jung, 2016; Ryu & West, 2009). For fit indices, LS fit indices ( $RMSE A_{P S_{-} W}$ , $RMSE A_{P S_{-} B}$ , $CF I_{P S_{-} W}$ , $CF I_{P S_{-} B}$ , $TL I_{P S_{-} W}$ , and $TL I_{P S_{-} B}$ ) based on the PS method (Ryu & West, 2009) and SI fit indices (i.e., $RMSE A_{SI}$ , $CF I_{SI}$ , and $TL I_{SI}$ ) were considered. In addition, $SRM R_{W}$ and $SRM R_{B}$ produced by Mplus were presented.

If needed, two-way analysis of variance (ANOVA) was conducted to verify the interaction effects between ICC and MT on the performance of LS fit indices. The dependent variable was fit index mean differences (i.e., fit index of correct model minus that of misspecified model; Jung, 2016). That is, a large absolute value of the dependent variable was taken to indicate that a fit index was more promising for detecting misspecified between-level models (i.e., better performance of fit index). In addition, the total sum of squares (SOS) of each fit index demonstrated variability of the corresponding fit index under a specific condition. Furthermore, the partial eta-squared measure ( $η_{p}^{2}$ ) was the effect size that indicated that the proportion of the variance accounted for the specific design factor effect or the interaction effect terms; $η_{p}^{2}$ was obtained by dividing Type III SOS of the effects by the corrected total SOS. It was interpreted using Cohen’s (1988) guidelines (small: .01, medium: .06, and large: .14).

Results

The performance of SI and LS approaches was evaluated for convergence rates, chi-square test statistics, and fit indices for all the analysis models. The pattern of results was generally consistent across three models (i.e., Model₁, Model_2A, and Model_2B), so we focused on common results for those models. Details of results were presented in Lee (2020).

Convergence Rates

For all the three correct models, the LS approach had generally higher convergence rates than the SI approach. The convergence rates of the SI approach were positively associated with NG or GS for the three models, while those of the LS approach were close to 1 across all design factors.

Nonconvergence occurred when the NG was 20 for all three models, consistent with the results of Jung (2016). Therefore, a total of 500 replications without convergence problems were included in the analyses. Note that the convergence rates of misspecified models were lower than those of the correct models.

Means of χ² Test Statistics and Fit Indices

Correct Model

For all correct models, the means of all $χ^{2}$ test statistics ( $χ_{SI}^{2}$ , $χ_{PS__W}^{2}$ , and $χ_{PS__B}^{2}$ ) were close to the degrees of freedom, meaning that the model was correctly specified. When the criterion alpha level was equal to .05, the rejection rates of $χ_{PS__W}^{2}$ were the lowest compared with those of $χ_{SI}^{2}$ and $χ_{PS__B}^{2}$ . Across all design factors, the patterns of all $χ^{2}$ test statistics were consistent. On the contrary, the rejection rates of $χ_{PS__B}^{2}$ were more inflated (i.e., more likely to increase Type 1 error rate) rather than $χ_{PS__W}^{2}$ or $χ_{SI}^{2}$ across all design factors.

The results showed that both SI and LS fit indices indicated a good model fit across all design factors. Specifically, the means of $CF I_{SI}$ and $TL I_{SI}$ were greater than .999, whereas the mean of $RMSE A_{SI}$ was smaller than .008 across all design factors for the three models. For LS fit indices, the means of $CF I_{P S_{-} W}$ and $TL I_{P S_{-} W}$ were equal to 1, and the means of $RMSE A_{P S_{-} W}$ and $SRM R_{W}$ were smaller than .012 for the three models. In addition, the means of $CF I_{P S_{-} B}$ and $TL I_{P S_{-} B}$ exceeded .988, and the mean of $RMSE A_{P S_{-} B}$ was smaller than .048 for the three models. The mean of $SRM R_{B}$ was smaller than .049, but it increased slightly with a decrease in NG or ICC.

Misspecified Within-Level Model (Model₁, Model_2A)

For all types of misspecified models, the means of $χ_{SI}^{2}$ were more inflated than those of $χ_{PS__W}^{2}$ . On the contrary, the rejection rates of $χ_{SI}^{2}$ and $χ_{PS__W}^{2}$ were equal to 1 across all design factors. Considering design factors, the means of $χ_{SI}^{2}$ and $χ_{PS__W}^{2}$ were more inflated (i.e., more likely to detect the misspecified within-level model) with an increase in NG or GS. Furthermore, the means of $χ_{SI}^{2}$ and $χ_{PS__W}^{2}$ were more inflated in SM than in MM. However, the effects of ICC or GB on the means of $χ_{SI}^{2}$ or $χ_{PS__W}^{2}$ were negligible.

Tables 2 and 3 show the performance of SI and LS fit indices across design factors. No difference exists between SI and LS fit indices for detecting the misspecified within-level model. That is, both SI and LS fit indices were sensitive to the model misspecification at the within-group level. In addition, the results showed that the performance of SI and LS fit indices was generally consistent across all design factors. However, their performance was more sensitive (i.e., more likely to detect the misspecified within-level model) to SM than to MM.

Table 2.

Means of SI and LS Fit Indices in Misspecified Within-Level Model (Model₁)

			NG/GS
			50/20		100/20		50/50		100/50		50/100		100/100
	ICC	MT	SI	LS_W	SI	LS_W	SI	LS_W	SI	LS_W	SI	LS_W	SI	LS_W
RMSEA	.10	MM	—	—	.069	.097	.070	.097	.070	.097	.070	.097	.070	.097
		SM	—	—	.086	.121	.087	.120	.087	.121	.088	.121	.088	.121
	.20	MM	.068	.096	.069	.097	.070	.097	.070	.097	.071	.097	.071	.097
		SM	.086	.121	.086	.121	.087	.121	.087	.121	.088	.121	.088	.121
	.30	MM	.069	.097	.070	.098	.071	.099	.071	.098	.072	.098	.072	.099
		SM	.087	.122	.087	.122	.088	.122	.088	.122	.089	.122	.089	.122
CFI	.10	MM	—	—	.952	.949	.950	.948	.950	.949	.949	.949	.949	.949
		SM	—	—	.926	.920	.923	.921	.923	.920	.921	.920	.922	.920
	.20	MM	.954	.949	.954	.948	.950	.948	.950	.948	.949	.948	.949	.948
		SM	.927	.919	.928	.919	.923	.919	.923	.919	.921	.919	.921	.919
	.30	MM	.954	.948	.954	.947	.949	.946	.950	.946	.948	.946	.948	.946
		SM	.928	.917	.929	.918	.923	.918	.923	.918	.920	.918	.920	.918
TLI	.10	MM	—	—	.905	.904	.900	.903	.901	.904	.899	.904	.899	.904
		SM	—	—	.852	.850	.846	.851	.845	.850	.843	.850	.843	.850
	.20	MM	.908	.904	.907	.903	.900	.902	.901	.902	.898	.902	.898	.902
		SM	.854	.847	.855	.848	.846	.848	.846	.848	.842	.848	.842	.848
	.30	MM	.908	.902	.908	.900	.899	.899	.900	.900	.896	.900	.896	.899
		SM	.857	.845	.858	.846	.846	.846	.846	.846	.841	.846	.841	.846
SRMR	.10	MM		—		.046		.046		.045		.045		.045
		SM		—		.130		.130		.130		.130		.130
	.20	MM		.046		.046		.046		.045		.045		.045
		SM		.131		.130		.130		.130		.130		.130
	.30	MM		.047		.046		.046		.046		.046		.046
		SM		.132		.132		.131		.132		.132		.131

Note. SI = simultaneous: LS = level-specific; NG=number of groups; GS=group size; ICC = intraclass correlation; MT = misspecification type; MM = measurement misspecification; SM = structure misspecification; — = nonconvergence; RMSEA = root mean square error of approximation; CFI = comparative fit index; TLI = Tucker–Lewis index; SRMR = standardized root mean squared residual.

Table 3.

Means of SI and LS Fit Indices in Misspecified Within-Level Model (Model_2A)

			NG/GS
			50/20		100/20		50/50		100/50		50/100		100/100
	ICC	MT	SI	LS_W	SI	LS_W	SI	LS_W	SI	LS_W	SI	LS_W	SI	LS_W
RMSEA	.10	MM	.064	.096	.065	.096	.066	.097	.066	.097	.066	.097	.066	.097
		SM	.081	.121	.081	.121	.082	.121	.082	.121	.082	.121	.082	.121
	.20	MM	.064	.096	.065	.097	.066	.097	.066	.097	.066	.097	.066	.097
		SM	.081	.121	.081	.121	.082	.121	.082	.121	.083	.121	.083	.121
	.30	MM	.065	.098	.065	.098	.067	.098	.067	.098	.067	.098	.067	.099
		SM	.082	.122	.082	.122	.083	.122	.083	.122	.083	.122	.083	.122
CFI	.10	MM	.953	.949	.953	.949	.950	.949	.951	.949	.950	.949	.950	.949
		SM	.925	.919	.926	.920	.923	.920	.923	.920	.922	.920	.922	.920
	.20	MM	.955	.949	.955	.948	.951	.948	.951	.948	.949	.948	.949	.948
		SM	.929	.919	.929	.919	.924	.919	.924	.919	.921	.919	.922	.919
	.30	MM	.956	.947	.956	.947	.951	.946	.951	.946	.949	.946	.949	.946
		SM	.931	.917	.931	.918	.924	.918	.924	.918	.921	.918	.921	.918
TLI	.10	MM	.917	.947	.917	.947	.912	.946	.913	.946	.911	.946	.911	.946
		SM	.868	.849	.869	.850	.864	.850	.864	.850	.862	.850	.862	.850
	.20	MM	.920	.904	.920	.903	.913	.902	.914	.902	.911	.902	.911	.902
		SM	.874	.847	.875	.848	.866	.848	.866	.848	.861	.848	.862	.848
	.30	MM	.922	.901	.922	.900	.913	.900	.913	.900	.909	.900	.909	.899
		SM	.878	.845	.879	.846	.866	.846	.866	.846	.861	.846	.861	.846
SRMR	.10	MM		.046		.045		.045		.045		.045		.045
		SM		.131		.130		.130		.130		.130		.130
	.20	MM		.046		.046		.046		.045		.045		.045
		SM		.131		.130		.130		.130		.130		.130
	.30	MM		.047		.046		.046		.046		.046		.046
		SM		.132		.132		.132		.132		.132		.131

Note. SI = simultaneous: LS = level-specific; NG=number of groups; GS=group size; ICC = intraclass correlation; MT = misspecification type; MM = measurement misspecification; SM = structure misspecification; RMSEA = root mean square error of approximation; CFI = comparative fit index; TLI = Tucker–Lewis index; SRMR = standardized root mean squared residual.

Misspecified Between-Level Model (Model₁, Model_2B)

For the misspecified models, the means of $χ_{SI}^{2}$ were more inflated than those of $χ_{PS__B}^{2}$ . The rejection rates of $χ_{SI}^{2}$ and $χ_{PS__B}^{2}$ exceeded .257 across all design factors. Considering the design factors, the means of $χ_{SI}^{2}$ and $χ_{PS__B}^{2}$ were more inflated (i.e., more likely to detect the misspecified between-level model) with an increase in NG, GS, or ICC. Furthermore, the means of $χ_{SI}^{2}$ and $χ_{PS__B}^{2}$ were more inflated in MM than in SM when ICC ranged from .20 to .30. However, the effect of GB on the means of $χ_{SI}^{2}$ or $χ_{PS__B}^{2}$ was negligible. The rejection rates of $χ_{SI}^{2}$ and $χ_{PS__B}^{2}$ were more inflated with an increase in ICC, and they generally were inflated with an increase in NG or GS.

Tables 4 and 5 show the performance of SI and LS fit indices across design factors. The results showed that all design factors did not affect the performance of SI fit indices, but affected that of LS fit indices. Specifically, their performance depended on the types of MT. For MM, the means of $RMSE A_{P S_{-} B}$ , $CF I_{P S_{-} B}$ , $TL I_{P S_{-} B}$ , and $SRM R_{B}$ ranged from .091 to .247, .896 to .932, .805 to .873, and .063 to .101, respectively, whereas the means of $RMSE A_{P S_{-} B}$ , $CF I_{P S_{-} B}$ , $TL I_{P S_{-} B}$ , and $SRM R_{B}$ ranged from .101 to .176, .890 to .952, .794 to .910, and .158 to .214, respectively, for SM. In addition, the mean of $RMSE A_{P S_{-} B}$ increased (i.e., more likely to detect the misspecified between-level model) with an increase in ICC or GS for both types of misspecification. However, the effect of ICC on the mean of $CF I_{P S_{-} B}$ or $TL I_{P S_{-} B}$ was differential across types of MT. For MM, the means of $CF I_{P S_{-} B}$ and $TL I_{P S_{-} B}$ decreased (i.e., more likely to detect the misspecified between-level model) with an increase in ICC, whereas for SM, the means of $CF I_{P S_{-} B}$ and $TL I_{P S_{-} B}$ increased with an increase in ICC. The mean of $SRM R_{B}$ increased (i.e., more likely to detect the misspecified between-level model) with an increase in ICC, and it was more sensitive to SM than to MM. In summary, ICC, GS, or MT affected the performance of LS fit indices, but the effect of NG or GB on their performance was negligible.

Table 4.

Means of SI and LS Fit Indices in Misspecified Between-Level Model (Model₁)

			NG/GS
			50/20		100/20		50/50		100/50		50/100		100/100
	ICC	MT	SI	LS_B	SI	LS_B	SI	LS_B	SI	LS_B	SI	LS_B	SI	LS_B
RMSEA	.10	MM	—	—	.013	.094	.011	.124	.012	.122	.009	.138	.009	.136
		SM	—	—	.015	.105	.012	.127	.012	.126	.009	.141	.009	.135
	.20	MM	.025	.168	.027	.169	.019	.197	.020	.196	.014	.208	.015	.207
		SM	.021	.146	.022	.142	.015	.156	.015	.155	.011	.164	.011	.159
	.30	MM	.032	.213	.035	.216	.024	.240	.024	.239	.017	.247	.018	.245
		SM	.024	.161	.025	.158	.016	.167	.017	.166	.012	.173	.012	.169
CFI	.10	MM	—	—	.997	.912	.998	.924	.998	.927	.999	.927	.999	.928
		SM	—	—	.997	.890	.998	.921	.998	.922	.999	.924	.999	.929
	.20	MM	.993	.900	.993	.899	.996	.905	.996	.906	.998	.905	.998	.905
		SM	.994	.925	.995	.929	.997	.940	.997	.941	.998	.941	.999	.944
	.30	MM	.989	.898	.988	.896	.994	.896	.994	.897	.997	.897	.997	.898
		SM	.993	.941	.994	.944	.997	.949	.997	.950	.998	.949	.998	.952
TLI	.10	MM	—	—	.995	.835	.997	.857	.997	.863	.998	.862	.998	.866
		SM	—	—	.994	.794	.996	.852	.997	.853	.998	.857	.998	.868
	.20	MM	.986	.813	.985	.811	.992	.821	.992	.823	.995	.823	.995	.823
		SM	.989	.859	.990	.866	.995	.888	.995	.890	.997	.890	.997	.895
	.30	MM	.978	.808	.977	.805	.988	.805	.988	.806	.994	.807	.993	.809
		SM	.987	.890	.987	.895	.994	.905	.994	.907	.997	.905	.997	.910
SRMR	.10	MM		—		.072		.076		.065		.070		.063
		SM		—		.168		.165		.160		.170		.158
	.20	MM		.088		.083		.089		.081		.087		.083
		SM		.196		.194		.196		.192		.192		.192
	.30	MM		.097		.093		.101		.097		.100		.099
		SM		.208		.208		.206		.207		.208		.207

Table 5.

Means of SI and LS Fit Indices in Misspecified Between-Level Model (Model_2B)

			NG/GS
			50/20		100/20		50/50		100/50		50/100		100/100
	ICC	MT	SI	LS_B	SI	LS_B	SI	LS_B	SI	LS_B	SI	LS_B	SI	LS_B
RMSEA	.10	MM	—	—	.014	.102	.011	.129	.011	.127	.008	.139	.009	.139
		SM	—	—	.014	.105	.011	.130	.011	.128	.009	.143	.009	.136
	.20	MM	.025	.176	.026	.178	.019	.201	.019	.200	.014	.209	.014	.209
		SM	.020	.148	.021	.144	.014	.157	.014	.156	.010	.165	.011	.160
	.30	MM	.032	.220	.033	.222	.023	.243	.023	.242	.016	.248	.017	.247
		SM	.022	.163	.023	.159	.015	.168	.016	.167	.011	.174	.011	.169
CFI	.10	MM	—	—	.998	.897	.998	.917	.999	.918	.999	.924	.999	.924
		SM	—	—	.997	.891	.998	.915	.999	.917	.999	.921	.999	.927
	.20	MM	.994	.887	.994	.886	.996	.899	.997	.900	.998	.903	.998	.903
		SM	.995	.920	.996	.925	.998	.938	.998	.939	.999	.940	.999	.943
	.30	MM	.990	.888	.990	.888	.995	.892	.995	.893	.997	.896	.997	.896
		SM	.995	.939	.995	.942	.998	.948	.998	.949	.999	.949	.999	.951
TLI	.10	MM	—	—	.996	.807	.997	.844	.997	.846	.998	.858	.998	.858
		SM	—	—	.996	.795	.997	.841	.997	.845	.998	.851	.999	.863
	.20	MM	.989	.787	.989	.787	.994	.810	.994	.813	.997	.818	.997	.818
		SM	.992	.850	.992	.860	.996	.884	.996	.886	.998	.887	.998	.893
	.30	MM	.983	.790	.983	.789	.991	.798	.991	.800	.995	.804	.995	.806
		SM	.991	.885	.991	.891	.996	.903	.996	.905	.998	.904	.998	.909
SRMR	.10	MM		—		.071		.075		.064		.070		.063
		SM		—		.173		.167		.162		.164		.159
	.20	MM		.090		.084		.090		.082		.088		.083
		SM		.198		.195		.192		.193		.193		.192
	.30	MM		.100		.097		.103		.099		.101		.099
		SM		.210		.208		.206		.207		.208		.207

As per the aforementioned discussion, ICC and MT influenced the performance of LS fit indices. We additionally conducted two-way ANOVA to verify the interaction effects of ICC and MT on the performance of LS fit indices (i.e., fit index of correct model minus that of misspecified model: Δ $RMSE A_{P S_{-} B}$ , Δ $CF I_{P S_{-} B}$ , Δ $TL I_{P S_{-} B}$ , and Δ $SRM R_{B}$ ). The results showed that the interaction effects of ICC and MT were statistically significant at α = .05 for all LS fit indices. The effect size ( $η_{p}^{2}$ ) was .436 for Δ $RMSE A_{P S_{-} B}$ , .619 for Δ $CF I_{P S_{-} B}$ , .623 for Δ $TL I_{P S_{-} B}$ , and .139 for Δ $SRM R_{B}$ .

Figure 2 shows the patterns of the interaction effects for the performance of LS fit indices. The results for each fit index are as follows. First, when ICC was equal to .10, the performance of $RMSE A_{P S_{-} B}$ was similar in both MM and SM. On the contrary, when ICC was equal to .30, the performance of $RMSE A_{P S_{-} B}$ was more sensitive (i.e., more promising for detecting misspecified between-level models) to MM than to SM. Second, the effect of ICC on the performance of $CF I_{P S_{-} B}$ or $TL I_{P S_{-} B}$ depended on MT. Specifically, the performance of $CF I_{P S_{-} B}$ and $TL I_{P S_{-} B}$ improved (i.e., more promising for detecting misspecified between-level models) with an increase in ICC for MM, whereas their performance deteriorated with an increase in ICC for SM. Finally, $SRM R_{B}$ was more promising for detecting misspecified between-level models in the case of SM than MM with an increase in ICC (range = .10 ~ .20).

Figure 2.

The Effect of ICC * MT on the Performance of LS Fit Indices for Misspecified Between-Level Model:

Discussion

Summary

This study examined the performance of SI and LS fit indices across various design factors, such as ICC, NG, GS, GB, and MT for three different MCFA models (Model₁, Model_2A, and Model_2B). The results for this study were as follows. First, the results showed that the convergence rates of the SI approach generally increased with an increase in ICC or GS, whereas those of the LS approach were close to 1 across all design factors for correct models. This result is consistent with Jung (2016) who considered the three-factor MCFA model with the same factor structure across levels. Accordingly, previous findings were replicated for MCFA models with different factor structures across levels by this study.

Second, both SI and LS fit indices were sensitive to detecting misspecified within-level models, and design factors rarely affected their performance. This finding is consistent with previous studies where both SI and LS fit indices performed equally well at the within-group level for MCFA models regardless of design factors such as ICC, sample size, and GB (Hsu et al., 2015, 2016; Jung, 2016; Ryu & West, 2009). This study confirmed that the performance of SI and LS fit indices was consistently excellent for detecting misspecified within-level models even in the MCFA model with different factor structures across levels.

Third, $RMSE A_{PS_B}$ well detected the misspecified between-level models for MCFA models with either identical or different factor structures across levels (Boulton, 2011; Hsu et al., 2016). Specifically, the performance of $RMSE A_{PS_B}$ improved with an increase in ICC or GS for both types of misspecification such as MM and SM. That is, $RMSE A_{PS_B}$ was more promising for detecting misspecified between-level models when ICC or GS increased.

Fourth, $CF I_{P S_{-} B}$ and $TL I_{P S_{-} B}$ were also good at detecting the misspecified between-level models and the ICC affected their performance. This is congruent with those reported in previous studies (Boulton, 2011; Hsu et al., 2016; Jung, 2016) that the performance of LS fit indices varied with ICC levels. Furthermore, it was found that MT moderated the effect of ICC on the performance of $CF I_{P S_{-} B}$ or $TL I_{P S_{-} B}$ . Specifically, the performance of $CF I_{P S_{-} B}$ or $TL I_{P S_{-} B}$ improved with an increase in ICC for MM, whereas it deteriorated with an increase in ICC for SM. Similarly, Boulton (2011) examined the severity of misspecification as a design factor. He also verified the interaction effect between ICC and severity of misspecification on the performance of $CF I_{P S_{-} B}$ or $TL I_{P S_{-} B}$ . These results suggest that the MT condition might moderate the effect of ICC on the performance of LS fit indices. However, no previous study directly supported these results. Therefore, further research is needed to generalize this finding.

Fifth, $SRM R_{B}$ was sensitive to misspecified between-level models, especially with an increase in ICC, as shown in previous studies (Boulton, 2011; Hsu et al., 2015). Furthermore, $SRM R_{B}$ performed better in SM than in MM, in line with findings by Hsu et al. (2015) and Jung (2016). Moreover, this study found the significant interaction effect between ICC and MT. Based on the findings, the effect of ICC on the performance of $SRM R_{B}$ was stronger in SM than in MM. However, a moderating effect of MT for the performance of $SRM R_{B}$ was found only in cases of the ICC levels ranging from .10 to .20. Therefore, caution should be exercised when generalizing these findings.

Finally, the effect of GS, NG, or GB on the performance of LS fit indices was comparatively trivial. Especially, the impact of GS or NG on it was unclear, partly consistent with those of previous studies (Hsu et al., 2015; Jung, 2016; Ryu & West, 2009) that the effect of the sample size on the performance of LS fit indices did not appear for MCFA model. Based on the findings from previous studies, and also this study, the performance of LS fit indices seems to be generally consistent across the sample sizes exceeding 1,000. In addition, the effect of GB on the performance of LS fit indices was relatively trivial, which is consistent with Jung’s (2016) study. This study focused only on the ratio of 1 to 3 (3 times) in the unbalanced group by considering practical issues. To generalize the results of this study, more various unbalanced conditions for GB factor should be explored as a further study.

In summary, the results showed that (a) the LS fit indices could detect the lack of model fit at any level for MCFA models with different factor structures across levels, (b) the performance of $RMSE A_{PS_B}$ or $SRM R_{B}$ improved with an increase in ICC, and (c) the effect of ICC on the performance of $CF I_{P S_{-} B}$ or $TL I_{P S_{-} B}$ depended on MT, such as MM and SM.

Implications and Recommendations

The results of this study have several implications. First, this study verified the performance of the LS approach in the MCFA model even with different factor structures across levels. Previous studies (Boulton, 2011; Hsu et al., 2015, 2016; Jung, 2016; Ryu, 2011; Ryu & West, 2009; Schermelleh-Engel et al., 2014; Sessoms, 2019) have only demonstrated its performance using the MCFA model with an identical factor structure across levels. Therefore, we recommend using the LS approach to empirical researchers who might consider the MCFA model with different factor structures as well as identical ones across levels.

Second, this study fills the literature gap by considering additional design factors such as GB and MT that were not sufficiently examined in previous studies. We found that the effect of GB on the performance of LS fit indices was trivial, whereas the effect of MT was verified in the MCFA model with different factor structures across levels. This would provide useful information to researchers in that no previous studies have been interested in the effect of MT on LS fit indices except for $SRM R_{B}$ (Hsu et al., 2015; Jung, 2016).

Finally, this study demonstrated the interaction effect between ICC and MT on the performance of the LS fit indices. This is partly consistent with Boulton’s (2011) finding that the effect of ICC on the performance of the LS fit indices increased as the misspecification severity (i.e., manipulating a value of the between-group latent factor correlation) increased. In addition, the inconsistent performance of $SRM R_{B}$ across ICC levels reported by previous studies (Hsu et al., 2015, 2016; Jung, 2016) could be explained by considering the model misspecification factor. In summary, this study suggests that the researchers need to pay attention to the role of model misspecification when examining the effect of ICC on the performance of LS fit indices.

Based on the results of this study, recommendations for empirical researchers using MCFA models are as follows. First, either LS or SI fit indices for the model fit evaluation are recommended for studies focusing on the within-level model. Both fit indices tend to perform well in detecting the misspecified within-level models regardless of any ICC, NG, GS, and GB levels. Second, researchers are highly encouraged to use LS fit indices ( $RMSE A_{P S_{-} B}$ , $CF I_{P S_{-} B}$ , and $TL I_{P S_{-} B}$ ) when their focus is on the between-level model. Especially, with an increase in ICC (range = .10–.30), LS fit indices are more promising than SI fit indices in detecting misspecified between-level models. In addition, $RMSE A_{P S_{-} B}$ performed much better than $RMSE A_{SI}$ with an increase in GS (range = 20–100). In conclusion, even with small ICC or GS (ICC =.10 or GS = 20), the LS fit indices are good at detecting the misspecified between-level models with traditional cutoff values ( $RMSEA \leq$ .06; CFI, $TLI \leq$ .95; Hu & Bentler, 1999).

Limitations and Future Directions

Future directions based on the limitations of this study are as follows. First, this study considered a limited number of levels for some design factors such as NG and GB. For example, we only adopted two levels of NG (50 or 100) and only explored a ratio of 1 to 3 for the unbalanced group (the large GS was set to be 3 times larger than the small GS). To generalize the results, additional scenarios using different NG or GB levels are required in future studies. Second, this study considered a simple MCFA model with a limited number of latent factors. This type of model might not be reasonable with real data, although it is frequently used for simulation research. To generalize these findings, further research could consider different models (e.g., structural models) suitable for practical research. Third, this study did not consider a multivariate normality condition as a design factor. However, the assumption of multivariate normality might not always be met in real data. Therefore, future research on the performance of the LS approach under the violation of multivariate normality is needed. Fourth, this study confirmed the interaction effect between ICC and MT on the performance of LS fit indices. However, to generalize the results, these findings should be reverified under various models and conditions. Furthermore, additional MT conditions (e.g., ignoring the nonzero residual correlations or fitting a single-factor model to two-factor data) should be considered. Finally, future research comparing the performance of PS method and equivalence testing is needed. Marcoulides and Yuan (2020) attempted to show the performance of equivalence testing for MSEM, but they only used empirical data. Therefore, the performance of this method needs to be verified using simulation data.

Footnotes

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Wonsook Sohn

References

Bentler

P. M.

(1990). Comparative fit indexes in structural models. Psychological Bulletin, 107(2), 238–246. https://doi.org/10.1037/0033-2909.107.2.238

Boulton

A. J.

(2011). Fit Index sensitivity in multilevel structural equation modeling [Unpublished master’s thesis]. University of Kansas.

Browne

M. W.

Cudeck

(1992). Alternative ways of assessing model fit. Sociological Methods & Research, 21(2), 230–258. https://doi.org/10.1177/0049124192021002005

Cohen

(1988). Statistical power analysis for the behavioral sciences (2nd ed.). Lawrence Erlbaum.

Dunn

E. C.

Masyn

K. E.

Jones

S. M.

Subramanian

S. V.

Koenen

K. C.

(2015). Measuring psychosocial environments using individual responses: An application of multilevel factor analysis to examining students in schools. Prevention Science, 16(5), 718–733. https://doi.org/10.1007/s11121-014-0523-x

Finch

W. H.

French

B. F.

(2018). A simulation investigation of the performance of invariance assessment using equivalence testing procedures. Structural Equation Modeling: A Multidisciplinary Journal, 25(5), 673–686. https://doi.org/10.1080/10705511.2018.1431781

Hox

J. J.

(2010). Multilevel analysis: Techniques and applications (2nd ed.). Routledge

Hox

J. J.

Maas

C. J.

(2001). The accuracy of multilevel structural equation modeling with pseudobalanced groups and small samples. Structural Equation Modeling, 8(2), 157–174. https://doi.org/10.1207/s15328007sem0802_1

Hox

J. J.

Maas

C. J.

Brinkhuis

M. J.

(2010). The effect of estimation method and sample size in multilevel structural equation modeling. Statistica Neerlandica, 64(2), 157–170. https://doi.org/10.1111/j.1467-9574.2009.00445.x

10.

Hsu

H. Y.

Kwok

O. M.

Lin

J. H.

Acosta

(2015). Detecting misspecified multilevel structural equation models with common Fit Indices: A Monte Carlo Study. Multivariate Behavioral Research, 50(2), 197–215. https://doi.org/10.1080/00273171.2014.977429

11.

Hsu

H. Y.

Lin

J. H.

Kwok

O. M.

Acosta

Willson

(2016). The impact of intraclass correlation on the effectiveness of level-specific fit indices in multilevel structural equation modeling: A Monte Carlo Study. Educational and Psychological Measurement, 77(1), 5–31. https://doi.org/10.1177/0013164416642823

12.

L. T.

Bentler

P. M.

(1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling: A Multidisciplinary Journal, 6(1), 1–55. https://doi.org/10.1080/10705519909540118

13.

Huang

F. L.

Cornell

D. G.

(2016). Using multilevel factor analysis with clustered data: Investigating the factor structure of the Positive Values Scale. Journal of Psychoeducational Assessment, 34(1), 3–14. https://doi.org/10.1177/0734282915570278

14.

Huang

F. L.

Cornell

D. G.

Konold

Meyer

J. P.

Lacey

Nekvasil

E. K.

Heilbrun

Shukla

K. D.

(2015). Multilevel factor structure and concurrent validity of the teacher version of the Authoritative School Climate Survey. Journal of School Health, 85(12), 843–851. https://doi.org/10.1111/josh.12340

15.

Jung

(2016). Comparison of model fit evaluation approaches in multilevel structural equation modeling [Unpublished doctoral dissertation]. University of Korea.

16.

Kim

E. S.

Dedrick

R. F.

Cao

Ferron

J. M.

(2016). Multilevel factor analysis: Reporting guidelines and a review of reporting practices. Multivariate Behavioral Research, 51, 881–898. https://doi.org/10.1080/00273171.2016.1228042

17.

Lee

(2020). Testing the performance of level-specific fit evaluation in MCFA models with different factor structures across levels [Unpublished doctoral dissertation]. University of Kyungpook.

18.

Liang

Bentler

P. M.

(2004). An EM algorithm for fitting two-level structural equation models. Psychometrika, 69(1), 101–122. https://doi.org/10.1007/bf02295842

19.

Lüdtke

Marsh

H. W.

Robitzsch

Trautwein

Asparouhov

Muthén

(2008). The multilevel latent covariate model: A new, more reliable approach to group-level effects in contextual studies. Psychological Methods, 13(3), 203–229. https://doi.org/10.1037/a0012869

20.

Marcoulides

K. M.

Yuan

K. H.

(2017). New ways to evaluate goodness of fit: A note on using equivalence testing to assess structural equation models. Structural Equation Modeling: A Multidisciplinary Journal, 24(1), 148–153. https://doi.org/10.1080/10705511.2016.1225260

21.

Marcoulides

K. M.

Yuan

K. H.

(2020). Using equivalence testing to evaluate goodness of fit in multilevel structural equation models. International Journal of Research & Method in Education, 43(4), 431–443. https://doi.org/10.1080/1743727x.2020.1795113

22.

Meuleman

Billiet

(2009). A Monte Carlo Sample Size Study: How many countries are needed for accurate multilevel SEM? Survey Research Methods, 3(1), 45–58. https://doi.org/10.18148/srm/2009.v3i1.666

23.

Muthén

L. K.

Muthén

B. O.

(1998–2019). Mplus 8.3 [Computer software].

24.

Rappaport

L. M.

Amstadter

A. B.

Neale

M. C.

(2020). Model fit estimation for multilevel structural equation models. Structural Equation Modeling: A Multidisciplinary Journal, 27(2), 318–329. https://doi.org/10.1080/10705511.2019.1620109

25.

Ryu

(2011). Effects of skewness and kurtosis on normal-theory based maximum likelihood test statistic in multilevel structural equation modeling. Behavior Research Methods, 43(4), 1066–1074. https://doi.org/10.3758/s13428-011-0115-7

26.

Ryu

(2014). Model fit evaluation in multilevel structural equation models. Frontiers in Psychology, 5, Article 81. https://doi.org/10.3389/fpsyg.2014.00081

27.

Ryu

West

S. G.

(2009). Level-specific evaluation of model fit in multilevel structural equation modeling. Structural Equation Modeling, 16(4), 583–601. https://doi.org/10.1080/10705510903203466

28.

Schermelleh-Engel

Kerwer

Klein

A. G.

(2014). Evaluation of model ﬁt in nonlinear multilevel structural equation modeling. Frontiers in Psychology, 5, Article 181. https://doi.org/10.3389/fpsyg.2014.00181

29.

Sessoms

J. C. L.

(2019). Level-Specific Fit Index performance with diagonally weighted least squares estimation of multilevel structural equation models [Unpublished doctoral dissertation]. University of North Carolina.

30.

Tucker

L. R.

Lewis

(1973). A reliability coefficient for maximum likelihood factor analysis. Psychometrika, 38(1), 1–10. https://doi.org/10.1007/bf02291170

31.

J. Y.

Kwok

O. M.

(2012). Using SEM to analyze complex survey data: A comparison between design-based single-level and model-based multilevel approaches. Structural Equation Modeling: A Multidisciplinary Journal, 19(1), 16–35. https://doi.org/10.1080/10705511.2012.634703

32.

J. Y.

Lin

J. J.

Nian

M. W.

Hsiao

Y. C.

(2017). A solution to modeling multilevel confirmatory factor analysis with data obtained from complex survey sampling to avoid conflated parameter estimates. Frontiers in Psychology, 8, Article 1464. https://doi.org/10.3389/fpsyg.2017.01464

33.

Yuan

K. H.

Bentler

P. M.

(2007). 3. Multilevel covariance structure analysis by fitting multiple single-level models. Sociological Methodology, 37(1), 53–82. https://doi.org/10.1111/j.1467-9531.2007.00182.x

34.

Yuan

K. H.

Chan

Marcoulides

G. A.

Bentler

P. M.

(2016). Assessing structural equation models by equivalence testing with adjusted Fit Indexes. Structural Equation Modeling: A Multidisciplinary Journal, 23(3), 319–330. https://doi.org/10.1080/10705511.2015.1065414

Testing the Performance of Level-Specific Fit Evaluation in MCFA Models With Different Factor Structures Across Levels

Abstract

Keywords

Introduction

Literature Review

Multilevel Structural Equation Model

Simultaneous Fit Evaluation

Simultaneous Chi-Square Test Statistics

Simultaneous RMSEA

Simultaneous CFI

Simultaneous TLI

Level-Specific Fit Evaluation

Level-Specific Chi-Square Statistics

Level-Specific RMSEAs

Level-Specific CFIs

Level-Specific TLIs

Level-Specific Standardized Root Mean Squared Residuals (SRMRs)

Previous Simulation Studies

Method

Data Generation

Design Factors

Intraclass Correlation

Number of Groups

Group Size

Group Balance

Misspecification Type

Analysis

Results

Convergence Rates

Means of χ2 Test Statistics and Fit Indices

Correct Model

Misspecified Within-Level Model (Model1, Model2A)

Misspecified Between-Level Model (Model1, Model2B)

Discussion

Summary

Implications and Recommendations

Limitations and Future Directions

Footnotes

Declaration of Conflicting Interests

Funding

ORCID iD

References

Means of χ² Test Statistics and Fit Indices

Misspecified Within-Level Model (Model₁, Model_2A)

Misspecified Between-Level Model (Model₁, Model_2B)