Abstract
Determining the number of factors in exploratory factor analysis (EFA) is crucial because it affects the rest of the analysis and the conclusions of the study. Researchers have developed various methods for deciding the number of factors to retain in EFA, but this remains one of the most difficult decisions in the EFA. The purpose of this study is to compare the parallel analysis with the performance of fit indices that researchers have started using as another strategy for determining the optimal number of factors in EFA. The Monte Carlo simulation was conducted with ordered categorical items because there are mixed results in previous simulation studies, and ordered categorical items are common in behavioral science. The results of this study indicate that the parallel analysis and the root mean square error of approximation (RMSEA) performed well in most conditions, followed by the Tucker–Lewis index (TLI) and then by the comparative fit index (CFI). The robust corrections of CFI, TLI, and RMSEA performed better in detecting misfit underfactored models than the original fit indices. However, they did not produce satisfactory results in dichotomous data with a small sample size. Implications, limitations of this study, and future research directions are discussed.
Keywords
Exploratory factor analysis (EFA) is widely used in social sciences research. EFA enables discovering the factor structure underlying a set of variables and developing scientific theories and measurements. When researchers perform EFA, determining the number of factors is the first step. This step is important because deciding an optimal number of factors directly influences the results of the study (Hayton et al., 2004; Henson & Roberts, 2006). Despite the importance the decisions on the number of factors and extensive research on methods for making the decisions, it remains one of the most challenging decisions in EFA (Hayton et al., 2004). Researchers have suggested various methods to determine the optimal number of factors to be retained, but different criteria often lead to different results. Also, since choosing the number of factors in EFA affects the results of the studies, researchers are recommended to apply multiple methods and reasoned reflection (Henson & Roberts, 2006). This study aims to compare the performance of parallel analysis and fit indices in determining the number of factors in EFA.
Traditional Methods to Identify the Number of Factors in EFA
Traditionally, researchers suggested multiple methods for estimating the number of factors in EFA. Scree test (Cattell, 1966) involves an examination of discontinuities of the eigenvalue plot. The eigenvalues of the correlation matrix are computed and plotted in descending order of the values. The “break” of the graph where the last substantial drop in the magnitude of the eigenvalues is identified, and factors of the left of this point are retained. This method can be helpful but has been criticized because of its subjectivity and ambiguity, especially when there are zero or more than two distinct substantial drop points (Fabrigar et al., 1999; Hayton et al., 2004).
In eigenvalue-greater-than-one rule (K1 rule; Kaiser, 1960), the number of eigenvalues greater than 1.0 for the correlation matrix is the number of factors to include. This method is simple and objective, but not recommended because it sometimes leads to inaccurate results, including overfactoring (i.e., extracting too many factors; Fabrigar et al., 1999).
Parallel Analysis
Parallel analysis (PA) by Horn (1965) is one of the most accurate factor retention methods (Fabrigar et al., 1999; Hayton et al., 2004; Henson & Roberts, 2006; Zwick & Velicer, 1986). PA compares the eigenvalues of a sample correlation matrix with the eigenvalues obtained from repeatedly generated random correlation matrices, which are reference eigenvalues. Each of the random correlation matrices is generated with the same sample size and the same number of uncorrelated variables as the actual correlation matrix. Then, the mean of the eigenvalues of each matrix is calculated, and factors are retained when sample eigenvalues exceed the corresponding reference eigenvalues.
Table 1 reviews the studies investigating PA for ordered categorical items. These studies investigated three factors that may affect the performance of PA: (a) observed eigenvalues (known as principal component analysis, PCA) versus model-implied eigenvalues (known as principal axis factoring, PAF), (b) mean versus 95th (or 99th) percentiles of the eigenvalues across comparison data sets, (c) Pearson versus polychoric correlation. Both PCA and PAF methods in PA performed well in general (e.g., Garrido et al., 2013, 2016). Since PCA performed better in unidimensional conditions while PAF was more accurate in high factor correlation conditions in a previous study (Golino et al., 2020), we utilized PAF in this study. The original PA utilized the mean of the eigenvalues across repeatedly generated random correlation matrices (Horn, 1965). However, the 95th (or 99th) percentiles of the eigenvalues as a more stringent criterion were employed in previous studies (e.g., Glorfeld, 1995; Green et al., 2016; Lim & Jahng, 2019), and the 95th and 99th percentiles criterion showed higher accuracy than the mean criterion for dichotomous items (Weng & Cheng, 2005). Thus, this study chose the 95th percentile criterion for PA.
Previous Studies on Performance of PA in Identifying Number of Factors in EFA Using Ordered Categorical Variables
Note. r = Pearson correlation,
There are two options for the sample correlations of ordered categorical items: Pearson correlation and polychoric correlation. Tetrachoric correlation is a type of polychoric correlation for dichotomous data; the term polychoric is used throughout. Some previous studies found that the PA based on Pearson correlation performed similarly or better than the PA based on polychoric correlation (Cho et al., 2009; Weng & Cheng, 2005). Weng and Cheng (2005) found that a PA based on Pearson and polychoric correlations performed well in deciding the number of factors for dichotomous items from their simulation study based on a one-factor model. Cho et al. (2009) reported that PA based on Pearson correlations performed similarly to or better than PA based on polychoric correlations in identifying the correct number of factors. In addition, PA using both Pearson and polychoric correlation showed a tendency to extract too few factors when the correlation among the factors was large (.7). Garrido and colleagues (2013, 2016) conducted simulation studies including more comprehensive conditions such as levels of factor loadings, variables per factor, number of factors, factor correlations, sample sizes, number of response categories, and item skewness. Garrido et al. (2013) showed PA based on Pearson and polychoric correlation performed similarly except for the highly skewed condition (±1.50 to ±2.00) with medium (.55) and high factor loadings (.70). In another study, they found that PA based on polychoric correlation using weighted least square mean and variance adjusted T statistic (WLSMV) estimator had satisfactory accuracy to select the correct number of factors (Garrido et al., 2016). Yang and Xia (2015) found that PA based on polychoric correlations produced accurate results when the factors were not correlated, with 95% or above accuracy in most conditions in selecting the correct number of factors. However, the accuracy of PA decreased when factors were correlated. Lim and Jahng (2019) conducted a simulation study using PA based on polychoric correlation for ordered categorical variables, and they suggested that the PA should be conducted when the number of factors is small, and the sample size is large because the accuracy of PA was affected by these conditions. Finch (2020) conducted a simulation to identify the optimal number of factors to retain in EFA, including PA based on Pearson correlation for normally distributed items and polychoric correlation for ordinal and dichotomous variables. PA performed well with normally distributed items when factor loadings were moderate (.5) to large (.7) but not as accurate with categorical items or small factor loadings (.35) (Finch, 2020).
In sum, the performance of PA has been reported as satisfactory in selecting the optimal number of factors except for some conditions, such as high factor correlation. However, previous studies revealed inconsistency regarding the superiority of PA based on Pearson versus polychoric correlation in determining the number of factors in EFA with ordinal data. In addition to our studied conditions, Green et al. (2012, 2015, 2016, 2018) developed the revised PA (R-PA), in which the comparison data of the R-PA method are created assuming (k– 1) factors underlie the observed data (Green et al., 2012, 2015, 2016, 2018). While this method outperformed the traditional PA in many conditions, it did not perform uniformly better in every condition (Green et al., 2015, 2016, 2018). A more recent study (Lim & Jahng, 2019) reported that the traditional PA performed better in general than its variants, such as R-PA. Thus, this study utilized traditional PA.
Model Fit Statistics to Identify the Number of Factors in EFA
Recently, researchers have been studying the performance of fit indices that are widely used in confirmatory factor analysis (CFA) to identify the number of factors to be retained in EFA (Clark & Bowles, 2018; Finch, 2020; Garrido et al., 2016; Yang & Xia, 2015). Numerous model fit indices have been proposed in the context of CFA as measures of the degree of fit (Hu & Bentler, 1999). The most used model fit indices are the comparative fit index (CFI; Bentler, 1990), the Tucker–Lewis index (TLI; Tucker & Lewis, 1973), the root mean square error of approximation (RMSEA; Steiger & Lind, 1980), and the standardized root mean squared residual (SRMR; Bentler, 1995).
Researchers utilize these fit indices to identify the optimal number of latent factors in EFA in a manner analogous to that in CFA. Common cutoff values are used to determine the models’ suitability, and showing satisfactory results in fit indices is one of the criteria for selecting an optimal number of factors in EFA with other criteria, such as PA.
CFI and TLI
Both CFI and TLI are incremental fit indexes that assess the degree to which the specified model is superior to an alternative baseline model in reproducing the observed covariance matrix. The CFI is normed between 0 and 1 (higher value indicates better fit), while TLI is not normed and bounded (higher value indicates better fit). The TLI imposes a penalty on models that involve the estimation of many parameters (West et al., 2012, 2023). As we utilized WLSMV as an estimation method, scaled CFI and scaled TLI were used, which can be described by the formulas below (Xia & Yang, 2019):
where
RMSEA
The RMSEA measures the discrepancy between the observed and estimated covariance matrix scaled by the degrees of freedom (Browne & Cudeck, 1992). The RMSEA ranges from 0 to infinity (higher values indicate a worse fit). It penalizes model complexity (e.g., overfactoring) by its df in the denominator (West et al., 2012, 2023). The scaled RMSEA is defined by the following equation (Xia & Yang, 2019):
SRMR
The SRMR is a badness-of-fit index, and 0 indicates a perfect fitting model. It does not penalize the model complexity and is sensitive to the sample size (West et al., 2012, 2023). The SRMR is calculated by the formula below (Bentler, 1995; Hu & Bentler, 1999):
where p refers to the number of observed variables, and
Robust Corrections to Fit Statistics
The commonly used fit indices mentioned above were originally developed within the framework of continuous data (Savalei, 2021). Compared with what would have been observed when continuous data, the fit indices for categorical data with WLSMV tend to show better fit than if the data remained in its original continuous form (Savalei, 2021). Therefore, robust corrections to CFI, TLI, and RMSEA, which make these indices comparable to the values by normal-theory maximum likelihood estimation, were developed (Savalei, 2018, 2021).The formula of CFI and RMSEA of robust correction is as follows (Savalei, 2021):
where cML is the normal-theory maximum likelihood fit function incorporating polychoric correlation as input,
To the best of our knowledge, the robust fit indices have not been studied in EFA. In this study, we explored the results of robust corrections of CFI, TLI, and RMSEA using the same traditional cutoff values of CFI, TLI, and RMSEA. These robust corrections are implemented in the lavaan package version 0.6.15 (Rosseel, 2012).
Previous Studies on Model Fit Statistics to Determine the Number of Factors in EFA
We reviewed simulation studies that are conducted to examine model fit statistics in identifying the optimal number of factors in EFA with ordered categorical data (Table 2). To the best of our knowledge, studies so far investigated the performance of traditional fit indices, and none of the studies assessed fit indices with robust corrections. We focused on studies involving WLSMV estimation, which can produce unbiased estimates and T statistic with correct Type I error rate under small sample size (N = 250; Beauducel & Herzberg, 2006) and was preferred estimation method for the factor analysis with ordered categorical data (Barendse et al., 2015).
Previous Studies on Performance of Fit Statistics in Identifying Number of Factors in EFA Using Ordered Categorical Variables
Note. ML = maximum likelihood; MLR = robust maximum likelihood; WLSMV = weighted least square mean and variance adjusted; WLS = weighted least squares; CFI = comparative fit index; TLI = Tucker–Lewis index; RMSEA = root mean square error of approximation; SRMR = standardized root mean square residual.
Barendse et al. (2015) investigated the performance of T statistic, RMSEA, and SRMR with both continuous and ordered categorical items and across different estimators including WLSMV. They recommended the use of RMSEA when researchers are interested in major factors with ordered categorical items (e.g., selection rates of three-factor model 61.0% for small sample, 74.1% for large sample condition), and reported that WLSMV was the preferred method among the five different estimation methods (e.g., ML, robust ML, WLSMV, ML of Polychoric correlations, and weighted least squares).
Yang and Xia’s (2015) simulation study found that, using WLSMV estimation, the RMSEA with a cutoff (<.06) performed well to identify the correct number of factors for three-factor model (correct model) for both dichotomous and 4-point scale items (mean RMSEAs < .06 across all conditions) but did not reject underfactored model in some conditions in dichotomous items, especially small sample size and high factor correlation conditions.
Garrido and colleagues (2016) compared PA based on polychoric correlation and CFI, TLI, and RMSEA using WLSMV. Their results showed that PA had higher accuracy (.86 mean accuracy rate across conditions) than CFI (.80), TLI (.79), RMSEA (.70), and SRMR (.57) in identifying the correct number of factors across categorical item conditions.
Clark and Bowles (2018) studied CFI, TLI, and RMSEA with the popular cutoff values by Hu and Bentler (1999; CFI and TLI ≥ .90 and .95, RMSEA ≤ .08 and .05) using WLSMV estimation to identify the number of factors when the items were dichotomous. The CFI and TLI showed high accuracy rates (45%–100% for CFI, 43%–100% for TLI using .95 cutoff value in three-factor model) to identify the correct number of factors, except for low factor loading (.35) and low factor correlation (.20) condition (29% for CFI, 26% TLI). The CFI (9%–100%) and TLI (13%–100%) with .95 cutoff value were more accurate than RMSEA (0%) with .05 cutoff value in rejecting underfactored models. Thus, Clark and Bowles (2018) recommended being cautious to use these conventional fit cutoff rules for evaluating models with ordered categorical items because their performance was contingent on data characteristics such as the number of items and factor correlations.
Finch (2020) conducted a simulation to compare the PA based on polychoric correlation for ordered items and CFI, TLI, RMSEA, and SRMR using WLSMV estimation. Unlike previous studies mentioned earlier (Clark & Bowles, 2018; Garrido et al., 2016), RMSEA outperformed CFI and TLI in selecting the correct number of factors. The RMSEA performed as well as PA in many conditions, and even better than PA when the factor loading was small (.35) or medium (.50) or the sample size was small (100 and 200) in both dichotomous and ordered items. In sum, there were mixed findings on the performance of model fit statistics in choosing the number of factors in EFA across different simulation designs and conditions. There was no consensus about how to utilize the fit indices for the best accuracy to select the number of factors in EFA.
Purpose of This Study
A simulation study is conducted to compare the performance between PA based on both Pearson and polychoric correlations and the most commonly used fit indices, (robust) CFI, (robust) TLI, (robust) RMSEA, and SRMR, using WLSMV estimation.
Methods
A simulation study was conducted to assess the accuracy of the factor retention methods in various conditions. Five factors were manipulated (e.g., factor loading, factor correlation, sample size, number of response categories, and skewness) which were reported to affect the performance of factor retention methods (Clark & Bowles, 2018; Finch, 2020; Garrido et al., 2016; Yang & Xia, 2015). The population model of this study was a correlated two-factor model with five items per factor.
Factor Loading
The factor loadings were simulated to be .30, .50, and .70 representing low, medium, and high, respectively (e.g., Clark & Bowles, 2018; Garrido et al., 2016). The factor loadings were the same for all ten items in each replication.
Factor Correlation
For the factor correlation between the two latent variables, the levels of .20, .50, and .70 were included. These represent weak, moderate, and strong correlations, respectively (e.g., Clark & Bowles, 2018).
Sample Size
Three conditions for the sample sizes were chosen, which were 200, 500, and 1,000. These values represent studies with small, medium, and large sizes, respectively (e.g., Garrido et al., 2016).
Number of Response Categories
This study included two and five levels for the number of response categories. Both dichotomous and polytomous items with five categories were selected because these are frequently used in social sciences research (Rhemtulla et al., 2012).
Distribution of Data and Data Generation
Since nonnormality in the form of asymmetry distribution is common in applied studies, skewness levels of 0.0, 0.74, 1.22, and 2.02 from the previous study (Muthén & Kaplan, 1985) were included, and these values represent a symmetric distribution, a slightly asymmetric distribution, a moderately asymmetric distribution, and a highly asymmetric distribution respectively. To generate ordered categorical items with the four types of distributions, the multivariate normal distribution items were first generated based on the population covariance matrix. Then sets of thresholds (z scores) were employed to categorize the continuous data into ordered categorical data for model fitting. To produce dichotomous items, one threshold was used which were 0.0 for a symmetric distribution, .4495 for a slightly asymmetric distribution, .706 for a moderately asymmetric distribution, and 1.06 for a highly asymmetric distribution. For the five-category condition, sets of four thresholds from the previous study (Muthén & Kaplan, 1985) were used. The threshold values were [1.645, .643, −.643, −1.645] for a symmetric distribution, [1.645, 1.036, .385, −.385] for a slightly asymmetric distribution, [1.881, 1.341, .772, −.050] for a moderately asymmetric distribution, and [1.645, 1.282, 1.036, .674] for a highly asymmetric distribution. The plots depicting these four distributions in both dichotomous and five-category items are presented in Figure 1 in the Supplemental Material.
In sum, there are 3 (factor loadings) × 3 (factor correlation) × 3 (sample size) × 2 (number of response categories) × 4 (distribution of data) = 216 total conditions in this study. For every condition, 1,000 unique data sets were generated.
Data Analysis
The R software (R Core Team, 2023) was used to generate data and conduct PA and EFA. The fa.parallel function in the psych package version 2.4.1 (Revelle, 2024) was utilized to conduct PA based on both Pearson and polychoric correlations. Principal axis factoring for the factor extraction method and 95th percentiles of the eigenvalue criterion were employed for the PA. The fa.function function offers two methods to generate random comparison data: random data generation and random sorting (resampling; Revelle, 2024). Random sorting means every value of each item was randomly rearranged, such that the expected correlation of any pairs of items would be zero while maintaining the univariate distribution of each item. Revelle (2024) comments that the two methods produce very similar results, and we chose the random data generation method. For evaluation, the proportion of choosing the correct number of factors was calculated in each condition.
The efa function in the lavaan package version 0.6.15 (Rosseel, 2012) was used to conduct EFA and produce model fit indices. The EFA was conducted using an oblique geomin rotation with the WLSMV estimation method. A one-factor solution, a two-factor solution (the true model), and a three-factor solution were fitted for each population model. The fit statistics, CFI, TLI, RMSEA, SRMR, and their robust corrections of each model across the conditions were produced, and the mean and standard deviation of those fit statistics were computed. Then, the proportion of (robust) CFI and (robust) TLI of each model higher than .90 and .95, and the proportion of (robust) RMSEA and SRMR of each model below .08 and .05 were computed. These cutoff values represent “acceptable” and “excellent” fit, respectively (Browne & Cudeck, 1992; Hu & Bentler, 1999; West et al., 2012). The models which did not converge were not included when the proportion was computed. The models that had negative residual variances, which were “Heywood Cases,” were not excluded as a previous study did (Clark & Bowles, 2018) because it did not affect the fit (Briggs & MacCallum, 2003).
Results
Parallel Analysis (PA)
The convergence rate of PA based on Pearson correlation was 100% across conditions. When polychoric correlation was utilized for PA analysis, the convergence rate was 100% for dichotomous items but 94.4% (range 40.9%–100%) for five-category items due to low convergence rates for small sample size conditions (see Supplemental Table 87 for details).
Table 3 demonstrates factor retention results based on PA utilizing both Pearson and polychoric correlations for dichotomous and polytomous items with five categories across conditions when the sample size is big (n = 1,000). The results of PA based on both Pearson and polychoric correlation demonstrated a 100% accuracy rate across conditions under moderate and high factor loading conditions. When the factor loadings were low (.3), the accuracy rate was still nearly 100% in most conditions, but it was lowered to 90.2% and 38.5% for dichotomous items for Pearson and polychoric correlation, respectively, under highly asymmetric distribution and high factor correlation condition.
Factor Retention (as Percentages) Based on Parallel Analysis (n = 1,000)
Note. Cor = Correlation; r = Pearson correlation,
When the sample size was medium (n = 500), accuracy rates were nearly 100% under moderate and high factor loading conditions in both PA based on Pearson and polychoric correlation. The accuracy rates decreased under low factor loading conditions when items were dichotomous, especially in high factor correlation and asymmetric data. The accuracy rates of PA based on Pearson correlation were 78.4% to 96.9%, and PA based on polychoric correlation was 19.9% to 64.1% under highly asymmetric, low factor loading conditions. When the sample size was small, the accuracy rates still showed almost 100% in most moderate and high factor loading conditions in both PA based on Pearson and polychoric correlation. However, the accuracy rates lowered under low factor loading, high factor correlation, and highly asymmetric conditions, especially in dichotomous data. This tendency became more pronounced when utilizing PA based on polychoric correlation, resulting in a notable decrease in accuracy from 10.7% to 23.4% under highly asymmetric, low factor loading conditions in dichotomous data. In contrast, PA based on Pearson correlation exhibited a comparatively lower drop from 59.3% to 86.5% (see Supplemental Tables 1 and 2).
Overall, PA based on Pearson correlation yielded accurate results in most conditions and showed a tendency that the accuracy rate was higher when factor loadings were increased, factor correlations were smaller, the sample size was bigger, and the data was symmetric. In addition, PA based on Pearson correlations produced a higher accuracy rate than PA based on polychoric correlations in general, even though PA based on both correlations showed almost 100% accuracy under the high factor loading and symmetric conditions.
EFA
The convergence rates were calculated by the proportion of the fit indices produced by lavaan output and 100% for most conditions in one-factor and two-factor solutions. There were a few conditions under high factor correlation that showed less than 100%, but the lowest convergence rate was 97.8%. The robust corrections to CFI, TLI, and RMSEA showed similar results in polytomous data. However, they yielded lower convergence rates in some dichotomous item conditions, especially when the sample size was small, and the distribution was nonnormal. The convergence rate of robust fit indices dropped to 4.3% under highly skewed small sample conditions in dichotomous data. The three-factor solution, an overfactored model, yielded a mean of 81.7% for CFI, TLI, and RMSEA, 89.4% convergence rate for SRMR, and 75.0% for the robust corrections to CFI, TLI, and RMSEA across conditions. The lowest convergence rate of the robust corrections to the fit indices, 3.4%, was produced under a small sample size and highly asymmetric conditions in dichotomous items. The convergence results across conditions for two- and three-factor solutions are presented in the Supplemental Materials.
The percentage in cutoff values of each fit index across conditions were presented for EFA results. The CFI, TLI, and RMSEA were discussed first with their corresponding robust corrections, and then SRMR was presented. The results with a big sample size with conservative cutoff values (.95 for CFI and TLI and .05 for RMSEA and SRMR) of both dichotomous and polytomous items were presented in the tables. The results of other conditions, as well as liberal cutoff values (.90 for CFI and TLI and .08 for RMSEA and SRMR), are provided in the Supplemental Material with the mean and standard deviation of each fit index.
CFI and Robust CFI
Table 4 illustrates the percentages of replications in a cutoff value of .95 for CFI in a big sample size across conditions. The CFI demonstrated a 100% accuracy rate in a two-factor solution (correct model) across conditions with both dichotomous and polytomous items (M = .997–1.00, SD = .000–.004). However, the CFI could not detect the three-factor solution and showed a 0% accuracy rate as it resulted in a perfect fit (M = 1.00, SD = .000–.001). In a one-factor solution, the accuracy rates were almost 100% in the low or moderate factor correlation as none of the replications showed higher than .95, but the accuracy rates became low when the factor correlation was high (i.e., 9.5%–91.2% for dichotomous items, 1.6%–100% for polytomous items). As it is shown in Table 5, the robust CFI produced noticeably more accurate results even under high factor correlation conditions. The accuracy rates were above 93.4% and 100% in both one- and two-factor solutions for dichotomous and polytomous items, respectively.
Percentages of Replication in Cutoff Values (>.95) for CFI (n = 1,000)
Note. # of Cat = Number of categories; Low loadings = low factor loadings (.3); Moderate Loadings = moderate factor loading (.5); High loadings = high factor loadings (.7); Low r = low factor correlations (.2); Mod r = moderate factor correlations (.5); High r = high factor correlations (.7); Sym = symmetric distribution; Asy1 = slightly asymmetric distribution; Asy2 = moderately asymmetric distribution; Asy3 = highly asymmetric distribution.
Percentages of Replication in Cutoff Values (>.95) for Robust CFI (n = 1,000)
Note. # of Cat = Number of categories; Low loadings = low factor loadings (.3); Moderate Loadings = moderate factor loading (.5); High loadings = high factor loadings (.7); Low r = low factor correlations (.2); Mod r = moderate factor correlations (.5); High r = high factor correlations (.7); Sym = symmetric distribution; Asy1 = slightly asymmetric distribution; Asy2 = moderately asymmetric distribution; Asy3 = highly asymmetric distribution.
Although the results were not presented in this paper because of space (see Supplemental Material), when the cutoff value of .90 was applied, the CFI still showed 100% accuracy for a two-factor solution across conditions. However, the CFI showed a very low accuracy rate in the one-factor solution, especially under high factor correlation conditions (0.0%–1.3% for dichotomous items, 0.0%–7.5% for polytomous items in big sample size conditions). In addition, when sample sizes were small and medium (n = 200, 500), CFI yielded very similar to those in Table 4. For robust CFI, the accuracy rate at the .95 cutoff decreased when the sample size was smaller. Specifically, the accuracy rates of one-factor solution and two-factor solution were above 92.0% and above 73.3% in medium sample size and above 89.3% and above 36.8% in small sample size. However, for five-category items, the accuracy rates at the .95 level were above 98.5% and 97.7% in one-factor and two-factor solutions, respectively, in medium sample size and above 92.9% and 70.5% in small sample size conditions.
TLI and Robust TLI
Table 6 presents the percentages higher than the .95 cutoff value for TLI when the sample size is big (n = 1,000). The TLI showed a 100% accuracy rate for a two-factor solution in both dichotomous and polytomous data (M = 1.00, SD = .000–.011). Similar to CFI, the TLI could not detect the three-factor solution and showed a 0% accuracy rate (M = 1.00–1.01, SD = .001–.010). The TLI showed a 100% accuracy rate in most conditions in the underfactored model except for the high factor correlation condition (51.6-99.9% for dichotomous items, 61.0%–100% for polytomous items). The robust TLI results are shown in Table 7. The accuracy rates slightly decreased in the two-factor solution to 83.2% for the dichotomous items and 99.5% for five-category items. However, the robust TLI yielded better results than TLI for a one-factor solution, as the accuracy rates were 100% across all conditions.
Percentages of Replication in Cutoff Values (>.95) for TLI (n = 1,000)
Note. # of Cat = Number of categories; Low loadings = low factor loadings (.3); Moderate Loadings = moderate factor loading (.5); High loadings = high factor loadings (.7); Low r = low factor correlations (.2); Mod r = moderate factor correlations (.5); High r = high factor correlations (.7); Sym = symmetric distribution; Asy1 = slightly asymmetric distribution; Asy2 = moderately asymmetric distribution; Asy3 = highly asymmetric distribution.
Percentages of Replication in Cutoff Values (>.95) for Robust TLI (n = 1,000)
Note. # of Cat = Number of categories; Low loadings = low factor loadings (.3); Moderate Loadings = moderate factor loading (.5); High loadings = high factor loadings (.7); Low r = low factor correlations (.2); Mod r = moderate factor correlations (.5); High r = high factor correlations (.7); Sym = symmetric distribution; Asy1 = slightly asymmetric distribution; Asy2 = moderately asymmetric distribution; Asy3 = highly asymmetric distribution.
When the cutoff value of .90 was used, the TLI showed a nearly 100% accuracy rate for the two-factor solution but presented a very low accuracy rate under high factor correlation conditions in the one-factor solution (0.0%–10.2% for dichotomous items, 0.0%–53.3% for polytomous items in big sample size condition). While the TLI showed a similar pattern when the sample size was small or medium, the accuracy rates were slightly lower in a low factor loading condition of two-factor solutions (82.0%–98.3% for dichotomous item, 92.1%–100% for polytomous item at .95 cutoff level) and in high factor loading and high factor correlation conditions of the one-factor solution (25.0%–42.4% for dichotomous item, 41.8%–56.3% for polytomous item at .95 cutoff level) when the sample size was small. The robust TLI showed noticeably better results than TLI in the underfactored model (90.5%–100% for dichotomous item, 96.0%–100% for polytomous item at .95 cutoff level) in small sample size conditions. However, the accuracy rates of robust TLI were lower than TLI in a two-factor solution (31.1%–100% for dichotomous item, 56.8%–100% for polytomous item at a .95 cutoff level, see Supplemental Material for details).
RMSEA and Robust RMSEA
Table 8 illustrates the percentage of replications for an RMSEA value that shows an excellent fit (<.05) under big sample size conditions. Like TLI and CFI, the results showed 100% accuracy (M = .006–.007, SD = .008–.009) when the data were fitted with a two-factor solution. When the model was fitted with a one-factor model, the RMSEA showed a nearly 100% accuracy rate in most conditions except for high factor correlation and low factor loading conditions in dichotomous items (33.0%–99.1%). However, RMSEA also could not detect the overfactored model, and it showed an even better fit than the two-factor model (M = .001–.002, SD = .003–.005). The robust RMSEA results yielded 100% accuracy for the underfactored model, as presented in Table 9. However, the accuracy rates were lowered for a two-factor solution from 47.7% to 97.3% for dichotomous items and 86.3% to 100% for polytomous items.
Percentages of Replication in Cutoff Values (<.05) for RMSEA (n = 1,000)
Note. # of Cat = Number of categories; Low loadings = low factor loadings (.3); Moderate Loadings = moderate factor loading (.5); High loadings = high factor loadings (.7); Low r = low factor correlations (.2); Mod r = moderate factor correlations (.5); High r = high factor correlations (.7); Sym = symmetric distribution; Asy1 = slightly asymmetric distribution; Asy2 = moderately asymmetric distribution; Asy3 = highly asymmetric distribution.
Percentages of Replication in Cutoff Values (<.05) for Robust RMSEA (n = 1,000)
Note. # of Cat = Number of categories; Low loadings = low factor loadings (.3); Moderate Loadings = moderate factor loading (.5); High loadings = high factor loadings (.7); Low r = low factor correlations (.2); Mod r = moderate factor correlations (.5); High r = high factor correlations (.7); Sym = symmetric distribution; Asy1 = slightly asymmetric distribution; Asy2 = moderately asymmetric distribution; Asy3 = highly asymmetric distribution.
When the cutoff value of .08 was applied, the accuracy rate of the two-factor solution was 100% for both dichotomous and polytomous items. Although the accuracy rates of the underfactored model were also 100% for most conditions, the accuracy dropped to 7.1% and 0% when the factor loadings were low and factor correlation was high in asymmetric data in dichotomous and polytomous data, respectively. The RMSEA yielded similar results when the sample size was small or medium. Specifically, the accuracy rates of the one-factor solution were slightly lowered in some small or medium sample size conditions. The range of accuracy rate was 35.4%–100% for dichotomous items and 82.6%–100% for polytomous items at a .05 cutoff level. The robust RMSEA produced better or similar results than RMSEA in an underfactored model across conditions, but accuracy rates were lower than RMSEA in two-factor solution in small and medium sample size conditions, especially in dichotomous items. Specifically, the range of accuracy rates were 34.7%–100% and 97.8%–100% for a one-factor solution and 28.0%–77.9% and 49.8%–86.9% for two-factor solution in dichotomous and polytomous items, respectively, at .05 cutoff level (see Supplemental Material for details).
SRMR
Table 10 demonstrates the percentage of replications for an SRMR value that shows an excellent fit (<.05) when the sample size was large. The accuracy rates were almost 100% across all conditions in both one- and two-factor solutions (one-factor solution: M = .128–.180, SD = .005-.031, two-factor solution: M = .006–.035, SD = .001–.005) regardless of number of item categories. However, SRMR also could not detect three-factor models and shows a very good fit (M = .005–.025, SD = .001–.004).
Percentages of Replication in Cutoff Values (<.05) for SRMR (n = 1,000)
Note. # of Cat = Number of categories; Low loadings = low factor loadings (.3); Moderate Loadings = moderate factor loading (.5); High loadings = high factor loadings (.7); Low r = low factor correlations (.2); Mod r = moderate factor correlations (.5); High r = high factor correlations (.7); Sym = symmetric distribution; Asy1 = slightly asymmetric distribution; Asy2 = moderately asymmetric distribution; Asy3 = highly asymmetric distribution.
The results were similar in medium sample size conditions, and SRMR detected a one-factor solution with a high accuracy rate in small sample size conditions (99.5%–100%). However, the accuracy rates were lower in the two-factor solution when the sample size was small in both dichotomous and five-category items to 0.7%–100% and 44.5%–100%, respectively. The accuracy rates were low, especially in low factor loading and highly asymmetric data in small sample size conditions (see Supplemental Material for details).
Discussion
Determining the optimal number of factors is critical in factor-analytic studies. Common methods were developed in continuous variable contexts, and most simulation studies were conducted with continuous items, although studies dealing with categorical or nonnormal data are common in behavioral science. Therefore, this study conducted a Monte Carlo simulation to examine the performance of PA and model fit statistics of EFA in ordered categorical items under diverse conditions.
First, PA produced accurate results in most conditions, and PA based on Pearson correlation yielded better results under low factor loading conditions and similar results under moderate and high factor loading conditions than PA based on polychoric correlation. This is consistent with previous research that reported the PA based on Pearson correlation performed better or similarly than PA based on polychoric correlation (Cho et al., 2009; Weng & Cheng, 2005). The accuracy rate lowered when the factor correlation was high, as previous research also reported (Cho et al., 2009; Lim & Jahng, 2019; Yang & Xia, 2015), but this tendency was noticeable only in the low factor loading conditions, especially in dichotomous items with asymmetric data. When the distribution of data was more skewed, the accuracy rate of PA tended to be lowered, as earlier research reported (Garrido et al., 2013). In addition, the accuracy rate of PA decreased when the sample size was smaller, especially under low factor loading conditions, and this is consistent with a previous study that reported PA was not effective when the sample size was small (Yang & Xia, 2015).
Second, the RMSEA demonstrated comparable accuracy rates at .05 cutoff values with PA across conditions. The RMSEA detected the underfactored model 100% in most conditions except for both high factor correlation and low factor loading conditions. These results were inconsistent with the previous research, which demonstrated the RMSEA could not detect underfactored models at all, except in one condition at a .05 level (Clark & Bowles, 2018). This discrepancy can be originated from the simulation conditions and models that were analyzed. For example, this study examined a two-factor model from symmetric distributions to highly asymmetric distributions of the item, but a previous study (Clark & Bowles, 2018) employed a three-factor structure as a population model, and symmetrical and asymmetrical items were mixed in each condition. Further study is needed to investigate the performance of RMSEA in underfactored models under broader conditions.
The robust RMSEA showed better results than RMSEA in underfactored models because it produced higher accuracy rates even in low factor loading and high factor correlation conditions. However, the accuracy rate of the robust RMSEA of the two-factor model was lowered at .05 cutoff level, and it showed poor accuracy in small sample size, especially in dichotomous items. This is consistent with the previous study mentioned that the robust fit indices did not perform well in dichotomous data with small sample size (Savalei, 2021).
Third, the SRMR demonstrated perfect accuracy rates in both underfactored model and true model especially when the sample size was big. However, when the sample size became smaller, the accuracy of the SRMR decreased especially in low factor loading conditions. A previous study indicated unsatisfactory results for SRMR for categorical data (Garrido et al., 2016), but this study found that this is only the case when the sample size was small.
Fourth, the CFI and TLI produced analogous results. Both showed high accuracy rates in detecting the true model, but lower accuracy rates in detecting underfactored model than the RMSEA in high factor correlation conditions. This result is consistent with the previous study that reported the CFI and TLI were less likely to be rejected in the underfactored model when the factor correlations were increased (Clark & Bowles, 2018). As Clark and Bowles (2018) stated in their study, high factor correlation indicates that the factors strongly overlap, and it becomes difficult to distinguish as different factors in the model. Therefore, it is insensitive to detect underfactored models under high factor correlation conditions. The robust CFI and robust TLI showed a lot better results in detecting the one-factor solution than CFI and TLI. However, the accuracy rates of two factor model (i.e., true model) were lowered in both robust CFI and robust TLI indices.
Fifth, a previous study reported that the fit indices yielded considerably poorer results when the data was skewed (Garrido et al., 2016), and this study demonstrated similar findings. The accuracy rates of the number of latent factors were not affected by the skewness of the data in most conditions, but there was a tendency that the accuracy rates decreased in some conditions, such as small factor loadings and/or high factor correlation conditions, when the data are highly skewed in the underfactored model.
Sixth, all fit statistics that were used in this study (i.e., [robust] CFI, [robust] TLI, [robust] RMSEA, as well as SRMR) could not detect overfactored models at all and even produced better fit results in overfactored models than the two-factor models. This result is consistent with previous research (Clark & Bowles, 2018; Yang & Xia, 2015), and as the authors of the previous study mentioned, this is because overfactored models reproduce the observed data matrix more accurately (Clark & Bowles, 2018).
In summary, the simulation results described in this study demonstrated that PA performed better than the fit indices in general. However, under most conditions, both the RMSEA with all sample sizes and the SRMR with large sample sizes performed similarly to the Parallel Analysis (PA) method at a cutoff level of .05. In particular, they demonstrated superior performance in detecting underfactored and correct models, especially when employing five-category items. The TLI showed a lower accuracy rate in the underfactored model than RMSEA but higher than CFI in general. This result is in keeping with a recent study (Finch, 2020) in that the RMSEA was more accurate than the CFI or TLI in determining the optimal number of factors in EFA. However, these results are inconsistent with other previous research (Clark & Bowles, 2018; Garrido et al., 2016) that demonstrated the CFI and TLI performed better than RMSEA in deciding the number of factors in EFA.
Implications
When estimating the number of factors in EFA with ordinal variables, conducting PA based on Pearson correlation is recommended because it showed high accuracy rates in most conditions. However, PA is not recommended when the sample size is small, items are highly skewed, factor correlation is high, factor loadings are low, and the number of response categories is small.
The SRMR and RMSEA results were comparable with PA in most conditions. Specifically, the SRMR is recommended for selecting the number of factors in EFA with ordered categorical items when the sample size is big. However, SRMR should not be utilized when the sample size is small, especially in dichotomous data. Moreover, RMSEA can be a good option in general. However, RMSEA could not detect the underfactored model when the factor loadings are low and factor correlation is high, especially when the number of categories is small. While the robust RMSEA is not recommended in general, this would be a better option low factor loading and high factor correlation conditions for polytomous items with liberal cutoff value (.08) as it showed much better results in detecting the underfactored model than RMSEA.
Although TLI and CFI could not detect underfactored model when the factor correlation was high, TLI and CFI can be employed in EFA with ordered categorical items in small to moderate factor correlation conditions. Thus, the robust CFI and robust TLI were recommended for high factor correlation conditions because these indices detected the underfactored model very well. The robust fit indices showed better accuracy rates in underfactored model in general, but it showed lower convergence rates and lower accuracy rates in the correct model than the general fit indices in this study. The low convergence might be from the cML fit function used in the robust fit indices formula, which could not be calculated non-pd matrix as input as it was mentioned as a limitation in a previous study (Savalei, 2021). Also, robust fit indices were not recommended in dichotomous data with a small sample size, as the previous study reported (Savalei, 2021). In addition, as Clark and Bowles (2018) pointed out in their study, researchers should be aware that the fit indices cannot detect overfactored models in EFA.
Finally, the result of this study indicates that the common cutoff values for RMSEA of .08 and both CFI and TLI of .90 are likely too liberal when used in EFA with ordered categorical items. The cutoff values of .05 for RMSEA and .95 for CFI and TLI are recommended for determining the optimal number of factors in EFA. For robust CFI and TLI, while .95 is recommended in most conditions, the cutoff can be more liberal (i.e., .90) in small sample size because of the low accuracy rate of the two-factor solution at the .95 cutoff level. For the robust RMSEA, a liberal cutoff level (.08) showed better accuracy rates than .05 level in most conditions.
Limitations and Future Research
This study extended previous studies on the performance of PA and fit statistics in EFA with ordered categorical items through Monte Carlo simulation under broader conditions including robust fit indices which were understudied. However, the limitations of this study and future directions should be mentioned. Although the conditions of this study were chosen to represent a wide variety of situations as possible, including the four levels of skewness for items, there were limitations in covering every possible condition of research. For example, this study employed a two-factor model with 10 variables and analyzed data with dichotomous and five-category items. Therefore, further research is needed with a wider variety of conditions with more complex models.
In addition, this study employed WLSMV as an estimation method. However, previous research indicated that the difference between WLSMV and (robust) ULS was small (Forero et al., 2009; Yang-Wallentin et al., 2010), but ULS outperformed WLSMV in the accuracy of estimation, whereas the WLSMV outperforms ULS in convergence rates (Forero et al., 2009). Thus, simulation studies utilizing other estimation methods, such as ULS or robust ULS, are needed to investigate if the results show differences depending on the estimation methods.
There are other methods in choosing the number of optimal factors in EFA including exploratory graph analysis (EGA) (Golino & Epskamp, 2017). As the EGA method was reported to be more accurate than the PA or other traditional methods in determining the number of latent factors (Golino & Epskamp, 2017; Golino et al., 2020), future studies comparing the performance of EGA and fit statistics in various factor models with ordinal items are needed.
In conclusion, this study examined the performance of PA and fit indices including robust fit indices in EFA with ordered categorical data. The PA and fit indices, especially RMSEA, demonstrated a high accuracy rate in most conditions. While further studies with more diverse factor models should be conducted to generalize the results, we recommend employing the PA based on Pearson correlation and fit indices in determining the number of factors in EFA with ordered categorical data, contingent upon the conditions of the data and models such as sample size, factor correlation, factor loading, and item distribution.
Supplemental Material
sj-docx-1-epm-10.1177_00131644241240435 – Supplemental material for Comparing Accuracy of Parallel Analysis and Fit Statistics for Estimating the Number of Factors With Ordered Categorical Data in Exploratory Factor Analysis
Supplemental material, sj-docx-1-epm-10.1177_00131644241240435 for Comparing Accuracy of Parallel Analysis and Fit Statistics for Estimating the Number of Factors With Ordered Categorical Data in Exploratory Factor Analysis by Hyunjung Lee and Heining Cham in Educational and Psychological Measurement
Footnotes
Authors’ Note
The portion of this paper was presented at 2023 NCME Annual meeting, Chicago, IL Virtual session.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Lee was supported by the R01 and R56 grants from the National Institute on Aging (NIA) (R56AG075744, R01AG065110), and Cham was support by the R01 and R56 grants from the National Institute on Aging (NIA) (R56AG075744, R01AG065110), R01 grants from the National Institute on Minority Health and Health Disparities (NIMHD) (R01MD015763, R01MD015715), R21 grant from the National Institute of Mental Health (NIMH) (R21MH124902), and the grant from California Department of Public Health (#22-10079). The content is solely the responsibility of the authors and does not necessarily represent the official views of the funding agencies.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
