Accounting for model misspecification in Bayesian structural equation models is an active area of research. We present a uniquely Bayesian approach to misspecification that models the degree of misspecification as a parameter—a parameter akin to the correlation root mean squared residual. The misspecification parameter can be interpreted on its own terms as a measure of absolute model fit and allows for comparing different models fit to the same data. By estimating the degree of misspecification simultaneously with structural parameters, the uncertainty about structural parameters reflects the degree of model misspecification. This results in a model that produces more reliable inference than extant Bayesian structural equation modeling. In addition, the approach estimates the residual covariance matrix that can be the basis for diagnosing misspecifications and updating a hypothesized model. These features are confirmed using simulation studies. Demonstrations with a variety of real-world examples show additional properties of the approach.
Structural equation modeling (SEM) is a popular statistical method for modeling covariance matrices. SEMs usually return a set of structural parameters that are configured in a way that is substantively interesting to the data analyst, that is, the relation between observed indicators and unobserved factors in confirmatory factor models or directional relations between latent factors in latent regression models. Theoretically, this configuration is simply a structured covariance matrix (— being a substantively interesting parameter vector) which is often different from the true covariance matrix, . Restated, the information in the true covariance matrix can be summarized by configured in a way as to have substantively interesting interpretations—SEMs are an attempt at data reduction. If the data reduction is to be believed, then should not be too distinct from . It is for this reason that model misspecification plays a big role in SEMs, and the gap between and often forms the basis for SEM goodness-of-fit indices, of which there are many.
Bayesian SEMs (BSEM, Levy & Mislevy, 2016; Song & Lee, 2012) are an approach to SEMs formulated on the basis of Bayesian probability that often relies on Markov chain Monte Carlo (MCMC) methods to approximate the posterior distribution of . BSEMs are increasingly popular for a wide variety of reasons: a general increase in the accessibility and use of Bayesian methods, and for a number of advantages BSEMs have over traditional or frequentist SEMs—the most commonly cited being the ability to estimate even when there is limited information in the data to estimate (e.g., Dunson, 2000; Lee et al., 2007; Scheines et al., 1999).
However, given the relative recency of BSEMs, model misspecification is an area of active research, and a number of traditional fit indices have been adapted to the Bayesian context. Posterior predictive p values (Levy, 2011) are akin to the frequentist test of absolute fit in a Bayesian context. Levy (2011) also proposed a Bayesian standardized root mean squared residual (SRMR)—an effect size style fit index. Hoofs et al. (2018) noted that posterior predictive p-values may be sensitive to trivial forms of misspecification in large samples and developed a Bayesian variant of the root mean square error of approximation (RMSEA) for use in large samples. And the work of Garnier-Villarreal and Jorgensen (2020) is the most extensive adaption of several frequentist fit indices for BSEM. Finally, generic information criteria-based fit indices are also used for comparing different BSEMs fit to the same data (Cain & Zhang, 2019; Merkle & Rosseel, 2018).
In this article, we present an approach to BSEMs with a saturated mean structure that models the degree of model misspecification as a parameter. Although modeling misspecification as a parameter has been accomplished in the frequentist literature (via modeling the RMSEA, Wu & Browne, 2015), our approach is uniquely Bayesian in that it leverages Bayesian techniques and approaches to quantifying uncertainty. The approach permits: (a) investigation of model fit in absolute terms; (b) model comparisons; and (c) investigation of potential modifications to the model. The approach accomplishes (a) by returning a misspecification parameter that is similar to the correlation root mean squared residual (CRMR, Ogasawara, 2001), which can be substantively interpreted on its own terms (Maydeu-Olivares, 2017). Model comparisons may then be performed using the posterior distribution of the returned fit index. Finally, one can investigate potential model modifications by focusing on the residual covariance matrix estimated by the approach.
Similar to Wu and Browne (2015), we are not simply intent on presenting another approach for evaluating the fit of SEMs. Our approach models misspecification simultaneously with structural parameters such that uncertainty due to model misspecification is then reflected in uncertainty about structural parameters. This aspect of our work is an advantage over other BSEMs which ignore this source of uncertainty. Practically, our approach will often result in structural parameters with wider intervals than a more typical BSEM.
In the next section, we elaborate on the approach we recommend. Afterward, we conduct simulation studies that evaluate the validity of the approach and compare the approach to extant methods. Then we demonstrate the approach with three datasets to further explore the behavior of the approach and conclude with a discussion and steps for further development.
All code for simulation studies, data analyses, Stan scripts, and supplementary materials are available at https://osf.io/29ydb/.
Misspecification as a Model Parameter
Wu and Browne (2015) developed a frequentist approach for modeling misspecification as a parameter such that uncertainty about structural parameters reflects model misspecification. Ideally, their work would be a starting point for a similar Bayesian endeavor as ours. However, we approach the same goal somewhat differently primarily because their approach estimates the RMSEA as a measure of the model misfit. We prefer a different approach due to challenges with interpreting the RMSEA (e.g., Chen et al., 2008; Savalei, 2012). That said, if a researcher is primarily interested in having uncertainty about structural parameters reflect the degree of model misspecification, then the approach of Wu and Browne (2015) should be sufficient and is readily extendable to the Bayesian context.
We lay out our proposal using the confirmatory factor analysis (CFA) model, although the proposal is valid for SEMs where the covariance matrix is sufficient. Our starting point is a Bayesian CFA model presented by Muthén and Asparouhov (2012), hereafter MA (2012):
where is the loading matrix with some elements set to 0 on the basis of theoretical considerations and model identification constraints, is the interfactor correlation matrix, is a diagonal matrix of residual variances, and is a full residual covariance matrix. Estimating without regularization will cause the model to be under-identified. MA (2012) estimate with a highly informative inverse-Wishart prior that implies —the vector of off-diagonal elements in —has a mean of 0 and a very small variance.
Substantively, this model assumes that major theoretical factors cause observed indicators to be correlated in a configuration assumed known a priori , while minor factors cause indicators to be trivially correlated in ways assumed unknown a priori, . This model is practically useful because it relaxes the assumption of local independence—often unrealistic for real-world data—while permitting simple structure. Under a sound theory for observed indicators, the correlations induced by the minor factors should be trivial suggesting that a simple structure is indeed reasonable for the indicators. When the correlations induced by minor factors are non-trivial, the minor factors are not minor casting doubt over the hypothesized configuration of the observed indicators.
The preceding paragraph forms the basis for our approach. In MA (2012)’s original formulation, the modeler sets the variance of by specifying a known inverse-Wishart prior on . In our formulation, we assume this variance is unknown and attempt to learn the variance from the available data.
Prior Strategies for
forms the residual covariance matrix, so options for identifying must similarly address the estimation of . We consider two approaches for estimating both matrices. In addition, we set priors for the data under the assumption that items are scaled with a total variance of about one.
Method 1: Separation Strategy
Let such that is a diagonal matrix of residual standard deviations and is the residual correlation matrix. We model both and separately (separation strategy, Barnard et al., 2000) which has the benefit over Wishart-type priors of setting the priors differently for the residual standard deviations and the residual correlations.
Prior Distributions
Under this strategy, we assume that , that is, the residual standard deviations have a weakly informative (Gelman et al., 2008; Lemoine, 2019) half- prior. And we assume , that is, the correlation matrix is LKJ-distributed (Lewandowski et al., 2009) with shape parameter .1 Accordingly, the marginal distribution of each residual correlation, , in is , where is the number of items. Practically, the residual correlations are assumed to have a mean of 0 and standard deviation, , that is, as . The main parameter to be learned from the data is , and we intend for it to be free to be as large as possible. Hence we assume , such that there is sufficient prior probability on very large values of .
A Standardized Metric for Model Fit
helps assess model fit as it is the root mean squared error (RMSE) of residual correlations with the location assumed to be 0. As , we can assume that residual correlations are increasingly trivial. However, is on the scale of residual standard deviations. To be able to use to compare different models fit to the same data, must be on the scale of the total variance. Accordingly, we rescale : , where is the -th diagonal element of such that is the RMSE of residual covariances. Finally, to create a standardized metric of model fit, we standardize : , where is the -th diagonal element of . Hence, is the root mean squared error of standardized residual covariances (SRCs).
Method 2: Hierarchical Estimation of Residual Covariances
As an alternative, a simple approach for estimating residual covariances in , , is to standardize the residual covariances and assume the SRCs to be normally distributed: , where is the -th diagonal element of and the scale parameter, , is learned from the data: .2 Hence, functions as the RMSE of SRCs with location assumed to be 0, and the prior for is weakly informative as SRCs will often be much lower than 1. We set the diagonal of to , and similar to the first method, we assume that the residual standard deviations are half-: .
The rationale for this approach is that the SRCs are on average 0, but each SRC may differ from 0 in continuous gradations. Hence, this approach can be seen as a ridge penalty (Park & Casella, 2008) for SRCs where is the shrinkage parameter. The connection to ridge regression opens up the possibility of other priors for SRCs. For example, sparsity may be desirable under the assumption that some SRCs are indeed 0 leading to alternative priors for SRCs such as double-exponential (lasso, Park & Casella, 2008) or global-local approaches that constrain some SRCs to 0 while allowing other SRCs to escape this constraint (e.g., horseshoe, Carvalho et al., 2009). Broadly, Bayesian estimation accommodates a variety of options for estimating SRCs. We believe the assumption that SRCs have a center of 0 with continuous gradations away from 0 is the most realistic, hence we maintain the normal distribution for SRCs above.
Similarities and Differences Between Both Methods
Both methods reflect an important observation about SEMs. It is commonplace in the SEM literature to assume that the population covariance matrix underlying a study is equivalent to a structured covariance matrix, that is, . Misspecified SEMs are then generated by “wrongly” setting some structural parameters to incorrect values, that is, assuming a cross-loading of 0.3 is 0. Differently, both methods above assume that misspecification (due to minor factors) is already present at the level of the population covariance matrix, that is, includes –MacCallum and Tucker (1991) suggested such a formulation for SEMs. Hence, even with a correct hypothesized structure, there is still error separate from sampling error. In both methods, contains this error and is determined by a parameter – in Method 1 and in Method 2. The key gain over MA ( 2012) is that this parameter can be estimated from an observed covariance matrix such that the parameter provides guidance on the magnitude of the effect of minor factors. In addition, different studies with identical values of either or will have different residual covariances at the population level, that is, the exact error due to minor factors is random, as opposed to being fixed.
Another similarity between both methods is that they produce fit indices, and or generically , which are similar to the CRMR. Although focuses on standardized residual covariances (like the SRMR), ignores the diagonal of the gap between the population covariance matrix and model-implied covariance matrix. In that respect, is more similar to the CRMR.
With regard to differences, the first approach based on the separation strategy represents a more credible data generation process, as it guarantees positive-definite population covariance matrices,3 while the second approach will generate non-positive-definite matrices as gets increasingly large. However, that the second approach is not always credible for data generation does not imply it has no value for estimation.
Proposed Bayesian Models
We now lay out the Bayesian models we use throughout this article unless otherwise specified and explain the model in the paragraphs that follow:
where is the sample covariance matrix assumed Wishart with degrees of freedom set to sample size , and scale matrix assumed to be the population covariance matrix, , rescaled by the degrees of freedom. are non-structurally zero elements of , with a scaling parameter that is learned from the data. The residual covariance matrices are summed to . The root of the diagonal of represents residual standard deviations and are assumed half-. Off-diagonal elements in , , represent purposely specified or hypothesized residual covariances. are standardized using the residual standard deviations , such that any hypothesized residual correlations are assumed beta-distributed with a boundary avoiding prior. Hence, our proposed models allow for both the a priori specification of known residual covariances in and the influence of minor factors in .
Permitting off-diagonal elements in both and complicates Method 1 slightly. Precisely, we permit off-diagonal residual covariances in via parameter expansion (Merkle & Rosseel, 2018; Palomo et al., 2007), such that is a diagonal matrix of residual variances after accounting for hypothesized residual covariances in . The remaining parameters and priors for Methods 1 and 2 remain unchanged from their earlier presentation. Taken all together, these priors are mostly weakly informative (Lemoine, 2019) for indicators with total variances close to 1.
Guidelines for Model Evaluation
The methods return which is similar to the CRMR. In the instance that the mean of SRCs is not 0, that is, the modeler fixes loadings such that the model implied covariances are systematically biased, both parameters estimate the RMSE of SRCs not the standard deviation of SRCs, i.e. such bias would be reflected in the returned index.
In the remainder of the article, we follow guidelines in section 10 of Maydeu-Olivares, (2017) for describing the magnitude of SRCs. If all SRCs are in , we decide the SRCs are negligible. And we classify SRCs exceeding the interval as non-trivial. We also acknowledge that the data analyst is free to set their benchmark in a given application with justification. We now identify some uses for with regard to model misspecification.
Benchmark for Acceptable Model Fit
When , most ( of) SRCs will be in the interval suggesting that most SRCs are negligible. When , most SRCs will be in the interval suggesting that most SRCs are small. Although these approximations are based on the normal distribution (Method 2), the normal distribution reasonably approximates the beta distribution (Method 1) when the beta shape parameters are equal (assumed for method 1) and exceed 10—a condition that will hold in the real world most datasets.
Relative Model Comparisons
So far, we have suggested that larger values of mean that minor factors account for more of the relation between indicators. However, a hypothesized model could also be incorrect such that reflects both the influence of minor factors and misspecification in the structured portion of the covariance matrix. Thus two models fit to the same data would differ in if one model better reflects the true structured covariance matrix underlying the data, with the more correct hypothesized structure having the lower . Differences or ratios for two models alongside credible intervals (based on the posterior distributions) provide an effect size of differences in model misspecification.
In addition to , examining allows for detecting large residual covariances, that is, relations where “minor” factors play an out-sized role. First, we standardize this matrix to ease interpretations: , where is a diagonal matrix with elements: , such that is the SRC matrix. SRCs whose 90% (or 95%) credible intervals (CI) are fully contained within may be judged to be trivially influenced by minor factors. SRCs whose 90% (or 95%) CI exclude 0 may be judged to be influenced by minor factors, though the influence may still be trivial depending on the estimate of the SRC. Absolute SRC estimates >0.1 pose a challenge to the theory. SRCs with 90% intervals that both exceed and contain 0 lead to an inconclusive state—they are neither clearly small nor distinct from 0.
When SRCs cannot be judged trivial, it is necessary to identify the cause of the non-trivial SRCs. Clusters of SRCs may point to modifications of the hypothesized structure. See Saris et al. (2009) for a discussion of the next steps on identifying model misspecifications.
We now present results from simulation studies to test the validity of the proposed approaches.
Simulation Studies
We conducted three simulation studies and summarize their results. In Study 1, we compared our proposed approach to extant approaches (standard BSEM, approximate-zero; Muthén & Asparouhov, 2012) under a relatively simple data generation process that assumed varying levels of influence of minor factors on the population covariance matrix. We found that the different approaches yielded negligible parameter bias for structural parameters, . However, only the proposed approaches correctly quantified uncertainty about across conditions. Based on the results of this study, we opted for the hierarchical estimation of residual covariances as our preferred method in the remainder of the article.
In Study 2, we further examined our proposed approach under a more typical data generation process that assumed varying levels of influence of minor factors on the population covariance matrix. In addition, we were interested in the ability of to select between a correct and a wrong model (that wrongly assumed unidimensionality of a two-factor model). We found that our proposed approach yielded adequate parameter recovery (bias and inference) for and was capable of distinguishing between a correct and a wrong model, especially for larger samples.
Finally in Study 3, we assessed the ability of the proposed approach to identify overly large residual covariances in in the data-generating process that were not prespecified by the modeler in . The proposed approach was able to identify such residual covariances in ; thus, the approach is capable of identifying notable missing residual covariances. Moreover, the proposed approach is capable of accurately estimating hypothesized residual covariances in simultaneously with .
With each simulation study, each design condition was replicated 1,000 times. The MCMC engine was Stan (Carpenter et al., 2017); for each model, there were 1,000 warm-up iterations, then 1,000 iterations were retained for inference across three chains. We retrieved (Vehtari et al., 2021)—it exceeded 1.05 less than 1% of the time for all parameters in each design condition for each model within each study suggesting MCMC convergence for parameters across almost all replications across studies.
Simulation Study 1
Data Generation Process
We set out to demonstrate the relative advantage of our approach over extant SEM estimation approaches. We assumed the separation strategy as the true model for the data as this model is both realistic—it includes the influence of minor factors—and produces positive-definite covariance matrices. The data generation process (DGP) assumed parallel indicators for each factor:
where , there were two correlated factors, the true total variance of indicators was 1, and the input data for analysis was . As can be seen from the DGP, the exact residual covariances are random, that is, vary across studies given a set level of misspecification represented by .
Conditions
We varied two factors in this study: (a) sample size, , by small (100), typical (300), large (1,000), and very large (5,000); (b) the size of minor factor influences or misspecification parameter, . Recalling the heuristic that most SRCs will fall in , these choices for correspond to models where most SRCs are negligible (within ), acceptable (within ), and several are nontrivial (exceeding ). Based on the relations in the separation-strategy section, , that is, corresponds to .
Models
We estimated the following models within each replication:
A standard frequentist model: parallel-indicators CFA with all residual covariances assumed null .
A standard Bayesian model (hereafter baseline): the corresponding Bayesian model with priors: , where is the residual variance.
An approximate-zero (hereafter AZ) model: tau-equivalent CFA using the framework defined by MA ( 2012). This model had the same priors as the baseline model. The extra parameter, , which estimates the influence of minor factor, was assumed to be inverse-Wishart with degrees of freedom set to , which implies a known standard deviation for residual covariances of about 0.1. Although we constrained the diagonal of to be identical, this model cannot have parallel indicators because there is no way to constrain the diagonal of to be identical across items using the inverse-Wishart distribution.
The separation-strategy (hereafter LKJ) approach corresponds to the DGP, with method specific priors as in line Method 1 of equation 2 and shared priors as in the baseline model.
The hierarchical estimation of residual covariances (hereafter normal) approach, with method-specific priors as in line Method 2 of Equation 2 and shared priors as in the baseline model.
Estimands and Performance Metrics
We were interested in the recovery of the structural parameters: , and the misspecification parameter, —we assessed all these parameters for the LKJ and normal models. For the frequentist model, we assessed all the structural parameters and computed the unbiased CRMR (Maydeu-Olivares, 2017). For the baseline model, we assessed all the structural parameters and computed the root mean squared error of realized SRCs using the approach laid out by Levy (2011). For the AZ approach, we assessed only and since the residual variance was split between two matrices. There is no prescribed RMR-type metric for this model.
We had two primary assessment metrics: bias and the empirical coverage rate (ECR) of the 90% confidence interval. We transformed bias to relative bias deeming relative bias within % as ideal and % as acceptable. We assessed ECR of the 90% credible intervals. We set (87.5%, 92.5%) and (85%, 95%) as ideal and acceptable limits for the 90% ECR, respectively.
Simulation Study 1 Results
Relative Bias for Structural Parameters
As shown in Figure 1, relative bias was ideal () for almost all parameters and acceptable () for all parameters across conditions and methods. The interfactor correlation was very slightly downwardly biased at a small sample size—this bias was negligible at a large sample size.
Study 1: Relative Bias of Structural Parameters.
Relative bias for
The most obvious problem as shown in Figure 2 was that computing the misspecification parameter based on the distribution of realized values (Levy, 2011) in the baseline model resulted in highly biased estimates for the misspecification parameter. This bias was only acceptable at and high misspecification . On the contrary, both the LKJ and normal methods returned acceptable bias across conditions with one exception: the LKJ approach had about 16% relative bias for the combination of small sample size and small misspecification .4 Hence both proposed approaches appear consistent and rarely biased. The unbiased CRMR was downwardly biased when except when the sample size was large.
Study 1: Relative Bias of
ECR for Structural Parameters and
As shown in Figure 3, the frequentist and baseline models resulted in severe under-coverage (ECR ) relatively often. This problem increased with larger values of and larger sample sizes. The AZ approach also demonstrated under-coverage for the loading and interfactor correlation when and . Finally, the ECR was adequate for both structural parameters and for both proposed approaches.
Study 1: Empirical Coverage Rate of 90% CI of Structural Parameters and
Discussion of Simulation Study 1
Results suggests adequate parameter recovery and inference for the proposed models in a relatively simple CFA scenario under the assumption that minor factors influence the population covariance matrix. The only exception was notable upward bias for at small sample size and small influence of minor factors. For this reason, of the proposed approaches, we continue with evaluation of the approach based on hierarchical estimation of the residual covariances (normal) approach.
As one might expect, ignoring the influence of minor factors—as the frequentist and baseline models do—leads to incorrect inference. Although structural parameter estimates remain unbiased, their uncertainty is underestimated (exact results in section C of the online supplement). This problem is magnified for larger samples or larger influence of minor factors. We find a similar undesirable result for the AZ model which assumes the size of the influence of a minor factor is known. This model also underestimates the uncertainty about structural parameters leading to poor inference.
An additional finding is the downward bias of the unbiased CRMR under low misspecification for smaller samples—this finding was reported in the original presentation of the unbiased CRMR (Maydeu-Olivares, 2017, Table 1). Importantly, this combination of conditions returns acceptable bias for returned by the normal model. This suggests that our method produces a fit index that is robust under certain scenarios that may affect the unbiased CRMR.
Finally, given the inability of extant approaches to produce adequate inference, we do not evaluate them going forward.
Simulation Study 2
We set out to study the ability of the proposed approach to choose between a correct and wrong model, again under the assumption of minor factor influences. We modified the DGP from Study 1 slightly:
The interfactor correlation is now .7; the true total variance of indicators remained 1.
We varied the same factors from study 1: (a) sample size, ; (b) the size of minor factor influences, . Based on the relations in the separation-strategy section, , where is the mean of the diagonal of that is, corresponds to .
Models
We estimated two models in this study, both based on the hierarchical estimation of residual covariances or normal model:
A correct model: according to Equation 2 (Method 2) that assumed the correct factor configuration.
A wrong model: according to Equation 2 (Method 2) that wrongly assumed a unidimensional factor instead of two correlated factors.
Wrongly assuming a unidimensional factor is a nontrivial misspecification. Given the DGP in Equation 4, the wrong model should have expected values of 0.083, 0.093 and 0.112 given the true population values of 0.025, 0.05, and 0.08. Hence, the wrong model would always be considered non-trivially misspecified. Ideally, should identify the better-fitting model and should be able to choose between both models.
Estimands and Performance Metrics
We were interested in: (a) the recovery of the structural parameters and in the correct model and (b) the ability of to distinguish both models. For assessing parameter recovery, we maintained the same metrics and benchmarks from Study 1: relative bias and ECR. For assessing the ability of to distinguish both models, we subtracted for the correct model from for the wrong model and checked the proportion of times the fifth percentile of this difference exceeded 0.
Simulation Study 2 Results
Relative Bias and ECR for Structural Parameters and
Relative bias was acceptable () for all parameters across conditions with two exceptions: was underestimated by 10.5% for both and . The ECR was acceptable () for most parameters across conditions with a few exceptions where the ECR > 95%. Complete relative bias and ECR results are in Section C of the online supplement.
Distinguishing Both Models Using
As shown in Figure 4, was able to distinguish both models to varying degrees, depending on sample size (more ability at large ) and the size of the influence of minor factors (more ability at lower ). When the influence of minor factors was low, was almost always successful at identifying the correct model as the right model. Substantively, when the influence of minor factors is large, it can be difficult to conduct informative model comparisons. Importantly, never led to the wrong conclusion, in fact was never lower for the wrong model. We do not expect this result to hold when the degree of misspecification for the major factors is less severe.
Study 2: Ability to Distinguish Correct From Wrong Model Using
Summary of Simulation Study 2
The results lead to two conclusions. The normal model is able to recover structural parameters and for a fairly typical CFA problem. In addition, is able to distinguish a correct model from a wrong model fit to the same data, assuming some influence of minor factors.
Simulation Study 3
We were interested in the ability of the normal model to identify unduly large residual covariances when the influence of minor factors is trivial. We modified the DGP from study 2:
Within each iteration, we randomly picked two pairs of residual covariances in , in the DGP, where each member of the pair belonged to a different factor. In half the conditions, the pairs were non-null: one residual correlation set to and the other set to , hereafter non-null delta condition. In the other half of conditions, the pairs were null, hereafter null delta condition. In the non-null delta condition, the absolute values of residual covariances in range from 0.069 (when ) to 0.21 (when ). We assumed such that residual covariances due to minor factors mostly lie in the interval, hence both pairs of residual covariances specified in should be larger than the elements in in the non-null delta condition.5
As noted in Equation 2, we estimate off-diagonal residual covariances in via parameter expansion (Merkle & Rosseel, 2018; Palomo et al., 2007), such that is a diagonal matrix of residual variances after accounting for hypothesized residual covariances in .
In addition to varying whether the residual covariances identified in were null or non-null, we varied the sample size, .
Models
We estimated two models in this study, both based on the normal model:
A complex model: according to equation 2 (method 2) that assumed the correct factor configuration, and pre-specified the two randomly selected residual covariances in .
A simple model: according to equation 2 (method 2) that also assumed the correct factor configuration, but did not pre-specify the two randomly selected residual covariances in .
In the null-delta condition, the simple model is correct while the complex model has two null parameters. In the non-null delta condition, the simple model is missing two parameters, while the complex model is correct. However, since the simple model estimates , both parameters should stand out in .
Estimands and Performance Metrics
We were interested in: (a) the ability of the simple model to identify the pairs of randomly selected elements in as the largest SRCs in (standardized ) in the non-null delta condition; (b) the ability of the complex model to correctly estimate the randomly selected elements in ; and (c) the ability of to distinguish both models. For assessing aim (a), we examined the proportion of times the simple model identified the randomly selected pairs in as the largest SRCs in . For assessing aim (b) we assessed the bias and ECR for both residual correlations estimated in in the complex model. For assessing aim (c), we subtracted for the complex model from for the simple model and checked the proportion of times the fifth percentile of this difference exceeded 0.
Simple Model: Identification of Large SRCs
In the null delta condition, the model identifies the pairs as the two largest SRCs in at a low rate, see left panel of Figure 5A. In the non-null delta condition, the model identifies at least one of the pair at a very high rate, while the ability to identify both pairs increases with sample size, see right panel of Figure 5A. In addition, the distribution of SRCs is always centered around 0 in the null delta condition (Figure 5B left panel), with variation that is a function of and sample size. The two pairs of randomly selected residual covariances result in estimates different from 0 in the non-null delta condition (Figure 5B right panel). Finally, the size of the SRC estimates increases with sample size, because the estimation of SRCs is regularized (shrunken) by their normal prior. These results suggest that can be used to identify overly large residual covariances that were not prespecified in .
Study 3: Detecting Large SRCs in Simple Model
Simulation Study 3 Results
Complex Model: Recovery of Residual Covariance Parameters
In the null-delta condition, the complex model estimates two redundant elements in with population parameter of 0, hence we are unable to compute their relative bias. These estimates had negligible bias, range = (–0.0049, 0.0024). In the non-null delta condition, the relative bias of both residual correlations ranged from -7.9% to 0.6%, suggesting acceptable bias for the residual correlations. Hence, the model is capable of accurately estimating prespecified residual correlations in alongside . In addition, the relative bias and ECR for structural parameters and in the complex model were almost always acceptable (section C of the online supplement).
Distinguishing Both Models Using
In the null delta condition, the 90% interval for the difference in for both models always included 0 across all conditions. Hence an investigator would rightly conclude that the simple model was just as acceptable as the complex model leading to a parsimonious solution. In the non-null delta condition, the 90% interval for the difference in for both models excluded 0: 1.9%, 30.9%, 68.3%, and 84.3% of the time for , respectively – was lower for the complex model. The ability to distinguish both models is markedly low for small samples.
Summary of Simulation Study 3 Results
These results demonstrate the ability to perform model diagnostics using when the influence of minor factors is assumed. And as one might expect, sample size plays a major role in the ability of the approach to yield informative diagnostics. When overly large SRCs are missing from a model (as in the simple model), the approach is able to identify them, especially when analyzing larger sample sizes. Moreover, the approach is able to estimate prespecified residual covariances alongside the full residual covariance matrix as in the complex model. However, the ability to detect missing residual covariances was very low at small sample sizes.
Demonstrations
Having shown that the approach produces reliable results, we now demonstrate the approach with real-world data. We fit the hierarchical estimation of residual covariances approach. As with the simulation studies, the MCMC engine was Stan. For each model, there were 2,000 warmup iterations, then 2,000 iterations were retained for inference across four chains. Sampler-specific diagnostics (Betancourt, 2017) were adequate for all estimated models. Across all models and parameters, the maximum never exceeded 1.01 suggesting parameter convergence for all parameters across models.
Holzinger-Swineford Example
We demonstrate the approach with the reduced Holzinger-Swineford dataset (n = 301, Rosseel, 2012). We assumed the following factor structure as a baseline model: , where represents estimated loadings and . are loadings constrained to 0. We assumed all factors (namely visual, verbal, and speed factors) were correlated. was 0.064, 95% CI [0.039, 0.098], such that one might expect some SRCs to exceed the desired interval. The unbiased CRMR (Maydeu-Olivares, 2017) for the corresponding frequentist model fit with lavaan (Rosseel, 2012) was 0.065, 95% CI [.047, .083]. The CRMR frequentist-CI is narrower because it ignores uncertainty due to minor factors. We inspected patterns in the SRCs (strict lower triangular part of Figure 6). We preferred to modify the factor structure as opposed to including residual correlations. Based on the consistently positive residual relation between Test 9 (discrimination straight and curved capitals) and Tests 1–3 (visual factor), we included Test 9 as an indicator of the visual factor that is, —this modification makes sense on face value.6 For the modified model, was 0.037, 95% CI [0.014, 0.064], suggesting SRCs were mostly contained within the desired interval. The unbiased CRMR for the corresponding frequentist model was 0.039, 95% CI [0.023, 0.055]. In addition, based on a comparison of posterior distributions for , there was 92% chance that reduced from the baseline model to the modified model. We inspected patterns in the SRCs (strict upper triangular part of Figure 6) and almost all SRC estimates were within with 90% CIs almost fully contained in the (−0.1, 0.1) interval. Given these results, we judged the effect of minor factors to be trivial. For comparison, we also fit a model that ignored the effect of minor factors (i.e., no matrix)—a standard BSEM. Both the modified and standard models had highly similar point estimates for all parameters, but the standard model had narrower credible intervals (Figure 7). These results reflect the findings from simulation Study 1, that is, accounting for minor factors rightly increases uncertainty about structural parameters.
Holzinger-Swineford Model: Standardized Residual Covariances in Matrix
Final Fitted Holzinger-Swineford Model Showing Parameter Estimates and 95% CI Across Fitted Models,
Political Democracy Example
We demonstrate the approach with the Political Democracy dataset (n = 75, Bollen, 1989). There were 11 items: four ratings data from 1960, four ratings data from 1965, and three economic measures from 1960. We assumed the following structured model: , for latent regression, and estimated factor (residual) variances and item residual variances—the Bayesian model is in section B of the online supplement. The ratings data were divided by 3 so that their variances were closer to 1. was 0.020, 95% CI [0.001, 0.043], and suggesting that all SRCs were within the desired interval. The unbiased CRMR for the corresponding frequentist model was 0.038, 95% CI [0.014, 0.061]. These frequentist values are notably larger than . However, the frequentist tests of exact (based on the CRMR) and close fit (based on the unbiased CRMR) were not statistically significant, , respectively. Hence in a frequentist context, there would be insufficient statistical power to claim misfit on the basis of this family of fit statistics. We also inspected patterns in the SRCs (section D of the online supplement), and all SRC estimates were within with 90% intervals contained in (−0.1, 0.1). And we conclude this model acceptable.
For comparison, we fit a model that ignored the effect of minor factors (i.e., no matrix)—a standard BSEM. Both models had highly similar point estimates for all parameters, and the standard model did not always have narrower credible intervals (Figure 8), especially when compared to the first demonstration. This matches the simulation study 1 finding that: when the influence of minor factors is smaller (smaller ) and sample size is small, the increase in uncertainty about structural parameters due to accounting for minor factor is lessened.
Political Democracy Model Showing Parameter Estimates and 95% CI Across Fitted Models,
Hospital Anxiety and Depression Scale Example
For our final example, we demonstrate the approach with the Hospital Anxiety and Depression scale Zigmond & Snaith, 1983). We use the pooled correlation matrix computed by Norton et al. (2013) in the context of a meta-analytic CFA as the input data, . We assumed the 14 items followed a bifactor structure: , representing general, anxiety and depression factors—all constrained orthogonal. was 0.028, 95% CI [0.023, 0.033], and all SRC intervals were fully contained the desired interval (section D of the online supplement). The unbiased CRMR for the corresponding frequentist model was 0.023, 95% CI [0.023, 0.024]. Hence, we conclude this model acceptable. For comparison, we fit a standard BSEM. Both models had highly similar point estimates for all parameters, but the standard model had markedly narrower credible intervals (Figure 9). This matches the simulation study 1 finding that for very large samples, the increase in uncertainty about structural parameters due to accounting for minor factor is magnified.
Hospital Anxiety and Depression Scale Bifactor Model Showing Parameter Estimates and 95% CI Across Fitted Models,
Discussion
We have presented a uniquely Bayesian approach to model fit, demonstrated the approach using real-world data, and shown that the approach is valid using simulation studies. The approach leverages Bayesian computation to model misspecification as a parameter, and permits the detection of non-trivial residual covariances. Practically, our approach would require fitting a new class of BSEMs beyond what is available in prepackaged BSEM software. For this reason, we have provided code to support adoption.
A potentially unappealing feature of the proposed approach is the increased uncertainty about estimates—an increase that is more marked at large sample sizes. However, we see this feature as a positive. A typical SEM implicitly assumes that the model configuration is correct. Uncertainty associated with specifying an incorrect model configuration is not reflected in the uncertainty about structural parameters. Given that models are at best approximations to reality, the typical SEM can be said to have overly optimistic uncertainty about structural parameters. MA (2012) capture this misspecification in the model by introducing , which reflects the influence of minor factors, under the assumption that the parameters underlying are known. We relax this assumption and estimate such that the size of the influence of minor factors is estimated by the model. Simulation results demonstrated that our proposed approach yields more reliable inference for structural parameters than fixing the size of the influence of minor factors a priori. In this regard, our approach is similar to the frequentist approach of Wu and Browne (2015) which models the RMSEA as a parameter. Both approaches produce structural parameters that reflect uncertainty due to model misspecification Alternatively stated: our proposed approach models the degree of incorrectness of the model. Hence, the uncertainty associated with specifying an incorrect model (which always exists in practice) is reflected in the uncertainty about structural parameters. Moreover, one application of our approach is the creation of realistic misspecified covariance structures, with varying levels of misspecification.
Another aspect of our work is that the proposed approach returns a misspecification parameter, , that is highly similar to the CRMR. Given that had acceptable bias and reliable inference across simulation conditions, may be considered as a credible fit index in Bayesian SEMs. Based on the simulation results, the variety of conditions under which is reliable is wider than that of the unbiased CRMR which is less reliable for the combination of small sample size and low influence of minor factors (Maydeu-Olivares, 2017). Moreover, the uncertainty about the unbiased CRMR does not reflect the uncertainty due to model misspecification as does.
A general concern with Bayesian methods is the influence priors have on the substantive results. In this article, there are two sets of priors: (a) priors for structural parameters and (b) priors related to the estimation of in both the normal and LKJ models. For structural parameters, we have opted for weakly-informative priors that are reasonable defaults when the observed items are scaled to have a variance of 1. Sometimes, researchers utilize more informative priors to augment relatively weak information in the data. Our expectation is that the use of informative priors will affect the proposed approach in ways that depend on the quality of the prior information (McNeish, 2016). When the informative priors increase efficiency about structural parameters, we can expect corresponding gains in the efficiency of and standardized residual covariances; and vice-versa. We do not expect any gains to be obtained from using extremely diffuse priors unless the data are very informative (Smid & Winter, 2020). For priors related to the estimation of for both the normal and the LKJ-models (for ), we hypothesize that alternative choices may yield improved results. Reasonable alternative priors include gamma, inverse-gamma, Student-, and log-normal distributions and other distributions with positive-only support. The choice of prior for is a question of optimizing the approach and we leave this question to future research.
Given the promise of Bayesian modeling of misspecification and as a fit index in this initial presentation, we consider some potential extensions. We intend to explore applications to Bayesian SEMs with non-continuous indicators such as for models with binary, ordinal or mixed indicators. Moreover, distributions with scale matrices are also candidates for extension, that is, a multivariate- extension would be reasonable for handling indicators with outliers. In addition, we hope to extend the work of Wu and Browne (2015) to a Bayesian context, after which the approach can be formally compared to our proposed approach. Finally, when presenting our approach, we mentioned the possibility of sparsity-inducing methods (e.g., lasso, global-local approaches) applied to the estimation of residual covariances—we also intend to explore this in a future study.
One limitation of our proposed approach is that does a poor job of distinguishing models at small sample sizes. One way to resolve this is to use generic information criteria (IC) for model selection as IC have already been applied to Bayesian SEMs (e.g., Cain & Zhang, 2019; Merkle & Rosseel, 2018).7 We consider using approximate Bayesian leave-one-out cross-validation to distinguish between proposed models. The results are promising—see implementation details and simulation results in section A of the online supplement.
Footnotes
Declaration of Conflicting Interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author received no financial support for the research, authorship, and/or publication of this article.
ORCID iD
James Ohisei Uanhoro
Supplemental Material
All code for simulation studies, data analyses, Stan scripts and supplementary materials are available at
Notes
References
1.
BarnardJ.McCullochR.MengX.-L. (2000). Modeling covariance matrices in terms of standard deviations and correlations, with application to shrinkage. Statistica Sinica, 10(4), 1281–1311.
2.
BetancourtM. (2017, January). A conceptual introduction to Hamiltonian Monte Carlo (arXiv: 1701.02434). https://arxiv.org/abs/1701.02434
CainM. K.ZhangZ. (2019). Fit for a Bayesian: An evaluation of PPP and DIC for structural equation modeling. Structural Equation Modeling, 26(1), 39–50. https://doi.org/10.1080/10705511.2018.1490648
5.
CarpenterB.GelmanA.HoffmanM. D.LeeD.GoodrichB.BetancourtM.BrubakerM.GuoJ.LiP.RiddellA. (2017). Stan: A probabilistic programming language. Journal of Statistical Software, 76(1), 1–32. https://doi.org/10.18637/jss.v076.i01
6.
CarvalhoC. M.PolsonN. G.ScottJ. G. (2009, April). Handling sparsity via the horseshoe. In Proceedings of the twelfth international conference on artificial intelligence and statistics (pp. 73–80). PMLR. http://proceedings.mlr.press/v5/carvalho09a.html
7.
ChenF.CurranP. J.BollenK. A.KirbyJ.PaxtonP. (2008, May). An empirical evaluation of the use of fixed cutoff points in RMSEA test statistic in structural equation models. Sociological Methods & Research, 36(4), 462–494. https://doi.org/10.1177/0049124108314720
8.
DunsonD. B. (2000). Bayesian latent variable models for clustered mixed outcomes. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 62(2), 355–366. https://doi.org/10.1111/1467-9868.00236
9.
Garnier-VillarrealM.JorgensenT. D. (2020, February). Adapting fit indices for Bayesian structural equation modeling: Comparison to maximum likelihood. Psychological Methods, 25(1), 46–70. https://doi.org/10.1037/met0000224
10.
GelmanA.JakulinA.PittauM. G.SuY.-S. (2008, December). A weakly informative default prior distribution for logistic and other regression models. The Annals of Applied Statistics, 2(4), 1360–1383. https://doi.org/10.1214/08-AOAS191
11.
HoofsH.van de SchootR.JansenN. W. H.KantI. (2018, August). Evaluating model fit in Bayesian confirmatory factor analysis with large samples: Simulation study introducing the BRMSEA. Educational and Psychological Measurement, 78(4), 537–568. https://doi.org/10.1177/0013164417709314
12.
LeeS. Y.SongX. Y.TangN. S. (2007, July). Bayesian methods for analyzing structural equation models with covariates, interaction, and quadratic latent variables. Structural Equation Modeling: A Multidisciplinary Journal, 14(3), 404–434. https://doi.org/10.1080/10705510701301511
13.
LemoineN. P. (2019). Moving beyond noninformative priors: Why and how to choose weakly informative priors in Bayesian analyses. Oikos, 128(7), 912–928. https://doi.org/10.1111/oik.05985
14.
LevyR. (2011, October). Bayesian data-model fit assessment for structural equation modeling. Structural Equation Modeling: A Multidisciplinary Journal, 18(4), 663–685. https://doi.org/10.1080/10705511.2011.607723
15.
LevyR.MislevyR. J. (2016). Bayesian psychometric modeling (p. 466). Chapman and Hall/CRC.
16.
LewandowskiD.KurowickaD.JoeH. (2009, October). Generating random correlation matrices based on vines and extended onion method. Journal of Multivariate Analysis, 100(9), 1989–2001. https://doi.org/10.1016/J.JMVA.2009.04.008
17.
MacCallumR. C.TuckerL. R. (1991). Representing sources of error in the common-factor model: Implications for theory and practice. Psychological Bulletin, 109, 502–511. https://doi.org/10.1037/0033-2909.109.3.502
18.
Maydeu-OlivaresA. (2017). Assessing the size of model misfit in structural equation models. Psychometrika, 82(3), 533–558. https://doi.org/10.1007/s11336-016-9552-7
MerkleE. C.RosseelY. (2018, June). blavaan: Bayesian structural equation models via parameter expansion. Journal of Statistical Software, 85(4), 1–30. https://doi.org/10.18637/jss.v085.i04
21.
MuthénB. O.AsparouhovT. (2012). Bayesian structural equation modeling: A more flexible representation of substantive theory. Psychological Methods, 17(3), 313–335. https://doi.org/10.1037/a0026802
22.
NortonS.CoscoT.DoyleF.DoneJ.SackerA. (2013, January). The hospital anxiety and depression scale: A meta confirmatory factor analysis. Journal of Psychosomatic Research, 74(1), 74–81. https://doi.org/10.1016/j.jpsychores.2012.10.010
23.
OgasawaraH. (2001, September). Standard errors of fit indices using residuals in structural equation modeling. Psychometrika, 66(3), 421–436. https://doi.org/10.1007/BF02294443
24.
PalomoJ.DunsonD. B.BollenK. (2007). Bayesian structural equation modeling. In LeeS.-Y. (Ed.), Handbook of latent variable and related models (pp. 163–188). Elsevier/North-Holland.
RosseelY. (2012). lavaan: An R package for structural equation modeling. Journal of Statistical Software, 48(2), 1–20. https://doi.org/10.18637/jss.v048.i02
27.
SarisW. E.SatorraA.van der VeldW. M. (2009). Testing structural equation models or detection of misspecifications?Structural Equation Modeling: A Multidisciplinary Journal, 16(4), 561–582. https://doi.org/10.1080/10705510903203433
28.
SavaleiV. (2012, December). The relationship between root mean square error of approximation and model misspecification in confirmatory factor analysis models. Educational and Psychological Measurement, 72(6), 910–932. https://doi.org/10.1177/0013164412452564
29.
ScheinesR.HoijtinkH.BoomsmaA. (1999, March). Bayesian estimation and testing of structural equation models. Psychometrika, 64(1), 37–52. https://doi.org/10.1007/BF02294318
30.
SmidS. C.WinterS. D. (2020, December). Dangers of the defaults: A tutorial on the impact of default priors when using Bayesian SEM with small samples. Frontiers in Psychology, 11, 611963. https://doi.org/10.3389/fpsyg.2020.611963
VehtariA.GelmanA.SimpsonD.CarpenterB.BürknerP.-C. (2021). Rank-normalization, folding, and localization: An improved for assessing convergence of MCMC. Bayesian Analysis, 16(2), 667–718. https://doi.org/10.1214/20-BA1221
33.
WuH.BrowneM. W. (2015, September). Quantifying adventitious error in a covariance structure as a random effect. Psychometrika, 80(3), 571–600. https://doi.org/10.1007/s11336-015-9451-3
34.
ZigmondA. S.SnaithR. P. (1983). The hospital anxiety and depression scale. Acta Psychiatrica Scandinavica, 67(6), 361–370.