Abstract
Model-based multiple imputation has become an indispensable method in the educational and behavioral sciences. Mean and covariance structure models are often fitted to multiply imputed data sets. However, the presence of multiple random imputations complicates model fit testing, which is an important aspect of mean and covariance structure modeling. Extending the logic developed by Yuan and Bentler, Cai, and Cai and Lee, we propose an alternative method for conducting multiple imputation–based inference for mean and covariance structure modeling. In addition to computational simplicity, our method naturally leads to an asymptotically chi-square model fit test statistic. Using simulations, we show that our new method is well calibrated, and we illustrate it with analyses of three real data sets. A SAS macro implementing this method is also provided.
Introduction
Model-based multiple imputation is widely accepted as one of the most flexible methods for handling missing data in a variety of applied research settings (Allison, 2001; Little & Rubin, 1987; Schafer, 1997). The formulas that combine point estimates and standard errors across multiple imputations have become familiar sights to educational and behavior scientists. The concept of replacing unobserved values by random draws from their proper posterior predictive distributions has not only fundamentally changed how methodologists and statisticians deal with missing observations in data collection and study design but also exerted far-reaching influence on statistical modeling and computation involving latent variables. After all, latent variables and missing data are synonymous.
As an inferential framework, multiple model-based imputation is extremely general. This generality stems from the usefulness of a missing data formulation popularized by the seminal paper on expectation-maximization (EM) algorithm by Dempster, Laird, and Rubin (1977). Unlike full likelihood–based methods that often require significant amount of tailored development for each new specialized condition, with the availability of proper multiple imputations, the complete data modeling and estimation problems are often remarkably simple. This is most naturally illustrated by the multiple imputation methods for the treatment of survey nonresponse, but it is one of a number of contexts in which imputation is used as a critical device to obtain statistical adjustments that would otherwise be too complicated for routine application. We mention below two seemingly unrelated examples simply to highlight the fact that even though a bulk of our subsequent theoretical and empirical investigations will focus on missing observations in mean and covariance structure modeling, the approach we took can be equally palatable to other contexts in which imputations are required, even if no observation is apparently missing.
First, consider the plausible value methodology (see e.g., Mislevy, Beaton, Kaplan, & Sheehan, 1992), which is one of the backbones of the statistical framework employed in such large-scale educational assessment systems as the National Assessment of Educational Progress (NAEP) or Program for International Student Assessment (PISA). These assessments tend to focus on group-level (e.g., gender, ethnicity, country, etc.) inferences, but the individual students that make up the groups are usually administered few test items due to the complex study designs that are geared toward obtaining more efficient group-level inferences. This results in substantial uncertainty in the individual students’ test scores. The simple aggregation of student scores yields statistically inconsistent group-level estimates. Glossing over a number of important details, at the end, the plausible values are multiple imputations based on a regression model that properly reflects the uncertainty in individual students’ scores so that consistent group-level inferences can be drawn. Proper analysis of data sets containing plausible values requires the same tools as in multiple imputation for handling missing observations.
Next, consider Stuart and Rubin’s (2008) matching method for estimating causal effects from observational data. In their approach, they proposed that one may construct a matched control group from multiple sources of control units when the original control group does not provide enough overlap with the treated group on observed covariates. Having more than one source of control units may introduce bias in estimated average treatment effect. To combat this potential bias, Stuart and Rubin relied on a regression-based multiple imputation procedure. Again, the analysis of the multiple sets of data requires the same basic statistical tools for combining multiple imputations.
Rather independently of the multiple imputation literature (though exceptions do exist, e.g., Rubin & Thayer, 1982), mean and covariance structure modeling (Browne & Arminger, 1995; Yuan & Bentler, 2007) has become one of the most widely used statistical techniques in social and behavioral sciences. However, when mean and covariance structure models must be fitted to multiply imputed data sets, a number of difficulties arise. First, the standard multiple imputation inferential procedure of analyzing each imputation data set separately and combining the point estimates and standard errors at the very end is cumbersome at best. In practice, the researcher usually must fit a series of models to explore specification, to compare their fit, and to examine their relative substantive interpretability. In each step, a variety of statistical and heuristic indices are examined to guide the next move. If there are 20 imputation data sets, the researcher must carry 20 replications in each step of the search for model specification, a daunting task especially when some of the indices consist of entire matrices of numbers (e.g., residual correlations). Automated procedures (e.g., PROC MIANALYZE in SAS) for combining multiple imputation results may only alleviate some of the burden. Second, even setting the cumbersomeness aside, the standard procedure does not provide an overall model fit statistic, which forms the basis of model fit assessment in mean and covariance structure modeling. Current suggestions (e.g., Allison, 2001; Meng & Rubin, 1992) are either not accurate enough (see e.g., Allison, 2003) or require computations that are cumbersome and nonstandard, at least insofar as mean and covariance structure modeling is concerned. Moreover, the performances of those suggestions have not been evaluated in the context of mean and covariance structure modeling. Indeed, we view, as a serious drawback of the standard multiple imputation inferential procedure, the lack of a principled way for computing popular fit indices such as root mean square error of approximation (RMSEA; Browne & Cudeck, 1993), or incremental indices such as Tucker–Lewis index (TLI; Tucker & Lewis, 1973). Third, the current multiple imputation procedures are focused on obtaining corrected final parameter estimates and standard errors and do not encourage publishing enough intermediate results (e.g., mean vector and covariance matrix), so that the other researchers can replicate or meta-analyze the findings more easily. Cai and Lee (2009) also make this point.
In response to the technical difficulties, we propose a new two-stage procedure for conducting multiple imputation inference for mean and covariance structure modeling. We note that this research is not on how to build proper imputation models to produce multiple imputations. In the subsequent theoretical derivations, we also purposively make no distinction between the kinds of imputations involved, for example, missing observations, plausible values, adjustments to potential outcomes, and so on. We will treat the imputations as given and focus on statistical inference with the multiple imputations.
The guiding insight of our new approach is that at least for standard mean and covariance structure modeling, the combination of multiple imputations can occur in the beginning, before any structural models are even fitted. The underlying statistical theory is a direct extension of the results obtained by Yuan and Bentler (2000), Cai (2008), and Cai and Lee (2009) for the deterministic EM algorithm. The new procedure is computationally simple because once the imputations are combined, the structural equation modeler is back in the familiar territory of working directly with summary statistics such as the means and covariances. We develop an asymptotically chi-square distributed overall model fit statistic based on Browne’s (1984) residual-based test (Proposition 4) that can also be used naturally as a basis for additional fit indices.
Sometimes the use of multiple imputations is unavoidable (e.g., dealing with plausible values), but even when there is a choice, we argue that the flexibility afforded by multiple imputation can be decidedly advantageous. Full-information maximum likelihood (FIML; Anderson, 1957; Arbuckle, 1996) is an often recommended alternative estimation strategy for mean and covariance structure modeling when some observations are missing. Under Rubin’s (1976) classification system of the types of missing data, FIML enjoys desirable large sample optimal properties when data are missing at random and the missing data mechanism is ignorable. Despite the asymptotic optimality, we show that our new procedure performs practically as well as the asymptotically optimal FIML estimator in finite sample sizes. In contrast to FIML, one of the key benefits of multiple imputation lies in the relative ease of including into the imputation model variables that are not part of the structural model, but are related to the missing data mechanism. Therefore, when data are not missing at random and the missing data mechanism is non-ignorable, which is arguably more realistic, our new procedure can easily reap the benefit of alternative multiple imputation systems that directly model the missing data mechanism without requiring significant change to how mean and covariance structure analysis is conducted in practice.
The remainder of the article is organized as follows. We will begin with some technical development to clarify the details of the new estimator and the assumptions that we are making. We will then use simulation studies to illustrate the performance of the new procedure relative to the FIML estimator and the standard multiple imputation inferential procedure in the context of missing responses. We apply the new procedure to the analysis of several well-known real data sets. We will conclude by noting the limitations of our new approach.
Notation and Existing Methods
A Mean and Covariance Structure Model
Throughout this article, we will assume that we are working with a multivariate normal data matrix
where
A typical example of a mean and covariance structure model is the extended factor analytic simultaneous equation model employed in the LISREL framework (Jöreskog & Sörbom, 2001). In LISREL, one may represent the measurement model for the ith case as
where
where
Taking expectations, we see that the model implies a linear mean structure
The implied covariance structure model is
The joint mean and covariance structure model is obviously
The exact form of the structural equation model is not essential here. Equation 7 is merely one of the many instantiations of the general mean and covariance structure model given in Equation 1. Other formulations (e.g., the Bentler-Weeks model) can be used. For our purposes, the model in Equation 7 is sufficient.
Estimation and Inference Under Standard Conditions
If there are no missing data in
and
respectively. We define
In practice, estimation is typically accomplished by invoking the multivariate normality assumption of
The minimizer of
Under appropriate conditions, it is consistent, asymptotically normal, and asymptotically efficient. Furthermore, the statistic
is distributed in large samples as a central chi-square variable with
Conventional Multiple Imputation Estimation and Inference
If missing values are present, or if some or all variables in
In brief, consider the
Standard multiple imputation formulas can be used to combine the point estimates and variability information (Schafer, 1997). The final point estimate is the average of the multiple imputation estimates:
For combining likelihood ratio goodness-of-fit test statistics, we may use the two statistics
Conventional FIML Estimator
As mentioned earlier, full-information maximum likelihood (Arbuckle, 1996) is a popular estimation method for mean and covariance structure models under ignorable missing data. In normal theory FIML estimation, an individual case’s contribution to the log likelihood can be defined as
where
This is obtained by removing the second row of a
The FIML estimator directly maximizes the log-likelihood function in Equation 14 to obtain the structural model parameter estimates. Let this maximum be denoted
The FIML estimator also naturally leads to a chi-square distributed fit statistic based on likelihood ratio comparisons. The above FIML log likelihood can be used to obtain an unstructured/saturated maximum likelihood estimate of the mean vector and covariance matrix under missing data. Let the saturated estimate of the mean vector be denoted
is asymptotically distributed as a central chi-square variable with
The New Two-Stage Estimator
The conventional multiple imputation procedure outlined previously is ideal for such modeling frameworks as linear regression analysis where the focus is on estimating and testing parameters. For reasons discussed earlier in the introduction section, it can be ill-suited to the practice of mean and covariance structure modeling. We shall call our new estimator the Multiple Imputation Two Stage (MI2S) estimator. As opposed to the standard practice of conducting separate structural analysis before combining the inferences, in MI2S, we combine the multiple imputations first and then estimate the structural parameters. By adapting a theorem (Proposition 4) proposed by Browne (1984), an asymptotically chi-square goodness-of-fit test statistic can be constructed.
Stage One: Combining Multiple Imputations
We assume the availability of
Consider imputation
The above equation is nothing more than a direct application of the standard multiple imputation combination rule to the unstructured/saturated first- and second-order moments of the complete data.
In essence, Equation 16 is a stochastic counterpart of the deterministic EM algorithm for handling missing data in the multivariate normal model (Dempster et al., 1977). EM produces maximum likelihood estimates of the unstructured/saturated first and second order moments based on the observed data. Here, the estimation of
Under multivariate normality of the complete data model, we can obtain the following estimate of the complete data Fisher information matrix based on imputation
where
The form of
A basic tenet of the theory behind multiple imputation is that if the imputations are proper, and if
Stage Two: Estimation and Inference
More formally, for the mean and covariance structure model
where
A comparison of Equations 11 and 19 clearly reveals the fact that the multiple imputation estimate of means and covariances
is similarly distributed as a chi-square variable, but as Yuan and Bentler (2000) noted, in general this is not true. Due to the presence of multiple imputations, the asymptotic covariance matrix of
A related problem is that the standard errors based on the second derivatives of
To remedy the situation, we apply Browne’s (1984) Proposition 4 to the context of multiple imputation inference. Proposition 4 contains a residual-based test statistic that is asymptotically chi-square for any consistent and asymptotically normal estimator of
Under MI2S estimation, let
is asymptotically distributed as a central chi-square variable with
Following a result in Browne and Arminger (1995), the limiting covariance matrix of the MI2S estimator
Simulation Studies
The simulation studies involve a series of comparisons involving the FIML estimator, the standard multiple imputation inferential procedure, with the newly developed MI2S estimator in the context of missing observations. For MI2S, the multiple imputations are generated under the multivariate normal model with the MCMC method. Specifically, we implement a data augmented Gibbs sampler described by Schafer (1997). The starting values of the Gibbs sampler are obtained by running an EM algorithm that produces the maximum likelihood estimates of the unstructured means and covariances. The burn-in period of the Gibbs sampler is set to 1,000 iterations. Throughout this section,
Type I Error Rates
We conduct this study to show that the proposed MI2S fit test statistic
where the underlined values are considered as fixed for the scale identification of the latent factors. And the factor covariance matrix is
The unique factor variances are given by
Complete multivariate normal data having the generating factor analysis covariance structure are first simulated. To simulate missingness, the complete data sets are then subject to two kinds of missing data mechanisms: missing completely at random (MCAR) and missing at random (MAR). For the MCAR condition, for each row in the simulated data matrix, a fair dice is rolled to decide whether there should be any missing values. Next, for a case that is chosen to contain missing observations, the values for the last three indicators are set to missing. This leaves about 16% of all observations missing. This pattern may occur in practice when a subset of individuals do not complete the questionnaire by study design (planned missingness), for example, due to the high cost of measuring all variables for all individuals.
For the MAR condition, the probability of missingness of the last three manifest variables is set to be linearly related to the mean of the first six manifest variables, say,
The three missing data conditions (NOMIS, MCAR, and MAR) are crossed with three sample sizes:
Because the fitted model is correctly specified, our hypothesis is that
Table 1 presents a summary of the results of this simulation study. First, under NOMIS, both
Type I Error Rates
Note: The entries in the “Converged” column refer to the number of converged replications in each condition.
Power to Detect Model Misspecification
In this simulation, we deliberately mis-specify the fitted model so that we may investigate the power of the MI2S statistic. The generating model is essentially the same as the model used in the previous simulation study. The only difference is that the generating model contains an extra parameter—a cross-loading for the ninth indicator on the first factor
The missing data conditions remain the same as in the previous simulation study (NOMIS, MCAR, and MAR). There are three sample size conditions: 100, 300, and 500. In each of the nine conditions, 1,000 replications are attempted. For each replication, we compare the rejection rates of
Table 2 presents a summary of the simulation results. The trend is quite clear. The test based on
Power to Detect Model Misspecification
Note: The entries in the “Converged” column refer to the number of converged replications in each condition.
Bias and Variability
To take a closer look at the MI2S estimator, we examine the estimated relative bias (RB) and root mean square deviation (RMSD) of the estimates from true generating parameter values. We continue to use the same CFA data generating model as in the first simulation study. The fitted model is correctly specified, so in effect, we are examining parameter recovery. For a generic parameter
where
Table 3 presents the true parameter values, RB, and RMSD for key parameters (factor loadings and correlations) under MCAR when
Bias and Variability of Parameter Estimates Under MCAR (
Applications to Real Data
In this section, we illustrate the performance of the MI2S estimator with applications to three real data sets. We wrote a SAS macro fully implementing the MI2S estimator and used it to obtain the results reported here. The macro and data sets are freely available from http://lcai.bol.ucla.edu/. Its usage is documented in Appendix A. The first two applications highlight the fact that the MI2S estimator can provide an asymptotically chi-square distributed fit test statistic, missing information adjusted standard error estimates, as well as additional model fit indices such as RMSEA (with confidence interval) and TLI. The third application shows that the MI2S estimator can conveniently handle mean and covariance structure modeling for data sets containing multiple sets of plausible values. When we have to create multiple imputations using MCMC sampling, we generally use a burn-in period of 1,000 iterations and subsampling intervals of 200.
CFA
We first apply the MI2S procedure to the well-known Open-Book Closed-Book data set in Mardia, Kent, and Bibby (1979). The original data set contains test scores on five subject areas obtained from 88 examinees, that is,
The model specification is as follows. The factor loading matrix is
reflecting the testing mode (open-book vs. closed-book), and the factors are correlated
The unique factor covariance matrix is given by
To illustrate the newly proposed MI2S estimator, we artificially created MAR data by applying the same procedure used in Cai and Lee (2009), wherein the scores on the last three variables were set to missing if the sum of the first two was less than 80, resulting in 28 cases with missing values. Next, based on 20 imputations
The empirical results mirror the simulation study. More specifically, with 4 degrees of freedom, the MI2S test statistic
For the purpose of comparison, we fitted the model using the FIML estimator and obtained
We also calculated adjusted standard error estimates for the structural parameter estimates. The results are presented in Table 4. The entries in the “Adjusted” column show the standard error estimates produced by MI2S estimator and the entries in the “Unadjusted” column show the naive standard error estimates obtained by treating the combined estimate of the means and the covariance matrix as if it comes from complete data. Notice that the unadjusted standard error estimates are in general smaller than the MI2S standard error estimates. It is an artifact caused by neglecting the fraction of missing information in the combined estimates. In contrast, the MI2S standard errors are properly inflated, accounting for the missing information. For the purpose of comparisons, the standard error estimates from the standard MI estimator and FIML estimator are presented in the columns of “Standard” and “FIML,” respectively. It can be seen that these standard error estimates are not only properly inflated adjusting for the missing information but also very close to those from the MI2S estimator.
Standard Errors of Structural Parameter Estimates for the Open-Book Closed-Book Data
Note: The entries in the “Adjusted” column show missing information adjusted standard error estimates produced by MI2S estimator, whereas the entries in the “Unadjusted” column show the incorrect standard error estimates when the combined estimate
Conditional Latent Curve Analysis
In the second application, the data from a symposium at the 1997 meeting of the Society for Research on Child Development (Curran, 1997) were analyzed. The data set contains four repeated measures of
The goal is to fit a linear latent curve model (Bollen & Curran, 2006) to the repeated measurements of aggressive behavior. We included gender and mother’s age as time-invariant covariates, making
The MI2S test statistic and the FIML test statistic come out to be very similar, that is,
Analysis Involving Plausible Values: PISA
As the third application, the MI2S estimator is applied in the mean and covariance structure modeling of a data set containing multiple sets of plausible values: PISA. Conducted triennially by the Organization for Economic Co-operation and Development (OECD), PISA is a system of international assessments that measures 15-year-olds’ literacy in reading, mathematics, and science. The PISA data also include numerous items on student characteristics, student family background, and student perceptions, just to name a few. More information about PISA can be found at http://www.pisa.oecd.org/.
A distinctive characteristic of the PISA data set is that student performance on literacy scales and subscales are reported as
In this particular application, we use data from the assessment conducted in 2006, focusing on the United States school sample. PISA 2006 uses a two-stage sampling design in which schools are first sampled and then students are sampled in the participating schools. Therefore, it is required to use sampling weights or to model the clustered data structure explicitly (or maybe both) for sound statistical analyses. 1 However, for a simple illustration of the MI2S estimator under the presence of multiple sets of plausible values, we ignored the sampling weights.
First, purely for the sake of illustration, we developed a hypothetical structural equation model in which a student’s science literacy (SCI) is regressed on his or her mathematics literacy (MATH) and the value of science (VoSCI) that he or she holds. Figure 1 presents a conceptual path diagram of our structural equation model. MATH is an observed variable consisting of five sets of plausible values. SCI is a latent variable, measured by three subscale variables: explaining phenomena scientifically (EPS), using scientific evidence (USE), and identifying scientific issues (ISI). EPS is a scaling indicator with a fixed factor loading of 1.0. Each of the science subscale variables contains five sets of plausible values. VoSCI is a latent variable, measured by two manifest variables: the general value of science (GSCI) and perceptions of the personal value of science (PSCI). GSCI is the scaling indicator for VoSCI. Neither GSCI nor PSCI contains plausible values. We freely estimated the covariance of a student’s mathematics literacy (MATH) and his/her value of science (VosCI). To aid interpretation, we standardized all observed variables. The analysis sample includes cases with complete data for all variables

A conceptual path diagram for the hypothetical structural equation model for PISA data. Only the key structural parameters are shown.
Next, we combined the
Structural Parameter Estimates and Standard Errors for the PISA Data
Note: The standard errors are missing information adjusted standard error estimates produced by MI2S estimator.
Discussions
In this article, we proposed a new two-stage estimator and inferential tools (MI2S) for mean and covariance structure modeling under the presence of multiple imputations. While standard multiple imputation theory dictates that one should fit a mean and covariance structure model
Our simulation studies and empirical data analysis show that the MI2S estimator performs as well as the existing FIML estimator under MCAR and MAR conditions not only in terms of Type I error rates and statistical power of overall model fit statistics but also in terms of bias and variability of parameter estimates.
The MI2S estimator is potentially more flexible than FIML. For example, handling plausible values is natural in the MI2S estimator, whereas FIML estimator will not be of much help because there is technically no missingness in the data sets. In addition, inclusion of missing-data-relevant covariates is very straightforward because MI2S is within a multiple imputation framework. In contrast, though it is possible to include missing data relevant covariates for the FIML estimator (Collins, Schafer, & Kam, 2001; Graham, 2003), significantly more efforts are required if the missing data mechanism needs to be explicitly modeled. Moreover, as the number of missing-data-relevant covariates increases, model estimation and identification becomes increasingly problematic for FIML (Savalei & Bentler, 2009, p. 494).
Compared with the standard multiple imputation procedures, the MI2S estimator is computationally more efficient (i.e., a single model-fitting vs. multiple model-fittings) and produces a chi-square distributed test statistic
Our research is not without limitations. First, the derivations are exclusively based on multivariate normal theory. The multivariate normality assumption implies that in higher order moments, interactions or nonlinearities among variables are not modeled in the imputation process. Therefore, if such complex associations are to be a crucial part of the analysis, biased consequences may be obtained due to the inconsistencies between the model used in the imputations and the model used in subsequent analyses of the imputed data sets. This remains an inherent issue for multiple imputation based estimators, MI2S included.
When the variables are clearly not normal (e.g., categorical variables or design variables), a different imputation approach can be employed based on log-linear models or general location models (Schafer, 1997) or sequential generalized regression models (Raghunathan, Lepkowski, Van Hoewyk, & Solenberger, 2001). More research should be conducted in the future to explore the possible existence of non-normal versions of two-stage estimators that work under multiple imputation.
Second, our simulation studies only examined a small set of conditions. In particular, a closer examination of Table 1 reveals that, as sample size increases, Type I error rates of
Third, we only implemented the proposed methods in SAS because of its widespread use in both academic and nonacademic settings. Though our macro is freely available, it is nevertheless restricted to a single software environment. We will explore the possibility of implementing the method in free statistical software such as R.
Footnotes
Appendix A
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Part of this research is made possible by grants from the Institute of Education Sciences (R305B080016 and R305D100039) and grants from the National Institute on Drug Abuse (R01DA026943 and R01DA030466).
