Handling Missing Data in Growth Mixture Models

Abstract

A Monte Carlo simulation was performed to compare methods for handling missing data in growth mixture models. The methods considered in the current study were (a) a fully Bayesian approach using a Gibbs sampler, (b) full information maximum likelihood using the expectation–maximization algorithm, (c) multiple imputation, (d) a two-stage multiple imputation method, and (e) listwise deletion. Of the five methods, it was found that the Bayesian approach and two-stage multiple imputation methods generally produce less biased parameter estimates compared to maximum likelihood or single imputation methods, although key differences were observed. Similarities and disparities among methods are highlighted and general recommendations articulated.

Keywords

growth mixture models missing data multiple imputation Bayesian

In longitudinal data analytic settings, direct applications (Bauer, 2007; Harring & Hodis, 2016) of finite mixture models are commonplace in the empirical literature, in which the primary goal is to identify unobserved (latent) heterogeneous subpopulations of growth trajectories (Jung & Wickrama, 2007). As a hypothetical example, researchers may follow students’ self-concept of ability (SCA) in math scores over time and wish to identify which students in the sample are “constantly high” in SCA, “slow decliners” in SCA, or “rapid decliners” in SCA (Musu-Gillette et al., 2015). These subpopulations of students are latent and are not identified a priori like groups manifest in the data as observed categorical independent variables (e.g., gender, socioeconomic status, experimental condition). Instead, their existence must be inferred from unobserved clustering embedded in the growth patterns themselves. Growth mixture models (GMMs; Muthén, 2001; Muthén & Shedden, 1999) are commonly employed to model repeated measures data for investigating this type of population heterogeneity in individuals’ longitudinal profiles although other types of longitudinal mixture models exist (see, e.g., McNeish & Harring, 2020; Nagin, 1999).

The increased popularity of GMMs in recent years kindled a number of methodological studies that focused on issues pertaining to using the model under varying real-world conditions. Much of the methodological research has dealt with testing indices for data-model fit (Henson et al., 2007; Nylund et al., 2007), evaluating methods that help decide on the number of subgroups (Bauer & Curran, 2003; McNeish & Harring, 2017; Nylund et al., 2007; Tofighi & Enders, 2007), examining the classification accuracy of models (Enders & Tofighi, 2008; Peugh & Fan, 2012), comparing estimation methods for finite mixtures (Hipp & Bauer, 2006; McLachlan & Krishnan, 2007), or showing how differing settings other than the default settings in popular software for estimating these models can impact parameter recovery and precision (Li et al., 2014).

Despite the research made available to help inform and refine the use of GMMs, not much attention has been devoted to testing GMMs in the presence of missing data. In the social and behavioral sciences, missing data analysis merits special attention because repeated measures data used for fitting GMMs are typically rife with missing data due to various issues like data collection error, participant nonresponse to specific items, drop out, or failure to participate in at least one wave of data collection.

Studies on handling missing data in the context of GMMs recommend using full-information maximum likelihood (FI) over single-stage multiple imputation (MI; Enders, 2011; McLachlan & Peel, 2000; Sterba, 2016) because MI requires the grouping information to be known a priori in order to correctly impute data (Enders, 2010) for each group. Since the grouping information for mixture models is latent, the use of MI with mixture models is problematic. Failure to specify a grouping variable when using MI has been shown to produce biased parameter estimates and incorrect identification of classes (Enders, 2011; Sterba, 2016).

Some recent studies have also turned to Bayesian methods to estimate GMMs with missing data using Markov chain Monte Carlo (MCMC) algorithms such as the Gibbs sampler. In a simulation study by Depaoli (2013), parameter recovery for a three-class GMM was investigated under varying degrees of class separation, sample size (SS), class proportion, and method of analysis (FI vs. a fully Bayesian [FB] approach using different priors). It was shown that a Bayesian approach with informative priors produced less biased estimates of the means (slopes and intercepts) than when using maximum likelihood (ML) estimation, particularly when separation among classes was larger and SSs were modest. Parameter variances, however, were not well recovered under all conditions. One limitation of the study, however, was that missing data were not considered. Other studies such as those by Song and Lee (2002), Cai et al. (2010), and Lu et al. (2011) proposed using Bayesian methods to address missing data that were missing at random (MAR) when using GMMs, finding favorable outcomes to using such models.

Harel (2007, 2009) also suggested using a two-stage multiple imputation (2M) approach to remedy the problems of using single-stage MI with mixture models. The original idea proposed by Shen (2000) as a way to impute data of different types (Rubin, 2003; Schafer & Graham, 2002) was extended to the context of mixture regression models. According to Harel (2009), the 2M method allows unbiased parameter estimates that are similar to those obtained from FI estimation. Nevertheless, the extent to which the method will work and how well it will perform in the context of GMMs and the different conditions of missingness as compared to these other proposed approaches (FI, MI, and FB methods) is unclear.

Despite the vast methodological work done in the area of GMMs and missing data, no study has brought these methods together and compared their performance under varying conditions specific to GMMs and missing data. The purpose of this study is to fill this important gap in the literature by comparing different methods—those that have been suggested for handling missing data in the context of GMMs as well as those not yet extended to this longitudinal context.

To outline the structure of this article, we first overview the generic latent growth model (LGM) and demonstrate how it extends to GMMs. In a subsequent section, missing data and missingness mechanisms are outlined followed by a review of estimation methods used in the simulation. Details regarding the single and two-stage multiple imputation methods are also provided. We provide a Monte Carlo simulation study to highlight the performance of these methods across manipulated conditions thought to occur in practical research scenarios and those that have been shown to impact results in past GMM simulation studies. Specifically, we explore convergence, classification accuracy, relative bias of parameter estimates, and precision of the standard errors (SEs).

Overview of LGMs and GMMs

Latent Growth Models

Several options exist for analyzing data from longitudinal studies with a continuous repeatedly measured outcome. The latent curve model (McArdle, 1986; Meredith & Tisak, 1990), or more generally, LGM (see, e.g., Hancock et al., 2013), an important subclass within structural equation modeling (SEM), is commonly employed to analyze continuous repeated measures data of this type. Theoretically, this approach hypothesizes the existence of latent trajectories capturing an underlying change process that can only be observed indirectly via the repeated measures (Bollen & Curran, 2006). The LGM permits the correlational structure of the repeated measures to be separated into within-individual variability as well as between-individual variability in individual subjects’ growth characteristics across time (see, e.g., Preacher et al., 2008).

Consider the conventional linear LGM with r time-invariant covariates takes the form of a restricted confirmatory factor analytic model, such that

y_{i} = Λ_{i} η_{i} + ε_{i},

η_{i} = α + Γ x_{i} + ζ_{i} .

In Equation 1, $y_{i} = (y_{1}, y_{2}, \dots, y_{T_{i}})^{'}$ is a $T_{i} \times 1$ vector of responses, where T_i is the number of observations for the ith individual. The i subscript permits each individual to be measured at possibly different times and allows for missing data. Elements of the $T_{i} \times p$ matrix, $Λ_{i}$ , are factor loadings, where p is the number of growth factors and the loadings are commonly, but not always (Meredith & Tisak, 1990), specified a priori to fit a specific type of growth trajectory¹; $η_{i}$ is a $p \times 1$ vector of individual-specific intercept and linear slope growth factor scores for individual i; and $ε_{i}$ is the $T_{i} \times 1$ vector of time-specific residuals that capture the misfit of an individual’s data and their fitted linear function. These residuals are assumed to be normally distributed with zero mean vector and covariance matrix, $Θ_{i}$ [i.e., $ε_{i} \sim N_{T_{i}} (0, Θ_{i})$ ], where $Θ_{i}$ depends on i only through its dimension, although this assumption can be relaxed (see, e.g., McNeish & Harring, 2020).

In Equation 2, individual linear growth factors are decomposed into the $p \times 1$ vector of growth factor means, α (conditional on $x_{i}$ ), a $p \times r$ matrix of time-invariant coefficients in $Γ$ capturing the effects of r time-invariant covariates in $x_{i}$ , and the $p \times 1$ vector of random effects, $ζ_{i}$ . It is assumed the random effects come from a p-dimensional normal distribution, $ζ_{i} \sim N_{p} (0, Ψ)$ , where $Ψ$ is a $p \times p$ matrix of the variances and covariances of the growth factors. The random effects, $ζ_{i}$ , are assumed to be uncorrelated with the time-specific residuals $ε_{i}$ (i.e., $cov (ζ_{i}, ε_{i}^{'}) = 0$ ). Given these distributional assumptions, the model implied mean $μ_{i}$ and covariance $Σ_{i}$ of the repeated measures are

E [y_{i}] = μ_{i} = Λ_{i} (α + Γ κ),

var [y_{i}] = Σ_{i} = Λ_{i} (Γ Φ Γ^{'} + Ψ) Λ_{i}^{'} + Θ_{i},

where $κ$ and $Φ$ are the mean vector and covariance matrix of the static covariates, $x_{i}$ , respectively.

Multiple group LGMs

Populations are often comprised of heterogeneous subpopulations that may follow different growth trajectories. These subpopulations can sometimes be identified by discrete covariates that are manifested in the population (e.g., gender, categorized age groups). In this case, parameters can be estimated separately for each group with what is referred to as a multiple-group model (e.g., Muthén & Curran, 1997), where Equations 1 –4 would be augmented with a g ( $g = 1, \dots, G$ ) subscript denoting to which group a set of parameters summarizing growth belongs. In contrast to accounting for population heterogeneity through multiple group models in which the grouping variable must be a measured variable in the dataset, heterogeneous subpopulations can exist, but group membership is more often latent and not known a priori (Sterba, 2014) leading to the use of finite mixture models (McLachlan & Peel, 2000) to classify individuals into trajectory types.

Growth Mixture Models

GMMs are a generalization of the multiple group modeling framework, in which group membership is latent and must be inferred via the characteristics of within-individual profiles (Muthén, 2001; Muthén & Shedden, 1999; Nagin, 1999). Instead of a known value for group membership, each observation receives a probability of membership in each of the estimated latent classes. Assuming multivariate normality, the composite probability density of a vector of continuous outcome variables, $y_{i}$ , for the ith individual can be expressed as

f (y_{i} | π, θ) = \sum_{k = 1}^{K} π_{k} f_{k} (y_{i} | θ_{k}),

where K ( $k = 1, \dots, K$ ) represents the number of latent classes specified by the researcher, f_k denotes the marginal density for class k, $θ_{k}$ is a parameter vector comprising the growth parameters in class k, and $π_{k}$ represents the mixing proportion of the sample belonging to class k with the caveats that $0 < π_{k} < 1$ , and $π_{k} = 1 - \sum_{j = 1}^{K - 1} π_{j}$ . Given class k, the LGM in Equations 1 and 2 can be extended in a straightforward manner to accommodate the inclusion of latent classes

y_{i} | k = Λ_{i} η_{i} + ε_{i},

η_{i} | k = α_{k} + Γ_{k} x_{i} + ζ_{i},

with the assumptions that the residuals and random effects for individual i follow distinct multivariate normal distributions, $ε_{i} | k \sim N_{T_{i}} (0, Θ_{i k})$ and $ζ_{i} | k \sim N_{p} (0, Ψ_{k})$ , respectively. With these distributional assumptions, the model-implied mean vector and model-implied covariance matrix from Equations 6 and 7 can be written as

μ_{i} | k = Λ_{i} (α_{k} + Γ_{k} κ),

Σ_{i} | k = Λ_{i} (Γ_{k} Φ Γ_{k}^{'} + Ψ_{k}) Λ_{i}^{'} + Θ_{i k} .

In sum, GMMs are a natural extension of LGMs that utilize a discrete latent variable with a specific number of categories permitting clusters of longitudinal change to be captured by distinctly parameterized growth trajectories. As McNeish and Harring (2020) point out, this discrete latent variable serves as a moderator for the whole (LGM) model, allowing parameter estimates to differ for the different categories of the discrete latent variable. Whether these latent classes are theorized a priori or whether they emerge from an exploration of the data, fitting GMMs to many longitudinal datasets can be severely hampered by the real possibility of the presence of missing data. This is discussed next.

Missing Data Mechanisms

Rubin (1976) regarded missing data as random variables with a probability distribution. Missing data were classified as having one of the three mechanisms: MAR, missing completely at random (MCAR), and missing not at random (MNAR). Briefly, for data that are MAR, the missingness is independent of the variable with missingness, but dependent on other observed variables. That is, if I represents the set of indicators of missing data in a dataset, then MAR may be expressed as $Pr (I | Y_{comp}) = Pr (I | Y_{obs})$ , where $Y_{comp}$ is the complete data consisting of observed and missing data. MCAR is the mechanism, in which the missingness is independent of both the missing and observed variables. This type of missing can be regarded as unsystematic missingness, which amounts to $Pr (I | Y_{comp}) = Pr (I)$ . A third, often called nonignorable, is the MNAR mechanism, which is thought to be operating when the missingness is dependent on the variable with missing data. That is, $Pr (I | Y_{comp}) = Pr (I | Y_{obs}, Y_{miss})$ . Rubin (1976) called MCAR and MAR missingness ignorable because likelihood-based maximization methods produce unbiased estimates even if the model for missing is ignored, which has to do with how the likelihood of the missing data can be differentiated out of the overall likelihood.

Extended Ignorability

Harel (2003) and Harel (2009) extended the meaning of ignorability by decomposing the missing data into different types. If the missing data, $Y_{miss}$ , is regarded as the decomposition of two types of missing data— $Y_{miss} = (Y_{miss}^{A}, Y_{miss}^{B})$ —and the missing data indicator $I^{+}$ represents the missingness indicator matrix for the two types of missing data, then the missingness is regarded to be MAR $^{+}$ if

Pr (I^{+} | Y_{obs}, Y_{miss}^{A}, Y_{miss}^{B}) = Pr (I^{+} | Y_{obs}) .

According to Rubin (1987), under the conventional MAR assumptions, imputations for any missing data can be drawn from $Pr (Y_{miss} | Y_{obs})$ , meaning that the missingness pattern, I , can be ignored. Analogously, Harel (2003, p. 37) showed that $I^{+}$ can be ignored, and the following relations can be established when MAR $^{+}$ holds:

Pr (Y_{miss}^{A} | Y_{obs}, I^{+}) = Pr (Y_{miss}^{A} | Y_{obs}),

and

Pr (Y_{miss}^{B} | Y_{obs}, I^{+}) = Pr (Y_{miss}^{B} | Y_{obs}, Y_{miss}^{A}) .

A weaker ignorability condition based on these relations that are pertinent to GMMs, where the missingness in the measured variables might be conditioned on the completely unobserved latent class variable (e.g., missing measured variable, $Y_{miss}^{B}$ , is dependent on completely unobserved latent class variable, $Y_{miss}^{A}$ ), is the conditional extended ignorability condition. Harel (2003) defined a conditional extended ignorability assumption, or CMAR $^{+}$ , where $I^{+}$ is conditional on the observed data and the other group of missing data. That is

Pr (I^{+} | Y_{obs}, Y_{miss}^{A}, Y_{miss}^{B}) = Pr (I^{+} | Y_{obs}, Y_{miss}^{A}) .

If the CMAR $^{+}$ condition can be assumed, then the following relation is also true:

Pr (Y_{miss}^{B} | Y_{obs}, Y_{miss}^{A}, I^{+}) = Pr (Y_{miss}^{B} | Y_{obs}, Y_{miss}^{A}),

which, under Rubin’s original assumptions of MAR, imputations can be drawn from $Pr (Y_{miss} | Y_{obs})$ . Imputations for a second set of missing data can be drawn from $Pr (Y_{miss}^{B} | Y_{obs}, Y_{miss}^{A})$ as long as the mechanism that made $Y_{miss}^{B}$ is not related to $Y_{miss}^{B}$ . For example, if $Y_{miss}^{A}$ represents the missing class indicators (completely unobserved) and $Y_{miss}^{B}$ represents the missing values of the measured variables, if the mechanism that created $Y_{miss}^{B}$ is not dependent on $Y_{miss}^{A}$ or $Y_{miss}^{B}$ , then it is considered MAR $^{+}$ . If the mechanism that created $Y_{miss}^{B}$ depends only on $Y_{miss}^{A}$ and possibly other observed measured variables, a possibility in GMMs where the missingness can depend on the latent class, then this constitutes CMAR $^{+}$ according to Harel (2003, p. 51).

Several options are available for addressing missing data under the four assumptions of missingness just reviewed. These include listwise deletion (LD), FI, MI methods, and Bayesian estimation using MCMC. These are discussed in the context of previous research on finite mixture models followed by the research questions driving the simulation study.

Approaches for Addressing Missing Data

Like finite mixture models in other contexts, GMMs may be estimated using various approaches, but ML estimation via the expectation–maximization (EM) algorithm (Harring, 2012; Muthén & Shedden, 1999) and a Bayesian framework using MCMC methods (e.g., Depaoli, 2014; Kohli et al., 2015; Lock et al., 2018) seem to standout in the literature. These two methods are popular because they are able to handle latent, unobserved parameters that are part of the GMM model, as well as when the datasets contain missing observations. These methods have the capability to estimate parameters even with missing data under strict assumptions about the missingness. These methods are discussed briefly next.

FI via the EM Algorithm

Aside from being relied on for estimating parameters, FI estimation is widely accepted as a way to deal with missing data (FI; Dempster et al., 1977). In general, the essence of FI estimation is to use each individual’s complete data to find the parameters across individuals that maximize the likelihood function. As long as the missingness can be assumed to be MAR, the ML estimates can be found using the marginal probability of the individuals’ observed responses. In other words, FI uses any and all of the available data for each individual, without the need to alter any of the data, for example, by deletion or imputation. The actual derivation at a set of parameter estimates that maximize the likelihood is obtained through the EM algorithm (McLachlan & Peel, 2000). The EM algorithm alternates between an E-step and M-step to arrive at a solution that maximizes such a likelihood function with missing information. Dempster et al. (1977) demonstrated how this likelihood could be assumed to be similar to other missing data problems. Essentially, the likelihood can be rewritten with the assumption that the class parameter is observed and the maximization of the log-likelihood becomes conditional only on the observed data and candidates of these “observed” parameters. The E-step amounts to computing a posterior probability for each individual evaluated at current parameter values, and during the M-step, posterior probabilities from the E-step are used to maximize the conditional expectations by replacing the unknown class indicators as well as computing other model parameters. Then, these new parameters are carried over to the next iteration, where the back and forth mechanism continues until parameters and class proportions that maximize the likelihood are found based on certain convergence criteria (e.g., the difference in log-likelihood values between successive iterations is negligible).

FB via MCMC

A FB approach using an MCMC simulation technique is another possible approach for obtaining parameter estimates for the GMM (Asparouhov & Muthén, 2010; Gelman et al., 1995), which can also be implemented to handle missing data. Although there are a number of nuanced MCMC methods that have been developed, the Gibbs sampler is one of the more widely used methods (Geman & Geman, 1984; Muthén et al., 2010). Within the Gibbs sampler, a sequence of values for unknown parameters, latent variables, and missing observations are iteratively obtained in order to create posterior distributions for the unknown parameters and missing data variables. This is done by conditioning on the observed data and prior information and iteratively updating one parameter and missing value at a time. Parameters are continuously updated by conditioning on the priors and observed data but also on the values from each previous iteration. See Asparouhov and Muthén (2010) for a more comprehensive, step-by-step guide to the mechanics underlying the Gibbs sampler.

Multiple Imputation

As a way to retain the advantages of MI methods and model sources of uncertainty mentioned before, Rubin (1987) recommended using MI. MI fills in values for the missing data in three general steps. First, multiple complete datasets are created by the imputation of plausible values in the cells where missing data are present. Then, the multiple complete datasets are analyzed. Finally, the results from the multiple analyses are aggregated. The underrepresented uncertainty of parameter estimates is addressed by creating several versions of complete data that produce multiple parameter estimates. These estimates are then averaged to produce a final parameter estimate. Special rules derived by Rubin (1987) are applied in the aggregation stage of SEs to account for between imputation variance and within imputation variance.

Filling the missing data cells in the first step is typically the most crucial step of the process. Several imputation methods have been suggested, but in general, these methods can be classified into two frameworks: joint-modeling (Rubin, 1987) and fully conditional specification (van Buuren et al., 2006). While these two approaches are different in how they impute the missing data, they are both grounded in Bayesian methods, which entails specifying probability distribution functions (or PDFs) for the parameters of interest [typically referred to as the prior distributions, or simply prior(s)], specifying a likelihood function—a conditional PDF of the data given the parameters—that use the data to provide evidence about the parameters and creating a posterior distribution using the prior(s) and likelihood function to describe the distribution of the parameters in light of the data. The three elements are then brought together by Bayes’s theorem, which equates the posterior distribution as the product of the prior distribution and the likelihood function divided by the marginal distribution of the data.

Two-Stage MI

Another option for addressing the missing data problem in GMMs is the 2M approach introduced by Harel (2003). The 2M method is feasible on the assumption of extended ignorability. In the 2M method, m sets of $Y_{miss}^{A}$ , or class indicators, are first generated and retained from an MCMC chain in the I-step. That is

Y_{miss}^{A (j)} \sim Pr (Y_{miss}^{A} | Y_{obs}),

where $j = 1, 2, ..., m$ . Then, for each m sets of the latent classes, assuming that the latent classes are fixed as known, $n - 1$ sets of $Y_{miss}^{B}$ are generated. That is

Y_{miss}^{B (j, k)} \sim Pr (Y_{miss}^{B} | Y_{obs}, Y_{miss}^{A (j)}),

where $k = 1, 2, ..., n$ . Here, there are $n - 1$ sets because the first set is generated from the first stage of imputations. This will result in a total of $m n$ complete datasets that are to be analyzed.

After analysis, the resulting $m n$ parameter estimates are averaged to obtain the final estimates and SEs are combined using a nuanced version of Rubin’s rules derived by Shen (2000). Analogous to Rubin’s rules, Shen’s rules formulate the resulting SEs to be a function of the within-imputation variability, the between-imputation variability, and the complete-data variance.

The 2M method has been mainly applied in situations where missing data can be thought of as being of distinct types of missingness, such as surveys containing planned and unplanned missing data (Graham et al., 2001; Rhemtulla & Hancock, 2016), longitudinal models with ignorable intermittent missing data points (Hedeker & Gibbons, 1997), and nonignorable dropout missing (Harel, 2003). The method has also been applied with latent variable models that involve analysis with missing data and unobserved latent variables, such as latent class regression models (Harel et al., 2013) and latent class contingency tables (Winship et al., 2002).

Although empirical studies have used the 2M method to address missing data, no studies to date have tested the use of the method through simulation, let alone on different conditions of missingness or GMMs. Thus, while the method has been theoretically justified and applied in several different scenarios, the extent to which the method will work and how well it will perform relative to other missing data handling methods, such as FI, MI, or FB methods remains ambiguous.

Purpose of the Study

The purpose of this study is to extend the use of the several existing methods presented to address the issue of missing data in the context of GMMs. A simulation study will be conducted to compare the FI, FB using Gibbs sampling, and the 2M approach. In typical GMM simulation studies, varying SSs, number of latent classes, and latent class separation are among the conditions explored. Other important considerations that have been the focus of studies on GMMs such as class enumeration and fit evaluation criteria (see, e.g., Bauer, 2007; Bauer & Curran, 2003) will not be considered in order to focus on the effects of missingness on parameter estimation. In typical missing data studies, these varying rates of missing data and missingness mechanism are some common conditions of interest, which will be explored in this study. Specifically, the current research will compare the performance of LD, FI, MI, 2M, and an FB approach for estimating GMMs when there are missing data at various occasions under varying conditions common in the GMM literature as well as different missing data conditions.

Specifically, the study will attempt to address the following.

When the goal is to classify individuals into groups, for which method will the growth model provide the highest percentage of classification accuracy for varying rates of missing data, SS, class separation, and missingness mechanism?

For what rate of missing data and class separations will the GMM produce biased parameter estimates when utilizing the different missing data handling methods under conditions of MAR (or CMAR $^{+}$ ) missingness and MCAR missingness?

Which missing data handling methods produce SE estimates that are most accurate?

Are there any advantages to using any of the missing data handling methods in the context of GMMs? Particularly, is using a more complicated method worth the extra effort for producing unbiased parameter estimates?

To answer these questions, a Monte Carlo simulation study was conducted using the conditions that were mentioned previously.

Method

Simulation

A Monte Carlo simulation was used to compare LD, FI, MI, FB, and 2M for handling missing data in the context of GMMs. The study involved testing conditions that are known to impact the parameter recovery of the population parameters. These conditions include missingness mechanism, missing rate (MR), SS, and class separation.

Data generation

Data for four timepoints were generated from a two-class linear growth model with random intercepts and slopes corresponding to Equations 6 and 7. A two-group factor analytic model was parameterized such that the loadings were fixed to model linear growth with four timepoints. The number of classes was chosen and kept fixed in order to keep the study focused on the missing data handling issue. The motivation for the number of timepoints came from recommendations by Muthén and Curran (1997), as well as past studies justifying this number (see, e.g., Depaoli, 2013; Tueller & Lubke, 2010). Data sets consisting of SSs of 100, 200, 500, and 1000 with an equal number of observations in each of the two classes were used and chosen based on these previous studies.

The means of the latent growth factors for the two classes were manipulated to achieve varying degrees of class separation using the Mahalanobis distance (MD), a measure of separation often used to quantify two multivariate distributions. The MD can be defined as follows:

MD = [(μ^{(1)} - μ^{(2)})^{'} Σ^{- 1} (μ^{(1)} - μ^{(2)} {)]}^{1 / 2},

where the $μ$ terms represent the means of the growth factors (intercept means and slope means of the two classes) and the $Σ^{- 1}$ represents the inverse of the common variance/covariance matrix of the growth parameters for all classes. The MD distance was manipulated by varying the values of the population latent growth intercept and slope means for each class. In accordance with previous studies, MDs of 0.5, 1.0, 1.5, and 2.0 were tested to represent low, poor, moderate, and high separation, respectively (see, e.g., Lubke & Neale, 2006). The class 1 parameters were fixed based on parameters found in a study by Kaplan (2002) that investigated the trajectories of reading development among kindergartners and first graders, similar to the parameters used by Depaoli (2013). Growth parameter variances and covariances, as well as time-specific residuals, were left fixed. Population values that were used for the data generation for each distance condition are presented in Table 1.

Table 1.

Data Generating Parameters for Each Distance Condition

Parameter Means	High Separation (MD $\approx 2.0$ )	Moderate Separation (MD $\approx 1.5$ )	Poor Separation (MD $\approx 1.0$ )	Very Poor Separation (MD $\approx 0.5$ )
Class 1 Intercept	48.00	48.00	48.00	48.00
Class 1 Slope	3.00	3.00	3.00	3.00
Class 2 Intercept	40.82	43.10	45.66	47.34
Class 2 Slope	4.00	4.00	4.00	3.60
Growth parameter variances (Ψ) = $[\begin{matrix} 18.00 \\ 1.20 & 2.00 \end{matrix}]$ Time-specific residuals (Θ) = $15 I_{4}$

Note. MD = Mahalanobis distance.

To simulate an MNAR missingness mechanism, or the ${CMAR}^{+}$ , the probability that individual i had data point $y_{i t}$ at time t missing was calculated based on a logistic function, $Pr {R_{i t} = 1} = \frac{exp (m)}{1 + exp (m)}$ , where $R_{i t}$ is the missingness indicator, such that when $R_{i t} = 1$ , then individual i’s observation at time t is missing. Also, $m = β Z_{i} + c$ , where Z_i is the unobserved class, $β$ is the coefficient used to control the relation between class and the propensity to be missing at timepoint 1, and c controlled the rate of missing. The coefficients presented in Table 2 were used to create the desired rates of missing and correlation with the class variable.

Table 2.

Coefficients for Creating Missing Data

Missing Rate (%)	$ρ$	$β$	c
5	.25	4.0	2.0
10	.34	4.0	1.3
20	.36	2.0	0.5
30	.37	1.7	−0.1
40	.36	1.5	−0.6

Note. The correlation term $ρ$ here is an approximation of the point-biserial correlation typically used to quantify the relationship between a continuous variable (missingness at timepoint 1) and a dichotomous variable (class).

To create MCAR missingness, for each outcome variable (outcome at each timepoint), a randomly drawn value from 0 to 1 from a uniform distribution was compared against a specified threshold that created an overall rate of missing. In this way, the missingness was completely haphazard and adhered to the rules of MCAR missingness. Percentages of 5, 10, 20, 30, and up to 40 were investigated for both missingness conditions.

Data analysis and other common simulation issues

After creating data with different types of missing data, the five different missing data handling approaches, LD, FI, MI, 2M, and FB were utilized to analyze each dataset. These methods were implemented using Mplus Version 8.1 (Muthén & Muthén, 1998–2017) and R (R Development Core Team, 2008). A total of 800 conditions were replicated 100 times. The replication value was chosen based on the minimum number of replications it took for the estimates of the means and variances to converge for the cell with the most suboptimal crossed-conditions (40% missing, LD, with an SS of 100 and very poor separation).

Convergence

To avoid local maxima solutions, the final likelihoods were replicated by varying the number of starting values and setting a maximum number of replicated likelihoods, as per Tueller and Lubke (2010). For the current study, 100 replications with perturbed starting values were conducted and a minimum criteria of 10 optimized likelihoods were used as a sign of convergence to the global maximum. For the Bayesian methods, convergence was assessed using the proportional scale reduction (PSR) factor, where a PSR of 1 with a tolerance of $\pm 0.5$ was considered converged. For both methods, replications that produced negative variance estimates or extremely large and/or SEs equal to zero were not counted as a replication. The final convergence rate for any given cell was the total number of fully converged data sets divided by the total number of replications that were needed to get to the desired number of 100 replications.

Label switching

Label switching, a known issue in simulation studies involving finite mixture models (Tueller et al., 2011), was also addressed. For the FB method, within-chain label switching was avoided by constraining the prior for the intercept of one class to be greater than the prior for the intercept of the other class. This prevented the Gibbs sampler from switching midchain. Between-chain label switching was avoided by using a single chain. Label switching between replications for the MLEM method was to be avoided by constraining the class 1 mean intercept to be greater than class 2 mean intercept and class 1 mean slope to be less than class 2 mean slope (Cassiday et al., 2021).

Choice of priors

In certain situations, variances of growth factors are especially susceptible to prior misspecification (Depaoli, 2013; McNeish, 2016). To investigate the kind of priors that should be used for the growth model in the current simulation study, a preliminary simulation was conducted on the full data (without any of the missing data conditions) to see how different priors would affect variance parameter estimation. The simulation involved comparing MLEM and a Bayesian MCMC method using prior specifications for the variance parameters as suggested by McNeish (2016) for LGMs. Priors for the means were not specified and left as diffuse because past studies found these parameters to be invariant across different priors. The same priors tested by McNeish (2016) were also tested for this preliminary study: an improper, noninformative, inverse Wishart distribution prior (Mplus default), a proper, noninformative, inverse Wishart distribution distribution prior, and a data driven prior using estimates from an initial MLEM run. The results from the complete data study showed that the noninformative (inverse Wishart) prior for the variances actually performed well compared to the other priors. Although these findings were inconsistent with suggestions from past studies, the improper, noninformative priors seemed to be the best choice to use for this particular situation for the missing data handling methods that were compared.

Number of imputations

It has been suggested that the number of imputations for MI be set at $M = 40$ in order to avoid any power issues (Graham, 2009). McGinniss and Harel (2016) showed that under MAR conditions and smaller rates of missing information (less than 40%), increasing the number of imputations for multiple-stage MI models did not produce significantly different bias or coverage rates among varying combinations of number of imputations at each stage, especially for problems involving estimation of point estimates and variances. Therefore, they recommended using as few as $N = 10$ imputations in the first stage and $M = 2$ imputations in the second stage. To be conservative, for the 2M approach, $N = 20$ imputations were used for the first stage and $M = 4$ imputations were used for the second stage for a total of $80$ imputations per dataset.

Evaluation of Outcomes

Each cell in the simulation was evaluated based on convergence, classification accuracy, relative bias of the parameter estimates, and precision of the SEs. In order to pinpoint any effects of the manipulated conditions on the parameter estimates, a split-plot design factorial analysis of variance (ANOVA) was conducted. In this particular design, the missing data handling method was regarded as a within-subject factor, and all other conditions were considered between-subject factors. All main effects (SS, class separation, missing mechanism, MR, and missing data handling method) and up to three-way interactions were considered on the relative bias and SE bias of the estimates of the means, variances, and class proportion parameters (nine parameters in total: two intercept means, two slope means, intercept and slope variances, intercept and slope covariances, residual variances, and two class proportions). Only factors and interactions that were identified as statistically significant (p value $\leq .05$ ) and with effect sizes of $η^{2} \geq .06$ (Cohen, 1988) were investigated further.

Classification accuracy

To answer the first research question of interest, classification accuracy was evaluated based on the percentage of correct and incorrect classifications. High classification accuracy meant that a high percentage of individuals classified into the class they belong in the population, and a low percentage of individuals classified into a different class than where they belong in the population. Most likely class memberships for each individual were obtained from the posterior probability after estimation. These class assignments were then compared to the individual’s true class assigned at the time of data generation.

Parameter recovery

To answer the second research question of interest, for each condition, parameter recovery of the fixed effects and respective random effects for the set number of replications were assessed using relative bias. That is

{\hat{θ}}_{R B} = \frac{(\frac{\sum_{b = 1}^{r e p s} {\hat{θ}}_{b}}{r e p s} - θ_{0})}{θ_{0}},

where ${\hat{θ}}_{b}$ is the parameter estimate, $θ_{0}$ is the population value for $θ$ , and $r e p s$ is the number of replications. Ideally, this percentage would be close to zero and no more than $10 %$ or greater in either direction (see, e.g., Curran et al., 1996; Kaplan, 1989).

To answer the third research question of interest, bias of the SEs of the estimates, or the analogous posterior standard deviations (SDs) of the estimates for the Bayesian method, was evaluated by the ratio between the square root of the mean variance of $\hat{θ}$ and the SD of the 100 parameter estimates. That is

SE Bias = \frac{S E (\hat{θ})}{S D_{1} (\hat{θ})},

where

S E (\hat{θ}) = \sqrt{100^{- 1} \sum_{b = 1}^{100} {[S E_{b} (\hat{θ})]}^{2}},

and $S D_{1} (\hat{θ}) = S D (\hat{θ}) \cdot \sqrt{99 / 100}$ , which is the corrected sample SD of the 100 parameter estimates. SEs were considered to be accurate when this ratio was close to 1. SE bias values greater than 1 indicated overestimated SEs, which lead to increased Type II errors, and values less than 1 indicate underestimated errors, which lead to increased Type I errors.

Results

Provided in Table 3 is a summary of the results from each of the factorial ANOVA analyses that were conducted on the outcome measures. A nonempty cell in the table is indication that differences were significant among all the conditions tested. For example, the table shows that the different missing data handling methods (ME) produced differences in the rates of convergence, accuracy, relative bias, and SE/SD ratios. A three-way interaction between method, SS, and MR was also flagged for the relative bias of the residual variances. In addition, a two-way interaction was flagged between method and SS for the convergence rates, relative bias of the intercept variance, and slope variance estimates, as well as the SE/SD ratio of all the variance estimates. This table was used to pinpoint specific conditions for which the outcomes were significantly different. While many differences in the conditions could have been reported, we focus on a few differences in order to focus on answering the research questions of interest. Specific results that are not discussed in this study are available from the first author by request.

Table 3.

Summary of Significant Outcomes From Factorial Analysis of Variance of Conditions Tested

Factor	Convergence	Accuracy	Relative Bias	SE/SD Ratio
ME $\times$ SS $\times$ MR	—	—	$Θ$	—
ME $\times$ SS	Significant	—	$Ψ_{1}$ , $Ψ_{2}$	$Ψ_{1}$ , $Ψ_{2}$ , $Ψ_{12}$ , $Θ$
ME $\times$ MR	—	—	—	$Ψ_{2}$
ME $\times$ MM	—	—	—	—
ME	Significant	Significant	Most	Most
SS	Significant	—	$Ψ_{2}$ , $Ψ_{12}$	—
SP	—	Significant	Most	—
MM	—	—	—	—
MR	Significant	—	—	—

Note. $Ψ_{1}$ = intercept variance; $Ψ_{2}$ = slope variance; $Ψ_{12}$ = intercept slope covariance; $Θ$ = residual variance; ME = missing data handling method; SS = sample size; MR = missing rate; SP = class separation; MM = missing mechanism.

In addition to the missing data analyses, full-data GMM analyses were conducted in order to establish appropriate comparisons for the different missing data analyses. The full-data analyses involved running a two-class GMM analysis on the data without any missing data, estimated two different ways. The first method of estimation was using the MLEM approach. The second approach was using a Bayesian estimation method using the Mplus default inverse Wishart (IW) prior specification for the variances of the growth factors. In a separate study, these priors produced the least biased estimates and were considered a good baseline for comparison. Results for the full-data analyses are not presented and are made available only by request.

Convergence and Accuracy

The convergence rates were generally lower for the missing data analyses (LD, FI, MI, FB, and 2M) than the rates obtained from the full-data analyses (MLEM and IW), as can be seen in Table 4, which shows the convergence rates for each method across all the SSs tested. The LD method produced the lowest convergence rates, which were as low as 28%. The FI method produced relatively low convergence rates compared to using MI, FB, and 2M, with a convergence rate as low as 39% when the SS was 100. The Bayesian approach produced the highest rates of convergence, even when SSs were as small as 100, never going below 80%. The 2M method also produced relatively high rates of convergence, although not as high as the FB approach, staying between 81% and 89% convergence rates.

Table 4.

Convergence Rates Across Method and Sample Size Using Missing Data Methods

Method	N = 100	N = 200	N = 500	N = 1000
MLEM	.48	.63	.79	.77
IW	.82	.90	.90	.91
LD	.28	.40	.57	.67
FI	.39	.52	.66	.75
MI	.67	.79	.85	.89
FB	.90	.90	.84	.82
2M	.89	.89	.84	.81

Note. MLEM = maximum likelihood via expectation-maximization; IW = inverse Wishart; LD = listwise deletion; FI = full-information ML via EM; MI = single-stage multiple imputation; FB = fully Bayesian; 2M = two-stage multiple imputation.

Although no significant interaction was detected for differences in the classification accuracy rates, classification accuracy decreased uniformly as separation decreased, as can be seen in Table 5. The FB approach maintained the closest accuracy rates to its full data counterpart. Accuracy rates decreased when the 2M method was used. Using LD, FI, and MI had a very minimal impact on the classification accuracy rates compared to the rates produced when using full data MLEM. Accuracy rates went from 53% when the separation was very poor to 66% when the separation was high. At the very poor classification condition, all the methods produced the same poor results of around 51% to 53% accuracy rates. The classification accuracy rates were also slightly better for the Bayesian method and the imputation methods across all separations, with the largest difference observed between FB and FI when the class separation was moderate. When the separation was high, FI, MI, and 2M produced similar classification accuracy rates, but the FB method was around 6% higher than all other methods. As expected, the missing data conditions dropped the classification accuracy rates across all methods.

Table 5.

Classification Accuracy Rates Across Method and Separation Using Missing Data Methods

Method	Very Poor	Poor	Moderate	High
MLEM	.51	.54	.57	.62
IW	.54	.58	.64	.68
LD	.51	.53	.57	.60
FI	.51	.53	.56	.60
MI	.52	.54	.57	.61
FB	.53	.57	.62	.66
2M	.52	.54	.56	.60

Parameter Recovery

The discrepancies across methods in the overall absolute relative bias measures can be clearly seen in Figure 1. Overall, the relative bias of the FB and 2M methods was closest to their full-data counterpart (IW) and was all under the 10% acceptable threshold. This is evidence that the FB and 2M methods produced similar estimates as when the data were not missing, regardless of how much data were missing or which missing data mechanism was used. The large discrepancies between the means and medians of the absolute relative bias measures indicate that the overall bias may be isolated to a few parameters that produced extremely biased parameter estimates. The LD, FI, and single stage MI methods produced higher mean absolute relative bias values than MLEM method, although the medians were lower.

Figure 1.

Left: Overall absolute relative bias across methods (ME). Right: Overall SE/SD ratios across methods (ME).

Examining the relative bias of all the estimated parameters for each method (Figure 1), for the Bayesian methods, these spikes in the means were likely due to the large relative bias measures produced by the class 1 and class 2 slope means, the slope variances, and intercept slope covariance measures. This is also seen in Table 6, where the relative bias passed the 10% threshold when missing data were incorporated and analyzed using the FB method in the estimation of the intercept variances (from 8% to 10%) and the intercept slope covariances (from 6% to 19%). For the MLEM-based methods, the large bias measures were attributed to class 1 and class 2 slopes, as well as intercept variances and intercept slope covariances. Table 6 shows how the slope means, intercept variances, slope variances, and intercept slope covariances were generally downward biased for the MLEM-based methods. For some parameters, like the intercept slope covariance parameter, LD and FI exacerbated the bias, increasing the relative bias from −20% to −45% for the LD method and to −26% for the FI method.

Table 6.

Relative Bias for Select Parameters Across Methods

Method	$μ_{s l o p_{C 1}}$	$μ_{s l o p_{C 2}}$	$Ψ_{1}$	$Ψ_{2}$	$Ψ_{12}$
MLEM	−.25	.14	−.05	−.19	−.20
LD	−.25	.14	−.19	−.12	−.45
FI	−.26	.21	−.06	−.16	−.26
MI	−.21	.14	−.09	−.25	.11
IW	−.04	.03	.08	−.04	−.06
FB	−.06	.03	.10	.00	−.19
2M	−.09	.05	.07	−.06	−.04

Note. Values in bold indicate relative bias values exceeding the 10% acceptable threshold in either direction (Curren et al., 1996). $μ_{s l o p_{C 1}}$ = class 1 slope mean; $μ_{s l o p_{C 2}}$ = class 2 slope mean; $Ψ_{1}$ = intercept variance; $Ψ_{2}$ = slope variance; $Ψ_{12}$ = intercept slope covariance; MLEM = maximum likelihood via expectation-maximization; IW = inverse Wishart; LD = listwise deletion; FI = full-information ML via EM; MI = single-stage multiple imputation; FB = fully Bayesian; 2M = two-stage multiple imputation.

Overall, very poor and high separation conditions produced a select number of parameters with large relative bias. Table 7 shows that the largest measures of relative bias occur for the estimates of the slope means, intercept variances, slope variances, and intercept slope covariances. Estimates of the slope means and slope variances were most negatively biased (22% and 21%, respectively) when separation was very poor (MD = 0.5) and improved as class separation increased to high (MD = 2.0), reaching negative relative bias measures of −13% for the slope means and −12% for the slope variances. The reverse occurred for the intercept slope variance estimates, where the bias increased from −21% to −40% when class separation went from MD = 0.5 to MD = 2.0.

Table 7.

Relative Bias for Select Parameters Across Class Separation

Condition	$μ_{i n t_{C 1}}$	$μ_{i n t_{C 2}}$	$μ_{s l o p_{C 1}}$	$μ_{s l o p_{C 2}}$	$Ψ_{1}$	$Ψ_{2}$	$Ψ_{12}$	$Θ$	$π_{c 1}$	$π_{c 2}$
MD = 0.5	.06	−.04	−.22	.16	−.21	−.14	.21	−.01	−.05	.05
MD = 1.0	.04	−.02	−.18	.10	−.16	−.10	−.04	−.01	−.03	.03
MD = 1.5	.02	.00	−.16	.10	.02	−.11	−.36	−.01	−.01	.01
MD = 2.0	.00	.02	−.13	.10	.21	−.12	−.48	.00	.03	−.03

Note. Values in bold indicate relative bias values exceeding the 10% acceptable threshold in either direction (Curren et al., 1996). $μ_{i n t_{C 1}}$ = class 1 intercept mean; $μ_{i n t_{C 2}}$ = class 2 intercept mean; $μ_{s l o p_{C 1}}$ = class 1 slope mean; $μ_{s l o p_{C 2}}$ = class 2 slope mean; $Ψ_{1}$ = intercept variance; $Ψ_{2}$ = slope variance; $Ψ_{12}$ = intercept slope covariance.

A closer look at the two-way interaction between SS and method for the intercept variances, slope variances, and residual variances shown in Figure 2 reveals different patterns of relative bias across SSs and methods. For example, larger SSs in general produced relative bias measures close to zero and within the 10% criteria. For MLEM-based methods, the bias increased uniformly downward as SS increased but was magnified when using the LD method followed by the MI method. The LD method produced values below the threshold even at sample sizes of 500. The Bayesian methods performed better at all SSs above 200, where the FB method produced estimates with positive relative bias right above the 10% threshold. At SSs of 100, all methods including the full-data method produced biased estimates of the intercept variance.

Figure 2.

Interaction plots for ME $\times$ SS of the relative bias measures. Clockwise from top left: Intercept variances, slope variances, and residual variances. Note. SS = sample size.

Estimates of the slope variances produced by the MLEM method were severely underestimated to begin with and with the addition of missing data conditions, methods like MI produced even more underestimated parameters across all SS conditions. The Bayesian approaches were mostly within the boundaries of the acceptable bias range except when the SSs were at 100 or 1000. Estimates were already underestimated for the full-data counterpart, and when missing data conditions were added, the FB and 2M methods exacerbated the negative bias for SSs of 500 and 1000. At SSs of 100 and 200, the relative bias swayed the opposite direction, where the relative bias estimates moved upward.

Finally, the three-way interaction between SS, method, and MR can be easily observed in Figure 3. Although the residual variances were generally not severely biased, the plot for SS 100 shows that the relative bias was extremely underestimated when the MR was 40% and LD was used. In general, the bias observed across methods and MR was as expected: larger missing data rates and smaller SSs produced more biased parameter estimates of the residual variances.

Figure 3.

Three-way interaction ME $\times$ SS $\times$ MR for the relative bias of the residual variances. Note. SS = sample size; MR = missing rate.

Discussion

In this study, several different methods for handling missing data in the context of GMMs were examined across realistic analytic conditions. Findings from the simulation study showed both expected and unexpected results. The MI method produced higher rates of convergence, even than the rates produced from the full data ML method. This may have been an artifact of the MI method being a quasi-Bayesian method in that imputations were drawn from a data augmentation process that is FB. This may be supported by the fact that the Bayesian-based methods had the highest rates of convergence (greater than 80% across SSs) and that the MI method produced rates approaching these values.

The accuracy rates across methods when missing data conditions were introduced were mainly indistinguishable for very poor class separation. The FB method, however, produced the highest accuracy rates with poor, moderate, and high class separation conditions, while the other methods produced similar accuracy rates. The lower accuracy rates that were observed for the 2M method were somewhat surprising, given that the parameter estimates were quite comparable to the FB method. In retrospect, this may have been because of how these accuracy rates were collected for the 2M method during the simulation, which was different from how accuracy rates were collected for nonimputation methods. In any case, this issue warrants a deeper investigation into how the accuracy parameters were collected during the simulation.

Overall, MLEM-based methods produced higher relative bias measures than Bayesian and Bayesian-based methods. The discrepancies between the means and medians of the overall absolute relative bias pointed to certain parameters as the main culprits of this bias. For all methods, including methods using the full data, slope means, intercept variances, and intercept slope covariances were always problematic, and the slope variance was also problematic for MLEM-based methods. Specifically, the slope variance and intercept slope covariances were consistently underestimated. These results were consistent with the growth modeling literature, which has pointed to the issue that even with complete data, some ML methods under a growth modeling SEM framework will underestimate factor variances and SEs (Browne & Draper, 2006), even though large SSs should correct for these discrepancies. In this regard, Browne and Draper (2006) and also McNeish and Harring (2017), who conducted a similar study of LGMs with a focus on samples of less than 100, recommend using a restricted version of the ML method to obtain unbiased parameter estimates. The issue has to do with how in an SEM framework, variance parameters are estimated around a fixed effect, which is more difficult to accomplish with smaller SSs. The suggested remedy according to these studies was to use a restricted version of ML estimation method, which uses a separate process to estimate the fixed effects parameters. The bias observed in this study was also partially supported by the results presented in Depaoli (2013), which showed that slope variances and sometimes the intercept and intercept slope variances were consistently underestimated when using the standard MLEM method (the default in Mplus). Bayesian-based methods also seem to have remedied this issue, as was seen by the relatively smaller measures of relative bias for all the variance terms.

The slope means were also consistently under- or overestimated, regardless of whether MLEM or a Bayesian method was used. These results generally corroborated results from Depaoli (2013), which showed biased slope means regardless of the prior that was used (unless the true parameters were used, which is unrealistic).

The SEs were consistently overestimated for 2M method, while in general, the FB method produced SE to SD ratios closest to 1, albeit overestimated SEs observed when SSs were at 100 and 200. The LD, MI, and FI methods, which were grounded on MLEM, produced underestimated SEs. These results were actually consistent with what was observed by McNeish (2016), where the Bayesian MCMC methods using noninformative priors produced overly wide coverage intervals (overestimated SEs), while FI consistently had coverage intervals that were too short (underestimated SEs). The reverse impact that the priors had on the SEs of the intercept means was interesting (SE ratios converged to 1 as SSs increased using noninformative priors, while they deviated upward from 1 as SSs increased using data driven priors). In addition, SEs were severely underestimated for the intercept means when using data driven priors, which may have been an artifact of the priors specified for the variance parameters since priors were not specified for any of the means.

LD and MI methods performed relatively poorly compared to FI, FB, and 2M, which was consistent with Enders (2011), who suggested avoiding MI for mixture models altogether. However, a firm recommendation is impossible to make based on these results because the MI approach used MLEM to analyze the imputed datasets, which as was observed, produced relatively poor results. It is impossible to tell from these results whether the bias occurred because of the missing data imputation process or because of the estimation of the datasets. On the contrary, 2M seemed to be a viable alternative, albeit producing poor estimates of the SEs.

One interesting observation from this study is that the differences when imposing an MCAR as opposed to a CMAR⁺ missingness mechanism were minimal, and whether the missing data approaches did anything to improve estimation when missingness was the more severe CMAR⁺ missingness was not very clear from the results. With multiple group models, Enders (2011) and Sterba (2016) suggested that the missingness mechanism was inconsequential to how poorly the MI method would perform. In this regard, it made sense that there was very little difference observed between the estimates from the two missingness mechanisms. However, these results suggest that additional investigations are needed because the MLEM estimates were already producing poor estimates to begin with, making the MI method incomparable to the 2M method.

Based on the findings from the simulation study, the Bayesian inference and 2M methods may serve as viable alternatives to FI and MI in terms of producing less biased parameter estimates for GMMs. However, this comparison warrants further investigation, given that the full-data analysis step of the MI method was conducted using MLEM, which produced relatively more biased estimates compared to the Bayesian method. Although choosing priors can be challenging, the advantages in terms of producing higher convergence rates, higher classification accuracy rates, less biased parameter estimates, and more accurate SEs (for the Bayesian approach) are certainly worth the additional preliminary investigation prior to the analysis. One disadvantage of using the 2M method is that although the tools are available in Mplus, the setup is more involved and requires the researcher to take additional steps outside of Mplus. The steps that were taken to conduct the two-stage imputation method for this study are fully disclosed in the online Appendix.

Even with the use of noninformative, diffuse priors, the FB approach was superior in regard to the evaluation criteria that were considered, especially with the estimation of the SEs. While the 2M method was relatively better than MI method and FI in terms of recovering parameter estimates, the SEs were generally overestimated, which in practical applications would have led to inflated Type II error rates. These recommendations should be taken with caution since many real-life conditions were kept consistent, a rare occurrence in any real data scenario.

Supplemental Material

Supplemental Material, sj-docx-1-jeb-10.3102_10769986221149140 - Handling Missing Data in Growth Mixture Models

Supplemental Material, sj-docx-1-jeb-10.3102_10769986221149140 for Handling Missing Data in Growth Mixture Models by Daniel Y. Lee and Jeffrey R. Harring in Journal of Educational and Behavioral Statistics

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

Note

References

Asparouhov

Muthén

B. O

. (2010). Bayesian analysis using Mplus: Technical implementation. Manuscript submitted for publication.

Bauer

D. J.

(2007). Observations on the use of growth mixture models in psychological research. Multivariate Behavioral Research, 42, 757–786.

Bauer

D. J.

Curran

P. J.

(2003). Distributional assumptions of growth mixture models: Implications for overextraction of latent trajectory classes. Psychological Methods, 8, 338–363.

Bollen

K. A.

Curran

P. J.

(2006). Latent curve models: A structural equation perspective. Wiley.

Browne

W. J.

Draper

(2006). A comparison of Bayesian and likelihood-based methods for fitting multilevel models. Bayesian Analysis, 1(3), 473–514.

Cai

J.-H.

Song

X.-Y.

Hser

Y.-I.

(2010). A Bayesian analysis of mixture structural equation models with non-ignorable missing responses with covariates. Statistics in Medicine, 29, 1861–1874.

Cassiday

Cho

Harring

J. R.

(2021). A comparison of label switching algorithms for finite mixture models. Educational and Psychological Measurement, 81, 668–697.

Codd

C. L.

Cudeck

(2014). Nonlinear random-effects mixture models for repeated measures. Psychometrika, 79, 60–83.

Cohen

(1988). Statistical power analysis for the behavioral sciences. Erlbaum.

10.

Curran

P. J.

West

S. G.

Finch

J. F.

(1996). The robustness of test statistics to nonnormality and specification error in confirmatory factor analysis. Psychological Methods, 1(1), 16–29.

11.

Dempster

A. P.

Laird

N. M.

Rubin

D. B.

(1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39, 1–38.

12.

Depaoli

(2013). Mixture class recovery in GMM under varying degrees of class separation: Frequentist versus Bayesian estimation. Psychological Methods, 18, 186–219.

13.

Depaoli

(2014). The impact of inaccurate “informative” priors for growth parameters in Bayesian growth mixture modeling. Structural Equation Modeling: A Multidisciplinary Journal, 21, 239–252.

14.

Enders

C. K.

(2010). Applied missing data analysis. SAGE.

15.

Enders

C. K.

(2011). Missing not at random models for latent growth curve analyses. Psychological Methods, 16, 1–16.

16.

Enders

C. K.

Tofighi

(2008). The impact of misspecifying class-specific residual variances in growth mixture models. Structural Equation Modeling: A Multidisciplinary Journal, 15, 75–95.

17.

Gelman

Carlin

J. B.

Stern

H. S.

Rubin

D. B

. (1995). Bayesian data analysis. Chapman and Hall.

18.

Geman

(1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 721–741.

19.

Graham

J. W.

(2009). Missing data analysis: Making it work in the real world. Annual Review of Psychology, 60(1), 549–576.

20.

Graham

J. W.

Taylor

B. J.

Cumsille

P. E.

(2001). Planned missing-data designs in analysis of change. In Collins

L. M

Sayer

A. G

. (Eds.), New methods for the analysis of change (pp. 335–353). American Psychological Association.

21.

Hancock

G. R.

Harring

J. R.

Lawrence

F. R.

(2013). Using latent growth modeling to evaluate longitudinal change. In Hancock

G. R.

Mueller

(Eds.), Structural equation modeling: A second course (2nd ed., pp. 309–341). Information Age Publishing, Inc.

22.

Harel

(2003). Strategies for data analysis with two types of missing values [Unpublished doctoral dissertation]. Pennsylvania State University, State College, PA.

23.

Harel

(2007). Inferences on missing information under multiple imputation and two-stage multiple imputation. Statistical Methodology, 4, 75–89.

24.

Harel

(2009). The estimation of R2 and adjusted R2 in incomplete data sets using multiple imputation. Journal of Applied Statistics, 36, 1109–1118.

25.

Harel

Chung

Miglioretti

(2013). Latent class regression: Inference and estimation with two-stage multiple imputation: Two-stage MI for LCR. Biometrical Journal, 55(4), 541–553.

26.

Harring

J. R.

(2012). Finite mixtures of nonlinear mixed effects models. In Harring

J. R.

Hancock

G. R.

(Eds.), Advances in longitudinal methods in the social and behavioral sciences (pp. 159–192). Information Age Publishing, Inc.

27.

Harring

J. R.

Hodis

(2016). Applications of mixture modeling in educational psychology. Educational Psychologist, 51, 354–367.

28.

Hedeker

Gibbons

R. D.

(1997). Application of random-effects pattern-mixture models for missing data in longitudinal studies. Psychological Methods, 2, 64–78.

29.

Henson

J. M.

Reise

S. P.

Kim

K. H.

(2007). Detecting mixtures from structural model differences using latent variable mixture modeling: A comparison of relative model fit statistics. Structural Equation Modeling: A Multidisciplinary Journal, 14, 202–226.

30.

Hipp

J. R.

Bauer

D. J.

(2006). Local solutions in the estimation of growth mixture models. Psychological Methods, 11, 36–53.

31.

Jung

Wickrama

K. A. S.

(2007). An introduction to latent class growth analysis and growth mixture modeling. Social and Personality Psychology Compass, 2, 302–317.

32.

Kaplan

(1989). A study of the sampling variability and z-values of parameter estimates from misspecified structural equation models. Multivariate Behavioral Research, 24(1), 41–57.

33.

Kaplan

(2002). Methodological advances in the analysis of individual growth with relevance to education policy. Peabody Journal of Education, 77(4), 189–215.

34.

Kohli

Hughes

Wang

Davison

M. L.

Zopluoglu

(2015). Fitting a linear–linear piecewise growth mixture model with unknown knots: A comparison of two common approaches to inference. Psychological Methods, 20, 259–275.

35.

Harring

J. R.

Macready

G. B.

(2014). Investigating the feasibility of using Mplus in the estimation of growth mixture models. Journal of Modern Applied Statistical Methods, 13, 31–50.

36.

Lock

E. F.

Kohli

Bose

(2018). Detecting multiple random changepoints in Bayesian piecewise growth mixture models. Psychometrika, 83, 733–750.

37.

Z. L.

Zhang

Lubke

(2011). Bayesian inference for growth mixture models with latent class dependent missing data. Multivariate Behavioral Research, 46, 567–597.

38.

Lubke

Neale

M. C.

(2006). Distinguishing between latent classes and continuous factors: Resolution by maximum likelihood? Multivariate Behavioral Research, 41(4), 499–532.

39.

McArdle

J. J.

(1986). Latent variable growth within behavior genetic models. Behavior Genetics, 16, 163–200.

40.

McGinniss

Harel

(2016). Multiple imputation in three or more stages. Journal of Statistical Planning and Inference, 176, 33–51.

41.

McLachlan

Krishnan

(2007). The EM algorithm and extensions (Vol. 382). John Wiley & Sons.

42.

McLachlan

Peel

(2000). Finite mixture models. Wiley.

43.

McNeish

D. M.

(2016). Using data-dependent priors to mitigate small sample bias in latent growth models: A discussion and illustration using Mplus. Journal of Educational and Behavioral Statistics, 41(1), 27–56.

44.

McNeish

D. M.

Harring

J. R.

(2017). The effect of model misspecification on growth mixture model class enumeration. Journal of Classification, 34, 223–248.

45.

McNeish

D. M.

Harring

J. R.

(2020). Covariance pattern mixture models: Eliminating random effects to improve convergence and performance. Behavior Research Methods, 52, 947–979.

46.

McNeish

D. M.

Matta

(2018). Differentiating between mixed-effects and latent-curve approaches to growth modeling. Behavior Research Methods, 50, 1398–1414.

47.

Mehta

P. D.

West

S. G.

(2000). Putting the individual back into individual growth curves. Psychological Methods, 5, 23–43.

48.

Meredith

Tisak

(1990). Latent curve analysis. Psychometrika, 55, 107–122.

49.

Musu-Gillette

L. E.

Wigfield

Harring

J. R.

Eccles

J. S.

(2015). Trajectories of change in students’ self-concepts of ability and values in math and college major choice. Educational Research and Evaluation, 21, 343–370.

50.

Muthén

B. O.

(2001). Latent variable mixture modeling. In Marcoulides

G. A.

Schumacker

R. E.

(Eds.), New developments and techniques in structural equation modeling (pp. 1–33). Lawrence Erlbaum Associates.

51.

Muthén

B. O.

Curran

P. J.

(1997). General longitudinal modeling of individual differences in experimental designs: A latent variable framework for analysis and power estimation. Psychological Methods, 2, 371–402.

52.

Muthén

B. O.

Muthén

L. K.

Asparouhov

. (2010). Bayesian analysis using Mplus. www.statmodel.com.

53.

Muthén

B. O.

Shedden

. (1999). Finite mixture modeling with mixture outcomes using the EM algorithm. Biometrics, 55, 463–469.

54.

Muthén

L. K.

Muthén

B. O

. (1998–2017). Mplus user’s guide (8th ed.). Muthén

Muthén

55.

Nagin

D. S.

(1999). Analyzing developmental trajectories: A semiparametric, group-based approach. Psychological Methods, 4, 139–157.

56.

Nylund

K. L.

Asparouhov

Muthén

. (2007). Deciding on the number of classes in latent class analysis and growth mixture modeling: A Monte Carlo simulation study. Structural Equation Modeling: A Multidisciplinary Journal, 14, 535–569.

57.

Peugh

J. L.

Fan

(2012). How well does growth mixture modeling identify heterogeneous growth trajectories? A simulation study examining GMM’s performance characteristics. Structural Equation Modeling: A Multidisciplinary Journal, 19, 204–226.

58.

Preacher

K. J.

Wichman

A. L.

MacCallum

R. C.

Briggs

N. E.

(2008). Latent growth curve modeling. SAGE.

59.

R Development Core Team. (2008). R: A language and environment for statistical computing (Tech. Rep.). R Foundation for Statistical Computing. http://www.R-project.org

60.

Rhemtulla

Hancock

G. R.

(2016). Planned missing data designs in educational psychology research. Educational Psychologist, 51, 305–316.

61.

Rubin

D. B.

(1976). Inference and missing data. Biometrika, 63, 581–592.

62.

Rubin

D. B.

(1987). Multiple imputation for nonresponse in surveys. Wiley.

63.

Rubin

D. B.

(2003). Nested multiple imputation of NMES via partially incompatible MCMC. Statistica Neerlandica, 57, 3–18.

64.

Schafer

J. L.

Graham

J. W.

(2002). Missing data: Our view of the state of the art. Psychological Methods, 7, 147–177.

65.

Shen

(2000). Nested multiple imputation [Unpublished doctoral dissertation]. Harvard University, Department of Statistics.

66.

Song

X.-Y.

Lee

S.-Y.

(2002). Analysis of structural equation models with ignorable missing continuous and polytomous data. Psychometrika, 67, 261–288.

67.

Sterba

S. K.

(2014). Fitting nonlinear latent growth curve models with individually varying time points. Structural Equation Modeling: A Multidisciplinary Journal, 21, 630–647.

68.

Sterba

S. K.

(2016). Cautions on the use of multiple imputation when selecting between latent categorical versus continuous models for psychological constructs. Journal of Clinical Child and Adolescent Psychology, 45, 167–175.

69.

Tofighi

Enders

C. K.

(2007). Identifying the correct number of classes in growth mixture models. In Hancock

G. R.

(Ed.), Advances in latent variable mixture models (pp. 317–341). Information Age Publishing, Inc.

70.

Tueller

S. J.

Drotar

Lubke

G. H.

(2011). Addressing the problem of switched class labels in latent variable mixture model simulation studies. Structural Equation Modeling: A Multidisciplinary Journal, 18(1), 110–131.

71.

Tueller

S. J.

Lubke

G. H.

(2010). Evaluation of structural equation mixture models: Parameter estimates and correct class assignment. Structural Equation Modeling: A Multidisciplinary Journal, 17(2), 165–192.

72.

van Buuren

Brand

J. P.

Groothuis-Oudshoorn

C. G.

Rubin

D. B.

(2006). Fully conditional specification in multivariate imputation. Journal of Statistical Computation and Simulation, 76(12), 1049–1064.

73.

Winship

Mare

R. D.

Warren

J. R.

(2002). Latent class models for contingency tables with missing data. In Hagenaars

J. A.

McCutcheon

A. L.

(Eds.), Applied Latent Class Analysis. Cambridge University Press.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.01 MB