Growth Mixture Modeling With Nonnormal Distributions: Implications for Data Transformation

Abstract

This study investigated the extent to which class-specific parameter estimates are biased by the within-class normality assumption in nonnormal growth mixture modeling (GMM). Monte Carlo simulations for nonnormal GMM were conducted to analyze and compare two strategies for obtaining unbiased parameter estimates: relaxing the within-class normality assumption and using data transformation on repeated measures. Based on unconditional GMM with two latent trajectories, data were generated under different sample sizes (300, 800, and 1500), skewness (0.7, 1.2, and 1.6) and kurtosis (2 and 4) of outcomes, numbers of time points (4 and 8), and class proportions (0.5:0.5 and 0.25:0.75). Of the four distributions, it was found that skew-t GMM had the highest accuracy in terms of parameter estimation. In GMM based on data transformations, the adjusted logarithmic method was more effective in obtaining unbiased parameter estimates than the use of van der Waerden quantile normal scores. Even though adjusted logarithmic transformation in nonnormal GMM reduced computation time, skew-t GMM produced much more accurate estimation and was more robust over a range of simulation conditions. This study is significant in that it considers different levels of kurtosis and class proportions, which has not been investigated in depth in previous studies. The present study is also meaningful in that investigated the applicability of data transformation to nonnormal GMM.

Keywords

nonnormal growth mixture modeling data transformation unbiased parameter estimate Monte Carlo simulation study

Introduction

Growth mixture modeling (GMM) is a form of finite mixture modeling that combines conventional random effects modeling with latent trajectory classes (Muthén & Asparouhov, 2015). GMM has been commonly used to detect whether qualitatively different subgroups of developmental paths exist within a population (Bauer & Curran, 2003; Muthén, 2004). In addition to this, it has more flexibility in modeling longitudinal data compared with traditional methods (e.g., hierarchical linear modeling and repeated measures analysis of variance), so GMM has been frequently employed in social science over the past two decades (Guerra-Peña & Steinley, 2016).

GMM accommodates unobserved heterogeneity in growth trajectories by employing latent classes derived from growth factors (e.g., intercept and slope) in a latent growth curve model. By including latent classes, this methodology fully captures both within-class variation (i.e., interindividual variation in each latent class) and between-class variation (i.e., variation between latent subgroups; Jung & Wickrama, 2008). Allowance of such variation in a model generally rests on the assumption of a normal distribution. This means GMM assumes that variables within each latent class are normally distributed (i.e., within-class normality; Feldman et al., 2009; Muthén & Asparouhov, 2015). In other words, the estimation of a model and its identification in GMM depend on the assumption of a mix of normal distributions, which allows for the existence of subpopulations captured by latent trajectory classes (Bauer & Curran, 2003; Wickrama et al., 2016).

Since nonnormal distributions with various degrees of skewness and kurtosis are readily found in real data, the within-class normality assumption of GMM is often considered a limiting feature (Son et al., 2019). Because of this normality assumption, GMM tends to prefer models that include spurious classes to explain the observed variable distributions to a true model, particularly when the outcomes strongly follow nonnormal distributions (Bauer & Curran, 2004; Muthén & Asparouhov, 2015). These spurious classes, which are excessively extracted, should not be understood directly as subgroups of a population with distinct developmental patterns. Although each class-specific parameter can be computed according to the result of latent class estimation, they are likely to be artificial and nonmeaningful (Brandt & Klein, 2015). However, in general cases GMM is frequently somewhat misused despite of high possibility of spurious subgroups being overextracted in a particular model design. That is because it has been taken for granted to ignore nonnormality, even though nonnormal situations frequently occur in mixture modeling analysis (McLachlan & Peel, 2000). This is why there has been little simulation research on nonnormal GMM.

To deal with nonnormality, recent articles have investigated the use of skew-t distributions for structural equation models (SEMs) and mixture models (e.g., Asparouhov & Muthén, 2016; Frühwirth-Schnatter & Pyne, 2010; Lee & McLachlan, 2014; Lin et al., 2007). GMM with distributions generated from multiple levels of nonnormality have also been examined in previous simulation studies (e.g., Bauer & Curran, 2003; Guerra-Peña & Steinley, 2016; Jung & Wickrama, 2008; Muthén & Asparouhov, 2015; Son et al., 2019). However, most of this research has been based around the performance of fit indices, such as the Akaike information criterion (AIC; Akaike, 1974), the Bayesian information criterion (BIC; Schwartz, 1978), and sample-size adjusted BIC (SBIC; Sclove, 1987), and likelihood ratio tests, such as the Lo–Mendell–Rubin adjusted likelihood ratio test (LMR-LRT; Lo et al., 2001) and the bootstrap likelihood ratio test (BLRT; McLachlan & Peel, 2000), in terms of how to determine the number of latent classes. Only several studies (Depaoli et al., 2019; Muthén & Asparouhov, 2015; Son et al., 2019) have focused on the extent to which class-specific parameter estimates are biased by the within-class normality assumption in GMM.

Furthermore, Muthén and Asparouhov (2015) investigated the degree of parameter bias according to nonnormality under only one true model with limited conditions, meaning it is somewhat difficult to generalize their results to real situations. Depaoli et al. (2019) developed Muthén and Asparouhov’s (2015) study with different settings of simulation design by examining a variety of model fit and assessment measures, extending the type of modeling misspecification and exposing estimation difficulties of skew-t GMM. But this study left several points to be developed when it comes to including various skewness or kurtosis levels of nonnormal distributions and investigating t distribution, which also belongs to skew-t family of continuous distributions. Son et al. (2019) attempted to extend their model by considering more diverse conditions for skewness as well as four types of skew-t family distribution (i.e., the normal, t, skew-normal, and skew-t), but it remained limited in terms of kurtosis levels and class proportion within the population. They took into account only one level of kurtosis and class proportion (i.e., a balanced proportion between two latent classes), meaning that the model could not be used to further understanding of the efficiency of skew-t distributions in GMM under various nonnormal conditions that may reflect empirical cases where researchers can frequently come across in real data. Thus, it remains necessary to conduct simulations that extend previous studies by exploring more practical conditions that cover various types of nonnormality.

In social science, data transformation has traditionally been used to ensure that nonnormal variables follow a normal distribution. Recently, questions have been raised about whether data normalization approaches (e.g., logarithmic transformation, Box–Cox transformation, etc.) can be applied to SEMs and mixture models (Bauer & Curran, 2003; Kline, 2016; Yuan et al., 2000).

For example, Morgan et al. (2016) tested the robustness of normalizing transformation (i.e., van der Waerden quantile normal scores in their study) against the nonnormality of outcome variables (i.e., indicators) in terms of deciding how many latent classes exist within a population under latent profile analysis (LPA). According to their results, using that method made an accuracy of class enumeration get increased. However, this simulation study was not enough regarding that only one type of transformation was adopted. Additionally, applicability of this normalizing method to nonnormal GMMs was not examined yet.

Logarithm transformation has also been employed in other applied research (e.g., Boers et al., 2010; Stanley et al., 2017), in which the target variables are normalized for smooth estimation using the designed models. Kupek (2005) examined suitability of log-linear transformation to binary variables in SEM. In that study, SEM with the sum of squares and cross-products matrix obtained by the log-linear model correctly identified the structural relation between the variables.

In contrast, there has been a lack of simulation research examining the effectiveness and meaningfulness of data transformation in GMM. Furthermore, the two main strategies (i.e., according to whether the strategy modifies raw data, they are divided into those that employ a skew-t distribution and those that are based on normalized variables) for nonnormal GMM have not yet been compared to determine which produces better performance for a variety of nonnormal distributions deriving from different levels of skewness and kurtosis.

Therefore, in the present study, simulations were conducted to compare GMM performance with different types of distribution (i.e., normal, t, skew-normal, and skew-t distribution) under a diverse range of nonnormality in the outcome variables. Additionally, the extent to which two data transformation methods (van der Waerden quantile normal scores and adjusted logarithm transformation) assert efficient performance in nonnormal GMM is also investigated. A comparison of two previous strategies (relaxing the within-class normality assumption and using data transformation on repeated measures) is then conducted to determine which of these improves the accuracy of parameter estimation for the various nonnormal conditions.

Multivariate Nonnormal Distributions

Nonnormal mixture models can fit nonnormal data considerably better than normal mixture models (Wickrama et al., 2016). To generate more parsimonious solutions, nonnormal models look to reduce the risk of the excessive extraction of latent classes due to the nonnormality. Three different distributions can be specified based on this approach: t, skew-normal, and skew-t. The t distribution is a heavy-tailed distribution with excessive kurtosis (McLachlan & Peel, 2000) and a skew-normal distribution covers strongly skewed distribution of observations (Azzalini & Valle, 1996), while a skew-t distribution accounts for both high levels of skewness and kurtosis (Lee & McLachlan, 2014). Since the normal, t, skew-normal, and skew-t are the skew-t family of continuous distributions, skew-t distributions are considered the most general form (Asparouhov & Muthén, 2016).

Unlike the fitting of a normal distribution, which shows the results of the mean and the variance-covariance of the parameters, fitting the data to a skew-t distribution produces additional information on skewness and kurtosis (Asparouhov & Muthén, 2016; Son et al., 2019). Modeling a skew-t distribution is more complicated than simply matching skewness and kurtosis because the skew parameters are important components of the variance–covariance matrix (Muthén & Asparouhov, 2015).

Considering model’s complexity caused by including additional information such as skewness and kurtosis, this study focused on the linear GMM, which is less complex than quadratic or cubic GMM. So, in this study, the intercept and slope are included in the latent growth factors of linear GMM. In latent class $K$ ( $C = 1, \dots, K$ ), an individual $i$ th trajectory can be expressed as

Y_{i} |_{C_{i} = K} = g_{K} (A_{K} β_{ij}) + ε_{i},

(1)

where $Y_{i} |_{C_{i} = K}$ denotes the vector of observed outcomes on the $i$ th subject at time $t_{ij}$ ( $i = 1, \dots, n; j = 1, \dots, n_{i}$ ). $n$ is the number of subjects, and $n_{i}$ is the number of repeated measurements for the $i$ th individual. The mean function of the true trajectory for the $i$ th individual is described as $g_{K}$ , with a possible unknown probability $π_{K} = P (C_{i} = K)$ , which satisfies $\sum_{1}^{K} π_{K} = 1$ . $A_{K}$ is the known square indicator matrix of which the diagonal elements are either 0 or 1 and the off-diagonal elements are all 0. The reason why $A_{K}$ is introduced is that the mean functions $g_{K}$ may only involve different subsets of $β_{ij}$ . By introducing $A_{K}$ , $A_{K} β_{ij}$ will set unrelated elements of $β_{ij}$ in the $K$ th latent class. Individual parameter vector $β_{ij}$ can be expressed as $β_{ij} = d (β, ζ_{i})$ , where $d$ represents an $s$ -dimensional linear function and $β = (β_{1}, \dots, β_{r})^{T}$ is a vector of universal population parameters. According to the approach by Lu and Huang (2014), $ζ_{i} = {(ζ_{i 1}, \dots, ζ_{iq})}^{T}$ is a vector of random effects (i.e., random errors), and it follows

ζ_{i} \overset{iid}{~} N_{q} (0, Σ),

(2)

in which $Σ (q \times q)$ is an unknown variance–covariance matrix (i.e., $r \geq s \geq q$ ).

GMM generally assumes that both $ε_{i}$ and $ζ_{i}$ follow normal distribution with a mean of 0 and within-class variance-covariance matrices. However, when the data is nonnormal, it is desirable to let the outcome variables be a function of the nonnormality. In this article, following the approach by Lu and Huang (2014), a normal distribution is retained for $ζ_{i}$ , while the skew-t distribution is applied to $ε_{i}$ as the observed variables are nonnormally distributed. In other words, the normality assumption is relaxed on only $ε_{i}$ while keeping $ζ_{i}$ normal. This is because the skew-t for both $ε_{i}$ and $ζ_{i}$ cannot be identified at the same time (Muthén & Asparouhov, 2015). Therefore, in addition to Equation (1), $ε_{i}$ is assumed to follow a skew-t distribution in this study. It can be expressed as

ε_{i} \overset{iid}{~} S T_{n_{i}, v} (- J (v) δ_{i}, σ^{2} I_{n_{i}}, Δ_{i})

(3)

where $- J (v) = {(v / π)}^{1 / 2} {Γ [(v - 1) / 2] / Γ (v / 2)}$ and $Γ$ is a Gamma function. The vector of random effects (i.e., random errors) $ε_{i} = (ε_{i 1}, \dots, ε_{i n_{i}})^{T}$ follows a multivariate skew-t distribution with degree of freedom $v$ , unknown variance parameter $σ^{2}$ , and skewness diagonal matrix $Δ_{i} = diag (δ_{1}, \dots, δ_{n_{i}})^{T}$ with skewness parameter vector $δ_{i} = (δ_{1}, \dots, δ_{n_{i}})^{T}$ (see Lu & Huang, 2014, for details). This study particularly focuses on the skewness and kurtosis of the overall data set, with all $δ_{i}$ set by $δ_{i 1} = \dots = δ_{i n_{i}}$ , assuming $Δ_{i} = δ I_{n_{i}}$ and $δ_{i} = δ 1_{n_{i}}$ with $1_{n_{i}} = {(1, \dots, 1)}^{T}$ . Thus, $Y_{i}$ is conditionally

Y_{i} |_{C_{i} = K} ~ S T_{n_{i}, v} (g_{K} (A_{K} β_{ij}) - J (v) δ 1_{n_{i}}, σ^{2} I_{n_{i}}, δ I_{n_{i}})

(4)

and marginally

Y_{i} ~ \sum_{1}^{K} π_{K} S T_{n_{i}, v} (g_{K} (A_{K} β_{ij}) - J (v) δ 1_{n_{i}}, σ^{2} I_{n_{i}}, δ I_{n_{i}})

(5)

where the vector mixture probabilities $π = {(π_{1}, \dots, π_{K})}^{T}$ can also be viewed as the mixture weights of all plausible components within the GMM framework. As long as the components of the models are all identifiable and distinguishable from each other, Equation (5) can be identified. If the components are identifiable but not distinguishable, constrictions may be added to make Equation (5) identifiable.

To apply a skew-t distribution using the process above, it is important to note that the skew parameters of $δ$ have to be equally held across time because, otherwise, the means of $Y$ will not follow the structure imposed by the random effect means but will also vary according to a function of the skew and the degree of freedom of the parameters (Lu & Huang, 2014; Muthén & Asparouhov, 2015).

Data Transformation for Normalization

Data transformation is a conventional approach to the normalization of data (Kline, 2016; Svolba, 2006) that has been employed in a significant volume of statistical research. As such, data transformation has been recently discussed in terms of its suitability for SEMs and mixture models (Bauer & Curran, 2003; Kline, 2016; Yuan et al., 2000) in dealing with nonnormality in models (e.g., logarithmic transformation, Box-Cox transformation, etc.). This study focuses on two types of transformation that have been applied in other simulation research: van der Waerden quantile normal scores, which were tested in LPA by Morgan et al. (2016), and logarithmic normalization, which has been employed in other applied research (e.g., Boers et al., 2010; Stanley et al., 2017).

Van der Waerden quantile normal scores were proposed by van der Waerden (1952) to normalize indicator distributions. All values in the original joint-distributions are transformed into normal scores $Z_{ij}$ . These $Z_{ij}$ scores are computed as

Z_{ij} = Φ^{- 1} (\frac{R (X_{ij})}{N + 1}),

(6)

where $Φ$ represents the cumulative distribution function of the standard normal distribution, $R (X_{ij})$ is the rank of the value for the $i$ th subject in the $j$ th latent group ( $i = 1, \dots, n; j = 1, \dots, n_{i}$ ), and $N$ is the sample size. For example, the lowest score is assigned a rank of 1. When ties of the values exist in the data, mid-ranks are used.

Logarithmic transformation, with which data can be made to approximately conform to normality, has been the most widely used approach in practice. However, conventional logarithmic transformation can only be applied to positive values (Feng et al., 2013). To cover zero or negative values in data (e.g., temperature, eyesight test scores, and bank statements), adjusted logarithmic transformation is a useful alternative. The manipulated $W_{it}$ is expressed as $W_{it} = \log {1 + Y_{it} - min (Y_{t})}$ , where $Y_{it}$ denotes the outcome variable (i.e., each indicator at time $t$ ) in this study, and $min (Y_{t})$ is the minimum value for $Y_{it}$ at time $t$ .

Simulation Studies

Data Generation

In this article, two Monte Carlo simulation studies were conducted. An unconditional baseline GMM is generated first with two latent trajectories by using Mplus 7.4 (Muthén & Muthén, 1998-2017). The reason why only two latent subgroups are employed is that mixture models are known to exhibit poor likelihood functions, such as those that lead to a local solution or nonconvergence (McLachlan & Peel, 2000), which may generate incorrect estimations (Son et al., 2019). The residual variance for the observations is 0.5 at all time points. The means of the intercept and slope are (4, 1) for Class 1 and (0, 0) for Class 2. The variance of the intercept and slope is 1 and 0.7, respectively, meaning individual variability in the latent growth factors is allowed. In addition, the covariance between the intercept and slope is set to 0 for both Class 1 and Class 2.

As illustrated in Figure 1, Study 1 is designed to assess the performance of each distribution used in GMM. Therefore, individual GMMs with nonnormal variables are tested under various distributions, with the nonnormality untransformed. The base model used in Study 2 is the same as that in Study 1, except for the type of distribution and data transformation. In contrast to Study 1, a normal distribution is applied in Study 2 as in conventional GMM. Instead, the outcome variables are normalized by certain methodologies of data transformation.

Figure 1.

Population growth mixture modeling. (a) Study 1. (b) Study 2.

Design Factors for Simulation

Five conditions are manipulated in both Study 1 and Study 2: sample size (SS), the skewness of the outcome variables (SK), the kurtosis of the outcome variables (KT), the number of time points (NT), and the class proportion of each latent trajectory (CP). To determine the values of these conditions, previous simulation research was consulted.

Three sample sizes (300, 800, and 1,500) are considered in this study, covering the range of sample sizes frequently observed in related mixture model research (e.g., Brandt & Klein, 2015; Morgan et al., 2016; Son et al., 2019). The levels of skewness and kurtosis are manipulated as (0.7, 1.2, and 1.6) and (2 and 4) respectively. In previous simulation studies dealing with nonnormality, the skewness of the observed variables ranged from 0 to 2.28 (e.g., 0 to 1.5 in Bauer & Curran, 2003; 0 to 1.25 in Flora & Curran, 2004; 0 to 1.6 in Guerra-Peña & Steinley, 2016; 2.19 to 2.28 in Lu & Huang, 2014; 0 to 1.25 in Morgan et al., 2016), and the kurtosis ranged from 0 to 6 (e.g., 0 to 6 in Bauer & Curran, 2003; 0 to 4 in Guerra-Peña & Steinley, 2016; 0 to 3.75 in Flora & Curran, 2004; 0 to 3.75 in Morgan et al., 2016). Considering the overall conditions of nonnormality under the SEMs or mixture models, skewness and kurtosis are configured. The number of time points is set at (4 and 8) in accordance with Son et al. (2019). For the class proportion of each latent trajectory, two conditions (0.5:0.5 and 0.25:0.75) are employed based on Bauer’s (2007) simulation.

A summary of the simulation process and the design factors is presented in Table 1. A total of 72 conditions (3×3×2×2×2) are configured for data generation with 100 replications for each condition. In Study 1, the fitting of the model to each of the four distributions is conducted for the 72 conditions, while the two forms of data transformation are tested for the same 72 conditions in Study 2.

Table 1.

Manipulated Design Factors for the Simulations.

Factor	Level
Sample size (SS)	300/800/1,500
Skewness of the outcome variables (SK)	0.7/1.2/1.6
Kurtosis of the outcome variables (KT)	2/4
The number of time points (NT)	4/8
Class proportion of each latent trajectory (CP)	(0.5:0.5)/(0.25:0.75)

Total conditions = 3×3×2×2×2 = 72

Note. All conditions are tested with 100 replications. CP represents the ratio of each latent class (Class 1: Class 2).

Data Transformation for Nonnormality

To generate nonnormality in the distributions of outcomes, Fleishman’s cubic transformation (Fleishman, 1978; Morgan et al., 2016; Vale & Maurelli, 1983) is applied to the data sets. This form of cubic transformation is used to obtain a specific nonnormal distribution with a certain level of skewness and kurtosis and is defined by the following polynomial expression:

Y = - a + bZ + c Z^{2} + d Z^{3}

(7)

where $Y$ is the nonnormal distribution of the variable with the expected skewness and kurtosis. The expected value is attained by transforming normal variate $Z$ with the coefficients vector $(a, b, c, d)$ .

These coefficients can be calculated using SAS/IML software (Fan et al., 2002), so SAS 9.4 is employed to generate the nonnormal GMMs in the present study. The obtained values, which derive from the skewness and kurtosis values set in the simulation design, are presented in Table 2.

Table 2.

Fleishman’s Transformation Cubic Coefficients.

Skewness	Kurtosis	$b$	$c$	$d$
0.7	2	0.8654	0.0932	0.0403
	4	0.7554	0.0780	0.0740
1.2	2	0.9506	0.2027	0.0025
	4	0.7976	0.1477	0.0568
1.6	2	0.9625	0.4321	−0.0527
	4	0.8713	0.2452	0.0218

Note. In SAS software, a normal distribution has a skewness of 0 and a kurtosis of 0. In the polynomial formula for Fleishman’s transformation cubic coefficients, $a$ equals $- c$ (thus, $a$ is not presented in this table). The $b$ , $c$ , and $d$ coefficients are rounded to four decimal places in this table, but nine decimal places are used in the actual transformation for the simulations.

Evaluation Criteria

In order to evaluate and compare the accuracy and reliability of the individual GMMs, four criteria are examined in this study: the convergence rate, parameter bias, mean square error (MSE), and coverage. The convergence rate is the number of replications in which the estimation of the analysis model is computed successfully. Therefore, nonconvergence represents that solution for the defined model may not be produced or mathematically implausible as a result of problems such as negative variance or a singular matrix. The parameter bias used in the present study is the bias of the logit for the probability of belonging to Class 1. The relative bias of this parameter can be computed using the following equation (Bandalos, 2006; Muthén & Muthén, 2000):

Bias (\hat{θ}) = \sum_{r = 1}^{R} (\frac{{\hat{θ}}_{r} - θ}{θ}) / R

(8)

where $θ$ represents the true value of the parameter, $\hat{θ}$ indicates the estimated value of the parameter for a certain class, and $r$ is the number of replications. A cut-off point at which the absolute value of 0.10 is lower than the bias can be considered a tolerable degree of bias (Finch et al.,1997; Kaplan, 1988), and this criterion is also employed in the present study. MSE provides information about the extent to which the parameter is consistently and accurately estimated and is expressed as:

MS E_{{\hat{θ}}_{r}} = {Bias ({\hat{θ}}_{r})}^{2} + Var ({\hat{θ}}_{r})

(9)

where $\hat{θ}$ denotes the estimated value of the logit for the probability of belonging to Class 1, and $r$ is the number of replications. A smaller MSE indicates estimates are obtained in more accurate and reliable way. Coverage is the proportion of replications for which the preestablished confidence interval includes the true parameter value. The present study employs a coverage of 95% to evaluate the GMMs, meaning that 95 of the 100 trials cover the true value of the parameter with a 95% reliability (Agresti & Finlay, 2009). A range from .925 to .975 is considered sufficient to satisfy the level of coverage for a parameter (Bandalos, 2006; Muthén & Muthén, 2000).

Results

Study 1

Study 1 investigates how the bias of the parameter estimates (i.e., the logit for probability of belonging to Class 1) is affected by the levels of nonnormality in the repeated measures using four types of distribution: normal, t, skew-normal, and skew-t. That means which distribution leads to get more accurate parameter estimates is explored. The GMMs are evaluated based on four criteria: convergence rate, parameter bias, MSE, and coverage.

Convergence Rate

The convergence rates for Study 1 according to sample size are presented in Table 3. For all conditions, GMMs with normal and t distributions exhibit good convergence rates, with all replications used in the estimation process except for a few cases. Skew-normal GMMs have the lowest convergence rates under many of the conditions, with convergence especially low for a sample size of 300, though this improved as the sample size increased. Additionally, on average, skew-normal GMMs with a CP of 0.25:0.75 have lower convergence rates compared with a CP of 0.5:0.5. Skew-t GMMs show slightly better convergence rates than skew-normal GMMs, but they remain lower than those of GMMs with normal and t distributions. In general, regardless of the distribution type, the GMMs exhibit higher convergence when the sample size is larger, the class proportion of each latent trajectory is balanced, and the degree of nonnormality is lower.

Table 3.

Convergence Rates in Study 1.

				SS = 300				SS = 800				SS = 1,500
CP	NT	SK	KT	Normal	t	Skew-normal	Skew-t	Normal	t	Skew-normal	Skew-t	Normal	t	Skew-normal	Skew-t
0.5:0.5	4	0	0	1.00	0.92	0.93	0.60	1.00	0.99	0.99	0.56	1.00	1.00	1.00	0.60
		0.7	2	1.00	1.00	0.87	1.00	1.00	1.00	0.97	0.99	1.00	1.00	1.00	0.98
			4	1.00	0.96	0.81	1.00	1.00	1.00	0.98	0.99	1.00	1.00	1.00	0.99
		1.2	2	1.00	1.00	0.73	0.97	1.00	1.00	0.86	0.96	1.00	1.00	0.90	1.00
			4	1.00	0.98	0.80	0.98	1.00	1.00	0.98	0.98	1.00	1.00	0.98	0.98
		1.6	2	1.00	1.00	0.94	1.00	1.00	1.00	0.92	0.98	1.00	1.00	1.00	0.98
			4	1.00	1.00	0.73	0.95	1.00	1.00	0.92	0.98	1.00	1.00	0.98	0.97
	8	0	0	1.00	0.99	0.90	0.75	1.00	1.00	0.99	0.80	1.00	0.99	1.00	0.80
		0.7	2	1.00	1.00	0.40	0.83	0.98	1.00	0.70	0.92	1.00	1.00	0.84	0.96
			4	1.00	1.00	0.74	0.93	0.98	1.00	0.83	0.91	1.00	1.00	0.97	0.83
		1.2	2	1.00	1.00	0.98	0.94	1.00	1.00	0.97	0.97	1.00	1.00	0.99	0.94
			4	1.00	1.00	0.62	0.78	0.98	1.00	0.90	0.88	1.00	1.00	0.95	0.84
		1.6	2	1.00	1.00	0.75	0.90	0.99	1.00	0.94	0.87	1.00	1.00	0.98	0.91
			4	1.00	1.00	1.00	0.77	1.00	1.00	0.98	0.84	1.00	1.00	1.00	0.83
0.25:0.75	4	0	0	1.00	0.80	0.92	0.44	1.00	0.87	0.97	0.74	1.00	0.90	1.00	0.85
		0.7	2	1.00	1.00	0.64	0.98	1.00	1.00	0.88	1.00	1.00	1.00	0.98	0.98
			4	1.00	0.97	0.72	0.99	1.00	1.00	0.95	0.98	1.00	1.00	0.98	0.98
		1.2	2	1.00	1.00	1.00	1.00	1.00	1.00	1.00	0.97	1.00	1.00	0.98	0.99
			4	1.00	1.00	0.61	0.97	1.00	1.00	0.87	0.99	1.00	1.00	0.98	1.00
		1.6	2	1.00	1.00	0.50	0.96	1.00	1.00	0.86	1.00	1.00	1.00	0.95	0.99
			4	1.00	0.99	0.38	0.99	1.00	1.00	0.57	0.99	1.00	1.00	0.61	1.00
	8	0	0	1.00	1.00	0.83	0.84	1.00	1.00	0.99	0.87	1.00	0.99	1.00	0.89
		0.7	2	1.00	1.00	0.43	0.87	1.00	1.00	0.98	0.88	1.00	1.00	0.74	0.91
			4	1.00	1.00	0.53	0.95	1.00	1.00	0.81	0.95	0.99	1.00	0.92	0.96
		1.2	2	1.00	1.00	0.92	0.89	1.00	1.00	0.99	0.91	1.00	1.00	0.84	0.79
			4	1.00	1.00	0.52	0.99	1.00	1.00	0.98	0.92	0.99	1.00	0.85	0.97
		1.6	2	1.00	1.00	0.97	0.97	1.00	1.00	0.95	0.90	1.00	1.00	0.79	0.93
			4	1.00	1.00	0.95	0.82	0.99	1.00	1.00	0.66	1.00	1.00	0.95	0.68

Note. Normally distributed condition (skewness and kurtosis of 0) is also included for convenience. SS = sample size; CP = class proportion of each latent trajectory; NT = the number of time points; SK = skewness of the outcomes; KT = kurtosis of the outcomes.

Parameter Bias

According to Table 4, the normal GMMs generate seriously biased parameter estimates. It is only under certain nonnormal conditions (SK = 1.6 and KT = 2) that the bias ranges from −0.10 to 0.10, which represents a tolerable degree of bias (Finch et al., 1997; Kaplan, 1988). Similarly, the performance of the t GMMs in terms of obtaining unbiased parameter estimates is poor, with somewhat accurate estimates produced only when NT = 4 and KT =2. On the other hand, the skew-normal GMMs produce unbiased estimates over a wider range of conditions, with an optimal performance at a sample size of 800. Unlike the GMMs with normal or t distributions, skewness affects the degree of bias to a greater extent than kurtosis for these models. Overall, however, the skew-t GMMs obtain the most accurate parameter estimates, which is consistent with past simulation research (e.g., Muthén & Asparouhov, 2015; Son et al., 2019). For most conditions, the absolute bias for this distribution is less than 0.1, but skew-t GMMs may be partially affected by kurtosis because most of the cases where the absolute bias is greater than 0.1 have KT = 4 in common.

Table 4.

Bias of the Logit for the Probability of Belonging to Class 1 in Study 1.

				SS = 300				SS = 800				SS = 1,500
CP	NT	SK	KT	Normal	t	Skew-normal	Skew-t	Normal	t	Skew-normal	Skew-t	Normal	t	Skew-normal	Skew-t
0.5:0.5	4	0	0	0.011	0.013	0.135	0.030	−0.017	−0.015	0.053	0.002	−0.009	–0.007	0.043	0.003
		0.7	2	0.836	0.101	−0.189	0.018	0.797	0.057	−0.232	−0.012	0.796	0.060	−0.222	−0.003
			4	1.120	0.474	−0.222	0.017	1.131	0.424	−0.243	−0.013	1.142	0.428	−0.242	−0.006
		1.2	2	0.352	0.082	−0.236	0.017	0.298	0.046	−0.283	−0.014	0.306	0.051	−0.272	−0.006
			4	0.986	0.325	−0.214	0.018	0.982	0.288	−0.240	−0.012	0.997	0.295	−0.241	–0.007
		1.6	2	0.058	0.079	0.007	0.018	0.024	0.043	−0.051	−0.013	0.030	0.052	−0.297	−0.005
			4	0.659	0.355	−0.303	0.023	0.611	0.310	−0.354	−0.004	0.613	0.319	−0.344	−0.005
	8	0	0	0.011	0.022	0.198	0.019	−0.026	−0.012	0.111	−0.020	−0.019	−0.004	0.098	−0.087
		0.7	2	1.464	1.127	−0.119	0.011	1.458	1.091	−0.123	−0.012	1.489	1.094	−0.126	−0.006
			4	1.624	1.319	−0.103	0.035	1.607	1.423	3.186	−0.011	1.641	1.442	−0.142	−0.006
		1.2	2	0.953	0.417	−0.196	0.015	0.931	0.370	−0.270	−0.014	0.946	0.382	−0.327	−0.006
			4	1.537	1.352	−0.050	0.011	1.536	1.317	−0.094	−0.014	1.560	1.319	−0.109	−0.008
		1.6	2	0.071	0.185	−1.492	0.016	0.034	0.152	1.413	−0.018	0.037	0.162	0.700	−0.007
			4	1.266	1.292	−0.145	0.007	1.257	1.229	−0.241	−0.011	1.271	1.241	−0.443	0.043
0.25:0.75	4	0	0	0.009	−0.001	0.105	−0.255	−0.012	−0.017	0.050	−0.136	−0.003	−0.006	0.047	−0.106
		0.7	2	0.532	0.151	−0.269	0.021	0.493	0.114	−0.296	−0.084	0.499	0.113	−0.296	−0.060
			4	0.724	0.888	−0.340	0.012	0.715	0.680	−0.356	−0.067	0.738	0.655	−0.356	−0.146
		1.2	2	0.256	0.118	−0.652	0.318	0.202	0.094	−0.382	0.013	0.217	0.099	−0.339	0.008
			4	0.635	0.598	−0.319	0.182	0.615	0.471	−0.338	0.370	0.625	0.462	−0.335	−0.025
		1.6	2	0.035	0.124	−0.080	0.027	0.013	0.098	−0.463	−0.008	0.018	0.107	−0.515	−0.002
			4	0.434	0.640	−0.374	0.283	0.393	0.586	−0.457	0.037	0.401	0.591	−0.464	0.046
	8	0	0	0.013	0.027	0.151	0.005	−0.019	0.001	0.088	−0.012	−0.011	0.010	0.092	0.046
		0.7	2	0.964	1.216	−0.256	0.022	0.946	1.349	−0.046	−0.008	0.961	1.429	−0.262	−0.001
			4	1.127	0.686	−0.612	0.126	1.105	0.705	−0.854	0.018	1.144	0.790	−1.008	0.031
		1.2	2	0.566	0.693	−0.677	0.016	0.542	0.633	0.020	−0.008	0.545	0.655	0.288	−0.005
			4	1.046	0.942	−0.507	0.077	1.030	1.046	−0.097	0.029	1.048	1.042	−0.381	0.077
		1.6	2	0.057	0.307	0.016	0.016	0.028	0.270	−0.008	−0.001	0.033	0.279	−0.613	0.002
			4	0.814	1.438	−0.752	0.030	0.802	1.791	−0.023	0.011	0.815	1.771	−0.741	0.310

Note. Absolute values lower than 0.1 are marked in boldface. Normally distributed condition (skewness and kurtosis of 0) is also included for convenience. Logit for 0.5:0.5 is 0 and 0.25:0.75 is −1.099. SS = sample size; CP = class proportion of each latent trajectory; NT = the number of time points; SK=skewness of the outcomes; KT = kurtosis of the outcomes.

Mean Square Error

In the normal GMMs, sample size appears to have no distinguishable effect on the MSE but is affected by the other remaining factors (Table 5). On average, the t GMMs have an MSE that minutely decreases as the sample size increases. The skewness of the outcome distribution does not have a major effect on the MSE of t GMMs, but higher levels of kurtosis lead to higher MSE of t GMMs for the parameter estimates except conditions where CP = 0.25:0.75, NT = 8, and SK = 0.7. In general, the MSE is low for skew-normal GMMs. Apart from a few cases, the MSE of skew-normal GMMs is lower when the class proportion is equal for each latent class. The MSE is lowest for skew-t GMMs. A higher sample size leads to a lower value of MSE for almost every condition, but when the class proportion is 0.25:0.75, the skew-t GMMs have a larger MSE, and the estimation accuracy seriously weakens under several conditions.

Table 5.

Mean Square Error in Study 1.

				SS = 300				SS = 800				SS = 1,500
CP	NT	SK	KT	Normal	t	Skew-normal	Skew-t	Normal	t	Skew-normal	Skew-t	Normal	t	Skew-normal	Skew-t
0.5:0.5	4	0	0	0.019	0.015	0.054	0.020	0.006	0.006	0.013	0.006	0.003	0.003	0.010	0.003
		0.7	2	0.762	0.035	0.060	0.015	0.665	0.011	0.063	0.005	0.649	0.007	0.055	0.003
			4	1.327	0.276	0.065	0.015	1.304	0.196	0.067	0.006	1.322	0.189	0.062	0.003
		1.2	2	0.149	0.024	0.076	0.015	0.099	0.009	0.088	0.005	0.098	0.006	0.078	0.003
			4	1.035	0.148	0.061	0.015	0.993	0.098	0.065	0.005	1.010	0.093	0.062	0.002
		1.6	2	0.019	0.026	0.046	0.015	0.006	0.009	0.130	0.006	0.004	0.006	0.102	0.003
			4	0.480	0.154	0.113	0.015	0.394	0.107	0.133	0.018	0.384	0.107	0.123	0.002
	8	0	0	0.017	0.019	0.076	0.167	0.007	0.006	0.028	0.169	0.003	0.003	0.022	0.011
		0.7	2	2.269	1.336	0.069	0.014	2.158	1.216	0.030	0.006	2.235	1.209	0.023	0.003
			4	2.768	1.934	0.621	0.030	2.611	2.068	1.823	0.006	2.711	2.103	0.151	0.003
		1.2	2	0.970	0.202	0.247	0.015	0.892	0.147	0.265	0.005	0.904	0.151	0.282	0.002
			4	2.481	1.909	0.046	0.014	2.386	1.764	0.060	0.006	2.450	1.752	0.037	0.003
		1.6	2	0.022	0.054	2.226	0.015	0.007	0.030	2.777	0.006	0.004	0.029	0.491	0.003
			4	1.675	1.736	0.227	0.014	1.610	1.534	0.231	0.006	1.630	1.551	0.205	0.003
0.25:0.75	4	0	0	0.021	0.021	0.057	0.487	0.008	0.008	0.018	0.220	0.004	0.004	0.011	0.173
		0.7	2	0.327	0.072	0.097	0.317	0.263	0.028	0.099	0.077	0.258	0.019	0.094	0.043
			4	0.572	1.078	0.138	0.491	0.536	0.513	0.135	0.155	0.557	0.444	0.131	0.188
		1.2	2	0.092	0.045	0.646	6.641	0.050	0.018	0.158	0.051	0.051	0.015	0.125	0.011
			4	0.446	0.564	0.118	3.169	0.398	0.255	0.123	14.90	0.400	0.226	0.117	0.051
		1.6	2	0.022	0.046	0.039	0.063	0.007	0.018	0.279	0.007	0.004	0.016	0.317	0.003
			4	0.223	0.504	0.152	6.625	0.169	0.370	0.221	0.053	0.167	0.362	0.221	0.155
	8	0	0	0.024	0.022	0.071	0.023	0.008	0.008	0.023	0.008	0.004	0.004	0.018	0.097
		0.7	2	1.012	1.810	0.089	0.101	0.930	1.961	0.071	0.009	0.943	2.103	0.073	0.003
			4	1.380	1.506	0.727	0.408	1.258	1.489	1.133	0.068	1.331	1.355	1.472	0.055
		1.2	2	0.383	0.537	0.552	0.022	0.312	0.419	0.035	0.007	0.307	0.439	0.083	0.004
			4	1.183	1.607	0.590	0.230	1.093	1.695	0.156	0.045	1.116	1.585	0.259	0.037
		1.6	2	0.027	0.129	0.022	0.021	0.008	0.084	0.007	0.006	0.004	0.083	1.751	0.003
			4	0.733	3.948	0.639	0.021	0.676	3.325	0.033	0.008	0.676	3.515	0.588	5.917

Note. The logit for 0.5:0.5 is 0 and for 0.25:0.75 is −1.099. Normally distributed condition (skewness and kurtosis of 0) is also included for convenience. SS = sample size; CP = class proportion of each latent trajectory; NT = the number of time points; SK = skewness of the outcomes; KT = kurtosis of the outcomes.

Coverage

Overall, the normal GMMs exhibit the worst coverage (Table 6). It is only when SK = 1.6 and KT = 2 that the coverage values for a sample size of 800 falls within the range of .925 to .975 (Bandalos, 2006; Muthén & Muthén, 2000). The t GMMs generate better coverage than the normal GMMs in general, but the performance is not particularly impressive. Skew-normal GMMs produce better coverage than both the normal and t GMMs in general, but their coverage values only fall within the range of .925 to .975 when the sample size is 1,500. In contrast, the skew-t GMMs offer the most conditions under which the coverage meets the cutoff criterion. To put it concretely, an increase in the sample size in skew-t GMMs also improves the coverage.

Table 6.

Coverage for a 95% Confidence Interval in Study 1.

				SS = 300				SS = 800				SS = 1,500
CP	NT	SK	KT	Normal	t	Skew-normal	Skew-t	Normal	t	Skew-normal	Skew-t	Normal	t	Skew-normal	Skew-t
0.5:0.5	4	0	0	0.940	0.967	0.871	0.867	0.930	0.929	0.970	0.893	0.970	0.970	0.950	0.950
		0.7	2	0.050	0.920	0.674	0.920	0.020	0.920	0.240	0.939	0.010	0.900	0.950	0.949
			4	0.040	0.396	0.654	0.920	0.010	0.050	0.163	0.929	0.011	0.834	0.950	0.949
		1.2	2	0.320	0.890	0.603	0.918	0.060	0.910	0.116	0.938	0.073	0.850	0.950	0.950
			4	0.020	0.629	0.613	0.918	0.031	0.360	0.173	0.939	0.034	0.040	0.950	0.959
		1.6	2	0.920	0.920	0.796	0.920	0.940	0.930	0.824	0.929	0.910	0.870	0.880	0.949
			4	0.070	0.420	0.466	0.926	0.020	0.120	0.011	0.929	0.018	0.020	0.949	0.959
	8	0	0	0.930	0.899	0.811	0.947	0.950	0.940	0.909	0.762	0.950	0.949	0.810	0.587
		0.7	2	0.153	0.020	0.800	0.940	0.010	0.387	0.700	0.935	0.010	0.482	0.488	0.969
			4	0.010	0.130	0.811	0.903	0.044	0.020	0.646	0.945	0.098	0.010	0.417	0.952
		1.2	2	0.040	0.240	0.281	0.926	0.010	0.050	0.072	0.938	0.058	0.582	0.010	0.947
			4	0.020	0.010	0.852	0.936	0.021	0.206	0.500	0.955	0.037	0.193	0.351	0.964
		1.6	2	0.910	0.730	0.378	0.922	0.949	0.620	0.258	0.931	0.910	0.230	0.165	0.945
			4	0.030	0.369	0.224	0.935	0.078	0.033	0.041	0.917	0.029	0.266	0.950	0.917
0.25:0.75	4	0	0	0.920	0.925	0.880	0.750	0.900	0.931	0.959	0.838	0.960	0.956	0.910	0.918
		0.7	2	0.190	0.900	0.578	0.857	0.010	0.890	0.136	0.910	0.132	0.770	0.940	0.929
			4	0.040	0.340	0.338	0.808	0.020	0.060	0.021	0.867	0.084	0.010	0.850	0.878
		1.2	2	0.630	0.880	0.290	0.910	0.400	0.860	0.250	0.907	0.100	0.690	0.949	0.960
			4	0.080	0.510	0.450	0.876	0.010	0.210	0.023	0.879	0.010	0.040	0.100	0.950
		1.6	2	0.920	0.880	0.800	0.906	0.930	0.830	0.131	0.920	0.960	0.630	0.165	0.970
			4	0.300	0.313	0.333	0.869	0.070	0.020	0.018	0.899	0.028	0.712	0.950	0.940
	8	0	0	0.920	0.930	0.867	0.905	0.910	0.930	0.869	0.908	0.960	0.939	0.750	0.955
		0.7	2	0.060	0.170	0.605	0.931	0.040	0.070	0.908	0.932	0.010	0.010	0.949	0.967
			4	0.010	0.510	0.404	0.884	0.016	0.560	0.138	0.958	0.010	0.540	0.835	0.958
		1.2	2	0.210	0.130	0.154	0.899	0.060	0.193	0.889	0.912	0.065	0.559	0.970	0.975
			4	0.010	0.440	0.462	0.909	0.020	0.390	0.878	0.946	0.010	0.360	0.870	0.959
		1.6	2	0.910	0.600	0.906	0.907	0.930	0.280	0.926	0.922	0.920	0.030	0.977	0.978
			4	0.090	0.070	0.032	0.909	0.040	0.010	0.910	0.894	0.030	0.381	0.970	0.956

Note. Coverage within a range of .925 to .975 is marked in boldface. Normally distributed condition (skewness and kurtosis of 0) is also included for convenience. SS = sample size; CP = class proportion of each latent trajectory; NT = the number of time points; SK = skewness of the outcomes; KT = kurtosis of the outcomes.

Study 2

In this section, normal distribution is employed like conventional GMMs. Instead, each of two data transformations is used to treat nonnormality. Therefore, Study 2 investigates whether van der Waerden quantile normal scores or adjusted logarithmic transformation produces more accurate parameter estimates. The GMMs are evaluated based on the same four criteria as in Study 1: convergence rate, parameter bias, MSE, and coverage.

Convergence Rate

Generally, the GMMs based on van der Waerden quantile normal scores (VW-GMMs) successfully converge under almost all conditions, with convergence rates that are higher when the sample size is larger (Table 7). The GMMs based on adjusted logarithmic transformation (AL-GMMs) show slightly better convergence rates than the VW-GMMs on average. Indeed, regardless of the simulation design, the AL-GMMs consistently have high-convergence rates.

Table 7.

Convergence Rates in Study 2.

				SS = 300		SS = 800		SS = 1,500
CP	NT	SK	KT	VW	AL	VW	AL	VW	AL
0.5:0.5	4	0	0	0.94	1.00	0.98	1.00	0.97	1.00
		0.7	2	0.97	0.99	0.98	1.00	0.97	1.00
			4	0.98	0.89	0.98	0.96	0.97	1.00
		1.2	2	0.99	1.00	1.00	1.00	1.00	1.00
			4	0.98	0.98	0.98	1.00	0.97	1.00
		1.6	2	1.00	1.00	1.00	1.00	1.00	1.00
			4	0.98	1.00	0.99	1.00	1.00	1.00
	8	0	0	0.93	1.00	1.00	1.00	1.00	1.00
		0.7	2	0.92	1.00	0.99	1.00	1.00	1.00
			4	0.93	0.99	0.99	1.00	1.00	1.00
		1.2	2	0.98	1.00	1.00	1.00	1.00	1.00
			4	0.91	1.00	0.99	1.00	1.00	1.00
		1.6	2	1.00	0.97	1.00	1.00	1.00	0.91
			4	0.95	1.00	0.99	1.00	1.00	1.00
0.25:0.75	4	0	0	0.98	1.00	0.99	1.00	1.00	1.00
		0.7	2	0.98	1.00	1.00	1.00	1.00	1.00
			4	0.99	0.95	1.00	1.00	1.00	1.00
		1.2	2	1.00	0.99	1.00	1.00	1.00	1.00
			4	0.99	1.00	1.00	1.00	1.00	1.00
		1.6	2	1.00	1.00	1.00	1.00	1.00	1.00
			4	1.00	1.00	1.00	1.00	1.00	1.00
	8	0	0	0.94	1.00	1.00	1.00	1.00	1.00
		0.7	2	0.95	1.00	1.00	0.99	1.00	0.99
			4	0.92	1.00	1.00	1.00	1.00	1.00
		1.2	2	1.00	1.00	1.00	1.00	1.00	1.00
			4	0.94	1.00	1.00	1.00	1.00	1.00
		1.6	2	1.00	0.96	1.00	0.96	1.00	0.88
			4	0.99	0.99	1.00	1.00	1.00	1.00

Note. Normally distributed condition (skewness and kurtosis of 0) is also included for convenience. SS = sample size; CP = class proportion of each latent trajectory; NT = number of time points; SK = skewness of the outcomes; KT = kurtosis of the outcomes; VW = van der Waerden quantile normal scores; AL = adjusted logarithmic transformation.

Parameter Bias

As seen in Table 8, the VW-GMMs produce seriously biased parameter estimates. The bias does not fall within the tolerable range of −0.10 to 0.10 (Finch et al., 1997; Kaplan, 1988) for any of the conditions. In more details, larger sample sizes generally led to greater bias except when SS = 300 and CP = 0.5:0.5. In terms of class proportion, the parameter estimates get further biased with unbalanced latent trajectories that led to lower bias just in case SS = 300 and NT = 4. The AL-GMMs produce unbiased estimates for a wider range of conditions, but only four combinations of conditions meet the cut-off criteria regardless of the sample size (CP = 0.5:0.5 NT = 4 SK = 1.2 KT = 2, CP = 0.5:0.5 NT = 4 SK = 1.6 KT = 4, CP = 0.5:0.5 NT = 8 SK = 1.6 KT = 4, and CP = 0.25:0.75 NT = 4 SK = 1.2 KT = 2). The accuracy of the estimates falls slightly when NT = 8 compared with NT = 4 but, if the GMMs have a skewness level of 1.6, a higher number of time points reduces the logit bias. When KT = 2, if the levels of skewness are higher, the bias of the parameter estimates also decreases for SK = 1.2 and then rises again for SK = 1.6.

Table 8.

Bias of the Logit for the Probability of Belonging to Class 1 in Study 2.

				SS = 300		SS = 800		SS = 1,500
CP	NT	SK	KT	VW	AL	VW	AL	VW	AL
0.5:0.5	4	0	0	1.994	0.246	1.515	0.225	1.251	0.231
		0.7	2	1.946	−0.179	1.316	−0.351	1.050	−0.374
			4	1.979	−0.527	1.331	−0.490	1.064	−0.435
		1.2	2	0.945	0.044	0.352	0.009	−0.625	0.014
			4	1.983	−0.355	1.333	−0.453	1.067	−0.437
		1.6	2	1.238	1.643	1.143	1.618	0.962	1.707
			4	1.569	−0.030	0.507	−0.068	0.031	−0.094
	8	0	0	−1.722	0.036	−3.663	0.020	−3.953	−0.056
		0.7	2	−1.732	−0.421	−3.669	−0.856	−3.876	−0.781
			4	−1.660	−1.192	−3.651	−0.873	−3.888	−0.361
		1.2	2	−0.825	0.198	−0.995	0.167	−1.033	0.171
			4	−1.707	−0.533	−3.657	−0.797	−3.886	−0.714
		1.6	2	1.031	0.464	1.043	0.838	1.049	1.036
			4	−0.932	0.034	−2.752	−0.013	−3.272	−0.019
0.25:0.75	4	0	0	0.418	−0.280	−0.906	0.161	−1.096	0.236
		0.7	2	0.164	−0.503	−1.081	−0.633	−1.106	−0.597
			4	0.175	−0.783	−1.083	−0.667	−1.142	−0.610
		1.2	2	1.236	0.004	−1.507	0.001	−1.505	0.005
			4	0.211	−0.627	−1.078	−0.662	−1.115	−0.625
		1.6	2	1.243	1.486	1.550	1.287	1.716	1.511
			4	−0.809	−0.219	−1.721	−0.425	−1.758	−0.794
	8	0	0	−2.530	−1.429	−4.172	−1.558	−4.305	−1.607
		0.7	2	−2.288	−0.685	−4.044	−0.889	−4.264	−0.843
			4	−2.333	−0.952	−4.070	−0.862	−4.254	−0.836
		1.2	2	−0.996	0.122	−1.214	0.114	−1.227	0.117
			4	−2.243	−0.819	−4.087	−0.868	−4.261	−0.853
		1.6	2	0.720	0.308	0.766	0.267	0.827	0.710
			4	−2.002	−0.182	−3.344	−0.255	−3.574	−0.220

Note. The logit for 0.5:0.5 is 0 and for 0.25:0.75 is −1.099. Absolute values lower than 0.1 are marked in boldface. Normally distributed condition (skewness and kurtosis of 0) is also included for convenience. SS = sample size; CP = class proportion of each latent trajectory; NT = the number of time points; SK = skewness of the outcomes; KT = kurtosis of the outcomes; VW = van der Waerden quantile normal scores; AL = adjusted logarithmic transformation.

Mean Square Error

On average, when the ratio of latent trajectories is unbalanced, an increase in the sample size tends to raise the MSE of the VW-GMMs (Table 9). The GMMs with unbalanced class proportion have lower MSE when NT = 4, but if with balanced one, lower MSE is obtained under models including more time points. Kurtosis is more influential factor than skewness to MSE of the VW-GMMs that exhibit larger MSE with higher level of kurtosis. Compared with the VW-GMMs, the MSE is generally lower for the AL-GMMs, apart from a few cases where SK = 1.6 and KT = 2. However, a larger MSE is observed for AL-GMMs with larger sample sizes. AL-GMMs with unbalanced latent trajectories have a slightly higher MSE when NT = 4 except for a few cases. In terms of nonnormality, MSE is lower at SK = 1.2 and then rises at SK = 1.6 on average for KT = 2. At a higher kurtosis level, an increase in skewness reduces the MSE, but this trend weakens if the sample size rises. In contrast to skewness, the kurtosis of the distribution in the outcomes does not have a clear effect on MSE.

Table 9.

Mean Square Error in Study 2.

				SS = 300		SS = 800		SS = 1,500
CP	NT	SK	KT	VW	AL	VW	AL	VW	AL
0.5:0.5	4	0	0	5.651	0.087	4.332	0.060	3.377	0.058
		0.7	2	5.447	0.101	3.625	0.164	2.921	0.159
			4	5.596	0.830	3.710	0.528	2.967	0.219
		1.2	2	4.783	0.018	3.302	0.006	2.271	0.003
			4	5.619	0.187	3.718	0.223	2.973	0.199
		1.6	2	2.873	4.264	2.526	4.679	1.867	4.574
			4	5.459	0.024	3.558	0.019	3.885	0.050
	8	0	0	11.004	0.610	14.929	0.515	2.218	0.594
		0.7	2	10.829	0.863	14.631	1.915	15.523	1.295
			4	10.665	3.225	14.497	2.408	15.626	1.013
		1.2	2	1.343	0.067	1.053	0.041	1.074	0.036
			4	10.935	0.768	14.549	1.173	15.609	0.793
		1.6	2	1.122	3.922	1.105	2.748	1.109	2.649
			4	8.522	0.020	9.842	0.007	10.836	0.007
0.25:0.75	4	0	0	2.829	1.218	2.147	0.222	2.218	0.098
		0.7	2	2.727	0.353	2.347	0.419	2.218	0.364
			4	2.768	1.102	2.405	0.485	2.360	0.379
		1.2	2	2.354	0.028	2.317	0.007	2.282	0.004
			4	2.850	0.462	2.392	0.451	2.312	0.397
		1.6	2	2.720	3.633	3.616	2.960	4.131	2.345
			4	3.151	0.326	3.178	0.713	3.138	1.385
	8	0	0	13.697	4.166	18.302	4.304	18.696	4.183
		0.7	2	12.786	0.601	17.737	0.809	18.331	0.721
			4	12.826	1.265	17.707	0.760	18.242	0.708
		1.2	2	1.537	0.071	1.496	0.024	1.518	0.019
			4	12.658	0.741	17.852	0.770	18.297	0.740
		1.6	2	0.622	2.324	0.647	1.823	0.692	1.077
			4	9.897	0.310	12.314	0.365	12.846	0.287

Coverage

The VW-GMMs produce the worst coverage for all conditions (Table 10), with a highest value of 0.440 (when SS = 300, CP = 0.5:0.5, NT = 4, SK = 1.6, and KT = 2). However, though the AL-GMMs exhibit better coverage overall, the low number of conditions under which the estimate falls within the range of .925 to .975 (Bandalos, 2006; Muthén & Muthén, 2000) indicates that the coverage is still not sufficient. In fact, the AL-GMMs have a coverage of between .940 and .970 in only five cases.

Table 10.

Coverage of 95% Confidence Interval in Study 2.

				SS = 300		SS = 800		SS = 1,500
CP	NT	SK	KT	VW	AL	VW	AL	VW	AL
0.5:0.5	4	0	0	0.191	0.710	0.327	0.340	0.433	0.110
		0.7	2	0.189	0.707	0.347	0.250	0.423	0.180
			4	0.208	0.348	0.327	0.062	0.433	0.000
		1.2	2	0.184	0.920	0.160	0.970	0.090	0.960
			4	0.198	0.429	0.337	0.060	0.423	0.020
		1.6	2	0.440	0.060	0.220	0.020	0.170	0.010
			4	0.309	0.910	0.323	0.900	0.170	0.850
	8	0	0	0.075	0.450	0.000	0.150	0.000	0.040
		0.7	2	0.066	0.660	0.000	0.290	0.000	0.120
			4	0.076	0.172	0.000	0.030	0.000	0.000
		1.2	2	0.173	0.700	0.010	0.650	0.000	0.400
			4	0.044	0.440	0.000	0.100	0.000	0.020
		1.6	2	0.040	0.041	0.010	0.030	0.000	0.019
			4	0.064	0.940	0.020	0.920	0.000	0.920
0.25:0.75	4	0	0	0.398	0.640	0.263	0.340	0.160	0.100
		0.7	2	0.357	0.330	0.250	0.020	0.140	0.000
			4	0.378	0.095	0.230	0.040	0.110	0.000
		1.2	2	0.090	0.900	0.000	0.970	0.000	0.960
			4	0.354	0.160	0.230	0.000	0.120	0.000
		1.6	2	0.430	0.100	0.290	0.062	0.260	0.010
			4	0.160	0.840	0.030	0.720	0.000	0.490
	8	0	0	0.085	0.220	0.000	0.100	0.000	0.010
		0.7	2	0.074	0.250	0.020	0.071	0.010	0.030
			4	0.076	0.030	0.010	0.000	0.000	0.000
		1.2	2	0.080	0.840	0.000	0.780	0.000	0.630
			4	0.053	0.090	0.010	0.010	0.000	0.010
		1.6	2	0.120	0.177	0.110	0.146	0.020	0.095
			4	0.141	0.788	0.000	0.820	0.000	0.840

Note. Coverage within a range of .925 to .975 is marked in boldface. Normally distributed condition (skewness and kurtosis of 0) is also included for convenience. SS = sample size; CP = class proportion of each latent trajectory; NT = the number of time points; SK = skewness of the outcomes; KT = kurtosis of the outcomes; VW = van der Waerden quantile normal scores; AL = adjusted logarithmic transformation.

Overall, based on the number of conditions for which all four evaluation criteria (convergence rate, parameter bias, MSE, and coverage) are satisfied or show allowable levels, skew-t GMMs outperform AL-GMMs. For the skew-t GMMs, the number of conditions that satisfy all four criteria increases as the sample size rises. In addition, balanced skew-t GMMs are preferable to their unbalanced counterparts in terms of optimizing the outcomes of all four criteria. The AL-GMMs produce only a single condition where the four evaluation criteria meet the defined standards when the sample size is 300 and only two each for sample sizes of 800 and 1,500, respectively. This means that using the skew-t distribution approach may be more appropriate for accurate parameter estimation regardless of the model structure (e.g., sample size, the subgroup size of each latent trajectory, and the number of outcome variables) and the level of nonnormality in the outcomes (e.g., the skewness and kurtosis).

However, in terms of efficiency, the skew-t GMMs require more computing time in parameter estimation than do the AL-GMMs. This disadvantage has been discussed in previous simulation research (e.g., Muthén & Asparouhov, 2015; Son et al., 2019). Skew-t GMMs are characterized by slower computation than other approaches, particularly for larger sample sizes, because they need to handle raw data in every step (Muthén & Asparouhov, 2015). Similarly, the skew-t GMMs employed in this present study require approximately 25 to 30 minutes for computation, while the AL-GMMs require only about 3 to 5 minutes to compute the estimates, irrespective of the simulation conditions.

An Application to Reading Scores From the ECLS-K Database

This section applied nonnormal linear growth model to the reading scores development over the ages from kindergarten to first grade using the Early Childhood Longitudinal Study–Kindergarten (ECLS-K) class database (National Center for Education Statistics, 1998). As Depaoli et al. (2019) already illustrated how much skewed the reading scores (i.e., repeated measures) were, this study also use these data to show performance of growth modeling under skew-t distribution.

For details of used data in this section, the reading scores were scaled by item response theory across four time points throughout 1998 to 2000 panels of ECLS-K; note that the reading scores were reestimated based on an expanded set of items, all those used in kindergarten through eighth grade. Assessment on children’s reading achievement was conducted in the fall and spring of kindergarten and first grade, respectively, across 18 months. The time intervals between the assessments were not equidistant, so the growth modeling for these data reflected the unequal spacing of waves. For the purpose of this example, missing data were removed. A total sample size of 4,176 children across 18 months (i.e., four time points) was analyzed for nonnormal growth modeling of item response theory -scaled reading scores. Nonnormality of the four repeated measures was shown in Figure 2.

Figure 2.

Four time points of reading score histograms. (a) Time 1 (fall 1998). (b) Time 2 (spring 1999). (c) Time 3 (fall 1999). (d) Time 4 (spring 2000).

Following previous literatures (Guerra-Peña & Steinley, 2016; Muthén & Asparouhov, 2015), BICs were compared throughout six approaches (i.e., four types of skew-t family distribution and two normalizing transformations) according to the number of latent class. As a result, in Figure 3, the model using skew-t distribution showed the lowest BIC values among four skew-t family distributions (i.e., the normal, t, skew-normal, and skew-t). This supported the present study and other previous simulation studies (Depaoli et al., 2019; Guerra-Peña & Steinley, 2016; Muthén & Asparouhov, 2015; Son et al., 2019) that relaxing within-class normality assumption and using skew-t as substitute made better performance of GMM for nonnormally distributed data. Additionay, considering a large gap between BIC values of the models (i.e., with one and two latent classes) under the normal growth modeling, it also showed that conventional growth modeling might have high possibility to make spurious classes under nonnormality. Meanwhile, the model under adjusted logarithmic transformation had lower BICs than under van der Wearden quantile normal scores, meaning that adjusted logarithmic transformation performed better under this applied data. This result also echoes the finding of the Study 2 in this article.

Figure 3.

Bayesian information criterion (BIC) of growth modeling for the reading scores. (a) Under four types of skew-t family distribution. (b) Under two types of normalizing transformations.

The model with one latent class was selected according to the BIC results of skew-t and adjusted logarithmic approaches, because BICs of the model with two latent classes were not much different from one latent class. Distinct from this study focusing on the bias of parameter estimates (i.e., bias of logit for probability to belonging to each latent class), only one latent class existed in these reading scores data. So, referring to the approach of exhibiting real data analysis in Depaoli et al. (2019), distance from 50% percentile (i.e., median) of observed scores to estimated growth trajectories was calculated at each time point (see Figure 4). The distance was standardized to make comparison easier, because data transformation including in two models (i.e., adjusted logarithmic and van der Wearden quantile normal scores) made data changed in terms of data scale, unit, and so on. Figure 4 showed that the skew-t modeling generally performed the nearer growth means at each time point. In case of fourth time point, the reading scores were almost normally distributed (see Figure 2 d), so skew-t growth modeling had far distance comparing with others. In contrast to skew-t, the normal and skew-normal modeling produced the nearest growth means only at fourth time point. Meanwhile, both two types of data transformation were shown to have far distance from 50% percentile. Overall, growth modeling for the reading scores from ECLS-K database was well specified with skew-t distribution.

Figure 4.

Standardized distance from 50% percentile of data.

Summary

In the present study, five factors were manipulated—sample size (300, 800, and 1,500), the skewness of the outcome variables (0.7, 1.2, and 1.6), the kurtosis of the outcome variables (2 and 4), the number of time points (4 and 8), and the class proportion of each latent trajectory (0.5:0.5 and 0.25:0.75)—and applied to the simulation models in both Study 1 and Study 2 to represent more realistic and diverse data conditions, especially in terms of the degree of kurtosis and the unbalanced class proportion of latent trajectories. In total, 72 simulation conditions (3×3×2×2×2) were generated and 100 replications run for each one. Based on these simulation models, the following summary can be drawn from the results.

First, of the four distributions tested in Study 1, the skew-t distribution was the most appropriate for nonnormal GMMs in terms of obtaining unbiased logits of probability when two subgroups were assumed to exist in the true population. The present study demonstrated that skew-t GMMs produced the most accurate parameter estimates. This finding echoes that of previous simulation studies (see, e.g., Depaoli et al., 2019; Muthén & Asparouhov, 2015; Son et al., 2019). Consistent with Son et al. (2019), in most conditions the absolute bias was less than 0.1, and the MSE was also lower compared with the other three distributions. Additionally, the number of conditions for which the cut-off criteria were met was highest for skew-t GMMs. The skew-t GMMs also obtained much better results in general when the sample size was larger (i.e., SS = 1,500), which supports the results of Depaoli et al. (2019) and Son et al. (2019). As other studies less focused on the extent to which the class proportion affects biases on the estimation, it is a point of significance in this study to find that the skew-t GMMs showed better performance when the class proportion in the population was balanced. Meanwhile, performance of skew-t GMM under normally distributed outcome variables was not bad and vice versa (i.e., normal GMM under nonnormally distributed outcome variables) was not good (see Tables 3 –6 for details), which was also similar to the results of Depaoli et al. (2019).

Even with the advantages of using a skew-t distribution in nonnormal GMM, caution is required under certain circumstances. When the sample size is small (see, e.g., Depaoli et al., 2019; Son et al., 2019) and the class proportion of the latent trajectories is unbalanced, the estimates produced by skew-t GMMs may be partially affected by high levels of kurtosis. The MSE can also be higher, and the coverage is unlikely to fall within the range of .925 to .975. Furthermore, it should be noted that the convergence rate of skew-t GMMs falls as the nonnormality level increases when there are unbalanced latent class sizes. Consistent with the results of Depaoli et al. (2019), fitting the model to a skew-t distribution requires much greater computational resources than other distributions (i.e., the normal or t in this article). Nevertheless, in terms of the overall results, skew-t GMMs may prove useful for obtaining accurate parameter estimates when the outcome variables follow a nonnormal distribution.

Second, the performance of the models based on adjusted logarithmic transformation was generally better than that of the models based on van der Waerden quantile normal scores. The AL-GMMs consistently produced high rates of convergence regardless of the simulation design. Compared with VW-GMMs, more conditions produced unbiased estimates, smaller MSEs, and allowable coverage when using AL-GMMs.

Third, adopting a skew-t distribution as an approach to nonnormal GMM can be recommended for two primary reasons. First, the skew-t GMMs overwhelmingly outperformed the AL-GMMs, with much more accurate estimation and greater robustness under various simulated conditions. Second, the results arising from data transformation approaches (e.g., the mean and variance of the intercept and slope) are known to be difficult to interpret (Agresti & Finlay, 2009). The computational efficiency of AL-GMMs may not compensate for this issue of unclear interpretation.

Discussion

By investigating more diverse simulation conditions that better reflect real-life data sets, it is expected that the present study will offer useful guidelines for applied researchers in social science when approaching nonnormal GMM. The main implications of this simulation study are as follows.

First, this article demonstrated that the most accurate parameter estimation comes from skew-t GMMs, which supports the findings of previous simulation studies (see, e.g., Depaoli et al., 2019; Muthén & Asparouhov, 2015; Son et al., 2019), although one main difference exists in terms of modeling nonnormality, which comes from manifest variables not latent growth factors (i.e., this article dealt with nonnormally distribution of manifest variables, while three articles did of latent growth factors). Most results of this simulation study share that of the previous ones in terms of better performance of skew-t GMM under larger sample size (Depaoli et al., 2019; Son et al., 2019), greater computing time for skew-t GMM (Depaoli et al., 2019), and so on. So, one of implications in this article was made by considering different levels of kurtosis in the repeated measures of the model and more practical conditions for class proportion. This study attempted to examine the performance of skew-t GMMs over a wider range of data in terms of estimation accuracy, particularly for the logit of probability, thus expanding on previous research related to biased parameter estimates caused by nonnormality in GMM. It was newly revealed that a high degree of kurtosis in the distribution of outcomes and unbalanced subgroup sizes can reduce the accuracy of skew-t GMMs.

Second, the applicability of data transformation to nonnormal GMM was investigated. Although a few normalizing methods have been tested using other mixture models, such as LPA, no previous research has investigated data transformation in nonnormal GMM simulations. This article focused on two methods (i.e., adjusted logarithmic transformation and van der Waerden quantile normal scores) proposed by other studies and found that adjusted logarithmic transformation may be more appropriate for nonnormal repeated measures in GMM than van der Waerden quantile normal scores. Specifically, it was contrast to nonnormal LPA of Morgan et al. (2016), van der Waeden quantile normal scores produced not accurate parameter estimates under nonnormal GMM. This difference might come from whether the growth factors are modeled or not. Meanwhile, it was similar to Kupek (2005) in terms of suitability of log-linear transformation within SEM framework. But there is a big difference between this article and Kupek (2005) on the aim of normalization. Contrary to the study of Kupek (2005) where binary variables were normalized for linearization, this article focused on nonnormally distributed outcomes. Additionally, the modeling approach to outcomes (i.e., longitudinal or not) in this article is also distinct from Kupek (2005), so this simulation study had an implication with examining appropriateness of the logarithm transformation on nonnormal GMM.

Third, this study compared the performance of alternative distribution approaches (i.e., the skew-t GMMs) with that of data transformation approaches (i.e., the AL-GMMs). It was found that fitting a model to normalized outcomes can reduce computation time, but it does not guarantee accurate parameter estimates or interpretable results. Therefore, this study offers guidelines for applied researchers in terms of which approach is more suitable when repeated measures in GMM follow a nonnormal distribution (i.e., in this article, most evaluation criteria recommended skew-t GMMs than AL-GMMs under various nonnormality of outcomes).

As with any Monte Carlo simulation study, these findings and implications are generalizable only for the conditions examined in this study. Even though the chosen design factors and the manipulated conditions are based on previous empirical research, the results may not be directly applicable to every situation because not all cases were covered by the results of this study. As such, this article has certain limitations and associated recommendations for future research that should be noted.

First, future research needs to expand on the findings of the present study by considering other conditions not included in this simulation design but that are characteristic of real data. For example, the kurtosis of the distribution of repeated measures may be increased to cover more extreme cases of nonnormality (e.g., adolescent alcohol use, which had a skewness of 1.99 to 10.69 in D’Amico et al., 2016). Only two levels of kurtosis were manipulated in this study, so assessing higher levels of kurtosis can provide insight and implications. Additionally, this article focused on only positively skewed data, so it should be cautious to apply the results of this article to other situations such as negative skewness. Future research might back up this point by considering various types of skewed data.

Second, it would be helpful if the analyzed GMMs were more complicated. For instance, only two latent classes were assumed in this study. Future research can include more subgroups within the true population (e.g., three or four classes, which is a common occurrence in applied research). Additionally, the latent growth factors may consist of higher order terms such as quadratic or cubic functions to more precisely track the developmental path of the variable of interest.

Third, class enumeration in nonnormal GMM should be thoroughly reviewed with the occurrence issue of parameter bias. Correctly deciding on the number of latent classes is also important for researchers because this number may not be known a priori. In the present study, a comparison of parameter bias for the four distribution types and for the data transformation approaches was highlighted. Even though Muthén and Asparouhov (2015) demonstrated that skew-t GMMs can detect the correct number of latent classes when exploring the results of BIC under nonnormal data, other indices or likelihood ratio tests (e.g., AIC, SBIC, LMR-LRT, BLRT, etc.) also need to be examined with respect to skew-t GMMs that involve nonnormal outcomes.

In conclusion, the findings of this study are expected to help researchers gain confidence in selecting the most appropriate approach for nonnormal outcome variables in growth mixture modeling.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Yeji Nam

References

Agresti

Finlay

(2009). Statistical methods for the social sciences. Pearson Prentice Hall.

Akaike

(1974). A new look at the statistical model identification. IEEE Transactions Automatic Control, 19(6), 716-723. https://doi.org/10.1109/TAC.1974.1100705

Asparouhov

Muthén

B. O.

(2016). Structural equation models and mixture models with continuous nonnormal skewed distributions. Structural Equation Modeling: A Multidisciplinary Journal, 23(1), 1-19. https://doi.org/10.1080/10705511.2014.947375

Azzalini

Valle

A. D.

(1996). The multivariate skew-normal distribution. Biometrika, 83, 715-726. https://doi.org/10.1093/biomet/83.4.715

Bandalos

D. L.

(2006). The use of Monte Carlo studies in structural equation modeling research. In Hancock

G. R.

Mueller

R. O.

(Eds.), Structural equation modeling: A second course (pp. 385-462). Information Age.

Bauer

D. J.

(2007). Observations on the use of growth mixture models in psychological research. Multivariate Behavioral Research, 42(4), 757-786. https://doi.org/10.1080/00273170701710338

Bauer

D. J.

Curran

P. J.

(2003). Distributional assumptions of growth mixture models: Implications for overextraction of latent trajectory classes. Psychological Methods, 8(3), 338-363. https://doi.org/10.1037/1082-989X.8.3.338

Bauer

D. J.

Curran

P. J.

(2004). The integration of continuous and discrete latent variable models: Potential problems and promising opportunities. Psychological Methods, 9(1), 3-29. https://doi.org/10.1037/1082-989X.9.1.3

Boers

Reinecke

Seddig

Mariotti

(2010). Explaining the development of adolescent violent delinquency. European Journal of Criminology, 7(6), 499-520. https://doi.org/10.1177/1477370810376572

10.

Brandt

Klein

A. G.

(2015). A heterogeneous growth curve model for nonnormal data. Multivariate Behavioral Research, 50(4), 416-435. https://doi.org/10.1080/00273171.2015.1022639

11.

D’Amico

E. J.

Tucker

J. S.

Miles

J. N.

Ewing

B. A.

Shih

R. A.

Pedersen

E. R.

(2016). Alcohol and marijuana use trajectories in a diverse longitudinal sample of adolescents: Examining use patterns from age 11 to 17 years. Addiction, 111(10), 1825-1835. https://doi.org/10.1111/add.13442

12.

Depaoli

Winter

S. D.

Lai

Guerra-Peña

(2019). Implementing continuous non-normal skewed distributions in latent growth mixture modeling: An assessment of specification errors and class enumeration. Multivariate Behavioral Research, 54(6), 795-821. https://doi.org/10.1080/00273171.2019.1593813

13.

Fan

Felsővályi

Á.

Sivo

S. A.

Keenan

S. C.

(2002). SAS for Monte Carlo studies: A guide for quantitative researchers. SAS Institute.

14.

Feldman

B. J.

Masyn

K. E.

Conger

R. D.

(2009). New approaches to studying problem behaviors: A comparision of methods for modeling longitudinal, categorical adolescent drinking data. Developmental Psychology, 45(3), 652-676. https://doi.org/10.1037/a0014851

15.

Feng

Wang

X. M.

(2013). Log transformation: Application and interpretation in biomedical research. Statistics in Medicine, 32(2), 230-239. https://doi.org/10.1002/sim.5486

16.

Finch

J. F.

West

S. G.

MacKinnon

D. P.

(1997). Effects of sample size and nonnormality on the estimation of mediated effects in latent variable models. Structural Equation Modeling, 4(2), 87-107. https://doi.org/10.1080/10705519709540063

17.

Fleishman

(1978). A method for simulating non-normal distributions. Psychometrika, 43(4), 521-532. https://doi.org/10.1007/BF02293811

18.

Flora

D. B.

Curran

P. J.

(2004). An empirical evaluation of alternative methods of estimation for confirmatory factor analysis with ordinal data. Psychological Methods, 9(4), 466-491. https://doi.org/10.1037/1082-989X.9.4.466

19.

Frühwirth-Schnatter

Pyne

(2010). Bayesian inference for finite mixtures of univariate and multivariate skew-normal and skew-t distributions. Biostatistics, 11(2), 317-336. https://doi.org/10.1093/biostatistics/kxp062

20.

Guerra-Peña

Steinley

(2016). Extracting spurious latent classes in growth mixture modeling with nonnormal errors. Educational and Psychological Measurement, 76(6), 933-953. https://doi.org/10.1177/0013164416633735

21.

Jung

Wickrama

K. A. S.

(2008). An introduction to latent class growth analysis and growth mixture modeling. Social and Personality Psychology Compass, 2(1), 302-317. https://doi.org/10.1111/j.1751-9004.2007.00054.x

22.

Kaplan

(1988). The impact of specification error on the estimation, testing and improvement of structural equation models. Multivariate Behavioral Research, 23(1), 69-86. https://doi.org/10.1207/s15327906mbr2301_4

23.

Kline

R. B.

(2016). Data preparation and psychometrics review. In Principle and practice of structural equation modeling (4th ed., pp. 64-96). Guilford Press.

24.

Kupek

(2005). Log-linear transformation of binary variables: A suitable input for SEM. Structural Equation Modeling, 12(1), 28-40. https://doi.org/10.1207/s15328007sem1201_2

25.

Lee

McLachlan

G. J.

(2014). Finite mixtures of multivariate skew t-distributions: Some recent and new results. Statistics and Computing, 24(2), 181-202. https://doi.org/10.1007/s11222-012-9362-4

26.

Lin

T. I.

Lee

J. C.

Hsieh

W. J.

(2007). Robust mixture modeling using the skew-t distribution. Statistics and Computing, 17(2), 81-92. https://doi.org/10.1007/s11222-006-9005-8

27.

Mendell

N. R.

Rubin

D. B.

(2001). Testing the number of components in a normal mixture. Biometrika, 88(3), 767-778. https://doi.org/10.1093/biomet/88.3.767

28.

Huang

(2014). Bayesian analysis of nonlinear mixed-effects mixture models for longitudinal data with heterogeneity and skewness. Statistics in Medicine, 33(16), 2701-2880. https://doi.org/10.1002/sim.6136

29.

McLachlan

G. J.

Peel

(2000). Finite mixture models. Wiley.

30.

Morgan

G. B.

Hodge

K. J.

Baggett

A. R.

(2016). Latent profile analysis with nonnormal mixtures: A Monte Carlo examination of model selection using fit indices. Computational Statistics & Data Analytics, 93, 146-161. https://doi.org/10.1016/j.csda.2015.02.019

31.

Muthén

B. O.

(2004). Latent variable analysis: Growth mixture modeling and related techniques for longitudinal data. In Kaplan

(Eds.), Handbook of quantitative methodology for the social sciences (pp. 345-368). Sage.

32.

Muthén

B. O.

Asparouhov

(2015). Growth mixture modeling with non-normal distributions. Statistics in Medicine, 34(6), 1041-1058. https://doi.org/10.1002/sim.6388

33.

Muthén

B. O.

Muthén

L. K.

(1998-2017). Mplus user’s guide (8th ed.). https://www.statmodel.com/download/usersguide/MplusUserGuideVer_8.pdf

34.

Muthén

B. O.

Muthén

L. K.

(2000). Integrating person-centered and variable-centered analyses: Growth mixture modeling with latent trajectory classes. Alcoholism: Clinical and Experimental Research, 24(6), 882-891. https://doi.org/10.1111/j.1530-0277.2000.tb02070.x

35.

National Center for Education Statistics. (1998). National educational longitudinal study of 1998. U.S. Department of Education.

36.

Schwartz

(1978). Estimating dimensions of a model. Annals of Statistics, 6(2), 461-464. https://doi.org/10.1214/aos/1176344136

37.

Sclove

S. L.

(1987). Application of model-selection criteria to some problems analysis, Psychometrika, 52(3), 333-343. https://doi.org/10.1007/BF02294360

38.

Svolba

(2006). Data mart coding and content. In Data preparation for analytics: Using SAS (pp. 105-160). SAS Institute.

39.

Son

Lee

Jang

Yang

Hong

(2019). A comparison of different nonnormal distributions in growth mixture models. Educational and Psychological Measurement, 79(3), 577-597. https://doi.org/10.1177/0013164418823865

40.

Stanley

Kellermanns

F. W.

Zellweger

T. M.

(2017). Latent profile analysis: Understanding family firm profiles. Family Business Review, 30(1), 84-102. https://doi.org/10.1177/0894486516677426

41.

Vale

Maurelli

(1983). Simulating multivariate nonnormal distributions. Psychometrika, 48(3), 465-471. https://doi.org/10.1007/BF02293687

42.

van der Waerden

B. L

. (1952). Order tests for the two-sample problem and their power. Indagationes Mathematicae, 14, 453-458. https://doi.org/10.1016/S1385-7258(52)50063-5

43.

Wickrama

K. A. S.

Lee

T. K.

O’Neal

C. W.

Lorenz

F. O.

(2016). An introduction to growth mixture models (GMMs). In Higher-order growth curves mixture modeling with Mplus: A practical guide (pp. 209-226). Routledge.

44.

Yuan

K. H.

Chan

Bentler

P. M.

(2000). Robust transformation with applications to structural equation modelling. British Journal of Mathematical and Statistical Psychology, 53(1), 31-50. https://doi.org/10.1348/000711000159169