Abstract
This study proposes transformation functions and matrices between coefficients in the original and reparameterized parameter spaces for an existing linear-linear piecewise model to derive the interpretable coefficients directly related to the underlying change pattern. Additionally, the study extends the existing model to allow individual measurement occasions and investigates predictors for individual differences in change patterns. We present the proposed methods with simulation studies and a real-world data analysis. Our simulation study demonstrates that the method can generally provide an unbiased and accurate point estimate and appropriate confidence interval coverage for each parameter. The empirical analysis shows that the model can estimate the growth factor coefficients and path coefficients directly related to the underlying developmental process, thereby providing meaningful interpretation.
Keywords
1. Introduction
Longitudinal studies of change are popular in various disciplines to evaluate individual growth over time. If a process under investigation is followed for a long enough time duration, it is likely to exhibit some degree of nonlinear change in which the curve has a nonconstant relationship to recorded time. One possible model to examine nonlinear individual change patterns is a piecewise linear latent growth model (Chou et al., 2004; Cudeck & Harring, 2010; Harring et al., 2006; Kohli, 2011; Kohli & Harring, 2013; Kohli et al., 2013; Sterba, 2014), also known as the linear spline growth model (Grimm et al., 2016, Chapter 11). By using at least two linear pieces, the linear spline (or piecewise linear) functional form can approximate more complex underlying change patterns. Previous studies have demonstrated broad application potential for linear spline trajectory models. For example, researchers have employed the model to assess postsurgical rehabilitation (Dumenci et al., 2019; Riddle et al., 2015), the learning process of a specific task (Cudeck & Klebe, 2002), the onset of alcohol abuse (Li et al., 2001), and intellectual development (Marcoulides, 2018).
One major challenge of analyzing linear piecewise change patterns is to decide the location of the change-point or “knot” that indicates the occurrence of the change in the developmental process. Although empirical researchers can specify the knot location a priori driven by domain knowledge, for example, Dumenci et al. (2019), Flora (2008), Riddle et al. (2015), and Sterba (2014), the knot is unknown in most cases. Marcoulides (2018) proposed to fit a pool of candidate models with knots at different locations and select the “best” one using the Bayesian information criterion (BIC). By building a multiphase mixed-effects model, Cudeck and Klebe (2002) demonstrated that the change-point can also be a random coefficient that allows individuals to transition from one phase to another at different time points. Earlier studies have demonstrated how to estimate knots using frequentist (Cudeck & du Toit, 2003; Harring et al., 2006; Kohli, 2011; Kohli & Harring, 2013; Kohli et al., 2013; Kwok et al., 2010; Preacher & Hancock, 2015) and Bayesian (Dominicus et al., 2008; Kohli, Hughes, et al., 2015; Lock et al., 2018; McArdle & Wang, 2008; Muniz Terrera et al., 2011; Wang & McArdle, 2008) mixed-effects models and growth models.
One intuitive longitudinal model with a linear spline functional form is a bilinear spline growth model (or linear-linear piecewise model) with an unknown knot. It can approximate complex nonlinear change patterns to answer specific research questions about when one developmental stage ends and the other one starts, as well as the development rate of each phase (Kohli, Hughes, et al., 2015; Kohli, Sullivan, et al., 2015; Sullivan et al., 2017). The piecewise linear-linear growth curve model can be fit in the mixed-effects modeling framework and the latent growth curve (LGC) modeling framework to estimate a knot. In the mixed-effects modeling framework, Kohli et al. (2016) estimated the knot with both fixed and random effects along with coefficients of the initial status and the rate of change and extended the model to the finite mixture modeling framework by implementing an R routine fitPMM (Zopluoglu et al., 2014) constructed on the marginal maximum likelihood estimation (MLE) procedure described in Du Toit and Cudeck (2009). Note that the marginal MLE procedure was proposed because the linear-linear piecewise growth model with a random knot is a nonlinear random coefficient model (NRCM), of which the joint likelihood function does not have a closed form and therefore cannot be calculated explicitly (Du Toit & Cudeck, 2009; Kohli et al., 2016).
In the LGC framework, Harring et al. (2006) developed a linear-linear piecewise model with an unknown fixed knot with a unified functional form of two linear pieces through reparameterization, which is realized by reexpressing growth factors as linear combinations of their original forms. Kohli (2011), Kohli et al. (2013), and Grimm et al. (2016, Chapter 11) derived the growth coefficients that are directly related to the underlying change patterns from the estimated coefficients. By implementing this model, Kohli et al. (2013) examined the development in each stage and estimated a fixed knot for procedural learning task research.
Preacher and Hancock (2015) extended the model with an unknown fixed knot to estimate the knot’s variance simultaneously, in which the knot location is allowed to vary and then is viewed as an additional growth factor besides the intercept and two slopes in the LGC modeling framework. More importantly, they considered incorporating obesity status to predict varying knot locations. They demonstrated that the linear-linear functional form with an unknown random knot is useful by reexamining a data set containing longitudinal records of plasma phosphate. To specify the NRCM in the structural equation modeling (SEM) framework, Preacher and Hancock (2015) linearized the model by reparameterizing growth factors through the Taylor series expansion (Browne & du Toit, 1991). One drawback of this linearized model lies in that it is fit in a reparameterized framework. Some reparameterized coefficients were no longer directly related to the specific research questions answered by employing such functional form and therefore lacked meaningful, substantive interpretation.
Some existing methodological studies, for example, Bauer (2003) and Curran (2003), have shown that the LGC modeling framework is mathematically equivalent to the mixed-effects model in most cases. However, researchers implemented different approximation methods on the NRCMs in the two frameworks: the marginal MLE procedure and the Taylor series expansion for mixed-effects and LGC modeling framework, respectively, as stated above. Specifically, the marginal MLE is utilized during the optimization process. The Taylor series expansion is employed to rewrite the repeated outcome as a linear combination of growth factors (i.e., prior to the optimization process), so that the normal theory of MLE is still applicable, which allows for manageable extensions, for example, adding covariates to account for variability in growth factors. 1 Based on this consideration, we propose inverse-transformation matrices for the model proposed by Preacher and Hancock (2015) to derive the growth factor coefficients in the original frame so that all coefficients are related to the specific research interest answered by utilizing the linear-linear piecewise functional form and interpretable, although these coefficients can be directly estimated in the mixed-effects modeling framework. In addition, we extend the model proposed by Preacher and Hancock (2015) by allowing time-invariant covariates (TICs) to predict individual differences in the initial status and rate-of-change of each stage, so that these covariates may account for the heterogeneity in trajectories instead of that only in knot locations. More importantly, we conducted simulation studies to assess how well the Taylor series approximation, which is necessary for reparameterization, works under various conditions.
Last, we propose the model in the framework of individual measurement occasions (Cook & Ware, 1983; Finkel et al., 2003; Mehta & West, 2000). This can occur when time is measured precisely (e.g., exact date and time of measurement) or self-initiated or randomly assigned responses. For instance, in an adolescent smoking study, participants were asked to complete a questionnaire on pocket computers immediately after smoking (Hedeker et al., 2006). Earlier studies, for example, Preacher and Hancock (2015) and Sterba (2014), have demonstrated that one possible way to fit growth models with individual measurement occasions is the definition variable approach, in which the “definition variables” are defined as observed variables that are employed to adjust model parameters to individual-specific values (Mehta & Neale, 2005; Mehta & West, 2000). In our case, these individual-specific values are individual measurement occasions.
The remainder of this article is organized as follows. In the Method section, we first present the model specification of the bilinear spline growth model with TICs to estimate a knot with its variability (the full model). Next, we introduce how to reparameterize growth factors and the corresponding path coefficients from the TICs to the growth factors to make them estimable in the LGC modeling framework and transform them back after estimation for interpretation purposes. Following that, we propose one parsimonious model to estimate the knot without variability (the reduced model). We then describe the model estimation as well as the Monte Carlo simulation design for model evaluation. In the Result section, we evaluate the model performance with regard to the nonconvergence rate, the proportion of improper solutions, as well as the performance measures, including the relative bias, the empirical standard error (SE), the relative root mean squared error (RMSE), and the empirical coverage probabilities for a nominal 95% confidence interval (CI) of each parameter of interest. In the Application section, by applying the proposed model to a longitudinal data set of mathematics achievement scores from the Early Childhood Longitudinal Study, Kindergarten Class of 2010–2011 (ECLS-K: 2011), we give a set of feasible recommendations for real-world practice. Finally, a broad discussion is presented regarding practical considerations, methodological considerations, and future directions.
2. Method
2.1. Model Specification
This section briefly describes a linear-linear growth curve model with an unknown random knot and TICs used to analyze nonlinear change patterns, estimate the knot with its variability, and predict individual differences in trajectories. As shown in Figure 1, the linear-linear piecewise latent growth model, which is an extension of the linear growth model, specifies a separate linear function for each of the two stages of the developmental process for each individual. We can express the measure for the ith individual at the
where

Within-individual change over time with bilinear spline functional form.
Note that we cannot fit the linear-linear piecewise change pattern specified in Equation 1 in the LGC modeling framework for two reasons. First, the outcome variable
where
and
respectively, where
where
2.2. Transformation and Inverse-Transformation Between Two Parameter Spaces
When fitting a model, especially a complex model, in the SEM framework, selecting a proper set of starting values can improve the likelihood of convergence and accelerate the computational process. Generally, descriptive statistics and visualization are tools for researchers to decide suitable starting values for the parameters of interest. However, it may not be straightforward for the reparameterized coefficients. Accordingly, the transformation from parameters in the original frame to those in the reparameterized setting help decide appropriate starting values. More importantly, some reparameterized coefficients are not straightforward to relate to developmental theory and then need to be transformed back to be interpretable after estimation. So the inverse transformation that helps derive the coefficients in the original frame is also needed.
By using the multivariate delta method (Lehmann & Casella, 1998, Chapter 1), we can transform the point estimates (that are the MLEs as we use full information maximum likelihood [FIML]) of coefficients in the two parameter-spaces. Suppose
where
Similarly, suppose
where
Based on Equations 6 and 7, we can make the transformation between the growth factor means of two parameter spaces by
and
respectively.
When regressing growth factors on the individual-level covariates as we did in Equation 5, we also need to rereparameterize the growth factor intercepts and path coefficients. We center the covariates to simplify the calculation. On the one hand, the vector
We obtain the SEs around the points estimates for the parameters in the original frame (i.e., the derived parameters) as follows. We first have the point estimates and the corresponding SEs of the coefficients in the reparameterized frame (i.e., the free parameters) directly. Note that these SEs are based on Fisher Information since we employ FIML. The inverse-transformation function and matrix described earlier define the relationships between the free parameters and the derived parameters. There are two common approaches to get the SEs around the point estimates for the derived parameters: an analytical approach and the delta method. Note that these two methods are equivalent if all the defined relationships are linear. We present how to calculate the SEs for the derived parameters in Online Appendix A.4. Additionally, SEM computer programs, such as Mplus and the R package OpenMx, are capable of providing the point estimates and corresponding SEs for the derived parameters automatically when we define the relationships.
When fitting the model using software such as the R package OpenMx, which allows for matrix calculation on those estimates from the model by the function mxAlgebra() (Boker et al., 2020), we only need to provide
2.3. Model Estimation
For the ith individual, the expected mean vector and the variance-covariance structure of the repeated measurements of the model given in Equations 2 and 5 can be expressed as
and
where
The parameters in the model given in Equations 2 and 5 include the mean vector and unexplained variance-covariance matrix of reparameterized growth factors, the reexpressed path coefficients as well as the means and the variance-covariance structure of covariates. Then, we can obtain the parameters in the original setting, as shown in Section 2.2.
and
list the parameters in the original and reparameterized frames, respectively.
We then use FIML to estimate
and
respectively, where C is a constant, n is the number of individuals, and
2.4. Reduced Model
With an assumption that the knot of each individual is roughly similar, we can fix the between-individual differences in the knot to 0 and construct a reduced model to estimate a knot without considering variability. We can specify the reduced model as
where
and
The growth factor intercepts
For the ith individual, the expected mean vector and the variance-covariance matrix of the repeated outcomes of this model are
and
respectively. For this reduced model,
and
and they list the parameters in the original and reparameterized settings, respectively. By replacing
3. Model Evaluation
We evaluated the proposed method using a Monte Carlo simulation study with two goals. The first goal is to assess how the approximation based on the Taylor series expansion affects the performance measures, including the relative bias, the empirical SE, the relative RMSE, and the empirical coverage probability (CP) of a nominal 95% CI of each parameter. We provide the definitions and estimates of these performance metrics in Table 1. The second goal is to examine how the reduced model performed sufficiently well compared to the full model.
Performance Metric: Definitions and Estimates
Note.
In the simulation study, we followed Morris et al. (2019) and decided the number of replications S = 1,000 by an empirical approach. We conducted a pilot simulation run and found that SEs of all coefficients except the unexplained intercept variance (i.e.,
3.1. Design of Simulation Study
We list all simulation conditions in Table 2. As mentioned earlier, the parameters of the most interest in the full model are the knot, its variance, and the path coefficients to the knot. The conditions hypothesized to influence the estimation of these knot parameters, along with other model parameters, include the number of individuals, the number of repeated measurements, the knot locations, the knot variances, shapes of trajectories, measurement precision, and the proportion of growth factor variances explained by TICs. Accordingly, we did not investigate some conditions that presumably would not affect the model performance meaningfully. For instance, the mean vector and variance-covariance matrix of the first three growth factors (i.e., the intercept and two slopes) usually change with the measurement scale and time scale in practice, and there is little theoretical reason to expect different time scales to have an impact on estimation (Biesanz et al., 2004). Accordingly, in the simulation design, we fixed the mean of intercept and first slope and only changed that of the second slope. We also fixed the variance-covariance matrix of the first three growth factors and kept the index of dispersion (
Simulation Design for the Models
Note. TICs = time-invariant covariates.
For a model to analyze longitudinal data, the factor that we are most interested in is the number of repeated measures. Generally, the model should perform better if the study duration is longer, which we wanted to investigate by the simulation study. Another important factor for the linear-linear trajectory is the knot location. Intuitively, the model should perform the best if the knot is in the middle of study duration; it is our interest to test this conjecture as well. To this end, we chose two different levels of the number of measurements: six and 10. We selected six as the minimum number of repeated outcomes to make the full model fully identified.
4
For the conditions with six repeated measures, we set the knot at halfway of study duration (
In order to fit the full model in the SEM framework and obtain interpretable coefficients, the approximation was introduced by the Taylor series expansion. As shown in Online Appendix A.2, one necessary condition to apply this approximation is that the knot variance approaches 0. We then investigated three levels of between-person differences in the knot to evaluate how the introduced approximation affects the model performance and compare the full model versus the reduced model. We set the knot standard deviation (SD) as 0, 0.3, and 0.6 as the zero, medium, and large difference in the knot location. Note that the magnitude of the knot SD is relative to the scale of measurement occasions.
Additionally, we examined how the shape of trajectories quantified by the standardized difference between two slopes affects the full model. We considered both positive and negative differences (
3.2. Data Generation and Simulation Step
For each condition that is listed in Table 2, the general steps of the simulation study for the models are outlined as follows:
Generate data for growth factors and TICs simultaneously using the R package MASS (Venables & Ripley, 2002; details are provided in Online Appendix A.6).
Generate the time structure with J scaled and equally spaced waves tj
and obtain individual measurement occasions:
Calculate factor loadings, which are functions of individual measurement occasions and the knot, for each individual.
Calculate values of the repeated measurements based on growth factors, factor loadings, and residual variances.
Implement the full model and the reduced model on the simulated data, estimate the parameters, and construct corresponding 95% Wald’s CIs.
Repeat the above steps until after obtaining
The simulation study was conducted on a Linux Beowulf Cluster with R Version 3.4 and OpenMx Version 2.12.2.
4. Results
4.1. Model Convergence and Proper Solution
In this section, we first examined the convergence rate and component fit measures of each condition. The convergence
6
is defined as to achieve OpenMx status code 0, which suggests a successful optimization, until up to 10 attempts with different sets of starting values (Neale et al., 2016). Convergence rate of each condition was investigated. Based on our simulation studies, the full model and its reduced version converged satisfactorily (the convergence rate of the full model achieved at least 95% while that of the reduced model was 100%). Of a total of 576 conditions, 490 conditions reported 100% convergence rate and 67 conditions reported convergence rate of 99% to 100%. The worst scenario in terms of the nonconvergence rate across all conditions was
We also conducted diagnostics to investigate “improper solutions” (referred to the estimates that are impossible in the population) following Bollen and Curran (2005, Chapter 2), such as estimates of growth factor variances less than 0 and/or estimates of correlations between growth factors beyond
Number of Improper Solutions Among 1,000 Convergent Replications Under Conditions With 10 Repeated Measures and Midway Knot, 13% Explained Variance
a
For the scenarios including knots with variability, we observed that the proper solution rate has positive associations with the magnitude of standardized difference between two slopes, the measurement precision, the sample size, and the knot SD. The sign of dz , the shifted knot locations, or the proportion of explained variance of growth factors only affected the number of improper solutions slightly, although reduced follow-up time inflated this number. When such improper solutions occurred, we replaced the full model with its reduced version for the model evaluation.
4.2. Performance Measures
In this section, we present simulation results in terms of performance measures, including the relative bias, empirical SE, relative RMSE, and empirical CP for a nominal 95% CI for each parameter of interest. In general, the estimated coefficients from the full model were unbiased and accurate, with target coverage probabilities. We first provide the summary statistics (specifically, median and range) for each performance metric of each parameter across the conditions with 10 repeated measurements and the midway knot. Then, we discuss how the simulation conditions affect these performance metrics.
Tables 4 and 5 present the median (range) of the relative bias and empirical SE, respectively, for each parameter of interest across all conditions with 10 repeated measurements and the halfway knot for the full model and its reduced version. We first calculated each parameter’s relative bias/empirical SE across
Median (Range) of Relative Bias of Each Parameter Across Conditions With 10 Repeated Measures and Midway Knot
a— indicates that the relative biases are not available from the reduced model. b NA indicates that the bounds of relative bias is not available. The model performance under the conditions with zero population value of difference in knot is of interest where the relative bias of those knot coefficients would go infinity.
Median (Range) of Empirical Standard Error of Each Parameter Across Conditions With 10 Repeated Measures and Midway Knot
a— indicates that the empirical standard errors are not available from the reduced model.
We plotted the relative bias under each condition with 10 repeated measures and the midway knot for

Relative biases of path coefficients and residuals of knot under conditions with 10 repeated measures and midway knot.
From Table 5, estimates from the full model and its reduced version were precise: The magnitude of empirical SEs of slope or knot parameters were around 0.10, although those values of intercept parameters were relatively large: the empirical SE of the intercept mean and that of the unexplained intercept variance achieved 0.35 and 2.35, respectively.
Table 6 lists the median (range) of relative RMSE of each parameter for both models under the conditions with 10 repeated measures and the midway knot, which combines bias and precision to examine the point estimates holistically. From the table, both models estimated parameters accurately. The magnitude of relative RMSEs of growth factor means was under 0.05, and the median RMSE value of path coefficients to the intercept or slopes was around 0.20.
Median (Range) of Relative Root Mean Squared Error (RMSE) of Each Parameter Across Conditions With 10 Repeated Measures and Midway Knot
a— indicates that the relative RMSEs are not available from the reduced model. b NA indicates that the upper bound of relative RMSE is not available. The model performance under the conditions with zero population value of difference in knot is of interest where the relative RMSE of those knot coefficients would go infinity.
Table 7 presents the median (range) of the CP for the full model and its reduced version under the conditions with 10 repeated measurements and the midway knot. Under the majority of conditions, the full model performed well in terms of empirical coverage as the median values of coverage probabilities of all parameters were around 0.95. We noticed that the coverage probabilities of knot variance and path coefficients to the knot could be conservative under the conditions with 0.6 knot SD.
Median (Range) of Coverage Probability of Each Parameter Across Conditions With 10 Repeated Measures and Midway Knot
a For the full model, the reported coverage probabilities have four decimals since we calculated the coverage probabilities only based on the replications with proper solutions. b— indicates that the coverage probabilities are not available from the reduced model.
In summary, based on our simulation study, the full model performed better than the reduced model in terms of performance measures and generated unbiased estimates with small variance and proper target CP. As pointed out earlier, one necessary condition for applying the Taylor series approximation is that the knot variance approaches 0. Through the simulation study, we found that the full model was robust under the conditions with the medium level of knot SD (i.e.,
5. Application
This section demonstrates the use of the full model to approximate nonlinear trajectories and estimate a knot that varies across individuals. This application has two goals. Although researchers can employ the proposed model to estimate a knot and its variance of a developmental process directly, we realize that a considerable number of studies that examine the underlying change patterns might start from an exploratory stage, where researchers only have vague hypotheses in terms of the proper functional form of trajectories. One common approach for such exploratory studies is to fit a pool of candidate models and select the “best” one by prespecified criteria such as information criteria or the fit between the model implied trajectory and a smooth line of observed trajectories. We want to demonstrate this process in our application. Accordingly, we first fit the full and reduced LGC models with bilinear spline functional form with three common parametric change patterns: linear, quadratic, and Jenss-Bayley growth curve, demonstrate how to select the model, and then show how to interpret the coefficients obtained from the proposed model. Second, there are two possible time scales available in the data set: students’ age and their grade-in-school. We use children’s age (in months) to have individual measurement occasions in the main analysis and conduct a sensitivity analysis where students’ grade-in-school was used as the time scale. We hope to gain insights on how different time structures would affect estimation in practice through the sensitivity analysis. We randomly selected 400 students from the Early Childhood Longitudinal Study Kindergarten Cohort: 2010-11 (ECLS-K: 2011) with complete records of repeated mathematics item response theory (IRT) scaled scores, demographic information (sex, race, and age at each wave), school information (baseline school location and baseline school type), and socioeconomic status (baseline family income and the highest education level between parents). 7
ECLS-K: 2011 is a nationally representative longitudinal sample of U.S. children enrolled in about 900 kindergarten programs starting from 2010−2011 school year. Children’s mathematics ability was assessed in nine waves: fall and spring of kindergarten (2010−2011), first (2011−2012) and second (2012−2013) grade, respectively, as well as spring of third (2014), fourth (2015), and fifth (2016), respectively. Only about 30% students were sampled in the fall of 2011 and 2012 (Lê et al., 2011). In the subsample, 51% of children were male, and 49% of children were female. Additionally, 44% of students were White, 5% were Black, 38% were Latinx, 8% were Asian, and 5% were others. We dichotomized the variable race to be White (44%) and others (56%). At the start of the study, 11% and 89% students were in private and public schools, respectively. We treated the variables school location (ranged between one and four), highest parents’ education (ranged between zero and eigth), and family income (ranged between one and 18) as continuous variables, and corresponding mean (SD) was
5.1. Main Analysis
In this section, we use children’s age (in months) as the time metric to have individual measurement occasions. We first fit the full and reduced linear-linear piecewise models, as well as three parametric growth curve functions, that is, linear, quadratic, and Jenss-Bayley. The graphical representations of the model implied curves together with the smooth line of the observed mathematics IRT scores are shown in Figure 3. From the figure, the nonlinear functional forms fit better than the linear function generally; and the models with bilinear spline change patterns fit better than the parametric nonlinear models to the first two or three measurement occasions. As shown in Figure 3, both the quadratic growth curve and the Jenss-Bayley tended to underestimate the mathematics achievement in the early stage.

Model implied trajectory and smooth line of observed mathematics IRT scores.
Table 8 provides the obtained estimated likelihood, information criteria (including Akaike infromation criterion [AIC] and Bayesian Information Criterion [BIC]), and residuals of each LGC model. From the table, the model with the linear-linear functional form and a random knot has the largest estimated likelihood, smallest AIC, and smallest residual though its BIC is slightly larger than the LGC with the quadratic function. Moreover, from Table 8, the full model consistently performs better than its reduced version for this empirical example, indicating that we need to consider the variability of the knot, although the model implied trajectory of both captures the overall observed data well as shown in Figure 3. We then fit the full model to investigate the impact of covariates, such as sex, race, parents’ highest education (baseline), family income (baseline), school type (baseline), and school location (baseline), on the knot variance (and other growth factor variances). The model fit information is also provided in Table 8.
Summary of Model Fit Information for the Models
Table 9 presents the estimates of parameters of interest. It is noticed that postknot development in mathematics skills slowed down substantially (on average 1.731 and 0.688 per month in the pre- and postknot stage, respectively). The knot was estimated at 102 months (i.e., 8.5 years old) on average. We noted that the initial mathematics performance (60-month old in our case) was positively associated with family income, private school, and parents’ highest education. Moreover, the variable family income also can explain individual differences in the development rate of mathematics IRT scores in the first stage. Further, boys improved more rapidly than girls in terms of mathematics ability in the first stage. Additionally, students coming from schools in more urban areas arrived at the change-point earlier.
Estimates of the Parameters of Mathematics Trajectories
* Statistical significance at .05 level.
5.2. Sensitivity Analysis
In this section, we use students’ grade-in-school as the time scale, where the time metric takes on discrete values (e.g., 0 is for the kindergarten fall semester and 0.5 in years or 6 in months is for the kindergarten spring semester). We list the estimated likelihood, AIC, BIC, and the residual variance of the linear-linear piecewise growth model in Table 8. 8 Upon further examination, the mathematics ability slowed down between the sixth and the seventh wave (around the fall semester of Grade 3), matching the estimate in the main analysis: students in Grade 3 usually aged between 8- and 9-year old. The effect size of the estimates of the path coefficients to the knot increased slightly, but the direction of the estimates stayed the same.
6. Discussion
In this article, we extended an existing linear-linear piecewise growth model with an unknown random knot to the framework of individual measurement occasions to account for the heterogeneity of the measurement time. We explored the predictors to explain the between-individual differences in the within-individual change patterns. We also proposed using the multivariate delta method to derive the mean vector and variance-covariance matrix of growth factors as well as the path coefficients directly related to the underlying change patterns. More importantly, we examined the proposed method by extensive simulation studies to investigate whether the approximation introduced by the Taylor series expansion affects the model performance.
To demonstrate how well the introduced approximation works, we compared the full model (that requires the Taylor series approximation) with the reduced model and found that the full model performed better than the reduced one in terms of the performance measures under the conditions with varying knot locations. The simulation study also showed that the full model performed well under the conditions with a medium-level knot SD, which aligns with the essential condition of using the Taylor series expansion in our case. We also demonstrate the use of the proposed method on a subset with
6.1. Practical Considerations
Based on the simulation study and the real-world data analysis, we provide a set of notes for empirical researchers. First, it has been shown that the bilinear latent growth model can be identified with at least five measurements with a prespecified knot at the halfway of study duration (i.e., there are two measures before and after the knot, respectively), although the information about model identification of linear-linear growth model with an unknown knot is scarce. Accordingly, we recommend visualizing the trajectories to examine the number of measurement occasions pre- and postknot. Second, for an empirical study, we still recommend examining the issue of improper solutions. As shown in the simulation study, greater than 80% of improper solutions were observed when we overspecified the model. Based on this result, an improper variance and/or correlation may simply suggest that the knot location is roughly similar across all individuals. We recommend employing the reduced model to estimate a fixed knot. Third, based on the simulation studies, the proposed model performed well in general. However, in cases with challenging conditions such as the large knot SD (i.e., 0.6), the small standardized difference between two slopes (i.e.,
It also suggests that, in practice, we need to interpret these knot-related estimates carefully. As shown in the simulation study, the unexplained knot variance was underestimated when the knot SD was 0.6 but could be overestimated when it was 0.3 when considering scaled measurement waves. In the Application section, the mathematics ability was evaluated every 6 months (the first six measurements) or every 12 months (the last three measurements). Given that the knot SD should be greater than 6.5 (
The proposed model is computationally complex, and thus, a set of proper starting values help improve the likelihood of convergence and lighten the computational burden. We first decide on an appropriate scale for each parameter. Summary statistics and visualization techniques are helpful in deciding the proper scales of growth factor means. In the Application section, we set the scales of the starting values of the growth factor variance and covariance to 1 and 0, respectively. The determination of covariates-related starting values is straightforward since we standardize those covariates: We set the means of covariates to 0 and the variance-covariance matrix of covariates to the observed values. Additionally, the scale of starting values of path coefficients was set as 0.5. For a complex model, we may need to further search starting values around the scales. One common approach for fine-tuning is the “grid search” method, where we create a list of starting values around the scale of each parameter, run the model with different combinations of the starting values across the parameter space, and see from which set we can obtain a converged solution. However, we do not suggest employing this method, given the size of the parameters. Instead, we recommend using the function mxTryHard() in the R package OpenMx that can help to obtain convergent solutions, as we did in the simulation study and the real-world data analysis. The function mxTryHard() allows for multiple attempts to fit a model until generating an acceptable solution. At each attempt, each starting value is multiplied by a random draw from a prespecified uniform distribution. With these random starts, OpenMx is often able to find a convergent solution in our experience.
Additionally, the proposed model can be employed for either theoretical or empirical consideration. Theoretically, as stated earlier, the proposed models allow for estimating the growth rate of each of two stages and the transition time from one stage to another. We recommend using either the full model or the reduced model if these growth coefficients are of research interest. When analyzing trajectories in an exploratory stage, where researchers only have a vague assumption about the underlying change patterns, the functional form can be decided by an empirical approach as we did in the Application section. To demonstrate the influence of the time scale on the estimates, we conducted a sensitivity analysis in the Application section. The fixed effects of growth factors obtained from the model with different time scales were almost the same, though the estimated effect size of path coefficients increased when we used the grade-in-school as the time scale. This difference is not surprising as students in the same grade-in-school could be of different ages. In practice, we recommend selecting the time scale by research questions instead of based on statistical criteria.
6.2. Methodological Considerations and Future Directions
This article focused on the growth model with a bilinear spline functional form because it is straightforward, intuitive, and useful. It can be generalized to a linear spline with multiple knots or a nonlinear spline (such as a linear-polynomial piecewise), where the corresponding (inverse-) transformation matrices can also be derived. Besides, it is feasible to extend the current work to address longitudinal studies with dropout under the assumption of missing at random thanks to the FIML technique. Additionally, we focused on TICs in the current work, which is helpful to identify baseline characteristics to explain the variability of growth factors. It can also be extended to investigate time-varying covariates to explain individual differences in the trajectories.
From the simulation studies, we observed that the improper solutions might also be generated under some challenging conditions such as the small standardized difference between two slopes (i.e.,
The (inverse-) transformation between growth factors in two parameter spaces is valuable for the reparameterized longitudinal model. Preacher and Hancock (2012, 2015) proposed to employ reparameterization to estimate parameters of the greatest research interest directly (e.g., a knot and its variance). One drawback of the reparameterized longitudinal model initially was that other reparameterized coefficients were no longer related to the developmental theory and therefore lacked meaningful, substantive interpretation, thus requiring a back transformation of the model parameters. In the linear-linear piecewise growth model with an unknown random knot, this transformation is at the individual level as each individual has a set of “personal” growth factors. We then proposed to use the multivariate delta method to simplify the calculation. The matrix form is especially convenient when we use the R package OpenMx that can carry out matrix multiplication automatically. The transformation functions and matrices for other reparameterized longitudinal models can also be proposed.
We construct the proposed model in the LGC modeling framework, which allows us to extend the model further. For example, we can extend the proposed model to examine a joint development of multiple repeated outcomes. There are at least two possible directions to realize this aim. One possible extension is a parallel process, and correlated growth model (McArdle, 1988), where each repeated outcome takes the linear-linear functional form with an unknown random knot. By utilizing this model, we can examine a joint development of reading ability and mathematics ability. The other possible extension is a second-order growth model (Hancock et al., 2001), where the factor-level change pattern is supposed to take a linear-linear functional form. To demonstrate our point, suppose we want to investigate the development of students’ overall academic performance measured by their reading, mathematics, and science ability. To construct the model, we consider the measurement model’s factors as first-order factors since they are immediately above the measured reading, mathematics, and science skills. Additionally, the growth factors (i.e., the intercept, two slopes, and the knot) can be viewed as second-order factors because they are two levels above the measured skills. In this model, the second-order growth factors are employed to account for the within-individual change and between-individual differences in the first-order factors.
6.3. Concluding Remarks
In this article, we propose transformation functions and matrices between coefficients in the original and reparameterized parameter spaces to derive interpretable coefficients related to developmental theory for an existing reparameterized longitudinal model in the LGC framework. The results of the simulation study and the real-world data analysis demonstrate the model’s valuable capabilities of estimating the knot and the path coefficients to the knot in the framework of individual measurement occasions and providing the interpretable coefficients in the original parameter space. Furthermore, as discussed above, the flexibility of the SEM framework and the ability to use latent variables in the LGC model allows for further extensions for the proposed model.
Supplemental Material
Supplemental Material, sj-docx-1-jeb-10.3102_10769986211052009 - Obtaining Interpretable Parameters From Reparameterized Longitudinal Models: Transformation Matrices Between Growth Factors in Two Parameter Spaces
Supplemental Material, sj-docx-1-jeb-10.3102_10769986211052009 for Obtaining Interpretable Parameters From Reparameterized Longitudinal Models: Transformation Matrices Between Growth Factors in Two Parameter Spaces by Jin Liu, Robert A. Perera, Le Kang, Roy T. Sabo and Robert M. Kirkpatrick in Journal of Educational and Behavioral Statistics
Supplemental Material
Supplemental Material, sj-pdf-1-jeb-10.3102_10769986211052009 - Obtaining Interpretable Parameters From Reparameterized Longitudinal Models: Transformation Matrices Between Growth Factors in Two Parameter Spaces
Supplemental Material, sj-pdf-1-jeb-10.3102_10769986211052009 for Obtaining Interpretable Parameters From Reparameterized Longitudinal Models: Transformation Matrices Between Growth Factors in Two Parameter Spaces by Jin Liu, Robert A. Perera, Le Kang, Roy T. Sabo and Robert M. Kirkpatrick in Journal of Educational and Behavioral Statistics
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
