Abstract
Piecewise mixed-effects models are useful for analyzing longitudinal educational and psychological data sets to model segmented change over time. These models offer an attractive alternative to commonly used quadratic and higher-order polynomial models because the coefficients obtained from fitting the model have meaningful substantive interpretation. The current study thus focuses on the estimation of piecewise mixed-effects model with unknown random change points using maximum likelihood (ML) as described in Du Toit and Cudeck (2009). Previous simulation work (Wang & McArdle, 2008) showed that Bayesian estimation produced reliable parameter estimates for the piecewise model in comparison to frequentist procedures (i.e., first-order Taylor expansion and the adaptive Gaussian quadrature) across all simulation conditions. In the current article a small Monte Carlo simulation study was conducted to assess the performance of the ML approach, a frequentist procedure, and the Bayesian approach for fitting linear–linear piecewise mixed-effects model. The obtained findings show that ML estimation approach produces reliable and accurate estimates under the conditions of small residual variance of the observed variables, and that the size of the residual variance had the most impact on the quality of model parameter estimates. Second, neither ML nor Bayesian estimation procedures performed well under all manipulated conditions with respect to the accuracy and precision of the estimated model parameters.
Introduction
Longitudinal data analytic methods have significantly advanced our understanding of various cognitive and behavioral developmental processes (e.g., change in aptitude or onset of alcohol abuse) across a particular span of time. The overall scientific objective is to effectively describe patterns of change for each individual as well as for a population as a whole. Measuring change over time necessarily requires a longitudinal perspective where repeated measurements are gathered for individual subjects. Such an understanding is especially valuable as it affords researchers and practitioners the opportunity to individualize treatment, and achieve more equity with respect to desirable outcomes. To be concrete, for example, developmental psychologists, educators, policy makers, and researchers can enrich their understanding of the development of students’ diverse learning trajectories, their understanding of whether the change process in cognitive and/or behavioral development of students can be predicted from covariates (e.g., from students’ sociodemographic characteristics, teacher characteristics, and the classroom environment), and their understanding of whether the teaching, generally, as well as specific interventions have desirable effects on students’ development over time (e.g., Grimm, 2008; Jordan, Hanich, & Kaplan, 2003; Miles & Miles, 1992; Shin, Davison, Long, Chan, & Heistad, 2013).
It is often seen in educational and psychological longitudinal data sets that the overall growth or development pattern over time in observed repeated measurements consists of distinct segments (phases) of growth (e.g., Kreisman, 2003; Paris, 2005; Silverman, Speece, Harring, & Ritchey, 2012). For example, in studies of interventions (e.g., evaluating the effectiveness of Head Start programs) the children usually do not improve much in the months before the intervention and that after the intervention they show rapid improvement; thus, the rate of growth is different pre- and post- intervention (Kreisman, 2003). Thus, the overall growth trajectory in this scenario incorporates two distinct segments of change, one segment that characterizes a slower acquisition trend that gives way to a later rapid acquisition trend. Because the rate of change is different in different segments, piecewise mixed-effects models (PMEs) offer a flexible framework to analyze this kind of segmented data set. These models allow the specification of separate growth profiles (functional forms) corresponding to multiple developmental stages of the overall change process from which repeated observations are made, such that each segment does not have to follow the same functional form (Chou, Yang, Pentz, & Hser, 2004).
One of the most interesting features of a PME model is the change point (knot), the value of the predictor—time at which the response function switches from one segment to another. For instance, in research related to the development of reading-related skills among elementary school students researchers have found that the ability of students to accurately and automatically decode fluently increases in the beginning of the second grade, but then it slows down somewhere in the middle of the third grade (Silverman et al., 2012). Hence, in this case the change point is located around the middle of the third grade. The location of change point can be known a priori or can be estimated. Additionally, the change point can be treated as a fixed parameter, indicating that each individual’s change point is assumed to be the same, or random, in which case the change point is an estimable subject-specific parameter. A PME model with unknown random change points is a type of nonlinear random-effects model (Cudeck & Klebe, 2002; Du Toit & Cudeck, 2009; Wang & McArdle, 2008). The estimation of the PME model with unknown random change points is computationally very challenging because of the nonlinear random parameter (i.e., change point) where we estimate both its mean and its variance (Kohli, Harring, & Zopluoglu, 2016). The two common approaches for estimation of longitudinal models are: maximum likelihood (ML) estimation and Markov chain Monte Carlo (MCMC) for Bayesian estimation.
The primary objective of the current study is to extend the prior simulation-based literature related to the estimation of PME models with unknown random change points. Researchers have shown that Bayesian estimation produces more reliable parameter estimates for PME models with unknown random change points in comparison to frequentist procedures of first-order Taylor expansion and adaptive Gaussian quadrature across all the simulation conditions (Wang & McArdle, 2008). In our study we intend to employ a more recent version of modified ML estimation approach proposed by Du Toit and Cudeck (2009) which seems to provide better estimation of nonlinear mixed-effects models, including PME models. Thus, in this study we aim to demonstrate the performance of the ML estimation as described by Du Toit and Cudeck (2009) with respect to the accuracy and precision of estimated PME model parameters via a small Monte Carlo simulation study. Moreover, to provide a more complete and comprehensive discussion, we also present Bayesian estimation results using MCMC as implemented in the rjags package for R (Plummer, 2016). A secondary objective of this article is to promote the implementation of PME models with random change points. We provide an (online) annotated Supplemental Appendix B that describes the R code for the ML model estimation used in this study. The remainder of this paper is organized as follows. Earlier simulation studies on the estimation of PME models with unknown random change points are briefly reviewed and discussed. In a subsequent method section, the PME model equation and the design of the simulation study are described. In the final section, the results from the simulation study are presented.
Literature review
To date there are few prior studies that have used Monte Carlo simulation techniques to study the performance of different estimation techniques for estimating PME model parameters with unknown random change points. The earliest simulation study on this topic was by Morrell, Pearson, Carter, and Brant (1995). The authors demonstrated the development and the utility of the PME model with random change points for examining change in prostate-specific antigen (PSA) levels among men before their prostate cancers were detected clinically. The estimation algorithm employed by them was a combination of least squares estimators for nonlinear mixed-effects models and ML estimators for a linear mixed-effects model. Based on the results of the simulation study, they reported that the estimates of fixed-effects parameters were close to the true (population) values. The asymptotic standard errors of the fixed-effects parameter estimates, however, were underestimated resulting in shorter confidence intervals and hence small coverage rates. The estimated value of the residual variance was close to its true value, but the estimated variance-covariance matrix of the random-effects was underestimated.
In Dominicus, Ripatti, Pedersen, and Palmgren (2008), the authors used the PME model with random change points to analyze the change in cognitive function during old age. The estimation method used was MCMC simulation using Gibbs sampling. Additionally, the authors executed a small simulation study to investigate the performance of the MCMC Bayesian estimation approach. The simulation results showed that the estimated model parameters, including the estimated fixed-effects parameters, residual variance parameter and variance-covariance matrix of the random effects, were all close to the true parameter values.
In Wang and McArdle (2008), the authors compared the performance of Bayesian estimation to frequentist first-order Taylor series expansion and adaptive Gaussian quadrature for estimating PME models with unknown random change points. They reported that Bayesian estimation yields more accurate and precise parameter estimates as compared to the frequentist methods across all the manipulated simulation conditions. The model convergence rates were also reported to be low for the frequentist methods of estimation.
This leads to the purpose of the current study. That is, to implement the ML approach as described in Du Toit and Cudeck (2009) for estimating linear–linear PME model parameters with unknown random change points. The choice to incorporate the linear–linear function was made because it is a frequently applied function in many educational and psychological research settings. Moreover, its estimation is expected to be more challenging with respect to model convergence as well as the accuracy and precision of parameter estimates (see Appendix A). Using the estimation approach of Du Toit and Cudeck (2009), the parameters of a nonlinear mixed-effects model are separated into linear parameters and intrinsically nonlinear parameters. The dimension of the integral for computing the ML of the marginal distribution of
Method
Data generation & simulation design
The following two-phase linear–linear piecewise function was used to generate longitudinal data sets for the purpose of the current study
where
where β
1i
and β
2i
are the subject-specific intercept and slope parameters of the first phase, β
3i
is the subject-specific slope parameter of the second phase, and γi
is the subject-specific change point for the ith individual. The distribution of random errors is assumed to be multivariate normal with a zero mean vector and a covariance structure of
The simulation design (including the population values for parameters chosen) for this study is close to reported values of previous simulation design used in the literature (Wang & McArdle, 2008). The total number of measurement occasions was eight where individuals were assumed to be measured at time points of 0, 10, 20, 30, 40, 50, 60, and 70. The manipulated conditions in the simulation design were sample size, residual variance, and location of change point. The two levels of the sample size condition were: small (N = 100) and medium (N = 500); the two levels of the change point location were: early occurrence of change point (γ = 20) and change point occurring somewhere in the middle (γ = 35); and for the residual variance the two levels were: small (σe 2 = 1) and large (σe 2 = 9). The manipulated conditions were fully crossed, yielding a total of eight conditions in the simulation. Each simulated condition contained 100 replications. Across all simulated conditions, the set of fixed regression parameters were specified as β 1 = 80, β 2 = 0.5, and β 3 = 0.1 and the covariance structure of random effects was specified as a diagonal matrix.
Model estimation
The parameters of the PME model under the ML framework were estimated using an R routine 1 (Zopluoglu, Harring, & Kohli, 2014) built upon the ML estimation procedure described in Du Toit and Cudeck (2009). The main advantage of this procedure is that it integrates out the linear parameters (β 1i , β 2i , β 3i ) and reduces the dimension of the integral to the number of intrinsically nonlinear parameter(s), which is one in this particular model (i.e., γ), when computing the likelihood of the marginal distribution. In contrast, the adaptive Gaussian quadrature method as implemented in SAS PROC NLMIXED approximates a four-dimensional integral. The R routine based on Du Toit and Cudeck’s (2009) procedure also combines the idea of using multiple sets of random starting values as suggested by Hipp and Bauer (2006) in order to avoid local convergence or non-convergence and thereby, increasing the chance of reaching the global maxima. The number of random starting value sets for the initial stage iterations were fixed to 75.
As a first step, each set of random starting values was generated by fitting a piecewise regression model with unknown change point location to each individual’s data using the segmented package in R (Muggeo, 2008). The individual estimates for β 1, β 2, β 3, γ, and σe obtained from the first step subsequently gave us the distribution of each fixed parameter in the model. In the second step, the first and the third quartiles of each fixed parameter distribution were computed. These two values were used as the lower and upper boundaries of the uniform distribution corresponding to respective fixed model parameters, and then a random number was drawn from this uniform distribution as the starting value for these parameters. The starting value for the residual variance was the average of the estimated residual variance across all the individuals (i.e., the rows of the generated data sets). To get the starting values for the variances of β 1, β 2, and β 3 following Hipp and Bauer (2006), a random number was drawn from a uniform distribution whose minimum value was 0.05 and the maximum value was the corresponding variance estimate for the respective parameter distribution obtained from the first step. The starting values for growth factor covariances were randomly drawn from a uniform distribution such that the correlations were between −0.75 and 0.75. Following these aforementioned steps we generated multiple 75 random initial starting value sets.
The initial stage iterations were run for each set of the random starting value sets. The initial stage iterations were stopped when either the number of iterations exceeded 15 or the difference in the log-likelihood values between two successive iterations was smaller than 0.001. For the final stage iterations, the best set obtained from the initial stage iterations was re-estimated until the convergence criterion was met. That is, the final stage iterations were stopped when either the number of iterations exceeded 2000 or the difference in the log-likelihood values between two successive iterations was smaller than 0.0001. Furthermore, for both initial and final stage iterations, 20 Gauss–Hermite quadrature points were used to approximate the integration.
For the Bayesian estimation framework, non-informative prior distributions were considered for all model parameters, including the fixed mean parameters, random error, and the variance-covariance matrix of random effects. The prior distribution for the fixed linear parameters was normal with mean zero and variance defined by a diagonal matrix with the value of 100 on the diagonal. The prior distribution for the random error was inverse gamma with shape and rate parameters equal to 0.001. The prior distribution for the inverse of the linear random effects covariance matrix was a Wishart distribution with a 3×3 identity matrix as the scale matrix and 3 degrees of freedom. The prior distribution for the change point was a uniform distribution in the interval [10, 60] to avoid model identifiability issues that may occur at the boundaries of the time points (i.e., 0 or 70). And the prior distribution for the standard deviation of the random effect of the change point was uniform distribution in the interval [0, 35/2] to warrant the range of time points as feasible values of the random change point. To elaborate, the time points range between 0 and 70 and we have two simulated change point values (γ = 20, 35) whose variation must lie within such range. Taking the middle point of the range and dividing it by two (35/2) yields a reasonable value for the upper bound interval of the uniform distribution to assure that the values of the random change point lay within [0, 70]. In addition, given previous findings related to Bayesian estimation being not sensitive to starting values (Wang & McArdle, 2008), initial values for MCMC were randomly generated by rjags.
Outcome measures
Two outcome measures were computed for both estimation frameworks: average parameter bias and coverage rate. The average parameter bias is equal to the average difference between the respective estimated parameter and the corresponding true value. It was computed using the formula
where
In ML, assuming estimation errors are normally distributed, the coverage rate for a 95% confidence interval was computed. One would like (1 − α)100 of one hundred confidence intervals for a given parameter to contain the true value of that parameter, i.e. one desire a coverage rate of (1 − α)100% (Kohli, Hughes, Wang, Zopluoglu, & Davison, 2015). In Bayesian framework coverage rate is computed based on credible intervals, which are analogous to confidence intervals in ML framework. Thus, for the Bayesian approach coverage rate using credible intervals according to the frequentist convention was computed. Note that the coverage rate was computed using only successfully converged replications in each cell.
Results of simulation study
Model convergence rate
Table 1 shows the model convergence rate across the different manipulated conditions for both the ML and the Bayesian estimation procedures. The rate of converged replications for the parameters of the PME model across all the manipulated conditions for the ML estimation (total number of simulation conditions was 8) was found to be between 65% and 96%. At first, it seems that the lower bound of the range is small (i.e., 65%), however, upon closer inspection there is a reason why the convergence rate is small for some of the manipulated conditions. For conditions with large values of residual variance (i.e., σe 2 = 9) the convergence rate is between 65% and 76%, and for the conditions with small values of residual variance (i.e., σe 2 = 1) the convergence rate is between 83% and 96%. Thus, we can conclude that overall large residual variance has an adverse effect on the model convergence rate, which is consistent with the results from prior simulation studies related to the estimation of PME models (see, e.g., Kohli & Harring, 2013; Wang & McArdle, 2008).
Model convergence rate across all conditions.
Model convergence rate for the Bayesian estimation approach was, in general, higher than the ML estimation approach. It was found to be between 97% and 100%. The lower bound of convergence rate corresponds to high residual variance and early occurrence of change point (i.e., σe 2 = 9 and γ =20). However, these results (related to high convergence rate) should be taken with caution because as we will see in the following section, higher convergence rate for the Bayesian estimation framework did not necessarily translate to more accuracy and precision with respect to the estimation of some of the model parameters.
Average parameter bias
Table 2 presents the average bias for each parameter across all the manipulated conditions for both the ML and the Bayesian estimation procedures. As shown in Table 2, the size of the average bias for the mean parameter estimates (i.e.,
Average parameter bias across all conditions.
Upon closer inspection, we found that the estimation of change point variance was poor for the manipulated conditions where the value of residual variance was large (i.e., σe 2 = 9). There may be a possible explanation for this finding. The estimation of intrinsically nonlinear models (e.g., PME models) can run into the issue of a flat likelihood function, especially when signal to noise ratio is small (e.g., σe 2 = 9), which may result in a large average bias. Thus, it may explain the poor recovery of the true value of the change point variance parameter.
Average parameter bias for the Bayesian estimation approach was decent for all parameter estimates, except for the variances of the two linear slopes. The average bias for variances of slope parameters ranges from 0.01 to 0.03. This means that relative to true parameter values, a relative bias of up to 33% and 700% for variances of slope parameters of the first and second phases, respectively, was found using a Bayesian estimation approach. The average bias for the variance of the change point (i.e.,
In sum, both ML and Bayesian estimation approaches had challenges with respect to the recovery of some of the model parameters.
Parameter coverage rate
Table 3 shows the average coverage rates of the 95% confidence interval and credible intervals across all the model parameters. The ML procedure provided average coverage rates for the mean parameter estimates (i.e.,
Coverage rate across all conditions.
The Bayesian estimation approach provided a high coverage rate of 90–100% in cells pertaining to the estimation of parameters:
Overall, we can conclude that the ML estimation approach in this study did a good job of estimating model parameters under the condition of small residual variance of the observed variables on the outcome measures—convergence rate, average parameter bias, and coverage rate. Additionally, we found that, in general, Bayesian estimation performed poorly with respect to the recovery of the variances of slope parameters across all simulated conditions, but it performed better with respect to the estimation of the variance of the change point parameter.
One of the main findings of this study is that the ML estimation approach performs satisfactorily under the condition of small residual variance (i.e., σe 2 = 1). Upon closer inspection of the simulated data, we found some general guidelines for when a researcher may take ML estimation results with caution. We computed the mean ratio of the residual variance to the total variance of the response variable across time points for each data set and averaged this index over the 100 replications within each simulated condition. That is, we computed the mean error variance to overall variance ratio for each data set using the formula
where
Error variance and overall variance ratio.
Discussion
In summary, the current study expanded on prior simulation-based literature related to the estimation of PME models with unknown random change points. We used and evaluated the performance of the ML estimation using the approach proposed by Du Toit and Cudeck (2009) and the MCMC for Bayesian estimation with respect to the accuracy and precision of estimated linear–linear PME model parameters via a small Monte Carlo simulation study. The key findings from the simulation study were twofold. First, neither ML nor Bayesian estimation performed well under all manipulated conditions with respect to the accuracy and precision of the estimated model parameters, which is inconsistent with previous findings (Wang & McArdle, 2008). Wang and McArdle (2008) concluded that across all the manipulated simulation conditions frequentist first-order Taylor series expansion and adaptive Gaussian quadrature performed poorly in comparison to the Bayesian approach with respect to the quality of parameter estimates as well as the model convergence rates. In our study, results were not conclusive in favor of either estimation procedures. Second, we learned that under the condition of small residual variance of the observed variables the ML estimation approach did recover reliable and precise model parameter estimates, and that the size of the residual variance had the most impact on the quality of parameter estimates. To put this finding in the context of real data setting, often it is not possible to know the reliability of data collected beforehand. Some research settings by default produce data that has lots of noise, thus has large residual variance in the observed variable. For example, in the research area of Curriculum-Based Measurement of Oral Reading or CBM-R data collected weekly from second grade students across several weeks tend to have small signal to noise ratio, in general. But in other research settings, such as longitudinal controlled clinical trials, the data may have less noise and thus be more reliable. The decision to use ML estimation procedure for fitting PMEs should be based on several factors, such as sample size, residual variance, complexity of the models, and so on. An aspect to consider when deciding on the quality of the estimation results using the ML framework is the ratio of the mean error variance to overall variance. If this ratio is equal to 0.12 or higher, estimation results should be viewed cautiously. Last, the potential occurrence of flat likelihood function in the event of small signal to noise ratio when estimating intrinsically nonlinear parameter(s) is a well-known issue.
Overall, PME models provide a flexible framework to researchers and practitioners that enables them to characterize individual behavior that exhibits distinct phases of development in each segment, and thereby allow researchers to address key questions such as developmental studies or studies seeking to measure the effect of treatment/ intervention, and so forth such as when individuals may need to seek professional services at the timing when mental ability decreases. The estimation of different variations of PME models is an active area of research, where both frequentist and Bayesian procedures have been used for model fitting purposes.
Footnotes
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Note
Appendix A
The estimation of linear–linear piecewise function is more challenging relative to the estimation of quadratic-linear piecewise function used in the Wang and McArdle (2008) study. To elaborate on this point, let’s take a quadratic-linear piecewise function expressed as
where yij is the observation for the ith individual at the jth time point, xij refers to the time point at which the ith individual is observed (the time scale includes eight time points—0, 10, 20, 30, 40, 50, 60 and 70), τi is the subject-specific nonlinear change point, and
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
