Abstract
Group-based trajectory models are used to investigate population differences in the developmental courses of behaviors or outcomes. This note introduces a new Stata command, traj, for fitting to longitudinal data finite (discrete) mixture models designed to identify clusters of individuals following similar progressions of some behavior or outcome over age or time. Normal, Censored normal, Poisson, Zero-inflated Poisson, and Logistic distributions are supported.
Introduction
A developmental trajectory measures the course of an outcome over age or time. This note introduces a Stata plugin for estimating group-based trajectory models that adapts to the Stata platform a well-established Statistical Analysis System (SAS)-based procedure for estimating group-based trajectory model demonstrated in two prior articles in this journal (Jones, Nagin, and Roeder 2001; Jones and Nagin 2007).
Using finite mixtures of suitably defined probability distributions, the group-based approach for modeling developmental trajectories is intended to provide a flexible and easily applied method for identifying distinctive clusters of individual trajectories within the population and for profiling the characteristics of individuals within the clusters. Thus, whereas the hierarchical and latent curve methodologies model population variability in growth with multivariate continuous distribution functions, the group-based approach utilizes a multinomial modeling strategy. Technically, the group-based trajectory model is an example of a finite mixture model. Maximum likelihood is used for the estimation of the model parameters. For a recent review of applications of group-based trajectory modeling, see Nagin and Odgers (2010); and for an extended discussion of the method, including technical details, see Nagin (2005).
The fundamental concept of interest is the distribution of outcomes conditional on age (or time); that is, the distribution of outcome trajectories denoted by
The software provides three alternative specifications of p(.): the censored normal distribution also known as the Tobit model, the zero-inflated Poisson distribution, and the binary logistic distribution. The censored normal distribution is designed for the analysis of repeatedly measured, (approximately) continuous scales which may be censored by either a scale minimum or maximum or both (e.g., longitudinal data on a scale of depression symptoms). A special case is a scale or other outcome variable with no minimum or maximum. The zero-inflated Poisson distribution is designed for the analysis of longitudinal count data (e.g., arrests by age). The Poisson distribution is a special case with no zero inflation. The binary logistic distribution is available for the analysis of longitudinal data on a dichotomous outcome variable (e.g., whether hospitalized in year t or not).
The model also provides capacity for analyzing the effect of time stable covariate effects on probability of group membership and the effect of time dependent covariates on the trajectory itself. Let
Installation
Traj can be installed by issuing the following commands within Stata. An additional command, trajplot, supports plotting the results.
An Example
Figure 1 illustrates an application of the method to data on self-reported delinquent group membership from age 11 to 17 in a large Montreal-based longitudinal study of over 1,000 males. The self-report is in the form of a binary indicator variable (yes = 1/no = 0). The model was estimated with the logistic specification of p(*) and therefore the trajectories are defined by the probability of delinquent group membership over age. A three-group model was found to be best based on the Bayesian information criterion. One group, estimated to account for 74.3 percent of the sampled population, followed a trajectory of no involvement in delinquent groups. Another group estimated to account for 13.1 percent of population followed a trajectory of rising delinquent group membership in contrast to that of another group of about equal size which followed a trajectory of declining delinquent group membership. For details of this application, see Lacourse et al. (2002).

Trajectories of delinquent group membership.
The solid lines in Figure 1 are based on the parameter estimates of the model itself. The dashed lines form a 95 percent confidence interval on the estimated probabilities of delinquent group membership. The dots are calculated with the actual data where each individual’s responses are weighted based on posterior probabilities of group membership. This figure is the product of plotting software that is installed along with the estimation software.
Documentation
Documentation on the use of both the Stata plugin and the original SAS-based software is available at www.andrew.cmu.edu/user/bjones.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was generously supported by National Science Foundation Grants SES-102459 and SES-0647576.
