Abstract
In this article, the authors suggest a profile-likelihood approach for estimating complex models by maximum likelihood (ML) using standard software and minimal programming. The method works whenever setting some of the parameters of the model to known constants turns the model into a standard model. An important class of models that can be estimated this way is generalized linear mixed models with factor structures. Such models are useful in educational research, for example, for estimation of value-added teacher or school effects with persistence parameters and for analysis of large-scale assessment data using multilevel item response models with discrimination parameters. The authors describe the profile-likelihood approach, implement it in the R software, and apply the method to longitudinal data and binary item response data. Simulation studies and comparison with
Keywords
1. Introduction
In this article, we suggest a profile-likelihood approach for estimating complex models by maximum likelihood (ML) using standard software and minimal programming. The method works whenever setting some of the parameters of the model to known constants turns the model into a standard model. An important class of models that can be estimated this way are generalized linear mixed models with factor structures.
Generalized linear mixed models, also known as multilevel or hierarchical generalized linear models (Goldstein, 2003; Raudenbush & Bryk, 2002), are popular models for longitudinal data as well as cross-sectional data with units nested in clusters, the canonical example being students nested in schools. Models with crossed random effects (e.g., Goldstein, 1987; McCaffrey, Lockwood, Koretz, Louis, & Hamilton, 2004; Raudenbush, 1993) to handle data with two or more nonnested classifications such as schools and neighborhoods are also becoming increasingly popular. These models can be estimated in a wide range of general purpose and specialized software packages. In this article, we consider an extension of these models to include factor structures and propose a method for estimating the extended models by ML.
To define this extension, we start by considering a standard two-level generalized linear mixed model with M random effects. The linear predictor for unit i in cluster j can be written as follows:
On the right-hand side of Equation (1), we see that
The reason for the name “factor loading” is that a confirmatory factor model can be specified as follows: The indicators or responses
We have described how a two-level linear mixed model with factor structures can be used to specify a confirmatory factor model. By changing the link function to a logit link, we obtain an item response model, where the factor loadings are now called discrimination parameters. If persons are nested in clusters, such as schools, a three-level mixed model with factor structures can be used to specify a multilevel factor or item response model. Since factor structures are not usually accommodated in software for generalized linear mixed models, researchers wanting to use such software to estimate multilevel measurement models have had to set the factor loadings to known constants (e.g., Raudenbush, Rowan, & Kang, 1991; Raudenbush & Sampson, 1999) or specify one-parameter item response models, with discrimination parameters set to one (e.g., Kamata, 2001; Maier, 2001).
Three general software packages for multilevel and latent variable modeling allow ML estimation of mixed models with factor structures:
We provide two examples handled by our approach. The first example is a random effects model for longitudinal data on students who are nested in middle schools for the first two occasions and high schools for the next two occasions where middle schools are cross-classified with high schools. The effects of middle schools and high schools on student outcomes are latent variables whose impacts change over time. The relative contribution of the school effects each year can be captured by associated factor loadings or persistence parameters. Such crossed random effects models with persistence parameters have been used for value-added assessments of school and teacher effects (e.g., McCaffrey et al., 2004). The second example is a multilevel two-parameter logistic (2PL) item response model for binary responses. In this example, the responses to the items in a test are viewed as nested within students (Adams, Wilson, & Wu, 1997; Mellenbergh, 1994). Since students are nested within schools, the resulting model is a three-level generalized linear mixed model with factor loadings or discrimination parameters. It is straightforward to incorporate covariates for students and schools. Such models are important for analyzing large-scale survey assessments such as the Programme for International Student Assessment (PISA) and the National Assessment of Educational Progress (NAEP) (e.g., Li, Oranje, & Jiang, 2009).
The outline of the article is as follows. In Section 2, we present our estimation approach in detail, including estimation of standard errors. The two examples follow in Section 3. For each example, a real dataset is analyzed and a small simulation study is performed. We briefly outline alternative models that can be handled by our approach in Section 4 and end with concluding remarks in Section 5. R code for the first application is presented in Appendix A (see the online Appendix, available at http://jeb.sagepub.com/supplemental).
2. Estimation
2.1. Profile-Likelihood Method
Suppose there are two sets of parameters,
We have written an R program to implement the nested maximizations described above. The
2.2. Standard Errors for
The covariance matrix of the parameter estimates is usually estimated by the inverse of the observed Fisher information matrix (minus the Hessian of the log-likelihood function evaluated at the ML estimates).
The required information matrix
2.3. Standard Errors for
Unlike for
In this section, we discuss three ways of estimating adjusted standard errors for
2.3.1. Hessian Matrix
The most straightforward way to obtain the correct standard errors for
2.3.2. Likelihood Ratio
The second method is based on the fact that the difference in twice the log likelihood between two nested models approximately equals the corresponding Wald statistic. In models where the log likelihood is quadratic in the parameters, such as linear mixed models with known variance parameters, the approximation becomes exact.
Suppose L and
A similar method was proposed by Miettinen (1976) who used any chi-square statistic (such as a Mantel–Haenszel statistic) for g in Equation (3). This approach was criticized by Halperin (1977) and Greenland (1984) because, for many parameters, the approximation works only when the null hypothesis is true. Here we assume, as is typically done in generalized linear mixed models, that the standard errors of regression coefficients do not depend on the true values of the coefficients (we will not apply this method to variance–covariance parameters). Also, by specifying a null value near the estimate (instead of a null hypothesis of “no association” as in Miettinen, 1976), we assume only that the log likelihood is quadratic near the mode, an assumption inherent in the estimation of asymptotic standard errors.
Notice that the likelihood ratio method requires p maximizations.
2.3.3. Delta Method
It follows from Parke (1986) that the asymptotic covariance matrix of
This method requires q maximizations if we use
Both the likelihood ratio and delta methods produce similar results in similar computation time. They both also involve a choice of d in Equations (3) and (4) that could affect the results. For the delta method, we can avoid an arbitrary choice of d using an established method for numerical derivatives (e.g., R package
3. Applications
The profile-likelihood method allows us to estimate a wide range of generalized (crossed) random effects models with factor structures. In this section, we describe two applications of this method.
3.1. Crossed Random Effects Model With Persistence Parameters
The first example is a study of school effects using longitudinal data where students are in middle school during the first two waves and in high school in Waves 3 and 4. The response variable could be vertically scaled achievement scores or psychological scales such as self-esteem where the questions do not change over time. Figure 1 shows the structure of the data as a diagram similar to those suggested by Browne, Goldstein, and Rasbash (2001).

Diagram for the longitudinal cross-classified data.
In the figure, rectangles represent units or clusters. A single arrow represents a nested relationship and unconnected rectangles at the same height represent a cross-classified relationship. The four repeated measures (Time 1 to Time 4) at the bottom level are nested within students. Students are nested both within middle school and high school but middle and high schools are crossed. The dashed arrows indicate that the first two measures (Time 1 and Time 2) are nested within middle school but the last two measures (Time 3 and Time 4) are nested within high school.
The required model is a crossed random effects model because students belong to a cross-classification of middle school and high school across time (e.g., see Goldstein, 1987; Raudenbush, 1993). We will fit the model
The model has occasion-specific parameters,
Figure 2 represents the model by adapting the diagram notation of Rabe-Hesketh et al. (2004) to models with nonnested random effects. Circles represent latent variables and rectangles observed variables. The frames represent the levels represent at which variables within them vary. Variables located within the frame labeled “MS” vary between middle schools and variables located within the frame labeled “HS” vary between high schools. The latent variable δ s for students is located in the intersection of these frames because subjects are cross-classified by middle schools and high schools. Arrows connecting latent and/or observed variables represent linear relations. We see that the middle school latent variable δ m affects all four responses, whereas the high school latent variable affects only Responses 3 and 4. The short arrows pointing at the responses from below represent the residuals etsmh .

Diagram for the crossed random effects model with persistence parameters.
Using the framework of Equation (2), we can write the factor structures as
This model is similar to multiple membership models that include weights for school effects in proportion to the amount of time spent in each school (e.g., Goldstein, Burgess, & McConnell, 2007; Hill & Goldstein, 1998). Unlike multiple membership models, however, we treat the weights (or impacts) of school effects as parameters to be estimated.
McCaffrey et al. (2004) employed a similar model to investigate teacher contributions to student achievement. They considered situations where students were cross-classified by the different teachers who taught them in different grades. Teacher effects were modeled as random effects and the varying impacts of the teacher effects (called persistence parameters) were estimated by ML. They used an unstructured covariance matrix for the Level-1 residuals e
tsmh
. While such a specification is currently not possible in
Because of the computational difficulty with ML estimation, Lockwood et al. (2007) suggested a Bayesian formulation of their earlier model (McCaffrey et al., 2004) and its extensions and used Markov chain Monte Carlo (MCMC). Lockwood et al. (2007) developed an MCMC algorithm in C since WinBUGS (Spiegelhalter et al., 1996) was prohibitively slow for their large datasets.
3.1.1. Empirical Study
We applied this model to the Korea Youth Panel Survey (KYPS) that sampled middle schools in the first stage and then randomly selected one class per school in the second stage. Students and parents were interviewed every year from 2003 to 2008. We analyzed data on 3,281 students observed in 104 middle schools at Waves 1 and 2 and 2,924 students followed up after dispersing into 860 high schools at Waves 3 and 4.
About 2.7% students switched their school membership during the middle school or high school years. Including them would necessitate making assumptions regarding the effects of the first and second middle schools in the second and subsequent waves of data and about the effects of the first and second high schools in the fourth wave of data. Since the portion of those students was small, we excluded them from the data for simplicity. In addition, 559 students with missing school identifiers were deleted. We handled the drop in student numbers after transition to high school by analyzing all available data under the missing at random (MAR) assumption (except for deleting 31 students who dropped out in Waves 2 and 4, although they could have easily been included). Missing school identifiers could alternatively be handled by assigning students to dummy schools as suggested by Lockwood et al. (2007).
There are an average of 34 students per middle school and 4 per high school, implying a highly sparse cross-classification between middle school and high school. The number of middle schools per high school ranges from 1 to 5, whereas the number of high schools per middle school ranges from 2 to 17. The response variable is self-esteem which is a mean-composite variable computed from six 5-point Likert-type scale items. The mean (standard deviation) of self-esteem is 3.16 (0.62), 3.26 (0.62), 3.31 (0.60), and 3.33 (0.61) at Waves 1, 2, 3, and 4, respectively. The internal consistency of the measures (Cronbach’s α) is on average 0.734.
Table 1
lists the result of fitting the model in Equation (5) to the data. All parameters were estimated using the profile-likelihood method described in the previous section. We used the delta method described in Section 2.3 to estimate the adjusted standard errors for the regression coefficients
Parameter Estimates and Standard Errors for Crossed Random Effects Model With Persistence Parameters
In the fixed part,
In the random part, we observe that the estimated within-student and between-student variation (
We do not provide the standard errors for variance parameters because the use of the standard errors for Wald-type tests and confidence intervals is inappropriate for these parameters (e.g., Berkhof & Snijders, 2001).
3.1.2. Simulation Study
Keeping the structure of the data as in the empirical application, we simulated new responses from the model fitted in the previous section with parameters set to the estimates in Table 1. This parametric bootstrapping procedure allows us to assess properties of the estimator. Table 2 summarizes the results for 100 replicates. The standard errors for the regression coefficients were obtained using the delta method.
Results of the Simulation Study for Crossed Random Effects Model With Persistence Parameters Estimated Using the Profile-Likelihood Method
Note. θ are the true parameters,
The estimated bias (
3.2. Multilevel 2PL Item Response Model
Another useful application is the multilevel 2PL item response model. One-parameter item response models can be viewed as two-level logistic regression models for binary responses Yip
to item i by person p, nested in persons, where the person ability is a random intercept and item difficulties are regression coefficients of item dummy variables (e.g., Adams et al., 1997; Mellenbergh, 1994)
Unlike one-parameter models, two-parameter models are no longer standard generalized linear mixed models due to the item discrimination parameters multiplying the latent variable. The two-parameter model is usually written as
Multilevel versions of the two-parameter model for student p nested in school k have been suggested by Fox & Glas (2001) and Rabe-Hesketh et al. (2004). The model we will consider here can be written as
Figure 3 shows a path diagram for the two-parameter multilevel item response model in Equation (8) using the path diagram conventions from Rabe-Hesketh et al. (2004). Here, the paths no longer represent linear relations and the short arrows represent Bernoulli variability instead of additive errors.

Diagram for the multilevel two-parameter item response model.
Multilevel two-parameter item response models have seldom been used due to computational obstacles. Standard software for item response models cannot handle latent variables at higher levels, whereas software for mixed models cannot estimate factor loadings or discrimination parameters. The general packages
In this section, we show how a multilevel two-parameter item response model can be fitted using the profile-likelihood method. Unlike linear mixed models for continuous responses, generalized linear mixed models for binary responses do not have a closed-form likelihood because the integrals over the random effects or latent variables are intractable. Whereas some software uses adaptive quadrature to evaluate the integrals numerically, the
As described in Section 4, more complex measurement models can also be estimated using the profile-likelihood approach.
3.2.1. Empirical Study
We analyzed data collected by Doolaard (1999), and previously analyzed by Cho and Rabe-Hesketh (2011), Fox and Glas (2001), and Vermunt (2007). (The data can be downloaded from http://www.statisticalinnovations.com/products/latentgold_datasets.html.) The data are from a Dutch primary school mathematics test that includes 18 items taken by 2,156 students who attended 97 schools in the Netherlands. Equation (8) was fitted using the profile-likelihood method. To estimate the adjusted standard errors for the fixed-effect parameters, the delta method was used. Differences between the delta and likelihood ratio methods were smaller than 0.001. The naive standard errors from
For comparison, the model was also fitted using
Parameter Estimates and Standard Errors for Multilevel Two-Parameter Item Response Model Using gllamm and the Profile-Likelihood Method
Note.
The last columns of Table 3 show the relative difference for all parameters and standard errors defined as the estimate using profile-likelihood minus the estimate using
3.2.2. Simulation Study
A small simulation study was carried out to evaluate the profile-likelihood method for Equation (8). We generated 50 datasets in which there are 10 dichotomous items for 1,000 students in 50 schools with 20 students per school. For each dataset, we fitted the model using both the profile-likelihood method and the
Summary of the Simulation Study for Multilevel Two-Parameter Item Response Model Using the Profile-Likelihood Method and gllamm
Note. θ are the true parameters,
The results for the profile-likelihood method and
4. Other Models That Can Be Estimated Using the Profile-Likelihood Approach
Any model that becomes a standard model when some parameters are set to known constants can be fitted using the profile-likelihood idea. Most of the models discussed in this section include factor structures and become standard generalized mixed models when the factor loadings are known.
An obvious extension of the multilevel two-parameter item response model discussed in Section 3 is to include more hierarchical levels and cross-classified levels, such as students nested in schools cross-classified with neighborhoods. To our knowledge, such a model has not been fitted before. It would also be straightforward to estimate multidimensional multilevel item response models (see, e.g., Goldstein, Bonnet, & Rocher, 2007) as outlined for the single-level case in the introduction. An example would be a bifactor model (Cai, Yang, & Hansen, 2011; Gibbons & Hedeker, 1992; Jeon, Rijmen, & Rabe-Hesketh, in press) where the entire test can be viewed as Level 3, whereas the item bundles or testlets are at Level 2, and the items are at Level 1.
Covariates could be added to the measurement model in several ways. Adding person or school covariates
Interactions between person covariates and item dummies can be added to the model to accommodate and test for differential item functioning (DIF); if such an interaction is included in the
We can also structure the difficulty parameters as a linear combination of item covariates (LLTM; De Boeck & Wilson, 2004; Fisher, 1983). When there is no discrimination parameter, the model is a standard generalized linear mixed model. Structuring the discrimination parameters this way in addition to the difficulty parameters, as suggested by Embretson (1999), we obtain (in the single-level case)
As discussed by De Boeck et al. (2011), models for polytomous responses with a continuation ratio logit link can be estimated by software for binary responses such as
For longitudinal data, Meredith and Tisak (1988) model nonlinear growth by specifying a growth curve model where the random coefficient does not multiply the known times associated with the measurement occasions, as in linear growth curve models, but unknown factor loadings to be estimated. The model has a similar structure to our model with persistence parameters, but it also includes a random intercept. For identification, two of the factor loadings are typically set to zero and one.
Factor loadings can also be used to relax the homoscedasticity assumption for random intercepts and random coefficients. For example, to let the variance of the random coefficient δ1j
of a covariate zij
differ between males and females, specify
Factor structures are not the only model extension that can be handled using the profile-likelihood approach. We could also estimate extensions of generalized linear (mixed) models with nonlinear terms such as
The above list of models that can be estimated using the profile-likelihood method clearly does not exhaust all possibilities but should give a flavor of the power of this approach.
5. Concluding Remarks
In this article, we have developed a simple approach for estimating complex models by ML using standard software and minimal programming. The method works whenever setting some of the parameters of the model to known constants turns the model into a standard model. An important class of models that can be estimated this way are generalized linear mixed models with factor loadings. Such models include random effects or latent variables weighted by some unknown parameters which are called factor loadings, persistence parameters, or discrimination parameters depending on the context.
We have described two applications in this article. Crossed random effects models with persistence parameters are useful for the assessment of value-added effects for teachers and schools. Multilevel two-parameter item response models are important for analysis of large-scale assessment studies such as NAEP and PISA.
We implemented the profile-likelihood method using the
Our implementation of the profile-likelihood method shares advantages as well as disadvantages of
The profile-likelihood method could also be implemented in other programming environments such as
Finally, as described in Section 4, we emphasize that the possible applications go further than the supplied cases. Much broader classes of models can be estimated using the proposed profile-likelihood approach.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
