Abstract
Mplus is a powerful latent variable modeling software program that has become an increasingly popular choice for fitting complex item response theory models. In this short note, we demonstrate that the two-parameter logistic testlet model can be estimated as a constrained bifactor model in Mplus with three estimators encompassing limited- and full-information estimation methods.
Keywords
Introduction
The testlet model (e.g., Bradlow, Wang, & Wainer, 1999) is a popular approach used by item response theory (IRT) researchers and practitioners to address the issue of local item dependence (LID) caused by a cluster of items sharing the same stimulus, which can be a reading comprehension passage, a scenario, a graph, and so on. Estimation of this model has usually been conducted with full-information methods, including Markov chain Monte Carlo (MCMC) algorithm (e.g., Koziol, 2016; Li, Bolt, & Fu, 2006) and marginal maximum likelihood estimation (MMLE; e.g., Jiao, Wang, & He, 2013; Li, Li, & Wang, 2010), although Bolt (2005) showed that both full- and limited-information methods can be applied to multidimensional item response theory (MIRT; Reckase, 2009) models. As different estimation methods are implemented in different software programs, researchers and practitioners may have to learn to use a software program with which they might not be familiar in order to use a particular estimation method. For example, SCORIGHT (Wang, Bradlow, & Wainer, 2004) and WinBUGS (Lunn, Thomas, Best, & Spiegelhalter, 2000) are software programs implementing MCMC that are often used to estimate the testlet model; for implementation of MMLE, TESTFACT (Wilson, Wood, & Gibbons, 1991), ConQuest (Wu, Adams, Wilson, & Haldane, 2007), and the SAS NLMIXED procedure (SAS Institute, 2015) are popular choices. IRTPRO (Cai, Thissen, & du Toit, 2015) is a relatively recent software program that has the flexibility to implement both MCMC and MMLE estimation methods for the testlet model.
Mplus (L. Muthén & Muthén, 1998-2012) is a popular statistical software program for latent variable modeling that is known for its provision of a large number of estimators. In this article, we show that the two-parameter logistic (2PL) testlet model, a special case of the MIRT model, can be estimated in Mplus with different estimators, namely the robust weighted least square estimator (WLSMV; B. Muthén, du Toit, & Spisic, 1997), the maximum likelihood estimator with robust standard errors (MLR), and the Bayes estimator. Among these three estimators, the WLSMV estimator is a limited information estimation method (the dominant method used in structural equation modeling) due to its use of only bivariate information, and the other two are full-information estimation methods that are commonly used in IRT.
The 2PL Testlet Model
The 2PL testlet model is given as
where
The bifactor model (Gibbons & Hedeker, 1992) takes the following form:
where
It has been shown that the 2PL testlet model is a special case of the more general bifactor model in that the discrimination parameter on a testlet trait is constrained to be the product of
Estimation of the 2PL Testlet Model in Mplus
Estimating a 2PL testlet model in Mplus is analogous to conducting a CFA analysis with a restricted bifactor model. One constrains the factor loadings on the secondary factors representing testlet effects to be the same as the corresponding factor loadings on the general factor regardless of the estimator used. For model identification purposes, variance of the general factor is fixed to be 1 and variances of the secondary factors can be freely estimated.
To convert the estimated factor loadings and threshold parameters to the item parameters in Equation (1), different estimators require slightly different procedures. If WLSMV is the estimator (with the default Delta parameterization), the conversion can be directly conducted with the following two equations (McDonald, 1999):
where
When the estimator is MLR or Bayes, in order to use Equations (3) and (4) we need to convert the factor loadings and thresholds estimated with MLR or Bayes to be on the same scale as those estimated with WLSMV. This conversion process requires the estimated unique variance of each item, which is not in the default output if either MLR or Bayes is used. In order to obtain the estimated item unique variance, we request Mplus to produce R2 (the proportion of explained variance in each item), which can be subtracted from 1 to obtain the estimated item unique variance. According to Asparouhov and Muthén (2016), the following equations can be used to convert the factor loadings and item thresholds estimated using MLR or Bayes to those estimated using WLSMV with the default Delta parameterization:
where
A Demonstration With Simulated Data
In this section, we demonstrate with a simulated data set that the 2PL testlet model can be estimated in Mplus with three different estimators. We also show that the estimated item parameters are comparable to those with NOHARM (normal ogive harmonic analysis robust method; Fraser & McDonald, 1988), a well-known IRT program for estimating MIRT models (e.g., Knol & Berger, 1991; Svetina & Levy, 2016). To simulate data, we assumed that there was a 30-item reading comprehension test taken by 2,000 examinees randomly drawn from a standard normal distribution; these 30 items were evenly distributed across six testlets with each having five items; we further assumed that all six testlets had testlet variances equal to 1. Plugging item parameters listed in the columns under “TRUE” in Table 3 in Equation (1), we generated a data set for subsequent analyses conducted in this section.
We fit the 2PL testlet model to the generated data set using the three sets of Mplus syntaxes in Figures 1 to 3. Figure 1 shows the Mplus syntax for the WLSMV estimator. Under the

Mplus syntax for two-parameter logistic (2PL) testlet model estimation with robust weighted least square estimator (WLSMV).

Mplus syntax for two-parameter logistic (2PL) testlet model estimation with maximum likelihood estimator with robust standard errors (MLR).

Mplus syntax for two-parameter logistic (2PL) testlet model estimation with Bayes estimator.
Within the parenthesis (line 6) we name the loadings of the 30 items on the general factor
Figure 2 shows the Mplus syntax for the MLR estimator. As this set of syntax is largely the same as that provided in Figure 1, we focus on the differences. One noticeable difference between the Mplus syntax provided in Figure 1 and that in Figure 2 is that the standardized solution is requested in the OUTPUT section for the MLR estimator (line 21), which provides the R2 estimate for each item. With WLSMV estimator Mplus automatically provides the R2 estimate, so it is not necessary to request it in the OUTPUT section. In addition, in the Mplus syntax with MLR estimator, we specify the link function to be probit (line 6). As mentioned earlier, the default link function with MLR estimator in Mplus is logit, and we change it to probit to be consistent with WLSMV and Bayes.
Figure 3 shows the Mplus syntax for the Bayes estimator. Same as the syntax for the MLR estimator, the standardized solution is also requested in the OUTPUT section for the Bayes estimator, as well as
Table 1 lists the testlet factor variance estimates in Mplus with the three estimators and in NOHARM. As the variance of the general factor was constrained to follow a standard normal distribution for model identification, a practice also implemented in IRT framework, these estimates can be directly interpreted as testlet variance estimates without additional transformation. As can be seen, all three estimators in Mplus produce factor variance estimates close to the generating value of one and similar to those in NOHARM.
Testlet Variances Estimates.
Note. NOHARM = normal ogive harmonic analysis robust method; WLSMV = robust weighted least square estimator; MLR = maximum likelihood estimator with robust standard errors.
Table 2 lists the factor loadings, item thresholds, and R2 estimates with the three estimators from the Mplus output. As can be seen, the factor loading and item threshold estimates with WLSMV estimator are not on the same scale as those estimated with MLR and Bayes estimators.
Factor Loading and Item Threshold Estimates.
Note. NOHARM = normal ogive harmonic analysis robust method; WLSMV = robust weighted least square estimator; MLR = maximum likelihood estimator with robust standard errors.
To convert the factor loadings and item thresholds to item discrimination and difficulty parameters, we directly apply Equations (3) and (4) to the estimates with WLSMV estimator and, for the estimates with MLR and Bayes estimator, we first apply Equations (5) and (6) to convert the estimates on the same scale as with WLSMV estimator, and plug in these results in Equations (2) and (3) to obtain the corresponding parameters in the IRT framework. In the following, we use the first item as an example to illustrate how to convert the factor loadings and item thresholds with MLR estimator to their IRT analogs. For this item,
As this item belongs to the first testlet and its loadings on the other testlet factors are 0, its factor loading vector
and the factor covariance matrix
Plugging the above values into Equations (3) and (4), we obtain the item parameter values for the first item listed under “MLR” in Table 3. As can be seen, the item parameter estimates with the three estimators in Mplus are similar to those in NOHARM and close to the true generating item parameter values.
True and Converted Item Parameter Estimates.
Note. NOHARM = normal ogive harmonic analysis robust method; WLSMV = robust weighted least square estimator; MLR = maximum likelihood estimator with robust standard errors.
Conclusions
The testlet model is a popular statistical model frequently used in the psychometric literature to address LID due to a cluster of items sharing the same stimulus, and it has been estimated with either MCMC algorithm or MMLE implemented in various software programs. To our knowledge there are no tutorials in the psychometric literature that show how to estimate the testlet models in Mplus, a popular latent variable modeling software program that has been used to estimate various complex IRT models (e.g., Finch & Bolin, 2017; Huggins-Manley & Algina, 2015). As a result, many Mplus users may have to resort to other statistical software programs with which they are not familiar if they would like to investigate possible LID in their data with the testlet model. This article adds to the current literature by showing that Mplus can be a viable tool for the estimation of testlet models with its provision of both limited- and full-information–based estimation methods.
Specifically, we demonstrated with a simulated data set that three different estimators used in Mplus encompassing both limited- and full-information estimation methods, namely WLSMV, MLR, and Bayes, can be used to estimate the 2PL testlet model (while Mplus can estimate the three-parameter logistic unidimensional IRT model, its testlet analog cannot be estimated), and the results were comparable to NOHARM, a well-established MIRT software program. We did not focus on the one-parameter logistic (1PL) testlet model, but as a special case of the 2PL testlet model it can be estimated with slight modification of the Mplus syntaxes provided in Figures 1 to 3. In the appendix, we provide example Mplus syntax for the 1PL testlet model with WLSMV estimator. As the 1PL testlet model differs from the 2PL testlet model in that all the item discrimination parameters are the same, we constrain the loadings of all items on both the general factor and the specific testlet factor to equality by naming them as
As the current article is of didactic nature, only one data set was simulated and consequently, we did not compute bias, standard error, and root mean square error, indices commonly used in simulation studies with multiple replications to evaluate parameter recovery. For applied researchers and practitioners who are Mplus users interested in applying the testlet model, the provided Mplus codes are readily usable and there is no need to migrate to other IRT statistical software programs that may take time and resources to learn. Methodological researchers may also benefit from the provided Mplus syntaxes, which can be used for simulation studies to investigate statistical and psychometric properties of the testlet model, and as the three estimators in Mplus illustrated in this article encompass different estimation methods, comparison studies of estimation methods can be easily conducted within Mplus without using additional software programs.
Footnotes
Appendix
Mplus Syntax for 1PL Testlet Model With WLSMV Estimator
| 1 | TITLE: 1PL Testlet Model Estimation |
| 2 | DATA: FILE IS r1.txt; |
| 3 | VARIABLE: NAMES ARE u1-u30; |
| 4 | CATEGORICAL ARE u1-u30; |
| 5 | ANALYSIS: ESTIMATOR = WLSMV; |
| 6 | MODEL: f by u1-u30* (l1); |
| 7 | f1 by u1-u5* (l1); |
| 8 | f2 by u6-u10*(l1); |
| 9 | f3 by u11-u15*(l1); |
| 10 | f4 by u16-u20*(l1); |
| 11 | f5 by u21-u25*(l1); |
| 12 | f6 by u26-u30*(l1); |
| 13 | f@1; [f@0]; |
| 14 | f with f1@0 f2@0 f3@0 f4@0 f5@0 f6@0; |
| 15 | f1 with f2@0 f3@0 f4@0 f5@0 f6@0; |
| 16 | f2 with f3@0 f4@0 f5@0 f6@0; |
| 17 | f3 with f4@0 f5@0 f6@0; |
| 18 | f4 with f5@0 f6@0; |
| 19 | f5 with f6@0; |
Note. WLSMV = robust weighted least square estimator; 1PL = one-parameter logistic.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
