Abstract
Researchers have developed a characteristic curve procedure to estimate the parameter scale transformation coefficients in test equating under the nominal response model. In the study, the delta method was applied to derive the standard error expressions for computing the standard errors for the estimates of the parameter scale transformation coefficients. This brief report presents the results of a simulation study that examined the accuracy of the derived formulas and compared the performance of this analytical method with that of the multiple imputation method. The results indicated that the standard errors produced by the delta method were very close to the criterion standard errors as well as those yielded by the multiple imputation method under all the simulation conditions.
The nominal response model (NRM, Bock, 1972) is an item response theory (IRT, Baker & Kim, 2004) model for analyzing items that elicit nominal or categorical responses, which has been recently used to examine the functioning of categories for items with ordered categories (e.g., Preston et al., 2011; Preston et al., 2015). Under the NRM, the probability of a respondent with ability level
where
One of the aspects of IRT equating is to place the parameter estimates for the items on the tests to be equated onto the same scale (Kolen & Brennan, 2014). This normally can be accomplished by conducting linear transformations on the estimates of the item parameters. Let us suppose that the parameters for the items in Test Y are transformed to the scale of the parameters for the items in Test X. In test equating under the NRM, the item category slope (
where A and B in Equations 2 and 3 are the two parameter scale transformation coefficients. A characteristic curve procedure for obtaining A and B in test equating under the NRM has been proposed by Baker (1993) and then revised by Kim and Hanson (2002).
The reporting of standard errors of equating has been advocated as a standard practice when conducting test equating (American Educational Research Association, American Psychological Association, & National Council on Measurement in Education, 2014). Despite the availability of the characteristic curve procedure for finding A and B, the approaches to estimating the standard errors for the estimates of A and B have not been developed and examined. There are at least three approaches to obtaining the standard errors for the estimates of A and B in test equating under the IRT models: the bootstrap method (Kolen & Brennan, 2014), the delta method (Andersson, 2018; Ogasawara, 2001; Zhang, 2020a), and the multiple imputation method (Zhang & Zhao, 2019). The bootstrap method is a resampling approach. It requires an IRT calibration on each bootstrap sample, which makes it very computationally intensive and time-consuming (Kolen & Brennan, 2014). Therefore, it was not considered in the study. The multiple imputation method makes use of multiple sets of random imputations of the item parameter estimates to compute the standard errors of equating coefficients, which is significantly less computationally demanding than the bootstrap method (Zhang, 2020b; Zhang & Zhao, 2019). The delta method is an analytical approach, the application of which depends on the development of relatively complicated mathematical formulas for the standard error expressions (e.g., Andersson, 2018; Ogasawara, 2001; Zhang, 2020a). However, provided that the mathematical formulas are derived, the delta method is much more efficient than both the bootstrap method and the multiple imputation method. More details about the comparison of the three standard error estimation procedures can be seen elsewhere (e.g., Kolen & Brennan, 2014; Zhang, 2020b; Zhang & Zhao, 2019).
In the study, the delta method, which has been applied to obtain the standard errors for the estimates of the parameter scale transformation coefficients (i.e., A and B) in test equating under the dichotomous IRT models (e.g., Ogasawara, 2001) and some polytomous IRT models (e.g., Andersson, 2018; Zhang, 2020a), was used to derive the standard error expressions for computing the asymptotic standard errors for the estimates of A and B estimated using the characteristic curve method in test equating under the NRM (Baker, 1993; Kim & Hanson, 2002). The derivations of the formulas can be seen in the supplementary file. This report presents the results of the application of the formulated delta method with the stimulated data.
Simulation Study
Method
Similar to the research designs employed in the studies of Baker (1993) and Kim and Hanson (2002), for convenience but without loss of generality, it is assumed that a common test which is composed of 20 items (i.e.,
Three manipulated factors were considered in the simulation study: sample size, number of categories, and group difference in ability distributions. Three levels of sample size were simulated:
The model calibrations were performed by using the R programming language with the package mirt (Chalmers, 2012) to fit the NRM to the simulated data. The sandwich estimator (Chalmers, 2018) was used to estimate the variance–covariance matrices for the item parameter estimates. The way how the variance–covariance matrices were transformed under different parameterizations can be seen in the supplementary file.
The R programming language was used to write code for estimating A and B and implementing the delta method to calculate the standard errors for the two coefficients. The R code can be made available upon request. For comparison, the multiple imputation method (Zhang & Zhao, 2019) was also applied to produce the standard errors for the two coefficients. The criterion standard errors were the empirical standard deviations of the estimated values of A and B calculated over 10,000 replications (Paek & Cai, 2014; Zhang & Zhao, 2019).
Results
Table 1 provides a summary of the criterion standard errors as well as the means and standard deviations of the estimated standard errors for the estimates of A and B calculated over the 300 replications. The results across the 12 simulated conditions consistently indicate that the standard errors produced by the delta method are nearly identical or extremely close to the criterion standard errors as well as those yielded by the multiple imputation method.
Descriptive Statistics of the Standard Errors for the Estimates of the Parameter Scale Transformation Coefficients Calculated Over the 300 Replications.
Note. Values within brackets are standard deviations. Empirical = criterion standard errors; MI = multiple imputation method;
Summary
The results of the simulation study suggest that the formulated delta approach presented in the current study can be utilized as a viable approach to determining the variability of the parameter scale transformation coefficients in test equating under the NRM. The delta method is more attractive than the bootstrap method and the multiple imputation method because it is significantly less computationally intensive and time-consuming. For example, in this simulation study, the delta method could be up to 8 times faster than the multiple imputation method which has been shown to be more computationally efficient than the bootstrap method (Zhang, 2020b; Zhang & Zhao, 2019). Hence, the delta method will be much more appealing than the multiple imputation method and the bootstrap method when the sample size is very large or when a complex linking design involving multiple test forms is employed (e.g., chain equating, Battauz, 2013). The delta method developed in the study offers researchers and practitioners a more feasible and computationally efficient procedure for facilitating the reporting of the standard errors for the parameter scale transformation coefficients in practical test equating involving the NRM.
Supplemental Material
Supplemental_Material – Supplemental material for Asymptotic Standard Errors of Parameter Scale Transformation Coefficients in Test Equating Under the Nominal Response Model
Supplemental material, Supplemental_Material for Asymptotic Standard Errors of Parameter Scale Transformation Coefficients in Test Equating Under the Nominal Response Model by Zhonghua Zhang in Applied Psychological Measurement
Footnotes
Acknowledgements
The authors would like to thank the Editor Dr. John R. Donoghue and the anonymous reviewers for their helpful and constructive comments.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Supplemental Material
Supplementary material is available for this article online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
