Abstract
When considering the two-parameter or the three-parameter logistic model for item responses from a multiple-choice test, one may want to assess the need for the lower asymptote parameters in the item response function and make sure the use of the three-parameter item response model. This study reports the degree of sensitivity of an overall model test M2 to detecting the presence of nonzero asymptotes in the item response function under normal and nonnormal ability distribution conditions.
When item response theory (IRT) is used for research and practice, the two-parameter logistic (2PL) and three-parameter logistic (3PL) models using the marginal maximum likelihood (MML) estimation where population ability distribution is assumed to be normal (MML-normal in short) are popular. If item responses are from a multiple-choice test, one may want to assess the necessity of pseudo guessing parameters in the item response function (IRF) and make sure the use of the 3PL model given the difficulty of the 3PL model estimation compared to the 2PL model. This study examines the sensitivity of an overall model–data fit test M2 (Joe & Maydeu-Olivares, 2010; Maydeu-Olivares & Joe, 2005, 2006) to detecting the presence of nonzero lower asymptotes in the IRF via simulation. (Compared to the popular likelihood ratio test for nested models as a relative model fit test (Haberman, 1977), M2 is an overall (or absolute) model fit test known as a limited information test and was proposed to overcome the problem of sparse data in the well-known full-information overall model–data fit tests such as Pearson’s
The key to determine the performance of M2 for detecting the nonzero lower asymptotes when the 2PL model is fitted to the data following the 3PL model is how much discrepancy the misspecification of the nonzero lower asymptotes makes between observed and model-expected first- and the second-order marginals. Typically item response data near the low or very low ability range are not as many as near the middle ability range. This means that the discrepancy due to the nonzero asymptotes may not be clearly manifest enough for M2 to have high power to detect the nonzero asymptotes with an ability distribution such as normal. It is conjectured that performance of M2 for detecting the nonzero lower asymptotes could be better with a PS ability distribution than a normal or a negatively skewed (NS) ability distribution.
Study Design
The data simulation factors were test length (TL = 15 and 30), sample size (N = 500 and 1,500), IRF forms (2PL or 3PL model), and five different ability distributions: positively skewed nonnormal2 (PS-nonnormal2), positively skewed nonnormal (PS-nonnormal), standard normal, negatively skewed nonnormal (NS-nonnormal), and negatively skewed nonnormal2 (NS-more-nonnormal2). The four factors resulted in 40 data generation conditions
Mean, Standard Deviation (SD), Skewness, and Excessive Kurtosis for Simulated Abilities.
Note. PS = positively skewed; NS = negatively skewed.
Results
The convergence of the model estimation was achieved for all replicated data sets for most of the conditions, passing the second-order test which indicates if the model solution is a potential local maximum. Six of the total 40 conditions had one through four nonconverged cases out of 500 replications. Those nonconverged estimation results were excluded in the calculation of the rejection rates by M2. The rejection rates of M2 are given in Table 2. (Per reviewer’s request, a separate simulation under the same data generating conditions was conducted for the likelihood ratio test for nested models comparing the 2PL and 3PL models. The results of this simulation are not included here, but available upon request from the authors.)
Rejection Rates of M2.
Note. PL = parameter logistic; IRF = item response function; PS = positively skewed; NS = negatively skewed.
The power of M2 to detect nonnormality was poor, that is, around the nominal significance level of 5% for most conditions and never exceeding 10% (see the results in the 2PL IRF model data in Table 2). The power of M2 to detect the nonzero lower asymptotes (i.e., results for the 3PL IRF model data in Table 2) was not satisfactory overall. Precisely speaking, the rejection rates for the 3PL IRF model data were the results due to both nonnormality of ability distributions and the nonzero lower asymptotes, but based on the observation of little or small impact of the nonnormality, we interpreted that the rejection rates of the 3PL IRF model data conditions were the results much more heavily affected by the nonzero lower asymptotes. The results without any degree of confounding by the nonnormal ability distribution are the rejection rates from the normal condition in the 3PL IRF model data. The conditions which exhibited more than 50% power with the nonzero lower asymptote data were only two: when TL is 30, N = 1,500, and the ability distribution is either normal or PS-nonnormal.
The observed weak power of M2 to detect nonnormality is consistent with the previously cited studies. Our study provided confirming further results for different nonnormal ability distribution conditions. The low power of M2 in this scenario (i.e., zero lower asymptote with nonnormality) may be linked to and interpreted as robustness of the 2PL model in predicting the first- and the second-order marginal subtables. Future studies that investigate the differences in those lower order marginals in detail could be conducted to corroborate this conjecture. Regarding the major focus of this study on the sensitivity of M2 to detect nonzero lower asymptotes, the current finding appears to suggest a large sample size and a long test for M2 to show satisfactory power. For M2 to exhibit more than half a chance to detect nonzero lower asymptotes with the assumption of normal ability held in the data, more than 30 items in a test and more than a sample size of 1,500 seem to be necessary based on the present study results. Note that the lower asymptotes were simulated from a truncated normal distribution with its mean equal to .23. If the average of lower asymptotes is smaller than this, a much longer test and a much larger sample size would be required. Finally, for the aforementioned conjecture about the effect of a PS ability distribution on the detection of nonzero lower asymptotes, it was observed when the TL is large (30), with which the PS-nonnormal condition showed higher power than the normal condition. However, for the PS-nonnormal2 condition, the PS nature of the nonnormal data did not help to raise the power of M2 to detect nonzero lower asymptotes. Although the PS-nonnormal2 had the same degree of nonnormality as PS-nonnormal in terms of skewness and kurtosis, the use of different mean and SD values as 0 and 1, respectively, has practically the equivalent design effect that produces more difficult items and higher discriminating items compared to the conditions having the mean and SD values equal to 0 and 1, respectively. Thus, in the PS-nonnormal2 condition, the effect of the PS distribution seems to be offset by this design effect in terms of the manifestation of the guessing effect in the lower order marginals. Research on adjusting the M2 test or alternative tests or procedures to improve the detection of nonzero lower asymptotes in IRT applications could be explored further.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
