Item-Weighted Likelihood Method for Ability Estimation in Tests Composed of Both Dichotomous and Polytomous Items

Abstract

For mixed-type tests composed of both dichotomous and polytomous items, polytomous items often yield more information than dichotomous ones. To reflect the difference between the two types of items, polytomous items are usually pre-assigned with larger weights. We propose an item-weighted likelihood method to better assess examinees' ability levels. Simulation results show that the estimated ability based on the new procedure is more consistent with examinee’s true ability than the usual maximum likelihood method.

Keywords

bias dichotomous response generalized partial credit model item-weighted likelihood mixed-type test polytomous response

Introduction

A scaling process of educational or psychological testing often involves integration of information from multiple sources of items. In general, a well-constructed test can be characterized by a set of carefully sampled items across different subdomains. It is often the case that the scoring process involves assigning different scoring weights to different item types that have different emphases. For example, in a recent large-scale state reading assessment test, each operational multiple-choice item carries a weight of 0.8 while each constructed response item (scored 0–4) carries a weight of 1.2.

The underlying rationale of assigning different weights is easy to understand. In a mixed-type test composed of both dichotomous and polytomous items, polytomous items usually carry more information concerning the level of latent trait than dichotomous items (see, e.g., Donoghue, 1994; Embretson & Reise, 2000, p. 95; Jodoin, 2003; Penfield & Bergeron, 2005). Hence, assigning larger weights to polytomous items shall lead to more accurate estimates of the latent traits than equally weighting all items.

It comes into sight that further progress could accrue in both theoretical and empirical perspectives concerning how to use item weights in ability estimation. More specifically, how to incorporate the weight information into a new estimation procedure so that the accuracy and precision of ability estimation can be improved? The objective of this article is to develop an item weighting scheme according to different item types in mixed-type tests to achieve more accurate latent ability estimation. We propose an item-weighted likelihood estimation (IWLE) procedure that is used in a combination of the three-parameter logistic (3PL) model and the generalized partial credit model (GPCM) under the assumption that item parameters are known. Furthermore, we consider a bias-reduced IWLE (BR-IWLE) method by utilizing Warm’s (1989) technique. The Fisher scoring iteration equations are theoretically derived. Simulation results indicate that the estimated trait levels given by the new IWLE procedure are more consistent with the true trait levels.

Before the IWLE procedure is fully explained, it is important to clarify the essential difference between the weighting rationale of IWLE and those of the existing item weighting methods. For instance, Lord (1980, p. 74) considered the problem of optimal item weights for dichotomously scored items, in which the weights can be different from item to item. The focus of IWLE is different and is on differentiating the information gaining from different item types. Instead of assigning different weights to different items, one only needs to assign different weights to different item types; and items of the same type all have the same weight. Our goal is to rigorously inspect the maximum likelihood equation according to given item-type weights and derive the corresponding Fisher scoring function.

It is worth mentioning that assigning weights to items has always been an important application in item response theory (IRT). For example, Linacre and Wright (1995) described several methods that weights can be implemented with Rasch computer programs (e.g., WINSTEPS/BIGSTEPS). Warm (1989) proposed a weighted likelihood estimation (WLE) method for the dichotomous IRT model that provides a bias correction to the maximum likelihood method by solving a weighted log-likelihood equation. Warm’s method has been implemented by the PARSCALE (Muraki & Bock, 2003) program by specifying the WML method of estimating scale scores. Recently, Penfield and Bergeron (2005) extended Warm’s correction to the case of GPCM. Clearly, IWLE will become a new member of the weights' family.

This paper is organized into two parts. The first part begins with an introduction to the IWLE procedure and then compares the performance of IWLE with that of the usual maximum likelihood estimation (MLE) by a simulation study. The second part is to investigate the possibility of further bias reduction by incorporating Warm’s (1989) WLE method with the IWLE procedure. The performances of the three procedures: MLE, WLE, and the BR-IWLE are compared. Finally, a real data set from a large-scale reading assessment is used to demonstrate the estimation difference between the MLE and the BR-IWLE. The detailed derivations for the components used in the Newton–Raphson equations for the IWLE procedure are given in the Appendix.

Method

In a mixed-type test consisting of dichotomous and polytomous items, a polytomous IRT model may be sufficient for practical fitting needs, in which a dichotomous item is treated as a special case of polytomous item. However, many research works have shown that polytomous items provide more information concerning the level of the latent trait than dichotomous items (see, e.g., Donoghue, 1994; Jodoin, 2003; Penfield & Bergeron, 2005). To reflect the difference between the two types of items and finally to improve latent trait estimation, we will combine dichotomous and polytomous IRT models rather than just fitting a polytomous model.

3PL Model and GPCM

Let us consider a mixed-type test that consists of n items in which m are dichotomous and n − m are polytomous, and we assume that the 3PL (Birnbaum, 1968) and GPCM (Muraki, 1992, 1997) models fit the data well. To simplify the notation, the examinee subscript will not be shown in the following derivations. Then, the probability of the correct response on dichotomously scored item i at ability level θ is defined by

P_{i} (θ) = c_{i} + \frac{1 - c_{i}}{1 + exp [- D α_{i} (θ - β_{i})]},

where D is the scaling constant equal to 1.702, and α_i, β_i, and c_i are the discrimination parameter, difficulty parameter, and guessing parameter of item i, respectively. As to the polytomous items, the probability of selecting response j (where j = 0, 1,…, J) of polytomous item i at ability level θ is

P_{i j} (θ) = \frac{exp [\sum_{v = 0}^{j} D a_{i} (θ - b_{i v})]}{\sum_{v = 0}^{J} exp [\sum_{k = 0}^{v} D a_{i} (θ - b_{i k})]},

where a_i is the discrimination parameter of item i and b_iv is the location parameter of category v.

Based on the above 3PL model and GPCM, we consider the problem of likelihood estimation of ability. Note that the likelihood of response can be written as the product of two types of likelihood functions:

L (θ| U) = (\prod_{i = 1}^{m} P_{i} (θ)^{u_{i}} Q_{i} (θ)^{1 - u_{i}}) \cdot (\prod_{i = m + 1}^{n} \prod_{j = 0}^{J} P_{i j} (θ)^{u_{i j}}),

where Q_i (θ) =1 − P_i (θ), and the response matrix U contains the responses of dichotomous items

u_{i} = \{\begin{matrix} 1, & i f t h e e x a m i n e e g i v e s c o r r e c t r e s p o n s e o n d i c h o t o m o u s i t e m i, \\ 0, & o t h e r w i s e, \end{matrix}

for i = 1, . . ., m, and the responses of polytomous items

u_{i j} = \{\begin{matrix} 1, & i f t h e r e s p o n s e t o p o l y t o m o u s i t e m i i s i n t h e j t h c a t e g o r y, \\ 0, & o t h e r w i s e, \end{matrix}

for i = m + 1, . . ., n and j = 0, 1, . . ., J.

Next, to increase the precision of latent trait estimation, we consider the following item-weighted likelihood:

I W L (θ | U) = (\prod_{i = 1}^{m} {\{P_{i} (θ)^{u_{i}} Q_{i} (θ)^{1 - u_{i}}\}}^{{\tilde{w}}_{i}}) \cdot (\prod_{i = m + 1}^{n} {\{\prod_{j = 0}^{J} P_{i j} (θ)^{u_{i j}}\}}^{{\tilde{w}}_{i}}),

where the weights (

{\tilde{w}}_{1}

{\tilde{w}}_{2}

, . . .,

{\tilde{w}}_{n}

) satisfy

{\tilde{w}}_{i}

> 0 for each i and

\sum_{i = 1}^{n} {\tilde{w}}_{i} = n

. Here, it is worth noting that this special likelihood (Equation 4) is no longer the usual one because of power terms of

{\tilde{w}}_{i}

. The idea of constructing such a weighted likelihood function originated from a technical report of Hu and Zidek (1995). Since then, the weighted likelihood method has been further developed both in theory and application (see, e.g., Hu, 1997; Hu & Rosenberger, 2000; X. Wang, van Eeden, & Zidek, 2004; X. Wang & Zidek, 2005).

Now, a key issue is to determine how to assign different weights to the items of different types. A general rationale is that high-quality items should carry larger weights whereas low-quality items should carry smaller weights. It seems to be the best strategy to assign weights to the items according to their information. In this article, we investigate a common practice currently endorsed by many state assessments: Polytomous items carry a larger weight whereas dichotomous items carry a smaller weight. Our main objective is to theoretically derive an item weighting scheme for the MLE procedure such that it generates more accurate latent trait estimates.

Taking the natural logarithm on both sides of Equation 4 gives

log [IWL (θ | U)] = \sum_{i = 1}^{m} {\tilde{w}}_{i} [u_{i} \log P_{i} (θ) + (1 - u_{i}) \log Q_{i} (θ)] + \sum_{i = m + 1}^{n} {\tilde{w}}_{i} [\sum_{j = 0}^{J} u_{i j} \log P_{i j} (θ)] .

Note that the overall log-likelihood function (Equation 5) can be separated into two additive components. One is about the dichotomous items, denoted by

L_{1} \hat{1.5 p t =} \sum_{i = 1}^{m} {\tilde{w}}_{i} [u_{i} log P_{i} (θ) + (1 - u_{i}) log Q_{i} (θ)],

and the other is about the polytomous items, denoted by

L_{2} \hat{=} \sum_{i = m + 1}^{n} {\tilde{w}}_{i} [\sum_{j = 0}^{J} u_{i j} \log P_{i j} (θ)] .

The Newton–Raphson equation (or Fisher scoring equation) for estimating ability at iteration t is given by

{[\hat{θ}]}_{t} = {[\hat{θ}]}_{t - 1} - \frac{L_{1}^{'} + L_{2}^{'}}{L_{1}^{''} + L_{2}^{''}},

where the components

L_{1}^{'}

L_{2}^{'}

L_{1}^{''}

, and

L_{2}^{''}

are defined in the Appendix. See Equations A1 through A4 for detailed derivations.

Simulation Study (I)

To evaluate the performance of the proposed IWLE procedure, a simulation study was conducted. We artificially constructed a 30-item test consisting of both dichotomous and polytomous items. Note that tests with mixed item types are commonly used in large-scale assessments in the United States, for instance in the National Assessment of Educational Progress (NAEP). Both theoretical and empirical evidences have shown that polytomous items in general provide more information than that of dichotomous items regarding the precision of trait level estimation (e.g., Donoghue, 1994; Embretson & Reise, 2000, p. 95; Jodoin, 2003; Penfield & Bergeron, 2005). A natural way to allocate weights is to assign higher weights to polytomous items and lower weights to dichotomous items. In practice, the allocation of weights is usually determined by test developers prior to administration of the tests.

Here are some explanations about the assignment of weights in our simulation study. Note that the item-weighted likelihood defined in Equation 4 will reduce to the usual likelihood when all ${\tilde{w}}_{i}$ are equal to 1. Hence, “1” should be taken as a benchmark of the magnitude of weights carried by items. As a result, items with relatively higher quality should carry weights larger than 1, while items with relatively lower quality should carry weights smaller than 1. However, the difference between the larger and smaller weights should not be too big. In practice, both larger and smaller weights should be in an appropriate range (say, (1,2) for larger weights and (0,1) for smaller ones).

To explore the effect of the assignment of weights, an intensive simulation study is conducted to cover a wide range of index values, such as the total number of items, the proportion of dichotomous and polytomous items among the whole mixed-type test, and the size of the two types of weights. In general, the simulation results exhibit a similar change trend. Here, due to page limitation, only simulation results based on the following scenario are discussed: the mixed-type test consists of 20 dichotomous items and 10 polytomous items and each dichotomous item has a weight 0.6 and each polytomous item has a weight of 1.8. Here, these two particular weights were assigned just for illustrative purposes.

Item parameters

All the discrimination parameters were randomly generated from the uniform distribution in the range [0.5, 1.5]. The difficulty parameters of the dichotomous items were randomly generated from the standard normal distribution N(0,1), while the guessing parameters were set to 0 in this part of simulation. In the next simulation study (II), we consider the case in which the guessing parameters were randomly generated from the uniform distribution in the range [0.1, 0.3]. The location parameters of each polytomous item were randomly generated from four normal distributions: b_i ₁ ~ N(−1.5, 1), b_i ₂ ~ N(−0.5, 1), b_i ₃ ~ N(0.5, 1), and b_i ₄ ~ N(1.5, 1), i = 21, . . ., 30.

Ability parameters

In the simulation, 17 equally spaced θ values were considered, ranging from −4.0 to +4.0 within an increment 0.5. At each θ, N = 2,000 replications were performed. At each replication, the dichotomous item responses were simulated according to the 3PL model, and the polytomous item responses were simulated according to the GPCM. The same item responses were used for both MLE and IWLE procedures.

Evaluation criteria

Accuracy, that is, mean bias (Bias), and precision, that is, root mean squared error (RMSE) of the ability estimates, were used to evaluate all the procedures.

Bias and RMSE

Bias was estimated using Equation 9. For Equation 9, we let θ be the true ability value and ${\hat{θ}}_{l}$ be the corresponding ability estimate for the lth replication. Then the estimated bias is computed as

B i a s = \frac{1}{N} \sum_{l = 1}^{N} ({\hat{θ}}_{l} - θ) .

RMSE is calculated using Equation 10:

R M S E = \sqrt{\frac{1}{N} \sum_{l = 1}^{N} {({\hat{θ}}_{l} - θ)}^{2} .}

That is, Bias is the mean difference between the true ability and the corresponding ability estimate. RMSE is the root of the mean squared difference between the true ability and the corresponding ability estimate.

Results of simulation

Figure 1 shows the results of bias, absolute bias, and RMSE calculated from a pilot study based on the setting described above.

Figure 1.

Comparisons of the bias, absolute bias, and root mean squared error (RMSE) between the maximum likelihood estimation (MLE) and the item-weighted likelihood estimation (IWLE): 20 dichotomous items with common weight 0.6 and 10 polytomous items with common weight 1.8.

The simulation results show that IWLE outperforms MLE regarding reduction in bias especially at extreme levels of the latent trait. However, we note that the effect of bias reduction of IWLE is not as ideal as expected when the levels of latent trait are in the intermediate range (say, $|θ| < 2.0$ ). Therefore, we mainly focus on the issue of bias reduction for the intermediate levels of the latent trait in the next section.

Bias Reduction

Up to the present time, many methods of bias reduction have been proposed, for example, see Lord (1983); Warm (1989); Firth (1993); T. Wang, Hanson, and Lau (1999); and S. Wang and Wang (2001). In this section, we investigate the feasibility of bias reduction of the IWLE procedure using Warm’s (1989) WLE technique.

To simplify the notation, let P_i (θ) = P_i, Q_i = 1 − P_i, W_i = P_iQ_i , and P_ij (θ) = P_ij . Note that

P_{i} = c_{i} + \frac{1 - c_{i}}{1 + exp [- D α_{i} (θ - β_{i})]},

can be expressed as

\frac{P_{i} - c_{i}}{1 - c_{i}} = \frac{1}{1 + exp [- D α_{i} (θ - β_{i})]} \hat{1 p t =} P_{i}^{*} .

Then we have

Q_{i}^{*} \hat{=} \frac{Q_{i}}{1 - c_{i}} = 1 - P_{i}^{*} .

The test information function for mixed-type test (denoted as I(θ)) is composed of two parts, the dichotomous part (denoted as I₁(θ)) and the polytomous part (denoted as I₂(θ)). As given in Baker (1992) and Baker and Kim (2004), the test information for the dichotomous items is

I_{1} (θ) = \sum_{i = 1}^{m} {D^{2} α_{i}^{2} W_{i} [\frac{P_{i}^{*}}{P_{i}}]}^{2} .

Furthermore, as given in Muraki (1993) and Donoghue (1994), the test information for the polytomous items is

I_{2} (θ) = \sum_{i = m + 1}^{n} D^{2} a_{i}^{2} [\sum_{j = 0}^{J} j^{2} P_{i j} - {(\sum_{j = 0}^{J} j \cdot P_{i j})}^{2}] .

Warm (1989) proposed a WLE method for the dichotomous IRT model, which provides a correction to the maximum likelihood method by solving an adjusted, or weighted, log-likelihood equation. In light of the superior performance of the WLE method in reducing bias, Penfield and Bergeron (2005) extended this correction to the case of GPCM.

Following the idea of Warm (1989) and Penfield and Bergeron (2005), a class of estimators, θ^*, may be defined as the value of θ that maximize the following Equation 14:

B R - I W L (θ | U) = f (θ) (\prod_{i = 1}^{m} \{P_{i} (θ)^{u_{i}} Q_{i} (θ)^{1 - u_{i}}\}) \cdot (\prod_{i = m + 1}^{n} \{\prod_{j = 0}^{J} P_{i j} (θ)^{u_{i j}}\}),

where f(θ) is the square root of the test information, that is,

f (θ) = \sqrt{I (θ)} = \sqrt{\sum_{i = 1}^{n} I_{i} (θ)}

, and

I_{i} (θ)

denotes the information function of item i.

Taking the natural logarithm on both sides of Equation 14 gives

l o g [B R - I W L (θ | U)] = log f (θ) + L_{1} + L_{2},

where L ₁ and L ₂ are given by Equations 6 and 7 with

{\tilde{w}}_{i}

= 1 for i = 1, . . ., n.

Let $B = log f (θ)$ . The Newton–Raphson equation (or Fisher scoring equation) for estimating ability at iteration t is given by

{[\hat{θ}]}_{t} = {[\hat{θ}]}_{t - 1} - \frac{B^{'} + L_{1}^{'} + L_{2}^{'}}{B^{''} + L_{1}^{''} + L_{2}^{''}},

where the components

L_{1}^{'}

L_{2}^{'}

L_{1}^{''}

, and

L_{2}^{''}

have been defined above, while the components

B^{'}

and

B^{''}

are given in the Appendix. See Equations A5 and A6 for detailed derivations.

Simulation Study (II)

In this section, the performance of the three methods, the usual MLE, the WLE, and the Bias-Reduced IWLE (BR-IWLE), are compared.

Figure 2 shows the bias, absolute bias, and RMSE calculated from a pilot study based on the setting described in the simulation study (I).

Figure 2.

Comparisons of the bias, absolute bias, and root mean squared error (RMSE) among the maximum likelihood estimation (MLE), the weighted likelihood estimate (WLE), and the bias-reduced item-weighted likelihood estimation (BR-IWLE): 20 dichotomous items with common weight 0.6 and 10 polytomous items with common weight 1.8. The guessing parameters in the 3PL model were set to 0.

The simulation results show that BR-IWLE outperforms both WLE and MLE regarding reduction in both Bias and RMSE.

True score

Besides the two evaluation criteria (Bias and RMSE), an additional criteria (True Score) was used to evaluate all three procedures. The estimated true scores are computed to account for measurement error. The true score for the mixed-type test is composed of two parts (i.e., dichotomous part and polytomous part) as follows:

T_{T r u e} (θ) = \sum_{i = 1}^{m} P_{i} (θ) + \sum_{i = m + 1}^{n} \sum_{j = 0}^{J} j \cdot P_{i j} (θ) .

Then three estimated true scores based on MLE, WLE, and BR-IWLE, respectively, at each θ can be obtained by replacing θ with the corresponding estimation, say,

{\hat{θ}}_{M L E}

{\hat{θ}}_{W L E}

, and

{\hat{θ}}_{B R - I W L E}

. Since

T_{T r u e} (θ)

is not known in real application, its difference between

T_{M L E} (θ)

T_{W L E} (θ)

, and

T_{B R - I W L E} (θ)

, respectively, should be investigated in the simulation study.

Figure 3 demonstrates that the estimated true scores based on BR-IWLE are always closer to the true scores than those based on MLE and WLE. Furthermore, from Table 1 , we can see that $|T_{T r u e} (θ) - T_{B R - I W L E} (θ)|$ are uniformly smaller than both $|T_{T r u e} (θ) - T_{M L E} (θ)|$ and $|T_{T r u e} (θ) - T_{W L E} (θ)|$ .

Figure 3.

True score vs. estimated true scores: 20 dichotomous items with common weight 0.6 and 10 polytomous items with common weight 1.8. MLE = maximum likelihood estimation; WLE = weighted likelihood estimate; BR-IWLE = bias-reduced item-weighted likelihood estimation.

Table 1.

Differences Between the True TrueScore and the Estimated TrueScores: 20 Dichotomous Items With Common Weight 0.6 and 10 Polytomous Items With Common Weight 1.8

θ	$\|T_{T r u e} (θ) - T_{M L E} (θ)\|$	$\|T_{T r u e} (θ) - T_{W L E} (θ)\|$	$\| T_{True} (θ) - T_{BR-IWLE} (θ) \|$
−4.0	1.2864	1.6746	0.9817
−3.5	1.0254	1.4030	0.7887
−3.0	0.5794	0.9402	0.4535
−2.5	0.3083	0.6282	0.2477
−2.0	0.1640	0.3185	0.1375
−1.5	0.0234	0.1254	0.0111
−1.0	0.0931	0.0852	0.0852
−0.5	0.0070	0.0049	0.0049
0	0.0222	0.0057	0.0057
0.5	0.0215	0.0055	0.0055
1.0	0.0324	0.0118	0.0118
1.5	0.0280	0.0073	0.0085
2.0	0.0898	0.0475	0.0113
2.5	0.4698	0.8036	0.2011
3.0	0.8785	1.2569	0.3020
3.5	1.1781	1.5906	0.7553
4.0	1.2819	1.7203	0.7125

Note. MLE = maximum likelihood estimation; WLE = weighted likelihood estimate; BR-IWLE = bias-reduced item-weighted likelihood estimation.

Finally, to investigate the effect of the guessing parameters, Figure 4 shows the bias, absolute bias, and RMSE when the guessing parameters were randomly generated from the uniform distribution in the range [0.1, 0.3]. From Figure 4, we can see that, at extreme low levels of the latent trait (say, θ < −3.0), the performance of BR-IWLE almost has no difference from MLE. However, BR-IWLE outperforms MLE at higher levels of the latent traits (say, θ > 2.0).

Figure 4.

Pilot Study Based on Real Data

To study the applicability of the IWLE method in operational large-scale assessments, a pilot study on a sample of 2,000 examinees was conducted. The test studied is from a recent state reading assessment consisting of 50 dichotomous items and 1 polytomous item (5-category). The weights were preassigned by the state testing board: 1.0 for dichotomous items and 1.39 for the 5-category polytomous item. After standardization with respect to the constraint $(1 / 51) \sum_{i = 1}^{51} {\tilde{w}}_{i} = 1$ , we obtain the weights for all the items: ${\tilde{w}}_{i}$ = 0.9924 for i = 1, . . ., 50; and ${\tilde{w}}_{51}$ = 1.3795. Based on the usual MLE and the BR-IWLE procedures, we obtained the estimates of ability levels of 2,000 examinees. The total absolute difference and the total relative difference of estimated abilities based on the two procedures are respectively,

κ = \sum |{\hat{θ}}_{B R - I W L E} - {\hat{θ}}_{M L E}| = 76.77

and

ℜ = \sqrt{\sum {(\frac{{\hat{θ}}_{B R - I W L E} - {\hat{θ}}_{M L E}}{{\hat{θ}}_{B R - I W L E}})}^{2}} = 19.29.

From the computational results, we can see that the difference between MLE and BR-IWLE is quite apparent even though there was only one polytomous item among the 51 items. It is worth mentioning that the difference will become very significant when the number of polytomous items increases.

Conclusions and Discussion

In many testing programs, it is often the case that different weights are assigned to different items according to certain practical emphases. Therefore, incorporating the weights in θ estimation is important because without considering the scoring weights may not capture all the information and emphases the test provides. In this article, we proposed the IWLE procedure that incorporates item weights into θ estimation. We theoretically derived the formulas for the Newton–Raphson iterations to solve the IWL estimates. According to Equations 8 and 16, the proposed IWLE method can be easily implemented by many practitioners.

A simulation study was conducted under various considerations, such as different settings in item weight, different combinations of dichotomous and polytomous items. The results from the pilot simulation study clearly demonstrate that the proposed IWLE method outperformed the usual MLE in terms of controlling bias and RMSE under the situation where polytomous items carry relatively higher weights and dichotomous items carry lower weights. However, the conclusion may not hold if we assign higher weights to dichotomous items whereas lower weights to polytomous items.

Improving latent trait estimation is always important in assessment and evaluation. Since most state assessments are employing true-score equating and linking, the application of the proposed method may reduce linking error and thus increase assessment reliability. Furthermore, the proposed item-weighted procedure is expected to have a broad range application, particularly in computerized testing. It can be incorporated with item selection procedures to not only lower item exposure rates but also improve ability estimation. Though in the current study the IWLE method is only used in a combination of the three-parameter logistic (3PL) model and the GPCM, the method can be easily generalized to other models such as the Rasch model, Partial Credit Model (PCM), and Graded Response Model.

Footnotes

This research was partially supported by the Fundamental Research Funds for the Central Universities (10JCXK007), the Natural Science Foundation of Jilin Province (201115005), the Training Fund of NENU'S Scientific Innovation Project (NENU-STC07002) and National Natural Science and Social Science Foundations of China (Grant Nos. 10931002, 10828102, 10871037, and 07JZD0031). The authors gratefully acknowledge the helpful comments of the two anonymous reviewers and the Editor.

Appendix

References

Baker

F. B.

(1992). Item response theory: Parameter estimation techniques. New York, NY: Marcel Dekker.

Baker

F. B.

Kim

S.-H.

(2004). Item response theory: Parameter estimation techniques. Revised and expanded (2nd ed.). New York, NY: Marcel Dekker.

Birnbaum

(1968). Some latent trait models and their use in inferring an examinee’s ability. In M. Lord

Novick

M. R.

(Eds.), Statistical theories of mental test scores (pp. 397–472). Reading, MA: Addison-Wesley.

Donoghue

J. R.

(1994). An empirical examination of the IRT information of polytomously scored reading items under the generalized partial credit model. Journal of Educational Measurement, 41, 295–311.

Embretson

S. E.

Reise

S. P.

(2000). Item response theory for psychologists. Mahwah, NJ: Erlbaum.

Firth

(1993). Bias reduction of maximum likelihood estimates. Biometrika, 80, 27–38.

Jodoin

M. G.

(2003). Measurement efficiency of innovative item formats in computer-based testing. Journal of Educational Measurement, 40, 1–15.

(1997). The asymptotic properties of the maximum-relevance weighted likelihood estimators. The Canadian Journal of Statistics, 30, 45–59.

Rosenberger

W. F.

(2000). Analysis of time trends in adaptive designs with application to a neurophysiology experiment. Statistics in Medicine, 19, 2067–2075.

10.

Zidek

J. V.

(1995). Incorporating relevant sample information using the likelihood (Technical Report No. 161). Vancouver, British Columbia, Canada: Department of Statistics, The University of British Columbia.

11.

Linacre

J. M.

Wright

(1995). BIGSTEPS (Version 2.57) Rasch-model computer program [Computer software]. Chicago, IL: MESA Press.

12.

Lord

F. M.

(1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Erlbaum.

13.

Lord

F. M.

(1983). Unbiased estimators of ability parameters, of their variance, and of their parallel-forms reliability. Psychometrika, 48, 233–245.

14.

Muraki

(1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16, 159–176.

15.

Muraki

(1993). Information functions of the generalized partial credit model. Applied Psychological Measurement, 17, 351–363.

16.

Muraki

(1997). A generalized partial credit model. In van der Linden

W. J.

Hambleton

R. K.

(Eds.), Handbook of modern item response theory (pp. 153–164). New York, NY: Springer.

17.

Muraki

Bock

R. D.

(2003). PARSCALE (Version 4.1) [Computer program]. Lincolnwood, IL: Scientific Software International.

18.

Penfield

R. D.

Bergeron

J. M.

(2005). Applying a weighted maximum likelihood latent trait estimator to the generalized partial credit model. Applied Psychological Measurement, 29, 218–233.

19.

Wang

(2001). Precision of Warm’s weighted likelihood estimates for a polytomous model in computerized adaptive testing. Applied Psychological Measurement, 25, 317–331.

20.

Wang

Hanson

B. A.

Lau

C. A.

(1999). Reducing bias in CAT ability estimation: A comparison of approaches. Applied Psychological Measurement, 23, 263–278.

21.

Wang

van Eeden

Zidek

J. V.

(2004). Asymptotic properties of maximum weighted likelihood estimators. Journal of Statistical Planning and Inference, 119, 37–54.

22.

Wang

Zidek

J. V.

(2005). Choosing likelihood weights by cross-validation. The Annals of Statistics, 33, 463–500.

23.

Warm

T. A.

(1989). Weighted likelihood estimation of ability in item response theory. Psychometrika, 54, 427–450.