Item Parameter Estimation With the General Hyperbolic Cosine Ideal Point IRT Model

Abstract

Over the last decade, researchers have come to recognize the benefits of ideal point item response theory (IRT) models for noncognitive measurement. Although most applied studies have utilized the Generalized Graded Unfolding Model (GGUM), many others have been developed. Most notably, David Andrich and colleagues published a series of papers comparing dominance and ideal point measurement perspectives, and they proposed ideal point models for dichotomous and polytomous single-stimulus responses, known as the Hyperbolic Cosine Model (HCM) and the General Hyperbolic Cosine Model (GHCM), respectively. These models have item response functions resembling the GGUM and its more constrained forms, but they are mathematically simpler. Despite the apparent impact of Andrich’s work on ensuing investigations, the HCM and GHCM have been largely overlooked by applied researchers. This may stem from questions about the compatibility of the parameter metric with other ideal point estimation and model-data fit software or seemingly unrealistic parameter estimates sometimes produced by the original joint maximum likelihood (JML) estimation software. Given the growing list of ideal point applications and variations in sample and scale characteristics, the authors believe these HCMs warrant renewed consideration. To address this need and overcome potential JML estimation difficulties, this study developed a marginal maximum likelihood (MML) estimation algorithm for the GHCM and explored parameter estimation requirements in a Monte Carlo study manipulating sample size, scale length, and data types. The authors found a sample size of 400 was adequate for parameter estimation and, in accordance with GGUM studies, estimation was superior in polytomous conditions.

Keywords

General Hyperbolic Cosine Model ideal point IRT MML-EM

Over the last decade, researchers have come to recognize the benefits of ideal point item response theory (IRT) models for noncognitive scale construction and scoring. Ideal point models have been applied in areas such as attitude (e.g., Andrich, 1988, 1989; Andrich & Styles, 1998; Roberts & Laughlin, 1996), health (Andrich & Van Schoubroeck, 1989), personality (Carter et al., 2014; Chernyshenko, Stark, Chan, Drasgow, & Williams, 2001; Stark, Chernyshenko, Drasgow, & Williams, 2006), vocational interest (Tay, Ali, Drasgow, & Williams, 2011), job satisfaction (Carter & Dalal, 2010), and performance (Borman et al., 2001) measurement.

To date, the vast majority of applied studies have utilized the Generalized Graded Unfolding Model (GGUM; Roberts, Donoghue, & Laughlin, 2000), even though many other ideal point models have been developed. Most notably, David Andrich and colleagues published a series of papers comparing dominance and ideal point measurement perspectives (Andrich, 1978, 1995, 1996), and they proposed ideal point models for dichotomous and polytomous single-stimulus responses, known as the Hyperbolic Cosine Model (HCM; Andrich & Luo, 1993) and the General Hyperbolic Cosine Model (GHCM; Andrich, 1996), respectively. These models have item response functions (IRFs) resembling the GGUM and its submodels, particularly the “partial credit model” which constrains discrimination parameters to 1 (Roberts & Laughlin, 1996), yet the GHCM probability equation is somewhat simpler, making it easier to compute derivatives analytically for parameter estimation and generalize the model for multidimensional forced-choice (MFC) applications (e.g., Seybert, 2013).

Despite the apparent impact of Andrich’s work on ensuing model development (e.g., Roberts et al., 2000; Stark, Chernyshenko, & Drasgow, 2005), model-data fit (Chernyshenko et al., 2001; Stark et al., 2006), and scale construction (Chernyshenko, Stark, Drasgow, & Roberts, 2007; Roberts, Laughlin, & Wedell, 1999) investigations, the HCM and GHCM have been largely overlooked by applied researchers. This may be due to questions about the compatibility of the parameter metric with other ideal point estimation and model-data fit software or concerns about some seemingly unrealistic parameter estimates produced by the early joint maximum likelihood (JML) estimation software (e.g., location parameter estimates well beyond the −3 to +3 range observed with most models; Stark, Chernyshenko, Lee, & Drasgow, 2000).

Given the growing list of ideal point applications and the variations in sample and scale characteristics in applied research, the authors believe that these HCMs warrant renewed consideration. To address this need and avoid potential difficulties with JML estimation, this study developed a new marginal maximum likelihood expectation-maximization (MML-EM) algorithm for the GHCM and explored parameter estimation requirements in a Monte Carlo study manipulating sample size, scale length, and data types. To illustrate the properties of this model, this study shows GHCM IRFs and item information functions (IIFs), and use the MML-EM algorithm to calibrate responses to a 20-item Orderliness scale (Chernyshenko, 2002).

The General Hyperbolic Cosine Model (GHCM)

Andrich (1996) proposed the GHCM for ordered categorical responses. It is an extension of the earlier HCM for dichotomous data. The probability function for the GHCM is

P (Y_{i j} = y; y < M) = \frac{\exp (κ_{y j}) 2 \cosh [(M - y) (θ_{i} - δ_{j})]}{γ_{i j}},

P (Y_{i j} = M) = \frac{\exp (κ_{M j})}{γ_{i j}},

where

γ_{i j} = \sum_{m = 0}^{M - 1} \exp (κ_{m j}) 2 \cosh [(M - m) (θ_{i} - δ_{j})] + \exp (κ_{M j}) .

Note that $Y_{i j}$ is the observed response by examinee i to item j, $θ_{i}$ is the latent trait score of examinee i, $δ_{j}$ is the location parameter of item j, $κ_{y j} = - \sum_{k = 0}^{y} τ_{j k}$ , where $τ_{j k}$ is the k^th subjective response category threshold parameter of item j ( $τ_{j 0}$ is assumed to be 0), and M is the maximum observed score for the ordered categorical item.

Figure 1 presents illustrative GHCM category response functions (CRFs) and IRFs (i.e., expected value functions) for two four-option polytomous items. Panels (a) and (b) present the CRFs and IRF for the first item, and Panels (c) and (d) present the CRFs and IRFs for the second item. Note that the CRFs are symmetric around the item location parameters ( $δ)$ , the probability of strongest disagreement $(P_{Y = 0})$ is highest at | $θ - δ |$ = −3 or 3, and the probability of strongest agreement $(P_{Y = 3})$ is highest when | $θ - δ |$ = 0. Accordingly, the IRFs in panels (b) and (d) show that examinees possessing intermediate trait levels have the highest probability of agreeing with these items. Finally, note that the second item, which has closer successive thresholds ( $τ_{k}$ ), is more discriminating than the first item having wider successive thresholds.

Figure 1.

Illustrative GHCM category response functions and item response functions.

Item Information

Previous studies have derived item information for the GHCM using Samejima’s (1969) definition of information for polytomous IRT models (e.g., Luo, 2001; Luo & Andrich, 2005). Let $I_{j k} (θ)$ denote the amount of information provided by category k (k = 0, 1,… $, M$ ) of item j. The information provided by item j at ability level $θ$ is

I_{j} (θ) = \sum_{k = 0}^{M} I_{j k} (θ) P_{j k} (θ),

where Fisher information $I_{j k} (θ)$ is

I_{j k} (θ) = - \frac{\partial^{2} \ln P_{j k} (θ)}{\partial θ^{2}} = - \frac{\partial}{\partial θ} (\frac{1}{P_{j k} (θ)} \frac{\partial P_{j k} (θ)}{\partial θ}) = \frac{1}{{[P_{j k} (θ)]}^{2}} [{(\frac{\partial P_{j k} (θ)}{\partial θ})}^{2} - (\frac{\partial^{2} P_{j k} (θ)}{\partial θ^{2}}) P_{j k} (θ)] .

By substitution, item information can be rewritten as

I_{j} (θ) = \sum_{k = 0}^{M} \frac{1}{P_{j k} (θ)} {(\frac{\partial P_{j k} (θ)}{\partial θ})}^{2} - \frac{\partial^{2} P_{j k} (θ)}{\partial θ^{2}},

where $P_{j k} (θ)$ is the GHCM probability of an examinee endorsing category k of item j, and the first and second partial derivatives with respect to $θ$ are

\frac{\partial P_{j k} (θ)}{\partial θ} = \frac{\exp (κ_{y j}) 2 \sinh ((M - y) (θ - δ_{j})) (M - y)}{γ_{j}} - \frac{\exp (κ_{y j}) 2 \cosh ((M - y) (θ - δ_{j})) \frac{\partial γ_{j}}{\partial θ}}{γ_{j}^{2}},

\begin{matrix} \frac{\partial^{2} P_{j k} (θ)}{\partial θ^{2}} = \frac{\exp (κ_{y j}) 2 \cosh ((M - y) (θ - δ_{j})) {(M - y)}^{2}}{γ_{j}} - \frac{\exp (κ_{y j}) 4 \sinh ((M - y) (θ - δ_{j})) (M - y) \frac{\partial γ_{j}}{\partial θ}}{γ_{j}^{2}} \\ + \frac{\exp (κ_{y j}) 4 \cosh ((M - y) (θ - δ_{j})) {(\frac{\partial γ_{j}}{\partial θ})}^{2}}{γ_{j}^{3}} - \frac{\exp (κ_{y j}) 2 \cosh ((M - y) (θ - δ_{j})) \frac{\partial^{2} γ_{j}}{\partial θ^{2}}}{γ_{j}^{2}}, \end{matrix}

where

\frac{\partial γ_{j}}{\partial θ} = \sum_{m = 0}^{M - 1} \exp (κ_{m j}) 2 \sinh ((M - m) (θ - δ_{j})) (M - m), and

\frac{\partial^{2} γ_{j}}{\partial θ^{2}} = \sum_{m = 0}^{M - 1} \exp (κ_{m j}) 2 \cosh ((M - m) (θ - δ_{j})) {(M - m)}^{2} .

Figure 2 presents GHCM IIFs for the two- and four-option items, respectively. Consistent with IIFs of the GGUM, the GHCM IIFs are bimodal and symmetric, and item information is zero at | $θ - δ |$ = 0, as explained by Andrich (1995). Also, as shown in Figure 2, the polytomous items are more informative than the dichotomous items across the ( $θ - δ)$ continuum except for the point where | $θ - δ |$ = 0. Note that more informative items would result in more accurate item parameter estimates.

Figure 2.

Illustrative GHCM item information functions.

Item Parameter Estimation

An advantage of MML estimation over JML estimation is that the MML parameter estimates are consistent. The number of parameters to estimate does not increase as sample size increases because person parameters are integrated out of the likelihood function. In this research, MML estimation of GHCM item parameters was accomplished using an expectation-maximization (EM) algorithm (Bock & Aitkin, 1981; Robert et al., 2000). The necessary equations for the GHCM were derived as follows.

Let $Y_{s}$ be one of S distinct response vectors for a given dataset (i.e., $Y_{s}$ = [ $Y_{s 1}$ , $Y_{s 2}$ ,…, $Y_{s J}$ ], where $Y_{s j}$ refer to the j^th element of $Y_{s}$ ) and let $U_{s j k}$ be a dummy variable such that $U_{s j k} = 1$ if $Y_{s j} = k$ and $U_{s j k} = 0$ otherwise. Under the assumption of local independence, the conditional probability of observing a particular response vector $Y_{s}$ given $θ$ is

P (Y_{s} | θ) = \prod_{j}^{J} \prod_{k}^{M} {[P_{j k} (Y_{s j} | θ)]}^{U_{s j k}},

and

\ln P (Y_{s} | θ) = \sum_{j}^{J} \sum_{k}^{M} U_{s j k} \ln P_{j k} (Y_{s j} | θ) .

For examinees randomly sampled from a population with trait distribution $g (θ)$ , the marginal likelihood of $Y_{s}$ is

P (Y_{s}) = \int P (Y_{s} | θ) g (θ) d θ .

Let N represent the total number of response patterns (i.e., examinees), S represent the number of unique response patterns (s = 1, 2, …, S), and $r_{s}$ be the frequency of the unique pattern, $Y_{s}$ , then $r$ = [ $r_{1}$ , $r_{2}$ , …, $r_{S}$ ] follows a multinomial distribution. The likelihood of the frequency distribution of response patterns is

L = (\begin{array}{l} N \\ r_{1} r_{2} … r_{S} \end{array}) \prod_{s = 1}^{S} P {(Y_{s})}^{r_{s}}

and the marginal log likelihood is

\ln L = \ln N! - \sum_{s = 1}^{S} \ln r_{s}! + \sum_{s = 1}^{S} r_{s} \ln P (Y_{s}) .

To estimate GHCM item parameters, the partial derivatives of Equation 12 with respect to $δ_{j}$ and $τ_{j k}$ are needed:

\begin{array}{l} \frac{\partial}{\partial δ_{j}} \ln L = \sum_{s}^{S} \frac{r_{s}}{P (Y_{s})} \frac{\partial}{\partial δ_{j}} P (Y_{s}) \\ = \sum_{s}^{S} \frac{r_{s}}{P (Y_{s})} \int [\frac{\partial}{\partial δ_{j}} P (Y_{s} | θ)] g (θ) d θ \\ = \sum_{s}^{S} \frac{r_{s}}{P (Y_{s})} \int [(\frac{\partial}{\partial δ_{j}} \ln P (Y_{s} | θ)) P (Y_{s} | θ)] g (θ) d θ \\ = \sum_{s}^{S} r_{s} \int [\frac{\partial}{\partial δ_{j}} \ln P (Y_{s} | θ)] \frac{P (Y_{s} | θ) g (θ)}{P (Y_{s})} d θ \\ = \sum_{s}^{S} r_{s} \int [\frac{\partial}{\partial δ_{j}} \ln P (Y_{s} | θ)] P (θ | Y_{s}) d θ . \end{array}

Furthermore, $\frac{\partial}{\partial δ_{j}} \ln P (Y_{s} | θ)$ in Equation 13 can be rewritten using Equation 9:

\frac{\partial}{\partial δ_{j}} \ln P (Y_{s} | θ) = \frac{\partial}{\partial δ_{j}} \sum_{j}^{J} \sum_{k}^{M} U_{s j k} \ln P_{j k} (Y_{s j} | θ) = \sum_{j}^{J} \sum_{k}^{M} \frac{U_{s j k}}{P_{j k} (Y_{s j} | θ)} \frac{\partial P_{j k} (Y_{s j} | θ)}{\partial δ_{j}} .

For a specific item j, because partial derivatives with respect to $δ_{j}$ do not depend on $δ_{j'}$ , where $j \neq j'$ , the summation over j can be omitted. Putting Equations 13 and 14 together, what is obtained is

\frac{\partial}{\partial δ_{j}} \ln L = \sum_{s}^{S} \int [\sum_{k}^{M} \frac{r_{s} U_{s j k}}{P_{j k} (Y_{s j} | θ)} \frac{\partial P_{j k} (Y_{s j} | θ)}{\partial δ_{j}}] P (θ | Y_{s}) d θ .

Similarly, the derivative of the marginal likelihood with respect to $τ_{j k}$ is

\frac{\partial}{\partial τ_{j k}} \ln L = \sum_{s}^{S} \int [\sum_{k}^{M} \frac{r_{s} U_{s j k}}{P_{j k} (Y_{s j} | θ)} \frac{\partial P_{j k} (Y_{s j} | θ)}{\partial τ_{j k}}] P (θ | Y_{s}) d θ .

The integrations in these equations can be performed numerically using Gaussian quadrature (Bock & Aitkin, 1981; Bock & Lieberman, 1970; Mislevy & Bock, 1990). Assuming $g (θ)$ is a standard normal distribution approximated by equally spaced points, $X_{q}$ for q = 1, …, Q with weights $A (X_{q})$ , the posterior term in Equations 15 and 16 can be rewritten as

P (X_{q} | Y_{s}) = \frac{L_{s} (X_{q}) A (X_{q})}{\sum_{q}^{Q} L_{s} (X_{q}) A (X_{q})},

where $L_{s} (X_{q}) = Π_{j}^{J} Π_{k}^{M} {[P_{j k} (Y_{s j} | X_{q})]}^{U_{s j k}},$ and $\sum_{q = 1}^{Q} A (X_{q}) = 1 .$

Substituting into Equations 15 and 16 yields the likelihood equations which must be solved for the GHCM item parameter estimates, ${\hat{δ}}_{j}$ and ${\hat{τ}}_{j k}$ :

\frac{\partial}{\partial δ_{j}} \ln L = \sum_{s}^{S} \sum_{q}^{Q} [\sum_{k}^{M} \frac{r_{s} U_{s j k}}{P_{j k} (Y_{s j} | X_{q})} \frac{\partial P_{j k} (Y_{s j} | X_{q})}{\partial δ_{j}}] P (X_{q} | Y_{s}),

\frac{\partial}{\partial τ_{j k}} \ln L = \sum_{s}^{S} \sum_{q}^{Q} [\sum_{k}^{M} \frac{r_{s} U_{s j k}}{P_{j k} (Y_{s j} | X_{q})} \frac{\partial P_{j k} (Y_{s j} | X_{q})}{\partial τ_{j k}}] P (X_{q} | Y_{s}) .

In the expectation step (E-step) of the EM algorithm, one must choose initial values for the item parameter estimates and compute the expected frequency of each item response at each quadrature point, ${\bar{r}}_{j k q}$ :

{\bar{r}}_{j k q} = \sum_{s}^{S} \frac{r_{s} U_{s j k} L_{s} (X_{q}) A (X_{q})}{\sum_{q}^{Q} L_{s} (X_{q}) A (X_{q})} .

Equations 18 and 19 thus become

\frac{\partial}{\partial δ_{j}} \ln L = \sum_{q}^{Q} \sum_{k}^{M} \frac{{\bar{r}}_{j k q}}{P_{j k} (Y_{s j} | X_{q})} \frac{\partial P_{j k} (Y_{s j} | X_{q})}{\partial δ_{j}},

\frac{\partial}{\partial τ_{j k}} \ln L = \sum_{q}^{Q} \sum_{k}^{M} \frac{{\bar{r}}_{j k q}}{P_{j k} (Y_{s j} | X_{q})} \frac{\partial P_{j k} (Y_{s j} | X_{q})}{\partial τ_{j k}},

where the first-order partial derivatives with respect to $δ_{j}$ and $τ_{j k}$ are

\begin{array}{l} \frac{\partial P_{j k} (Y_{s j} | X_{q})}{\partial δ_{j}} = \frac{\exp (κ_{Y_{s j}}) 2 \sinh ((M - Y_{s j}) (X_{q} - δ_{j})) (- M + Y_{s j})}{γ_{j}} \\ - \frac{\exp (κ_{Y_{s j}}) 2 \cosh ((M - Y_{s j}) (X_{q} - δ_{j})) \frac{\partial γ_{j}}{\partial δ_{j}}}{γ_{j}^{2}}, \end{array}

\begin{array}{l} \frac{\partial P_{j k} (Y_{s j} | X_{q})}{\partial τ_{j k}} = \frac{\exp (κ_{Y_{s j}}) 2 (- Y_{s j} - 1) \cosh ((M - Y_{s j}) (X_{q} - δ_{j}))}{γ_{j}} \\ - \frac{\exp (κ_{Y_{s j}}) 2 \cosh ((M - Y_{s j}) (X_{q} - δ_{j})) \frac{\partial γ_{j}}{\partial τ_{j k}}}{γ_{j}^{2}}, \end{array}

and

\frac{\partial γ_{j}}{\partial δ_{j}} = \sum_{m = 0}^{M - 1} \exp (κ_{m j}) 2 \sinh ((M - m) (X_{q} - δ_{j})) (- M + m),

\frac{\partial γ_{j}}{\partial τ_{j k}} = \sum_{m = 0}^{M - 1} \exp (κ_{m j}) 2 (- m - 1) \cosh ((M - m) (X_{q} - δ_{j})) + (- M - 1) \exp (κ_{M j}) .

After the E-step, a maximization step (M-step) is performed: The likelihood equations (21 and 22) for each item are set to zero and solved for their roots ( ${\hat{δ}}_{j}$ , ${\hat{τ}}_{j 1}$ , ${\hat{τ}}_{j 2}$ , …, ${\hat{τ}}_{j M}$ ). These provisional estimates are used as starting values for the next E-step, and this process continues until a maximum number of EM cycles has been performed or the change in log likelihood or individual item parameter estimates between cycles is sufficiently small.

Method

Simulation Study

The authors conducted a Monte Carlo simulation to examine parameter estimation requirements for the GHCM based on the MML-EM approach. Three factors were manipulated in a fully-crossed 5 × 2 × 2 experimental design: (a) sample size (50, 100, 200, 400, 800); (b) number of items (10, 20); (c) data type (dichotomous, four-option polytomous). Fifty replications were performed in each experimental condition.

Parameters for response data generation were selected based on previous studies with ideal point models (e.g., Carter & Zickar, 2011; Joo, Lee, & Stark, 2017; Koenig & Roberts, 2007; Roberts, Donoghue, & Laughlin, 2002; Roberts & Thompson, 2011). Location parameters ( $δ_{j}$ ) were evenly distributed across the trait continuum on the interval [−2, 2]. For dichotomous conditions, threshold parameters ( $τ_{j k}$ ) were randomly sampled from a uniform distribution [−3, −0.5]. For polytomous conditions, the highest threshold parameter for each item ( $τ_{j K})$ was randomly sampled from a uniform distribution [−1.5, −0.5]; then subsequent descending thresholds $τ_{j (k - 1)}$ were generated using a recursive formula, wherein a constant of –.25 and a random error sampled from N(0, 0.04) were added to the preceding $τ_{j k}$ . Once the 20 sets of item parameters were generated, odd number sets were selected to form the 10-item measures. (All items were used for the 20-item measures.) The items in the respective 10- and 20-item measures were then sorted in ascending order based on the location parameters. Finally, GHCM response data were generated in the customary manner by comparing randomly sampled uniform numbers to the response probabilities for the respective items.

EM Implementation

Initial values for the GHCM item parameters were chosen in accordance with previous ideal point model simulation studies (e.g., Andrich & Luo, 1993; de la Torre, Stark, & Chernyshenko, 2006; Roberts et al., 2000). Because the items in each measure were already sorted by their location parameters, and person parameters were sampled from a N(0,1) distribution, the initial location parameter for item 1 was set at −3 and successive items were assigned initial location values in increments of 6 / (J − 1), where J is total number of items in the measure. All initial threshold parameters were set at −1.

To perform the numerical integration in the E-step, 61 equally spaced quadrature points on the interval [−3, 3] were used. The likelihood of the response data was computed using the ${\bar{r}}_{j k q}$ (Equation 20) from the E-step and the initial item parameter estimates. In the M-step, the likelihood equations were solved using the bound constrained Broyden–Fletcher–Goldfarb–Shanno (constrained BFGS) optimization method (Byrd, Lu, Nocedal, & Zhu, 1995). Note that constrained BFGS optimization allows lower and upper bounds to be set for each estimated parameter. The lower and upper bounds were set at −5 and +5, respectively, based on the item parameters in previous ideal point studies (e.g., Andrich & Luo, 1993; Roberts et al., 2000). Because BFGS is a minimization algorithm, the log likelihood was multiplied by −1. The likelihood equations were solved one item at a time by repeating the E- and M- steps until the largest change of all individual item parameter estimates between cycles was less than .0005 or a maximum of 999 EM iterations was performed (de la Torre, 2009; Roberts et al., 2000).

Person Parameter Estimation

The expected a posteriori (EAP; Bock & Mislevy, 1982) method was chosen for estimating person parameters of the GHCM. In EAP, the person parameter estimates are obtained for each respondent by taking the mean of the posterior density of $θ$ . The EAP estimate $({\hat{θ}}_{E A P})$ and its variance are

{\hat{θ}}_{E A P} = E (θ | X) = \frac{\int \dots \int θ L (X | θ) f (θ) d θ}{\int \dots \int L (X | θ) f (θ) d θ},

V a r (θ | X) = \frac{\int {(θ - \hat{θ})}^{2} L (X | θ) f (θ) d θ}{\int L (X | θ) f (θ) d θ},

where X is a matrix of item responses, $L (X | θ)$ is the likelihood of the response data across items, based on the chosen item response model, and $f (θ)$ is a prior distribution for $θ$ . The posterior standard deviation (PSD) of the EAP estimate is obtained by taking the square root of $Var (θ | X)$ . Note that the standard normal density function for $f (θ)$ was used in the current study.

Analysis

To assess the efficacy of GHCM parameter estimation, root mean square error (RMSE = $\sqrt{\sum_{i} {({\hat{μ}}_{i} - μ_{i})}^{2} / I})$ and average absolute bias (BIAS = $\sum_{i} | {\hat{μ}}_{i} - μ_{i} | / I$ ) were computed, where $μ_{i}$ represents the generating parameters, ${\hat{μ}}_{i}$ represents the estimated parameters, and I represents the total number of items or simulees. RMSE and BIAS were computed for each replication and then averaged across replications. In addition, the standard error (SE) of each item parameter estimate was computed by taking the square root of the diagonal element of the approximated inverse Hessian matrix of the log likelihood function, produced by the constrained BFGS optimization on the last EM cycle. For person parameter estimation, the PSD was used as the standard error estimate. Data generation, parameter estimation, and analysis of the item parameter estimates were accomplished using an R v3.2.3 program (R Core Team, 2016) that is available as online supplemental material.

Results

Table 1 presents the overall parameter recovery statistics for the individual simulation conditions. For the four-option polytomous conditions, note that RMSE, BIAS, and SE of $τ_{1}$ , $τ_{2}$ , and $τ_{3}$ were computed separately, then averaged to allow direct comparisons with the dichotomous findings.

Table 1.

GHCM Monte Carlo Study Results.

			δ			τ			θ
Response type	Item	N	RMSE	BIAS	SE	RMSE	BIAS	SE	RMSE	BIAS	PSD
Dichotomous	10	50	.44	.40	.40	.66	.64	.33	.65	.54	.65
		100	.36	.29	.30	.50	.45	.23	.62	.51	.63
		200	.28	.21	.20	.37	.31	.16	.68	.56	.68
		400	.15	.11	.15	.17	.13	.11	.66	.55	.65
		800	.16	.11	.10	.15	.10	.08	.68	.56	.67
	20	50	.34	.28	.42	.62	.58	.34	.57	.46	.59
		100	.23	.19	.29	.46	.32	.23	.60	.49	.59
		200	.21	.15	.21	.32	.16	.16	.55	.45	.55
		400	.17	.14	.15	.13	.09	.11	.57	.46	.57
		800	.07	.05	.10	.08	.05	.08	.56	.45	.56
Polytomous	10	50	.76	.40	.20	.40	.32	.47	.41	.33	.42
		100	.19	.14	.15	.14	.12	.33	.42	.33	.42
		200	.16	.11	.10	.11	.09	.23	.42	.34	.42
		400	.06	.05	.07	.06	.05	.16	.42	.34	.42
		800	.04	.04	.05	.03	.02	.11	.42	.34	.42
	20	50	.81	.46	.19	.23	.19	.44	.31	.24	.31
		100	.67	.35	.14	.21	.17	.31	.30	.24	.30
		200	.37	.15	.10	.17	.10	.22	.31	.25	.31
		400	.10	.05	.07	.05	.04	.15	.31	.24	.31
		800	.03	.03	.05	.02	.02	.11	.31	.24	.31

Note. Threshold (τ) parameters were averaged across response categories for the polytomous condition. RMSE = root mean square error; BIAS = absolute bias; SE = standard error; PSD = posterior standard deviation.

The detailed results in Table 1 show that the overall accuracy of GHCM item parameter estimates was a clear function of sample size. The magnitude of the three error measures (RMSE, BIAS, and SE) decreased as sample size increased, regardless of the item parameter in question. It is important to note that after the sample size reached 400, the increased precision afforded by larger samples began to wane. Note also that, with the sample size of 400, the parameter estimates of the GHCM were recovered well. For example, with 400 simulees, the marginal values of RMSE, BIAS, and SE across simulation conditions for $δ$ were .10, .07, and .09, respectively; and the corresponding values for $τ$ were .09, .06, and .12, respectively.

Overall item parameters were estimated better in polytomous conditions than in dichotomous conditions with equal sample size. For example, when the sample size was 400 and the number of items was 20, the BIAS of $δ$ and $τ$ in the dichotomous condition were .14 and .09, respectively, whereas they were .05 and .04 in the corresponding polytomous condition. This suggests that increasing the number of response options tends to improve item parameter estimation, a finding which is consistent with previous ideal point simulation studies (e.g., Joo et al., 2017; Roberts et al., 2002; Roberts & Thompson, 2011). Furthermore, although increasing the number of items from 10 to 20 produced only negligible improvements in polytomous conditions, estimation improved somewhat in the dichotomous conditions.

In addition, person parameters were most accurately estimated with the 20-item polytomous data. As shown in Table 1, the accuracy of the person parameter estimates increased as the number of items increased, and better results were obtained with polytomous data than with dichotomous data. However, person parameter estimates were not influenced by sample size, which suggests robustness to item parameter estimation error as reported in previous research (e.g., Stark, Chernyshenko, & Guenole, 2011).

Real Data Example

To illustrate the application of the GHCM based on the MML-EM algorithm, the authors analyzed 493 responses to a four-option (0 = Strongly Disagree, 1 = Disagree, 2 = Agree, 3 = Strongly Agree), 20-item Orderliness scale (Chernyshenko, 2002). They expected that a sample size of 493 would be adequate for estimation based on the simulation findings. The initial location parameters were generated based on the item responses using the method described by Roberts and Laughlin (1996) and the initial threshold parameters were set at −1. For more information about the initial values for location parameters, readers may refer to the Appendix in Roberts and Laughlin (1996). The parameter estimates and their SEs for the complete set of items are presented in Table 2.

Table 2.

GHCM Item Parameters and Standard Errors for 20-Item Orderliness Scale.

Item	Content	Orderliness (N = 493)
		δ		τ₁		τ₂		τ₃
1	I spend a lot of time looking for objects that I’ve misplaced.	−2.36	(.07)	−3.63	(.14)	−1.66	(.10)	−0.37	(.19)
2	I find myself unprepared in most situations.	−2.67	(.08)	−3.68	(.12)	−0.69	(.14)	0.94	(.51)
3	Most of the time my room is in complete disarray.	−2.98	(.06)	−3.46	(.12)	−2.03	(.11)	−1.21	(.18)
4	Taking care of every detail is a waste of time and effort.	−2.48	(.07)	−3.47	(.12)	−0.78	(.13)	−0.18	(.26)
5	I prefer to keep my options open and rarely plan in advance.	−1.41	(.07)	−3.71	(.17)	−1.10	(.10)	0.64	(.19)
6	I seldom make detailed “to do” lists.	−1.17	(.06)	−2.24	(.12)	−1.98	(.10)	−0.10	(.18)
7	Being neat is not exactly my strength.	−2.71	(.06)	−3.86	(.13)	−2.23	(.10)	−0.95	(.17)
8	I do pretty standard maintenance of my property and possessions.	1.51	(.09)	−4.53	(.30)	−3.42	(.13)	0.25	(.14)
9	Most jobs do not require much planning.	−0.76	(.09)	−2.65	(.14)	0.17	(.11)	1.89	(.36)
10	My planning skills are about average.	−0.30	(.10)	−3.02	(.28)	−2.20	(.12)	1.83	(.21)
11	I try to keep everything in its place, but that doesn’t always work for me.	−0.99	(.08)	−3.30	(.23)	−2.39	(.12)	0.75	(.15)
12	I have a daily organizer, but I have a hard time keeping it up to date.	−2.61	(.06)	−3.98	(.15)	−3.10	(.10)	−0.95	(.14)
13	I try to balance my checkbook at the end of each month.	0.77	(.06)	−1.78	(.13)	−0.97	(.10)	0.16	(.14)
14	I have a daily routine and stick to it.	1.62	(.07)	−4.24	(.18)	−1.32	(.10)	0.57	(.20)
15	I prefer to do things in a logical order.	1.93	(.08)	−4.59	(.26)	−3.94	(.14)	−0.34	(.13)
16	Organization is a key component of most things I do.	1.86	(.06)	−4.66	(.19)	−2.49	(.11)	−0.55	(.13)
17	I rarely deviate from my morning routines.	1.10	(.07)	−3.30	(.18)	−1.30	(.10)	0.86	(.18)
18	I become annoyed when things around me are disorganized.	1.71	(.07)	−4.66	(.22)	−2.80	(.11)	−0.23	(.13)
19	I hate when people are sloppy.	1.74	(.07)	−4.55	(.25)	−3.34	(.13)	−0.71	(.11)
20	Every item in my room and on my desk has a designated place.	2.44	(.06)	−4.04	(.15)	−2.90	(.10)	−0.65	(.15)

As shown in Table 2, the estimated values of $δ$ across the 20 items ranged from −2.98 to 2.44, which is consistent with the use of a standard normal prior distribution. The average SE for $δ$ was .07. $τ$ estimates also fell within the range observed in the simulation, with $τ_{1}$ ranging from −4.66 to −1.78, $τ_{2}$ ranging from −3.94 to .17, and $τ_{3}$ ranging from −1.21 to 1.89. Their average SE was .16. Interestingly, the estimated SEs for $δ$ and $τ$ were nearly identical to those in the most similar simulation condition (i.e., four-option, 20-item, N = 400). GHCM person parameters were also estimated using the EAP method. The person parameter estimates ranged from −2.85 to 2.54, and the distribution was nearly symmetric (skewness = .14) and normal with a mean of –.03 and a standard deviation of .81. Normality of the person parameter estimates provides evidence that the parameter estimates of the GHCM were reliable given that MML estimation was based on the normality assumption. As shown in Roberts et al. (2002), if the person parameter estimates were positively or negatively skewed, the accuracy of parameter estimates would tend to decrease and the interpretation might not be accurate.

In addition, selected IRFs from this analysis are presented in Figure 3. In accordance with expectations, the IRFs for Items 9 and 13, which had moderate location parameters (–.76 and .77, respectively), exhibit marked folding (somewhat bell-shaped), whereas the IRFs for Items 4 and 18, which had more extreme location parameters (−2.48 and 1.71, respectively) exhibit only moderate folding at one end of the depicted trait range.

Figure 3.

Selected IRFs from real data analysis using 20-item orderliness scale.

Discussion

This research was conducted to redirect attention toward a potentially useful ideal point model for noncognitive measurement in educational and organizational contexts. The authors developed an MML-EM algorithm for GHCM item parameter estimation and explored item properties and parameter recovery to address concerns raised in early ideal point personality research (Stark et al., 2000). They also presented example IRFs and IIFs to explore GHCM similarity to models that are more mathematically complex (e.g., GGUM; Roberts et al., 2000). Consistent with research on the GGUM (Roberts et al., 2002), they found that GHCM parameters were estimated better with polytomous data than with dichotomous data. However, improvements in item parameter estimation accuracy with samples larger than 400 were fairly small—a finding that should be encouraging to practitioners. In addition, their real data analysis showed that the estimation algorithm produced parameter estimates and SEs in accordance with their simulation findings, as well as with previous ideal point calibration studies using similar prior distributions (i.e., de la Torre et al., 2006).

Comparison With the GGUM

As noted in the introduction, the GHCM and GGUM have a number of features in common. As shown in Figures 1 and 2, the GHCM has IRFs and IIFs resembling the GGUM (Roberts et al., 2000). The IRFs of both models are bell-shaped and symmetric, and their IIFs are bimodal and symmetrical around | $θ - δ |$ = 0. In addition, both GGUM and GHCM are flexible enough to fit dichotomous and ordered-polytomous responses, and simulation studies with both models have found better estimation results with polytomous responses.

To explore the similarity of the two models further, the authors examined the correspondence between GHCM and GGUM trait scores in their large sample (N = 800) conditions. They generated GGUM dichotomous and four-option polytomous response data using person parameters (thetas) sampled from a standard normal distribution and item parameters similar to those in previously published studies (e.g., Roberts et al., 2002). They fitted the GGUM and GHCM to those data and compared the resulting EAP theta estimates. Scatterplots showing the correspondence between the GGUM and GHCM theta estimates are presented in Figure 4. Across conditions, the correlations ranged from .76 to .96 and increased substantially as the number of items and categories increased, suggesting that the GHCM may be a viable alternative to the GGUM for measuring noncognitive constructs.

Figure 4.

Correspondence between GHCM and GGUM person parameter (Theta) estimates.

For the model parameterization, however, there is a distinct difference between the GHCM and GGUM. Most notably, the GGUM includes a discrimination parameter along with location and threshold parameters, whereas the GHCM contains only location and threshold parameters. The GGUM is thus more flexible and can be seen as a more general model. In future research, it would be interesting to modify the GHCM by adding a discrimination parameter and compare model fit with the GGUM under various conditions.

Limitations and Future Research

Based on the findings of this study, the authors recommend that researchers use polytomous response formats and a calibration sample size of 400 when item parameter accuracy is a priority (e.g., in measurement equivalence studies). These recommendations are made while recognizing their study’s limitations. More specifically, this study and others have shown that ideal point model estimation is generally better with polytomous formats, but the authors only compared performance with two and four response options. Future research might consider expanding the number of options to five, six, or seven to cover the range of formats typically seen in applied research. Second, the authors used an MML-EM method to examine GHCM calibration requirements, but future studies may explore whether Markov chain Monte Carlo (MCMC; Patz & Junker, 1999) methods, which do not require first- and second-order partial derivatives, are more effective with samples of 400 and smaller. Alternatively, the marginal maximum a posteriori (MMAP) method can be developed for the GHCM, given that MMAP strategy for the GGUM reduces tradeoffs that are often observed between location and threshold parameters (Roberts & Thompson, 2011). Third, because this study presented GHCM IIFs, it may be useful to examine the GHCM as a basis for computerized adaptive testing (CAT).

In summary, the authors hope that this study serves as a catalyst for future ideal point modeling and application research. They encourage practitioners to explore the GHCM as an alternative to other ideal point models for ordered categorical responses and perhaps develop generalizations to address faking and other types of aberrant responding associated with noncognitive measurement.

Supplemental Material

APM-17-03-060.R3_Online_Supplemental_Material_(1) – Supplemental material for Item Parameter Estimation With the General Hyperbolic Cosine Ideal Point IRT Model

Supplemental material, APM-17-03-060.R3_Online_Supplemental_Material_(1) for Item Parameter Estimation With the General Hyperbolic Cosine Ideal Point IRT Model by Seang-Hwane Joo, Seokjoon Chun, Stephen Stark, and Olexander S. Chernyshenko in Applied Psychological Measurement

Footnotes

Authors’ Note

The current affiliation for the Author ‘Olexander S. Chernyshenko’ is ‘University of Western Australia, Australia’.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

Supplemental Material

Supplemental material is available for this article online.

References

Andrich

(1978). Relationships between Thurstone and Rasch approaches to item scaling. Applied Psychological Measurement, 2, 449-460.

Andrich

(1988). The application of an unfolding model of the PIRT type to the measurement of attitude. Applied Psychological Measurement, 12, 33-51.

Andrich

(1989). A probabilistic IRT model for unfolding preference data. Applied Psychological Measurement, 13, 193-216.

Andrich

(1995). Distinctive and incompatible properties of two common classes of IRT models for graded responses. Applied Psychological Measurement, 19, 101-119.

Andrich

(1996). A hyperbolic cosine latent trait model for unfolding polytomous responses: Reconciling Thurstone and Likert methodologies. British Journal of Mathematical and Statistical Psychology, 49, 347-365.

Andrich

Luo

(1993). A hyperbolic cosine latent trait model for unfolding dichotomous single-stimulus responses. Applied Psychological Measurement, 17, 253-276.

Andrich

Styles

I. M.

(1998). The structural relationship between attitude and behavior statements from the unfolding perspective. Psychological Methods, 3, 454-469.

Andrich

Van Schoubroeck

(1989). The General Health Questionnaire: A psychometric analysis using latent trait theory. Psychological Medicine, 19, 469-485.

Bock

R. D.

Aitkin

(1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46, 443-459.

10.

Bock

R. D.

Lieberman

(1970). Fitting a response model for dichotomously scored items. Psychometrika, 35, 179-197.

11.

Bock

R. D.

Mislevy

R. J.

(1982). Adaptive EAP estimation of ability in a microcomputer environment. Applied Psychological Measurement, 6, 431-444.

12.

Borman

W. C.

Buck

Hanson

M. A.

Motowidlo

S. J.

Stark

Drasgow

(2001). An examination of the comparative reliability, validity, and accuracy of performance ratings made using computerized adaptive rating scales. Journal of Applied Psychology, 86, 965-973.

13.

Byrd

R. H.

Nocedal

Zhu

(1995). A limited memory algorithm for bound constrained optimization. SIAM Journal on Scientific Computing, 16, 1190-1208.

14.

Carter

N. T.

Dalal

D. K.

(2010). An ideal point account of the JDI work satisfaction scale. Personality and Individual Differences, 49, 743-748.

15.

Carter

N. T.

Dalal

D. K.

Boyce

A. S.

O’Connell

M. S.

Kung

M. C.

Delgado

K. M.

(2014). Uncovering curvilinear relationships between conscientiousness and job performance: How theoretically appropriate measurement makes an empirical difference. Journal of Applied Psychology, 99, 564-586.

16.

Carter

N. T.

Zickar

M. J.

(2011). The influence of dimensionality on parameter estimation accuracy in the generalized graded unfolding model. Educational and Psychological Measurement, 71, 765-788.

17.

Chernyshenko

O. S.

(2002). Applications of ideal point approaches to scale construction and scoring in personality measurement: The development of a six-faceted measure of conscientiousness (Unpublished doctoral dissertation). University of Illinois at Urbana-Champaign.

18.

Chernyshenko

O. S.

Stark

Chan

K. Y.

Drasgow

Williams

B. A.

(2001). Fitting item response theory models to two personality inventories: Issues and insights. Multivariate Behavioral Research, 36, 523-562.

19.

Chernyshenko

O. S.

Stark

Drasgow

Roberts

B. W.

(2007). Constructing personality scales under the assumptions of an ideal point response process: Toward increasing the flexibility of personality measures. Psychological Assessment, 19, 88-106.

20.

de la Torre

(2009). DINA model and parameter estimation: A didactic. Journal of Educational and Behavioral Statistics, 34, 115-130.

21.

de la Torre

Stark

Chernyshenko

O. S.

(2006). Markov chain Monte Carlo estimation of item parameters for the generalized graded unfolding model. Applied Psychological Measurement, 30, 216-232.

22.

Joo

S. H.

Lee

Stark

(2017). Evaluating anchor-item designs for concurrent calibration with the GGUM. Applied Psychological Measurement, 41, 83-96.

23.

Koenig

J. A.

Roberts

J. S.

(2007). Linking parameters estimated with the generalized graded unfolding model: A comparison of the accuracy of characteristic curve methods. Applied Psychological Measurement, 31, 504-524.

24.

Luo

(2001). A class of probabilistic unfolding models for polytomous responses. Journal of Mathematical Psychology, 45, 224-248.

25.

Luo

Andrich

(2005). Item information functions for the general unfolding dichotomous model. In Alagumalai

Curtis

D. D.

Hungi

(Eds.), Applied Rasch measurement: A book of exemplars (pp. 308-328). New York, NY: Springer.

26.

Mislevy

R. J.

Bock

R. D.

(1990). BILOG 3: Item analysis and test scoring with binary logistic models. [Computer program]. Mooresville, IN: Scientific Software.

27.

Patz

R. J.

Junker

B. W.

(1999). Applications and extensions of MCMC in IRT: Multiple item types, missing data, and rated responses. Journal of Educational and Behavioral Statistics, 24, 342-366.

28.

R Core Team. (2016). R: A language and environment for statistical computing [Computer software]. Austria, Vienna: R Foundation for Statistical Computing.

29.

Roberts

J. S.

Donoghue

J. R.

Laughlin

J. E.

(2000). A general item response theory model for unfolding unidimensional polytomous responses. Applied Psychological Measurement, 24, 3-32.

30.

Roberts

J. S.

Donoghue

J. R.

Laughlin

J. E.

(2002). Characteristics of MML/EAP parameter estimates in the generalized graded unfolding model. Applied Psychological Measurement, 26, 192-207.

31.

Roberts

J. S.

Laughlin

J. E.

(1996). A unidimensional item response model for unfolding responses from a graded disagree-agree response scale. Applied Psychological Measurement, 20, 231-255.

32.

Roberts

J. S.

Laughlin

J. E.

Wedell

D. H.

(1999). Validity issues in the Likert and Thurstone approaches to attitude measurement. Educational and Psychological Measurement, 59, 211-233.

33.

Roberts

J. S.

Thompson

V. M.

(2011). Marginal maximum a posteriori item parameter estimation for the generalized graded unfolding model. Applied Psychological Measurement, 35, 259-279.

34.

Samejima

(1969). Estimation of latent ability using a response pattern of graded scores (Psychometric Monograph No. 17). Richmond, VA: Psychometric Society.

35.

Seybert

(2013). A new item response theory model for estimating person ability and item parameters for multidimensional rank order responses (Unpublished doctoral dissertation). University of South Florida, Tampa.

36.

Stark

Chernyshenko

O. S.

Drasgow

(2005). An IRT approach to constructing and scoring pairwise preference items involving stimuli on different dimensions: The multi-unidimensional pairwise-preference model. Applied Psychological Measurement, 29, 184-203.

37.

Stark

Chernyshenko

O. S.

Drasgow

Williams

B. A.

(2006). Examining assumptions about item responding in personality assessment: Should ideal point methods be considered for scale development and scoring? Journal of Applied Psychology, 91, 25-39.

38.

Stark

Chernyshenko

O. S.

Guenole

(2011). Can subject matter expert ratings of statement extremity be used to streamline the development of unidimensional pairwise preference scales? Organizational Research Methods, 14, 256-278.

39.

Stark

Chernyshenko

O. S.

Lee

W. C.

Drasgow

(2000, April). New insights in personality measurement: Application of an ideal point IRT model. Paper presented at the 15th annual conference for the Society of Industrial and Organizational Psychology, New Orleans, LA.

40.

Tay

Ali

U. S.

Drasgow

Williams

B. A.

(2011). Fitting simulated dichotomous and polytomous data: Examining the difference between ideal point and dominance models. Applied Psychological Measurement, 35, 280-295.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.19 MB