Uncertainty in Latent Trait Models

Abstract

A model that extends the Rasch model and the Partial Credit Model to account for subject-specific uncertainty when responding to items is proposed. It is demonstrated that ignoring the subject-specific uncertainty may yield biased estimates of model parameters. In the extended version of the model, uncertainty and the underlying trait are linked to explanatory variables. The parameterization allows to identify subgroups that differ in uncertainty and the underlying trait. The modeling approach is illustrated using data on the confidence of citizens in public institutions.

Keywords

Rasch model partial credit model rating scales response styles ordinal data heterogeneity dispersion

Introduction

Individual-specific tendencies to respond to items irrespective of content can affect the reliability and validity of scale scores. In particular, response styles and their impact on reliability have been thoroughly investigated. Response styles include a tendency to middle or extreme categories, a tendency to agree with items regardless of content, or a tendency to respond to items carelessly. An overview that includes more response styles was given by Van Vaerenbergh and Thomas (2013).

Various methods for investigating response styles in latent trait theory have been proposed. One method uses multi-trait models, which assume that there are several distinct traits that influence category selection, one or more of the traits representing response styles, see, for example, Bolt and Johnson (2009), Johnson and Bolt (2010), Bolt and Newton (2011), Wetzel and Carstensen (2017), and Falk and Cai (2016). Johnson (2003) considered a cumulative type model for extreme response styles, Wetzel and Carstensen (2017) and Plieninger (2016) proposed partial credit models (PCMs) that account for specific response styles, and Jin and Wang (2014) and Tutz et al. (2018) extended the PCM to accommodate for the extreme response style. An alternative strategy for measuring response style is the use of finite mixtures, which was introduced by Rost et al. (1996). It is assumed that the observed response is a mixture of a finite number of latent responses, that means, the whole population can be divided into disjunctive latent classes. After classes have been identified, it is investigated if item characteristics differ between classes, potentially revealing differing response styles, see, for example, Eid and Rauber (2000), Gollwitzer et al. (2005), Maij-de Meij et al. (2008), Moors (2010), and Van Rosmalen et al. (2010). An instructive overview on mixture-distribution and HYBRID Rasch models was given by von Davier and Yamamoto (2007). More recently, tree-based methods to investigate response styles have been proposed, see Böckenholt (2017) and Böckenholt and Meiser (2017).

This article investigates a specific response behavior that does not amount to a response style in the traditional sense although it shares similarities with the noncontingent response style (NCR), which is found if persons have a tendency to respond to items carelessly, randomly, or nonpurposefully (Baumgartner & Steenkamp, 2001; Van Vaerenbergh & Thomas, 2013). The response behavior that is considered is characterized by varying degrees of uncertainty. It means that respondents may respond in a deliberate way, knowing exactly which category they prefer, or suffer from a high degree of uncertainty, responding nonpurposefully. Traditionally, the term “response style” is used to describe an individual’s tendency to choose a certain kind of response category, for example, extreme or middle categories, irrespective of item content and the individual’s trait value. Modeling of potential uncertainty is somewhat different. If a person is very certain about the category he or she prefers, the person will have a very high probability to choose a specific category for any given item; however, the chosen categories will be different over items if the item parameters differ across items. Therefore, the person will not prefer a specific kind of response category and thus the behavior is not a response style in the traditional sense. Although one could consider it as a response style in a wider sense because the person has a specific way to respond to items that are not driven by content, the authors will not refer to it as a response style to avoid confusion with the term response style as it is commonly used.

In recent years, the inclusion of uncertainty in ordinal regression has been investigated by Piccolo (2003), Iannario and Piccolo (2016), Gottard et al. (2016), Tutz et al. (2017), and Simone and Tutz (2018), a comprehensive overview has been given by Piccolo and Simone (2019). The basic assumption behind the so-called CUB models, which stands for Combination of a Uniform and a shifted Binomial distribution, is that the choice of a response category is determined by a mixture of a distinct preference and uncertainty. The latter is represented by a uniform distribution over the response categories. But CUB models are designed as regression models without assuming repeated measurements, they are not latent trait models, uncertainty is linked to explanatory variables, and they do not account for subject-specific response styles.

The modeling strategy proposed in the following is the explicit modeling of uncertainty by introducing subject-specific parameters that are consistent throughout items and might be determined by external explanatory variables. The proposed model explicitly aims at modeling the heterogeneity in the population. The authors consider in detail extensions of the PCM (Masters, 1982), a model for polytomous data that reduces to the binary Rasch model when applied to dichotomous data.

Subject-specific factors for binary models were considered before. The approach proposed by Reise (2000) has been critically discussed by Conijn et al. (2011). The latter investigated in particular problems with the representation as a multilevel logistic regression model. More recently, Ferrando (2016) proposed a normal-ogive model that contains item and person discrimination parameters. The presence of two factors makes difficult estimation procedures necessary. Therefore, Ferrando (2016) proposed a two-step approach, which works only under rather specific assumptions. The model proposed here differs from the models proposed by Ferrando and others in several respects. The authors consider extensions of the PCM, not the graded response model. Moreover, the authors include explanatory variables and use marginal estimation methods that allow that the slope parameters can be correlated with content-related parameters.

In section “Unobserved Heterogeneity and the Occurrence of Invalid Parameters,” it is demonstrated that ignoring heterogeneity in variance over subpopulations may yield strongly misleading parameter estimates. In section “Heterogeneity in Uncertainty,” models are proposed that account for the heterogeneity by including subject-specific parameters. After investigating the properties of the parameter estimates in a simulation study (section “Simulation Study”), an application is given. In section “Alternative Item Response Models,” it is briefly shown that multiplicative effects function quite differently in different models.

Unobserved Heterogeneity and the Occurrence of Invalid Parameters

In the following, a specific form of unobserved heterogeneity that can cause severe problems in latent trait models is considered. It is of interest because it can be seen as one of the sources of uncertainty and a motivation for the model that is proposed. For simplicity, the authors consider the binary Rasch model although the same problems are found in latent trait models with more than two response categories. The binary Rasch model assumes that the response $Y_{pi} \in {0, 1}$ of person $p$ when responding to item $i$ is determined by:

P (Y_{pi} = 1) = \frac{\exp (θ_{p} - δ_{i})}{1 + \exp (θ_{p} - δ_{i})}, p = 1, \dots, P, i = 1, \dots, I .

In achievement tests, $θ_{p}$ typically represents the ability of the person and $δ_{i}$ the difficulty of the item. In questionnaires, $θ_{p}$ may represent the attitude and $δ_{i}$ an item-specific threshold on the latent scale. In both cases, it is assumed that the parameters are on the same latent scale. For the identification of problems that may arise when using the Rasch model, it is instructive to consider the derivation of the model from the assumption of latent random variables. When person $p$ meets item $i$ one assumes:

The ability or attitude is determined by the continuous random variable $Y_{pi}^{*} = θ_{p} + σ_{ε} ε_{pi}$ , where $θ_{p}$ is a fixed parameter linked to the person, $ε_{pi}$ is a random variable that represents the variability of the response, and $σ$ is a dispersion parameter.

The link between the unobserved variable $Y_{pi}^{*}$ and the observed response is given by $Y_{pi} = 1$ if $Y_{pi}^{*} \geq δ_{i}$ , which means that one observes $Y_{pi} = 1$ if the latent variable is larger than the item-specific threshold $δ_{i}$ .

If one assumes that the noise variable $ε_{pi}$ has the logistic distribution function $F (η) = \exp (η) / (1 + \exp (η))$ , it is straightforward to derive the model:

P (Y_{pi} = 1) = \frac{\exp ((θ_{p} - δ_{i}) / σ_{ε})}{1 + \exp ((θ_{p} - δ_{i}) / σ_{ε}))}, p = 1, \dots, P, i = 1, \dots, I .

In this representation parameters are not identifiable; therefore, constraints on the parameters are needed. Typically, one uses the scale constraint $σ_{ε} = 1$ . Then, the model is equivalent to the common Rasch model in which parameters are identified up to an additive constant. To obtain identified parameters, it is common to use a sum normalization so that the sum of item difficulties is assumed to be zero. Alternatively, an arbitrary item can be fixed to a constant value.

The derivation uses implicitly that the dispersion parameter $σ_{ε}$ is the same for all persons and items. However, this is a strong assumption that does not have to hold. Let us assume more generally that the latent variable is given by $Y_{pi}^{*} = θ_{p} + σ_{p} ε_{pi}$ with person-specific dispersion $σ_{p}$ . To keep things simple, let us first consider the case where the dispersion takes only two values, depending on a binary trait like gender or age group (young/old). This can be represented by $σ_{p} = \exp (x_{p} γ)$ , where $x_{p}$ is a group indicator with values $x_{p} \in {0, 1}$ . Then, one obtains $σ_{p} = \exp (γ)$ if $x_{p} = 1$ and $σ_{p} = 1$ if $x_{p} = 0$ . If one derives the observed response in the same way as previously as a dichotomized version of latent variables, one obtains different parameters for the two groups. More concrete, one obtains,

\log (\frac{P (Y_{pi} = 1)}{P (Y_{pi} = 0)}) = θ_{p} - δ_{i} if x_{p} = 0, \log (\frac{P (Y_{pi} = 1)}{P (Y_{pi} = 0)}) = \frac{θ_{p}}{e^{γ}} - \frac{δ_{i}}{e^{γ}} if x_{p} = 1 .

This entails peculiar effects if one wants to compare parameters. Actually, one has two Rasch models, one that holds in the subpopulation $x_{p} = 0$ and one in the subpopulation $x_{p} = 1$ . Formally, these can be given by:

\log (\frac{P (Y_{pi} = 1)}{P (Y_{pi} = 0)}) = θ_{p}^{(s)} - δ_{i}^{(s)}, s = 0, 1,

(1)

with $s = 0$ representing $x_{p} = 0$ and $s = 1$ representing $x_{p} = 1$ . In the group $x_{p} = 0$ , one has the original parameters $θ_{p}^{(0)} = θ_{p}$ , $p = 1, \dots, P$ , $δ_{i}^{(0)} = δ_{i}$ , and $i = 1, \dots, I$ , whereas in the group $x_{p} = 1$ , one has the parameters $θ_{p}^{(1)} = θ_{p} / e^{γ}$ , $p = 1, \dots, P$ , $δ_{i}^{(1)} = δ_{i} / e^{γ}$ , and $i = 1, \dots, I$ . It is essential that in both subpopulations simple Rasch models hold. However, comparison of parameters between groups may be strongly misleading. For illustration, let $x_{p}$ refer to gender with $x_{p} = 1$ coding females and $x_{p} = 0$ males. Let us consider two persons, one female with parameter $θ_{f}$ and one male with parameter $θ_{m}$ , which have the same strength parameter, that is, $θ_{f} = θ_{m}$ . If one compares the Rasch model parameters of the two persons, one obtains,

\frac{θ_{m}^{(0)}}{θ_{f}^{(1)}} = \frac{θ_{m}}{θ_{f}} e^{γ} = e^{γ} .

That means, if $γ > 0$ , although the underlying abilities are the same ( $θ_{f} = θ_{m}$ ), the comparison of the Rasch model parameters measured by the Rasch model parameters $θ_{p}^{(s)}$ indicates that the ability of the male person is larger than the ability of the female person. The reason is that the female person is confronted with “simpler” items $δ_{i} / e^{γ}$ than the male persons.

It should be noted that the Rasch model does not hold in the total population. However, it holds in each subpopulation and can be legitimately fitted within subpopulations. But parameters (and parameter estimates) cannot be compared because parameters in each subpopulation are scaled using the scale constraint $σ = 1$ in each subpopulation.

Even if one does not want to compare parameter estimates, it is obvious that one runs into problems if one ignores heterogeneity and fits a simple Rasch model to the total population. The heterogeneity of the person parameters is less severe because although the persons come from different subpopulations, each person has his or her own parameter. However, estimates of item parameters tend to be biased because persons from different subpopulations respond to items with different difficulty parameters. For males, the difficulties are $δ_{i}$ , and for females, $δ_{i} / e^{γ}$ .

Similar problems with unobserved heterogeneity have been found for binary and ordinal regression models, and Allison (1999) showed that misleading parameter estimates can occur if one fits a binary logit model in separate groups. Some methods to correct parameter estimates in regression were considered by Williams (2009), Mood (2010), Karlson et al. (2012), Breen et al. (2014), and Tutz (2018). In item response theory mixture-distribution approaches can be used to this end, for an overview see von Davier and Rost (2016).

Heterogeneity in Uncertainty

In the following, the authors consider models that are able to avoid the occurrence of biased estimates caused by unobserved heterogeneity. The family of models that is considered is the Rasch model family represented by the PCM.

The PCM

Let $Y_{pi} \in {0, 1, \dots, k}$ , $p = 1, \dots, P$ , $i = 1, \dots, I$ , denote the ordinal response of person $p$ on item $i$ . The PCM, which was considered, among others, by Masters (1982) and Masters and Wright (1984), assumes for the probabilities:

P (Y_{pi} = r) = \frac{\exp (\sum_{l = 1}^{r} θ_{p} - δ_{il})}{\sum_{s = 0}^{k} \exp (\sum_{l = 1}^{s} θ_{p} - δ_{il})}, r = 1, \dots, k,

where $θ_{p}$ is the person parameter and $(δ_{i 1}, \dots, δ_{ik})$ are the item parameters of item $i$ . For notational convenience, the definition of the model implicitly uses $\sum_{l = 1}^{0} θ_{p} - δ_{il} = 0$ .

The defining property of the PCM is seen if one considers adjacent categories. The resulting presentation,

\log (\frac{P (Y_{pi} = r)}{P (Y_{pi} = r - 1)}) = θ_{p} - δ_{ir}, r = 1, \dots, k,

shows that the model is locally (given response categories $r - 1$ , $r$ ) a binary Rasch model with person parameter $θ_{p}$ and item difficulty $δ_{ir}$ . It is immediately seen that for $θ_{p} = δ_{ir}$ , the probabilities of adjacent categories are equal, that is, $P (Y_{pi} = r) = P (Y_{pi} = r - 1)$ .

An Extended PCM

The extended version of the PCM that is proposed has the form:

P (Y_{pi} = r) = \frac{\exp (\sum_{l = 1}^{r} e^{α_{p}} (θ_{p} - δ_{il}))}{\sum_{s = 0}^{k} \exp (\sum_{l = 1}^{s} e^{α_{p}} (θ_{p} - δ_{il}))}, r = 1, \dots, k .

Thus, the usual predictor in the PCM, $η_{pir} = θ_{p} - δ_{ir}$ , which distinguishes between category $r - 1$ and $r$ , is replaced by the more general predictor:

η_{pir} = e^{α_{p}} (θ_{p} - δ_{ir}), r = 1, \dots, k,

(2)

which contains the additional subject-specific parameter $α_{p}$ . As shown in the following, the parameter $α_{p}$ determines the observed response in a specific way.

Interpretation of subject-specific parameters

Let us start with the simplest case of a binary response ( $k = 1$ ). Then, it is easily seen that the following holds:

If $α_{p} = 0$ for all $p$ , one obtains the binary Rasch model.

If $α_{p} > 0$ , the person $p$ is a strong discriminator, he or she has a distinct preference for one of the two categories depending on $θ_{p}$ . For $α_{p} \to \infty$ , one obtains $P (Y_{pi} = 1) = 1$ if $θ_{p} > δ_{i 1}$ and $P (Y_{pi} = 0) = 1$ if $θ_{p} < δ_{i 1}$ .

If $α_{p} < 0$ , the person $p$ is a weak discriminator. For $α_{p} \to - \infty$ , one obtains $P (Y_{pi} = 1) = 0.5$ for all abilities/attitudes $θ_{p}$ . The person shows a NCR, which means he or she has a tendency to respond to items randomly, or nonpurposefully.

In the general PCM, one has to distinguish between two cases, ordered thresholds and unordered thresholds. In the case of ordered thresholds $(δ_{ir} \leq δ_{i, r + 1})$ , one obtains the following:

If $α_{p} = 0$ for all $p$ , one obtains the traditional PCM.

For $α_{p} \to \infty$ , one obtains for a person with $θ_{p} \in (δ_{ir}, δ_{i, r + 1})$ the probability $P (Y_{pi} = r) = 1$ , one observes a distinct response, the person knows exactly which category he or she prefers. The property holds for all $k$ if one defines in addition $δ_{i 0} = - \infty$ , $δ_{i, k + 1} = \infty$ .

For $α_{p} \to - \infty$ , one obtains $P (Y_{pi} = r) = 1 / (k + 1)$ for all abilities/attitudes $θ_{p}$ . The person’s response has a discrete uniform distribution over the response categories, which means random responding.

For illustration, the impact of the parameter $α_{p}$ is visualized in Figure 1. It shows the response probabilities for a PCM with four categories for five different values of $α_{p}$ . For $α_{p} = 0$ , one obtains the response probabilities given in the middle, which represent the response probabilities for the traditional PCM without subject-specific heterogeneity in discrimination. It is seen that for decreasing $α_{p}$ , one comes closer to a uniform distribution across categories, whatever the parameter $θ_{p}$ is; for increasing $α_{p}$ , the preference for categories becomes very distinct depending on the value of $θ_{p}$ . The chosen parameters are rather large/small so that the impact becomes obvious.

Figure 1.

Response probabilities in an extended PCM for four values of $α_{p}$ (ordered thresholds).

In the case of three response categories ( $k = 2$ ), which are considered for simplicity, and reverse thresholds $δ_{i 2} < δ_{i 1}$ , the following behavior is found. For $α_{p} \to \infty$ , one obtains:

For all persons $P (Y_{pi} = 1) = 0$ , that is, the middle category is never chosen.

For persons with $θ_{p} < (δ_{i 1} + δ_{i 2}) / 2$ one has $P (Y_{pi} = 0) = 1$ .

For persons with $θ_{p} > (δ_{i 1} + δ_{i 2}) / 2$ one has $P (Y_{pi} = 2) = 1$ .

Thus, the inverse structure of thresholds yields a more distinct avoidance of the middle category than the traditional PCM.

For $α_{p} \to - \infty$ , one obtains again $P (Y_{pi} = r) = 1 / (k + 1)$ for all abilities/attitudes $θ_{p}$ . The person has a discrete uniform distribution over the response category, which means random responding.

As has been demonstrated, the parameter $α_{p}$ can be seen as modeling the subject-specific decisiveness or discriminatory power. For large $α_{p}$ , the person has distinct preferences; for small $α_{p}$ , the person tends to a choose one of the response categories at random. As it is not possible to determine if indecisiveness or carelessness is the reason, the authors will, more generally, refer to the subject-specific effect $e^{α_{p}}$ as uncertainty effect. Although the used terminology primarily refers to attitude measurement or personality questionnaires, uncertainty may also come into play in achievement tests. The uncertainty may refer to a nonpurposeful response representing a person’s ability to work concentrated or distractedly. Without specifying the specific source, the term $e^{α_{p}}$ is considered as representing uncertainty and the extended model is called the uncertainty partial credit model (UPCM).

The uncertainty can also explain the occurrence of response patterns that are unlikely in a unidimensional model in which uncertainty is ignored. The responses of a person with high uncertainty is hardly predictable because he or she shows random behavior.

It should be noted that the uncertainty parameter $α_{p}$ is a function of the unobserved heterogeneity considered in section “Unobserved Heterogeneity and the Occurrence of Invalid Parameters.” In the special case of binary responses ( $k = 1$ ), the unobserved dispersion $σ_{p}$ in the latent variable $Y_{pi}^{*} = θ_{p} + σ_{p} ε_{pi}$ is given by $σ_{p} = e^{- α_{p}}$ . This interpretation as the heterogeneity in discrimination resulting from unobserved variability is also possible in the general PCM. If one derives the PCM from latent variables locally (given categories $r - 1$ , $r$ ), the same reasoning applies as for the binary Rasch model. While $e^{α_{p}}$ represents the distinctness of person $p$ , represents the person-specific dispersion.

The UPCM extends the PCM to account for uncertainty of respondents. In the same way, the $e^{{- α}_{p}}$ generalized PCM considered by Muraki (1992, 1997) can be extended to the uncertainty generalized partial credit model (UGPCM):

\log (\frac{P (Y_{pi} = r)}{P (Y_{pi} = r - 1)}) = e^{α_{p}} a_{i} (θ_{p} - δ_{ir}), r = 1, \dots, k,

which includes an item-specific slope parameter $a_{i}$ . In the generalized PCM, the items have differing discriminatory powers. The additional subject-specific uncertainty parameter means that discriminatory power varies also across persons.

The UPCM and the UGPCM extend the PCM and its generalized version to include the uncertainty of respondents. The parameterization is quite different from the extensions of the PCM considered by Jin and Wang (2014) and Tutz et al. (2018). The latter aims at modeling extreme response styles and assumes that the distance between thresholds of adjacent categories is subject-specific.

Including Subject-Specific Characteristics

In the UPCM, each person has its own uncertainty parameter $α_{p}$ , which yields a large number of parameters. Thus, for estimation, it is useful to assume that they are random effects. It is of special interest to investigate if dispersion heterogeneity is determined by subject-specific covariates. To this end, let the uncertainty parameter depend on a vector of subject-specific covariates $x_{p}$ in the form $α_{p} = α_{p 0} + x_{p}^{T} α$ and assume that $α_{p 0}$ is a random effect that follows the normal distribution $N (0, σ^{2})$ . In the same way, one can include explanatory variables for the trait parameter using $θ_{p} = θ_{p 0} + x_{p}^{T} ξ$ . Thus, the general UPCM is given by,

P (Y_{pi} = r) = \frac{\exp (\sum_{l = 1}^{r} e^{α_{p 0} + x_{p}^{T} α} (θ_{p 0} + x_{p}^{T} ξ - δ_{il}))}{\sum_{s = 0}^{k} \exp (\sum_{l = 1}^{s} e^{α_{p 0} + x_{p}^{T} α} (θ_{p 0} + x_{p}^{T} ξ - δ_{il}))}, r = 1, \dots, k .

Figure 2 shows the resulting response probabilities if a binary predictor (male: $x_{p} = 1$ , female: $x_{p} = 0$ ) is included in the location part and the dispersion part of an UPCM with parameters $α = 1.5$ and $ξ = 0$ for the middle row of the plot and parameters $α = 0$ and $ξ = 1.5$ for the bottom plot. The first row shows the effect of $α_{p 0}$ , larger values increase the distinctness, and smaller values decrease distinctness. In the middle row, one sees the probabilities resulting from an additional dispersion effect $α = 1.5$ , which makes all responses more distinct in the male population. In the third row, the location/trait effect $ξ = 1.5$ is visualized. It increases the probabilities for higher categories because the trait is stronger in the male population.

Figure 2.

Response probabilities in an extended PCM with a binary predictor and varying parameters $α_{p 0}, α, ξ$ .

Parameters are estimated by marginal likelihood where it is assumed that the person parameters $θ_{p 0}, α_{p 0}$ follow a normal distribution, $N (0, Σ)$ . The diagonals of the matrix $Σ$ contain the variance of the person effects $σ_{θ}^{2}$ and the variance of the uncertainty parameters $σ_{α}^{2}$ , and the off diagonals are the covariances between uncertainty and location effects, $co v_{θ α}$ . Details of the estimation procedure are given in an online appendix.

Simulation Study

A variety of simulations to evaluate the performance of the method and the possible consequences of ignoring uncertainty. Both model versions the authors propose (UPCM and UGPCM) are compared with their simpler counterparts PCM and GPCM, respectively. The number of observations ( $n = 200$ or $n = 500$ ) has been conducted and the number of items ( $I = 10$ or $I = 30$ ) were varied over the simulation settings. The data were simulated under the assumption that the UPCM or the UGPCM holds. The number of categories of the response variables was fixed to $k = 5$ . As explanatory variables, the authors used one binary variable and one continuous variable, and the latter was drawn from a standard normal distribution. Scenarios in which these variables had an effect as well as scenarios in which no effects were present. Also, the uncertainty effects vary across the simulation settings. In the cases in which these effects are considered were present, the respective parameters were set to $ξ_{1}^{T} = (0, 0)$ or $ξ_{2}^{T} = (0.2, - 0.1)$ for the trait effects, and $α_{1}^{T} = (0, 0)$ or $α_{2}^{T} = (- 0.2, 0.1)$ for the uncertainty parameters. Furthermore, the covariance matrix of the random effects was fixed to either:

Σ_{1} = (\begin{matrix} 1 & 0 \\ 0 & 0 \end{matrix}) or Σ_{2} = (\begin{matrix} 1 & 0.1 \\ 0.1 & 0.5 \end{matrix})

The item parameters $δ_{ir}$ were generated randomly using a standard normal distribution. To avoid pathological situations, within each item the item parameters were ordered increasing in size.

Overall, nine different simulation settings were inspected, each setting was conducted with 100 replications. Figure 3 displays boxplots of the mean squared error (MSE) of the threshold parameters $δ_{ir}$ , separately for each simulation setting based on the UPCM as data generating process. It is seen that in the cases in which no uncertainty effects were present and no covariates included (first column), the UPCM performs slightly worse than the PCM. This was to be expected because the PCM is the data generating model and the UPCM fits superfluous parameters. If uncertainty parameters are not equal to zero (second and third columns), the UPCM dominates the PCM, in particular when explanatory variables are included (third column), the dominance is very distinct.

Figure 3.

Boxplots illustrating the MSE (on log-scale) for estimates of the threshold parameters $δ_{ir}$ , separately for nine different simulation settings and both for the regular PCM model and the UPCM model.

Exemplarily, one specific simulation setting (UPCM with $n = 500$ , $I = 30$ , $ξ_{2}$ and $Σ_{2}$ , compare bottom right panel of Figure 3). Figure 4 compares the estimates of the threshold parameters of a regular PCM with the item parameter is considerd in more detail obtained for the UPCM. For reasons of clarity, presentation is restricted to the first six items. The boxplots show the respective estimates together with the true values, separately for each item and separately for PCM and UPCM. True values are highlighted by (red) crosses. It can be seen, that in contrast to the UPCM, the regular PCM estimates are biased. In particular, the estimates for the first and the last thresholds appear to be biased even stronger than the two middle thresholds.

Figure 4.

Boxplots for estimates of the four threshold parameters $δ_{ir}$ together with true values (red crosses) for Items 1 to 6. Estimates are displayed separately for all items and both for the regular PCM model and the UPCM model.

Figure 5 displays the estimates of the random effects covariance matrix $Σ_{2}$ . Again, the estimates can be compared with estimates from the regular PCM; however, obviously the PCM only provides estimates for the random effect of the trait. The PCM clearly underestimates the variance of the trait effects. When using the UPCM, the bias is weaker and the other parameters are estimated reasonably well. Figure 6 displays the estimates of all covariate effects, both for trait effects and uncertainty parameters. All effects are estimated rather well by the UPCM model.

Figure 5.

Boxplots for estimates of random effects covariance parameters from covariance $Σ$ together with true values (red crosses). PCM only entails a random effect for trait effects, the respective estimates are shown for comparison.

Figure 6.

Boxplots for estimates of covariate effects $ξ$ for trait effects and $α$ for uncertainty parameters together with true values (red crosses) and separately for both explanatory variables.

Figure 7 displays boxplots of the MSEs of the threshold parameters $δ_{ir}$ , separately for each simulation setting when the UGPCM is the data generating model. The results are very similar to the comparison of the PCM and the UPCM. If uncertainty (and covariate) effects are present, the UGPCM performs much better than the simple GPCM. However, when compared with Figure 3, it is seen that more outliers occur if the model contains item-specific slopes.

Figure 7.

Boxplots illustrating the MSE (on log-scale) for estimates of the threshold parameters $δ_{ir}$ , separately for nine different simulation settings and both for the regular GPCM model and the UGPCM model.

An Application

For illustration, data from the ALLBUS, the general survey of social science carried out by the German institute GESIS (http://www.gesis.org/allbus). The data contain the answers of 2,535 respondents from the questionnaire in 2012. In particular, the authors consider eight items that refer to the degree of confidence the participants have in public institutions and organizations are considered. These institutions are the federal court, the Bundestag (parliament), the justice system, TV, press, government, police, and political parties. The items are measured on a scale from 1 (no confidence at all) to 7 (excessive confidence). As explanatory variables for the trait effects and for the uncertainty effects the following person characteristics were used: Age: age of participant in years; Gender: 0: male; 1: female; Income: Income of participant in Euros; WestEast: 1: East Germany/former GDR; 0: West Germany/former FRG.

To ensure that all covariate effects are comparable in their size, all variables were standardized. Both a simple PCM and the UPCM were fitted to the data. The variance of the random effect for the trait parameters in the PCM was estimated to be ${\hat{σ}}^{2} = 0.736$ , when fitting the UPCM the covariance matrix was estimated as:

\hat{Σ} = (\begin{matrix} 0.927 & 0.037 \\ 0.037 & 0.422 \end{matrix}) .

Although there seems to be no correlation between both random effects, it seems that the random uncertainty effect with an estimate of ${\hat{σ}}_{α}^{2} = 0.422$ (standard error $0.025$ ) cannot be neglected. The estimate for the variance of the trait effects ${\hat{σ}}_{θ}^{2} = 0.927$ (standard error $0.042$ ) is slightly larger than the corresponding estimate from the PCM. In addition, also fitted the UPCM without including covariates was fitted. The obtained covariance,

\hat{Σ} = (\begin{matrix} 0.934 & 0.055 \\ 0.055 & 0.419 \end{matrix}),

is very similar to the covariance obtained for the UPCM with covariates.

Figure 8 displays the estimates of the item parameters of both the simple PCM and the proposed UPCM. It can be seen that in particular the estimates for the exterior thresholds differ between both models while the estimates for the inner thresholds are rather similar. For the version of the UPCM without covariates, the estimates of the item parameters are very similar to the estimates from the regular UPCM and therefore are not shown.

Figure 8.

Item parameter estimates for confidence data, separately for simple PCM and the proposed UPCM.

Table 1 collects the parameter estimates of both the trait effects and the uncertainty parameters of the explanatory variables together with the corresponding standard errors. It is seen that with the exception of the gender and age effects of trait confidence all effects turned out to be significant (for $α = 0.05$ ).

Table 1.

Parameter Estimates for Effects of Explanatory Variables (Together with Standard Errors), Both for Trait Effects $ξ$ and for Uncertainty Effects $α$ .

	$ξ$ (Trait)	$α$ (Uncertainty)
Income	0.051 (0.004)	0.056 (0.020)
Gender	−0.003 (0.020)	0.044 (0.019)
Age	−0.035 (0.020)	−0.055 (0.018)
WestEast	−0.155 (0.021)	−0.039 (0.019)

For the interpretation of the effects, the authors propose a visualization tool, which is in particular helpful, when many explanatory variables are available. For the motivation, let us consider again the UPCM, which can be given by:

\log (\frac{P (Y_{pi} = r)}{P (Y_{pi} = r - 1)}) = e^{α_{p 0} + x_{p}^{T} α} (θ_{p 0} + x_{p}^{T} ξ - δ_{ir}) .

From this representation, it is seen that the person and item parameters determine the log-odds of observing category $r$ rather than category $r - 1$ . One obtains the following:

a multiplicative effect $e^{α_{j}}$ if the jth variable increases by one unit, and

a location effect that shifts the second part of the predictor by $ξ_{j}$ if the jth variable increases by one unit.

We plot for each variable the effect point $(ξ_{j}, e^{α_{j}})$ together with 0.95 confidence intervals in both directions, which yields stars (Figure 9). The no-effects reference point is $(0, e^{0}) = (0, 1)$ . The abscissa represents the effect on traits, values on the right (larger than zero) indicate that the trait increases with increasing variable values, and values on the left (below zero) indicate that the trait decreases when the variable increases. It is seen that higher income increases confidence in public institutions and people living in the former east (WestEast = 1) tend to have reduced confidence. It is also seen that age tends to reduce the confidence, although the effect is not significant at the 0.05 level because the star crosses the zero line $ξ_{j} = 0$ . The ordinate represents the uncertainty or random behavior. Large values (above 1) indicate distinctness of the response, small values (below 1) indicate indecision. Income increases distinctness, also females have a tendency to a more distinct response. Increasing age reduces the distinctness of the response, and also people from the former west show higher uncertainty. In summary, only income and WestEast appear to have distinct effects on the general trait level while all variables show significant effects with respect to uncertainty.

Figure 9.

(Exponential) effects of explanatory variables in ALLBUS data together with confidence intervals both for trait effects $ξ$ and uncertainty effects $α$ .

Alternative Item Response Models

The PCM is an extension of the binary Rasch model, but not the only one. Also Samejima’s graded response model (Samejima, 1997) and the sequential model (Tutz, 1989; Verhelst et al., 1997) are extensions of the binary model, which contain the Rasch model as special cases. In the same way as the PCM these models can be extended to contain an additional subject-specific uncertainty component. It is straightforward in the sequential model, which assumes a step wise solving of items, and every step is specified as a dichotomous IRT model. Also the graded response model, which works well in personality questionnaires and attitude scales, can be derived from an underlying latent trait. The graded response model has the form $P (Y_{pi} \geq r) = F (θ_{p} - δ_{ir})$ , $r = 1, \dots, k$ , where $F (.)$ again is a cumulative distribution function. The extended version assumes for the probabilities $P (Y_{pi} \geq r) = F (e^{α_{p}} (θ_{p} - δ_{ir}))$ , $r = 1, \dots, k$ , with $e^{α_{p}}$ representing the subject-specific factor. However, some caution is warranted when interpreting the subject-specific term. It differs from the corresponding term in the PCM. The way the subject-specific term modifies the response probabilities is seen best when looking at the extreme cases. One obtains the following properties:

For $α_{p} = 0$ , one obtains the traditional graded response model.

For $α_{p} \to \infty$ , one obtains for a person with $θ_{p} \in (δ_{ir}, δ_{i, r + 1})$ the probability $P (Y_{pi} = r) = 1$ , that means a person knows exactly what he/she wants.

For $α_{p} \to - \infty$ , one obtains $P (Y_{pi} = 0) = P (Y_{pi} = k) = 0.5$ .

In particular, the last case ( $α_{p} \to - \infty$ ) shows that the subject-specific term has a different meaning in the graded response model. Persons with $α_{p} \to - \infty$ choose one of the extreme categories, which means they show what is called an extreme response style (ERS). Thus, when going through the continuum between $α_{p} = - \infty$ to $α_{p} = \infty$ one covers the continuum between an extreme response style and a distinct response. For the PCM with a subject-specific term, one covers the continuum between a uniform distribution, which means uncertainty, and a distinct response.

The difference in interpretation is caused by the specific property of the PCM that modification of the local responses (given $Y \in {r - 1, r}$ ) modifies automatically all the other response probabilities. The extended graded response model is in itself of interest but refers to a different response style and is not further investigated here. The graded response model with a subject-specific factor was considered previously by Ferrando (2009), for alternative models see also Ferrando (2014).

Concluding Remarks

The extended UPCM that is proposed adds a subject-specific uncertainty component to the traditional PCM. It can in particular be used to investigate if uncertainty is determined by person characteristics. Ignoring the uncertainty component can yield biased estimates. Subject-specific uncertainty is not a response style in the traditional sense, but can be seen as a response style in a wider sense, representing a consistent pattern of response behavior.

The proposed models (both UPCM and UGPCM) are implemented in the statistical software R (R Core Team, 2019). The implementation is available from the authors and will be available from CRAN soon. Further details on the estimation procedure can be found in the online appendix.

Supplemental Material

sj-pdf-1-apm-10.1177_0146621620920932 – Supplemental material for Uncertainty in Latent Trait Models

Supplemental material, sj-pdf-1-apm-10.1177_0146621620920932 for Uncertainty in Latent Trait Models by Gerhard Tutz and Gunther Schauberger in Applied Psychological Measurement

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Gerhard Tutz

Supplemental Material

Supplemental material for this article is available online.

References

Allison

P. D.

(1999). Comparing logit and probit coefficients across groups. Sociological Methods & Research, 28(2), 186–208.

Baumgartner

Steenkamp

J.-B. E.

(2001). Response styles in marketing research: A cross-national investigation. Journal of Marketing Research, 38(2), 143–156.

Böckenholt

(2017). Measuring response styles in Likert items. Psychological Methods, 22(1), 69–83.

Böckenholt

Meiser

(2017). Response style analysis with threshold and multi-process IRT models: A review and tutorial. British Journal of Mathematical and Statistical Psychology, 70(1), 159–181.

Bolt

D. M.

Johnson

T. R.

(2009). Addressing score bias and differential item functioning due to individual differences in response style. Applied Psychological Measurement, 33(5), 335–352.

Bolt

D. M.

Newton

J. R.

(2011). Multiscale measurement of extreme response style. Educational and Psychological Measurement, 71(5), 814–833.

Breen

Holm

Karlson

K. B.

(2014). Correlations and nonlinear probability models. Sociological Methods & Research, 43(4), 571–605.

Conijn

J. M.

Emons

W. H.

van Assen

M. A.

Sijtsma

(2011). On the usefulness of a multilevel logistic regression approach to person-fit analysis. Multivariate Behavioral Research, 46(2), 365–388.

Eid

Rauber

(2000). Detecting measurement invariance in organizational surveys. European Journal of Psychological Assessment, 16(1), 20–30.

10.

Falk

C. F.

Cai

(2016). A flexible full-information approach to the modeling of response styles. Psychological Methods, 21(3), 328–347.

11.

Ferrando

P. J.

(2009). A graded response model for measuring person reliability. British Journal of Mathematical and Statistical Psychology, 62(3), 641–662.

12.

Ferrando

P. J.

(2014). A factor-analytic model for assessing individual differences in response scale usage. Multivariate Behavioral Research, 49(4), 390–405.

13.

Ferrando

P. J.

(2016). An IRT modeling approach for assessing item and person discrimination in binary personality responses. Applied Psychological Measurement, 40(3), 218–232.

14.

Gollwitzer

Eid

Jürgensen

(2005). Response styles in the assessment of anger expression. Psychological Assessment, 17(1), 56–69.

15.

Gottard

Iannario

Piccolo

(2016). Varying uncertainty in CUB. Advances in Data Analysis and Classification, 10(2), 225–244.

16.

Iannario

Piccolo

(2016). A comprehensive framework of regression models for ordinal data. Metron, 74(2), 233–252.

17.

Jin

K.-Y.

Wang

W.-C.

(2014). Generalized IRT models for extreme response style. Educational and Psychological Measurement, 74(1), 116–138.

18.

Johnson

T. R.

(2003). On the use of heterogeneous thresholds ordinal regression models to account for individual differences in response style. Psychometrika, 68(4), 563–583.

19.

Johnson

T. R.

Bolt

D. M.

(2010). On the use of factor-analytic multinomial logit item response models to account for individual differences in response style. Journal of Educational and Behavioral Statistics, 35(1), 92–114.

20.

Karlson

K. B.

Holm

Breen

(2012). Comparing regression coefficients between same-sample nested models using logit and probit: A new method. Sociological Methodology, 42(1), 286–313.

21.

Maij-de Meij

A. M.

Kelderman

van der Flier

. (2008). Fitting a mixture item response theory model to personality questionnaire data: Characterizing latent classes and investigating possibilities for improving prediction. Applied Psychological Measurement, 32(8), 611–631.

22.

Masters

G. N.

(1982). A Rasch model for partial credit scoring. Psychometrika, 47(2), 149–174.

23.

Masters

G. N.

Wright

(1984). The essential process in a family of measurement models. Psychometrika, 49(4), 529–544.

24.

Mood

(2010). Logistic regression: Why we cannot do what we think we can do, and what we can do about it. European Sociological Review, 26(1), 67–82.

25.

Moors

(2010). Ranking the ratings: A latent-class regression model to control for overall agreement in opinion research. International Journal of Public Opinion Research, 22(1), 93–119.

26.

Muraki

(1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16(2), 159–176.

27.

Muraki

(1997). A generalized partial credit model. In van der Linden

W. J.

Hambleton

R. K.

(Eds.), Handbook of modern item response theory (pp. 153–164). Springer.

28.

Piccolo

(2003). On the moments of a mixture of uniform and shifted binomial random variables. Quaderni di Statistica, 5, 85–104.

29.

Piccolo

Simone

(2019). The class of CUB models: Statistical foundations, inferential issues and empirical evidence. Statistical Methods & Applications, 28(3), 389–435.

30.

Plieninger

(2016). Mountain or molehill? A simulation study on the impact of response styles. Educational and Psychological Measurement, 77(1), 32–53.

31.

R Core Team. (2019). R: A language and environment for statistical computing [Computer software manual].

32.

Reise

S. P.

(2000). Using multilevel logistic regression to evaluate person-fit in IRT models. Multivariate Behavioral Research, 35(4), 543–568.

33.

Rost

Carstensen

von Davier

(1996). Applying the mixed Rasch model to personality questionnaires. In Rost

Langeheine

(Eds.), Applications of latent trait and latent class models in the social sciences (pp. 324–332). Waxmann.

34.

Samejima

(1997). Graded response model. In van der Linden

W. J.

Hambleton

R. K.

(Eds.), Handbook of modern item response theory (pp. 85–100). Springer.

35.

Simone

Tutz

(2018). Modelling uncertainty and response styles in ordinal data. Statistica Neerlandica, 72(3), 224–245.

36.

Tutz

(1989). Sequential item response models with an ordered response. British Journal of Statistical and Mathematical Psychology, 43(1), 39–55.

37.

Tutz

(2018). Binary response models with underlying heterogeneity: Identification and interpretation of effects. European Sociological Review, 34(2), 211–221.

38.

Tutz

Schauberger

Berger

(2018). Response styles in the partial credit model. Applied Psychological Measurement, 42(6), 407–427.

39.

Tutz

Schneider

Iannario

Piccolo

(2017). Mixture models for ordinal responses to account for uncertainty of choice. Advances in Data Analysis and Classification, 11(2), 281–305.

40.

Van Rosmalen

Van Herk

Groenen

. (2010). Identifying response styles: A latent-class bilinear multinomial logit model. Journal of Marketing Research, 47(1), 157–172.

41.

Van Vaerenbergh

Thomas

T. D

. (2013). Response styles in survey research: A literature review of antecedents, consequences, and remedies. International Journal of Public Opinion Research, 25(2), 195–217.

42.

Verhelst

N. D.

Glas

C. A. W.

de Vries

H. H.

(1997). A steps model to analyze partial credit. In van der Linden

W. J.

Hambleton

R. K.

(Eds.), Handbook of modern item response theory (pp. 123–138). Springer.

43.

von Davier

Rost

. (2016). Logistic mixture-distribution response models. In Van der Linden

W. J.

(Ed.), Handbook of item response theory: Volume 1 (pp. 421–434). Chapman & Hall/CRC.

44.

von Davier

Yamamoto

. (2007). Mixture-distribution and hybrid Rasch models. In von Davier

Carstensen

C. H.

(Eds.), Multivariate and mixture distribution Rasch models (pp. 99–115). Springer.

45.

Wetzel

Carstensen

C. H.

(2017). Multidimensional modeling of traits and response styles. European Journal of Psychological Assessment, 33, 352–364.

46.

Williams

(2009). Using heterogeneous choice models to compare logit and probit coefficients across groups. Sociological Methods & Research, 37(4), 531–559.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.06 MB

0.16 MB