Explaining Variability in Response Style Traits: A Covariate-Adjusted IRTree

Abstract

Contamination of responses due to extreme and midpoint response style can confound the interpretation of scores, threatening the validity of inferences made from survey responses. This study incorporated person-level covariates in the multidimensional item response tree model to explain heterogeneity in response style. We include an empirical example and two simulation studies to support the use and interpretation of the model: parameter recovery using Markov chain Monte Carlo (MCMC) estimation and performance of the model under conditions with and without response styles present. Item intercepts mean bias and root mean square error were small at all sample sizes. Item discrimination mean bias and root mean square error were also small but tended to be smaller when covariates were unrelated to, or had a weak relationship with, the latent traits. Item and regression parameters are estimated with sufficient accuracy when sample sizes are greater than approximately 1,000 and MCMC estimation with the Gibbs sampler is used. The empirical example uses the National Longitudinal Study of Adolescent to Adult Health’s sexual knowledge scale. Meaningful predictors associated with high levels of extreme response latent trait included being non-White, being male, and having high levels of parental support and relationships. Meaningful predictors associated with high levels of the midpoint response latent trait included having low levels of parental support and relationships. Item-level covariates indicate the response style pseudo-items were less easy to endorse for self-oriented items, whereas the trait of interest pseudo-items were easier to endorse for self-oriented items.

Keywords

response style multidimensionality item response tree explanatory item response theory

Survey research methodology is a multidisciplinary field that integrates principles and practices from statistics, sampling theory, psychology, computer science, and many others. One important aspect to evaluating the validity of inferences made from surveys is an understanding of individuals’ cognitive response processes. Early work (e.g., Strack & Martin, 1987; Tourangeau, 1984; Tourangeau & Rasinski, 1988) laid out four general stages in the cognitive response process for answering survey questions: (1) interpreting the question, (2) retrieving relevant inputs or a prior judgment from memory, (3) integrating information to form a judgment, and (4) reporting a response. Survey analysis typically assumes one substantive trait of interest (TOI)—the focus of traditional total-score or latent trait methodologies—is driving responses to each of these stages. However, responses to ordinal survey items may be driven by factors beyond the latent TOI, introducing construct-irrelevant variance, limiting the validity of inferences drawn from item and scale scores.

Research into the sources of construct-irrelevant variance has seen growth in the area of individuals’ response style, which is the tendency of a respondent to use a response scale in a systematic way, regardless of the content of the items and survey (Paulhus, 1991; Plieninger & Meiser, 2014). Baumgartner and Steenkamp (2001) recognized and summarized seven important response styles. Of these, extreme response style (ERS; Cronbach, 1946) has attracted the most attention in the response style literature (Jin & Wang, 2014). ERS denotes a systematic tendency to endorse the outermost, or extreme, options on either end of the scale. For example, on a 5-point ordinal scale representing 1 = strongly disagree through 5 = strongly agree, Options 1 and 5 would be considered extreme. Options 2 = disagree and 3 = neither agree nor disagree, and Option 4 = agree are nonextreme. An individual who consistently chooses an extreme option, regardless of whether they are high or low on the substantive trait, would have high levels of ERS. In contrast, midpoint response style (MRS) is the systematic tendency to use the midpoint response category. The middle category of a 5-point scale is 3 = neither agree nor disagree and the consistent selection of this midpoint would indicate high levels of MRS.

All response styles, such as ERS and MRS, are important considerations because the presence of response style can confound interpretation of scores (Bolt & Johnson, 2009; Jin & Wang, 2014) by introducing bias in the total score with regard to the primary substantive TOI (Leventhal & Stone, 2018). Specific potential consequences of ignored ERS and MRS include score inflation, reordering individuals along the scale, spurious correlations between constructs, distorted factor structures, and differential item functioning not related to content of the item (Thissen-Roe & Thissen, 2013). Therefore, the consequences of unaccounted-for response styles may mask, or over-state, true relationships among measures and affect the substantive conclusions in survey analysis.

Few studies have documented the empirical effects of failing to incorporate response style. van Vlimmeren et al. (2017) illustrate the importance of accounting for response style, showing that latent TOI levels differed substantially before and after controlling for response style. In another study, Danner et al. (2015) found that failing to account for response style systematically affects the variance of personality items and biases the association with criterion variables. A third study found the presence of acquiescence response style can distort the intended factorial structure of a questionnaire by introducing bias to the item variances and covariances (Rammstedt et al., 2010). Jin and Wang (2014) reported that ignoring ERS by fitting standard item response theory (IRT) models resulted in biased parameter estimates.

Another study (Adams et al., 2019) made use of a multidimensional nominal response model (MNRM; Bolt & Johnson, 2009) to examine response styles at the level of individual respondent. They found considerable heterogeneity of response styles in an empirical example, cautioning that unexpected (i.e., not identified nor addressed) response styles can confound measurement of the identified response style(s) and the TOI. Furthermore, accounting for response styles can influence the precision with which the TOI is estimated (Adams et al., 2019). Collectively, the impact of response styles on measurement of the TOI is clear. What is lacking relates to the correlates and predictors of response styles, at the item- and person-level, and whether an understanding of response style predictors can help mitigate the effects of response style contamination.

Measurement of Response Style

Jackson and Messick (1958) stress the development and evaluation of methods accounting for response styles. Multiple approaches exist to measure and assess response styles (see Van Vaerenbergh & Thomas, 2013). Classical methods count how often a participant selects a particular response category, such as the extreme categories. The primary limitation of this approach is that it confounds the measurement of response styles and the trait(s) of interest. For example, a person providing many high scores on a set of items could be an extreme responder but could also be very high on the TOI. The classical approach cannot disentangle response style and the substantive trait; as such, there is no direct way of correcting the TOI measure for response style bias in the classical approach (Bolt et al., 2014). Multidimensional item response theory (MIRT) models simultaneously measure the TOI and response styles, such as ERS and MRS, to address these shortcomings. MIRT models approaches to response style can be categorized as either threshold-based approaches or response–process approaches (Böckenholt & Meiser, 2017).

Threshold-based approaches to ordinal items assume that individuals with higher levels of the underlying TOI latent trait(s) will endorse higher response categories across survey items. The threshold-based approach to modeling response styles captures individual differences in the use of response categories by varying the thresholds for each individual (Böckenholt & Meiser, 2017). The threshold approach has been extended to multidimensional models. For example, the MNRM (Bolt & Johnson, 2009) builds on a unidimensional threshold-based approach to account for response styles. Based on the nominal response model, the MNRM includes a second latent trait specified as a response style (Bolt & Newton, 2011) and can simultaneously accommodate multiple response styles.

Item Response Trees for Response Style Measurement

Alternatively, item response tree (IRTrees) are a form of noncompensatory, multidimensional IRT models. They are a flexible approach that explicitly models a multiple-stage decision-making process (Böckenholt, 2012). For a five-category ordinal item, one hypothesized decision-making process is that an individual responds using three distinct decisions, as illustrated in Figure 1. Each decision is driven by a separate trait ( $θ_{MRS}, θ_{TOI}, and θ_{ERS}$ for MRS, TOI, and ERS, respectively; Böckenholt, 2012; De Boeck & Partchev, 2012). The IRTree model easily adapts to fit different conceptualizations and different numbers of categories. For the three-decision IRTree (see Figure 1), respondent i first enters into the MRS stage of item j by endorsing the middle category, neither agree nor disagree, with probability $m_{ij}$ . If the respondent does not endorse the midpoint category (with probability $1 - m_{ij}$ ), the TOI is endorsed with probability $t_{ij}$ (or not endorsed with probability $1 - t_{ij}$ ) in the second stage. In the third stage, the ERS stage, the extreme response is endorsed with probability $e_{ij}$ , or not endorsed with probability $1 - e_{ij}$ (Böckenholt, 2012; Plieninger & Heck, 2018). Each oval in Figure 1 represents a decision stage, driven by a separate latent trait. While there are two ovals, or nodes, associated with the ERS trait in Figure 1, only one ERS trait is assumed to drive the decision stage. That is, there are not separate ERS traits for low TOI responders and high TOI responders. One example of separate ERS traits is provided in Park and Wu (2019); this decision of whether to have a single ERS trait or multiple ERS traits is up to the researcher based on the hypothesized responding process.

Figure 1.

Item response tree (IRTree) model accounting for midpoint response style (MRS), the trait of interest (TOI), and extreme response style (ERS) for a 5-point ordinal item. Observed responses are denoted by rectangles and decision stages by ovals. In each decision stage, the latent trait underlying the decision is denoted by the corresponding theta symbol.

The probability of each Likert-type item response category can be specified by tracing the path in Figure 1. For example, the probability of person i responding to item j in the lowest category, strongly disagree ( $Y_{ij} = 1$ ), is given by:

\Pr (Y_{ij} = 1 | θ_{MRS}, θ_{TOI}, θ_{ERS}, ω_{j}) = (1 - m_{ij}) * (1 - t_{ij}) * (e_{ij}),

(1)

where $ω_{j}$ represents the item parameters. In the three-decision model applied to a 5-point Likert-type item, such as the one in Figure 1, each decision is dichotomous (i.e., endorse or not endorse). The original ordinal-scale data can be recoded into three dichotomous pseudo-items, $Y_{ijk}^{*}$ , for the kth decision stage (k = MRS, TOI, ERS). This pseudo-item approach is also helpful for visualizing the model (see Table 1). Returning to the lowest category strongly disagree example, an individual would have a pseudo-item response profile of [001] to represent a decision process of failing to endorse the midpoint and TOI pseudo-items but endorsing the extreme response pseudo-item, for item j. Another example is the second-highest category, agree, for which an individual would have a response profile of [010] to represent a decision process of failing to endorse the midpoint pseudo-item, endorsing the TOI pseudo-item, and failing to endorse the extreme response pseudo-item. A two-parameter logistic (2PL) IRT model models each decision branch (i.e., dichotomous pseudo-item) of the tree. Thus, for a survey with 16 ordinal items, there are 48 dichotomous pseudo-items.

Table 1.

Pseudo-Item Coding Matrix.

Original scale (numeric response)	$Y_{MRS}^{*}$	$Y_{TOI}^{*}$	$Y_{ERS}^{*}$
Strongly agree (5)	0	1	1
Agree (4)	0	1	0
Neither agree nor disagree (3)	1	–	–
Disagree (2)	0	0	0
Strongly disagree (1)	0	0	1

Note. $Y_{k}^{*}$ is the pseudo-item for the kth decision stage, where k = MRS, ERS, or TOI. MRS = midpoint response style; TOI = trait of interest; ERS = extreme response style.

Utilizing the unidimensional 2PL, a generalized IRTree model (Jeon & De Boeck, 2016), for person i endorsing the pseudo-item j in the kth decision stage, is given by:

\Pr (Y_{ijk}^{*} = 1 | θ_{ik}) = g^{- 1} (a_{jk} θ_{ik} + b_{jk}),

(2)

where $g^{- 1}$ is the logit function. Equation 2 is explicitly modeling each of the k decision stages (i.e., the dichotomous pseudo-items). The person parameter, $θ_{ik}$ , represents person i’s latent trait at decision stage k. For example, person i’s latent propensity for ERS would be denoted as $θ_{iERS}$ . The item parameter $b_{jk}$ represents the jth item easiness, or intercept, at decision stage k. For example, $b_{jERS}$ represents item j’s easiness in eliciting an extreme response. Large, positive values indicate the item elicits more extreme responses and negative values indicate the item elicits fewer extreme responses. The slope parameter, $a_{jk}$ , can be interpreted as how well the item discriminates among individuals who are low on the latent trait in decision stage k and those who are high on the latent trait in decision stage k (de Jong et al., 2008). For example, the parameter $a_{jERS}$ represents how well the item discriminates among individuals who are low in ERS and those high in ERS. This model is unconditional because there are no person or item covariates included. However, the response in each decision stage is conditional on the response in previous decision stages to the pseudo-items. That is, a respondent cannot endorse both the midpoint pseudo-item and also endorse the TOI pseudo-item.

Heterogeneity in Response Styles

One potential advantage of flexible IRTree models is that they can simultaneously estimate relations among response style and predictors, which allows for the use of covariates to explain heterogeneity in response styles. Previous research found response styles differ across geographic regions, ethnicities, and nationalities (de Jong et al., 2008; Thissen-Roe & Thissen, 2013) and may be attributable to differences in language, education level, interpretation of response scale anchors, and cultural customs (Harzing, 2006). For instance, de Jong et al. (2008) used a tree-like IRT model and found ERS is positively related to national-cultural individualism and masculinity. Individual respondent characteristics have also been examined for their relationship to response styles, but the results have been inconsistent. In one example, women tended to have higher levels of extreme response tendencies than men (e.g., de Jong et al., 2008), but Peterson et al. (2014) found males tended to have higher extreme response tendencies than females. Other positive correlates with the propensity for extreme responding include both younger and older individuals (de Jong et al., 2008), taking a survey face-to-face rather than web based (Liu et al., 2016), responding to survey items quickly, and simplistic thinking tendencies (Naemi et al., 2009). In the United States, studies have shown that African Americans and Hispanics tend to give more extreme responses than Caucasians (e.g., Bachman & O’Malley, 1984). Jin and Wang (2014) found that model fit improved with the addition of person covariates explaining response styles of a generalized IRT model and that number of siblings was a significant predictor of ERS.

There has been less research into the correlates with MRS. In two studies, respondents showed a higher propensity toward choosing the midpoint response options when the survey was delivered in their second language (Harzing, 2006; Gibbons et al., 1999) and a higher propensity toward extreme responses when the survey was delivered in the respondents’ native language. In addition, paper-and-pencil and web-based surveys tended to elicit more midpoint responses when compared to telephone surveys (Harzing, 2006). East Asian respondents have been shown to give more midpoint responses than North American respondents (Culpepper & Lowery, 2002).

Fewer studies explored item-level characteristics that may explain MRS and ERS. Park and Wu (2019) found that individuals tended to avoid using the lower extreme category regardless of whether the item was positively or negatively worded. Bandaloset al. (2019) examined whether item labeling, with complete or endpoint-only categorical labels, affected extreme and midpoint responses. They found that items with endpoint-only labels tended to elicit more extreme responses as well as more midpoint responses. Böckenholt (2019) reported improved model fit with the addition of an item-level covariate.

Covariate IRTrees for Response Style Heterogeneity

The research settings and populations differ across these studies, as do the methodologies used to measure response styles. Many of the studies that explored response style heterogeneity relied on the classical approach, performing a post hoc comparison of response style levels rather than incorporating predictors directly into the model. While both the threshold-based and tree-like approaches can theoretically incorporate covariates, only a few studies have attempted to explain response style heterogeneity in a generalized IRT (e.g., Jin & Wang, 2014) framework, which includes IRTrees. Jin and Wang’s (2014) approach is limited because their model explores only one response style at a time—ERS—with only person covariates. In the IRTree framework, a few studies incorporated item-level covariates (e.g., Bandalos et al., 2019; Park & Wu, 2019), but not item- and person-covariates in the same model. There is a need to evaluate IRTree model performance with both person- and item-covariates for multiple response style traits.

To expand on previous work, the response process model in Equation 2 extends to incorporate item- and person-level covariates. The covariate IRTree (Jeon & De Boeck, 2016), conceptually an extension of the linear logistic test model (Fischer, 1973), is

\begin{matrix} g (\Pr (Y_{ijk}^{*} = 1 | θ_{ik})) = \sum_{q = 1}^{Q} a_{jkq} X_{jkq} θ_{ik} + \sum_{l = 1}^{L} b_{jkl} W_{jkl}, \\ θ_{ik} = \sum_{s = 1}^{S} γ_{ks} Z_{iks} + {θ'}_{ik}, \end{matrix}

(3)

where $a_{jkq}$ is the regression coefficient for the qth covariate $X_{jkq}$ ( $q = 1, \dots, Q$ ) that explains the discrimination parameters in decision stage k; $b_{jkl}$ is the regression coefficient for the lth covariate $W_{jkl}$ ( $l = 1, \dots, L, L \geq J$ ) that explains the easiness, or intercept, parameters in decision stage k; and $γ_{ks}$ is the regression coefficient for the sth covariate $Z_{iks}$ ( $s = 1, \dots, S$ ) that explains the latent trait at decision stage k. As specified in the covariate IRTree in Equation 3, the effect of the item-level covariates are not invariant across dimensions (i.e., across MRS, TOI, and ERS), but the covariate values themselves are. For example, if the item-level covariate is a dichotomous indicator of positive- versus negative-wording, the covariate is always 1 for item 1, if item 1 is a positively worded item. However, $b_{1 kl}$ varies by decision stage because the pseudo-item varies by decision stage. Both $X_{jkq}$ and $W_{jkl}$ contain item indicators such that $X_{jkq}$ and $W_{jkl}$ equal 1 for item j and 0 otherwise. Thus, the regression coefficient on the item indicator represents item discrimination, $a_{jk}$ , and item intercept, $b_{jk}$ , for item j in decision stage k. In addition, $X_{jkq}$ and $W_{jkl}$ can both contain the covariates for item discrimination and easiness, respectively, which effectively decompose the unconditional parameters. For the person traits, $γ_{ks}$ is the regression slope for the sth person covariate in $Z_{iks}$ and $θ_{ik}^{'}$ is the residual latent trait not explained by person covariates. As Jeon and De Boeck (2016) point out, when the measurement of ERS, MRS, or TOI is the primary purpose of the analysis, person covariates should not be used. However, they can be useful in explaining the varying levels of the latent traits found in the unconditional model in Equation 2.

We illustrate the model with a simple example for item j in the k = ERS decision stage, a single person-level covariate used to explain the ERS latent trait level, and a single item-level covariate used to explain easiness of endorsing the ERS pseudo-item. The person covariate is dummy coded, representing, in this example, whether the person’s biological sex is male (coded 0) or female (coded 1). The item covariate is continuous, representing the number of words in the item prompt. Because we are only focusing on one item for illustrative purposes and single person- and item-level covariates, the summation symbols are dropped.

\begin{matrix} g (\Pr (Y_{ijERS}^{*} = 1 | θ_{iERS})) = a_{jERS} θ_{iERS} + b_{jERS} + b_{jERS, Words} * W_{jERS, Words}, \\ θ_{iERS} = γ_{ERS, female} Z_{iERS, female} + θ'_{iERS}, \end{matrix}

(4)

In Equation 4, $b_{jERS}$ is item j’s easiness in endorsing extreme responses; positive values of $b_{jERS}$ indicate, in general, that extreme responses are easy to endorse for item j. $γ_{ERS, female}$ is the regression weight on the person covariate, $Z_{i, ERS, female}$ , indicating the difference in average levels of ERS for males and females. A positive $γ_{ERS, female}$ would indicate females have higher levels of ERS, on average, than males. A positive $b_{jERS, Words}$ would indicate items with more words tends to be easier to endorse for ERS.

This study demonstrates use of the covariate IRTree model using multiple person covariates, extending work from Jin and Wang (2014) and Jeon and De Boeck (2016), neither of which used covariates to explain latent trait parameters. We also present a simulation study evaluating adequacy of the Bayesian Markov chain Monte Carlo (MCMC) estimation method for parameter recovery. After demonstrating adequate parameter recovery, we provide an empirical example using the National Longitudinal Study of Adolescent to Adult Health (Add Health; Harris et al., 2009) six-item sexual knowledge scale with multiple covariates to predict item easiness. The specific research questions are

Research Question 1: What are the effects of sample size, correlation among the latent traits, relation among latent traits and a covariate, and the specific latent trait related to the covariate on covariate IRTree item and regression parameter recovery?

Research Question 2: What are the covariates that explain heterogeneity in the substantive trait and response styles of the Add Health sexual knowledge scale?

Method and Results

Simulation: Parameter Recovery

A relatively well-known limitation of noncompensatory MIRT models is the inability to accurately estimate model parameters (Wang & Nydick, 2015). Thus, it was important to ensure adequate recovery of parameters prior to interpretation of parameter estimates. Jin and Wang (2014), using MCMC estimation, performed one of the more comprehensive parameter recovery studies for response style models. However, their model contained only one response style (i.e., ERS) and was not a tree-like model. Jeon and De Boeck (2016) conducted a small simulation study evaluating a two-dimensional generalized IRTree fit to 24 three-category responses (N = 316) using maximum-likelihood estimation and found that item parameters were recovered quite well. Plieninger and Heck (2018) compared their proposed acquiescence model to a 1PL version of the unconditional generalized three-dimensional IRTree model presented earlier (N = 250). They found adequate recovery of difficulty parameters using MCMC estimation. Researchers evaluating a noncompensatory MIRT model—similar to the generalized three-dimensional IRTree model, albeit a 1PL parameterization—found MCMC estimation outperformed Bayesian and non-Bayesian specifications of the Metropolis–Hastings Robbins–Monro algorithm, especially in conditions with high intertrait correlations (Wang & Nydick, 2015). Given these findings and our relatively high intertrait correlations, we elected to use MCMC estimation for the current study.

In addition to Wang and Nydick’s (2015) evaluation of estimation algorithms, they further suggested sample sizes of 1,000 are required for adequate parameter recovery of noncompensatory MIRT models without missing data. Recoding of items into pseudo-items results in missing data in the TOI and ERS pseudo-items (see Table 1) when the midpoint pseudo-item is endorsed and recovery of item parameters may be compromised. Accordingly, we considered the following sample sizes: (N = 1,000, 2,000, 4,000).

The magnitude of latent trait intertrait correlations can affect parameter recovery (Wang & Nydick, 2015). Previous empirical studies (including the current) have generally found similar correlations among traits: weak to moderate negative relation between MRS and TOI, strong negative relation between MRS and ERS, and weak positive relation between TOI and ERS (e.g., Ames & Myers, 2020; Böckenholt, 2017; Plieninger & Heck, 2018). Nonetheless, others have found near zero correlations among latent traits (Myers & Ames, 2019); thus, we evaluated conditions with uncorrelated and correlated latent traits.

We further considered the magnitude of relations among the continuous covariate and the latent traits that represented no, moderate, and strong relations ( $γ_{ks} = 0.00, 0.30, 0.50$ ) and the latent trait related to the covariate ( $k = MRS, TOI, ERS$ ). In addition, we evaluated two conditions where the covariate was related to all latent traits ( $γ_{MRS, S} = - 0.30, γ_{TOI, S} = 0.50, γ_{ERS, S} = 0.30$ or $γ_{MRS, S} = - 0.50, γ_{TOI, S} = 0.30, γ_{ERS, S} = 0.50$ ). The number of items was fixed to 20 across all conditions, which is consistent with previous studies involving IRTrees (Jeon & De Boeck, 2016; Plieninger & Heck, 2018). A total of 66 conditions were evaluated with 100 replicates each.

Person parameters and the continuous covariate were simulated from a multivariate standard normal distribution: $MVN (0, \sum),$ where the diagonal elements of ∑ were ones and the off-diagonal elements were zeros or $r_{MRS, TOI} = - . 26, r_{MRS, ERS} = - . 55, r_{ERS, TOI} = . 11$ with slope ( $γ_{ks}$ ) parameters as previously listed. There are few published studies with empirical IRTree parameters, and the IRTree simulation studies have been limited almost entirely to the empirical example explored in the study. Thus, we had little guidance on potential generating item parameters and what would be realistic and relevant. The generating item parameters (see Table 2) were selected to represent a range of plausible discrimination and intercept parameters typically found in IRT studies and guided by an empirical data set of responses to a national survey on food insecurity with a very large sample size. Item parameters were fixed at each replication of the simulation because little is known about potential distributions of tree-like item parameters.

Table 2.

Generating Item Parameters for IRTree Model.

	Item parameters
Item	$a_{MRS}$	$a_{TOI}$	$a_{ERS}$	$b_{MRS}$	$b_{TOI}$	$b_{ERS}$
1	0.49	1.06	1.33	1.79	−0.93	−0.03
2	0.96	0.93	0.94	−0.48	−0.60	0.73
3	0.80	0.84	0.88	−1.32	0.74	1.55
4	1.21	1.35	1.13	0.29	0.34	0.02
5	1.04	0.96	0.77	−0.24	0.83	−0.75
6	1.05	1.18	1.38	0.93	1.19	−1.04
7	1.10	1.10	0.69	−0.28	−0.47	0.41
8	1.19	1.36	1.96	0.52	1.41	−0.39
9	0.66	0.82	0.66	1.53	−1.46	−0.75
10	0.99	1.67	1.84	0.05	−0.17	−0.47
11	1.00	0.58	0.78	−0.07	−1.58	0.12
12	1.49	1.60	1.29	0.22	0.47	−0.67
13	1.23	0.70	1.27	1.35	0.67	0.47
14	0.72	0.99	1.62	1.03	−0.62	−0.30
15	1.81	1.50	1.26	−1.04	0.12	1.44
16	1.17	1.38	1.15	−0.68	−0.33	−0.11
17	0.81	1.29	1.04	−0.83	−0.56	0.76
18	1.55	1.10	0.56	0.00	0.84	2.00
19	1.62	1.82	1.48	−0.63	0.05	1.13
20	1.32	0.60	0.95	0.51	0.29	−1.64

Note. $a_{k}$ and $b_{k}$ represent discrimination and (negative) intercept parameters. Discrimination parameters are in the probit metric (i.e., $a_{probit} ≅ a_{logit} / 1.702$ ). IRTree = item response tree; MRS = midpoint response style; TOI = trait of interest; ERS = extreme response style.

The covariate IRTree model was fit with Mplus using MCMC estimation with the Gibbs sampler (Muthén & Muthén, 1998-2017). For model identification, latent trait means are fixed to zero and latent trait variances are fixed to one. Two chains were estimated with the first 25,000 iterations from each chain discarded as burn-ins and the posterior distribution consisted of the last 25,000 iterations from each chain. Convergence was assessed via visualization of trace plots and evaluation of the potential scale reduction factor convergence criteria (Gelman & Rubin, 1992). The minimum potential scale reduction factor convergence criterion was set to 1.02, which corresponds to less than 4% of the variability in the posterior being between-chain variability. Weakly informative priors were used for the item parameters: $a_{jk},$ $b_{jk}, and γ_{ks} ~ N (0, σ^{2} = 2)$ . The latent trait vector was assumed multivariate standard normal, and an inverse-Wishart prior was placed on the covariance structure: $\sum ~ IW (0, 4)$ .

The accuracy of parameter estimates was evaluated via coverage, average bias, root mean square error (RMSE), and relative bias. Coverage was computed as the proportion of 95% credible intervals that contained the generating parameter value. Average bias of the item parameters, within each of the $k = 3$ traits, was calculated as: $\frac{1}{R * J} \sum_{r = 1}^{R} \sum_{j = 1}^{J} ({\hat{δ}}_{r j} - δ_{j})$ , where ${\hat{δ}}_{rj}$ and $δ_{j}$ represent the estimated and generating parameter values for replication $r = 1, \dots, R$ and item $j = 1, \dots, J$ . RMSE of the item parameters, within each of the $k = 3$ traits, was calculated as $\sqrt{\frac{1}{R * J} \sum_{r = 1}^{R} \sum_{j = 1}^{J} {({\hat{δ}}_{rj} - δ_{j})}^{2}}$ . Relative bias was calculated for each trait as $\frac{1}{R * J} \sum_{r = 1}^{R} \sum_{j = 1}^{J} (({\hat{δ}}_{rj} - δ_{j}) / δ_{j})$ .

Results of Simulation Study

Item Parameter Recovery

A series of between-subjects factorial analyses of variance were conducted to evaluate the effect the simulation conditions on estimation bias. Given the purpose of the study was to identify conditions that influenced parameter recovery to the greatest extent, we did not evaluate statistical significance of the analyses of variance but rather the effect size $η_{p}^{2}$ , which can be interpreted as the proportion of variance accounted for by an effect after controlling for the other effects in the analysis. The magnitude of the latent trait correlations accounted for 0% ( $η_{p}^{2} = 0.00$ ) of the variability in bias associated with each parameter (e.g., $a_{jMRS}, b_{jTOI}$ ) and thus had negligible influence on item parameter estimation bias. Results were collapsed across the correlated and uncorrelated conditions to facilitate interpretation of results. Furthermore, the number of latent traits the covariate was related to accounted for 0% ( $η_{p}^{2} = 0.00$ ) of the variability in parameter bias and thus had negligible effect on item parameter estimation bias. Thus, item parameter recovery was evaluated across sample sizes, whether the latent trait was related to the covariate, and magnitude of the relations (absolute value) between the latent traits and the covariate.

Discrimination

The trait related to the covariate, magnitude of the relation with the covariate ( $γ_{ks} : no = 0.00, small = 0.30, large = 0.50$ ), and their interaction accounted for the most variability in discrimination parameter estimation bias across MRS, TOI, and ERS dimensions (see Table 3). Though the average bias was approximately 0 when the covariate was not related to the latent trait, given the interaction, when the covariate was related to the latent trait, the average bias was $- 0.04$ and $- 0.14$ for the small and large magnitude conditions. Thus, the degree to which the parameters were underestimated increased as the magnitude of the relation to the covariate increased. A similar pattern was found with regard to RMSE; however, RMSE values associated with the MRS trait were slightly lower than RMSE values associated with the TOI and ERS traits (see Figure 2). This finding is likely attributable to the missing data associated with TOI and ERS pseudo-items. Note the proportions of variance explained by simulation conditions associated with the TOI and ERS dimensions were smaller than the proportion associated with the MRS condition. This finding is also likely reflective of the missing data associated with the TOI and ERS pseudo-items. That is, the missing data likely accounts for a portion of the variance in parameter estimation bias.

Table 3.

Proportion of Variance ( $η_{p}^{2}$ ) in Bias Attributed to Simulation Condition.

	Parameter
Condition	$a_{jMRS}$	$a_{jTOI}$	$a_{jERS}$	$b_{jMRS}$	$b_{jTOI}$	$b_{jERS}$	$γ_{ks}$
Sample size (SS)	.00	.00	.00	.01	.01	.01	.00
Trait-covariate relation (TCL)	.13	.07	.06	.00	.00	.00	.02
Covariate magnitude (CM)	.11	.06	.05	.00	.00	.00	.02
SS × TCL	.00	.00	.00	.00	.00	.00	.00
SS × CM	.00	.00	.00	.00	.00	.00	.00
TCL × CM	.11	.06	.05	.00	.00	.00	.00
Total variance explained	.31	.20	.16	.01	.01	.00	.02

Note. MRS = midpoint response style; TOI = trait of interest; ERS = extreme response style.

Figure 2.

Item parameter average bias, relative bias, and root mean square error (RMSE) across sample sizes, latent traits, and magnitude of slope coefficient ( $γ_{ks}$ ).

In terms of relative bias, the magnitude of the relation with the covariate had the largest effect on discrimination parameter estimates with the large magnitude influencing parameter estimates the most and no relation influencing parameter estimates least. Nonetheless, discrimination parameter estimates were within or near the commonly used ±10% relative bias heuristic for small bias across all conditions. Coverage of the discrimination parameters decreased as the magnitude of the relation to the covariate and sample size increased and was essentially equivalent across dimensions. In conditions where the covariate was related to the relevant latent trait, average coverage across dimensions was $0.92$ and $0.69$ ( $N = 1, 000$ ), $0.90$ and $0.55$ ( $N = 2, 000$ ), and $0.84$ and $0.40$ ( $N = 4, 000$ ) for small and large magnitude conditions, respectively. This finding corresponds with the same pattern found among the empirical standard errors (i.e., standard deviations of estimates across replications). That is, the empirical standard errors tended to decrease as the magnitude of the relation with the covariate increased and slightly decrease as the sample size increased.

Intercept

Sample size accounted for 1% of the variance in intercept parameter estimation bias across MRS, TOI, and ERS conditions. Nonetheless, the average bias among intercept parameters was approximately 0 across all conditions and dimensions. The largest relative bias (~5^~) was associated with the TOI intercepts in the $N = 1, 000$ sample size condition. The magnitude of the relation with the covariate had negligible influence on intercept parameter RMSE values. MRS traits were associated with the smallest RMSE values, followed by TOI, and ERS traits were associated with the largest RMSE values. This pattern of RMSE values was similar across sample sizes and decreased as sample size increased. Coverage of the intercept parameters was greater than $0.95$ in all conditions except for in the $N = 4, 000$ large magnitude condition, where coverage was $0.93$ , $0.94$ , and $0.91$ associated with MRS, TOI, and ERS traits, respectively.

Covariate

The slopes among the latent traits and the covariate ( $γ_{ks}$ ) were recovered well across all conditions. The trait in which the covariate was related to and magnitude of the relation conditions each accounted for approximately 2% of the variability in estimation bias after accounting for the other conditions and interactions ( $η_{p}^{2} = 0.02$ ). The average bias was approximately $0$ (RMSE $= 0.02$ ) when the covariate was related to the MRS and TOI traits. However, when the covariate was related to the ERS trait, the average bias (RMSE) was $0.00$ ( $0.04$ ), $- 0.03$ ( $0.06$ ), and $- 0.05$ ( $0.05$ ) across the $0.00$ , $0.30$ , and $0.50$ magnitude conditions, collapsing across sample size conditions. The negative average bias findings indicated the degree of underestimation increased as the magnitude of the slope parameters increased. Although parameter estimation was adequate with regard to average bias and RMSE, 95% coverage of slopes associated with $0.30$ magnitude condition was less than adequate in the $N = 1, 000$ ( $0.86, 0.91, 0.73$ ), $N = 2, 000$ ( $0.77, 0.84, 0.60$ ), and $N = 4, 000$ ( $0.58, 0.82, 0.28$ ) conditions, for MRS, TOI, and ERS conditions, respectively. Coverage of slopes associated with $0.50$ magnitude and ERS trait conditions was less than $0.95$ in the $N = 1, 000$ ( $0.84$ ), $N = 2, 000$ ( $0.77$ ), and $N = 4, 000$ ( $0.59$ ) conditions. The relatively low coverage is associated with relatively small empirical standard error values across all conditions (~0.02) and average bias in corresponding conditions. The parameter estimation bias and coverage were most evident when the covariate was related to the ERS trait, which is likely related to the missing data associated with ERS pseudo-items.

Empirical Example

We use data from the National Longitudinal Study of Adolescent to Adult Health (Add Health) survey (Harris et al., 2009). Add Health began with an in-school questionnaire administered to a nationally representative sample of students in Grades 7 to 12 in 1994-1995 (Wave 1). Our primary focus is on a six-item scale of sexual knowledge (see Table 4 for items and Add Health database variable names, with means and category percentages). Items include a subject (self, friend) and topic (condom, rhythm method, withdrawal method) component. For example, the friend-item asks, “Your closest friends are quite knowledgeable about the withdrawal method of birth control,” whereas the self-item asks, “You are quite knowledgeable about the withdrawal method of birth control” when referring to birth control. Response options include a 5-point rating response format going from strongly disagree = 1 to strongly agree = 5. We chose to include both the self- and friend-items because of the close link in adolescence between friends’ behavior and an individual’s sexual behavior (Smith et al., 1985), implying a strong positive relationship between an adolescent individual’s self-reported sexual behavior and the sexual behavior of their friends.

Table 4.

Sexual Knowledge Scale of the Add Health Survey.

Subject	Topic	Add Health variable name	M	Percent extreme	Percent midpoint	M
Self	Condom	H1PF6	4.2101	46.62	7.31	4.21
Self	Rhythm	H1PF11	3.5870	21.53	28.99	3.59
Self	Withdrawal	H1PF22	3.4415	21.46	19.99	3.44
Friend	Condom	H1PF12	3.9167	28.54	19.09	3.92
Friend	Rhythm	H1PF17	3.3485	14.25	33.72	3.35
Friend	Withdrawal	H1PF9	3.6650	22.24	18.53	3.67

Description of the Sample

All respondents in the random sample had complete data at Wave 1 for the sexual knowledge scale and covariates (N = 2,666 respondents). Response percentages and are provided in Figure 3. The item with the lowest mean (3.35) was the friend-rhythm item, which also had the lowest percentage of extreme responses (14.25%) and highest percentage of midpoint responses (33.72%). The item with the highest mean (4.21) was the self-condom item, which also had the highest percentage of extreme responses (46.62%) and lowest percentage of midpoint responses (7.31%).

Figure 3.

Percentage of respondents in each category.

Covariates

We chose whether the respondent was Caucasian (variable name H1GI6A) and biological sex (variable name BIO_SEX) as dichotomous person-level covariates. We use the parental support and relationship (also from Add Health) scale score as a continuous person-level covariate, computed as the average of nine items that ask respondents their level of agreement to statements such as “Most of the time, your father is warm and loving toward you.” High scale scores indicate higher perceived levels of parental support and a closer relationship with parents. The sample was 71.94% White, 49.89% female, average age of 15.75, and the average parental relationship scale score was 4.09 out of 5.00. The item-level covariate is whether the item is self-oriented or friend-oriented.

Estimation

All data analyses and model-fitting were performed using MCMC in the SAS (SAS Institute Inc., 2017) MCMC procedure. Prior distributions for the item parameters were weakly informative (i.e., $b ~ N (0, σ^{2} = 3); a ~ N (0, σ^{2} = 2)$ ) and the prior on the vector of latent abilities was multivariate normal (i.e., $θ ~ MVN (μ, \sum)$ ). Priors on the individual elements of $μ$ were $μ_{k} ~ N (0, σ^{2} = 2)$ , and priors for the diagonal elements of ∑ were truncated normal $(σ_{k} ~ N (0, σ^{2} = 2, lower = - 2, upper = 2))$ . To estimate the models in Equations 2 and 3, the data were placed into “long” form, as detailed in De Boeck and Wilson (2004). An example of this format can be found in Figure 4, which illustrates that the item responses serve as the dependent variables and the covariates as their predictors, for two respondents. Person 1 is male and Person 2666 is female in this fictional representation.

Figure 4.

Long form example data with one person covariate (female), one item covariate (self item), the observed response (response), and item indicators (self-condom through friend-withdraw).

Results, Empirical Example

Unconditional IRTree

To investigate the presence of ERS and MRS in the Add Health responses, we first fit an unconditional (i.e., no covariates) IRTree to the Add Health data (see Table 5). In the ERS decision stage, the discrimination parameters ranged from 1.946 to 3.629, indicating the presence of ERS in the data. The item easiness parameters in the ERS decision stage indicated the items were relatively difficult to endorse for the extreme options with the exception of Item 6. Item 6 had the highest percentage of extreme responses (45.7%; see Table 2), reflected in $b_{6 ERS}$ being the easiest item in that decision stage ( $b_{6 ERS} = - 0.069$ ).

In the MRS decision stage, the discrimination parameters ranged from 1.591 to 2.649, indicating the presence of MRS in the data. The item easiness parameters in the MRS decision stage indicated the items were relatively difficult to endorse for the midpoint options. In this decision stage, Item 6 was the most difficult ( $b_{6 MRS} = - 3.121$ ) to elicit midpoint responses. This is because Item 6 had the lowest percentage of midpoint responses (9.2%). The TOI decision stage indicated most items were easy to endorse in terms of sexual knowledge, which is reflected in the average item scores greater than 3. Item 6 was the easiest to endorse ( $b_{6 TOI} = 3.598$ ), with 82.2% respondents either choosing agree or strongly agree.

Table 5.

Add Health Results: Unconditional IRTree.

Decision stage	Item	$a_{jk} EAP$	$a_{jk} HDP$ $(lower)$	$a_{jk} HDP$ $(upper)$	$b_{jk} EAP$	$b_{jk} HDP$ $(lower)$	$b_{jk} HDP$ $(upper)$
ERS	1	2.321	2.041	2.620	−1.577	−1.362	−1.786
	2	3.259	2.812	3.747	−2.866	−2.490	−3.268
	3	3.629	3.126	4.155	−1.661	−1.387	−1.953
	4	3.319	2.902	3.789	−2.181	−1.880	−2.490
	5	3.220	2.805	3.655	−2.036	−1.734	−2.321
	6	1.946	1.715	2.167	−0.069	0.058	−0.199
MRS	1	2.029	1.800	2.274	−1.404	−1.243	−1.570
	2	2.649	2.306	2.993	−1.284	−1.084	−1.474
	3	2.191	1.914	2.465	−2.302	−2.064	−2.527
	4	2.217	1.943	2.490	−2.233	−2.000	−2.451
	5	1.929	1.699	2.170	−2.046	−1.852	−2.242
	6	1.591	1.345	1.826	−3.121	−2.850	−3.372
TOI	1	2.518	2.172	2.896	2.187	2.450	1.921
	2	4.275	3.558	5.071	2.231	2.648	1.832
	3	2.924	2.477	3.403	3.708	4.156	3.258
	4	3.986	3.358	4.619	2.991	3.429	2.543
	5	4.002	3.363	4.659	2.374	2.754	1.990
	6	2.335	1.990	2.703	3.598	3.981	3.227

Note. HPD is the 95% highest posterior density with (lower) as the lower limit and (upper) as the upper limit. $a_{jk}$ and $b_{jk}$ are the item discrimination and easiness, respectively. IRTree = item response tree; EAP = expected a posteriori estimate; MRS = midpoint response style; TOI = trait of interest; ERS = extreme response style.

Covariate IRTree

We used person-level covariates to predict person parameters and an item-level covariate to predict item easiness. Table 6 presents the parameter estimates for the covariate regression weights of this analysis. The deviance information criterion (DIC) for the unconditional model was 32,299.84 and the DIC for the covariate IRTree model was 31,495.72, a difference of 804.12, indicating the covariate IRTree provided a better fit of the data to the model (Zhang et al., 2019). We computed DIC using the marginal likelihood, which does not use latent variables as parameters “in focus” of the computations (Merkle et al., 2019).

Table 6.

Covariate Weights.

				95% HPD Interval
Covariate	Node	M	SD	Lower	Upper
White	MRS	0.0356	0.0517	−0.0663	0.1344
Female	MRS	0.0353	0.0470	−0.0580	0.1269
Parental support and relationship	MRS	−0.0969^a	0.0357	−0.1627	−0.0261
Self	MRS	−0.1635^a	0.0632	−0.2890	−0.0498
White	ERS	−0.1036^a	0.0498	−0.2012	−0.0086
Female	ERS	−0.1779^a	0.0446	−0.2690	−0.0918
Parental support and relationship	ERS	0.2686^a	0.0347	0.2006	0.3357
Self	ERS	−0.3273^a	0.0496	−0.4293	−0.2393
White	TOI	0.0017	0.0501	−0.0938	0.1033
Female	TOI	−0.2018	0.0481	−0.2954	−0.1065
Parental support and relationship	TOI	0.1913^a	0.0339	0.1184	0.2554
Self	TOI	0.3231^a	0.0560	0.2111	0.4351

Note. HPD = highest posterior density, MRS = midpoint response style, ERS = extreme response style, TOI = trait of interest.

HPD does not cross 0.

In the midpoint decision stage, being White had a negligible impact on levels of the MRS trait (expected a posteriori [EAP] = 0.0356), as did being female (EAP = 0.0353). We define negligible as the HPD crossing zero. Higher levels of parental support and relationship were associated with slightly lower levels of the MRS trait (EAP = −0.0969, 95% HPD: −0.1627 to −0.0261). Endorsing the midpoint category was more difficult in the self-oriented items than the friend-oriented items (EAP = −.1635, 95% HPD: −0.2890 to −0.0498). In the extreme response decision stage, White respondents tended to have lower levels of the ERS trait (EAP = −0.1036, 95% HPD: −0.2012 to −0.0086) than non-Whites. Female respondents tended to have lower levels of the ERS trait (EAP = −0.1779, 95% HPD: −0.2690 to −0.0918) than males. Higher levels of parental support and relationship were associated with higher levels of the ERS trait (EAP = 0.2686, 95% HPD: 0.2006 to 0.3357). Endorsing the extreme categories was more difficult in the self-oriented items than the friend-oriented items (EAP = −0.3273, 95% HPD: −0.4293 to −0.2393).

In the sexual knowledge TOI stage, being White had a negligible impact on levels of the TOI trait (EAP = 0.0017). Female respondents tended to have lower levels of the TOI trait (EAP = −0.2018, 95% HPD: −0.2954 to −0.1065). Higher levels of parental support and relationship were associated with higher levels of the TOI trait (EAP = 0.1913, 95% HPD: 0.1184 to 0.2554). Endorsing the sexual knowledge TOI was easier in the self-oriented items than the friend-oriented items (EAP = 0.3231, 95% HPD: 0.2111 to 0.4351).

Discussion

This study has presented a covariate IRTree as a method to estimate and explain response styles. We conducted one parameter recovery simulation, which demonstrated adequate parameter recovery. Applied researchers should rest assured that item and regression parameters should be estimated with sufficient accuracy when sample sizes are greater than approximately 1,000 and MCMC estimation with the Gibbs sampler is used. Nonetheless, it would be prudent to evaluate conditions not explored in the current study. With regard to parameter estimation, the biggest area of concern is likely the missing data associated with the TOI and ERS pseudo-items. Parameters associated with TOI and ERS traits were generally estimated less precisely than parameters associated with the MRS trait, as the pseudo-item coding does not result in missing MRS pseudo-items. Altogether, these findings suggest that the Bayesian estimation option in Mplus provides a fast, flexible, and user-friendly approach to estimating the unconditional and covariate IRTrees. Mplus syntax for estimating this three-dimensional IRTree model is included in the Supplemental Appendix (available online).

The empirical example illustrated the flexibility of the model, incorporating person- and item-level covariates. In addition, a mix of continuous and dichotomous predictors were used. Meaningful predictors of higher levels of ERS latent trait included being non-White, being male, and having high levels of parental support and relationships. Previous research found mixed results on the role of gender and race on proclivity for providing extreme responses (Thissen-Roe & Thissen, 2013), but these findings agree with Bachman et al. (2010) and Meisenberg and Williams (2008) who found males provided more extreme responses than females. One concern with using parental support and relationships scale scores is that the scale is also prone to response style contamination. To investigate this hypothesis, we estimated an unconditional IRTree for the parental support and relationships scale and correlated the ERS traits across the sexual knowledge and parental support scales. The resulting correlation is large and significant ( $r = 0.712, p < . 001$ ), indicating that individuals with high ERS on one scale tend to have high levels of ERS on the other. The positive estimate of the parental support scale predictor on ERS may be due to the positive relationship between ERS across the scales.

Meaningful predictors of higher levels of MRS latent trait included having low levels of parental support and relationships. Finally, higher levels of sexual knowledge were associated with being male and having high levels of levels of parental support and relationships. Focusing on gender predictor, the findings are consistent with previous studies. For example, Leland and Barth (1992) found that males knew more about using condoms correctly and the use of condoms’ role in preventing sexually transmitted diseases. Previous research has found that young adults who discuss sexual activity and contraception use with a parent or guardian are more likely to use contraception consistently (Amialchuk & Gerhardinger, 2015; Crosby et al., 2002), affirming the positive relationship found between parental support and relationships and the sexual knowledge TOI.

Item-level covariates indicate the response style pseudo-items were less easy to endorse for self-oriented items, whereas the TOI pseudo-items were easier to endorse for self-oriented items. The difference between self- and other-oriented items indicates that self-oriented items elicit fewer extreme and midpoint responses. There are a few potential explanations. Projection of friends’ behavior has been shown to overestimate the similarity of the friend’s behaviors with their own (Mirande, 1968), which may potentially explain the relative easiness in endorsing the extreme pseudo-items, particularly at the higher end. It is relatively easier for respondents to endorse the TOI for the self-items, removing response style contamination, possibly indicating that the projection bias other researchers have noted might be attributable to response style differences between friend- and self-items.

Limitations

This study did not use covariates to predict or explain the item discrimination parameters. Incorporating item-level discrimination covariates, such as whether the item is negatively worded or self-oriented, could provide survey development guidance. If item features can be used to explain, for example, whether an item is able to discriminate between individuals with high and low response styles, then the inclusion or exclusion of those features could limit the effects of response style contamination through intentional survey design. Future research should intentionally manipulate features such as response option labels and item wording, among other features, to accomplish this task.

Another limitation is the noncompensatory nature of the IRTree model framework of focus in this article. In compensatory models, the latent abilities interact such that low levels of one latent trait can be offset by an increase in higher levels of another latent trait. By contrast, noncompensatory models do not have an interaction like this. Low levels of one latent trait cannot be completely offset through an increase in other traits. For example, consider the strongly disagree and strongly agree categories. The noncompensatory nature of the IRTree model implies these extreme responses cannot be obtained by high, or low, levels of the TOI alone—the respondent must have high levels of ERS in order to provide responses to these categories. This assumption may be unreasonable. That is, the relation between TOI and ERS may be compensatory in nature. Myers and Ames (2019) provide an initial exploration into the compensatory model for response styles, but there is still considerable work to be done in that area.

Despite these limitations, this study presented a generalized IRTree with covariates that can be useful for investigating the multiple decision stages of respondents to ordinal items. The empirical application to the Add Health data set demonstrated the presence of MRS and ERS response styles in the data. For those respondents with high response style scores, using a total score or latent trait from a unidimensional IRT model, such as the graded response model, will be misleading. In the context of Add Health, the presence of high levels of ERS and MRS could influence the findings on the effectiveness of interventions and bias the conclusions regarding correlates with sexual knowledge. The covariate IRTree is an important methodological tool that explains response style heterogeneity and detects the presence of response styles in the data. The model is flexible and can account for covariates at the item and respondent levels, proving useful in a variety of applications.

Supplemental Material

Supplementary_File_Appendix – Supplemental material for Explaining Variability in Response Style Traits: A Covariate-Adjusted IRTree

Supplemental material, Supplementary_File_Appendix for Explaining Variability in Response Style Traits: A Covariate-Adjusted IRTree by Allison J. Ames and Aaron J. Myers in Educational and Psychological Measurement

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Allison J. Ames

Supplemental Material

Supplemental material for this article is available online.

References

Adams

D. J.

Bolt

D. M.

Deng

Smith

S. S.

Baker

T. B.

(2019). Using multidimensional item response theory to evaluate how response styles impact measurement. British Journal of Mathematical and Statistical Psychology, 72(3), 466-485. https://doi.org/10.1111/bmsp.12169

Amialchuk

Gerhardinger

(2015). Contraceptive use and pregnancies in adolescents’ romantic relationships: Role of relationship activities and parental attitudes and communication. Journal of Developmental & Behavioral Pediatrics, 36(2), 86-97. https://doi.org/10.1097/DBP.0000000000000125

Ames

A. J.

Myers

A. J.

(2020, April 4-8). Explaining variability in extreme response style traits: A covariate-adjusted IRTree [Paper presentation]. Annual Meeting of the National Council on Measurement in Education, Toronto, Ontario, Canada.

Bachman

J. G.

O’Malley

P. M.

(1984). Black-white differences in self-esteem: Are they affected by response styles? American Journal of Sociology, 90(3), 624-639. https://doi.org/10.1086/228120

Bachman

J. G.

O’Malley

P. M.

Freedman-Doan

(2010). Response styles revisited: Racial/ethnic and gender differences in extreme responding (Monitoring the Future Occasional Paper No. 72). Institute for Social Research.

Bandalos

Ames

Spratto

(2019, April 4-8). Effects of category labeling on response choice: An IRTree analysis [Paper presentation]. National Council on Measurement in Education 2019 Annual Meeting, Toronto, Ontario, Canada.

Baumgartner

Steenkamp

J.-B. E. M

. (2001). Response styles in marketing research: A cross-national investigation. Journal of Marketing Research, 38(2), 143-156. https://doi.org/10.1509/jmkr.38.2.143.18840

Böckenholt

(2012). Modeling multiple response processes in judgment and choice. Psychological Methods, 17(4), 665-678. https://doi.org/10.1037/a0028111

Böckenholt

(2017). Measuring response styles in Likert items. Psychological Methods, 22(1), 69-83. https://doi.org/10.1037/met0000106

10.

Böckenholt

(2019). Assessing item-feature effects with item response tree models. British Journal of Mathematical and Statistical Psychology, 72(3), 486-500. https://doi.org/10.1111/bmsp.12163

11.

Böckenholt

Meiser

(2017). Response style analysis with threshold and multi-process IRT models: A review and tutorial. British Journal of Mathematical and Statistical Psychology, 70(1), 159-181. https://doi.org/10.1111/bmsp.12086

12.

Bolt

D. M.

Johnson

T. R.

(2009). Addressing score bias and differential item functioning due to individual differences in response style. Applied Psychological Measurement, 33(5), 335-352. https://doi.org/10.1177/0146621608329891

13.

Bolt

D. M.

Kim

J. S.

(2014). Measurement and control of response styles using anchoring vignettes: A model-based approach. Psychological Methods, 19(4), 528-541. https://doi.org/10.1037/met0000016

14.

Bolt

D. M.

Newton

J. R.

(2011). Multiscale measurement of extreme response style. Educational and Psychological Measurement, 71, 814-833. https://doi.org/10.1177/0013164410388411

15.

Cronbach

L. J.

(1946). Response sets and test validity. Educational and Psychological Measurement, 6(4), 475-494. https://doi.org/10.1177/001316444600600405

16.

Crosby

DiClemente

Wingood

Harrington

Davies

& Hook

M. K.

(2002). Low parental monitoring predicts subsequent pregnancy among African-American adolescent females. Journal of Pediatric and Adolescent Gynecology, 15(1), 43-46. https://doi.org/10.1016/S1083-3188(01)00138-3

17.

Culpepper

Lowery

(2002). Survey response bias among Chinese managers. Academy of Management Proceedings, 2002(1), J1-J6. https://doi.org/10.5465/apbpp.2002.7516876

18.

Danner

Aichholzer

Rammstedt

(2015). Acquiescence in personality questionnaires: Relevance, domain specificity, and stability. Journal of Research in Personality, 57, 119-130. https://doi.org/10.1016/j.jrp.2015.05.004

19.

De Boeck

Partchev

. (2012). IRTrees: Tree-based item response models of the GLMm family. Journal of Statistical Software, 48(1), 1-28. https://doi.org/10.18637/jss.v048.c01

20.

De Boeck

Wilson

. (2004). Explanatory response models. In De Boeck

Wilson

(Eds.), Explanatory item response models (pp. 565-580). Springer. https://doi.org/10.1007/978-1-4757-3990-9

21.

de Jong

M. G.

Steenkamp

J.-B. E. M.

Fox

J.-P.

Baumgartner

. (2008). Using item response theory to measure extreme response style in marketing research: A global investigation. Journal of Marketing Research, 45(1), 104-115. https://doi.org/10.1509/jmkr.45.1.104

22.

Fischer

(1973). The linear logistic test model as an instrument in educational research. Acta Psychologica, 37(6), 359-74. https://doi.org/10.1016/0001-6918(73)90003-6

23.

Gelman

Rubin

D. B.

(1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7, 457-472. https://doi.org/10.1214/ss/1177011136

24.

Gibbons

Zelner

Reduk

(1999). Effects of language and meaningfulness on the use of extreme response styles by Spanish-English bilinguals. Cross-Cultural Research, 33(4), 369-381. https://doi.org/10.1177/106939719903300404

25.

Harris

. (2009). The National Longitudinal Study of Adolescent to Adult Health (Add Health), Waves I & II, 1994-1996; Wave III, 2001-2002; Wave IV, 2007-2009 [machine-readable data file and documentation]. Carolina Population Center, University of North Carolina at Chapel Hill.

26.

Harzing

(2006). Response styles in cross-national survey research: A 26-country study. International Journal of Cross-Cultural Management, 6(2), 243-266. https://doi.org/10.1177/1470595806066332

27.

Jackson

D. N.

Messick

(1958). Content and style in personality assessment. Psychological Bulletin, 55(4), 243-252. https://doi.org/10.1037/h0045996

28.

Jeon

De Boeck

(2016). A generalized item response tree model for psychological assessments. Behavioral Research Methods, 48(3), 1070-1085. https://doi.org/10.3758/s13428-015-0631-y

29.

Jin

K.-Y.

Wang

W.-C.

(2014). Generalized IRT models for extreme response style. Educational and Psychological Measurement, 74(1), 116-138. https://doi.org/10.1177/0013164413498876

30.

Leland

Barth

(1992). Gender differences in knowledge, intentions, and behaviors concerning pregnancy and sexually transmitted disease prevention among adolescents. Journal of Adolescent Health, 13(7), 589-599. https://doi.org/10.1016/1054-139X(92)90373-J

31.

Leventhal

Stone

(2018). Bayesian analysis of multidimensional item response theory models: A discussion and illustration of three response style models. Measurement: Interdisciplinary Research and Perspectives, 16(2), 114-128. https://doi.org/10.1080/15366367.2018.1437306

32.

Liu

Conrad

F. G.

Lee

(2016). Comparing acquiescent and extreme response styles in face-to-face and web surveys. Quality & Quantity, 51, 941-958. https://doi.org/10.1007/s11135-016-0320-7

33.

Meisenberg

Williams

(2008). Are acquiescent and extreme response styles related to low intelligence and education? Personality and Individual Differences, 44(7), 1539-1550. https://doi.org/10.1016/j.paid.2008.01.010

34.

Merkle

E. C.

Furr

Rabe-Hesketh

(2019). Bayesian comparison of latent variable models: Conditional versus marginal likelihoods. Psychometrika, 84(3), 802-829. https://doi.org/10.1007/s11336-019-09679-0

35.

Mirande

(1968). Reference group theory and adolescent sexual behavior. Journal of Marriage and the Family, 30(4), 572-577. https://doi.org/10.2307/349497

36.

Muthén

L. K.

Muthén

B. O.

(1998-2017). Mplus user’s guide. (8th ed.). Muthén & Muthén.

37.

Myers

A. J.

Ames

A. J.

(2019, July 15-19). Multilevel item response tree for examining heterogeneity in response styles [Paper presentation]. Annual International Meeting of the Psychometric Society, Santiago, Chile.

38.

Naemi

B. D.

Beal

D. J.

Payne

S. C.

(2009). Personality predictors of extreme response style. Journal of Personality, 77(1), 261-286. https://doi.org/10.1111/j.1467-6494.2008.00545.x

39.

Park

(2019). Item response tree models to investigate acquiescence and extreme response styles in Likert-type rating scales. Educational and Psychological Measurement, 79(5), 911-930. https://doi.org/10.1177/0013164419829855

40.

Paulhus

D. L.

(1991). Measurement and control of response bias. In Robinson

J. P.

Shaver

P. R.

Wrightsman

L. S.

(Eds.), Measures of social psychological attitudes. Vol. 1: Measures of personality and social psychological attitudes (pp. 17-59). Academic Press. https://doi.org/10.1016/B978-0-12-590241-0.50006-X

41.

Plieninger

Heck

(2018). A new model for acquiescence at the interface of psychometrics and cognitive psychology. Multivariate Behavioral Research, 53(5), 633-654. https://doi.org/10.1080/00273171.2018.1469966

42.

Plieninger

Meiser

(2014). Validity of multiprocess IRT models for separating content and response styles. Educational and Psychological Measurement, 74(5), 875-899. https://doi.org/10.1177/0013164413514998

43.

Peterson

R. A.

Rhi-Perez

Albaum

(2014). A cross-national comparison of extreme response style measures. International Journal of Market Research, 56(1), 89-110. https://doi.org/10.2501/IJMR-2014-005

44.

Rammstedt

Goldberg

L. R.

Borg

(2010). The measurement equivalence of Big Five factor markers for persons with different levels of education. Journal of Research in Personality, 44(1), 53-61. https://doi.org/10.1016/j.jrp.2009.10.005

45.

SAS Institute Inc. (2017). Base SAS® 9.4 procedures guide (7th ed.).

46.

Smith

Udry

Morris

(1985). Pubertal development and friends: A biosocial explanation of adolescent sexual behavior. Journal of Health and Social Behavior, 26(3), 183-192.

47.

Strack

Martin.

L. L

. (1987). Thinking, judging, and communicating: A process account of context effects in attitude surveys. In Hippler

H. J.

Schwarz

Sudman

(Eds.). Social information processing and survey methodology (pp. 123-148). Springer. https://doi.org/10.1007/978-1-4612-4798-2_7

48.

Thissen-Roe

Thissen

(2013). A two-decision model for responses to Likert-type items. Journal of Educational and Behavioral Statistics, 38, 522-547. https://doi.org/10.3102/1076998613481500

49.

Tourangeau

(1984). Cognitive aspects of survey methodology: Building a bridge between disciplines. In Jabine

T. B.

Straf

M. L.

Tanur

J. M.

Tourangeau

(Eds.), Cognitive sciences and survey methods (pp. 73-100). National Academies Press.

50.

Tourangeau

Rasinski

K. A.

(1988). Cognitive processes underlying context effects in attitude measurement. Psychological Bulletin, 103(3), 299-314. https://doi.org/10.1037/0033-2909.103.3.299

51.

Van Vaerenbergh

Thomas

T. D

. (2013). Response styles in survey research: A literature review of antecedents, consequences, and remedies. International Journal of Public Opinion Research, 25(2), 195-217. https://doi.org/10.1093/ijpor/eds021

52.

van Vlimmeren

Moors

G. B. D.

Gelissen

J. P. T. M

. (2017). Clusters of cultures: Diversity in meaning of family value and gender role items across Europe. Quality & Quantity, 51, 2737-2760. https://doi.org/10.1007/s11135-016-0422-2

53.

Wang

Nydick

S. W.

(2015). Comparing two algorithms for calibrating the restricted non-compensatory multidimensional IRT model. Applied Psychological Measurement, 39(2), 119-134. https://doi.org/10.1177/0146621614545983

54.

Zhang

Tao

Wang

Shi

N.-Z.

(2019). Bayesian model selection methods for multilevel IRT models: A comparison of five DIC-based indices. Journal of Educational Measurement, 56(1), 3-27. https://doi.org/10.1111/jedm.12197

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.06 MB