Assessing Dimensionality of the Ideal Point Item Response Theory Model Using Posterior Predictive Model Checking

Abstract

Although the use of ideal point item response theory (IRT) models for organizational research has increased over the last decade, the assessment of construct dimensionality of ideal point scales has been overlooked in previous research. In this study, we developed and evaluated dimensionality assessment methods for an ideal point IRT model under the Bayesian framework. We applied the posterior predictive model checking (PPMC) approach to the most widely used ideal point IRT model, the generalized graded unfolding model (GGUM). We conducted a Monte Carlo simulation to compare the performance of item pair discrepancy statistics and to evaluate the Type I error and power rates of the methods. The simulation results indicated that the Bayesian dimensionality detection method controlled Type I errors reasonably well across the conditions. In addition, the proposed method showed better performance than existing methods, yielding acceptable power when 20% of the items were generated from the secondary dimension. Organizational implications and limitations of the study are further discussed.

Keywords

dimensionality assessment posterior predictive model checking item response theory ideal point generalized graded unfolding model Monte Carlo simulation

In industrial-organizational psychology, there has been growing interest in measuring noncognitive constructs, such as attitude, personality, emotional intelligence, and vocational interests, which aim to predict valued employee outcomes, such as training performance (Barrick et al., 2001; Schmidt & Hunter, 1998), adaptive performance (Huang et al., 2014), motivation (Judge & Ilies, 2002), and overall job performance (Barrick et al., 2001; Judge et al., 2013). Researchers also have shown that noncognitive constructs reduce the adverse impacts typically associated with measures of cognitive ability (Hough et al., 2001; Ployhart & Holtz, 2008).

To construct reliable and valid noncognitive measures and evaluate their quality, an item response theory (IRT) methodology has been widely utilized. Based on item response process assumptions, researchers have developed and implemented two types of IRT models: dominance and ideal point. Dominance IRT models assume that the item response probability monotonically increases as an examinee's latent trait increases. This assumption is well accepted in cognitive and noncognitive assessments where examinees’ maximum performance and abilities are measured. Ideal point IRT models assume nonmonotonic item response probability, where an examinee endorses or agrees with an item only if the examinee's latent trait is close to the location of the item. Ideal point models have recently gained attraction in organizational research because some research suggested that the ideal point models characterize the examinee's endorsement process better than dominance models for measures that require introspection and self-reflection (Drasgow et al., 2010; Nye et al., 2020).

Although the use of ideal point models for organizational research has increased over the last decade (Foster et al., 2017), the assessment of construct dimensionality of ideal point scales has been overlooked in previous research. For example, researchers presumably applied conventional methods for assessing dimensionality developed based on the dominance response process to ideal point scales. However, this approach may not be appropriate, because it could result in an artificial scale dimension from ideal point scales (Davison, 1977; Tay & Drasgow, 2012a). Furthermore, the dimensionality assessment methods for ideal point scales have not been systematically evaluated. Only a few studies attempted to develop and evaluate the methods under various conditions, but the techniques require manual statistical programming and intensive computation (Carter & Zickar, 2011). User-friendly software for assessing the dimensionality of ideal point scales is also not readily available for organizational researchers.

In principle, an evaluation of dimensionality is a prerequisite for constructing a scale, because the reliability and validity of measures are heavily dependent on the construct dimensionality of the scales. Therefore, we aim to develop appropriate dimensionality assessment methods for ideal point scales and examine the efficacy of the methods under various conditions via a Monte Carlo simulation. In addition, we show the feasibility of the developed dimensionality assessment methods using an empirical example and demonstrate the procedure for assessing the dimensionality of ideal point scales.

Ideal Point Modeling in Organizational Research

Ideal point IRT models have been used in many organizational studies (Foster et al., 2017; Nye et al., 2020). Ideal point IRT modeling has several advantages for addressing important issues. For example, ideal point models are flexible for measuring a wide range of latent traits, including positive (or negative) and intermediate items. Intermediate items, especially, have shown several advantages for noncognitive assessment in organizational settings because these items provide more information for examinees who have moderate latent traits and increase the reliability of the scores (Chernyshenko et al., 2007; LaPalme et al., 2018). Intermediate items refer to statements designed to measure a neutral value on the construct latent trait continuum (Drasgow et al., 2010). An example of intermediate item on the extraversion scale in a personality assessment is “I don’t mind attending social events”. This item could be endorsed by moderate-level introverts and extraverts because the statement does not explicitly describe the extreme extraversion trait, and moderate-level examinees would perceive the statement as describing themselves accurately (Stark et al., 2006). Intermediate items are generally known to be less discriminating and have lower scale reliability under the dominance modeling framework. However, with ideal point modeling, intermediate items can be analyzed with high reliability and better model fit (Cao et al., 2015).

Moreover, intermediate items have been proven to be more fake-resistant than extreme positive or negative items (Liu & Zhang, 2020; Zhang et al., 2020). In organizational settings, faking has been a major issue: Researchers have debated whether it occurs and whether it matters (Birkeland et al., 2006; Donovan et al., 2014). Faking is defined as examinees’ attempt to distort their latent traits to be more favorably perceived (Kuncel & Borneman, 2007). Although there are mixed results in the literature (Ellingson et al., 2007), faking responses generally have negative impacts on the psychometric properties of scales, rank order in personnel selection, and the overall utility of noncognitive measures (Birkeland et al., 2006; Mueller-Hanson et al., 2003; Rosse et al., 1998). Several methods have been suggested for preventing faking in high-stakes settings, and researchers have found that including intermediate items can effectively prevent faking (Liu & Zhang, 2020; O'Brien & LaHuis, 2011; Scherbaum et al., 2013; Zhang et al., 2020). For example, Harris et al. (2021) empirically showed that motivated test takers (i.e., the faking condition) perceived intermediate items were more difficult to respond to in the “socially desirable” way than dominance items (i.e., positive or negative items). In accordance with these advantages, several empirical and simulation studies utilized ideal point models and showed the benefits of ideal point modeling for noncognitive assessments (Cao et al., 2015; Carter & Dalal, 2010; Drasgow et al., 2010; LaPalme et al., 2018; Stark et al., 2006; Tay et al., 2009).

Generalized Graded Unfolding Model

In applied settings, the generalized graded unfolding model (GGUM; Roberts et al., 2000, 2002) is the most widely used ideal point IRT model because the model is flexible enough to fit dichotomous and polytomous responses, and user-friendly software, GGUM2004 (Roberts et al., 2006), is publicly available. To describe the generalized formulation of the GGUM, let $X_{i j}$ be the observed item response from the i^th examinee to the j^th item (ranging from 0 to $C$ ). Then, the GGUM probability function is
$P (X_{i j} = x | θ_{i}) = \frac{\exp (α_{j} [x (θ_{i} - δ_{j}) - \sum_{k = 0}^{x} τ_{j k}]) + \exp (α_{j} [(M - x) (θ_{i} - δ_{j}) - \sum_{k = 0}^{x} τ_{j k}])}{\sum_{w = 0}^{C} \exp (α_{j} [w (θ_{i} - δ_{j}) - \sum_{k = 0}^{w} τ_{j k}]) + \exp (α_{j} [(M - w) (θ_{i} - δ_{j}) - \sum_{k = 0}^{w} τ_{j k}])}$
(1)
where $θ_{i}$ is the latent trait of the i^th examinee, $α_{j}$ is discrimination, $δ_{j}$ is the location, and $τ_{j k}$ is the k^th subjective response category threshold parameters of the j^th item. M is defined as $2 C + 1$ . The GGUM response probability increases as the distance between the latent trait and the location parameters $(| θ_{i} - δ_{j} |)$ decreases, and the maximum probability is observed at $θ_{i} = δ_{j}$ . Similarly, the GGUM response probability decreases as $| θ_{i} - δ_{j} |$ increases, and the minimum probability is reached when $| θ_{i} - δ_{j} |$ is the largest. For a more detailed explanation of GGUM probability functions and general ideal point IRT models, readers are referred to a series of GGUM works (Roberts et al., 2000, 2002).

Practical Issues with Ideal Point Scale Dimensionality

One of the major assumptions for a construct assessment is the unidimensionality of the scale. Violation of the assumption can substantially distort the reliability and the validity of scales and decrease the precision of parameter estimation, resulting in substantial bias and measurement error in the model. Conventional dimensionality assessment methods, such as factor analysis (FA) or principal component analysis (PCA), have been considered in organizational settings; however, these methods might not be appropriate for ideal point scales because the conventional methods assume a linear, monotonically increasing relationship between latent traits and the response probability (Carter & Zickar, 2011; Tay & Drasgow, 2012a). Previous literature has discussed the inappropriateness of conventional dimensionality assessment methods for ideal point scales due to the unfolding/nonmonotonic nature (Davison, 1977; Maraun & Rossi, 2001; van Schuur & Kiers, 1994). For example, it has been shown that when unidimensional ideal point scales are analyzed with conventional dimensionality assessment methods that inherently assume the dominance response process, an additional factor can be yielded (Spector et al., 1997; Tay & Drasgow, 2012a).

To illustrate how conventional methods could mislead the dimensionality assessment results for ideal point scales (also according to the suggestion of an anonymous reviewer), two unidimensional data based on dominance and ideal point processes were generated. The dominance data were generated using the two-parameter logistic (2PL) model, and the ideal point data were generated using the GGUM¹. The data included 20 items and 1,000 examinees. PCA was applied to the data, and scree plots were created using the principal components. Figure 1 shows the scree plot results for unidimensional dominance and ideal point data. The figure clearly shows that for the dominance data, one strong factor was found and explained more than 25% of the total variance. In contrast, for the ideal point data, two major principal components were found, and both explained more than 10% of the total variance. This illustration emphasizes previous research where an additional artificial factor can be found in dimensionality assessment with ideal point scales. Tay and Drasgow (2012a), for example, showed that when scales are constructed to measure positive and negative traits on the same latent trait continuum, the midpoint-level items are endorsed by positive and negative trait responders. Thus, these midpoint-level items are categorized into the latent variable that is distinct from the positivity-to-negativity dimension. Figure 2 further illustrates the PCA results for multidimensional data. Similarly, multidimensional data were generated based on the assumption of dominance and ideal point processes, and 40% of the total items were generated from the secondary dimension, which correlates the primary dimension with .40. As shown in Figure 2, two factors were detected with PCA from the dominance data, whereas three factors were detected from the ideal point data. Specifically, it was more challenging to identify the distinct factors for ideal point data because the remaining principal components still explained a large proportion of the total variance. This illustration highlights that conventional dimensionality assessment methods not only fail to detect the secondary construct dimension but also could lead to a misinterpretation of the dimensionality assessment for ideal point data.

Figure 1.
Scree plots from principal component analysis (PCA) for unidimensional dominance and ideal point data.

Figure 2.
Scree plots from principal component analysis (PCA) for multidimensional dominance and ideal point data.

Another example of the misuse of dimensionality assessment for ideal point scales is related to attitude assessment. Specifically, misinterpretation of ideal point dimensionality may have amplified the controversy between the bipolar model and the bivariate model in attitude assessment. After Thurstone (1928) first conceptualized that attitude can be measured, the attitude dimension was well understood with the bipolarity structure (e.g., sadness to happiness, conservative to liberal, and evil to good). However, researchers argued that the bipolarity structure of attitudes assumes a reciprocal relationship between positive and negative traits (i.e., mutually exclusive) and does not explicitly explain attitudinal ambivalence (Cacioppo & Berntson, 1994). Ambivalence occurs when an examinee makes positive and negative evaluations of an attitude object; thus, the examinee tends to agree on two opposite attitude statements (McGrane, 2019). One of the reasons researchers advocated the bivariate model is that ambivalence can be explained by imposing an additional dimension. Researchers provided psychometric evidence that supports the bivariate model with studies in which an additional dimension was found with the factor analytic approach (Cacioppo et al., 1997). However, given that studies showed attitudes can be better explained using the ideal point model (Andrich & Styles, 1998), dimensionality assessment based on the dominance assumption, such as FA, could have led to the misinterpretation of ideal point scale dimensionality. The additional dimension triggered by conventional methods may not appropriately explain attitudinal ambivalence but was created artificially. Recent studies also have shown that coendorsement of positive and negative statements can better be explained by the bipolarity structure of attitudes using the unidimensional ideal point model, especially for problems associated with the dimensional structure of affect (Tay & Kuykendall, 2017) and the construct continuum specification (Tay & Jebb, 2018). However, we carefully note that the goal of the present study is not to convince readers of the bipolarity structure of attitudes but rather focus on the methodological issue of ideal point scale dimensionality in the context of organizational research.

Assessment of Ideal Point Scale Dimensionality

Although the conventional methods were mainly developed based on the dominance assumption, several studies attempted to develop dimensionality assessment methods for ideal point scales. Habing et al. (2005), for example, previously investigated the performance of Q3 statistics (Yen, 1984) and proposed a modified version of the Q3 statistic for the GGUM. The modified Q3 statistic corrects the direction of residuals for extreme items (i.e., high or low location items) that could produce the opposite sign due to the nonmonotonicity of the GGUM. The simulation results indicated that the modified Q3 statistic controls Type I error and produces high power to detect the violation of unidimensionality when extreme location items are used. However, the performance of the modified Q3 statistic is still in question for nonextreme items because of the bias in the modified Q3 statistic (Habing et al., 2005). In addition, Carter and Zickar (2011) investigated the impact of a violation of unidimensionality with the GGUM and showed that item and latent trait parameters are severely biased if the scale includes secondary dimension items. Wang and Wu (2016) also developed a confirmatory multidimensional GGUM and showed the efficacy of the model. However, previous studies mainly focused on the impact of the violation of unidimensionality and the development of a confirmatory multidimensional model, and a method for assessing dimensionality with the GGUM is still in its infancy. Habing et al.’s (2005) study was limited to Q3 statistics, and the results were not favorable for intermediate-level items. Many statistics other than Q3 are also available for dimensionality assessment, and it would be useful for organizational researchers if more than one-dimensionality assessment method were developed and compared for various contexts. Tay and Drasgow also emphasized that “we need to develop models of nonlinear factor analysis that go beyond common link functions (e.g., log, logit, or inverse) to take into account the unfolding relationship between indicators and the latent variable” (2012a, p. 380).

To achieve this goal, in the present study, we developed dimensionality assessment methods for the GGUM under the Bayesian framework. Specifically, we adapted the posterior predictive model checking (PPMC) procedure to assess the unidimensionality of the GGUM scale. In the next section, we present the PPMC method and its discrepancy statistics, and describe the design, procedure, and analysis methods of the simulation study.

Posterior Predictive Model Checking

The PPMC method (Gelman et al., 1996) was originally developed to evaluate model fit by computing the discrepancy between observed data and the fitted model using the posterior predictive distribution. The posterior predictive distribution refers to the predictive distribution of replicated data conditional on the observed data (Sinharay et al., 2006). If the observed data fit the model, the replicated data generated under the model should look similar to the observed data (Gelman et al., 1996).

One of the advantages of the PPMC method is that any type of conventional statistic can be used to evaluate the discrepancy between the observed data and the model. Specifically, using the selected discrepancy statistic D, one can conduct a specific diagnostic analysis with the observed data (Berkhof et al., 2004; Levy et al., 2009). For example, item pair statistics, such as Q3 or the odds ratio (OR), can be implemented as discrepancy statistics to assess the unidimensionality of the scale because they are sensitive in detecting multidimensionality and local dependence among items (Chen & Thissen, 1997). Furthermore, researchers can make statistical inferences about population parameters by computing posterior predictive p-values (PPP). A PPP value closer to .5 indicates the observed data fit the model adequately, and a value closer to 0 or 1 indicates the model does not fit the observed data adequately. To implement the feature of p-value in classical hypothesis testing, it has been suggested using $α / 2$ and $1 - α / 2$ as tail-area probabilities for rejecting the null hypothesis of the adequate model fit.

Based on previous studies (Drasgow et al., 1995; Habing et al., 2005; Levy et al., 2009; Nye et al., 2020), we chose the following discrepancy statistics to assess the scale dimensionality of the GGUM: chi-squares $(X^{2} and G^{2})$ , OR, covariance (COV), model-based covariance (MBC; Reckase, 1997), and Q3 (Yen, 1984). Detailed mathematical descriptions of the PPMC method and discrepancy statistics are provided in the Appendix.

We considered the two types of chi-square statistics $(X^{2} and G^{2})$ as discrepancy statistics because they are commonly used model fit statistics in IRT literature. For this study, we specifically chose the item pair chi-squares because they were developed for detecting local dependency under the classical statistics framework (Chen & Thissen, 1997). However, chi-square statistics are generally sensitive to large sample sizes and tend to inflate Type I error under the null condition (Tay & Drasgow, 2012b). This may not be the case for the PPMC dimensionality methods because the conventional features of model fit statistics do not hold for PPMC in general. Specifically, in PPMC, the discrepancy statistics characterize the replicated and observed data separately, and statistical inference is made using the empirical posterior predictive distribution from the replicated data instead of the theoretical sampling distribution. Therefore, increasing the sample size does not influence the empirical posterior predictive distribution, and the statistical inference remains consistent (Levy et al., 2009; Nye et al., 2020).

In this study, we also considered the OR, COV, and MBC as the discrepancy statistics. Previous studies with the PPMC method found that the OR is effective for detecting model-data misfit induced by local dependencies among item pairs (Sinharay et al., 2006). The OR produces high power to detect item misfit under the PPMC framework. One advantage of using the OR statistic and the COV matrix as discrepancy statistics for PPMC is that they do not require item parameter estimates for computing PPP. This implies that the statistical inference with the OR and the COV matrix would be consistent regardless of the accuracy of the item parameter estimates.

In addition, the COV and MBC matrices are useful statistics for detecting multidimensionality by computing the association among item pairs. The two approaches differ in how they compute the expected frequency. The COV matrix computes the expected frequency from the mean of the observed response counts. The MBC matrix, however, computes the expected frequency based on the fitted model using Equation 1. This difference plays an important role in the ideal point IRT model. Habing et al. (2005) have discussed that dimensionality assessment methods which use the conditional covariance matrix (e.g., DIMTEST) may not be appropriate for the ideal point model because the procedure assumes a monotonic relationship between the latent trait and the item response (Stout, 1990; Zhang & Stout, 1999). The conditional covariance matrix may distort the measure of item pair association for intermediate items because the mean of the observed response counts does not properly reflect the expected frequency. To address this issue, we incorporated the MBC matrix where the expected frequency can be computed directly from the fitted model.

Finally, we included Q3 as the discrepancy statistic in this study. Q3 is a widely used statistic for assessing dimensionality under the classical statistic framework. One criticism of the Q3 statistic is that there is no proper sampling distribution for making statistical inferences. Yen (1984) suggested using Fisher's z-transformation because the Q3 statistic with the z-transformation asymptotically follows the standard normal distribution if the model is correct. However, Chen and Thissen (1997) provided empirical evidence that Q3 with the z-transformation does not approximate the standard normal distribution, especially in the tail probabilities, and Type I errors inflate when the sample size increases. However, the lack of a sampling distribution for Q3 can also be overcome with PPMC by empirically constructing the posterior predictive distribution (Sinharay et al., 2006).

Methods

Simulation Design

To evaluate the performance of PPMC dimensionality assessment methods for the GGUM, we conducted a Monte Carlo simulation study. We included various simulation designs by manipulating the data-generating conditions: (a) sample size (500, 1,000, and 2,000), (b) test length (10 and 20), (c) dimensional correlation (0, 0.3, and 0.6), (d) range of discrimination parameter (low, medium, and high), and (e) proportion of multidimensionality (20% and 40%). A total of 3 × 2 × 3 × 3 × 2 = 108 simulation conditions were considered, and for each condition, 50 replicated data were generated². We fixed the response scale as dichotomous in this study. The rationale for the specific values in each condition is described as follows.

Sample Size and Test Length

The sample size was varied across 500, 1,000, and 2,000, and the test length was varied from 10 to 20 items. The smallest sample size was 500 because previous studies have shown that item parameters of the GGUM are accurately estimated as low as 500. With the Bayesian Markov chain Monte Carlo (MCMC) estimation method, previous simulation studies reported that a sample size of 400 can produce minimal bias for item parameter estimates of the GGUM (de la Torre et al., 2006; Joo et al., 2017; Roberts & Thompson, 2011). Given that the accuracy of item parameter estimation improves as the sample size increases, we expect that the efficacy of the PPMC dimensionality assessment methods would be most apparent in large sample size conditions. We also varied the number of items on a test from 10 to 20. In practical settings, many noncognitive tests are relatively short, and typical measures developed with the GGUM have test lengths ranging from 10 to 20 (Roberts et al., 1999; Tay & Drasgow, 2012a, 2012b). We believe the range of the sample size and the test length considered in this study is commonly observed in practice, and the simulation results can be generalized to applied settings.

Dimensional Correlation

To investigate the impact of the dimensional correlation of the multidimensional latent trait structure, we varied the dimensional correlations from 0 to .3 and .6. Levy et al. (2009) illustrated that the dimensional correlation has a significant impact on dimensionality assessment, and the increasing dimensional correlation tends to diminish the effect of multidimensionality. This is especially true if the multidimensional scales are created with secondary dimension items in addition to primary dimension items. Previous studies (Carter & Zickar, 2011; Habing et al., 2005) also empirically supported that increasing the dimensional correlation of the latent trait structure substantially decreases the impact of multidimensionality. For example, Carter and Zickar (2011) showed that the bias of the GGUM item and latent trait parameters decreased considerably as the dimensional correlation increased from 0 to 0.9. They also found that the parameter estimation results in the correlation of the 0.9 conditions were similar to those for unidimensional scales. Based on these results, we expect that the power to detect the violation of unidimensionality improves as the dimensional correlation decreases.

Discrimination Parameters

We varied the range of discrimination parameters in the simulation design from low to medium and high. For each of the three conditions, the alpha parameters of the GGUM (i.e., the discrimination parameters) were randomly drawn from U(0.5, 1), U(1, 1.5), and U(1.5, 2), respectively, for each replication. The values in the uniform distributions were selected based on empirical GGUM studies (Cao et al., 2015; Carter & Dalal, 2010; Tay & Ng, 2018). Note that items with high discrimination parameters produce latent trait parameters that are more accurate and have lower measurement errors. Similarly, it is expected that multidimensional scales with high discrimination parameters have a higher impact than multidimensional scales with low discrimination parameters. Zhang and Stout (1999) and Levy et al. (2009) discussed that increasing discrimination parameter values leads to an increase in the magnitude of the conditional COV structure. We expect that increasing discrimination parameters will increase the detection rate of the dimensionality assessment methods.

Proportion of Multidimensionality

We also varied the proportion of the items exhibiting multidimensionality from 20% to 40%. Based on previous studies (Carter & Zickar, 2011; Levy et al., 2009), the proportion of the secondary dimension items affects the detection rate considerably. When the number of secondary dimension items is relatively small, the latent trait vectors from two dimensions are highly distinguishable, and the angle between two vectors becomes large. In contrast, as the number of secondary dimension items increases, the distinction (i.e., angle) between two latent trait vectors diminishes (Levy et al., 2009; Zhang & Stout, 1999). As a result, the Type I error rate (i.e., falsely detecting multidimensionality) would increase, and the power rate (i.e., correctly detecting multidimensionality) would decrease. We chose 20% and 40% as the simulation conditions because these values were typically observed in several studies (Carter & Zickar, 2011; Habing et al., 2005; Levy et al., 2009). Therefore, the performance of the dimensionality detection methods from the present study can be compared directly.

Data Generation

For the simulation study, we benchmarked generating the parameters of the GGUM from previous studies (Carter & Zickar, 2011; de la Torre et al., 2006; Joo et al., 2017; Roberts & Thompson, 2011; Roberts et al., 2000, 2002). The discrimination parameters $(α)$ for all items were randomly generated in accordance with the simulation conditions (e.g., low [U(0.5, 1.0)], medium [U(1.0, 1.5)], and high [U(1.5, 2.0)] discrimination parameter condition). The location parameters $(δ)$ were simulated to range from −2.0 to + 2.0 with equal spacing, making the first item location parameter −2.0 and the last item location parameter 2.0. The range for generating location parameters was determined based on Thurstone’s (1928) principle of attitude scale construction strategy. In addition, the threshold parameters $(τ)$ were randomly generated from U(−3.0, −0.5), reflecting substantive studies with the GGUM (Carter & Zickar, 2011; Stark et al., 2006; Tay et al., 2009; Zhang et al., 2020).

To generate the multidimensional scale, we followed the procedure in previous studies that investigated scale dimensionality (Carter & Zickar, 2011; Levy, 2011; Levy et al., 2009). First, we generated a two-dimensional latent trait matrix from a multivariate normal distribution with a zero mean vector and a variance–covariance matrix. The off-diagonal elements of the variance–covariance matrix were set to 0, and the diagonal elements of the variance–covariance matrix were varied depending on the simulation study conditions (e.g., the dimensional correlation equaled 0, 0.3, or 0.6). The first column of the latent trait matrix was considered the primary dimension, and the second column of the latent trait matrix was considered the secondary dimension. Then, the item responses were generated based on the latent trait dimensions.

The secondary dimension items selected for the GGUM varied based on the simulation study design³. For the 20% multidimensionality with the 10-item test condition, we randomly selected two items as the secondary dimension items and selected the remaining eight items as the primary dimension items. For the 40% multidimensionality with the 10-item test condition, we randomly selected four items as the secondary dimension items. Similarly, we selected four items as the secondary dimension items in the 20% multidimensionality with the 20-item test condition and randomly selected eight items as the secondary dimension items in the 40% multidimensionality with the 20-item test condition. The random selection of the secondary dimension items varied across replications to reduce the impact of generating the item parameters of the GGUM.

Parameter Estimation and PPMC Implementation

The parameters of the GGUM were estimated with the MCMC estimation procedure (Patz & Junker, 1999). Specifically, the Metropolis–Hastings within Gibbs sampling method was implemented. The four-parameter beta distribution, B(a, b, u, v), was specified as a prior distribution for the item parameters of the GGUM. Note that four-parameter beta distribution can mimic widely used distributions, such as normal and log-normal distributions, by manipulating its shape (a, b) and range (u, v) parameters (Zeng, 1997). As used by de la Torre et al. (2006) and Joo et al. (2017), the four-parameter beta distributions B(1.50, 1.50, 0.25, 4.00), B(2, 2, −5, 5), and B(2, 2, −6, 6) were used for the alpha, delta, and tau parameters of the GGUM. For the latent trait parameter, standard normal distribution, N(0, 1) was specified for the MCMC estimation algorithm.

The Metropolis–Hastings with Gibbs sampling method also requires specifying initial values for the parameters to be estimated. The initial values were generated separately for each parameter of the GGUM. For the alpha parameters, we set 1 as the initial value across all items. For the delta parameters, we set −3 as the initial value for the first item and specified values in an increment of 6/(J−1), where J is the number of items, for the successive items, such that the initial value for the last item was specified as 3. For the tau parameters, we fixed the initial value at −1 for all items. For the latent trait parameters, we randomly generated samples from N(0, 1) and specified the samples as the initial values for all parameters. The strategy for the initial value specification is consistent with previous ideal point IRT model studies (de la Torre et al., 2006; Joo et al., 2019).

After specifying initial values, we estimated the item and latent trait parameters of the GGUM simultaneously with the MCMC algorithm. We considered multiple chains (C = 3) in this study. For each chain, a total of 20,000 iterations were performed, and the first 10,000 samples were discarded as the burn-in period. After the burn-in period, samples across the chains were used to compute the parameter estimates. The number of iterations, burn-in period, and chains were determined based on the convergence check computed from $\hat{R}$ statistics (Gelman & Rubin, 1992). The $\hat{R}$ statistic was computed for all parameters and evaluated with the practical criterion where $\hat{R}$ less than 1.2 is considered as convergence. We took the average from the posterior samples to compute the parameter estimates and the standard deviation of the posterior samples to compute the posterior standard deviation (PSD) of the estimates.

For the dimensionality assessment, we implemented PPMC by adapting the method described by Sinharay et al. (2006) in which the model is evaluated using the posterior predictive distribution. The computation of the posterior predictive distribution is described as follows.
After the burn-in period, we drew Markov chain samples of the item and latent trait parameters from the posterior distribution.

We generated replicated response data, $X^{r e p}$ , using the item and latent trait parameters sampled from the posterior distributions, denoted as $ω^{r e p}$ .

We computed the discrepancy statistics using the observed data $D^{o b s} = D (X, ω^{r e p})$ and the replicated data $D^{r e p} = D (X^{r e p}, ω^{r e p})$ . As an example, suppose D represents the chi-square statistic. Then, we obtained the chi-squares from each data set $X^{r e p}$ and $X^{o b s}$ for computing $D^{r e p}$ and $D^{o b s}$ , respectively. In the computation of the chi-squares, we used the GGUM item and latent trait parameters sampled from the posterior distributions.

Finally, we compared the discrepancy statistics $D^{r e p}$ and $D^{o b s}$ . We computed the PPP values as the proportion of the discrepancy statistics for the replicated data that were greater than the corresponding statistics for the observed data across the posterior samples. Extreme PPP values (i.e., less than .025 or greater than .975) indicated model-data misfit (Sinharay et al., 2006).

Evaluation Criteria

To evaluate the performance of the PPMC dimensionality detection methods, we computed Type I error and power rates. The Type I error rate was obtained by computing the proportion of replications in which PPP falsely produced the extreme value for item pairs that involved the primary dimension. Similarly, the power rates were obtained by computing the proportion of replications in which PPP correctly produced the extreme value for item pairs that involved the secondary dimension. In addition, to investigate the impact of the violation of unidimensionality on the parameter estimates, we computed the bias and the root mean squared error (RMSE) for the item and latent trait parameter estimates and the correlation between generating and estimated (CORR) for the latent trait parameters. We computed the bias and the RMSE as follows:
$Bias = \sum_{r} \frac{ω_{r} - {\hat{ω}}_{r}}{R}$
(2)
$RMSE = \sqrt{\sum_{r} \frac{{(ω_{r} - {\hat{ω}}_{r})}^{2}}{R}}$
(3)
where R is the total number of replications, $ω_{r}$ is the generating parameters, and ${\hat{ω}}_{r}$ is the estimated parameters of the GGUM for the r^th replicated data set. We computed the bias, RMSE, and CORR for each parameter estimate and averaged them across the items (i.e., alpha, delta, and tau) and latent trait values (i.e., theta) to summarize the results.

In addition, we compared the performance of PPMC with that of existing methods. The existing methods can serve as the baseline for evaluating the performance of the PPMC dimensionality assessment methods. We considered commonly used model fit statistics developed for dimensionality assessment. For example, Drasgow et al. (1995) previously introduced several model fit statistics using chi-squares (i.e., singles, doubles, and triples) for dominance IRT models. However, they noted that the chi-square statistics do not approximate the sampling distribution and are sensitive to a large sample. Alternatively, Drasgow et al. (1995) suggested using the chi-square degrees of freedom ratio (χ²/df) for assessing model-data fit. To address the sensitivity of chi-squares to large sample sizes, they also recommended adjusting chi-squares with a fixed sample size (e.g., 3,000; Tay & Drasgow, 2012b). As a heuristic diagnosis, Drasgow et al. (1995) suggested using a cutoff value of 3 for detecting model misfit. Later, Tay and Drasgow (2012b) showed adjusted χ²/df doubles and triples with the cutoff value of 3 are effective for detecting model-data misfit but have little power to detect moderate multidimensionality. Based on the previous findings, we included the adjusted χ²/df doubles in this study.

We also included Yen's Q3 (Yen, 1984; we refer to this statistic as YQ3 to distinguish it from Q3 in the PPMC method) and modified Q3 (MQ3; Habing et al., 2005) as the existing methods. Given that YQ3 statistics with the z-transformation do not accurately approximate the standard normal distribution, we alternatively used .20 as the heuristic cutoff value for detecting multidimensionality following Chen and Thissen’s (1997) recommendation. This cutoff has been used in applied settings (Makransky & Bilenberg, 2014; Reeve et al., 2007) and found to be effective for controlling Type I error under the null condition (Christensen et al., 2017). Data generation, MCMC estimation, PPMC computation, and evaluation of simulation results were accomplished using the statistical program R (R Core Team 2020).

Results

Type I Error Rates

Tables 1 and 2 show the Type I error results for the PPMC dimensionality assessment methods for the GGUM. Note that the Type I error rate is computed as the proportion of incorrectly detecting primary dimension items as secondary dimension items across replications. As shown in Tables 1 and 2, the Type I error rates were well controlled for the 20% multidimensional item conditions and were lower than or equal to the nominal level of .05. The acceptable Type I error rates (< .05) were consistent across conditions, including the sample size, the number of items, the dimensional correlation, and the level of the discrimination parameters. In terms of the discrepancy statistics, MBC and Q3 revealed higher Type I errors than the other statistics. For the 40% multidimensional item conditions; however, the Type I error increased considerably, and we observed values higher than the nominal level in some conditions. As the sample size, the number of items, and the level of the discrimination parameters increased, the Type I error rates increased substantially, and we observed unacceptable Type I error in the 20-item conditions. In addition, the Type I errors of OR, COV, MBC, and Q3 tended to be higher than those of the chi-square statistics (X2 and G2). MBC and Q3 produced the highest Type I error (.27) in the 2,000 sample size, 20-item test, high discrimination parameters, 0.6-dimensional correlation, and 40% multidimensionality item condition.

Table 1.
Type I Error Rates of Posterior Predictive Model Checking (PPMC) Dimensionality Assessment Methods for the 10-Item Conditions.

20% Multidimensional Items 40% Multidimensional Items

Disc COR N X2 G2 OR COV MBC Q3 X2 G2 OR COV MBC Q3

Low 0 500 .00 .00 .01 .01 .01 .01 .01 .01 .03 .03 .03 .03

1,000 .00 .00 .01 .00 .02 .02 .01 .01 .05 .05 .05 .05

2,000 .00 .00 .01 .00 .01 .01 .03 .03 .06 .06 .07 .07

0.3 500 .00 .00 .02 .02 .02 .02 .01 .01 .04 .03 .04 .04

1,000 .00 .00 .00 .00 .02 .02 .01 .01 .04 .04 .05 .05

2,000 .00 .00 .01 .00 .01 .02 .03 .04 .08 .07 .09 .09

0.6 500 .00 .00 .02 .02 .02 .02 .00 .00 .02 .02 .02 .02

1,000 .00 .00 .01 .01 .02 .02 .00 .00 .03 .03 .05 .05

2,000 .00 .00 .01 .00 .01 .01 .00 .00 .03 .01 .05 .05

Medium 0 500 .00 .00 .01 .01 .01 .01 .01 .01 .04 .04 .04 .04

1,000 .00 .00 .01 .01 .02 .02 .04 .04 .07 .07 .08 .08

2,000 .00 .00 .00 .00 .01 .01 .04 .04 .06 .06 .07 .07

0.3 500 .00 .00 .01 .01 .02 .02 .00 .00 .03 .03 .03 .03

1,000 .00 .00 .01 .01 .02 .02 .03 .03 .06 .06 .08 .08

2,000 .00 .00 .01 .00 .02 .02 .06 .07 .10 .09 .12 .12

0.6 500 .00 .00 .01 .01 .02 .02 .00 .00 .03 .03 .03 .03

1,000 .00 .00 .00 .00 .01 .01 .00 .00 .03 .03 .06 .06

2,000 .00 .00 .00 .00 .02 .02 .01 .01 .05 .03 .11 .11

High 0 500 .00 .00 .01 .01 .02 .02 .02 .02 .07 .07 .08 .08

1,000 .00 .00 .00 .00 .01 .01 .05 .05 .07 .07 .08 .08

2,000 .00 .00 .00 .00 .01 .01 .09 .09 .12 .12 .13 .13

0.3 500 .00 .00 .01 .01 .02 .02 .01 .01 .05 .05 .05 .05

1,000 .00 .00 .01 .01 .02 .02 .06 .06 .11 .11 .15 .15

2,000 .00 .00 .00 .00 .02 .02 .07 .07 .12 .12 .17 .17

0.6 500 .00 .00 .01 .01 .02 .02 .00 .01 .04 .04 .05 .05

1,000 .00 .00 .01 .01 .02 .02 .02 .02 .10 .10 .15 .15

2,000 .00 .00 .01 .00 .02 .02 .02 .02 .08 .06 .15 .15

Note: Disc = item discrimination; COR = theta dimension correlation; COV = covariance; MBC = model-based covariance; OR = odds ratio. Values in bold are Type I error greater than .05.

Table 2.
Type I Error Rates of Posterior Predictive Model Checking (PPMC) Dimensionality Assessment Methods for the 20-Item Conditions.

20% Multidimensional Items 40% Multidimensional Items

Disc COR N X2 G2 OR COV MBC Q3 X2 G2 OR COV MBC Q3

Low 0 500 .00 .00 .03 .03 .04 .04 .01 .01 .08 .08 .09 .09

1,000 .00 .00 .01 .01 .05 .05 .04 .04 .10 .09 .12 .12

2,000 .00 .00 .01 .01 .04 .04 .04 .04 .07 .06 .08 .08

0.3 500 .00 .00 .03 .03 .04 .04 .01 .01 .05 .05 .07 .06

1,000 .00 .00 .01 .01 .04 .04 .02 .02 .07 .07 .12 .11

2,000 .00 .00 .01 .01 .04 .05 .02 .02 .08 .07 .12 .12

0.6 500 .00 .00 .04 .04 .05 .05 .00 .00 .04 .04 .05 .05

1,000 .00 .00 .02 .01 .05 .05 .00 .00 .03 .03 .09 .09

2,000 .00 .00 .01 .01 .04 .05 .00 .00 .05 .04 .12 .12

Medium 0 500 .00 .00 .03 .03 .05 .05 .03 .03 .10 .10 .11 .11

1,000 .00 .00 .01 .01 .05 .05 .08 .08 .13 .13 .15 .15

2,000 .00 .00 .01 .01 .04 .04 .03 .03 .06 .06 .09 .09

0.3 500 .00 .00 .03 .03 .05 .05 .02 .02 .09 .09 .11 .11

1,000 .00 .00 .01 .01 .04 .04 .02 .02 .08 .08 .14 .14

2,000 .00 .00 .01 .01 .05 .05 .05 .05 .11 .10 .16 .16

0.6 500 .00 .00 .03 .03 .05 .05 .01 .01 .06 .06 .08 .08

1,000 .00 .00 .02 .02 .05 .05 .01 .01 .07 .06 .13 .13

2,000 .00 .00 .01 .01 .05 .05 .01 .01 .09 .07 .18 .18

High 0 500 .00 .00 .03 .03 .04 .04 .07 .07 .16 .16 .17 .17

1,000 .00 .00 .01 .01 .05 .05 .08 .08 .12 .12 .14 .14

2,000 .00 .00 .01 .01 .04 .04 .06 .06 .10 .09 .12 .12

0.3 500 .00 .00 .03 .03 .05 .05 .03 .03 .13 .13 .15 .15

1,000 .00 .00 .01 .01 .04 .04 .08 .09 .17 .17 .23 .23

2,000 .00 .00 .01 .01 .04 .04 .05 .05 .13 .12 .21 .21

0.6 500 .00 .00 .03 .03 .04 .04 .00 .00 .06 .06 .09 .09

1,000 .00 .00 .02 .02 .05 .05 .01 .01 .09 .09 .18 .18

2,000 .00 .00 .01 .01 .05 .05 .02 .03 .14 .13 .27 .27

Note: Disc = item discrimination; COR = theta dimension correlation; COV = covariance; MBC = model-based covariance; OR = odds ratio. Values in bold are Type I error greater than .05.

Power Rates

Tables 3 and 4 show the power rates of the PPMC dimensionality assessment methods across the simulation conditions. Overall, we observed a pattern consistent with previous dimensionality studies (Carter & Zickar, 2011; Levy, 2011). As expected, increasing the sample size substantially increased the power rates of the dimensionality assessment methods. For example, the marginal power rates for each sample size (i.e., average power rates across simulation conditions) for X2 were .12, .45, and .63 for the 500, 1,000, and 2,000 sample sizes, respectively. In addition, the power rates notably increased as the discrimination parameters increased. When the low discrimination parameters were considered, the marginal power rates for X2, OR, and MBC were .28, .44, and .46, whereas the corresponding values were .53, .66, and .68 when high discrimination parameters were considered. Regarding the dimensional correlation of the latent trait structure, higher correlation resulted in lower power rates for the methods across conditions. The decrease in the power rates was more prominent when the dimensional correlation increased from 0.3 to 0.6. For example, the marginal power rates of G2, COV, and Q3 were .50, .60, and .62 for the 0 correlation conditions. However, for the 0.3 correlation conditions, the corresponding values decreased to .45, .58, and .61, respectively, and for the 0.6 correlations, the corresponding values decreased to .27, .43, and .49. Consistent with the Type I error rates, increasing the number of secondary dimension items decreased the overall power rates. When 20% of the items were generated from the secondary dimension, the marginal power rates for OR and Q3 were .66 and .67, whereas the corresponding values decreased to .44 and .47 when 40% of the items were generated from the secondary dimension. Last, MBC and Q3 overall showed the highest power rates across the discrepancy statistics. For example, the average power rates across the conditions for X2, G2, OR, COV, MBC, and Q3 were .40, .41, .55, .54, .57, and .57, respectively. These results support a previous PPMC dimensionality study (Levy et al., 2009), where the PPP distributions of MBC and Q3 showed more uniformity than the other statistics.

Table 3.
Power Rates of Posterior Predictive Model Checking (PPMC) Dimensionality Assessment Methods for the 10-Item Conditions.

20% Multidimensional Items 40% Multidimensional Items

Disc COR N X2 G2 OR COV MBC Q3 X2 G2 OR COV MBC Q3

Low 0 500 .14 .14 .28 .28 .28 .28 .02 .02 .11 .11 .11 .11

1,000 .50 .52 .70 .68 .72 .70 .29 .30 .41 .41 .41 .41

2,000 .78 .80 .86 .84 .84 .84 .44 .46 .62 .60 .61 .62

0.3 500 .04 .04 .22 .22 .22 .22 .02 .02 .07 .08 .08 .08

1,000 .46 .46 .64 .64 .68 .68 .20 .20 .36 .36 .40 .39

2,000 .72 .76 .90 .88 .90 .90 .38 .39 .54 .51 .56 .56

0.6 500 .04 .04 .12 .12 .12 .12 .01 .01 .05 .06 .07 .07

1,000 .04 .08 .38 .32 .42 .38 .03 .03 .13 .13 .20 .19

2,000 .24 .24 .44 .40 .56 .56 .13 .14 .40 .36 .46 .46

Medium 0 500 .22 .22 .50 .50 .52 .52 .06 .07 .21 .20 .21 .21

1,000 .70 .70 .76 .76 .78 .78 .38 .38 .49 .48 .49 .49

2,000 .84 .84 .92 .90 .92 .92 .65 .65 .76 .74 .74 .75

0.3 500 .04 .04 .34 .32 .34 .32 .06 .06 .17 .17 .17 .18

1,000 .72 .72 .86 .84 .82 .82 .31 .32 .51 .51 .55 .55

2,000 .74 .76 .82 .80 .84 .86 .49 .50 .64 .63 .64 .64

0.6 500 .02 .00 .18 .18 .20 .20 .00 .00 .08 .08 .08 .08

1,000 .46 .48 .70 .66 .72 .72 .10 .10 .28 .27 .38 .38

2,000 .62 .62 .74 .74 .78 .78 .33 .33 .53 .49 .59 .59

High 0 500 .32 .34 .64 .64 .66 .66 .15 .15 .26 .26 .25 .25

1,000 .78 .80 .86 .86 .86 .86 .56 .56 .66 .65 .63 .63

2,000 .86 .88 .92 .90 .92 .92 .64 .65 .72 .70 .73 .73

0.3 500 .28 .28 .60 .58 .60 .60 .14 .14 .27 .27 .28 .29

1,000 .72 .72 .82 .80 .84 .84 .55 .55 .66 .64 .67 .67

2,000 .86 .86 .94 .92 .94 .94 .65 .65 .76 .75 .77 .77

0.6 500 .10 .08 .52 .52 .56 .56 .02 .02 .14 .14 .16 .16

1,000 .62 .62 .78 .74 .76 .76 .27 .27 .50 .50 .57 .58

2,000 .72 .72 .80 .76 .82 .84 .50 .50 .71 .67 .77 .77

Note: Disc = item discrimination; COR = theta dimension correlation; COV = covariance; MBC = model-based covariance; OR = odds ratio. Values in bold are power greater than or equal to .80.

Table 4.
Power Rates of Posterior Predictive Model Checking (PPMC) Dimensionality Assessment Methods for the 20-Item Conditions.

20% Multidimensional Items 40% Multidimensional Items

Disc COR N X2 G2 OR COV MBC Q3 X2 G2 OR COV MBC Q3

Low 0 500 .17 .17 .39 .39 .41 .41 .03 .03 .14 .14 .15 .15

1,000 .56 .56 .70 .66 .71 .72 .30 .30 .47 .45 .46 .47

2,000 .81 .81 .87 .86 .87 .87 .57 .59 .70 .68 .69 .69

0.3 500 .11 .11 .28 .28 .30 .31 .02 .02 .12 .12 .14 .14

1,000 .57 .57 .74 .72 .75 .75 .24 .24 .46 .45 .50 .50

2,000 .69 .69 .83 .79 .82 .82 .50 .51 .70 .67 .72 .73

0.6 500 .04 .04 .20 .20 .23 .23 .00 .00 .07 .07 .10 .10

1,000 .22 .23 .46 .45 .51 .51 .04 .04 .22 .21 .30 .30

2,000 .54 .55 .76 .73 .76 .76 .15 .18 .45 .41 .55 .55

Medium 0 500 .31 .32 .53 .52 .54 .55 .14 .14 .28 .27 .29 .29

1,000 .74 .75 .82 .81 .85 .85 .41 .41 .54 .52 .55 .55

2,000 .83 .83 .90 .87 .88 .88 .68 .69 .79 .77 .79 .79

0.3 500 .25 .26 .46 .46 .49 .49 .04 .04 .17 .17 .20 .20

1,000 .66 .66 .78 .77 .79 .79 .45 .46 .64 .62 .66 .67

2,000 .79 .80 .85 .85 .88 .88 .66 .67 .78 .76 .80 .80

0.6 500 .08 .08 .26 .26 .31 .30 .01 .01 .09 .09 .12 .12

1,000 .47 .48 .65 .62 .71 .71 .12 .12 .36 .34 .45 .45

2,000 .72 .73 .81 .80 .85 .85 .36 .37 .64 .60 .71 .71

High 0 500 .48 .48 .63 .62 .64 .63 .22 .22 .35 .35 .36 .36

1,000 .86 .86 .89 .88 .90 .90 .62 .62 .71 .70 .72 .73

2,000 .93 .93 .92 .91 .91 .91 .78 .78 .84 .83 .85 .85

0.3 500 .49 .49 .66 .66 .68 .68 .11 .11 .28 .28 .31 .31

1,000 .87 .87 .91 .91 .92 .92 .57 .58 .73 .71 .76 .76

2,000 .90 .90 .93 .91 .92 .92 .80 .80 .87 .85 .88 .88

0.6 500 .17 .17 .40 .39 .43 .42 .02 .02 .14 .14 .17 .18

1,000 .63 .63 .74 .74 .78 .78 .33 .33 .58 .56 .66 .66

2,000 .79 .80 .85 .84 .87 .88 .62 .62 .78 .76 .83 .83

Note: Disc = item discrimination; COR = theta dimension correlation; COV = covariance; MBC = model-based covariance; OR = odds ratio. Values in bold are power greater than or equal to .80.

Comparison with the Existing Methods

Table 5 shows the Type I error rates of the existing dimensionality assessment methods across the simulation conditions. Overall, the results show that the existing dimensionality methods controlled Type I error reasonably well, and YQ3 and MQ3 generally outperformed the χ²/df ratio. For the 20% multidimensional item conditions, the Type I error rates were well controlled across all methods. However, for the 40% multidimensional item conditions, the χ²/df ratio showed inflated Type I error results when the dimensional correlations were 0 and .3. YQ3 and MQ3 also showed inflated Type I error results when the number of items was 10, the dimensional correlation was 0, and the discrimination parameters were high.

Table 5.
Type I Error Rates of Adjusted χ2/df Ratio, Yen's Q3, and Modified Q3 Dimensionality Assessment Methods.

10 Items 20 Items

20% MD Items 40% MD Items 20% MD Items 40% MD Items

Disc COR N χ² YQ3 MQ3 χ² YQ3 MQ3 χ² YQ3 MQ3 χ² YQ3 MQ3

Low 0 500 .03 .01 .01 .20 .01 .01 .02 .00 .00 .13 .00 .00

1,000 .01 .00 .00 .17 .01 .01 .01 .00 .00 .13 .00 .00

2,000 .00 .00 .00 .16 .00 .00 .00 .00 .00 .09 .00 .00

0.3 500 .02 .01 .01 .10 .00 .00 .02 .00 .00 .04 .00 .00

1,000 .01 .00 .00 .02 .00 .00 .01 .00 .00 .05 .00 .00

2,000 .00 .00 .00 .01 .00 .00 .00 .00 .00 .03 .00 .00

0.6 500 .03 .01 .01 .04 .00 .00 .03 .00 .00 .03 .00 .00

1,000 .01 .00 .00 .02 .00 .00 .01 .00 .00 .02 .00 .00

2,000 .00 .00 .00 .01 .00 .00 .00 .00 .00 .01 .00 .00

Medium 0 500 .01 .03 .03 .15 .04 .03 .02 .00 .00 .16 .01 .00

1,000 .01 .02 .02 .15 .04 .04 .01 .00 .00 .23 .01 .00

2,000 .00 .01 .01 .12 .03 .03 .00 .00 .00 .14 .00 .00

0.3 500 .01 .03 .03 .05 .03 .02 .01 .00 .00 .05 .00 .00

1,000 .00 .01 .01 .05 .01 .01 .01 .00 .00 .04 .00 .00

2,000 .00 .01 .01 .07 .00 .00 .00 .00 .00 .09 .00 .00

0.6 500 .01 .03 .03 .02 .01 .01 .01 .00 .00 .02 .00 .00

1,000 .00 .01 .01 .01 .00 .00 .01 .00 .00 .02 .00 .00

2,000 .00 .00 .00 .03 .00 .00 .00 .00 .00 .02 .00 .00

High 0 500 .01 .05 .05 .13 .11 .09 .01 .00 .00 .12 .02 .01

1,000 .00 .04 .04 .08 .12 .11 .01 .00 .00 .13 .02 .01

2,000 .00 .03 .03 .19 .10 .09 .00 .00 .00 .13 .02 .00

0.3 500 .00 .06 .06 .07 .06 .04 .01 .00 .00 .06 .01 .00

1,000 .01 .04 .04 .09 .03 .04 .00 .00 .00 .08 .00 .00

2,000 .00 .02 .03 .09 .03 .03 .00 .00 .00 .07 .00 .00

0.6 500 .01 .04 .04 .02 .02 .02 .01 .00 .00 .03 .00 .00

1,000 .00 .02 .02 .02 .01 .01 .00 .00 .00 .04 .00 .00

2,000 .00 .02 .02 .03 .01 .01 .00 .00 .00 .04 .00 .00

Note: χ² = adjusted chi-square ratio, YQ3 = Yen's Q3, MQ3 = modified Q3. Values in bold are Type I error greater than .05.

Figure 3 shows the power rates of detecting multidimensionality across the methods for the 20% multidimensional item condition. Given that the 40% multidimensional item condition showed an inflated Type I error, we present only the 20% multidimensional item condition for the 20-item test. In Figure 3, we also include PPMC with Q3 and MBC to directly compare the performances. As shown in the figure, PPMC with Q3 and MBC outperformed existing methods across the conditions. The power rates of PPMC with Q3 and MBC were consistently higher than those of existing methods, and the power difference was more evident as the dimensional correlation increased. YQ3 and MQ3 showed minimal effectiveness, and the χ²/df ratio showed somewhat better performance for detecting multidimensionality. As the sample size increased, the performance of the χ²/df ratio and the PPMC methods with Q3 and MBC increased, whereas that of YQ3 and MQ3 remained consistent.

Figure 3.
Power rates of detecting generalized graded unfolding model (GGUM) multidimensionality across methods for the 20% multidimensional item condition.

Item and Latent Trait Parameter Estimates

To investigate the impact of multidimensional scales on the GGUM, we computed the bias, RMSE, and CORR for the item and latent trait parameter estimates across simulation conditions. The item and latent trait parameter estimates were computed by fitting the unidimensional GGUM. Figures 4 and 5 show the bias and the RMSE of the item parameter estimates across the sample size and dimensional correlation conditions, respectively. In Figure 4, panels a and c (left column) show the bias and RMSE results for the primary dimension item parameters, and panels b and d (right column) show the bias and RMSE results for the secondary dimension item parameters. Item parameter estimates for the primary dimension showed minimal bias, whereas those for the secondary dimension showed substantial bias across different sample sizes. The bias of the primary dimension items was close to 0, but that of the secondary dimension items ranged from −.75 to 0. The bias of the alpha parameters increased as the sample size increased, and the bias of the tau parameters decreased as the sample size increased. The bias of the delta parameters remained 0 across all sample sizes. Similarly, the RMSEs for the primary dimension item parameters were relatively small, whereas the RMSEs for the secondary dimension items were relatively large. As the sample size increased, the overall RMSEs decreased for the primary dimension item parameters, and the values were consistent with previous GGUM estimation studies (Roberts & Thompson, 2011; Roberts et al., 2000). For the secondary dimension items, the RMSEs of the alpha parameters increased substantially as the sample size increased, and the RMSEs of the delta parameters decreased as the sample size increased. The RMSEs of the tau parameters showed consistent results across the sample size conditions.

Figure 4.
Bias and root mean squared error (RMSE) of the generalized graded unfolding model (GGUM) item parameter estimation for primary and secondary dimension items across sample size.

Figure 5.
Bias and root mean squared error (RMSE) of the generalized graded unfolding model (GGUM) item parameter estimation for primary and secondary dimension items across dimension correlations.

Figure 5 also shows the bias and the RMSE for the GGUM item parameter estimates, but the results are summarized across the dimensional correlation conditions. Consistently, the item parameter estimates for the primary dimension showed minimal bias across the dimensional correlation conditions, and the corresponding RMSE values were close to 0, similar to the results illustrated in Figure 4. The item parameter estimates for the secondary dimension showed nonignorable bias, especially for the alpha parameters, and the RMSE values were considerably high. Consistent with the power rate results as shown in Tables 3 and 4, increasing dimensional correlation decreased the bias and the RMSE of the secondary dimension item parameter estimates. The delta parameters showed the highest RMSE values when the dimensional correlation condition was 0, but as the dimensional correlation increased, the RMSE decreased, and the values were similar to those of the alpha and tau parameters. However, regardless of the conditions, the RMSEs of the secondary dimension item parameters were substantially higher than those of the primary dimension item parameters.

Figure 6 shows the RMSE and the CORR of the latent trait parameter estimates across the sample size (left column) and the number of items (right column). As expected, increasing the sample size and the number of items overall improved the accuracy of the GGUM latent trait parameter estimates. However, as the proportion of the secondary items increased, accuracy decreased notably. As shown in Figure 6, regardless of the sample size and the number of items, the RMSEs of the latent trait parameter estimates increased, and the CORRs of the estimates decreased consistently as the number of items for the secondary dimension increased from 20% to 40%.

Figure 6.
Root mean squared error (RMSE) and CORR of the generalized graded unfolding model (GGUM) latent trait estimation for the proportion of the secondary dimension items across sample size and the number of items.

Empirical Example

To demonstrate the feasibility of PPMC dimensionality assessment methods, we applied the methods to an open-source personality data set (openpsychometric.org). The Big Five personality measure includes items from the International Personality Item Pool (IPIP; Goldberg, 1992) and consists of 50 items (i.e., 10 items per dimension). To illustrate the performance of dimensionality assessment methods, we purposely selected eight items from a conscientiousness scale and two items from an openness scale. Thus, the conscientiousness trait is considered the primary dimension, and the openness trait is considered the secondary dimension for the example scale. Based on the simulation results, 2,000 examinees were randomly drawn from the total data set. The Big Five personality data set was originally coded as 5-point scales, but for this demonstration, we dichotomized the responses such that any response less than or is equal to 3 was recoded as 0, and 1 otherwise. We first analyzed the response data with the GGUM by calibrating the item and person parameters via MCMC estimation⁴. After we obtained the GGUM parameters, we used the PPMC dimensionality assessment to analyze the scale dimensionality. The dichotomized item response data set and the estimated item parameters were used for the GGUM dimensionality analysis. The R code for this analysis is available in the online Supplemental Material.

The item parameter estimates, and their PSD results are shown in Table 6. Overall, the items were well estimated, and the GGUM parameters were similar to values commonly observed in applied settings. The discrimination parameter estimates ranged from .30 to 2.23 and an average of 1.26. We also found that the item discrimination estimates from the openness trait showed low values. For the location parameters, the estimates showed a pattern consistent with their item directions. That is, positive items produced positive location parameter values, and negative items produced negative location parameter values. The range of the location parameters was also within the range commonly observed in noncognitive measurement.

Table 6.
Generalized Graded Unfolding Model (GGUM) Item Parameter Estimates and Standard Error from the Empirical Example.

Item Dimension Est(α) Est(δ) Est(τ) SE(α) SE(δ) SE(τ)

1 C 1.22 1.69 −1.70 .13 .31 .29

2 C 1.67 −1.38 −1.71 .19 .28 .27

3 C 1.05 1.42 −0.09 .22 .43 .17

4 C 1.70 −1.05 −1.94 .15 .11 .12

5 C 1.34 1.64 −2.51 .12 .21 .21

6 C 2.23 −1.39 −1.74 .26 .27 .26

7 C 1.37 1.75 −1.29 .32 .29 .27

8 C 1.23 −0.93 −2.46 .12 .13 .16

9 O 0.49 0.43 −0.60 .08 .25 .18

10 O 0.30 3.70 −0.77 .02 .21 .15

Dimension Direction Statements

C Pos I am always prepared.

C Neg I leave my belongings around.

C Pos I pay attention to details.

C Neg I make a mess of things.

C Pos I get chores done right away.

C Neg I often forget to put things back in their proper place.

C Pos I like order.

C Neg I shirk my duties.

O Pos I have a rich vocabulary.

O Pos I have a vivid imagination.

Note: C = conscientiousness; O = openness; Pos = positive keyed; Neg = negative keyed.

Table 7 shows the dimensionality assessment results for the empirical data. We computed the item pair PPP values based on the Q3 and MBC statistics. Note that extreme PPP values (less than .025 or greater than .975) indicate item misfit from the multidimensionality of the scale. As shown in Table 7, only one item pair (items 9 and 10) was detected as extreme PPP from the Q3 and MBC statistics, and the other item pairs showed moderate PPP values ranging between .13 and .50 for MBC and .06 and .94 for Q3. As intended, items 9 and 10 were correctly detected from the PPMC dimensionality methods.

Table 7.
Posterior Predictive P-Value (PPP) of Item Pairs Using PPMC-MBC and PPMC-Q3 from the Empirical Example.

PPP Item Pair Matrix (MBC)

Items 1 2 3 4 5 6 7 8 9

1 –

2 .47 –

3 .21 .37 –

4 .43 .26 .34 –

5 .30 .50 .29 .48 –

6 .48 .25 .42 .25 .50 –

7 .29 .45 .24 .45 .33 .48 –

8 .44 .31 .35 .28 .45 .31 .46 –

9 .24 .27 .14 .24 .34 .30 .27 .28 –

10 .29 .28 .13 .20 .36 .26 .29 .27 .00

PPP Item Pair Matrix (Q3)

Items 1 2 3 4 5 6 7 8 9

1 –

2 .93 –

3 .07 .92 –

4 .92 .09 .91 –

5 .07 .93 .09 .90 –

6 .93 .06 .94 .09 .91 –

7 .08 .93 .09 .91 .10 .94 –

8 .90 .11 .87 .10 .84 .12 .86 –

9 .06 .13 .07 .33 .40 .75 .11 .27 –

10 .24 .11 .10 .08 .78 .08 .26 .20 .00

Note: MBC = model-based covariance; PPMC = posterior predictive model checking. Values greater than .975 or less than .025 are in bold.

Discussion

The purpose of this study was to develop dimensionality assessment tools for the most widely used ideal point IRT model, GGUM, using the Bayesian PPMC method, and explore the efficacy of different discrepancy statistics in conjunction with the sample size, the number of items, the level of the discrimination parameters, the dimensional correlation, and the proportion of multidimensionality. We evaluated the power and Type I error of the developed methods and investigated the impact of the item and latent trait parameters of the GGUM under the violation of unidimensionality. We also compared the PPMC dimensionality methods with existing methods. Finally, we demonstrated the PPMC dimensionality assessment procedure with the empirical example data set.

The results first showed Type I error rates were well controlled at the nominal level when 20% of the total items were generated from the secondary dimension. However, when the proportion of secondary dimension items reached 40%, the Type I error rates were inflated in the multidimensionality detection methods. This finding is expected, because if almost half of the items were generated from the secondary dimension, the magnitude of unidimensionality for the scale would diminish, resulting in a reduction in item pairs that reflect the same dimension (Levy et al., 2009; Zhang & Stout, 1999). Second, the sample size, the item discrimination, and the dimensional correlation had a significant impact on detecting the multidimensional scales. We found acceptable power rates under the conditions of 2,000 sample size, high discrimination parameters (i.e., GGUM alpha greater than 1.50), and zero-dimensional correlation. However, similar to the Type I error results, power diminished notably when the proportion of secondary dimension items reached 40% and low discrimination parameters were used. Third, the discrepancy statistics MBC and Q3 performed best, yielding the highest power while controlling the Type I error rates. Fourth, the PPMC dimensionality assessment methods outperformed the existing methods (adjusted χ²/df ratio, YQ3, and MQ3). Although the χ²/df ratio, YQ3, and MQ3 showed well-controlled Type I error rates for the 20% multidimensional conditions, and Type I error was inflated as the proportion of multidimensionality increased. More importantly, the power to detect multidimensional items for the existing methods was unacceptably lower than that for the PPMC methods. Fifth, the PPMC dimensionality assessment methods correctly detected the secondary dimensional items from the personality example data set. This finding assured that the PPMC dimensionality assessment methods can be utilized in practical settings. Finally, the violation of unidimensionality of scales substantially biased the GGUM item parameters, especially the discrimination parameter.

Implications for Organizational Researchers

The present study has several important implications for noncognitive measurement research. First, the developed methods can be readily applied in the development of ideal point scales. Previously, researchers either ignored assessing the dimensionality of scales or used conventional dimensional assessment methods, such as FA or PCA (Cao et al., 2015; Tay & Drasgow, 2012a). However, conventional methods are inappropriate for ideal point scales due to the nonmonotonic nature of the response process; therefore, applying conventional methods to ideal point scales may distort the results. As illustrated in the introduction, PCA incorrectly identified the ideal point scale as two dimensions when the scale was generated as unidimensional. As Tay and Drasgow (2012a) noted, PCA does not take the intermediate latent trait continuum into account, but instead, incorrectly assumes two construct dimensions for the positive and negative ends of the latent trait continuum. Moreover, previously developed methods (e.g., adjusted χ²/df ratio and modified Yen's Q3) were not very effective at detecting the violation of unidimensionality for the ideal point scales (Habing et al., 2005; Tay & Drasgow, 2012b). In the present study, we have shown the effectiveness of the PPMC dimensionality assessment method for the ideal point scales and the newly developed method consistently outperformed the existing methods across the simulation conditions. We posit that the program developed for the present study can be a useful tool for organizational researchers to check the dimensionality of ideal point scales, given that the program was specifically developed for the ideal point IRT model (the R code is available for applied researchers in the online Supplemental Material).

However, it is worthwhile to note that the PPMC method did not show sufficient power in some conditions. Although the PPMC method consistently outperformed the existing methods, the PPMC method still showed limited power. For example, when 40% of the items were generated from the secondary dimension, the highest power only reached .88 using MBC and Q3 as the discrepancy statistics. In addition, when the discrimination parameter values were relatively low (e.g., less than 1) for the 40% multidimensionality condition, no discrepancy statistic from the PPMC method showed sufficient power. Although this finding is consistent with previous studies (e.g., Levy et al., 2009) and the 40% multidimensionality is too extreme and may not be considered as a unidimensional scale, we recommend organizational researchers carefully investigate and interpret dimensionality assessment results with the PPMC method. Furthermore, highly discriminating items with a well-constructed scale would produce the best result for the PPMC dimensionality assessment.

Second, the developed dimensionality assessment methods for the ideal point model can be implemented in person-organization (P-O) fit measures. In principle, ideal point models have been used to develop and evaluate P-O fit measures because the ideal point response process is more appropriate for P-O measurement than the dominance response process. Specifically, examinees are highly likely to endorse P-O items if their job matches the level of the examinees’ latent traits. Chernyshenko et al. (2009), for example, developed a P-O fit measure using the ideal point IRT model and provided incremental validity evidence in predicting intentions to leave the job. However, in their study, the dimensionality of P-O measures was not investigated, and evidence for scale unidimensionality has not been reported. In future, researchers can investigate the dimensionality of P-O fit measures using the PPMC dimensionality assessment methods and improve the validity of the measurement by detecting and removing items that possibly measure an irrelevant construct dimension.

Third, the GGUM dimensionality assessment methods can also be applied for the development of multidimensional forced choice (MFC). Note that MFC measures consist of two or more statements representing different construct dimensions within an item, and examinees are asked to choose or rank the statements based on their preference. Over the last few decades, new IRT models have been exclusively developed for MFC measures to reduce the ipsative problem associated with the traditional scoring method (Brown & Maydeu-Olivares, 2011; Joo et al., 2018; Lee et al., 2019), and one of the most popular IRT models based on the ideal point assumption is multi-unidimensional pairwise preference (MUPP; Stark et al., 2005). The MUPP model has been implemented in industrial applications, including the Tailored Adaptive Personality Assessment System (TAPAS; Drasgow et al., 2012; Stark et al., 2014). In practice, MUPP item calibration and scoring are performed based on the GGUM statement probability function, and the GGUM statement parameters play an important role in test development in terms of designing and constructing MFC items (Joo et al., 2020). For example, test developers would like to match similar statement extremities within an MFC item to create fake-resistant MFC measures, and to do so, accurate and valid statement parameter estimates should be obtained in the GGUM item calibration step (Stark et al., 2005). In addition, it is also critical for test developers to ensure the unidimensionality of statements to increase the validity of MFC measures. However, to date, no statistical software was available for the GGUM dimensionality assessment, and the construct validity of MFC statements for MUPP has been heavily dependent on the item writers’ subjective judgment. Therefore, the present study contributes to MFC research by providing options for test developers to statistically evaluate the statement level construct validity and unidimensionality.

Limitations and Future Research

The present study also has several limitations. First, the simulation study did not include polytomous responses in the design; the simulation explored only the dichotomous responses of the GGUM, and dimensionality assessment methods were specifically developed for the dichotomous responses only. However, the PPMC method can be easily extended for the polytomous responses of the GGUM by including discrepancy statistics for polytomous responses. For example, the global OR for polytomous responses (Penfield, 2007) can be implemented as discrepancy statistics in the PPMC method, and one can evaluate the performance of the method under various conditions. In addition, the polytomous version of chi-square statistics (Drasgow et al., 1995) can be applied in the future. Second, based on the simulation results, the PPMC method still requires a large number of examinees to obtain the acceptable power rate (e.g., power greater than or equal to .80). Although the PPMC method has shown better performance than existing methods, organizational researchers still need a large number of examinees to meet the statistical standards. This may be challenging for some organizational studies where a relatively smaller number of examinees are generally collected (e.g., employee or student data in research setting). However, large-scale personality assessments such as college admission tests or military selections generally have a large number of examinees (or applicants) and ideal point IRT modeling and the dimensionality assessment be easily applied. A future study should investigate and develop more efficient dimensionality assessment methods that meet the statistical standards and require a fewer number of examinees. Third, the proposed PPMC dimensionality assessment methods depend on the specific ideal point IRT model, the GGUM. That is, the proposed PPMC dimensionality methods would perform effectively if two assumptions are met: (a) The observed response data fit the GGUM adequately and (b) the GGUM item parameters are accurately estimated. Again, these assumptions may not easily hold in organizational studies due to various reasons, and a future study should develop dimensionality assessment methods that do not depend on a specific model (i.e., the nonparametric approach).

Despite these limitations, this study contributes to and advances research on dimensionality assessment with ideal point measures. The PPMC dimensionality methods described in this paper have many potential extensions for ideal point scales, and more investigation is needed to refine the methods and applications. We posit that the methods and program developed for this study will help organizational researchers apply IRT scaling for noncognitive constructs more gainfully in research and practice.

Supplemental Material

sj-docx-1-orm-10.1177_10944281211050609 - Supplemental material for Assessing Dimensionality of the Ideal Point Item Response Theory Model Using Posterior Predictive Model Checking

Supplemental material, sj-docx-1-orm-10.1177_10944281211050609 for Assessing Dimensionality of the Ideal Point Item Response Theory Model Using Posterior Predictive Model Checking by Seang-Hwane Joo, Philseok Lee, Jung Yeon Park and Stephen Stark in Organizational Research Methods

			20% Multidimensional Items	40% Multidimensional Items
Low	0	500	.00	.00	.01	.01	.01	.01	.01	.01	.03	.03	.03	.03
		1,000	.00	.00	.01	.00	.02	.02	.01	.01	.05	.05	.05	.05
		2,000	.00	.00	.01	.00	.01	.01	.03	.03	.06	.06	.07	.07
	0.3	500	.00	.00	.02	.02	.02	.02	.01	.01	.04	.03	.04	.04
		1,000	.00	.00	.00	.00	.02	.02	.01	.01	.04	.04	.05	.05
		2,000	.00	.00	.01	.00	.01	.02	.03	.04	.08	.07	.09	.09
	0.6	500	.00	.00	.02	.02	.02	.02	.00	.00	.02	.02	.02	.02
		1,000	.00	.00	.01	.01	.02	.02	.00	.00	.03	.03	.05	.05
		2,000	.00	.00	.01	.00	.01	.01	.00	.00	.03	.01	.05	.05
Medium	0	500	.00	.00	.01	.01	.01	.01	.01	.01	.04	.04	.04	.04
		1,000	.00	.00	.01	.01	.02	.02	.04	.04	.07	.07	.08	.08
		2,000	.00	.00	.00	.00	.01	.01	.04	.04	.06	.06	.07	.07
	0.3	500	.00	.00	.01	.01	.02	.02	.00	.00	.03	.03	.03	.03
		1,000	.00	.00	.01	.01	.02	.02	.03	.03	.06	.06	.08	.08
		2,000	.00	.00	.01	.00	.02	.02	.06	.07	.10	.09	.12	.12
	0.6	500	.00	.00	.01	.01	.02	.02	.00	.00	.03	.03	.03	.03
		1,000	.00	.00	.00	.00	.01	.01	.00	.00	.03	.03	.06	.06
		2,000	.00	.00	.00	.00	.02	.02	.01	.01	.05	.03	.11	.11
High	0	500	.00	.00	.01	.01	.02	.02	.02	.02	.07	.07	.08	.08
		1,000	.00	.00	.00	.00	.01	.01	.05	.05	.07	.07	.08	.08
		2,000	.00	.00	.00	.00	.01	.01	.09	.09	.12	.12	.13	.13
	0.3	500	.00	.00	.01	.01	.02	.02	.01	.01	.05	.05	.05	.05
		1,000	.00	.00	.01	.01	.02	.02	.06	.06	.11	.11	.15	.15
		2,000	.00	.00	.00	.00	.02	.02	.07	.07	.12	.12	.17	.17
	0.6	500	.00	.00	.01	.01	.02	.02	.00	.01	.04	.04	.05	.05
		1,000	.00	.00	.01	.01	.02	.02	.02	.02	.10	.10	.15	.15
		2,000	.00	.00	.01	.00	.02	.02	.02	.02	.08	.06	.15	.15

			20% Multidimensional Items	40% Multidimensional Items
Low	0	500	.00	.00	.03	.03	.04	.04	.01	.01	.08	.08	.09	.09
		1,000	.00	.00	.01	.01	.05	.05	.04	.04	.10	.09	.12	.12
		2,000	.00	.00	.01	.01	.04	.04	.04	.04	.07	.06	.08	.08
	0.3	500	.00	.00	.03	.03	.04	.04	.01	.01	.05	.05	.07	.06
		1,000	.00	.00	.01	.01	.04	.04	.02	.02	.07	.07	.12	.11
		2,000	.00	.00	.01	.01	.04	.05	.02	.02	.08	.07	.12	.12
	0.6	500	.00	.00	.04	.04	.05	.05	.00	.00	.04	.04	.05	.05
		1,000	.00	.00	.02	.01	.05	.05	.00	.00	.03	.03	.09	.09
		2,000	.00	.00	.01	.01	.04	.05	.00	.00	.05	.04	.12	.12
Medium	0	500	.00	.00	.03	.03	.05	.05	.03	.03	.10	.10	.11	.11
		1,000	.00	.00	.01	.01	.05	.05	.08	.08	.13	.13	.15	.15
		2,000	.00	.00	.01	.01	.04	.04	.03	.03	.06	.06	.09	.09
	0.3	500	.00	.00	.03	.03	.05	.05	.02	.02	.09	.09	.11	.11
		1,000	.00	.00	.01	.01	.04	.04	.02	.02	.08	.08	.14	.14
		2,000	.00	.00	.01	.01	.05	.05	.05	.05	.11	.10	.16	.16
	0.6	500	.00	.00	.03	.03	.05	.05	.01	.01	.06	.06	.08	.08
		1,000	.00	.00	.02	.02	.05	.05	.01	.01	.07	.06	.13	.13
		2,000	.00	.00	.01	.01	.05	.05	.01	.01	.09	.07	.18	.18
High	0	500	.00	.00	.03	.03	.04	.04	.07	.07	.16	.16	.17	.17
		1,000	.00	.00	.01	.01	.05	.05	.08	.08	.12	.12	.14	.14
		2,000	.00	.00	.01	.01	.04	.04	.06	.06	.10	.09	.12	.12
	0.3	500	.00	.00	.03	.03	.05	.05	.03	.03	.13	.13	.15	.15
		1,000	.00	.00	.01	.01	.04	.04	.08	.09	.17	.17	.23	.23
		2,000	.00	.00	.01	.01	.04	.04	.05	.05	.13	.12	.21	.21
	0.6	500	.00	.00	.03	.03	.04	.04	.00	.00	.06	.06	.09	.09
		1,000	.00	.00	.02	.02	.05	.05	.01	.01	.09	.09	.18	.18
		2,000	.00	.00	.01	.01	.05	.05	.02	.03	.14	.13	.27	.27

			20% Multidimensional Items	40% Multidimensional Items
Low	0	500	.14	.14	.28	.28	.28	.28	.02	.02	.11	.11	.11	.11
		1,000	.50	.52	.70	.68	.72	.70	.29	.30	.41	.41	.41	.41
		2,000	.78	.80	.86	.84	.84	.84	.44	.46	.62	.60	.61	.62
	0.3	500	.04	.04	.22	.22	.22	.22	.02	.02	.07	.08	.08	.08
		1,000	.46	.46	.64	.64	.68	.68	.20	.20	.36	.36	.40	.39
		2,000	.72	.76	.90	.88	.90	.90	.38	.39	.54	.51	.56	.56
	0.6	500	.04	.04	.12	.12	.12	.12	.01	.01	.05	.06	.07	.07
		1,000	.04	.08	.38	.32	.42	.38	.03	.03	.13	.13	.20	.19
		2,000	.24	.24	.44	.40	.56	.56	.13	.14	.40	.36	.46	.46
Medium	0	500	.22	.22	.50	.50	.52	.52	.06	.07	.21	.20	.21	.21
		1,000	.70	.70	.76	.76	.78	.78	.38	.38	.49	.48	.49	.49
		2,000	.84	.84	.92	.90	.92	.92	.65	.65	.76	.74	.74	.75
	0.3	500	.04	.04	.34	.32	.34	.32	.06	.06	.17	.17	.17	.18
		1,000	.72	.72	.86	.84	.82	.82	.31	.32	.51	.51	.55	.55
		2,000	.74	.76	.82	.80	.84	.86	.49	.50	.64	.63	.64	.64
	0.6	500	.02	.00	.18	.18	.20	.20	.00	.00	.08	.08	.08	.08
		1,000	.46	.48	.70	.66	.72	.72	.10	.10	.28	.27	.38	.38
		2,000	.62	.62	.74	.74	.78	.78	.33	.33	.53	.49	.59	.59
High	0	500	.32	.34	.64	.64	.66	.66	.15	.15	.26	.26	.25	.25
		1,000	.78	.80	.86	.86	.86	.86	.56	.56	.66	.65	.63	.63
		2,000	.86	.88	.92	.90	.92	.92	.64	.65	.72	.70	.73	.73
	0.3	500	.28	.28	.60	.58	.60	.60	.14	.14	.27	.27	.28	.29
		1,000	.72	.72	.82	.80	.84	.84	.55	.55	.66	.64	.67	.67
		2,000	.86	.86	.94	.92	.94	.94	.65	.65	.76	.75	.77	.77
	0.6	500	.10	.08	.52	.52	.56	.56	.02	.02	.14	.14	.16	.16
		1,000	.62	.62	.78	.74	.76	.76	.27	.27	.50	.50	.57	.58
		2,000	.72	.72	.80	.76	.82	.84	.50	.50	.71	.67	.77	.77

			20% Multidimensional Items	40% Multidimensional Items
Low	0	500	.17	.17	.39	.39	.41	.41	.03	.03	.14	.14	.15	.15
		1,000	.56	.56	.70	.66	.71	.72	.30	.30	.47	.45	.46	.47
		2,000	.81	.81	.87	.86	.87	.87	.57	.59	.70	.68	.69	.69
	0.3	500	.11	.11	.28	.28	.30	.31	.02	.02	.12	.12	.14	.14
		1,000	.57	.57	.74	.72	.75	.75	.24	.24	.46	.45	.50	.50
		2,000	.69	.69	.83	.79	.82	.82	.50	.51	.70	.67	.72	.73
	0.6	500	.04	.04	.20	.20	.23	.23	.00	.00	.07	.07	.10	.10
		1,000	.22	.23	.46	.45	.51	.51	.04	.04	.22	.21	.30	.30
		2,000	.54	.55	.76	.73	.76	.76	.15	.18	.45	.41	.55	.55
Medium	0	500	.31	.32	.53	.52	.54	.55	.14	.14	.28	.27	.29	.29
		1,000	.74	.75	.82	.81	.85	.85	.41	.41	.54	.52	.55	.55
		2,000	.83	.83	.90	.87	.88	.88	.68	.69	.79	.77	.79	.79
	0.3	500	.25	.26	.46	.46	.49	.49	.04	.04	.17	.17	.20	.20
		1,000	.66	.66	.78	.77	.79	.79	.45	.46	.64	.62	.66	.67
		2,000	.79	.80	.85	.85	.88	.88	.66	.67	.78	.76	.80	.80
	0.6	500	.08	.08	.26	.26	.31	.30	.01	.01	.09	.09	.12	.12
		1,000	.47	.48	.65	.62	.71	.71	.12	.12	.36	.34	.45	.45
		2,000	.72	.73	.81	.80	.85	.85	.36	.37	.64	.60	.71	.71
High	0	500	.48	.48	.63	.62	.64	.63	.22	.22	.35	.35	.36	.36
		1,000	.86	.86	.89	.88	.90	.90	.62	.62	.71	.70	.72	.73
		2,000	.93	.93	.92	.91	.91	.91	.78	.78	.84	.83	.85	.85
	0.3	500	.49	.49	.66	.66	.68	.68	.11	.11	.28	.28	.31	.31
		1,000	.87	.87	.91	.91	.92	.92	.57	.58	.73	.71	.76	.76
		2,000	.90	.90	.93	.91	.92	.92	.80	.80	.87	.85	.88	.88
	0.6	500	.17	.17	.40	.39	.43	.42	.02	.02	.14	.14	.17	.18
		1,000	.63	.63	.74	.74	.78	.78	.33	.33	.58	.56	.66	.66
		2,000	.79	.80	.85	.84	.87	.88	.62	.62	.78	.76	.83	.83

			10 Items	20 Items
Low	0	500	.03	.01	.01	.20	.01	.01	.02	.00	.00	.13	.00	.00
		1,000	.01	.00	.00	.17	.01	.01	.01	.00	.00	.13	.00	.00
		2,000	.00	.00	.00	.16	.00	.00	.00	.00	.00	.09	.00	.00
	0.3	500	.02	.01	.01	.10	.00	.00	.02	.00	.00	.04	.00	.00
		1,000	.01	.00	.00	.02	.00	.00	.01	.00	.00	.05	.00	.00
		2,000	.00	.00	.00	.01	.00	.00	.00	.00	.00	.03	.00	.00
	0.6	500	.03	.01	.01	.04	.00	.00	.03	.00	.00	.03	.00	.00
		1,000	.01	.00	.00	.02	.00	.00	.01	.00	.00	.02	.00	.00
		2,000	.00	.00	.00	.01	.00	.00	.00	.00	.00	.01	.00	.00
Medium	0	500	.01	.03	.03	.15	.04	.03	.02	.00	.00	.16	.01	.00
		1,000	.01	.02	.02	.15	.04	.04	.01	.00	.00	.23	.01	.00
		2,000	.00	.01	.01	.12	.03	.03	.00	.00	.00	.14	.00	.00
	0.3	500	.01	.03	.03	.05	.03	.02	.01	.00	.00	.05	.00	.00
		1,000	.00	.01	.01	.05	.01	.01	.01	.00	.00	.04	.00	.00
		2,000	.00	.01	.01	.07	.00	.00	.00	.00	.00	.09	.00	.00
	0.6	500	.01	.03	.03	.02	.01	.01	.01	.00	.00	.02	.00	.00
		1,000	.00	.01	.01	.01	.00	.00	.01	.00	.00	.02	.00	.00
		2,000	.00	.00	.00	.03	.00	.00	.00	.00	.00	.02	.00	.00
High	0	500	.01	.05	.05	.13	.11	.09	.01	.00	.00	.12	.02	.01
		1,000	.00	.04	.04	.08	.12	.11	.01	.00	.00	.13	.02	.01
		2,000	.00	.03	.03	.19	.10	.09	.00	.00	.00	.13	.02	.00
	0.3	500	.00	.06	.06	.07	.06	.04	.01	.00	.00	.06	.01	.00
		1,000	.01	.04	.04	.09	.03	.04	.00	.00	.00	.08	.00	.00
		2,000	.00	.02	.03	.09	.03	.03	.00	.00	.00	.07	.00	.00
	0.6	500	.01	.04	.04	.02	.02	.02	.01	.00	.00	.03	.00	.00
		1,000	.00	.02	.02	.02	.01	.01	.00	.00	.00	.04	.00	.00
		2,000	.00	.02	.02	.03	.01	.01	.00	.00	.00	.04	.00	.00

Item	Dimension	Est(α)	Est(δ)	Est(τ)	SE(α)	SE(δ)	SE(τ)
1	C	1.22	1.69	−1.70	.13	.31	.29
2	C	1.67	−1.38	−1.71	.19	.28	.27
3	C	1.05	1.42	−0.09	.22	.43	.17
4	C	1.70	−1.05	−1.94	.15	.11	.12
5	C	1.34	1.64	−2.51	.12	.21	.21
6	C	2.23	−1.39	−1.74	.26	.27	.26
7	C	1.37	1.75	−1.29	.32	.29	.27
8	C	1.23	−0.93	−2.46	.12	.13	.16
9	O	0.49	0.43	−0.60	.08	.25	.18
10	O	0.30	3.70	−0.77	.02	.21	.15
Dimension	Direction	Statements
C	Pos	I am always prepared.
C	Neg	I leave my belongings around.
C	Pos	I pay attention to details.
C	Neg	I make a mess of things.
C	Pos	I get chores done right away.
C	Neg	I often forget to put things back in their proper place.
C	Pos	I like order.
C	Neg	I shirk my duties.
O	Pos	I have a rich vocabulary.
O	Pos	I have a vivid imagination.

PPP Item Pair Matrix (MBC)
1	–
2	.47	–
3	.21	.37	–
4	.43	.26	.34	–
5	.30	.50	.29	.48	–
6	.48	.25	.42	.25	.50	–
7	.29	.45	.24	.45	.33	.48	–
8	.44	.31	.35	.28	.45	.31	.46	–
9	.24	.27	.14	.24	.34	.30	.27	.28	–
10	.29	.28	.13	.20	.36	.26	.29	.27	.00
PPP Item Pair Matrix (Q3)
Items	1	2	3	4	5	6	7	8	9
1	–
2	.93	–
3	.07	.92	–
4	.92	.09	.91	–
5	.07	.93	.09	.90	–
6	.93	.06	.94	.09	.91	–
7	.08	.93	.09	.91	.10	.94	–
8	.90	.11	.87	.10	.84	.12	.86	–
9	.06	.13	.07	.33	.40	.75	.11	.27	–
10	.24	.11	.10	.08	.78	.08	.26	.20	.00

Footnotes

Appendix

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iDs

Seang-Hwane Joo

Philseok Lee

Supplemental Material

Supplemental material for this article is available online.

Notes

Author Biographies

Seang-Hwane Joo is an assistant professor in the department of educational psychology at the University of Kansas. His research focuses on item response theory, psychometrics, and multilevel modeling.

Philseok Lee is an assistant professor in the department of psychology at George Mason University. His research focuses on the applications of modern psychometric modeling and personnel assessment.

Jung Yeon Park is an assistant professor in the division of educational psychology and research methods. Her research focuses on latent variable modeling, longitudinal data analysis, and Bayesian data analysis.

Stephen Stark is a professor and chair of the department of psychology at the University of South Florida. His research focuses on the development and application of psychometric methods to practical problems in industrial organizational and educational settings.

References

Andrich

Styles

I. M.

(1998). The structural relationship between attitude and behavior statements from the unfolding perspective. Psychological Methods, 3, 454-469.

Barrick

M. R.

Mount

M. K.

Judge

T. A.

(2001). Personality and performance at the beginning of the new millennium: What do we know and where do we go next? International Journal of Selection and Assessment, 9, 9-30.

Berkhof

van Mechelen

Gelman

(2004). Enhancing the performance of a posterior predictive check (IAP statistics network tech. Rep. 0350). Interuniversity Attraction Pole, Katholieke Universiteit Leuven.

Birkeland

S. A.

Manson

T. M.

Kisamore

J. L.

Brannick

M. T.

Smith

M. A.

(2006). A meta-analytic investigation of job applicant faking on personality measures. International Journal of Selection and Assessment, 14, 317-335.

Brown

Maydeu-Olivares

(2011). Item response modeling of forced-choice questionnaires. Educational and Psychological Measurement, 71, 460-502.

Cacioppo

J. T.

Berntson

G. G.

(1994). Relationship between attitudes and evaluative space: A critical review, with emphasis on the separability of positive and negative substrates. Psychological Bulletin, 115, 401-423.

Cacioppo

J. T.

Gardner

W. L.

Berntson

G. G.

(1997). Beyond bipolar conceptualizations and measures: The case of attitudes and evaluative space. Personality and Social Psychology Review, 1, 3-25.

Cao

Drasgow

Cho

(2015). Developing ideal intermediate personality items for the ideal point model. Organizational Research Methods, 18, 252-275.

Carter

N. T.

Dalal

D. K.

(2010). An ideal point account of the JDI work satisfaction scale. Personality and Individual Differences, 49, 743-748.

10.

Carter

N. T.

Zickar

M. J.

(2011). The influence of dimensionality on parameter estimation accuracy in the generalized graded unfolding model. Educational and Psychological Measurement, 71, 765-788.

11.

Chen

W. H.

Thissen

(1997). Local dependence indexes for item pairs using item response theory. Journal of Educational and Behavioral Statistics, 22, 265-289.

12.

Chernyshenko

O. S.

Stark

Drasgow

Roberts

B. W.

(2007). Constructing personality scales under the assumptions of an ideal point response process: Toward increasing the flexibility of personality measures. Psychological Assessment, 19, 88-106.

13.

Chernyshenko

O. S.

Stark

Williams

(2009). Latent trait theory approach to measuring person-organization fit: Conceptual rationale and empirical evaluation. International Journal of Testing, 9, 358-380.

14.

Christensen

K. B.

Makransky

Horton

(2017). Critical values for yen’s Q₃: Identification of local dependence in the rasch model using residual correlations. Applied Psychological Measurement, 41, 178-194.

15.

Davison

M. L.

(1977). On a metric, unidimensional unfolding model for attitudinal and developmental data. Psychometrika, 42, 523-548.

16.

de la Torre

Stark

Chernyshenko

O. S.

(2006). Markov chain monte carlo estimation of item parameters for the generalized graded unfolding model. Applied Psychological Measurement, 30, 216-232.

17.

Donovan

J. J.

Dwight

S. A.

Schneider

(2014). The impact of applicant faking on selection measures, hiring decisions, and employee performance. Journal of Business and Psychology, 29, 479-493.

18.

Drasgow

Chernyshenko

O. S.

Stark

(2010). 75 Years after Likert: Thurstone was right!. Industrial and Organizational Psychology, 3, 465-476.

19.

Drasgow

Levine

M. V.

Tsien

Williams

Mead

A. D.

(1995). Fitting polytomous item response theory models to multiple-choice tests. Applied Psychological Measurement, 19, 143-166.

20.

Drasgow

Stark

Chernyshenko

O. S.

Nye

C. D.

Hulin

C. L.

White

L. A.

(2012). Development of the tailored adaptive personality assessment system (TAPAS) to support army personnel selection and classification decisions. Drasgow Consulting Group Urbana IL.

21.

Ellingson

J. E.

Sackett

P. R.

Connelly

B. S.

(2007). Personality assessment across selection and development contexts: Insights into response distortion. Journal of Applied Psychology, 92, 386-395.

22.

Foster

G. C.

Min

Zickar

M. J.

(2017). Review of item response theory practices in organizational research: Lessons learned and paths forward. Organizational Research Methods, 20, 465-486.

23.

Gelman

Meng

X. L.

Stern

(1996). Posterior predictive assessment of model fitness via realized discrepancies. Statistica Sinica, 6, 733-760.

24.

Gelman

Rubin

D. B.

(1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7, 457-472.

25.

Goldberg

L. R.

(1992). The development of markers for the Big-five factor structure. Psychological Assessment, 4, 26-42.

26.

Harris

A. M.

McMillan

J. T.

Carter

N. T.

(2021). Test-taker reactions to ideal point measures of personality. Journal of Business & Psychology, 36, 513-532.

27.

Habing

Finch

Roberts

J. S.

(2005). A Q3 statistic for unfolding item response theory models: Assessment of unidimensionality with two factors and simple structure. Applied Psychological Measurement, 29, 457-471.

28.

Hough

L. M.

Oswald

F. L.

Ployhart

R. E.

(2001). Determinants, detection and amelioration of adverse impact in personnel selection procedures: Issues, evidence and lessons learned. International Journal of Selection and Assessment, 9, 152-194.

29.

Huang

J. L.

Ryan

A. M.

Zabel

K. L.

Palmer

(2014). Personality and adaptive performance at work: A meta-analytic investigation. Journal of Applied Psychology, 99, 162-179.

30.

Joo

S. H.

Chun

Stark

Chernyshenko

O. S.

(2019). Item parameter estimation with the general hyperbolic cosine ideal point IRT model. Applied Psychological Measurement, 43, 18-33.

31.

Joo

S. H.

Lee

Stark

(2017). Evaluating anchor-item designs for concurrent calibration with the GGUM. Applied Psychological Measurement, 41, 83-96.

32.

Joo

S. H.

Lee

Stark

(2018). Development of information functions and indices for the GGUM-RANK multidimensional forced choice IRT model. Journal of Educational Measurement, 55, 357-372.

33.

Joo

S. H.

Lee

Stark

(2020). Adaptive testing with the GGUM-RANK multidimensional forced choice model: Comparison of pair, triplet, and tetrad scoring. Behavior Research Methods, 52, 761-772.

34.

Judge

T. A.

Ilies

(2002). Relationship of personality to performance motivation: A meta-analytic review. Journal of Applied Psychology, 87, 797-807.

35.

Judge

T. A.

Rodell

J. B.

Klinger

R. L.

Simon

L. S.

Crawford

E. R.

(2013). Hierarchical representations of the five-factor model of personality in predicting job performance: Integrating three organizing frameworks with two theoretical perspectives. Journal of Applied Psychology, 98, 875-925.

36.

Kuncel

N. R.

Borneman

M. J.

(2007). Toward a new method of detecting deliberately faked personality tests: The use of idiosyncratic item responses. International Journal of Selection and Assessment, 15, 220-231.

37.

LaPalme

Tay

Wang

(2018). A within-person examination of the ideal-point response process. Psychological Assessment, 30, 567-581.

38.

Lee

Joo

S. H.

Stark

Chernyshenko

O. S.

(2019). GGUM-RANK statement and person parameter estimation with multidimensional forced choice triplets. Applied Psychological Measurement, 43, 226-240.

39.

Levy

(2011). Posterior predictive model checking for conjunctive multidimensionality in item response theory. Journal of Educational and Behavioral Statistics, 36, 672-694.

40.

Levy

Mislevy

R. J.

Sinharay

(2009). Posterior predictive model checking for multidimensionality in item response theory. Applied Psychological Measurement, 33, 519-537.

41.

Liu

Zhang

(2020). An item-level analysis for detecting faking on personality tests: Appropriateness of ideal point item response theory models. Frontiers in Psychology, 10, 3090.

42.

Makransky

Bilenberg

(2014). Psychometric properties of the parent and teacher ADHD rating scale (ADHD-RS): Measurement invariance across gender, age, and informant. Assessment, 21, 694-705.

43.

Maraun

M. D.

Rossi

N. T.

(2001). The extra-factor phenomenon revisited: Unidimensional unfolding as quadratic factor analysis. Applied Psychological Measurement, 25, 77-87.

44.

McGrane

J. A.

(2019). The bipolarity of attitudes: Unfolding the implications of ambivalence. Applied Psychological Measurement, 43, 211-225.

45.

Mueller-Hanson

Heggestad

E. D.

Thornton

G. C.

III . (2003). Faking and selection: Considering the use of personality from select-in and select-out perspectives. Journal of Applied Psychology, 88, 348-355.

46.

Nye

C. D.

Joo

S. H.

Zhang

Stark

(2020). Advancing and evaluating IRT model data fit indices in organizational research. Organizational Research Methods, 23, 457-486.

47.

O'Brien

LaHuis

D. M.

(2011). Do applicants and incumbents respond to personality items similarly? A comparison of dominance and ideal point response models. International Journal of Selection and Assessment, 19, 109-118.

48.

Patz

R. J.

Junker

B. W.

(1999). A straightforward approach to markov chain monte carlo methods for item response models. Journal of Educational and Behavioral Statistics, 24, 146-178.

49.

Penfield

R. D.

(2007). Assessing differential step functioning in polytomous items using a common odds ratio estimator. Journal of Educational Measurement, 44, 187-210.

50.

Ployhart

R. E.

Holtz

B. C.

(2008). The diversity–validity dilemma: Strategies for reducing racioethnic and sex subgroup differences and adverse impact in selection. Personnel Psychology, 61, 153-172.

51.

R Core Team (2020): R: A language and environment for statistical computing [Computer software]. Vienna, Austria.

52.

Reckase

M. D.

(1997). A linear logistic multidimensional model. In van der Linden

W. J.

Hambleton

R. K.

(Eds.), Handbook of modern item response theory (pp. 271-286). Springer-Verlag.

53.

Reeve

B. B.

Hays

R. D.

Bjorner

J. B.

Cook

K. F.

Crane

P. K.

Teresi

J. A.

, Thissen, D., Revicki, D. A., Weiss, D. J., Hambleton, R. K., Liu, H., Gershon, R., Reise, S. P., Lai, J., & Cella, D. (2007). Psychometric evaluation and calibration of health-related quality of life item banks. Medical Care, 45, S22-S31.

54.

Roberts

J. S.

Donoghue

J. R.

Laughlin

J. E.

(2000). A general item response theory model for unfolding unidimensional polytomous responses. Applied Psychological Measurement, 24, 3-32.

55.

Roberts

J. S.

Donoghue

J. R.

Laughlin

J. E.

(2002). Characteristics of MML/EAP parameter estimates in the generalized graded unfolding model. Applied Psychological Measurement, 26, 192-207.

56.

Roberts

J. S.

Fang

H. R.

Cui

Wang

(2006). GGUM2004: A windows-based program to estimate parameters in the generalized graded unfolding model. Applied Psychological Measurement, 31, 64-65.

57.

Roberts

J. S.

Laughlin

J. E.

Wedell

D. H.

(1999). Validity issues in the Likert and thurstone approaches to attitude measurement. Educational and Psychological Measurement, 59, 211-233.

58.

Roberts

J. S.

Thompson

V. M.

(2011). Marginal maximum a posteriori item parameter estimation for the generalized graded unfolding model. Applied Psychological Measurement, 35, 259-279.

59.

Rosse

J. G.

Stecher

M. D.

Miller

J. L.

Levin

R. A.

(1998). The impact of response distortion on preemployment personality testing and hiring decisions. Journal of Applied Psychology, 83, 634-644.

60.

Scherbaum

C. A.

Sabet

Kern

M. J.

Agnello

(2013). Examining faking on personality inventories using unfolding item response theory models. Journal of Personality Assessment, 95, 207-216.

61.

Schmidt

F. L.

Hunter

J. E.

(1998). The validity and utility of selection methods in personnel psychology: Practical and theoretical implications of 85 years of research findings. Psychological Bulletin, 124, 262-274.

62.

Sinharay

Johnson

M. S.

Stern

H. S.

(2006). Posterior predictive assessment of item response theory models. Applied Psychological Measurement, 30, 298-321.

63.

Spector

P. E.

Van Katwyk

P. T.

Brannick

M. T.

Chen

P. Y.

(1997). When two factors don't reflect two constructs: How item characteristics can produce artifactual factors. Journal of Management, 23, 659-677.

64.

Stark

Chernyshenko

O. S.

Drasgow

(2005). An IRT approach to constructing and scoring pairwise preference items involving stimuli on different dimensions: The multi-unidimensional pairwise-preference model. Applied Psychological Measurement, 29, 184-203.

65.

Stark

Chernyshenko

O. S.

Drasgow

Nye

C. D.

White

L. A.

Heffner

Farmer

W. L.

(2014). From ABLE to TAPAS: A new generation of personality tests to support military selection and classification decisions. Military Psychology, 26, 153-164.

66.

Stark

Chernyshenko

O. S.

Drasgow

Williams

B. A.

(2006). Examining assumptions about item responding in personality assessment: Should ideal point methods be considered for scale development and scoring? Journal of Applied Psychology, 91, 25-39.

67.

Stout

W. F.

(1990). A new item response theory modeling approach with applications to unidimensionality assessment and ability estimation. Psychometrika, 55, 293-325.

68.

Tay

Drasgow

(2012a). Theoretical, statistical, and substantive issues in the assessment of construct dimensionality: Accounting for the item response process. Organizational Research Methods, 15, 363-384.

69.

Tay

Drasgow

(2012b). Adjusting the adjusted χ2/df ratio statistic for dichotomous item response theory analyses: Does the model fit? Educational and Psychological Measurement, 72, 510-528.

70.

Tay

Drasgow

Rounds

Williams

B. A.

(2009). Fitting measurement models to vocational interest data: Are dominance models ideal? Journal of Applied Psychology, 94, 1287-1304.

71.

Tay

Jebb

A. T.

(2018). Establishing construct continua in construct validation: The process of continuum specification. Advances in Methods and Practices in Psychological Science, 1, 375-388.

72.

Tay

Kuykendall

(2017). Why self-reports of happiness and sadness may not necessarily contradict bipolarity: A psychometric review and proposal. Emotion Review, 9, 146-154.

73.

Tay

(2018). Ideal point modeling of Non-cognitive constructs: Review and recommendations for research. Frontiers in Psychology, 9, 2423.

74.

Thurstone

L. L.

(1928). Attitudes can be measured. American Journal of Psychology, 33, 529-554.

75.

van Schuur

W. H.

Kiers

H. A.

(1994). Why factor analysis often is the incorrect model for analyzing bipolar concepts, and what model to use instead. Applied Psychological Measurement, 18, 97-110.

76.

Wang

W. C.

S. L.

(2016). Confirmatory multidimensional IRT unfolding models for graded-response items. Applied Psychological Measurement, 40, 56-72.

77.

Yen

W. M.

(1984). Effects of local item dependence on the fit and equating performance of the three-parameter logistic model. Applied Psychological Measurement, 8, 125-145.

78.

Zeng

(1997). Implementation of marginal Bayesian estimation with four-parameter beta prior distributions. Applied Psychological Measurement, 21, 143-156.

79.

Zhang

Cao

Tay

Luo

Drasgow

(2020). Examining the item response process to personality measures in high-stakes situations: Issues of measurement validity and predictive validity. Personnel Psychology, 73, 305-332.

80.

Zhang

Stout

(1999). Conditional covariance structure of generalized compensatory multidimensional items. Psychometrika, 64, 129-152.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.02 MB

			20% Multidimensional Items						40% Multidimensional Items
Disc	COR	N	X2	G2	OR	COV	MBC	Q3	X2	G2	OR	COV	MBC	Q3
Low	0	500	.00	.00	.01	.01	.01	.01	.01	.01	.03	.03	.03	.03
		1,000	.00	.00	.01	.00	.02	.02	.01	.01	.05	.05	.05	.05
		2,000	.00	.00	.01	.00	.01	.01	.03	.03	.06	.06	.07	.07
	0.3	500	.00	.00	.02	.02	.02	.02	.01	.01	.04	.03	.04	.04
		1,000	.00	.00	.00	.00	.02	.02	.01	.01	.04	.04	.05	.05
		2,000	.00	.00	.01	.00	.01	.02	.03	.04	.08	.07	.09	.09
	0.6	500	.00	.00	.02	.02	.02	.02	.00	.00	.02	.02	.02	.02
		1,000	.00	.00	.01	.01	.02	.02	.00	.00	.03	.03	.05	.05
		2,000	.00	.00	.01	.00	.01	.01	.00	.00	.03	.01	.05	.05
Medium	0	500	.00	.00	.01	.01	.01	.01	.01	.01	.04	.04	.04	.04
		1,000	.00	.00	.01	.01	.02	.02	.04	.04	.07	.07	.08	.08
		2,000	.00	.00	.00	.00	.01	.01	.04	.04	.06	.06	.07	.07
	0.3	500	.00	.00	.01	.01	.02	.02	.00	.00	.03	.03	.03	.03
		1,000	.00	.00	.01	.01	.02	.02	.03	.03	.06	.06	.08	.08
		2,000	.00	.00	.01	.00	.02	.02	.06	.07	.10	.09	.12	.12
	0.6	500	.00	.00	.01	.01	.02	.02	.00	.00	.03	.03	.03	.03
		1,000	.00	.00	.00	.00	.01	.01	.00	.00	.03	.03	.06	.06
		2,000	.00	.00	.00	.00	.02	.02	.01	.01	.05	.03	.11	.11
High	0	500	.00	.00	.01	.01	.02	.02	.02	.02	.07	.07	.08	.08
		1,000	.00	.00	.00	.00	.01	.01	.05	.05	.07	.07	.08	.08
		2,000	.00	.00	.00	.00	.01	.01	.09	.09	.12	.12	.13	.13
	0.3	500	.00	.00	.01	.01	.02	.02	.01	.01	.05	.05	.05	.05
		1,000	.00	.00	.01	.01	.02	.02	.06	.06	.11	.11	.15	.15
		2,000	.00	.00	.00	.00	.02	.02	.07	.07	.12	.12	.17	.17
	0.6	500	.00	.00	.01	.01	.02	.02	.00	.01	.04	.04	.05	.05
		1,000	.00	.00	.01	.01	.02	.02	.02	.02	.10	.10	.15	.15
		2,000	.00	.00	.01	.00	.02	.02	.02	.02	.08	.06	.15	.15

			10 Items						20 Items
			20% MD Items			40% MD Items			20% MD Items			40% MD Items
Disc	COR	N	χ²	YQ3	MQ3	χ²	YQ3	MQ3	χ²	YQ3	MQ3	χ²	YQ3	MQ3
Low	0	500	.03	.01	.01	.20	.01	.01	.02	.00	.00	.13	.00	.00
		1,000	.01	.00	.00	.17	.01	.01	.01	.00	.00	.13	.00	.00
		2,000	.00	.00	.00	.16	.00	.00	.00	.00	.00	.09	.00	.00
	0.3	500	.02	.01	.01	.10	.00	.00	.02	.00	.00	.04	.00	.00
		1,000	.01	.00	.00	.02	.00	.00	.01	.00	.00	.05	.00	.00
		2,000	.00	.00	.00	.01	.00	.00	.00	.00	.00	.03	.00	.00
	0.6	500	.03	.01	.01	.04	.00	.00	.03	.00	.00	.03	.00	.00
		1,000	.01	.00	.00	.02	.00	.00	.01	.00	.00	.02	.00	.00
		2,000	.00	.00	.00	.01	.00	.00	.00	.00	.00	.01	.00	.00
Medium	0	500	.01	.03	.03	.15	.04	.03	.02	.00	.00	.16	.01	.00
		1,000	.01	.02	.02	.15	.04	.04	.01	.00	.00	.23	.01	.00
		2,000	.00	.01	.01	.12	.03	.03	.00	.00	.00	.14	.00	.00
	0.3	500	.01	.03	.03	.05	.03	.02	.01	.00	.00	.05	.00	.00
		1,000	.00	.01	.01	.05	.01	.01	.01	.00	.00	.04	.00	.00
		2,000	.00	.01	.01	.07	.00	.00	.00	.00	.00	.09	.00	.00
	0.6	500	.01	.03	.03	.02	.01	.01	.01	.00	.00	.02	.00	.00
		1,000	.00	.01	.01	.01	.00	.00	.01	.00	.00	.02	.00	.00
		2,000	.00	.00	.00	.03	.00	.00	.00	.00	.00	.02	.00	.00
High	0	500	.01	.05	.05	.13	.11	.09	.01	.00	.00	.12	.02	.01
		1,000	.00	.04	.04	.08	.12	.11	.01	.00	.00	.13	.02	.01
		2,000	.00	.03	.03	.19	.10	.09	.00	.00	.00	.13	.02	.00
	0.3	500	.00	.06	.06	.07	.06	.04	.01	.00	.00	.06	.01	.00
		1,000	.01	.04	.04	.09	.03	.04	.00	.00	.00	.08	.00	.00
		2,000	.00	.02	.03	.09	.03	.03	.00	.00	.00	.07	.00	.00
	0.6	500	.01	.04	.04	.02	.02	.02	.01	.00	.00	.03	.00	.00
		1,000	.00	.02	.02	.02	.01	.01	.00	.00	.00	.04	.00	.00
		2,000	.00	.02	.02	.03	.01	.01	.00	.00	.00	.04	.00	.00