Abstract
Can one accurately infer the dimensionality of constructs such as emotions (i.e., happy–sad), work–family spillover (i.e., positive–negative), or job performance (i.e., organizational citizenship behaviors and counterproductive work behaviors) with commonly used methods? In this article, the authors show how the misapplication of commonly used methods (e.g., factor analysis [FA]) to data originating from an ideal point response process (i.e., self-reported typical behaviors: attitudes, personality, emotions, or interests) can lead to incorrect theoretical and statistical inferences. The authors demonstrate that principal components analysis (PCA) produces an additional spurious dimension despite Likert scaling procedures (i.e., reverse scoring and excluding items with low item-total correlations to improve scale reliability). This incorrectly leads to a conclusion against bipolarity. The authors illustrate the substantive implications for organizational research with emotions data showing that the misapplication of FA could underlie the longstanding debate on the bipolarity of affect. To circumvent this potential problem, the authors propose analytic steps to determine if the recovered constructs are spurious. Additionally, the authors lay out specific issues that need to be considered when evaluating the bipolarity of self-reported typical behavior constructs such as work–family spillover and job performance.
Keywords
Organizational scientists and psychologists alike hold dear the oft quoted adage “There is nothing so practical as a good theory” (Lewin, 1951). Indeed, the development of theory informs and guides practice and is critical for the progress of organizational research. Nevertheless, theory itself is underpinned by the methods used to elucidate key psychological constructs. In making a case for the significance of scientific methods, Cacioppo and Bernston (1994) illustrated a poignant vignette from the Philosophy of Science (Eddington, 1939). In this story, a hypothetical scientist sought to determine the size of fish in the sea by using a 2-inch net. After trawling the waters extensively, he found that there was not a single fish smaller than 2 inches and firmly concluded that fish smaller than 2 inches do not exist. The ironic conclusion serves to highlight the theoretical error immediately apparent from the misappropriate application of method. In the same spirit, we bring attention to the problems associated with applying dominance models (e.g., factor analysis and principal component analysis) when assessing key psychological constructs such as attitudes, personality, emotions, and interests.
Recent studies have reexamined the appropriateness of the statistical models used to describe various types of behaviors (Chernyshenko, Stark, Chan, Drasgow, & Williams, 2001; Chernyshenko, Stark, Drasgow, & Roberts, 2007; Roberts, Laughlin, & Wedell, 1999; Stark, Chernyshenko, Drasgow, & Williams, 2006; Tay, Drasgow, Rounds, & Williams, 2009). Dominance models like factor analysis and traditional item response theory (IRT) models (e.g., the two-parameter logistic IRT model) assume that the probability of item endorsement (p) is directly related to the difference between the latent trait (θ) and the item location (δ), formulated as
In this article, we show that the importance of specifying the correct measurement model for different types of behaviors cannot be overstated. There are several key contributions over past research. First, we review past research on this issue and update the current psychological theory (Tay et al., 2009) underlying these statistical models. Second, some previous studies have found that the misapplication of dominance models to ideal point data do not create substantial problems. For example, the rank ordering of individuals has not changed drastically and had little impact on selection decisions (Stark et al., 2006). Although in vocational interest research, ideal point models (unlike dominance models) can locate individuals on the middle of the bipolar People–Things dimension when they have dual interests (Tay et al., 2009), this advantage is moot because recent research shows that People and Things are better construed as two distinct constructs (Tay, Su, & Rounds, 2011). At first blush, it appears that misapplication of models does not hinder our scientific progress. Nevertheless, we review additional findings on how applying dominance models to ideal point data leads to spurious dimensions. We present new evidence showing that this problem persists even after Likert scaling (e.g., reverse scoring and excluding items with low item-total correlations to improve scale reliability) and commonly used techniques such as multidimensional scaling (MDS) are used.
Third, we illustrate the importance of the issue to substantive topics by showing that the longstanding controversy of the bipolarity of happiness and sadness (see the review by Russell & Carroll, 1999) may be due to an incorrect application of dominance models to ideal point responses. To our knowledge, this is the first article to present empirical evidence for how ideal point responding can directly affect our theoretical inferences. This has direct implications for constructs that are of interest to organizational science such as work–family spillover (i.e., positive and negative) and job performance (i.e., organizational citizenship behavior and counterproductive work behavior). Analytic issues pertaining to these types of constructs are elaborated in our discussion.
Theory and Measurement
Historically, there were sharp distinctions in the measurement techniques used to examine the latent structure of maximal and self-reported typical behaviors. Cronbach (1949), for example, elucidated the distinction between behavior types. As a forerunner in the field of measurement, Thurstone (1927a, 1927b, 1928, 1945) developed dominance and ideal point approaches to measure mental abilities and attitudes, respectively, because of stark differences in the processes underlying item responding. From a theoretical standpoint, an appropriate measurement model should be used for the type of behavior one attempts to measure. For example, if one attempts to accurately understand and measure individual responses to a multiple-choice examination, it is important to model guessing. This goes beyond the issue of measurement accuracy of individuals. It implies that all correct responses are not simply a result of the latent trait, but also contingent on the item format (e.g., two or four options) and chance. Similarly, dominance and ideal point models statistically depict how a psychological construct is understood and the process by which responses are produced.
Drawing from the work of Coombs (1964), Cronbach (1949), Fiske and Butler (1963), and Terman (1924), an integrative theoretical framework developed by Tay and his colleagues (2009) showed the conceptual correspondences between types of behaviors and measurement models. In this article, we describe: (a) the nature of the latent trait, (b) the meaning of the latent continuum, and (c) the item response process. We extend the prior framework by presenting other distinguishing features such as (d) the role of motivation and (e) the polarity of constructs. These characteristics are intertwined, but for ease of presentation, we discuss each aspect in turn.
Latent Trait
In measuring maximal behaviors, the aim is to determine the threshold or capacity limits of individual attributes like cognitive or physical ability. The assessment is set up to determine the maximal output of the physiological, psychological, or physical system. For example, repeatedly lifting a 50-pound weight tests when physical strength is exhausted. By contrast, self-reported typical behaviors encompass habitual characteristics, typical modes of action, or descriptive states. For instance, an individual endorsing a statement about personality (e.g., “I am a happy person”) or emotions (e.g., “I was happy over the past week”) reveals a general predisposition or a descriptive account of his or her (or a target’s) psychological state, respectively.
Latent Continuum
For maximal behaviors, the latent continuum is marked by increasing levels of difficulty. Therefore, item locations may be comprehended as difficulty parameters as seen from the tradition of IRT dominance model parameterizations. From another perspective, the latent continuum may also indicate the degree of desirability more generally. For example, higher levels of achievement are valued and even advocated for in organizations. This may not hold true for self-reported typical behavior like attitudes or personality; more is not necessarily better. For example, inordinate attitudes, personality attributes, or emotion states may border on extremism or psychological disorders. Additionally, item placements on the continuum are simply understood as locations in ideal point models, with no connotation of difficulty. The continuum itself is marked generally by non–value-laden terms like valence or high-low.
Item Response Process
As we had alluded earlier, in testing for maximal behaviors, items are pitted against individuals. It is assumed that individuals either overcome items or fail trying. For example, in classic Guttman scaling, the point where an item is overcome marks the threshold ability for an individual. Hence, from the perspective of the examinee, items are viewed as hurdles or obstacles to overcome. We view this as a hurdle algorithm of responding—where the resulting item response process is monotonic; that is, given the item difficulty, higher levels of the latent trait are associated with increased probability of success. However, in self-reported typical behaviors, individuals determine whether items are characteristic of themselves. Self-characterization on the measured dimension occurs because individuals draw from self-perceptions and past experiences in deciding which terms are the most appropriate self-descriptors. Therefore, the response process follows a matching algorithm of responding. This corresponds to an item response process that is non-monotonic, and individuals endorse an item to the extent it matches their self-characteristic or behavior or a target’s characteristic or behavior.
Motivation
Motivation arouses psychological and physical resources and creates more effort directed toward explicit requirements of the assessment task (cf. Campbell & Pritchard, 1983). Because assessment procedures for maximal performance behaviors challenge the capacity limits of individuals, increased motivation enhances test performance, leading to higher scores. By contrast, assessment procedures for self-reported typical behaviors require individuals to determine the correspondences between the administered item and themselves. Therefore, increased motivation serves to enhance accuracy of responding on self-reported typical behaviors, leading to more accurate scores. But under high-stakes testing conditions, motivation for self-presentation may lead to increased scores on desired attributes (Morgeson et al., 2007). This may lead to greater appearance of dominance responding on desired attributes (e.g., O’Brien & LaHuis, 2011).
Polarity of Constructs
By definition, maximal performance behavior constructs are unipolar because they reflect physical, physiological, and psychological capacities. Higher scores reflect a higher quantity of an attribute and lower scores reflect a lack of the attribute. Therefore, the latent continuum for maximal behaviors ranges from low, moderate, to high. On the other hand, self-reported typical behaviors are not necessarily constrained as unipolar or bipolar. For example, discrete emotions can be measured as two unipolar dimensions—such as happiness and sadness (e.g., “not at all happy/sad” to “very happy/sad”). Or they may be conceptualized as emotional valences and hence measured as single bipolar dimensions ranging from sadness to happiness (e.g., “very sad” to “very happy”) (e.g., Killingsworth & Gilbert, 2010).
Does Applying an Inappropriate Measurement Model Matter?
Despite the theoretic significance of choosing an appropriate measurement model for a particular type of behavior, a critical question is whether incorrectly specifying a model affects scientific practices or hinders other epistemic endeavors within organizational science. Indeed, although there is recognition that ideal point models are appropriate for self-reported typical behaviors (Drasgow, Chernyshenko, & Stark, 2010), many are not convinced that dominance models applied to self-reported typical behaviors yield substantial problems to the reliability or validity of scales (e.g., Oswald & Schell, 2010; Reise, 2010; Waples, Weyhrauch, Connell, & Satoris, 2010).
In this article, we focus on the issue of whether an inappropriate measurement model could affect inferences of construct dimensionality. This is a critical issue because many theoretical models of typical behaviors—within organizational science and the social sciences in general—have been built on factor-analytic approaches. These include broad areas of personality, attitudes, interests, and emotions, which are important predictors of job performance and arguably necessary to understand in their own right as individual differences (Ackerman & Humphreys, 1990; Schmitt, Cortina, Ingerick, & Wiechmann, 2003).
Past research shows that unidimensional ideal point data yield an additional factor when analyzed by a dominance methodology such as factor analysis or principal components analysis (Coombs, 1975; Coombs & Kao, 1960; Davison, 1977; Spector, Van Katwyk, Brannick, & Chen, 1997; van Schuur & Kiers, 1994).
Although Clyde Coombs examined this phenomenon, it was in the context of preferential choice data, where one seeks to extract dimensions underlying preference choices (e.g., product preference).
Relating directly to rating scales, Spector and colleagues (1997) presented extreme negative, moderate negative, moderate positive, and extreme positive items to individuals and used classical test theory (CTT) techniques to show that patterns of responses appear to have ideal point characteristics. Using the pattern of observed responses (assumed to be ideal point), they additionally simulated several data sets and applied confirmatory factor models to show that a two-factor model generally fit better.
van Schuur and Kiers (1994) and Davison (1977) demonstrated this more precisely by simulating unidimensional ideal point data and showing that two dimensions were recovered with principal components analysis, and moderate or neutral items loaded highly on an unrotated artifactual factor, as seen in Figure 1a.

Principal components analysis (PCA) of unidimensional ideal point data: Factor loading plots. (A) PCA including neutral/moderate items. (B) PCA excluding neutral/moderate items.
Nevertheless, past research has been silent on three important issues. First, it is now known that moderate or neutral items tend to load highly on an artifactual factor when individuals respond by an ideal point process as shown in Figure 1a. Given that most scales are created using scaling procedures where moderate or neutral items are generally excluded (due to low item-total correlations), to what extent would dimension recovery be a problem? As shown in Figure 1b, if such items are excluded, the factors extracted would more closely reflect unidimensionality. To the extent that most of the intermediate items are excluded, dimension recovery with dominance procedures would fare well. Indeed, it is believed that current scaling procedures—such as Likert scaling—exclude middling items for which ideal point response models are most suited (e.g., Oswald & Schell, 2010; Waples et al., 2010). Therefore, more light needs to be shed on this topic.
Second, little is known about dimension recovery for ideal point data other than principal components analysis (or factor analysis). Is there a simple alternative—such as multidimensional scaling—that organizational researchers can use to address this issue? Multidimensional scaling has also been used by organizational researchers for uncovering data dimensionality. It has been used to examine individual differences (Mount, Barrick, Scullen, & Rounds, 2005; Rounds & Tracey, 1993), cognitive representations of conflict (Gelfand et al., 2001) and team models (Mohammed, Klimoski, & Rentsch, 2000), workplace behaviors (Robinson, & Bennett, 1995), and reactions to job dissatisfaction (Farrell, 1983). In MDS procedures, the dimensionality of the data is commonly inferred from the fit of the model and item configurations. Yet, this alternative technique has not been applied to examine ideal point response data.
Third, there has been speculation that ideal point responding can lead to erroneous substantive conclusions (e.g., van Schuur & Kiers, 1994), but little evidence has been presented to support this claim. To demonstrate the importance of this issue for organizational research, we present an illustrative example showing that measurement issues could underlie the decades of ongoing debate on whether lexically opposed emotions—happiness and sadness—are part of a single bipolar valence dimension or two separate dimensions (Cropanzano, Weiss, Hale, & Reb, 2003; Russell & Carroll, 1999). We examined how an emotion scale—that was created along a theoretic sad-to-happy continuum—fared using various data processing and analytic techniques. We show that an additional dimension may be recovered with such techniques, and substantive interpretations may be misleading. Specifically, we obtain happy and sad factors, which create the impression of two orthogonal constructs. Yet, a unidimensional ideal point model produced good fit to the data.
Summary of Study Goals
In this article, we expand on the theoretical differences between maximal performance behaviors and self-reported typical behaviors and the need to apply the appropriate measurement model. Although past research has shown that misapplying dominance models to ideal point data does not necessarily create substantial problems, we provide evidence for how misapplying dominance models—such as principal components analysis (PCA) or factor analysis—to unidimensional data can yield an additional spurious dimension. Unfortunately, earlier studies have not demonstrated whether our current scaling procedures can alleviate these problems, and it is not known if other frequently used procedures such as MDS can correctly uncover dimensionality. To this end, a simulation study was conducted to demonstrate that this problem persists and is not easily resolvable with data preprocessing. A second study illustrates how two factors may be recovered from a sad-to-happy bipolar continuum despite a good fit for a unidimensional ideal point model. To help researchers avert potential problems, we present a decision table for evaluating construct dimensionality. We also discuss analytic issues that need to be considered when attempting to examine the dimensionality of constructs such as work-family spillover and job performance.
Study 1
Dimension Recovery With PCA and MDS
In the first study, we used two statistical techniques to determine whether certain types of procedures can recover the correct dimensions underlying ideal point data; these approaches include PCA and nonmetric MDS. Furthermore, because such analyses are frequently run on data preprocessed using Likert scaling, we applied these techniques to raw and Likert-scaled data to determine if data processing can help recover the correct dimensionality. In this article, we refer to Likert scaling as a classical test theory procedure of reverse scoring negatively worded terms and excluding items with low item-total correlations to meet standards for scale reliability as advocated for by Rensis Likert (1932). As mentioned earlier, it is commonly believed that this procedure effectively eliminates middling items. Likert scaling does not refer to the response scale.
Method
We examined conditions reflecting those typically used to study construct dimensionality. We focus only on the unidimensional case because if unidimensionality cannot be recovered, then it is unlikely that applying current techniques to multidimensional ideal point data would yield the correct number of dimensions. Sample size (250, 500, 1,000), number of item response options (2 and 5), number of items (9 and 12), and skewness level (0 and –1) were manipulated, giving a total of 3 × 2 × 2 × 2 = 24 conditions. Within each condition, 200 replications were undertaken. For each replication, data were generated under ideal point assumptions; then, data processing techniques—none and Likert scaling—were performed. Because it is well known that PCA recovers an additional spurious dimension when zero-order correlations are extracted from dichotomous data, tetrachoric correlations were also examined in this instance (Carroll, 1945); for polytomous data, PCA performs well on the zero-order correlations. Nonmetric MDS was applied to the respective correlations (c) that were converted to dissimilarities (D), where
Data Generation
In each replication, simulees were generated with latent trait values drawn from a standard normal distribution for the no skew condition. In the skew condition, latent trait values were drawn from a moderately skewed distribution (mean = 0; variance = 1; skewness = 1). Ideal point data were simulated using a generalized graded unfolding model (GGUM) parameterization (Roberts, Donoghue, & Laughlin, 2000). For dichotomous data, the endorsement probability is given by
Data Processing
A third of all items, specifically, the items on the negative end of the continuum, were first reverse scored. Then, item-total biserial correlations were evaluated and further reverse scoring was undertaken for items with negative item-total correlations. After the reverse scoring procedure was complete, the data were evaluated to determine if excluding items improved scale reliability. If the scale reliability was lower than .75, a scale purification process was undertaken to emulate how researchers might improve scale reliabilities; this procedure was important particularly for dichotomous items because of lower reliabilities: (a) The item that had the largest scale reliability improvement was dropped; (b) scale reliability was reevaluated after removing the item; (c) steps (a) and (b) were repeated until no further improvements in reliability could be made or if a minimum scale length of three items was reached. The number of items excluded in each replication was recorded.
Dimensional Analyses
Two statistical techniques were applied to determine dimensionality. For PCA, Horn’s (1965) parallel analysis was used to evaluate the number of components to retain. In this technique, PCA eigenvalues are compared to the 95th percentile eigenvalue across 100 randomly generated data sets; PCA is performed on each random data set that has the same data dimensions (i.e., number of simulees and scale length), and variables are assumed to be uncorrelated. In our results, we examined the proportion of replications (out of 200) that parallel analysis identified a single factor or two factors. In the case of nonmetric MDS, we fit models with one to five dimensions and recorded the variance accounted for (VAF). Across the 200 replications, these average values can then be compared to determine the numbers of dimensions that are needed to account for the variability in the data. In general, researchers have used an “inverse-scree” to evaluate the location for the smallest rate of change in the VAF as the number of dimensions increases (see Borg & Groenen, 2005). In our results, we examine the average VAF by one and two factors.
Software
All simulations and analyses were implemented in R 2.9.0. The R package utilized for MDS procedures was “smacof” (de Leeuw & Mair, 2009). We recommend that the updated version of “smacof” v. 1.2-0 published in May 2011 be used for estimating unidimensional nonmetric MDS solutions as previous versions had more problems reaching convergence.
Results
The simulation results for dimension recovery across various data processes and analytic techniques are displayed in Table 1. Because the pattern of results for the no skew and skew conditions was very similar, we present only results for the no skew condition. We focus on whether Likert scaling can help correctly identify unidimensionality and whether a unidimensional MDS solution can account for most of the variance between the items.
Apparent Dimensionality of Ideal Point Data as a Function of Data, Data Processing Techniques, and Analytic Strategies
Note: #RO = the number of response options simulated; N = sample size; SL = scale length; α = Cronbach’s alpha for final Likert scale; d = average number of items dropped from original scale.
aPrincipal components analysis: Columns denote the number of replications parallel analysis retained for one or two components; numbers in parentheses denote results using tetrachoric correlations. Numbers may at times add up to more than 1.00 because of rounding. On occasions, parallel analysis found more than two components, but for space considerations we only present the first two components.
bNonmetric multidimensional scaling: Columns denote the average proportion of variance accounted for by the numbers of dimensions.
Reliability and Scale Length After Likert Scaling
In scale construction, negatively worded items are commonly reverse scored and items with low item-total correlations are excluded. From Table 1, we see that Likert scaling led to Cronbach’s alphas in the range of .55 to .85. Scale reliabilities were naturally higher for polytomous data than dichotomous data. From the table, we also see that on average, about three to four items were excluded from the scale; an examination of the types of items dropped showed that items in the middle of the continuum tended to be excluded from the scale due to improved scale reliability as these items tended to have low or negative item-total point-biserial correlations. Of interest is that more items were excluded for dichotomous data compared to polytomous data. The implication is that application of PCA to dichotomous data would be more likely to recover a single dimension, as expected in Figure 1b. On the other hand, polytomous data would more likely recover two dimensions because fewer middling items were excluded. 1
PCA
For dichotomous data, parallel analysis of non–Likert-scaled data led to the retention of one dimension a substantial proportion of the time when sample sizes were small; however, this trended to two dimensions as sample size and test length increased. Likert-scaled data improved the recovery of a single dimension for dichotomous data as parallel analysis recovered one dimension more than 90% of the time. Interestingly, when parallel analysis was applied to tetrachoric correlations for dichotomous data, it led to more frequent recovery of two dimensions rather than one, indicating that tetrachoric correlations were not useful for examining ideal point dimensionality. Tetrachoric correlations assume a dominance model underlying item responses, and apparently violations of their strong assumptions lead to problematic estimates of correlations. Importantly, as the number of simulees increased, the proportion of runs that identified two dimensions systematically increased.
Parallel analysis of the polytomous non–Likert-scaled data using PCA consistently and incorrectly identified two dimensions for the truly unidimensional ideal point data. As noted earlier, because Likert scaling procedures were less likely to remove middling polytomous items that affect dimensionality recovery, parallel analysis of Likert-scaled data produced similar results in that two dimensions were identified instead of one.
MDS
For dichotomous data, when nonmetric MDS was applied to non–Likert-scaled data, a single dimension was sufficient because the proportion of variance accounted for was close to 1.00. However, Likert scaling resulted in a lower VAF when a single dimension was specified, and instead two dimensions were often suggested. Unlike PCA for dichotomous data, MDS did not perform better when Likert scaling was applied.
For polytomous non–Likert-scaled data, a unidimensional MDS solution could sufficiently account for most of the variance among the items. After Likert scaling, a single dimension could also account for a substantial amount of variance, although these values were slightly attenuated compared to the non–Likert-scaled data.
Discussion
This study presents evidence for how unidimensionality may be obscured because of a mismatch of analytic techniques to item responding. We highlight some important results. Foremost, PCA in general showed that two dimensions were identified for raw data. Even after Likert scaling techniques were applied, two dimensions were still retained with polytomous data. Although dimensionality was reduced to a single component for dichotomous data with Likert scaling, recovery of two components was likely as the sample size increased. This suggests that if individuals respond to items in an ideal point manner, the application of Likert scaling techniques (i.e., reverse scoring and dropping items with low item-total correlations) can still result in a spurious dimension.
Interestingly, the Likert scaling procedure was effective for dichotomous items because it excluded items in the middle of the continuum and unidimensionality was effectively recovered. However, our analysis shows that Likert scaling was an ineffective procedure for polytomous data and that Likert scaling would not automatically exclude middling and moderate items. Therefore, the current belief that our scaling procedures effectively exclude ideal point items in most instances is incorrect.
Are there possible solutions given that Likert scaling procedures are not always effective? A possible solution is to generate only extreme items during scale creation. However, this requires prior knowledge on what is extreme for a subpopulation of interest (see Tay et al., 2009). Further, the use of only extreme items would not provide sufficient discrimination between individuals with moderate trait values, and ideal point responding can lead to a rejection of both extremities. For instance, an individual may reject both “very happy” and “very sad” because they do not describe his or her neutral mood state. It should also be noted that for Likert scaling to be effective, a balance of positive and negative items is required so that middling items that are reflected by low item-total correlations can be identified. Clearly, there are limitations in using current classical test methods for identifying and excluding ideal point items. For effective item identification and exclusion, item analysis using ideal point modeling is needed.
We note that in our simulation we focused on the PCA procedure because it has been the method of interest in past ideal point modeling studies. Yet, the PCA procedure functions as a good proxy for factor analysis (FA) because there are hardly any practical differences in dimension recovery (see review by Velicer & Jackson, 1990). Nevertheless, we acknowledge that there are technical differences between the two procedures, such as variance partitioning, that may lead to lower loadings for FA compared to PCA (see review by Conway & Huffcutt, 2003). Our simulation results show the limitation of applying PCA and, by extension, FA to ideal point data.
Second, nonmetric MDS pointed to unidimensionality for raw data; this result was invariant across different types of response processes and robust to different numbers of item response options. Thus, given unprocessed data, nonmetric MDS may be a trustworthy method for studying dimensionality of a scale. On the other hand, when Likert scaling was used, MDS produced a worse fit compared to the use of unprocessed data. This shows that MDS dimensionality recovery was only effective for raw data but not Likert-scaled data.
To summarize, PCA revealed two dimensions with the exception of Likert-scaled dichotomous data. However, MDS revealed one dimension for unprocessed data.
Study 2
The first study showed that spurious dimensions may be recovered despite the application of scaling procedures, particularly for PCA. Yet, it may be difficult to recognize how spurious dimension recovery can affect our theoretical inferences. To anchor these issues conceptually, we analyze an emotions data set. Importantly, the structure of emotions has been debated for almost half a century (see review by Cropanzano et al., 2003) and apparently has not been resolved. Specifically, are happiness and sadness bipolar ends of a single continuum (see Russell & Carroll, 1999)? Or are they fundamentally two unipolar dimensions that have moderate inverse correlations (see Larsen, McGraw, & Cacioppo, 2001)? We suggest that individuals use ideal point responding to emotion terms and factor-analytic techniques may incorrectly reveal two ostensible orthogonal dimensions rather than a single dimension.
We used descriptors of emotional valence to describe a conceptual bipolar continuum from “very sad” to “very happy.” This conceptually conforms to the core affect space model (Russell & Carroll, 1999). We chose to use this unique format because it is recognized that conventional response formats such as “strongly disagree” to “strongly agree” are often interpreted as bipolar by the participants (Russell & Carroll, 1999); that is, the neutral point occurs close to the middle response option (e.g., “neither agree nor disagree”). Therefore, current research evaluating bipolarity uses a screening item like “Are you experiencing any happiness at all?” Respondents who endorse yes on the screening item would then proceed to rate the extent they feel happiness (e.g., Larsen et al., 2001). However, this specific procedure is not amenable to common psychometric procedures such as those evaluated here.
Instead, it is more straightforward to use a dichotomous response format with multiple items along a conceptual bipolar continuum. Moreover, because dichotomous items only have a single threshold, they are easier to interpret (e.g., the item “slightly happy” is located at a point above the item “slightly sad”). By contrast, for example, an item such as “happy” with five response options would have several thresholds for options such as “not at all happy,” “slightly happy,” “moderately happy,” and so forth, with no specific location per se for the item itself; and these response options have been found to range from negativity to positivity (Segura & Gonzalez-Roma, 2003).
Method
Participants were 224 students (46% male; 54% female) from the psychology subject pool in a large midwestern university. The average age of participants was 20.28 (SD = 11.47). Demography by race (71% White, 5% Black, 16% Asian, 8% Others) and ethnicity (95% non-Hispanic or Latino) showed some diversity. As part of a larger online survey, items were created to measure emotions along a proposed bipolar continuum. These included 10 graded adverbs of happy and sad (e.g., “very happy” to “slightly happy” to “slightly sad” to “very sad”). The list of emotion terms is presented in Table 2. All items were randomly presented and only one item was presented at a time. Participants were asked if the presented emotion term described their current feelings. Responses were recorded using a yes/no format.
Emotions Scale: Descriptive Statistics, Generalized Graded Item Response Parameters, and Principal Components Analysis
Results
Validating Response Assumptions
A dichotomous dominance model—the two-parameter logistic model (2PLM)—was fit to the data. The 2PLM is commonly used for self-reported typical behaviors such as personality (see Chernyshenko et al., 2001). This model was compared to an ideal point model—the GGUM. Initial analyses showed that the item “very sad” was not well estimated in both the ideal point and dominance model due to low item endorsements (4%). This item was excluded from the analysis. To assess IRT fit, the adjusted χ2/dfratios (adjusted for sample size) were examined (Drasgow, Levine, Tsien, Williams, & Mead, 1995). This statistic reflects the difference between observed and model-based frequencies, and recent research shows that lower values of doubles and triples adjusted χ2/df ratios indicate relatively better IRT fit (Tay, Ali, Drasgow, & Williams, 2011). The doubles χ2 is computed from a contingency table using pairs of items i and i′and the triples χ2is computed using triples of items.
The 2PLM had an average adjusted χ2/dfratio of 19.35 (doubles) and 28.78 (triples), whereas the GGUM had an average adjusted χ2/df ratio of 6.52 (doubles) and 10.85 (triples). This indicated that the GGUM fit better to the data than the 2PLM, providing evidence that respondents most likely used ideal point responding. The absolute fit for the GGUM was substantially larger than 3 because the adjusted χ2/df index is sensitive to pairs of item that are very similar in content (see Tay et al., 2009). The two largest doubles adjusted χ2/df values—in both the 2PLM and GGUM—were (a) “fairly happy” and “moderately happy” and (b) “slightly happy” and “a little happy.” When the items “fairly happy” and “slightly happy” were excluded, the average adjusted χ2/df ratio for the GGUM was 3.72 (doubles) and 3.89 (triples), whereas the 2PLM had average adjusted χ2/df ratios of 6.08 (doubles) and 8.34 (triples).
Table 2 shows the estimated graded response item parameters. Clearly, the item locations (δ) correspond with the grading adverbs where more extreme emotions are located at the end poles and less extreme emotions are closer to the middle of the continuum. This closely corresponds with the first component from the PCA solution, as would be expected from ideal point data.
Estimated Item Configurations From Confirmatory Factor Analysis (CFA) and Multidimensional Scaling (MDS)
Scale Reliability
Reliability analyses were conducted on the emotion scale excluding the item “very sad.” The reliability of the raw data was .48. Likert scaling increased the reliability to .75. No items were excluded from the process.
PCA and Factor Analysis
The PCA eigenvalues for the first three components were 3.17, 2.18, and 1.05, respectively. As expected of unidimensional ideal point data, parallel analyses determined two components for extraction. Confirmatory factor analysis (CFA) was applied to raw data because Likert scaling implicitly assumes that positive and negative emotions are reflections of the same construct. When a one-dimensional model was fit to the data, the analysis did not converge, which suggests that the model was misspecified. A two-dimensional model of positive and negative valence produced moderate fit (χ2 (26) = 130.95, Comparative Fit Index [CFI] = .94, Tucker-Lewis Index [TLI] = .91, root mean square error of approximation [RMSEA] = .134). As shown in Table 3, the factor loadings revealed that the item “very happy” (λ = .22) did not load highly on the positive valence factor whereas all the other loadings were higher than .82. Excluding this item led to an excellent fit (χ2 (19) = 29.89, CFI = .99, TLI = .99, RMSEA = .051). Importantly, the correlation between positive and negative valence was –.31 in the initial model and –.27 in the final model. This would lead to an interpretation that positive and negative valence indicators reflect two unipolar constructs, whereas the IRT analyses—which take into account the item response process—clearly indicate one latent dimension.
MDS
On the other hand, MDS showed that a single dimension may be sufficient for accounting for the data. The variance accounted for by a single dimension was .86, whereas two dimensions accounted for all the variance in the data. This pattern matches the results from our simulation study. Although MDS pointed to a single dimension, the resultant spatial configuration was incongruous with our conceptual expectations. As seen in Table 3, although positive and negative valence items were located on opposite ends, more extreme positive and negative valenced items were closer spatially as compared to less extreme positive and negative valenced items.
General Discussion
This study contributes to organizational and psychological research in showing that ignoring the item response process can lead to theoretical, statistical, and substantive problems. Foremost, we have argued that psychological constructs that are self-reports of typical behaviors generally conform to an ideal point response process and it is important to use the appropriate statistical model that provides a theoretical match. However, the severity of the statistical and substantive implications has been questioned. This is because it is believed our current scaling procedures exclude most items that are located in the middle of the continuum, which are most characteristic of the ideal point model. Yet, we showed that incorrect inference of dimensionality can occur and continue to persist even after we apply Likert scaling procedures, particularly for PCA, which may be generalized to factor-analytic procedures. Substantively, we presented empirical evidence that the ideal point response process may underlie the longstanding debate on the bipolarity of affect.
In this section, we present a decision table to circumvent this potential problem of recovering a spurious dimension that can affect our substantive interpretations. We present potential areas for research and conclude by calling for the development of new analytic techniques to assess data dimensionality.
Detecting the Potential Spurious Dimension: A Decision Table
To circumvent the potential problem of recovering a spurious dimension, we advocate some simple steps for researchers to undertake, as summarized in Table 4. First, we propose that researchers evaluate whether the construct of interest falls under the class of maximal performance behaviors of self-reported typical behaviors using the theoretical framework proposed in the introduction. Given the latter behavior class, the recovery of a spurious dimension may be a potential problem because individuals would likely use an ideal point response process. We note that dominance dichotomous data analyzed using factor-analytic approaches could potentially lead to a difficulty factor, but this is a result of an inappropriate analytic technique for dichotomous data rather than the response process.
Proposed Analytic Steps for Determining Whether Recovered Dimension Is Spurious
Second, it is important to determine whether the end poles of the construct are recovered as two separate dimensions. As mentioned in our current theoretical framework, self-reported typical behaviors may be either bipolar or unipolar constructs. Interestingly, although some constructs are defined as unipolar—that is, a construct that ranges from nonoccurrence (or nonexistent) to frequent occurrence (or extreme)—these scales have reverse-worded items that may render such constructs to be operationally bipolar. For conceptually bipolar and operationally bipolar constructs, we propose that if the application of PCA or FA with oblique rotation yields two dimensions, which are defined by the conceptual ends of our bipolar construct, there is a likelihood that the number of factors extracted may be incorrect. For a unipolar construct, if the application of PCA or FA with oblique rotation yields two dimensions that are defined by the conceptual ends of the unipolar construct (i.e., nonoccurrence and frequent occurrence; nonexistent and extreme), it is also likely that the number of factors extracted may be incorrect. The number of factors extracted may also depend on the number of items at each end of the continuum. For example, when there is only one reverse-worded item on a scale, there may be little evidence for such items loading on a separate factor.
Third, we can examine the component or factor correlation for disconfirming evidence of bipolarity. If it is close to zero or negative, it is possible that the extracted factors could be part of a single continuum. If it is moderately or strongly positive, it is unlikely that the factors are part of a single continuum and they may be substantively different. For example, it has been shown that although vocational interest can be classified as self-reported typical behaviors (Tay et al., 2009), and that two factors are extracted from a conceptual bipolar People–Things dimension, these factors are positively correlated and better construed as two separate factors (Tay, Su, et al., 2011).
Fourth, we can further examine the unrotated PCA loading plot to determine whether we recover a semicircular factor loading plot as shown in Figure 1. We would expect that middling items (i.e., moderate items for unipolar constructs; neutral items for bipolar constructs) to have higher loadings on the second dimension—which is likely spurious. On the other hand, we would expect more extreme items of the construct to have high opposing loadings on the first dimension. For example, a unipolar construct like happiness may have negative loadings for “slightly happy” and positive loadings for “very happy,” whereas a bipolar valence construct may have negative loadings for “very sad” and positive loadings for “very happy.” This would be a strong indication that the additional recovered dimension is spurious.
Finally, if all the prior steps indicate a possible spurious dimension, we can confirm the item response process by comparing the fit of an ideal point response model with a dominance model. This follows past procedures (e.g., Stark et al., 2006) and a simulation study that shows that validity of such an approach (Tay, Ali, et al., 2011). Finding a better fit for an ideal point response model would confirm that an ideal point response process was used and resulted in the recovery of a spurious dimension when PCA or FA was applied. When comparing model fit, it is also helpful to know whether individuals responded to scale items under evaluative conditions (e.g., selection settings) because they may be motivated to self-presentation, which may lead to dominance-like responding (O’Brien & LaHuis, 2011). Under nonevaluative conditions, it is important for participants to be motivated to respond accurately to uncover ideal point responding.
Potential Areas of Application in Organization Science
The application of these procedures can be used to examine various constructs within organization science, psychology, or other fields that use self-reports of typical behaviors. For example, it has been suggested that the ostensible orthogonality of political attitudes may be due to ideal point responding (e.g., van Schuur & Kiers, 1994). Within organization science, we suggest two areas for investigation: work–family interface and job performance. These constructs are a sample of a wide variety of constructs that can be further examined using ideal point modeling and more innovative methodologies. In the following, we present some issues that researchers can consider when examining these constructs.
Recent work on work–family interface (Allen, in press) has focused on the positive and negative interdependencies between work and family, termed positive and negative spillover, respectively. Greenhaus and Powell (2006) found that the average correlation between positive and negative spillover from 15 studies was very small in magnitude, suggesting that positive and negative spillover should be construed as independent constructs rather than bipolar. Yet, this current conceptualization may not be accurate if ideal point responding underlies responses to positive and negative spillover.
A careful examination of dimensionality may require revisiting current work–family scales to determine whether positive and negative spillover reflect oppositional processes on the same set of psychological resources. Specifically, Greenhaus and Powell (2006) presented a list of resources that can spill over between roles. These include skills and perspectives, psychological and physical resources, social-capital resources, flexibility, and material resources. Despite a clear demarcation of resources, existing scales can potentially confound different resources with spillover valence (positive or negative). For instance, in a national longitudinal study of Americans (MIDUS), the work–family scale for positive (or negative) spillover does not have an antithetical equivalent for negative (or positive) spillover. More concretely, a negative work-to-family spillover item, “Your job makes you feel too tired to do the things that need attention at home,” does not have a positive work-to-family spillover item asking about whether a job enhances energy or motivation to do things that need attention at home. We propose that for more definitive examinations of construct dimensionality—particularly bipolarity—it is necessary to use theory to focus on a single operating process rather than multiple processes.
Scholars have also been interested in the relationship between organizational citizenship behaviors (OCBs) and counterproductive work behaviors (CWBs) (see the review by Dalal, 2005). Specifically, are OCBs opposite of CWBs, or are they separate constructs? The current thought is that OCBs and CWBs are likely to be unrelated (e.g., Spector, Bauer, & Fox, 2010). Interestingly, the meta-analytic finding has been that OCBs and CWBs have an overall inverse relation of –.32, and ranges from –.55 to –.23, depending on the response format. These correlations are similar to values found in the past when examining the bipolarity of emotions. For example Green, Goldman, and Salovey (1993) found that the observed correlations for happy and sad scales ranged from –.69 to –.25.
Aside from conceptual issues underlying OCBs and CWBs (see Spector & Fox, 2010), the type of response format appears to affect the degree to which OCBs and CWBs are related. Scales with a behavioral frequency format have lower average correlations (r = –.23) as compared to agreement format (r = –.55) (Dalal, 2005). One interpretation is that frequency ratings contain less halo effects as compared to agreement ratings because they focus less on the rater’s attitude toward the behavior than the specific behavior itself (Spector et al., 2010). Another plausible interpretation is that frequency ratings are based on a unipolar response format (“not at all” to “very frequently”), which lowers the correlation between two antithetical items. Even when OCBs and CWBs are bipolar and mutually exclusive, consider that not engaging in OCBs (“not at all”) still allows for endorsements of any frequency of CWBs (“not at all” to “very frequently”) and vice versa. This leads to an L-shaped bivariate response pattern between two variables, resulting in an attenuated correlation (Russell & Carroll, 1999). On the other hand, an agreement format has a bipolar response format and does not have the aforementioned problem. However, these formats may inherently assume bipolarity when bipolarity is something that needs to be tested (Russell & Carroll, 1999). Clearly, more research needs to distinguish substantive effects from scale response format effects.
In the following, we summarize the common themes and highlight other related issues when evaluating construct dimensionality:
A clear definition of the nature of the latent continuum needs to be developed and antithetical scale items should not be confounded by other factors.
Relatedly, researchers need to use innovative methods to develop scale items that sample the continuum in order to determine whether the continuum is indeed bipolar or not. Strong evidence of bipolarity would empirically demonstrate that the continuum is bipolar. Indicators along this continuum reveal the nature of bipolarity. For example, we show that intensity differentiates points along the emotion valence continuum.
Scale format can create potential problems when inferring bipolarity using correlations.
When two established constructs are being tested for bipolarity, we generally do not Likert scale all the indicators because this presumes unidimensionality. Instead, we encourage researchers to use the decision table to determine dimensionality.
It is insufficient to show that inversely related constructs are distinct because they have different antecedents and consequences. If both constructs are part of a latent continuum so that where one ends the other begins, incremental predictive validity is expected.
At the item level, we propose that researchers should develop a conceptual and methodological basis to determine whether reverse-worded items match our construct of interest. In so doing, tests of item bipolarity can be developed.
Call for Alternative Analytic Techniques
As presented in the simulations and the empirical analysis, currently available techniques have limited accuracy for ascertaining dimensionality when the data are truly ideal point. The simulation study showed that nonmetric MDS may be robust in determining data dimensionality. However, when empirical data were examined, the spatial configuration of the nonmetric MDS was conceptually incorrect. Nevertheless, this research is particularly suggestive of the use of MDS and other related procedures such as unfolding MDS. We are currently conducting additional simulation studies to examine the differences between the recovered loadings for ideal point and dominance data and how they map on to underlying item characteristics.
Although we have presented analytic steps to help researchers circumvent such problems when examining unidimensionalconstructs, less is known about the accuracy of dominance models in recovering the correct number of dimensions when one seeks to extract multiple dimensions simultaneously. For example, there have been different factor-analytic solutions proposed for personality, such as a five-factor solution (Digman, 1990), a six-factor solution (Ashton et al., 2004), or a seven-factor solution (Benet & Waller, 1995). Clearly, this is an important impetus for developing multidimensional ideal point models.
Certainly, there have been some heuristics developed to assess ideal point data unidimensionality (e.g., Davison, 1977; Maraun & Rossi, 2001; van Schuur & Kiers, 1994), but there has not been a rigorous statistic developed for this purpose. Recently, there have been proposals to use a Q3 statistic for the GGUM that shows some promise as a test for unidimensionality and local independence (Habing, Finch, & Roberts, 2005). Another possibility is the modification of the conditional covariance statistic, utilized in item response theory, to assess data unidimensionality (e.g., Guo & Tay, 2010). An anonymous reviewer also kindly pointed out the potential use of nonlinear factor analysis (McDonald, 1967) and correspondence analysis (CA) for ideal point data (Polak, Heiser, & de Rooij, 2009). Indeed, these are potential areas that may be fruitful in creating a new set of tools to assess the dimensionality of self-reported typical behaviors. Yet, more work remains to be done. For instance, we need to develop models of nonlinear factor analysis that go beyond common link functions (e.g., log, logit, or inverse) to take into account the unfolding relationship between indicators and the latent variable. Because CA scales both individuals and items simultaneously, it has also been recognized that CA is “unconventional” and requires a visual interpretation to determine a one-dimensional solution similar to PCA (Polak et al., 2009, p. 3117). Given the importance of factorial validity as part of construct validity, we call for more research in this area.
Conclusion
Our article draws on three perspectives—theoretical, statistical, and substantive—to demonstrate the importance of accounting for the response process. Although an additional dimension may be recovered from unidimensional self-reported typical behaviors, we present analytic steps to help researchers circumvent this problem. We also suggest how these findings and proposed procedures can be applied to assess the dimensionality of constructs in organizational research. We also call for the development of analytic techniques to assess the dimensionality of self-reported typical behaviors. We hope that this article contributes to the measurement literature by raising awareness about the limitations of current techniques (particularly for PCA and FA) and renewing the theoretical and methodological interest in scale construction and analysis of psychological and organizational constructs.
Footnotes
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
The author(s) received no financial support for the research, authorship, and/or publication of this article.
