Abstract
The assessment of the number of dimensions and the dimensionality structure of questionnaire data is important in scale evaluation. In this study, the authors evaluate two dimensionality assessment procedures in the context of Mokken scale analysis (MSA), using a so-called fixed lowerbound. The comparative simulation study, covering various theoretically and empirically relevant conditions, indicates that the MSA procedures may result in scales that are inconsistent with the dimensionality of the data set at hand. That is, a single Mokken scale can be multidimensional, and two Mokken scales can pertain to a single dimension. In an illustrative evaluation, MSA using a range of lowerbounds, rather than a fixed lowerbound, was shown to have some benefits, but not to solve all limitations. The results of this study imply that MSA is perfectly suitable to create Mokken scales. However, MSA appears of limited value as a dimensionality assessment method.
Keywords
An essential part of the psychometric analyses of questionnaire data is the assessment of the dimensionality. The dimensionality is defined as the minimum number of latent traits that is needed to describe the statistical dependency in the data (Zhang & Stout, 1999a). Because the dimensionality refers to the data, it is dependent on the characteristics of the items and the respondents (Reckase, 2009). Many researchers are interested in the dimensionality of a questionnaire in a population. By applying a specific dimensionality assessment procedure on sample data they try to get an indication of the dimensionality in the population of interest. When the data are described by more than a single dimension, researchers are usually also interested in the dimensionality structure of their data (Zhang & Stout, 1999b), that is, which items are related to which trait(s). In what follows, the general term dimensionality will be used to indicate the number of latent traits involved and the dimensionality structure.
To assess the dimensionality of questionnaire data, multiple methods have been developed. In recent years, there has been an increasing interest in the use of nonparametric item response theory (IRT; see Embretson & Reise, 2000) methods, such as exploratory Mokken scale analysis (MSA; e.g., Sijtsma, Meijer, & van der Ark, 2011). MSA (Mokken, 1971) has been used as a dimensionality assessment method in different research areas in the social sciences, political sciences, psychiatry, and marketing (Zijlstra, van der Ark, & Sijtsma, 2011). For instance, MSA was used to assess the dimensionality of the Social Participation Questionnaire (SPQ; Koster, Timmerman, Nakken, Pijl, & van Houten, 2009), the Brief Symptom Inventory–18 (BSI-18; Meijer, de Vries, & van Bruggen, 2011), the Hospital Anxiety and Depression Scale (HADS; Emons, Sijtsma, & Pedersen, 2010), and the SERVQUAL instrument (Paas & Sijtsma, 2008).
The popularity of nonparametric methods, such as MSA, in empirical practice is not surprising. First, nonparametric methods are generally based on less restrictive assumptions than on parametric methods (Junker & Sijtsma, 2001), and therefore, they are more likely to fit empirical data than parametric methods. Furthermore, nonparametric methods are less sensitive to small sample sizes than parametric methods (Stout, 2001). For a recent overview about MSA and its assumptions, see Sijtsma et al. (2011).
The comparative performances of different nonparametric dimensionality assessment methods have been investigated in various simulation studies. Those studies suggest that MSA is inferior in relation to alternative nonparametric methods in recovering the correct dimensionality structure (Mroch & Bolt, 2006; van Abswoude, van der Ark, & Sijtsma, 2004). In particular, those studies revealed that MSA does not function well in conditions in which the traits correlate (Mroch & Bolt, 2006; van Abswoude et al., 2004), or in which the items load on more than one trait (van Abswoude et al., 2004). Although those test conditions are the rule rather than the exception in empirical practice (van Abswoude, Vermunt, & Hemker, 2007), after publication of these articles, numerous researchers have still used MSA as a dimensionality assessment tool (e.g., Bech, Fava, Trivedi, Wisniewski, & Rush, 2011; Chen, Tseng, Hu, & Koh, 2010; Doyle, Conroy, McGee, & Delaney, 2010; Emons et al., 2010; Koster et al., 2009; Meijer et al., 2011; Ordoñez, Ponsoda, Abad, & Romero, 2009; Roorda, Houwink, Smits, Molenaar, & Geurts, 2011; Sousa et al., 2010). Furthermore, MSA is still recommended to assess unidimensionality (Sijtsma et al., 2011). The authors suspect that one of the reasons for this continuous use and recommendation of MSA as a dimensionality assessment method is that previous studies have mainly focused on the comparison of different nonparametric IRT methods, and to a lesser degree on the mechanisms behind MSA that cause MSA to be insufficient for dimensionality assessment. In particular, the authors conjecture that the continued use of MSA as a dimensionality assessment method might be due to a misunderstanding about what a Mokken scale reflects, namely, that it would necessarily consist of items that measure a distinct single underlying trait. The latter would be necessary to function well as a dimensionality assessment method. Therefore, in this study, they focus on the mechanisms of MSA that underlie the underperformance of MSA as a dimensionality assessment tool.
Overview
The aim of the present article is to examine and explain the performances of MSA as a dimensionality assessment method, in a range of conditions that are relevant from a theoretical and/or empirical perspective. For the latter, the authors focus on data structures that have been shown to be particularly relevant in the personality field. They extend the literature on the performance of MSA by discussing the mechanisms of MSA that underlie the underperformance of MSA as a dimensionality assessment tool and by including a recently proposed method that is based on a genetic algorithm (GA; Straat, van der Ark, & Sijtsma, in press).
The remainder of this article is organized as follows. First, the Monotone Homogeneity Model (MHM) and the scalability coefficient H on which MSA is based are introduced. Second, the conditions that must be satisfied for a set of items to form a Mokken scale are presented, and the most commonly used item selection methods to create Mokken scales are discussed. Third, the meaning of a Mokken scale is explained. Fourth, the simulation study is presented, in which the performance of MSA as a dimensionality assessment method is examined, using a fixed lowerbound. A range of conditions is included in which it is speculated that MSA has difficulties in recovering the dimensionality structure. Fifth, the value of MSA using a range of lowerbounds to assess the dimensionality is considered. For illustrative purposes, an empirical example is presented in which the use of MSA, with a fixed lowerbound and a range of lowerbounds, as a dimensionality assessment method appears less suitable than the use of an alternative method, namely, factor modeling. Finally, the implications of the findings are discussed.
MSA
MHM
MSA is based on the MHM. The MHM is based on the assumptions of unidimensionality, local independence, and monotonicity of the item response function (IRF). An IRF describes the probability of an affirmative response to (an answer category of) an item, as a function of the latent trait scores. If the MHM holds, persons can be ordered on the latent trait using their ordering on the total score of the questionnaire.
Scalability Coefficients
To explore whether the MHM fits for a given set of items, the scalability coefficient H is commonly used. There are three versions of the scalability coefficient, namely, between two items X i and X j (H ij ), between an item X i and the remaining items of the item set (H i ), and for the item set (H).
The scalability coefficient H ij of items X i and X j is defined as
where Cov(X i ,X j ) is the covariance between items X i and X j ; Cov(X i ,X j )max is the maximum covariance between items X i and X j , given the marginal distributions of the items X i and X j ; and hence, H ij is the ratio between these two.
The scalability coefficient H i of a single item X i with respect to the other items in the item set is defined as
where R(i) is the total rest score (the unweighted total score minus the score of item X i ), Cov(X i ,R(i)) is the covariance between item X i and the total rest score R(i), Cov(X i ,R(i))max is the maximum covariance between item X i and the total rest score R(i), and hence, H i is the ratio between these two.
The scalability coefficient H is defined as the aggregated item scalability coefficients over the item set, namely, as
Mokken Scale Conditions
MSA aims at finding Mokken scales. Mokken (1971) defines a set of items as a Mokken scale if it meets two requirements:
The covariances between the items are nonnegative:
The scalability coefficients for all items are larger than or equal to a positive lowerbound c:
The rationale behind those two requirements is as follows: The first Mokken scale condition stems from the requirement that items must comply with the MHM. As under the MHM the covariance of each item pair is nonnegative in the population, negative covariances are in conflict with MHM (Sijtsma & Molenaar, 2002). The second Mokken scale condition stems from the desire to construct scales with items that discriminate sufficiently between persons. As items with positive but small H i values contribute little to a reliable person ordering, H i values are required to be larger than or equal to a specified lowerbound c (Sijtsma & Molenaar, 2002).
The Aim of MSA
The aim of MSA is to find from an item set, one or more subset(s) of items that each meet the Mokken scale conditions. Such subset(s) of items are commonly denoted as Mokken scale(s). The Mokken scale conditions are necessary but insufficient conditions to create unique Mokken scales. That is, there may be multiple subsets of items that meet the Mokken scale conditions. Therefore, further conditions must be specified to ensure the unicity of the partitioning of items in subscales. One of such an additional condition is the scale length. Mokken (1971) proposed to select as many items as possible in a scale that satisfy the scale conditions. Nonetheless, the condition of scale length is still insufficient to obtain a unique partitioning. To address this issue, different so-called item selection methods have been proposed.
Item Selection Methods
The available item selection methods can be divided into two main classes, namely, heuristic procedures and procedures based on a specific objective function. The latter may differ with respect to the algorithm used to optimize the objective function.
The most popular item selection method is the automated item selection procedure, here referred to as MSA-AISP. MSA-AISP is implemented in MSP5.0 (Molenaar & Sijtsma, 2000) and in R (Straat et al., in press; van der Ark, 2007). The MSA-AISP is a sequential item selection method. The procedure consists of the following steps:
The item pair with the highest positive H ij value is selected.
From the pool of the remaining items, the item is selected that
covaries positively with the items selected in Step 1 (Scale Condition 1)
has an H i value with respect to the already selected items that is significantly larger than zero and is equal or larger than the predefined lowerbound c (Scale Condition 2)
maximizes the total H, considering the items already selected.
Subsequently, Step 2 is repeated until there are no items left that fulfill the conditions for being selected. From the remaining items, if any, a new scale is formed, following the same steps as described above. This forming of additional scales continues until there are no items left that meet the Conditions 2a through 2c. Any possible remaining items are denoted as being unscalable (Sijtsma & Molenaar, 2002).
A disadvantage of the heuristic MSA-AISP is that it does not consider all possible combinations of item pairs and, therefore, it relies heavily on the starting set of items. As a consequence, the MSA-AISP can yield suboptimal results in the sense that it can form scales which do not have the maximum possible H coefficient and/or is not of maximal length, given all possible partitions of the item set (van Abswoude et al., 2004). As an alternative, item selection methods based on an objective function can be used.
A recently proposed item selection method based on an objective function uses a GA to find the optimal scales (Straat et al., in press; van der Ark, 2007), to be denoted here as MSA-GA. The aim of the MSA-GA is to select from all possible partitionings of items, the partitioning for which the largest scale contains the maximum possible number of items, the second largest scale contains the second maximum possible number of items, and so on.
To explain MSA-GA further, suppose there are F possible partitions that meet the Mokken scale conditions. Let
where
The Meaning of a Mokken Scale
Irrespective of the item selection method applied, a MSA is performed with the aim to create Mokken scales. As discussed earlier, a Mokken scale has to satisfy the two scale conditions and is furthermore required to be of maximum scale length. In contrast to those clear properties of a Mokken scale, its meaning does not always seem to be clear.
A correct interpretation of a Mokken scale is that it consists of items that “a) measure a common trait . . . with b) reasonable discriminative power determined by lowerbound c” (Sijtsma & Molenaar, 2002, p. 68). From those two properties, it is sometimes deduced that MSA “reveals the dimensionality structure of the data by selecting one or more subsets of items” (Sijtsma & Molenaar, 2002, p. 71). The authors conjecture that this deduction is not necessarily true. First, note that MSA only reveals the dimensionality if MSA would reveal either a single unidimensional scale, or if MSA would reveal various scales that are each unidimensional and mutually distinct. They conjecture that MSA may reveal scale(s) that do not satisfy those requirements, because MSA could reveal a Mokken scale that is multidimensional, or MSA could reveal different unidimensional Mokken scales that are not distinct. A simulation study was performed to assess under which conditions this may occur.
Simulation Study
Generation of Simulated Data
To mimic a realistic data structure, each data set was simulated to have item scores on 30 items, each on a 5-point Likert-type scale. The simulated data were generated according to a common factor model. If
where
To relate the continuous latent score of subject i on item j (y
ij
*) to the observed ordered polytomous data, prespecified thresholds were used for variable
For all items in the simulation study, the thresholds were specified such that one third (i.e., 10) is symmetrically distributed, and two thirds (i.e., 20) are skewed, of which half (i.e., 10) in the opposite direction, so as to mimic differences in item difficulty across variables. The thresholds were chosen such that the expected proportion of observations in each of the five categories (
Conditions Simulation Study
The data were simulated under a number of conditions in which it is speculated that MSA may create Mokken scales that are not necessarily distinct and unidimensional; furthermore, the sensitivity to sampling fluctuations was assessed. In particular, the following three factors were manipulated:
Sample size (at N = 250 and 1,000)
Percentage common variance (
Dimensionality structure as manipulated via the percentage of common variance attributed to the general factor g (
The dimensionality structure was manipulated via a loading matrix that complied with a bifactor model, while considering uncorrelated factors (i.e., taking
The bifactor model (Holzinger & Swineford, 1937) is a model, in which each item may load on a general factor, and on one or more specific factors, while the factors are uncorrelated. Commonly, an item loads on one specific factor at maximum. The bifactor model is very attractive in the personality and psychopathology domain, because in this context, one typically aims at measuring complex constructs, with an interest in the general trait and in specific aspects of the trait (e.g., Reise, Morizot, & Hays, 2007).
In this simulation study, a bifactor model with three specific factors was used. This bifactor loading matrix can be depicted as
with
The four levels of the dimensional structure were defined in terms of the percentage of common variance attributed to the general factor g. The four levels lead to structures with the following properties:
The condition with g = 100.0 was achieved by taking in
The condition with g = 0.0 was achieved by taking in
In the conditions with
The specific values for a and b were chosen such that they are in accordance with the required percentage common variances.
To relate the conditions with a weak and a strong general factor (i.e.,
As an example, consider the condition with the common variance equal to 75.0%, and the weak general factor. This is achieved by taking a = .433 and b = .750 in
where
Now, the four levels of the dimensional structure are explained, and what the true dimensionality is can be indicated. The number of dimensions involved is one for
In what follows, the four dimensionality structures will be referred to as a unidimensional structure (
A fully crossed design with 500 replications per condition was used, yielding 2 (sample size) × 5 (percentage common variance) × 4 (dimensionality structure) × 500 (replications) = 20,000 generated data sets.
Analysis of Simulated Data
Each of the 20,000 simulated data sets was analyzed with MSA in the statistical program R (R Development Core Team, 2011; Mokken package in R, see van der Ark, 2007). The simulated data sets were analyzed using the MSA-AISP and the MSA-GA (Straat et al., in press; van der Ark, 2010). Because most researchers use the default setting for a lowerbound for the scalability coefficient of c = .3 (Zijlstra et al., 2011), all simulated data sets were analyzed with c = .3.
Recovery Measures
A MSA results in a percentage of unscalable items and a partitioning vector that indicates which items are allocated to which subscales. The MSA results were compared with the simulated data structures with respect to the number of dimensions, the dimensionality structure, and the percentage of unscalable items.
Number of dimensions
To assess to what extent the partitioning of MSA deviates from the correct partitioning, the absolute deviation was evaluated between the number of subscales revealed by MSA and the true number of dimensions (i.e., one for
Dimensionality structure
To assess to what extent the partitioning of MSA corresponds to the simulated data structure, the Rand Index (Rand, 1971) was considered between the partitioning of the scalable items revealed by MSA and the true partitioning (i.e., the unidimensional structure for
Percentage of unscalable items
To assess how many items MSA indicate as unscalable and, thus, how many items are included in the partitioning of MSA, the percentage unscalable items was considered.
Results
Performance of MSA to indicate the number of dimensions
Figures 1a to 1d depict the mean absolute deviation (MAD) between the number of subscales revealed by MSA and the true number of dimensions, as a function of the dimensionality structure and the common variance, for each combination of sample size and item selection method. Note that a MAD of zero indicates that MSA revealed the correct number of dimensions across all replicates in this condition. In Figures 1a to 1d, one can distinguish four general patterns in performances:

MAD from number of correct dimensions based on the scalable items
If the percentage common variance is low (i.e., for
If the percentage common variance is moderate to large (i.e., for
MSA-AISP indicates the number of dimensions well if the percentage common variance is moderate to large (i.e., for
MSA-GA fails to correctly indicate the number of dimensions if the percentage common variance is moderate to large (i.e., for
To obtain further insight into the performance of MSA in indicating the number of dimensions, Table 1 presents the percentage under-, over-, and correct estimation of the number of dimensions by MSA-AISP and MSA-GA, as a function of the dimensionality structure and the common variance, at a sample size of 1,000. The results at a sample size of 250 are not presented because they appeared to be rather similar to the ones in Table 1. As can be seen in Table 1, MSA-AISP tends to overestimate the number of dimensions if the percentage common variance is low (i.e., for
The Percentages of Under-, Over-, and Correct Estimation of the Number of Dimensions by MSA for the Data Sets With a Sample Size of 1,000 per Condition Considered
Note: MSA = Mokken scale analysis; 1-dim = one dimensional; 3-dim-sc = three-dimensional strongly correlated; 3-dim-wc = three-dimensional weakly correlated; 3-dim-unc = three-dimensional uncorrelated; AISP = automated item selection procedure; GA = genetic algorithm. Percentage correct estimation is underlined. Cells representing 50% or more of the data sets are printed in boldface.
Performance of MSA to indicate the dimensionality structure
Figure 2 depicts the Rand Index between the MSA partitioning of the scalable items and the true dimensionality structure (of the scalable items), as a function of the dimensionality structure and the common variance, for each combination of sample size and item selection method. Note that a higher Rand Index indicates a better recovery, with a value equal to 1 indicating that MSA revealed the correct partitioning.

MRI over scalable items between Mokken partition and correct partition
Figure 2 shows that the performances of MSA in indicating the dimensionality structure follow a highly comparable pattern with the performances of MSA in indicating the number of dimensions (Figure 1). That is, the four general patterns in performances as discussed previously also holds for the recovery of the dimensionality structure.
Percentage of unscalable items of MSA-AISP and MSA-GA
Figures 3 and 4 depict boxplots of the percentages of unscalable items indicated by MSA-AISP and MSA-GA, as a function of the dimensionality structure, the common variance, and the sample size. For MSA-AISP (see Figure 3), the percentages of unscalable items decrease with increasing percentages of common variance. This is not surprising, because an unscalable item indicates a weak item, and the percentage of common variance expresses the strength of an item. For MSA-GA, the percentages of unscalable items also decrease with increasing percentages of common variance, but it does so only up to a common variance of 50.0%, and for common variances larger than 62.5%, the percentages unscalable items appear to increase.

Percentage unscalable items, method MSA-AISP

Percentage unscalable items, method MSA-GA
For MSA-AISP and MSA-GA, in the low common variance condition (i.e., for
Discussion
The results of the simulation study showed that the performance of MSA as a dimensionality assessment method appears to be modest. That is, in this study, MSA-AISP indicates the correct dimensionality for unidimensional and three-dimensional data with uncorrelated or lowly correlated factors, in conditions with reasonable common variance (i.e., for
In the conditions in which MSA (MSA-AISP or MSA-GA) creates Mokken scales that do not correspond to the true dimensionality of the data, MSA may have created a single Mokken scale with a truly multidimensional structure (yielding underestimation of the number of dimensions), and/or several Mokken scales that are truly unidimensional, but nondistinct (yielding overestimation of the number of dimensions). Below, these two situations are elaborated further.
One Mokken scale with a multidimensional structure
MSA can create a single Mokken scale with a truly multidimensional structure only in case the subscales do have sufficient “in common.” In the simulation experiment, this is particularly shown in the condition with correlated factors (i.e., for
The reason that MSA can create one multidimensional Mokken scale stems from the aim of MSA to select the maximum number of items that indicate a common trait and that discriminate well between persons. That is, if items discriminate well enough (i.e.,
Several nondistinct unidimensional Mokken scales
In the case of a truly unidimensional structure, MSA may create various unidimensional, but nondistinct Mokken scales. In this simulation study, this is illustrated in the condition with a unidimensional structure with a low common variance. This phenomenon can be understood as follows. MSA requires the items in a subscale to measure a common trait, with sufficient discriminative power. In the case of a low common variance, the unique parts of the items may happen to correlate, just because of sampling fluctuations. Those correlations may be just strong enough for certain subsets of items to let MSA define different subscales. This phenomenon becomes more likely with decreasing sample sizes, as is also shown in this simulation study, where the overestimation appeared to more severe in the conditions with a sample size of 250 than with 1,000.
MSA With a Range of Lowerbounds
In the simulation study, the performance of MSA with a fixed lowerbound in assessing the dimensionality was considered. Alternatively, one could apply MSA using a range of lowerbounds, as proposed by Hemker, Sijtsma, and Molenaar (1995). The pattern of Mokken scales over increasing lowerbounds would then reveal the dimensionality. In particular, a multidimensional scale would be associated with the following typical pattern of Mokken scales with increasing lowerbounds: “(1) most or all items are in one scale; (2) two or more unidimensional scales are formed; and (3) two or more smaller scales are formed and several items are rejected” (Hemker et al., 1995, p. 350). The scales in the second stage reflect the final solution of the so-called separate unidimensional scales. A unidimensional scale would be typically associated with the following pattern with increasing lowerbounds: “(1) most or all items are in one scale; (2) one smaller scale is found; and (3) one or a few small scales are found and several items are rejected” (Hemker et al., 1995, p. 350). The scale in the first stage reflects the unidimensional scale.
The fact that the lowerbound c plays an important role in the partitioning of items into Mokken scales is not surprising. As discussed, Mokken scales are scales that are of maximal length, given a minimal strength of each item in the scale. The lowerbound c tips the balance between item strength and item length, and thus affects the item partitioning. According to Hemker et al. (1995), this range of item partitionings over different c could be indicative of the dimensionality of the data. Because an extensive evaluation of the use of MSA with a range of lowerbounds is beyond the scope of this article, the “range-of-lowerbounds procedure” is examined here briefly.
To illustrate the possible benefits and limitations of the range-of-lowerbounds procedure, the authors have applied it on a number of selected data sets. Note that the procedure requires a visual inspection of the course of item partitionings over increasing lowerbounds, which precludes its implementation in a simulation study. “Easy” and “difficult” conditions are selected, that is, conditions in which the range-of-lowerbounds procedure is expected to perform better than a fixed threshold, and conditions in which the range-of-lowerbounds procedure may give results from which the dimensionality of the data is hard to disentangle.
The results of the range-of-lowerbounds procedure were presented using the MSA-AISP. In a preliminary analysis, this procedure was applied also on MSA-GA. Similar results were observed, albeit MSA-GA has a stronger tendency to join items into the first scale(s). This capacity is completely in line with the objective function of MSA-GA, in which the length of the first scales is heavily rewarded to be long. The authors will now discuss how the MSA-AISP range-of-lowerbounds procedure performs.
First, the condition that appeared relatively difficult for MSA with a fixed boundary in the simulation study, namely, the condition with a low common variance (
For Examples 1 to 4, the MSA scales for each boundary are presented in Table 2. The judgment of the number of dimensions involved on the basis of the typical patterns for uni- and multidimensionality described by Hemker et al. (1995) are printed in boldface. As can be seen in Table 2, the range-of-lowerbounds procedure indicates the correct structure of the data in Examples 1, 3, and 4, where the data have a unidimensional structure, a three-dimensional structure with uncorrelated factors, and a three-dimensional structure with weakly correlated factors. However, if the factors are strongly correlated, as in Example 2, the range-of-lowerbounds procedure does not seem to offer a solution because it incorrectly indicates the three-dimensional data structure as being two dimensional.
Six Illustrative Examples (Ex.) of the Range-of-Lowerbounds Procedure Using MSA-AISP
Note: MSA = Mokken scale analysis; AISP = automated item selection procedure; 1-dim = one dimensional; 3-dim-sc = three-dimensional strongly correlated; 3-dim-wc = three-dimensional weakly correlated; 3-dim-unc = three-dimensional uncorrelated. For the conditions with three dimensions, the true partitioning is Items 1 to 10, 11 to 20, and 21 to 30. The numbers in boldface indicate the judgment of the number of dimensions involved on the basis of the range-of-lowerbounds procedure.
Next, the performance of the range-of-lowerbounds procedure in two conditions in which the item strengths are allowed to differ across and within the scales (rather than be equal as in the previous Examples) is illustrated. In Example 5, the condition of a three-dimensional uncorrelated dimensionality structure was considered, with a moderately high overall common variance (
In Example 5 (see Table 2), MSA creates only two coherent Mokken scales, corresponding to the first two subscales, of which the items discriminate well. Thus, the range-of-lowerbounds procedure incorrectly indicates a three-dimensional data structure as being two dimensional. Whether the unscalable items do belong to one scale or to multiple scales is not deducible from this result. In Example 6, the range-of-lowerbounds procedure provides a solution that is difficult to interpret. Given the pattern of Mokken scales over increasing c, researchers will probably denote the solution of c = .1 as the final solution and, thus, will incorrectly ascribe too many items of two separate scales as belonging to the same scale. If the solution of c = .2 is interpreted as the final solution, the third scale is very small, and, given the different partitionings of Items 21 to 30 over increasing c, this third scale would be difficult to interpret. In either case, the correct dimensionality structure of the data is hard to deduce from the course of partitionings of Mokken scales over different c.
In sum, the application of MSA with a range of lowerbounds on a limited number of data sets has illustrated that MSA with a range of lowerbounds seems to have some benefits over the use of a fixed lowerbound c. However, MSA with a range of lowerbounds does not solve all limitations of MSA as a dimensionality assessment method, because the general mechanisms of MSA apply also to this procedure. That is, MSA creates Mokken scales, scales that are of maximal length, given a minimal strength of each item in the scale. If, as in real-data applications, the strength of the items differ within and between scales, then the dimensionality structure of the data can be difficult to disentangle from MSA, whether a fixed or a range of lowerbounds is used.
Empirical Example
To illustrate the difficulties in disentangling the dimensionality of a data set on the basis of MSA, MSA was applied on an empirical data set. The data from a study by Koster et al. (2009), who studied the psychometric qualities of the SPQ, were used. The SPQ was designed to assess four key themes of social participation of pupils with special needs in regular primary education. Koster et al. (2009) hypothesized that the four key themes would be represented by four subscales of the SPQ. In addition, the total SPQ scale was hypothesized to reflect social participation in general.
The data of all items of the SPQ were reanalyzed with MSA-AISP and MSA-GA, using a fixed lowerbound of c = .3 and a range of lowerbounds from .0 to .5 with steps of .1. The number of Mokken scales indicated, and the structure and strength of the items were evaluated. Furthermore, a bifactor model (a full hierarchical model; see Chen, West, & Sousa, 2006) was fitted, where each item loads on a general factor and a single specific factor, according to the subscale the items were hypothesized to belong to. The bifactor model was fitted using weighted least squares means and variance adjusted (WLSMV) estimation (suitable for polytomous data) in the Mplus program (Muthén & Muthén, 1998-2007). To determine the model fit of the bifactor model, the root mean square error of approximation (RMSEA; Browne & Cudeck, 1993; Steiger, 1990) and the comparative fit index (CFI; Bentler, 1990) were considered. The prevailing convention is that a RMSEA value of .08 or smaller and a CFI value of .95 or larger indicate an acceptable fit (e.g., Schermelleh-Engel, Moosbrugger, & Müller, 2003).
The MSA-AISP and the MSA-GA with a fixed lowerbound of c = .3 indicated a single Mokken scale with no unscalable items. The H value of this scale was .46, indicating a moderately strong scale. As can be seen in Table 3, the strength of the individual items range from .33 to .59.
The SPQ Items, With Loadings on the General (
Note: SPQ = Social Participation Questionnaire; MSA = Mokken scale analysis. Unscalable items are indicated with 0. Loadings >.30 in absolute value are indicated in boldface.
The range-of-lowerbounds procedure indicated a partitioning to which a unidimensional pattern and a multidimensional pattern appear to apply. As can be seen in Table 3, with increasing c, (a) all items are in one scale (up to c = .3), (b) four scales are formed (of which one larger one; with c = .4), and (c) four smaller scales are formed, and several items are rejected (with c = .5). This fit the description of multidimensionality, with the partitioning associated to the one of c = .4 as the final solution.
Confusingly, this partitioning also fits with the description of unidimensionality, as “(1) most or all items are in one scale (up to c = .3); (2) one smaller scale is found (with c = .4); and (3) one or a few small scales are found and several items are rejected (with c = .5)” (Hemker et al., 1995, p. 350). This would indicate one dimension, with the partitioning associated to the one of c = .0, hence including all items, as the final solution. Because the partitioning fits a unidimensional pattern and a multidimensional pattern, the authors conclude that the dimensionality structure of the data cannot be deduced from the range-of-lowerbounds procedure.
The bifactor analysis revealed that the bifactor model considered fitted adequately (RMSEA = .069, CFI = .974). Table 3 shows the items of the SPQ with their factor loadings on the general and specific factors of the bifactor model. As can be seen in Table 3, all items of the SPQ load moderately high on the general factor (loadings ranging from .47 to .92). This suggests that the items measure a common trait, albeit to a different extent. Furthermore, the loadings of the items on their specific factor range—in absolute value—from .06 to .73, suggesting that the items differ widely in their capability to measure a specific key theme. As can be seen in Table 3, the loadings on the specific factor for the key theme “Contacts/Interactions” are rather low (i.e., <.30 in absolute value) for six out of nine items, implying that those six items do not measure anything substantial beyond the common trait. For the remaining three key themes, the loadings on their specific factors are moderate to large (>.27), suggesting that those key themes can be sensibly distinguished from the common trait. On the basis of the bifactor analysis, the authors conclude that the SPQ–Revised measures one general trait, social participation, and, in addition, that the four key themes of social participation can be sensibly distinguished.
General Discussion
The aim of this study was to evaluate how well MSA performs as a dimensionality assessment method. To this end, the authors performed a simulation study in which they examined for a range of conditions how well MSA reveals the correct number of dimensions and the correct dimensionality structure. Their results indicate that MSA does not necessarily create scales that reveal the dimensionality of a data set. Rather, MSA creates Mokken scales: scales that are of maximal length, given a minimal strength of each item in the scale. This implies that a single Mokken scale can reflect unidimensional data and multidimensional data with strong items. Analogously, multiple Mokken scales can reflect multidimensional data and unidimensional data with weak items. As one does not know whether empirical data complies with the conditions in which MSA generally performs well, the empirical value of MSA as a dimensionality assessment method appears to be limited.
The alternative application of MSA with a range of lowerbounds seems to be a good alternative if the factors are not strongly correlated and the items do not differ substantially in item strength. If the items considerably differ in item strength, this application does not seem to be beneficial. The authors did consider only a limited number of data sets, and further research could reveal in more detail the benefits and limitations of the range-of-lowerbounds procedure.
In this study, the authors considered only a constrained variant of the bifactor model, namely, one in which the factor loadings of the bifactor model could be rotated to a model without a general factor, but with correlated factors (here a correlated three-factor model). This was done to ease the interpretation of the results and to make a connection to earlier studies (e.g., Mroch & Bolt, 2006; van Abswoude et al., 2004). However, the current findings do generalize to more complicated bifactor models, as the authors verified in a small simulation study.
This study indicated that MSA is not generally applicable as a dimensionality assessment method. This result gives rise to the question what alternative dimensionality assessment methods are. If one wishes to stay with nonparametric methods, possible alternatives are DIMTEST (Nandakumar & Stout, 1993; Stout, 1987; Stout, Goodwin Froelich, & Gao, 2001), DETECT (Zhang & Stout, 1999b), or HCA/CCPROX (Roussos, Stout, & Marden, 1998). More research should be conducted on these different methods to indicate which of these nonparametric alternatives are most viable for dimensionality assessment.
If empirical data would appear to comply to a reasonable extent with a parametric model, the authors would advocate the use a parametric model for dimensionality assessment. The use of parametric models has the advantage that it gives a sparse representation of the dimensionality structure of the data. In addition, the use of a parametric model has the advantage that it gives an estimation of the uncertainty of the model estimates. An example of a useful parametric model is a bifactor model, which is currently very popular in the personality and psychopathology domain (e.g., Reise et al., 2007). Because of the intimate relationship between factor models and IRT-based models (see, for example, Takane & de Leeuw, 1987), factor models such as the bifactor structure can be used in the general multidimensional item response theory (MIRT)–based models as well.
Footnotes
Acknowledgements
The authors would like to thank Marloes Koster, Sip Jan Pijl, and Els J. van Houten for sharing their data.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
