Abstract
An automated item selection procedure in Mokken scale analysis partitions a set of items into one or more Mokken scales, if the data allow. Two algorithms are available that pursue the same goal of selecting Mokken scales of maximum length: Mokken’s original automated item selection procedure (AISP) and a genetic algorithm (GA). Minimum sample size requirements for the two algorithms to obtain stable, replicable results have not yet been established. In practical scale construction reported in the literature, we found that researchers used sample sizes ranging from 133 to 15,022 respondents. We investigated the effect of sample size on the assignment of items to the correct scales. Using a misclassification of 5% as a criterion, we found that the AISP and the GA algorithms minimally required 250 to 500 respondents when item quality was high and 1,250 to 1,750 respondents when item quality was low.
Keywords
Introduction
Mokken scale analysis (MSA; Mokken, 1971; Sijtsma & Molenaar, 2002; Van Schuur, 2011) is a popular scaling method in many research areas. Google Scholar reports approximately 1,900 hits for the search terms “Mokken scale”, “Mokken scaling”, “Mokken model”, and “monotone homogeneity”. Key references using MSA can be found in marketing (Paas & Molenaar, 2005), medicine and health (Paap et al., 2011), political science (Van Schuur, 2003), psychology (Alterman, Cacciola, Habing, & Lynch, 2011), and sociology (Gow, Watson, Whiteman, & Deary, 2011). An intriguing question for researchers using complex multivariate methods such as MSA is which minimum sample size they need to obtain stable, replicable results. For MSA, in particular its automated item selection algorithm, minimum sample size is unknown but in this article we provide guidelines.
The automated item selection algorithm of MSA has two versions pursuing the same goal using different algorithms (Straat, Van der Ark, & Sijtsma, 2013a). The goal is to produce a division of an item set into one or more Mokken scales, if the data allow. Like most multivariate procedures, the item selection procedure optimizes a particular formal criterion to arrive at a result, and in doing so threatens to capitalize on chance. The danger of chance capitalization raises the question which minimum sample size is needed to produce results that are replicable in a new sample of data. A literature search of recent MSA applications revealed sample sizes ranging from 133 (Adler & Brodin, 2011) to 15,022 observations (Prince et al., 2010). For the latter sample size, we expect stable results but for the former sample size this is not evident, and knowledge of minimally required sample size would be highly convenient for assessing the stability of results obtained in smaller samples. Researchers have a limited amount of time and finances to collect data (Hedeker, Gibbons, & Waternaux, 1999) but wish to be able to replicate their findings in future studies (Jackson, 2003) and to have adequate statistical power to find the effects they pursue (Hedeker et al., 1999). These requirements call for knowledge of minimum sample size needed to construct scales.
Many studies investigated the minimally required sample size for other statistical methods such as regression analysis (e.g., Cohen, 1988; Green, 1991), factor analysis (e.g., Guadagnoli & Velicer, 1988; MacCallum, Widaman, Preacher, & Hong, 2002; Mundfrom, Shaw, & Ke, 2005; Velicer & Fava, 1998), multilevel analysis (e.g., Cohen, 2005; Hedeker et al., 1999; Snijders & Bosker, 1993), structural equation modeling (e.g., Bentler & Yuan, 1999; Jackson, 2003; Muthén & Muthén, 2002), and item response theory (e.g., Chuah, Drasgow, & Leucht, 2006; Hambleton & Jones, 1994; Hulin, Lissak, & Drasgow, 1982; Reise & Yu, 1990). Thus far, for MSA such studies have not been done. Minimum sample-size studies investigated the sample size that is minimally required to have precise parameter estimates. For MSA, the question does not concern parameter estimation but the correct partitioning of the items into scales.
This article is organized as follows. First, we discuss the monotone homogeneity model (MHM; Mokken, 1971; Sijtsma & Molenaar, 2002). Second, we discuss the two automated item selection methods for MSA. Third, we study the minimally required sample size to find the correct partitioning of the items. Fourth, we provide recommendations about minimum sample sizes required for item selection in MSA.
Monotone Homogeneity Model
The MHM (Mokken, 1971, chap. 4; also see Sijtsma & Molenaar, 2002; Van Schuur, 2011) is a nonparametric item response theory (IRT) model that is at the basis of MSA. The MHM for dichotomous items, scored (0, 1), is defined by three assumptions: The latent variable
Like the MHM, many parametric IRT models assume unidimensionality and local independence but unlike the MHM, they require parametric restrictions on the item response functions. IRT models for dichotomous item scores such as the Rasch (1960) model and the 2-parameter logistic model (Birnbaum, 1968) are special cases of the MHM. Hemker, Van der Ark, and Sijtsma (2001) showed that the polytomous-item MHM encompasses well-known parametric IRT models such as the partial credit model (Masters, 1982) and the graded response model (Samejima, 1969). Next, we discuss two automated item selection methods in MSA that can be used to partition items into clusters that satisfy the definition of a Mokken scale and approximate the requirements of the MHM.
Mokken Scale Analysis
Let
the scalability coefficient for item
and the scalability coefficient for the total scale score is defined as
A set of items forms a Mokken scale (Sijtsma & Molenaar, 2002, pp. 67-69) if (a) all interitem correlations are positive and (b) all coefficients
The objective of MSA’s automated item selection methods is to select a first Mokken scale containing as many items as possible, then from the unselected items, if any remain, to select a second Mokken scale containing as many items as possible, and so on until there are no items left or until items remain that are unscalable (Mokken, 1971; Straat et al., 2013a). The R package mokken (Van der Ark, 2012) contains two item selection algorithms that pursue this objective. One algorithm is the automated item selection procedure (AISP; Sijtsma & Molenaar, 2002) and the other is the genetic algorithm (GA; Straat et al., 2013a). We briefly describe the two item selection procedures. For a more extensive description, see Straat et al. (2013a).
Automated Item Selection Procedure
AISP is a bottom-up item selection procedure. AISP starts with selecting from all ½
Genetic Algorithm
Procedure GA uses a genetic algorithm to pursue the same goal as AISP. GA mimics an evolutionary process to search among all possible partitionings the partitioning that satisfies MSA’s scaling objective (Straat et al., 2013a). First, GA generates random partitionings and evaluates each partitioning with respect to the scaling objective. Second, the better a partitioning represents the scaling objective, the more likely it is that the partitioning is selected in a new, second population of partitionings that is drawn with replacement from the first population. The partitionings in the second population are randomly assorted and subjected to crossovers and mutations to accomplish that some of these partitionings become different from the partitionings in the first population. Next, GA evaluates the partitionings in the second population and produces a third population following the same rules that were used to produce the second population. After the formation of each population, GA records which partitioning was the best partitioning until the most recent population. If the best partitioning remains the same after having produced a pre-specified number of populations, this partitioning is reported as final.
Method
Due to lack of an analytical method for deriving the minimally required sample size for procedures AISP and GA, we used a simulation study to investigate the minimally required sample size. This took two stages. At the first stage, we studied the effect of sample size (16 levels, ranging from 50 to 3,500) on the assignment of items to scales. At the second stage, we searched for the minimally required sample sizes to obtain at least 80%, 90%, 95%, and 99% correct item assignment.
We also included independent variables in our design that may interact with the effect of sample size on the assignment of items. In exploratory factor analysis, Hogarty, Hines, Kromrey, Ferron, and Mumford (2005) found that in addition to sample size, size of the factor loadings, test length, and correlation between factors may have an effect on assigning items to scales based on the outcome of the exploratory factor analysis. We used the same design characteristics, replacing size of factor loadings by size of
Simulation Model
For the data simulation, we assumed a test consisting of
Design
In the design of the simulation study, six characteristics were fixed: (a) the distribution of the latent variables was bivariate standard normal; (b) the number of latent variables equaled 2; (c) the number of answer categories equaled 5; (d) lower bound
Sample Size
We investigated 16 different sample sizes: N = 50, 100, 250, 500, 750, 1,000, 1,250, 1,500, 1,750, 2,000, 2,250, 2,500, 2,750, 3,000, 3,250, 3,500. A pilot study showed that for sample sizes larger than 3,500 AISP and GA produced stable partitionings. This rendered studying larger sample sizes superfluous.
Test Length
We investigated short tests (
Correlation Between Latent Variables
The correlation between the latent variables
Value
A higher item discrimination causes a higher
Range of
Item Selection Procedure
We used procedures AISP and GA to analyze each data set. We used the R package mokken (Van der Ark, 2012) to run AISP and GA.
Dependent Variable
The Adjusted Rand Index (Hubert & Arabie, 1985; also see, e.g., Steinley, 2004) is the most commonly used index expressing the similarity of two partitionings. It can be used for evaluating a partitioning of an item set into one or more Mokken scales, and does this by comparing the obtained partitioning with a baseline partitioning. However, if all items are correctly selected into the same scale, the denominator of the Adjusted Rand Index equals 0; hence, the index is not defined. Especially when a single scale is the correct partitioning, we expect this situation to occur frequently. Thus, we decided not to use this popular index.
The partitioning of an item set into subsets can also be assessed by means of the Per Element Accuracy (
To obtain baseline partitionings, in each design cell we used 1,000,000 observations to determine the “true” item partitioning. The baseline partitionings were the following. For
In the first stage, in each design cell
Results
The minimally required sample sizes for the four pre-specified levels of
Minimum Sample Size Requirements for AISP and GA Procedures for Four Different Levels of PEA (Per Element Accuracy).
Note. .80 < PEA ≤ .90: mediocre; .90 < PEA ≤ .95: adequate; .95 < PEA ≤ .99: good; PEA > .99: excellent.
Per Element Accuracy Results for AISP.
Note. N = sample size; J = number of items.
The minimally required sample size for different
The minimally required sample size was larger if the
Discussion
The minimum sample size the AISP and GA procedures require to obtain stable results depends mainly on the positive difference between the
Due to this rather awkward situation, detailed sample-size guidelines cannot be provided prior to data collection. This situation is not typical for MSA and is also found with other methods. For example, for factor analysis the minimally required sample size depends on a complex interplay of many aspects, including the estimated factor loadings (Hogarty et al., 2005; also see Muthén & Muthén, 2002). For structural equation models, Muthén and Muthén (2002) suggested the minimally required sample size should be estimated by means of Monte Carlo studies in which for different sample sizes one simulates data sets using the hypothesized parameter values. If the model parameters are recovered well from the data, then the sample size is probably sufficient. For MSA, the situation is simpler because the minimally required sample size predominantly depends on one factor only, which is the distance between the observed and criterion values of the scalability coefficients. Because we have studied this factor extensively, we believe researchers need not resort to Monte Carlo studies to obtain the minimally required sample size for the AISP and GA procedures; the following three guidelines are sufficient.
First, one should always choose
Second, if possible test constructors should use a well-founded theory of the attribute of interest to develop tests, and carefully choose items that are well-discriminating indicators of the attribute, expressed by relatively high
Third, one should use prior information. In some cases, previous studies or pilot studies can be used to predict the
The guidelines in Table 2 can also be used as a posterior check to evaluate previous research. For example, Watson, Wang, Ski, and Thompson (2012) used MSA to analyze data obtained by administering a Chinese version of the MIDAS to a sample of 180 respondents, and reported a unidimensional scale with
Minimally required sample sizes have already been established for exploratory factor analysis and depend on a complex interplay between the item loadings on a factor and the number of items per factor. The effects of test length and number of latent variables on the minimally required sample size were negligible for the AISP and GA procedures. The decision rule for the minimally required sample size for the AISP and GA procedures is easier: For high-quality items, the AISP and GA procedures perform well for small sample sizes.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
