Abstract
An overview of popular software packages for conducting dimensionality assessment in multidimensional models is presented. Specifically, five popular software packages are described in terms of their capabilities to conduct dimensionality assessment with respect to the nature of analysis (exploratory or confirmatory), types of data (dichotomous, ordered polytomous, continuous, missingness), technical details, and statistics used for dimensionality assessment. Following descriptions of existing software packages, several promising potentially broadly applicable approaches are described that have been proposed but are not yet implemented in widely available software.
The development and emergence of applications of multidimensional item response theory (MIRT) models that specify multiple latent variables (e.g., Ackerman, 1996; Adams, Wilson, & Wang, 1997; Béguin & Glas, 2001; Bolt & Lall, 2003; Embretson, 1997; McDonald, 1997; Reckase, 1997; Walker & Beretvas, 2003; Yao & Boughton, 2007) bring with it the need for supporting data-model fit procedures in such contexts. Included among the areas of data-model fit for MIRT models is dimensionality analysis. Many choices exist with respect to procedures and software for dimensionality assessment of item response data. In this article, the authors review five popular software packages for dimensionality assessment that have established guidelines for use in terms of evaluating test statistics or referencing recommended cutoff values. Each software package is described with regard to the features that can be handled within the standard options of the software. Specifically, each software package is described in terms of its capabilities to conduct dimensionality assessment with respect to the
nature of the analysis, where at the exploratory end of the spectrum the number of latent variables is not specified, and at the confirmatory end of the spectrum the number of latent variables and the pattern of dependence of item responses on the latent variables are specified;
types of data, including dichotomous, ordered polytomous, continuous, and missing data (assuming missing at random);
technical details, including estimation, rotation, and model restrictions and allowances (e.g., of a lower asymptote parameter for dichotomous data); and
statistics used for dimensionality assessment with a focus on their use for multidimensional modeling.
Following descriptions of existing software packages, several promising potentially broadly applicable approaches are described that have been proposed but are not yet implemented in widely available software.
Software Description
Mplus
Nature of Analysis
One of the most popular programs for latent variable modeling, Mplus (Muthén & Muthén, 1998-2010) can be used to conduct either exploratory or confirmatory factor analysis for dimensionality assessment.
Types of Data
Mplus is one of the most flexible software packages available to researchers with respect to the types of data. It can handle dichotomous, polytomous, and continuous data; various types of correlational matrices; and missing data.
Technical Details
Exploratory or confirmatory factor analysis may be conducted using least squares or maximum-likelihood-based estimators. In addition, Mplus contains options for orthogonal (Varimax) and oblique (Promax) rotations of exploratory factor solutions. Mplus does not allow for lower asymptote parameters in neither the computation of the tetrachoric (or polychoric) correlations nor in the estimation of the model parameters in an exploratory or confirmatory approach. If the correlation matrix is not positive definite (as a result from negative factor or residual variances, out of bound correlations, or linear dependency between factors), Mplus will issue a warning and a solution will not be produced.
Statistics for dimensionality assessment
In exploratory factor analysis, the user specifies the number of latent variables to be extracted. Relevant Mplus output for each exploratory dimensionality analysis includes eigenvalues from the tetrachoric or polychoric correlation matrix, the (rotated) solution, the residual correlation matrix, the root mean square residual (RMSR), a χ2 statistics with associated degrees of freedom, and the root mean square error of approximation (RMSEA). Although Mplus does not determine the number of dimensions in exploratory mode directly, it can be used by researchers by specifying models of different number of dimensions. As Mplus outputs the previously mentioned relevant output for each requested solution, a user can determine dimensionality by applying certain criterion. For example, a user may wish to evaluate model fit between M and (M+ 1) factor solution, where M is the number of latent variables requested to be extracted. Then, via a χ2 difference test, a better fitting model may be retained. Alternatively, a user may examine dimensionality by comparing eigenvalues produced by Mplus with those obtained via parallel analysis (Horn, 1965). In such situations, the number of retained factors equals the number of eigenvalues produced by Mplus that are greater than a mean of simulated random eigenvalues from parallel analysis (e.g., see Crawford et al., 2010; Glorfeld, 1995, and Zwick & Velicer, 1986, for descriptions of various approaches for parallel analysis).
The procedures and statistics produced in confirmatory analyses come primarily from those in structural equation modeling traditions for continuous, polytomous, or dichotomous data. For continuous data, researchers have developed an overall model χ2 test statistic (e.g., Bollen, 1989) and a number of model fit indices (e.g., Hu & Bentler, 1999). For discrete data, scaling corrections are available for the usual χ2 statistics using likelihood or least squares estimation (Muthén & Muthén, 1998-2010; Satorra & Bentler, 1994). C. Yu and Muthén (2002) introduced a weighted root mean square residual (WRMR) for use with discrete data, suggesting that values <1 indicate adequate fit (Finney & DiStefano, 2006). At the local level, residual correlations or covariances serve to indicate the adequacy with which the model accounts for the item-pair associations. Mplus can produce all these indices based on estimating confirmatory models using least squares estimators based on tetrachoric and polychoric correlations or using full-information maximum likelihood techniques. Last, with the addition of Bayesian modeling via Markov chain Monte Carlo estimation in Mplus, users are able to examine dimensionality via posterior predictive model checking analyses (Gelman, Meng, & Stern, 1996) using likelihood ratio statistics.
TESTFACT
Nature of Analysis
The TESTFACT software package (Bock et al., 1999) can be used in either exploratory or confirmatory factor analysis. However, confirmatory factor analysis is limited to bifactor structures only.
Types of Data
TESTFACT supports analyses of polytomous and dichotomous data, permitting missingness, and proceeds by computing the tetrachoric correlation matrix. Because tetrachoric correlation matrices need not be positive definite, TESTFACT will produce a smoothed tetrachoric correlation matrix from which it extracts eigenvalues.
Technical Details
TESTFACT uses marginal maximum likelihood estimation procedure and supports Varimax and Promax rotations. The model supports the presence of lower asymptote parameters; however, they need to be supplied to the software, which requires that they be estimated outside the program (e.g., in BILOG; Mislevy & Bock, 1982).
Statistics for dimensionality assessment
In addition to eigenvalues, TESTFACT output yields a χ2 statistic treated as a χ2 variate with associated degrees of freedom. Exploratory dimensionality analysis can proceed in a model comparison framework by sequentially fitting models with additional latent variables and testing for the improvement in fit. Specifically, the test can be conducted by examining the difference in the χ2 statistics from models with M and (M+ 1) latent variables and evaluating that difference in terms of (a) a central χ2 distribution with degrees of freedom equal to the difference in the degrees of freedom for the two models or (b) twice the degrees of freedom (Haberman, 1977; Schilling & Bock, 2005). Similarly, a model comparison approach to determining the number of latent variables may proceed by fitting models with increasing number of latent variables and selecting the number of latent variables based on the model that yields the smallest value of likelihood-based information criteria (e.g., Akaike information criterion [AIC]; Akaike, 1987).
The Normal-Ogive Harmonic Analysis Robust Method (NOHARM)
Nature of Analysis
NOHARM, developed by McDonald (1962, 1967, 1981, 1997, 2000) and implemented in the program of the same name (Fraser & McDonald, 1988), uses a polynomial approximation to the normal-ogive MIRT model for either exploratory or confirmatory analysis. 1 When the exploratory option is selected, the user specifies the number of latent variables to be extracted (like in Mplus), and the pattern of coefficients is obtained. Unlike confirmatory analysis in TESTFACT, NOHARM is not limited to bifactor structures in its confirmatory analysis.
Types of Data
NOHARM supports dichotomous data only (i.e., not polytomous), and it does not allow for missing data. However, the user may input a product moment matrix, and therefore if missingness is present, an adjustment for missingness in estimating the product moment matrix may be used.
Technical Details
NOHARM uses least squares estimates based on the first- and second-order product moments, and, like TESTFACT and Mplus, it allows for Varimax and Promax rotations of the solution in exploratory approaches. Like TESTFACT, NOHARM allows for the input of lower asymptote parameters estimated elsewhere (e.g., BILOG).
Statistics for dimensionality assessment
Relevant NOHARM output for exploratory and confirmatory analyses of dimensionality includes the residual matrix for unique pairings of items, the sum of the squares of the residuals, RMSR, Tanaka’s goodness-of-fit index (GFI), as well as associated (rotated) factor loadings. In addition to the above-mentioned output, several statistics have been proposed that use the results of NOHARM, including
Several exploratory approaches to determine the optimal number of latent variables have been proposed. Finch and Habing (2005) suggested using tests of
To evaluate model fit in confirmatory analysis, Fraser and McDonald (1988) proposed that if the RMSR is of the order of four divided by the square root of sample size, then a test of significance would not reject the hypothesized model. In addition, to conduct a test of the fit of a NOHARM model,
Dimensionality Evaluation to Enumerate Contributing Traits (DETECT)
Nature of Analysis
The DETECT procedure and software (H. R. Kim, 1994; Stout et al., 1996; Zhang, 2007; Zhang & Stout, 1999b) is primarily an exploratory technique that seeks to identify dimensionally homogeneous clusters of items, thereby characterizing the amount of multidimensional approximate simple structure present in the data. 2 Conceptually, DETECT searches for a partitioning of the items into clusters where within-cluster conditional covariances are positive and between-cluster conditional covariances are negative.
Although mostly applied in its exploratory mode, a confirmatory DETECT analysis can be run by specifying the desired partition rather than searching for the optimal partition; characteristics and capabilities of exploratory DETECT resemble those when DETECT is used in a confirmatory mode.
Types of Data
DETECT can only handle dichotomously scored data and does not support missing data (see Zhang, 2007, on theory for supporting missingness). The procedure has been extended theoretically, and software has been produced to handle polytomous data (F. Yu & Nandakumar, 2001; Zhang, 2007) and missingness (Zhang, 2007). However, the software packages supporting these analyses are not available to the public.
Technical Details
DETECT does not fit a model per se; rather, as a nonparametric procedure, DETECT conditions on a number-correct-score-based estimate of the dimension of best measurement for the total test (Zhang & Stout, 1999a, 1999b; see Zhang, 2007, for the use of other conditioning variables supporting missing data) and searches for clusters of homogeneous items.
Statistics for dimensionality assessment
DETECT provides several relevant pieces of information for exploring multidimensionality. First, DETECT provides an estimate of the amount of multidimensionality that may be operationalized via an analysis of all the data (yielding
If multidimensionality is deemed present, the tenability of the assumption of approximate simple structure may be evaluated by considering the percentage of the signs of the conditional covariances that achieve the goal of having all within-cluster conditional covariances positive and all between-cluster signs be negative (approximate simple structure index, reported in DETECT as the IDN index). Alternatively, the ratio (R) of
If the hypothesis of approximate simple structure is supported, the solution may be interpreted in terms of the number of homogeneous item clusters as the number of dominant latent variables and the assignment of items to those clusters. To the extent that there are clusters with few items or if approximate simple structure does not hold, inferring the number of dominant latent variables should be done with caution (Jang & Roussos, 2007).
DIMTEST
Nature of Analysis
The DIMTEST program (Stout, Douglas, Junker, & Roussos, 1993) implements Stout’s (1987, 1990) test for essential unidimensionality: a confirmatory approach to hypothesis testing of a model of one latent variable.
Types of Data
DIMTEST, like DETECT, can only handle dichotomously scored data and does not allow for missingness. Extensions to polytomous data have been implemented in software (Li, Habing, & Roussos, 2010, 2011; Nandakumar, Yu, Li, & Stout, 1998); however, the software is not currently commercially available.
Technical Details
Although DIMTEST was explicitly built for testing assumptions of unidimensionality, Stout et al. (1996) proposed a straightforward approach to assess assumed multidimensional simple structure by using the assumed groupings of items to define the partitioning test (PT) and the assessment test (AT; see Stout et al., 1996, for full example and application). DIMTEST allows for the inputting of a single estimate of a lower asymptote parameter applied to all items.
Statistics for dimensionality assessment
As a nonparametric procedure, DIMTEST develops DIMTEST statistic by aggregating the conditional covariances among a set of suspect items, AT, conditional on the remaining items, PT. The items constituting the AT may be declared by the user or may be chosen by an exploratory process in DIMTEST.
Under the null hypothesis of essential unidimensionality, the resulting statistic is approximately distributed as a standard normal variable. In finite tests, the statistic is biased; current versions of DIMTEST correct for this bias by generating unidimensional data using a nonparametric model based on the original data (Stout, Froelich, & Gao, 2001). Currently, a more sophisticated version of DIMTEST is being developed, which could improve the subset selection for dichotomous and polytomous versions of DIMTEST (Li, 2011; Li et al., 2010, 2011).
Create-Your-Own Software
There are a number of other promising dimensionality assessment procedures that have not yet been implemented in widely available software; researchers wishing to use them are left to do their own programming. For example, parallel analysis (Horn, 1965) has been shown to be one of the most effective tools for conducting exploratory dimensionality analysis in factor analysis of continuous data (e.g., Buja & Eyuboglu, 1992; Glorfeld, 1995; Velicer, Eaton, & Fava, 2000; Zwick & Velicer, 1986). To date, only a few studies have examined the performance of parallel analysis in discrete data (e.g., Cheng & Weng, 2005; Tran & Formann, 2009). Importantly, software for conducting parallel analysis based on the results of exploratory factor analysis (e.g., O’Connor, 2000; Watkins, 2006) has focused on continuous data. Researchers wishing to implement parallel analysis for discrete data may need to write their own software or program the desired computations in a general statistical computing environment.
As another example, a researcher may consider the body of work surrounding local dependence indices, such as Yen’s (1984) Q3, Reckase’s (1997) model-based covariance, odds ratios and their logarithmic transformations and standardized values (Chen & Thissen, 1997), and χ2 and G2 statistics drawn from contingency table analyses (Agresti, 2002); see Ip (2001), Chen and Thissen (1997), and Levy, Mislevy, and Sinharay (2009) for descriptions and studies of these and related indices. The use of these indices for evaluating dimensionality trades on the connections between the assumptions of local independence and dimensionality (Ip, 2001; Levy & Svetina, 2011; Nandakumar & Ackerman, 2004). They have been used for confirmatory approaches to dimensionality assessment, almost exclusively in assessing unidimensionality (Chen & Thissen, 1997; Levy, 2011; Levy et al., 2009). Levy and Svetina (2011) examined several of these in the context of multidimensional models, and building off of them they developed a generalized dimensionality discrepancy measure for use in confirmatory dimensionality assessment when fitting multidimensional models. These indices and approaches hold promise in that they appear flexible enough to handle dichotomous, polytomous, and continuous data; missingness; complex as well as (approximate) simple structures; and models with lower asymptote parameters for dichotomously scored items.
Despite this promise, and in some cases a long-standing popularity of these indices (e.g., Q3), they are not included in any widely available software. Rather, analysts are left to do the programming on their own via stand-alone software targeting specific applications (Chen, 1998; S.-H. Kim, Cohen, & Lin, 2006) or by writing their own code in a general statistical computing environment (Levy & Svetina, 2011), none of which is yet widely available to the mainstream research community. An additional complication is the necessary integration with additional software that conducts model estimation. This is not the case with Mplus, TESTFACT, and NOHARM, which do estimation within the software, and DETECT and DIMTEST that do not require estimation but use an observed score approximation to a single latent dimension.
Summary
The purpose of this article was to describe popular and available software and procedures for the assessment of multidimensionality that may be useful to researchers and practitioners across disciplines. These procedures have been successfully applied, and some simulation studies have been conducted that focus on certain aspects or situations (e.g., Finch & Habing, 2005, 2007; Levy & Svetina, 2011; Svetina, 2011); however, more simulation studies comparing them are warranted, particularly given the increased use of multidimensional psychometric models and the relative lack of work evidencing the procedures’ relative strengths and weaknesses.
In addition, it is important to note that each of the programs has advantages and disadvantages associated with it. In some programs or procedures determining the number of dimensions may seem as a rather straightforward process. For example, in DETECT, a researcher may infer the number of dimensions as the number of nonoverlapping clusters outputted by DETECT. In other programs, such as Mplus, TESTFACT, or NOHARM, the user must specify the number of factors in fitting the model and make further investigation to determine the optimal solution. In other words, there is no direct count of dimensions of the response data at hand. Rather, a researcher must first determine the number of factors to fit in an exploratory factor analysis. This could mean that a researcher would fit a single-, two-, and three-factor exploratory models in NOHARM. Then, using some criterion (e.g.,
Finally, although the purpose of this article is not to make specific recommendations about which method or software should be used in particular situations (see Jasper, 2010; Levy & Svetina, 2010; Nandakumar & Ackerman, 2004), the authors emphasize that dimensionality assessment involves a heavy involvement on the part of the researcher regardless of the procedures used, including those that output the count of optimal number of dimensions. In evaluating the dimensionality of the item responses at hand, researchers should carefully examine any solution provided directly or indirectly from a procedure.
Footnotes
Acknowledgements
The authors would like to thank the editor and two anonymous reviewers for their useful comments and suggestions regarding earlier drafts of the manuscript.
Authors’ Note
The opinions expressed are those of the authors and do not represent views of the Institute or the U.S. Department of Education.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The work of the second author was supported by the Institute of Education Sciences, U.S. Department of Education, through Grant R305D100021 to Arizona State University.
