Abstract
Most computerized adaptive tests (CATs) have been studied using the framework of unidimensional item response theory. However, many psychological variables are multidimensional and might benefit from using a multidimensional approach to CATs. This study investigated the accuracy, fidelity, and efficiency of a fully multidimensional CAT algorithm (MCAT) with a bifactor model using simulated data. Four item selection methods in MCAT were examined for three bifactor pattern designs using two multidimensional item response theory models. To compare MCAT item selection and estimation methods, a fixed test length was used. The Ds-optimality item selection improved θ estimates with respect to a general factor, and either D- or A-optimality improved estimates of the group factors in three bifactor pattern designs under two multidimensional item response theory models. The MCAT model without a guessing parameter functioned better than the MCAT model with a guessing parameter. The MAP (maximum a posteriori) estimation method provided more accurate θ estimates than the EAP (expected a posteriori) method under most conditions, and MAP showed lower observed standard errors than EAP under most conditions, except for a general factor condition using Ds-optimality item selection.
Keywords
Introduction
In the past several decades, research has repeatedly demonstrated that a computerized adaptive test (CAT) can be at least an average of 50% shorter than a paper-and-pencil test with equal or better measurement precision (Chang & van der Linden, 2003; Gibbons et al., 2008; Kingsbury & Weiss, 1980, 1983; van der Linden, 1998; Weiss, 1982, 1985). However, CAT primarily has been studied using the framework of unidimensional item response theory (UIRT), which is widely used in educational and psychological research to model how examinees respond to test items. Although most item response theory (IRT) models that are currently used assume that test items measure a single dominant latent trait, it is not always practical to assume that a test measures only a single latent trait (Reise, Morizot, & Hays, 2007). Most personality inventories in psychology are designed to measure multidimensional latent traits rather than a single latent trait (e.g., the NEO Personality Inventory; Costa & McCrae, 1992). Therefore, it is appropriate to introduce a multidimensional latent space, in which multidimensional IRT (MIRT) modeling would be adopted beyond the framework of unidimensionality in search of a more generalizable model to fit real data.
Several studies have examined the effects on item parameter estimation if UIRT is applied to multidimensional data (Ackerman, 1989; Ansley & Forsyth, 1985; Reckase, 1974). A general finding from these studies is that if there is a predominant general factor in the data, the presence of multidimensionality has little effect on the estimation of item and trait parameters. However, if the data have strong secondary factors beyond the primary factor, the application of a UIRT model results in a serious distortion of the measurement characteristics of the instrument. Indeed, the validity of UIRT applications (linking, model-fit, parameter estimation, scoring, and CAT) would also be questioned in a situation in which it is reasonable to hypothesize a multidimensional latent space. Consequently, CAT might not guarantee an optimal test for individual examinees unless IRT parameter estimates are accurately prespecified given an appropriate model for the data.
Through both adaptive and conventional testing with dichotomous three-parameter logistic (3PL) IRT, Folk and Green (1989) demonstrated that a unidimensional model applied to two-dimensional data affects item parameter estimates. Their study demonstrated that if nondominant factors do not affect scale scores, the trait (θ) can be estimated with an assumption of a unidimensional latent trait underlying the data in conventional testing. However, if the two-dimensional latent traits are relatively uncorrelated and dominant in the data, using one or the other trait creates a large difference in θ estimates. Folk and Green (1989) concluded that the difference between θ estimates in CAT was greater than in conventional testing because UIRT item discrimination parameter estimates are used for both the item selection and θ estimation procedures in CAT.
Since Bock and Aitkin (1981) extended the IRT model to a multidimensional case, many researchers have studied CAT using a bank of items calibrated under MIRT models. Initially, Bloxom and Vale (1987) developed multidimensional adaptive estimation procedures. They extended the multivariate analysis of Owen’s (1975) sequential Bayesian adaptive updating algorithm. Then, Tam (1992) evaluated a multidimensional adaptive estimation procedure through precision, test information, and computational time. However, these studies (Bloxom & Vale, 1987; Tam, 1992) considered the implementation of CAT with respect to only θ estimation methods. They did not address the procedure for multidimensional adaptive item selection that considers prior knowledge of a multivariate distribution of θ.
Although Bloxom and Vale (1987) and Tam (1992) initially developed multidimensional CAT (MCAT), their MCAT failed to demonstrate advantages over unidimensional adaptive testing (Segall, 1996). Therefore, Segall (1996) developed multidimensional Bayesian item selection and θ estimation procedures and demonstrated that his MCAT was more efficient than a unidimensional CAT (UCAT) in terms of test length and precision. In addition to gaining efficiency, his MCAT could be used as an instrument to measure various content traits for examinees from nine different subtests of the Armed Services Vocational Aptitude Battery (Moreno & Segall, 1992). Luecht (1996) also demonstrated the efficiency of MCAT. He observed that an MCAT with content constraints could achieve approximately the same precision with 25% to 40% fewer items than were required in UCAT with regard to the measurement of latent traits.
Furthermore, Li and Schafer (2005) showed that UCAT and MCAT, with constraints on item exposure rates, were capable of producing accurate estimates of reading and math abilities. Specifically, compared with UCAT, MCAT slightly increased the accuracy of θ estimates for examinees at the low and high end of the θ scale in both reading and math tests. Therefore, MCAT appears to be an efficient method for ensuring adequate coverage of content in adaptive testing and provides a separate multidimensional vector of estimated θs for each examinee.
Segall (1996) used a confirmatory simple structure to implement MCAT, in which items within one scale were assumed to measure the same latent trait, and each item was loaded on only one latent trait. For that reason, this confirmatory simple structure MIRT model is called a “multi-unidimensional” IRT (multi-UIRT) model or a “between-item” MIRT model (Wang & Chen, 2004). However, these models confine each item to measuring a single latent trait similar to the multiple scale procedure, which is not realistic for many multidimensional constructs. Usually, the constraint of dimensional independence among factors is not appropriate for correlated data because latent traits are generally correlated with each other. Although MIRT models (e.g., Ackerman, 1989; Bock & Aitkin, 1981; Browne, 2001) allow latent traits to correlate with each other, they do so from an exploratory factor analytic perspective so that their latent traits are not readily interpretable.
To eliminate this constraint, a multidimensional item response model is needed that (a) measures more than one latent trait, (b) yields readily interpretable latent traits, and (c) directly estimates item and person parameters jointly. As a response to the need, the bifactor model was applied in CAT (Weiss & Gibbons, 2007), and the second-order factor model also was used in CAT (Huang, Chen, & Wang, 2012). Figure 1a describes the UIRT model, and Figure 1b represents the multi-UIRT model (Segall, 1996; Wang & Chen, 2004). However, these models have not been applied to empirical data analysis to investigate latent trait structures, such as intelligence (e.g., Horn, 1986). Therefore, many researchers have described the simple models by adding factors between the test-specific factors and the general factor. These models can be formulated within the framework of second-order factor analysis, which is similar to the multi-UIRT model. Figure 1c illustrates a path-analytic representation of a simple second-order factor model, involving six observed variables (X1-X6), two first-order factors (F1 and F2), and a general factor based on the correlation between these two factors. The effect of the general factor on the observed variables is mediated by a particular first-order factor, and the effect size is proportional to the loading of the first-order factor on the general factor. This second-order factor model is different from the preceding models that are characterized by group factors or a single broad construct. A major advantage of the second-order factor approach is to simultaneously identify first-order factors and second-order factors. Huang et al. (2012) have implemented the second-order IRT model in MCAT. However, second-order factors are conceptually abstract constructs and have different interpretations from first-order factors because the higher-order factors are not directly related to observed variables (Chen, West, & Sousa, 2006). Additionally, the use of MCAT with the second-order IRT model was limited to the number of first-order latent traits because of the inefficiency and complexity of estimating parameters of the second-order IRT model (Huang et al., 2012). Figure 1d presents the bifactor model of the six observed variables, which is simply an extension of Spearman’s two-factor model. In a theoretical framework, a general factor in the bifactor model contributes to all variables with group factors. Because a general factor and group factors are all first-order factors, it is not more complicated to apply the bifactor model in MCAT than UCAT. Additionally, the general factor can be interpreted as an essentially unidimensional trait in an IRT model if the general factor loadings are more dominant than the group factor loadings (Reise et al., 2007).

Four types of IRT models based on a confirmatory factor analytic perspective.
Bifactor Models
Holzinger and Swineford (1937) originally applied the term bifactor to a test measuring psychological traits. They defined the bifactor pattern as a theoretical framework in which all variables are explained by a general factor and group factors, both as first-order factors. This bifactor pattern assumes that uncorrelated group factors are independent of the general factor. The bifactor model allows only one of the k = 2, . . . , p values of λ ik (group factor loadings) to be nonzero, in addition to λ i1 (the general factor loading). For example, the theoretical bifactor pattern with one general factor and two group factors for six items can be described as
The first column is the general factor, and the other columns are the group factors in the factor pattern matrix. Cai, Yang, and Hansen (2011) proposed a bifactor-like structure if an item always loads on the general factor and is permitted to load on at most one specific factor. For example, if Items 3 and 4 did not load on a specific group factor, λ32 and λ43 would be zero in Equation 1.
The Schmid-Leiman solution allows the bifactor model to build an unrestricted model (exploratory bifactor model) from a polychoric correlation matrix, which can be implemented in the R function “schmid” (“psych” package; Revelle, 2015), SAS or SPSS macros (Wolff & Preising, 2005).Target pattern rotation is another way to estimate a less restricted model for the bifactor model (Browne, 2001). A confirmatory bifactor model can be directly implemented by the R program (“sem” package), Mplus (Muthén & Muthén, 2004), and LISREL (Jöreskog & Sörbom, 1995). Jennrich and Bentler (2012) introduced an exploratory bifactor analysis using a bifactor rotation criterion. However, those methods are based on a linear factor model using a factor correlation matrix. In these applications, the models are employed to obtain the dimensions only instead of examinees’ scores.
Gibbons and Hedeker (1992) specified the full-information item bifactor analysis (FIIBFA) model as combining a bifactor model with the multi-UIRT model representing simple structure, meaning that each item is related to a general trait and one group trait only. In the two-dimensional computation in the bifactor model, the primary dimension should be considered first, and then the second dimension can be considered to estimate the probability of a correct response. Consequently, the conditional probability of the item response
where the latent variable
Chen et al. (2006) differentiated the bifactor model from a second-order factor model. A bifactor model is potentially applicable if there are multiple domain group factors, each of which is hypothesized to account for the unique influence of the specific domain over and above the general factor. However, a second-order factor model is potentially applicable if the first-order factors are substantially correlated with each other, and there is a second-order factor that is based on the correlations among the first-order factors. The second-order factor model can be employed to test whether a second-order factor explains the first-order factors.
Reise et al. (2007) demonstrated that 16 items in the consumer assessment of health care providers and systems (CAHPS 2.0) were fitted well both the bifactor model and a second-order factor model. However, they showed that the major part of common variance was explained by a general factor in the bifactor model, which is a main conceptual difference with the second-order factor in a second-order factor model. A second-order factor contains a qualitatively different dimension from first-order factors because a second-order factor explains the common variance among first-order factors, not observed variables. However, a general factor in the bifactor model is on the same conceptual level with group factors. Consequently, Reise et al. (2007) said that although both the bifactor model and second-order factor model can provide the same fit to data, the second-order factor model does not directly address if the data is unidimensional or multidimensional, whereas the bifactor model can do so directly.
A number of researchers have demonstrated that the bifactor model provides an excellent framework to measure multidimensional traits containing a primary construct (Gustafsson & Aberg-Bengtsson, 2010; Reise, Moore, & Haviland, 2010). However, the bifactor model is still poorly understood and seldom used by applied researchers. Initially, Weiss and Gibbons (2007) implemented a CAT algorithm with the bifactor model using the completed 616 items of the 626 items in “The Mood-Anxiety Spectrum Scales” (Cassano et al., 1997) and evaluated the efficiency and precision of the performance of CAT with the bifactor model. However, Weiss and Gibbons’ (2007) algorithm was still based on unidimensional testing. Therefore, there is a need for an algorithm of item selection and scoring that is truly multidimensional for CAT with a bifactor model. The objective of this study was to evaluate appropriate multidimensional item selection and
Method
Four factors that reflect realistic testing situations and could affect the precision of CAT were considered: (1) two MIRT models, (2) four item selection methods, (3) three bifactor pattern designs, and (4) two θ estimation methods. The comparison was based on three dependent variables, including the correlation between true θ and estimated θ (
Response Generation
To approximate the condition of equal measurement precision throughout the θ range, item banks contained 400 dichotomous items for the bifactor model and 600 items for a “bifactor-like” model, which were approximately the numbers of items in the Mood-Anxiety Spectrum Scales for which Weiss and Gibbons (2007) implemented a CAT algorithm with the bifactor model. The item responses were generated according to the bifactor model using an R program (R Development Core Team, 2012). In the Monte Carlo simulation study, IRT parameters that could be transformed into factor analytic parameters were specified. The equation for the probability of a correct response for a 3PL bifactor IRT model is
where
The item responses for this study were generated given the true θs and item parameters using Equation 3. The first step in the data generation process was to generate 400 or 600 random numbers from U[0, 1] for each examinee. The probability of a correct response given the 2PL or 3PL bifactor IRT model was obtained for each item, conditional on θ. These model-based probabilities were compared with the random numbers to obtain the item responses for each item. If the model-based probability was greater than the random number, the response to that item was recorded as correct (1). Likewise, if the model-based probability was less than the random number, the item response was recorded as incorrect (0). This process was repeated for each item to obtain the full item response matrix for the 400 or 600 items for each simulated examinee. To reduce the variance of the dependent variables, a total of 1,000 simulees were generated within each of three sets of bifactor pattern designs.
Bifactor Pattern Designs
Each item was assigned a vector of discrimination parameters
A bifactor model is especially appropriate if researchers have instruments with a dominant general factor (Reise et al., 2007). Reise et al. (2007) stated that if items tend to have small loadings on the general factor and large loadings on the group factor, the multi-UIRT model should be used, and if the general factor loadings are larger than group factor loadings, the bifactor model should be used to measure traits. Table 1 shows three bifactor pattern designs to represent typical bifactor patterns and bifactor-like pattern. The purpose of the three pattern designs was to examine the effect of bifactor pattern on the estimates of general and group factors. All three pattern designs were reasonable when translated into factor loadings without any Heywood cases (Heywood, 1931), and all values were positive values. To generate standardized general and group factor discrimination parameters, values of the multidimensional discrimination index (MDISC; Reckase, 1985) were drawn from a log-normal distribution with a mean of zero and standard deviation of 0.20. In the traditional bifactor pattern with the low group factor discrimination parameters, the discrimination values were then calculated from MDISC such that the first 200 items had an angle of 15 degrees with the general factor axis and 75 degrees in the first group factor axis, whereas Items 201 to 400 had angles of 15 degrees with the general factor axis and 75 degrees in the second group factors. With the specified angles between the general and group factors, the general factor had some items that weighted more heavily in its direction in order to reduce the indeterminacy of which factor would likely be the general factor (DeMars, 2007). In the traditional bifactor pattern with the high group factor discrimination parameters, the discrimination values were then calculated from MDISC such that the first 200 items had an angle of 30 degrees with the general factor axis, and 60 degrees with the first group factor axis, whereas Items 201 to 400 had angles of 30 degrees with the general factor axis and 60 degrees with the second group factors. In the bifactor-like pattern, the discrimination values were then calculated from MDISC such that the first 200 items had an angle of 30 degrees with the general factor axis and 60 degrees with the first group factor axis, whereas Items 201 to 400 had angles of 30 degrees with the general factor axis and 60 degrees with the second group factor and Items 401 to 600 loaded on only the general factor with MDISC.
Three Discrimination Parameter Conditions for the Bifactor Models.
Note. G = general factor;
For an item bank providing equal measurement precision across θs, item difficulty parameters,
True
s
Although the bifactor model can be used as an exploratory type of analysis using a bifactor rotation criterion (Jennrich & Bentler, 2012), this study applied a confirmatory bifactor analysis within an IRT framework. Because the bifactor model in this study was constructed so that uncorrelated group factors were independent of the general factor (Holzinger & Swineford, 1937), it was not necessary to consider the inter-correlations of traits. Therefore, each examinee had latent traits (
Estimation
The multidimensional maximum a posteriori (MAP; Bock & Aitkin, 1981) method was used to estimate
In the bifactor model, Gibbons et al. (2007) simplified the expected a posteriori (EAP) method to estimate the primary latent variable
where Pl
is the unconditional probability of observing response pattern
The EAP estimate of
where
To evaluate the integrals, these integrations can be reasonably approximated using the Gauss-Hermite quadrature nodes and weights (see Stroud & Sechrest, 1996).
Item Selection for MCAT
After estimating current
D-Optimality
Because this criterion maximizes the determinant in Equation 9, it is called D-optimality. In UCAT, items can be selected on the basis of item information. Likewise, in MCAT, the provisional trait estimate vector,
where
where
where
Ds-Optimality
Mulder and van der Linden (2009) stated that Ds-optimality (Silvey, 1980) reflects the optimal item selection for MCAT if the first ability of the ability vector
A Bayesian version of Ds-optimality was applied in this study by adding the inverse of a prior covariance matrix to Equation 11.
Mulder and van der Linden (2009) showed that this criterion generally selects items that highly discriminate with respect to the “intentional” ability, θ1, except if the amount of information about the “nuisance” abilities is relatively low.
A-Optimality
The A-optimality method minimizes the sum of the asymptotic variances of the estimates, resulting in selection of the item that minimizes the traces of the inverse to the information matrix,
Because A-optimality results in an item selection criterion that contains the determinant of the information matrix as an important factor, it is similar to the D-optimality method but is different from the Ds-optimality method (Mulder & van der Linden, 2009). A Bayesian version of A-optimality was applied in this study by adding the inverse of a prior covariance matrix to Equation 13:
E-Optimality
The criterion of E-optimality maximizes the smallest eigenvalues of the information matrix. The minimum of the eigenvalues of
Implementing the MCAT Algorithm
In the MCAT algorithm, θ estimation and item selection proceeded for all dimensions simultaneously. To compare four item selection and two θ estimation methods, the CAT was fixed length. Weiss and Gibbons (2007) showed that the mean number of items administered in CAT ranged from approximately 20 to 50 items per scale to recover each scale score with a correlation greater than .90 for all group factor scales. Therefore, MCAT in the present study terminated after 40 items were administered in the bifactor model with two group factors. The response pattern, current θ estimates, and OSE for each examinee were saved after each item was administered. Because there was no available commercial software to implement the MCAT algorithm with the bifactor model, the MCAT algorithms were developed in the R language (R Development Core Team, 2012) by the author. To validate the program, θ estimates obtained by two estimation methods for the full-length MCAT algorithm were compared with those computed by “mirt” R package (Chalmers, 2012). The two sets of factor score matrices were identical to each other, rounded to two decimal places.
Evaluative Criteria
Four factors were examined: (1) three bifactor patterns, (2) four item selection methods, (3) two θ estimation methods, and (4) two MIRT models. The
where j is an examinee, k is each factor, and N is the number of examinees. The efficiency of
The OSE of the EAP method in the MCAT was obtained by taking the square root of the posterior variances in Equations 5 and 7. These indices provided descriptive information about the recovery of
Results
Estimation Issues
The first operational problem for θ estimation methods in the MCAT with the bifactor model was to satisfy the convergence criterion of .001 because of difficulty meeting simultaneously the convergence criterion of .001 for all latent traits. Therefore, this study used θ estimation algorithms with a maximum of 10 iterations instead of the convergence criterion of .001. To ensure convergence of 1,000 simulees’ estimates across each condition, the convergence criterion of each factor was investigated by an R program. The average convergence criteria for each latent trait of 1,000 simulees were less than .001 for all latent traits when the maximum number of iterations was set to 10. The second computational issue was to find MLE estimates in the MCAT algorithm with the bifactor model. The MLE estimation method in the bifactor model has a computational issue in that the Hessian matrix becomes singular during updating
The bifactor model in this study was constructed so that uncorrelated group factors were independent of the general factor (Holzinger & Swineford, 1937). According to the model assumption,
Correlations Among
Note. MAP = maximum a posteriori; EAP = expected a posteriori.
Fidelity, Accuracy, and Efficiency
Tables 3 and 4 summarize
Note. MAP = maximum a posteriori; EAP = expected a posteriori. G is the general factor,
RMSEs Across Four Item Selection Algorithms for Three Bifactor Pattern Designs
Note. MAP = maximum a posteriori; EAP = expected a posteriori; RMSE = root mean square error. G is the general factor,
The precision of
MAP estimation showed higher
Table 5 displays the average OSEs of the
OSEs of Estimates Across Four Item Selection Algorithms for Three Bifactor Pattern Designs.
Note. OSE = observed standard error; MAP = maximum a posteriori; EAP = expected a posteriori. G is the general factor,
Similar to the results of correlation and RMSE, OSEs for the group factors in the bifactor pattern with high group factor discrimination parameters were smaller than the other bifactor pattern designs across all conditions, whereas the general factor OSEs in the bifactor design with low group factor discrimination were smaller than the other bifactor pattern designs across all conditions. If the average group factor discrimination parameters increased, OSEs of the group factors were decreased. Because higher item information considers the decline of OSEs, the increase in the group factor discrimination parameters provided low OSEs for group factors in MCAT with the bifactor model. In the model comparison, the 2PL bifactor model provided lower OSEs than the 3PL bifactor model across all conditions.
Discussion and Conclusions
This study applied the bifactor model to MCAT under varied bifactor pattern designs using two MIRT models, three θ estimation methods, and four MCAT item selection methods. Results showed that MLE cannot be applied within the bifactor model because of singularity of the information matrix. Therefore, only MAP and EAP θ estimation methods were analyzed. By examining the correlations among estimated θs, this study demonstrated that the MCAT algorithm satisfied the independence of the general factor and the group factors. An interesting finding in this study was that Ds-optimality item selection worked well with EAP estimation for the general factor scores. For the group factors, however, D- or A-optimality item selection improved both the accuracy and efficiency of
High group factor discrimination parameters contributed to higher precision and better efficiency of only group factor
Although development of MIRT began many years ago, there are few practical applications of MIRT to CAT because of its complexity. The advantages of MCAT with the bifactor model are that (1) it has higher fidelity and accuracy with respect to a k-dimensional vector of traits for each examinee as compared with using successive unidimensional CATs and (2) it satisfies independence of the general factor θ estimates and the group factor θ estimates (the general factor score can be interpreted as the unidimensional trait in UCAT). In the future, therefore, continued research on MCAT with the bifactor model is warranted and the analytic techniques in this study should help researchers who would implement MCAT with the bifactor model find the systematic and optimal item selection and estimation methods that are appropriate under certain circumstances for practical application.
However, some obvious issues are raised by this study for future research. This study used the bifactor model to implement MCAT as one specific model among many possible multidimensional models. By considering the effect of multidimensionality on IRT item parameter estimates, the bifactor model estimates a substantively common trait as well as capturing multiple constructs (Reise et al., 2007). However, the bifactor model is not versatile for all psychological data. Gibbons et al. (2008) mentioned some limitations of the bifactor model. First, the bifactor model specification relies on prior information to indicate the relationships between items and factors. Second, a primary (i.e., general) dimension is assumed to exist. If the test items show a strong simple factor structure, the bifactor model will not be useful. Under these circumstances, it would be appropriate to use a unidimensional model for each factor structure rather than the bifactor model. This model comparison is empirically testable by comparing the fit of the bifactor model with corresponding unidimensional models and unrestricted multidimensional models by using TESTFACT (Wood et al., 2003) or the “mirt” R package (Chalmers, 2012). Third, the bifactor model requires each item to load on a primary dimension and on no more than one subdomain. If items are related to multiple subdomains, they will not be appropriate for the bifactor model. Therefore, the bifactor model can be adapted to MCAT only if the researcher has strong theoretical beliefs concerning the structure of a domain or has evidence concerning the actual dimensional structure of a test, and after demonstrating that the bifactor model fits better than alternative models.
In this study, only a small subset of bifactor structures was examined to investigate the quality of MCAT with the bifactor model. In this regard, this study has several limitations that could be of interest in future studies. First, this study generated the data using specific instances of the bifactor structure. This means that the mean and standard deviation of group factor discrimination parameters for each group factor were applied equally to generate group factor discrimination parameters for each item. Second, this study used only three bifactor loading conditions (with low group factor loadings, high group factor loadings, and bifactor-like loadings). Future studies should investigate whether the same conclusions can be arrived if the conditions are further varied. Furthermore, additional multidimensional item selection methods can be considered, such as maximizing the Kullback-Leibler information (Veldkamp & van der Linden, 2002). Yao (2013) recommended the Kullback-Leibler method for variable-length MCAT. Possible directions could be usage of the complete information matrix to capture more information rather than applying the determinant alone or a simple algorithm that reduces the computational load. A new item selection method comprising the bifactor model needs to be developed and compared with usual multidimensional item selection methods in future research. Additionally, there are also other potential research issues in MCAT with the bifactor model, such as how to select the first item and how to control item exposure. MCAT with the bifactor model could resolve the content-balance issue in practical testing. Because each item in the bifactor model loads on only one specific group factor, MCAT with the bifactor model alternated items that loaded on each group factor, which functioned as content balancing. MCAT with the bifactor model relatively administered an equal number of items with respect to each of the group factor scales, which would result in content-balanced θ estimates based on various mixtures of group factor scales, which is an additional advantage of MCAT with the bifactor model.
Finally, researchers should consider some computational issues when MCAT with the bifactor is applied to practical data. First, MLE did not operate in the MCAT with the bifactor model. As mentioned above, the Hessian matrix (p×p symmetric matrix) of second derivatives (the negative information matrix) evaluated at
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
