Abstract
The present study examined the factor structure of the Differential Ability Scales–Second Edition (DAS-II) core subtests from the standardization sample via confirmatory factor analysis (CFA) using methods (bifactor modeling and variance partitioning) and procedures (robust model estimation due to nonnormal subtest score distributions) recommended but not included in the DAS-II Introductory and Technical Handbook. CFAs were conducted with the three DAS-II standardization sample age groups (lower early years [age = 2:6–3:5 years], upper early years [age = 3:6–6:11 years], school age [7:0–17:11 years]) using standardization sample raw data provided by NCS Pearson, Inc. Although most DAS-II core subtests were properly associated with the theoretically proposed group factors, both the higher order and bifactor models indicated that the g factor accounted for large portions of total and common variance, whereas the group factors (Verbal, Nonverbal, Spatial) accounted for small portions of total and common variance. The DAS-II core battery provides strong measurement of general intelligence, and clinical interpretation should be primarily, if not exclusively, at that level.
Keywords
The Differential Ability Scales–Second Edition (DAS-II; Elliott, 2007a) is a popular battery of cognitive tests to assess intelligence of children and adolescents aged 2 to 17 years and, although becoming somewhat dated (the norms are now more than 12 years old), is still currently used by practitioners and included in omnibus interpretive systems such as Cross-Battery Assessment. The DAS-II is a revision of the DAS (Elliott, 1990), an adaptation of the British Ability Scales (Elliott et al., 1979) that was standardized for use in the United States. There are three age-related levels: lower early years (2:6–3:5 years), upper early years (3:6–6:11 years), and school age (7:0–17:11 years), and the three levels contain different configurations of 10 core subtests appropriate for each age. These subtests combine to yield a General Conceptual Ability (GCA) score, a higher-order composite score thought to measure psychometric g (Spearman, 1927). There are also three first-order composite scores called cluster scores (Verbal Ability [V], Nonverbal Reasoning Ability [NV], and Spatial Ability [SP]) that are hypothesized to reflect more specific and diverse aptitudes. In addition, the DAS-II provides users with nine supplementary subtests across the various age brackets, which contribute to the measurement of three diagnostic cluster scores (Processing Speed, Working Memory, and School Readiness). However, these indicators do not contribute to the measurement of the GCA or the three primary cluster scores and thus were not the focus of the present investigation.
Although the Introductory and Technical Handbook (Elliott, 2007b) indicated that the DAS-II development was not driven by a single theory of cognitive ability, the content and structure of the DAS-II were heavily influenced by the Cattell–Horn–Carroll (CHC) model of cognitive abilities (Carroll, 1993, 2003; Cattell & Horn, 1978; Horn, 1991; Schneider & McGrew, 2018). This model also served to guide assessment of DAS-II structural validity and serves as the primary method for score interpretation.
The Introductory and Technical Handbook suggests that users should interpret DAS-II scores in a stepwise fashion beginning with the GCA and then proceed to more specific measures (e.g., clusters and subtests). However, Elliott (2007b) suggested that the profile of strengths and weaknesses generated at the cluster and subtest levels is of more value than the information provided by the GCA, especially in cases where considerable variability across the cluster scores is observed and detailed procedures for evaluating scatter among the cluster and subtest scores are outlined in the Introductory and Technical Handbook. According to Elliott, “the most satisfactory description of a child’s abilities is nearly always at the level of profile analysis” (p. 87). However, such prescriptive statements are rarely justified in applied practice and require adherence to standards of empirical evidence (Marley & Levin, 2011). More recently, McGill et al. (2018) reported on the absence of supportive evidence and negative evidence for such profile analyses since the seminal review by Watkins (2000).
Interpretation of test scores and comparisons must be guided by strong replicated empirical evidence deriving from structural validity, relationships with external variables including incremental validity and diagnostic and treatment utility, as noted in the Standards for Educational and Psychological Testing (American Educational Research Association et al., 2014). An important starting point for such evidence resides in the test structure as structural validity is a requisite property of broader construct validity (Keith & Kranzler, 1999).
The DAS-II Introductory and Technical Handbook did not report results of exploratory factor analysis (EFA) in examining construct validity nor was there disclosure of proportions of variance accounted for by the higher order g factor and the proposed first-order group factors, subtest g loadings, subtest specificity estimates, or incremental predictive validity estimates for the factors and subtest scores. Without this information, clinicians are unable to independently determine the relative importance of factor and subtest scores relative to the GCA score. Factor or subtest scores that fail to capture meaningful portions of non-g true score variance will likely be of limited clinical utility. The omission of incremental predictive validity results is particularly troubling because users are encouraged to interpret the DAS-II beyond the GCA level but DAS-II cluster scores, like all such scores, conflate general intelligence variance and group factor variance. Youngstrom et al. (1999) examined the incremental validity of the original DAS and found that interpretation beyond the GCA was not supported.
Structural Validity Investigations of the DAS-II
Confirmatory factor analyses (CFAs) of the DAS-II hierarchical structure were reported in the DAS-II Introductory and Technical Handbook (Elliott, 2007b), and figures 8.1, 8.2, 8.3, and 8.4 illustrate the standardized validation models for the seven core and diagnostic subtests (2:6–3:5 years) featuring two first-order factors, 11 core and diagnostic subtests (4:0–5:11 years) with five first-order factors, 14 core and diagnostic subtests (6:0–12:11 years) with seven first-order factors, and 12 core and diagnostic subtests featuring six first-order factors, respectively. In these models, several first-order factors not available in the actual DAS-II were specified (e.g., auditory processing, visual–verbal memory, and verbal short-term memory). In addition, the auditory processing and visual–verbal memory factors in the final validation models for ages 6 to 17 years were each produced from a single indicator and reflect an empirically underidentified dimension. Although the inclusion of single indicator variables is possible in CFA, variables assessed by a single measure should not be interpreted as factors due to the fact that they do not possess any shared variance from multiple indicators (Brown, 2015).
Keith et al. (2010) examined measurement invariance of the DAS-II core and diagnostic subtest structure and reported support for a six-factor hierarchical model that corresponded closely with CHC theory with general intelligence at the apex. However, the final validation model required the specification of a cross-loading for the Verbal Comprehension measure on Crystallized Ability and Fluid Reasoning factors. Although Keith et al. provided the results of residualized subtest factor loadings in their DAS-II CFA analyses, the clinical utility of these results are limited due to the fact they were derived from a hypothesized first-order latent structure that deviates significantly from the structure suggested in the Introductory and Technical Handbook (Elliott, 2007b). Neither Elliott (2007b) nor Keith et al. reported univariate or multivariate skewness or kurtosis estimates among the scales used as indicators in their CFA models, which could have implications for proper model estimation if data were nonnormally distributed. 1 Also missing from CFAs conducted by Elliott and Keith et al. were comparisons of rival bifactor structures as explanations of DAS-II.
Until recently, independent factor analytic investigations of the DAS-II, as well as the validation study results reported in the Introductory and Technical Handbook, relied singularly on application of CFA procedures to various configurations of the core and supplementary subtests to produce different hierarchical models consistent with CHC theory. However, use of these results to ascertain what the core battery measures is problematic as those models are not structurally equivalent (Cattell, 1978). In recognition of these limitations, Canivez and McGill (2016) conducted hierarchical EFA and variance decomposition using the Schmid and Leiman (1957) procedure. Canivez and McGill found that although the DAS-II core subtests measured the general intelligence dimension (estimated by the GCA) well, as evidenced by high omega-hierarchical (ωH) coefficients, the DAS-II group factors (V, NV, SP) did not contribute sufficient portions of unique variance as evidenced by low and inadequate omega-hierarchical subscale (ωHS) coefficients. These results suggest that clinical interpretation of the DAS-II should likely be restricted to the GCA level and any interpretation of other scores or comparisons beyond the GCA should be done with caution and in light of additional external validation evidence.
To further elaborate on the DAS-II structure, Dombrowski, McGill, Canivez, and Peterson (2019) utilized similar EFA procedures to examine the total battery using the standardization sample data from the 5- to 8-year-old age range to determine the degree to which the DAS-II theoretical structure proposed in the Introductory and Technical Handbook, later refined by Keith et al. (2010), could be replicated. Results suggested a six-factor solution that was generally consistent with the CHC-based structure suggested by the publisher, with desired simple structure attained. However, two subtests (Picture Similarities and Early Number Concepts) did not saliently load on any group factors. Dombrowski, McGill, and Morgan (2019) used Monte Carlo simulation by resampling the standardization sample correlation matrices 1,000 times and analyzed the structure of the DAS-II total battery using maximum likelihood CFA. Both studies generally supported the theoretical structure posited in the Introductory and Technical Handbook. However, like Canivez and McGill (2016), large portions of subtest variance across both studies were apportioned to the general intelligence dimension resulting in a large ωH coefficient, but small portions of unique variance were apportioned to the first-order group factors as evidenced by the small ωHS coefficients, indicating inadequate unique measurement by the group factors.
Alternatively, Dombrowski, Golay, et al. (2018) used Bayesian structural equation modeling (BSEM) to examine the latent factor structure from the DAS-II core subtests using the 7 to 17 years age range standardization sample of raw data. This allowed the estimation of small, nonzero parameters often set to zero in traditional CFA that can inflate factor covariances and potentially distort model results. Results revealed the plausibility of the hypothesized three-factor model, consistent with publisher theory, expressed as either a higher order (HO) or a bifactor (BF; Holzinger & Swineford, 1937) model. However, best BSEM model fit was obtained from an alternative structure, a two-group factor (V, SP) bifactor (BF) model with Matrices (MAT) and Sequential and Quantitative Reasoning (SQR) loading on g only and no NV group factor. As with Canivez and McGill (2016); Dombrowski, McGill, Canivez, and Peterson (2019); and Dombrowski, McGill, and Morgan (2019), the general intelligence factor dominated subtest variance and had high omega-hierarchical (ωH) coefficients, but the DAS-II group factors (V, NV, SP) did not contribute sufficient portions of unique variance as shown by low and inadequate omega-hierarchical subscale (ωHS) coefficients.
Although the latent structure of the DAS-II core subtests was examined via CFA in the Introductory and Technical Handbook, the standardized solutions for these analyses were not provided and alternative, rival models (e.g., bifactor) were not evaluated. Inspection of the goodness-of-fit results in table 8.4 indicated that a three-factor hierarchical model provided the most optimal solution for the core subtests across the age spans with fairly robust improvements in fit when compared with competing one-factor and hierarchical two-factor models. Generally, CFAs have supported a hierarchical model with general intelligence at the apex and three first-order factors for the core subtests, but bifactor structure has thus far only been examined by Dombrowski, Golay, et al. (2018) using a recently rediscovered approach to latent variable modeling and those results did not support a three-factor structure for the core tests.
Purpose of the Current Study
CFAs for the DAS-II core subtests reported in the Introductory and Technical Handbook are not sufficiently explicated, did not recognize or account for nonnormal distributions, and did not disclose portions of variance accounted for by the various factors. Furthermore, Elliott did not provide standardized parameter estimates for the core subtest models or examine rival bifactor model representations, which might provide better and more parsimonious fit to the standardization sample data. Accordingly, the purpose of the present investigation was to extend the results of the Canivez and McGill (2016) EFA study and to address the limitations of CFA reported in the DAS-II Introductory and Technical Handbook. Specifically, the present study examined the factor structure of the DAS-II core subtests through CFA and disclosure of variance contributions of latent factors using the normative sample raw data across the three test levels (i.e., lower early years, upper early years, and school age). It is believed that the results furnished by the present investigation will be instructive for determining how the DAS-II core battery should be interpreted in clinical practice.
Method
Participants
Participants were members of the DAS-II standardization sample and included a total of 3,460 individuals ranging in age from 2 to 17 years. Age groups included lower early years (2:6- to 3:5-year-olds; N = 352), upper early years (3:6- to 6:11-year-olds; N = 920), and school age (7:0- to 17:11-year-olds; N = 2,188). Detailed demographic characteristics are provided in the DAS-II Introductory and Technical Handbook (Elliott, 2007b). The standardization sample was obtained using stratified proportional sampling across key demographic variables of age, sex, race/ethnicity, parent educational level, and geographic region; and examination of the demographic results reported in the Introductory and Technical Handbook reveal a close correspondence across the stratification variables to the October 2002 U.S. census estimates.
Table S1 (see supplemental material) presents DAS-II Core Subtest correlation matrices and descriptive statistics for the three DAS-II age groups indicating some departure from normal distribution (Onwuegbuzie & Daniel, 2002; West et al., 1995). Univariate skewness estimates from the three age groups ranged from −0.911 to 0.733. Univariate kurtosis estimates from the three age groups, however, ranged from 0.556 to 3.195. Mardia’s (1970) multivariate kurtosis estimates for the lower early years age 2:6 to 3:5 sample (Ζ = 11.37), upper early years age 3:6 to 6:11 sample (Ζ = 26.49), and the school age 7:0 to 17:11 sample (Ζ = 33.86) indicated statistically significant (p < .05) multivariate nonnormality for all three age groups (Cain et al., 2017) that has implications for CFA model estimation and fit statistics.
Instrument
The DAS-II uses different combinations of the 10 core subtests to produce the GCA score at different points of the age span. Whereas the GCA score is composed of four subtests at ages 2:6 through 3:5 years, six core subtests are used from ages 3:6 through 17:11 years. The core subtests combine to form three primary cognitive clusters at the first-order level and each is composed of two subtests. Verbal (V) ability and Nonverbal (NV) Reasoning Ability clusters are provided for all ages, but an additional Spatial (SP) Ability cluster is available from ages 3:6 through 17:11 years. Additional combinations of supplemental diagnostic subtests are provided, which can be combined to yield additional first-order clusters (e.g., Working Memory, Processing Speed, and School Readiness); however, these measures are not utilized to calculate the higher order GCA composite or its lower order cognitive clusters. In addition, the diagnostic measures cannot be used to substitute for the core subtests.
Procedure and Analyses
NCS Pearson, Inc. provided standardization sample raw data for independent analyses. EFA models suggested by Canivez and McGill (2016) and those promoted by Elliott (2007b) and the publisher (see table 8.4) were examined and compared. Whereas Elliott only reported oblique models for the core subtests, the present study examined oblique, higher order, and bifactor structures to determine fit to these data.
CFAs with maximum likelihood estimation were conducted using EQS 6.3 (Bentler & Wu, 2016). Each of the three latent group factors produced by DAS-II core subtests (V, NV, SP) have only two observed indicators and thus are empirically underidentified. Consequently, to ensure identification of CFA bifactor models, those subtests were constrained to equality (Little et al., 1999). Given the significant multivariate kurtosis observed in all three age groups, robust maximum likelihood estimation with the Satorra and Bentler (S-B; 2001) corrected chi-square was applied. Byrne (2006) indicated “the S-B χ2 has been shown to be the most reliable test statistic for evaluating mean and covariance structure models under various distributions and sample sizes” (p. 138). Because Elliott (2007b) did not disclose univariate or multivariate normality estimates or apply the S-B–corrected χ2, present results may differ from those presented in the Introductory and Technical Handbook. Some models reported in table 8.4 included cross-loading the Picture Similarities and Matrices subtests on multiple factors, but these were not presently examined given problems of cross-loading and its abandoning of desired simple structure. It should be noted that previous EFA studies (Canivez & McGill, 2016; Dombrowski, Golay, et al., 2018; Dombrowski, McGill, Canivez, & Peterson, 2019) did not support specification of this parameter, whereas Keith et al. (2010) did not include cross-loadings of Picture Similarities and Matrices subtests in initial calibration, reference variable, or final validation CFA models. Furthermore, those parameters deviate from the theoretical structure of the test, which is based upon desired simple structure.
Given that the large sample size could unduly influence the χ2 value (Kline, 2016), approximate fit indices were used to aid model evaluation and selection. Although criterion values for approximate fit indices are not universally accepted (McDonald, 2010), the comparative fit index (CFI), Tucker–Lewis index (TLI), and the root mean square error of approximation (RMSEA) were used to evaluate overall global model fit. Higher values indicated better fit for the CFI and TLI, whereas lower values indicated better fit for the RMSEA. Combinatorial heuristics of Hu and Bentler (1999) were applied where CFI and TLI ≥ .90 and RMSEA ≤ .08 were criteria for adequate model fit; whereas CFI and TLI ≥ .95 and RMSEA ≤ .06 were criteria for well-fitting models. Marsh et al. (2004), however, cautioned overgeneralizing such heuristics that could result in the incorrect rejection of an acceptable model (Type I error). The Akaike Information Criterion (AIC) was also considered. Because AIC does not have a meaningful scale, the model with the smallest AIC value was preferred as most likely to replicate (Kline, 2016). Superior models required adequate to good overall fit and indication of meaningfully better fit (ΔCFI > .01, ΔRMSEA > .015, ∆AIC > 10) than alternative models (Burnham & Anderson, 2004; Chen, 2007; Cheung & Rensvold, 2002). Local fit was also considered in addition to global fit as models should never be retained “solely on global fit testing” (Kline, 2016, p. 461). Statistical power sufficient to detect even small differences is provided by the large sample size as well as more precise model parameter estimates.
Coefficients omega-hierarchical (ωH) and omega-hierarchical subscale (ωHS) were estimated and provide a model-based estimate of the proportion of true score variance that would be obtained in a unit-weighted score composed of subtests associated with a specific factor (Reise, 2012; Rodriguez et al., 2016a, 2016b; Watkins, 2017). The ωH coefficient is the unique general intelligence factor variability estimate with variability from the group factors removed. The ωHS coefficient is the unique group factor variability estimate with variability from all other group and general factors removed (Brunner et al., 2012; Reise, 2012). Omega estimates (ωH and ωHS) are calculated from CFA bifactor solutions or decomposed variance estimates from higher order models and were obtained using the Omega program (Watkins, 2013), which is based on the works of Zinbarg et al. (2005, 2006) and the Brunner et al. (2012) tutorial. Although standards for omega coefficients acceptability for clinical use are not universally accepted, it has been suggested that ωH and ωHS coefficients should exceed .50, but .75 might be preferred (Reise, 2012; Reise et al., 2013; Rodriguez et al., 2016a, 2016b). Reise et al. (2013) and Rodriguez et al. (2016a, 2016b) illustrated meaningful attribution of indicators to latent general or group factor measurement when the majority of unique variability was present in the factor and thus the minimum criterion of .50. The Hancock and Mueller (2001) construct reliability or construct replicability coefficient (H) supplemented omega coefficients and estimated the latent construct adequacy represented by the indicators, using a criterion value of .70 (Hancock & Mueller, 2001; Rodriguez et al., 2016). H coefficients were produced by the Omega program (Watkins, 2013).
Results
Lower Early Years (Age = 2:6–3:5)
Table 1 presents fit statistics for the only two models that could be tested. Both unidimensional g (Model 1, see Figure 1) and the oblique V and NV (Model 2, see Figure 2) models fit the standardization data well. No statistically significant or meaningful differences between these two models were noted in fit statistics, so given the extremely high V–NV covariance (.936), the unidimensional g model was determined the best representation for parsimonious explanation of DAS-II measurement for this age group. Higher-order and bifactor models would be mathematically equivalent to Model 2.
CFA Fit Statistics for DAS-II Core Subtests for the Total Standardization Samples.
Note. CFA = confirmatory factor analysis; DAS-II = Differential Ability Scales–Second Edition; S-B = Satorra–Bentler; TLI = Tucker–Lewis index; CFI = comparative fit index; RMSEA = root mean square error of approximation; CI = confidence interval; AIC = Akaike’s Information Criterion; g = general intelligence; V = Verbal; NV = Nonverbal; SP = Spatial.
Bold text reflects best and preferred model.
Factor 2 (Verbal) disturbance was linearly dependent on other parameters so EQS set disturbance variance to zero for model estimation.
Matrices and Picture Similarities subtests had negative path coefficients on the NV group factor.
Model respecified with Matrices and Picture Similarities subtests with only g paths and no NV group factor paths.
Higher order model AIC presented in the table, bifactor model AIC was slightly higher at 39,579.54 but not meaningfully different.
EQS condition code noted the NV and g factors were linearly dependent on other parameters.
Matrices and Sequential and Quantitative Reasoning subtests had small negative path coefficients (−.01 and −.05, respectively) on NV group factor.
Higher order model AIC presented in the table, bifactor model AIC was slightly higher at 92,042.29 54 but not meaningfully different than higher order model, and Matrices and Sequential and Quantitative Reasoning group factor standardized path values were 0 and thus not statistically significant.
Bifactor model respecified with Matrices and Sequential and Quantitative Reasoning subtests having only g paths and no NV group factor paths (equivalent to removing Matrices and Sequential and Quantitative Reasoning subtests NV group factor paths in Model 5).

Unidimensional measurement model with standardized coefficients, for the 4 DAS-II core subtests for ages 2:6 to 3:5, N = 352.

Two oblique factors measurement model with standardized coefficients, for the four DAS-II core subtests for ages 2:6 to 3:5, N = 352.
Upper Early Years (Age = 3:6–6:11)
Table 1 presents fit statistics for models tested for the 3:6 to 6:11 age group. The combinatorial heuristics of Hu and Bentler (1999) indicated that Model 1 (g) was inadequate with too low TLI and too high RMSEA. Model 2 (oblique V and NV) provided adequate fit to standardization sample data, but Model 3 (oblique V, NV, SP) fit the standardization sample data well and better than Model 2 (higher TLI and CFI and lower RMSEA and AIC). However, given the significant covariance among the three group factors, it was necessary to explicate higher-order and bifactor representations of that model. Model 4 (higher-order with V and NV) produced adequate to good fit but contained a local fit problem of linear dependence of the lower-order Verbal factor disturbance that needed to be fixed to zero to allow model estimation. Model 5a (bifactor with V and NV) provided better fit than Model 4 (higher TLI and CFI and lower RMSEA and AIC), but Matrices and Picture Similarities subtests had negative standardized path coefficients on the NV group factor so was respecified as Model 5b (as per Dombrowski, Golay, et al., 2018) with Matrices (MAT) and Picture Similarities (PS) subtests containing only standardized g path coefficients and no NV group factor standardized path coefficients. Due to only having two indicators per group factor, Model 6 (higher-order [see Figure 3] and bifactor [see Figure 4] representations with V, NV, and SP) was mathematically equivalent and both provided good fits to standardization sample data, and neither produced local fit problems. As such, both higher-order (Figure 3) and bifactor (Figure 4) representations of Model 6 are further explicated in Tables 2 and 3 to illustrate decomposed sources of variance and model-based reliability estimates.

Higher-order measurement model with standardized coefficients, for the six DAS-II core subtests for ages 3:6 to 6:11, N = 920.

Bifactor measurement model with standardized coefficients, for the six DAS-II core subtests for ages 3:6 to 6:11, N = 920.
Sources of Variance in the DAS-II Core Subtests for the Total Standardization Sample Ages 3:6 to 6:11 (N = 920) According to a CFA Higher-Order Model (Figure 3).
Note. DAS-II = Differential Ability Scales–Second Edition; CFA = confirmatory factor analysis; b = standardized loading of subtest on factor; S2 = variance explained in the subtest; h2 = communality; u2 = uniqueness; ECV = explained common variance; ω = omega; ωH = omega-hierarchical (general factor); ωHS = omega-hierarchical subscale (group factors); H = construct reliability or replicability index; PUC = percentage of uncontaminated correlations.
Sources of Variance in the DAS-II Core Subtests for the Total Standardization Sample Ages 3:6 to 6:11 (N = 920) According to a CFA Bifactor Model (Figure 4).
Note. DAS-II = Differential Ability Scales–Second Edition; CFA = confirmatory factor analysis; b = standardized loading of subtest on factor; S2 = variance explained in the subtest; h2 = communality; u2 = uniqueness; ECV = explained common variance; ω = omega; ωH = omega-hierarchical (general factor); ωHS = omega-hierarchical subscale (group factors); H = construct reliability or replicability index; PUC = percentage of uncontaminated correlations.
The general intelligence dimension accounted for most of the DAS-II subtest variance and substantially smaller portions of subtest variance were uniquely associated with the three DAS-II group factors (V, NV, SP). Omega-hierarchical and omega-hierarchical subscale coefficients estimated using bifactor results from Table 4 found the ωH coefficient for general intelligence (.748) was high and indicated a unit-weighted composite score based on the six subtest indicators would account for 74.8% true score variance. The ωHS coefficients for the three DAS-II group factors (V, NV, SP) were considerably lower ranging from .072 (NV) to .210 (V). Thus, unit-weighted composite scores for the three DAS-II first-order factors possess too little unique true score variance to recommend confident clinical interpretation (Reise, 2012; Reise et al., 2013). Table 4 also presents H coefficients that reflect correlations between the latent factors and optimally weighted composite scores (Rodriguez et al., 2016). The H coefficient for the general factor (.785) indicated the general factor was well defined by the six DAS-II subtest indicators, but the H coefficients for the three DAS-II group factors ranged from .096 to .281 and thus were not adequately defined by their subtest indicators. Results were identical or nearly identical for the higher-order representation of DAS-II (see Table 3).
Sources of Variance in the DAS-II Core Subtests for the Total Standardization Sample Ages 7:0 to 17:11 (N = 2,188) According to a CFA Higher-Order Model (Figure 5).
Note. DAS-II = Differential Ability Scales–Second Edition; CFA = confirmatory factor analysis; b = standardized loading of subtest on factor; S2 = variance explained in the subtest; h2 = communality; u2 = uniqueness; ECV = explained common variance; ω = omega; ωH = omega-hierarchical (general factor); ωHS = omega-hierarchical subscale (group factors); H = construct reliability or replicability index; PUC = percentage of uncontaminated correlations.
School Age (Age = 7:0–17:11)
Table 1 presents fit statistics for tested models for the 7:0 to 17:11 age group. Examination of fit statistics indicated that Model 1 (g) was inadequate (too low TLI and CFA, too high RMSEA). Model 2 (oblique V and NV) provided adequate to good fit but Model 3 (oblique V, NV, SP) fit the standardization sample data well and was superior to Models 1 and 2 (higher TLI and CFI and lower RMSEA and AIC). Due to significant covariance of the three group factors (V, NV, SP), higher-order and bifactor models were necessary. Model 4 (higher-order with V and NV) produced good fit but contained a local fit problem where the NV and g factors were linearly dependent on other parameters. Model 5 (bifactor with V and NV) fit the standardization sample data well and not only was superior to Model 4 (higher TLI and CFI and lower RMSEA and AIC) but also contained local fit problems where Matrices (MAT) and Sequential and Quantitative Reasoning (SQR) subtests had small negative path coefficients (−.01 and −.05, respectively) on the NV group factor. Due to only having two indicators per group factor, Model 6 (higher-order [see Figure 5] and bifactor [see Figure 6] representations with V, NV, and SP) were mathematically equivalent and provided good fit to standardization sample data. The higher-order version of Model 6 contained a local fit problem of a standardized path coefficient of 1.0 between g and NV (see Figure 5) and the bifactor version of Model 6 contained a local fit problem of standardized path coefficients of 0 between the NV group factor and MAT and SQR subtests. As a result, the NV group factor was deleted from the bifactor model and Model 7 (see Figure 7) estimated to represent a bifactor model with only the V and SP group factors and MAT and SQR subtests contained only standardized path coefficients with g. Both higher-order (Figure 5) and bifactor (Figures 6 and 7) representations of Models 6 and 7 are further explicated in Table 4 (higher-order) and Table 5 (bifactor) to illustrate decomposed sources of variance and model-based validity estimates.

Higher-order measurement model with standardized coefficients, for the six DAS-II core subtests for ages 7:0 to 17:11, N = 2,188.

Bifactor measurement model with standardized coefficients, for the six DAS-II core subtests for ages 7:0 to 17:11, N = 2,188.

Final bifactor measurement model with standardized coefficients, for the six DAS-II core subtests for ages 7:0 to 17:11, N = 2,188.
Sources of Variance in the DAS-II Core Subtests for the Total Standardization Sample Ages 7:0 to 17:11 (N = 2,188) According to a CFA Bifactor Model (Figure 6).
Note. DAS-II = Differential Ability Scales–Second Edition; CFA = confirmatory factor analysis; b = standardized loading of subtest on factor; S2 = variance explained in the subtest; h2 = communality; u2 = uniqueness; ECV = explained common variance; ω = omega; ωH = omega-hierarchical (general factor); ωHS = omega-hierarchical subscale (group factors); H = construct reliability or replicability index; PUC = percentage of uncontaminated correlations.
In both the higher-order (Model 5) and bifactor (Model 6) models, the general intelligence dimension accounted for most of the DAS-II subtest variance and substantially smaller portions of subtest variance were uniquely associated with the three DAS-II group factors (V, NV, SP). Omega-hierarchical and omega-hierarchical subscale coefficients estimated using bifactor results from Table 5 found the ωH coefficient for general intelligence (.834) was high, and indicated a unit-weighted composite score based on the six subtest indicators would account for 83.4% true score variance. The ωHS coefficients for the three DAS-II group factors (V, NV, SP) were considerably lower ranging from .000 (NV) to .268 (V). Thus, unit-weighted composite scores for the three DAS-II first-order factors possess too little unique true score variance to recommend clinical interpretation (Reise, 2012; Reise et al., 2013). Table 5 also presents H coefficients that reflect correlations between the latent factors and optimally weighted composite scores (Rodriguez et al., 2016). The H coefficient for the general factor (.871) indicated the general factor was well defined by the six DAS-II subtest indicators and essentially unidimensional, but the H coefficients for the three group factors ranged from .000 to .365 and thus were not adequately defined by their subtest indicators. Results were nearly identical for the higher-order representation of DAS-II (see Table 4).
Discussion
The present study provided an independent analysis of the factor structure of the DAS-II core subtests with the three age groups in the standardization sample using best practice CFA methods. The DAS-II Introductory and Technical Handbook (Elliott, 2007b) does not report the CFA procedures and analyses necessary to adequately support reported construct validity. Lack of disclosure of univariate and multivariate nonnormality among DAS-II core subtests in the standardization sample and apparent lack of robust model estimation in CFA reported in the DAS-II Introductory and Technical Handbook resulted in misestimation of model fit statistics and parameter estimates. Furthermore, the lack of reporting portions of variance captured by the various dimensions prohibits users of the DAS-II from determining which scores contain sufficient unique true score variance necessary for individual decision making. The present study attempted to overcome these shortcomings using the standardization sample raw data provided by NCS Pearson, Inc. for independent assessment.
Results of the present study paralleled quite well the EFA results from the DAS-II core subtests reported by Canivez and McGill (2016) that indicated that the DAS-II core subtests measured general intelligence well, and although the subtests generally had associations with theoretically linked first-order factors (V, NV, SP), the unique contributions of true score variance in the first-order group factors were universally low, prohibiting confident individual clinical interpretation. The present CFA results for the three DAS-II age groups (lower early years [2:6–3:5 years], upper early years [3:6–6:11 years], school age [7:0–17:11 years]) showed that although most subtests were generally aligned with their theoretical first-order group factors (V, NV, SP), most of the reliable subtest variance was associated with an overall, general intelligence factor (g), regardless of model expression (higher-order vs. bifactor). The dominance of the general intelligence factor and the limited unique measurement of the three group factors is evidenced by the subtest variance apportions where the general factor accounted for more than 6.84 times as much common subtest variance (3:6–6:11 years) and 6.88 times as much common subtest variance (7:0–17:11 years) as any individual DAS-II group factor and about 3 times as much common subtest variance (3:6–6:11 years) and about 4.7 times as much common subtest variance (7:0–17:11 years) as all three DAS-II group factors combined. Similar results were reported by Cucina and Howardson (2017) with the original DAS (Elliott, 1990).
The omega coefficients (ωH and ωHS) and construct reliability or construct replicability coefficients (H) from CFA results of the bifactor models (and higher-order models) indicated that although the broad g factor allows for confident individual interpretation of the GCA, the ωHS and H estimates for the three DAS-II group factors were unacceptably low (see Tables 3–5), and thus extremely limited for measuring unique cognitive constructs (Brunner et al., 2012; Hancock & Mueller, 2001; Reise, 2012; Rodriguez et al., 2016) supposedly quantified by the DAS-II cluster scores. Most disconcerting is the observation that for ages 7:0 to 17:11, the NV factor appears completely absent (a result suggested by Dombrowski, Golay, et al., 2018), yet an NV cluster score is provided for interpretation by the publisher. Such results indicate “to interpret subscale scores as representing the precise measurement of some latent variable that is unique or different from the general factor, clearly, is misguided” (Rodriguez et al., 2016, p. 225). Had variance apportions been reported in the DAS-II Introductory and Technical Handbook, this problem would have been disclosed and users of the DAS-II would be better able to decide whether there was little to nothing to report beyond the GCA. 2
The present results, like those reported by Cucina and Howardson (2017) with the original DAS, challenge the CHC-inspired interpretive model preferred by the test publisher model (see also Canivez & Youngstrom, 2019), in that, the portions of unique variance conveyed by the broad ability clusters (V, NV, SP) are quite small and thus likely to be of little consequence but the variance contributed by g is quite large and of primary importance. Thus, it appears these results provide ample support for Carroll’s conceptualization of the structure of intelligence but not Cattell and Horn or McGrew who have de-emphasized psychometric g and focused on the group factors (Horn & Blankson, 2005; Horn & Noll, 1997; McGrew, 2018). An additional theoretical implication is the preference for the bifactor model when there is an attempt to estimate or account for domain-specific abilities (Murray & Johnson, 2013), something explicitly done with DAS-II interpretations of V, NV, and SP scores and their comparisons. Users of the DAS-II must consider the empirical evidence of how well the group factor cluster scores (domain-specific) uniquely measure their represented construct independent of the general intelligence (g) factor (GCA) score (Chen et al., 2006, 2012). Bifactor models contain a general factor but permit multidimensionality, which some consider an advantage relative to the higher-order model for determining the group factor contributions independent of the general intelligence factor (Reise et al., 2010).
Reynolds and Keith (2013) have questioned the appropriateness of the bifactor model and stated that “we believe that higher-order models are theoretically more defensible, more consistent with relevant intelligence theory (e.g., Jensen, 1998), than are less constrained hierarchical [bifactor] models” (p. 66). However, Gignac (2006, 2008) argued, in comparing bifactor and higher-order models, that general intelligence is the most substantive factor of a battery of cognitive tests, so g should be modeled directly and that it is the higher-order model that requires explicit theoretical justification for the full mediation of general intelligence by the group factors. Carroll (1993, 1995) empirically illustrated that variation in subtest scores reflect both general and a more specific group factor variances. So, although subtest scores may appear reliable, in the majority of cases, that reliability estimate is primarily due to the influence of the general factor and not the specific group factor (Carretta & Ree, 2001). Others have argued that Spearman’s (1927) and Carroll’s (1993) conceptualizations of intelligence are better represented by the bifactor model (Beaujean, 2015; Brunner et al., 2012; Frisby & Beaujean, 2015 Gignac, 2006, 2008; Gignac & Watkins, 2013; Gustafsson & Balke, 1993). For example, Beaujean (2015) suggests that Spearman’s conception of general intelligence was of a factor “that was directly involved in all cognitive performances, not indirectly involved through, or mediated by, other factors” (p. 130) and also noted that “Carroll was explicit in noting that a bi-factor model best represents his theory” (p. 130).
The question of whether the general factor of intelligence actually represents a legitimate psychological dimension continues to be adjudicated and there are respected intelligence scholars that contend that g is nothing more than a statistical artifact. Recently, Kovacs and Conway (2016, 2019a, 2019b) presented their process overlap theory (POT), which argues for combination of, and attempts to merge, psychometric aspects of intelligence with cognitive psychology and neuroscience. They suggest that g is an emergent property (not the cause) of domain-general executive functions. Their effort was to provide a unified theory for general intelligence but Gottfredson (2016) pointed out a number of misconceptions and misattributions of g theory noting “the g theory they portray is not the one to which g theorists actually subscribe” (p. 210). Gottfredson welcomed the attempt to merge the disparate fields but illustrated how Kovacs and Conway are consistent with g theory, and not contrary to it, based on different levels of analysis.
Even so, the substantially greater total and common variance associated with general intelligence among DAS-II core subtests is a result that has been observed in numerous other studies examining the latent factor structure of intelligence or cognitive ability tests using both EFA and CFA procedures (Bodin et al., 2009; Canivez, 2008, 2014; Canivez & Watkins, 2010a, 2010b; Canivez et al., 2009, 2016, 2017; DiStefano & Dombrowski, 2006; Dombrowski, 2013, 2014a, 2014b; Dombrowski & Watkins, 2013; Dombrowski et al., 2009; Gignac & Watkins, 2013; Nelson & Canivez, 2012; Nelson et al., 2007, 2013; Watkins, 2006, 2010; Watkins & Beaujean, 2014; Watkins et al., 2006, 2013). These results continue to support the dominance of psychometric g and are consistent with the literature regarding the practical importance of general intelligence (Deary, 2013; Gottfredson, 2008; Jensen, 1998; Lubinski, 2000; Ree et al., 2003). Although it appears that in the case of highly gifted (precocious) individuals, there are additional effects of spatial abilities and intraindividual differences (higher verbal or higher quantitative abilities) related to excelling in humanities or science, technology, engineering, and math (STEM) domains (Kell, Lubinski, & Benbow, 2013; Kell, Lubinski, Benbow, & Steiger, 2013; Lubinski, 2016; Makel et al., 2016) such that g accounts for less variance in these circumstances, g still typically accounts for the most variance. This phenomenon is described by Spearman’s law of diminishing returns, which was specifically examined in the DAS-II by Reynolds et al. (2011), and they did indeed find that there was less g variance related to most subtests in the high- versus low-ability group, but for most subtests, there was still more g variance than broad ability variance associated with most DAS-II subtests. As such, the principal interpretation of DAS-II core subtests should be of the GCA, the estimate of g; although, perhaps in intellectually gifted individuals, other factors might be of value. The dominance of g variance captured by the DAS-II core subtests is a likely reason that methods to determine how many factors to extract and retain in EFA such as parallel analysis and minimum average partials suggest the DAS-II might be sufficiently represented by only one factor and the inability to locate the posited NV factor consistently in the present results (Crawford et al., 2010).
Relatedly, the confidence intervals provided for the DAS-II factor scores are considerably smaller (due to conflated general intelligence variance) than they might be if only the unique true score variance of the factor scores was used. The poor incremental validity provided by intelligence test group factors in accounting for meaningful portions of achievement variance beyond that provided by the omnibus composite IQ score in many contemporary intelligence tests (e.g., Canivez, 2013; Canivez et al. 2014; Glutting et al., 2006; McGill, 2015) may be the result of small amounts of unique variance captured by first-order factors as observed in the present study. Youngstrom et al. (1999) found in the assessment of incremental validity of the DAS factor scores, as predictors of achievement beyond the GCA, that interpretation of broad factor scores was not supported. Although incremental validity of DAS-II cluster scores above and beyond the GCA does not yet appear to have been investigated, it is hard to imagine these specific group factors would provide useful incremental information when predicting performance in academic achievement or relations with other external criteria given the current results.
Another problem for DAS-II interpretation is the recommended practice of identification of factor-based cognitive strengths and weaknesses through ipsative comparisons because analyses of DAS-II factor score differences at the observed score level conflate g variance and specific group factor (Verbal, Nonverbal, Spatial) variance. The same is true of analyses of subtest-based processing strengths and weaknesses (PSWs). Because it is not possible to disaggregate these sources of variance for individuals, it is impossible to know how much of the variance in performance is due to the general factor, specific group factor, or the narrow subtest ability. These concerns are in addition to the long-standing problems identified for ipsative score comparisons (McDermott et al., 1990, 1992; McDermott & Glutting, 1997) and suggest that these interpretive practices should probably be eschewed. In addition, the longitudinal stability of such processing strengths and weaknesses (PSWs) (see Watkins & Canivez, 2004) or diagnostic and treatment utility of such DAS-II PSWs in particular, has yet to be demonstrated. Although these types of profile analysis methods remain popular in clinical practice, compelling empirical support for the validity of these practices is presently lacking (e.g., Glutting et al., 2003; Macmann & Barnett, 1997; McDermott et al., 1990, 1992; McDermott & Glutting, 1997; McGill et al., 2018; Miciak et al., 2014; Watkins, 2000; Watkins et al., 2007).
Finally, it should be noted that these results are not unique to the DAS-II. As a result, a host of independent CFA and EFA studies of other major tests of intelligence such as the Wechsler Intelligence Scale for Children–Fourth Edition (WISC-IV; Bodin et al., 2009; Canivez, 2014; Keith, 2005; Watkins, 2006, 2010; Watkins et al., 2006), Wechsler Intelligence Scale for Children–Fifth Edition (WISC-V; Canivez et al., 2016, 2017, 2020; Dombrowski, Canivez, & Watkins, 2017), Wechsler Adult Intelligence Scale–Fourth Edition (WAIS-IV; Canivez & Watkins, 2010a, 2010b; Nelson et al., 2013), Wechsler Preschool and Primary Scale of Intelligence–Fourth Edition (WPPSI-IV; Watkins & Beaujean, 2014), Woodcock-Johnson–Third Edition (WJ III; Cucina & Howardson, 2017; Dombrowski, 2013, 2014a, 2014b; Dombrowski & Watkins, 2013; Strickland et al., 2015), Woodcock-Johnson–Fourth Edition (WJ IV; Dombrowski, McGill, & Canivez, 2017, 2018a, 2018b), Stanford–Binet Intelligence Scale: Fifth Edition (SB-5; Canivez, 2008; DiStefano & Dombrowski, 2006), Kaufman Assessment Battery for Children (KABC; Cucina & Howardson, 2017), Kaufman Assessment Battery for Children–Second Edition (KABC-2; McGill & Dombrowski, 2018), Kaufman Adolescent and Adult Intelligence (KAIT; Cucina & Howardson, 2017), and Reynolds Intellectual Assessment Scales (RIAS; Dombrowski et al., 2009; Nelson & Canivez, 2012; Nelson et al., 2007) have reached similar conclusions about what commercial ability tests measure. We encourage practitioners to consider these results along with the psychometric meta-analysis conducted by Dombrowski, McGill, and Morgan (2019) when making decisions about how these measures should be interpreted and utilized in clinical practice.
Limitations
The results of the present study pertain only to the latent factor structure of the DAS-II core subtests and do not fully test all aspects of construct validity. In fact, as emphasized by Bonifay et al. (2017), bifactor (and other structures) must be examined for adequacy against external criteria in theoretical validation. Latent profile analysis might be useful to determine whether the DAS-II is able to identify various diagnostic groups that might be expected to differ from normative samples. As previously mentioned, studies examining relations of DAS-II scores with external criteria such as examinations of incremental predictive validity (Youngstrom et al., 1999) or effects of extreme cluster score variability on DAS-II prediction of academic achievement (Kotz et al., 2008) should also be conducted. In addition to observed scores, DAS-II latent factor scores could also be examined for contributions to the explanation of academic achievement (see Glutting et al., 2006; Kranzler et al., 2015). Diagnostic utility of DAS-II cluster scores should also be examined to determine whether they offer utility for correct classification of individuals within specific groups or differential treatment response (see Canivez, 2013b).
Conclusion
The present CFA results reinforce the admonition of extreme caution for any interpretations of DAS-II scores beyond the GCA (Canivez & McGill, 2016; Dombrowski, Golay, et al., 2018, 2019), including assessments for PSW. Due to the very small portions of unique true score variance provided by cluster scores and the inability to locate the NV score consistently across the age span of the test, such scores and their comparisons are potentially misleading. Better measurement of posited DAS-II first-order dimensions as distinct from g will likely require the creation and inclusion of more or better indicators as has been suggested with other general intelligence tests (Canivez et al., 2016, 2017; Dombrowski, McGill, & Canivez, 2017, 2018a, 2018b). These results, in addition to the advantages of bifactor modeling in aiding our understanding of test structure (Canivez, 2016; Cucina & Byle, 2017; Gignac, 2008; Reise, 2012), indicate that comparisons of bifactor and higher order representations are likely needed to fully understand what cognitive tests such as the DAS-II measure. Given “the ultimate responsibility for appropriate test use and interpretation lies predominantly with the test user” (American Educational Research Association et al., 2014, p. 141), consideration of the present results and other independent DAS-II studies allow users to “know what their tests can do and act accordingly” (Weiner, 1989, p. 829).
Supplemental Material
Supplemental_Material – Supplemental material for Factor Structure of the Differential Ability Scales–Second Edition Core Subtests: Standardization Sample Confirmatory Factor Analyses
Supplemental material, Supplemental_Material for Factor Structure of the Differential Ability Scales–Second Edition Core Subtests: Standardization Sample Confirmatory Factor Analyses by Gary L. Canivez, Ryan J. McGill and Stefan C. Dombrowski in Journal of Psychoeducational Assessment
Footnotes
Authors’ Note
Preliminary results were presented at the 2017 Annual Convention of the American Psychological Association, Washington, D.C. Standardization data from the Differential Ability Scales–Second Edition (DAS-II). Copyright© 1998, 2000, 2004, 2007 NCS Pearson, Inc. and Colin D. Elliott. Normative data copyright© 2007 NCS Pearson, Inc. Used with permission. All rights reserved.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Notes
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
