Abstract
With an age range from 3 to 13 years, the Kaufman Assessment Battery for Children–Second Edition (KABC-II) offers an appealing option for the assessment of cognitive abilities for children. Although independent research has provided evidence of the construct validity of the KABC-II for school-age children, previous studies have rarely included an examination of preschool-age children. This study used confirmatory factor analysis to investigate the structure of the KABC-II in preschool children in relation to the Cattell-Horn-Carroll (CHC) theory of intelligence. By examining competing models based on CHC theory, this study found that the constructs measured by the KABC-II subtests matched those specified within the manual, for children ages 4 through 5. In addition, these results suggested that Gf and Gv are in fact distinct broad abilities for this age group. These findings provide support for differentiated cognitive abilities within young children.
Keywords
From yearly high-stakes testing to weekly curriculum-based measures, testing has become a pervasive part of schooling (Brown-Chidsey, 2005). Although test results can play a role in determining outcomes ranging from teachers’ salary to grade retention to the appropriate reading group, a primary use of individual testing is to determine eligibility for special education. And, increasingly, schools are recognizing the importance of identifying children with special needs before the start of formal schooling, during the preschool years (Ford & Dahinten, 2005).
As part of a comprehensive battery, a measure of intelligence (or cognitive abilities) is an essential element of the early identification process (Keith, 2015, chap. 15). With an age range from 3 to 18, the Kaufman Assessment Battery for Children–Second Edition (KABC-II; Kaufman & Kaufman, 2004) offers an appealing possible measure of cognitive abilities within this preschool population.
The KABC-II allows for interpretation through the Cattell-Horn-Carroll (CHC) theory of intelligence (Kaufman & Kaufman, 2004). Based on a hierarchical organization, CHC theory, like the three-stratum model, hypothesizes a model of intelligence that includes an overall measure of ability (g), a middle level consisting of broad abilities, and a lowest level consisting of narrow abilities (Schneider & McGrew, 2012).
Although independent studies have investigated both the stratum-structure and the broad abilities in older children (Reynolds, Keith, Fine, Fisher, & Low, 2007), the only study to examine the use of the KABC-II with preschool children focused on the level of differentiation of cognitive abilities. In a comparison of a CHC three-stratum model with a two-stratum model, which hypothesizes a model of intelligence including both narrow and broad abilities without an overall measure of ability (g), Morgan, Rothlisberg, McIntosh, and Hunt (2009) showed that a model without an overarching g may provide a better fit for young children.
In contrast, the KABC-II manual presented confirmatory factor analytic (CFA) evidence supporting a CHC-based structure for the test, including multiple broad abilities, across all ages (Kaufman & Kaufman, 2004). The manual sets forth a model for ages 4 through 6 that includes an overarching g factor and four broad abilities: crystallized intelligence (Gc), visual processing (Gv), long-term retrieval (Glr), and short-term memory (Gsm). The biggest difference in suggested structure and interpretation for younger versus older children is that there is no separate fluid/novel reasoning (Gf) factor for younger children. Instead, tasks that appear to measure Gf in older children are classified as measuring Gv in younger children, a not-uncommon finding with individual cognitive batteries (Flanagan, Mascolo, & Genshaft, 2004). It is unclear whether this finding is a result of less differentiation in intelligence for younger children or simply a difficulty measuring Gf in younger children (Carroll, 1993, chap. 17).
Analytic results presented in the manual also suggest a possible combination of Gc and Gv factors and scales. The publishers justified the division of Gc and Gv based on the separate content addressed by the subtests that load on those factors but noted the high correlation between these factors. The strength of this correlation supports findings from other studies (Blaga et al., 2009; Tideman & Gustafsson, 2004; Ward, Rothlisberg, McIntosh, & Bradley, 2011). Given evidence showing that intelligence tests for young children generally identify fewer factors of intelligence than are found with older children and adults (Blaga et al., 2009; Tideman & Gustafsson, 2004; Ward et al., 2011), these loadings raise a question as to whether Gf, Gv, and Gc actually represent distinct factors within the KABC-II battery (Kaufman & Kaufman, 2004).
The manual noted two additional questions about the factor structure for this age range (Kaufman & Kaufman, 2004). Although designed to measure Gv, the Conceptual Thinking subtest also showed strong loadings on Gc, raising the question of whether it should be allowed to load onto both factors (see Table 1 for a description of each subtest). In addition, the Face Recognition subtest showed a low factor loading (.27) on Gv. Face Recognition follows a similar format to the facial recognition subtests on the NEPSY-II (Korkman, Kirk, & Kemp, 2007) and TOMAL-II (Reynolds & Voress, 2007), both of which use the tests as measures of short-term, visual memory. The memory elements within this subtest, in combination with the poor loading onto Gv, suggest that Face Recognition might load on both Gv and Gsm.
Kaufman Assessment Battery for Children–Second Edition Subtest Descriptions.
Note. Gc = crystallized intelligence; Gv = visual processing; Glr = long-term retrieval.
Beyond the issues raised by the manual, an earlier investigation of the factor structure for ages 6 to 18 found that minor alterations resulted in a significant improvement in model fit (Reynolds et al., 2007). Given that this study also showed evidence of invariance of the variance/covariance matrices across age groups, these findings suggest that similar alterations in the factor structure may improve the model fit for ages 4 to 5. More specifically, Reynolds and colleagues (2007) suggested that allowing Gestalt Closure to load on both Gv and Gc and allowing Hand Movements to load on both Gf and Gsm may significantly improve model fit. This study also showed that Conceptual Thinking along with Pattern Reasoning likely measures Gf in younger children (Reynolds et al., 2007).
The present investigation is designed to address these questions and to clarify the nature of the constructs measured by the KABC-II for children in the 4- to 5-year-old age range. In particular, this research will address the question of whether younger children show less differentiation in intelligence. Building on the initial studies presented in the manual, this research will use CFA to compare the factor structure of the KABC-II with competing models and, ultimately, to add to knowledge of the construct validity of the KABC-II.
Method
Instrument
The KABC-II is an individually administered test designed to measure processing and cognitive abilities in children ages 3 years 0 months (3:0) through 18 years 11 months (18:11). Subtests and items administered are determined by each child’s age and ability level; subtests administered to 4- and 5-year-olds are described briefly in Table 1. The test, its components, and its interpretation are described in detail in the KABC-II manual (Kaufman & Kaufman, 2004). An illustration of the model supported by the test’s authors and used as a scoring structure is provided in Figure 1.

Baseline higher order model of the structure of the KABC-II for 4- and 5-year-old children.
Participants
To focus the study on preschool-age children, the sample for the present study included all 4- and 5-year-olds from the KABC-II standardization sample (Kaufman & Kaufman, 2004). The sample excluded 3-year-olds, because the scoring structure for this age does not support the use of the three-stratum model suggested for 4- and 5-year-olds. According to the KABC-II manual, the standardization sample was representative of U.S. children according to sex, racial/ethnic group, parental education, geographic region, and educational placement. Sample characteristics are shown in Table 2.
Demographic Characteristics of the Standardization Sample for Ages 4:0-5:11.
Models and Analysis
All questions were tested via CFA and the comparison of competing models. Data preparation was completed using IBM SPSS (Version 21), and models were analyzed using IBM Amos (Version 21; Arbuckle, 2013). Maximum likelihood estimation was used for all analyses, and the full information maximum likelihood method was used to handle the minimal missing data (less than 1% for one variable).
Model fit, used to make comparisons between models, was assessed and compared using multiple measures. The root mean square error of approximation (RMSEA) and the comparative fit index (CFI) were used to assess the fit of single models. RMSEA values below .05 (Browne & Cudeck, 1993) and CFI values above .95 suggested good model fit (Hu & Bentler, 1999). Chi-square (χ2) and its associated degrees of freedom were calculated for each model and compared for nested models, with statistically significant changes in chi-square (Δχ2) between nested models indicating a significant change in model fit. For nonnested models, the Akaike Information Criterion (AIC), supplemented by the sample-size adjusted Bayes Information Criterion (aBIC) was used to compare models, with smaller AIC and aBIC values indicating better model fit (Keith, 2015). The aBIC includes a larger parsimony adjustment than the AIC.
Initial model and modifications
The model supported in the test manual (Figure 1) served as the baseline model, and it was compared with similar models with adjustments and constraints imposed to address each hypothesis. Details regarding competing models are included in subsequent sections.
Stratum differentiation: Two-stratum models
CFA was used to compare competing two-stratum models with the three-stratum manual model.
Age differentiation: Does the KABC-II measure Gf for young children?
The KABC-II structure for young children combines Gf and Gv tests into a single Gv factor. It is difficult to obtain a separate Gf factor when there is only one measure of Gf for an age group (as happens with 4-year-olds on the KABC-II).
A reference-variable approach can be useful in testing the invariance of factor structures when tests differ across groups, including when a test is administered to one age but not another. With this approach, a subtest not given at one age is treated as a latent variable for that age and as a measured variable for ages for which it is administered. Equality constraints (for factor loadings, error variances, and, if necessary, other parameters) are used to constrain relevant parameters to equivalence across ages (for more detail, see Keith & Reynolds, 2012; for an example, see Reynolds, Keith, Flanagan, & Alfonso, 2013). For the reference-variable analyses, we started with two separate, standardized models, to be analyzed in tandem, with equality constraints. This method allowed for the inclusion of several subtests completed by the 5-year-old group but not administered to the 4-year-old group. The first model, a first-order factor model for the 5-year-old age group, included all subtests completed by that age group with Conceptual Thinking and Pattern Reasoning loading onto a Gf factor, as they would for older children (Reynolds et al., 2007). This model also included the delayed recall versions of the Glr measures, administered to 5-year-olds, but not 4-year-olds. The second model, a first-order factor model for the 4-year-old age group, included the same structure as the first model but with the missing tests appearing as latent variables. The models are displayed in Figure 3.
Once the pattern of loadings was established as equivalent across groups (configural invariance), additional model constraints were added to test the equivalence of the Gf, Gv, and other factors across the ages, and then to test substantive hypotheses about the Gf and Gv factors (i.e., whether they are separable). Model fit was evaluated as described above; RMSEA was corrected for the number of groups.
Results
Initial Model and Modifications
The fit indices for the baseline model, shown in Table 3, suggest that this original model reflects a plausible structure for the KABC-II, lending support to the theoretical and scoring structure of the test. The first- and second-order factor loadings indicate that the subtests generally represent good measures of the broad abilities on which they load and that the broad abilities represent good measures of g. The standardized loadings are shown in Figure 1. The Face Recognition subtest, however, showed a low (.309) but statistically significant loading on Gv. Of the subtest loadings, Word Order showed the strongest loading as a measure of Gsm (.857). Of the broad abilities, Gv showed the strongest loading onto g (.951).
Summary of Minor Modifications and Two- versus Three-Stratum Comparisons.
Note. All models compared to the baseline model unless otherwise noted. CFI = comparative fit index; RMSEA = root mean square error of approximation; AIC = Akaike Information Criterion; aBIC = adjusted Bayes Information Criterion; Gc = crystallized intelligence; Gv = visual processing; Gsm = short-term memory; g = overall measure of ability.
Compared with a first-order model shown here as Model 7.
Several hypotheses were tested by comparing alternative models against this baseline model using a chi-square difference test. This approach was used for the five analyses described below.
Conceptual thinking
The first analysis tested whether Conceptual Thinking measures both Gv and Gc by allowing this test to load onto both broad abilities. This change resulted in no statistically significant improvement in model fit as shown by Δχ2 in Table 3. This finding suggests that the baseline model, with more degrees of freedom, represents a better fit to the data.
Face recognition
The second analysis tested whether Face Recognition measures both Gv and Gsm by allowing it to load onto both factors. This change did not result in a statistically significant change in Δχ2 (Table 3), favoring the baseline model.
Hand movements
The third analysis tested whether Hand Movements measures both Gsm and Gv. This change resulted in no statistically significant improvement in model fit (Table 3), supporting the baseline model over this alternative.
Gestalt closure
The fourth analysis tested whether Gestalt Closure measures both Gv and Gc. As with the previous models, this change resulted in no statistically significant improvement in model fit (Table 3), again supporting the baseline model.
Gv and Gc
The fifth analysis examined whether Gv and Gc were measured as distinct constructs. A first-order model, with no g factor and a single factor representing both Gv and Gc, was used to examine this question. The constrained model represented a significant worsening in χ2, suggesting that Gv and Gc do, in fact, measure distinct constructs.
Stratum Differentiation: Two-Stratum Models
The first two-stratum model examined whether or not the broad abilities were reflections of an overarching g factor. We constructed a two-stratum model with subtests loading onto broad abilities, but with no overarching g factor. Both Δχ2 and AIC suggested that a model lacking a g factor does not provide a statistically significant improvement in model fit, thus supporting the baseline higher order model.
The second two-stratum model examined whether a model with less differentiation represented a better fit by loading all subtests directly onto an overarching g factor. The fit indices showed a deterioration in model fit, suggesting, once again, that the baseline three-stratum model represents a better structure for the data.
The third two-stratum model tested was a bifactor model, a hierarchical model in which both broad abilities and a more general ability are conceived as first-order factors and thus as two strata rather than three. Although CHC theory is generally conceived of as a higher order model, and the KABC-II is based on a higher order structure, there is also much recent interest in bifactor models as an alternative to higher order models of intelligence (Murray & Johnson, 2013; Reise, 2012). The standardized results from a bifactor version of the KABC-II model for 4- and 5-year-olds are shown in Figure 2. The model used unit variance identification (i.e., a standardized model, Keith & Reynolds, 2012), and the two Glr subtest loadings were constrained to be equal to achieve identification. Although such models generally fit as well or better than higher order models (because they can be considered as less-constrained versions of those models, Keith & Reynolds, 2012; Reynolds & Keith, 2013), for the younger children in the KABC-II sample, this difference in fit was not statistically significant (Table 3). Thus, for all three 2-stratum models, the higher order model was supported on both empirical grounds and, in our opinion, on theoretical grounds.

Bifactor KABC-II model.
Age Differentiation: Gf and Gv
The general strategy for the reference-variable age-related model comparisons was first to establish configural invariance across the ages (Model 1) and then gradually constrain aspects of the model to equivalence across ages: factor loadings (metric invariance), factor variances, and factor covariances (Models 2-8; Keith, 2015; Keith & Reynolds, 2012). Intercept invariance was not of interest in this research. The fit of the models tested is shown in Table 4. Given our particular interest in the Gv and Gf factors, constraints to these factors were made prior to other aspects of the models. Once the equivalence of measurement and structure across groups was established, subsequent models tested the separability of the Gv and Gf factors for both age groups (Models 9-10).
Summary of Gf-Gv Models.
Note. All models compared with the previous model, unless otherwise noted. RMSEA is corrected for the number of groups. Gf = fluid/novel reasoning; Gv = visual processing; CFI = comparative fit index; RMSEA = root mean square error of approximation; AIC = Akaike Information Criterion; aBIC = adjusted Bayes Information Criterion.
Compared with Model 8.
The configural model, with no constraints beyond those necessary for identification and estimation, showed a generally good fit for the data (Figure 3 shows this model). The next step tested for metric invariance across age groups, first for the factor of interest (Gv, Model 2) and then for all factors (Model 3), by setting factor loadings to be equivalent across groups (the model did not test for metric invariance for Gf because this factor was already constrained in the configural model to allow for identification). An examination of the fit indices showed no significant changes in χ2 for either model (Table 4), suggesting that metric invariance holds across age groups.

Reference-variable analysis of the consistency of measurement of constructs for 4- and 5-year-olds.
The next step was to test for the equivalence of the Gv variance across age groups (Model 4), followed by the variances of the other factors (Model 5). The Gf-Gv covariance was constrained to be equal across ages (Model 6), followed by the covariances of the other factors with Gf and Gv (Model 7) and all other covariances (Model 8). As shown in Table 4, all such constraints did not degrade model fit to a statistically significant degree; the factor covariances and correlations for Gf and Gv are the same for these ages. These constraints to the structural model were in preparation for testing the separability of the Gf and Gv factors.
Models 9 and 10 tested whether the Gf and Gv factors are separable for the two age groups. In Model 9, factor correlations (within a standardized version of the model) between Gf and Gv were set to 1 for both ages. This constraint led to a statistically significant increase in Δχ2, meaning that these two factors are indeed separable. The second aspect of factor equivalence—constraining the Gf-to-other-factor correlations to be the same as the Gv-to-other-factor correlations (Keith & Reynolds, 2012)—was tested in Model 10. As shown in the table, these constraints did not further degrade model fit compared with Model 9, but the two sets of constraints together still showed a statistically worse fit than the model in which all parameters were simply equal across groups. If AIC were used to compare models, the same conclusions would be reached as with Δχ2, but aBIC (with its greater reward for parsimony) supported a model with Gf and Gv combined. Because the models are nested and Δχ2 was our primary measure for comparing competing nested models, we concluded that Gf and Gv are indeed separable. This conclusion is also supported by the magnitude of the correlation between Gf and Gv (from Model 8), .826, a value that is high, but not excessively so (the Gv-Glr correlation was larger, .877). Gf and Gv factors are measured by the KABC-II for both 4- and 5-year-old children, a finding in line with that seen in school-age children (Reynolds et al., 2007).
Discussion
With the increasing stress on early identification of learning difficulties, practitioners have found themselves with an urgent need for valid tools for assessing preschoolers. These findings should lend confidence to psychologists using the KABC-II to assess preschool children. Users of this measure should feel comfortable interpreting subtest scores according to the classifications presented within the manual. As the subtests included within this study appear to be strong measures of the single broad abilities on which they load, psychologists should also be able to use the KABC-II as part of a cross-battery assessment, pulling measures of specific broad abilities.
In addition, the uncertainty within the literature as to the development of differentiated cognitive abilities within the preschool years makes independent validation of the factor structure of the KABC-II even more essential. Our analyses supported the three-stratum model set forth in the manual and supported by CHC theory. Our findings, however, contradict previous research that suggests intelligence is better represented by a two-stratum model or model with fewer factors for young children, in comparison with the more complex representation of intelligence for older children (Blaga et al., 2009; Morgan et al., 2009; Tideman & Gustafsson, 2004; Ward et al., 2011). Consistent with other independent research (Reynolds et al., 2007), our results suggest that a model with both broad abilities and an overarching g factor provides the best fit for the data with this age range. In fact, our analysis of the distinctness of the Gf and Gv factors supported further differentiation within this age group, suggesting that the data are best represented by five, rather than four, broad ability factors.
Limitations
This study was restricted by the data available within the standardization sample for the KABC-II. Most significantly, this restriction led to the exclusion of two subtests, Block Counting and Pattern Reasoning, in the initial factor structure analyses. These subtests are included in both the theoretical and scoring structure of the KABC-II for 5-year-olds. This lack of consistent data, with the same subtests administered across the preschool age range, led to the exclusion of 3-year-olds from this study. Future studies, with consistent data across this age range, would be better equipped to examine both the questions posed within this study and the development of intelligence within this crucial period.
Furthermore, our study was restricted to information gathered from a single intelligence test, the KABC-II. Particularly with regard to the question of whether intelligence is less differentiated in young children, studies utilizing different measures have come to different conclusions. This variation suggests that future research should examine results across multiple measures of intelligence, using CFA to better determine the structure of intelligence within this age range.
Despite these caveats, this research supported the factor structure of the KABC-II as consistent with the scoring structure, presented in the test manual, for 4- and 5-year-old children. Such outside evaluations provide an important test of construct validity. Our findings also suggest, however, that the KABC-II is measuring both Gv and Gf abilities for these ages. These findings should add to the literature on the development of differentiated cognitive abilities within young children. Although it is common in downward, preschool extensions of school-age intelligence measures to provide less differentiation, these results support the existence and measurability of distinct broad abilities within this age group. In addition, support for a Gf broad ability factor in young children may spur test makers and psychologists to further develop subtests capable of assessing this ability in this age range.
Footnotes
Acknowledgements
We are grateful to Alan Kaufman and Pearson Clinical Assessment for access to the data used in this research.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
