Abstract
The construct validity of the International Cognitive Ability Resource (ICAR) has yet to be investigated using a gold-standard individually administered intelligence battery. The present study used a convenience sample of 97 students to examine the respective relations between the ICAR16 and overall intelligence (g) and the Cattell–Horn–Carroll broad abilities measured by the WAIS-IV. Large correlations were observed between the observed overall scores (rICAR16, full-scale IQ = .81, p < .001) and the CFA-estimated general factors (r = .94, p < .001). Evidence from confirmatory factor models suggests that the ICAR letter–number Series task measures fluid reasoning, while the matrix reasoning, verbal reasoning, and three-dimensional reasoning tasks measure visual–spatial reasoning (Gv). Findings support the ICAR16 as a valid brief measure of nonverbal intelligence; however, replications in larger samples are needed.
Keywords
Introduction
Flexible, well-validated that are tools freely available in the public domain facilitate cognitive ability measurement for researchers across fields. The International Cognitive Ability Resource (ICAR) is the first nonproprietary resource that freely distributes items to qualified researchers and encourages the development and contribution of items for external validation and use (Dworak, Revelle, Doebler et al., 2020). One of the main barriers to the adoption of the ICAR is that no research to date has been conducted on its construct validity using a gold-standard, individually administered cognitive battery.
The aim of the current study was to examine the construct validity of the ICAR Sample Test (ICAR16) through the framework of the most well-supported theory of intelligence, Cattell–Horn–Carroll (CHC) theory (McGrew, 2009). We examined the respective relations between the ICAR16 and overall intelligence (g) estimated by the WAIS-IV. Following the model suggested by Weiss, Keith, Zhu et al. (2013), we also examined the relations between the ICAR16 subtests and the five CHC broad abilities measured by the WAIS-IV: comprehension–knowledge (Gc), fluid reasoning (Gf), visual–spatial processing (Gv), short-term working memory (Gsm), and processing speed (Gs). Bivariate correlations and confirmatory factor analytic methods were used to examine the following research questions: (1) How does the ICAR16 compare to the WAIS-IV as an overall estimate of general intelligence? (2) How do the respective ICAR16 item types relate to the CHC broad abilities?
Method and Materials
Comparison of Demographic Information of Present and Respective Norming Samples.
Standardized Parameter Estimates (SE) for the Five-Factor Model.
Note. Gc = comprehension–knowledge, Gf = fluid reasoning, Gv = visual–spatial processing, Gsm = short-term working memory, and Gs = processing speed. *p < .05. **p < .01. ***p < .001.
The 16-item subset of the ICAR (ICAR16) was published by Condon and Revelle (2014) as a sample test. The ICAR16 includes four items from each of the four item types: letter–number series (LN16), matrix reasoning (MX16), verbal reasoning (VR16), and three-dimensional rotation (R3D16). Internal consistency was acceptable for three of the four item types (.59 < α < .74) except for MX16, which was only marginally acceptable (α = .52). The average reliability of the overall ICAR16 was adequate (α = .81; ω total = .83; ICAR16 ω hierarchical = .66).
All participants were administered the WAIS-IV by supervised graduate student examiners at a university-based assessment center. Participants in the clinical sample sought evaluation services through the assessment center and were administered the 10 core subtests of the WAIS-IV within a larger battery of neuropsychological tests. Students in the volunteer sample were only administered the 10 core subtests of the WAIS-IV. All participants completed the self-administered, untimed items of the ICAR16 on-site within the same testing session as the WAIS-IV.
Confirmatory factor analyses (CFA) were conducted in Mplus7.4 (Muthén & Muthén, 2012) using maximum likelihood (ML) estimation. To address the first research question, the correlation between the CFA-estimated general factors derived from the ICAR16 subtests and WAIS-IV composite indices (VCI, PRI, WMI, and PSI) was examined (Figure 1). Bivariate correlations between the ICAR16 total score and FSIQ were also examined. Correction methods for restriction of range (Alexander, 1990) and reliability (Murphy & Davidshofer, 1988) were replicated from the initial ICAR validation study (Condon & Revelle, 2014). Correlated general factor model.
To address the second research question, correlations between observed ICAR subtest scores and latent CHC constructs were inspected through confirmatory factor models based largely on the model proposed by Weiss et al. (2013). The magnitude of the correlations between ICAR16 observed scores and the latent factors (Figure 2) were used to inform a final CFA model (Figure 3). Multiple criteria were considered to evaluate model fit based on cut-off scores described by Keith (2019, p.327), including the Comparative Fit Index (CFI), Tucker Lewis Index (TLI), Root Mean Square Error of Approximation (RMSEA), Standardized Root Mean Square Residual (SRMR), Akaike Information Criterion (AIC), and Sample-Size Adjusted Bayesian Information Criterion (SABIC). Five-factor model with correlated International Cognitive Ability Resource–observed scores. Note. WAIS-IV subtest error terms and correlations between respective latent variables and ICAR16 subtests are not displayed for model clarity. Final structure suggested by data.

Results
To address research question one, general factor models were created for the respective tests (Figure 1). The model fit the data well (χ2(19) = 14.25, p = .77; RMSEA < .001; 90% C.I. (.00, .06); CFI = 1.00; TLI = 1.05; SRMR = .04; AIC = 4255.33; SABIC = 4240.76) and revealed a large correlation between the general factors (r = .94, p < .001). The uncorrected correlation between the respective overall observed scores, the FSIQ and ICAR16 total score, was moderate in magnitude (rFSIQ,ICAR16 = .62, p < .001), while the range- and reliability-corrected correlation was large (
Comparison of Final Structural Models.
Note. Gf = fluid reasoning, Gv = visual–spatial processing, CFI = comparative fit index, TLI = Tucker Lewis index, RMSEA = root mean square error of approximation, SRMR = standardized root mean square residual, AIC = Akaike information criterion, SABIC = sample-size adjusted Bayesian information criterion.
Final best fitting model depicted in Figure 3.
Given that MX16 demonstrated similarly sized correlation with both Gf and Gv (Table 2), this subtest was initially allowed to crossload on both constructs. Loading MX16 on only the Gf factor provided a slightly worse fit than the cross-loaded model (∆χ2(1) = 4.81, p = .03). Alternatively, loading MX16 on the Gv factor only did not significantly impact model fit (∆χ2(1) = 2.02, p = .16). Findings suggest loading MX16 on the Gv construct only provides the best fitting and most parsimonious model.
Discussion and Limitations
Based on data from a convenience sample of 97 university students, findings suggest the ICAR16 correlates with general ability estimates on the WAIS-IV at a magnitude similar to other brief measures of intelligence (e.g. Salthouse (2014)). As expected, the ICAR16 subtests demonstrated the largest correlations with fluid reasoning and visual–spatial processing constructs and the smallest correlations with processing speed and working memory. The best-fitting confirmatory factor model suggests the letter–number series subtest measures Gf and the three remaining ICAR16 subtests measure Gv.
These findings are somewhat surprising, given the content of the matrix reasoning and verbal reasoning tasks is more consistent with traditional measures of Gf. In regard to the former, some research has demonstrated that Gv is more important in matrix tasks than previously believed, and in some groups, equally important as Gf (Waschl, Nettelbeck, Jackson et al., 2016). As the ICAR16 matrix task demonstrated large correlations with both factors, it is likely that it calls on both Gf and Gv. It should also be noted that internal consistency of the ICAR matrix task is questionable (Condon & Revelle, 2014), which may confound correlations with the latent factors.
The relation between the verbal reasoning task and Gv is less easily explained. Although Gf and Gv are strongly related, their differentiating quality is Gv’s reliance on visual stimuli, which is completely absent in the ICAR verbal reasoning task. It is possible this finding stems from imprecision in the WAIS-IV indicators of Gv; at least one study indicated the visual puzzles subtest relies on multiple abilities beyond Gv in a mixed clinical sample (Fallows & Hilsabeck, 2012). Given the inconsistency with the theory, the strength of the relation between Gv and the verbal reasoning task may be a statistical artifact or product of the small sample, and replications are necessary.
The principal limitation of the present study is the nature and size of the convenience sample. The sample did not meet the minimum size recommended for CFA with ML estimation (N > 100) (Anderson & Gerbing, 1984), and thus, estimates may be biased and the quality of goodness of fit statistics is questionable. However, model convergence and lack of improper solutions in model estimation mitigate the risk of bias (Chen et al., 2001), and findings can be used to inform future research on the tool.
Generalizability of findings is also limited by the sample homogeneity in regard to age, race/ethnicity, and education, and the inclusion of a clinical subsample. Given the sample limitations, present analyses are considered preliminary and should be replicated in larger, more diverse samples.
Conclusions
The present study provides evidence that the ICAR16 is a valid brief measure of nonverbal intelligence. Findings suggest the letter–number series measures fluid reasoning, and matrix reasoning, verbal reasoning, and three-dimensional rotation measure visual–spatial abilities; however, results are somewhat inconsistent with theory and must be replicated. Overall, preliminary results support the use of the ICAR16 as a brief measure of collective nonverbal abilities (Gf and Gv), but present evidence does not support its use as a multidimensional measure of cognitive abilities. Findings provide a foundation for future validation research on the ICAR in larger samples and with more diverse item sets.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported in part by a Richard W. Woodcock Dissertation Research Award from the Woodcock Institute for the Advancement of Neurocognitive Research and Applied Practice.
