Abstract
The Reynolds Intellectual Assessment Scales (RIAS) measures general intelligence and its two main components, verbal and nonverbal intelligence, each comprising of two subtests. The RIAS has been recently standardized in Denmark, Germany, Switzerland, and Spain. Using the standardization samples of the U.S. (n = 2,438), Danish (n = 983), German (n = 2,103), and Spanish (n = 1,933) versions of the RIAS, this study examined measurement invariance across these four language groups for a single-factor structure, an oblique two-factor structure with a verbal and nonverbal factor, and a bifactor structure with a general, a verbal, and a nonverbal factor. Single-group confirmatory factor analysis (CFA) supported the oblique two-factor and bifactor structure for each language group but not the single-factor structure. The bifactor analysis revealed that the general factor accounted for the largest proportion of common variance in each language group, while the amount of variance accounted for by the two specific factors was small and their reliabilities low. Multiple-group CFA supported scalar invariance in both, the oblique two-factor and bifactor structure.
Keywords
Intelligence is one of the most frequently studied psychological constructs (e.g., Goldstein, Princiotta, & Naglieri, 2015). A more recently introduced intelligence test is the Reynolds Intellectual Assessment Scales (RIAS) developed in the United States by Reynolds and Kamphaus (2003) and normed for individuals between the ages of 3 and 90 years. Over the last decade, several studies have repeatedly demonstrated the convergent, discriminant, and predictive validity of the RIAS version (Beaujean, Firmin, Michonski, Berry, & Johnson, 2010; Edwards & Paulin, 2007; Krach, Loe, Jones, & Farrally, 2009; Nelson & Canivez, 2012). Moreover, the RIAS has been evaluated to be independent of motor coordination, visual-motor speed, and reading skills, and has been found to be user friendly in terms of its administration, scoring, and interpretation (Andrews, 2007; Dombrowski & Mrazik, 2008; Elliot, 2004). Recently, the RIAS was successfully adapted to three other languages: The Danish version (Hartmann & Andresen, 2011) standardized in Denmark, the German version (Hagmann-von Arx & Grob, 2014) standardized in Switzerland and Germany, and the Spanish version (Santamaría & Fernández Pinto, 2009) standardized in Spain.
The theoretically proposed factor structure of the RIAS is based on Carroll’s (1993) three-stratum theory. The RIAS consists of four intelligence subtests at the first Stratum and two factors—verbal intelligence index (VIX) and nonverbal intelligence index (NIX)—at the second Stratum. These factors serve as indicators of crystallized and fluid intelligence, which, in turn, are combined to a composite intelligence index (CIX) at the third Stratum reflecting general intelligence (g). There is also a composite memory index (CMX) at the second Stratum, which is composed of scores from two supplemental subtests and not integrated into the measure of general intelligence (Dombrowski, Watkins, & Brogan, 2009; Nelson & Canivez, 2012; Nelson, Canivez, Lindstrom, & Hatt, 2007). Based on the theoretical assumptions, Reynolds and Kamphaus (2003) evaluated a series of factor structures for the RIAS and posit an oblique two-factor structure with four subtests measuring two correlated factors, verbal and nonverbal intelligence. Two alternative structures may be of special interest. One is a single-factor solution with general intelligence. The other alternative is a bifactor structure with one general factor measuring general intelligence and two specific factors measuring verbal and nonverbal intelligence. These three structures are shown in Figure 1. The two memory subtests have been found to be separate from the other RIAS subtests (e.g., Nelson et al., 2007).

The single-factor structure (a), two-factor structure (b), and bifactor structure (c) of the RIAS.
Assessing the factor structure of psychological tests has been found to be important in ensuring that test scores can be interpreted according to the posited test structure (e.g., Widaman & Reise, 1997). The RIAS factor structure proposed by Reynolds and Kamphaus (2003; Figure 1b) has been repeatedly supported in the United States and the three other language groups, in which the RIAS is available (i.e., Danish, German, and Spanish), using standardization samples (Hagmann-von Arx & Grob, 2014; Hartmann & Andresen, 2011; Reynolds & Kamphaus, 2003; Santamaría & Fernández Pinto, 2009). The factor structure was further supported in an independent study with a referred U.S. student sample (Beaujean, McGlaughlin, & Margulies, 2009). Studies have also found evidence for a single-factor structure (Figure 1a) for the U.S. RIAS standardization sample (Dombrowski et al., 2009) and for referred samples (Beaujean & McGlaughlin, 2014; Nelson & Canivez, 2012; Nelson et al., 2007). Combining the RIAS subtest with other tests of crystallized and fluid intelligence, Nelson and Canivez (2012) found in a clinical sample that the oblique two-factor structure was superior compared with the single-factor structure. Using the Schmid and Leiman (1957) procedure, which contains proportionality constraints, Dombrowski et al. (2009), Nelson and Canivez (2012), and Nelson et al. (2007) found also support for a higher order factor structure with general intelligence as a second-order factor, which accounted for the largest proportion of common variance.
Although the proposed RIAS structure with two main components was supported in each individual language group providing evidence for configural invariance of the RIAS (Meredith, 1993), it is unknown whether higher levels of factorial (measurement) invariance hold across the four language groups. The higher levels of measurement invariance are (e.g., Meredith, 1993; Widaman & Reise, 1997) metric invariance (invariant factor loadings across groups), scalar invariance (invariant factor loadings and intercepts across groups), and invariance of residual variances (invariant factor loadings, intercepts, and residual variances across groups). Configural invariance indicates that the factor structure is the same across groups but does not warrant that the observed variables measure the latent construct in the same way. Metric invariance implies that the observed variables are related to the latent variable equivalently across groups. This level of invariance allows a comparison of the groups in terms of path coefficients and covariances between observed and latent variables (Chen, Sousa, & West, 2005) and predictive or convergent validity from one group can be transferred to other groups (Widaman & Reise, 1997). Scalar invariance implies that group differences in observed means are due to a difference in latent means and, so, permits a comparison of the groups in terms of factor and observed means and variances (Widaman & Reise, 1997). This level of invariance allows practitioners to compare results of different language versions and assess individuals with a migration background in their native language when they have insufficient language skills in the country of resettlement (Sattler, 2001), which has become increasingly important as migration has increased worldwide in recent decades (Eurostat, 2014; U.S. Department of Homeland Security, 2013). Finally, invariance of residual variances allows a comparison of residual variances across groups (Chen et al., 2005).
The goal of the present study was to assess measurement invariance of the U.S., Danish, German, and Spanish versions of the RIAS for a single-factor structure, an oblique two-factor structure with a verbal and nonverbal factor, and a bifactor structure with a general, a verbal, and a nonverbal factor using standardized data of the standardization samples. The analysis of standardized data is desirable in situations where not identical tests are administered (e.g., Cudeck, 1989; Kim & Ferree, 1981) and is most relevant for practice, where very often standardized test scores are calculated and interpreted. Because invariance of residual variances holds rather rarely (e.g., Van De Schoot, Schmidt, De Beuckelaer, Lek, & Zondervan-Zwijnenburg, 2015), results were expected to provide evidence for scalar invariance but not invariance of residual variances.
Method
Participants
The study included a total of 7,457 individuals from the RIAS standardization samples. The U.S. standardization sample includes 2,438 individuals from the United States, the Danish standardization sample includes 983 individuals from Denmark, the German standardization sample includes 2,103 individuals from Switzerland and Germany, and the Spanish standardization sample includes 1,933 individuals from Spain. All standardization samples are equally distributed within each age group with respect to participants’ gender (50% female) and are meant to match the Census data on educational attainment (years of education completed in the United States and level of education in the other samples) of the respective countries. The U.S. sample considered also ethnicity and geographic region of the participants. The U.S. RIAS version is standardized for individuals aged 3 to 94 years (M = 24.07 years, SD = 22.42). The Danish (M = 20.83 years, SD = 19.53 years) and German (M = 19.56 years, SD = 20.33 years) versions are standardized for individuals aged 3 to 99 years (M = 19.56 years, SD = 20.33 for the German version, M = 20.83 years, SD = 19.53 for the Danish version). The Spanish RIAS version is standardized for individuals aged 3 to 94 years (M = 21.75 years, SD = 20.51). Demographic characteristics are provided in the Technical Manuals (Hagmann-von Arx & Grob, 2014; Hartmann & Andresen, 2011; Reynolds & Kamphaus, 2003; Santamaría & Fernández Pinto, 2009).
Measures
The adaptations of the RIAS for Denmark, Germany and Switzerland, and Spain were done in several steps. First, items were translated by experienced translators and individuals with psychological education. Second, all items were examined whether they are suitable for the respective language group. Items referring specifically to the American culture were modified so that they were appropriate for the culture of the respective language group (e.g., the American Newspaper machine was replaced by a machine used in European countries). In addition, the items were empirically pretested and the order of the items was adjusted to ensure an order of ascending difficulty (e.g., questions regarding baseball were easier for Americans while questions regarding soccer were easier for Europeans). Finally, the adaptations were standardized using a representative sample of the respective language group.
The RIAS yields three factors: CIX, VIX, and NIX. VIX and NIX comprise two subtests each. The verbal subtests are Guess What (GWH; identifying an object or a concept through the use of verbally presented clues) and Verbal Reasoning (VRZ; completing verbal analogies). The nonverbal subtests are Odd-Item Out (OIO; identifying a picture that does not go with the others) and What’s Missing (WHM; identifying the missing element of a presented picture). The CIX is calculated from the sum of the T scores (M = 50; SD = 10) of the four intelligence subtests. The RIAS also includes a conormed, supplemental CMX comprising two subtests, Verbal Memory (VRM; reproducing verbally presented sentences and short stories), and Nonverbal Memory (NVM; recognizing visually presented objects). In line with suggestions of others (e.g., Nelson & Canivez, 2012), the memory subtests were not included in the analysis.
Data Analysis
The software package lavaan (Rosseel, 2012) in R and maximum likelihood estimation method were used to assess measurement invariance of the RIAS across the four language groups. Correlation matrices of the subtest T scores (M = 50; SD = 10) with means and standard deviations were used as data input (see Online Supplemental Table 1), because the publisher of the U.S. RIAS version refused to make the raw data available. First, as suggested by Meade, Johnson, and Braddy (2008), a single-group confirmatory factor analysis (CFA) was conducted to test the three models presented in Figure 1 for each language group separately. It is worth noting that these models are scale-invariant allowing an appropriate analysis of standardized data (Cudeck, 1989). For the bifactor model, the explained common variance (ECV; Ten Berge & Sočan, 2004), coefficient omega hierarchical (ω H ; McDonald, 1999), which has been found to be a more appropriate measure of reliability than coefficient alpha (e.g., Watkins, 2017), and construct replicability H (Hancock & Mueller, 2001) were calculated for the general factor and the two specific factors using the software Omega 1 (Watkins, 2013). Second, a multiple-group CFA was conducted across the four language groups. Finally, the original U.S. version was compared with each of the other three language versions.
The different levels of measurement invariance were tested using procedures proposed by Meredith (1993), Milfont and Fischer (2010), Steenkamp and Baumgartner (1998), and Widaman and Reise (1997). First, configural invariance (Model 1) was tested. In all three models, the factor loading of GWH was set to 1. In the oblique two-factor model, the factor loading of OIO was also set to 1 as well. In the bifactor model, the loadings on the verbal and nonverbal factors were all set to 1. Because a bifactor model with only two indicators per factor is not identified, the variances of the verbal factor and nonverbal factor were constrained to be equal. In doing so, this model parallels the model estimated by Beaujean and McGlaughlin (2014). Finally, the mean of the general factor, verbal factor, and nonverbal factor were fixed to zero. Next, metric invariance (Model 2), scalar invariance with free latent factor means for the Danish, German, and Spanish versions (Model 3), and invariance of residual variances (Model 4) were assessed.
To assess the model fit, the following goodness-of-fit indices and criteria were used: comparative fit index (CFI) of ≥.95, McDonald’s noncentrality index (Mc) of ≥.90 and root mean square error of approximation (RMSEA) of ≤.06 (Hu & Bentler, 1999). Because of the large sample size, the chi-square test statistic was not considered (Meade et al., 2008), instead the ΔCFI and ΔMc were calculated, with values of ≤.002 for ΔCFI and ≤.005 for ΔMc indicating insignificant reduction in model fit (Meade et al., 2008).
Results
Single-Group CFA
For each language group, each factor model was tested with the four intelligence subtests as indicators. It is critical to note that the oblique two-factor model and the bifactor model are statistically equivalent (Canivez, 2016; Canivez & Watkins, 2016; Lee & Hershberger, 1990; Reise, 2012) and statistically not distinguishable from one another using single-group CFA. The fit estimates of these models are presented in Table 1. As can be seen, all language groups yielded a good fit for the oblique two-factor model and the bifactor model. For the single-factor model, the RMSEA indicated a poor fit for all four language groups.
Fit Indices for Single-Group Confirmatory Factor Analyses for the Four Language Groups.
Note. nUnited States = 2,438, nDanish = 983, nGerman = 2,103, nSpanish = 1,933. df = degrees of freedom; CFI = comparative fit index; Mc = McDonald’s noncentrality index; RMSEA = root mean square error of approximation; CI = confidence interval.
Using bifactor analysis, ECV, ω H , and replicability H for the general factor and the two specific factors are given in Table 2. The estimates of the ECV show that more than 70% of the common variance is attributable to the general factor in each language group. For the two specific factors, the amount of variance accounted was small and their reliabilities low. The standardized factor loadings for the oblique two-factor and bifactor model are provided in Online Supplemental Table 2.
ECV, ω H , and Replicability H.
Note. ECV = explained common variance; ω H = omega hierarchical; g = general intelligence; v = verbal intelligence; nv = nonverbal intelligence.
Multiple-Group CFA
Next, measurement invariance across all groups using multiple-group CFA was assessed for the oblique two-factor model and the bifactor model but not for the single-factor model due to its poor fit. Four models were tested implying configural invariance, metric invariance, scalar invariance with free latent factor means for the Danish, German, and Spanish versions, and invariance of residual variances. Fit estimates for the oblique two-factor model (Figure 1b) are provided in Table 3. As expected, Model 1 (configural invariance), Model 2 (metric invariance), and Model 3 (scalar invariance) showed a good fit. Comparing Model 1 with Model 2 and Model 2 with Model 3, the ΔCFI and ΔMc revealed no significant reduction in model fit, indicating that scalar invariance holds across all four groups. That is, the factor structure, the loadings, and the intercepts did not differ significantly across the four groups. Model 4 implying invariance of residual variances yielded a poor fit, meaning that the variances of the error terms varied significantly across groups.
Fit Indices for Multiple-Group Confirmatory Factor Analyses Evaluating Measurement Invariance of the Oblique Two-Factor Structure.
Note. N = 7,457, nUnited States = 2,438, nDanish = 983, nGerman = 2,103, nSpanish = 1,933. df = degrees of freedom; CFI = comparative fit index; Mc = McDonald’s noncentrality index; RMSEA = root mean square error of approximation; CI = confidence interval.
Results of the bifactor model (Figure 1c) are given in Table 4. In contrast to single-group analysis, the fit statistics for the oblique two-factor model and the bifactor model were not identical using multiple-group analysis. The configural model, metric model, and scalar model showed a good fit, but the ΔCFI indicated a significant reduction in fit of the metric model compared with the configural model. The model implying invariance of residual variances model yielded a poor fit. Comparing the fit of the bifactor model with the fit of the oblique two-factor model, the values of the CFI, Mc, and RMSEA were all in favor of the oblique two-factor model.
Fit Indices for Multiple-Group Confirmatory Factor Analyses Evaluating Measurement Invariance of the Bifactor Structure.
Note. N = 7,457, nUnited States = 2,438, nDanish = 983, nGerman = 2,103, nSpanish = 1,933. df = degrees of freedom; CFI = comparative fit index; Mc = McDonald’s noncentrality index; RMSEA = root mean square error of approximation; CI = confidence interval.
The results of the comparison of the original U.S. version with each of the other language versions are shown in Table 5 for the oblique two-factor solution and in Table 6 for the bifactor solution. For the oblique two-factor structure, invariance of residual variances could be supported for the German version, indicating that the factor structure, the loadings, intercepts, and residual variances were not significantly different across the two groups. Scalar invariance was supported for the Danish and Spanish versions. For the bifactor structure, invariance of residual variances held for the German and Spanish versions, but the ΔCFI indicated a significant reduction in fit for the Spanish version between the scalar model and the model implying residual variance invariance. For the Danish version, only configural invariance hold, meaning that the U.S. version and the Danish version differed in the factor loadings, intercepts, and residual variances.
Fit Indices for Multiple-Group Confirmatory Factor Analyses Evaluating Measurement Invariance of the Oblique Two-Factor Structure Comparing the U.S. Group to the Danish, German, and Spanish Groups.
Note. N = 7,457, nUnited States = 2,438, nDanish = 983, nGerman = 2,103, nSpanish = 1,933. df = degrees of freedom; CFI = comparative fit index; Mc = McDonald’s noncentrality index; RMSEA = root mean square error of approximation; CI = confidence interval.
Fit Indices for Multiple-Group Confirmatory Factor Analyses Evaluating Measurement Invariance of the Bifactor Structure Comparing the U.S. Group to the Danish, German, and Spanish Groups.
Note. N = 7,457, nUnited States = 2,438, nDanish = 983, nGerman = 2,103, nSpanish = 1,933. df = degrees of freedom; CFI = comparative fit index; Mc = McDonald’s noncentrality index; RMSEA = root mean square error of approximation; CI = confidence interval.
Discussion
This study assessed measurement invariance of the RIAS across the U.S., Danish, German, and Spanish versions for a single-factor structure, an oblique two-factor structure, and a bifactor structure using the standardization samples. Employing CFA techniques, results supported the oblique two-factor structure and the bifactor structure for all four versions. The bifactor structure tested in this article is statistically equivalent to the two-factor structure using single-group analysis and has been recognized as the best representation of Carroll’s (1993) three-stratum theory (Beaujean, 2015; Carroll, 2003; Cucina & Howardson, 2016). The results of the bifactor analysis support previous findings that the general factor accounted for the largest proportion of common variance in each language group (e.g., Dombrowski et al., 2009; Nelson & Canivez, 2012; Nelson et al., 2007). The finding that the amount of variance accounted for by the two specific factors was small and that the reliabilities tended to be low in each language group impeded a confident interpretation of these two factors. This last finding is not surprising because with only two indicators per specific factor it is difficult to capture substantial unique variance in a bifactor structure. No evidence was found supporting the single-factor structure with one general factor, indicating that this structure is too parsimonious for the RIAS.
Using multiple-group CFA, the findings of the current study provide evidence for scalar invariance in both, the oblique two-factor model and the bifactor model. However, the oblique two-factor model fitted the data slightly better than the bifactor model, a finding that corroborates the notion that the oblique two-factor structure is theoretically more accepted for the RIAS (e.g., Reynolds & Kamphaus, 2003). This is further in line with previous studies providing evidence for the oblique two-factor structure (Beaujean et al., 2009; Hagmann-von Arx & Grob, 2014; Hartmann & Andresen, 2011; Reynolds & Kamphaus, 2003; Santamaría & Fernández Pinto, 2009). However, the results of the bifactor analysis revealed a strong general factor and a good model fit, although there are only two indicators per specific factor, which requires that the loadings of the specific factors are set to 1 and the variances of the specific factors are constrained to equality to identify the model. Invariance of residual variances did not hold across the four language groups (i.e., residual variances varied across groups), which may be due to varying reliabilities of the scales across groups (DeShon, 2004). Invariance of residual variances was only found between the U.S. version and the German version and, to a lower degree, between the U.S. version and the Spanish version.
This study has several strengths and some limitations. One strength is the evaluation of the RIAS factor structure across all language groups in which the RIAS currently is available. The large samples and the employment of confirmatory methods are further strengths. It should be noted, though, that analyses were conducted using standardized data of the standardization samples because raw data were not made available by all publishers of the RIAS. The use of standardized data, which express the extent to which a person deviates from the norm, is a common practice to test psychometric properties of intelligence tests (e.g., Beaujean, Freeman, Youngstrom, & Carlson, 2012; Beaujean & McGlaughlin, 2014; Benson, 2007; Benson, Beaujean, & Taub, 2015; Benson, Hulac, & Kranzler, 2010; Bialer, 1974; Kranzler, Benson, & Floyd, 2015; Kranzler, Floyd, Benson, Zaboski, & Thibodaux, 2016; Naglieri, Taddei, & Williams, 2013; Nelson & Canivez, 2012; Taub & Benson, 2013). However, future studies may use raw scores (i.e., number of questions answered correctly), which would enable researchers to address additional questions, such as whether general populations differ in their factor means. Due to the wide age range of the RIAS, the raw scores are typically highly correlated with age (Reynolds & Kamphaus, 2003). Therefore, analyses using raw scores would require the assessment of age-homogeneous groups (cf. Bialer, 1974). A second limitation of the current study is the use of samples representing the general populations, and, thus, it remains unknown whether measurement invariance holds for specific subpopulations. There is one study focusing on referred students, which found evidence that the U.S. RIAS factor structure holds across samples of referred students and the norming data of the RIAS (Beaujean et al., 2009). Furthermore, the brevity of the RIAS is a clear strength as it allows for efficient testing (Andrews, 2007; Dombrowski & Mrazik, 2008; Elliot, 2004) but limits the statistical evaluation of alternative factor structures. For example, additional measures of verbal and nonverbal abilities could be added that would allow the estimation of a joint bifactor model with free factor variances (see, for example, Canivez, Konold, Collins, & Wilson, 2009; Nelson & Canivez, 2012). Finally, invariance was examined at the level of subtests assuming similar loadings of the items across the language groups. Future research may analyze measurement invariance at the item-level using raw scores.
In conclusion, the current study strongly supports the oblique two-factor structure and the bifactor structure with a strong general factor. Both factor structures showed scalar measurement invariance across all four language groups, meaning that the RIAS test structure is comparable across individuals from the United States, Denmark, Switzerland and Germany, and Spain. No evidence was found for the single-factor structure, which seems to be too simple for the RIAS.
Supplemental Material
Supplement_Material_RIAS – Supplemental material for The Reynolds Intellectual Assessment Scales: Measurement Invariance Across Four Language Groups
Supplemental material, Supplement_Material_RIAS for The Reynolds Intellectual Assessment Scales: Measurement Invariance Across Four Language Groups by Jasmin T. Gygi, Thomas Ledermann, Alexander Grob, Myriam Rudaz and Priska Hagmann-von Arx in Journal of Psychoeducational Assessment
Footnotes
Acknowledgements
We are grateful to Hogrefe Psykologisk Forlag A/S, Virum, Denmark, for providing the data on the Danish RIAS standardization sample, and TEA Ediciones, Madrid for providing the data on the Spanish RIAS standardization sample. We thank Cecil R. Reynolds for helpful discussions on this topic.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Supplemental Material
Supplemental material is available for this article online.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
