Abstract
The theory of triarchic intelligence posits that, in addition to the widely acknowledged analytical reasoning abilities, creative and practical abilities should be included in the assessments of intellectual capacities and identification of gifted students. To find support for such an approach, the present study examined the psychometric properties of the Aurora-a Assessment Battery of triarchic abilities in the upper primary grades. To assess the dimensional structure of the Aurora-a Assessment Battery, we analyzed subtest scores of 499 elementary school children. Correlational and factor analyses showed a poor fit between Aurora-a subtest scores and the triarchic theory of intelligence, indicating deficiencies in either the triarchic theory or the design of the Aurora-a Battery. Researchers should sustain their current efforts to evaluate the validity of various theories of intelligence and to develop theory-based assessment instruments.
Keywords
The most frequently used tools to assess cognitive abilities of children are standardized achievement tests and IQ (intelligence quotient) tests (McClain & Pfeiffer, 2012). However, the majority of states in the United States of America require the use of a multiple-criteria model to assess cognitive abilities of children (National Association for Gifted Children & Council of State Directors of Programs for the Gifted, 2015). This requirement is in line with the triarchic theory of intelligence, which states that assessments of cognitive abilities should address analytical, creative, and practical abilities (Sternberg, 2011; Sternberg & Grigorenko, 2002). The a-part of the Aurora Assessment Battery attempts to assess triarchic intellectual abilities in upper primary school children (Chart, Grigorenko, & Sternberg, 2008). Although the Aurora-a Battery is used in various U.S. states, as well as in European and Middle East countries (Tan et al., 2009), the triarchic structure is assumed and not thoroughly examined in previous studies. To date, it is unclear whether Aurora-a subtests indeed reflect the three types of intellectual abilities. Therefore, the present study examined whether the Aurora-a Battery can discriminate analytical, creative, and practical abilities in Dutch upper elementary school children.
Cognitive abilities are at the foundation of most theories of intelligence ever since the introduction of a general intelligence factor (i.e., g-factor) by Spearman (1904). Current theories of intelligence, however, assume intelligence to comprise a broad range of cognitive aspects (Ziegler & Heller, 2000). The Cattell–Horn–Carroll model of intelligence (CHC model; McGrew, 1997), for example, incorporates Cattell’s theory on fluid and crystallized intelligence and Carroll’s three-stratum theory. The CHC model proposes a number of broad abilities that are on the one hand related to general intelligence and on the other hand to a great variety of narrow abilities. In contrast, Guilford (1959) made a distinction between two types of intelligence: convergent thinking and divergent thinking. Sternberg’s (2011) theory of triarchic intelligence also emphasized the role of divergent thinking abilities next to analytical abilities, although he referred to it as creativity. In contrast to other theories of intelligence, however, the triarchic theory of intelligence assumed a third type of ability to be of equal importance: practical abilities.
Practical ability can be defined as “the ability to adapt to, shape, and select environments” (Sternberg et al., 2000, p. 1) so that these align better with an individual’s needs, abilities, and desires. In contrast to the formal and declarative academic knowledge that is represented as analytical abilities, practical abilities involve the use of tacit and procedural knowledge. More specifically, analytical and creative abilities are used to come up with solutions for real-life problems, yet practical abilities involve implementation of these solutions in the context via strategies that are often acquired implicitly. That is, strategies are learned without explicit instruction and are therefore also referred to as tacit knowledge (Cianciolo et al., 2006).
The assessment of this third type of ability calls for tacit knowledge tests or practical ability inventories (Cianciolo et al., 2006; Sternberg, 2011). In these kinds of tests, participants have to find a solution for common problem situations either in real-life tests or via paper-and-pencil assignments. As indicator for practical intellectual abilities, participants have to make a situational judgement by specifying the usefulness of various responses to these situations.
Although instruments for the assessment of analytical, creative, or practical abilities are available, practical and creative assessment instruments are used only limitedly. A national survey of state policies and practices in the United States showed that potential cognitive abilities were often identified by standardized IQ test and achievement test scores (McClain & Pfeiffer, 2012). As a consequence, children with abilities that are not recognized by these traditional assessments are underrepresented in gifted programs, as are minority children and children from low–socioeconomic status (SES) backgrounds (Chart et al., 2008). Assessment of a broader range of cognitive abilities might especially benefit minority and economically disadvantaged students (Stemler, Grigorenko, Jarvin, & Sternberg, 2006). The Aurora Assessment Battery (Chart et al., 2008) is designed to recognize children with analytical, creative, or practical talents so that a more diverse population of children gains access to gifted programs. Especially for triarchic enrichment programs, in which teachers provide analytical, creative, and practical assignments (e.g., Aljughaiman & Ayoub, 2012), insight into children’s intellectual profiles might help them align their teaching to the individual ability levels of the children.
The Aurora Battery consists of two parts, both of which are group-administered, paper-and-pencil-tests. The Aurora a-part is grounded in the theory of triarchic intelligence and comprises analytical, creative, and practical subtests. Subtests are balanced across a verbal, figural, and numerical domain to allow students to demonstrate multiple and varied types of abilities. Whereas a supplemental Aurora g-part assesses conventional g-factor cognitive abilities (Chart et al., 2008), our study was concerned only with the assessment of triarchic abilities with the Aurora-a part.
Thus far, only four studies have been conducted with regard to the psychometric qualities of the Aurora-a subtests. Only one of these studies, however, examined whether the underlying structure of the Aurora-a Battery matched the triarchic theory of intelligence. In a first study, Kornilov, Tan, Elliott, Sternberg, and Grigorenko (2011) found Aurora-a subtest scores to be substantially and positively related to conventional English achievement tests (i.e., median r =.50 for MidYIS and median r =.43 for Key Stage 1 and 2). However, only 10% to 20% of children classified as gifted based on achievement test scores were also classified as gifted based on their Aurora scores. Similarly, Mandelman, Barbot, Tan, and Grigorenko (2013) found classification agreement rates of 38.5% for analytical abilities, 15.1% for creative abilities, and 61.5% for practical abilities between the TerraNova test for academic achievement and the Aurora-a Battery. A study conducted by Mandelman, Tan, Kornilov, Sternberg, and Grigorenko (2010) examined the association between children’s self-reports of triarchic abilities and their scores on analytical, practical, and creative subtests as examined with the Aurora-a. Their results showed statistically significant, yet small, correlations between the two types of assessment of triarchic intellectual abilities. However, analytical self-concept scores were also statistically significant related to practical ability scores, as were practical self-concept and analytical ability scores. All three studies assumed the three-factor structure to be present in this test battery without analyzing this a priori on an item or subtest level. Although reliability statistics on subscale levels suggested high internal consistency between items within the three ability and three domain subscales (Mandelman et al., 2010), it was not examined whether item scores indeed coherently added up to subtest scores.
In a fourth study, Aljughaiman and Ayoub (2012) attempted to check whether the data of the Aurora-a Battery reflected the triarchic structure. To do so, they calculated analytical, creative, and practical subtest scores. Moderate Cronbach’s alpha values were reported for analytical (α =.71) and creative abilities (α = .67), as well as for practical abilities (α = .68). However, such alpha values can be found in both unifactorial and multifactorial test batteries (Drenth & Sijtsma, 2006) and thus cannot be used as indicator of the underlying structure of a test. Next, Aljughaiman and Ayoub (2012) split the ability scores in verbal, figural, and numerical scores so that nine ability-domain subscale scores (e.g., analytical-verbal, analytical-numerical) were calculated. These nine subscale scores were included as dependent variables in a confirmatory factor analysis (CFA). Results showed high factor loadings (.64-.85) for all nine ability-domain subscales. Based on these results, the authors concluded that Aurora-a Battery scores adequately fitted the theory of triarchic intelligence. However, this latter study has the methodological drawback that the CFA was performed on a combined subtests level. Combining scores like this is a form of subtest parceling, which reduces the uniqueness of constituent subtests and inflates fit statistics in CFAs and structural equation modeling (SEM; Bandalos, 2002; Sass & Smith, 2006).
To sum up, it is clear that even though the theory of triarchic intelligence is rich and full of potential for practical applications (Grigorenko, Jarvin, & Sternberg, 2002; Sternberg & Clinkenbeard, 1995), it needs more data to support the claims. To date, especially research on the assessment of triarchic abilities in primary school children is rather limited. The Aurora-a Battery was developed to assess analytical, creative, and practical abilities in U.S. elementary and middle school children (Chart et al., 2008). In three of the studies on the psychometric qualities of the Aurora-a Battery conducted so far, the underlying factor structure was assumed but not examined. Moreover, no attempts have been made to examine whether item scores indeed coherently added up to subtest scores. In the only attempt to explore the underlying structure, Aljughaiman and Ayoub (2012) included combined subtest scores and not single subtest scores of children in Saudi Arabia. In the present study, we investigated the psychometric qualities of the Dutch version of the Aurora-a Battery. Because the Aurora was developed for American children, we started from item-level analysis to prevent biases due to differences in the cultural and linguistic environment of American and Dutch elementary school children. Next, we used correlational and factor analyses to examine the underlying triarchic structure of the Aurora-a Battery.
Method
Participants
To obtain a sample of 500 participants, we sent invitation letters to all primary schools located in three Dutch municipalities (i.e., Ede, Zeist, and Oss) in the central/south part of the Netherlands. Of these 86 schools, we invited the first 6 schools that agreed to participate in the present study. Subsequent schools were kindly informed that full participation had been accomplished and were invited to participate in a follow-up study. Children attending the schools in response to our invitation mostly stemmed from high-SES backgrounds. Because the number of children was close to 500, we did not approach the remaining schools.
Participants were 499 children from fourth (six classes,n = 149), fifth (six classes, n = 195), and sixth grade (six classes, n = 155). The average age of all participants was 11 years and 1 month, and 48.1% were boys. Parents of all children provided consent for participation.
Materials
The Aurora-a Battery (Chart et al., 2008) comprises 17 subtests divided over three domains (visual-spatial, verbal, and numerical) and three abilities (analytical, creative, and practical). Table 1 presents subtest names for all nine ability-domain combinations. The developers of the Aurora-a gave consent to translate the subtests into Dutch and provided us with all the necessary materials. For all subtests, we translated the instructions as strictly as possible. Except for the general instructions, the items of the visual and numerical subtests involved little or no language and were thus a one-to-one translation into Dutch. The translation of the verbal subtests was more complex. Because the items concerned children’s knowledge of certain linguistic or contextual characteristics, items had to be adapted to suit the level of knowledge of Dutch children. Translators discussed any doubts with regard to the content and level of difficulty of the translated version with the developers, a consortium of international Aurora researchers, and Dutch elementary school teachers. The verbal-practical subtest Headlines involved figurative language, which is only incidentally used in Dutch. Because it was problematic to maintain equivalencies with respect to meaning, psychometric construct, and item difficulty, the subtest was not translated into Dutch and not included in the present study.
The Subtests of the Aurora Divided Over the Three Intellectual Abilities and Domains.
Note. MC = multiple-choice; SA = short answer; ES = essay.
Subtest was not included in the present study.
Subtests’ answering format was open-ended or multiple-choice. The open-ended items required children to write down either an essay or a short answer (i.e., one word or a number). Coders polytomously rated 20% of the essays using the original Aurora-a Battery scoring manual. This manual provides extensive lists of examples of answers given by children together with their corresponding ratings. To get acquainted with the Aurora and its scoring manual, coders first rated data of a pilot study. Raters reviewed their ratings and discussed about ambiguities until the interrater correlations were .70 or higher. We again discussed any doubts with regard to the interpretation of criteria with the international consortium of Aurora researchers. Subsequently, multiple raters rated items for at least 90 children per subtest.
Interrater correlations were high (.72 ≤ rs ≤ .95, n ≥ 30, ps = < .001). The short open-ended answers were dichotomously scored (incorrect = 0, correct = 1), as were the multiple-choice answers.
The following six subtests from the Aurora-a Battery assessed children’s analytical intellectual abilities:
Boats. This subtest presented 10 photographs displaying toy boats, which were connected to one another by a cord. Boats could float around on the water but always stayed connected in the same way. Children had to choose which out of four possible photographs displayed an impossible position of toy boats. Every correct answer rendered 1 point.
Shapes. This subtest assessed analytical abilities by presenting 10 figures of a broken shape with one piece missing. Children had to figure out which of four possible pieces would complete the broken shape, earning 1 point for every correct answer.
Homophones. This subtest consisted of two parts. In Part A, children had to complete nine sentences by filling in two words sounding the same but having different meanings, for example, wear–where. In Part B, children had to complete six sentences by filling in two words with reversed orders of strings, for example, desserts–stressed. Children earned 1 point for every correct pair of words. Because the words in this subtest had to be homophones, we could not include a translation of the English words; thus, other words were included in the Dutch version.
Metaphors. In this subtest, children had to finish nine metaphorical sentences by elaborating on the similarities between two objects. Raters coded the answers according to two criteria: (a) to what degree the child is able to think comparatively and (b) to what degree the child is able to identify common elements with clear, specific, and imaginative language. The mean percentage of agreement between raters was 72.5%.
Letter Math. This subtest presented five math problems, consisting of imaginative cards with a letter on one side and a number on the other. Children had to figure out which number should come on the back of the letter cards to correctly solve the math problem. A maximum of 11 points could be earned by replacing letters with the correct numbers.
Algebra. This subtest comprised five numerical story problems that had to be solved by careful reading and calculating. In some problems, more than one answer should be given, so that a total of eight points could be earned.
The following five subtests assessed creativity:
Book Covers. This subtest intended to measure creativity by presenting five images that had to be interpreted as book covers. Children had to write down, thereby expressing their creativity, what the imaginary books could be about. Raters coded their answers according to two criteria: (a) the degree to which the child conducted the task adequately and (b) the degree to which the child created an original and substantial story accompanying the picture. The mean percentage of agreement between raters was 66.0%.
Multiple Uses. In this subtest, children had to write down as many unusual uses of five common objects (e.g., chalkboard eraser and hammer) as they could make up. Coders rated (a) the degree to which the child expressed a clear list of multiple atypical uses and (b) the degree to which answers were detailed and original. The mean percentage of agreement between raters was 77.4%.
Conversations. In this subtest, children had to write down conversations between two common objects (e.g., fork/knife and mountain/ocean). Coders rated (a) the degree to which the child expressed substantial dialogues and (b) the degree to which a dialogue identified both characters in novel exchange. The mean percentage of agreement between raters was 74.8%.
Figuratives. This subtest comprised 12 sentences with a figurative element in it. Children had to choose which out of four alternatives would best fit within the story following the given sentence. Children earned 1 point for every correctly marked answer. In the Dutch version, we included Figuratives that we assumed upper elementary school children to be familiar with.
Cartoon Numbers. In this subtest, children had to write down a conversation between two numbers within seven given scenarios. Coders rated (a) the degree to which a social element was included and (b) the degree to which responses incorporated both knowledge of numeric values and personification of numbers within a social situation. The mean percentage of agreement between raters was 72.5%.
The following five subtests assessed children’s practical intellectual abilities:
Toy Shadows. This subtest presented eight photographs of a light shining on a toy placed in front of a screen. Children had to indicate which out of four photographs showed the exact shadow that would be projected on the screen. Every correct answer yielded 1 point.
Paper Cutting. Children saw 10 photographs of folded pieces of paper. In these photographs, an area was shaded to indicate which part should be imaginatively cut out. Children had to indicate which out of four photographs of cut-out, unfolded papers displayed the correct answer. Correct answers bore 1 point.
Decisions. This subtest presented three scenarios. Children had to designate whether statements were pro or con arguments for a decision within the given scenario. Irrelevant statements had to be left out. All correctly designated statements were worth 1 point so that a total of 17 points could be earned.
Money. This subtest consisted of five scenarios in which a number of persons had to divide a bill, thereby also taking into account debts from previous transactions. Children had to write down the expenses of 13 persons, bearing a maximum of 13 points.
Maps. In this subtest, children had to draw a line showing the shortest route to the movie theatre for 10 items, picking up a couple of friends from their homes along the route. Every fully correct route was worth 2 points; partly correct routes were worth 1 point.
Procedure
We group administered the Dutch version of the Aurora-a Battery to all children in the 18 participating classrooms in multiple sessions. The order of subtests was randomly divided over either two or three test booklets. The 45- to 60-minute sessions occurred in 1 or 2 days, dependent on the preferences of the teacher, always with a total of 120 minutes to complete the Aurora-a Battery.
Statistical Analyses
We examined the structure of the Aurora-a Battery from two perspectives. First, we used test- and item analyses to evaluate the psychometric quality of the Aurora-a items and subtests. We computed the rit value for each item, and in addition, we estimated reliability statistics for each subtest. The rit value is the correlation between the item score and subtest score. Because rit values are inflated by item overlap, we corrected values by subtracting the item variance and replacing this with the best estimate of common variance (i.e., the squared multiple correlation). Negative rit values are indicative of poor item qualities and therefore problematic. Values between .00 and .19 indicate that the item does not discriminate well, values between .20 and .29 indicate sufficient discrimination, and values of .30 and above indicate good discrimination (Ebel & Frisbie, 1991). We estimated reliability in terms of the greatest lower bound (GLB) and Guttman’s lambda2 because these measures provide a weaker underestimation of the actual level of reliability than Cronbach’s alpha (Sijtsma, 2009; Ten Berge & Sočan, 2004). Following guidelines suggested by Sijtsma, Lucassen, Meijer, and Evers (2010), we considered reliability coefficients higher than .80 to be good and values below .70 to be insufficient.
Second, we calculated correlations between all Aurora-a subtests. In addition to the original correlations between subtests, we calculated disattenuated correlations (Osborne, 2003) in an attempt to be more realistic in our estimation of correlations. In the correction for attenuation, we used the GLB to get the most conservative estimation of the latent correlation. Original correlations served as input for subsequent factor analyses.
We used a CFA to examine whether the triarchic structure was present in the data regarding the 16 Aurora-a subtests. Subtests were classified over three latent factors, corresponding with the three types of abilities as suggested by the triarchic theory of intelligence. We allowed factors to correlate because the theoretical model posits the three aspects of intelligence to be distinct but related abilities (Kornilov et al., 2011). We used guidelines by Hu and Bentler (1999) to evaluate the fit between the model and the data. Although these guidelines are not free from imperfections (e.g., Fan & Sivo, 2005), Hu and Bentler’s (1999) comparative fit index (CFI) should exceed .95 for the model to accurately fit the data. The root mean square error of approximation value should not exceed .06 as an indicator of discrepancies between observed and predicted covariances.
We used the R package “psych” (Revelle, 2015) to conduct test and item analyses and LISREL Version 9.1 (Jöreskog & Sörbom, 2012) to conduct the CFA. Although the analyses are rather straightforward, missing data complicated the situation. The number of missing values ranged from 1% for Boats to 44% for Cartoon Numbers. The approach to handle the problem of missing data entailed the computation of maximum likelihood (ML) estimates of the mean vector and covariance matrix for the variables of interest (see, e.g., Little & Rubin, 1987). The estimates were obtained using the expectation–maximization (EM) algorithm (Dempster, Laird, & Rubin, 1977). Application of the EM algorithm results in a mean vector and covariance matrix based on all collateral information available (Cudeck, 2000). The ML estimates of the means and covariances can be directly used in any multivariate analysis, but for practical reasons, we produced a single data set with imputed values based on the ML estimates. For each missing value, the point estimate was filled in on the basis of the ML estimates of the means and covariances (see Truxillo, 2005). To use the EM algorithm, we assumed that the data were multivariate normal and that the missingness was at random (MAR). Although simulations suggest the EM algorithm to be quite robust to violations of the multivariate normality assumption (e.g., Allison, 2006; Enders, 2001; Graham, Hofer, & MacKinnon, 1996; Graham & Schafer, 1999), we checked the skewness and kurtosis of the score distributions. As can be seen from Table 2, almost all of the univariate distributions had a skew and kurtosis below +1.5 or above −1.5. This means that the distributions can be considered sufficiently close to normal (George & Mallery, 2010; Kline, 2005; Tabachnick & Fidell, 2013).
Descriptive Statistics for the Item–Total Correlations.
Note. P10 = 10th percentile score; P90 = 90th percentile score.
It is more difficult to check the MAR assumption. If there is no serious reason to assume nonrandomness, erroneous assumption of MAR often has minor impact (Collins, Schafer, & Kam, 2001), but nevertheless checked whether the subjects with missing values were different from the subjects without missing values. If we compare the means of the responders and nonresponders on each subtest by conducting a series of t tests and use the Bonferroni–Holm step-down procedure to adjust the p values for multiple testing, we see that in only a small 3% of the cases the two groups are significantly different from each other. This means that there is no reason to assume that the MAR assumption does not hold.
Results
Table 2 reports statistics with regard to the rit value, skewness, and kurtosis for all subtests. Because correlations have a skew distribution, the arithmetic mean of the item–total correlations of a subtest is not an appropriate reflection of the average correlation. Therefore, we first transformed rit values into Fisher’s Z-values. Next, we calculated the mean of these transformed values and subsequently transformed these back to a mean rit value.
Because the American Homophones and Figuratives subtests could not be used in Dutch children, new sentences had to be created in translating these subtests. One of the Dutch Homophones items was too complex for the 8- to 13-year-old children participating in this study. This item involved the low-frequent word “to stare” (in Dutch: staren) to be filled in the blanks, whereas the other Homophones items involved high-frequent words (Dutch Word Frequency List, 2014). In addition, one Figuratives item showed low rit values. With 42% of children answering the item correctly, this item was not too difficult. However, the low rit value indicated that this item did not map into the same ability as the other items of this subtest. For both Paper Cutting and Toy Shadows, one item correlated very low (rit < .05) with subtest total scores. For Paper Cutting, correctly answering that item required children to realize that the unfolded papers were held by a person. This was a crucial element, because the cut out pieces of paper will fall down on the ground and will thus not be visible any longer. The discarded item of Toy Shadows did not differ from the other items in terms of content. However, one of the multiple-choice alternatives resembled the correct answer too much so that a lot of children chose this incorrect alternative. Because of low item–total correlations, we excluded five items of the subtest Decisions. Three of these items were irrelevant arguments that children should ignore when answering. Apparently, upper primary school children were not able to leave out these irrelevant statements. The other two excluded arguments were too ambiguous for the children to interpret. In total, we thus excluded nine items for further analyses.
Table 3 shows reliability coefficients for all Aurora-a subtests. The reliability coefficient for the analytical subtest Shapes was low (GLB = .39; λ2 = .42). This low reliability could be due to a high level of difficulty of some of the items. For 4 out of 10 items, performances were below or at chance level. We excluded the subtest Shapes from further analyses. Reliability coefficients for the other Aurora-a subtests were acceptable to good.
Descriptive Statistics of the Aurora Subtests.
Note. λ2 = Guttman’s Lambda 2; GLB = greatest lower bound.
We excluded the analytical subtest Shapes from analyses due to the low GLB.
Table 3 furthermore presents descriptive statistics for fourth-, fifth-, and sixth-grade children separately. The percentage of missings ranged from 44% (Cartoon Numbers) to 1% (Boats). The percentage of missings was highest in creative subtests. We expect this to be due to the unusual format of these subtests. Especially for Cartoon Numbers, the assignment involved the unusual situation of numbers involved in a social context. In the Netherlands, however, arithmetic is taught according to the idea that mathematics must be connected to reality, stay close to children’s experience, and be relevant to society (Van den Heuvel, 2000). The Cartoon Numbers subtests might have differed too much from this format for children to answer the questions. Because a previous study showed ceiling effects in some of the Dutch subtests (Gubbels, Segers, & Verhoeven, 2014), we performed further frequency analyses. According to Terwee et al. (2007), a ceiling effect is present if more than 15% of all respondents achieved the highest possible score. Frequency analyses on the 15 Aurora subtests showed ceiling effects for the subtests Decisions, Toy Shadows, and Boats, with 29.7%, 27.0%, and 16.5%, respectively, of all children achieving the highest possible score.
Table 4 presents Pearson’s correlations and disattenuated correlations between Aurora-a subtests. Based on the triarchic theory of intelligence, we expected substantial correlations between subtests within the three ability-domains. Generally, correlations coefficients between Aurora-a subtests were however low,
Note. rs ≤ −.09 and rs ≥ .09 are significant.
Pearson correlations are presented above the diagonal. bDisattenuated correlations are presented below the diagonal.
A CFA with three factors yielded an inadequate model fit (see Table 5). Figure 1 shows the standardized pattern coefficients for the 15 subtests. Standardized path coefficients ranged from .31 to .64 (M = .54) for the analytical, .14 to .68 (M = .38) for the creative, and .39 to .60 (M = .51) for the practical subtests. Moreover, the figure shows that the correlation between the latent factors comprising analytical and practical subtests was 1.00. In addition, the creative factor also correlated substantially (r = .83) with the analytical and the practical factor (r = .79). Results of the CFA furthermore showed high levels of error variance for all subtests. Altogether, results of the CFA indicated that the model based on Aurora-a subtest scores deviated substantially from the suggested triarchic model.
Goodness-of-Fit Statistics for the Confirmatory Factor Analysis Models.
Note. df = degrees of freedom; CFI = comparative fit index; RMSEA = root mean square error of approximation.
Models are compared to the single-factor model.

Standardized path coefficients for the 15 Aurora-a subtests in the three-factor triarchic model.
To further explicate the underlying structure of the data, we next examined the fit for a two-factor model. In light of the high correlation between the analytical and practical factor in the three-factor model, we combined analytical and practical subtests into one factor. Again, the analytical and creative latent factor correlated substantially (r = .81). The fitted two-factor model showed standardized path coefficients ranging from .30 to .64 (M = .52) for the analytical/practical factor. For the creative factor, standardized path coefficients ranged from .14 to .68 (M = .38). Finally, we examined the fit for a single-factor model. Standardized path coefficients now ranged from .10 to .64 (M = .46), with lowest standardized path coefficients found for the creative subtests. The goodness-of-fit statistics showed that both the two-factor model and the single-factor model did not adequately fit the data (see Table 5).
A chi-square difference test in which the three- and two-factor models were compared with the single-factor model revealed a significant improvement in fit for both models (see Table 5). However, the goodness-of-fit statistics did not substantially differ for the three models. In addition, results for the three-factor model indicated a very large overlap between the analytical and practical factor. Although a high correlation between the analytical/practical factor and creative factor was found in the two-factor model as well, the low factor loadings for the creative subtests in the single-factor model seem to indicate that there might be a second factor involved. Therefore, we inspected modification indices of the two-factor model’s factor loadings and changed the parameters if they resulted in a considerable improvement in fit. The model fit of the adapted two-factor model presented in Figure 2 improved, although still not adequately fitted the data (see Table 5). The correlation between the analytical/practical and creative factor was .42. With regard to subtests’ factor loadings, the originally analytical subtest Metaphors was found to load more strongly on the creative factor. In addition, the subtests Homophones, Decisions, Book Covers, and Figuratives loaded both on the creative factor and on the analytical factor.

Standardized path coefficients for the 15 Aurora-a subtests in the adapted two-factor model.
Discussion
The aim of the present study was to investigate whether triarchic intellectual abilities can be discriminated in upper primary school children using the paper-and-pencil test of the Aurora-a Battery. Low correlations between subtests indicated that the categorization of subtests did not correspond with the original classification of analytical, creative, and practical abilities. In addition, correlations between subtests within the verbal, numerical, and spatial domains were also low. The only earlier study that addressed the structure of the Aurora-a with a CFA found an excellent fit between Aurora-a data and the triarchic model of intelligence (Aljughaiman & Ayoub, 2012). In addition, they found high factor loadings for all combined ability-domain subscales. In contrast, we performed the CFA on a subtest level and did not find support for a triarchic factor structure in the Dutch version of the Aurora-a. An adapted model with an analytical/practical and creative factor fitted the data best. An explanation for these results might be found in the design and adaptation of the subtests, yet might also indicate flaws in the underlying triarchic theory of intelligence.
First, results of the three-factor model indicated a very high correlation between the analytical and practical factor. In addition, creativity factor scores also correlated substantially with analytical and practical factor scores in both the three- and two-factor models. The high correlations between factors might be due to our sampling procedure. Our research sample mainly comprised children from high-SES backgrounds. Previous research has, however, shown that especially minority and economically disadvantaged students profit from assessment batteries addressing a broad range of cognitive skills (Stemler et al., 2006). In a study sample with children from more diverse backgrounds, the Aurora-a might thus be found to differentiate the three types of abilities more strongly than in the current sample. However, the extremely high correlation between the three latent factors might also indicate that the theory of triarchic intelligence is flawed. Although triarchic theory considers practical abilities as essential as analytical and creative abilities, we did not find any evidence for the first type of abilities.
After some adaptations were made based on modification indices, the model fit of a two-factor model was better than that of the three-factor model, yet still inaccurate. The substantial factor loadings found for analytical and creative subtests were supportive of Guilford’s (1959) distinction between convergent and divergent thinking. This distinction is also acknowledged in many of the more recent models of intelligence and giftedness. In the CHC model of intelligence (McGrew, 1997), for example, some of the narrow abilities in the long-term storage and retrieval component are related to divergent thinking abilities. Similarly, the differentiated model of giftedness and talent of Gagné (2004) considers creativity and intellectual abilities as two types of natural abilities or gifts. The models of Renzulli (1986) and Mönks and van Boxtel (1985), on the other hand, include creativity and intellectual abilities as distinct, yet associated determinants in reaching gifted levels of performance. Although the terminology and exact role of both types of abilities vary over theories, most theories acknowledge both analytical and creative abilities to play a role in the intellectual development of children. The role of practical abilities is, however, less evidenced and not supported in the present study.
With regard to the design of the subtests, the type of assessment might diverge from the targeted ability. Researchers often use tasks in which children have to find as many responses as possible in a limited time as an assessment of children’s creative abilities (Lubart, Pacteau, Jacquet, & Caroff, 2010). In our results, substantial correlations were found between Aurora-a subtests that resembled these divergent thinking tasks. With the Multiple Uses subtests, children had to write down as many applications of common objects as they could think of. Similarly, the subtests Metaphors and Conversations required children to find as many similarities and conversational expressions respectively. Although Metaphors originally belonged to the analytical domain, this subtest strongly resembled the Unusual Uses subtest from the Torrance Test of Creative Thinking, which was designed to assess creative abilities (Sternberg, 1998). Practical abilities, on the other hand, can be assessed with either tacit knowledge tests or practical ability inventories. In previous studies using these inventories, both Heng (2000) and Cianciolo et al. (2006) found an overlap between general academic abilities and practical abilities. Of the practical subtests, Toy Shadows and Decisions were the only subtests that partly matched the tacit-knowledge format of judgement of real-life situations. With regard to Paper Cutting, a similar subtest is included in the well-established Standford–Binet Intelligence Scales (Terman & Merrill, 1960) as an assessment of abstract visual reasoning. The design of the practical subtests might thus have resembled general intelligence test formats too much to discriminate between the two types of abilities.
As a final point for discussion, it should be noted that there was little room for improvement in some children. High subtests scores were found earlier in a Dutch sample of gifted upper primary school children (Gubbels et al., 2014). The present study amplifies these findings, showing maximum scores in a number of children in a more heterogeneous sample including 10- to 12-year-old children of all intelligence levels. None of the earlier studies evaluating the Aurora-a Battery reported descriptive statistics, so it is unclear how children in other countries scored on the 17 Aurora-a subtests. Further research is needed to examine the presence of ceiling effects in other than Dutch children and the role of cultural differences herein.
The present study has some limitations. First, the current study addressed scores of a Dutch translation of the Aurora-a. Although we tried to translate subtests as strictly as possible, we cannot be sure that the Dutch version was similar to the original version or the Arabic version used by Aljughaiman and Ayoub (2012) with respect to psychometric qualities. In addition, cultural differences between Dutch and Arabic children might limit the comparability of both studies. Second, we did not take into account individual variation in ability profiles. A study by Kornilov et al. (2011) demonstrated that some individuals show rather flat intelligence patterns with no apparent strengths or weaknesses, whereas others clearly excel in one of the abilities. In addition, Lohman, Gambrell, and Lakin (2008) showed that high-ability children show extreme discrepancies in abilities more often than average-ability children. Also, we did not include any school achievement tests, so the association between intellectual abilities and academic performance cannot be elucidated. Moreover, data were cross-sectional, so conclusions on the development of intellectual abilities over ages could not be drawn. The development of these abilities and its relation to school achievements could be addressed in future research using a longitudinal design.
To conclude, results from the present study showed that the Dutch version of the Aurora-a Battery did not accurately represent the underlying triarchic theory of intelligence, yet did differentiate analytical and creative ability scores. These findings might indicate deficiencies either in the triarchic theory or in the design of the Aurora-a Battery. Researchers should sustain their current efforts to evaluate the validity of various theories of intelligence and develop theory-based assessment instruments.
Footnotes
Acknowledgements
We would like to thank Dr. Elena Grigorenko and her laboratory at the Yale Child Study Center for sharing the battery and offering their support. Furthermore, we are especially grateful to Dr. Els Schrover from the Centre for the Study of Giftedness for translating all Aurora-a subtests to Dutch.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Netherlands Organisation for Scientific Research (Grant No. 411-12-621).
