Abstract
A popular conception of the “intelligence,” or g, thought to be measured by IQ tests, is that of a cognitive “strength” variable that facilitates complex cognition such as reasoning and problem solving. Yet test items seem remarkably un-complex when compared with everyday cognition. Here, typical verbal and non-verbal test items are examined and arguments asserting their complexity are challenged. In contrast, several lines of research indicate how “real life” cognition is much more complex than that required by such items. The claim that an IQ-job performance correlation is stronger for more complex jobs is also challenged. This leads to the suggestion that other sources of variance, including cultural, affective, and other non-cognitive factors, may explain differences in test performance. An alternative explanation for the still-puzzling “Flynn effect” is proposed, with the idea that IQ differences reflect cultural “distance” (from possibly equal, but different, complexities) rather than a universal cognitive “strength.”
Individual differences in IQ test performance are often thought to reflect variation in ability for complex cognition. Such ability is itself thought to differentiate as a kind of cognitive energy, capacity, or “strength” variable, the metaphor underlying the concept of “g”. For example, Gottfredson (2002) suggests that a “useful working definition of g for understanding everyday competence is therefore the ability to deal with complexity” (p. 29). More recently, Gottfredson (2007, “Measure vs. the construct”, para. 1) claimed that “The active ingredient in tests of intelligence is the complexity of their items … that makes some more difficult than others (more abstract, more distracting information, require inferences, etc.).” She provides examples of such complexity, as seen in test items: e.g., solving a 3x3 matrix item compared with a 2x2 matrix; reproducing a nine-block pattern compared with a four-block pattern; describing the similarity of the words “seed – egg,” compared with “pear – apple.”
But how sure can we be that individual differences found in IQ test performances really are of complex cognitive ability, rather than, say, equal, but different, complexities, perhaps reflecting differences in knowledge, cognitive structures, or other learned resources acquired in different cultures or contexts? To be sure, this is not a question that challenges the existence of g, a concept arising from correlations among test performance; rather it’s about how to describe and characterize the true source of such correlations (and thus the nature of g). It is assumed that the “g-theory” of IQ test performance is so well-known and widely disseminated (e.g., Mackintosh, 2011) as to not require detailed description here. But, as with any science, theoretical progress depends on progressive clarification of concepts. That the area sorely needs it is seen, for example, in Deary’s remark in a well-known introductory text on intelligence, that “There is no such thing as a theory of human intelligence differences—not in the way that grown-up sciences like physics or chemistry have theories” (Deary, 2001, p. ix). A problem constantly overlooked, for example, is that the “complexity” ingredient is not obvious in all the contents of the most-used IQ tests, like the Stanford–Binet and the Wechsler scales. Indeed, typical IQ test items seem remarkably un-complex in their cognitive demands compared with, say, the cognitive demands of ordinary social life and other everyday activities that the vast majority of children and adults can meet.
For example, to take the most extreme cases, the tests include numerous simple questions like “What is the boiling point of water,” “Who wrote Hamlet,” “In what continent is Egypt,” and so on. These demand little more than rote reproduction of factual knowledge most likely acquired from experience at home or by deliberate teaching in school. Much research indicates how opportunities and pressures for acquiring such valued pieces of information—from books in the home to parents’ interests and educational level—are more likely to be found in middle-class than working-class or ethnic minority homes. This helps explain why differences in home background correlate highly with school performance (Mackintosh, 2011). So, individual differences on such items are more likely to stem from differences in cultural and family background than ability for complex cognition. In other words, the variance underlying different responses to such items could be described as reflecting cognitive “distance” (from the specific learning demanded by the items) rather than a more general cognitive “strength” needed to deal with item complexity. It seems interesting and important to check whether such doubts about test complexity apply to other typical test items. Such checks could not, of course, be covered exhaustively in a single short paper. Those that follow merely illustrate some possibilities for further discussion.
Complexity in verbal items?
It can be argued that individual performances on common verbal test items, such as vocabulary (word definitions), similarities (describing how two things are the same), and comprehension (explaining common cultural phenomena such as why doctors need more training) most clearly reflect culturally acquired knowledge or processing as the real source of variance. Gottfredson (1997), however, argues that there is more than meets the eye in such simple-looking test items. For example, she argues that even a simple vocabulary test is one of “a highly general capacity for comprehending and manipulating information … a process of distinguishing and generalizing concepts” (p. 94). However, that argument must surely apply to all word definitions, not just those devised by an item designer from a specific culture. After all, items are selected for inclusion in a test not on the basis of the complexity of cognition they demand, but by virtue of agreement, on initial trials, with other test items (Anastasi, 1990). That is, they are selected because they tend to favour testees who also do well on other items—for whatever reason. This is complexity by imputation not demonstration.
In other words, we need to know in what ways the few items chosen demand greater cognitive complexity than that evident in other words in other cultures. As an example, take the Scots word “canny” (meaning shrewd), which has subtle differences from the same word used in north-east England (meaning alright, “OK”). In these different contexts children are assimilating equally complex conceptual meanings, yet the test item would elicit correct answers from one group, incorrect from the other. Again the true source of variance in apparent complex cognition is really one of cognitive distance rather than relative cognitive strength.
Likewise with “similarities” items. Gottfredson (1997) suggests that being able to say why a pair of words like “fly – tree” are alike tests the learning of more abstract knowledge than the pair “orange – banana,” and that such learning has required a greater ability for complex cognition. That sounds reasonable: but it doesn’t necessarily mean that children who fail that item are incapable of using the taxonomically higher category. Development of such conceptual hierarchies is exceedingly widespread in nearly all children, as well as adults. Rosch (1978) described the development of taxonomic—subordinate, basic, and superordinate—schemes of object concepts from a young age. For example, virtually all children from grades 3 and 5 were able to sort pictures according to superordinate categories. Similarly, Walsh, Richardson, and Faulkner (1993), found that even 4-year-olds paired objects from triads on the basis of their superordinate taxonomic category (e.g., pictures of swan and elephant) rather than perceptual similarity (e.g., pictures of elephant and kettle), a tendency that increased with age. This suggests that differences between children in whether or not they can bring such conceptual structure to bear in describing object similarities is deeply experiential, and, therefore possibly culture bound. Of course, it is possible that some children will acquire the superordinate-level concepts earlier or faster than others, but this needs to be objectively established on culturally appropriate items.
Other IQ test items require that culturally distinct forms of language have been acquired, rather than a distinct ability for complex cognition as such. Take, for example, analogical reasoning items like: black is to white as night is to … ? (day).
Such items have been taken to illustrate Spearman’s (1927) theory of intelligence as the “eduction of relations and correlates” (p. 172; a relationship is induced from the first part and the answer deduced by mapping the relationship onto the second part). However, as Goswami (1992, 2007) discovered, failure on these items could arise from lack of knowledge of the relations being tested. The solution is, she suggests, to design items based on relations equally familiar to all group(s) being tested. Then we will know that they are truly being tested for complexity of cognition and not culturally related experience. When this is done, indeed, it transpires that even very young children are capable of “complex” analogical reasoning (Goswami, 2007).
Such items also presume culturally specific ways of linguistically coding the semantic relations that constrain ways of engaging with and thinking about problems. Children who fail such a task may do so because of lack of familiarity with the linguistic form “is to … as,” rather than being unable to realize that day is opposite to night in the way that white is opposite to black (Olson, 2005).
So a mere assertion of complexity about IQ test items based on a surface appearance may be too simplistic. As Gottfredson (1997) indeed admits, “We lack systematic task analyses of IQ tests, partly because their development has traditionally been guided by empirical procedures (e.g., what discriminates best between individuals who are considered gifted, average, and retarded), rather than by theoretical considerations” (p. 96). All other discussion of complexity in this context, and of the source of variance underlying it, is informal or impressionistic, usually without regard for cultural contextualization and differentiation.
Complexity in non-verbal items?
One attempted solution to these problems with verbal items has been to use non-verbal test items. The classic examples, in this regard, are probably the Raven’s Progressive Matrices (or just the “Raven”), often described as a test of “pure g” (Jensen, 1980). The example in Figure 1 illustrates the induction of a relationship (a covariation) across the rows, and the relationship being conditioned down the rows. The conditioned covariation can then be used to identify the missing element from the given options (requiring what is sometimes called conditional reasoning). The nature of the covariations (usually called “rules”) varies from item to item, and can vary in complexity in the form of conditional depth or abstraction: e.g., more elements changing, or nonlinearities in the conditioning.

A matrix item typical of a test like the Raven.
As with other IQ test items, there has actually been little in-depth analysis of the cognitive demands of the Raven. What there has been suggests that, again, the Raven is surprisingly un-complex in its cognitive demands, compared with everyday cognition. For example, in perhaps the most thorough analysis to date, Carpenter, Just, and Shell (1990) found little evidence that “level of abstraction,” as in the complexity of rules, influences item difficulty.
One factor in discriminating between subjects appeared to be numbers of different rules involved. This is sometimes explained in terms of another construct, “working memory.” As the number of rules increases, working memory load and “goal management” demands are said to increase: that is, more rules must be maintained in working memory while the conditioning is abstracted and the best fit then predicted. Indeed, Embretson (1995, 1998), using factor analysis of scores, claimed that “working memory load” (using a formula based on numbers of rules in items) and “general control processes” accounted for 92% of the variance in test performances. A large body of other data has been interpreted as supporting a substantial relationship between working memory capacity and g (Ackerman, Beier, & Boyle, 2005; Nisbett et al., 2012; Primi, 2001; Shipstead, Redick, & Engle, 2010). So the source of variance in Raven test performance is described as the capacity to manage sub-goals in multiple channels of processing simultaneously.
On the other hand, using a design aimed at separating sources of variance, Primi (2001) found little evidence for the role of goal management or “central executive processes.” Unsworth and Engle (2005) found that the association between working memory span and Raven scores was fairly constant across levels of item difficulty, memory load, and rule type. Thus they concluded that “something other” than the number of things that can be held in memory is involved in performance on the Raven. Some possibilities will be discussed below. This illustrates the point that there is a great deal of confusion about the nature of working memory. Indeed, Nisbett et al. (2012) concluded, in their review, that the definition of working memory remains disputed and relations with IQ or g are still uncertain.
Complex cognition in real life
One problem surrounding descriptions of “what IQ tests test” is uncertainty about how to describe complex problems and the true nature of the cognition needed to deal with them. As Eysenck and Keane (1990) put it: “There is a strong sense in which thinking research has failed to capture the dynamic qualities of everyday thought” (p. 462). Richardson (2010, 2013) has suggested an information theoretic approach in which complexity can be equated with depth of covariation between increasing numbers of spatio-temporal variables, or so-called “mutual information” (see also Prokopenko, Boschetti, & Ryan, 2009). Systems for abstraction of such complexity appeared very early in biological evolution. Absence of such clear understanding means that, as Eysenck (2004) put it, “There is an apparent contradiction between our ability to deal effectively with our everyday environment, and our failure to perform well on many laboratory reasoning tasks” (p. 104).
On the other hand abundant cognitive research suggests that everyday, “real life” problem solving, carried out by the vast majority of people, especially in social-cooperative situations, is a great deal more complex than that required by IQ test items, including those in the Raven. Indeed, it seems that the environments experienced by all living things are highly changeable and complex in a more dynamic sense than static test items would suggest, combining many more variables, changing over time, and interacting (Green & Sadedin, 2005). Accordingly, the eduction of relations and correlates is exhibited in quite simple brains like those of honeybees and fruit flies (Miller, 2009; for review see Richardson, 2010). In humans, all ordinary social life demands highly complex cognition because “other individuals are among the most complex, novel, changing, active, demanding, and unpredictable objects in our environments” (Gottfredson, 1997, p. 107). In other words, highly complex cognition is intrinsic to everyday human social life and is experienced in children from birth.
Unsurprisingly, therefore, complex cognition is evident even in very young children. Schulz and Gopnik (2004), for example, presented problems to preschool children requiring construction of a complex mental matrix of conditioned covariations, sub-goal management in working memory, and executive functions. “Children in the test condition had to keep track of a complex pattern of conditional dependence and independence” (p. 166); yet they made the correct inferences most of the time. More recently it has been shown that even young infants are capable of such complex conditional reasoning (Gweon & Schulz, 2011; Téglás et al., 2011). It is not at all clear, therefore, in what ways variation in the Raven—or other test—performances can, in older children and adults, be attributed to differences in complex cognition, when these complex processes seem pedestrian in infancy.
Compounding this paradox is the evidence in adults that IQ scores may be unrelated to the ability to carry out complex problem solving in social and practical situations. In a study of betting at a race course, Ceci and Liker (1986) found that punters’ predictions of odds to be a sophisticated cognitive process, entailing values on up to 11 variables, involving non-linearities and complex interactions between them. They found individuals’ accuracy at such predictions to be unrelated to IQ test scores. Studies with artificial grammars, in which participants must learn deeply abstract rules in order to make accurate predictions, revealed little relationship between performance and IQ (Gebauer & Mackintosh, 2007; Reber, Walkenfeld, & Hernstadt, 1991). Similar negative results have emerged from studies using simulated factory-production or public-service systems, involving large numbers of variables organized according to complex equations, mostly opaque to the subject who has to regulate the system (for review see Beckmann & Guthke, 1995). Sternberg, Grigorenko, and Bundy (2001) refer to several studies with managers, salespersons, and university professors, showing that IQ-type test scores have little if any correlation with performances on the kinds of tasks they regularly encounter in their jobs. In a more recent review of such studies, Wenke, Frensch, and Funke (2005) conclude that “no convincing empirical evidence exists that would support a causal relation between complex explicit or implicit problem solving competence, on the one hand, and global intelligence on the other hand” (p. 181).
IQ and job complexity
Note that, like others, Gottfredson (1997) appeals a lot to correlations between IQ (type) test performances and “job complexity” (e.g., manual, clerical, professional) on the grounds that jobs vary in “the complexity of their information processing demands” (Gottfredson, 2002, p. 30). This conclusion is based on Hunter and Hunter’s (1984) meta-analysis of studies carried out before the 1970s. Although often repeated in the literature, the conclusion has been challenged on a number of grounds, especially the merits of the corrections applied to the raw data (see contributions in Murphy, 2003). In particular, subsequent meta-analyses of Hartigan and Wigdor (1989) on more recent data, not only found the IQ-job performance correlations to be weaker generally, but also to have much weaker covariation with job complexity (see Table 1).
Correlations from older and more recent meta-analyses.
UC = uncorrected; C = corrected (data from Hartigan & Wigdor, 1989; Hunter & Hunter, 1984). In Hunter’s two classification schemes I is “precision setup” group (e.g., machinist, cabinetmaker, metal fabricator); II is “feeding-offbearing” group (e.g., shrimp picker, corn-husking machine operator, cannery worker); III is “high complexity” (e.g., retail food manager, fish and game warden, biologist, city circulation manager); IV is “medium complexity” (e.g., automotive mechanic, radiologic technician, automotive parts counterman, high school teacher); V is “low complexity” (e.g. assembler, insulating machine operator, forklift truck operator).
Of course, a (debatable) correlation between IQ and job performance that itself increases with job complexity doesn’t necessarily mean that the true source of such covariance is causal; nor does the correlation describe its true source. As we have seen, the demands of IQ tests seem very un-complex compared with nearly all jobs, suggesting that there may be other sources of variance underlying differences in test performance. Some possibilities are examined in the next section.
Learned cognitive structures
A little further analysis suggests that, as with “vocabulary” or “similarities” sub-tests, what the Raven may actually test is the presence of knowledge structures more familiar to some (sub)cultures than others. The manipulation of symbols (e.g., words, numbers) in two-dimensional array on paper is a “cultural tool” in the western world. It is applied to record sheets, tables with rows and columns of totals and subtotals, spreadsheets, timetables, and so on, as well as textual material. These nearly all require the reading of symbols from top left to bottom right, additions, subtractions, and substitutions of numbers or other symbols across columns and down rows, and the deduction of new information from them. As the analyses of Carpenter et al. (1990) show, these are precisely the kinds of manipulations (or “rules”) built in to the Raven’s items (see Figure 1).
Experience with, or exposure to, such tools varies significantly across cultures and social classes, differentially preparing individuals for performance on items (including the Raven) reflecting them. As Luria (1976) explained, “The structure of thought depends upon the structure of the dominant types of activity in different cultures” (pp. xiv–xv). Or, as Nisbett (2003) puts it, “Differences in thought stem from differences in social practices” (p. 203).
Dasen and Mishra (2013) show that social class is related to culture in a deeper sense than the physical and social contexts in which the child lives. Families and sub-cultures vary in their exposure to, and usage of, the tools of literacy, numeracy, and other “ways of thinking,” that will prepare children for the culturally more specific content of IQ tests (Cole & Cagigas, 2010; Malda, Van de Vijver, & Temane, 2010). We now also know from brain imaging and other studies, how background experience with specific cultural tools or procedures results in changes in brain networks that differentially prepare individuals for given cognitive tasks (Han & Northoff, 2008; May, 2011; Woollett & Maguire, 2011). Such effects have been taken to explain the importance of social class context to cognitive demands (Hackman, Farah, & Meaney, 2010).
Occasionally, such cultural dependence is acknowledged by test users, as when Raven test items administered to participants in Kuwait “were transposed to read from right to left following the custom of Arabic writing” (Abdel-Khalek & Raven, 2006). And, as we shall see below, changes in cognitive aspects of, for example, mathematics teaching in schools is thought to be related to massive average gains in Raven scores, as in the so-called Flynn effect (see further below). Performance on the Raven’s test, in other words, is a question, not of inducing rules from meaningless symbols de novo, but of recruiting ones already acquired, or not, through the cognitive activities of some cultures more than others.
Again, in this view, rather than IQ being a “strength” measure of a context-free computational device, it is really a “distance” metric, expressing individuals’ socio-cultural parity/disparity with the cognitive structures incorporated by test designers into the test items. Bridging such a distance is probably what happens when children are adopted from lower class to middle or upper class homes and they gain as much as 12–18 IQ points (Nisbett et al., 2012), or move into middle class occupations, or gain other exposure to test-related cultural tools, and exhibit the “Flynn effect” (see below).
Fixed cognitive complexity?
Although IQ is often posed as a measure of a durable individual ability for complex cognition, evidence now suggests that this is not the case. Of course there is a modest average correlation of IQs at one age with IQs at a later age, depending on the gap between measures. But this may be no more than a measure of the degree to which individuals’ circumstances have remained unchanged (exactly the same may be said about language dialect, or personal antibody profiles, for example). We now know that sizable IQ changes can take place within individuals.
For example, Sigelman and Rider (2011) reported the IQs of one group of children tested at regular intervals between the ages of 2 years and 17 years. The average difference between a child’s highest and lowest scores was 28.5 points, with almost one third showing changes of more than 30 points (mean 100, of course). Ramsden, Richardson, Josse, and Thomas (2011) showed how individual IQs across the teenage years, in their sample, varied between minus 18 to plus 21, with 39% of the total sample showing statistically significant change.
Consistent with such observations is the frequent demonstration of the trainability of IQ, even on “fluid” (roughly non-verbal) intelligence, such as working memory tasks, thought to best reflect levels of g, and, therefore, expected to be relatively stable. For example, Jaeggi, Buschkuehl, Jonides, and Perrig (2008) trained adults (mean age 26 years) on a working memory (WM) task, requiring the ability to hold and manipulate information in the mind for a short period of time. They found a substantial “dose-dependent” transfer to performance on a matrices test like the Raven. This triggered other studies, although critical challenges have arisen over aspects of study designs and inconsistencies of results. For example, in a thorough experiment, Redick et al. (2013) found little improvement in RPM scores following WM training. However, in a more recent, better designed, study Jaeggi, Buschkuehl, Jonides, and Sha, (2012) confirm their original results, but emphasize how transfer effects are critically dependent upon initial training performance. It is also intriguing that the WM training group in the Redick et al. (2013) study reported enhanced everyday cognitive functioning compared with control groups.
This debate goes on. But then, as Redick et al. (2013) say, the problem is that we don’t really know what is being trained in WM training (see also Shipstead, Redick, & Engle, 2012). Interestingly Morrison and Chein (2011) showed how “core” training in WM, emphasizing attentional focus, and avoidance of distraction (ignoring irrelevant information), produced “far reaching” transfer effects. Redick et al. (2013), found that, in subjects where transfer did occur, transfer did not depend on initial WM or fluid intelligence scores, so suggested that “perhaps our lack of transfer represented a lack of motivation by our participants … (which) would severely impact our ability to detect improvements as a function of training” (p. 41). And a number of studies show how factors—even physical exercise—that improve sense of well-being, also improve memory and cognitive test scores (e.g., Hillman, Erickson, & Kramer, 2008). In other words, much of the WM training variance could be affective rather than cognitive in origin. As discussed further below, affective factors will influence distractiveness, attention, persistence, and other aspects of performance on all training and testing.
Other improvements in IQ related to relevant complex experience have been reported. For example, in one study, a single repeated experience with an abstract reasoning test exhibited an average score gain of 0.38 SDs (about 6 IQ points; Dunlop, Morrison, & Cordery, 2011). Salthouse (2013) shows that cognitive test experience at any age improves performance at a later age, irrespective of age of participants (from 18–80 years). Moreno et al. (2011) showed that even computerized training on music in preschool children boosted IQ test performance in over 90% of the sample. Such improvements in experience-dependent test performance may offer further clues as to the variance basis of IQ. Again, such results could indicate that something other than a general ability for complex cognition accounts for at least much of the variance in IQ test performance.
Non-cognitive sources of variance
It is possible that much of the source of variance in IQ test performance, and, therefore in g, does not stem from ability for complex cognition at all. As Mackintosh (2004) reminded us, “g reflects no more (and no less) than the indisputable fact that scores on all IQ tests are positively correlated. Equally indisputably, however, we have little idea of the reason(s) for this positive manifold” (p. 217). Jensen (1998) concurs that “g tells us little if anything about contents” (p. 92). Spearman’s original hypothesis about g was based on inter-correlations between school attainments on various subjects. But surveys consistently show that the biggest influence on school performance is parental support. Parental interest and encouragement, their educational and occupational aspirations for their children, maternal involvement in play, provision of opportunities for learning, and so on, show correlations of 0.6–0.7 with IQ (Mackintosh, 2011). A child who is being motivated or “pushed” by parents will tend to put in above-average effort in all subjects, and conversely for those who are poorly motivated. This factor alone could explain inter-correlations across subjects.
In addition are various affective factors that can determine engagement with, and performance on, cognitive tests. Levels of self-confidence, stress, motivation, and anxiety, and general physical and mental vigor, all affect education, job, and cognitive test performances which will, therefore, tend to inter-correlate (Derakshan & Eysenck, 2009; Wilkinson & Pickett, 2009). Coy, O’Brien, Tabaczynski, Northern, and Carels (2011) show how anxiety in test takers reduces performance through cognitive interference (see also Lang & Lang, 2011). Experiments by Zanto and Gazzaley (2009) show that the primary determinant of working memory performance (often assumed to be the basis of differences in IQ test performance) is not maintenance and manipulation of information; rather it is suppression of irrelevant information, or distraction, which is highly prone to anxiety and level of motivation.
Motivation has been shown to be a significant factor in other research. Duckworth, Quinn, Lynam, Loeberd, and Stouthamer-Loeber (2011) found that “incentives increased IQ scores by an average of 0.64 SD, with larger effects for individuals with lower baseline IQ scores” and that, after adjusting for the influence of test motivation, “the predictive validity of intelligence for life outcomes was significantly diminished, particularly for nonacademic outcomes” (p. 7716). Lovaglia, Lucas, Houser, Thye, and Markovsky (1998) predicted that status consciousness will alone produce differences in IQ test performance between high-status and low-status individuals. In three controlled experiments, participants were randomly assigned to low or high status groups. The latter consistently scored up to seven points higher on the Raven test.
As with knowledge structures, mentioned above, such affective factors are closely related to social class background. Schaffer (1996) argued that class/cultural experiences create specific “belief systems” in parents that determine parental rearing practices. Such belief systems include powerful self-evaluations of personal cognitive competence, or cognitive self-efficacy beliefs (Bandura, 1997; Ward, 2012). More to the point, these tendencies stem from perceptions of place in a social order and extent of control in, and cognitive engagement with, that order.
Those who lack power tend to feel lack of control over superordinate goals and values, think in a less abstract way than they are capable of, and merely “view themselves as the means for other people’s goals” (Smith, Jostmann, Galinsky, & Van Dijk, 2008, p. 446), also resulting in loss of motivation. Such feelings and perceptions are, in turn, closely related to the extent of inequality in societies, such that the greater the degree of social inequality the more disparate the self-perceptions (Loughnan et al., 2011; Wilkinson & Pickett, 2009). In a series of experiments with working-memory-type tasks, Smith et al. (2008) showed how a sense of powerlessness tends to make individuals more vulnerable to performance decrements during complex executive tasks. Such depression of social power engagement—or its converse, the inflated self-perceptions of more powerful classes—is said to explain the link between socio-economic status (SES) and cognitive and brain development (Hackmann et al., 2010; Loughnan et al., 2011).
The cognitive consequences of social power relations in families are likely to be (socially) transmitted to children in two senses. Parents who have socially acquired doubts about their own abilities are less likely to encourage those abilities in their children (Dweck, 2008). The children are then more likely to avoid cognitive engagement with unfamiliar or challenging problem solving situations (Ahmavaara & Houston, 2007). Moreover, biological effects of stress on parents, as in future response and avoidance tendencies, may also be transmitted epigenetically to children, by affecting gene transcription affecting physiological stress management processes, so that offspring will tend to underperform in stressful situations (Daxinger & Whitelaw, 2012).
Such power-based affective factors are rarely taken into account when explaining variance in IQ scores, or seeking causes in IQ correlations. Yet they may constitute much of the general factor—that is, g—arising from inter-correlations of test scores. Consistent with this point is that inter-correlations between tests are stronger in the lower score ranges than in the higher (Evans, 2000), whereas the opposite may be expected. Because, according to this view, IQ score differences are, in effect, merely a re-description of the distribution of social power, with its cognitive-affective consequences; such power-based affective factors may also explain well-known correlations between IQ and many other consequences of that power structure, such as social class, health, life expectancy, crime, and so on (Gottfredson & Deary, 2004).
Note that this “power factor,” and its relationship with social class, is unlikely to be controlled or factored out from correlations by indices of SES, which tend to be rather crude. Such attempts have nearly all used father’s occupation at a particular time in life as an index of social class during development. For example, Gottfredson and Deary (2004), merely use occupation or relative affluence of area of residence at mid-life, which may be little guide to the real psychology of families and changes over time. Marriages are often cross-SES: it is usually the father’s occupation that is taken as the family SES, but it is mothers’ belief systems that are most related to their children’s cognitive development. Families of the same SES living in the same neighborhood can vary enormously in terms of such belief systems.
Moreover, social mobility is such that current SES may be a relatively recent status. In the (British) National Child Development Study (Fogelman, 1983) 16% of fathers had changed their SES in the short period from their children being 7 to 11 years of age. And children in the same family may be treated quite differently from one another. As Noble, Farah, and McCandliss (2006, p. 363) say, SES is itself a complex construct with largely enigmatic interactions with other influences on cognitive development and school attainments. All these social dynamics will render SES only a weak index of the social “power” factor most related to IQ variance, and reduce the IQ-SES correlation. In fact, it could be that IQ is itself a more subtle index of the social class/cultural influences that really matter in cognitive development. This could explain why the association between SES and academic achievement is slight when a measure of cognitive ability is partialled out (Diniz, Pocinho, & Almeida, 2011).
The Flynn Effect and complex cognition
The “Flynn Effect” describes the steep increase in average IQ test scores over time in a wide range of countries: an increase of about 15 points or one standard deviation per generation (Flynn, 1998). It is most pronounced on non-verbal tests like the Raven on which average performances in Britain gained 27.5 points between 1947 and 2002 (maximum score 60; Flynn, 2007). The phenomenon remains puzzling to g theorists because it could not be due to genetic changes or other improvements in a simple speed, capacity, or strength variable over the short period in question (equivalent, say, to a physiological or physical parameter like physical strength or metabolic efficiency showing a population-wide gain of 50% over two generations). And, as Flynn (1998) noted, “IQ gains have not been accompanied by an escalation of real world cognitive skills … an evolution from widespread retardation to normalcy, or from normalcy to widespread giftedness” (p. 61).
So it is generally recognized that the effect must, at least predominantly, be due to environmental effects. Yet these remain enigmatic. Improving nourishment and general health, increasing test familiarity, or cryptic genetic effects, have been invoked. However, empirical support for these ideas has been mixed: for example, there is little regional or temporal correspondence between nutritional improvements and rising scores (Flynn, 2007).
Flynn believes that the effect is due to elevated demands for the kinds of abstract problem solving skills, in scientifically advanced cultures, that are also tapped in IQ test items. The effect also coincides with the fact that more children in most countries are spending longer periods of time in institutional education emphasizing a scientific approach to reasoning with an attendant emphasis on classification and logical analysis. Flynn (2007) refers to this as the spread of “scientific spectacles” (p. 143). According to Nisbett et al. (2012), massive gains on tests like the Raven show “that reasoning about abstractions has improved in a very real way” (p. 141).
Certainly schooling has a definite and substantial effect on IQ test performance (Winship & Korenman, 1997). However, such explanations tend to envisage a single, vertical axis of “abstract thinking,” reflecting the predominant “strength” model of g. As Fox and Mitchum (2013) note, they “rest on a conceptual metaphor of the Flynn effect as an increase in some psychological quantity that is already possessed in greater or lesser amounts by every person in every population” (p. 979).
There are several problems with that view. First, as described above, the cognitive demands of items like similarities, analogies, and Raven’s matrices compare poorly with those of everyday life. Indeed, there is no objective or generally accepted measure of “abstractness”: the view of what is more or less abstract tends to be impressionistic rather than objective. So IQ test performances may not, themselves, be good reflections of what is more or less abstract. They may simply reflect equally abstract, but different, forms of reasoning ability unequally tapped. As Dasen and Mishra (2013, p. 1) put it, “cultural differences occur in cognitive styles rather than in the presence or absence of particular cognitive processes” (p. 1). Such styles can only be properly compared in terms of their “distance” from each other in the way that the concept of “linguistic distance” is used to measure how different one language or dialect is from another (e.g., Chiswick & Miller, 2005).
Of course, it would always be possible to view one such language as a standard of language ability and to construct a universal language “test” based upon it. But it would soon be realized that such a test would be measuring (horizontal) language distance and not (vertical) strength. More to the point, as more people acquired that particular form, perhaps through migration or other kinds of exposure and familiarity, the average “score” would tend to rise over time, though without any improvements in general language ability as such.
We suggest that something like this has happened with IQ scores across generations to produce the Flynn effect. The leaps in average IQs do correspond with the demographic swelling of the middle classes over the decades in question and movement of more individuals to new levels in the social power structure. On the one hand, this means greater familiarity with the cultural tools of test designers, including the specific conceptual and classification schemes, text and number literacy, and so on, that are embedded in IQ test items. The importance of such familiarity—rather than cognitive complexity as such—was illustrated above with particular reference to the cognitive demands of the Raven test. On the other hand, such class mobility means improved sense of place within the power structure together with improved self-esteem, self-confidence, self-efficacy beliefs, and so on, all leading to improved test performance.
This idea is consistent with the data. Within any country, as noted by Nisbett et al. (2012), “Gains differ as a function of the degree of modernity that characterizes different nations” (p. 142): relatively modest for those where changes in class structure are already past a peak; spectacular for those (such as African countries) where they are more recent. As mentioned above, it is such a distance that is probably being bridged when children are adopted from lower class to middle class homes and gain up to 12 IQ points relative to those not so promoted. In other words, the Flynn effect further reveals IQ to be a distance rather than a strength metric: that is, of proximity to a specific set of socio-cognitive structures and to affective test preparedness.
Conclusion
IQ is almost unique in the field of scientific measurement for absence of agreement, over most of a century, about what is being measured. Today, a glance at virtually any collection of views on the subject will demonstrate how these disagreements persist. In the Cambridge Handbook of Intelligence, for example, Davidson and Kemp (2011) note that “Few constructs are as mysterious and controversial as human intelligence,” and that “there is little consensus on what exactly it means … for one person to be more intelligent than another” (p. 59).
A wide, though sometimes intuitive, understanding, is that differences in IQ test performance are really differences in the ability for complex cognition, a position reflected in the concept of g. A few psychologists have attempted to justify that position; generally, though, it has not been subjected to close scrutiny. Here the position is examined through analysis of some typical verbal and non-verbal items. As a result it is suggested that the tests are remarkably un-complex cognitively, especially in relation to the complexities of cognition most people exhibit in everyday tasks and social interaction. Examples of the latter were considered for illustration, as was the reported association between IQ and (complexity of) job performance. Alternative sources of IQ differences were then considered, including differential preparedness for the test stemming from (a) differences in cognitive structures developed from differences in social practice in different (sub-)cultures and social classes and (b) non-cognitive, or affective, differences arising from the power structure of western societies. Reflections of these in the lability and “trainability” of individual test performances were then considered, and a new explanation for the Flynn effect is proposed. It was concluded that IQ differences, and apparent differences in ability for complex cognition, arise, at least in part, from differences in cognitive “distance” between multiple cognitive styles rather than in a singular cognitive “strength.”
Footnotes
Funding
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
