Abstract
Intelligence is an important human trait on which people differ. Few studies have examined the stability of intelligence differences from childhood or youth to older age using the same test. The longest such studies are those that have followed up on some of the participants of the Scottish Mental Surveys of 1932 and 1947. Their results suggest that around half of the individual differences in intelligence are stable across most of the human life course. This is valuable information because it can be used as a guide to how much of people’s cognitive-aging differences might be amenable to alleviation.
People differ in their ability to solve mental problems. This ability has associations with success in life. People who are more cognitively able—more intelligent—tend to stay longer in full-time education, to have more professional and higher-income occupations, and to be healthier and live longer (Deary, 2000, 2012, 2013; Deary, Weiss, & Batty, 2010; Strenze, 2007). In the present article, I address why it is important to know the long-term stability of individual differences in this important human trait.
Two Meanings of Stability
One type of stability is that of mean levels. Some cognitive skills show a steady decline in average scores after young adulthood (Salthouse, 2010). These include processing speed, reasoning, spatial ability, and some aspects of memory. Other cognitive skills’ mean levels decline later and less, if at all, before very old age. These include vocabulary and other verbal abilities, general knowledge, and some number skills. Research on the stability of mean cognitive levels is based on cross-sectional, longitudinal, and cross-sequential designs (Schaie, 2005). The field has lively debates about how much, whether, and when certain cognitive abilities change with age (Salthouse, 2009).
The present article is based on a second type of stability: the stability of individual differences. Think about any human trait that is distributed along a continuum, such as height, weight, extraversion, or intelligence. We can ask whether people retain their relative ranks in the continuum as they change with age: Are heavier people in a group at one age still heavier in that group later on? A simple way to visualize this stability is by using a scatter plot. This is a plot in which the values of the trait for each person at Time 1 are placed on the x-axis and the values measured at Time 2 are placed on the y-axis. A numerical value can be given to this type of stability using a correlation coefficient (Deary, 2000). This is a statistic whose values can range from −1 through 0 to +1. Perfect stability of individual differences—that is, the retention of everyone’s relative position in a group—from one time to the next would be indicated by a value of or near to +1, and no stability at all by a value of 0.
It is important to know how stable intelligence differences are across the human life course. If we find out that intelligence differences are somewhat unstable between youth and old age, it could mean, for example, that some people who start off scoring low on cognitive tests do better later, and vice versa, or that people who have the same cognitive scores as youngsters diverge over time as they age. Next, we should want to discover variables that cause individual differences in changes in cognitive function. These causes could be ingredients toward a recipe for understanding healthier cognitive aging.
Measuring Intelligence
If we are interested in the stability of intelligence, which test or tests should be used? There are many cognitive abilities, and an even larger number of cognitive tests, and there are differences in the ways that different schools of psychology think about cognitive tests. In neuropsychology and experimental psychology, the classification of cognitive tests is primarily based on their putative functions, with evidence informed by cognitive theories and drawn from sources such as brain-damage and brain-imaging studies (Lezak, Howieson, & Loring, 2004). In differential psychology, the classification of cognitive tests tends to be based on their correlations (Carroll, 1993; Deary, 2000). The correlations among cognitive tests form a hierarchy. Tests that assess functions within a cognitive domain (e.g., memory or processing speed) tend to have stronger correlations with each other than with tests that assess different domains. This means that people can be given scores on individual tests and also on a broader cognitive domain. There is an even more general regularity: Scores on cognitive domains all tend to be positively correlated. This common ability across all cognitive domains is usually referred to as general cognitive ability, general intelligence, or just g (Spearman, 1927). These three levels of individual differences in cognitive abilities are shown and explained in Figure 1.

The hierarchical model of intelligence variance. At Level 1, people differ in scores on specific tests that assess the various cognitive domains. Scores on all the tests correlate positively. At Level 2, there are especially strong correlations among tests measuring the same domain, so a latent trait at the domain level can be extracted to represent this common variance. At Level 3, people who do well in one domain also tend to do well in the other domains, so a general cognitive latent trait (g) can be extracted. This model allows researchers to partition cognitive-performance variance into these different levels. They can then explore the causes and consequences of variance at different levels of cognitive specificity or generality. For example, there are genetic and aging effects on g and on some specific domains, such as memory and processing speed. Note that the specific-test-level variance contains variation in the performance of skills that are specific to the individual test and also contains error variance. Reprinted from “Intelligence,” by I. J. Deary, 2013, Current Biology, 23, p. R674. Copyright 2013 by Elsevier. Reprinted with permission.
General intelligence can be assessed by administering a group of diverse cognitive tests to a sample of people. Then, a statistical procedure such as principal component analysis or factor analysis can be used to provide a score for each person that represents his or her general cognitive ability. If a group of diverse tests has not been administered, one can assess overall cognitive ability by using a test that is known to relate strongly to general intelligence. Tests such as these (e.g., some IQ-type tests) often have a variety of types of items.
The Scottish Mental Surveys of 1932 and 1947
Estimating the stability of individual differences in intelligence across most of the human life course depends on an unusual set of factors. The same people should be tested several decades apart. Ideally, they should be tested on the same test with the same instructions. The test should be capable of providing valid assessments of cognitive differences at both ages.
Scotland is the only country in the world, as far as I am aware, in which the intelligence of almost the whole population has been tested (Deary, Whalley, & Starr, 2009). On June 1, 1932, the Scottish Council for Research in Education tested the intelligence of almost every child who had been born in 1921 and was attending school in Scotland. The council tested 87,498 children, about 95% of the surviving 1921-born population. It did the same on June 4, 1947, testing 70,805 children who had been born in 1936. These tests were the Scottish Mental Surveys of 1932 and 1947. On both occasions, the council used the same version of the Moray House Test No. 12, a group-administered paper-and-pencil test. Teachers read out instructions, and the children were given 45 minutes to complete the test. This test has a preponderance of verbal-reasoning items, but also some numerical items and other types of items; these are described in Deary et al. (2009). In both Mental Surveys, validation samples of about 1,000 children in each of the populations were tested, and the correlation between the Moray House Test and individually administered Binet test scores was about .8 (Scottish Council for Research in Education, 1933, 1949).
The Scottish Council for Research in Education retained the Mental Surveys’ data. We discovered these in the 1990s (Deary et al., 2009). We decided to use the childhood data as baselines from which to study the factors across the life course that contributed to more or less healthy cognitive aging. We traced and recruited people—by then in old age—who had taken part in the Mental Surveys. We conducted follow-up studies in the cities of Aberdeen and Edinburgh and thereby began, respectively, the Aberdeen and Lothian (Edinburgh and its surrounding area) Birth Cohorts of 1921 and 1936 (Deary, Gow, Pattie, & Starr, 2012; Whalley et al., 2011). By retesting these individuals, we were in a position to examine the stability of intelligence across more of the human life course than had been studied previously, from childhood to older age.
The Stability of Intelligence…
…in others’ studies
The studies considered here are limited to those that used the same intelligence test at initial measurement and at follow-up. If the question is the stability of intelligence, then there is an advantage in using the same test on all occasions within a study. If a different test is used at follow-up, then the estimated stability is limited by the correlation between the two tests when they are given contemporaneously. For example, one study tested 930 men who had taken the Army General Classification Test at army induction 50 years later on the Telephone Interview for Cognitive Status and found a correlation of .457 (Plassman et al., 1995). This investigation of the long-term stability of cognition was limited by the contemporaneous correlation between the two tests.
There are also possible limitations if one uses the same mental test on both occasions. The test might not be appropriate for subjects at different ages. For example, it might be too easy or too hard at one of the two ages, producing what are called, respectively, ceiling and floor effects in the test scores. These will lead to an underestimation of the correlation between the two occasions, because ceiling and floor effects prevent the expression of the full range of individual differences. The test also might have content that has become archaic or that is highly memorable, such that some people might recall answers on the second occasion rather than figuring them out. These possible problems should be kept in mind. One way to answer the question of whether the test is appropriate for the ages at which it has been applied is to compute its correlation with other well-validated tests given contemporaneously at each age.
Before the follow-up studies of the Scottish Mental Surveys were begun, there were others that reported correlations between scores on the same test across decades. Owens (1966) reported a study in which 96 men took the Army Alpha Form 6 intelligence test as freshmen at Iowa State University at a mean age of 19 years and then took the test again at age 61. The correlation for the total score over that 42-year gap was .78.
Schwartzman, Gold, Andres, Arbuckle, and Chaikelson (1987) reported a study in which 260 male Canadian World War II veterans took the Revised Examination “M” test of intelligence twice. The first time was at army induction, when the men ranged from 17 to 41 years old. The second time was at a follow-up about 40 years later, when the men were 52 to 81 years old. The correlation for the total score over that approximately 40-year gap was .78, the same as that reported by Owens (1966).
…in the Scottish Mental Surveys
To the valuable, four-decades-long studies described above, the follow-up studies of the Scottish Mental Surveys have added between two and four extra decades of follow-up (Deary, Pattie, & Starr, 2013; Deary, Whalley, Lemmon, Crawford, & Starr, 2000; Gow et al., 2011). There are two notable aspects of the design of these studies. First, the subjects are all from the same birth year and were tested on the same day. Therefore, there is little age variation in age at the first and second tests. Second, the subjects were children at the first test and in old age at the second test. In the Owens (1966) and Schwartzman et al. (1987) studies, the subjects were adults at the first test and were mostly younger than subjects in the Scottish studies at the follow-up test.
The follow-up studies of the Moray House Test, originally conducted when participants had a mean age of 11 years, have taken place when the subjects had mean ages of 70, 77, 79, 87, and 90 years, with Ns ranging from about 1,000 to 100 (Table 1). The raw correlation from age 11 to age 70—the shortest follow-up period—is .67 (Gow et al., 2011). The raw correlation from age 11 to age 90—the longest follow-up period—is .54, which is reduced to .45 when people with reported cognitive pathology or indications of it are removed (Deary et al., 2013).
Correlations Between Moray House Test No. 12 (MHT) Scores at Approximately 11 Years of Age and Older Ages
Note: WAIS-III-UK = Wechsler Adult Intelligence Scale - Third UK Edition; CI = confidence interval.
The estimate of the correlation obtained after taking into account the sample’s restricted range of Moray House Test scores by comparing them with those of the whole population tested in the Scottish Mental Surveys of 1932 and 1947.
After the exclusion of two subjects with dementia and three with possible dementia, the correlation was .51. After the exclusion of two more subjects with low scores on the Mini-Mental State Examination (which is often used as a screening test for possible dementia), the correlation was .45, as shown.
This component was formed from five nonverbal tests from the Wechsler Adult Intelligence Scale - Third UK Edition: Matrix Reasoning, Letter-Number Sequencing, Block Design, Digit-Symbol Coding, and Symbol Search.
Raven’s Standard Progressive Matrices were administered with a 20-minute time limit.
What do these findings tell us about the proportion of people’s differences in intelligence that are stable from childhood to old age? Some considerations should be made before any conclusions are drawn. The correlations should not be taken at face value, for at least four reasons.
First, the correlations are estimates of the values based on specific numbers of subjects. Therefore, each correlation gives an idea of the association and comes with uncertainty about how well it estimated the true value. Table 1 shows the reported 95% confidence intervals for two of the correlations, which are in the smaller samples. Larger samples have smaller confidence intervals.
Second, the range of intelligence in all of the samples in Table 1 is more restricted than it was in the population at age 11. One valuable aspect of the Scottish Mental Surveys is that they tested almost the whole population. Therefore, it is known by how much the samples’ ranges are restricted. The raw correlations can be recalculated to estimate what they would be if a sample with the whole range of intelligence had been retested. The estimates of the disattenuated correlations in Table 1 are about .1 higher than the raw correlations.
Third, no cognitive test is perfectly reliable. Even if the second administration of the test took place one day after the first—never mind almost 80 years later—the correlation would not be a perfect 1.0. The correlations in Table 1 are not adjusted for the Moray House Test’s internal consistency, or “period-free” reliability. This is another reason that the raw correlations in Table 1 are probably underestimations of the true values.
Fourth, there is no guarantee that the Moray House Test is appropriate for assessing the range of intelligence found in both childhood and old age. The test was validated in childhood against the individually administered Binet test, with a correlation of around .8 (Scottish Council for Research in Education, 1933, 1949). Its correlations with other cognitive tests in older ages are shown in Table 1. Most of these are around .7, which demonstrates that the test does validly assess cognitive differences in older age. The test shows, therefore, concurrent validity both in childhood and in older age.
Typically, one estimates the proportion of the individual differences that are shared by squaring the correlation coefficient. Some argue that in the situation we have here, we should instead use the actual correlation (almost r = .7). There is still debate about this (Johnson, 2011). This method would obviously give a larger value. However, if we are conservative and use the square of the correlation, and if we do not apply the full correction for the restriction of the sample range, then about half (i.e., .7 × .7 = .49) of the differences in intelligence at age 70 can be traced back to age 11, and about a third of the differences in intelligence at age 90 can be traced back to age 11. Given that we should expect there to be factors that cause people to change relative to each other in intelligence between childhood and early adulthood, across adulthood, and from late adulthood and into old age, the stability appears quite high. We subsequently found that variation in DNA was a strong factor driving this lifetime stability, and we estimated that environmental factors were more important determinants of individual differences in lifetime cognitive changes (Deary, Yang, et al., 2012).
Influences on Instability
The long-term stability of intelligence was not the principal topic of interest for the Aberdeen and Lothian Birth Cohort studies. In conducting these studies, we aimed to discover the factors that reduced the stability coefficient. The research design we used was typically to have cognitive ability in older age as the outcome variable and to ask, “What contributes to people’s differences in intelligence in old age after we take into account their cognitive level from childhood?” This research was aimed at finding protective factors that mitigated cognitive decline and risk factors that appeared to speed it up. Examples of risk factors were the e4 allele of the gene for apolipoprotein E, which is also a risk factor for dementia (Deary et al., 2002), and smoking (Corley, Gow, Starr, & Deary, 2012). Examples of protective factors were physical fitness and activity (Gow, Corley, Starr, & Deary, 2012) and education (Ritchie, Bates, Der, Starr, & Deary, 2013).
One possibility we tested was whether a high level of intelligence in childhood is itself protective against the rate of decline in cognitive ability in older age. This question is sometimes phrased as, “Is age kinder to the initially more able?” From our results to date, the answer is no (Gow et al., 2011). We found no differences in the rates of change of cognitive ability in older age among those who had different intelligence scores as children.
There were some other surprising results that made it clear how valuable it was to have intelligence scores from childhood. We found that people with higher intelligence in old age engaged more in intellectual activities, drank more coffee and red wine, and had lower levels of inflammatory biomarkers in their bloodstream (Corley et al., 2010; Gow et al., 2012; Luciano, Marioni, Gow, Starr, & Deary, 2009). Had we stopped there, we might have thought we had found some clues to healthier cognitive aging. However, when we adjusted for subjects’ scores on the Moray House Test at age 11—about 60 years earlier—these associations fell to almost nothing. We concluded that these factors were not influencing rates of cognitive aging. We had found examples of confounds or possible reverse causation. That is, brighter children tend to become brighter older adults, and they also take part in more intellectual activities, drink more coffee and red wine, and have less-inflamed blood. We continue to search for factors that protect against or are risk factors for cognitive aging. Across this research field as a whole, few such factors have been replicated (Plassman, Williams, Burke, Holsinger, & Benjamin, 2010).
Stability and Change in Intelligence: Conclusions
Here, I have emphasized the rarity and value of those studies that tested intelligence in the same people, using the same test, on occasions that were decades apart. There is curiosity value in knowing how stable this important human trait is across so many years. More importantly, the stability gives a baseline from which to reckon the amount of change, and then to start the process of finding the contributors to that change. Some of that change will be stochastic, but some will have discoverable—and, we hope, remediable—determinants. There are other ways to study instability in intelligence, and these complement the information offered here (Nisbett et al., 2010). The aim of our Scottish follow-up studies is to ask: Setting aside someone’s intelligence score in childhood, what else contributes to the score in older age? An understanding of the stability of intelligence forms the foundation for the study of its lifelong changes.
Footnotes
Declaration of Conflicting Interests
The author declared no conflicts of interest with respect to the authorship or the publication of this article.
Funding
I. J. Deary is supported by Grant MR/K026992/1 from the UK Medical Research Council and the Biotechnology and Biological Sciences Research Council for the University of Edinburgh Centre for Cognitive Ageing and Cognitive Epidemiology, part of the cross-council Lifelong Health and Wellbeing initiative. Funding from the Biotechnology and Biological Sciences Research Council and the Medical Research Council is gratefully acknowledged.
