Abstract
As a foundation for studies of human cognitive aging, it is important to know the stability of individual differences in cognitive ability across the life course. Few studies of cognitive ability have tested the same individuals in youth and old age. We examined the stability and concurrent and predictive validity of individual differences in the same intelligence test administered to the same individuals (the Lothian Birth Cohort of 1921, N = 106) at ages 11 and 90 years. The correlation of Moray House Test scores between age 11 and age 90 was .54 (.67 when corrected for range restriction). This is a valuable foundation for estimating the extent to which cognitive-ability differences in very old age are accounted for by the lifelong stable trait and by the causes of cognitive change across the life course. Moray House Test scores showed strong concurrent and predictive validity with “gold standard” cognitive tests at ages 11 and 90.
Keywords
Demographic changes in societies have brought about more research interest in human aging (Lutz, Sanderson, & Scherbov, 2008; Martin, 2011). Although age affects many aspects of functioning, high priority has been given to cognitive functions, and especially how they might be retained better as people grow older (Plassman, Williams, Burke, Holsinger, & Benjamin, 2010). Decline of cognitive functions with age is associated with, for example, a loss of independence in carrying out everyday tasks (Tucker-Drob, 2011). A fundamental goal for cognitive-aging studies would be to determine how much of the variance in human intelligence is stable across the life course, which also would indicate how much changes. Knowing the long-term stability of individual differences in cognitive abilities will allow researchers to estimate how much variance can be accounted for by factors that might influence age-related cognitive changes.
Such understanding will be especially important for the concept of cognitive reserve, the idea that higher prior cognitive ability confers some protection from the effects of brain pathology on cognitive functions in old age (Singh-Manoux et al., 2011; Staff, 2012). Testing this idea formally requires knowledge of how strongly prior, youthful cognitive ability itself is associated with cognitive functions at various stages of old age. Another value of knowing the long-term stability of intelligence is that some of the putative health determinants of cognitive-aging differences are themselves predicted by long-lasting differences in intelligence (Deary, Weiss, & Batty, 2010; Gottfredson, 2004; Lubinski, 2009). Therefore, it is valuable to have a measure of intelligence prior to the development of such medical problems, and prior to the attainment of occupational status in adulthood, which also relates to health differences.
There have been few reports of correlations between intelligence-test scores obtained in youth and those obtained when the same participants reach maturity and older ages. One study found that intelligence scores at around age 11 and age 40 correlated at or above .7 (McCall, 1977). Correlations of about this magnitude have also been found when cognitive-ability tests have been administered in the late teens or early 20s and then again in the early to mid-60s (Owens, 1966; Schwartzman, Gold, Andres, Arbuckle, & Chaikelson, 1987). A smaller effect size (.46) was found when a cruder estimate of cognitive ability was obtained at a 50-year follow-up from young adulthood to old age (Plassman et al., 1995). Here, we offer an estimate of the stability of intelligence as measured by the same test in childhood and in old-old age.
We have previously reported the correlations (raw, uncorrected Pearson’s rs) between Moray House Test (MHT) cognitive-ability scores at age 11 and scores on the same test at ages 70 (r = .67; Lothian Birth Cohort of 1936), 77 (r = .63; Aberdeen Birth Cohort of 1921), 79 (r = .66; Lothian Birth Cohort of 1921), and 87 (r = .51; Lothian Birth Cohort of 1921; Deary, Whalley, Lemmon, Crawford, & Starr, 2000; Deary, Whiteman, Starr, Whalley, & Fox, 2004; Gow et al., 2011). We judge that the present report is a valuable and unusual addition to these prior reports for the following reasons. First, we provide the association (with its confidence interval) between scores from the same intelligence test at ages 11 and 90 years. This is likely to be the longest follow-up study of intelligence differences that will be undertaken. It provides information about the stability of intelligence differences across as much of the life course as will ever be tested in a sample large enough not to have wide confidence intervals. The sample whose data and results we report is still moderate in size, with an N of around 100. Second, the data we used are from a new wave of data collection and have not been reported previously. Third, we present results for several other standard cognitive tests taken at age 90. These demonstrate concurrent validity for the MHT at age 90, to complement the concurrent validity that has already been shown at age 11 (Deary, Whalley, & Starr, 2009, chap. 1; Scottish Council for Research in Education, 1933). These new data also provided an opportunity formally to test predictive validity, that is, how scores on the various cognitive tests in old-old age correlate with MHT scores from almost 80 years earlier. This was a rare opportunity to discover the extent to which scores on diverse cognitive tests in old-old age are dependent on cognitive ability in childhood. Our analyses provide insight into the mix of fluid and crystallized skills that contribute to performance on different cognitive tests.
The principal aim of the present study was to examine the stability of individual differences in intelligence from age 11 years to age 90 years in a narrow-age sample. The secondary aim was to investigate how scores on the same test taken at age 11 and age 90 correlate with scores on several other tests, taken at age 90, that are commonly used as indices of functioning in important cognitive domains.
Method
Participants
The study participants were members of the Lothian Birth Cohort of 1921 (LBC1921). Most of the cohort had taken part in the Scottish Mental Survey of 1932 (Scottish Council for Research in Education, 1933; Deary et al., 2009), which, on June 1 of that year, administered a validated test of general mental ability (MHT No. 12) to almost all children who had been born in 1921 and were attending school in Scotland. From 1999 to 2001, people living in Edinburgh and the surrounding area of Scotland were invited to take part in a study of cognitive aging. The 550 recruits, who were all born in 1921, formed Wave 1 of the LBC1921. The tracing, recruitment, and testing in the first three waves of the LBC1921 study were described in previous reports (Deary, Gow, Pattie, & Starr, 2012; Deary et al., 2004). The present study’s analyses are based on data collected during Wave 4 of the study, which took place during 2011 and 2012. Examinations took place as close to the participants’ 90th birthdays as was possible. The majority of the participants were examined at the Wellcome Trust Clinical Research Facility at the Western General Hospital in Edinburgh. Some, because of mobility problems or distance from the research clinic, were tested at home. The number of participants at each wave of testing and the causes of attrition are shown in Figure 1. The principal analyses in this report concern MHT scores from childhood and Wave 4, as well as scores on other cognitive tests at Wave 4. Although there was a small range of ages on both testing occasions, we refer to the ages of participants at these two waves as 11 and 90 years.

Number and mean age of participants and causes of attrition at each wave of testing in the Lothian Birth Cohort of 1921. “Withdrew” refers to participants who did not wish to continue to participate. “Not well enough/no longer eligible” refers to participants who withdrew for health reasons, including being too frail to travel, and participants who had severe visual impairments or dementia (except at Wave 4, when 11 people with dementia were examined at home by a physician, but not given tests of cognitive functioning). “Other reasons” for leaving the study include, for example, living too far away to travel to the testing site, caring for a spouse, refusal by the participant’s general medical practitioner, and missing an appointment.
Cognitive measures
Moray House Test No. 12
The same MHT was administered to participants at ages 11 and 90 years. The MHT is a paper-and-pencil test of general mental ability. The time limit is 45 min, and the maximum possible score is 76. There is a preponderance of verbal-reasoning items, with some numerical and other types of items, as we have described elsewhere (Deary et al., 2004). The test is reprinted in full in the monograph on the Scottish Mental Survey of 1932 (Scottish Council for Research in Education, 1933). The correlation between scores on the MHT and the Stanford revision of the Binet test in 1932, when the subjects were 11 years old, was .81 in boys and .78 in girls (n = 500 for each; Scottish Council for Research in Education, 1933, p. 100). When participants were age 11, the MHT was administered to groups in school. When they were age 90, it was administered individually. On both occasions, it was self-completed.
National Adult Reading Test (NART)
The NART is widely used to estimate peak prior or premorbid cognitive ability (Nelson & Willison, 1991). The participant is asked to pronounce 50 words that are irregular in grapheme-phoneme associations, stress, or both.
Mini-Mental State Examination (MMSE)
This short interview-based test is often used as a screening test for dementia (Folstein, Folstein, & McHugh, 1975). The maximum score is 30. A score of less than 24 is often used as an indicator of possible dementia.
Raven’s Progressive Matrices
This paper-and-pencil test requires the participant to choose which of a number of options correctly provides the missing piece of an abstract visual pattern (Raven, Court, & Raven, 1977). It is a test of nonverbal reasoning and includes 60 items. A time limit of 20 min was applied.
Wechsler Logical Memory test
This test from the Wechsler Memory Scale—Revised (Wechsler, 1987) is used as a test of verbal declarative memory. It requires the participant to recall as much information as possible from two stories that the tester reads aloud. Each story has 25 idea units. Recall takes place immediately after each story and after a delay during which other mental tests are completed. Immediate- and delayed-recall scores correlated very highly, so we used only the latter score in our principal component analysis of the data.
Wechsler Letter-Number Sequencing test
In each item of this test from the Wechsler Adult Intelligence Scale-IIIUK, the participant is asked to listen to a jumbled series of numbers and letters read aloud by the tester (Wechsler, 1998). The participant then tries to recall the series, but reports first the numbers in ascending order and then the letters in alphabetical order. The score is the number of correct items. This is used as a test of working memory.
Verbal-fluency test
In this test, the participant is asked to name as many words as possible that begin with the letters C, F, and L (Lezak, 1995). The time limit for each letter is 1 min. The score is the total number of words over all three letters. This is used as a test of executive functioning.
Wechsler Digit Symbol test
This paper-and-pencil test from the Wechsler Adult Intelligence Scale—Revised requires the participant to place symbols below rows of single-digit numbers according to a given code, which is displayed explicitly on the answer sheet during the test (Wechsler, 1981). Participants are asked to complete as many items as possible in 2 min without making errors. This is used as a test of processing speed.
4-Choice reaction time
Choice reaction time was tested using a stand-alone device with a small LCD screen and response buttons (Deary, Der, & Ford, 2001). On each of 40 trials, a number (1, 2, 3, or 4) appeared on the screen, and the participant was asked to press the corresponding response button as quickly as possible. There were 8 practice trials, and the interstimulus interval varied between 1 and 3 s. The score used was the mean reaction time in milliseconds.
Demographic and health measures
Each participant’s sex and age (in days, at the time of testing at age 11 and age 90) were recorded. The following measures were recorded at the age-79 interview: number of years of full-time education, parental occupational social class, and social class of the participant’s own main occupation during his or her working life. The two measures of social class were graded on a scale from 1 to 5, with 1 representing the most professional occupations and 5 the most manual (General Register Office, 1956). Possession of the e4 allele of the gene for apolipoprotein E (APOE) was typed on DNA extracted from venous blood (Wenham, Price, & Blandell, 1991). Smoking status (i.e., whether the participant ever smoked, was an ex-smoker, or was a current smoker) was recorded at the age-90 interview. Medical history for the following key chronic illnesses was also recorded at that interview: hypertension, cardiovascular disease, diabetes, cerebrovascular disease, cancer, dementia, thyroid disease, and arthritis. Subjects rated their health on a 5-point scale as “poor,” “fair,” “good,” “very good,” or “excellent.” Depressive-mood symptoms were recorded using the Hospital Anxiety Depression Scales (HADS; Zigmond & Snaith, 1983).
Results
Of the 129 LBC1921 subjects tested (Fig. 1), 106 (49 men, 57 women) had MHT scores from childhood and at Wave 4 of the LBC1921 study; 79 were tested at the Wellcome Trust Clinical Research Facility, and 27 were tested at home. On average, participants were 10.9 years old (SD = 0.3) at the 1932 MHT testing and 90.1 years old (SD = 0.2) at Wave 4. MHT scores were significantly higher at age 90 (M = 51.9, SD = 14.4) than at age 11 (M = 49.0, SD = 11.1), t(105) = −2.43, p = .017, Cohen’s d = 0.23. MHT scores for these same 106 participants were lower and more spread at age 90 than they had been when participants were mean ages of 79 (M = 63.4, SD = 8.2) and 87 (M = 57.8, SD = 11.1).
Of the 106 LBC1921 participants with MHT data at age 11 and age 90, 54 had never smoked, 47 were ex-smokers, and 5 were still smoking at 90. No participants rated their current health as poor, 14 rated it as fair, 35 rated it as good, 44 rated it as very good, and 13 rated it as excellent. The numbers with medical conditions were as follows—hypertension: 71; cardiovascular disease: 51; diabetes: 3; cerebrovascular disease: 13; cancer: 22; dementia: 2 (the status of 3 individuals was uncertain because their medical history was contradicted by the medication they took or information provided by other people); thyroid disease: 15; and arthritis: 53. Five participants had a HADS depression score greater than 8, and none had a score greater than 11. Four had an MMSE score less than 24. The mean number of years of full-time education was 11.5 (SD = 2.7). The distribution for childhood social class in the five categories from most professional to most manual (for participants with these data) was 18, 26, 37, 14, and 6. The distribution for adult social class in these five categories was 35, 34, 35, 0, and 2. Of the 104 participants who had given blood for genetic testing, there were 22 APOE e4 carriers and 82 noncarriers.
The correlation between the raw MHT scores at age 11 and age 90 for the 106 participants was .55 (95% confidence interval, CI = [.39, .69]; CI obtained by bootstrapping based on 1,000 samples; Fig. 2). For these same participants, the correlation between MHT scores at age 90 and age 79 was .73, and the correlation between MHT scores at age 90 and age 87 was .84 (both ps < .001). Age in days at the age-11 MHT testing in 1932 correlated .28 (p < .01) with raw MHT score at that time. Age in days at the age-90 testing correlated –.06 (p = .52) with raw MHT score at that time. MHT scores at ages 11 and 90 were adjusted for age (in days) at the respective testing occasion by regressing MHT scores on age in days and saving the standardized residuals. The correlation of age-11 and age-90 MHT scores was .54 (95% CI = [.38, .68]) when both scores were age corrected and was also .54 (95% CI = [.37, .67]) when only the age-11 MHT scores were age corrected. After we omitted the 2 subjects with a history of dementia and the 3 whose medical history of dementia was uncertain, the correlation (adjusted for age at both testing occasions) was .51 (95% CI = [.34, .66]; n = 101). After we omitted 2 additional participants whose MMSE score was less than 24, the correlation was .45 (95% CI = [.29, .58]; n = 99). Among the LBC1921 sample members who were tested at age 90, age-11 MHT scores were less spread (SD = 11.1) than they are in the Scottish population as a whole (SD = 15.5; Maxwell, 1961). After we applied Equation 1 from Wiberg and Sundström (2009) to the correlation of .54 to correct for this restriction of range, the estimate of the age-11/age-90 MHT correlation in an unrestricted sample was .67.

Moray House Test raw scores for the Lothian Birth Cohort of 1921 at ages 11 and 90 years.
Correlations between the age-11 (age-corrected) and age-90 MHT scores and age-90 scores on the other cognitive tests are shown in Table 1. The MHT scores at both ages correlated significantly with the scores on all of these other tests. The tests whose age-90 scores had the highest correlations with age-11 MHT score (i.e., r > .3) were the NART (.57), Raven’s Progressive Matrices (.50), the MMSE (.40), and the Digit Symbol test (.32). The tests whose age-90 scores had the highest correlations with age-90 MHT score were Raven’s Progressive Matrices (.75), the Digit Symbol test (.68), the MMSE (.63), and the Letter-Number Sequencing test (.61); the verbal-fluency test had the lowest correlation with age-90 MHT score (.38). We also examined the difference between each of these other tests’ correlations with age-11 and age-90 MHT scores (Table 1). In all cases, the correlation with the contemporaneous MHT score (i.e., at age 90) was higher than the correlation with the MHT score at age 11. The only tests for which the difference in the correlations was not significant were the NART and the verbal-fluency test, and the effect sizes for these two tests’ correlations with MHT at age 11 and age 90 were similar despite the almost-80-year lag in testing.
Pearson Correlations Between Moray House Test (MHT) Scores at Ages 11 and 90 and Other Cognitive-Test Scores at Age 90 and Results of a Principal Component Analysis of These Scores
Note: See the Method section for details concerning the tests administered. Loadings greater than |.3| are indicated in boldface.
All correlations were significant at the level of p < .01, with the exception of the correlation between delayed recall on the Logical Memory test and age-11 MHT score, which was significant at the level of p < .05 (ns ranged from 101 to 106). bThis analysis was based on data from 93 participants, with full data on all variables. Immediate recall on the Logical Memory test was not included in this analysis because of its high correlation (r = .906) with delayed recall on the same test.
We explored the correlations among age-11 MHT score, age-90 MHT score, and the other age-90 cognitive-test scores further using principal component analysis (Table 1). All tests had high (> .5) loadings on the first unrotated principal component. Eigenvalues suggested the extraction of two components. The first accounted for 45.6% of the total variance, and the second accounted for 11.8%. These components were rotated using direct oblimin rotation. Only two tests had loadings greater than .35 on Rotated Component 2: the MHT at age 11 (.89) and the NART at age 90 (.83). This appears to be a prior (crystallized) intelligence component. All of the other tests except the test of verbal fluency (.38) had loadings greater than .58 on Rotated Component 1. This appears to be a current (fluid) ability component. The verbal-fluency test had similar and modest loadings on the two rotated components. The correlation between the two rotated components was .40.
Discussion
The correlation between intelligence—as measured using MHT No. 12—tested at age 11 and age 90 years was .54 (95% CI = [.37, .67]). When corrected for restriction of range, the estimate was .67. Whether the correlation itself or its square is the effect size best suited to estimating the shared variation across the life course is a complex issue (Johnson, 2011, especially her Fig. 6). The mean score at age 90 was still between a quarter and a fifth of a standard deviation higher than the mean score at age 11, and considerably lower and more spread than the mean for the same participants at ages 79 and 87. MHT scores at ages 11 and 90 correlated significantly with scores on all other cognitive tests given at age 90. However, most of these other tests’ correlations with age-90 MHT scores were significantly higher than their correlations with age-11 MHT scores; this was true for tests of general cognitive status, nonverbal reasoning, verbal declarative memory, processing speed, and working memory. Age-11 and age-90 MHT scores had similar correlations with scores on the NART, a test of irregular-word pronunciation, at age 90. At age 11, much of the vocabulary represented in the NART would not yet have developed.
Our main aim was to provide a correlation between scores on the same validated test from testing occasions about as far apart in the human life course as is ever likely to be examined. The most up-to-date data available indicate that about 14% of men and 24% of women from the LBC1921 generation in Scotland survived to age 90 (Office of National Statistics, 2011). We have already published results for the LBC1921 at age 87 (Gow et al., 2011). However, we envisage that this age-90 report—showing mean MHT scores similar to what they were at age 11—will be the last such report with a moderately sized sample (N ~ 100), and we do not foresee reports from other cohorts that will have such a longitudinal spread. We have also provided new and substantial evidence for the concurrent validity of the MHT in old age. It has a high correlation with multiple domains of fluid-type cognitive ability, especially as measured by Raven’s Progressive Matrices, which assesses nonverbal reasoning and has a high loading on fluid general cognitive ability (Carroll, 1993, p. 597; Jensen, 1998, p. 38). The MHT has long been known to have concurrent validity at age 11; its correlation with the Stanford revision of the Binet test was approximately .80 in a sample drawn from the same background population (i.e., children born in 1921 and attending schools in Scotland in June 1932; Scottish Council for Research in Education, 1933). The present study’s longitudinal analyses also provide evidence of the relative predictive validity of the MHT in childhood for scores on numerous cognitive tests at age 90.
The extensive medical information we have reported here shows that by age 90, the LBC1921 exhibited considerable pathology. We did not model the data to examine how medical and other factors might contribute to change in cognitive ability from age 11 to age 90. Our aim was to report the stability of cognitive differences in the acknowledged presence of—and despite—this heterogeneity. In future reports, we will examine contributions to cognitive changes in analyses that use data collected in all four waves of the LBC1921, which will have greater power to detect such contributions. To the extent that lower mental ability in youth is associated with greater cognitive decline in older age, including the few participants with possible early dementia could have inflated our estimate of the true correlation between age-11 and age-90 cognitive ability. However, there is as yet no unequivocal evidence that mental ability in youth is associated with individual differences in the amount of normal cognitive aging (Gow et al., 2012), or with the likelihood of developing the most common form of dementia (McGurn, Deary, & Starr, 2008).
Another factor that must be acknowledged is that the age-11 MHT scores of the LBC1921 sample members who were tested at age 90 were higher and less spread (M = 49.0, SD = 11.1) than the scores of the Scottish population (M = 34.5, SD = 15.5; Maxwell, 1961). We provided an estimate of what the age-11/age-90 correlation might be in an unrestricted sample (i.e., .67). We are cautious, though, in suggesting that it is possible to recruit a sample representative of the original population or that one should necessarily correct the age-11/age-90 MHT correlation, given this restriction of range. It is well documented that the Edinburgh area is the highest MHT-scoring area in Scotland, so the LBC1921 sample’s mean childhood MHT score is not the mean for the Scottish population (Deary et al., 2009, p. 26). Also, score on the MHT at age 11 is a predictor of survival to old age (Deary et al., 2004; Whalley & Deary, 2001). Therefore, older samples will unavoidably have higher means and less spread, so when examining a “correction” of the correlation based on the childhood standard deviation, one must remember that older samples no longer have that same distribution of original cognitive ability, irrespective of how comprehensively they have been recruited.
Nevertheless, there are probably still some factors that make the raw estimate of the age-11/age-90 MHT correlation an underestimate. For example, lower cognitive ability is associated with attrition from longitudinal studies (Dykiert, Gale, & Deary, 2009; Nishiwaki, Clark, Morton, & Leon, 2005). However, we note that among the LBC1921 participants who had MHT scores at age 11 and age 90, the standard deviation was larger at age 90 (14.4) than at age 11 (11.1). It is possible that whatever led to the increased variance in MHT scores—chronic illnesses, for example—was associated with MHT scores at age 11. It is known that childhood intelligence is associated with diverse illness and health outcomes (Deary et al., 2010; Gottfredson, 2004; Lubinski, 2009). However, if there were such a magnifying effect on the correlation over time, one would also expect to see an association between childhood cognitive ability and cognitive change within old age, and we have not found such an association to date (Gow et al., 2012; Gow et al., 2011). Overall, then, the raw correlation of .54 offers, with these caveats, a lower-bound estimate of the association between cognitive ability in childhood and old-old age among relatively healthy (i.e., free from acute, severe illness), largely nondemented, mostly community-dwelling people at age 90.
In conclusion, individual differences in intelligence as assessed by the MHT show moderately high stability from childhood to old-old age. Although one should bear in mind the presence of measurement error and the possible attenuation of the effect size caused by restriction of range in the sample, the nonstable variance provides a possibly sizable target for researchers seeking the causes of cognitive change across the life course.
Footnotes
Acknowledgements
We thank the participants of the Lothian Birth Cohort of 1921 and the research nurses and staff of the Wellcome Trust Clinical Research Facility. We thank Paul Redmond for database management. We are grateful to the Scottish Council for Research in Education for access to the Scottish Mental Survey of 1932.
Declaration of Conflicting Interests
The authors declared that they had no conflicts of interest with respect to their authorship or the publication of this article.
Funding
The data collection for the Lothian Birth Cohort of 1921 at age 90 (Wave 4) was funded by the Scottish Government’s Chief Scientist Office (Grant No. ETM/55). The work was undertaken in The University of Edinburgh Centre for Cognitive Ageing and Cognitive Epidemiology, which is part of the cross-council Lifelong Health and Wellbeing Initiative (Grant No. G0700704/84698). Funding for the Centre from the Biotechnology and Biological Sciences Research Council (BBSRC), Engineering and Physical Sciences Research Council (EPSRC), Economic and Social Research Council (ESRC), and Medical Research Council (MRC) is gratefully acknowledged.
