Abstract
Taking into the account both the severity and the consistency of performances obtained on memory tests by patients with amnestic mild cognitive impairment (aMCI) could improve the power to predict their progression to Alzheimer’s disease. For this purpose, we constructed the Episodic Memory Score (EMS), which is obtained by subdividing in tertiles performances obtained at baseline in verbal (RAVLT) and visual episodic memory (Rey-Osterrieth Figure-delayed recall) and giving a score ranging from 1 (worst result) to 3 (best result) to results falling within each tertile. The EMS was computed for each patient by summing the tertile score obtained on each memory task, so that the total score ranged from 4 (worst performance) to 12 (best performance). The aMCI sample consisted of 198 subjects who completed the two-year follow-up, at the end of which 55 subjects had converted to dementia. The mean EMS score obtained by aMCI converters was significantly lower than that of aMCI-stable patients. In detecting conversion to dementia, the comparison between EMS and individual memory scores obtained at baseline was made by computing ROC curves, and estimating the respective area under the curve (AUC). The EMS had a larger AUC than the individual memory scores. At baseline aMCI converters performed worse than non-converters not only on memory tasks, but also on executive functions tasks. However, in a multiple variables logistic regression analysis in which all scores showing statistically significant differences between aMCI-converters and aMCI-stable were entered, the EMS was the only reliable predictor of progression from aMCI to dementia.
Keywords
INTRODUCTION
Petersen et al. [1] proposed the construct of mild cognitive impairment (MCI) to identify patients in the pre-clinical stage of dementia who are at risk of developing Alzheimer’s disease (AD). These authors provided the following simple set of criteria to identify MCI patients: (a) the presence of subjective and objective memory disorders; (b) the absence of obvious impairments in other cognitive domains; (c) the absence of deficiencies in daily living activities; (d) the absence of dementia. However, even if subjectsidentified as MCI (and in particular as amnestic MCI (aMCI) patients) on the basis of these criteria had an increased risk of progressing to dementia, most remained stable or even reverted from MCI to a ‘normal’ state. Furthermore, estimates of the annual conversion rate to dementia were very variable, ranging from less than 4% [2–4] to 20–30% [5–8]. Therefore, several neuropsychological investigations tried to single out the best methods of identifying MCI patients who have a higher risk of converting to AD. These investigations, recently reviewed by Gainotti et al. [9], showed the following: (a) measures of delayed recall are the best neuropsychological predictors of conversion from MCI to AD [10–18]; (b) pathological scores on more than one memory test at baseline increase the accuracy of this prediction [19, 20]; and (c) the more stringent the measures of memory impairment, the better the prediction of conversion to dementia[21, 22].
In the present study, we focused on the advantage of integrating the results obtained on more than one long-term memory tasks in improving the prediction of conversion from aMCI to AD. Both previous investigations, i.e., Perri et al. [19] and Loewnstein et al. [20], had indeed focused more on return to normality than on conversion to AD. In fact, these authors showed that subjects classified as aMCI on the basis of only one memory test at baseline rarely converted to AD and frequently returned to normality, whereas most MCI patients who had obtained pathological scores on more than one memory test at baseline frequently converted to AD and rarely showed a return to normality at the follow-up. These data demonstrated that the association of results obtained on two or more long-term memory tasks increases the ability to predict the progression of patients with an amnestic form of MCI. Furthermore, they suggested that the integration (rather than the simple association) of these results might further improve the prognostic ability of the individual tests.
To check this hypothesis, in a follow-up study conducted in a large group of carefully diagnosed aMCI patients, we took into account results obtained at baseline on three long-term episodic memory tasks, i.e., immediate and delayed recall and the delayed recognition of the Rey’s Auditory Verbal Learning Task (RAVLT) [23] and the delayed reproduction of the Rey-Osterrieth Complex Figure [24]. We sumrised that the ability to predict conversion from aMCI to AD with a score integrating results obtained on these four tasks would be greater than the predictive capacity shown by each individual long-term memory task.
MATERIALS AND METHODS
Subjects
The study sample consisted of consecutively enrolled patients who had referred to the Memory Clinic of the Catholic University and the Behavioral Neurology laboratory of the IRCCS Santa Lucia, with the complaint of memory disturbances between January 1, 2010 and December 31, 2011. All subjects had to: 1) fulfill the current criteria for MCI [1]; and 2) perform below the 5th percentile (i.e., 1.67 standard deviations [SD]) of the corresponding Italian normative population [25] on the RAVLT delayed-recall. Exclusion criteria were the presence of definite central nervous system diseases; evidence of chronic vascular leucoencephalopathy (Fazekas score ≥2) [26]; previous traumatic brain injury; previous prolonged loss of consciousness due to anoxic damage; alcohol abuse; medical conditions potentially associated with cognitive disturbances (i.e., renal or hepatic failure; thyroidal dysfunction; folate and/or vitamin B12 deficiency); treatment with psychotropic drugs (antidepressants, antipsychotics, systematic therapy with benzodiazepines).
To exclude major psychiatric diseases (mood disorders, psychosis, severe anxiety-spectrum disorders) patients satisfying the above-mentioned criteria underwent an unstructured psychiatric interview, which was conducted separately with the patient and his/her principal caregiver.
Furthermore, 60 healthy subjects (HS) were also recruited; the sample was composed of proxies (mainly spouses) of MCI subjects. None had memory disturbances and all performed normally on the neuropsychological assessment; healthy subjects also had to satisfy all of the exclusion criteria for the MCI group.
All subjects gave their informed consent to participate in this study. The study was approved by local ethical committees and was performed in accordance with the declaration of Helsinki.
Neuropsychological assessment
Subjects underwent a baseline neuropsychological examination, which included the Mini- Mental State Examination (MMSE) [27] and a comprehensive test battery [25]. The latter included the RAVLT (immediate and delayed recall; delayed recognition) [23]; the Rey-Osterrieth Figure (ROCF; copy and delayed recall) [24]; the Raven’s Progressive Matrices (PM’47) [28]; the Stroop’s test – short version [29]; the Multiple Features Target Cancellation (MFTC) [30, 31] test; the Phonological Verbal Fluency (PVF) [25] test; the Semantic Verbal Fluency (SVF) test [32]; copying drawings without and with landmarks [25]; the objects and actions naming task [33] and the digit span forward and backward [34].
Follow-up
The study design included a two-year follow-up. During this period, the patients underwent a complete neurological and medical examination and a neuropsychological assessment every six months. Diagnoses at the follow-up examination were made by two neurology specialists who were blinded as to results of the baseline neuropsychological examination. The diagnosis of dementia was verified using the DSM-IV-TR criteria [35]. Furthermore, the diagnosis of a specific type of dementia was made in keeping with the corresponding clinical criteria; the following dementias were taken into account: AD [36], dementia with Lewy bodies [37], frontotemporal dementia [38], and primary progressive aphasia [39].
The Episodic Memory Score
The EMS was obtained by subdividing in tertiles the scores obtained by the sample in verbal (RAVLT-immediate and delayed recall; delayed recognition) and visual episodic memory (Rey-Osterrieth Figure-delayed recall). Thus, a score equal to the corresponding tertile was assigned for each task (e.g., the 1st tertiles corresponded to a score of 1); the EMS was computed by summing the tertile score obtained in each memory test; thus, the total score ranged from 4 (worst performance) to 12 (best performance). Table 1 reports the boundaries of each tertile of the episodic memory tasks that were used to obtain the EMS.
The use of tertiles instead of quartiles mainly relies on the distribution of our study sample that was mainly positively skewed. As a consequence, using quartiles instead of tertiles, EMS would have ranged from 4 to 16 and only few subjects would have belonged to the lower scores of the EMS distribution. The presence of few patients among the worst scores of the EMS distribution would have made even more hazardous the interpretation of the results based on quartiles splitting. However, we have carried out the same analyses also using a quartiles splitting and the final results are similar. See the Supplementary Material for the analyses with quartiles.
Statistical analysis
Group comparisons were performed by means of t-tests for independent groups, with Levene’s test for equality of variance, and Pearson’s χ2 with Fisher’s exact test when required.
The predictive role of the EMS and other neuropsychological measures was assessed by means of a ‘multiple variables logistic regression mode’, which included all variables that showed statistically significant differences in univariate statistics. In order to provide a more reliable prediction model, a bootstrap approach was applied.
RESULTS
Sample characteristics
The initial aMCI sample consisted of 242 subjects (female subjects: 116; 47.9%); mean age was 72.22 years (SD = 7.635), mean literacy, 10.63 years (SD = 4.731), and mean time elapsed from the onset of memory disturbances was 27.95 months (SD = 21.906). Of the initial sample, 198 subjects (82%) completed the two-year follow-up. The reasons for dropping out were mainly unwillingness to undergo further neuropsychological evaluations (18 subjects); development of medical conditions that made the neuropsychological evaluation for the purpose of this study difficult or un-suitable (11 subjects); onset of exclusionary conditions (i.e., stroke, traumatic brain injury; 10 subjects); moving to another city (3 subjects); and death (2 subjects).
Table 2 summarizes the demographic features and neuropsychological performances at baseline of the aMCI patients who completed the follow-up in comparison with the subjects who dropped out. As shown, demographic features (age, literacy, and gender) were not significantly different between the two groups; as for memory disturbances, there was a trend for subjects who remained in the study to have a longer onset compared to patients who were lost at follow-up (p = 0.06); furthermore, subjects who remained in the study obtained slightly higher scores on the copy of figures (p = 0.081) and object naming tasks (p = 0.069).
Results obtained at the baseline by aMCI and healthy subjects
The following analyses were carried out taking into account only the aMCI subjects who remained in the study.
As expected, compared to HS, the aMCI group performed worse on all the episodic memory tests (RAVLT immediate recall: 24.83 ± 6.514 versus 38.02 ± 7.866; |t|256 = 13.06; p < 0.001; RAVLT delayed recall: 2.07 ± 1.913 versus 8.58 ± 2.465; |t|256 = 21.52; p < 0.001; RAVLT recognition accuracy: 0.74 ± 0.106 versus 0.91 ± 0.067; |t|256 = 11.76; p < 0.001; ROCF delayed recall: 4.68 ± 3.769 versus 12.41 ± 8.256; |t|256 = 10.16; p < 0.001). Furthermore, aMCI subjects performed worse than HS on the SVF (13.70 ± 2.726 versus 15.08 ± 3.810; |t|256 = 2.75; p = 0.006), objects naming (28.31 ± 2.537 versus 29.80 ± 2.110; |t|256 = 5.14; p < 0.001), the Stroop’s test (errors on interference: 2.22 ± 2.303 versus 0.90 ± 1.674; |t|256 = 4.11; p < 0.001), and the MFTC (accuracy: 0.92 ± 0.071 versus 0.94 ± 0.06; |t|256 = 2.64; p = 0.009).
Episodic Memory Score: Comparison between aMCI and healthy subjects
The mean EMS of aMCI subjects was significantly lower than that of HS (7.51 ± 2.268 versus 11.53 ± 0.892; |t|256 = 13.41; p < 0.001). Furthermore HS had a narrower distribution of scores compared to aMCI (see Fig. 1). As shown, the distribution of EMS had a bimodal appearance in aMCI subjects, whereas there was a clear negative skew in HS.
Figure 1 displays the frequency of individual EMS in aMCI and HS. As shown, none of the HS obtained an EMS lower than 8, and only 2 subjects obtained an EMS lower than 10. The distribution of scores was highly heterogeneous in aMCI. Overall, there was a large, statistically significant difference in the distribution of scores (χ2 = 149.93; p < 0.001).
Comparison between converters and subjects with stable forms of aMCI
At the two-year follow-up, 55 subjects (27.8% of the aMCI sample) converted to dementia; all of them were affected by AD.
Table 3 reports the comparison of demographic features and neuropsychological performances between aMCI-converters and aMCI-stable. As shown, aMCI-converters were older than aMCI-stable and at the baseline examination they performed worse on most of the neuropsychological tests (i.e., MMSE, RAVLT, ROCF, copy of figures, digit span forward and backward, the Stroop test, PVF, SVF, and actionsnaming).
Episodic Memory Score: Comparison between converters and subjects with stable forms of aMCI
The mean EMS score obtained by aMCI converters was significantly lower than that of aMCI-stable subjects (6.00 ± 1.440 versus 8.09 ± 2.264; |t|196 = 7.710; p < 0.001). The lower part of Table 4 displays the distribution of individual EMS scores in aMCI-converters and aMCI-stable subjects. As shown, none of the aMCI subjects who converted to dementia obtained an EMS score higher than 9, with over 80% scoring lowerthan 8. The distribution of scores was significantly different between the two groups (χ2 = 36.54; p < 0.001).
Table 4 reports the sensitivity, specificity, positive predictive value and negative predictive value of all possible cut-off points of the EMS.
Comparison between the EMS and individual memory scores in predicting the conversion
Table 4 reports the comparison between EMS and individual memory scores obtained at baseline in detecting conversion to dementia. The scores obtained on RAVLT and ROCF delayed recall were corrected for age and literacy according to the respective normative studies [25, 38]. For most of the possible cut-off points, the EMS showed high sensitivity, with at least acceptable levels of specificity, that were always higher than those observed in the individual memory tasks.
The classification accuracy of the EMS and memory tasks was also compared by computing ROC curves, and estimating the respective area under the curve (AUC). The EMS had the largest AUC (0.767; 95% CI: 0.7017–0.8327), which was significantly higher than the AUC of RAVLT – delayed recall (0.618; 95% CI: 0.5303–0.7048; χ2 = 10.86; p = 0.001); RAVLT–delayed recognition accuracy (0.651; 95% CI: 0.5678–0.7347; χ2 = 9.88; p = 0.002); ROCF–delayed recall (0.537; 95% CI: 0.4504–0.6240; χ2 = 22.33; p < 0.001). The comparison between the AUCs of the EMS and RAVLT–immediate recall showed a near to significant difference (0.687; 95% CI: 0.6099–0.7648; χ2 = 3.11; p = 0.078).
We did not provide a cut-off point indicating the most appropriate levels of EMS, because apart from the general observation that the lowest EMS values (which corresponded to the most pathological levels of the memory scores) gave the highest percentage of correct classifications, with the increase of the EMS values there was a trade-off between sensitivity and specificity. In fact, sensitivity was very low (16%) and specificity was very high (94%) when the cut-off was placed between 4 and 5 of the EMS values, whereas more balanced values of sensitivity (73%) and specificity (71%) were obtained when the cut-off was put placed between 6 and 7 of the EMS values, even though the percentage of correct classifications was almost the same (72.7 for a cut-off <5 and 71.2 for a cut-off <7). With increasing EMS values the sensitivity still increased, but lower levels of specificity and of prognostic accuracy were obtained. For instance, when the cut-off was put placed between 8 and 9, the sensitivity increased to 92% , but the specificity decreased to 45% and the percentage of correct classifications did not reach 60% .
Multiple variables logistic regression
The EMS was entered as independent variable in a multiple variables logistic regression analysis in which all variables that showed statistically significant differences between aMCI-converters and aMCI-stable were entered (namely, MMSE, digit span forward and backward, Stroop’s test, phonological and semantic verbal fluency and, verbs naming), along with age. In order to provide a stable model, a bootstrap resampling technique was used in which 5000 iterations were performed. As shown in Table 5, the EMS was the only reliable predictor of progression from aMCI to dementia.
DISCUSSION
Our research substantially confirms that integrating the results obtained on verbal and non-verbal episodic memory tasks increases the ability to predict conversion from aMCI to AD dementia based on the results of individual memory tasks. The advantage of an integrated analysis of memory scores with respect to analyzing the results of a single test is justified by several theoretical reasons, and specifically: a) it allows including in the final evaluation the different factors that could determine the final performance on memory tests (i.e., learning, forgetting, recollection ability, and visual versus verbal memory discrepancies); b) it provides the opportunity to graduate and assign different powers to the severity of memory performances beyond the criteria simply supplied by the cut-off; and c) it reduces the casuality error that is clearly more frequent when only one test is administered. However, if we continue to analyze the results obtained on individual tests we must also acknowledge that the difference between converters and non-converters involves (in addition to scores obtained on the EMS) also scores of general mental impairment (MMSE) and scores denoting an impairment of executive functions (e.g., ROCF: copy; Digit span backward; Stroop’s test – interference time and errors; PVF), as many previous investigations [e.g., 40–45] have already shown. Nevertheless, we would like to stress the fact that when the EMS was entered as independent variable in a multiple variables logistic regression analysis, in which all the variables that showed statistically significant differences between aMCI-converters and aMCI-stable subjects (i.e., MMSE, digit span forward and backward, Stroop’s test, phonological and semantic verbal fluency, verbs naming, and age) were also entered, the EMS was the only reliable predictor of progression from aMCI to dementia. Another noteworthy finding of our research is that among the individual memory scores the one which best distinguished converters from non-converters was the immediate (rather than the delayed) recall of the RAVLT. This is probably because the scores obtained on the delayed recall of the RAVLT had been used (with a hard cut-off point) to identify aMCI patients. There was, therefore, a floor effect which made it difficult to distinguish between converters and non-converters using delayed recall of the RAVLT, whereas scores obtained on immediate recall were much more widely distributed and, for the first and the intermediate part of the list, also came from the long-term memory store [23, 47].
It is worth noting that our purpose was more to propose a method than to provide diagnostic data set. Actually the purpose of our paper is to suggest a method of scoring that encompasses both severity and consistency of the memory performances, providing adjunctive information about the risk of progression from MCI to AD. As a matter of fact, our method could also be used by putting together categorized scores obtained on the free and cued, immediate and delayed recall of the ‘Grober & Buschke Selective reminding test’ or of the ‘California Verbal learning test’, since it has been shown that providing support for the semantic encoding of memorandum and category cues at the time of retrieval does not improve results obtained on recall by AD patients (see Gainotti for review) [9]. Thus, the use of ‘global memory scores’, of whatever memory test procedure, could help to measure also the severity and the consistency of the performances which are probably the most important factors to be taken in to account in predicting the progression todementia.
A final important point that must be discussed concerns the cut-off points to be chosen for practical application of the EMS values, because we have seen that the greatest prognostic accuracy is obtained using low cut-off points, but that when they increase there is a trade-off between greater sensitivity and lower specificity. Perhaps the best way to use the EMS values is to integrate them with the other information available to the clinician. If this information is specific, but not sensitive, a rather high cut-off could be selected, whereas a very low cut-off could be chosen if this information is sensitive, but not very specific. For instance, the use of radiolingands for amyloid, which are sensitive but not specific markers of conversion to AD [48] could be combined with low EMS values, which are very specific but not very sensitive markers of the same process. On the other hand, the use of 18F-FDG PET, which is a marker more specific than sensitive [49], could be combined with high EMS values, which are more sensitive than specific markers of conversion from aMCI to AD.
