Abstract
Background:
Episodic memory testing is fundamental for the diagnosis of Alzheimer’s disease (AD). Although the Free and Cued Selective Reminding Test (FCSRT) is widely used for this purpose, it may not be sensitive enough for early detection of subtle decline in preclinical AD. The Memory Binding Test (MBT) intends to overcome this limitation.
Objectives:
To analyze the test-retest reliability of the MBT and its convergent validity with the FCRST.
Methods:
36 cognitively healthy participants of the ALFA Study, aged 45 to 65, were included for the test-retest study and 69 for the convergent analysis. They were visited twice in a period of 6 ± 2 weeks. Test-retest reliability was determined by the calculation of the intra-class correlation coefficient (ICC). Score differences were studied by computing the mean percentage of score variation between visits and visualized by Bland-Altman plots. Convergent validity was determined by Pearson’s correlations.
Results:
ICC values in the test-retest reliability analysis of the MBT direct scores ranged from 0.64 to 0.76. Subjects showed consistent practice effects, with mean amounts of score increasing between 10% and 26%. Pearson correlation between MBT and FCSRT direct scores showed r values between 0.40 and 0.53. The FCSRT displayed ceiling effects not observed in the MBT.
Conclusions:
The MBT shows adequate test-retest reliability and overall moderate convergent validity with the FCSRT. Unlike the FCSRT, the MBT does not have ceiling effects and it may therefore be especially useful in longitudinal studies, facilitating the measurement of subtle memory performance decline and the detection of very early AD.
Keywords
INTRODUCTION
In contrast with age-related memory impairment, Alzheimer’s disease (AD) is characterized by a recall deficit that does not significantly improve with cueing or recognition procedures, after effective encoding of information, what has been defined as the “amnestic syndrome of the hippocampal type” [1]. Recent diagnostic criteria for AD include the detection of this type of memory impairment [2, 3]. Certain memory tests, such as the Free and Cued Selective Reminding Test (FCSRT) [4], sensitively detect memory features indicative of subtle hippocampal damage [5, 6]. Therefore, the FCSRT is commonly used as a screening measure in clinical trials targeting prodromal AD [7, 8]. However, neither the FCSRT nor the FCSRT-IR, that adds immediate recall during the learning process [7, 9], can detect preclinical memory impairment. In this scenario, the Memory Binding Test (MBT), previously referred to as the Memory Capacity Test (MCT), was designed to improve the detection of early memory impairment when memory performance is still within the normal range [10].
In the FCSRT, participants learn one list of 16 items in relation to 16 category cues and are then asked to recall those items both in an unstructured manner (free recall) and in relation to the category cues (cued recall). Free recall is the most sensitive measure to early AD; ceiling effects limit the sensitivity of cued recall [6, 11]. In the MBT, participants learn not one but two lists of 16 words each in relation to the same set of 16 category cues. Participants are ultimately asked to recall both items for each category cue. Recalling the separately learned exemplars in relation to their shared category cue requires binding, that is, the ability to form associations, which is critical for memory formation [12]. Thus, the MBT mainly differs from the FCSRT in its reliance on cued recall over free recall, in the elimination of ceiling effects by providing two exemplars for each cue and, finally, in its requirement for associative binding.
Associative binding represents a conjunctive association that can facilitate familiarity and within-domain associations [13]. Deficits in associative binding, also referred to as relational binding, have been related to damage in the hippocampus and related medial temporal lobe structures [14–16]. Effective binding should represent an advantage to learn words that, despite belonging to different lists, share the same semantic category in pairs. Persons in the earliest stages of AD may have reductions in memory binding ability [17] showing greater deficits in tests that require binding than in others that require the recollection of unrelated information. In healthy elderly, this pattern is not observed [18]. Therefore, the MBT may be sensitive to early memory impairment [10].
Although there is an extensive amount of literature on normative data and the psychometric properties of the FCSRT, such as influence of demographic factors [19–21], test-retest reliability and construct validity [7, 9], and diagnostic and predictive utility [22, 23], few have tested the MBT. These studies have shown that the MBT is more sensitive than other memory measures for the detection of subtle memory impairment in individuals in the preclinical stage of AD and that it correlates better to brain amyloid deposition [24–27]. Further, we recently contributed with reference data and correction factors for sociodemographic characteristics of the adapted Spanish form A of the MBT in a midlife Spanish sample mainly consisting of first degree descendants of AD patients [28] (form B is currently under development).
Nevertheless, the psychometric properties of the MBT are still to be characterized. It is very relevant to analyze the MBT test-retest reliability to provide an estimate of the correlation between two scores from the same test that is administered at two different time points [29]. Moreover, since both the FCSRT and the MBT have been designed to detect what has been referred to as “genuine” memory impairment of the Alzheimer type [4], it is of high interest to study their convergent validity. In this scenario, the present study has two main aims: first, to analyze the test-retest reliability of the MBT and, second, to describe the convergent validity of the MBT with the FCRST.
MATERIALS AND METHODS
Participants
Participants were members of a wider research infrastructure: the ALFA Study, for zheimer and milies (Clinicaltrials.gov Identifier: NCT01835717). For more information and details on inclusion and exclusion criteria, please refer to the above-mentioned registry. Briefly, to become a member of the ALFA Study, participants had to be between 45 and 75 years of age and were excluded if they did not pass a neuropsychological screening (detailed below), presented any medical condition that could interfere with cognition or with the results of the study or had relevant neurological conditions or major psychiatric disorders.
The study was approved by the Ethics Committee of the “Parc de Salut Mar” (Barcelona, Spain) and conducted in accordance to the directives of the Spanish Law 14/2007, of 3rd of July, on Biomedical Research. All participants accepted the study procedures by signing an informed consent and had a close relative, who also signed an informed consent, volunteering to participate in the functional assessment of the participant.
Procedure and materials
Both studies (test-retest reliability and convergent validity) consisted of two visits separated by a period of 6 ± 2 weeks. This interval was determined by considering a time frame that was short enough to ensure the cognitive stability of the participants but, at the same time, long enough to minimize the learning and retention effects of the test material. Visit 1 was the basal visit of the parent study that, together with other procedures (such as obtaining the clinical history, basic sociodemographic data, and a blood sample for further genetic characterization), included a neuropsychological screening and a cognitive test battery where either the MBT or the FCSRT were administered. Visit 2 solely consisted of the administration of the neuropsychological screening and the cognitive test battery.
The neuropsychological screening consisted of the following tests (their corresponding exclusion cutoffs are indicated): Mini-Mental State Examination (MMSE) [30, 31], cutoff <26; Memory Impairment Screen (MIS) [32, 33], cutoff <6; verbal semantic fluency (naming animals) [34, 35], cutoff <12; Time Orientation of the Test Barcelona II (TO TB-II) [36], cutoff <68; Clinical Dementia Rating Scale (CDR) [37], cutoff >0; and Goldberg Anxiety and Depression Scale (GADS) [38, 39]. The GADS was used for mood state screening purposes. Whenever the scores were over the cutoffs defined for suspected disorder (i.e., anxiety >3; depression >1), the rater checked whether the participant met DSM-IV criteria for General Anxiety Disorder or Major Depressive Episode and, if this was the case, they were excluded from thestudy.
The test-retest reliability study included 36 participants that were administered the MBT in both visits. To counterbalance the order of administration of the tests, the participants in the convergent validity study were sequentially assigned to one of two possible protocols. Individuals assigned to protocol 1 were administered the MBT in the first visit and the FCSRT in the second one. Inversely, individuals assigned to protocol 2 were administered the FCSRT in the first visit and the MBT in the second one. Due to the similar nature of these two tests, they should never be administered in the same testing session. Initially, 73 participants were included in the convergent validity study, but two cases were lost-to-follow-up and two other cases were considered outliers (with an associated Z-score of –4.2 in the FCSRT) and, therefore, excluded from the analysis. Out of 69 valid cases, 35 participants were assigned to protocol 1 and 34 to protocol 2.
Materials and instructions to administer the MBT and the FCSRT were provided by their author(Dr. Herman Buschke) and the Albert Einstein College of Medicine of Yeshiva University of New York (AECOM). The Spanish version of the MBT was obtained through a translation and transcultural adaptation process according to the linguistic criteria followed in the original version, which are explained in more detail in a previous paper [28].
The FCSRT consists of the learning and retention of a list of 16 semantically unrelated words through a controlled learning process with semantic encoding. During the learning phase the participant is asked to read and associate 16 printed words with their corresponding semantic cue (e.g., “Which is the fruit?”), consecutively presented in sets of four. After this initial learning and encoding procedure, three recall trials are performed, each one preceded by a brief distraction task (i.e., backwards calculation during 20 s). Each recall trial has two parts: free recall and cued recall for the words not spontaneously retrieved, providing selective reminding when the corresponding word has not been recalled after cueing. The same encoding cues used in the learning phase are used again in the cued recall condition (e.g., “Which was the fruit?”). A delayed free and cued trial is performed 30 min(± 5 min) later.
The MBT consists of the learning and retention of two lists of 16 words belonging to 16 different semantic categories presented in the same order. Each list is learned by reading and identifying each item in an array of printed words shown in sets of four when their category cue is presented (e.g., “Which is the insect?”). Immediately after the 16 words on the first list have been correctly associated with their corresponding semantic cue, the cued recall of these words is assessed. Next, and without distraction or delay, the same procedure is followed with the second list. Both lists share semantic categories in pairs of words (e.g., the semantic cue “insect” applies for “flea” in one list and for “ant” in the other). Straight after, paired cued recall is tested. The category cues are presented again, in the same order, to assess binding by recall of both items together for each category cue (e.g., “Now, from both lists, which were the insects?”). Next, and again without any delay or interference, free recall is assessed by asking the subject to state all the words that he/she can remember from both lists. Finally, delayed free recall and paired cued recall are tested 30 min (± 5 min) later.
During the in-between lapse of time for the delayed recall, either for the FCSRT or for the MBT, other cognitive tests, without verbal content to avoid interference, were administered.
Variables analyzed
Of the numerous variables that both the FCSRT and the MBT yield, we selected the most representative ones according to the main features of this kind of tests (i.e., free and cued and immediate and delayed recall) and that are more comparable between both tests. Those variables that directly result from the addition of correct words given under each condition were tagged as direct variables, while those that arise from a ratio were considered derived variables. Table 1 shows a description of what is measured by each of the variables and their correspondence between both tests. Two variables of the MBT, namely Semantic Proactive Interference (SPI) and Paired Recall Pairs (PRP), represent a novel contribution with respect to the FCSRT. Therefore, they do not have a counterpart in the convergent analysis. Descriptive statistics were used to analyze the demographic characteristics of the two study samples.
Test-retest reliability analysis
First of all, as suggested by recent literature on serial assessments [40], raw descriptive data for the test and retest assessments of the MBT are shown with their corresponding correlation values. Performance change was expressed by computing the mean percentage of score variation in Visit 2 as compared to that in Visit 1.Test-retest reliability was determined by the calculation of the intra-class correlation coefficient (ICC) and its associated 99% confidence interval. To control for Type I errors related to multiple comparisons, a restrictive confidence interval was used (alpha = 0.01). To illustrate the interpretation of correlation, the data of the retest scores against the test scores were plotted including the line of equality and showing the R2 (r-square). We also carried out the Bland-Altman analysis [41], which includes a scatterplot of the differences between test and retest against test.
Convergent validity analysis
The convergent validity of the MBT and FCRST variables was determined by the calculation of the Pearson’s correlation coefficient (r) with a restrictive 99% confidence interval. For Total Free Recall (TFR) and Total Delayed Free Recall (TDFR) the scatterplots were generated to illustrate the linear relationship, and also the squared correlation (R2) that indicates the proportion of shared variance. After visual inspection of the scatterplots a curvilinear relationship (nonlinear) in Total Recall (TR) against Total Paired Recall (TPR) and Total Delayed Recall (TDR) against Total Delayed Paired Recall (TDPR) was detected, as well as the presence of ceiling effects (scores close to the highest level of performance) in the FCRST variables. For these reasons segmented (piecewise) analyses were carried out. The MBT variables (independent) were clustered into two different groups because exhibited different relationships with FCRST variables (dependent) in these segments. A “segmented quadratic model with plateau” was fitted to estimate both segment models, the first segment was fitted with a quadratic model equation (curvilinear) and the second with a horizontal line (plateau) with an unknown join point (or knot). The NLIN procedure of SAS was used to find the optimum model and estimate the parameters [42].
RESULTS
The sociodemographic characteristics of the evaluable sample for each of the studies, including the descriptive data for the neuropsychological screening tests, are shown in Table 2.
Test-retest reliability
Raw descriptive data for the test-retest of the MBT together with the corresponding correlation values for the main variables are shown in Table 3. A global practice effect for direct variables, evidenced by higher scores in the retest, is observed. The mean increase in the retest ranges from 10% to 26% of the words recalled. Regarding derived variables, the mean percentage of change is minor, ranging from –0.6% to 3% . Correlations between test and retest were high (ICC ranging from 0.64 to 0.76) for direct variables and low, even negative, for derived ones.
Figure 1 shows the scatterplots for the test and retest scores of the MBT and the Bland-Altman plots showing the difference between the two measures for the variables Total Paired Recall (TPR) and Total Delayed Paired Recall (TDPR).
Convergent validity
The descriptive data for the FCSRT and the MBT variables as well as their correlation values are shown in Table 4. Analysis of the raw data shows that performance in the FCSRT tends to be higher than in the MBT. This is shown by comparing the main variable of the FCSRT (TR) and its counterpart in the MBT (TPR): the mean score in the FCSRT-TR is 44.7, which represents the 93% of the possible maximum score, i.e., 48, whereas for the MBT-TPR the mean only represents the 79% (25.4 out of 32). With regards to the delayed recall total scores, the percentages are 95% for the FCSRT-TDR and 78% for the MBT-TDPR. Figure 2 shows the relationship between the main variables of the FCSRT and the MBT. Both the total immediate recall (FCSRT-TR and MBT-TPR; Fig. 2A) and the total delayed recall (FCSRT–TDR and MBT-TDPR; Fig. 2B) scatterplots show a ceiling effect in the FCSRT with respect to the MBT. These FCSRT best performers show a wide range of variability in their performance in the MBT. As a differential trait, the segmented models allow the study of more than one pattern at the same time in the relationship between data by identifying the optimal point that maximizes the explained variance of the model. In our model, a plateau (horizontal line) was forced upon the rationale of the FCSRT ceiling effect. Thus, the plateau was found in the theoretical FCSRT-TR score of 45.9 for the immediate recall and in the 15.5 for the delayed recall (FCSRT- TDR). Both scores are very close to the possible maximum scores of the variables (48 and 16, respectively).
DISCUSSION
The MBT is a test designed to detect early pre-symptomatic memory impairment due to AD and to overcome some limitations of the widely used FCSRT for the detection of subtle hippocampal damage [5, 10]. In this study, the test-retest reliability of the MBT and the convergent validity with the FCSRT have been analyzed.
Test-retest reliability
For the direct variables, the ICC ranged from 0.64 to 0.76, which can be considered moderate to high (Table 3). Despite these good reliability results we have observed a consistent and significant improvement in the mean scores obtained by the participants in their second assessment. Moreover, the magnitude of such improvement was also noticeable in terms of raw scores. For TPR a mean increase of 2.6 points represented an increase of about a 10% in performance, while for the TDFR improvement exceed the 25% of the initial score. This tendency to improvement in the second assessment is not surprising, since practice effects are known to be especially challenging for serial memory testing. In addition, the variability in practice effects has been suggested to reduce the magnitude of test-retest reliability correlations [43]. However, despite practice effects lead to an increase in retest score, test-retest reliability is only partially threatened by this phenomenon: If improvement in the task is a universal trend (e.g., all subjects obtain one additional point) the correlation between test-retest scores would remain unchanged. Subsequent exposures to the same memory stimuli represent additional learning trials that facilitate both the storage of the material and the effect of procedural learning, that is, benefiting from knowing how to approach the task more effectively [44]. In fact, not showing a certain degree of learning in subsequent retest assessments could even be used as an indicator of which subjects will suffer cognitive decline in the future [45, 46].
A previous report of the test-retest reliability of the FCSRT [7] showed slightly higher correlation coefficients (0.76 for free recall and 0.83 for total recall). However, the sample in that study was composed of subjects enrolled in a prodromal AD clinical trial and parallel forms of the test were used. Therefore, our results are not comparable to theirs. However, despite the theoretical predictive ability of the absence of learning effects for cognitive impairment, serial assessments often aim to track the real performance of an individual in a given domain. For such an aim, one would ideally use a learning-free test. A common practice to address this is with the use of parallel forms. Parallel forms, or alternate versions, try to minimize the influence of a previous exposure to the task by substituting its items with equivalent ones in terms of difficulty. Although declarative memory, i.e., the content of the task, can be controlled with parallel forms, the procedural learning derived from initial exposure still remains [47]. In any case, having parallel versions available has been demonstrated to be useful in serial assessments and an alternate version of the MBT would be of high interest.
With regards to retention indexes, we found almost no correlation between the test and retest of the MBT (Table 3). Even a negative correlation between the two measurements of the retention index arose when free recall was taken into account (DFRR). Taken as a group, the mean percentage of change is certainly small, about the 3% , but there is a high inconsistency when the analysis is performed at the subject level. Again, it seems that there is an intrinsic intra-individual variability in the ability to freely retrieve the already learned material. Such variability would deserve further analyses to establish the utility of retention indices based on free recall. On the other hand, when cued recall is taken into account (DPRR), the low correlation can be explained (as it was the case in the convergent analysis) by the reduced data variability. Indeed, the score range in this case was very narrow in the first assessment and even narrower in the retest. The unstable performance in free recall versus the consistently retained information as shown by cued recall can be explained by the already classical distinction between the accessible information (through free recall) and what is really available in the memory storage (through cued recall) or, what is the same, the distinction between recall and retention [48]. In any case, the ICC reflects the variation of scores between test and retest. Thus, DPRR can be considered as a robust variable that will be probably specifically sensitive to cognitive impairment.
The MBT also allows the assessment of semantic proactive interference, that occurs when previous learning interferes with new one [49]. The vulnerability to proactive interference may be useful for predicting future progression to dementia [50] and can be best observed when the subject has to learn two competing lists of targets that share semantic categories [50, 51]. With regards to semantic proactive interference, the results show a modest ICC with no relevant change between the two assessments, which can be interpreted as a quite stable measure. This is consistently described in associative learning research [52]. The effect of contextual cues in acquisition and/or retrieval tends to disrupt the recency effect, that is, the firstly trained association (list 1) tends to prevail affecting the retrieval of the most recent one (list 2). This is an interesting effect that deserves to be monitored in longitudinal assessments, as it is believed that, although proactive interference susceptibility increases with age [53], exceeding a certain vulnerability threshold could be an early manifestation of progression to dementia [54].
Convergent validity
The FCSRT and the MBT displayed moderate positive correlation coefficients for the direct variables, being the highest one the comparison between TR of the FCSRT and TPR of the MBT (r = 0.535). The rest of the coefficients of direct variables range between this value and r = 0.404 (Table 4). As mentioned before, in the FCSRT free and cued recalls are alternated, while in the MBT free recall is not assessed until cued recall of both lists has been assessed. However, the delayed recall trials follow the same pattern in both tests: first free and then cued recall (paired for the MBT). Therefore, although the variables of both tests have been confronted by their essential conceptualization, that is of free or cued recall, they not only differ in their demand (one versus two 16-word lists), but also in the administration procedure. This could partially explain that the correlation between both tests is only moderate. Moreover, the MBT intends, from its inception, to overcome some limitations of the FCSRT. As such, it was designed to avoid any ceiling effects in cognitively healthy subjects. This has been corroborated in a recent study by showing that the MBT (referred to as the MCT in that study) provides increased performance variance in the assessment of clinically normal older adults, where memory changes are only subtle [27]. Such absence of ceiling effect can be specifically observed in the present work in the segmented analysis plots (Fig. 2) which show a range of variability in the upper scores of the MBT that is not captured by the FCSRT. Many subjects performing in the upper end of the FCSRT (i.e., scores of 47-48 in TR, or 15-16 in TDR) showed a wider range in the comparable MBT variables (i.e., scores ranging from 20 to 32 in TPR, or 19 to 32 in TDPR). This is the expected behavior for a test that is more sensitive to subtle subclinical changes than the FCSRT. This result evidences the MBT enhanced capacity with respect to the FCSRT to distinguish among different memory performances within the normal range of memory ability.
With regards to the derived variables, that is, the retention indexes, we have found weak or even no correlation between the MBT and FCSRT indexes. Low Pearson correlations are mirroring intra-subject variability in the amount of information successfully retrieved in delayed trials as compared to the amount of information acquired during the learning process. However, such variability is not high in terms of real magnitude. Despite the lack of correlation, mean percentages of retention were similar across both tests, being around 100% . The low dispersion of the data also supports this notion. Dispersion, as measured by SD, is small for both memory tasks in both tests (approximately of 20% for DFRR and of 5% for DTRR/DPRR) suggesting that, although participants globally vary in their retention ability at different assessments, this variation lies in a narrow and well-defined band of good retention performance. Individuals clearly tended to retain around 100% of the learned material over the delayed period. In fact, these phenomena are also consistently observed in the test-retest reliability analysis of the MBT (see above). Therefore, both the FCSRT and the MBT evidence that, in cognitively healthy participants, robust retention ability, which is not captured by free recall, is expected. In fact, both tests are based in cued recall and in the encoding specificity principle described by Tulving and Thompson [55], which maximizes the recall of the truly encoded information: in its broadest form the principle asserts that only can be retrieved what has been stored and that how it can be retrieved depends on how it was stored (p. 359). The results presented in our manuscript support the idea that an index of rapid forgetting can be used as a predictor of progression to dementia among elderly individuals [56].
Limitations and future directions
In the future, the longitudinal follow-up of the participants and the study of clinical samples will allow us to determine other essential and desirable MBT psychometric characteristics as, for example, its predictive value. We hypothesize that longitudinal assessments will allow us to capture subtle variations in memory performance through the MBT among subjects that still perform at a top level in the FCSRT. To confirm this, follow-up assessments of the ALFA population are underway. Moreover, it will be very interesting to have a reliable parallel version (form B) in order to make repeated assessments more feasible, something which our group is already preparing.
Conclusions
The MBT is a novel test to assess episodic memory function that shows good test-retest reliability, although it is not exempt from learning effects. On the other hand, the MBT overcomes certain limitations of the FCSRT in the detection of subtle memory impairment. This is supported by the absence of a ceiling effect in the performance of the MBT among subjects that obtain the maximum possible scores in the FCSRT. Therefore, the MBT may be especially useful in longitudinal studies, facilitating the identification of subtle decline in memory performance and the detection of very early AD.
Footnotes
ACKNOWLEDGMENTS
Albert Einstein College of Medicine owns the copyright for this test and makes it available as a service to the research community but charges for commercial use. For permission requests contact the AECOM at: biotech@einstein.yu.edu.
The research leading to these results has received funding from “la Caixa” Foundation. Additional funding was obtained from: Fondo de Investigación Sanitaria (FIS), Instituto de Salud Carlos III (ISC-III) under grant PI12/00326; the Einstein Aging Study under grant P01 AG03949. Juan D. Gispert holds a ‘Ramón y Cajal’ fellowship (RYC-2013-13054).
First of all, we want to express our gratitude to all the volunteers participating in this study. We thank the raters and nurse for data collection: Berta Blasco, Anna Brugulat, Iris Cceres, Susana Castrillo, Juan C. Cejudo, Xavier Gotsens, Nuria Leiva, Paula Marne, Lidia Medina, Carolina Rodríguez, Emili Rodríguez, Alicia Sabio, Laia Tenas, and Tania Menchón; and the neuropsychology students that contributed for data collection of the Master of Neuropsychology of the Autonomous University of Barcelona –Hospital del Mar (Marta Almería, Sonia Arribas, Carlota Arrondo, Cristina Borrs, Mercedes Florido, Juan Luís García, Alba Gómez, Natalia Martín, Beatriz Pereira, Carmen Prez, Mireia Rivero, Anna Suades, Adri Tort, with an special acknowledgement for their dedication to Greta García, Lorena Grau and Joana Piqu), and of the Master of Diagnostic and Rehabilitation in Neuropsychology of the Autonomous University of Barcelona –Hospital Sant Pau (Sheila Aguilar, Xavier Borrell, Carmen Corte, Carla Dalmau, Isabel Leiva, Alicia Ribas, Jlia Vzquez, Muriel Vicent and Natlia Vilamajó). We thank Carolina Minguillón for text editing and continuous support and Andreia Carvalho for her logistic support.
In memory of Mrs. Maria Thos i Negre, the authors would like to express their gratitude for her donation to the Pasqual Maragall Foundation for research on Alzheimer’s prevention.
