Abstract
Background:
Visual rating scales for medial temporal lobe atrophy (MTA) and posterior atrophy (PA) have been reported to be useful for Alzheimer’s disease diagnosis in routine clinical practice.
Objective:
To investigate the efficacy of combined MTA and PA visual rating scales to discriminate amnestic mild cognitive impairment (aMCI) patients from healthy controls.
Methods:
This study included T1-weighted MRI images from two different cohorts. In the first cohort, we recruited 73 patients with aMCI and 48 group-matched cognitively normal controls for training and validation. Visual assessments of MTA and PA were carried out for each participant. Global gray matter volume and density were estimated using voxel-based morphometry analysis as the objective reference. We investigated the discriminative power of a single visual rating scale and the combination of the MTA and PA rating scales for identifying aMCI. The second cohort, consisting of 33 aMCI patients and 45 controls, was used to verify the reliability of the visual assessments.
Results:
Compared with the single visual rating scale, the combination of the MTA and PA exhibited the best discriminative power, with an AUC of 0.818±0.041, which was similar to the diagnostic accuracy of the gray matter volumetric measures. The discriminative power of the combined MTA and PA was verified in the second cohort (AUC 0.824±0.058).
Conclusion:
The combined MTA and PA rating scales demonstrated practical diagnostic value for distinguishing aMCI patients from controls, suggesting its potential to serve as a convenient and reproducible method to assess the degree of atrophy in clinical settings.
Keywords
INTRODUCTION
Alzheimer’s disease (AD) is the most common cause of neurodegenerative disorders leading to dementia. Mild cognitive impairment (MCI) occurs along a continuum from normal cognition to dementia [1]. The amnestic subtype of MCI has a high risk of progression to AD, and it could constitute a prodromal stage of this disorder [2]. Fischer et al. reported that patients with amnestic mild cognitive impairment (aMCI) have a 2.5-year overall conversion rate of 48.7% [3]. Given the lack of effective treatment strategies for AD dementia, early identification of aMCI may offer the opportunity for therapeutic success to prevent or delay the process of AD [4].
Traditional psychological assessments have several issues for the diagnosis of aMCI, such as limited diagnostic accuracy, lack of unified assessment tools, and subjectivity [5, 6]. In contrast, neuroimaging techniques are objective and much more stable. Currently, valuable neuroimaging biomarkers supportive of AD pathology are becoming available [7–9]. Structural magnetic resonance imaging (MRI), characterized by its relative noninvasiveness and feasibility, has been widely employed in revealing brain global and regional morphological changes in individuals with AD and aMCI [10–12]. For instance, researchers reported a significant gray matter (GM) volume reduction of the medial temporal cortices (e.g., hippocampus and entorhinal cortex) in aMCI patients [13, 14]. According to the National Institute on Aging-Alzheimer’s Association (NIA-AA), the utility of structural MRI has also been recommended for assisting in the detection of MCI [11]. In addition, compared with psychological assessments, structural MRI serve as a sensitive biomarker to identify aMCI converters from nonconverters. Previous studies have found that brain structural abnormalities have been predictive factors for the longitudinal development of aMCI subjects [13, 15]. However, due to the long processing time and dependency on specific algorithms in MRI quantitative analysis, brain morphological measurements have not yet been widely applied in routine clinical work [16].
Alternatively, visual rating scales may serve as a useful and cost-effective diagnostic tool in clinical settings. In 1992, Scheltens et al. first reported the diagnostic value of visual medial temporal lobe atrophy (MTA) for AD patients [17]. Using this semiquantitative method, AD patients showed a significantly higher degree of MTA than controls. Subsequently, other studies have also confirmed the feasibility of MTA in discriminating AD and MCI from healthy elderly controls [18–20]. For instance, Westman et al. reported a prediction accuracy of 81% for distinguishing AD from controls based on subjectively assessed MTA, which was similar to predictions based on the volume of the manually segmented hippocampus [20]. However, MTA can also be observed in aging, frontotemporal lobe degeneration, and vascular dementia, suggesting its limited ability to distinguish AD from other types of dementia and even normal aging [21, 22]. Moreover, it has been reported that in addition to MTA, AD patients also present significant posterior cingulate gyrus and temporoparietal cortex atrophy, especially in early-onset individuals [23–26]. A visual rating scale for posterior atrophy (PA), proposed by Koedam et al. in 2011, was developed and referred to as a valuable tool in the daily assessment of dementia [23]. Möller et al. further confirmed that PA could be quantitatively validated and reflect GM atrophy in parietal regions [27]. Previous studies have demonstrated improved accuracy of the combination of MTA and PA visual rating scales in identifying patients with AD from normal aging [28]. Nevertheless, there are no available studies assessing the discriminative power of combined visual MTA and PA rating scales for identifying aMCI from controls.
Currently, two major methods have been proposed to define aMCI: the conventional Petersen/Winblad criteria and the actuarial neuropsychological Jak/Bondi criteria [6]. In this study, we aimed to investigate the effectiveness of combined MTA and PA visual rating scales to discriminate aMCI from normal controls (NCs). To prove the robustness of these visual rating scales, we used two sources of datasets with the above two criteria for aMCI diagnosis. Patients with aMCI in the first cohort were diagnosed according to the Petersen criteria, while aMCI patients in the second cohort were identified through the Jak/Bondi criteria. First, based on
METHODS
Participants
We recruited 73 patients with a diagnosis of aMCI and 48 group-matched NCs in the first cohort (cohort A). Both patients and controls were recruited from the Memory Clinic of the Department of Neurology, Xuanwu Hospital of Capital Medical University, Beijing, China from September 2009 to December 2015. Patients with aMCI were diagnosed according to the criteria proposed by Petersen et al. [29, 30], which included: 1) memory loss complaint confirmed by an informant; 2) objective cognitive impairment in single or multiple domains, adjusted for age and education; 3) preserved general cognitive function; 4) failure to meet the criteria for dementia; 5) the clinical dementia rating (CDR) score is 0.5. The inclusion criteria for the control group included: 1) no complaint of memory loss; 2) CDR score is 0; 3) no severe visual or auditory impairment. The general exclusion criteria for both groups included: 1) a history of stroke; 2) major depression, with Hamilton Depression Rating Scale score >24 points; 3) other central nervous system diseases that may cause cognitive impairment, such as Parkinson’s disease, tumors, encephalitis, and epilepsy; 4) cognitive impairment caused by traumatic brain injury; 5) systemic diseases, such as thyroid dysfunction, syphilis, and HIV; 6) a history of psychosis or congenital mental developmental delay. In cohort A, 58 aMCI subjects were followed up. Within the follow-up period of four years (median, 22 months), 36 aMCI subjects converted to AD [9], and 22 were nonconverters.
Participants in cohort B, including 33 aMCI patients and 45 controls, were recruited mainly through standardized public advertisements and referrals from general physicians, or informants from March 2017 to April 2019. The definition of MCI was in accordance with the criteria proposed by Jak and Bondi in 2014, which is mainly based on regular neuropsychological tests [6]. Participants were diagnosed as MCI if they met any one of the following three criteria and failed to meet the criteria for dementia: 1) having impaired scores (defined as >1 SD below the age-corrected normative means) on both measures in at least one cognitive domain (memory, language, or speed/executive function); 2) having impaired scores in each of the three cognitive domains sampled (memory, language, or speed/executive function); 3) the Functional Activities Questionnaire ≥9. Furthermore, in this study, individuals with memory complaints and objective memory decline were considered as aMCI patients, and those with only language or executive function deficits were excluded. Group-matched cognitively normal older adults were included in the control group. The general exclusion criteria were consistent with those used for the first cohort.
The research activities involved in this study have been conducted in accordance with the ethical standards of the Helsinki Declaration, and were approved by the Medical Research Ethics Committee and Institutional Review Board of Xuanwu Hospital of Capital Medical University. All participants were in voluntarily participated in the study and provided written informed consent.
Neuropsychological assessment
Participants in cohort A carried on regular neuropsychological tests, including the Montreal Cognitive Assessment (MoCA) (Beijing version), the Auditory Verbal Learning Test (AVLT)-traditional version, and the CDR. Based on the number of educational years, the cut-off points of MoCA were: 13 (no formal education), 19 (1 to 6 years of education), and 24 (7 or more years of education) [31]. For the memory domain, the cut-off point of the AVLT-long delayed memory item was 6. The CDR score for aMCI was 0.5.
Participants in cohort B participated in the following neuropsychological tests focusing on three cognitive domains: 1) Memory domain: the Auditory Verbal Learning Test-HuaShan version [AVLT-H] [32], including AVLT-long delayed memory, with cut-off points of 5 (50–59 years old), 4 (60–69 years old), and 3 (70–79 years old), and AVLT-recognition, with cut-off points of 20 (50–59 years old), 19 (60–69 years old), and 18 (70–79 years old); 2) Language domain: the Animal Fluency Test (AFT), with cut-off points of 12 (junior middle school), 13 (high school), and 14 (college) [33], and the 30-item Boston Naming Test (BNT), with cut-off points of 19 (junior middle school), 21 (high school), and 22 (college) [34]; and 3) Speed/executive domain: the Shape Trail Test Part A (STT-A), with cut-off points of 70 s (50–59 years old), 80 s (60–69 years old), and 100 s (70–79 years old), and the Shape Trail Test Part B (STT-B), with cut-off points of 180 s (50–59 years old), 200 s (60–69 years old), 240 s (70–79 years old) [35]; and 4) the MoCA-Basic (MoCA-B), with cut-off points of 19 (no formal education or elementary school), 22 (junior middle school or high school), and 24 (college) [36].
MRI data acquisition
Structural MRI scanning in cohort A was performed on a 3.0 Tesla Siemens Trio scanner (Siemens, Erlangen, Germany) at Xuanwu Hospital of Capital Medical University. All the participants were examined using a standard dementia MRI protocol, which included the following sequences: three-dimensional (3D) magnetization-prepared rapid gradient echo (MPRAGE) T1-weighted sequence (parameters: TR = 1900 ms, TE = 2.2 ms, TI = 900 ms, FA = 9°, FOV = 256×224 mm2, slice number = 176, slice thickness = 1 mm, voxel size = 1×1×1 mm3, and matrix = 256×256). Multiplanar reconstructions (MPRs) of 3D T1-weighted sequences were performed in sagittal (1 mm) and oblique–coronal orientations (1 mm slice perpendicular to the long axis of the hippocampus).
The second cohort underwent structural MRI scanning with an integrated simultaneous 3.0 T TOF PET/MR (SIGNA PET/MR, GE Healthcare, Milwaukee, WI, USA) at Xuanwu Hospital of Capital Medical University. The parameters for acquiring T1-weighted 3D brain structural images were as follows: SPGR sequence, FOV = 256 × 256 mm2, matrix = 256 × 256, slice thickness = 1 mm, gap = 0, slice number = 192, TR = 6.9 ms, TE = 2.98 ms, TI = 450 ms, flip angle = 12°, voxel size = 1 × 1 × 1 mm3.
Visual rating assessment
In this study, we selected T1-weighted structural MRI data for the visual rating assessments of both MTA and PA. Visual rating of the entire study sample in each cohort was performed by three trained raters, each with an average of 8 years of clinical experience. In cohort A, rater 1 (CS) and 2 (JL) were neurologists, and rater 3 (DP) was a rehabilitation physician, whereas in cohort B, rater 1 (XB), 2 (XW), and 3 (YL) were all neurologists. All raters were blinded to any demographic information, clinical diagnosis, and neuropsychological assessments of the participants.
MTA in the left and right hemispheres was separately rated on the oblique-coronal reconstructed sections of the 3D T1-weighted sequence, using a 5-point rating scale (0–4) previously described by Scheltens et al. [17]. The medial temporal region of each hemisphere was manually parcellated based on the height of the hippocampal formation and the width of the choroid fissure and the temporal horn. MTA was defined as 0 points for no atrophy, 1 point for minimal atrophy, 2 points for mild atrophy, 3 points for moderate atrophy, and 4 points for severe atrophy. PA was also assessed for the left and right hemispheres separately based on the sagittal, axial, and oblique-coronal MPRs of the T1-weighted sequence [23]. In the event of discrepant scores on different orientations (e.g., 0 on the axial orientation and 1 on the coronal orientation), the highest score was selected. Based on the anatomical regions of the posterior cingulate sulcus, the parietooccipital sulcus, the cortex of the parietal lobes, and the precuneus, PA was rated using a 4-point scale (0 point = absent; 1 point = mild widening of sulcus and mild atrophy; 2 points = substantial widening and substantial atrophy; and 3 points = evident widening of sulci and knife-blade atrophy). For both MTA and PA, the mean values of the left and right hemisphere scores were also calculated. Figure 1 shows the scoring of the visual rating scales for MTA and PA.

The scoring of visual rating scales for medial temporal lobe atrophy (MTA) and posterior atrophy (PA). A) Visual rating scales for MTA, with examples of no (grade 0), minimal (grade 1), mild (grade 2), moderate (grade 3), and severe (grade 4) atrophy. B) Visual rating scales for posterior atrophy (PA). Each panel is an example of no (grade 0), mild (grade 1), moderate (grade 2), and severe (grade 3) posterior brain atrophy. PAR, parietal lobe; PCS, posterior cingulate sulcus; POS, parieto-occipital sulcus; PRE, precuneus.
Inter- and intrarater agreement analysis
To quantify inter- and intrarater agreements, we calculated intraclass correlation coefficient (ICC) values between any two raters and between the first and second sessions of the three raters separately for both cohorts [37]. The reference points for the categorization of ICC were: 1) <0.4 (relatively low agreement); 2) 0.4–0.75 (moderate agreement); and 3) >0.75 (relatively high agreement). To evaluate the intrarater agreement, we randomly selected twenty MRI images from all participants in each dataset and conducted random sampling 500 times. The inter- and intrarater agreements were quantified using the ICC method for both cohort A and cohort B.
Voxel-based morphometry
Due to the lack of a precise anatomical atlas for extracting the regions expressing MTA and PA used for the visual assessments, we compared our results with objective GM volume and density measurements based on the whole brain. All T1-weighted structural MRI images were preprocessed using voxel-based morphometry (VBM) analysis in SPM12 (Statistical Parametric Mapping, Version 12, https://www.fil.ion.ucl.ac.uk/spm/software/spm12/) and MATLAB 2014b. To address the variability in the scanning parameters, the MRI scans and Dartel imported scans were registered into stereotaxic space by applying rigid-body transformations and the Dartel nonlinear image registration procedure. GM, white matter, and cerebrospinal fluid (CSF) tissue probability maps with a priori tissue maps as references were acquired by the unified segmentation algorithm. Subsequently, GM maps were normalized to the Montreal Neurological Institute (MNI) International Consortium for Brain Mapping 152 (ICBM152) template. Finally, the normalized GM images were smoothed with a 4 mm full-width at half-maximum (FWHM) Gaussian kernel to reduce the spatial signal-to-noise ratio and the error caused by space normalization for individuals. The GM measures, including whole brain GM volume (absolute and relative volume) and density of each individual, were calculated for further comparison. For the GM relative volume and GM density measures, the total intracranial volume of each individual was corrected.
ROC analysis
To determine whether the visual rating scales have the potential to distinguish aMCI from controls, receiver operating characteristic (ROC) curves were generated. The ROC curves were obtained based on the values of sensitivity and specificity for each of the visual rating scales and the GM volumetric measures. Then, the area under the ROC curve (AUC) was used to quantitatively assess the discriminative power of these measures. Furthermore, to estimate the discriminative power of the combined MTA and PA rating scales, a multivariable-based ROC analysis was employed. Using logistic regression, new predicted probabilities were calculated by combining the MTA and PA visual rating measures. The AUC was also used to assess the discriminative power of this predicted probability. The logistic regression equation model formula is as follows:
Statistical analysis
The software G*Power 3.1 was used to estimate the sample size in our study. Demographic and neuropsychological assessments were compared using two-sample T tests, the Mann–Whitney U test or chi-squared test as appropriate. To facilitate analysis of group differences in the visual rating measures, the scores of rater 1 were used. We compared group differences in the visual rating measures using general linear regression, with and without adjustment for the effect of age, gender, and years of education. When evaluating the correlation between the visual rating scores and the GM volumetric measures, partial correlation analysis was used, with adjustments for age, gender, and years of education. To solve the possible overfitting problem in the ROC analysis, a standard permutation test was employed using 1000 random resamplings of data, and the results were averaged to produce a final classification performance with mean and standard deviation (SD) values. The ROC curves were compared with Delong’s statistic method [38]. In addition, we assessed the correlation between visual rating assessments and neuropsychological scores using partial correlation analysis, with age, gender, and years of education as covariates. All data processing and analyses were performed using SPSS 22.0 and R 3.6.1. p < 0.05 was considered significant.
RESULTS
Demographic and neuropsychological characteristics
There were no significant differences in age, gender, or years of education between aMCI patients and controls in either cohort A or cohort B. However, compared with controls, aMCI patients in cohort A showed significantly lower scores of MoCA, AVLT-immediate recall (AVLT-I), AVLT-delayed recall (AVLT-D), and AVLT-recognition (AVLT-R) (p < 0.001). In cohort B, MCI patients exhibited significantly decline on the tests of MoCA-B, AVLT-D (long), AVLT-R, STT-A, STT-B, AFT, and BNT. Detailed information is shown in Table 1. Baseline demographic and neuropsychological assessments for aMCI converters and aMCI nonconverters in cohort A are shown in Supplementary Table 1.
Demographic and neuropsychological assessments for all participants
For normally distributed data, they are presented as the mean±SD; For non-normally distributed data, they are presented as the median (IQR); aMCI, amnestic mild cognitive impairment; NC, normal control; MoCA, Montreal Cognitive Assessment; AVLT-I, auditory verbal learning test-immediate recall; AVLT-D (long), auditory verbal learning test-long delayed recall; AVLT-R, auditory verbal learning test-recognition; MoCA-B, Montreal Cognitive Assessment-Basic; STT-A, Shape Trail Test Part A; STT-B, Shape Trail Test Part B; AFT, Animal Fluency Test; BNT, Boston Naming Test; *MoCA (Beijing version) was used in Cohort A, MoCA-B was used in Cohort B; aTwo-sample t- test; bPearson chi-square test; cMann-Whitney U test.
Visual ratings of MTA and PA
Six visual rating measures, including those of left MTA, right MTA, mean MTA, left PA, right PA, and mean PA, were assessed separately for each participant in the two cohorts. Relative to those of the controls, the visual rating scores of the aMCI patients were greater with and without adjustment, suggesting more significant regional brain atrophy of the medial temporal lobe and parietal areas during aMCI. However, although there was a statistically significant difference in the left PA score between aMCI patients and controls in cohort B (p = 0.022), the difference was less significant than that of the other visual rating measures (Table 2A). For aMCI converters, there was a statistically significant atrophy in the left MTA at baseline (p = 0.034) (unadjusted, Supplementary Table 2).
Group differences of visual rating scales between aMCI and NC subjects
Data are presented as the median (IQR); p*, group differences were compared using Mann-Whitney U test, without adjustment; p#, group differences were compared using general linear model, adjusted for age; p&, group differences were compared using general linear model, adjusted for age, gender, and years of education. aMCI, amnestic mild cognitive impairment; NC, normal control; MTA, medial temporal lobe atrophy; PA, posterior atrophy.
Group differences of GM volumetric measures between aMCI and NC subjects
GM, gray matter; aMCI, amnestic mild cognitive impairment; NC, normal control.
We calculated a statistical power of 99.9% to compare the means of the MTA between aMCI group and control group based on Mann-Whitney test in the two cohorts, with α = 0.05. For PA visual rating scale, we also calculated a statistical power of 99.47% in cohort A, while the statistical power was 89.96% in cohort B.
Inter- and intrarater reliability
In cohort A, the value of the interrater agreement ranged from 0.761 to 0.916, and the intrarater agreement for all visual rating measures ranged from 0.735 to 0.922, which indicated a relatively good consistency. In cohort B, interrater agreement for MTA was best between raters 1 and 2, with a value of 0.900, followed by a value of 0.833 between raters 1 and 3 and lowest between raters 2 and 3, with a value of 0.802. Intrarater agreement for MTA varied between 0.719 and 0.884. Interrater agreement for PA was relatively high between raters 1 and 2, with a value of 0.845, whereas the values for intrarater agreement varied ranging from 0.709 to 0.832. Detailed information is shown in Table 3.
Inter- and intra-rater agreement for visual rating of MTA and PA in cohort A and B
MTA, medial temporal lobe atrophy; PA, posterior atrophy. Statistical significance was set at p < 0.05.
Voxel-based morphometry
For the GM images from all participants, we computed the whole-brain GM volumetric measures using VBM analysis. The results revealed that aMCI patients showed lower GM volumetric measures (GM volumes and density) than controls (p < 0.001) except for GM relative volume in cohort B (Table 2B and Fig. 2), and a negative correlation between whole-brain GM measures and visual rating scores could be observed in the regions showing MTA and PA (Fig. 3). These results indicated that the visual assessments of specific brain regions efficiently reflected GM volume loss.

The lower GM volumetric measures in aMCI than controls in cohort A (p < 0.001). A) GM relative volumes. B) GM density.

A negative correlation between whole brain GM measures and visual rating scores. A) the negative correlation between GM relative volume and mean MTA. B) The negative correlation between GM relative volume and left MTA. C) The negative correlation between GM relative volume and right MTA. D) The negative correlation between GM relative volume and mean PA. E) The negative correlation between GM relative volume and left PA. F) The negative correlation between GM relative volume and right PA.
Visual rating and GM volume-based classification analysis
Using the ROC analysis approach, we first estimated the discriminative power of each of the visual rating scales and the GM volumetric measures in identifying patients with aMCI from controls (Table 4 and Fig. 4). In cohort A, the visual rating scales for MTA and PA exhibited the potential discriminative power, with AUCs of 0.776±0.044 and 0.725±0.045, respectively. The GM relative volume showed a relatively good discriminative power, followed by the GM density, with AUCs of 0.839±0.034 (p < 0.001, Delong’s test) and 0.783±0.042 (p < 0.001, Delong’s test), respectively.
ROC analysis of visual rating scales and MRI image measures
Combined GM volume measures means the combination of GM absolute volume, GM relative volume and GM density.

ROC of visual rating scales in cohort A and B. A) ROC of single and combined visual rating scales in cohort A. B) ROC of single and combined visual rating scales in cohort B. C) ROC of single and combined GM measures in cohort A. D) ROC of single and combined GM measures in cohort B.
Furthermore, we calculated the discriminative power of the combination of the MTA and PA, as well as that of the combined GM volumetric measures for distinguishing aMCI from controls. Compared with the single visual rating scales, the combination of the MTA and PA visual rating scales showed relatively higher classification accuracy, with an AUC of 0.818±0.041. The discriminative power of the combined GM measures was relatively excellent, with an AUC of 0.857±0.034 (p = 0.016, Delong’s test). These results suggested that the combination of multiple visual rating scales was beneficial for optimizing the classification accuracy of aMCI. Similar findings were also demonstrated in cohort B. Compared to the individual rating scales, the combination of the rating scales for MTA and PA had increased discriminative power increased, with an AUC of 0.824±0.058.
Furthermore, the combination of the MTA and PA visual rating scales showed an AUC of 0.683 (95% CI: 0.547–0.819) in identifying aMCI converters from aMCI nonconverters. When we combined the psychological assessments, visual rating scales and GM measures, the discriminative power would further increase, with an AUC of 0.801 (95% CI: 0.686–0.917) (Supplementary Table 3).
Correlation between cognitive scores and visual rating measures in aMCI patients
We implemented Spearman partial correlation analysis to investigate the relationship between all six visual rating measures and the cognitive assessments in aMCI patients. The correlation results in cohort A and cohort B are summarized in Table 5. In cohort A, the results showed that the left MTA measure had a significantly negative correlation with AVLT-I (R = –0.293, p = 0.015), AVLT-D (R = –0.265, p = 0.028), and AVLT-R (R = –0.248, P = 0.040), while the mean MTA measure was negatively correlated with AVLT-I (R = –0.260, p = 0.031). The visual rating scale measures for PA were not correlated with any of the cognitive scores. In cohort B, there was a positive correlation between the PA measure and STT-A.
The correlation between visual rating measures and cognitive assessments in aMCI
*Statistical significance was set at p < 0.05.
DISCUSSION
In the present study, we investigated the effectiveness of visual rating scales of MTA and PA in identifying patients with aMCI and found that: 1) the combination of visual rating scales for MTA and PA achieved greater discriminative power between aMCI patients and controls than the individual visual rating scales; 2) similar discriminative power was verified in a second cohort, indicating the repeatability and consistency of the visual assessments. Taken together, our findings demonstrated apparent regional brain atrophy in both the medial temporal lobe and posterior areas among aMCI patients. The combination of multiple visual rating scales appeared to be more rapid and effective in identifying aMCI than a single visual rating scale, which has the potential to be widely used in clinical practice due to its convenience and speed.
Visual rating characteristics of MTA and PA in aMCI
The medial temporal lobe, as a critical component among typical AD-related brain regions, has been revealed to have structural alterations in patients with both AD and aMCI [39]. In comparison to complicated GM volumetric measurements, the visual rating scale for MTA is considered as a quick method in routine clinical practice [40]. As Shen Q and his colleagues reported, compared with hippocampus volumetric measures, visual MTA even provided better discriminative power in distinguishing aMCI or AD from healthy controls [41]. Similarly, this study demonstrated a significantly higher degree of MTA in aMCI patients than in controls. In addition, individuals with aMCI who ultimately converted to AD dementia initially presented GM volume loss in the medial temporal lobe, including the hippocampus and entorhinal cortex [13]. Atrophy in the medial temporal lobe has been recommended as a topographical biomarker indicating the progression of MCI. In this study, compared with aMCI nonconverters, aMCI converters showed a statistically significant degree of atrophy in the left MTA at baseline (p = 0.034), which indicated that atrophy in the medial temporal lobe might be a topographical biomarker of conversion to AD for aMCI patients. Furthermore, a negative correlation between MTA and the AVLT score was also shown in this study, verifying the close association between the disruption of the medial temporal lobe and memory loss. Similarly, previous structural MRI-based quantitative analysis also demonstrated the association between hippocampal atrophy and cognitive impairment [42, 43]. In other words, visual MTA assessment exhibits the similar effectiveness in mirroring the memory decline.
PA may be underrecognized in clinical practice. It has been increasingly acknowledged that the presence of MTA should be associated with not only AD patients but also in other types of dementia, such as frontotemporal lobe degeneration, vascular dementia, and semantic dementia, indicating that MTA may be less effective in distinguishing AD from other types of dementia [21, 45]. Additionally, not every AD patient presents with MTA. Indeed, an AD subtype called no-atrophy AD has been reported in previous studies [24]. Furthermore, current studies have emphasized a prominent posterior (parietal) atrophy pattern in AD, and approximately 30% of AD patients showed PA without MTA [16, 28]. Posterior cerebral atrophy, generally affecting the posterior cingulate gyrus, precuneus, and parietal lobes, has been confirmed in a large cohort of patients with AD, particularly in younger individuals [46]. Similarly, in our study, aMCI patients showed higher levels of PA than controls. PA is also considered to be associated with worse performance in visuospatial and executive functions [47], which was confirmed by the positive correlation between PA and STT-A in cohort B. It is noteworthy that regional brain atrophy in posterior cortices has low specificity in discriminating AD from other forms of dementia [16]. Several other neurodegenerative disorders also display posterior brain atrophy, such as behavioral variant frontotemporal dementia due to C9orf72 expansion, posterior cortical atrophy, and sporadic non-amnestic AD [48–50]. Thus, we considered that combined visual rating assessments may provide more diagnostic evidence than a single visual rating of PA.
In addition to visual rating assessments, aMCI patients also showed relatively lower GM volumetric measures than the controls. A negative correlation between GM volumetric measures and visual rating scores suggested that visual assessments of specific brain regions could reflect the GM volume loss.
The reliability of combined visual rating scales
Previous studies have suggested that the pattern of GM loss in the aMCI subject scans is initially focused on the medial temporal lobes, and subsequently, extends into the posterior regions, including the parietal lobe and the temporoparietal association cortices [13]. This dynamically changing atrophy pattern reveals the importance of combining medial temporal areas and posterior cortices in comprehensively evaluating the structural changes in the stage of aMCI. Thus, owing to the complementary morphological information provided by the different visual rating scales, the combination of the MTA and PA appeared to achieve a more effective diagnosis than the single visual rating measure. Koedam et al. emphasized the increased diagnostic sensitivity for AD when combining MTA and PA [23]. One study also recommended that adding PA to the MTA could further improve the discrimination of AD from controls (AUC 0.87), although the discrimination abilities were good for the individual MTA and PA scales (AUC 0.80 and 0.74, respectively) [28]. As is shown in the present study, we demonstrated the advantages of the integration of the MTA and PA ratings in identifying aMCI, with a significantly increased discriminative power over that of the individual rating measures (AUC 0.818±0.041), that was similar to the results obtained with the combined multiple GM measures. Our main findings were verified with cohort B (AUC 0.824±0.058), indicating the repeatability and robustness of the results in different clinical settings.
Various types of cortical morphological features, such as cortical thickness, GM volume, metric distortion, and GM densities, have shown promising results in the classification between aMCI and controls [51–53]. Our study revealed the relatively high classification accuracy from using each of the parameters extracted from the whole brain (GM volume and density), which is in line with previous studies. For example, based on the cortical thickness, the classification accuracy in discriminating aMCI from controls was shown to be 78% in the left hemisphere and 60% in the right hemisphere [54]. Similarly, given that different morphological features derived from structural MRI have unique neuropathological characteristics and various contributions in discriminating aMCI from cognitively healthy controls [5, 56], the integration of multiple features may be beneficial in improving the diagnostic accuracy for aMCI [57, 58]. Li et al. extracted six cortical features for each aMCI subject and demonstrated the best discriminative power (84%) by combining the metric distortion and cortical thickness features in the left hemisphere [54]. Xiao et al. also reported a relatively high classification accuracy of 86.11% for aMCI based on the combination of texture features and morphometric features [53], indicating that the multifeatured combination was better than single features. Our study exhibited similar classification power via the combination of multiple GM measures, with an AUC of 0.857±0.034.
However, the discrimination accuracy of the combined GM measures in cohort B was significantly improved, with a mean AUC of 0.957±0.038. This may have been caused by the possible overfitting problem in the ROC analysis, even though random permutation was performed 1000 times and the results were averaged to produce a final classification performance. In summary, the combined visual assessments were clinically useful and yielded a diagnostic accuracy close to that of the quantitative MRI measures.
Limitations and future directions
Several limitations need to be considered. First, although they demonstrate speed and convenience in evaluating morphological alterations, visual rating scales are subjective, semiquantitative methods when compared with traditional MRI quantitative analysis. Second, this study was based on small clinical cohorts. Although the reliability of our current findings was confirmed with cohort B, a larger sample size from multiple centers may be essential to provide more evidence in future studies. Third, the different sources of participants in the two cohorts may have introduced different severities of the disease. In cohort A, participants were recruited from the memory clinic. Several studies have revealed the relationship between the degree of cognitive concern and the severity of AD-related pathology in aging cohorts of cognitively normal adults [59, 60]. Participants in cohort A were active in seeking medical help due to their concerns about the cognitive decline, suggesting that they might be in the later stage of aMCI. Meanwhile, participants in cohort B were recruited mainly from community-based advertisements, and they might be in the earlier stage of aMCI. In our study, the visual rating scores of the aMCI patients in cohort B were lower than those in cohort A. In addition, structural MRI scanning in the two cohorts was performed on different machines. In the future, strategies should be implemented to ensure the consistency in multicenter, cross-machine data collection. Finally, patients with aMCI in this study had no evidence of AD-related pathologic features, such as Aβ positivity, pathologic tau, and altered glucose metabolism. Future studies should focus on aMCI with the etiological confirmation of AD, including biomarkers derived from the CSF or positron emission tomography (PET), which might provide higher accuracy in the diagnosis of prodromal AD.
Conclusions
The discriminative power of visual rating scales for identifying aMCI patients from cognitively normal controls was preliminarily assessed in this study. Based on two datasets with different criteria for MCI diagnosis, the combination of MTA and PA visual rating scales exhibited more effective discriminative power in discriminating aMCI from controls than the individual rating scales, suggesting its repeatability and diagnostic value as a neuroimaging biomarker in routine clinical practice. Although semiquantitative and subjective, visual rating scales remain the primary method for extracting diagnostically useful information in the clinical settings.
Trial registration
ClinicalTrials.gov Identifier: NCT02353884 and NCT03370744
Footnotes
ACKNOWLEDGMENTS
We would like to thank all the participants in this study. We also acknowledge Dr. Chunxiu Wang for her assistance in statistical analysis. This article was supported by the National Key Research and Development Program of China (2016YFC1306300, 2018YFC1312001), National Natural Science Foundation of China (61633018, 81801052), Beijing Nature Science Foundation (7161009), China Postdoctoral Science Foundation (2018M641414), and Beijing Municipal Commission of Health and Family Planning (PXM2019_026283_000002).
