Abstract
Background:
Early detection of amyloid-β (Aβ) positivity is essential for an accurate diagnosis and treatment of Alzheimer’s disease (AD), but it is currently costly and/or invasive.
Objective:
We aimed to classify Aβ positivity (Aβ+) using morphometric features from magnetic resonance imaging (MRI), a more accessible and non-invasive technique, in two clinical population scenarios: one containing AD, mild cognitive impairment (MCI) and cognitively normal (CN) subjects, and another only cognitively impaired subjects (AD and MCI).
Methods:
Demographic, cognitive (Mini-Mental State Examination [MMSE] scores), regional morphometry MRI (volumes, areas, and thicknesses), and derived morphometric graph theory (GT) features from all subjects (302 Aβ+, age: 73.3±7.2, 150 male; 246 Aβ–, age: 71.1±7.1, 131 male) were combined in different feature sets. We implemented a machine learning workflow to find the best Aβ+ classification model.
Results:
In an AD+MCI+CN population scenario, the best-performing model selected 120 features (107 GT features, 12 regional morphometric features and the MMSE total score) and achieved a negative predictive value (NPVadj) of 68.4%, and a balanced accuracy (BAC) of 66.9%. In a AD+MCI scenario, the best model obtained NPVadj of 71.6%, and BAC of 70.7%, using 180 regional morphometric features (98 volumes, 52 areas and 29 thicknesses from temporal, parietal, and frontal brain regions).
Conclusions:
Although with currently limited clinical applicability, regional MRI morphometric features have clinical usefulness potential for detecting Aβ status, which may be augmented by a combination with cognitive data when cognitively normal subjects make up a substantial part of the population presenting for diagnosis.
INTRODUCTION
Abnormally high amyloid-β (Aβ) plaque deposition (i.e. Aβ positivity) is an early sign of Alzheimer’s disease (AD). 1 However, approximately 40% of amnestic mild cognitive impairment (MCI) individuals and 12% of AD patients have a normal Aβ deposition. 2 Additionally, Aβ positivity is also observed in cognitively normal (CN) old adults. 3 Considering this heterogeneity among groups, a research framework wherein Aβ positivity is a criterion for the pre-clinical stage of AD diagnosis, regardless of the current diagnosis, was previously proposed. 4
Aβ positivity is currently assessed through cerebrospinal fluid (CSF) sampling or positron emission tomography (PET), which are invasive, costly, and relatively unavailable in clinical settings. 5 Previous studies using magnetic resonance imaging (MRI), a more affordable and available technique, have found a correlation between Aβ positivity and morphometric changes in the grey matter (GM) 6 , medial temporal cortical, hippocampal atrophy and ventricular enlargement.7–9 Furthermore, the atrophy of the brain regions known for early high Aβ deposition (namely, the basal neocortex and hippocampus) can be detected years before the onset ofdementia. 10
Recently, graph theory (GT) metrics have been used to study the properties and the organization of brain networks, propelling knowledge regarding the etiology of AD and MCI11–13 and the influence of abnormal Aβ deposition on brain networks in these pathologies— as we 13 and others14–17 have reported. Only a few studies have used GT metrics derived from regional morphometric MRI features (e.g., volumes, areas, and thicknesses) to analyze morphometric relationships between brain regions. They have found abnormal morphometric relationships in GM associated with Aβ positivity and subsequent clinical manifestations14,16–18, 14,16–18, which suggests that such changes may precede the more easily detectable regional morphometric changes. 14 These abnormal morphometric relationships have been shown to occur both at global 14 and local levels16,17, 16,17, specifically involving regions related to memory loss, such as the middle frontal gyrus, the precuneus, and the medial temporal gyrus.15,19, 15,19 (For further familiarization with GT metrics, please see the Supplementary Methods and 20 ). Abnormal GT metrics involving the precuneus, the fusiform gyrus, and the hippocampus have also been associated with cognitive decline.11,18,21, 11,18,21 Given that, MRI morphometric GT metrics have been surprisingly little explored as a potential biomarker for Aβ positivity, especially given its clinical usefulness potential (non-invasiveness, low cost, and high accessibility compared to PET/CSF exams), at least as a preliminary “rule-out” tool.
Machine learning (ML) algorithms using MRI features have also shown to be useful in classifying Aβ status with accuracies ranging from 67.0% to 86.0% (sensitivities of 68.7% – 92.0% and specificities of 61.0% – 87.0%) and in providing information regarding relevant brain regions associated with Aβ positivity.22–29 Overall, ML studies have shown that regional morphometry data has not improved models’ performances over models using demographic, genetic (Apolipoprotein E), and neuropsychological data, suggesting that more complex features may be warranted.22,26, 22,26
Considering the research gaps mentioned above, in this study, we aimed: 1) to provide a model for classifying Aβ status using T1-weighted MRI data in two scenarios: in one we used a dataset with AD, MCI, and CN individuals (Scenario 1), and, in the other, a dataset with AD and MCI individuals (Scenario 2), regardless of what diagnosis is given or suspected (thus, a more clinically relevant and ‘real-world’ scenario); 2) to assess the potential of morphometric GT features to classify Aβ status; 3) to identify and discuss the most relevant features and brain regions in Aβ status classification in terms of etiological interpretability, and 4) to test the potential for clinical applicability of the best models foreach scenario.
MATERIALS AND METHODS
Data sample
In this study, we used neuroimaging (3T T1-Weighted MRI scan features and the standardized uptake value (SUV) from PET) and cognitive data from a total of 548 participants (97 AD, 257 MCI, 194 CN) available in the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database. Data collection has been approved by the local ethics committees of the participating sites (subjects provided written informed consent at the time of inclusion in ADNI). ADNI is a landmark and longitudinal multicenter study that acquires subjects’ data to identify AD progression biomarkers. For more up-to-date information on ADNI, please visit http://adni.loni.usc.edu. 30 We selected the subjects based on the criteria described in the Supplementary Methods. MRI images were pre-processed using Freesurfer 6.0 (http://surfer.nmr.mgh.harvard.edu). To identify the Aβ status of each subject, we used the relative SUV (rSUV) provided by the ADNI database as computed in the Supplementary Methods, and the threshold of 1.1 for Aβ PET positivity (i.e. rSUV > 1.1 is Aβ+). 31
Regional morphometry and morphometric features using graph theory
Regional morphometric data (volumes, areas and thicknesses) from cortical, subcortical and the other brain regions (enumerated in the Supplementary Methods) was extracted using Freesurfer.
32
The volumes of these regions were normalized by the estimated total intracranial volume to guarantee uniformization among subjects. Then, we created a brain network (matrix) for each subject, where the relationship (edge) between two brain regions (i and j) is given by the normalized ratio of the regional morphometric features (Equation 1, and Supplementary Methods), which we have previously proposed
33
and applied to AD classification.
34
In this study, we only focused on finding patterns in the topological changes of brain regions and not in the value of the ratio. Given that, we binarized the matrices, which also reduces the complexity of interpreting the GT. We explored four thresholds to binarize matrices: 0.3, 0.5, 0.7 and 0.9, because there is no threshold proposed in the literature and we wanted to identify the one that leads to the highest model performance (details in the Supplementary Methods).
For each subject, we computed 12 brain networks, meaning four networks (one for each threshold) for each of the three morphometric measures (volume, area, and thickness); and we computed several GT metrics to extract features, characterizing the relationships between brain regions using the Brain Connectivity Toolbox for Python 20 (bctpy; https://pypi.org/project/bctpy/) (the extracted GT metrics are discriminated in the Supplementary Methods).
Aβ status classification: Feature selection and classification
We performed the classification of Aβ status (Aβ+ versus Aβ–) in two scenarios: considering all diagnoses (AD, MCI, and NC) – (Scenario 1); and patients only (AD&MCI) – (Scenario 2). In both scenarios, the dataset was shuffled and randomly split only once into 70% for the training set and 30% for the test set with the same proportion of diagnoses (Supplementary Table 3 details this split). To examine if the train and test sets were age- and sex-matched, we used the Mann-Whitney test and the Chi-Square test, respectively. We also assessed their effect size with Cohen’s d and confidence intervals(CI).35–37
In both scenarios, we developed a GridSearch cross-validation pipeline (GridSearchCV) with five stratified folds (Fig. 1) for seven different classifiers: logistic regression, support vector machine (linear and non-linear kernel), decision trees, random forest, extra trees and K-nearest neighbors (Supplementary Methods) with the same train and test sets, including feature selection, hyperparameters optimization and classification, only applied to the train set. To reduce the number of features and to identify which ones best allowed for the classification of Aβ status, the pipeline removed features with zero variance and used the ANOVA F-score. We defined the number of features as a hyperparameter for the GridSearchCV pipeline, starting from 5 features until the nearest hundredth of the total training samples (400 for the 383 training samples of Scenario 1; and 200 for the 165 training samples of Scenario 2), with an interval of 5. Because before feature selection each scenario had the number of features > >the number of observations (271 regional morphometry, two demographic, two cognitive and 29393 GT features for each GT threshold), we decided to drastically remove the number of features to reduce the chance of occurring biased classifications. We conducted the GridSearchCV using Matthews correlation coefficient (MCC) score (the higher, the better) to optimize the hyperparameters of the feature selection and of the classifiers (the hyperparameter space used is in Supplementary Table 4).

Machine learning pipeline for Aβ classification, including feature selection, training and final model testing.
We also used a re-weighting technique to prevent biased classifications resulting from the slightly higher number of total Aβ+ than Aβ– subjects (Scenario 1:55.1% versus 44.9%; Scenario 2:51.5% versus 48.5% - same proportion in all dataset, the training and test sets) by giving less weight to the former. 38 All code was written in Python and is available from the authors upon reasonable request.
Evaluation of model performance, feature interpretability, and clinical applicability
To assess the potential of different types of features in classifying Aβ status, we created 15 feature sets as shown in (Table 1). As we obtained seven models (one for each classifier) from GridSearchCV for each feature set, we defined the best model as the one with the highest MCC score (Fig. 1). Thus, we ended up with 15 models (M0 to M14) in each classification scenario, one for each feature type. Then, we defined the final best model in classifying Aβ positivity within the 15 feature types as the one with the highest balanced accuracy, and consequently higher MCC. We also analyzed the best model in terms of negative predictive value adjusted to prevalence NPVprev (described below) (i.e., with fewer false negatives), aiming for a putative clinical use as a first-line tool for “ruling-out” individuals testing negative.
Initial feature sets in each model (before GridSearchCV)
GT, graph theory; MMSE, Mini-Mental State Examination.
We also evaluated and compared the performance of the best models in the test set using ROC-AUC, sensitivity, specificity, positive predictive value (PPV), NPV, MCC, and confusion matrices. We computed the confidence interval for all metrics using 2000 bootstrapped samples from the test set, and we evaluated the ROC-AUC as reported elsewhere. 39 We also tested the best models for statistically significant differences in ROC-AUC using the DeLong Test (at p < 0.05) implemented in R. To assess the existence of overfitting or underfitting in the best 15 models, we also analyzed the learning curve in the entire training set with MCC score versus the number of training samples.
The PPV and NPV are measures that have greater clinical meaning if calculated using the prevalence of the classes being distinguished by the biomarker. Therefore, both PPV and NPV were calculated using: (i) the contingency table of the model (PPV mod and NPV mod ); (ii) the prevalence of Aβ positivity (PPVprev and NPVprev); and (iii) a prevalence of 50% - “standardized predictive value” (PPVstan and NPVstan)— as advised for the sake of comparison with other studies. 40 To our knowledge, the prevalence of Aβ positivity in the general population has not yet been reported. Therefore, we estimate a value of 46.7% of Aβ positivity prevalence in a dementia-typical clinical setting based on previous meta-analysis data2,3, 2,3 (details in the Supplementary Methods). Additionally, we also calculated the diagnostic odds ratio (DOR = (sensitivity/(1-sensitivity))/((1-specificity)/specificity)). 41 Finally, to improve the potential for interpretability of the features in a clinical context and how correlated are with the Aβ status, we used Spearman’s rank correlation to measure the association between the rSUV concentration and each selected feature of the best model. We also used the Mann-Whitney U test to assess statistically significant differences between the Aβ status in the selected features by the best model (in each scenario) (at p < 0.05). Finally, we used a clinical applicability scale we previously developed 42 to assess the clinical applicability of the final best models and respective selected features.
RESULTS
Sample description
The demographic, clinical, and cognitive data of Scenario 1 and Scenario 2 are depicted in Table 2. The number of AD and MCI subjects differs between scenarios because these subjects were randomly selected to create age- and sex-matched sets. In Scenario 1, Aβ+ and Aβ– showed statistically significant differences in age (Mann-Whitney U = 28790.5 [p < 0.01]; r = 0.20 CI95 %: [0.12;0.28], Aβ+age >Aβ– age) and in MMSE (Mann-Whitney U = 21788.0 [p < 0.01]; r = 0.36 CI95 %:[0.28;0.43], Aβ+MMSE-total <Aβ-MMSE-total). The groups did not statistically differ in sex (χ(1) = 0.692 [p = 0.40]). In Scenario 2, Aβ+ and Aβ– only showed statistically significant differences in MMSE (Mann-Whitney U = 4087.0 [p < 0.01], r = 0.37 CI95 %:[0.25;0.47], Aβ+MMSE-total <Aβ-MMSE-total). Age and sex did not show statistically significant differences (age: Mann-Whitney U = 14050.5 [p = 0.38], sex: χ(1) = 0.198 [p = 0.66]). The training and test sets did not show any statistically significant differences in demographic and cognitive data (Supplementary Table 5).
Subject demographic, cognitive data, diagnoses, and Aβ status in Scenario 1 and Scenario 2
AD, Alzheimer’s disease; Aβ+, amyloid-beta positive; Aβ–, amyloid-beta negative; CI, confidence interval, MCI, mild cognitive impairment; MMSE-total, Mini-Mental State Examination total score; CN, cognitively normal; NA, not available.
Classification of Aβ status using all diagnoses – Scenario 1
The results of the Aβ status classification in the test set of Scenario 1 are depicted in Table 3 for the M0 (demographic and cognitive data only), and the two best models (M14 and M5), and the respective learning curves in Fig. 2. For conciseness, the remaining results are in Supplementary Table 6. The model M14 achieved the highest balanced accuracy (66.9% CI95 %:[59.6;70.0]), but not the highest NPVprev (68.4% CI95 %:[61.9;75.2]). The model M3 (balanced accuracy = 64.6% CI95 %:[57.4;70.0]) achieved the highest NPVprev (69.7% CI95 %:[61.8;77.7]). Model M3 also scored the highest ROC-AUC (0.703 CI95 %:[0.622;0.779]). Only M4 versus M14, M10 versus M14, and M6 versus M10 showed statistically significant differences in the ROC-AUC (DeLong Test: p < 0.05, Supplementary Table 6). Models inputted with cognitive data (M2 and M11 to M14) used MMSE-total score to classify Aβ status. The models using regional morphometric and morphometric GT features (M7 to M14) mostly selected matching index and topological overlap features and excluded all the surface areas of brain structures. The models M2 and M11 to M14 also excluded all demographic features (age and sex).
Performance of the demographic and clinical (M0), and best models differentiating AB status in all diagnoses (Scenario 1) and in patients only (Scenario 2)
Acc, accuracy; BAC, balanced accuracy; CI, confidence interval, knn, k-nearest neighbors, MCC, Matthew’s correlation coefficient; MMSE-total, Mini-Mental State Examination total score; FP, false positives; FN, false negatives; NPVprev, positive predictive value; NPVtan, negative predictive value standardized; NPVprev, negative predictive value considering the Aβ prevalence; PPV, positive predictive value; PPVstan, positive predictive value standardized; PPVprev, positive predictive value considering the Aβ prevalence; Sens, sensitivity; Spec, specificity; svm-rbf, support vector machine with Radial Basis Function; ROC-AUC, receiver operating characteristic – area under the curve; TP, true positives; TN, true negative. BAC, Sens, Spec, PPV, and NPV are expressed in %.

Learning curves of M0 and best models regarding the Matthews correlation coefficient (MCC) for Scenario 1 and Scenario 2. Each plot shows the mean and the standard deviation from training and validation sets for each feature group (initial features) of the final model of the GridSearch for each group of initial features. The MCC score of the test set is also represented in the plot for comparison with the validation score of the model when the entire training sample is used.
Classification of Aβ status in AD patients – Scenario 2
We also obtained 15 best models in Scenario 2, one for each initial feature type. The results for the M0 (demographic and cognitive data only), and the two best models (M1 and M7) are depicted in Table 3, and the respective learning curves in Fig. 2 (the remaining results are in Supplementary Table 8). Model M7 achieved the highest balanced accuracy (72.0 CI95 %:[61.8;80.0]) and the highest MCC score (0.448 CI95 %:[0.241;0.642]). Both models M1 and M2 showed similar NPVprev to the value of model M7 (NPVprev: 71.4% CI95 %:[62.6;81.8]) by achieving values of 71.6% CI95 %:[61.5;81.8], 71.8% CI95 %: [61.0;83.2]), respectively. The learning curves (Fig. 2 and Supplementary Figure 2) showed that model M7 achieved a higher score than the average and standard deviation of the validation set, which did not occur with models M1 and M2, indicating the absence of robustness for the former. The 15 models did not show statistically significant differences in the ROC-AUC (DeLong Test: p > 0.05, Supplementary Table 9). In this scenario, the models inputted with cognitive data (M2 and M11 to M14) also used the MMSE score to classify Aβ status and excluded demographic features (the M2 model also selected the MMSE-DR scores).
Relevant features and their interpretability in Aβ classification
Because our models selected more than 100 features, we described here the main findings and fully detailed the in Supplementary Table 10 (Scenario 1) and Supplementary Table 11 (Scenario 2).
We considered M14 the best model to classify Aβ status in Scenario 1, as it achieved the highest balanced accuracy. This model needed 120 features: 107 from morphometric GT features (58 matching index, 47 topological overlap, and 2 degree); 12 regional morphometric features (10 volumes and two thicknesses); and one cognitive (MMSE-total score). Only 29 features showed statistically significant differences between Aβ+ and Aβ– and were significantly correlated with the rSUV concentration. We enumerate here the 5 features with the highest correlation coefficient: negative correlation - thickness of the left entorhinal, MMSE-total score; positive correlation: matching index and topological overlap between the volumes of the left middle temporal and the right insula, and matching index between the thickness of the right rostral anterior cingulate and the left entorhinal.
For Scenario 2, we also selected the best model based on its balanced accuracy value and the difference between the MCC score in the test set and the validation interval in the learning curve (the less difference, the better): model M1. This model only used regional morphometric features: the volumes were more numerous (98 of 180) than areas (53 of 180) and thicknesses (29 of 180). From the total of 180 features, only 45 showed statistically significant differences between Aβ+ and Aβ– and were statistically significantly correlated with the rSUV concentration. We also show here the 5 features with the highest correlation coefficient: Spearman with negative correlation – area of the left inferior parietal, mean left thicknesses, the thicknesses of left and right banks of the superior temporal sulci, left caudal middle frontal; and positive – area of the right precentral.
Clinical applicability of the best Aβ classification models
Evaluating clinical applicability is the first step towards assessing the clinical usefulness of a biomarker and can be done according to the framework we have previously proposed. 42 Briefly, ‘clinical applicability’ can be estimated with the combination of 2 parameters: the ‘Quality of Evidence’ (statistical strength and reproducibility) and the ‘Effect Size’ (predictive power) of a biomarker. 42
We obtained a DOR of 4.15 and 5.97 for models M14 (Scenario 1) and M1 (Scenario 2), respectively. These DOR values correspond to a (large) Effect Size score of 4 (out of 4). 42 Given that the present study was not of an a priori and precisely defined biomarker, as it included morphometric GT metrics not yet explored for Aβ classification, the biomarker scored 1 (out of 4) in Quality of Evidence. Thus, this biomarker presented a clinical applicability sum score of 5 (out of 8), which deems it still not worthy of clinical usefulness consideration (cut-off being at 6). 42
DISCUSSION
In this study, we analyzed two scenarios for Aβ status classification: i) using all subjects (AD, MCI, and CN), and ii) using only patients showing cognitive impairment (AD and MCI). The major findings of our study were: i) classifiers achieved higher discrimination power in the classification of Aβ status in Scenario 2 than in the dataset comprising more diagnostic variability (Scenario 1); ii) although presenting higher performance in Scenario 1, morphometric GT features showed no statistically significant differences in ROC-AUC in comparison with models considering regional morphometric features, suggesting that the former may be unnecessary in this task; iii) morphometric GT features measuring overlap and similarity of neighboring regions between two brain regions were the most selected for Aβ status’ classification, in Scenario 1; iv) regional morphometric features achieved higher performance in Scenario 2, than models using morphometric GT features; v) the developed models showed limited potential for clinical applicability.
Comparison with previous MRI models of Aβ classification
To our knowledge, none of the studies previously reported in the literature had attempted to design ML models to detect Aβ positivity using morphometric GT metrics (Table 4). In general, published models using MRI-derived features, additional cognitive, demographic, tau protein, and/or genetic data seem to perform slightly better (Table 4) than those using MRI-derived features alone - ROC-AUC from 0.74 CI95 %:[0.66, 0.82] to 0.91 CI95 %:[0.81, 0.99] in Aβ status classification of CN29,43, 29,43; and ROC-AUC 0.71 to 0.73 in Aβ status classification of MCI subjects. 22 Most studies mainly focused on discriminating Aβ status on MCI subjects (Table 4), and only two studies addressed the same classification scenarios as we performed in this study.
Aβ status classification using machine learning applied to regional morphometry or morphometric graph theory features from MRI data: Comparison between previous studies and with the present study
Aβ, amyloid-beta; Aβ+, amyloid-beta positive; Aβ–, amyloid-beta negative; Acc, accuracy; ADNI, Alzheimer’s Disease Neuroimaging Initiative; BAC, Balanced accuracy; ADAS-cog, 11- item Alzheimer’s Disease Assessment Scale cognitive subscale; APOE4, apolipoprotein E4; BDNF, brain-derived neurotrophic factor; CDR-SB, Clinical Dementia Rating Scale– Sum of Boxes; CI, confidence interval; CN, cognitively normal; CSF, cerebrospinal fluid; FLAIR, Fluid-attenuated inversion recovery; FS, feature selection; IL-6R, interleukin 6 receptor; IL-13, interleukin 13; MMSE-total, Mini-Mental State Examination total score; MMSE-MI, Mini-Mental State Examination – activity Memory Immediate; MRI, magnetic resonance imaging; NfL, Neurofilament light protein; NR, not reported; PiB, Pittsburgh compound-B; ptau181, plasma levels of phosphorylated tau181; SMC, subjective memory complaints; Sens, sensitivity; Spe, specificity; SVM, support vector machine; TNF-α, tumor necrosis factor α; *Not reported in the study, but the value is defined here as (sensitivity+specificity)/2.
Kate et al. 24 have followed an approach similar to ours (Scenario 1, model M14), albeit with a different set of regional morphometry feature types extracted from several MRI sequences: T1-weighted, FLAIR, SWI and/or T2* MRI scan. Although surpassing our model (average balanced accuracy 77.0% versus 66.9% (M14)), their model had the disadvantage of requiring the analysis of different MRI sequences, and needing to resort to a visual appraisal, which is prone to inter-rater variability.
Classifying Aβ status within a cognitively impaired population is relevant to better understand each subject’s pathology and facilitate monitoring and treatment of the disease. For instance, MCI Aβ+ subjects are more prone to convert to AD. 44 Our best model in this scenario (Scenario 2, model M1) achieved higher balanced accuracy than the one proposed by Tosun et al. 29 (66.5% versus 70.7%) that used an MRI score (probability score between 0 - 1 of having Aβ positive based on a deep learning model) feature to discriminate Aβ status in a cognitively impaired subjects’ dataset. They increased their performance of classifying Aβ status by adding subjects’ demographic, the APOE e4 status and other neuropsychological tests to the MRI-score model which suppressed our model (balanced accuracy: 86.0% versus 70.7%). Our results contradict their findings because our models also using MRI and additional features had lower performance than the model using only regional morphometric data (if excluding model M7 that may have biased results).
Clinical interpretability of selected features for Aβ detection
By combining all diagnoses in both Aβ+ and Aβ– groups, we made the classification task of Scenario 1 more demanding than Scenario 2. Although with different Aβ statuses, the subjects we used, especially the CN subjects, may share similar MRI features as previous studies have reported opposite outcomes. Some have observed no regional morphometric differences between CN subjects despite their Aβ status45,46, 45,46 and have suggested that, in some cases, neurodegeneration may occur before Aβ positivity 46 , while others have found reduced hippocampus volume 47 and cortical atrophy associated with Aβ positivity in CN.48,49, 48,49
By selecting morphometric GT features, our model M1 (Scenario 1) is consistent with previous group analyses16,17,50, 16,17,50, and extend those findings by observing that not yet reported GT metrics (matching index and topological overlap) were selected for classification and correlated to rSUV concentrations. These metrics that measure the similarity/overlap of neighbors of two brain regions confirm that regional morphometric changes involve both neighboring and distant brain regions (these GT metrics encompassed regions located at different lobes) instead of isolated regions as also previously reported. 16
The results from Scenario 2 suggest that MCI Aβ+ and AD Aβ+ have similar unique characteristics, differing from MCI Aβ– and AD Aβ–. Although our Aβ+ and Aβ– groups did show statistically significant differences in MMSE, this cognitive test was not considered for the classification, contradicting previous results (Table 4) and suggesting that regional morphometric features may be more relevant to ‘rule-out’ Aβ– within a sample with only cognitively impaired subjects. Our model selected the right surface area of the pars opercularis, which has been reported to be predictive of a more significant decline in MCI. 51 We also confirmed an association between Aβ+ and reduced middle temporal thickness (Spearman, p < 0.01) and paracentral (Spearman, p < 0.05) volume, which are regions known to have high amyloid plaque deposition 52 , even in CN subjects. 53
Clinical usefulness of our best model (an MRI graph theory-based biomarker) for Aβ detection
Following our own published framework for evaluating the clinical applicability of a biomarker in psychiatry 42 , extendable to neurology, we herein concluded that the clinical applicability of our best-performing models in both scenarios achieved a score of 5 (out of 8), not surpassing our clinical applicability threshold (≥6). If these models’ clinical applicability is attained in the future, then a clinical usefulness will need to include PPV and NPV estimations 54 – besides quantification and balancing of risks, side effects, cost, convenience, number-needed-to-assess and relevance of outcome. 42 The model M14 (Scenario 1) shows an NPVprev of 68.4% (i.e., 68.4% of subjects having an Aβ negative result in our test will truly be Aβ negative). Even limited, its NPVprev is higher than its PPVprev (65.8%) and thus carries more potential to be used to “rule-out” (rather than “rule-in”) Aβ positivity. The same is true for the best model (M1) in Scenario 2 (NPVprev of 71.6%, PPVprev of 70.3%). A “rule-out” aiding tool could increase cost-effectiveness as a 1st-line assessment: negative Aβ results could be considered with relatively higher certainty, individuals who are classified as Aβ positive would need further testing (e.g., CSF or PET).
Strengths and limitations
This study has strengths in comparison with the previous studies. First, our best models are more closely representative of a clinical setting/trial as they detect Aβ positivity regardless of the individual’s diagnosis, which could be advantageous in supporting enrolment in clinical trials and earlier diagnosis. Second, the models only needed data from MRI T1-weighted images (and MMSE-total in Scenario 1), which are less expensive and more straightforwardly available than the current gold standards. This study also has some limitations. First, our calculation of Aβ positivity prevalence based on meta-analysis and longitudinal studies data 2 — used to calculate the PPVprev and NPVprev of our models— aimed to reflect as much as possible what which is found in a typical clinical consultation setting receiving patients with cognitive complaints, taking US dementia centers as a proxy. 55 Nonetheless, the diagnostic diversity within this population may differ by culture, hospital and primary care site, which could slightly change the calculated values. Second, our number of Aβ– subjects with AD were relatively lower than MCI and CN Aβ–. Although we have used a weighted technique to mitigate this imbalance, the results may change or improve if more subjects’ data are provided to the classifier in training. Lastly, we used data from only one database. Therefore, independent datasets are needed to validate and assess the generalizability of the present results.
Conclusion
Aβ status determination is critical for confident early-stage diagnosis and clinical trial enrolment. In this study, we assessed the potential of morphometric GT to classify Aβ status and “rule-out” Aβ positivity in two real-world scenarios: one when attempting to AD, MCI and CN subjects may be presented to the clinic; and one where only AD and MCI patients may. We also showed that adding MRI features to demographic and cognitive models can make models in the first scenario slightly more robust and accurate, while offering anatomical interpretability at the brain level. For the second scenario, morphometric GT features were less useful than regional morphometric features. Such findings may contribute to the body of knowledge regarding the pathophysiology of Aβ positivity and could potentiate future personalized treatments for Aβ positive patients.
AUTHOR CONTRIBUTIONS
Helena Pereira (Formal analysis; Writing – original draft); Vasco Diogo (Methodology; Writing – review & editing); Diana Prata (Funding acquisition; Supervision; Writing – review & editing); Hugo Ferreira (Conceptualization; Methodology; Writing – review & editing).
Footnotes
ACKNOWLEDGMENTS
We thank Fundação para a Ciência e Tecnologia (FCT), the European Commission Seventh Framework Programme Marie Curie Career Integration, the Alzheimer’s Disease Neuroimaging Initiative (ADNI) as well as all subjects contributing data to it.
Data collection and sharing for this project was funded by the ADNI (National Institutes of Health Grant U01 AG02418) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; BioClinica, Inc.; Biogen Idec Inc.; Bristol-Myers Squibb Company; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; GE Healthcare; Innogenetics, N.V.;IXICO Ltd.; Janssen Alzheimer Immunotherapy Research& Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Medpace, Inc.; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Synarc Inc.; and Takeda Pharmaceutical Company. The Canadian Institutes of Health Research is pro-viding funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (
). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Disease Cooperative Study at the University of California, San Diego. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California.
FUNDING
This work was supported by Fundação para a Ciência e Tecnologia (FCT), via grants: UIDB/00645/2020 (
; institutional grant to IBEB, FCUL), SAICTPAC/0010/2015 (project grant to HF), 2021.08306.BD (PhD fellowship to HP) and 2020.06310.BD (PhD fellowship to VD), DSAIPA/DS/0065/2018 (project grant to DP) and 2022.00586.CEECIND (research contract to DP).
CONFLICT OF INTEREST
Diana Prata and Hugo Alexandre Ferreira are co-founders and shareholders of the neuroimaging research services company NeuroPsyAI, Ltd. However, this had no influence on any contents of the present work.
DATA AVAILABILITY
The data used in this study are from the dataset Alzheimer’s Disease Neuroimaging Initiative database (https://adni.loni.usc.edu/), which is available upon request at
.
