Abstract
Alzheimer’s disease (AD) has a long pathological process, with an approximate lead-time of 20 years. During the early stages of the disease process, little evidence of the building pathology is identifiable without cerebrospinal fluid and/or imaging analyses. Clinical manifestations of AD do not present until irreversible pathological changes have occurred. Given an opportunity to provide treatment prior to irreversible pathological change, this study aims to identify a subgroup of cognitively normal (CN) participants from the Australian Imaging, Biomarker & Lifestyle Flagship Study of Ageing (AIBL), where subtle changes in cognition are indicative of early AD-related pathology. Using a Bayesian method for unsupervised clustering via mixture models, we define an aggregate measure of posterior probabilities (AMPP score) establishing the likelihood of pre-clinical AD. From Baseline through to 54 months, visuo-spatial function had the greatest contribution to the AMPP score, followed by attention and processing speed and visual memory. Participants with the highest AMPP scores had both increasing neo-cortical amyloid burden and decreasing hippocampus volume over 54 months, compared to those in the lowest category with stable amyloid burden and hippocampus volume. The identification of a possible pre-clinical stage in CN participants via this method, without the aid of disease specific biomarkers, represents an important step in utilizing the strength of cognitive composite scores for the early detection of AD pathology.
Keywords
INTRODUCTION
Current neurobiological models of Alzheimer’s disease (AD) show that accumulation of amyloid-β (Aβ) begins 15–25 years prior to the clinical classification of dementia [1]. As Aβ accumulation progresses, subtle decline in cognition, including memory, language, and executive function become evident and generally occur without the insight of the individual [2]. The process by which accumulation of Aβ begins to influence the different aspects of cognition in the earliest stages of AD is still not well understood, but cognitive decline in the presence of Aβ aggregates is most likely due to the combination of the accumulation of tau proteins and neuronal death [3]. Once cognitive difficulties become sufficient in magnitude to be detected on clinical examination, and recognized by the individual themselves or a close confidant, a classification of mild cognitive impairment (MCI) is made, which in the presence of abnormally high levels of Aβ is considered as prodromal AD. Given the centrality of cognitive impairment to early clinical manifestations of AD, there is a need to optimize both the measurement and analyses of cognition in this stage of the disease.
Research groups conducting large prospective studies of clinical and biomarker measures such as the Australian Imaging, Biomarker & Lifestyle Flagship Study of Ageing (AIBL) [4], Alzheimer’s Disease Neuroimaging Initiative, AddNeuroMed, and Orange County Aging Study utilize batteries of neuropsychological tests combined with assessment of subjective cognitive state to characterize the nature and magnitude of cognitive decline in preclinical AD. While the actual neuropsychological tests themselves vary between these large studies, most are combined to yield sets of domain-specific composite scores, which themselves are consistent across studies. Compared to performance scores on individual tests, cognitive composite scores provide indices of impairment and change that are more sensitive to deterioration in cognitive function over time [5, 6], yield stronger associations with AD biomarkers [7, 8], and are more clinically relevant [9]. Changes in cognition over time defined using composite scores provide a strong understanding of the course of AD from the preclinical, prodromal stages into clinically defined dementia.
To date, most studies seeking to determine the utility of cognitive composite scores in measuring AD related cognitive impairment or change, have begun their investigation by defining individuals according to some AD relevant category, for example, based on clinical status (i.e., CN versus MCI), the presence of a biomarker (abnormal versus normal Aβ level) or the presence of a genetic risk factor (APOE ɛ4 carrier versus non-carrier). In these contexts, the utility of the composite score is determined by the nature and magnitude of the differences between the categorical groups in their cross-sectional cognitive performance or in their changes over time. When conducted in individuals who are cognitive normal (CN), such approaches generally find that in the absence of known AD risk factors (both AD biomarkers or genetic risk factors), cognitive function remains stable over time [10]. In contrast, the AD risk factors, alone or in combination are associated with a substantial decline in these same cognitive domains.
The strong agreement that in CN adults, AD risk factors are associated with cognitive decline provides a strong foundation for the development of models of cognitive decline that by themselves can be optimized and then used to identify AD risk factors. The ability to utilize decline in cognition as a marker of disease pathology in a CN population would provide a non-invasive and relatively cost-effective method for identifying AD risk that often requires more expensive, time consuming, and invasive procedures. One way to approach this issue is to investigate cognitive data of cognitively normal individuals without reference to information about any AD risk status, then utilize statistical techniques to identify those aspects of cognition whose change can predict the presence of different AD risk factors.
In the current study, we examined the extent to which a Bayesian approach for the estimation of Gaussian mixture models, which estimates the distribution of the number of groups in an unsupervised manner, would be able to both determine which cognitive composite scores are associated with the early signs of AD pathology, and then decipher which participants are likely to have these early biological risk factors. This rationale has advantages over the mainstream classical supervised methods that utilize clinical classification labels, by taking an agnostic approach to cognitive composite scores, without the aid of pathological biomarkers such as tau, Aβ, or APOE ɛ4, to identify those who are likely to develop disease. Standard supervised statistical methods that use clinical AD markers are likely to find differences in most if not all cognitive composite scores; however, this would not be a novel finding. The aim of the current study was therefore to identify those cognitive composite scores which naturally have separate distributions for each AD risk factor status and then determine the utility of these distributions for classifying risk in individual adults.
While the individuals in this study are considered CN, we postulate that a small sub-population may exhibit some decline or divergence from normal aging in certain domains of cognition. The method chosen to assess this hypothesis was designed to provide participants with a posterior probability of being in a normal aging or a previously unknown pre-clinical AD group. It then combines these posterior probabilities from informative cognitive composite scores to create a score that can be tested against known pathological markers. By using this unsupervised Bayesian approach, which is agnostic to the classical disease pathology biomarkers, this study aims to capture patterns of divergence from normal aging by studying only CN individuals within a large-scale longitudinal study of aging.
MATERIALS AND METHODS
Sample selection
The sample consisted of those CN samples recruited at inception as part of the AIBL study that did not meet criteria for MCI [4]. Participants were assessed with neuropsychological evaluations over four time points, Baseline (N = 761), 18 (N = 697), 36 (N = 613), and 54 (N = 415) months (Table 1). To investigate how well the predicted groups performed against two AD related pathological parameters, a subset of the data with hippocampal volume, as measured by magnetic resonance imaging (MRI, Baseline N = 168, 18 months N = 151, 36 months N = 120, 54 months N = 116), and with measures of Aβ, as evaluated using the standardized uptake value ratio (SUVR) from positron emission tomography (PET, using the 11C-Pittsburgh compound B (11C-PiB, PiB-PET), Baseline N = 184, 18 months N = 162, 36 months N = 134, 54 months N = 116). Further information regarding sample demographics are shown in Table 1.
Cognitively normal sample demographics over 54 months
Neuropsychological evaluation and composite scores
Neuropsychological assessment was performed (as described previously in [4]) at Baseline, 18, 36, and 54 months. Data from the AIBL neuropsychological tests were organized into specific cognitive domains as per [8]: verbal episodic memory (Logical Memory 2, CVLT-II Long Delay Free Recall and CVLT-II d), visual memory (RCFT Short Delay Recall, RCFT Long Delay Recall and RCFT Recognition), executive function (Stroop C/D, Letter Fluency (FAS) and Category Switching (fruit/furniture, D-KEFS)), language (Category Fluency (animals/boys names) and the Boston Naming Test), attention and processing speed (Digit Span [forward & back], Stroop Dots and Digit Symbol-Coding), and visuo-spatial functioning (RCFT Copy and Clock Drawing). Organization of composite scores were based on the decision of the AIBL expert panel of neuropsychologists as to what each specific neuropsychological tests would best represent each cognitive domain. For each of these six cognitive domains, data from the neuropsychological tests were organized into a composite score, following the method of [11].
Briefly, estimation of each composite score involved the following: A) model-based estimates were derived by regressing the average of the set of cognitive scores (which make up the component, dependent variable) with age, education, gender, and collection point (independent variables) for the CN group only; B) using these model estimates, we derive adjusted values for the complete data set using the predict() function in R (CN, MCI, and AD); and C) then lastly, we standardize the data, removing the predicted value from (B) and the mean for the CN group and divide by the standard deviation. Thus, for each cognitive domain and each time point, the composite score for the CN group had a mean of zero and standard deviation of one. Consequently, values from composite scores that deviated below this mean at each time point may be classified as belonging to participants whose cognition ability is different to others measured in the study.
MRI and PiB-PET imaging data collection
MRI and neocortical Aβ imaging with PiB-PET measures were calculated as previously described [12]. Briefly, high resolution MRI scans (3D T1 & T2 weighted) were taken for all participants, with images segmented into white matter, grey matter, and cerebrospinal fluid using the expectation maximization algorithm used in [13]. Hippocampal volume was derived using the manual curation based on the Montreal Neurological Institute template. Total hippocampal volume (expressed as cm3) was calculated by adding left and right hemisphere values, and dividing by the intracranial volume specific to each time point. PiB-PET images were spatially normalized using CapAIBL [14] with an adaptive atlas. Using a volume of interest template to create a quantitative value for amyloid burden, images were summed and spatially normalized to derive a SUV. This value was normalized by the cerebellar cortex to create the SUVR.
Statistical methodology
Unsupervised clustering of each composite score for each time point was performed using overfitted univariate finite mixture models using the Zmix algorithm [15]. Zmix is an unsupervised algorithm for performing model-based clustering by mixture models when the number of groups is unknown. Essentially, a larger number of groups than thought necessary are included in a mixture model. The algorithm creates groups which are not needed to model the data to be assigned very low weights, such that unnecessary groups have a high probability of being empty. The distribution of the number of occupied groups is used to estimate the number of groups supported by the data.
The approach consists of two main steps: 1) analyzing composite score data via Zmix and 2) computing an aggregate measure of posterior probabilities (AMPP) for each individual in the study. In the first step, univariate Gaussian mixture models containing five groups (the chosen upper bound) were fit independently to each composite score and time point, comprising 24 datasets in total (six composite scores for four time points). Using the tools available in the Zmix R package [15], the distribution of the number of occupied groups, the allocation probabilities to the identified groups, and the distribution of all parameter estimates (i.e., group weights, means, and variances) were extracted for each time point.
In the second step, an aggregate measure of posterior probabilities (AMPP score) was computed to obtain a global measurement of the probability of an individual being persistently allocated to a group with a negative mean, over all scores and time points. This identifies individuals who consistently diverge from the majority of the CN individuals, utilizing all the information available from the results of the first step. For each individual, the AMPP score is the cumulative sum of the probability of being allocated to a group for which the mean was below zero with a probability greater than 0.95. This was normalized to assist interpretation. The AMPP score was used to classify the individuals into three subgroups (defined by AMPP scores <2: low, 2-3: medium, and >3: high, chosen via visualization of the distribution) that putatively relate to their underlying risk of showing signs of decreased cognition that could be flagged for follow up with the treating specialist. Further details about the algorithm and other methods utilized are presented in the Supplementary Material.
As no ‘gold standard’ exists to validate the AMPP score for this study, a post-hoc analysis exploring the association of high AMPP scores with PiB-PET and MRI measures was performed to provide a validation of the results. Sample sizes for AMPP groups across the complete data sample, the PiB-PET and MRI samples are shown in Table 2.
AMPP group frequencies per time point
RESULTS
Population sample demographic comparisons
Average age for the CN population at Baseline was 70.4 (SD: 6.8). Proportions of males to females, those educated less than compared with greater than 12 years and those with or without at least one APOE ɛ4 allele did not differ significantly between time points (p > 0.05, Table 1). Comparing those who dropped out at each time point to those that remained, there was no significant difference in APOE ɛ4 allele status or gender (p > 0.05), except at 18 months, where there were slightly more males that dropped out compared with females (p = 0.02). MMSE and CDR values were stable across all time points (MMSE: 29, CDR score: 0). At all time points, those that dropped out were significantly older than those that remained (p < 0.05). Supplementary Table 1 shows the mean (± SD) for each composite score at each of the four time points. Given the smaller sample size for the imaging group, we tested the differences in cognitive composite score between the smaller imaging group and the whole group for each composite score at baseline to ensure that the imaging group was representative of the overall sample. No differences were seen for any of the composite scores (p > 0.05).
Assessment of composite scores via Zmix
At each of the four time points, visuo-spatial function resulted in a 50% or greater probability of two or more groups (model estimated composite group means are shown in Supplementary Table 2). Attention and processing speed (18 and 36 months) and visual memory (Baseline) also yielded probabilities greater than 10% (36, 23, and 19%) suggesting a moderate capability for group separation. The participant-wise posterior probability of the cognitive composite scores where more than one group was found is shown in Fig. 1 using eight pairs of plots. Each pair contains a boxplot of the distribution of the group means for the composite score/time point in question on the left, and on the right, a plot of the ordered group membership probabilities of allocation for each participant. Darker shading represents those participants with high posterior probability of being either group. For visuo-spatial function, the plots show that group 2 is usually quite small, ranging from 13% of the total sample at baseline to 30% at 54 months (Supplementary Table 2). Attention and processing speed and visual memory, however, show much larger group numbers for group 2, ranging from 41% for visual memory at 18 months, to 62% for attention and processing speed at 36 months (Supplementary Table 2).

Participant distribution for the top composite scores for each time point. Each subplot includes on the left, a box and whisker plot of the mean model predicted composite score per predicted group, and on the right, the ordered posterior probability for each participant to be included in the group. For plots on the right, the x-axis: Y represents participants, with darker shading correspond to a higher posterior probability (closer to 1), and light shading to low probability (closer to 0). The y-axis represents the number of groups that were found via Zmix. Those plots with a small probability of there being two groups, as represented by a small p(K = 2) on the box and whisker plot demonstrate that while two groups were found, the probability that there is actually two groups is small, while the probability of there being only one group is large.
Aggregate measure of posterior probabilities (AMPP) score
The AMPP score was used to quantify the overall probability for each individual participant, across all scores and times, of being allocated to a possible pre-clinical group. With a posterior probability of there being more than one group of over 0.2, visuo-spatial function at all time points, attention and processing speed at 18 and 36 months, and visual memory at baseline were chosen to create the AMPP score (distribution of AMPP scores shown in Supplementary Figure 1). Assessing each of the composite scores at Baseline as compared to 18 months with respect to AMPP score groups across all participants, we found that AMPP groups were not consistent across all composite scores, with only those scores that contributed towards the AMPP score showing changes between baseline and 18 months (Fig. 2). By looking at individual participant identification numbers, it can be seen that many of the participants do not perform poorly across all scores, and may only have deficits in one or two cognitive domains. For those scores that did show participant separation into two groups (visuo-spatial functioning, attention and processing speed, and visual memory), the majority of the participants with high AMPP scores had composite score values less than zero, demonstrated by the points having a clear shift to the left in Fig. 2 (non-green points), indicating a decrease in cognitive performance at 18 months.

Scatter plot of six composite scores by AMPP score at Baseline and 18 months. Baseline on the x-axis and 18 months on the y-axis. Each point is colored and scaled according to the AMPP score associated with the individual. Points are plotted as their participant ID such that identification numbers can be seen easily in each plot. A) Verbal episodic memory; B) Visual memory; C) Executive function; D) Language; E) Attention and processing speed; F) Visuo-spatial functioning.
Further characterizing the utility of the AMPP score into three groups, Fig. 3 shows a clear separation in mean composite score over time between AMPP groups for visuo-spatial functioning, visual memory and attention and processing speed. Differences in other composite scores were less visible. It should be noted here that the derivation of these composite scores included adjustment for time point, such that each individual collection of data was a raw and independent representation of participants cognitive state at that time, allowing direct comparison between time points and removing any bias due to within patient correlations. As such, very little slope is shown over time for all composite scores, and changes between AMPP groups denote differences that are consistent over time.

Line and error bar plots of six cognitive composite scores over 54 months between AMPP groups. Dots represent the mean cognitive composite value at each time point, and error bars represent the standard error of the mean. A) Verbal episodic memory; B) Visual memory; C) Executive function; D) Language; E) Attention and processing speed; F) Visuo-spatial functioning.
Investigating the same groups for those participants where PET SUVR and hippocampal volume was available (Table 2), we saw both a general increase in SUVR (Fig. 4A) and a general decrease in hippocampal volume (Fig. 4B) over time for those participants in the high AMPP group as compared with the low/medium groups. While a trend is apparent and corroborates the AMPP score’s ability to identify individuals potentially at risk, the sample size of the high AMPP group with imaging data for the later time points (Table 2) was insufficient to prove this with any statistical significance for hypothesis testing.

Mean SUVR and Hippocampus volume by AMPP score group over 54 months. The figure contains data from only those participants who were measured for either PiB-PET SUVR or hippocampal volume. The low and medium grouping corresponds to individuals with AMPP scores of ≤3, while high the high group to AMPP scores of >3.
Discussion
This study aimed to determine whether CN participants enrolled in the AIBL study, for whom deterioration in cognition is inconsistent with normal aging and may therefore reflect pre-clinical AD could be identified on the basis of their cognitive composite scores. This was achieved using a Bayesian method for the estimation the number of groups in an unsupervised manner [15]. The rationale for this study was to both define those cognitive composite scores that had two or more natural distributions (groups) which may represent underlying disease pathology, and then to determine whether participants with a high AMPP score were likely to have early disease-specific cognitive changes. Results from this study found that data from only a few cognitive composite scores had a high probability of allocating individuals scores into two or more groups. While most classical methods would have searched for statistical differences for each composite cognitive score between disease biomarker status (PET amyloid), the unsupervised Bayesian Gaussian approach was aimed at defining those CN individuals who would have disease related cognitive changes different to those in normal aging CN individuals.
Considering all six cognitive composite scores from the AIBL neuropsychological battery, changes in visuo-spatial function provided the best separation of CN participants into potentially pertinent, “progression” or “pre-clinical” groups whose mean scores were less than those of the majority of CN participants and therefore indicated a declining trajectory. Other composite scores, such as attention and processing speed, and visual memory, also performed well in separating participants into possibly pertinent groups. The visuo-spatial function construct consists of two measures, RCFT Copy and Clock Drawing, two aspects of cognition known to become disordered in AD dementia, with poor scores on these tests reflecting possibly greater cortical atrophy as compared to participants with late onset AD (LOAD) [16, 17].
The AMPP score, which was used to express the probability of group membership to either normal or declining trajectory groups showed a small subset of participants were declining at a faster rate as compared with the normal group. These participants, characterized by high AMPP scores, also had higher increases in SUVR over 54 months as compared to those in the low AMPP group, and faster hippocampal volume atrophy over 54 months as compared to those with low or medium AMPP group. While it is known that the accumulation of Aβ plaques is highly relevant to an AD diagnosis, it is possible that decreases in hippocampal volume over time seen in the high AMPP score group may also reflect other non-AD related diseases.
Determination of the ultimate set of non-invasive markers that classify pre-clinical disease is one of the most important research questions facing clinical trial designs today. Studies such as [18–20] highlight the importance of early detection of AD pathology, and show that composite scores provide greater sensitivity to amyloid-related cognitive decline and impairment in non-demented individuals as compared to scores from the individual cognitive tests upon which they are based. Furthermore, Langbaum and colleagues [21] showed that composite scores are less sensitive to measurement error as compared with single cognitive tests, and are preferable to use in the detection of cognitive decline in pre-clinical AD. In the past, such studies organize data from individual tests into composite scores, and then compare these directly with established disease risk factors using a supervised statistical approach. The importance and sensitivity of different composite scores are then determined on the basis of their association with these risk factors. While we agree that this is an important and necessary step, the rationale used in the current study used a completely unsupervised approach (blind to PET amyloid status and other risk factors) to neuropsychological data to determine whether there were in fact separate groups in a CN population that may represent a pre-clinical disease state independent of biomarker status.
Using the combination of the unsupervised clustering method and the AMPP score with prospective data from the AIBL study, this research has uncovered a small subset of CN participants that tend to have low cognitive scores in three cognitive domains, most commonly visuo-spatial functioning. Our results support the hypothesis that while the majority of CN participants will have a slow and uniform decline in cognition (as reviewed in [22]), there is a small group with AD pathology that demonstrates faster deterioration in cognition. Given this approach is fundamentally cross sectional, and that longitudinal follow up on the same participants was suggestive of pre-clinical AD groupings, this method maybe useful for others without longitudinal follow up to identify those with pre-clinical pathology.
From the six cognitive domains assessed within this study, the composite measure of visuo-spatial function provided the greatest sensitivity to cognitive decline in the older adults, defining a small group of participants with that declined on this score. Other composite measures including attention processing speed, visual memory, and verbal episodic memory were also able to separate these participants, albeit with the posterior probability for these scores at classifying individuals into two groups less strong. This finding is consistent with other studies that have observed deterioration in visual spatial function up to five years prior to clinical diagnosis of AD [23–25]; while decline in verbal episodic memory has been detected up to eight years prior to diagnosis [26]. More recently, a review paper by [27] recommended that visuo-spatial tests of block design, clock drawing and complex figure recall have some of the greatest diagnostic potential for early stage damage caused by AD, supporting the findings from this research.
The approach used here was motivated partially by the lack of knowledge of the underlying processes which may lead to changes in cognition in the early stages of AD. The data vary widely between individuals, tests, and time points, and such variation may be related to early AD via complex pathways which may not lead to a neat or even consistent decline. Therefore, instead of fitting a single large, complex model which attempts to encompass all sources of variability and correlation, an alternative approach was chosen which places few assumptions on the data, fitting a mixture model to each score and time point independently. While the mixture modelling itself does not account for repeated and correlated measures, we perform no parametric hypothesis testing, and only place probabilities that participants are likely to be in either a “normal” or “pre-clinical” group. Furthermore, we remove correlation bias by adjusting for collection point in the derivation of the composite scores, so that the design was able to assess whether the same relationship found at Baseline is validated at repeated time points.
One possible limitation of this study is the small sample size of the imaged group, with only a very small number with available PET image data out to three time points. Thus, results from the unsupervised cognitive modelling, although aligning with what we would expect and not different to the overall cohort at baseline, need to be taken with caution. Furthermore, the large proportion of missing data over subsequent time points meant the sample size was quite reduced by 54 months. Participants from the AIBL study were strictly screened over multiple time points to rule out dementia from other sources such as vascular dementia, Lewy body dementia, and Parkinson’s disease dementia; however, it is possible that CN participants who maybe in the early stages of any of these non-AD dementias are yet to exhibit the clinical symptoms from these other dementias. In this way, some variation that is seen in the cognitive composite scores maybe due to other undiagnosed dementias.
In summary, we evaluated six cognitive composite scores using an unsupervised mixture method to derive clinically meaningful groups. The data from the AIBL cohort showed that, at consecutive time points, visuo-spatial function was able to identify a small subgroup of CN participants with AD pathology. Using the univariate approach to model composite scores at each time point allowed the assessment of individual composite scores at baseline, as well as determination of whether those composite scores could discern the same groups at later time points. Further research needs to be conducted to assess both visuo-spatial functioning in other cohorts, to ascertain a possible time line for the identification of pre-clinical disease.
