Abstract
Increasingly, clinical trials for Alzheimer’s disease (AD) are being conducted earlier in the disease phase and with biomarker confirmation using in vivo amyloid PET imaging or CSF tau and Aβ measures to quantify pathology. However, making such a pre-clinical AD diagnosis is relatively costly and the screening failure rate is likely to be high. Having a blood-based marker that would reduce such costs and accelerate clinical trials through identifying potential participants with likely pre-clinical AD would be a substantial advance. In order to seek such a candidate biomarker, discovery phase proteomic analyses using 2DGE and gel-free LC-MS/MS for high and low molecular weight analytes were conducted on longitudinal plasma samples collected over a 12-year period from non-demented older individuals who exhibited a range of 11C-PiB PET measures of amyloid load. We then sought to extend our discovery findings by investigating whether our candidate biomarkers were also associated with brain amyloid burden in disease, in an independent cohort. Seven plasma proteins, including A2M, Apo-A1, and multiple complement proteins, were identified as pre-clinical biomarkers of amyloid burden and were consistent across three time points (p < 0.05). Five of these proteins also correlated with brain amyloid measures at different stages of the disease (q < 0.1). Here we show that it is possible to detect a plasma based biomarker signature indicative of AD pathology at a stage long before the onset of clinical disease manifestation. As in previous studies, acute phase reactants and inflammatory markers dominate this signature.
INTRODUCTION
Although there have been considerable advances in our understanding of Alzheimer’s disease (AD), the results of clinical trials testing AD treatments have so far been disappointing [1, 2]. A recent study suggested about 99.6% of trials aimed at preventing or reversing the disease between 2002 and 2012 have failed [3]. Only a few pharmaceutical interventions have been approved to treat AD and these only temporarily treat symptoms and are not disease modifying. Potential explanations for the failure of disease-modifying therapies to date could include i) the relatively advanced stage of disease targeted in clinical trials and ii) the high rate of absence of AD pathology in participants included in these trials. This suggests that trials will increasingly be targeting the pre-clinical AD population and therefore necessitating the use of biomarkers such as positron emission tomography (PET) imaging or cerebrospinal fluid (CSF) measures to confirm AD pathology. However, such measures are relatively invasive and costly and current screening failure rates are high. Having a blood marker that identifies participants with likely pre-clinical AD would provide a pre-screening filter for PET imaging or CSF measures, reducing screening failures and accelerating clinical trials. Moreover, given the ease of repeated sampling of blood, such a measure might find utility as a companion biomarker in clinical trials.
Research has suggested that there is a peripheral signal of disease in the blood of people with clinically diagnosed AD [4, 5]. Moreover, such a signal has also been shown to reflect AD brain pathology [6–8]. Therefore a peripheral measure may also be a valuable biomarker in the preclinical stages. In 2013, for example, we reported a protein signature in non-demented elderly people from the Baltimore Longitudinal Study of Ageing (BLSA) that could differentiate between individuals who would go on to show amyloid pathology using PET imaging ten years later [9]. In the present study, we aimed to expand upon this work by increasing the coverage of the proteome investigated and by identifying proteins that are stable in their ability across time to reflect amyloid as a continuous measure in cognitively healthy individuals.
The present study used two complementary proteomic techniques for the discovery of biologically relevant plasma biomarkers of pre-clinical AD. Two Dimensional Gel Electrophoresis (2DGE) was used to identify larger proteins (∼20–250 kDa), whereas gel-free liquid chromatography-tandem mass spectrometry (LC-MS/MS) methodology was tailored to focus on the identification of low molecular weight (LMW) proteins (<30 kDa). Plasma proteins were assessed for their relationship to global brain amyloid load at three time points; 12 years before in vivo amyloid PET imaging, 6 years before amyloid PET imaging, and concurrently to amyloid PET imaging. Proteins that were consistently associated with amyloid PET at all three time points were investigated further as candidate biomarkers throughout the disease in an independent cohort, using pre-existing data generated by Ashton et al. [7].
MATERIAL AND METHODS
Discovery cohort: Baltimore Longitudinal Study of Ageing neuroimaging sub-study (BLSA-NI)
The BLSA was initiated in 1958 as a prospective longitudinal study of healthy human aging [10–14]. The primary objective of the study is to understand the physical and cognitive changes associated with normal ageing. The BLSA neuroimaging sub-study is currently in its 19th year and includes a subset of BLSA volunteers who return annually for imaging and clinical evaluations. The neuroimaging sub-study is described in detail by Resnick et al. [15].
Discovery cohort: assessments, blood collection, and processing
In the initial 2DGE experiments (discovery phase 1) we examined plasma samples from 54 subjects from the BLSA-NI, whose demographics are shown in Table 1. All subjects had serial blood sampling over a 12-year period prior to a 11C-PiB PET scan. Blood samples were obtained in the morning after an overnight fast, centrifuged at 3000 rpm for 30 min at 4°C, and plasma collected and stored in 1.5 ml aliquots at -80°C. This study was approved by the local institutional review board. All participants provided written informed consent.
Plasma samples from three time points approximately 6 years apart were selected for proteomic analysis; time point 1 = 12±2 years prior to the PET scan, time point 2 = 6±2 years prior to the PET scan, and time point 3 = concurrent to the PET scan (Fig. 1). Plasma samples from a subset of 25 of these subjects, matched for age and gender, were further analyzed for the discovery of low molecular weight proteins as biomarkers (discovery phase 2).
Extensive neuropsychological testing was completed at each time-point and apolipoprotein E genotyping was performed for each participant [16]. All subjects remained cognitively normal throughout the 12-year period of this study.
BLSA-NI 11C-PiB PET
The PiB PET imaging methodology of the BLSA-NI study is detailed elsewhere [17]. The mean cortical distribution volume ratio (DVR) was calculated by averaging values from orbitofrontal, prefrontal, superior frontal, parietal, lateral temporal, occipital, and anterior and posterior cingulate regions. Each of the 54 subjects had one 11C-PiB PET scan taken concurrently to their final plasma sample (time point 3).
Discovery study, phase 1: High molecular weight studies using 2DGE
2DGE analysis was performed as detailed in Hye et al. [4]. Each sample was analyzed in duplicate. Gel images were analyzed using Progenesis SameSpots software (Non-linear Dynamics) and normalized volumes for each of the protein spots were exported. As each sample had been analyzed in duplicate the coefficient of variation (CV) and average normalized volumes were calculated for every duplicate spot. Average volumes were selected for statistical analysis, unless the CV was >40% then the spot closest in value to the overall normalized spot volume mean was selected. A script was written in R (version 3.0.0) to complete the above pre-processing.
Protein identification by liquid chromatography – tandem mass spectrometry (LC-MS/MS)
2DGE spots that had statistically significant changes in volume associated with 11C-PiB PET DVR were manually excised from 2DGE preparative gels. The proteins present in a given spot were trypsin digested and the resulting peptides were subjected to LC-MS/MS for protein identification. A Thermo EASY n-LC II system (Thermo Scientific) was coupled to an LTQ Orbitrap Velos Pro for LC-MS/MS protein identification. Peptides were resolved by reverse phase chromatography (35 min) using 20x collision-induced dissociation (CID) scans following each Fourier transform mass spectrometry (FTMS) scan (2x μScans at 30,000 resolving power @ 400 m/z). CID was carried out on 20 of the most intense ions from each FTMS scan.
Pre-processing of LC-MS/MS data
Raw data files produced in Excalibur software (Thermo Scientific) were processed using Proteome Discoverer V1.3 (Thermo Scientific) and peptide spectral matches and subsequent protein identifications were determined using Mascot (v2.3; http://www.matrixscience.com).
Discovery study, phase 2: Low molecular weight (LMW) studies using gel-free LC-MS/MS
Gel-free LC-MS/MS was performed to provide a complementary evaluation of the proteome to that already conducted using 2DGE. Prior depletion of abundant and high MW plasma proteins allowed for the extraction and identification of proteins with a LMW (<30 kDa). Samples were first passed through an albumin and IgG depletion column (Proteoprep Immunoaffinity Albumin and IgG Depletion Kit from Sigma), then remaining large proteins were removed using a 30 kDa cut off filter (Amicon Ultrafree-MC centrifugal filter tubes from Merck Millipore). The LMW filtrate was then trypsin digested and LC-MS/MS analyzed, as described previously, with triplicate injections.
LMW LC-MS/MS data pre-processing
Raw data was processed as previously described in discovery phase 1. Data was then loaded into Scaffold 4 (Proteome Software Inc., USA) allowing the merging of triplicate sample injections into one BioSample. Proteins were filtered using the following cut off values: protein threshold = 95%, minimum number of peptides = 2, peptide threshold = 80%. Proteins with more than 50% missing data, decoy proteins, and any contaminants (keratin and any trypsin) were removed. To enable comparisons across samples the Normalized Total Spectra quantitative value was selected for statistical analysis.
Extension cohort: The Australian Imaging, Biomarkers and Lifestyle Flagship Study of Ageing (AIBL)
The AIBL study, launched in 2006, is a prospective longitudinal study aiming to discover biomarkers, cognitive characteristics, and health and lifestyle factors contributing to the subsequent development of AD (http://www.aibl.csiro.au/). The study design including additional subject recruitment, diagnosis and PET imaging have been described previously [18].
Extension cohort: assessments, blood collection, and processing
This study further examined pre-existing proteomic data from 78 subjects from the AIBL study, generated by Ashton et al. [7], which included; 19 healthy controls, 31 subjective memory complaints (SMC), 22 mild cognitive impairment (MCI), and 6 AD patients, all of which had undergone 11C-PiB PET scans. Detailed diagnostic criteria and further cohort information is available in Ellis et al. [18]. Subjects had been selected to be enriched for clear cases of PiB negativity (PiB-) and positivity (PiB+) based upon a standardized uptake value ratio (SUVR) cut off of 1.3. Standardized clinical assessments included Mini-Mental State Examination (MMSE), and apolipoprotein E (APOE) genotypes were available. The details of blood collection and sample processing have been previously reported [8]. Demographics and PiB group distributions are shown in Table 2.
AIBL 11C-PiB PET
The PiB PET imaging methodology of the AIBL study is detailed elsewhere [19]. The SUVR values were calculated using the cerebellar grey matter as the reference region and neocortical amyloid burden (NAB) was determined using the average SUVR of the mean of frontal, superior parietal, lateral temporal, lateral occipital, and anterior and posterior cingulate regions.
Extension Study: Tandem Mass Tag LC-MS/MS
Tandem mass tag (TMT) labeling enables a high throughput multiplex approach for LC-MS/MS analysis. TMT 6Plex (Thermo Scientific) combined with LC-MS-MS therefore allows the use of the same methodology as in gel-free LC-MS/MS discovery phase experiments but now multiplexed with up to six samples including a reference sample measured simultaneously. Ashton et al. [7] had previously performed TMT 6Plex LC-MS/MS on 78 samples from the AIBL study, and therefore this pre-existing dataset was here further analyzed. Protein identifications observed in different gel fractions were considered as separate entities defined as protein MW isoforms.
Statistical analysis; discovery and extension studies
All statistical analyses were performed using R (version 3.0.0). Discovery data was analyzed using partial correlation; proteomic data was correlated with 11C-PiB PET DVR while controlling for six covariates (age, gender, education (years), body mass index (BMI), cholesterol, and APOE ɛ4 status). For 2DGE data, a penalized regression analyses using elastic net penalty [20] with leave-one-out cross validation (LOOCV) was additionally used to select a smaller subset of important spots which could be picked for LC-MS/MS analysis. Collinearity is common in 2DGE datasets perhaps as different forms of the same protein may be present within multiple 2DGE spots. An elastic-net regression model was chosen as it deals with overfitting and accounts for multi-collinearity between predictors (spots) through encouraging a grouping effect where strongly correlated predictors tend to be in or out of the model together. This type of regression model is also preferable for high-dimensional data where the number of explanatory variables is larger than the sample size (p > n). LOOCV was selected as the most appropriate cross validation approach for the sample size.
Extension study data was obtained from a pre-existing dataset generated by Ashton et al. (2015) [7]. Briefly, protein MW isoforms had undergone the following statistical tests; Mann-Whitney U test and logistic regression (PiB+and PiB- group comparison) and Spearman Rank Correlation (SRC) and linear regression (associations with PiB retention as a continuous measure). Each test had been completed for both the mean and median protein values, resulting in a total of eight statistical tests per protein MW isoform. For logistic and linear regression, age, gender and APOE ɛ4 status were used as covariates. For the purpose of this study, we targeted only the protein MW isoforms of interest for statistical analyses and re-calculated false discovery rate corrections for this subset of data.
RESULTS: DISCOVERY
We aimed in this study to identify blood-based markers indicative of brain amyloid in cognitively healthy people using samples obtained from the BLSA. Furthermore we sought to identify proteins that remained stable in their biomarker ability over time. We then examined whether these proteins were also associated with PET amyloid in disease using samples and data obtained from the AIBL study. In both cases we predicated plasma studies not on clinical status but on PET imaging using in vivo 11C-PiB PET derived measures of global brain amyloid burden. In order to explore the large dynamic range of proteins detected by MS/MS, we utilized a combination of gel based and gel free methods of protein separation and also depleted the samples of high molecular weight proteins.
Discovery phase 1 : 2DGE identification of high MW plasma protein markers of global 11C-PiB PET DVR
We first performed 2DGE on 54 participants in the BLSA-NI, all without dementia at each assessment at three time points; 12 years before PET imaging, 6 years before PET imaging, and concurrently with PET PiB imaging. In total, 1386 discrete silver stained protein spots, each representing proteins with similar molecular weight and charge characteristics, were identified as being well defined and present on each gel.
Partial correlation was performed to identify 2DGE protein spots that were significantly related to cerebral amyloid load whilst controlling for age, gender, education, BMI, cholesterol, and APOE ɛ4 status. Normalized spot volumes correlating with 11C-PiB PET DVR at each time point, whilst controlling for covariates were identified (119, 47, and 73 protein spots nominally significant at baseline, time point 2, and time point 3, respectively, p < 0.05). 2DGE protein spots showing significant correlation were visually re-assessed to verify suitability for identification by mass spectrometry. Spots which were too small or faint to be accurately located by eye were excluded from further analysis. After these visual inspections the number of spots remaining were: 53 spots for time point 1, 6 spots for time point 2, and 33 spots for time point 3. These proteins were taken forward for further regression analysis. The above steps are summarized in Fig. 2.
For each time point, an elastic-net penalized regression model with LOOCV was performed on the remaining spots with seven covariates included; age, gender, BMI, cholesterol, education (years), APOE ɛ4 status, and prediction time (years between plasma sample and 11C-PiB PET scan). The elastic-net model with the minimum cross-validated Mean Squared Error (MSE) was selected as the model of interest. To obtain a measure of how well these elastic-net protein models were predicting PET amyloid load, multiple linear regressions were performed. The resulting regression models could explain 83%, 53% and 90% of the variance in 11C-PiB PET DVR levels 12 years later, 6 years later, and concurrently (Table 3).
The protein spots on 2DGE gels that were included in each regression model and hence had some value in predicting PiB amyloid load were selected for identification with mass spectrometry. Manual spot picking of significant spots was performed as described in Hye et al. [4]. The proteins found to be abundantly present in the 2DGE spots significantly related to 11C-PiB PET DVR at each of the three time points are listed in Supplementary Tables 1, 2, and 3.
Discovery phase 2: LC-MS/MS identification of LMW plasma protein markers of global PiB PET DVR
We then performed a gel-free tandem mass spectrometry experiment using a subset of the same sample set in order to supplement the 2DGE/MS data that preferentially examined abundant high molecular weight proteins. The analysis of the resulting data followed broadly the same workflow first using partial correlation with11C-PiB PET DVR while controlling for covariates (age, gender, education (years), BMI, cholesterol, and APOE ɛ4 status) to identify proteins which have a relationship with amyloid load. Partial correlations at baseline, time point 2, and time point 3 revealed 5, 2, and 2 proteins, respectively, which nominally significantly correlated with 11C-PiB PET DVR 12 years later, 6 years later and concurrently (p < 0.05) (data shown in Supplementary Tables 4, 5, and 6).
Combining data from the two discovery approaches
Table 4 combines together all of the proteins identified as candidate biomarkers in the two discovery phases. Seven proteins were found as candidate biomarkers for 11C-PiB PET DVR consistently across all three time points; alpha-2-macroglobulin, apolipoprotein A-I, complement C3, complement C4B, haptoglobin, Ig kappa chain C region, and serum albumin.
RESULTS: EXTENSION STUDY
TMT LC-MS/MS performed on AIBL subjects
Proteins identified as candidate biomarkers across all three time points were taken forward for further investigation. We then wanted to measure each of these discovery phase proteins simultaneously in multiple samples from an entirely independent cohort to allow for relative quantitation in replication phase study. TMT combined with LC-MS/MS is an approach that allows the use of the same methodology to analyze proteins but now multiplexed with up to six samples including a reference sample simultaneously. This approach had previously been performed by Ashton et al. [7] on 78 samples from the AIBL study; TMT LC-MS/MS had been performed on all samples and after a stringent data clean up procedure (detailed in Ashton et al. [7]) 381 protein MW isoforms were confidently measured, representing 116 unique protein groups. Of our 7 candidate biomarkers from the discovery phase analyses all but one (Complement C4b) were detected in this dataset. As protein identifications observed in different gel MW fractions were considered as separate entities these 7 proteins consisted of 45 protein MW isoforms which were then targeted for statistical analyses.
A total of 15 protein MW isoforms passed at least one statistical test (uncorrected p < 0.05). After false discovery rate (FDR) correction 8 protein MW isoforms passed at least one statistical test (q < 0.1) (shown in Table 5). These 8 protein MW isoforms represented α2M, serum albumin, APO-A1, C3, and haptoglobin. This result adds support to the discovery findings that these 5 proteins perform as biomarkers of PET amyloid load, and importantly indicates that their ability remains stable throughout different stages of AD.
DISCUSSION
It would be of considerable benefit to AD clinical trials if markers in peripheral fluids could be found that correlated with pathology specific markers derived from PET or CSF measurements. Such markers might be less specific than direct measures of pathological proteins but might make a substantial contribution to reducing the costs of screen failure and accelerate identification of potential participants to interventional trials. As part of efforts to search for such markers, plasma proteins were assessed for their relationship to amyloid load 12 years prior, 6 years prior, and concurrently to estimation of cerebral amyloid pathology using a 11C-PiB PET scan.
This study used two complementary proteomic techniques for the discovery of plasma biomarkers of pre-clinical AD pathology. Together these two techniques provide a relatively comprehensive proteome analysis of plasma; a tissue challenging for biomarker discovery both because of its complexity and the dynamic range of protein concentration and size. Each technique identified unique candidate biomarkers and three candidate proteins overlapped between the two techniques, providing a form of technical replication.
Seven proteins correlated with amyloid burden at all three time points and were taken forward to extension experiments in an independent cohort comprised of cognitively healthy individuals as well as those with subjective memory concerns, mild cognitive impairments, and AD. Of these proteins, complement C4b was the only protein not detected using the multiplexing analysis, possibly because of a failure of TMT labeling or because C4b has high peptide sequence homology with another protein detected, such as complement C4a, and consequently peptides may have been misidentified. Five of the six proteins detected in these extension studies showed a significant association with PET amyloid load even when FDR corrections were applied. These five proteins are α2M, serum albumin, Apo-A1, C3, and haptoglobin. Together these results provide strong support for the ability of these proteins to reflect amyloid burden throughout the disease course, from a pre-clinical phase through to established clinical syndromes.
All five of these proteins have been previously identified in plasma as candidate biomarkers associated with AD disease status and/or pathology [4, 22–27]. In fact α2M and C3 are two of the most reproducible plasma protein markers of AD, associating with AD-related phenotypes in five independent cohorts [5]. It is encouraging to see the same proteins consistently reported as plasma AD biomarkers across studies. The findings here add support for their utility as AD biomarkers in the preclinical stage, and also demonstrates that their biomarker abilities remain stable over time.
In the discovery phase experiments, a number of proteins were identified as biomarkers of PET amyloid load at 1 or 2 time points. Due to the statistical approach used (penalized regression model), it is not possible to rule out that these proteins may have also performed as biomarkers at all three time points but were omitted from the regression model due to collinearity with another protein. Of particular interest are fibrinogen chains alpha, beta, and gamma, all of which were significantly associated with PET amyloid load at two time points, and have been identified in previous studies as biomarkers of PET amyloid load in AD cohorts [7, 27]. Furthermore fibrinogen has been shown to cross the blood-brain barrier and co-localize with amyloid plaques in AD brain tissue and AD animal models [28–30]. The results of this study therefore replicate the potential biomarker ability of fibrinogen and furthermore indicate its possible utility during the preclinical stage of the disease.
The proteins found here to be implicated in pre-clinical AD are involved in a variety of systems, including; complement and inflammation, coagulation, hemostasis, fibrinolysis, and blood-brain barrier integrity. Previous research has implicated these systems in AD pathogenesis [31–34]. Such systems may be causally related to AD onset, or be affected as a result of the disease. Additionally the involvement of these systems in AD pathology may be inter-linked or they may act independently, further work is needed to clarify these relationships. It may be possible to identify the resulting contribution of some of these pathways (e.g., chronic inflammation) as high risk factors for the development of AD in healthy individuals, comparable to that of more-established risk factors such as APOE ɛ4 status. At the very least, the presence of these biomarkers in healthy elderly individuals indicates potential early intervention targets worthy of further investigation. Further work examining how these plasma proteins may be associated with the longitudinal cognitive decline and brain atrophy data available within the BLSA-NI would also be beneficial to see whether similar biological systems are again highlighted.
We acknowledge the limitations of this study largely caused by the relatively small cohort of subjects used in this investigation. Though we believe the advantages of the longitudinal replication achieved in the discovery phases compensate for this, and enable the detection of proteins robust in their biomarker ability over time. The unbalanced discovery phase cohorts are also a limitation of this work to be considered, however this design was necessary due to sample availability and cost limitations. Validation within other healthy elderly cohorts with amyloid PET scans is also critically needed to assess the candidate markers ability to perform across cohorts and studies. We believe the proteomic markers identified in our study will serve as a useful starting point for future validation studies for such cohorts.
A further limitation of this study may be the lack of standardization in the approaches used for the amyloid PET quantitative analyses between the discovery study (DVR) and the extension study (SUVR). In some cases differing DVR and SUVR quantitative approaches may impact upon the classification of amyloid positivity, however we believe the effects of using the two different approaches to be minimal in this study. In the discovery phase, analyses subjects were not dichotomized into groups and the statistical analysis applied identified proteins that could predict continuous amyloid burden. As DVR and SUVR values are often reported to correlate, the resulting markers should be minimally affected by the use of either method. The extension study did use an SUVR cut off (1.3) to classify subjects as amyloid negative/positive, however the cohort was enriched for clear cases of PiB negativity and positivity (mean negative SUVR = 1.1, mean positive SUVR = 2.3). Therefore the likelihood of these subjects being identically classified as amyloid negative/positive via an alternative analysis approach is high.
In conclusion, this study adds further evidence that plasma is a rich resource for AD biomarkers. Biological variation and the dynamic range of plasma complicates the discovery, but through the use of complementary techniques these issues can be minimized. Here we show it is also possible to detect these plasma-based biomarkers at a stage long before the onset of clinical disease manifestations. We identified numerous candidate biomarkers, many of which have previously been reported as AD biomarkers in the clinical stage of the disease, providing further evidence for such biomarkers and supporting their utility as biomarkers longitudinally throughout the disease. Quantitative assays are now needed to validate the proteins reported here as candidate biomarkers in the pre-clinical stage. If validated these proteins may, independently or in combination, provide a blood based screening measure to enrich populations for clinical trial recruitment, and allow repeated testing as companion biomarkers in such clinical trials.
Footnotes
ACKNOWLEDGMENTS
We are grateful for grant funding from the Alzheimer’s Society, Alzheimer’s Research UK, and the NIHR Biomedical Research Centre for Mental Health and Biomedical Research Unit for Dementia at the South London and Maudsley NHS Foundation Trust and Kings College London. We are grateful to participants in the Baltimore Longitudinal Study of Aging for their invaluable contribution. This research was supported in part by the Intramural Research Program of the NIH, National Institute on Aging. A portion of this work was funded by GE Healthcare and Janssen R&D. SJK is supported by an MRC Career Development Award in Biostatistics (MR/L011859/1). The authors acknowledge data used in the preparation of this article obtained from The Australian, Imaging, Biomarkers and Lifestyle Study of Ageing (AIBL) which is a collaboration between CSIRO, Edith Cowan University (ECU), The Florey Institute of Neuroscience and Mental Health (FINMH), National Ageing Research Institute (NARI) and Austin Health (
).
