Abstract
Background
Regulatory approval of new investigational Alzheimer's disease (AD) therapies could be accelerated if reasonably likely surrogate endpoints could be used. Neurofilament light chain (NfL) has potential utility as a prognostic biomarker of neurodegeneration in AD.
Objective
To synthesize available evidence on the relationship between baseline NfL levels and longitudinal clinical decline.
Methods
A systematic literature review identified 19 eligible studies, contributing 37 longitudinal statistical models evaluating the association between baseline NfL (plasma or cerebrospinal fluid [CSF]) with subsequent clinical decline based on validated clinical scales including Mini-Mental State Examination (MMSE), Alzheimer's Disease Assessment Scale-Cognitive Subscale, and Clinical Dementia Rating. Results were pooled via meta-analysis, using partial correlation coefficients (PCC), separately for patient sub-groups (mild cognitive impairment, AD or combined).
Results
Across the AD continuum, higher baseline NfL levels were consistently associated with greater cognitive and global clinical decline in most analyses. This pattern was consistent for both plasma (pooled PCC = −0.17 [95% CI = −0.22, −0.12] for MMSE, any AD population) and CSF NfL (pooled PCC = −0.14 [95% CI = −0.24, −0.04] for MMSE, any AD population). The strength of association across multiple clinical endpoints and populations, measured by absolute value of pooled PCC, ranged from 0.13 to 0.25.
Conclusions
The results support the utility of NfL as a predictive biomarker for progression of clinical decline in AD patients.
Keywords
Introduction
Alzheimer's disease (AD) is a progressive brain disease resulting from neuronal damage, starting in brain areas responsible for memory, language, and thinking. 1 In 2018, the National Institute on Aging and Alzheimer's Association (NIA-AA) developed a research framework towards a biological definition of AD. 2 This framework has since been revised to provide criteria for staging of AD. 3 Based on this framework, AD is an aggregate of neuropathological changes defined by specific in vivo biomarkers. Biologic hallmarks characteristic of AD include amyloid-β (Aβ) plaques and intracellular tau neurofibrillary tangles, as well as neuronal degeneration.1–4 AD-related brain changes occur at least 20 years before symptom onset and can be detected with various biomarkers, through neuroimaging (i.e., magnetic resonance imaging and positron emission tomography [PET], and markers in cerebrospinal fluid [CSF] and plasma [or serum]). 5 Recent advances in the development of more accurate AD-specific blood tests have enabled clinical trials and specialized memory care centers to incorporate the use of blood tests to help identify AD-related brain changes,6,7 with an ongoing development of standardized and validated blood tests for use in routine clinical practice. 7
The updated NIA-AA diagnostic scheme is based on tau and beta-amyloid biomarkers which are closely associated with amyloid pathology. 3 In this scheme, the combination of the two proteinopathies is considered the defining feature for a diagnosis of AD. It is nevertheless acknowledged that a positive amyloid PET scan is a frequent finding in cognitively unimpaired individuals, whereas a high level of tau detected by a tau PET scan is invariably associated with neurodegeneration and cognitive symptoms. 3 This highlights the importance of neurodegeneration as a key downstream indicator of clinically relevant disease. Neurofilament light chain (NfL), originally regarded as a necessary marker of neurodegeneration and neuroaxonal injury, 2 is now classified as a tool for staging, prognosis and an indicator of a biological treatment effect in AD. 3 NfL is a non-specific marker of neuronal damage, that is increasingly used to evaluate disease progression (demonstrated by increased levels of NfL in the blood and CSF) across a wide variety of neurological conditions, including multiple sclerosis (MS), spinal muscular atrophy, and AD.8–12 In AD, pathological aggregation of tau protein results both in a loss of microtubule stability 13 and direct neuronal and synaptic toxicity14,15 with neuritic tau pathology strongly associated with clinical decline.16,17 NfL is released into plasma because of loss of structural integrity of neurons resulting from tau aggregation pathology. 18 As a result, plasma NfL concentration is correlated with the load of tau aggregation pathology18–20 and disease severity. 20 Therefore, there is a reasonable mechanistic basis for the potential utility of measurement of change in plasma levels of NfL as a downstream biomarker for the neurodegenerative process that underlies AD.
Elevated NfL levels in blood in AD reflect ongoing neurodegeneration. This makes it a good indicator of disease severity and progression without being reliant on the presence of any one pathology. Increased plasma NfL levels are present from pre-symptomatic 21 through to advanced stages of AD. 22 Multiple studies have now shown that plasma NfL levels correlate with disease severity and predict disease progression from mild cognitive impairment (MCI) to AD 23 or progression through stages of AD.18,24,25 Importantly, these relationships are evident even in early-stage AD patients with MCI, supporting the utility of NfL to detect and track disease progression across the continuum of AD. 21 Based on this accumulated evidence, NfL has become recognized as a core biological marker of neurodegeneration and clinical progression in AD. 3
Several independent studies have recently reported rates of change in plasma NfL levels correlating with changes in cognitive decline and neuroimaging hallmarks of AD, supporting the use of NfL concentration as a biomarker for disease progression.18,22,24,26,27 Increased plasma NfL concentration has also been associated with poorer cognitive scores in MCI-AD and AD,28,29 with longitudinal change in plasma NfL shown to correlate with measures of cognitive decline. 25 While there is evidence from the literature that independently supports the prognostic value of plasma NfL for disease progression and severity in AD, the relationship between blood-based biomarkers including NfL and clinically relevant outcomes such as cognition requires further validation and has been identified as a research priority by the Alzheimer's Association. 11
The regulatory context for the present study is that NfL has been used as surrogate endpoint for several neurodegenerative diseases. The most recent guidelines from the Consortium of Multiple Sclerosis Centers support the use of NfL as a blood biomarker to provide clinically useful information about prognosis and therapeutic efficacy for MS. 30 The FDA has recently approved a treatment for amyotrophic lateral sclerosis under the accelerated approval pathway based upon the surrogate endpoint of reduction of NfL in plasma. In December 2024, the FDA further validated NfL's utility by agreeing that reductions in NfL measured in CSF may serve as supportive evidence of therapeutic benefit for AMT-130 in Huntington's disease under the accelerated approval pathway. 31 However, to date NfL has not been used as the basis for regulatory approval in the treatment of MCI due to AD or AD dementia. Recent revisions to guidance from the FDA have further clarified the framework for evaluation of surrogate endpoints, emphasizing the need for convergence of evidence from multiple sources, including preclinical animal models, epidemiological data, and relevant clinical data from randomized controlled clinical trials (RCTs) demonstrating that an effect on the surrogate is reasonably likely predict clinical benefit. 32 The only data available from RCTs linking NfL levels with clinically relevant outcomes showed significant correlations with integrated Alzheimer's Disease Rating Scale (iADRS) and with progression of brain atrophy,33,34 whereas plasma levels of P-tau 217, GFAP and amyloid-β 42/40 showed no significant correlation.33,34
The Donanemab phase 2 RCT 34 met statistical significance in its primary end point iADRS. Although it did not meet statistical significance in the plasma NfL, it demonstrated statistical significance in the correlation between change in NfL versus change in iADRS (r = −0.182, p = 0.02) which supports plasma NfL is reasonably likely to predict a clinical benefit. Nonetheless it is important to further demonstrate that the biomarker is predictive of longitudinal clinical outcome. To our knowledge there has not been a systematic review investigating the relationship between baseline NfL levels and longitudinal change in cognitive or global function in AD. We have therefore undertaken this study to determine whether NfL has the potential to function as a predictive biomarker in MCI due to AD and AD dementia. To this end, we have identified previously published studies in MCI due to AD and AD dementia that have investigated statistical relationships between baseline NfL and change in clinical outcomes and used meta-analytic methods to determine the relative predictive utility of NfL.
Methods
A systematic literature review (SLR) was performed in accordance with the Cochrane Handbook for Systematic Reviews of Interventions and reported in alignment with the Preferred Reporting Items for Systematic Literature Reviews and Meta-Analyses (PRISMA) statement.35–37 This SLR was not registered, and a protocol was not prepared.
Search strategy
The search strategy was developed and tested through an iterative process by a medical information specialist and was peer-reviewed independently by another senior medical information specialist before execution using the Peer Review of Electronic Search Strategies (PRESS) checklist.
38
The search strategy was developed based on pre-defined PICOS criteria (Supplemental Table 1
Using the Ovid® search interface, the following databases were searched on October 4, 2023: Ovid MEDLINE® (including Epub Ahead of Print, In-Process, In-Data-Review & Other Non-Indexed Citations and Daily), Embase, Cochrane Database of Systematic Reviews, and Cochrane Central Register of Controlled Trials. Search strategies utilized a combination of controlled vocabulary and keywords (e.g., “Alzheimer Disease”) (Supplemental Table 2). Modified versions of the Cochrane Highly Sensitive Search Strategy filter for randomized controlled trials (RCTs) and a filter for observational & non-randomized studies were applied. The search strategy was not restricted by language. Animal-only and opinion pieces were removed from the results. Conference abstracts were limited to those available in the last two years from the date of the database search (i.e., October 2021 to October 2023). Gray literature searches were also conducted for the following: websites of five key clinical conferences confirmed not to be indexed within Embase (limited to the last two years from the date of the database search), clinical trial registries, reference lists of previously published reviews, and relevant technology appraisals from health technology assessment (HTA) agencies and regulatory agencies (Supplemental Table 3).
Study selection
Records identified from the electronic database searches were de-duplicated prior to exporting to DistillerSR (Evidence Partners, Ottawa, Canada) systematic review software for study selection. Study selection was conducted by two reviewers who independently reviewed the study records, citation titles, and abstracts to assess eligibility based on the pre-defined eligibility criteria. Duplicates were quarantined from the final screening list prior to study selection. Reviewers documented their reasons for exclusion and any discrepancies between the two reviewers were resolved by consensus or were referred to and resolved by a third independent reviewer not involved in the study selection process.
Records considered to describe potentially eligible studies were independently reviewed by two reviewers in full-text form for formal inclusion in the review. Records that did not meet the eligibility criteria were excluded and the reason for exclusion was recorded. Any discrepancies between the two reviewers were resolved by consensus or were resolved by a third independent reviewer not involved in the study selection process. Included full-text articles were further validated for inclusion during the data extraction phase. This involved reviewing the study design details, baseline population characteristics, and statistical analyses of the outcomes of interest in detail. Searches and study selection of all gray literature sources described above were conducted by a single reviewer.
Inclusion and exclusion criteria
The specific inclusion criteria were formulated as follows: 1) adult patients with diagnosed AD, or adult patients with dementia due to AD, or adult patients with MCI due to AD; 2) studies reporting a statistical correlation between both baseline NfL measurement (in either blood or CSF) and at least one relevant AD clinical outcome as assessed by the Mini-Mental State Examination (MMSE), the Alzheimer's Disease Assessment Scale-Cognitive Subscale (ADAS-Cog), Clinical Dementia Rating (CDR) Global Rating Scale, Clinician Dementia Rating Scale Sum of Boxes (CDR-SB), the Alzheimer's Disease Cooperative Study – Activities of Daily Living (ADCS-ADL) scale, or the Alzheimer's Disease Neuroimaging Initiative (ADNI) assessment scale for memory (ADNI-MEM) or executive function (ADNI-EF). We have designated the ADNI scales as ADNI-S for convenience. Studies were not restricted based on whether they included an intervention or comparator.
Conversely, studies were excluded if they: 1) included patients ≤18 years; 2) had populations not specifically diagnosed with AD, or had mixed dementia (i.e., AD and at least one other form of dementia), or reported aggregated data for mixed dementia populations that included <80% of patients with pure AD, or focused on patients with AD and specific comorbidities; 3) were based on non-human research; 4) were published in languages other than English; 6) were in the form of case reports, protocols, comments, letters, editorials, narrative reviews, or systematic literature reviews.
Data extraction
Data were extracted by two independent reviewers into a standardized form in Microsoft® Excel (Microsoft Corporation, Seattle, US). A list of specific data elements extracted from included studies is presented in Supplemental Table 4. A risk of bias assessment of each included study was conducted using a modified version of the Downs and Black checklist for the assessment of the methodological quality of both randomized and non-randomized studies (Supplemental Table 5).39–41 Only full-text publications were assessed for quality since conference abstracts often lack sufficient methodological data to assess study quality.
Data synthesis and analysis
Synthesis of correlations. A random effects meta-analysis was used to synthesize published associations between baseline NfL and change in clinical status. Given the methodological heterogeneity present in the evidence base, published results were transformed into a partial correlation coefficient (PCC) prior to synthesis. 42 The PCC is a measure of association between an outcome of interest (change in clinical outcome) and a focal covariate (baseline NfL) after accounting for the explanatory power of other model covariates (e.g., age, sex and years of education; Supplemental Table 11). Converting the reported estimates to PCCs helps to reduce the heterogeneity introduced by conditioning on different sets of covariates and by reporting results on different scales (e.g., standardized versus unstandardized β).
Summary of the population groups, baseline neurofilament light chain detection method and clinical assessment tool of the selected models.
Combined population includes models with participants from both the MCI and AD populations.
Unclear from study description if the ADAS-Cog11 or ADAS-Cog13 was used. AD, Alzheimer's disease; ADAS-Cog 11, Cognitive abilities with the 11-question Alzheimer's Disease Assessment Scale; ADAS-Cog 13, Cognitive abilities with the 13-question Alzheimer's Disease Assessment Scale; ADCS-ADL23, Functional abilities with the 23-item Alzheimer's Disease Cooperative Study – Activities of Daily Living; ADNI, Alzheimer's Disease Neuroimaging Initiative; ADNI-EF, A composite score for executive function validated in Alzheimer's Disease Neuroimaging Initiative participants; ADNI-MEM, A composite memory score from the Alzheimer's Disease Neuroimaging Initiative neuropsychological battery; CDR, Clinical Dementia Rating; CDR-SB, Clinical Dementia Rating Sum of Boxes; CSF, cerebrospinal fluid; MCI, mild cognitive impairment; MMSE, Mini-Mental State Examination; NfL, neurofilament light chain.
Derivation of partial correlation coefficients. The PCC was calculated using the formula
Other types of correlations were considered for synthesis but not pursued further due to additional data requirements necessary to derive the target correlation that were not available in sufficient number of identified models. For instance, while the semi-partial correlation and standardized slope represent suitable alternatives, both require information on R2, which was reported only in approximately 39% of models eligible for the meta-analysis.
Statistical analysis. Meta-analyses were performed separately for the following populations of interest: 1) any population (i.e., models from patients with MCI only, models from patients with AD only, and models that had a combined population of MCI and AD); 2) patients with MCI only; 3) patients with AD only; 4) combined population (i.e., models that had a combination of patients with MCI and AD). For each of the populations of interest, meta-analyses were performed separately for each of the following outcomes: 1) MMSE; 2) ADAS-Cog; 3) CDR; 4) ADNI-S; 5) ADCS-ADL. It should be noted that for the purposes of the meta-analysis, some clinical endpoints were combined within a respective outcome category to ensure a sufficient sample size was available for synthesis. For example, synthesis of results for ADAS-Cog included results for both ADAS-Cog 11 and ADAS-Cog 13, synthesis of results for CDR included results for both CDR global score and CDR-SB, and the synthesis of results for ADNI-S included results for both ADNI-MEM and ADNI-EF. For each population and outcome, meta-analyses were performed separately for each NfL detection method: 1) plasma; 2) serum; 3) CSF. In addition, results were also pooled across NfL detection methods (i.e., any NfL measure).
Meta-analytic models were fitted separately for each population of interest (any, MCI, AD, or combined MCI and AD population), clinical outcome (MMSE, ADAS, CDR, ADNI-S, ADCS-ADL), and NfL detection method (any, plasma, serum, or CSF), provided there were at least two studies reporting at least three model results.
As the evidence base was characterized by clustering of results within studies (i.e., reporting of multiple model results from individual studies), a three-level random effects meta-analysis was employed instead of a standard random effects model to account for correlation of effect sizes within studies. 43 A standard random effects model assumes that all correlations are independent with observed values parameterized by a common correlation and two terms, sampling error and between-study heterogeneity. This approach does not account for possible dependence of multiple estimates within studies and can therefore lead to biased estimates, under-estimated standard errors, and higher risk of false positives. In contrast, the three-level model considers the same two terms as the standard random effects model as well as a third term to account for within-study dependence in observed correlations, thereby producing more accurate pooled estimates.
This model was fitted for each subgroup of PCC estimates categorized by study population, clinical measurement, and baseline NfL detection method. For subgroups where each study only reported one PCC estimate, the third level was dropped since there was no within study variance to account for. Uncertainty of the pooled PCCs was characterized using 95% confidence intervals (CI) and prediction intervals.44,45 Heterogeneity was assessed using the I² statistic
46
and classified according to thresholds reported in Cochrane guidelines:
43
0% to 40%: might not be important; 30% to 60%: may represent moderate heterogeneity; 50% to 90%: may represent substantial heterogeneity; 75% to 100%: considerable heterogeneity
All analyses were conducted in R version 4.3.2 (R Core Team, Vienna, Austria). The three-level random effects meta-analyses were fitted using the meta package (metacor function) version 6.5–0. 47
Results
Systematic literature review
In total, 1432 records were identified by the database searches (Figure 1). After duplicates were removed, 924 citations remained for title and abstract screening; 704 records did not meet inclusion criteria and were excluded. The subsequent full text review resulted in the exclusion of an additional 201 records. Gray literature searches identified 266 records, and all identified records were excluded during screening. No additional studies were identified through bibliography review of SLRs, given that no SLRs met the pre-defined eligibility criteria. In total, 19 records met the pre-defined eligibility criteria.

PRISMA flow diagram. AAIC, Alzheimer's Association International Conference; AD/PD, Alzheimer's Disease/Parkinson's Disease; ANZCTR, Australian New Zealand Clinical Trials Registry; CADTH, Canadian Agency for Drugs and Technologies in Health; CDSR, Cochrane Database of Systematic Reviews; CENTRAL, Cochrane Central Register of Controlled Trials; CTAD, Clinical Trials on Alzheimer's Disease; EMA, European Medicines Agency; FDA, Food and Drug Administration; HTA, Health Technology Assessment; ICTRP, International Clinical Trials Registry Platform; ISPOR EU, International Society for Pharmacoeconomics and Outcomes Research Europe; IQWiG, Institute for Quality and Efficiency in Health Care; MA, meta-analysis; MEDLINE, Medical Literature Analysis and Retrieval System Online; n, number; NICE, National Institute for Health and Care Excellence; NMA, network meta-analysis; PBAC, Pharmaceutical Benefits Advisory Committee; SLR, systematic literature review; TLV, Dental and Pharmaceutical Benefits Agency.
Of the 19 included studies, one was a conference abstract and was not eligible for risk of bias assessment. Of the 18 studies eligible for quality assessment using the modified version of the Downs and Black Questionnaire,39–41 17 studies were rated as poor quality and one study was given a rating of fair (Supplemental Table 6); it should be noted, however, that the maximum score for uncontrolled (20 points) and non-randomized (25 points) studies is inherently lower than for randomized controlled studies (28 points) with use of this questionnaire. 41 For transparency, sources of funding and conflicts of interest in the included studies were assessed. Only one of the included studies was industry sponsored. 33 One study did not report any information about funding 48 and one study did not report any information about funding or conflicts of interest. 49
Characteristics of included studies
Nineteen unique publications met the inclusion criteria (Supplemental Table 7). As noted above, of the 19 unique publications that met the inclusion criteria, 18 were full-text publications and one was a conference abstract. 33 All the included studies were published within the past eight years (range = 2016–2023). Two of the included studies were RCTs,33,50 and the remainder were observational studies. The included studies used data from participants enrolled in several different databases including various ADNI databases (e.g., ADNI-1, ADNI-2, ADNI-3, ADNI-GO),23,51–58 BioFINDER databases (including BioFINDER-1 and BioFINDER-2),53,57 the Australian Imaging, Biomarkers and Lifestyle study of ageing (AIBL),59,60 Prospective Dementia Registry Austria (PRODEM-Austria), 61 IRCCS San Raffaele Scientific Institute, 49 CYTOCOGMA cohort, 48 EMIF Multi-modal biomarker discovery (EMIF-AD MBD) study, 62 and Boston University Alzheimer's Disease Center Clinical Core Registry. 26
Each of the included studies was reviewed for statistical analyses that investigated the relationship between baseline NfL and one of the clinical outcome measures identified in the eligibility criteria. Each of these statistical analyses was represented as a unique “model” estimate for data extraction. The 19 included studies provided a total of 37 model estimates that were identified as meeting the inclusion criteria for full data extraction.
A summary of baseline characteristics for the participants included in the selected models is available in Supplemental Table 8. Mean age across the available models ranged from 69.8 62 to 78.0 48 years and the proportion of females included ranged from 28.0% 53 to 75.0%. 48 Twenty-one models reported the baseline MMSE score for participants, with mean scores spanning between 14.7 63 to 28.3, 64 highlighting the spectrum of AD disease severity among included studies. Most models (n = 30) reported baseline NfL levels in either plasma, serum, or CSF.
A summary of the population groups, baseline NfL detection methods and clinical assessment tools in the selected models is available in Table 1. A summary of the description of the included models is available in Supplemental Table 9; follow-up time for the included models ranged from 6.0 33 months to 78.0 59 months.
Thirty-three of the included models adjusted for multiple covariates (ranging from two 63 to ten 52 covariates adjusted for per model) while four models were univariate. There was significant heterogeneity in the sample sizes for the included models, which ranged from 17 33 to 1477 51 participants. There was also a large degree of heterogeneity in the statistical estimates that were reported from the selected models: thirty models reported p-values, eighteen models reported β-values, twelve models reported R2, nine reported Rho, six reported the t statistic, and four reported SE. A summary of the statistical data available for the selected models is available in Supplemental Table 10.
Synthesis of correlations and study selection for meta-analysis
Overall, 19 studies contributing 37 model results were available for synthesis. Of these studies, 17 reported statistical associations as t statistics or β-values from regression models that conditioned on other covariates, such as age, gender, education, and other baseline characteristics (Supplemental Tables 10 and 11). Two further studies reported univariate Pearson correlation coefficients without adjustment for other covariates. Upon review of available data, six model estimates (16a, 16b, 17, 18a, 18b, 19a) from four primary studies26,49,56,60 were excluded from the meta-analysis. All six estimates were eliminated due to insufficient information to permit derivation of the PCC. After exclusion of these studies, a total of 31 estimates from 15 studies were considered for the meta-analysis. A summary of the PCCs for each model included in the meta-analysis is reported in Table 1.
Given that there were an insufficient number of models across each population group and NfL detection method for ADCS-ADL, no meta-analytic models were run for this outcome measure. After converting published results into PCC, the number of unique studies contributing statistical associations and the number of models estimates that were available for meta-analysis are reported by endpoint and population of interest in Table 2.
Number of studies (and number of estimates) available for meta-analysis and summary of pooled partial correlations from meta-analysis by study population group, clinical outcome measure, and neurofilament light chain detection method.
Any population includes studies with patients in MCI population, AD population, or combined (MCI and AD) population.
Combined population includes studies with participants from both the MCI and AD populations, with results presented for both groups combined.
Any NfL measurement included studies using plasma, serum or CSF to measure baseline NfL levels.
*All available data for this outcome is for the MCI population; therefore, a meta-analysis was performed in the MCI population only.
**All available data for this outcome is for plasma measurements; therefore, a meta-analysis was performed for plasma NfL only.
∓ Baseline NfL is associated with change in clinical outcome, with a nominal p-value <0.05 and 95% confidence intervals that exclude 0 (null value of no correlation). AD, Alzheimer's disease; ADAS, Alzheimer's Disease Assessment Scale; ADNI, Alzheimer's Disease Neuroimaging Initiative; CDR, Clinical Dementia Rating; CI, confidence interval; CSF, cerebrospinal fluid; MCI, mild cognitive impairment; NfL, neurofilament light chain.
Summary of the meta-analysis results
A summary of the pooled partial correlations derived from the three-level random effects model are reported in Table 2 for all study populations, cognition and global outcomes, and NfL measurements of interest. Results consistently imply a significant correlation across most analyses performed. The estimated 95% confidence intervals imply correlation coefficients that range in absolute value from 0.06 to 0.44.
It should be noted that the signs of the pooled partial correlation coefficients differ according to the clinical assessment tool used. For example, results for ADAS and CDR appear as a positive value, whereas those for MMSE and ADNI-S appear as a negative value. This difference arises from the directionality of the association between cognitive function and total scores on the clinical assessment tool, with higher MMSE and ADNI-S scores being associated with improved cognition, while higher ADAS and CDR scores are associated with poorer cognition or global function.
Mini-Mental state examination
Any population (mild cognitive impairment, alzheimer's disease, or combined). Pooled PCCs between baseline NfL (any measurement) and MMSE in any population are reported based on fifteen model estimates from nine studies (Figure 2). The point estimates and confidence intervals indicate that higher NfL levels are significantly associated with a decrease in cognitive function as measured by the MMSE (pooled PCC: −0.16, 95% CI: −0.21, −0.11, I2: 49%). Similar results were derived for both plasma (pooled PCC: −0.17, 95% CI: −0.22, −0.12; Figure 3) and CSF NfL measurements (pooled PCC: −0.14, 95% CI: −0.24 −0.04; Supplemental Figure 1).

Forest plot for the meta-analysis of the relationship between any baseline neurofilament light chaina with change in Mini-mental state examination score in any populationb. aIncluded models using any detection method (plasma, serum, or CSF) to assess baseline NfL. bAny population includes models with patients in MCI population only, AD population only, or combined (MCI and AD) population. CI, confidence interval; COR, correlation.

Forest plot for the meta-analysis of the relationship between baseline plasma neurofilament light chain with change in Mini-mental state examination score in any populationa. aAny population includes models with patients in MCI population only, AD population only, or combined (MCI and AD) population. CI, confidence interval; COR, correlation.
Mild cognitive impairment. Pooled PCCs between baseline NfL (any measurement) and MMSE in patients with MCI are reported based on eight model estimates from six studies (Supplemental Figure 2). The point estimates and confidence intervals indicate that higher NfL levels are significantly associated with a decrease in cognitive function as measured by the MMSE (pooled PCC: −0.14, 95% CI: −0.18, −0.10). Similar results were derived for both plasma (pooled PCC: −0.15, 95% CI: −0.22, −0.09; Supplemental Figure 3) and CSF NfL measurements (pooled PCC: −0.14, 95% CI: −0.19, −0.08; Supplemental Figure 4).
Alzheimer's disease. Pooled PCCs between baseline NfL (any measurement) and MMSE in patients with AD are reported based on five model estimates from four studies (Supplemental Figure 5). The point estimates and confidence intervals indicate that higher NfL levels are significantly associated with a decrease in cognitive function as measured by the MMSE (pooled PCC: −0.16, 95% CI: −0.28, −0.04). By contrast, correlations derived for baseline CSF NfL and change in MMSE score were not statistically significant (pooled PCC: −0.13, 95% CI: −0.33, 0.07) (Supplemental Figure 6).
Alzheimer's disease assessment scale
Any population (mild cognitive impairment, Alzheimer's disease, or combined). Pooled PCCs between baseline NfL (any measurement) and ADAS-cog in any population are reported based on five model estimates from five studies (Figure 4). The point estimates and confidence intervals indicate that higher NfL levels are significantly associated with a decrease in cognitive function as measured by the ADAS (pooled PCC: 0.19, 95% CI: 0.10, 0.27, I2: 79%). Similar results were derived for plasma NfL (pooled PCC: 0.20, 95% CI: 0.08, 0.31; Supplemental Figure 7).

Forest plot for the meta-analysis of the relationship between any baseline neurofilament light chaina with change in Alzheimer's disease assessment scale score in any populationb. aIncluded models using any detection method (plasma, serum, or CSF) to assess baseline NfL. bAny population includes studies with patients in MCI population, AD population, or combined (MCI and AD) population. CI, confidence interval; COR, correlation.
Clinical dementia rating
Any population (mild cognitive impairment, Alzheimer's disease, or combined). Pooled PCCs between baseline plasma NfL and CDR in any population are reported based on four model estimates from three studies (Figure 5). The point estimates and confidence intervals suggest that higher NfL levels are significantly associated with a decrease in global function as measured by the CDR (pooled PCC: 0.23, 95% CI: 0.12, 0.34, I2: 53%).

Forest plot for the meta-analysis of the relationship between baseline plasma neurofilament light chain with change in clinical dementia rating score in any population groupa. aAny population includes studies with patients in MCI population, AD population, or combined (MCI and AD) population. CI, confidence interval; COR, correlation.
Mild cognitive impairment. Pooled PCCs between baseline plasma NfL and CDR in patients with MCI are reported based on three model estimates from two studies (Supplemental Figure 8). The point estimates and confidence intervals suggest that higher NfL levels are significantly associated with a decrease in global function as measured by the CDR (pooled PCC: 0.25, 95% CI: 0.06, 0.44).
Alzheimer's disease neuroimaging initiative
Any population (mild cognitive impairment, Alzheimer's disease, or combined). Pooled PCCs between baseline plasma NfL and MMSE in any population are reported based on five model estimates from three studies (Figure 6). The point estimates and confidence intervals indicate that higher NfL levels are significantly associated with a decrease in cognitive function as measured by the ADNI-S score (pooled PCC: −0.18, 95% CI: −0.25, −0.11).

Forest plot for the meta-analysis of the relationship between baseline plasma neurofilament light chain measurement with change in the Alzheimer's disease neuroimaging initiative score in any population groupa. aAny population includes studies with patients in MCI population, AD population, or combined (MCI and AD) population.
Combined population (mild cognitive impairment and Alzheimer's disease). Pooled PCCs between baseline plasma NfL and MMSE in studies of combined AD and MCI populations are reported based on four model estimates from two studies (Supplemental Figure 9). The point estimates and confidence intervals indicate that higher NfL levels are significantly associated with a decrease in cognitive function as measured by the ADNI-S score (pooled PCC: −0.19, 95% CI: −0.29, −0.10).
Discussion
We report the results of a meta-analysis of nineteen unique publications providing a total of thirty-seven unique model estimates investigating the statistical correlation between baseline NfL levels and longitudinal measures of clinical change using MMSE, ADAS-cog, CDR, ADNI-S and ADCS-ADL in patients with MCI due to AD, AD dementia or populations with a combination of patients with MCI and AD.23,26,33,48,49,51–62,65
Given the heterogeneity in the reporting of results, a PCC was calculated for each unique model estimate. Due to a lack of available data permitting calculation of a PCC in two studies, thirty-one models from fifteen unique studies were included in the meta-analysis. The MMSE was the clinical assessment tool used most frequently, while there was insufficient evidence for ADCS-ADL to permit development of a meta-analytic model for this measure. Independent of the population group (MCI only, AD only, or combined MCI and AD) and assessment tools used (MMSE, ADAS-cog, CDR, ADNI-S) the results demonstrated consistent statistically significant correlations between baseline NfL levels and longitudinal measures of clinical function in most of the analyses, whereby increased levels of NfL at baseline were correlated with long-term cognitive and global clinical decline. Though NfL is a non-specific marker of neuronal damage, NfL levels in blood are consistently predictive of longitudinal change in clinical outcomes, supporting NfL as a blood biomarker with prognostic value.66–70 NfL levels are currently used to evaluate disease progression across a number of neurological conditions, including MS and spinal muscular atrophy.8–11,71 Moreover, the non-specific nature of NfL as a biomarker of neurodegeneration has not precluded consensus recommendations for its use alongside neuroimaging and clinical measures of disease progression as a decision-making tool for the management of MS patients in clinical practice. 30 Nonetheless, the present results further add to the epidemiological support linking NfL with clinical disease stage and clinical progression.18–20
In the meta-analyses conducted previously, the majority of included studies were cross-sectional and focused on the relationship between NfL levels in the blood with MMSE scores in MCI and AD populations at a single point in time.72,73 The present study provides further information by investigating the relationship between baseline NfL concentration and longitudinal change in clinical outcomes. We show that higher baseline NfL levels are associated with increased longitudinal clinical decline in MCI and AD populations. In addition to MMSE, the present study also investigated the correlations between baseline NfL and additional clinical assessment tools including CDR, ADAS-cog, and ADNI-S.
As dementia is characterized by clinical impairment across a combination of cognitive and functional domains, 2 one strength of this study is that in contrast to the previous meta-analyses that have been conducted using only MMSE, correlations between NfL and both cognitive and global assessments were included in the meta-analyses. There is a lack of general consensus regarding the most appropriate assessment tools for determining diagnosis and severity of AD. While several countries have guidelines that specify that the MMSE is recommended for screening dementia and evaluating the cognitive function of patients,74,75 most guidelines do not recommend a specific assessment tool and provide only a recommendation for brief clinical assessments in general. 4 Although the MMSE is the most frequently used tool in clinical practice,76,77 other tools such as CDR, CDR-SB, ADCS-ADL23, ADAS-Cog11, ADAS-Cog13, ADNI-MEM, and ADNI-EF measure different domains of clinical functioning (e.g., cognition, daily function, working memory, executive function).76–80 While these tools are less frequently used in clinical practice settings, these scales are commonly used in research settings as an outcome measure for clinical trials.76–80 Due to the lack of consensus regarding evaluation of clinical function in AD populations, this study was designed to include a variety of assessment tools that are often used either in clinical practice or in controlled trials for AD.
One limitation of the present analysis is the small number of studies available, particularly when stratified across different clinical outcomes and population groups. In the present analysis, only the MMSE had a sufficient sample size for synthesis as a standalone cognitive tool (e.g., the requirement to have at least two studies reporting at least three model results). Therefore, a decision was taken to combine results across modified versions or sub-scales of the different standardized assessment tools. Specifically, ADAS-Cog11 and ADAS-Cog13 were combined into ADAS-Cog, CDR global score and CDR-SB were combined into CDR, and ADNI-EF and ADNI-MEM were combined into ADNI-S. This permitted conduct of additional exploratory analyses. Previous research has looked at the correlation between various subscale variants in patients with MCI and AD. Using the ADAS-Cog subscale variants, Podhorna and colleagues found that the signal-to-noise ratio did not differ substantially among ADAS-Cog variants, and thus concluded that the addition or reduction of items to the original ADAS-Cog test added limited value, particularly in MCI. 81 Similarly, O’Bryant and colleagues found that CDR-SB and CDR global score compared well for staging patients with AD, although they noted that the CDR-SB had better utility in tracking changes in dementia severity. 82 Therefore, by combining results the present analysis allowed for investigation of the impact across tools assessing different domains of clinical function. Given that the present results were directionally consistent across all categories of assessment, it is suggested that there is likely a general alignment across all measured domains of cognitive and global function.
Another limitation in the present analysis arises from potential discrepancies between the various assessment tools used to determine clinical staging across the AD continuum between the various databases, as well as the inclusion criteria for participants in each of the included studies. Assessment tools used to evaluate dementia vary in sensitivity and specificity, as well as domains assessed (e.g., cognition, function, behavior) and cut-off scores used to assess clinical staging (MCI, mild AD, moderate AD, severe AD).83,84 Due to the nature of AD as a continuum, and a lack of consensus in how to categorize the clinical stages of AD across various tools and criteria, it is likely there was some overlap between patients categorized as cognitively impaired with MCI and AD diagnoses in the databases from which the data were sourced for the studies included in this analysis. In an attempt to mitigate this limitation, analyses were conducted in separate groups (i.e., MCI only, AD only), in a combined group (i.e., studies that contained both MCI and AD), and in any population (i.e., comprised of MCI only, AD only, and combined groups). The results were consistent across all approaches to subgrouping.
Among the reported statistical estimates, it was noted that there was a fair amount of heterogeneity, particularly for the ADAS and CDR outcomes where the I2 was >0.50 for all estimates. Given that the meta-analysis was based on observational study designs that used different statistical modeling approaches (e.g., multivariable regression, mixed modeling, non-parametric correlations), different covariate sets, and different model specifications, this heterogeneity was not unexpected. Despite this heterogeneity, it was noted that the reported statistical estimates were directionally consistent across a variety of assessment tools and population groups. Notably, of the 31 models, 30 reported point estimates consistent with an inverse relationship between levels of NfL and clinical progression.
In the present study, most of the included estimates used plasma measurements to assess baseline NfL. This is likely due to the fact that plasma NfL is considered to be more clinically suitable and a more feasible option to measure changes in AD at the population level compared to CSF NfL.85,86 A systematic review and meta-analysis evaluating the correlation between CSF and blood NfL reported that due to the levels of blood NfL being significantly lower than CSF NfL, that consideration should be taken when choosing which assay to use, as the analytic sensitivity of each assay and subsequent applicability to measurement of CSF and plasma differs; it is suggested that Simoa should be used in preference to other assays (e.g., ELISA) to when assessing blood NfL measurements, as it provides more accurate measurements. 85 The same study demonstrated that while blood and CSF NfL concentrations correlate moderately with each other, the correlation is not perfect, and authors advise that the limitations of the detection assay used should be taken into account when interpreting blood NfL concentrations. 85 Together these results suggest that there may be differences between the measurement of blood and CSF NfL, as well as in their association with other measures, and that this consideration should be taken into account when combining plasma and CSF into meta-analytic models. In the present study, these limitations were minimized by conducting separate analyses for plasma and CSF NfL in addition to investigating analyses that combined models with blood (plasma and serum) and CSF NfL.
Most of the included studies were observational utilizing data from large databases inclusive of patients with MCI due to AD and AD dementia (e.g., ADNI, BioFINDER, AIBL). Given that data collection was not standardized across databases and that multiple NfL detection methods were used (i.e., plasma, serum, and CSF), it is likely that the included studies utilized a variety of different commercially available assays and analytical platforms to quantify NfL levels. In a comparative study between two commercially available assays for measuring plasma NfL, it was found that the enzyme-linked lectin assay (ELLA) measured higher NfL levels than the single-molecule array (Simoa) technology and that there was also more variation in the distribution of NfL values with ELLA compared to Simoa. 87 Despite this, plasma NfL levels for the study population were strongly correlated between the two assays (r = 0.94, p < 0.0001). 87 Another comparative study evaluating three different analytic platforms for the quantification of NfL in serum and CSF samples found similar results. 88 Levels of NfL in CSF measured using a conventional ELISA, electrochemiluminescence (ECL)-based assay, and Simoa technology were highly correlated (r = 1.0, p < 0.001 for all pair-wise comparisons). 88 Serum NfL measurements were also found to be highly correlated between the ECL assay and Simoa (r = 0.86, p < 0.001), 88 with smaller correlations found between ELISA and ECL assays (r = 0.41, p = 0.018) and between ELISA and Simoa assays (r = 0.43, p = 0.013). 88 Overall, the authors concluded that blood NfL can be measured reproducibly using different assay platforms with sufficient sensitivity, further supporting the use of blood NfL measurements as a biomarker of neuronal damage. 88 While it is possible that combining different assays for NfL within meta-analyses may introduce a potential limitation, there is evidence to suggest that there is a high degree of correlation between the various measurement platforms.
Early diagnosis of AD is desirable as it allows for better disease monitoring and management. 89 Biomarker testing has the potential to allow for earlier and more accurate diagnosis in primary care settings but is currently only recommended for use in specialized clinics.7,89 For NfL to be considered an acceptable biomarker for use in primary care settings, there is a need for the development of consensus guidelines for standardized measurement techniques and age-specific cut-off values for both MCI and AD populations. 11 Incorporation of blood-based NfL biomarker testing as part of the routine diagnostic work-up would have significant impact on the diagnosis of early neurodegeneration, and could allow for an earlier diagnosis of AD to be made at the MCI stage. 2 Moreover, as blood-based biomarker testing is more cost effective than PET imaging, this could have a positive impact on the economic burden to the healthcare system. Given the fact that a strong public health response has the potential to mitigate the future impact of AD, the Lancet Neurology Commission has acknowledged a call for a significantly larger investment in AD treatment research and a broad public health approach towards disease prevention.90,91
Conclusion
Overall, the results of the meta-analyses show that higher baseline NfL concentration was associated with greater longitudinal decline in cognitive and global clinical function, as measured by MMSE, ADAS-cog, CDR, and ADNI-S scales. This pattern was observed in models with any population of AD (MCI, AD or combined) as well as across various subgroups (MCI population only, AD population only, and combined MCI and AD population). Point estimates and confidence intervals for plasma NfL implied a range of correlations (in absolute value) of 0.06 to 0.44. The results of the present meta-analyses together with the evidence from Donanemab phase 2 RCT demonstrate that a beneficial treatment effect on NfL levels is reasonably likely to predict clinical benefit in patients with MCI due to AD or AD dementia and provide supportive evidence for the use of NfL as a surrogate endpoint in AD clinical trials. The future development of prospective RCTs directly evaluating the treatment effects of targeted AD therapies, particularly treatments targeting tau aggregation pathology, on NfL levels and clinical outcomes are required to further validate NfL as a surrogate endpoint in MCI due to AD and AD dementia. Consensus guidelines are also required to standardize the use of NfL measurements.
Supplemental Material
sj-docx-1-alr-10.1177_25424823251379878 - Supplemental material for Neurofilament light (NfL) chain levels predict clinical decline in Alzheimer's disease: A systematic review and meta-analysis
Supplemental material, sj-docx-1-alr-10.1177_25424823251379878 for Neurofilament light (NfL) chain levels predict clinical decline in Alzheimer's disease: A systematic review and meta-analysis by Kim Thomas, Paul Spin, Nikita Sir, Kevin Hou, Nicholas J Ashton, Henrik Zetterberg, Sonya Miller, Carol Pringle, Richard Stefanacci, Claude M Wischik and Serge Gauthier in Journal of Alzheimer's Disease Reports
Supplemental Material
sj-docx-2-alr-10.1177_25424823251379878 - Supplemental material for Neurofilament light (NfL) chain levels predict clinical decline in Alzheimer's disease: A systematic review and meta-analysis
Supplemental material, sj-docx-2-alr-10.1177_25424823251379878 for Neurofilament light (NfL) chain levels predict clinical decline in Alzheimer's disease: A systematic review and meta-analysis by Kim Thomas, Paul Spin, Nikita Sir, Kevin Hou, Nicholas J Ashton, Henrik Zetterberg, Sonya Miller, Carol Pringle, Richard Stefanacci, Claude M Wischik and Serge Gauthier in Journal of Alzheimer's Disease Reports
Supplemental Material
sj-docx-3-alr-10.1177_25424823251379878 - Supplemental material for Neurofilament light (NfL) chain levels predict clinical decline in Alzheimer's disease: A systematic review and meta-analysis
Supplemental material, sj-docx-3-alr-10.1177_25424823251379878 for Neurofilament light (NfL) chain levels predict clinical decline in Alzheimer's disease: A systematic review and meta-analysis by Kim Thomas, Paul Spin, Nikita Sir, Kevin Hou, Nicholas J Ashton, Henrik Zetterberg, Sonya Miller, Carol Pringle, Richard Stefanacci, Claude M Wischik and Serge Gauthier in Journal of Alzheimer's Disease Reports
Footnotes
Acknowledgements
The authors are grateful for the contributions of Joanna Bielecki and Becky Hooper of EVERSANA for their assistance with development of the search strategy, PRESS checklist review, and conducting of the database searches. HZ is a Wallenberg Scholar and a Distinguished Professor at the Swedish Research Council supported by grants from the Swedish Research Council (#2023-00356; #2022-01018 and #2019-02397), the European Union's Horizon Europe research and innovation programme under grant agreement No 101053962, Swedish State Support for Clinical Research (#ALFGBG-71320), the Alzheimer Drug Discovery Foundation (ADDF), USA (#201809-2016862), the AD Strategic Fund and the Alzheimer's Association (#ADSF-21-831376-C, #ADSF-21-831381-C, #ADSF-21-831377-C, and #ADSF-24-1284328-C), the European Partnership on Metrology, co-financed from the European Union's Horizon Europe Research and Innovation Programme and by the Participating States (NEuroBioStand, #22HLT07), the Bluefield Project, Cure Alzheimer's Fund, the Olav Thon Foundation, the Erling-Persson Family Foundation, Familjen Rönströms Stiftelse, Stiftelsen för Gamla Tjänarinnor, Hjärnfonden, Sweden (#FO2022-0270), the European Union's Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 860197 (MIRIADE), the European Union Joint Programme – Neurodegenerative Disease Research (JPND2021-00694), the National Institute for Health and Care Research University College London Hospitals Biomedical Research Centre, and the UK Dementia Research Institute at UCL (UKDRI-1003).
Ethical considerations
Not applicable.
Consent to participate
Not applicable.
Consent for publication
Not applicable.
Author contribution(s)
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was funded by TauRx Therapeutics Ltd.
Declaration of conflicting interests
The authors declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Kimberly Thomas, Paul Spin, Nikita Sir, and Kevin Hou are employees of EVERSANA™. EVERSANA receives consultancy fees from pharmaceutical and device companies, including TauRx Therapeutics Ltd Claude Wischik and Richard Stefanacci are salaried employees of TauRx Therapeutics Ltd Tay Siew Choon is the deputy chairman of TauRx Therapeutics Ltd Henrik Zetterberg has served at scientific advisory boards and/or as a consultant for Abbvie, Acumen, Alector, Alzinova, ALZPath, Amylyx, Annexon, Apellis, Artery Therapeutics, AZTherapies, Cognito Therapeutics, CogRx, Denali, Eisai, LabCorp, Merry Life, Nervgen, Novo Nordisk, Optoceutics, Passage Bio, Pinteon Therapeutics, Prothena, Red Abbey Labs, reMYND, Roche, Samumed, Siemens Healthineers, Triplet Therapeutics, and Wave, has given lectures in symposia sponsored by Alzecure, Biogen, Cellectricon, Fujirebio, Lilly, Novo Nordisk, and Roche, and is a co-founder of Brain Biomarker Solutions in Gothenburg AB (BBS), which is a part of the GU Ventures Incubator Program (outside submitted work). Serge Gauthier has provided scientific advice to AbbVie, ADvantage, Alzheon, AmyriAD, Eisai, Lilly, Meilleur Technologies, Novo Nordisk, Otsuka, TauRx.We acknowledge Claire Hull of TauRx for administrative assistance in submitting the manuscript on behalf of the authors. The listed authors confirm that they have authorized Claire Hull to submit the manuscript on their behalf and have approved all statements and declarations, including disclosures of conflicting interests and funding.
Data availability statement
The data that support the findings of this work are publicly available.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
