Abstract
Brain aging leads to difficulties in functional independence. Mitigating these difficulties can benefit from technology that predicts, monitors, and modifies brain aging. Translational research prioritizes solutions that can be causally linked to specific pathophysiologies at the same time as demonstrating improvements in impactful real-world outcome measures. This poses a challenge for brain aging technology that needs to address the tension between mechanism-driven precision and clinical relevance. In the current opinion, by synthesizing emerging mechanistic, translational, and clinical research-related frameworks, and our own development of technology-driven brain aging research, we suggest incorporating the appreciation of four desiderata (causality, informativeness, transferability, and fairness) of explainability into early-stage research that designs and tests brain aging technology. We apply a series of work on electrocardiography-based “peripheral” neuroplasticity markers from our work as an illustration of our proposed approach. We believe this novel approach will promote the development and adoption of brain aging technology that links and addresses brain pathophysiology and functional independence in the field of translational research.
CHALLENGES AND OPPORTUNITIES IN ADDRESSING THE TENSION BETWEEN MECHANISM-DRIVEN PRECISION AND CLINICAL RELEVANCE IN TRANSLATIONAL RESEARCH ON BRAIN AGING
Intervening brain aging to maintain or improve functional independence is among the most important goals for translational research on brain aging, and relies on understanding 1) the proposed causal biological pathways that lead to brain aging disorders in a sensitive (the disorder always acts via the pathway) and specific (the pathway does not lead to other disorders and is not present in healthy individuals) manner and 2) how these biological changes result in cognitive and functional deficits. Unfortunately, the two aspects of this goal are often in tension, highlighted by the recent controversy regarding the FDA approval of Aducanumab, a drug that showed success in reducing amyloid-β plaques [1] (a biomarker thought to be part of the causal mechanism of Alzheimer’s disease; AD) in largely white, well-educated participants without clear evidence of downstream effects on clinical outcomes of interest (e.g., cognitive decline [2]). This can be contrasted with research on physical exercise interventions, which show broad improvements in cognition across multiple brain aging disorders, often via mechanisms not specific to brain aging [3]. For example, exercise has been shown to lower chronic inflammation, which is a risk factor for AD, as well as a host of other physiological and neurological disorders, making this mechanism alone non-specific to AD. This divide reflects a broader disagreement in the human brain aging field over whether AD should be predominantly diagnosed according to biological markers thought to be involved in the causal process that leads to AD [4] or clinical outcomes, as slowing functional decline is the ultimate goal in translational AD research [5]. This tension between mechanism-driven precision and clinical relevance is partially driven by difficulties in studying brain aging in humans: early-stage research that is used to establish internal validity (i.e., establish specific and sensitive causal links to pathophysiology) uses expensive measures in controlled settings and is challenging to translate, or more specifically scale, into research in humans in the real-world that are the target of clinical tools. This has led to a disconnect between basic research studying the biological bases of brain aging and clinical research attempting to understand and improve real-world functions influenced by brain aging in humans.
Technology-driven tools used in the human brain aging field can be generally categorized into 1) brain imaging measures for understanding temporal and spatial domains of brain pathophysiology, 2) brain modulation devices for modifying brain integrity via various pathways, and 3) digital biomarkers indirectly measuring neurocognitive status (see Table 1). Compared to traditional clinical tools, these technology-driven tools can leverage advantages from autonomous, consistent data acquisition and artificial intelligence (AI) to assist in clarifying the linkage between brain and behavior to aid in allocating resources to support independence [6]. Therefore, these tools have the potential to improve 1) prediction of brain aging, characterizing who is at-risk earlier and with more precision, 2) monitoring of brain aging via downstream indicators and risk factors to know when resources are needed, and 3) modification of brain aging, establishing how to slow or prevent brain aging in individuals. However, to promote the development of brain aging-related technology-driven tools with both clinical relevance and mechanism-driven precision, we need to acknowledge that: 1) brain aging is highly complex, with clear links between proposed disease mechanisms and clinical outcomes remaining elusive, 2) clinical outcomes are highly heterogenous, meaning generalizability from small, homogenous samples is especially poor, and 3) methods required to establish internal validity are costly and lack scalability. The field of research aiming to develop technologies to predict, monitor, and modify brain aging is in its infancy, providing a key opportunity to improve research guidance and avoid the same mistakes and fractionation as in the broader AD literature, particularly given that many technology developers may not appreciate the specific requirements of clinical research on brain aging.
Literature on technology-driven human brain aging research
DTI, diffusion tensor imaging; EEG, electroencephalography; fNIRS, functional near-infrared spectroscopy; MEG, magnetoencephalography; MRI, magnetic resonance imaging; PET, positron emotion tomography; tDCS, transcranial direct current stimulation; TMS, transcranial magnetic stimulation.
In the current opinion paper, we emphasize a multi-dimensional understanding of explainability to guide and appraise the development of technology used in clinical research on brain aging. By leveraging our appreciation of emerging clinical research frameworks aimed at uncovering causal mechanisms (e.g., the precision medicine [7]) or intervention development (e.g., NIH stage model [8], pragmatic trial design principles [9]), we suggest that incorporating multiple desiderata of explainability into the earliest stage of technology development and testing will produce tools with both clinical relevance and mechanism-driven precision. Our goal is to promote the development and adoption of brain aging technology that links and addresses brain pathophysiology and functional independence.
THE ROLE OF EXPLAINABILITY IN UNDERSTANDING TENSIONS IN TECHNOLOGY-DRIVEN BRAIN AGING RESEARCH
Explainability (or interpretability) refers to the extent to which the reasons why an algorithm arrived at a particular decision can be understood by humans. There is often a trade-off between predictive power and explainability: allowing AI algorithms to find the best models for predicting outcomes from input data without limiting their complexity or requiring their workings to be fully transparent to their human users often leads to improvements in their predictive power. This trade-off means that it is tempting, particularly for the developers of AI, to sacrifice explainability to maximize predictive power. However, while explainable algorithms may be more limited in predictive accuracy in the short term, there are long-term benefits to the generation of explainable algorithms. Notably, in clinical research, knowing which features drive predictions is critical to improving knowledge, detecting biases, facilitating social interactions, and meeting clinical standards [10]. Algorithms lacking explainability have the potential to generate predictions based on aspects of their input data that are either non-transferable or ethically problematic. For example, Watson’s AI for oncology [11] showed good accuracy in controlled environments but made incorrect or even dangerous decisions when attempts were made to expand its usage into natural world settings, where the data showed complex interactions that were not present during training. Knowing which features drove its predictions could have avoided this issue. Additionally, these issues cause a lack of trust among medical professionals required to communicate decisions to patients and by patients themselves: improving explainability can improve trust by stakeholders throughout the development process, improving adoption and increasing the clinical impact of technology.
Explainability can be facilitated using inherently explainable models (e.g., transparent algorithms) or by applying post-hoc approaches to reverse-engineer solutions from more complicated algorithms (see Box 1). In translational research, these methods have to be embedded within theories of disease action to have a real impact. Understanding how different neurodegenerative diseases impact behavior in sensitive and specific ways is at the core of translational brain aging research (e.g. [4]). Thus, regulatory bodies and clinicians often prioritize explainable solutions that fit within our understanding of specific diseases when deciding on tools to develop and adopt. In neuroscience and brain aging research, in addition to the input (e.g., neuroimages) and the output (such as the disease diagnosis), several other variables including demographics, behavioral and cognitive test scores, or genetic information are involved, and a combination of inherent and post-hoc algorithms are needed to generate appropriate and reliable explanations that link to biological pathways. For example, it is necessary to demonstrate that predictions are driven by neuroimaging signals reflecting causal biological pathways, not by biases in the demographics of the training data. Ensuring techniques are providing explainable solutions, therefore, requires additional work on the part of researchers to demonstrate not only that their algorithm is performing accurately, but that the features driving predictions can be uncovered and explained.
Box 1. Getting explainable solutions from machine learning
To provide a concrete example, AD diagnosis can be predicted with 70–90% accuracy on the basis of structural connectivity [12]. Follow-up analyses reveal which specific brain regions are driving predictions and visualize this comparatively to traditional brain imaging approaches. Doing so highlights that predictions are driven predominantly by regions known to accumulate proteins hypothesized to cause AD early on in the disease [4]. A similar analysis could be performed in the context of novel deep learning algorithms [13] for both inherent and post-hoc explanations to interpret the effect of sex, age, and other external variables on the features and predictive power of the model, to ensure these factors that lack mechanism-driven precision are not driving predictions [14]. This provides confidence both in the links between AD pathology and structural connectivity, as well as the algorithm itself. This information is helpful to researchers, clinicians, and regulatory bodies making decisions based on the development, approval, and adoption of similar technologies.
IMPROVING EXPLAINABILITY AS A MEANS OF DEVELOPING IMPROVED BRAIN AGING TECHNOLOGY
Importantly, explainability is a multi-dimensional construct. While the emergence of techniques such as those mentioned above improves the explainability of findings identified using AI, there is no systematic way of quantifying explainability or of determining whether explanations are sufficient to specify the relationships between biological pathways of brain aging and cognitive and functional decline. For example, a cognitive neuroscience explanation of where in the brain [15] deficits of structural connectivity are most strongly related to cognitive decline may not be a sufficient explanation for a clinical evaluation of causality that is essential to know, for example, whether modifying structural connectivity would lead to improvement in cognitive function. The current tensions between mechanism-driven precision and clinical relevance in brain aging research can be understood as a mismatch in which aspects of explainability are being prioritized by stakeholders in the field [4, 5]. Lipton [16] proposes four desiderata— causality, informativeness, transferability, and fairness— critical for understanding this tension (see Table 2). Conflicts arising from mismatched explainability priorities are almost inevitable because developing tools that meet all desiderata for explainability is particularly challenging in the field of brain aging, due to trade-offs in the ability of different types of data that are prioritized at different stages of research to meet different desiderata. This means that even when tools meet one or two explainability desiderata at one stage of research, several other desiderata are likely to be missed.
Overview of explainability desiderata in brain aging research
Early-stage research in controlled environments can establish causal and informative measures of brain aging pathophysiology, and link these to laboratory measures of behavior, using techniques that acquire signals directly from the brain. The most common type of causal research uses animals, leveraging the ability to directly intervene on the brain. This capacity to intervene directly on the brain allows researchers to know how the specific mechanism that they are altering causally affects 1) pathophysiology and 2) behavior; and link these two core aspects of clinical brain aging research. However, the transferability of mechanisms from animal to humans, as well as from laboratory to clinical settings are often challenging, particularly when comparing the complex interplay of real-world cognitive, social, and emotional experiences involved in clinically relevant real world behavior with laboratory measures. Both the scalability of the measures and generalizability of the findings lead to a lack of translations between studies [17]. Intervening directly on the brain in humans is more challenging and only allowable in rare cases. This makes causal links between proposed mechanisms and 1) pathophysiology and 2) behavior much more difficult to establish in humans. PET imaging can be used in early-stage research to identify potential mechanisms that cause brain aging or link brain aging with a behavioral decline in humans; however, PET is expensive and requires the injection of tracers, resulting in small, demographically homogeneous samples that are currently challenging to combine [18], limiting transferability and fairness. Genetic research can also establish potential causal pathways by identifying biological risk factors for specific brain aging disorders, but endophenotypes (e.g., measures from neuroimaging) are often required to link these informatively to clinically relevant behavior [19].
Whether the mechanisms identified in early stage research causally link pathophysiology and behavior in humans, therefore, needs to be established using progressive steps that steadily translate findings from controlled to real world settings. In clinical research, this is done most commonly using randomized controlled trials (RCTs), first in research settings, then in clinical settings, then in the real world. However, to be informative and maintain causal mechanistic links to pathophysiology, these studies require the use of neuroimaging data, including PET, EEG, and MRI, that can provide insight into whether changes seen in the behavioral outcomes are being modified via the proposed causal biological pathways. A recent perspective [20] proposed a causality continuum by which different neuroimaging studies can be evaluated, and highlighted that studies that combine multiple neuroimaging modalities (including those that involve experimental manipulation of the brain, e.g., brain stimulation) and can demonstrate “coherence” of findings across modalities provide the best evidence for causal mechanisms. These requirements to demonstrate causality place a large burden on RCTs in the field of brain aging, and the samples are often limited in size or demographic heterogeneity, as neuroimaging measures are expensive, particularly when they need to be collected over multiple timepoints as in an RCT design. This can also lead to type 2 errors due to insufficient power, making this type of research high-risk, leading many neuroimaging investigators to prioritize research in large, publicly available datasets using methods that are lower on the causality continuum (e.g., resting state fMRI). Additionally, behavioral measurements in this type of research are commonly limited to laboratory measures (e.g., [21]) that do not generalize in clear ways to real-world behavior [22]. On the other hand, research in real-world settings can leverage autonomous, continuous behavioral data collection (e.g., from wearable devices, smartphone applications, sensors [23]) to develop transferable predictions that are more easily testable for fairness [24], but are not able to provide informative insight into the causal biological pathways that these mechanisms are involved in at the level of the brain.
To overcome this disconnect, we believe that technology developed in early-stage research needs to be appraised not only on whether it is able to establish causality and informativeness, but on whether it has the potential to demonstrate that its predictions/decisions are transferable and fair when it moves into later stage research. On the other hand, researchers developing technology aimed at real-world settings need to be made aware of the importance of causality and informativeness (in addition to transferability and fairness), and should attempt to link the improvements seen in clinical outcomes to our understanding of the biological mechanisms of brain aging. Researchers should use this general checklist to appraise the explainability of their technology, and modify their current and future research so that it meets, or has the potential to meet, all four desiderata of explainability (see Table 2). We believe that these additional checks on whether research using brain aging technology has the capacity to develop explainable tools across all four desiderata will help prioritize funding for higher impact research, allowing more researchers to utilize clinically-robust research designs. For example, research using technology within an RCT that 1) causally intervene on mechanisms with known pathways linking them to pathophysiology (informed by animal models or human PET imaging), 2) include multimodal neuroimaging markers to inform how the mechanism acts via these causal pathways, and 3) use scalable measures of behavior and indicators of neural mechanisms that can be expanded to real-world research to test the transferability and fairness of these mechanisms would score highly and should be prioritized.
INCORPORATING MULTIPLE DESIDERATA OF EXPLAINABILITY INTO EARLY-STAGE TECHNOLOGY DEVELOPMENT AND TESTING IN BRAIN AGING RESEARCH
We suggest early-stage research utilize technology that can be scaled up to demonstrate transferability and fairness, enabling real-world research to be linked causally and informatively to the mechanisms of brain aging. Using the NIH stage model framework [8] as an example (see Table 3), the early stages refer to identifying components/mechanisms on which the technology (Stage 0) will be based, developing technology (Stage 1), and establishing efficacy in the laboratory (Stage 2), as opposed to later stages of establishing efficacy in clinical settings (Stage 3), showing effectiveness in the real world (Stage 4), or disseminating into the real world (Stage 5). Accordingly, several practical strategies can be applied during early-stage research: 1) research should be guided by a clear clinical premise, aimed at predicting, monitoring, or modifying a specific and sensitive brain aging pathway informed by stage 0 research and 2) attempts should be made to incorporate real-world measures into early-stage research, to identify coherence across data modalities that can establish internal validity (i.e., neuroimaging) and those that can be used to establish external validity (i.e., wearable devices). Using the four desiderata of explainability can help researchers to judge whether their research is fostering, or will be able to foster, explainability that can be embedded into theories of disease action: can features driving predictions be causally and informatively linked to both pathophysiology and behavior that is real-world relevant, and are they transferable and fair. In early stages, it is appropriate to prioritize causality and informativeness; however, researchers need to consider whether their technology is feasible in terms of ascertaining transferability and fairness at later stages, and incorporating scalable data modalities into early-stage research is the best way of ensuring this. Features based on these scalable modalities can then be moved through to late-stage research, knowing that they have demonstrable links to pathophysiology. Similarly, researchers who have identified transferable and fair behavioral markers should work on building these back into early-stage research to determine causal and informative links to pathology.
Proposed research stages (Note: there is significant feed-forward and feed-back between stages, and research does not neccesarily have to follow the stages precisely)
Model predictions should be clearly linked to the clinical features that drive them, facilitated by the use of inherently explainable algorithms or post-hoc approaches to reverse engineering explanations [10]. Research designs should prioritize maximizing explainability desiderata: e.g., using RCTs and referring to the causality continuum in neuroimaging studies [20] to maximize causality, basing hypotheses on, and attempting to interpret, potential biological mechanisms to improve informativeness, using training, test, validation methods in large, diverse samples using real-world measures to improve transferability, collecting demographically diverse samples, and specifically accounting for potential socioeconomic or ethnic biases to ensure fairness. Meeting all desiderata of explainability would benefit from collaborative research between experts from multiple fields within brain aging research. Animal models, for example, are essential for stage 0 research to identify novel candidate mechanisms on which technology can be built. Clinical scientists and psychiatrists are best able to design and implement RCTs in order to establish the causal basis for these technologies in difficult-to-study patient groups, and neuroscientists will be needed to maximize the informativeness of the mechanisms that the technology leverages. This will require a systems neuroscience approach, building teams with expertise at different levels of understanding the same biological pathways to establish coherence between the different levels of measurement that are required for the four desiderata of explainability. Additionally, ethicists and social scientists will be needed to ensure studies testing transferability and fairness take into account all potential sources of bias. To facilitate this, researchers should also strive to meet FAIR guiding principles for scientific data management and stewardship [25], allowing researchers to foster explainability collaboratively, adding causal and informative links to real-world biomarkers, and developing scalable markers to help establish transferability and fairness of behavioral mechanisms of known indicators of pathology.
EXAMPLE RESEARCH: TECHNOLOGY-MEASURED ADAPTATION CAPACITY AS A TARGET FOR COGNITIVE TRAINING
To illustrate a practical example of the incorporation of explainability into the early stages of designing and testing a technology-driven brain aging tool, we outline how research into a specific brain aging process— adaptation capacity— can be leveraged to develop technology to modify brain aging with comprehensive explainability using research from our lab (see Table 2). We highlight how we believe our approach helps to improve the clinical relevance of this particular mechanism, while maintaining mechanism-driven precision.
Early stage research (e.g., mechanistic studies from stage 0) has established that the connectivity of specific regions of the brain, particularly those in the cingulate cortex, is maintained in older adults with superior memory [26], and that this connectivity may confer protection against AD pathology [27]. Further stage 0 research has shown that autonomic nervous system (ANS) function relates to cingulate cortex function [28], and ANS responses to cognitive training are important for determining whether individuals show improvement following the intervention [29]. This establishes a biological pathway via which ANS responses to cognitive training may intervene on brain networks known to be essential for AD, allowing for technology developed to leverage this pathway to have mechanism-driven precision, if future research can establish how this pathway modifies AD pathophysiology. Importantly, measurement of the ANS using ECG is highly scalable and can be done with wearable devices that give this potential intervention high clinical relevance if it can be leveraged effectively to improve cognitive training outcomes.
This research, therefore, inspired intervention development and pilot testing research (stage 1) to test whether cognitive training can improve ANS function via this shared mechanism involving the cingulate cortex [30], and whether improving ANS function can further strengthen the effect of cognitive training in cognitive aging [31]. Separately, in a stage 2 trial, we also revealed that selected patterns of ANS function can predict neuroplasticity in the cingulate cortex following training [32]. This study used an inherently explainable machine-learning approach called shapelet analysis with a limited number of theory-derived features to provide clear explanations for predictions that could be incorporated into findings from stage 0 research to develop more comprehensive explainability. This knowledge can feedback to develop new tools and cognitive training paradigms: e.g., early-stage research directly monitoring and modifying adaptation capacity in an individualized cognitive training [33]. Additional research is required to understand the precise mechanism by which ANS function relates to AD pathophysiology and how interventions that act on adaptation capacity might causally alter AD pathophysiology, potentially using an RCT design alongside multimodal neuroimaging. The fact that ANS measures are scalable allows for later stage research, potentially leveraging app-based cognitive training approaches alongside wearable ANS measures, to determine the transferability and fairness of these biomarkers. Attention will need to be paid to the fact that some wearable ANS measures are known to be less accurate in individuals with darker skin tones [24], to ensure this biomarker is effective in ethnic minority populations and doesn’t widen healthcare outcome disparities. If successful, research based on theories surrounding adaptation capacity could result in scalable technology to modify brain aging that meets all desiderata for explainability: cognitive training that improves cognition based on mechanisms with 1) causal, informative links to the resistance to AD pathology and 2) transferability and fairness that ensure this technology is equally successful in the real world in all individuals. If research is carried out according to this framework, this technology would have both mechanism-driven precision: an understanding of the biological pathways via which is causally alters AD pathophysiology, as well as clinical relevance: fair and transferable effects that improve cognitive function in the real world in older adults at-risk for AD.
CONCLUSION
We here urge the field of translational research on brain aging to recognize the tension between mechanism-driven precision and clinical relevance in existing research trying to predict, monitor, and modify brain aging and associated functional outcomes. Technology-driven tools provide the potential to help link these goals that have become disconnected in the broader AD literature, but only if individuals developing technology are aware of, and take active steps to mitigate, this tension. Utilizing the four desiderata of explainability to judge the potential of clinical technology early on in the design process can help to identify potential explainability issues later on, and may therefore help promote the dissemination of cost-effective technology to promoting successful brain aging.
