Abstract
Background:
Comprehensive treatment of Alzheimer’s disease and related dementias (ADRD) requires not only pharmacologic treatment but also management of existing medical conditions and lifestyle modifications including diet, cognitive training, and exercise. Personalized, multimodal therapies are needed to best prevent and treat Alzheimer’s disease (AD).
Objective:
The Coaching for Cognition in Alzheimer’s (COCOA) trial was a prospective randomized controlled trial to test the hypothesis that a remotely coached multimodal lifestyle intervention would improve early-stage AD.
Methods:
Participants with early-stage AD were randomized into two arms. Arm 1 (N = 24) received standard of care. Arm 2 (N = 31) additionally received telephonic personalized coaching for multiple lifestyle interventions. The primary outcome was a test of the hypothesis that the Memory Performance Index (MPI) change over time would be better in the intervention arm than in the control arm. The Functional Assessment Staging Test was assessed for a secondary outcome. COCOA collected psychometric, clinical, lifestyle, genomic, proteomic, metabolomic, and microbiome data at multiple timepoints (dynamic dense data) across two years for each participant.
Results:
The intervention arm ameliorated 2.1 [1.0] MPI points (mean [SD], p = 0.016) compared to the control over the two-year intervention. No important adverse events or side effects were observed.
Conclusion:
Multimodal lifestyle interventions are effective for ameliorating cognitive decline and have a larger effect size than pharmacological interventions. Dietary changes and exercise are likely to be beneficial components of multimodal interventions in many individuals. Remote coaching is an effective intervention for early stage ADRD. Remote interventions were effective during the COVID pandemic.
Keywords
Trial registration: ClinicalTrials.gov: NCT03424200 (clinicaltrials.gov/ct2/show/NCT03424200) registered January 10, sep 2018
INTRODUCTION
Complex diseases require complex therapies [1]. The COCOA trial was designed to test the hypothesis that multimodal interventions can slow progression, halt, or reverse— ameliorate— the course of Alzheimer’s disease and related disorders (ADRD). Over the last several decades clinical experience (e.g., [2]) and basic research (e.g., [3]) have convergently explored the possibility that there may be multiple mechanisms within an individual that drive disease pathogenesis and that may underpin treatment. This view of Alzheimer’s disease (AD) as a complex disease could be contrasted with a view that AD is solely caused by an increase in amyloid driven by a single mechanism and can only be treated by reversing that increase [4]. Epidemiological studies have identified risk factors for AD, many of which may be modifiable by lifestyle interventions (e.g., [5]). Results from previous and contemporaneous studies (e.g., [6 –9]) suggested that our central hypothesis is true— that lifestyle modifications ameliorate the cognitive and functional trajectories across the AD spectrum— and motivated further validation and testing of this hypothesis. Individuals have a range of ability and tolerance to lifestyle interventions and have variable responses (e.g., [10]). Therefore, the COCOA trial uses longitudinal adaptive personalized coaching to match types and intensities of lifestyle interventions to individual circumstances. COCOA complements and extends results from the FINGER trial [11], which showed benefit from an intervention combining diet, exercise, and cognitive training in a cohort of aging individuals at high risk or in the earliest stages of the AD spectrum. COCOA’s intervention was delivered by telephonic coaching. This coaching leveraged knowledge from previous studies, notably those underpinning US public health guidelines and the MIND diet [12], to choose interventions expected to ameliorate cognition, including diet, exercise, and cognitive training. We use the word ‘ameliorate’ to indicate improvement with respect to the control arm, which can encompass improvement with respect to baseline, stability, or deterioration with respect to baseline. COCOA inclusion criteria encompassed individuals on the AD spectrum, centering on those with mild cognitive impairment (MCI).
METHODS
Methods for the COCOA trial have been published previously [13]. In brief, COCOA was a prospective randomized clinical trial (RCT) to test the hypothesis that coached multimodal interventions ameliorate cognitive decline. The main inclusion criteria for participants were to be at least 50 years of age and to have a Memory Performance Index (MPI) of 65 or less (embic.us/application/clinical). The main exclusion criterion was an existing diagnosis of a non-AD neurodegenerative disorder.
The coaching intervention was designed based on previously tested multimodal interventions such as that of FINGER [14]. Our personalized coaching approach has been described in detail previously [13, 15]. Lifestyle coaching was provided to each participant by registered dietitians or certified nutritionists. Coaching was delivered via monthly telephone calls as well as email and text messages between calls. All coaches received four weeks of training on the general program before working directly with clients. At a high level, the training consisted of a review of all the data provided to each client (clinical laboratories, genetics, gut microbiome, questionnaires), evidence-based action items for out-of-range biomarkers, an overview of how genetic data is used appropriately in combination with phenomic biomarkers, an overview of behavioral intervention techniques, and procedural and administrative aspects of the program. Coaches received additional training on AD, the study protocol, delivery of BrainHQ, and evidence-based lifestyle interventions specific to brain health. Because all coaches already had extensive academic training and clinical experience in nutrition and exercise interventions, these were not a major focus of training. The training was designed to be experiential rather than solely didactic, so coaches had substantial opportunity to practice both designing and delivering personalized action plans. Coaching calls were recorded, with participant permission, and regularly reviewed by both coaching supervisors and clinical staff for general coaching and clinical quality as well as fidelity to the intervention goals.
Personalization was based on clinical biomarkers, genetic data, and questionnaire data, with consideration for participant preferences. For example, as described previously [15], reduction of dietary saturated fat might be a recommendation for anyone with an elevated LDL-cholesterol, however, the specific behavioral plan would be personalized to each participant’s lifestyle (e.g., for one person it might be reducing meat intake and for another it might be switching to skim milk instead of whole milk in their coffee). With genetic data available, coaches could additionally personalize the action plan based on the individual’s polygenic risk score for LDL-cholesterol, which might require a more intensive lifestyle intervention if high. The final personalized action plan for each participant was developed collaboratively between the coach, based on their clinical expertise, and the participant during each coaching call. The extent to which this personalization approach was successful in improving clinical biomarkers and optimizing health trajectories has been reported by Zubair et al. [15].
The genetic data provided in the study was focused on lifestyle and wellness genetics, such as genetics related to nutrition, exercise, sleep, and stress, as well as genetics related to common risk biomarkers associated with those lifestyle factors (e.g., genetic risk for elevated cholesterol or glucose). No medical genetics were provided or coached to, including APOE genetics. Coaches did not make recommendations solely based on genetic risk, although they might take genetics into account when developing a behavioral action plan to address an out-of-range biomarker.
Although tailored to each participant, the COCOA coaching intervention always included dietary recommendations based on the MIND diet [12], physical activity recommendations based on US public health guidelines, and recommendations for sleep and stress management. Individualized recommendations for enhancing social interactions and brain stimulating activities were also implemented as appropriate. Cognitive training was administered to all participants with BrainHQ (Posit, Inc). The behavioral aspect of the intervention was grounded in social cognitive theory [16] and coaches were also trained in other evidenced-based behavioral techniques such as motivational interviewing [17] and cognitive-behavioral strategies [18].
MPI testing was performed by trained staff who were blinded to participant arm. The nature of the intervention precluded participant blinding. Participants were recruited from a high-volume memory clinic in Orange County, Southern California. These participants reflect the demographics of their age cohort in Orange County. Awareness of memory care resources is high in Orange County as a result of the Orange County Vital Brain Aging Program [19], which was a cognitive health awareness program implemented at community sites throughout the county. Therefore, COCOA participants are drawn from a population with a heightened awareness of memory-care public-health messaging. Potential participants were evaluated by an experienced neurologist practicing a standardized diagnostic approach and received diagnoses according to standard criteria [20], including cerebrospinal fluid or imaging biomarker confirmation if indicated.
Participants were randomized into two arms using block randomization with variable-sized small blocks. Informed consent was obtained from all participants. The trial protocol as approved by the Western Institutional Review Board (WIRB; protocol #20172152) is included in Supplemental Material. A CONSORT checklist for clinical trials is included in Supplementary Material (http://www.equator-network.org/reporting-guidelines/consort).
Arm allocation
The COCOA project encompasses two initiatives. The first initiative, the COCOA trial described in this manuscript, was to conduct a test of the null hypothesis that multimodal intervention is not better than standard of care. The second initiative was to generate dense longitudinal omics data to enable mechanistic systems biology analyses, including analyses of COCOA data combined with other cohort data sets. Systems analyses benefit from data that samples many regions of state space. We hypothesized that participants in the coached intervention arm would be more likely to implement lifestyle modifications than those in the control arm, that different participants in the intervention arm would implement different amounts of each intervention, and that therefore participants in the intervention arm would sample more diverse regions of state space than individuals in the control arm. Therefore, we sought to maximize the number of individuals in the intervention arm without sacrificing much power to test the hypothesis of the COCOA trial. We chose a 3 : 2 ratio of participants in the intervention arm to control arm as a compromise between these goals.
Choice of primary outcome measure
There are many outcome measures used in AD research, and these have evolved and improved over time. These measures include measures of cognition, function, frailty, as well as measures that combine subsets of individual measures of these and other domains. The MoCA [21], a proprietary measure in wide use earlier in this century, has lower sensitivity and precision than newer measures [22, 23]. Modern AD research has an increasing emphasis on evaluation of individuals in the early stages of the AD spectrum (e.g., [24]); for such research, a higher sensitivity tool than the MoCA is needed. Improving sensitivity and precision also increases power. Previous multimodal lifestyle trials, such as FINGER, have employed trial-specific “sum of z-scores” as primary outcome metrics [25, 26]. These sum of z-scores have the advantage of assessing multiple domains and can be useful if different individuals in the trial are improving in different ways. They reduce sensitivity to change in specific areas, but also reduce the effects of noise in particular areas by averaging them across all areas. In practice these metrics are not reported by authors sufficiently to be replicated in detail by other groups. Furthermore, as the z-scores are determined by reference to other individuals in the study, the meaning of these scores is population specific and can change as individuals join or leave the study.
The Clinical Dementia Rating scale (CDR) has been used in many AD trials [27], notably very large trials funded by the pharmaceutical industry (e.g., [28]). Pharmaceutical trials aim for the broadest possible FDA approval, and so have an incentive to use a multidomain measure. Reductions in power due to low precision of these measures are compensated by very high enrollments enabled by large budgets. The global CDR is a staging scale with five possible scores: 0, 0.5, 1, 2, and 3. Many trials now use the “CDR sum of boxes” (CDR-SB) [29, 30], in part because of improved sensitivity over the CDR global score [31]. By summing component values across six domains, the CDR-SB has an increased range: 0–18. Even so, much of this range spans advanced dementia, so few of the discrete categories of the CDR-SB correspond to early stages of the AD spectrum, constraining the maximum precision and creating stark choices for balancing sensitivity and specificity and no ability to resolve subtle changes. Like sum of z-scores, the act of combining scores across the CDR reduces sensitivity to change in any particular area but also reduces the effects of noise in particular areas by averaging it across all areas. The CDR is subjective but has sufficient interrater reliability to be embraced by many studies, particularly those focused on late-stage dementia [32]. Other drawbacks to the CDR include a lengthy rater certification process and long administration time, which can raise administration costs and burden participants [33]. The cognitive portion of the Alzheimer’s Disease Assessment Scale (ADAS-Cog) [34] is employed as an endpoint by some trials, but high variability requires high sample sizes [35].
We chose the proprietary MPI as our primary outcome measure for the COCOA trial. We needed a metric that is precise, sensitive, reproducible, not subjective, inexpensive, has no learning effect, and can be compared across time and space with other studies. The range of the MPI is [0,100]. The MPI accounts for age, race, education, gender, method of test administration, and word list used. In a study of 121,481 normal or cognitively impaired subjects, the proportion of variance explained by MPI was greater than other common tests of cognition, including the ADAS-Cog and the Rey Auditory Verbal Learning Test [36]. The MPI has been validated to have 97% overall accuracy for discriminating between normal aging and MCI in both English and Japanese, and in primary care, academic, and insurance settings [37 –40]. The MPI is free for research use. Logistics of administration are quick and inexpensive. We desired frequent assessments to increase power and to best capture individual cognitive trajectories; the MPI has an exceptionally small learning effect, if any. The MPI can be administered remotely, which at the time of trial design was not considered an important criterion but became important during the COVID pandemic.
We recognize that the AD field is still searching for a universal outcome measure that is sufficiently perfect that it could be used by all studies, enhancing meta-analyses. It may be that no such measure is discoverable. Even should such a measure be discovered, there will likely remain instances where other measures, such as the MPI and CDR-SB, are more appropriate. Evaluating existing measures and developing improved measures remains an important current area of research (e.g., [41]).
Power
Power calculations have been detailed previously [13]. In brief, we planned to employ a linear mixed effects model (LMM) to show that arm assignment significantly affected MPI. We evaluated power over a range of possible values, using established methods to assess sample sizes for correlated longitudinal data [42, 43]. For a postulated effect size of 3 MPI points over two years, using conservative parameter estimates, we estimated a need for 84 participants (N) to achieve 80% power (alpha = 0.05). This estimate shaped our pre-COVID goal of recruiting 200 participants. However, we recognized our parameter choices were conservative and expected actual power to be higher. Therefore, we also explored less conservative parameter choices derived from a real-world study [44]. With these parameter choices, for an effect size of 1 MPI point over two years, COCOA was predicted to achieve 80% power with N = 36.
Event driven procedural modifications
In April 2019, Arivale, our primary provider of trial logistics, ceased operations. Most operations previously contracted through Arivale were then assumed in house. Enrollment ceased due to Arivale ceasing operations. Circa March 2020, the COVID pandemic began. COVID increased risk for in-person activities. Assessment interactions were rescheduled as necessary, in some cases creating a delay between planned assessments or causing a planned assessment to be skipped. Not all assessments were performed precisely at preplanned intervals. The use of time as a variable in analyses allowed conclusions to be robust to unsynchronized data. The intervention, telephonic coaching, was exceptionally robust (almost presciently so) to COVID disruptions.
Unforeseen circumstances are likely in any long-term clinical trial. COCOA was well designed for these unforeseen circumstances. First, as mentioned above, COCOA’s interventions were driven by telephonic coaching, which was robust to the physical distancing (aka, “social distancing”) required by public health due to the pandemic. In response to the cessation of Arivale, we hired coaching staff from Arivale to maintain provider continuity. Many of the COCOA assessments were obtained remotely, or could be, so there was little interruption or variation of these due to forced remote interactions. The MPI is a summary statistic of the MCI Screen (MCIS), which can be performed remotely.
One result of the event-driven changes was the reduction from a planned 200 participants (original trial design) to 55 participants (trial completion). Of these 55, not all completed all planned assessments; some participants dropped out of the trial early and some participants skipped some assessments. Reasons for leaving the trial early included relocating out of state. No withdrawals were related to COVID, as far as we could ascertain. Most reasons for skipping assessments were because of physical-distancing needs and/or lack of health care staff to perform these assessments during intense COVID waves.
Statistical analysis
All randomized participants that completed a baseline visit were included in a modified intention-to-treat analysis. Missing data was not imputed. As prespecified at trial design, repeated measurements of the primary outcome measure (MPI) from baseline through the last trial visit were analyzed using an LMM. Arm, time (a continuous variable measured in days since enrollment), and sex were included in the LMM. The LMM used 254 data points from 54 participants. Data were collected from enrollment in the study until departure from the study. For three participants, due to pandemic delays, the last timepoint was more than 2.2 years after enrollment, with the most prolonged timepoint at 2.7 years after enrollment. Annotated R code and data are provided in the Supplementary Material.
RESULTS
Participants
COCOA recruited, enrolled, and randomized 55 participants over 16 months; attrition reduced the number to 35 who completed all 24 months of active participation in the trial. However, most individuals who did not complete 24 months participated long enough to be included in one or more of the analyses presented in this manuscript; N is indicated for each analysis in the corresponding sections (CONSORT flow diagram, Fig. 1). There were 8 females and 14 males in the control arm and 10 females and 21 males in the coaching arm. For this population, weight was 171 [35] pounds (mean [SD]); body mass index was 26 [4]. Baseline demographics are presented in Table 1. The first participant was recruited in January 2018. The last participant was recruited in April 2019. Data collection from participants ended in summer 2021, slightly longer than 24 months after the last participant started the trial; some assay collections were delayed due to the COVID pandemic.

CONSORT flow diagram for the COCOA Trial. Although there was some attrition during the trial, most participants stayed active long enough to contribute molecular data to at least two timepoints.
Baseline participant characteristics. Demographic attributes were balanced between the two arms of the COCOA trial (N = 53 completed baseline assessments). ApoE genotype only available for a subset of participants
Retention and adherence
One of the 55 enrolled participants stopped receiving assessments prior to baseline, so the modified intention-to-treat number for these analyses was 54. Of these 54 participants, 28 continued assessments for the full two years. Of the 29 participants starting in the intervention arm, 6 remained actively engaged with their coach after two years (Supplementary Figure 1). We and others have postulated that much of the benefit of coaching for lifestyle interventions occurs during the first year (or less) of coaching, and these benefits persist many years after coaching ceases (e.g., [45 –48]). Most of the participants (25/29 = 86%) in the intervention arm remained actively engaged with their coach longer than one year. The majority of participants (48/55 = 87%) received at least two MPI assessments. Overall, six MPI assessments were planned (baseline and then at 4, 8, 12, 18, and 24 months) for each of the 54 participants; the majority of these (235/324 = 73%) were obtained.
Cognition (MPI) is significantly ameliorated in the intervention arm
Cognition (MPI) was significantly ameliorated in the intervention arm (Fig. 2; Supplementary Figure 2). A prespecified LMM, including sex as a covariate, showed a 2.1 [1.0] MPI point benefit (one-tailed p = 0.016; N = 54) to the intervention (coached) arm compared to the control (standard of care) arm over the two-year course of the intervention; we therefore reject the null hypothesis that the COCOA intervention has no effect. We report one-sided significance; we pre-declared our hypothesis that the intervention arm would have a better outcome than the control arm, consistent with previous research results, such as FINGER.

Linearized aggregate MPI trajectories of COCOA participants. The coaching arm has less decline in cognition than the standard-of-care arm. Although a linear fit does not fully capture the dynamics of the trajectories, it illustrates a traditional statistical approach for testing the significance of the primary outcome measure. Some final MPI assessments were delayed past 24 months due to the COVID pandemic. Change from baseline is graphed. 90% confidence intervals are shaded.
We performed a similar analysis with the MoCA. MoCA ameliorates in the intervention arm, but not significantly (Supplementary Figure 3). We expected less power for MoCA, as we measure MoCA less frequently and MoCA has less precision than MPI, so this lack of MoCA significance is not surprising [49].
The average change in MPI score between enrollment and end of the trial (withdrawal, death, or completion), for those individuals with at least two assessments, was –4.6 [3.2] (N = 19) in the control arm and –1.7 [3.1] (N = 29) in the intervention arm, so we estimate an effect size due to the COCOA intervention of 63% less cognitive loss. This effect size eclipses known effect sizes from pharmaceutical interventions (the best typically 10–20%) and exceeds most reported effect sizes from non-pharmaceutical interventions (the best typically 46–59%) [50]. We make this claim very cautiously but recognize that effect sizes are widely reported in professional literature and the popular press. Effect-size comparisons between trials are fraught because of differences in outcome measures and other trial aspects. Our reported effect size, as well as the reported effect sizes we cite for comparison, may be inflated [51]; however, publication bias can also deflate published effect sizes. Furthermore, if trajectories are non-linear and/or have oppositely signed slopes for different arms, it is not clear that discussion of “effect size” is a meaningful way to present results, or at least it becomes important to use tailored definitions of effect sizes for each trial. Tailored effect sizes can be based on time delay [52] or area between trajectories. We present effect size results in this manuscript for two reasons: 1) CONSORT guidelines require effect-size reporting, and 2) power calculations for clinical trials, including COCOA, often draw upon a concept of effect sizes. For some medical conditions with simple univariate easily measurable outcomes, a simple concept of effect size may be a useful tool. For ADRD, and other complex diseases, the concept of ‘effect size’ may be less useful. Effect size reports should be deconstructed as we have done in this paragraph. Effect sizes should be reported outside of technical literature with caution, if at all. Alternatives to power calculations that require a simple concept of effect size should be sought. At the least, ADRD effect sizes should be multidimensional concepts.
Effects of sex, age, and education
Sex, age, and education influence cognitive assays in many contexts. We performed linear regression including terms for sex, age, and education. Of these, only sex is an independent predictor of MPI, as modeled by the LMM, in COCOA (two-sided p-value = 0.0064), with female sex having an aggregate beneficial effect (2.2 MPI point benefit). COCOA is underpowered to observe effects of age (p = 0.29) and education (p = 0.29), if any, on MPI change over time, as modeled by the LMM. The interactive term between sex and arm is not significant, indicating that the effects of the COCOA intervention and sex are independent or that we lack power to see such an effect. There are no large differences in the distributions of sex, age, or education between arms (Table 1).
Function (FAST) is significantly ameliorated in the intervention arm
Function as measured by the Functional Assessment Staging Test (FAST) is also significantly ameliorated in the intervention arm: linear regression showed a benefit to FAST in the coached arm compared to control (one-tailed p = 0.030; Fig. 3). Individuals with at least two assessments in the control arm deteriorated (i.e., increased) 0.53 [1.2] (N = 15) FAST score points over the course of the trial and individuals in the intervention arm deteriorated only 0.33 [1.1] (N = 27) FAST score points, very approximately a 40% reduction. Therefore, both a measure of cognition (MPI) and a measure of function (FAST) ameliorated due to the COCOA intervention. This suggests that the COCOA intervention has broad multifaceted benefits across multiple systems, that the COCOA intervention is beneficial for overall brain health and for patient-oriented outcomes relevant to dementia.

Linearized FAST trajectories of COCOA Participants. The coaching arm has less decline in function than the standard-of-care arm. Trajectories are computed with linear regression. 90% confidence intervals are shaded.
The robustness of the FAST result is mediocre. For example, if the three participants with the largest absolute changes in FAST score are removed from the analysis, the p-value for the improvement in the intervention arm versus the control arm would no longer be nominally significant. Although the significance is not robust, the improvement is: one has to remove the 10 participants (out of 42 with at least two FAST measurements) with the largest absolute changes to eliminate aggregate improvement. This robustness implies that the largest benefits from the intervention may be accrued by a relatively few individuals, that there may be a small benefit enjoyed by many participants, and that at least some individuals in the intervention group may receive no benefit. It is also possible that our FAST improvement is a chimera resulting from type I error.
FAST is a low-precision patient-oriented instrument with large differences between consecutive categorizations. Substantial pathophysiology likely separates FAST stages. For example, FAST stage 2 is “subjective functional deficit”; stage 3 is “objective functional deficit interferes with a person’s most complex tasks”; and stage 4 is “instrumental activities of daily living (IADLs) become affected, such as bill paying, cooking, cleaning, traveling”. In untreated AD, it typically takes several years to degenerate from FAST stage 3 to 4, and two years to degenerate from FAST stage 4 to 5 [53]. It is rare for FAST to improve. It is therefore particularly interesting to compare the number of observations of FAST improvement between the intervention arm and the control arm. We considered all pairs of consecutive FAST assessments of the same individual. In control participants, 1 of 32 (3.1%) such intervals showed improvement; in intervention participants, 6 of 65 intervals (8.6%) showed improvement (Supplementary Figure 4). This difference is not significant by Fisher’s exact test, but the magnitude of the difference suggests that the intervention may occasionally reverse the process of disease (or at least reverse the decline of patient-oriented outcomes) in some individuals.
Comparison to ADNI trajectories
Our primary outcome linear model for COCOA estimates a –0.2 MPI point decline per year in the intervention arm and a –2.8 MPI point decline per year in the control arm. In the Alzheimer’s Disease Neuroimaging Initiative Database (ADNI) cohort, in aggregate, MoCA scores decline about –2.2 points per year in AD participants (Supplementary Figure 5). In COCOA data, the MoCA score and MPI score are correlated: a one-point change in MPI corresponds to a 0.31-point change in MoCA (data not shown). This coefficient fits intuition, since the maximum MoCA score is 30 and the maximum MPI score is 100. Using this coefficient to convert COCOA’s MPI predictions to MoCA equivalence, our primary outcome linear model estimates a –0.1 MoCA point decline per year in the intervention arm and a –0.9 MoCA point decline per year in the control arm. COCOA controls receive standard of care. ADNI participants presumably receive standard of care, although ADNI is not an interventional study. Therefore, it would be expected that both groups have similar aggregate decline. However, the difference between COCOA controls’ –0.9 and ADNI’s –2.2 MoCA points is noticeable. The difference between these two values is within our comparison’s margin of error. It is also likely that on average, COCOA controls do better than ADNI participants. Possible explanations: COCOA participants receive care at specialty memory clinics and/or benefit from the specific environment and demographics of Orange County (e.g., affluence), and ADNI participants receive care from diverse providers, including those not specializing in memory care, and reside in diverse locales.
External datasets like ADNI are useful for validation. The ADNI validation comparison establishes that COCOA results are consistent with other research results [54]. The difference in the regression coefficient for cognitive decline between COCOA controls and ADNI participants highlights one fallacy of categorizing individuals in an external dataset as “controls”. Unless controls are drawn from the same population as the intervention arm, an overt or hidden confounder may distort comparisons. In theory, an accurate systems model could completely account for confounders, but 1) we do not yet have such a model, and 2) if one had such a model it is not clear why more research requiring such external controls would be needed, except if it was fully accurate at modeling the system in control conditions but not able to model the system in intervention conditions, so use cases would be limited.
Minimal clinically important difference
The minimal clinically important difference (MCID) is the smallest change in a treatment outcome that an individual patient would identify as important or would indicate a change in the patient’s management. Wu et al. [55] estimate the MCID for MoCA as 1.2 points. From the above analysis, over the two-year course of COCOA, we expect a 1.6-point difference in MoCA between arms, which would be clinically meaningful by this metric. We have no reason to believe that COCOA has optimized coached multimodal lifestyle interventions; as more RCTs are conducted and as dementia care knowledge accumulates, we expect the clinical importance of such interventions to grow.
DISCUSSION
We have shown that personalized multimodal lifestyle interventions can ameliorate cognitive change over time in individuals on the AD spectrum. Our results have two broad implications. First, COCOA served well as a pilot trial. More trials, such as the PREVENTION trial [56], should be modeled after COCOA’s trial design. Such designs can generate new information, both testing well defined hypotheses and expanding the scope of inquiry into areas of pathophysiology that are less explored [1]. These trials should generate dense-data and empower artificial intelligence (AI) analysis. Second, lifestyle interventions should be foundational to clinical care guidelines for all individuals on the ADRD spectrum.
Duration of effect
COCOA was a two-year study and maintained its effect for at least a year. Many pharmaceutical interventions show transient benefit, so duration of effect may be an additional advantage that lifestyle interventions would have in a head-to-head comparison with a drug. Persistent cognitive benefits have been seen in other multimodal intervention trials (e.g., FINGER). Many published pharmaceutical trials end after six months to a year; stakeholders have a disincentive to test a drug longer than 6 months. Donepezil was approved by the FDA based largely on 6-month data; any longer trial of a new drug might risk showing inferiority. Despite this, a few trials of small-molecule pharmaceuticals have extended as long as 2 years (e.g., [57]). These have established limited benefits for some small molecules [58]. Trials of monoclonal antibodies have typically lasted longer than small-molecule trials. Notably, the DIAN trial for dominantly inherited AD lasted up to seven years (but the intervention did not slow cognitive decline) [59]. These trials have recognized the benefit of relatively early intervention [60] and the possibility that transient non-linear effects might fade after a few months. To avoid distraction by transient effects, comparative reviews and meta-analyses of intervention trials for AD should focus on trials lasting at least 12 months; COCOA was designed to enable inclusion in such comparisons.
Limitations
COCOA results may not be generalizable to other populations, as patients were recruited from a small area and participants mostly self-described as White. However, COCOA results likely are generalizable because: 1) other trials have seen similar results (e.g., [14]), 2) lifestyle interventions can be adapted to many cultures, 3) hypothesized mechanisms touch upon universal human physiology, and 4) the personalized nature of the intervention automatically adapts to population differences. As a pilot study, COCOA has demonstrated the feasibility of including many diverse individuals in trials designed with dense data collection and analyzed with systems epistemology. We conclude that more resources should be devoted to fund larger multi-center dense-data trials, not only of AD but also other diseases, that can be inclusive of the full diversity of the global population. To this end, inclusion and exclusion criteria should be made as loose as other trial-design constraints allow.
Differences between participants at baseline are a limitation of COCOA and all other clinical trials. Because these differences are also a strength as well as a limitation, it may be unreasonable to attempt to eliminate them completely in trial designs, even if it were possible. These differences will result in differences between baseline distributions of variables. These differences could have two consequences for interpretation of trial results. Such differences, and not the intervention, could be responsible for the difference in outcome between arms. One particular concern would be a subtle difference in mean baseline MPI score (Table 1). A higher baseline MPI score in the control compared to intervention arm would be predicted to result in less MPI loss, since the rate of cognitive decline tends to accelerate as a function of itself. Since we observe the opposite, we doubt this baseline MPI difference had any impact on our overall conclusions. If anything, it might suggest the effect of the intervention is slightly stronger than we report.
Attrition from a trial, whether biased by the intervention or not, can also result in differences between outcome and other variables. We did not detect any significant differences in attributes of individuals withdrawing from COCOA [49]. In particular, neither arm assignment nor MPI score at the time of withdrawal had a significant impact on retention.
Yet another limitation of COCOA is outcome measurement variability. In the context of high variability, aggregate results are not necessarily applicable to any given individual (Supplementary Figure 2). Cognitive outcome measures with less error are a general need for studies of neurodegeneration and further development of such measures should be a research funding priority. Also, since cognitive performance can fluctuate over the course of a day or any short time period, studies will also benefit from frequent if not nearly continuous assessment to better power tests of long-term change in cognitive baseline (see also [49]).
At the time of the COCOA trial, obtaining confirmation of amyloid/tau biomarker positivity in all individuals screened for a trial without pharmaceutical funding was prohibitive. Therefore, biomarker positivity was known for participants only if that information was present in clinical records. For all individuals in COCOA who had been tested for biomarker positivity, all were positive. However, it is possible that some of the remaining participants were not. It is possible to consider this either a limitation or a strength of the trial. In the context of the NIA-AA definition of AD developed for research purposes [61], then it is possible that not all COCOA participants have AD by this definition, and therefore, if biomarker subgroups could be identified, one subgroup might benefit more than another; this would be seen as a limitation. However, many individuals both within the United States and around the world who are at risk for AD do not know their current or eventual biomarker status. Therefore, in a real-world setting, the COCOA results may be generally applicable to ADRD and those at risk for ADRD; this would be considered a strength. A full discussion of the definition of AD and its relationship to COCOA inclusion criteria can be found in the Supplementary Material.
Comparisons with other case reports, studies, and trials
The general utility of multimodal interventions to ADRD is now widely accepted, as opposed to a few years ago when greater skepticism prevailed [50]. Although the general utility is widely accepted, there are plenty of areas of uncertainty, skepticism, and/or controversy. These areas generally encompass uncertainty about exactly which modes should be included, and the details of the included modes. One area of uncertainty: should cognitive training be included, and if it should, what specific exercises and regimen should be included? Another: although it is generally accepted that diet is important, exactly which diet is best for each individual, and what supplements, if any, should be included? Given the increasing acceptance of multimodal interventions— to the point where they may now be considered to be the most effective intervention for ADRD— we do not in this manuscript thoroughly review the literature. The FINGER trial is the most recognized trial demonstrating the effectiveness of multimodal interventions [14]. More recent studies, including the Chicago Health and Aging Project (CHAP) and the Rush Memory and Aging Project (MAP), validate these findings (e.g., [62]). Several other case reports and case series similarly support the importance and effect size of multimodal interventions (e.g., [7, 63]). Even dietary interventions alone have been reported with similar effect sizes, although not all diets and studies are equal [64]. Neither the significance nor effect size of the COCOA multimodal intervention should be surprising in light of these reports. The COCOA effect size is similar to those observed in other studies. We look forward to the results of related ongoing and future trials (e.g., [56, 65]).
Effect size can be used to evaluate therapeutic interventions. For AD pharmaceuticals, the relative difference of the primary outcome between arms of an RCT has been used to determine effect size. Lecanemab, among the best AD pharmaceuticals, has a 27% effect size at 18 months [66]. The COCOA intervention has a 63% effect size, more than twice that of lecanemab. It is unlikely that pharmaceutical interventions and lifestyle interventions share exactly the same mediators and mechanisms; therefore, we expect multimodal therapies that include pharmaceuticals to be better in some individuals than multimodal interventions restricted to lifestyle. However, the COCOA trial was not designed to test such a combination.
Clinical insights
Many if not most dementia patients in the United States are not receiving the best current standard of care. Many if not most people at risk for dementia are not receiving the best preventive care. Standard-of-care includes screening for and treating known causes of dementia, optimizing diet and exercise, and cognitive or social engagement (e.g., [67]). Both the cognitive (MPI) and functional (FAST) outcomes for COCOA were significant. The coherency of these outcomes suggests that the intervention operates on fundamental mechanisms and improves real world outcomes. As mentioned above, COCOA controls do better than ADNI participants. One likely explanation is that COCOA controls received better standard of care than ADNI participants. Furthermore, ADNI participants probably receive better care than average, as they participate in an academic clinical study. However, they are unlikely to receive care at specialty memory clinics and may not consistently receive care consistent with modern guidelines, particularly as such care is typically not fully reimbursed by payers. It is possible that even the specialty memory clinics providing standard of care to COCOA participants are not fully following modern recommendations. In that case, one possible interpretation of the beneficial result of the COCOA intervention is that COCOA is “merely” doing a better-than-usual job of implementing standard of care. If so, our analysis points urgently to the need to better implement standard of care, and to better emphasize multimodal lifestyle interventions in training and published guidelines. Two intertwined translational conclusions result: 1) substantial improvements can be made to the nation’s cognitive health by democratizing known standard-of-care measures, and 2) the benefits of multimodal lifestyle interventions when deployed nationally are likely to be even greater than observed in COCOA because the standard-of-care baseline for COCOA controls was so high.
The feasibility of a telephonic (or other remote communication modality) coaching intervention is very high, and the cost to implement is very low. This is particularly true if coaching can be fully or partially implemented or supported with artificial intelligence [68]. While training coaches on the multiple data types and evidence-based interventions can be time-consuming, considerable time is saved by hiring individuals (e.g., Registered Dietitian Nutritionists) who already have academic and clinical training in key areas of the intervention such as nutrition and exercise science. To scale this intervention, it is certainly possible for lay coaches, with supervision from licensed allied health professionals, to deliver much of the follow up intervention once the personalized action plans have been developed. Differences between reimbursement rates for pharmaceuticals and coaching can be immense, so that even if coaching is orders of magnitude less expensive than pharmaceuticals, the cost to the patient can be higher. Therefore, we recommend advocacy for changes in how payment for dementia care is implemented on a national level. Also, we believe that group coaching, although not tested by COCOA, will further decrease costs and possibly have additional social engagement benefits [5]. Many participants leaving COCOA, including in the control arm who received coaching after the trial period, wished to continue with the coaching intervention. However, it was difficult to find existing coaching providers. Therefore, in conjunction with changes in payer reimbursement policies, entrepreneurial opportunities may exist for coaching. Compliance in real-world situations may be higher than in COCOA, as patients will not necessarily have burdensome in-person assessments and blood draws and should find it easier to continue remote coaching even if they move to other locations. Furthermore, they should be able to choose from a variety of coaching styles and implementations that best match their needs. Even without such flexibility, most participants in the COCOA intervention arm were highly engaged with coaching for at least a year, suggesting that many people are likely to persistently engage with coaching interventions, and that even for those who do not, much of the benefit of coaching on cognition can be conferred over a short time, consistent with the results of other similar trials, such as FINGER [14]. From the data presented in this manuscript, considered in isolation, and assuming no knowledge of previous research in neurodegeneration or physiology, one should remain open to possibilities that all aspects of the COCOA intervention may contribute to cognitive and functional amelioration. Such aspects include social interaction and increased attention to health issues, as well as the diet, exercise, and cognitive training elements of the multimodal intervention. At this point in time, we cannot recommend that any one of these elements be subtracted from multimodal interventions.
Future directions
Future COCOA analyses will focus on the dense molecular data obtained for the cohort. Analyses of the molecular and other dense data generated from COCOA (e.g., [69]) may help resolve mechanisms and individual contributions of the multimodal intervention. Analyses to date, as presented here, were designed to test the single hypothesis that personalized multimodal intervention drawing on multiple lifestyle domains would ameliorate dementia. These analyses cannot by themselves inform other hypotheses, such as whether restricting intervention to a subset of these interventions would have performed as well as the full multimodal intervention. Nor can these analyses determine the relative value or contribution of each of these interventions in the cohort or in individuals. However, prior information supports a strong contribution of exercise (e.g., [70]), and based on such prior knowledge we postulate that exercise may be the most important intervention for many individuals. Larger datasets incorporating longitudinal omics data interpreted in the context of mechanistic models are likely necessary in order to resolve the contributions of components of multimodal interventions, as there are an infinite number of possible combinations of doses and types of interventions, limiting the utility of univariate approaches. Results of trials such as COCOA should help modify and refine practical real-world multimodal clinical therapies for individuals with or at risk for AD, and should help with the design of AI-ready clinical trials to further improve clinical recommendations and advance basic biomedical understanding of AD.
Footnotes
ACKNOWLEDGMENTS
COCOA would not have been possible without the participants, for whom we reserve the greatest acknowledgement and appreciation. Institutional support from Hoag Memorial Hospital Presbyterian was vital. Data used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (
). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at adni.loni.usc.edu.
FUNDING
This work was supported by the Alzheimer’s Translational Pillar of Providence St. Joseph Health.
CONFLICT OF INTEREST
Dr. Shankle is an employee of EMBIC Corporation. Dr. Hara owns stock in EMBIC Corporation. There are no other conflicts of interest for any of the authors.
DATA AVAILABILITY
The data supporting the findings of this study are available within the article and/or its supplementary material.
