Abstract
Background:
Multiple algorithms with variable performance have been developed to identify dementia using combinations of billing codes and medication data that are widely available from electronic health records (EHR). If the characteristics of misclassified patients are clearly identified, modifying existing algorithms to improve performance may be possible.
Objective:
To examine the performance of a code-based algorithm to identify dementia cases in the population-based Mayo Clinic Study of Aging (MCSA) where dementia diagnosis (i.e., reference standard) is actively assessed through routine follow-up and describe the characteristics of persons incorrectly categorized.
Methods:
There were 5,316 participants (age at baseline (mean (SD)): 73.3 (9.68) years; 50.7% male) without dementia at baseline and available EHR data. ICD-9/10 codes and prescription medications for dementia were extracted between baseline and one year after an MCSA dementia diagnosis or last follow-up. Fisher’s exact or Kruskal-Wallis tests were used to compare characteristics between groups.
Results:
Algorithm sensitivity and specificity were 0.70 (95% CI: 0.67, 0.74) and 0.95 (95% CI: 0.95, 0.96). False positives (i.e., participants falsely diagnosed with dementia by the algorithm) were older, with higher Charlson comorbidity index, more likely to have mild cognitive impairment (MCI), and longer follow-up (versus true negatives). False negatives (versus true positives) were older, more likely to have MCI, or have more functional limitations.
Conclusions:
We observed a moderate-high performance of the code-based diagnosis method against the population-based MCSA reference standard dementia diagnosis. Older participants and those with MCI at baseline were more likely to be misclassified.
INTRODUCTION
Dementia prevalence is expected to rise due to the aging of the population, potentially reaching over 130 million cases worldwide in 2050 [1]. This dementia burden forecast is a significant public health concern worldwide [2], as dementia has devastating consequences for the individual and significantly impacts families, communities, and healthcare economies [1, 3]. A considerable portion of patients who would meet diagnostic criteria are not diagnosed with dementia by a physician [4], resulting in missed opportunities to potentially reduce some of dementia’s consequences (e.g., early referral to care providers, evaluation for reversible causes of cognitive difficulties, and access to social or legal services) [5–7]. Patients and families may also have less time to prepare for the future early in the course of the disease, leading to delays in nursing home care, or in finding other resources to help older adults to age in place with resulting cost savings and better quality of life [5–7]. Missed opportunities to detect cognitive impairment may compromise patient safety, medication compliance, and comorbidities management [8]. Thus, identifying the characteristics of those with missed or delayed dementia diagnosis is important to improving timely dementia diagnoses in all patients.
Standard screening for dementia is not routinely performed in clinical practice, but algorithms have been developed to identify patients with dementia using combinations of billing codes and medication data that are widely available from health systems [9–11]. Such algorithms are appealing because they can be readily applied across health systems using existing billing and electronic health record (EHR) data. However, algorithms that use structured data have shown variable sensitivity for accurately identifying patients with dementia [2]. Additionally, there is limited information about the characteristics of patients incorrectly identified by these algorithms [6]. If the characteristics of misclassified patients are clearly identified, modifying existing algorithms to improve performance may be possible.
The aim of the study was to examine the performance of a slightly adjusted code- and prescription-based electronic algorithm developed based on previous work [9, 11], to identify missing cases of dementia in a routinely collected EHR database of participants of a population-based cohort where dementia was actively assessed through routine follow-up. In addition, we describe the characteristics of persons incorrectly categorized.
MATERIALS AND METHODS
Study population
Our analyses included participants in the Mayo Clinic Study of Aging (MCSA; ≥50 years old), a population-based, longitudinal study of cognitive aging, mild cognitive impairment (MCI), and dementia [12, 13]. In MCSA, participants are invited for follow-up visits every 15 months, following the same baseline evaluation protocol. Trained nurses review participants’ medical records to abstract data on chronic conditions, such as diabetes, hypertension, dyslipidemia, coronary artery disease, atrial fibrillation, congestive heart failure, stroke, and depression.
Study approval was obtained from the Institutional Review Boards of the Mayo Clinic and Olmsted Medical Center in Rochester, Minnesota. Participants of the MCSA provided written informed consent before participation. In the case of participants with cognitive impairment sufficient to interfere with capacity, assent was obtained from a legally authorized representative.
Identification of dementia in the MCSA cohort (reference standard)
At each MCSA visit, a study coordinator collected sociodemographic factors, and asked questions on memory, neuropsychiatric symptoms, and medications; the Clinical Dementia Rating scale [14] and the Functional Activities Questionnaire [15] were administered to an informant. A physician reviewed the medical history and performed a neurological examination, and nine neuropsychological tests were administered by a psychometrist to assess cognitive performance in four domains: memory, attention/executive function, language, and visuospatial skills [12]. After reviewing each participant’s information, the final diagnosis (cognitively unimpaired, MCI, or dementia) at each MCSA visit was adjudicated by consensus between the study coordinator, the physician, and a neuropsychologist. Evaluators were blind to any previous cognitive diagnosis. Individuals were diagnosed with dementia if they met the Diagnostic and Statistical Manual of Mental Disorders-IV criteria for dementia [16]. Individuals who performed in the normal range and did not meet the criteria for MCI [17] or dementia [16] were classified as cognitively unimpaired. An extensive medical record review occurred every five years for participants lost to follow-up to ascertain dementia when they reached 70 years of age. At baseline, the apolipoprotein E (APOE) genotype was assessed from blood drawn using standard methods [18]. The APOE ɛ4 carriers included participants with one or two copies of the ɛ4 allele (i.e., ɛ2/ɛ4, ɛ3/ɛ4, ɛ4/ɛ4).
Covariates
We estimated socioeconomic status using the HOUSES measure [19]. HOUSES is a single measure constructed using 4 items (number of bedrooms, number of bathrooms, square footage of the unit, and estimated building value of the unit) ascertained from the Olmsted County Assessor’s office [19]. The HOUSES index was calculated for participant address at baseline and was dichotomized into the lower Olmsted County quartile (Q1: lowest socioeconomic status) and the remaining Q2–Q4.
A severity weighted index of disease burden at baseline was calculated using a modified Charlson comorbidity index (Charlson Index) [20] score based on electronic diagnosis codes. BMI was calculated using height and weight [BMI = weight (kg) / [height (m)]2], both measured at MCSA baseline and was categorized into underweight (BMI < 18.5), healthy weight (BMI≥ 18.5 and BMI < 25), overweight (BMI≥ 25 and BMI < 30), and obesity (BMI≥ 30). The Beck Depression Inventory II [21] and the Beck Anxiety Inventory [22] were self-reported, and each consisted of 21 items. The BDI-II measures common symptoms of depression over the past two weeks, and the BAI measures common anxiety symptoms over the past week. An ordinal scale ranging from 0 to 3 is used to rate the severity of each item (total score 0 to 63); a higher score indicates higher severity of symptoms.
The Clinical Dementia Rating scale assesses impairment caused by cognitive loss. The Clinical Dementia Rating scale Sum of Boxes (CDR-SB) was computed by summing the scores for each of the domain boxes (score ranges from 0 to 18) [23]. The Functional Activities Questionnaire is a 10-item questionnaire that assesses instrumental activities of daily living (e.g., assembling tax records, shopping alone for groceries, writing checks, working on a hobby, turning off the stove after use, traveling out of the neighborhood, preparing a balanced diet; score ranges from 0 to 30). The Functional Activities Questionnaire was considered complete if 70% of the questions were answered. Higher scores for both measures indicated greater impairment.
Code-based algorithm for identification of dementia
We studied the performance of a slightly adjusted algorithm developed based on previous work to identify dementia from electronic health record data [9, 11].
Briefly, the algorithm included at least one of the following: 1) an International Classification of Diseases Ninth edition codes (290, 797, 2900, 2901, 2902, 2903, 2904, 2941, 2942, 2948, 3310, 3311, 29000, 29010, 29011, 29012, 29013, 29020, 29021, 29030, 29040, 29041, 29042, 29043, 2908, 29089, 29410, 29411, 29420, 29421, 29480, 331, 33111, 33119, 33182) or ICD-10 codes for dementia (F0150, F0151, F0280, F0281, F0390, F0391, F061, G300, G301, G308, G309, G3101, G3109, G3183),
For the current study, the EHR-derived billing codes and codes for prescription medication were extracted for the study population using the Rochester Epidemiology Project medical records linkage system indices [24] for MCSA participants without dementia at baseline and no dementia codes or medication within one year of MCSA baseline. Codes and medications were extracted between the MCSA baseline and one year after an MCSA dementia diagnosis or last follow-up/medical record review (in persons without a dementia diagnosis by MCSA).
Comparing MCSA and code-based algorithm
We compared the code-based dementia algorithm diagnosis to the reference standard MCSA dementia diagnosis to assess the sensitivity and specificity of the code-based algorithm, as well as the positive and negative predictive value. Participants were classified as positive by the code-based algorithm if they had a qualifying ICD-9/10 code or dementia medication between MCSA baseline and one year after an MCSA dementia diagnosis or last follow-up/medical record review (in persons without a dementia diagnosis by MCSA).
Statistical analyses
Participant characteristics were summarized using descriptive statistics (mean, standard deviation, median, interquartile range, count, percentage).
We calculated the percentage of MCSA participants that were classified as having dementia using the electronic codes algorithm and estimated the sensitivity, specificity, positive predictive value, and negative predictive value of the algorithm to identify dementia, using the MCSA diagnosis as the reference standard. We compared the characteristics of persons identified incorrectly (either as positive or negative cases) to persons identified correctly using Chi-Square or Kruskal Wallis tests, as appropriate.
Analyses were considered statistically significant at a p-value <0.05 and were performed using SAS version 9.4 (SAS Institute Inc., Cary, NC).
RESULTS
Baseline characteristics
Characteristics of the study population are shown in Table 1 (N = 5,316; mean age (SD): 73.3 (9.68) years; 50.7% male). Most participants were white (98.0%) and not Hispanic or Latino (99.1 %), and 18.4% had a HOUSES score in the lowest socioeconomic status quartile. Overall, 739 persons (14.4%) in the cohort developed dementia during follow-up (follow-up time including time to last medical record review; (mean (SD): 7.2 (3.83) years), as assessed by MCSA. Two-thirds (68.4%) of the participants had hypertension, 78.3% had dyslipidemia, 29.6% had coronary artery disease, and 17.9% had diabetes. One-third (33.5%) of the participants were obese at the MCSA baseline, and 16.7% had a Functional Activities Questionnaire total score greater than zero at baseline. Only 5.7% of the participants (n = 303) had less than one year follow-up.
Baseline characteristics of study participants
N (%) unless otherwise specified. {n}, number of participants missing measurement; BDI-II, Beck Depression Inventory-II; BAI, Beck Anxiety Inventory; APOE, Apolipoprotein E; FAQ, Functional Activities Questionnaire; CDR, Clinical Dementia Rating. ∧From baseline to last in-person follow-up. ∼Either until last active follow-up visit or last medical record review date for participants who dropped from active follow-up. For participants diagnosed with dementia last medical record review is the date of dementia diagnosis. *Vascular disease burden includes diabetes mellitus, hypertension, stroke, history of atrial fibrillation, coronary heart disease, congestive heart failure, dyslipidemia, and peripheral vascular disease with a range of 0–8.
Performance of code-based dementia algorithm
Five hundred and twenty participants were correctly diagnosed as having dementia (i.e., true positives) and 4,369 as not having dementia (i.e., true negatives) (Table 2) compared to the reference standard diagnosis. Overall, 219 persons were diagnosed as having dementia by MCSA, but this diagnosis was not captured by the algorithm (i.e., false negatives), and 208 were considered free of dementia by MCSA but were identified as having dementia by the algorithm (i.e., false positive). The sensitivity of the code-based algorithm was 0.70 (95% CI: 0.67, 0.74), and the positive predictive value was 0.71 (95% CI: 0.68, 0.75). Specificity and negative predictive value were high (both, 0.95, 95% CI: 0.95, 0.96). The MCSA diagnosis and the code-based algorithm diagnosis agreed in 4,889 participants [92%; (true positives + true negatives)/(total number)].
Identification of dementia using the code-based algorithm compared to the Mayo Clinic Study of Aging diagnosis (reference standard)
CI, confidence interval; PPV, positive predictive value; NPV, negative predictive value. Participants are positive by the code-based algorithm if they had a qualifying event between MCSA baseline and MCSA dementia diagnosis/last follow-up + 1 year; participants were free of dementia at MCSA baseline with no qualifying ICD9/10 codes or dementia medications within 1 year of baseline.
All four measures did not change appreciably when estimated for males and females or when we only included participants 65 years old or older (Table 2).
Differences in characteristics of correctly and incorrectly identified participants
We observed that the algorithm failed to identify participants with dementia (i.e., were false negatives) in persons who were older, had less education, higher frequency of MCI at baseline, and worse Functional Activities Questionnaire scores and CDR-SB scores compared to those who were correctly identified (i.e., were true positives; Table 3).
Baseline characteristics of MCSA participants by the code-based algorithm and MCSA diagnosis
N(%) unless otherwise specified. {n}, number of participants missing measurement; BDI-II, Beck Depression Inventory-II; BAI, Beck Anxiety Inventory; FAQ, Functional Activities Questionnaire; CDR, Clinical Dementia Rating. ∧Participants are positive by the code-based algorithm if they had a qualifying event between MCSA baseline and MCSA dementia diagnosis/last follow-up + 1 year; participants were free of dementia at MCSA baseline with no qualifying ICD9/10 codes or dementia medications within 1 year of baseline. *Kruskal-Wallis or Chi-Square test p-value. #Vascular disease burden including diabetes mellitus, hypertension, stroke, history of atrial fibrillation, coronary heart disease, congestive heart failure, dyslipidemia, and peripheral vascular disease with a range of 0–8. ±From baseline to last in-person follow-up. ∼Either until last active follow-up visit or last medical record review date for participants who dropped from active follow-up. For participants diagnosed with dementia last medical record review is the date of dementia diagnosis.
On the other hand, persons who were incorrectly identified as having dementia by the code-based algorithm (i.e., were false positives) were older, more likely to have MCI, had a higher Beck Anxiety Inventory score, and had more comorbidities than true negative participants (Table 3).
DISCUSSION
Using the dementia diagnosis of the population-based MCSA as the reference standard, the code-based algorithm had a sensitivity of 70%, and the positive predictive value was 71%, while specificity and negative predictive value were both 95%. Almost 30% of the dementia cases (219 of 739) were not captured by the algorithm. Individuals falsely diagnosed as having dementia by the algorithm (i.e., false positives) were older and more likely to have MCI at baseline and higher comorbidities burden than true negative participants. In addition, false negative participants were older, were more likely to have MCI at baseline, higher Functional Activities Questionnaire score, and Clinical Dementia Rating sum of boxes at baseline than those who were correctly identified with dementia (i.e., true positives).
Previous studies that validated dementia codes in routinely collected healthcare data had variable findings reflecting differences in methodologies, settings, and datasets used [2]. In a recent systematic review of 27 studies (using hospital, primary care, insurance, or prescription data) [2], positive predictive value for all-cause dementia ranged from 33% to 100% (with 16 of 27 studies having a positive predictive value greater than 75%) and sensitivity ranged from 21% to 86%; the US insurance data provided the highest sensitivity (vs. clinical dementia as assessed by the ADAMS cohort in 2001–2003). In our study, there was a little higher positive predictive value in females, although the present study in agreement with others [6], did not find a statistically significant difference in sex between the comparison groups.
Present findings partially agree with findings from the ADAMS study [25] (a substudy of the Health and Retirement Study (HRS) that represented the US population aged 71 and older between 2001–2003). The ADAMS study found that those with undiagnosed dementia were older, with fewer years of education, more likely male, unmarried and had less severe dementia than those with a dementia diagnosis [25]. Similarly, the NHATS study (a population-based, nationally representative cohort of Medicare beneficiaries aged ≥65 in the continental USA) [6], found that undiagnosed patients (39.5% of those with probable dementia) have lower education and less functional impairment. However, they did not find differences in dementia underdiagnosis with sex. Variability in findings could also be attributed to differences both in the assessment of dementia or probable dementia diagnosis and/or the comparison groups chosen (e.g., informant recollection or Medicare claim-based diagnosis) [6, 25].
A systematic literature review and meta-analysis, including 23 studies from around the world reporting the proportion of undetected dementia and/or its determinants, suggested that male sex, younger age (<70 years), living in a community setting (vs. residential/nursing home), and diagnosis by a general practitioner, were associated with a higher rate of dementia underdetection while using the Mini-Mental State Examination (MMSE) diagnosis criteria was associated with a lower underdetection rate [26].
A timely dementia diagnosis can improve patient outcomes, reduce safety risks, and provide the opportunity to enroll in clinical trials or receive disease-modifying therapies as they become available, prevent complications, and allow time for patients and families for future planning [6]. In addition, understanding the characteristics of patients failing to be diagnosed with dementia could help target specific populations, educate providers and families, and support patients and their caregivers better [6].
Our study compared the performance of a code-based electronic algorithm to identify dementia with a population-based cohort where dementia is actively assessed through routine study follow-up. The evaluation revealed a moderate-high performance of the code-based diagnosis method. The moderate sensitivity and positive predictive value indicated potential EHR data quality issues related to the coding and documentation of dementia cases during routine practice care. Omitted diagnosis could be a documentation mistake or a deliberate decision, and often we cannot assess whether documented dementia cases were timely diagnosed [7], and reveal the nature of the real-world EHR environment that data documentation and collection purpose is for billing rather than clinical research. Thus, it is not possible to completely eliminate data missingness in EHRs. However, we need to note that a high positive predictive value is desirable, as a high proportion of those identified as persons with dementia in routinely collected EHR data would be true dementia cases; to maximize statistical power and minimize selection bias in case ascertainment, a high sensitivity is also preferred [2].
A strength of our study was application of an electronic algorithm to real-world EHR data on a cohort of participants with actively assessed dementia. Overall, this algorithm demonstrated a moderately-high validity in identifying dementia cases. When a reference standard data set such as MCSA is available, data quality assessment and statistical bias adjustment can be conducted to improve the validity of study outcomes from EHR data. For example, Tong et al. [27] proposed a statistical method to reduce data quality bias by combining the strengths of a small reference-standard validation set (e.g., MCSA cohort) of patients and large real-world EHR data, but it is potentially error-prone. When ICD codes are not routinely coded and/or underdiagnosed and underreported, text information such as patient clinical notes can be leveraged to improve the detection of conditions in clinical practice [28].
MCSA is one of the few large population-based cohort studies with detailed serial neurocognitive evaluations that randomly selects participants from a geographically defined area covered completely by the Rochester Epidemiology Project medical record linkage systems in Olmsted Cunty (MN), which qualifies MCSA diagnosis as the reference standard for validating EHR dementia diagnosis and a major strength of the current study.
Limitations of the study should also be noted. Cases that received medication for other than dementia conditions (e.g., memantine use in anxiety disorders) [9] may have been captured; thus, classification error by the present algorithm could have occurred. Omissions in documentation by the care provider in the EHR might result in the misclassification of individuals. In the event of a research diagnosis of dementia, MCSA informs the participants’ primary care team, and this information is likely to influence further investigation by the primary care team and possibly result in more true positive cases. Most of the study participants were White (98%) and not Hispanic or Latino (99.1%). Therefore, assessing the performance of the algorithm in diverse populations is warranted. A longer follow-up will capture more dementia cases in this cohort of older adults; therefore, our estimates of the algorithm performance precision might be conservative and we would expect the confidence intervals to narrow as additional cases accumulate over time.
Building dementia detection EHR algorithms is an area of research that is still underdeveloped [2]. We observed a moderate-high performance of the code-based diagnosis method against the population-based MCSA reference standard dementia diagnosis; inaccurately categorized (false positive or false negative) participants were more likely to be older and with MCI. Future algorithms could also take into consideration the characteristics of misclassified participants, and algorithms may need to be developed specifically for populations for which the algorithm performed less well (e.g., dementia cases were missed more often in persons with stroke or more functional limitations). Additional methodologies like the natural language processing [29] using free-text in EHR could complement code-based algorithms and help increase the accuracy of a dementia diagnosis in routinely collected EHR data. As routinely collected EHR data are becoming more available, studies need to examine further the characteristics of incorrectly identified patients, the barriers to accurate EHR documentation, and the development of effective dementia detection EHR algorithms.
Footnotes
ACKNOWLEDGMENTS
The study was supported by the National Institution of Aging (R01 AG068007). The Mayo Clinic Study of Aging was supported by the National Institutes of Health (U01 AG006786, P30 AG062677, R37 AG011378, R01 AG041851, R01 NS097495), the Alexander Family Alzheimer’s Disease Research Professorship of the Mayo Clinic, the Mayo Foundation for Medical Education and Research, the Liston Award, the GHR Foundation, the Schuler Foundation, and used the resources of the Rochester Epidemiology Project (REP) medical records linkage system, which is supported by the National Institute on Aging (NIA: AG058738), by the Mayo Clinic Research Committee, and by fees paid annually by REP users. The content of this article is solely the responsibility of the authors and does not represent the official views of the National Institutes of Health (NIH) or the Mayo Clinic. The funding sources had no role in study design; in the collection, analysis, and interpretation of data; in the writing of the report; and in the decision to submit the article for publication.
CONFLICT OF INTEREST
Sunyang Fu, Muskan Garg, and Luke R. Christenson report no disclosures. Jennifer St. Sauver and Sunghwan Sohn receive research support from the NIH. Maria Vassilaki has received research funding from F. Hoffmann-La Roche Ltd and Biogen in the past and consulted for F. Hoffmann-La Roche Ltd; she receives currently research funding from NIH and has equity ownership in Johnson and Johnson, Medtronic, Merck, and Amgen; Maria Vassilaki is an Editorial Board Member of this journal but was not involved in the peer-review process nor had access to any information regarding its peer-review. Ronald C. Petersen serves as a consultant for Roche, Inc., Eisai, Inc., Genentech, Inc. Eli Lilly, Inc., and Nestle, Inc., served on a DSMB for Genentech, receives royalties from Oxford University Press and UpToDate, and receives NIH funding.
DATA AVAILABILITY
The study makes de-identified data available to qualified researchers upon reasonable request.
