Abstract
Introduction
ADHD is the most common childhood neurodevelopmental disorder, with a prevalence of 5% to 10% among 4- to 17-year-olds in the United States and 5% worldwide (Polanczyk, Willcutt, Salum, Kieling, & Rohde, 2014; Thomas, Sanders, Doust, Beller, & Glasziou, 2015). ADHD is currently defined by persistent patterns of inattention and/or hyperactivity/impulsivity that are inconsistent with one’s developmental level, usually continue across the life span, and lead to impairments in social, educational, and work activities (American Psychiatric Association, 2013; Barkley, 1997; Barkley, Fischer, Smallish, & Fletcher, 2002; Levy, 2014).
The current best practice method used to diagnose ADHD in childhood is a comprehensive approach that combines information from multiple methods and informants, including parent and teacher ratings, and clinical information derived from parent interviews (Dupaul et al., 2016; Goldman, Genel, Bezman, & Slanetz, 1998; Willcutt, 2012). These methods, however, generally are not feasible to implement on a consistent basis in high-volume primary care settings (Epstein et al., 2014), which may preclude their use in large-scale clinical or population-based studies of ADHD. Such studies—which play an important role in addressing key ADHD-related research questions—frequently utilize existing administrative databases such as medical or insurance claims to identify patients with ADHD or classify ADHD status among study patients (Chen et al., 2014; Christensen, Sasané, Hodgkins, Harley, & Tetali, 2010; Guevara, Lozano, Wickizer, Mell, & Gephart, 2001, 2002; Leibson, Katusic, Barbaresi, Ransom, & Brien, 2001; Mandell, Guevara, Rostain, & Hadley, 2003; Zima et al., 2010). These databases can be efficiently utilized in research as they contain diagnostic and health service information about large numbers of individuals, are time efficient, relatively inexpensive to access, and can be linked with other data sets to examine relationships between health services and outcomes (Burke et al., 2014; Leibson et al., 2001; Losina, Barrett, Baron, & Katz, 2003; Mandell et al., 2003; Roos, Walld, Wajda, Bond, & Hartford, 1996; Virnig & McBean, 2001). In particular, with the advent of the “meaningful use” criteria released by the Department of Health and Human Services and recent government investment to incentivize the use of medical electronic health records (EHRs; Jha, 2010), the amount of EHR data readily available for research purposes is rapidly growing and likely to increase substantially in the future.
One potential limitation of utilizing EHR data for research is the uncertain validity of these data given that they are not collected for research purposes, may not be collected systematically, and may be incomplete or contain coding errors (Van Walraven & Austin, 2012; Virnig & McBean, 2001; Walkup & Yanos, 2005). To address this concern, several prior studies have examined different administrative databases—Medicaid and Medicare claims, pharmacy claims, or institution-level EHRs—to assess validity of diagnoses of psychiatric conditions (Walkup & Yanos, 2005), including depression (Rawson, Malcolm, & D’Arcy, 1997; Spettell et al., 2003), schizophrenia (Lurie, Popkin, Dysken, Moscovice, & Finch, 1992; Rawson et al., 1997), and autism (Burke et al., 2014). Results have been mixed; some case definitions were found to have high specificity but low sensitivity and positive predictive value (PPV; Lurie et al., 1992; Spettell et al., 2003), while others have found high PPVs or a high proportion of agreement between hospital and medical chart data (Burke et al., 2014; Rawson et al., 1997). Similar to the manner in which ADHD is assessed, these mental health conditions are diagnosed in community settings using labor-intensive best practice methods that are difficult to implement in high-volume primary care settings. Validation studies of administrative data are vital to understanding the potential misclassification of ADHD status and allowing researchers to estimate the magnitude and direction of misclassification bias in applied studies.
Despite the potential utility of EHR data for use in epidemiologic studies related to ADHD, only two studies have assessed the utility of EHR in identifying children diagnosed with ADHD (Daley et al., 2014; Guevara et al., 2001). Daley et al. (2014) calculated confirmation rates, that is, the percentage of diagnoses of ADHD indicated in EHRs that were confirmed via medical record review. Within the context of a larger study determining the cost of health care for those with ADHD, Guevara et al. (2001) reviewed the medical charts of 70 randomly selected patients with an ADHD diagnosis to assess the diagnosis’ accuracy and to conduct sensitivity analyses. To our knowledge, no study has examined the sensitivity and specificity of an algorithm to identify ADHD cases using existing administrative EHR data. Thus, our objective was to validate an EHR-based algorithm to classify ADHD status among a cohort of primary care patients of CHOP pediatric health care network. This internal validation study was part of a larger retrospective cohort study that aimed to examine the association between ADHD and driving outcomes among adolescents.
Method
CHOP EHR
The CHOP health care network encompasses more than 50 locations throughout southeastern Pennsylvania and southern New Jersey, including 31 primary care centers, 14 specialty care centers, a 535-bed inpatient hospital, two emergency departments, and two urgent care centers supporting more than one million visits annually. The network serves a socioeconomically and racially diverse population and accepts most insurance plans, including Medicaid. CHOP first implemented its EHR system (EpicCare®, Epic Systems, Inc, Madison, WI) in 2001, and it is currently used for all aspects of clinical care in all outpatient locations and within the hospital. For every hospital encounter or office visit, the CHOP EHR captures (a) identifiable data (e.g., name, date of birth, address); (b) demographic data (e.g., sex, race/ethnicity); (c) International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) diagnosis codes that indicate the reason(s) for the visit (required for a provider to bill for services under that code); (d) ICD-9-CM diagnosis codes that document ongoing or historical problems (known as the “problem list”); (e) medications prescribed by CHOP or non-CHOP providers; and (f) provider notes, letters, communication with non-CHOP health care providers, and other qualitative descriptions of each visit that may provide important supplemental information on ADHD diagnosis (e.g., results of diagnostic testing, documentation of parent and teacher reports).
ADHD Classification: Algorithm Development
We first conducted an initial analysis of EHR data to develop an algorithm to identify whether each patient had been diagnosed with ADHD. To this end, we identified all patients of the CHOP health care network who were born 1987-1995 and New Jersey residents at their last CHOP visit (N = 113,966). We then identified those who fit the following hierarchical criteria and, at each level, extracted a random sample of records for manual chart abstraction: (a) had an ICD-9-CM diagnosis code beginning with “314” (i.e., indicative of ADHD) at a CHOP office or hospital visit (n = 7,567; 172 sampled), (b) had an ICD-9-CM diagnosis code beginning with “314” on their problem list (n = 302; 30 sampled), (c) were prescribed a medication with primary indication for ADHD (n = 880; 88 sampled) in the absence of an ADHD diagnostic code, and (d) had select ADHD-related keywords (e.g., “ADHD,” “hyperactivity”) in provider progress notes or qualitative descriptions of visits (n = 3,055; 161 sampled) in the absence of an ADHD diagnostic code. This final step was a search for key terms within the provider notes and not a formal natural language processing evaluation. With oversight by two study authors (BY, TP), two trained abstractors performed a manual review of each sampled patient’s entire EHR to search for independent sources of information that confirmed ADHD case or non-case status. These sources included letters, patient history, evaluations, appointment notes, telephone notes, problem list addendums, uploaded media (e.g., scanned prescription, external evaluations), laboratory results, emergency department notes, and office or hospital visits if they indicated a separate calendar date or different physician (i.e., separate visit) than the originally entered ICD-9-CM code. Very few patients who were selected based on visit-or problem list-level ICD-9-CM codes were found to be non-cases (<2% and 0%, respectively). In addition, neither our keyword search in provider notes nor a medication prescription with primary indication for ADHD in the absence of ICD-9-CM diagnostic codes yielded additional confirmed cases of ADHD. Based on these results, we developed the following final algorithm to identify patients who had an ADHD diagnosis in our cohort study: Patients were classified as having ADHD if their EHR indicated an ICD-9-CM diagnostic code beginning with “314” either at a CHOP office or hospital visit or on their problem list.
Formal Internal Validation Study
The aim of this validation study was to develop an applied method to classify ADHD status for a retrospective cohort study examining the association between ADHD and motor vehicle crash outcomes. Eligibility for this formal validation study mirrored criteria for the larger applied study. Eligible patients included individuals who were born 1987-1995, had two or more office visits at any of the six CHOP primary care practices located in New Jersey, were New Jersey residents at the time of their last CHOP network visit, were not identified as having an intellectual disability, and had their last primary care visit at the age of 12 years or older to ensure the individual was seen by a primary care provider at an old enough age to confirm ADHD status (Diagnostic and Statistical Manual of Mental Disorders [5th ed.; DSM-5; American Psychiatric Association, 2013] criteria calls for ADHD symptom onset by age 12). The final study cohort included 15,609 patients (Figure 1). Using the above-described ADHD algorithm, 2,030 (13.0%) of these patients were classified as having ADHD.

Flowchart depicting selection of study cohort.
We conducted an EHR review for all patients who were classified with ADHD and a weighted random sample of patients classified as non-ADHD with the goal of confirming the presence or absence of ADHD in this cohort. Similar to the algorithm development phase, two trained abstractors reviewed patient EHRs with oversight by two study authors (BY, TP). The review process for the 2,030 patients classified as having ADHD is shown in Figure 2. Using EHR data, the following were automatically confirmed as having ADHD: 940 patients who had three or more ADHD-related CHOP visits and an additional 284 who were prescribed an ADHD medication by a CHOP provider (as medication was an independent source of confirmation). The remaining 806 patients included those who had either one or two ADHD-related CHOP visits or a problem list diagnosis and had no history of being prescribed an ADHD medication. For each of these patients, abstractors were instructed to manually review the entire EHR to locate any of the aforementioned independent sources confirming ADHD case status. When contradicting information was present in the EHR, abstractors weighed letters, evaluations, and appointment notes more in making a final diagnostic decision. A patient was determined to be a “confirmed ADHD case” when at least one independent source confirming ADHD diagnoses was recorded. Similarly, if an independent source was found that indicated the patient did not have ADHD, the patient was determined to be a “confirmed non-ADHD case,” weighing the most recent independent source more. Only one patient had information both confirming ADHD status and other information confirming non-ADHD status; this patient was confirmed as a non-ADHD case because one letter and two patient forms documented no history of ADHD. Finally, patients for whom neither a source confirming the presence or absence of ADHD was located were considered to have an unknown case status. Twenty-four percent of records were reviewed by both abstractors to assess interrater reliability. All disagreements were reviewed at consensus meetings, and final decision was made by the two co-authors (BY, TP).

Flowchart depicting manual review process of ADHD cohort to confirm case status.
We also reviewed a sample of 807 patients in the non-ADHD cohort to determine whether there were sources in their medical records indicating the presence of an ADHD diagnosis. To optimize the rigor of our assessment, we oversampled patients with visit- or problem list-level ICD-9-CM codes for disruptive behavior disorder (“313.81” or “312.X”) and learning disorder (“315.X”)—conditions frequently comorbid among children with ADHD; approximately 70% of patients in each group were sampled. In addition, a 3.5% sample of the 13,005 patients who did not have a comorbid conditions were reviewed. Sizes of samples selected for review were based on power calculations; for each sampled group, we had at least 80% power to detect a 5% misclassification rate with a confidence interval (CI) width of 2%.
Statistical Analysis
We estimated the proportion of patients with ADHD that our classification algorithm correctly identified as having ADHD (i.e., sensitivity), the proportion of patients without ADHD that our classification correctly identified as not having ADHD (i.e., specificity), the proportion of patients classified as ADHD by our algorithm who had an confirmed ADHD diagnoses (i.e., PPV), and the proportion of patients classified has non-ADHD by our algorithm who were confirmed non-ADHD patients (i.e., negative predictive value [NPV]), along with exact 95% CIs using standard measures for proportions. With respect to patients classified as “unknown ADHD case status” during the review of our ADHD cohort, we identified two distinct groups. The first was patients who had their last CHOP primary care visit on or before 2005, the year the EHR was fully implemented in New Jersey primary care practices; these patients were considered to have “incomplete records” as their EHR had very little information available beyond visit dates and diagnosis codes. The second group was patients who had visits after 2005 but did not have information in their EHR confirming the absence or presence of ADHD; we classified these patients as “inconclusive.” In analyses, patients with incomplete records were assumed to have the same distribution of ADHD, non-ADHD, and inconclusive status as patients for whom a case status classification was determined. Validation analyses were conducted under two different assumptions for those with inconclusive status, representing extreme conditions: (a) all were assumed to be true ADHD cases and (b) all were assumed to be true non-ADHD cases. We then weighted the manual review results of the sample of non-ADHD patients to the entire non-ADHD cohort. In addition to estimating validity for the overall sample, we estimated the PPV separately for each origin of diagnostic codes (visit-level vs. problem list). Analyses were conducted in SAS Version 9.3 (SAS Institute, Inc., Cary, NC). This study was approved by CHOP institutional review board.
Results
Overall, patients were last seen at CHOP primary care at a median age of 17.9 years (interquartile range [IQR] = 15.9, 19.1 years; Table 1). Patients in the ADHD cohort had a median of 27 (16, 44) office visits to the CHOP network compared with 18 (9, 31) for patients in the non-ADHD cohort. The majority of patients in both cohorts were non-Hispanic White (ADHD: 69%; non-ADHD: 59%), and patients with ADHD were more likely to be male (72% vs. 47%, p < .001).
Demographic and Relevant Characteristics for Study Participants in ADHD and Non-ADHD Cohorts (N = 15,609).
Note. Unless otherwise indicated, data are expressed as number (percentage) of participants. Denominators for percentages exclude missing data. Percentages are rounded and may not total to 100. IQR = interquartile range.
A total of 806 EHRs of patients classified as having ADHD were manually reviewed (Figure 2). Of these, we were able to confirm the presence of ADHD in 376; when combined with those automatically confirmed based on three plus visits or medication use, a total of 1,600 of the 2,030 (79%) were determined to be confirmed cases of ADHD. Only 1% of patients were confirmed as non-ADHD cases (n = 25), while the remaining 405 (20%) patients were determined to have an unknown case status. Of these 405 patients, 26% had incomplete records while 74% were labeled inconclusive.
Of the 807 patients in the non-ADHD group who were manually reviewed, 9 (1%) were misclassified as non-ADHD patients when in fact they had been diagnosed with ADHD. After fully weighting these results up to the larger non-ADHD comparison group, we estimated that 67 (0.5%) patients in the non-ADHD cohort were in fact diagnosed with ADHD.
The overall interrater agreement for ADHD case classification was 96.4% with a kappa of 0.87 (95% CI = [0.75, 0.99]). We also estimated the sensitivity, specificity, PPV, and NPV of the ADHD algorithm under the two assumptions (Table 2). When all inconclusive cases were assumed to be true ADHD cases, the overall sensitivity was 0.97 (95% CI = [0.96, 0.97]), the specificity was 0.99 (95% CI = [0.99, 0.99]), and the PPV was 0.98 (95% CI = [0.98, 0.99]). Estimates among those who were classified via visit-level codes and those classified via problem list codes were similar. When all inconclusive cases were assumed to be true non-ADHD cases, the sensitivity was 0.96 (95% CI = [0.95, 0.97]), specificity was 0.98 (95% CI = [0.97, 0.98]), and PPV was 0.83 (95% CI = [0.81, 0.85]). The estimated PPV among patients diagnosed using problem list codes was lower (0.67; 95% CI = [0.58, 0.75]) than those diagnosed using visit-level codes (0.84; 95% CI = [0.83, 0.86]). Notably, we found no evidence that validity varied appreciably by site; estimated sensitivity and specificity was greater than 92% for all sites.
ADHD Status Classified via Algorithm and After Manual Review of Electronic Health Records (i.e., True Case Status), Along With SE, SP, PPV, and NPV (95% CI) of Algorithm, Overall and Stratified by Origin of Diagnostic Code (N = 15,609).
Note. SE = sensitivity; SP = specificity; PPV = positive predictive value; NPV = negative predictive value; CI = confidence intervals; ICD-9-CM = International Classification of Diseases, Ninth Revision, Clinical Modification.
True case status estimated based on manual review of sample of records.
Case status determined by ICD-9-CM diagnosis codes documented on an office or hospital visit that indicate the reason(s) for the visit.
Case status determined by ICD-9-CM diagnosis codes documented on an list of ongoing or historical problems.
Discussion
This study developed and validated an EHR-based algorithm that classified ADHD status among a large cohort of primary care patients in a regional health care network. Overall, findings demonstrate that an algorithm that seeks to capture ADHD case status among primary care patients can do so with a high degree of sensitivity using only ICD-9-CM codes contained in visit-level diagnosis fields or in the problem list. In addition, a very low proportion of patients without ADHD diagnosis codes were in fact diagnosed with ADHD, resulting in a very robust estimate of specificity for the algorithm. Notably, we were unable to use EHRs to independently confirm the presence or absence of an ADHD diagnosis for one in five patients with ADHD diagnosis codes; this likely reflects the lack of documentation of how ADHD diagnoses were made. Despite this limitation, we demonstrated that the selected ICD-based algorithm was able to identify the presence of ADHD with a high degree of accuracy.
Our findings establish the utility of EHR’s in classifying ADHD status in large population-based studies and quality improvement projects and uniquely extend results of two prior studies. Daley et al. (2014) used EHR data to identify potential cases of incident (i.e., newly diagnosed) ADHD among 3- to 9-year-old patients in 10 large health care organizations. To assess the accuracy of the EHR diagnosis data, the authors randomly selected 40 patients within each site for whom medical charts were reviewed to determine confirmation of incident ADHD. The investigators used five definitions of a confirmed case of ADHD and, for each, calculated the proportion of potential ADHD cases that were confirmed (Daley et al., 2014). When incident and prevalent cases were included, the proportion of confirmed ADHD cases was high (89.8%-94.2%); however, the proportions were lower when a more conservative definition was used (48.9%-59.1%). To assess the validity of ADHD diagnoses in claims data and evaluate the health care costs of ADHD, Guevara et al. (2001) conducted a sensitivity analyses within the context of the larger study. The investigators randomly selected 70 patients for whom medical charts were reviewed and categorized ADHD case status based on different ADHD definitions: probable, possible, and doubtful diagnoses. Seventy-two percent of the 70 patients were classified as having a probable diagnosis, 21% had a possible diagnosis, and 7% had a doubtful diagnosis (Guevara et al., 2001). While both studies estimated the PPV of an existing ADHD diagnosis in health records, neither included patients without ADHD; thus, they were not able to fully assess the quality of their classification by calculating sensitivity or specificity. Although PPV is an important measure of classification quality and can be used to quantitatively estimate the extent of and correct for misclassification bias in applied studies, its dependence on the prevalence of the condition in the population makes portability across studies with different populations problematic (Lash, Fox, & Fink, 2009). To our knowledge, this study is the first to estimate sensitivity and specificity of ADHD diagnosis codes, which can quantify a classification’s performance across populations. More generally, validation studies make it possible to conduct quantitative bias analyses, allowing researchers to account for potential bias within an applied study and ultimately enhance the robustness of study results. While epidemiology methodolgists strongly encourage the conduct of bias analysis (Lash et al., 2009), the process is not often used in applied clinical research.
There are several limitations to this study. First, this method is limited by the quality of diagnostic information in medical records, which was highly variable; some records had parent and teacher rating scales and the results of evaluations conducted by external mental health experts, while others had no information beyond the visit date and diagnosis code. While ADHD diagnoses in the EHR accurately reflect real-world diagnostic decisions rendered by providers in primary care, these decisions may differ to some extent from those rendered by expert mental health providers (e.g., psychologists, psychiatrists, developmental pediatricians) applying best practice methods for assessing ADHD in a systematic manner. Furthermore, inherent in the process of diagnostic decision making for mental health conditions is that, even when best practice methods for diagnosis are systematically applied, there is always an element of uncertainty in a small percentage of cases given that this diagnosis relies on clinical judgment. Accordingly, although this study is able to make conclusions about the extent of an EHR-based algorithm to distinguish patients who have and have not been diagnosed with ADHD in clinical settings, it was not designed to identify “true” ADHD status. Another potential limitation is that we only examined EHR data from a single health care network, raising questions about generalizability. However, the network includes a variety of providers across multiple settings and serves a broad range of patients with regard to race/ethnicity and socioeconomic status. Furthermore, the network utilizes a single EHR system, which promotes standardized practice, similar to other large health care systems. In addition, although there may have been some variability in how providers diagnosed ADHD across sites, we found no appreciable difference in accuracy across the six practices. As such, the practices, providers, and patients represented in this study likely reflect those in other large health care systems, although replications in other systems are needed to confirm generalizability. Study findings are likely more generalizable to studies that are conducted within large health care systems—in particular, those that standardize practices—than to studies that combine data from multiple institutions. In addition, abstractors were not blinded to the ADHD status of the patients. It is possible that this knowledge might have biased the reviewers; however, we expect this was minimized given that we developed strict criteria on information that was considered “confirmatory,” included a classification category of “inconclusive,” and ensured final results were reviewed by two licensed psychologists (BY, TP). Finally, this method does not capture patients who do have ADHD but either do not seek care or were unrecognized by CHOP providers.
Conclusion
Health service and applied clinical studies have utilized ICD-9-CM codes derived from EHRs to classify ADHD status; however, internal validity of study results is dependent on the extent to which diagnosis codes can accurately classify status. The current study describes the development an EHR-based approach to classify ADHD status among primary care patients and, for the first time, demonstrates that such an algorithm can be used by large epidemiologic and clinical studies with high sensitivity and specificity. Estimates of sensitivity and specificity from the current study can be utilized by future applied studies to conduct analyses that correct estimates of measures of association (e.g., odds ratios, risk ratios) for misclassification of ADHD status.
Footnotes
Acknowledgements
The authors thank Melissa Pfeiffer for her expertise in data management and for her review of the manuscript and Michael Elliott for his guidance on statistical analysis. In addition, the authors thank Meghan Kirk and Sayaka Ogawa for their assistance reviewing electronic health records.
Authors’ Note
The views presented are those of the authors and not necessarily the views of The Children’s Hospital of Philadelphia (CHOP) or National Institutes of Health (NIH).
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Eunice Kennedy Shriver National Institute of Child Health and Human Development at the National Institutes of Health (Grant R01HD079398, Principal Investigator: Curry).
