Abstract
Surveillance of chronic hepatitis C virus (HCV) cases faces limitations that result in delays and underreporting. With increasing use of electronic health records (EHRs), the authors evaluated the predictive value of using International Classification of Diseases, Ninth Revision (ICD-9) codes to identify chronic HCV cases from EHR data. Longitudinal EHR data from 4 health care systems during 2006–2012 were evaluated. Using chart abstraction and review to confirm chronic HCV cases (“gold standard” definition), the authors calculated the sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) of 2 case definitions: (1) ≥2 ICD-9 codes separated by ≥6 months and (2) ≥1 positive HCV RNA (ribonucleic acid) test. Among 2,718,995 patients, 20,779 (0.8%) with ICD-9 codes indicating a likely diagnosis of chronic HCV infection were identified; 13,595 (65.4%) of these were randomly selected for review. Case definition 1 (≥2 ICD-9 codes separated by ≥6 months) had 70.3% sensitivity, 91.9% PPV, 99.9% specificity, and 99.9% NPV while case definition 2 (≥1 positive HCV RNA test) had 74.1% sensitivity, 97.4% PPV, 99.9% specificity, and 99.9% NPV. The predictive values of these alternate EHR-derived ICD-9 code-based case definitions suggest that these measures may be useful in capturing the burden of diagnosed chronic HCV infections. Their use can augment current chronic HCV case surveillance efforts; however, their accuracy may vary by length of observation and completeness of EHR data.
Background
T
Viral hepatitis surveillance is critical to accurately estimating the true burden of HCV infection in the United States. 4 However, the true prevalence of chronic HCV infection cannot be determined from national surveillance data 11,12 because current surveillance is based on a passive reporting system, (ie, a system wherein state and local health departments have variable ability to detect and report viral hepatitis A, B, and C cases through the National Notifiable Diseases Surveillance System [NNDS]). 4,5 The surveillance infrastructure to monitor chronic HCV infections at the local and state level is inadequate. 13 Chronic HCV infection surveillance has been limited by the lack of resources and personnel at the local and state levels to assess and follow-up HCV case reports to determine chronicity, resulting in considerable delay, under-ascertainment, and underreporting of cases. Such efforts also are labor intensive for public health department staff, because, for example, an average of 4 documents per case may need to be evaluated to distinguish between acute, resolved, or chronic HCV case status. 14 Thus, in 2013, chronic HCV surveillance cases were reported by only 37 of the 50 states. 4 Accordingly, the National Health and Nutrition Examination Survey (NHANES) has mainly been used to estimate the burden of chronic HCV in the United States. 1,15 NHANES is a random survey of US residents and obtains representative data on the health and nutritional status of the population. Though it has been used to estimate national HCV infection prevalence, 1 the NHANES sample is a low-risk, low-prevalence population, consequently limiting its generalizability. 1,16 NHANES also includes surveillance data from persons who are currently infected and those ever infected with HCV, and therefore may not accurately capture cases of chronic HCV infection. 1
Adoption of electronic health records (EHRs) has increased significantly over the past decade in the United States. 17 With the increasing availability of EHRs, the use of the International Classification of Diseases, Ninth Revision (ICD-9) codes to complement public health surveillance has been proposed. 18 The ICD-9 coding system is a system that assigns specific codes to clinical diagnoses and procedures that are associated with health care utilization in the United States. Prior work has demonstrated that ICD-9 codes have good sensitivity and specificity for public health surveillance of some disease conditions. 18 –21 For example, one study showed that the use of EHR-derived ICD-9 codes for acute hepatitis B had good sensitivity and specificity when compared to chart abstractor confirmed diagnoses and can be used for disease surveillance 18 Other studies have shown that EHR-derived ICD-9 codes can be effective for the public health surveillance of influenza-like illnesses, Clostridium difficile infections, community-acquired pneumonia, and other acute communicable diseases. 19,20,22,23 There are limited data on the sensitivity and specificity of ICD-9 or ICD-10 codes for chronic HCV infection 24 and their utility in the surveillance of chronic HCV infections despite the aforementioned barriers to current surveillance strategies.
With increasing use of EHRs in the United States, the predictive values of ICD-9 and ICD-10 codes for identifying true cases of chronic HCV infection warrants evaluation. The use of EHR-derived ICD-9 and ICD-10 codes to identify true cases of chronic HCV infection can further complement surveillance data obtained from NNDS and NHANES. In this study, data from the Chronic Hepatitis Cohort Study (CHeCS) were used to examine the sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) of using 2 alternate EHR-derived ICD-9 code-based case definitions, (1) ≥2 hepatitis C ICD-9 codes separated by ≥6 months and (2) ≥1 positive HCV RNA (ribonucleic acid) test, when compared to true cases of chronic HCV infection, ascertained by abstractor chart review (“gold standard” case definition).
Methods
CHeCS is a cohort study drawing patients from the EHR and claims databases of 4 large integrated health systems with data on more than 2.7 million patients. This analysis includes patients aged 18 years or older who used any health service between January 1, 2006, and December 31, 2012. 25 The 4 integrated health systems include Geisinger Health System (GHS), Danville, PA; Henry Ford Health System (HFHS), Detroit, MI; Kaiser Permanente-Hawaii, (KPHI), Honolulu, HI; and Kaiser Permanente-Northwest (KPNW), Portland, OR. GHS serves approximately 2.6 million Pennsylvania residents. HFHS provides health care services to more than 1 million southeast Michigan residents. KPHI provides health care services to about 220,000 health plan members in Hawaii while KPNW provides health care services to about 500,000 health plan members in Oregon.
Eligible patients for this study were selected from the CHeCS cohort between January 1, 2006, and December 31, 2012 because chronic HCV infections could be confirmed by abstractor chart review (gold standard) during this time period. The study team obtained retrospective and prospective longitudinal EHR data related to patients' demographic characteristics, medical diagnoses, and laboratory results starting from the time of the patients' first visit to the health system when EHRs were available in order to identify chronic HCV infections. Data were evaluated both retrospectively and prospectively through the end of 2013. Given that CHeCS patients' records have received ICD-9 coding to date, patients with at least 1 of the following were considered eligible for further evaluation: any acute or chronic HCV ICD-9 code (070.44, 070.54, 070.70, 070.71, 070.41, 070.51), a positive HCV antibody test, or a positive HCV RNA test (nucleic acid test [NAT]; qualitative, quantitative, or genotype test). Each year, patients' records were evaluated to determine whether the cohort inclusion criteria were met, and patients who did not meet the cohort inclusion criteria in one year were reevaluated for cohort inclusion in subsequent years. Trained medical abstractors reviewed the EHRs of patients meeting cohort inclusion criteria to collect summary data and confirm chronic HCV infection status. Because they might in fact be true chronic HCV cases, charts lacking documentation that a specialist had diagnosed chronic HCV or with documentation of acute HCV were flagged and reviewed under the supervision of a physician using hepatologist-developed criteria. 25 Flagged cases for which chronic viral hepatitis infection could not be confirmed were excluded from the study.
The demographics of patients with chronic HCV infection meeting cohort inclusion criteria and confirmed by trained medical abstractors (the gold standard), were compared to alternate “chronically infected cases” identified solely by EHR-derived ICD-9 codes (≥2 hepatitis C ICD-9 codes separated by ≥6 months) or laboratory results (≥1 positive HCV RNA test). The sensitivity, specificity, PPV, and NPV of the 2 alternate case definitions were compared to the gold standard. All analyses were conducted using SAS v.9.3 (SAS Institute, Inc., Cary, NC).
Results
Figure 1 is a flowchart that shows CHeCS cohort eligibility and chronic HCV case identification. Of 2,718,995 eligible health system patients, the study team identified 20,779 (0.8% of the total health system population) unique individuals with a likely diagnosis of chronic HCV infection (≥1 positive anti-HCV or NAT, ≥2 HCV ICD-9 codes, or 1 HCV ICD-9 code plus a positive anti-HCV or NAT). Of these likely cases, ∼65% were randomly selected for review by trained medical abstractors because of funding restrictions. Numbers of confirmed chronic HCV infections and chronic HCV infections ruled out after medical abstractor chart review are shown in Figure 1. Less than 10% of patients had insufficient evidence–primarily lacking an HCV RNA test–to determine chronic HCV infection status.

Flow chart for cohort eligibility and chronic HCV case identification. HCV, hepatitis C virus; ICD-9, International Classification of Diseases, Ninth Revision; RNA, ribonucleic acid.
The study team compared the demographic characteristics of the 11,273 abstractor-confirmed (gold standard) chronic HCV cases with the 7928 confirmed chronic HCV cases identified using the alternate EHR-derived ICD-9 case definition, ≥2 HCV ICD-9 codes separated by ≥6 months, and 8349 confirmed chronic HCV cases identified using the EHR-derived laboratory-based case definition, ≥1 positive HCV RNA test. The age and sex composition of each of the 3 case definitions (gold standard case definition and the 2 alternate case definitions) are similar (Table 1). There are slight differences in the composition of White, non-Hispanic and Black non-Hispanic participants by the case definitions. White, non-Hispanic participants accounted for between 58.8% and 68.4% of respondents by case definition while Black, non-Hispanic participants accounted for between 19.4% and 29.7% of respondents. The composition of Asian, non-Hispanic, Native American, Hawaiian/Pacific Islander, and unknown race respondents was similar across all 3 case definitions. HFHS accounted for the greatest proportion of respondents identified using the gold standard case definition (abstractor confirmed chronic HCV infection cases) and the EHR-derived laboratory-based case definition, ≥1 positive HCV RNA, while KPNP accounted for the greatest proportion of respondents identified used alternate EHR-derived ICD-9 case definition, ≥2 HCV ICD-9 codes separated by ≥6 months (Table 1).
HCV, hepatitis C virus; RNA, ribonucleic acid.
Chronic HCV case definition: ≥2 HCV ICD-9 codes separated by ≥6 months
The case definition, ≥ 2 HCV ICD-9 codes separated by ≥ 6 months, identified 8622 chronic HCV infection cases (Fig. 1A). However, this definition did not identify 38.8% of true chronic HCV cases (cases identified by the gold standard of abstractor chart review). Numbers of confirmed chronic HCV infections (true positives) and those ruled out as chronic HCV infections (false positives) are shown in Figure 1A.
Table 2 shows the sensitivity, specificity, PPV, and NPV of this case definition. Using retrospective longitudinal data, the sensitivity, the ability to identify true chronic HCV infections, of this case definition was 70.3%, while its specificity, the ability to correctly identify non–chronic HCV infections, was almost 100%. The PPV describes the proportion of chronic HCV infections identified by this case definition that are confirmed chronic HCV infections while the NPV describes the proportion of non–chronic HCV infections identified by this case definition that were ruled out as chronic HCV infections. The PPV and NPV of this case definition were over 91% (Table 2).
070.44, 070.54, 070.70, 070.71, 070.41, 070.51.
EHR, electronic health record; HCV, hepatitis C virus; ICD-9, International Classification of Diseases, Ninth Revision; RNA, ribonucleic acid.
Chronic HCV case definition: ≥1 positive HCV RNA test
The case definition, ≥1 positive HCV RNA test, identified 8574 chronic HCV cases (Fig. 1B). This definition did not identify 34.1% true chronic HCV infections (cases identified by the gold standard of abstractor chart review). Numbers of confirmed chronic HCV infections (true positives) and those ruled out as chronic HCV infections (false positives) are shown in Figure 1B. The reasons for ruling out chronic HCV infection among these cases included errors in lab data and spontaneous clearance of infection.
Table 2 shows the sensitivity, specificity, PPV, and NPV of this case definition. The sensitivity of this case definition was 74.1% and the specificity was almost 100%. The PPV and NPV of this case definition were over 97% (Table 2).
Discussion
Currently, passive national surveillance conducted through the NNDS only captures a small percentage of all chronic HCV cases 5 ; hence, this study explored the utility of EHRs and found some evidence that its use could supplement current HCV infection surveillance efforts. The findings of this analysis suggest that using the 2 alternate EHR-derived case definitions in the longitudinal data set (ie, ≥2 HCV ICD-9 codes separated by ≥6 months, or ≥1 positive HCV RNA test) to identify chronic HCV infections yielded moderate but varying sensitivity and PPV. The sensitivity and PPV of using ≥1 positive HCV RNA test to identify chronic HCV infection was greater than using ≥2 HCV ICD-9 codes separated by ≥6 months. However, the 2 alternate case definitions had the same specificity and NPV. The greater sensitivity and PPV associated with using ≥1 positive HCV RNA test as a case definition may be related to the fact that patients who have an HCV RNA test are those who may have received a positive antibody test to screen for HCV infection previously, may have elicited a greater index of clinical suspicion from their physicians, or may be more clinically symptomatic. Hence, these patients may be more likely to have true chronic HCV infections than those who only have HCV ICD-9 codes. Among persons with ≥2 HCV ICD-9 codes separated by ≥6 months, the first HCV ICD-9 code might be a “rule out” diagnosis code and may not reflect a true diagnosis, consequently affecting the sensitivity of this case definition. The lower sensitivity associated with using ≥2 HCV ICD-9 codes separated by ≥6 months also may be affected by time duration (≥6 months) associated with this case definition compared with a case definition that requires ≥1 positive HCV RNA test that is not associated with any time duration. Additionally, because chronic HCV infection is often an underlying disease condition, physicians may fail to include this diagnosis at every clinic visit, further decreasing the sensitivity of using the alternate case definition, ≥2 HCV ICD-9 separated by ≥6 months.
Despite the increased sensitivity and PPV of using ≥1 positive HCV RNA test, this case definition is limited by the fact that it cannot completely distinguish between acute and chronic HCV infections. HCV RNA can be detected in blood as early as 2 to 3 weeks after infection and laboratory records alone do not provide a comprehensive clinical picture of chronicity. 26 The increased PPV of using ≥1 positive HCV RNA may be attributed to the difference in prevalence of confirmed chronic HCV cases identified by both case definitions. The case definition, ≥1 positive HCV RNA, identified more confirmed chronic HCV infections than the other case definition, ≥2 HCV ICD-9 codes separated by ≥6 months.
This analysis is subject to several limitations. These findings may not be generalizable to other populations because the prevalence of chronic HCV infection in this sample may be higher than the general population 1 and data used were obtained from the EHRs of 4 health systems that primarily provided health care to certain catchment areas in 4 US states. Using EHR laboratory data to distinguish true from questionable or non–cases of HCV infection required very extensive data processing and selected chart review, primarily because laboratory test codes and reporting formats are not standardized between laboratories, and these codes are subject to change over time. This analysis did not evaluate whether the time for data capture affects the sensitivity and specificity estimates of the case definitions. Especially for chronic conditions such as chronic HCV infection, a longer time for data capture may increase the number of true cases identified.
Another limitation of this study is that its findings were based on ICD-9 codes that have recently been replaced by ICD-10 codes. Despite the improved specificity of ICD-10 codes, the study team expects the conclusions to remain the same because the ICD-9 codes the EHR-derived case definition was based on are consistent with the new ICD-10 codes. It is also possible that some true cases were missed among the patients excluded from chart review who had only a single ICD-9 code for HCV but no positive anti-HCV or RNA tests. This analysis was limited to patients who had access to health care. Undiagnosed cases also could not be ascertained. 2 The study cohort was geographically and demographically diverse and as such may not reflect the demographic composition of patients in other geographic regions, hence limiting its generalizability. However, this patient population reflects real-life clinical care at 4 sites across the United States with broad catchment areas. There also are human and economic limitations regarding the feasibility of using ICD codes for public health surveillance that can be attributed to the extensive data processing and chart review that are required to ensure data completeness and accuracy.
In conclusion, the predictive values of these alternate EHR-derived ICD-9 code-based case definitions suggest that these measures may be useful in complementing current strategies to capture the burden of diagnosed chronic HCV infections. Although results should be interpreted cautiously as reliability may vary greatly according to the completeness of EHR data and EHR data observation period, their use offers a timely chronic HCV surveillance approach that can augment, but not replace, current chronic HCV infection surveillance efforts that are based on data from NNDS and NHANES.
Footnotes
Author Disclosure Statement
Drs. Abara, Zhong, Collier, Gordon, Boscarino, Schmidt, Trinacty, and Holmberg, Ms. Moorman, and Ms. Rupp declared no conflicts of interest with respect to the research, authorship, and/or publication of this article. CHeCS was funded by the CDC Foundation, which received grants from AbbVie; Genentech, a Member of the Roche Group Gilead Sciences; Janssen Pharmaceuticals, Inc.; and Vertex Pharmaceuticals. Past partial funders include Bristol-Myers Squibb. Granting corporations do not have access to CHeCS data and do not contribute to data analysis or writing of manuscripts. The findings and conclusions in this report are those of the authors and do not necessarily represent the official position of the Centers for Disease Control and Prevention
