Abstract
Objective:
Lyme disease (LD) is the most commonly reported tick-borne illness in North America. To improve LD surveillance, we explored claims data as an adjunct data source for monitoring trends in Lyme disease incidence.
Methods:
We retrospectively analyzed claims from a nationwide US health insurance plan, identifying patients with newly diagnosed LD in 13 high-prevalence states over two time periods, 2004–2006 and 2010–2012.
Results:
The average LD case incidence as estimated by using claims data in 2010–2012 (75.67 per 100,000 person-years, n = 3474) was 1.50 times higher than 2004–2006 (50.25 per 100,000 person-years, n = 1965) (p < 0.001) and higher than incidence reported by the states to the Centers for Disease Control and Prevention. Among the 13 highest-prevalence states, there were 11 states with increased LD incidence over time.
Conclusions:
Surveillance systems should explore a fusion of data sources, including payer claims that appear to be highly sensitive with limitations, with electronic laboratory data that afford high specificity, but appear to miss cases.
Introduction
L
LD is usually diagnosed on the basis of clinical manifestations and a history of tick bite and can be confirmed by a serological test or supported by residence in or travel to endemic areas. LD is underreported (Coyle et al. 1996, Meek et al. 1996, Johnson et al. 2011, Jones et al. 2012, Muller et al. 2012, Nelson et al. 2013). The initial diagnosis of LD may be difficult if the patient is unaware of the tick bite, if the erythema migrans (EM) rash does not develop or is not noticed, or if the presentation is atypical (Smith et al. 2002, Bratton et al. 2008, Shapiro 2014). The Centers for Disease Control and Prevention (CDC) recommends a two-tiered algorithm for LD serologic testing, using an enzyme immunoassay (EIA) as a first test, followed by western immunoblot (WB) for immunoglobulin M (IgM)/IgG if the first-tier EIA is positive or equivocal (Centers for Disease Control and Prevention 1995). The limitations of passive surveillance (Doyle et al. 2002) result in underascertainment and underreporting. Common use of the conventional CDC two-tiered algorithm (Nelson et al. 2013) incorrectly can lead to misinterpretation and misclassification (Steere et al. 2008, Branda et al. 2010, Schoen 2013).
Infectious disease surveillance in the United States depends on health care providers reporting cases with sufficient information for case classification may suffer from various forms of selection bias (McCarthy and Giesecke 1999, Bailey et al. 2005) and incomplete reporting (Doyle et al. 2002). Claims data are available in an electronic format, contain consistent fields, and provide information with population coverage timely (Virnig and McBean 2001). The low cost and timeliness of availability make claims data a potential supplementary source of data for infectious disease surveillance (Allen-Dicker and Klompas 2012, Jones et al. 2013, Marder et al. 2015).
We used insurance claims from a major nationwide employer-provided health insurance plan in the United States plan to estimate the change in incidence of LD cases over time in 13 high-prevalence states.
Methods
Study population and data
We conducted a population-based retrospective cohort study using medical insurance claims data from a nationwide health insurance plan in the United States. The Boston Children's Hospital Institutional Review Board approved the study, granting a waiver of consent.
For comparison, datasets were constructed for two time periods, 2004–2006 and 2010–2012. Insurance claims generated by individuals from the 13 states with high-prevalence LD (Connecticut, Delaware, Maine, Maryland, Massachusetts, Minnesota, New Hampshire, New Jersey, New York, Pennsylvania, Vermont, Virginia, and Wisconsin) (Centers for Disease Control and Prevention 2013) were included in this study. Location of infection in these states was based on the zip code of the provider who made the first LD diagnosis. The LD case number in 13 states based on provider's and member's zip code were strongly correlated (r = 0.96, p < 0.001).
The database covers records of outpatient and inpatient visits, drug prescriptions, and laboratory orders. Every outpatient or inpatient visit was coded with one principal and up to three secondary International Classification of Diseases, Ninth Revision (ICD-9) codes and one zip code associated with the provider's address. Prescription-filling data include the date, National Drug Code, and quantity dispensed (in days). Laboratory orders were coded with the Current Procedural Terminology (CPT) code. If the time interval between two claims with the same diagnosis codes was shorter than 30 days (a 30-day persistence window), they were merged into a so-called “condition era,” a means to apply consistent rules for medical conditions to infer distinct episodes in care (Ryan 2010). Similarly, prescription fills for an individual drug were treated as drug eras based on a 30-day persistence window (Ryan 2010): If a new prescription is refilled within 30 days of the end date of a previous one, the two prescriptions were merged into a single era and treated as continuous therapy.
Case definition
To be included in the cohort of patients with newly diagnosed LD, at least one occurrence of the LD-specific ICD-9 code 088.81 was required between January 1, 2004, and December 31, 2006, and between January 1, 2010, and December 31, 2012. Any patient having LD ICD-9 codes in the year prior to the start date of each study period was excluded. Furthermore, any patient having LD ICD-9 codes in both periods was counted once in 2004–2006.
To improve the specificity of our case identification, inclusion also required antibiotic treatment for LD and serologic testing for B. burgdorferi (CPT code: 86617 and 86618 for EIA and western blotting, respectively). In 2004–2006 and 2010–2012, only 287 and 637 patients, respectively, had PCR tests coded, without serologic testing; they were excluded from the analysis. Treatment for LD is defined as a 14-day or longer course of one of the antibiotics recommended for the treatment of LD by the Infectious Diseases Society of America (IDSA) (doxycycline, amoxicillin, cefuroxime axetil, ceftriaxone, cefotaxime, penicillin G, and azithromycin, clarithromycin, erythromycin for adult patients intolerant of amoxicillin, doxycycline, and cefuroxime axetil) (Wormser et al. 2006), provided that treatment began within 30 days before or after the first LD condition era. At least one serologic test order was required within 90 days before or after the first LD condition era.
Statistical analysis
Incidence rates with 95% percent confidence intervals (CI) based on the Poisson distribution were calculated. The incidence rates from CDC-reported data were calculated using confirmed LD cases, based on the confirmed LD case number and incidence rate in 13 high-prevalence state on CDC's website (Centers for Disease Control and Prevention 2013). Significance for comparison of rates was tested with a Z statistic, using a two-sided approach. All analyses were performed using R software (v. 3.1.0, R Foundation for Statistical Computing,
Results
LD serologic test orders and LD diagnoses
Serologic test orders, LD diagnoses, and antibiotic treatment for LD are described in Figure 1. A total of 51,008 patients had at least one LD serologic test in 2004–2006. Of those, 4785 (9.4%) also had at least one LD diagnosis code and 1965 (3.8%) were treated for 14 or more days. About half of the LD cases (49.8%) had both antibody tests recommended by the CDC (EIA and western blotting) done, whereas the remainder had only one of these tests done. In 2010–2012, 79,836 patients were tested for antibodies to B. burgdoferi. The number of patients newly diagnosed with LD and the patients meeting the case definition were 6844 (8.6%) and 3474 (4.4%), respectively; 55.2% of the LD cases had two-tiered antibody test.

Construction of the 2004–2006 and 2010–2012 cohorts. Case definition required Lyme serologic test, at least one Lyme disease (LD) diagnosis code 088.81, and 14 or more days of antibiotic treatment with Infectious Disease Society of America–recommended antibiotics. Two-tiered Lyme serologic test: Using an enzyme immunoassay (EIA) as a first test, followed by western immunoblot for immunoglobulin M (IgM)/IgG if the first-tier EIA is positive or equivocal.
Incidence rate of LD
In the 2004–2006 period, 1,676,095 insured individuals (3,910,249 person-years) resided in 13 states with high-prevalence of LD. Of these, 7213 had at least one principal ICD-9 diagnosis code of LD and 1965 individuals met our full inclusion criteria. In the 2010–2012 period, 1,926,022 insured individuals (4,590,875 person-years) resided in these 13 states. Of these, 10,512 were diagnosed with LD, and 3474 met our inclusion criteria. The demographic characteristics of LD cases in the two study periods are described in Table 1.
The mean annual incidence rates of LD cases estimated from claims data in the two study periods are shown in Figure 2. The estimated incidence rate in 2010–2012 (75.67 cases per 100,000 per year) was 1.50 times higher than in 2004–2006 (50.25 cases per 100,000 per year) (p < 0.001). For comparison, the rates computed based on the confirmed cases reports from state health departments to CDC for the same periods in 13 high-prevalence states are also shown in Figure 2. The geographic patterns and incidence changes over two study periods in each state are shown in Figure 3. The estimated incidence rates were increased over time in 11 of 13 states. Only two states (Connecticut and Delaware) had lower LD incidence rates in 2010–2012.

Estimated annual incidence rates of Lyme disease in 2004–2006 and 2010–2012 (per 100,000 insured patients per year) represented by circles with the Centers for Disease Control and Prevention (CDC) reported incidence (per 100,000 population) of confirmed cases in 13 high-prevalence states represented by boxes for comparison (Centers for Disease Control and Prevention 2013).

Geographic distribution and incidence changes over time of Lyme disease cases in 2004–2006 and 2010–2012.
Antibiotic treatment for LD
Of the LD patients treated for 14 or more days with antibiotics, most were treated with doxycycline in both 2004–2006 (78.7%) and 2010–2012 (78.6%). The average treatment length based on claim prescriptions for the two periods was 31.0 (95% CI 29.7–32.3) and 31.7 (95% CI 30.7–32.7) days, respectively. The average numbers of refills in both study periods was 1.4. The percentage of patients treated with more than two antibiotics in 2010–2012 (26.9%) was higher than in 2004–2006 (14.6%) (p < 0.001). There were more patients treated for 14 or more days with antibiotics for LD in 2010–2012 (33.0% of patients who had at least one LD diagnosis code) than in 2004–2006 (27.2%) (p < 0.001).
Discussion
There was a substantial increase in the incidence of LD between 2004–2006 and 2010–2012, as ascertained from insurance claims data. In both time periods, our estimates of LD incidence were higher than the incidence computed from CDC reports of confirmed LD cases. LD incidence reported by CDC is based on disease reported to state and local public health departments, who then classify cases according to a CDC case definition developed by the Council of State and Territorial Epidemiologists (CSTE) before transmitting data to CDC. Underreporting is well-recognized in public health surveillance systems (Doyle et al. 2002) in general and for LD specifically. Physician surveys suggested only 8.2% (Coyle et al. 1996) to 16% (Meek et al. 1996) of LD cases are reported. A prior study estimated there were 288,000 newly infected patients in the United States in 2008 based on data from seven large commercial laboratories that perform more than three quarters of diagnostic tests for LD on patients in Connecticut, Maryland, Minnesota, and New York (Hinckley et al. 2014).
Our estimates of LD incidence based on claims data are subject to limitations. First of all, using claims data from a single health care insurer only captures persons with the health plan, not including uninsured persons, military personnel, and Medicaid/Medicare enrollees. However, a growing number of states have established all-payer claims databases to collect and concatenate medical claims from public and private payers. Additionally, an ICD-9 code for LD may be present in the claims data even in situations where LD was eventually excluded as a diagnosis (Sickbert-Bennett et al. 2010). Moreover, diagnosis codes have been found to have variable accuracy for because of coding and physician errors (Sickbert-Bennett et al. 2010).
Claims data include tests billed for but not test results. Hence, patients with negative tests are still counted as cases if they meet the other criteria. A previous study found that about 12% of specimens submitted to seven large laboratories were estimated to represent true infections (Hinckley et al. 2014). Serologic testing may be overused or used incorrectly, yield false positives in cases with a low prior probability of disease, or yield a false-negative result in cases with early stage of infection (Steere et al. 2008), leading to misinterpretation and misclassification (Tugwell et al. 1997, Ramsey et al. 2004, Steere et al. 2008, Muller et al. 2012, Hinckley et al. 2014). We found that over 90% of patients had serologic tests, without a LD ICD-9 code; these patients might have negative results of serologic test or have LD but coded with disseminated manifestations. Furthermore, LD can be diagnosed clinically in patients with an EM who live in or have traveled to endemic areas without serologic testing (Wormser et al. 2006). Patients with early LD who were not tested would be missed with serologic testing being included in the LD case definition.
Due to the limitation of using claims data, the motivation for the treatment is not included. Although the LD case definition needs to be validated, we believe we have minimized these effects using a strict case definition that included a combination of test order, LD diagnostic code, and antibiotic treatment as criteria. In contrast, surveillance systems using laboratory results to identify cases will miss cases where tests serology was falsely negative or where patients were treated empirically, although it could be identified by physician reporting. Another limitation is that due to the lack of inpatient prescriptions in claims data, cases treated primarily in an inpatient setting may be undercounted. Furthermore, we found that less than 1% of our LD cases were treated with intravenous (i.v.) antibiotics as outpatients, lower than the percentage of LD cases with apparent late-stage manifestations, such as neurologic and cardiac symptoms (Bacon et al. 2008). We suspect this low rate may be due in part to out-of-pocket payment when insurance prior approval of therapy is refused.
Public health surveillance in the United States is based on the collection of data at the local and state levels, followed by the classification of cases on the basis of standard case definitions developed by CSTE and adopted by the CDC for purposes of national notification and national counts. This system depends on health care providers reporting cases with sufficient information for case classification and/or the capture of positive laboratory test results with sufficient follow-up for case classification. It is a substantial surveillance burden and hard to sustain. Providers may not report cases, early clinical LD may be diagnosed prior to the seroconversion to a positive laboratory test, and adequate follow-up of positive laboratory results may not be possible. All of these factors contribute to underreporting (Hinckley et al. 2014). The discordance in LD trend over the two study periods between the claims data and cases identified by CDC through the traditional surveillance system suggests that underreporting is becoming more common, perhaps due to “reporting fatigue.”
Because we only analyzed only confirmed LD cases from CDC reported statistics, the change in case definitions between 1996, 2008, and 2011 should have minimal impact. The 2008 change allowed for a category of “probable” case that basically allowed for the expansion of clinical criteria. Across all three definitions, if a case had EM, laboratory confirmation is required for persons with no known exposure. One difference is that in the 2008 and 2011 definitions, a single-tier IgG immunoblot seropositivity is an option for meeting laboratory criteria.
The average treatment lengths based on claims prescription data were longer than the treatment length recommended by the IDSA (Wormser et al. 2006), which is 14–21 days. Whereas multiple studies concluded that extended antibiotic therapy provides no meaningful benefit (Klempner et al. 2001, 2013), many patients may continue antibiotics past 14 days for persistent symptoms or other perceived need.
Conclusions
Surveillance based on payer claims data analysis suggests underreporting of LD to current public health surveillance systems. These payer claims data could be a useful data source for LD surveillance and estimation of acute and chronic disease burden and trends (Allen-Dicker and Klompas 2012, Jones et al. 2013, Nelson et al. 2013, Mor et al. 2014, Marder et al. 2015). Surveillance efforts should focus on a fusion of data sources including payer claims that appear to be highly sensitive, with electronic laboratory data that afford high specificity, but appear to miss cases.
Footnotes
Acknowledgments
We thank Dara Manoach for suggesting that we develop predictive algorithms for chronic Lyme disease diagnoses, which led us to study the incidence of Lyme disease, and Karen Olson for preparing the dataset for analysis. This work was supported by a contract from the Massachusetts Department of Public Health and by the National Science Council, Taiwan (NSC 103-2917-I-564-063 to Y.J.T.). The National Science Council was not involved in designing or conducting the study, data collection, management, analysis, or interpretation, nor the preparation, review, and approval of the manuscript.
Author Disclosure Statement
No competing financial interests exist.
