Abstract
Purpose:
We sought to examine issues of generalizability in research on adolescent and young adult (AYA) cancer survivorship that relies on using community-based healthcare delivery system data.
Methods:
Individuals aged 15 to 39 diagnosed with cancer between 1992 and 2006 were identified using data from community-based healthcare systems in California and Seattle. Loss to follow-up was defined as the first disenrollment (the end) of membership in the healthcare systems after cancer. Censoring occurred at death or study end (2009). We used Kaplan–Meier analysis to quantify follow-up, and multiple Cox regression to examine the association of follow-up loss with demographic and cancer characteristics.
Results:
Of 6828 eligible AYAs, most (93%) were aged between 20 and 39 years at diagnosis; 62% were female and 39% were non-White. Solid tumors accounted for 81% of diagnoses. The majority (89%) of patients continued to be members of the healthcare systems and available for follow-up 1 year after diagnosis. Approximately 60% remained enrolled 5 years after diagnosis. Loss to follow-up was associated with younger age at diagnosis, male gender, and African American or Hispanic race/ethnicity.
Conclusion:
Data from community-based healthcare delivery systems offer an efficient way to identify large and diverse samples of AYA-onset cancer survivors. Differential loss to follow-up can threaten the generalizability of results from these studies and should be assessed quantitatively. Healthcare system data offer an alternative to studies requiring direct contact with participants.
In contrast, studies using community-based healthcare delivery system data can include all available survivors of AYA cancer, thus minimizing the impact of participation bias on generalizability. These systems enroll individuals into comprehensive insurance plans, with the bulk of medical care for enrolled members provided by clinicians affiliated with the system and in facilities owned and operated by the system. The membership of the systems generally reflects that of their local community. 7 This integration of insurance and medical care is unique to the United States, where insurance coverage and sources of care typically involve separate entities that interact only as it relates to payment for medical care.
Administration of community-based healthcare delivery systems requires the capture of an extensive amount of medical data. These data can then be used to identify and follow a population of survivors of AYA cancer who are representative of the local community. However, the generalizability of the results will diminish if disenrollment varies by demographic or cancer characteristics. This is akin to the problem of loss to follow-up in clinical trials. Our two goals were to: (1) examine the extent of disenrollment among AYAs diagnosed with cancer, and (2) determine if disenrollment varied by demographic and cancer characteristics.
Methods
Setting
Our study used data from two community-based healthcare delivery systems: Group Health (GH) in Seattle, Washington, and Kaiser Permanente Northern California (KPNC) in Oakland, California. The organization and function of these two healthcare delivery systems is very similar. Both have joined with other healthcare delivery systems to form the Cancer Research Network (CRN). 8 Funded by the National Cancer Institute (NCI), the CRN consists of the research programs, enrollee populations, and databases of 9 community-based healthcare delivery systems. The CRN's ultimate goal is to conduct collaborative research examining the effectiveness of preventive, curative, and supportive cancer interventions among diverse populations. The CRN has been a useful mechanism for studying survivors of adult cancer. 9 The GH Institutional Review Board provided approval for both delivery systems. Per institutional requirements, the Wake Forest University Institutional Review Board also approved the study. The need for informed consent was waived.
Population
Using a virtual data warehouse designed to facilitate access to electronic data by standardizing formats across multiple healthcare systems, 10 we identified a cohort of individuals enrolled in GH or KPNC who were aged between 15 and 39 years when diagnosed with an incident invasive cancer (other than nonmelanoma skin cancer) while enrolled in one of the systems. This choice of age range was driven by recommendations from a NCI-sponsored expert group. 1 To maximize sample size, patient identification began when enrollment records and cancer registry data were first computerized (1992 for GH, and 1996 for KPNC). Patient identification continued through 2006 for both systems. Patients were identified from enrollment and cancer registry data, which are described in the Data section.
Data
Disenrollment, defined as the first postcancer membership gap lasting at least 90 days, served as a proxy for loss to follow-up. The term “gap” refers to an indication in administrative data that an individual's enrollment may have ended. Gaps of fewer than 90 days are frequently due to delays in processing coverage changes rather than true disenrollment. Gaps were identified from enrollment data that are maintained for administrative purposes.
Cancer registry data were the source of demographic and cancer characteristics that we believed a priori might be associated with disenrollment. Both GH and KPNC maintain these data as part of their compliance with legal mandates requiring they report cancer diagnoses to the NCI's Surveillance, Epidemiology, and End Results (SEER) program. Thus, the quality and completeness of the cancer registry data are consistent with data available directly from SEER. Demographic characteristics of interest included age, gender, and race/ethnicity. Age was categorized into 5-year groups to facilitate examination of age-related trends in disenrollment. Cancer characteristics included cancer type and year of diagnosis. Cancer type was categorized into hematologic or solid tumor using the AYA categorization scheme used by SEER.11,12 There were no temporal differences in our results, so we categorized year of diagnosis by decade (1992 to 1999 and 2000 to 2006; as noted under Population, data were first available in 1992 for GH, and in 1996 for KPNC).
We also used cancer registry data to identify deaths. As previously noted, both healthcare delivery systems are required to meet SEER standards, including those for long-term follow-up. Thus, the quality and completeness of the death data are consistent with data available directly from SEER.
Statistical analysis
We used the Kaplan–Meier method to tabulate disenrollment and the cumulative incidence of death. Follow-up began at cancer diagnosis and ended at the first disenrollment after cancer diagnosis, a proxy for loss to follow-up. Data from later reenrollments were disregarded. Subjects who did not disenroll were censored, and follow-up ended on the date of their death or cessation of study follow-up on June 30, 2009. After tabulating frequencies for the characteristics possibly associated with disenrollment, we used multiple Cox regression adjusted for healthcare systems to assess the association of those characteristics with disenrollment. We found no violations of the assumptions underlying Cox regression models, with one exception. Evaluation of the cumulative martingale residual plot documented violation of the proportional hazards assumption for the 15- to 19-year-old age group. Evidence of this violation is shown in Figure 1, in which the solid line for the 15- to 19-year-old age group is of a different shape and crosses the lines for the other age groups. The remaining age groups did not violate this assumption, so models were stratified by age group (15–19 and 20–39). A p-value of 0.05 or lower was considered statistically significant. All analyses were conducted using SAS version 9.2 (SAS Institute, Inc., Cary, NC).

Kaplan–Meier plot of post-cancer disenrollment from two integrated healthcare delivery systems by adolescents and young adults diagnosed with cancer from 1992 to 2006 and followed through June 30, 2009 (N=6828).
Results
We identified 6828 AYA cancer patients, 93% of whom were aged between 20 and 39 years at cancer diagnosis (Table 1). The group was 62% female and 39% non-White. Solid tumors accounted for 81% of diagnoses, with female breast, thyroid, and genital tract cancers the most common. The three most common solid cancers were genital tract (1456 total; 747 female and 709 male), breast (1305), and thyroid (843). The three most common hematologic cancers were Hodgkin lymphoma (532), non-Hodgkin lymphoma (360), and leukemia (317). Because data for one healthcare delivery system were not available until the late 1990s, almost two-thirds of the cases were diagnosed between 2000 and 2006.
Groups are mutually exclusive. Total % does not equal 100% due to rounding error.
Hematologic cancers (n): Hodgkin lymphoma (532), non-Hodgkin lymphoma (360), leukemia (317), and other (92).
Solid cancers (n): genital tract (1456 total; 747 female and 709 male), breast (1305), thyroid (843), colorectal (388), central nervous system (298), soft tissue sarcoma (211), oral cavity (210), and other (816).
The cumulative incidence of death was 30% or higher for those aged 20 to 24 and 35 to 39 at diagnosis, versus roughly 20% for the remaining age groups (Table 2). Approximately 90% of patients of all ages remained enrolled and available for follow-up 1 year after diagnosis. In subsequent years, the proportion of patients available for follow-up varied by age (Fig. 1, Table 2). For example, loss to follow-up was notably higher for the 20–24 years age group at 5 years. Higher proportions of both that group and those aged 15 to 19 years were lost to follow-up at 10 years.
All subjects in the original cohort are accounted for at each time point. That is, the total sample size of the cohort will equal the total of those who are alive, deceased, or have reached the end of the follow-up period. The percentage remaining enrolled is based on the number of enrolled subjects divided by the number of living subjects (regardless of whether or not they remained enrolled).
This captures all deaths throughout the entirety of follow-up, which extended past 10 years for the number of patients alive at 10 years. The % refers to the total number of patients diagnosed, regardless of whether they disenrolled.
Regression analysis
Using disenrollment as a proxy for loss to follow-up, patients diagnosed when between 20 and 39 years of age were more likely to be lost if they were younger (adjusted hazard ratio [HR]=1.9; 95% confidence interval [CI]: 1.7–2.2 for 20–24 year olds vs. 35–39 year olds); male (HR=1.2; CI: 1.1–1.3); and African American or Hispanic (compared to White, respectively, HR=1.4; CI: 1.2–1.7 and HR=1.2; CI=1.1–1.4) (Table 3). Cancer type and year of diagnosis were not associated with potential loss to follow-up. We found similar results in the 15–19-year-old group, although only Hispanic race/ethnicity remained statistically significant (vs. White, HR=1.6; CI: 1.1–2.2). These results were stratified because the proportional hazards assumption did not hold for the youngest age group.
Adjusted for all variables shown, plus healthcare system.
Groups are mutually exclusive.
CI, confidence interval; HR, hazard ratio; ref, reference category.
Discussion
Among nearly 7000 AYAs diagnosed with cancer from 1992 to 2006 in two community-based healthcare systems, most remained enrolled and available for follow-up 1 year after diagnosis. Approximately two-thirds were available for follow-up at 5 years after diagnosis and about 40% remained available at 10 years. Survivors who disenrolled and thus were potentially lost to follow-up were more likely to be younger, male, and African American or Hispanic. Using secondary data in this manner may be a viable option for certain types of research into survivorship after AYA cancer. In the following paragraphs, we assess the relative strengths and limitations of primary data collection from study subjects and secondary data analyses.
The AYA HOPE Study gathered data directly from participants. After a recruiting process involving multiple mailings, telephone calls, and contact tracing, the response rate was 43% at 6–15 months after diagnosis. 6 In comparison, at 1 year after diagnosis, the majority of eligible subjects would have been available for outcomes studies using data from community-based healthcare systems similar to the two included in this study (i.e., members of the NCI-funded CRN). Thus, it is likely that the results from secondary analysis studies using data from community-based healthcare systems will be more generalizable than studies requiring primary data collection.
The availability of subjects for secondary analysis in our study declined after 1 year postdiagnosis. At 5 years, the proportion of individuals still enrolled was just over half of those diagnosed at younger ages and about two-thirds of individuals diagnosed at older ages. In contrast, follow-up in the AYA HOPE Study was more complete, with more than 90% of participants returning surveys 21–49 months after diagnosis. 6 It appears follow-up may be more complete when participants are actively followed as part of a study versus followed passively using electronic healthcare system data.
Our findings suggest that loss to follow-up via disenrollment is not random. While we identified no difference by cancer type, disenrollment was more common among males and non-Whites. Disenrollment was also more common in younger individuals, whose insurance may be less stable as they age out of parental coverage and/or leave the area for school or work. Similar patterns of differential response by age, gender, and race/ethnicity were observed for the AYA HOPE Study. 6 Thus, differential loss to follow-up may be a concern in all types of observational studies.
Lessons for research on survivors of AYA cancers may also be drawn from the Childhood Cancer Survivor Study 13 (CCSS), as the participants were contacted 5 or more years after their diagnosis of a childhood cancer, when many were in their AYA years. Similar to both our and the AYA HOPE Study, participation in CCSS differed by age, gender, and race/ethnicity. 13 Resource-intensive recruitment efforts resulted in a completion rate of 62%, 14 which is comparable to the number of individuals in our study who were available for follow-up at 5 years and lower than in the AYA HOPE Study.
We expect that studies in which contact is initiated from the community-based healthcare system that treated an individual would have similar or better success than the AYA HOPE Study and CCSS. Response rates for studies conducted by healthcare systems are comparable to, or better than, studies conducted by other groups. For example, in a study of lung and colorectal cancer patients, the response rate for CRN sites was 61%, compared to 58% for a cancer registry-based site and 49–64% for five university-based sites. 15 In addition, community-based systems offer unique advantages for tracing survivors of AYA cancers. Some AYAs who disenroll from a system later reenroll in that same system, and thus current contact information may be readily available. Potential subjects covered under a parent's or guardian's insurance policy at the time of diagnosis might be located through parents or guardians who remain enrolled. Prospective cohort studies use similar approaches, so this follow-up could presumably be structured to meet human subject protection requirements.
Whereas it is common practice to make a qualitative assessment of bias in studies actively following participants, studies relying on electronic data should use quantitative methods 16 to assess the impact of loss to follow-up on study results. For example, as was done in our analysis, subjects are typically censored at their first disenrollment, with data from later periods of reenrollment disregarded. Alternatively, data from reenrollment periods could be incorporated into analyses and missing data imputed for periods of disenrollment. In a preliminary assessment among subjects in one system, 25% of those who disenrolled subsequently reenrolled (pers. comm., J. Chubak, February 2012). Another alternative is to use statistical methods, such as inverse probability-of-attrition weighting, 17 to estimate results had those lost to follow-up continued to be available for analysis. In this method, the process of weighting accounts for differential loss to follow-up. Comparing the results of traditional analyses with these two approaches provides an opportunity to assess quantitatively how results have been influenced by loss to follow-up.
Our study has both strengths and limitations. Funding considerations restricted data collection to only two healthcare delivery systems. The two systems were selected based on investigator interest and data availability. We have no reason to believe that the results would substantively differ had other systems been included. Our results may overestimate loss to follow-up in future studies, as higher proportions of young adults will probably remain enrolled for longer periods of time due to recent insurance reforms in the United States requiring insurers to offer parental coverage for dependents up to the age of 26. 18 As is true of any cancer registry, including SEER, it was not possible to limit the study to individuals who had completed therapy. The key strength of our study is the availability and completeness of demographic, cancer, enrollment, and death data during a contemporary treatment era. Future existing data studies in community-based healthcare systems will be further strengthened by access to electronic medical records not yet available at the time of this study.
Conclusion
Existing electronic data from community-based healthcare systems offer an efficient approach to assembling samples of survivors of AYA cancer and obtaining information about those individuals before, during, and after their cancer diagnosis. Retrospective studies using these types of data are uniquely positioned to provide information about the outcomes of AYA survivors over a lengthier time period and more rapidly than is possible in studies relying on participant contact. An important limitation of studies using healthcare data—loss to follow-up—can be quantified and systematically assessed using emerging statistical methods.16,17 Healthcare system data offer an alternative to studies requiring direct contact with participants.
Footnotes
Acknowledgments
We thank Bill Tolbert and Maqdooda Merchant for assistance with data collection, and Drs. Ed Wagner and Larry Kushi for their support of this work. We also appreciate Dr. Jessica Chubak's helpful comments on a draft of this manuscript. Our research was supported by the National Cancer Institute at the National Institutes of Health (U19 CA 79689, Ed Wagner, Principal Investigator).
Author Disclosure Statement
No competing financial interests exist.
