Abstract
Background:
A widely representative health system cohort with longitudinal specimen collection can serve as an efficient clinical biobank resource for multiple studies. Because the full scope of a health system cohort can include both health care workers and patients, enrollment and biobanking efforts may be designed to engage these specific participant populations.
Methods:
For a multisite health system cohort that initially enrolled health care workers and then expanded to enroll patients, we evaluated the relative success of initiatives that specifically targeted enrollment of various health care worker and patient populations. We also compared enrollment rate success based on engagement type (active vs. passive), modality (in-person vs. virtual), and venue (clinical-based or community-based). Across each method of engagement, we compared the conversion rate from study consent to collected biospecimen.
Results:
For recruitment activities involving health care workers, enrollment rates varied based on active versus passive (62% vs. 0.8%) and in-person versus virtual (9.6% vs. 0.8%) engagement as well as clinical-based versus community-based (65% vs. 3.9%) venues (p < 0.001 for all). For health care workers, the overall conversion rate from consent to biospecimen collection was 87%. For recruitment activities involving patients, enrollment rates also varied based on active versus passive (53% vs. 0.8%) and in-person versus virtual (62% vs. 0.8%) engagement, as well as clinical-based versus community-based (70% vs. 41%) venues (p < 0.001 for all). For patients, the overall conversion rate from consent to biospecimen collection was 75%.
Conclusions:
For studies aiming to build a biorepository resource involving both health care worker and patient participants, the active rather than passive engagement methods are likely to achieve not only a higher rate of contact to consented enrollment but also a higher rate of conversion from consent to biospecimen collection. Further studies are needed to guide resource planning around biorepository building capacity for specific study designs.
Introduction
Health systems are optimally positioned to engage and enroll two general complementary types of potential study participants into longitudinal study cohort studies with biobanking—health care workers and patients. The advantages of a cohort study design deliberately focused on enrollment of health care workers and patients are multifold. First, health care workers represent a relatively healthy ambulatory subset of the population at large and, thus, are readily able to serve as controls for any given case–control or case–cohort designs focused on a particular disease phenotype or outcome that would be enriched among patients while accounting for potential confounders. 1 Second, both health care workers with established employment and patients who have established care at a given health system are likely to be readily available for repeated research engagement, including serial specimen collection over time. This includes the potential for collection of remnant samples,2,3 whether through employee health-related blood tests conducted among health care workers or through clinically indicated blood tests conducted among patients. Third, both health care workers and patients at a given health system are likely to have prospective clinical outcomes data captured within the electronic health record (EHR) of the same health system. This is because the employee health insurance plans for health care workers tend to favor receiving care within the same health system, and because patients consenting to participate in a research study hosted at a given institution are more likely to follow up for longitudinal and outcomes care at the same institution.4,5 Notwithstanding these strategic advantages, most prior longitudinal cohort studies involving either health care workers or patients have tended to predominantly focus on one or the other as a target study population. Recognizing that there are intrinsic differences in how each of these two general populations might be ideally engaged, we sought to understand how various approaches to enrolling from these two study populations into a single blended cohort could inform how similar study design and approaches could be best implemented in the future.
Methods
The overarching cohort study was originally established in May 2020 as a longitudinal study of risk, recovery, and resilience throughout the pandemic6–9 and now provides an infrastructure for a range of ancillary studies, including those focused on healthy versus unhealthy aging trajectories amid varying degrees of preexisting risk factor burden. All participants are recruited under a single set of study protocols designed to engage the broad range of potential study subjects from across the multicenter Cedars-Sinai Health System, which serves an urban catchment area of ∼2 million residents of the major metropolis of Los Angeles, California. Given the intentionally broad study design, the enrollment protocols did not involve any exclusions based on the presence or absence of any particular clinical characteristic. All protocols were reviewed and approved by the Cedars-Sinai Medical Center institutional review board.
Recruitment procedures
For all aspects of cohort study development, recruitment and retention activities use a variety of approaches applied across a range of venues, as outlined below and in Figures 1 and 2.

Study operational overview.

Study operational workflow.
For health care worker participants, passive recruitment efforts have included sending a printed letter in a mass mailer, sending letters via e-mail, and posting notices in periodic e-mail news bulletins sent to all employees. Active recruitment efforts have included engaging health care workers attending on-site employee wellness events, attending seasonal influenza and COVID-19 vaccine stations established by the employee health department, and partnering with in-hospital unit and ward staff supervisors to engage nurses and respiratory technicians, as well as other staff at or near their workstations, particularly around the time of shift changes.
For patient participants, passive recruitment efforts have included sending a printed letter in a mass mailer that is sent on behalf of the study team or on behalf of collaborating clinical providers. The first wave of such mass mailers was distributed during the late months of 2020 and the early months of 2021. Active recruitment efforts have included engaging ambulatory patients attending community-based or regional work site wellness events. In addition, for clinical programs run by partnering or collaborating clinician providers, we preidentified patients eligible for enrollment and proactively called these patients prior to their scheduled appointments so that we could coordinate meeting them for in-person engagement in the waiting areas of their clinic providers on the day of their appointments. For the clinics run by these same partnering or collaborating clinical providers, we have also engaged patients de novo in the waiting room areas. The first wave of such in-clinic engagement activity began during the late months of 2021 and the early months of 2022.
Health data and biospecimen collection
For all recruited and enrolled participants, the initial study encounter involves completion of an electronic survey regarding exposures, medical history, and health outcomes, along with a venous phlebotomy blood draw into two ethylenediaminetetraacetic acid (EDTA) 10 mL tubes that are processed for storage of EDTA plasma and buffy coat. Additional biophysical profiling assessments include noninvasive diagnostics, including anthropometry and cardiac, metabolic, neurocognitive, musculoskeletal, and vascular assessments. Ongoing monitoring for completion of surveys, biophysical profiling, and biospecimen collection is conducted via our electronic tracking platform, comprised of a custom-built integration between REDCap and LabVantage. Our infrastructure actively curates, validates, and harmonizes data continually from multiple sources, including standardized survey instruments, the EHR, research cores, pharmacy databases, and outcomes registries. All of these integrated data are managed by analytical workflows optimized to facilitate further expansion, ancillary studies, and research questions (Fig. 2).
Statistical analyses
We analyzed all data collected regarding recruitment, enrollment, and biospecimen collection since the start of the cohort study. We specifically examined the rates of successfully completed enrollment, for any venue where the total of potentially eligible enrollees could be determined or estimated. We considered a completed enrollment as a signed consent form. For all participants successfully enrolled, we also specifically examined the rates of successful collection of at least one blood biospecimen, and this was calculated as the rate of conversion from consent to biospecimen collection. We analyzed successful enrollment rates and conversion-to-biospecimen rates for health care worker participants and for patient participants separately. For these two general target groups of eligible health system cohort study participants, we also analyzed enrollment and conversion-to-biospecimen rates associated with the different general approaches to enrollment outlined above with a primary focus on active compared with passive strategies and a secondary focus on in-person compared with remote (i.e., virtual) engagement strategies as well as clinical-based compared with community-based venues for engagement. For all enrolled participants, we additionally analyzed rates of retention, defined as having continued engagement with the study in any capacity (e.g., through follow-up blood draw, survey completion, and contribution of clinical data) at any timepoint following the initial date of enrollment and having not withdrawn from the study. In addition to comparing enrollment rates between methods of engagement, we also evaluated the relative success of different methods of engagement across subsets of the eligible study population. In particular, we compared enrollment rates by methods of engagement in health care workers versus patients as well as in older versus younger aged individuals and by racial/ethnic groups. We used the two-proportion Z-test to conduct between-group comparisons. We conducted all statistical analyses using the R statistical programming software (v4.3.1) and considered a two-tailed p < 0.05 statistically significant.
Results
Overall characteristics of the study cohort are shown in Table 1. The health care worker subcohort compared with the patient subcohort was younger in age (46 ± 13 years vs. 56 ± 16 years) and included a higher prevalence of females (68% vs. 52%).
Total Combined Cohort Characteristics
SD, standard deviation.
For health care worker participants, the active engagement strategies were more effective than passive engagement strategies (Table 2 and Fig. 3) with respect to initial enrollment rates (62% vs. 0.8% enrolled of all contacted, p < 0.001) as well as with respect to conversion from consent to collection of a biosample (88% vs. 59% sampled of all enrolled, p < 0.001). We also found greater success when using in-person compared with virtual methods of communication (9.6% vs. 0.8% enrolled of all contacted, p < 0.001; 77% vs. 59% sampled of all enrolled, p < 0.001) and when engaging in clinical-based compared with community-based venues (65% vs. 3.9% enrolled of all contacted, p < 0.001; 100% vs. 37% sample of all enrolled, p < 0.001). Overall retention rate for health care workers was 99% (as of July 11, 2025).

Rates of enrollment (Panel A) and of conversion from consent to biospecimen collection (Panel B) are shown by engagement approach.
Recruitment and Retention Rates by Method of Potential Participant Engagement
*p < 0.001 for all comparisons made using the two-proportion Z-test of enrollment rate differences between the two methods indicated.
For patient participants, the active engagement strategies were more effective than passive engagement strategies (Table 2 and Fig. 3) with respect to initial enrollment rates (53% vs. 0.8% enrolled of all contacted, p < 0.001) as well as with respect to conversion from consent to collection of a biosample (92% vs. 60% sampled of all enrolled, p < 0.001). We also found greater success when using in-person compared with virtual methods of communication (62% vs. 0.8% enrolled of all contacted, p < 0.001; 93% vs. 60% sampled of all enrolled, p < 0.001) and when engaging in clinical-based compared with community-based venues (70% vs. 41% enrolled of all contacted, p < 0.001; 96% vs. 76% sample of all enrolled, p < 0.001). Despite less intensive engagement methods leading to an overall lower rate of patient enrollment, the absolute number of total patients enrolled even through passive engagement approaches was substantial given the large size of the source population of patients (Fig. 4) that, in turn, contributed to a majority of total biospecimens collected particularly in the latter years of study enrollment (Fig. 4). Overall retention rate for patients was 95% (as of July 11, 2025).

Total cumulative enrollment by study population and by engagement approach are shown by total number of participants enrolled (Panel A) and by total number of blood samples collected (Panel B).
When analyzing the relative success of engagement methods between health care workers and patients, we found there was no significant between-group difference in the relatively low enrollment rates attained through passive engagement methods (Table 3). However, active engagement methods appeared more effective in achieving somewhat higher enrollment rates among health care workers compared with patients (61.8% vs. 52.5%, p < 0.001). Similarly, virtual communication methods were equally low yield in health care workers and patients. However, in-person communication methods led to substantially higher enrollment rates in patients compared with health care workers (62.4% vs. 9.6%, p < 0.001). Although clinical-based venues led to moderately high enrollment rates in both participant groups, these rates were somewhat higher in patients than in health care workers (69.6% vs. 65.0%, p = 0.004). However, community-based venues of engagement were much more likely to achieve higher enrollment rates in patients compared with health care workers (40.8% vs. 3.9%, p < 0.001).
Comparison of Enrollment Rates by Method of Engagement and by Participant Group
p Value was obtained from the two-proportion Z-test comparing enrollment rate differences between health care worker and patient cohorts for the same method.
In analyses of enrollment rates by age group and race/ethnicity, results were similar with some exceptions (Supplementary Table S1). Health care workers aged 50 years or older were more successfully engaged than younger counterparts through active methods (75.1% vs. 56.4%, p < 0.001). Patients aged 50 years or older were similarly more successfully engaged than their younger counterparts through in-person communication methods (65.1% vs. 58.4%, p < 0.001). For both health care workers and patients, non-Hispanic White individuals compared with individuals of other racial/ethnic groups were generally more successfully engaged across all methods (Supplementary Table S2). However, other racial/ethnic groups were enrolled at greater rates through in-person communication, clinical-based venues, and community-based venues (p < 0.001 for all).
Discussion
The main findings of our study are 4-fold. First, active compared with passive recruitment strategies were over 70–80 times more likely to lead to successful enrollment of either a health care worker or a patient participant, with overall slightly greater rates of engagement seen among health care workers. Second, in-person compared with virtual methods of engagement were 10-fold more successful in health care workers and over 80-fold more successful in patients. Third, clinical-based versus community-based settings for engaging in research were also more effective, especially for health care workers but also for patients. Last, the consent to collection of biospecimen conversion rate was very high at 80%–90% for all actively engaged study participants, although it trended lower for any participants whose initial contact with the study team was through a passive strategy or using a remote or virtual method of communication.
Our study extends from reports of prior cohort-building efforts in several ways. Whereas most prior large cohort studies have focused predominantly on a single type of participant population (e.g., health care workers, health system patients, or community-dwelling adults),10–12 we developed our health system-based platform to engage the two major eligibility groups, including both the patients served and those employed to deliver or support the care for these patients. Whereas many prior studies have also involved predominantly one method or approach to participant recruitment (e.g., mail survey studies, patient registration-based recruitment, and clinic encounter-based recruitment),12–14 we implemented a multimodality strategy for enrolling across our two target participant populations. We deliberately engaged each population separately, in parallel, to prioritize communications around the interests of each particular group (e.g., health care workers employed in a particular setting, such as the intensive care unit, or patients with a particular disease condition, such as advanced heart disease). Nonetheless, we also implemented multimodality approaches with common elements. Features of the multimodality approach included, on the one hand, more active engagement strategies that are more resource intensive and, on the other hand, passive strategies that are more cost-effective while expected to be lower yield. Exactly how effectiveness might differ between these strategies, which would inform resource planning, was not entirely clear until conducting the current analysis.
The finding that active compared with passive approaches were overall 70–80 times more effective is not surprising and in some settings is still dwarfed by the relative cost of implementing an active over a passive operation. In fact, in many situations, a passive strategy may not only be sufficient but preferred and particularly in circumstances where there are severe time or cost constraints. Our results suggest that the difference in taking a more active or more passive approach to study engagement is relatively similar for health care workers and patients, although health care workers may be more readily enrolled given their generally higher levels of health literacy, intrinsic trust in the medical establishment, and strong appreciation for a biomedical research mission.15–17 By contrast, the method of communication used to engage health care workers versus patients appears to make a difference. Although in-person or “live” methods of communication were 10 times more effective than virtual methods of communication for health care workers, they were 80 times more effective in patients. This could be because health care workers tended to be younger than patients in our study on average and, thus, perhaps more comfortable with and responsive to virtual modes of communication. It is well known that when additional time and resources are directed toward optimizing rapport, trust, and transparency, any efforts to engage individuals in research are more likely to be successful. 18 It is also well known that technology literacy, access, and familiarity are all potential challenges when engaging older-aged individuals and particularly those with comorbidities.15,16,19,20 Our finding that study engagement was more effective in the clinical setting rather than the community setting is also not surprising, particularly given that engagement occurred directly in or adjacent to areas of professional occupation for the health care workers. Importantly, consent rates to biospecimen collection were high across all modalities of engagement, a finding that has been similarly reported in prior studies21,22; however, we found that this trend was especially strong when the initial engagement was through an active as opposed to a passive strategy. This specific finding can be used to guide resource planning, particularly for studies wherein the primary or predominant study aim is the biospecimen collection, such as for the purpose of building a biorepository of a biobank cohort resource.
Although we were unable to conduct analyses of enrollment rates by disease groups, given the broadly inclusive design of the study without any prespecified clinical criteria for eligibility, we did analyze enrollment rates by demographic characteristics. We found generally consistent results across age and race/ethnicity subgroups, although we did find that older compared with younger individuals were more successfully engaged by active or in-person communication methods—and that non-White or Hispanic individuals were more successfully engaged through not only in-person communication but also clinical-based as well as community-based venues. These findings reinforce the importance of recognizing that investments in certain resource-intensive methods are often needed to either assure or augment study enrollment rates among groups of eligible participants who may be at risk for being under-represented in clinical research studies for a variety of reasons.
Several limitations of the study merit consideration. Our cohort study design leveraged electronic systems that are available at our institution and that may or may not be available in a similar capacity across other institutions. Our main study protocol is a relatively streamlined protocol that involves an initial short survey and a minimal conservative number of EDTA blood tubes collected at study entry, as well as at follow-up timepoints. This streamlined protocol was intentionally designed to minimize participant burden and maximize recruitment and retention while allowing for a majority of immunological and molecular assays to be applied. However, such a streamlined protocol may not fit the needs of all studies, including alternate biobanking or biorepository protocols, and so variation in the potential or perceived participant study burden could also impact the generalizability of our results. Our study was also initiated during the onset of the pandemic, although it continued with ongoing additional enrollment into non-pandemic-related ancillary studies while using the same study infrastructure and approaches to recruitment and retention.
Our study findings indicate the feasibility and effectiveness of implementing a multimodality engagement approach to building a combined cohort of health care workers and patients within a single health system. Recognizing that health care worker and patient participants represent different and complementary sets of assets and scientific value to a cohort build, they also represent different types of populations with distinct preferences regarding study engagement. Although active compared with passive engagement methods demonstrate greater effectiveness for both participant types, such resource-intensive methods are not always more efficient depending on the balance between operational costs and desired output in terms of absolute numbers of participants enrolled. Nonetheless, for study designs that aim to prioritize biospecimen collection, the more active engagement methods are more likely to return a higher conversion rate of consent to collection of biospecimen. Additional studies are needed to further understand the relative return on research investment to guide resource planning around biorepository building capacity for various general as well as specific study designs.
Authors’ Contributions
B.K. and Y.H.K. participated in the methodology, software, validation, formal analysis, investigation, data curation, writing—review and editing, visualization, supervision, and project administration. M.W. participated in software, validation, formal analysis, data curation, writing—review and editing, visualization, and project administration. W.W. participated in software, validation, investigation, data curation, writing—review and editing, visualization, and project administration. S.C.L. participated in software, validation, investigation, data curation, writing—review and editing, visualization, supervision, and project administration. M.B. participated in validation, investigation, writing—review and editing, visualization, supervision, and project administration. M.M. participated in validation, investigation, data curation, writing—review and editing, visualization, supervision, and project administration. J.L.K. participated in validation, investigation, data curation, writing—review and editing, visualization, and project administration. C.T. participated in validation, investigation, writing—review and editing, supervision, and project administration. G.A.M. participated in software, validation, investigation, data curation, writing—review and editing, and project administration. B.S. participated in investigation, writing—review and editing, supervision, and project administration. P.B. participated in software, validation, formal analysis, and writing—review and editing. T.T.N., B.F., K.G., M.B., Y.L., M.R., K.R., T.A.T., and E.W. participated in investigation, writing—review and editing, and project administration. J.N. participated in software, validation, formal analysis, data curation, and writing—review and editing. N.S. participated in software, validation, data curation, and writing—review and editing. M.W. participated in software, validation, formal analysis, data curation, writing—review and editing, visualization, and supervision. A.C.K. and J.E.E. participated in conceptualization, methodology, writing—review and editing, supervision, and project administration. K.S. participated in conceptualization, methodology, resources, writing—review and editing, project administration, and funding acquisition. S.C. and S.Y.J. participated in conceptualization, methodology, software, validation, formal analysis, investigation, resources, data curation, writing—original draft, writing—review and editing, visualization, supervision, project administration, and funding acquisition.
Footnotes
Acknowledgments
The authors are grateful to all their patients and to the frontline health care workers in their Cedars-Sinai Medical Center’s health system who continue to be dedicated to delivering the highest quality care. The authors would like to thank staff of the Research Informatics and Scientific Computing Core, the COVID Recovery Program, the Comprehensive Transplant Center, the Advanced Heart Disease Center, the Center for Weight Management and Metabolic Health Clinic, the Angeles Clinic and Research Institute, the Cedars-Sinai Cardiology Medical Group, the Department of Radiation Oncology, the Cedars-Sinai Samuel Oschin Comprehensive Cancer Institute, the Center for Rheumatology, and the Institute for Research on Healthy Aging.
Author Disclosure Statement
S.Y.J. has served as a consultant for Sapient Bioanalytics, a company that supported the collection and processing of samples for this study. The remaining authors have no relevant potential conflicts.
Funding Information
This work was supported in part by Cedars-Sinai Medical Center, the Erika J. Glazer Family Foundation, and Sapient Bioanalytics, LLC.
Role of the Sponsors
The funding sources had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the article; and decision to submit the article for publication.
Data Availability Statement
Requests for deidentified data may be directed to the corresponding authors (S.C. and S.Y.J.) and will be reviewed by the Office of Research Administration at Cedars-Sinai Medical Center prior to issuance of data sharing agreements, which are designed to ensure patient and participant confidentiality.
