Abstract
Introduction:
Biospecimens are an important part of conducting population-based research as they allow the linkage of biological information to other important clinical, social, and environmental factors, providing a more robust understanding of cancer prevention, treatment, and care options. It can be costly and labor-intensive to collect and process these biospecimens, making the use of preexisting, banked biospecimens an appealing option for researchers.
Objective:
This study examines the use of existing biospecimens in National Cancer Institute (NCI)-funded population-based cancer control research grants managed by the Division of Cancer Control and Population Sciences (DCCPS) as of January 2024.
Methods:
A total of 104 grants managed by DCCPS, NCI that involved the use of existing biospecimens were included in this analysis. Information that was abstracted from the grants included cancer type, biospecimen type, study design (intervention or observational), cancer continuum category (risk/etiology or survivorship), and named biospecimen resource.
Results:
The most commonly used biospecimens were blood products (64.4%), DNA (59.6%), tissue (53.8%), and RNA (26.9%). Risk-focused studies mainly used blood products, while survivorship studies favored tissue and RNA. There was also a notable difference in biospecimen use between studies of common versus rare cancers, with rare cancer studies using tissue and RNA samples less frequently than studies of common cancers.
Conclusion:
The variety of biospecimen types being used to examine a breadth of hypotheses related to cancer risk and survivorship emphasizes the value of biospecimen resources across the cancer continuum.
Introduction
Biospecimens are an important part of conducting population-based research as they allow the linkage of biological information to other important clinical, lifestyle, social, environmental, and molecular factors, providing a more robust understanding of cancer prevention, treatment, and care options. Biospecimens can also enable the discovery and validation of key biomarkers, increase our understanding of disease mechanisms, and contribute to improving treatment outcomes in personalized medicine. Biospecimen analysis techniques are rapidly advancing,1,2 further opening the possibilities of their use to elucidate knowledge gaps in cancer research. In light of these potential applications, biospecimens are often collected as part of population-based studies with two functions in mind: to address current investigator-initiated (prospective) research specific aims and to bank additional supplies for future, to-be-determined use. It can be costly and labor-intensive to collect and process the large number of biospecimens that are often needed in population-based research, making the use of preexisting, banked biospecimens an appealing option for researchers.
In 2014, Carrick et al. 3 published an analysis of National Cancer Institute (NCI)-funded grants active as of July 2012 and managed by the Division of Cancer Control and Population Sciences (DCCPS) to investigate the use of biological specimens in population-based cancer control research. In that article, the authors reported that 19.4% of the grants funded by NCI and managed by DCCPS utilized preexisting biospecimens (and did not collect new biospecimens), with DNA, tissue, and serum/plasma being the most commonly utilized biospecimen types among these grants. Through that analysis, several areas of improvement were identified to facilitate population-based cancer research with cost- and time-effective approaches to biospecimen use in mind. This included expanding the use of biospecimens in behavioral research studies, increasing the collection of rarer specimens and specimens from rarer cancer types, conducting validation studies of biospecimen collections to ensure stored samples are suitable, and continuing to foster the sharing of biospecimens.
Since then, bioinformatic and technological advances have empowered researchers to ask novel questions and permitted the use of a wider variety of biospecimen types, enabling a more comprehensive understanding of how molecular, clinical and environmental factors interact in cancer risk and outcomes. For example, spatial omics technologies have been developed, enabling analysis of tumor heterogeneity, profiles, evolution, and microenvironment. 1 Additionally, integration of tumor immunology with germline genetics and exposures has facilitated integrated analyses of peripheral blood circulating cells, cell-free plasma, tissue, feces, and urine in research aimed at better understanding strategies for cancer prevention and management. 2
The goals of this analysis were to (1) describe the current use of existing biospecimens in NCI-funded grants active as of January 2024 and managed by DCCPS, NCI, and (2) determine whether the areas identified 10 years ago to improve the use of preexisting biospecimens have been addressed.
Methods
In January 2024, the Information for Management, Planning, Analysis, and Coordination (IMPAC II) records (National Institutes of Health’s proprietary system containing information about all NIH extramural research projects) were queried using NCI’s Portfolio Management Application to identify active grants (excluding supplements) managed by NCI’s DCCPS. Data obtained from IMPAC II for this analysis included grant type, grant mechanism, and the fiscal year in which the competing grant application was awarded. Additionally, information available on biospecimen use was obtained for the identified grants. The biospecimen-related data are available for all active DCCPS grants starting in 2014 and are based on an annual review of grant abstract, aims, research design, and methods. Biospecimen-related variables available include whether the grant application describes the collection of new biospecimens and/or the utilization of existing biospecimens (i.e., specimens that have been collected previously for other purposes) as well as the biospecimen type. Biospecimen type categories are DNA, cell lines, strains (includes Epstein–Barr virus transformed lymphocytes, but not purchased cell lines), red blood cells (RBC), feces, leukocytes/buffy coat, urine, peripheral blood lymphocytes (PBL; includes peripheral blood mononuclear cells [PBMC], isolated T cells, etc.), buccal/saliva/oral swab for DNA, Guthrie cards/blood spot, cervical swab, blood (whole blood unprocessed blood components), serum or plasma, tissue culture, tissue (fresh, frozen, paraffin embedded tissue blocks, slides, bone marrow, tissue microarray), RNA, and saliva for biomarkers. For this analysis, whole blood, serum/plasma, RBCs, PBLs, and PBMCs were combined into a single “blood products” variable. The tissue category mostly includes formalin fixed paraffin embedded (FFPE) tissue.
As of January 2024, DCCPS managed 1039 active grants. The focus of this analysis was on population-based cancer research studies that used only existing biospecimens. We defined existing biospecimens as biological materials that were collected prior to the funding period of the active grant included in the present analysis. These existing biospecimens were collected from individuals for research or clinical use and were stored in ways that allow researchers to request those stored materials for new studies. After excluding grants that did not include biospecimens (i.e., no biospecimens at all were included in the grant), those that included new collection of biospecimens for the purposes of the grant, or those that did not use existing biospecimens to test a population-based cancer research hypothesis (i.e., the existing biospecimens were used for methods development, etc.), 104 grants remained (Fig. 1) and are listed in Supplementary Table S1. Note that the analysis included all grant types (including exploratory investigator-initiated awards [e.g., NIH R21 activity code], small research investigator-initiated awards [e.g., NIH R03 activity code], program project awards [e.g., NIH P01 activity code], independent investigator-initiated awards [e.g., NIH R01 activity code]).

Biospecimen grant review process. One thousand and thirty-nine grants were included. Of these, 652 were initially excluded because they did not include biospecimens (i.e., no biospecimens at all were included in the grant) and 257 were excluded because they involved new collection of biospecimens. Upon further review, 26 additional grants were excluded because while they included existing biospecimens, the existing specimens were not being used to test a population-based cancer research hypothesis (i.e., they were used for methods development, etc.). One hundred and four grants that only used existing biospecimens for population-based cancer research were the focus of this article.
Each of the 104 grants were coded independently by two coauthors who abstracted the following information from the study aims: cancer type, study design (intervention or observational), cancer continuum category (risk/etiology [defined as pre-diagnosis and referred to in this article as “risk”] or survivorship [defined as focused on the health and well-being of a person with cancer from the time of diagnosis until the end of life]), sex (male or female), and named biospecimen resource. If a consortium was named as the resource used, only the name of the consortium was abstracted. Discrepancies in coding were resolved by consensus or a third reviewer. For analysis purposes, rare cancers were defined as those with an age-adjusted incidence rate of less than 15/100,000 in the population using the NCI-supported Surveillance, Epidemiology, and End Results (SEER) Program Incidence Data, November 2023 submission (1975–2021). 4 This study did not require NIH institutional review board review since it is not human subjects research, as the analyses were restricted to summaries of existing studies, did not include identifiable private information or biospecimens, and involved no interaction or intervention with human participants.
Results
Of the 1039 active grants managed by DCCPS, 652 were excluded because they did not include any biospecimens (i.e., no biospecimens at all were included in the grant), 257 were excluded because they included new collection of biospecimens for the purposes of the grant, and another 26 were excluded because they did not use existing biospecimens to test a population-based cancer research hypothesis (i.e., the existing biospecimens were used for methods development, etc.). The remaining 104 grants (10.0%) used preexisting biospecimens only (Fig. 1). The 104 grants spanned multiple funding mechanisms and DCCPS program areas, 5 with the majority being R01s (63.5%; an R01 is the most common type of research grant awarded by the NIH for independent investigator-initiated research projects) and aligned with the “Epidemiology and Genomics” program area (94.2%; Table 1). Most of the grants utilizing existing biospecimens were risk-focused grants (53.8%), with 29.8% focused on survival or survivorship, and 16.3% focused on both risk and survivorship. The grants spanned many different cancer types, with breast cancer being the most common cancer of focus (24%), followed by colorectal cancer (11.5%) and blood cancers (11.5%). Of the 104 grants, 2.9% used biospecimens from childhood cancer survivors, 97.1% from adults, 31.7% from females only, and 57.7% from both males and females.
Existing Biospecimens Utilized in Division of Cancer Control and Population Sciences Grants Active as of January 2024 (n = 104), by Study Characteristic (Row Percents)
Includes whole blood, serum, plasma, red blood cells, peripheral blood lymphocytes, and peripheral blood mononuclear cells.
The four most commonly used biospecimens were blood products (64.4%), DNA (59.6%), tissue (53.8%), and RNA (26.9%) (Table 1). Other less frequently utilized biospecimens included urine (5.7%), cell lines/strains (4.8%), buccal cells/saliva for DNA (4.8%), Guthrie cards/blood spots (2.9%), cervical swabs (1.0%), tissue culture (1.9%), and saliva for biomarkers (1.9%) (Supplementary Table S2). No grant utilized existing fecal samples.
Use of blood products was higher among risk-focused only grants (75.0%) compared with those focused on survivorship (54.8%) or both risk and survivorship (47.1%), while use of DNA or RNA was higher among survivorship grants compared with the other cancer continuum type grants. Furthermore, use of tissue was higher among grants with a survivorship component (67.7%) versus risk-focused only grants (41.1%).
A similar percentage of the common versus rare cancer-focused grants utilized blood products and DNA; however, a higher percentage of the common cancer-focused versus the rare cancer-focused grants utilized tissue (68.9% vs. 33.3%) and RNA (32.8% vs. 19.4%). Blood product and RNA use were higher among grants focused on hematologic cancers versus those focused on solid tumors.
Among grants that referenced their source of biospecimens, there were over 150 uniquely named resources, with about half of the grants utilizing more than one resource (Supplementary Table S3). Biospecimen resources named in three or more grants (n = 18) are shown in Table 2. The Southern Community Cohort Study was named in almost one-fifth of the grants included in this portfolio analysis (17.3%). Other frequently named resources were the Multiethnic Cohort (n = 12), the Women’s Health Initiative (n = 11), the Nurses’ Health Study (n = 9), and the Black Women’s Health Study (n = 7).
Biospecimen Resources Named in Three or More Active Division of Cancer Control and Population Sciences Grants Utilizing Existing Biospecimens, January 2024
Discussion
Rather than collecting new biospecimens for a study, which can be costly and time-intensive, using existing biospecimens for cancer control and population science research can be an appealing and effective option for researchers. Although it is often challenging to identify preexisting biospecimens that have the appropriate accompanying clinical, lifestyle, social, environmental, and/or molecular data, once biospecimens are identified, this information has the potential to address many key cancer-related hypotheses. This analysis of grants managed by DCCPS, NCI was conducted to understand the use of existing biospecimens in population-based cancer research. Most of the grants using biospecimens supported Epidemiology and Genomics research (94.2%, Table 1) and represented aspects of research across the cancer continuum including risk and survivorship.
The 2014 analysis described in Carrick et al. 3 found that 19.4% of active NCI DCCPS grants leveraged preexisting biospecimens only. Our new analysis found 10.0% of grants active in 2024 used preexisting biospecimens only. The difference may be attributed to a slight increase in the overall percentage of active NCI DCCPS grants not using any biospecimens (Supplementary Fig. S1). Since 2017, the proportion of newly funded population-based cancer grants not utilizing any biospecimens has increased slightly from around 53% in 2015 to around 65% between 2017 and 2024. One possible explanation for this trend is the increasing emphasis on the secondary use of already existing data. NIH has encouraged the use of existing data, and starting in 2017, the NCI issued funding opportunities for secondary analysis and integration of existing data to elucidate cancer risk and related outcomes; NCI grant funding opportunities for investigator-initiated cancer-related research funding using already existing data included NCI notices: PA-17-239, PA-17-243, PAR-25-095, PAR-25-096, PAR-20-276, and PAR-20-277. What is clear is that biospecimens continue to be used in a substantial portion of grants across the entire analysis period, emphasizing the need to continue supporting initiatives for researchers to use both retrospectively and prospectively collected materials.
Recent advances in molecular technologies have allowed researchers to test a variety of previously unexplored hypotheses in cancer control research. In the 2014 analysis, 3 DNA was the most common biospecimen included in grants using existing biospecimens (72.0%). In the current analysis, blood is the most common (64.4%); in the 2014 analysis, approximately 52.0% of grants used existing blood (serum, plasma, or blood). This increase might reflect development of novel technologies allowing the measurement of analytes in preserved blood specimens. For instance, one recent population-based study measured cotinine in previously collected maternal serum as an indication of smoking during pregnancy and the potential impact on testicular cancer risk in offspring. 6 Advances in technologies have also spurred a growing interest in integrating transcriptomics with genomics data to gain a better understanding of cancer risk, identify novel biomarkers, and better characterize tumor subtypes. Our current analysis found existing RNA specimens were used in 26.9% of the grants, but the 2014 analysis 3 found that less than 5.0% of the grants used existing RNA specimens, which may support the idea of increasing interest in transcriptomics and genomics in population-based cancer research. Some examples of this use of existing RNA specimens were identified in articles resulting from grants included in this current analysis; for example, RNA was used for long noncoding RNA profiling as a potential regulator of tumor aggressiveness and prognosis 7 and for gene expression profiling of inflammatory factors as potential mediators of accelerated aging in breast cancer patients. 8
While rare cancers account for 25% of all new cases of adult cancers each year in the United States, it can be difficult to conduct epidemiologic studies of rare cancers due to challenges in achieving sufficient sample sizes. 9 This analysis provided information about the types of biospecimens used by funded rare cancer grants. Studies of rare cancers tended to use mostly blood products (63.9%), DNA (58.3%), tissue (33.3%), or RNA (19.4%). While the percentage of rare cancer studies utilizing blood products and DNA was similar to the common cancer studies, there were notable differences between rare and common cancer grants utilizing existing tissues (33.3% of rare cancer studies vs. 68.9% of common cancer studies) and existing RNA (19.4% vs. 32.8%). These differences are likely due to the availability (or lack thereof) of existing tissue and RNA resources for population-based rare cancer versus common cancer grants.
The current analysis showed tissue is one of the most utilized biospecimen types (53.8% of grants utilized tissue). A recent effort to help facilitate future activities involving tissues for population-based cancer research is the SEER Virtual Repository.10,11 In 2024, the NCI established the SEER Virtual Repository as an infrastructure for researchers to obtain de-identified, but linked, tissue and data on cancer cases ascertained by the SEER Program cancer registries.10,11 This program is anticipated to facilitate future studies of cancer types that have traditionally been difficult to study in population-based studies, such as early onset cancers, rare cancers, cancer cases with unusual cancer outcomes, and cases with multiple diagnoses of a primary cancer.
In this analysis, the most common sources of existing biospecimens mentioned were long-standing (at least 20 years) epidemiology cohort studies with at least 50,000 enrollees at baseline. Three of the most commonly referenced sources include epidemiology cohorts such as the Southern Community Cohort Study, the Multiethnic Cohort Study, and the Women’s Health Initiative (WHI) cohort and the WHI Life and Longevity After Cancer cohort. These are large, longitudinal studies with well-annotated specimens that have existed for many years, making them rich biospecimen resources relevant to population-based cancer research. In addition to large epidemiologic cohort studies, other sources of existing biospecimens included case–control studies, randomized controlled trial studies, consortiums, networks, and hospital or other specialized biobanks (e.g., Komen Tissue Bank, Children’s Hospital of Philadelphia Biobank). The sources of existing biospecimens were utilized by a combination of investigators who originally developed the resource and by investigators unaffiliated with the original resource. Although these older collections of existing biospecimens have certain limitations, such as biospecimen quality considerations, limited accompanying data (such as epidemiologic, comorbidity, and treatment data), and data on potentially outdated clinical practices, treatments, and/or exposures that may have changed over time, they represent an important potential source of biospecimens. Researchers need to be cognizant of these potential issues when selecting biospecimens and designing their studies. Researchers also need to consider these limitations when selecting a study design, such as making appropriate case–control selections (e.g. matching cases with controls from the same time period), considering changes in exposure variables over time using longitudinal or multipoint analysis models, and utilizing biospecimens that are paired with robust data on specimen handling and storage and from biobanks that have quality control validation data on biospecimen suitability.
Our previous analysis in 2014 noted several areas of improvement to better facilitate population-based cancer research using preexisting biospecimens. 3 Noted areas included expanding the use of biospecimens in behavioral research studies, increasing collection of rarer specimens and specimens from rarer cancer types, conducting validation studies of biospecimen collections to ensure stored samples are suitable, and continuing to foster sharing of biospecimens. Progress has been made on some, but not all, gap areas that were identified in 2014. In 2014, we speculated that behavioral cancer research studies would benefit from greater use of biospecimens as analytical methods evolve. For example, behavioral studies have benefited from the development of polygenic scores, which rely on DNA. 12 In this current analysis, we observed a slight increase in biospecimen use among NCI population science behavioral grants (grants that were managed primarily by the Behavioral Research Program within NCI DCCPS) from 24.1% in 2014 to 30.9% in 2024. As another example of the breadth of research disciplines utilizing biospecimens, we also identified a health care delivery research project that involved analysis of existing serum biospecimens to measure prostate-specific antigen levels, with the overall objective of optimizing prostate cancer screening by determining which men should be screened, when screening should occur, and how frequently screening should be performed. 13
Another area for improvement identified in 2014 was increasing the collection of rarer specimen types (such as feces and cervical swabs). Given the small number of grants utilizing them in the current analysis, it is possible that prospectively collecting these specimens on an as-needed basis is more efficient or that the current science for utilizing these samples for research purposes is still being developed, such as biospecimen collection/preservation techniques for fecal analyses14–16 and the development of specific collection strategies for gynecological biospecimens. 17 Another recommended improvement was to increase biospecimen collections for rare cancer types, which takes more concerted and coordinated efforts than the collection of more common cancer types. Significant individual initiatives have advanced rare tumor research capacity in part via increased biospecimen collection, including the My Pediatric and Adult Rare Tumor network, which was designed to conduct a comprehensive longitudinal Natural History Study of Rare Solid Tumors, 18 and the previously mentioned SEER Virtual Tissue Repository, which could be a rich resource for rare cancer tissue. The need for rare tumor biospecimen collections also seems to be increasingly recognized by the larger research community,19,20 aligning with recent noted trends in increasing U.S. and global rare cancer research efforts as evidenced by publications (reviewed in Vivelo et al. 21 ). Taken together, these recent efforts may indicate an increasing recognition of the importance of collecting and pooling rare cancer resources including biospecimens.
In 2014, we recommended that validation studies be conducted regularly to ensure stored samples are suitable for the desired analyses, thus increasing the potential utility of existing biospecimens. NCI and the International Society for Biological and Environmental Repositories have published best practices for biospecimens,22,23 which include establishing and implementing quality control protocols to ensure quality and reliability of the biospecimens for research. With the increasing number of molecular analysis platforms, a broader and more fitting recommendation may be to ensure that the samples are “fit for purpose”—stored in manners that are suitable and will yield reproducible data for their intended research question and with the selected technology or assay.24,25 Consulting the scientific literature or other resources such as the NCI Biospecimen Research Database26,27 or individual biobanks for studies that have used the biospecimens from the specific biobanks for similar assays/technologies/study questions can give insight as to whether the biospecimens could be fit for purpose. Biobanks may also facilitate pilot studies using a small subset of samples at no or minimal cost to ensure the stored specimens are fit for the desired purpose (H. Eliassen and J. Neville-Golden, personal communication, September 4, 2025).
Over the past 10 years, progress has also been made in terms of fostering the sharing of biospecimens. In an effort to facilitate the identification of potential sources of existing biospecimens and data, the NCI maintains a Specimen Resource Locator (SRL 28 ) and a database of cancer epidemiology cohorts (Cancer Epidemiology Descriptive Cohort Database [CEDCD] 29 ). The SRL is a database of biospecimen resources designed to help researchers locate possible samples for their investigational use, while the CEDCD has descriptive information about cohort studies, including types of biospecimens and enrollment counts. As of 2025, more than 55 cancer epidemiology cohorts in the CEDCD 29 have blood, buccal/saliva, tissue, urine, and/or feces in their repositories and have established processes in place for researchers to request access to these existing biospecimens and data. The CEDCD and SRL allow users to search by specimen type and other key variables to identify collections with their desired criteria. More recently, NCI DCCPS developed suggested biospecimen management and sharing plan elements for population science projects 30 to help foster sharing of existing biospecimens.
The observation that only 10.0% of NCI DCCPS-funded grants are relying solely on existing biospecimens that were already collected and stored suggests a need to reconsider the biobanking model typically used in population-based studies. Biobanking efforts developed for population-based research studies typically focus on collecting specimens to meet planned goals but also place importance on storing unused specimens for unknown future use by investigators. These efforts tend to be associated with large epidemiology cohort studies that follow large groups of participants over time for the incidence of new cancers and cancer-related outcomes. Utilization of the stored biospecimens can facilitate new cancer research studies in a cost and time-efficient manner. 3 However, too often, biobanked specimens are underutilized, leading to increased costs. 31 Understanding the reasons for underutilization of preexisting samples and the need for collecting new biospecimens could help inform future biobanking efforts. Reasons why investigators may opt to collect new specimens rather than using existing specimens could include administrative and regulatory requirements for accessing existing stored specimens and data that are perceived as excessive. New study designs are being explored to collect and store biospecimens from certain subsets of participants. 32 Shifting to more of a prospective biobank model (where biospecimens are collected at the time they are requested by researchers) instead of a classic biobank model (where residual biospecimens are collected for potential future uses) could be considered in some instances. 33 The SEER VTR was developed with this idea in mind; collecting and storing residual SEER FFPE tissues for potential future uses was not sustainable, and thus, the idea of a prospective SEER FFPE tissue request and distribution model was developed.
Conclusion
Through this analysis, we determined and described types of and typical uses for existing biospecimens in population-based cancer control studies. The variety of biospecimen types being used to examine a breadth of hypotheses related to cancer risk and survivorship demonstrates the value of biospecimen resources. Our analysis focused on a small set of grants—those using existing biospecimens in classic model biobanks. Biospecimens continue to be important to the aims of a substantial portion of grants across the entire analysis period, emphasizing the need to continue supporting initiatives for researchers to use both retrospectively and prospectively collected materials. The grants included in this analysis utilized many different sources of existing biospecimens. Given the potentially overwhelming number of sources of appropriate specimens and data and the need to facilitate sharing of existing biospecimens, we encourage investigators to include their resource descriptions in centralized databases like the SRL or CEDCD, conduct validation studies of stored specimens and make the results easily accessible, lower the administrative burdens (perceived or real) for accessing stored specimens, and consider implementing different models for biobanking.
Authors’ Contributions
R.H.: Conceptualization, methodology, data curation, writing—original draft, writing—review and editing, and project administration. L.G.: Conceptualization, methodology, data curation, writing—original draft, writing—review and editing, visualization, and formal analysis. D.D.: Conceptualization, methodology, data curation, writing—original draft, and writing—review and editing. S.M.: Conceptualization, methodology, data curation, writing—original draft, and writing—review and editing. S.N.: Conceptualization, methodology, data curation, writing—original draft, and writing—review and editing. D.M.C.: Conceptualization, methodology, data curation, visualization, writing—original draft, writing—review and editing, and supervision. All authors have reviewed and approved the article prior to submission.
Footnotes
Acknowledgments
The authors thank Scott Rogers for help with the initial identification of grants for this portfolio analysis.
Author Disclosure Statement
No conflicting financial interests exist.
Funding Information
The authors’ salaries were provided by the National Institutes of Health. No other funding was provided for this research.
Disclaimer
The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Supplemental Material
Supplemental Material
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
