Abstract
Biospecimens are critical in driving health research. There is increased demand for scale and quality of biospecimens that in turn drives biobanking operational costs, influences utilization, and threatens the sustainability of individual biobanks. Biospecimen research has begun to inform the details of new biobanking standards and the steps of the biobanking process that are most important to focus on to achieve higher quality. This focus on quality is currently centered mostly on intrinsic features of biospecimens and their annotating data. This review highlights additional quality features that are important to researchers in determining the fit for purpose in their research. First, we define complex qualities as those that are mostly extrinsic to the individual biospecimen and data, and second, we provide data on the growth in demand for biospecimens with this type of quality in cancer research biobanks. Finally, we discuss why biospecimen complexity is a challenge for biobanks and utilization of existing collections, and provide examples of strategies biobanks can consider to improve their focus on this aspect of quality, as we predict that researcher demand for complex biospecimens will continue to expand in the future.
Introduction
Biospecimens, such as tissue and blood samples, are critical fuels that drive research to create better health. Biospecimens are also central to the realization of personalized treatment and management decisions in clinical practice. The importance of biospecimens in health research and the expansion of biobanks have risen steadily as research demand has increased. At the same time, there has also been a gradual appreciation of the need for not just more biospecimens but also better quality biospecimens. 1
The biobank community has addressed this need for higher quality by generating evidence through biospecimen science that highlights the effects of preanalytical variables and delineating best practices and standards for biobanking. Efforts to further refine these standards and implement programs to communicate standards are ongoing. 2 Biobanks, however, are just one of many infrastructure components within the broad arena of health research, and so the investment in biobanks in the public/academic sector is finite. For individual biobanks, the dual pressures of meeting demand for quantity and quality of biospecimens present a significant challenge to sustainability.
In the context of the research system as a whole, these increased requirements can only be met by either redeployment of existing resources within the biobank sector or from other sectors within the research system or they must lead to an increase of system costs. Alternatively, it may be possible to develop new models to support the cost of biobanks, such as cost sharing with stakeholders in health care, which may be considered as research and clinical biobanking begin to merge.
But for now, at the level of an individual biobank, these efforts to implement standards and improve quality clearly represent an additional immediate fiscal pressure on their sustainability, on top of the operational pressures of increased demand and social pressures of changing ethical and legal requirements. 3 Therefore, it is now even more important than ever for biobanks to consider the value of different types of standards adopted, and to strike a balance between the numbers and types of biospecimens to collect and with which aspects of their quality to invest in.
Standards are specifications that should lay out the characteristics or rules not just for a product but also a process to achieve or be recognized for a type of quality. Quality is an attribute or characteristic of something measured against similar entities that denotes the degree to which something approaches excellence or addresses the intended purpose. The intended purpose of biospecimens in driving health research is always evolving. Therefore, standards cannot be rigid; they should be founded on the best available knowledge and change as this evolves, and they should be framed with an eye to costs and benefits of health research. Similarly, quality in biobanking should be relevant to the research users and incorporate all relevant characteristics that relate to the research use.
The emergence of biospecimen science and the knowledge generated continues to play an important role in shaping standards to achieve high-quality biospecimens. However, there has been a tendency among biobankers to limit the focus to those aspects of quality that are integral to or directly related to the individual biospecimen. These intrinsic qualities include measures of the integrity and the status of the biospecimen, biobank data relating to the collection, handling, processing and storage of the biospecimen, and patient data relating to the host condition. For example, a recent draft of proposed ISO standards for biobanking was almost entirely focused on these intrinsic qualities. 4
But beyond intrinsic aspects of quality there are many other aspects that could be considered as features of quality that determine fitness for research purpose, and these characteristics are of equal or greater importance to the research user.
We have previously introduced the concept for researchers and proposed the term “complex” to distinguish these latter extrinsic qualities from the former “simple” intrinsic qualities (Fig. 1). 5 In this article, we expand further on the definitions of simple and complex quality, provide data supporting the growth in use of complex biospecimens, consider why biospecimen complexity is a challenge for biobanks, and provide examples of strategies biobanks can consider to improve their focus on this aspect of quality.

Facets of biospecimen quality. Diagram illustrates the different layers and categories of simple/intrinsic quality features and complex/extrinsic quality features associated with a biospecimen.
Quality: Simple and Complex Features
We have previously defined simple quality as the intrinsic features of a sample and its annotating data that determine its fitness for research purpose. 5 This sample can be a single biospecimen and the data can include patient diagnosis, sample type and collection information, specimen composition, and any of these parameters may be less or more comprehensive in terms of scope and density. For example, there may be several sample aliquots and many data fields concerning the patient, biospecimen collection and composition, the pathology represented in different areas of the biospecimen, the clinical features, treatments, and outcomes data.
There are also many quality metrics that can be measured in the course of biobanking these samples and data that determine the fitness for different research purposes. Some of these metrics are measures of the nature and state of the biospecimen, some of the nature, and extent and completeness of the data. There are also other aspects and features that are mostly extrinsic to but associated with each biospecimen, that in some instances are not so easily measured, but that also determine the degree to which it can be used or valued for an intended research purpose.
Complex qualities of biospecimens can be considered in several categories:
Governance of the use for research. This feature is mostly encompassed by the simple determinants applied by the donor (i.e., yes/no and narrow/broad) or a representative of the donor (i.e., waiver of consent from an ethics review board). But the nature of consent can also be conditional (involve a set of specific conditions) and complex (e.g., the basis and ability to recontact). There are also complex determinants applied by other individuals acting on behalf of the donor. For example, pathologists often play a role by determining which surgical tissues must be kept for future clinical use and which tissue or portions of tissues can be safely released to researchers. Some of these determinants are very important in influencing the degree to which the biospecimen is representative of the disease. Health data relationships concerning the individual that are not directly related to the condition represented by the sentinel biospecimen. In general, after biobanking a biospecimen, limited data are gathered about other concurrent or future conditions and events that are not directly related to the initial disease. But, for example, as cancer therapies become more successful, research into the relationship between an initial event such as a tumor and other conditions and their treatments and adverse events need to be better supported. Host relationships between the individual biospecimen and other biospecimens representing the same disease from the same individual (i.e., linkage of biospecimens preserved in different formats, or across space and time, or across clinical events). An obvious feature that some biobanks have already incorporated into their operations is enhanced processing and annotation. This is more work but is a way to add a type of “complex quality.” For example, a biobank can extend the value of a biospecimen by linking it to several aliquots that are preserved in different ways so that questions can be asked of linked samples using different assays or by the addition of classifying annotation to improve “selectability.” Some biobanks routinely convert existing cases into complex biospecimens by additional processing to create tissue microarrays (TMAs) and addition of current and topical biomarker annotation because this enhances the ability to apply current selection criteria. Another complex feature of a sentinel biospecimen is linkage to other biospecimens across locations and clinical events. But although collection of multiple blood or tissue samples over time after diagnosis or treatment is a standard component in many clinical cancer studies and trials, it is not a focus for most institutional biobanks. Population relationships between the individual biospecimen and other biospecimens from other comparable individuals (i.e., the degree to which a biospecimen is representative of a given population). A less obvious but very important feature of complex quality is representation of the population. Many disease-oriented biobanks have operated for years on the premise that collecting what is feasible, and then selecting and offering this in cohorts from their inventories are good enough for research. But this means that our biobank inventories are very biased. There was no really good way to address this before, but now technical advances have largely unlocked the ability of researchers to apply complex assays to the formalin-fixed paraffin-embedded (FFPE) block, and so enhancing access to the relatively unbiased FFPE samples that exist in clinical pathology archives is a new opportunity that established biobanks should not ignore. For example, population cohort studies are maturing and creating linkages with pathology archives that contain specimens associated with later events, and population-based cancer registries are creating linkages to create virtual tissue repositories.
Evidence for Increased Need for Complex Biospecimens
We have recently experienced a change in the nature of enquiries and applications from cancer researchers to access the existing inventory of our poly-user institutional tumor biobank. There has been a decline in requests for simple biospecimens (as categorized earlier) and instead an increase in requests for complex biospecimens. In many instances the latter requests could not be fulfilled or could only be addressed by redirecting researchers to the local pathology department archives as our tumor bank inventory previously held mostly biospecimens associated with simple qualities. This has stimulated us to transform our biobank operations and to investigate if this is a wider overall trend in cancer research.
Our preliminary studies using an indirect approach and surrogate indicators suggest that such a trend toward increased biospecimen complexity can be detected in the cancer research literature. 5 To further explore and characterize the extent to which demand for complex biospecimens may be increasing, we have now considered some new selection and analysis strategies and pursued several additional literature review studies.
One strategy is to replicate but extend our previous literature review strategy. 5 The challenge of this indirect method is that there is no standardized terminology or keywords for the biospecimen complexity factors described earlier. For example, the term “biospecimens” is commonly used by biobankers but is not necessarily used by researchers using the specimens.
In addition, simply presenting the total number of articles with some type of biospecimen complexity does not provide a level of detail on what types of complexity are prevalent. Nevertheless, to broadly determine research trends this same strategy can be replicated using research topics that are likely to contain complex biospecimens. Research on tumor heterogeneity, for example, would involve comparison between matched samples to determine intertumoral differences, whereas studies of circulating protein biomarkers and tumor DNA (circulating tumor DNA [ctDNA]) generally involve the collection of multiple liquid biopsies or blood samples over time and may also involve comparison with a primary and metastatic tumor specimen.
As shown in Figure 2, a second study using the same indirect literature search performed using additional relevant search terms (i.e., “heterogeneity,” “clonal evolution,” “ctDNA,” and “Liquid Biopsy”) shows similar trends to our previous findings. We found that the proportion of PubMed articles using keywords linked to complex biospecimens have increased significantly in the past 3 years (Fig. 2). ctDNA and Liquid Biopsy, both terms that are linked to the collection of multiple biospecimens over several time points, have grown in usage exponentially.

Growth in biospecimen complexity-related topics as a fraction of all PubMed cancer publications. Search terms “heterogeneity,” “clonal evolution,” “ctDNA,” and “Liquid Biopsy” were used in combination with “Cancer” and filtered by human research. Results by year were divided by the total number of results for “Cancer” alone, and then normalized for each search term to results from 2000 to compare relative growth over time. ctDNA, circulating tumor DNA.
In 2000, ctDNA articles comprised 0.06% of all human cancer publications. In 2017, the percentage of all articles multiplied six-fold to 0.39%. Liquid Biopsy articles increased 10-fold from 0.03% to 0.32%. Similarly, the fraction of heterogeneity articles increased from 0.8% to 2.3% of cancer publications, and clonal evolution articles doubled from 0.1% to 0.2%. This indicates that studies using biospecimens from multiple locations may be increasing in cancer research.
An alternative literature review strategy is to conduct a direct review of recent publications, where individual articles are assessed for the types of biospecimens and data used. One disadvantage of this strategy is the time and effort involved in reviewing sufficient numbers of articles to achieve an adequate sample size to discern trends across several time points. This is especially challenging when the study is conducted across different journals because there is inherent bias in the type and scope of research published in specific journals and variation associated with specialized research topics and also differing publication standards.
Nevertheless, for our third study we used a relatively unbiased direct approach in terms of nature of research and journal to attempt to document increasing biospecimen complexity. We identified the first 50 articles listed in PubMed at each 5-year interval between 2000 and 2015 (n = 200) that met the overall criteria that cancer research data were generated using human biospecimens. Additional criteria were also applied (e.g., English language) to facilitate direct review. The target number (n = 50 per year time point) was chosen empirically on the basis that using similar overall numbers we have previously been able to identify significant trends in evolution of different formats of tissue biospecimens and data trends.6,7 For each publication, we evaluated reporting criteria pertinent to several of the complexity factors related to categories indicated in Figure 1 as well as impact scores.
We were mostly unable to detect significant trends toward use of complex biospecimens in this data set. However, we found that the overall use of combined FFPE and frozen tissues from the same patient in research has increased in the literature significantly since 2000 (Fig. 3). There has also been an increase in the use of articles with outcomes data. But the number of articles assessed at each of the time points, for a total of 200 articles, was too low to adequately assess trends for other types of complex data and for articles using tissue over time or source from different anatomical locations.

Relative proportions of cancer research publications using single- or multiformat combinations of biospecimens and change over time. A set of 200 articles (50 per time point) was studied and categorized on the basis of use of a single format or multiple formats of matched biospecimens from the same cohort of patients. There was a significant change in relative use of single format versus the combination of FFPE + frozen format tissue biospecimens (p = 0.006, chi-squared test for trend). FFPE, formalin-fixed paraffin-embedded.
These results nevertheless indicate that there has been an increase in use of matched FFPE and frozen tissue formats and outcomes data.
In our fourth study, we used the same direct approach, but to address bias across journals we selected a single journal, Nature Medicine. We also wished to address the issue created by the delay between initiating a research study and the eventual publication of an article, such that very recent increases in research demand may not yet be reflected in the general literature but may be reflected in articles at the “cutting edge.” This journal was, therefore, selected on the basis that it has frequently published research articles that involve use of human biospecimens and has a high impact factor and reputation such that it might be considered to represent a part of the cutting edge of health research.
We found that compared with 2000, where very few articles involved use of complex biospecimens, in 2015 just over half of all Nature Medicine publications using biospecimens had some measure of complexity associated with the biospecimens used, either in terms of multiple formats (56%) or more than one host location (67%) (Table 1). These results indicate that biospecimen complexity is increasing in articles in a single leading journal.
Publications in Nature Medicine Utilizing Biospecimens from Cancer Patients, Categorized by Articles Where Biospecimens were Associated with Multiple Preservation Formats or Obtained from Multiple Locations from the Same Patient
All primary research articles published in 2000 and each 5-year interval up to 2015 were evaluated and the subset that involved human biospecimens for cancer research (n = 28 articles) was assessed for the use of multiple biospecimen formats and/or multiple sites from the same patient.
In light of this finding of increasing use of complex biospecimens in a high impact journal we also reassessed the journal impact factor (JIF) percentile of each publication in our third study. Using aggregate numbers, articles with complex biospecimens and data had a higher impact on average than articles using simple biospecimens (Fig. 4).

JIF percentile from 2000 to 2015. The JIF was assessed in the set of 200 articles studied in Figure 3 and categorized into articles with multiple biospecimen formats, articles with matched biospecimens (multiple locations or multiple time points), and articles with simple biospecimens (one format, one location, and one time point). JIF percentile score was found to be statistically different between articles with simple versus complex biospecimen features (*p < 0.05, Mann–Whitney test). JIF, journal impact factor.
Articles with complex features such as multiple formats were published in journals with significantly higher JIF percentile scores on average. Altmetric scores are a relatively new metric to quantify impact of articles in terms of online activity created by an individual article and is complementary to traditional citation-based metrics.7,8 Because Altmetric scores are a new metric, only Altmetric scores for 2015 articles could be evaluated. Nevertheless, in 2015 articles with multiple formats or with matched samples had higher Altmetric scores than articles without complex biospecimens.
Discussion
We believe that there is a growing trend, at least in cancer research, in research requirements for biospecimens with more complex qualities. We have experienced this recently in the type of research requests made to our tumor biobank. We have sought to confirm the wider impact of this trend by examining the use of complex biospecimens in published cancer research. Some evidence in support of this local observation can be found in a significant increase in publications on key topics that typically involve consideration of geographical and temporal factors and patterns of disease.
We were unable to confirm that this is a broad trend across the cancer research spectrum through direct assessment of a small random selection of publications for the 15-year period up until 2015. However, we did observe increased use of some types of matched tissue biospecimens across journals, an association between complex biospecimen usage and higher JIF, and a significant rise in the use of complex biospecimens linked by format or location in at least one leading journal.
We have already noted several factors that limit the ability to detect changes in use of types of biospecimens through both indirect and direct literature review. An additional limitation is the inherent delay that occurs between the initiation of a study and the publication of results. Studies can take several years to complete, articles several months to draft, and the time from submission to publication can also take many months (e.g., average time of >6 months for the 2015 Nature Medicine articles reviewed). The noticeable increase in articles utilizing complex biospecimens in a leading-edge journal since 2010 and in articles concerning topics such as ctDNA and Liquid Biopsy since 2015 across all journals could suggest that it is premature to expect to be able to fully document this change in demand.
Nevertheless, this potential change in research demand toward a preference for biospecimens associated with complex qualities may be a challenge for biobanks. For the typical disease-oriented or hospital-integrated research biobank, operations revolve around the collection of individual biospecimens that represent a clinical event such as a primary tumor diagnosis. Representative biospecimens are processed, stored, and annotated over time to create an inventory of cases associated with outcomes and that can be selected at a later point in time to support “retrospective” research questions. 9
Aspects of quality that are intrinsic to the biospecimen or the immediate clinical event are, therefore, mostly within the scope of the biobank processes and relatively feasible to address. By contrast aspects of quality that are extrinsic to the individual biospecimen are often more difficult to address and often involve a change or extension in the design of operations. This means that addition of complex qualities usually involves added costs and effort or significant adaptation of processes to redeploy resources within an existing organization, or it is left to new organizations to tackle. This is a challenge to sustainability.
One view is that current biobanks may be focusing too much on improving simple quality features and failing to adapt fast enough to this new trend. For example, the new ISO biobanking standard that has absorbed a lot of effort and attention from the biobank community is important but hardly addresses complex quality features.
And yet, given a choice today between accessing a very large cohort of primary tumor biospecimens collected at very high cost to the most demanding standards and accessing a very small cohort of cases with primary and matching recurrences, perhaps also accompanied by multiple blood samples collected under unknown or just simple standards, the users of our tumor biobank would overwhelmingly choose the latter. This is not to say that creating, disseminating, and harmonizing biobanking standards is not important and this is not to support that fact that journals and reviewers pay little attention to the source and quality of biospecimens from which data in publications are derived, despite the issue that this may be a significant contributor to the issue of scientific irreproducibility.
But the current focus of many biobanks on only improving simple quality features means that the demand from research for complex quality biospecimens is not being satisfied. This may lead to a shift in research funding from professional biobanks toward supporting the creation of new research biobanks by individual researchers and programs, which may run counter to current efforts in the biobank community to centralize expertise and promote and harmonize standards.
How can currently established poly-user biobanks adapt to this new trend? 10 One solution to free up existing budget resources to support a shift in the focus from simple to complex biospecimens is to carefully consider the balance of costs to the biobank and benefits to their primary users of implementing specific quality standards. Adoption of external quality assurance programs and standards is very important for improving the quality of research conducted on human biospecimens, but different types and levels of quality incur different costs.
The new ISO standard serves as an important reference point and will be valuable for many biobanks and types of research users. But there are other widely adopted external quality assurance programs associated with compatible standards and self-assessment tools that may be less costly to adhere to and, therefore, relevant to some types of research biospecimen collections. 11 Another strategy for biobanks is to transform their operations to concentrate on bespoke prospective cohort collection.
Prioritizing custom cohort collection for research groups over collection for the biobanks own inventory uses the biobank's consent and collection machinery to help researchers efficiently compile their own biobanks. One step in moving in this direction is to initiate a communication plan directed at local researchers to inform them that the biobank can provide a prospective collection service and has more to offer than just an inventory. To enable this shift, the biobank also needs to establish targets for its inventory and cap automatic accrual of some types of biospecimens to free up resources.
Another strategy for a tumor biobank is to refocus enrolment energy into obtaining consent for biobanking from representative populations diagnosed with cancer, independent of the likelihood of being able to secure fresh or frozen materials at the time of surgery. 12 This facilitates enacting processes to access to FFPE materials in the clinical pathology archives. Active utilization of population-based cancer registries can identify specialized potential cohorts associated with clinical FFPE biospecimens, for example, cohorts with biospecimens that span clinical events and that can be linked by a dedicated effort by the biobank to compile the cohort for researchers.13,14
In summary, biobanks are a research platform, research is dynamic and always evolving, and platforms have a life cycle. Compiling and maintaining a stock that allows cohorts of biospecimens with outcomes data and simple quality features to be rapidly and efficiently studied remains important. However, the research appetite for complex biospecimens that allow disease evolution in space and time to be studied is rapidly expanding. Biobanks need to rise to the challenge and adapt.
Footnotes
Acknowledgments
We gratefully acknowledge support for this work by the Biobanking and Biospecimen Research Program at BC Cancer (supported by the Provincial Health Services Authority), the Canadian Tissue Repository Network (funded by grants from the Institute of Cancer Research, Canadian Institutes of Health Research and the Terry Fox Research Institute, and from the Canadian Cancer Research Alliance), and the Office of Biobank Education and Research, University of British Columbia (supported by the Department of Pathology and Laboratory Medicine, University of British Columbia).
Authors' Contributions
All authors contributed to the ideas presented in this study and to the development and preparation of the article. All authors read and approved the final article.
Author Disclosure Statement
No conflicting financial interests exist.
