A Proposed Schema for Classifying Human Research Biobanks

Abstract

Human research biobanks have rapidly expanded in the past 20 years, in terms of both their complexity and utility. To date there exists no agreement upon classification schema for these biobanks. This is an important issue to address for several reasons: to ensure that the diversity of biobanks is appreciated, to assist researchers in understanding what type of biobank they need access to, and to help institutions/funding bodies appreciate the varying level of support required for different types of biobanks. To capture the degree of complexity, specialization, and diversity that exists among human research biobanks, we propose here a new classification schema achieved using a conceptual classification approach. This schema is based on 4 functional biobank “elements” (donor/participant, design, biospecimens, and brand), which we feel are most important to the major stakeholder groups (public/participants, members of the biobank community, health care professionals/researcher users, sponsors/funders, and oversight bodies), and multiple intrinsic features or “subelements” (eg, the element “biospecimens” could be further classified based on preservation method into fixed, frozen, fresh, live, and desiccated). We further propose that the subelements relating to design (scale, accrual, data format, and data content) and brand (user, leadership, and sponsor) should be specifically recognized by individual biobanks and included in their communications to the broad stakeholder audience.

Introduction

Human research biobanks are collections of human biospecimens (tissues, blood, and body fluids and their derivatives) and associated data that are utilized for research purposes. These biospecimens and data are in some cases designated for research at the outset of the collection process, but often are collected for diagnostic procedures and later repurposed for research when residual material exists after diagnostic procedures have been completed (eg, clinical pathology archives). Human research biobanks are critical to the current drive for personalized medicine as they provide the necessary biospecimens to fuel the majority of research platforms. There are already numerous examples of medical advances that have relied upon human research biospecimens.^1,2 With the emphasis now on development of tools to facilitate personalized medicine, the important role biobanks play in medical research will only increase. This predicted trend is supported by findings that over the past 2 decades there has been a significant expansion in biospecimen usage and, therefore, biobanking as an activity.³

The expansion of biobanking over the past 2 decades is largely due to evolution of research technologies and informatics capabilities. For example, the introduction of widely available antibodies coupled with invention of the tissue microarray enabled high-throughput screening of human tissues for suspected biomarkers.⁴ Technological advances such as these have meant that human biospecimens and data can be technically approached as readily as cell lines and animal models. This in turn has led to demand for biospecimens from the entire research spectrum, so that, whereas biospecimens were once the domain solely of translational research, they are now also critical for discovery and clinical research.

Another factor underlying the expansion in human research biobanking is increased diversity of biobanks. Biobanks have historically thought to be strictly comprised of formal entities within hospitals and research institutions, whereas it is now recognized that there are a variety of different biobank types spanning a broad spectrum, from very small collections aimed at supporting a specific research project (mono-user biobanks), through collections associated with several research groups or clinical trials (oligo-user biobanks), and finally to larger collection programs that formally identify themselves as biobanks or repositories (poly-user biobanks). A good analogy to this spectrum of biobank types is that of “tools in the research workshop” (Fig. 1). Biobanks belong to several general classes, but within these classes they have specialized aspects to their designs, relating to the range of research questions they intend to support. The range of questions in research reflect the necessary incremental steps that lead research from basic discoveries to mature strategies, tools, and therapies that impact clinical care. Cancer research provides a good example of the progression of research questions from the basic to translational domain: what genes are different between tumors; what cells express a specific gene within tumors; what effect does expression of a gene have on response to therapy?

FIG. 1.

The various types of human research biobanks result in a spectrum of biobank types: mono-user, oligo-user, and poly-user biobanks. A Color version of this figure is available in the online article at www.liebertpub.com/bio

The Need for Biobank Classification

There currently exists no accepted classification system for human research biobanks, largely because of the way that the discipline of biobanking has evolved since its origin. Research biobanks originally began from the usage of residual clinical samples. However, because this was a secondary, unanticipated use of these materials, the research community eventually became aware of the limitations on what clinical laboratories could offer. This resulted in a rapid, unregulated proliferation of independent research biobanks established to serve specific research interests. This new research-based approach to biobanking has significantly expanded to embrace a range of specialized components including frameworks (privacy and security), equipment (processing, annotation, and storage), operating procedures (biospecimen accrual, processing, annotation, storage, release, distribution, and tracking), clinical informatics (pathology, treatment, and outcome data), database structures (donor consent, inventory management tools, and query tools), policies (priorities and access processes), economic models (funding sources, user fees, and intellectual property), governance models (for strategy and operations), and personnel with specialized roles and training. This has meant that biobanking, which was once a limited activity within clinical pathology, has now evolved into a sophisticated discipline. However, this evolution has largely occurred outside of clinical departments in an uncoordinated way. This has created major constraints in quality and capacity, compounded by the diversity of biobank design in the absence of an accepted system for describing or classifying biobanks.

This absence of a biobank classification system has several consequences that are detrimental to the discipline. First, it makes the diversity of biobanks hard to appreciate, and lack of appreciation is a major factor contributing to the challenges faced by all biobanks in securing adequate resources and funding. Although it has been established that long-term institutional support is a requirement to ensure biobank sustainability,^5,6 many institution decision-makers fail to recognize the importance, complexity, and operational needs of biobanks within their organizations. Similarly, many scientists believe that effective biobanking amounts to no more than a freezer and a fraction of a technician's time to fill the freezer. Although this type of biobank does exist, there exists the remainder of the biobank spectrum that encompasses increasing levels of operational complexity and associated cost. There will undoubtedly continue to be a need for the more basic types of biobanks [eg, formalin-fixed paraffin-embedded (FFPE), tissues collected without standardized protocols and held within pathology archives],³ but many of the newer research technologies, such as genomic analyses, require fresh biospecimens processed in a highly standardized way and that are annotated with extensive clinical data.⁷ If institutions, funding bodies, and the scientific community fail to recognize and support the latter type of biobank, the biospecimens needed for research toward personalized medicine will not exist.

The absence of a classification system may also soon impact research users by hampering the ability to interpret and reproduce research results. Until recently, it has been sufficient to report data on a biomarker with only limited information about the biospecimens used and their source. However, as the recent focus on “biospecimen research” has already proven, knowledge of the preanalytical variables associated with biospecimens is integral to downstream interpretation of research results.^8,9 As the importance of preanalytical variables has become appreciated, the biobank community has also realized that these need to be documented and this is reflected by several new initiatives. For example, the “Standard PREanalytical Code” (SPREC) recommendations provide a framework to capture the key preanalytical factors impacting biospecimens.^10,11 It is only a matter of time until documentation of preanalytical variables becomes a requirement for research publications. The recently published “Biospecimen Reporting for Improved Study Quality” (BRISQ) recommendations provide a set of reporting parameters that should be included in biospecimen research publications, including preanalytical variables and other aspects of biospecimen-related research, such as clinical characteristics of participants and aspects of the accrual mechanism.¹¹ Related to this, it will also soon be important to define the biobank source to enable increased experiment reproducibility, as shown by a recent initiative to create a “BioResource Impact Factor” (BRIF) to create a quantitative means to document the use and quality of research arising from individual biobanks.¹² These 3 important initiatives, SPREC, BRISQ, and BRIF, would all work toward increasing communication between biobankers and researchers to enable better interpretation of research results and increased biospecimen traceability and experimental reproducibility. The purpose of this article is to make the argument for development of a similar tool with a different focus, intended to help communicate the different and diverse aspects of biobanks, particularly to the broader biobank stakeholder audience (ie, groups other than the biobankers and researchers).

Developing a Classification Schema

Classification is the process whereby ideas and entities, in this case biobanks, are recognized, differentiated, and understood. Determination of the overall entity and the categories requires that the common purpose is defined at each level. “Prototype,” “classical,” or “conceptual” approaches to consider the components and elements that can be categorized within and between categories have been defined.¹³ The prototype approach involves recognition of general prototypes to create the basic stems or concepts, and then the definition of grades of categories related to these. In the classical approach, biobanks would be defined by their differences into discrete classes. In the conceptual approach, biobanks would be defined by looking first for inherent commonalities to create groups.

Human research biobanks have existed in “prototype” forms for more than 50 years. The initial prototype was perhaps the pathology archive, restricted to FFPE materials. The prototypes of other forms of biobanks then evolved in response to the demands for increased numbers of biospecimens and non-FFPE biospecimen formats. These additional biobank types became recognized by the 1980s through the prototypical designations (eg, blood biobanks and fresh-frozen tissue biobanks) (Table 1). Second-generation versions of these prototypes emerged in the early 2000s with improved oversight, access, and more diverse operational designs to address biospecimen needs in terms of format and quality, through standardized processes and annotation.⁸ Third-generation versions that incorporate knowledge from biospecimen science and better defined preanalytical variables are also now emerging.

Table 1.

A Prototypical Biobank Classification Schema

Classification	Categories	Characteristics
By prototype	Pathology archive	Collection consisting of clinical FFPE materials. These were the earliest forms of human biobanks.
	Blood biobank	Collection consisting of blood and its derivatives. These biobanks began following the advent of clinical blood tests and clinical drug trials.
	Fresh-frozen tissue biobank	Collection consisting of fresh-frozen tissue. These biobanks began in association with the introduction of clinical biomarkers (eg, estrogen receptor testing) and increased with the advent of newer molecular biology-based research methodologies (eg, polymerase chain reaction).

FFPE, formalin-fixed paraffin-embedded.

The experience gathered and the examples provided by these new generations of biobanks provide the elements to consider a new approach to classification. Currently, biobanks are most often described by a classical classification schema based on the type of research they intend to support: (1) population study biobank, (2) basic research biobank, (3) translational study biobank, (4) clinical trial biobank, and (5) pathology archive biobank (Table 2). An example of the application of this type of classification method is the Biobanking and Biomolecular Resources Research Infrastructure (BBMRI) division of its catalog of European biobanks into population-based biobanks and disease-based biobanks.¹⁴ As argued above, the increasing complexity and specialization within the discipline of biobanking and the need for recognition of the diversity of biobanks mean that classification of biobanks by other approaches would be more useful for the many stakeholder groups other than researchers.

Table 2.

A Classical Biobank Classification Schema Based on Type of Research a Biobank Intends to Support

Classification	Categories	Characteristics
By type of research they support	Population study	Focus of collection is on donors representing a defined population (normal, at risk, or with disease). Biospecimen focus is often blood/DNA samples. Usually corresponds to “large” size and “oligo” or “poly” user categories.
	Research project	Focus of collection is variable (donor and/or patients with disease type). Biospecimen focus is variable and often involves tissue selected to represent normal and/or diseased tissue. Accrual through retrospective or prospective collection. Usually corresponds to “small” size and “mono” user categories.
	Translational study	Focus of collection is on patients representing specific disease types and categories of disease accrued through either prospective or retrospective collection. Biospecimen focus is typically FFPE and/or blood samples with treatment and outcomes data. Usually corresponds to “moderate” size and “oligo” user categories.
	Clinical trials	Focus of collection is on patients with disease representing types and enrolled prospectively into clinical trials to test new therapeutic approaches. Biospecimen focus is mostly FFPE and/or blood samples linked to the trial treatment and outcomes data. Usually corresponds to “large” size and “poly” user categories.
	Pathology archive	Focus of collection is on cases treated by surgical approaches. Biospecimen focus is mostly FFPE samples. Usually corresponds to “large” size and “poly” user categories.

As a starting point, we can define the root class, “biobank,” stemming from the collection of human research biospecimens and associated data. All such biobanks have many shared elements around the activities of accrual, processing, storage, and release and the rules, laws, and standards for operations that involve external oversight bodies. There are also important shared elements across many biobanks around connecting with participants, standardization of quality, oversight of biospecimens, and linkage to clinical and outcomes data. These elements provide a rich set of features to serve as a basis for classification by a new classical or conceptual approach. However, the assumption of classical classification is that all major classes exist already and that all classes are mutually exclusive. As the recent emergence of live cell biobanking suggests, biobanks are still evolving in step with the science that they support, and so the conceptual approach is more appropriate.

The Elements of a New Conceptual Classification Schema

We propose that biobanks are classified by a conceptual approach based on 4 functional elements—donor/participant, design, biospecimens, and brand—and multiple subelements as outlined in Fig. 2. Each element will be discussed.

FIG. 2.

Proposed conceptual classification schema for biobanks based on 4 elements—donor/participant, design, biospecimens, and brand—and multiple subelements. A Color version of this figure is available in the online article at www.liebertpub.com/bio

Donor/participant

This relates to the participant category, status, group, and disease

The focus (category) can be on healthy control populations or diseased test subjects/participants. The test subjects may be alive or deceased (status), may belong to recognizable groups such as pediatric or adult (group), and may be healthy or at risk for (preclinical state) or exhibit disease (eg, noncancer, benign, or cancer).

Design

This relates to the accrual plan, scale, and the data format and data content

Accrual can be prospective or retrospective, and the source can be direct from the participant or indirect via a medical procedure, surgery, or autopsy, and in prospective biobanks the biospecimens and data can be collected once or at multiple time points. The collection can also involve participants attending or deceased at a single health institution or multiple clinical centers. The scale of collection can vary from very small collections in single laboratories related to single studies, to moderate- and large-sized collections related to single and multiple research studies or collections. Scale is of course a relative term, but a recent review of biospecimen cohort size used in cancer research publications suggests that the average size has risen from 50 to 200 over the past 2 decades.³ Our own experience within the tumor biobanking field is that many current single studies and basic research laboratories host collections of <200 cases, whereas formal biobanks designed to support multiple studies with cohorts selected using several criteria from the biobank stock typically host >1000 cases. On this basis we would propose to define small/moderate/large as <200, 200–1000, and >1000, circa 2011. The data format within the database can comprise linked or unlinked elements to the participant and biospecimen (linked/identifiable elements can be coded or uncoded vs. irreversibly unlinked/anonymized biospecimens and data). The data content also vary in extent of annotation and can include elements that relate to participant, biospecimen, disease, treatment, and outcome. With unlinked designs, data collection typically occurs once and in association with biospecimen accrual, whereas with linked designs collection can occur before, during, or long after biospecimen accrual.

Biospecimens

This relates to the type, target, and method of preservation of biospecimens

The type of biospecimen includes blood, tissue, fluid, or cells; the target might be healthy or diseased target tissues; and preservation method might be formalin or other fixation, freezing at 4°C, −20°C, or −190°C, or the collection might be focused on fresh materials (usually to obtain cell or fluid fractions), live cell preservation (usually through freezing), or desiccation.

Brand

This relates to the leadership, sponsors/custodians, and the user base associated with the biobank

Biobanks are created by individuals, often expert researchers and groups, and institutions. This occurs with support of dedicated sponsors, and as in other nonresearch fields, the biobank often becomes commonly known by these brand labels as these have clear value in communication and recognition to key sponsors, stakeholders, and communities of users. The term “user” refers to the intended spectrum of users. This can be a single user (mono-user), several users such as in a research group (oligo-user), or the biobank can be intended for multiple users (poly-user). The “user base” categories define many of the design features.

Considerations in the Application of a Conceptual Classification Schema

The elements and subelements outlined earlier provide a common terminology for building a classification schema for biobanks. We have recently applied the above schema, elements, and subelements as a basis for formulating a plan for certification of Canadian tumor biobanks. Inherent to this certification plan we needed to functionally classify biobanks to refine the different education and information needs around standardization of biospecimen quality. However, this is a relatively narrow context (ie, to guide development of a detailed program plan for tumor biobanks and mostly from the biobank stakeholder perspective) in which to test application of this schema. To be more widely applicable, a new biobank classification should be simple, flexible, and general. Classification should also serve as a communication tool with the various biobank stakeholders and therefore must relate to the need to communicate the elements and subelements that are most important and of most interest to all the major stakeholder groups. We propose an example of one approach describing how to apply this schema to an individual biobank to stimulate broader discussion.

Example of How to Apply a Conceptual Classification Schema

To apply this conceptual classification schema to our own biobank, the British Columbia Cancer Agency's Tumour Tissue Repository (BCCA-TTR), we created a matrix (Fig. 3) to display the different biobank stakeholders and elements. The stakeholders we considered were comprised of 5 groups: (1) public/participants, (2) members of the biobank community, (3) health care professionals/researcher users, (4) sponsors/funders, and (5) oversight bodies. All elements and their subelements were then organized into rows and the subelements that we considered to represent the principal concerns for each of these stakeholder groups were identified with an “X.” This enabled a score to be calculated for each subelement, equal to the number of stakeholders for which each subelement was important (0–5). Using this approach, we identified the most important common elements for classification of our biobank to be brand (user, leadership, and sponsor) and design (scale, accrual, data format, and data content).

FIG. 3.

Matrix of biobank elements and subelements grouped into areas of interest for the proposed stakeholder groups. As an example of application of this matrix, those elements and subelements applicable to the BC Cancer Agency's Tumour Tissue Repository (BCCA-TTR) have been recorded (“X”). The column on the far right shows the score for each subelement, equal to the number of stakeholders for which each subelement was important (0–5). A Color version of this figure is available in the online article at www.liebertpub.com/bio

Conclusion

The adoption of a more refined classification system is important for the future of human research biobanks. Given the wide spectrum of biobank stakeholders, a single classification schema may not serve all biobanks optimally, but it may be valuable and possible to agree upon the elements of a primary classification schema and secondary terms for common use. We propose that a conceptual approach based on shared functional elements should be the basis for such classification. Unlike other recent suggestions to classify biospecimens for research users (SPREC),¹⁰ standardize biospecimen-related research reporting (BRISQ),¹¹ and reporting of biobank impact factors (BRIF),¹² the approach presented here would allow for detailed biobank classification with common terminology that is of most value for operational communication between biobanks and their nonbiobank stakeholders. However, to be adopted, a classification method must be simple, broadly applicable, and use elements that are the least likely to change over the life of the biobank. Therefore, we conclude that brand and its subelements should be the core basis for classification to communicate with all stakeholders and that design, biospecimen, and donor elements should be selectively used as secondary classifiers. As an example, our own biobank name (BC Cancer Agency's Tumour Tissue Repository) incorporates brand and disease subelements, but for consistency with this proposed schema and to more broadly communicate across our stakeholders, we might consider a future modification to the title “BC Cancer Agency Poly-user Tumour Repository” and to adopt the secondary terms “prospective” and “large” biobank when appropriate for the target audience.

Footnotes

Acknowledgments

This work was conducted as part of the BC Cancer Agency's Tumour Tissue Repository Program (supported by the BC Cancer Foundation), the BC BioLibrary (supported by a grant from the Michael Smith Foundation for Health Research), and the Canadian Tumour Repository Network (supported by a grant from the Institute of Cancer Research, Canadian Institutes of Health Research).

Author Contributions

Both authors contributed to the ideas presented here and to the development and preparation of the manuscript. Both authors read and approved the final manuscript.

Disclosure Statement

There are no conflicts of interest to disclose.

References

Zwick

, Wallasch

, Ullrich

. HER2/neu: a target for breast cancer therapy. Breast Dis, 2000; 11:7–18.

Topol

, Murray

, Frazer

. The genomics gold rush. JAMA, 2007; 298:218–221.

Hughes

, Barnes

, Watson

. Biospecimen use in cancer research over two decades. Biopreserv Biobank, 2010; 8:89–97.

Kononen

, Bubendorf

, Kallioniemi

et al. Tissue microarrays for high-throughput molecular profiling of tumor specimens. Nat Med Jul, 1998; 4:844–847.

Hewitt

. Biobanking: the foundation of personalized medicine. Curr Opin Oncol, 2011; 23:112–119.

Riegman

, Morente

, Betsou

et al. Biobanking for better healthcare. Mol Oncol, 2008; 2:213–222.

Ginsburg

, Burke

, Febbo

. Centralized biorepositories for genetic and genomic research. JAMA, 2008; 299:1359–1361.

Barnes

, Parisien

, Murphy

, Watson

. Influence of evolution in tumor biobanking on the interpretation of translational research. Cancer Epidemiol Biomarkers Prev, 2008; 17:3344–3350.

Barnes

, Grom

, Griffin

et al. Gene expression profiles from peripheral blood mononuclear cells are sensitive to short processing delays. Biopreserv Biobank, 2010; 8:153–162.

10.

Betsou

, Lehmann

, Ashton

et al. Standard preanalytical coding for biospecimens: defining the sample PREanalytical code. Cancer Epidemiol Biomarkers Prev, 2010; 19:1004–1011.

11.

Moore

, Kelly

, Jewell

et al.

Biospecimen reporting for improved study quality (BRISQ)

Cancer Cytopathol, 2011; 119:92–102.

12.

Cambon-Thomsen

. BRIF: bio-resource impact factor. An International Working Group towards An Operational Index to Promote and Recognize Sharing of Biological Samples and Associated Data. Biopreserv Biobank, 2011; 9:98.

13.

http://en.wikipedia.org/wiki/Categorization.

14.

Riegman

, de Jong

, Llombart-Bosch

. The Organization of European Cancer Institute Pathobiology Working Group and its support of European biobanking infrastructures for translational cancer research. Cancer Epidemiol Biomarkers Prev, 2010; 19:923–926.