Abstract
The importance of stimulating greater sharing of data for use and reuse in health research is widely recognized. To this end, the findable, accessible, interoperable, and reusable (FAIR) principles for data have been developed and widely accepted in the research community. Research biospecimens are a resource that leads to much of this health research data but are also a form of data. Therefore, the FAIR principles should apply to biospecimens. Nevertheless, there is a widespread problem of not sharing biospecimen resources that is clearly visible within the research arena. The impacts of this are likely to include diversion of precious research funds into compiling duplicate biospecimen cohorts, detraction from research productivity as researchers compete for and create duplicate resources, and deterrence of attempts to assess research reproducibility. This article explores some of the barriers that may limit availability of FAIR biospecimens. These barriers relate to the type of biospecimen collections and the characteristics of the custodians that influence their intention and interest in sharing. Barriers also relate to the ethical, legal, and social issues concerning collections, the research context of the collections, and cost and expertise involved in repurposing collections to enable sharing. Several solutions to increase sharing are identified. Some have recently been implemented, including enhancing biospecimen locators with tools to guide researchers and facilitating transfer of research collections to centralized biobank infrastructures at the conclusion of projects. New proposed solutions include improving search capabilities within publication databases, and introduction of evidence-based justifications for all new collections into peer-reviewed grant competition processes. It is recognized that there are both scientific factors and practical reasons that can impose limits to sharing biospecimens. However, funding availability, productivity, and progress in health research all stand to benefit from improved sharing of research biospecimen collections.
Introduction
In health research a major focus is to discover, translate, and validate new biomarkers. Biomarkers are commonly discovered and measured in biospecimens. Many biospecimens are generated through medical/surgical procedures performed in the context of clinical care when remnant portions of biospecimens become available for research. Other biospecimens are obtained directly from patients. Biospecimens are either used for immediate analysis or stored for future research. Research biospecimens represent an important data source by themselves but gain most value through linkage to annotating data.
In the data field, the concept of findable, accessible, interoperable, and reusable (FAIR) data has been developed to enhance sharing and reuse. 1 The FAIR principles have since been promoted by international implementation initiatives, 2 adopted by journals, 3 and strongly encouraged by national research funders.4,5
The FAIR principles are intended to apply not only to “data” in the conventional sense, but also to the algorithms, tools, and workflows that led to those data. 2 In this sense, it can be argued that biospecimens are not only a form of tool or resource that leads to data but are also a form of data. 6
Given the fuzzy distinction and overlap between biospecimens and data, we and others6,7 propose that the FAIR principles should apply equally to human biospecimens, in the same way as other forms of data. For example, after biospecimens have been deployed for the initial intended purpose, any remaining biospecimens should be FAIR. To be findable, available biospecimens need to be indexed by locators as individual specimens and/or cohorts. To be accessible, mechanisms should be established for custodians to provide available biospecimens to qualified researchers and access processes should be standardized and harmonized between biobanks. To be interoperable, those biospecimens that are fit for new purposes should have sufficient annotating data.
These data should describe the biobanking processes, including the standards under which the biospecimens were collected, processed, stored, and managed. Annotating data should also include information about intrinsic biospecimen qualities to allow biospecimens to be selectable for specific purposes. To be reusable, biospecimens should be associated with donor consent that enables sharing. Biospecimens should also be processed and stored in a format such that portions of intact biospecimens remain available to ensure that future technical and scientific advances can be applied to raw materials.
As argued by Holub et al., 6 the original FAIR principles need to be modified further in the context of health research using biospecimens to address not just wider sharing, but also improved reproducibility and privacy protection. Holub et al. 6 proposed a number of specific additional FAIR-health principles to address these other aspects of sharing in the context of biospecimens, categorized into quality, incentive, and privacy principles.
The actions to attain FAIR status for biospecimens have, therefore, been clearly delineated. The FAIR-health extended principles also provide a valuable set of additional detailed requirements to enhance the process and value of sharing biospecimen collections. 6 However there has been relatively little progress around implementing these actions to apply FAIR principles to biospecimens. This is despite the growing awareness and commitments made for other forms of data.8–12 For example, a search of the PubMed database reveals that <3% of >700 articles published since 2015 that mention the FAIR principles also include the terms “biospecimens” or “specimens.”
Impact of Not Adopting FAIR Principles for Biospecimens
The widespread problem of not adopting FAIR principles and encouraging more sharing of biospecimen resources is clearly visible within the research arena. 13 At the same time, the impact of failing to share has not been well explored and needs research. 14 Just as with the issue of scientific reproducibility, 15 there will be a range of viewpoints, 16 but we believe that there are many possible impacts from lost opportunities to share biospecimens. These include the financial cost of cohort duplication detracting from research funding available for individuals, the energy costs of fostering competition for infrastructure resources detracting from research productivity, the patient donor confidence costs not maximizing use of generous donations, and the progress costs of hindering reproducibility efforts that would be enabled by collections being openly available.
Some of these impacts may increase as health research places a greater emphasis on real-world data sources17,18 and higher “complexity” biospecimens.19–21 For example, it will become more difficult to rely on combining multiple small biospecimen cohorts. It will also become more difficult to meet research demand for “complex” quality biospecimens.19–21 “Complex” qualities of biospecimens are features that are mostly extrinsic to the biospecimens, such as linkage to other biospecimens from the same donor from different anatomic locations or at different timepoints.19–21 These types of biospecimens are most commonly associated with biobanks created in the context of research studies, and their utilization in health research is increasing.19–21
Further consideration of the impact of not adopting FAIR principles for biospecimens and the specific factors and barriers to current sharing of biospecimen collections is warranted. Attention to these issues may help to support wider implementation of existing solutions and to formulate additional solutions to expanding the availability of FAIR biospecimens for research.
Barriers That May Limit Availability of FAIR Biospecimens
To identify the most important barriers to the availability of FAIR biospecimens, it may be useful to consider the types of collections that exist, the relationship between collection types and sharing, and the reasons that some types of collections are less frequently shared.
Types of collection
There are some biospecimen collections to which FAIR principles are routinely applied in the course of regular operations. 22 Nonetheless there are many biospecimen collections to which FAIR principles are often not applied.
The spectrum and classification of types of research biospecimen collections have been discussed previously,22–24 where the simple three types classification with respect to intended use is most relevant. 25 These collection types are designated poly-, oligo-, and monouser biospecimen collections.23,25 Polyuser research collections (most commonly identified as “biobanks/biorepositories”) are created with the intention of supporting multiple research studies by multiple external users. 24 So these collections usually strive to be “FAIR” and are usually accessible and to a greater or lesser extent findable.
But these collections are not necessarily interoperable or reusable. Oligo-user collections (frequently created as “core” research group resources) are often created in the context of clinical studies with the intention of supporting multiple future studies by a defined group of users. 24 So these collections are often shared within a closed group, but are also often not FAIR. Monouser collections (often created as individual project/laboratory collections) are usually created through a single study or group of related studies, with the intention of supporting a single user. 24 Creation of monouser collections in the context of new research programs first became common in the 1990s as biospecimen use in research increased.26,27
Although these collection types are sometimes reused (i.e., repurposed for future related studies by the user or provided to the user's collaborators), opportunities for transition to openly shared FAIR collections are increasingly common. This can occur with retirement of original research custodians, subject to participant consent preferences and appropriate institutional governance arrangements. If either of these aspects are not appropriately planned for in advance, this can lead to the loss of potentially valuable biospecimen cohorts.
Characteristics of the custodian
In all types of health research collections, biospecimens are obtained, used, and stored for research under the auspices of an individual or group of individuals identified as the custodian(s).7,28,29 Although there are national variations on the definition of custodian, it is often considered as an ethical term, 29 rather than a legal or regulatory term. 28 The concept of custodianship is also complex and the characteristics and biospecimen sharing behavior of custodians may be influenced by the institutional framework within which they work, among other forces. 7
Expanding sharing and implementing mechanisms to achieve each component of FAIR ultimately require custodians to embrace the principle of biospecimen sharing, and for custodians to either intend, or be required, to share. Sharing is implicit in the characteristics of custodians who choose to devote research effort to create polyuser research collections. In contrast, the concept of wider sharing of biospecimen resources can be more challenging for custodians associated with mono- and oligouser collections.
Research is a highly competitive arena, in terms of both pursuing avenues of investigation and securing funding support. The challenges of securing research funding to create collections that lead to publication output mean that it can be difficult for custodians to relinquish what could be useful research assets in the future. 13 The sense of ownership that can develop when creating any form of collection also contributes to selective pressure for competitive individuals to not share what could be viewed as hard-earned resources. 24
This can occur even where these resources ultimately derive from generous patient donors who may anticipate wider use of their donations. 30 The concept of biospecimens as forms of data and thus subject to sharing, rather than as private research resources and thus subject to intellectual property laws, is also new for some custodians. This means that the acceptance of the principle of sharing among such custodians is often low.
The ethical, legal, and social issues context of the collection
The custodian of a collection is generally identified by the relevant ethics review board approval.28,29,31 Sharing biospecimens is not a requirement for such approval, but if sharing is proposed, this can trigger an expanded review process.32,33 The overall principles for addressing legal and ethical frameworks for biospecimen or data sharing have been well delineated. 31 However, it should be noted that the ethical, legal, and social issues (ELSIs) and review board considerations are complex, vary greatly from country to country, and can only be summarized here.
If the custodian has a plan for biospecimen or data sharing at the outset, then there will usually be a requirement to obtain broad consent for research use from participant donors and to establish robust governance processes. If the plan for sharing is developed after initial ethics approval, then the ethics approval, including participant consent, will need to be revisited. Where the original participant consent does not allow for sharing, participants may need to be approached for reconsent. The case for a waiver of consent could also be developed, and may be granted if reconsent is not possible. Navigating and addressing these ethics requirements involve time and effort on the part of the custodian and can inevitably be viewed as a barrier to rendering collections FAIR.
Research context of the collection
Mono- and oligouser research biospecimen collections represent important resources within the research landscape, 24 with the main recent increase in biospecimen utilization for health research attributable to these types of collections. 21 Biospecimens in mono- and oligouser collections are often obtained directly from patients (e.g., blood samples) or from clinical pathology archives (e.g., Formalin Fixed Paraffin Embedded blocks). 34 Prospective collection directly from patients through clinical studies also often results in creation of higher demand “complex” quality biospecimen collections, as opposed to lower demand “simple” quality biospecimen collections. 21
Mono- and oligouser research collections are often created in the context of a specific project or study, to address a particular research question. These collections can be supported by time-limited grant funds that are secured by a principal investigator, where the projects are conducted by research trainees and supported by laboratory staff. However, specific features of these collections (e.g., specific patient groups, biospecimens with complex qualities, unpublished raw data generated by the laboratory, and feasibility of obtaining ongoing outcomes data) lend themselves to initial high expectation for reuse by the original research team in future investigations.
Given the significant initial investment in time and resources (that are often underestimated by researchers), combined with strong anticipation of potential future uses by the research custodian, there are often no plans for sharing. This is reasonable in the short term, however, plans for sharing are rarely revisited over time, until funding or storage resources become limiting factors. At this point, it may be challenging to address some of the basic interoperability and reuse requirements to render the biospecimens FAIR.
Investments in collection maintenance and repurposing
Although the initial purpose for most mono- and oligouser collections is usually predetermined, the documentation of collections is often nonstandardized. The organization of collections is also often undertaken by research trainees and staff who can be unfamiliar with biobanking processes. Furthermore, after collections are created through a funded project, there can be no dedicated plans or resources for maintaining the collection. As a result, continuity of management often fails through trainee and staff turnover.
Similarly, there is usually no consideration given to the costs of ongoing storage until either freezer space becomes limiting or freezers fail. Expenditure for replacement or additional freezers is rarely subjected to impartial valuation and utilization assessments and this has contributed to a proliferation of hallway freezers in many research institutions. 35
The presence of ongoing storage space is, therefore, often uncoupled from the existence of expertise and resources required for collections to be made available to external users. This means that the easiest default position continues to be to not share. Even when sharing is considered, there may be limited mechanisms to advertise the collection with sufficient detail. There may also be limited capacity to respond to enquiries from potential users and to review applications, to organize shipping of biospecimens and data, and/or to implement partial cost recovery.
There are also no disincentives introduced by the peer review system or by funders to duplication of existing collections. Faced with no effective marketplace to post and acquire such collections, and a common underestimation of the resources needed to create and maintain high-quality collections, researchers will continue to propose the creation of new collections even when adequate materials may already exist. 36
Solutions to Stimulate Availability of FAIR Biospecimens
Develop mechanisms to overcome practical barriers to finding research collections
The biobanking community, predominantly custodians involved in polyuser biobanks, has dedicated significant efforts in the form of national and international locators 37 to enhance researcher capacity to find biospecimen collections. However, mono- and oligouser collections remain very underrepresented on these locators.37,38 There are also no requirements in many jurisdictions for research collections to be listed on a public locator, and as a result, it has been reported that between 15% and 35% of biobanks choose not to be listed. 37 More importantly, our own experience is that collection locators are underutilized by researchers.
The visibility, usability, and purpose of collection locators, therefore, need to be carefully reassessed, especially in view of data suggesting researcher preferences for accessing local sources. 39 Two approaches to enhancing existing locators by connecting them with additional tools have been described.40,41 The first approach is the development of a complementary locator tool that guides researchers in their decision making on specific biospecimen needs. The tool then guides them in how best to obtain them. 40 Another approach to enhance the effectiveness of locators is to introduce a tool to help researchers to communicate and guide negotiations with biobanks after potential biospecimens have been identified within a locator. 41
An alternative form of locator that is already familiar to researchers is publication databases such as PubMed. We and others 42 have advocated for improved documentation of the sources and characteristics of biospecimen collections associated with publications to address variable quality in the end research. 43 Tools to identify sources and detailed features and meta-data of biospecimen collections such as Bioresource Research Impact Factor, 44 Biospecimen Reporting for Improved Study Quality,45,46 Minimum Information About BIobank data Sharing, 47 and Standard PREanalytical Code 48 have been created to be incorporated into articles. Although some journals have endorsed these tools, widespread adoption has not occurred.
One barrier to adoption may be the search capabilities within the publication databases. For example, in PubMed no single term or even set of terms can be used to effectively search for publications in a topic area associated with biospecimen research collections.19,20 Addition of relevant PubMed MeSH terms in combination with more detailed information about biospecimens used in generating publications might provide a more effective and familiar search tool than biobank locators.
Make research collections easier to share
When individual researchers create new biospecimen collections, there are often no requirements provided by funders or institutions to enhance future interoperability. Requirements or recommendations could include the need to adopt specific biobanking standards, collection protocols, or standards for the annotating and generated data. Although efforts have been made in some of these areas, the institution of at least recommendations or requirements in all these areas could lead to better harmonization of collections and improve interoperability.
Similarly fostering utilization of local biobanking services that are also provided by centralized biobank infrastructures would facilitate the management of many collections. It would be important to maintain ongoing governance of collections by the original custodians, particularly while collections are actively supporting the original research project. At the same time, custodians can take steps to make research collections open access and/or to transfer governance of the collection, particularly at the end of the funded research project.
This could involve placing the collection under a new scientific advisory committee that could include the original custodian or would otherwise involve transfer of custody. 7 The early stages of this transition could include discounted storage costs for collections that adopt FAIR principles from the outset and/or that demonstrate clear plans for reuse of their biospecimens. In the later stages after transfer of governance, there could also be a grace period wherein the original custodian or research team has first choice of biospecimens. In this way, central biobank infrastructures can act as hubs for long-term storage and support continued storage and distribution of the highest value/demand research collections.7,24
The broader use of existing collections may not always be possible (e.g., when funder and regulatory requirements impose restrictions), or desirable (e.g., when provenance and quality details are not sufficiently well documented). It would, therefore, be necessary for biobanks serving as centralized infrastructures to develop criteria and guidelines to identify the highest value collections. We have previously proposed some high-level criteria encompassing ethical factors, collection factors, operational issues, and research value, which could be developed further. 34
Modify biobank and researcher behaviors in favor of sharing
Funders and institutions should stimulate more discussion and promotion of FAIR and FAIR-health principles where biospecimens are concerned. As part of these activities, it is important to educate researchers on the real costs/efforts required to establish new collections and alternative options to obtain the necessary biospecimens. 40 At the same time, institutional biobanks should focus more on providing services to support research collections with the types of data that researchers require, or are likely to require, in the future. 36
Previous studies have demonstrated researcher preferences for “local collections,” partly because of the ability to directly link to clinical data. 39 This should be a (further) incentive for institutions to provide support for mechanisms that enable locally collected and administered biospecimen cohorts to be made available and shared.
An alternative that is available to funders is to promote the use of existing biobank infrastructure services and to raise the bar for approving funding to create new collections. Peer grant review panels scrutinize the importance of projects and the need for and the details of the methods of analysis of a biospecimen collection to address the research question. However, limited consideration is typically given to alternative biospecimen sources when new collections are proposed. This is partly because the question of biospecimen provenance is typically not part of the peer review mandate. But this is also because these alternative biospecimen sources are typically difficult to identify.
In some cases, a new biospecimen collection will be integral and essential to the research, such as where projects seek to identify biomarkers associated with responses to new or experimental therapies. However, in other cases, justifications for creating new collections can be less compelling. Funders should introduce requirements for evidence-based justifications for all new collections into peer-reviewed grant competition processes. These requirements should be combined with supporting better visibility and findability of existing collections and encouraging utilization of existing collections and biobank infrastructures.
Limitations to Sharing Biospecimens
As we have outlined, there may be widespread health research benefits in improving the extent of biospecimen sharing. However, we recognize that there is a limit to the value of sharing all biospecimens. There will be ongoing requirements for fit-for-purpose biospecimens that cannot be met by existing biospecimen collections. One example of this is biospecimens associated with new treatments that are collected in association with clinical trials to identify biomarkers of response to therapy. Furthermore, not all existing biospecimens will be suitable for reuse for future research. In addition, whereas data sharing has no practical limit on reuse, and some biospecimen products such as nucleic acids have fewer limits, biospecimen reuse is limited to the amount of physical tissue remaining.
Although every effort should be made by researchers, biobankers, funders, and other stakeholders to optimize the use of locators, standardize mechanisms to access biospecimens, develop annotating data, and strategically plan for sharing, it is inevitable that there will not be 100% overlap between past research collections and future research needs. Therefore, the purpose of advocating for FAIR biospecimens is to optimize use of biospecimens under known system constraints.
Conclusions
The FAIR principles encapsulate a set of features and characteristics that promote the action and value of sharing, enhancing ability and capacity to share resources. The main barriers that may limit availability of FAIR biospecimens relate to mono- and oligo-type biospecimen collections and the characteristics of the custodians with respect to their intention and interest in sharing. Barriers also relate to the ELSIs concerning collections, and the cost and expertise issues involved in repurposing collections for sharing.
Recently implemented solutions to increase sharing include enhancing biospecimen locators with tools to improve their function, beyond initial identification of biospecimen collections. Facilitating transfer of research collections to centralized biobank infrastructures at the conclusion of projects may also help to overcome cost and expertise issues. In addition, we propose some new solutions, such as enhancing findability of biospecimens through improving search capabilities for biospecimen cohorts within existing publication databases.
Furthermore, the introduction of evidence-based justifications for all new collections into peer-reviewed grant competition processes could be considered by funders. National research funders have already taken similar steps to promote adoption of FAIR principles for data, with recent addition of requirements to provide data management plans as part of grant submissions. 49
Failure to adopt FAIR principles for research use of biospecimens may have many negative impacts, just as has already been recognized in the context of data.14,50 Although both scientific factors and practical reasons can impose limits to sharing biospecimens, funding availability, productivity, and progress of research would greatly benefit from improved sharing of research biospecimen collections. Broader adoption of FAIR principles for biospecimens would be an important step in the right direction.
Footnotes
Acknowledgments
P.H.W. gratefully acknowledges support for this study by the Biobanking and Biospecimen Research Services Program at BC Cancer (supported by the Provincial Health Services Authority), the Canadian Tissue Repository Network (funded by grants from the Institute of Cancer Research, Canadian Institutes of Health Research, and the Terry Fox Research Institute). J.A.B. gratefully acknowledges support of the NSW Health Statewide Biobank from the NSW Office of Health and Medical Research, NSW Health Pathology, and the Sydney Local Health District. A.R. gratefully acknowledges the support of the Menzies Centre for Health Policy and Economics at the University of Sydney.
Authors' Contributions
A.R. and J.A.B. contributed to writing—review and editing and P.H.W. was involved in conceptualization and writing—original draft.
Author Disclosure Statement
No competing financial interests exist.
Funding Information
This study has supported, in part, by grants from the Institute of Cancer Research, Canadian Institutes of Health Research (Grant# 374111), and the Terry Fox Research Institute (Grant# 1066), to P.H.W.
