Abstract
Background:
Establishing targets for case accrual is an important component of a strategic plan for a biobank. We have previously assessed overall patterns of biospecimen use in cancer research publications in selected journals. Here we extend this analysis to consider patterns of biospecimen use in relation to cancer research programs developed by individual investigators.
Methods:
We selected three individual cancer research investigators whose independent research programs began circa 1986, have been characterized by extensive use of human tumor biospecimens, and have primarily involved translational research in the areas of breast, lung, and ovarian cancer. We analyzed biospecimen and data usage in their career publications categorized by numbers, type, and format, and accompanying annotating data in terms of conformance with BRISQ reporting and ethics related criteria.
Results:
Biospecimens were used in 313/474 (66%) of publications analyzed. The average number of biospecimens used by these research programs increased six-fold from less than 1000 in 2001–2003 to greater than 6000 in 2010–2012, and the average cohort sizes per article also increased from approximately 50 to 200 cases per study over the same period in most biospecimen categories (p<0.05). The relative proportions of different formats of biospecimens used has varied significantly and continues to change with the emergence of digital biospecimen derived data. In these three translational research programs, BRISQ elements relating to ‘Biobank’ categories were significantly less well reported for biospecimens used in publications than data corresponding to ‘Clinical chart’ categories (p<0001).
Conclusions:
This study shows that overall use of biospecimens in cancer research has increased significantly and that dynamic variation in the relative use of different biospecimen formats has also occurred. This study also confirms our previous findings on patterns of biospecimen use and also those concerning incomplete reporting of relevant data elements that has not improved in the past decade.
Introduction
B
These first two phases of biobank development have been predominantly biobank-centric and focused on product oriented areas and internal operations. As the pressure to be sustainable grows and costs increase,9,10 biobanks also need to focus on enhancing their value and measuring their impact from the perspective of external stakeholders (e.g., public, funders, and a broader spectrum of researchers). 11
Increased orientation of biobanks towards a customer focus requires better data about the market served in order to formulate strategies and to match the projected future demand for both biospecimens and data. 11 Examining the patterns of historical use of biospecimens and of data used in reporting on biospecimens may provide relevant knowledge to biobanks. This type of information is needed in order to better plan for future collection activities and to predict the level of quality needed 12 and the extent of the annotating data that is needed 13 with the biospecimens.
We have previously examined these questions by conducting literature surveys of publications in cancer research journals across 2 decades.14,15 We have assessed overall patterns of biospecimen use in cancer research publications in these selected journals and followed this by assessing the quality of reporting of biospecimen-related data. However, we recognize that these analyses may be subject to biases including the specific scope and changing impact factors of the limited number of journals selected for review. Here we seek to confirm and extend these analyses by considering patterns of biospecimen use in relation to the evolution of three cancer research programs developed by individual investigators and their publications in a wide range of journals over a similar time period.
Materials and Methods
Three Canadian cancer investigators were selected for review of their publications on the basis of their similar professional trajectory, known emphasis of interest on translational (as opposed to basic or clinical) research, a close association with biobanks as creators and users, and their distinct focus on three defined tumor systems (breast, lung, and ovarian). Since each investigator followed different training periods but all became established in faculty positions and initiated an independent research laboratory program in the period 1986–1988, we restricted our analysis to publications generated between the years 1986 and 2012. Out of a total of 474 papers assessed, 66% (313 papers published in more than 40 journals) used biospecimens or digital biospecimen data originally derived from biospecimens. Digital data included data generated by the coauthors in a previous study but was often data from other laboratories and investigators not included in the article from clinical laboratory analyses (e.g., BRCA1 testing results from blood samples) and data from past studies or from web databases (e.g., microarray analysis of fresh and frozen tumor tissue samples).
Excluded from review were articles relating to collaborations in clinical trials, and publications for which we did not have full access. For each article, the number and format of biospecimens and/or digital biospecimen data used was documented. Types of biospecimens included ‘hematological’ or ‘tissue’; and formats of tissue biospecimens were subclassified as “fresh” tissue, “frozen” tissue, “FFPE” tissue, and “TMA” (FFPE tissue processed into a microarray format), as previously defined. 14 In general, the number of cases used for TMA construction was used as the biospecimen number except when it was clearly stated that more than one biospecimen was sampled to create the TMA (e.g., separate TMA sampling of normal and tumor components of a case).
In addition, we assessed the frequency and reporting status of accompanying data by using the 15 Tier 1 BRISQ criteria that are recommended to be reported as a data standard. We also added three non-BRISQ elements related to the reporting of research ethics status. For analysis of BRISQ data, articles using biospecimens generated by these three investigators since 1993 were considered. A subset of 119 articles where the investigators role was as first or senior author since 1994 were selected to restrict this subanalysis to those articles published in the last 18 year period where the investigator was most likely to have direct involvement with the datasets used in the research. We used the definitions previously described to categorize the data reported (see Table 2 reference 8 ) as follows 1) Not reported (i.e., no mention of the data element even though relevant), 2) Unclear, inferred, or assumed (i.e., information relating to the data element is unclear, can be only indirectly inferred by an expert, or requires an assumption by the reader) 3) Reported (i.e., the data element was clearly stated), or 4) Not applicable (the data element was not applicable to the biospecimen type or application), to categorize the reporting status for each Tier 1 element and the three non-BRISQ elements relating to ethics data elements. 8 Statistical analyses were preformed using GraphPad Prism software and included Chi-Squared and ANOVA tests where appropriate.
Results
Overall trends in biospecimen use
Overall we reviewed the biospecimens used in 66% (313/474) articles from three investigators that each has published in more than 40 different journals. These articles arising from their research programs described analysis of data from 43,249 biospecimens and another 6,255 digital biospecimens. Data were grouped into nine periods from 1986 to 2012 to allow comparison with our earlier analyses. The numbers and types of biospecimens used by each research program are detailed in Table 1 and a comparison between the overall totals used by each program (Fig. 1A) and in aggregate (Fig. 1B) are shown in Figure 1. There was a similar pattern and also a significant increase over time in the overall numbers of biospecimens used by each research program. The average total number of biospecimens used by these three research programs in each 3-year period increased six-fold from less than 1000 in 2001–2003 to greater than 6000 in 2010–2012. In addition, utilization of digital biospecimen data also emerged after 2003 and has increased in the past decade from 10% to 20% of total biospecimen-derived data (Fig 1B).

The total biospecimen numbers used from 1986 to 2010 increased and was similar for all three investigators. The upper panel shows the comparison between the numbers of direct biospecimens (tissues and blood) used by each investigator, and the lower panel compares the aggregate numbers of direct biospecimens (black bars) and indirect digital biospecimens (gray bars) used overall by these three investigators.
See Methods for full descriptions of categories.
Biospecimen use in different categories and formats
We next considered biospecimen use in relation to tissue format. The total numbers of biospecimens used in all five categories of tissues and formats showed relatively similar patterns of change with the most notable change being the apparent steadily upward trend over the past decade (Fig. 2). Comparisons between the period of 2001–2003 and 2010–2012 showed this increase in total numbers of specimens was significant for all formats except FFPE and Fresh formats. While some of this increase was due to growth in individual laboratory programs and more publications per year, the average cohort sizes per article have also increased from approximately 50 to 200 cases per article over the same period in all biospecimen categories (Fig. 3). Comparisons between 2001–2003 and 2010–2012 showed this increase in cohort sizes was again significant for Frozen, TMA, and Hematological formats. We also analyzed the relative proportions of different tissue biospecimen preservation formats used (Fig. 4). The importance of the frozen biospecimen format rose to become the dominant format (and the fresh and pure FFPE formats declined) from the start of the period analyzed (1986–1988) up to the mid-1990s. From then, the importance of the FFPE format increased over the next decade (if the emergence of the TMA as a processed FFPE format is combined with the pure FFPE format) to the mid-2000s. Since then, there has been an upward trend again in frozen and perhaps fresh formats.

The average number of all categories of biospecimens used increased over the two and a half decades assessed. Panels show aggregate numbers of different biospecimen types (tissues and blood) and tissue formats (frozen, FFPE, TMA, TMA+FFPE, fresh) used from 1986 to 2012. Bars correspond to means±standard deviations.

The average cohort sizes of biospecimens used increased over the two and a half decades assessed. Panels show the average cohort sizes for different biospecimen types (tissues and blood) and tissue formats (frozen, FFPE, TMA, TMA+FFPE, fresh) from 1986 to 2012. Bars correspond to means±standard deviations.

The relative proportions of different tissue biospecimen formats used per article varied significantly over the more than two and a half decades assessed. Formats of biospecimens in the tissue category included fresh, TMA, FFPE, and Frozen tissues.
Reporting of biospecimen data
The BRISQ criteria were assessed for all first author articles, which used biospecimens, and categorized into groups of data fields that are typically derived from the clinical chart or a biobank, respectively, as described previously. 8 Overall we found that there was a significant difference in reporting frequency between the clinical chart and biobank data field groups (Fig. 5A, p<0.0001). Criteria that were least often reported (frequency <25%) included “Stabilization,” “Storage Duration,” “Shipping Temperature(s),” and “Storage Temperature.” To assess for any changes over time, articles were grouped into three intervals from 1994–2012. This analysis showed that the frequency of reporting of only some data fields has improved and others have decreased. Amongst the clinical chart BRISQ criteria, all clinical chart fields were reported at frequencies >50% and 3/8 were reported at >90% (see Fig. 5A) and 2/8 data fields showed an increase (“Clinical Diagnosis” and “Pathology Diagnosis” categories, p<0.05), while 6/8 fields showed no overall change over the time period analyzed. Many biobank data fields were reported at frequencies <50% and 3/8 were reported at <25% (Fig. 5A) and there were improvements in 2/8 data fields (“Selection criteria” and “Constituency of preservative,” p<0.01) but deterioration in 1/8 category (“Storage Temperature” Fig. 5B, p=0.0056) over the time period analyzed. As a control for BRISQ field reporting we also assessed three ethics data reporting criteria: “Documentation of Consent,” “Documentation of Research Ethics Board (REB) Approval,” and “Name of Institutional Review Board (IRB)/Research Ethics Board (REB).” All three showed a significant increase in reporting frequency between the time intervals from 1994 to 2012 (e.g., “Name of IRB/REB,” p<0.0001), from reporting frequencies of <10% to >60%–70%.

The reporting frequency of tier 1 BRISQ criteria was low for elements typically obtained from biobanks. The upper panel shows overall reporting over the entire period for all BRISQ tier 1 categories sorted into the two source categories (‘Clinical chart’ and ‘Biobank’) based on the most common source of data for these elements as previously described 8 , and sorted into three additional ‘Ethics’ related categories assessed. The lower panel shows changes in frequency of reporting over three 6-year periods from 1995 to 2012 for four example categories. The total numbers of articles assessed was n=40, 41, and 38 in each of the time periods shown and for the storage duration category ‘assumed’ was combined with ‘reported’ status.
Discussion
We have assessed biospecimen use in publications from three investigators whose independent laboratory-based cancer research programs began around 1986 and have spanned over 25 years, focusing predominantly on breast, lung, and ovarian cancer. Total biospecimen use by each investigator increased initially, as might be expected in the first 10 years analyzed. Once all three programs had become well established, there was a substantial increase in biospecimen use. In the last decade (2001–2012) overall use of biospecimens increased almost six-fold and average cohort sizes increased almost four-fold. This rising trend was apparent for all biospecimen types and formats assessed. Of the recently recommended BRISQ elements of annotating data, those elements commonly obtained from the “Clinical chart” were significantly better reported than elements that would typically be obtained from the “Biobank.”
These findings are similar to those of our previous three related studies where we were able to demonstrate comparable increases in overall biospecimen use and cohort sizes,. We also reported relatively poor documentation of BRISQ data elements in cancer research publications reviewed over a similar time period (from 1988 to 2010).8,14,15 Our previous studies were subject to several potential biases. These relate to potential changes over the two decades in relative impact factors (i.e., reviewers of higher impact journals often require larger datasets and additional validation data sets), editorial policy (i.e., the journal Clinical Cancer Research includes larger clinical studies and trials than journals such as Cancer Research), changing demographics, and publication bias (i.e., funding agencies have promoted increased ‘translation’ of research from laboratory models to human samples and data, that has encouraged new researchers to train in applied ‘translational research’ fields and established researchers to increasingly include a ‘translational’ component to their studies). Our current study also has some biases. These relate to change in experience (i.e., the evolution from new to senior investigators) and funding (i.e., increased scale of funding acquired over time enabling larger studies), and also changing demographics and publication bias. In the latter case, an individual laboratory program can often evolve from discovery using cell lines and animal models at the outset to validation involving translational studies using biospecimens. There is also an inherent trend to validate findings using larger or multiple cohorts in order to publish new findings and in higher impact journals. Given these several biases, it is notable that the overall pattern of increasing biospecimen use was similar between these three investigators. It is also reassuring that the two different approaches used in our previous and current studies (i.e., assessment of many investigators publishing in a limited number of four journals versus assessment of a limited number of investigators publishing in a wide range of over 40 journals) have given very similar results.
We have previously noted increased cohort sizes over the 2 decades up to 2008 for both Frozen and FFPE format tissues but not Fresh tissues. At the same time, we noted an increase in the relative proportion of studies using FFPE tissues and a decrease in those using frozen or fresh tissues alone.14,15 Our data here show a similar pattern, however with the extended period of observation from 2008 up to 2012 it is possible to conclude that the proportion of studies using Frozen tissues may no longer be declining, and may even be rising. It should be noted that we have not previously assessed digital biospecimen-derived data use, but in many cases the digital data was either reported directed as or can be assumed to have been generated from Frozen tissues. Since there was clearly a steady increase in digital biospecimen use over the last decade, this would further add to the recent upward trend in relative Frozen tissue use. Our observations here confirm that despite the dramatic rise in use of FFPE and TMA formatted tissues previously noted that changes in relative biospecimen formats may remain dynamic and frozen tissue biospecimens remain an important format.
A relatively low frequency of reporting of ‘biobank’ category BRISQ elements in publications has been noted by ourselves and others.8,16 In one report, less than half of a set of 125 ‘biomarker discovery’ related articles contained information about the biospecimens used. 17 The proportion with at least some biospecimen-related data reported found here was higher and may be related to the fact that all three investigators have all been involved in biobank activities alongside their research programs, but the results here are nevertheless surprising. We speculate that in many instances detailed biospecimen-related data may have been omitted because of restrictions in manuscript lengths up until the past decade when submission of online supplementary data has become possible. However, no significant improvement has occurred in the past decade despite the marked improvement in reporting of ethics-related fields. It is likely that the main driver for encouraging reporting of complete datasets is the reviewer community and journal requirements.
We conclude that there are many factors driving the increasing use of biospecimens in cancer research, but perhaps the strongest of these are new enabling technologies (i.e., the changing ability to measure and analyze the same factor such as mRNA in smaller samples and with higher throughput). While digital data may well become even more prominent in the next decade, we predict that the need to study primary biospecimens that are fully annotated and reflect new approaches to treatment will continue to drive demand.
Footnotes
Author Disclosure Statement
No competing financial interests exist.
This work was supported by the Tumour Tissue Repository Program at the BC Cancer Agency (a part of the Canadian Tumour Repository Network that is funded by a grant from the Institute of Cancer Research, Canadian Institutes of Health Research) and the Office of Biobank Education and Research, University of British Columbia (that is supported by the Department of Pathology and Laboratory Medicine, University of British Columbia.
