Abstract
Background:
Building or acquiring research data management (RDM) capacity is a major challenge for health and medical researchers and academic institutes alike. Considering that RDM practices influence the integrity and longevity of data, targeting RDM services and support in recognition of needs is especially valuable in health and medical research.
Objective:
This project sought to examine the current RDM practices of health and medical researchers from an academic institution in Australia.
Method:
A cross-sectional survey was used to collect information from a convenience sample of 81 members of a research institute (68 academic staff and 13 postgraduate students). A survey was constructed to assess selected data management tasks associated with the earlier stages of the research data life cycle.
Results:
Our study indicates that RDM tasks associated with creating, processing and analysis of data vary greatly among researchers and are likely influenced by their level of research experience and RDM practices within their immediate teams.
Conclusion:
Evaluating the data management practices of health and medical researchers, contextualised by tasks associated with the research data life cycle, is an effective way of shaping RDM services and support in this group.
Implications:
This study recognises that institutional strategies targeted at tasks associated with the creation, processing and analysis of data will strengthen researcher capacity, instil good research practice and, over time, improve health informatics and research data quality.
Keywords
Introduction
Research data represent a complex ecosystem with substantial value to scientific enquiry. As a national priority firmly embedded in Australian policy and funding targets (Australian Government and Department of Education and Training, 2015; Innovation and Science Australia, 2017), initiatives aimed at better access to, and sharing of, research data have followed suit. In fact, the estimated value of data to Australia’s public research is $1.9 billion and possibly up to $6 billion per annum (at 2012–2013 levels of expenditure and activity) (Houghton and Gruen, 2014). Coupled with the growing capabilities of data analytics and our increasing capacity to link discrete datasets, the enhanced use of research data presents a remarkable opportunity (Holman et al., 2008; Jutte et al., 2011; Kelman and Bass, 2002). Some exemplars include linking research data with childhood education outcomes to determine optimal birthweight percentile in Aboriginal and non-Aboriginal Australian children (McEwen et al., 2018); prison medical records of incarcerated adults to explore the association between self-harm history and future self-harm risk (Borschmann et al., 2017); genome-wide association analysis to gain insight into the genetic architecture of complex traits (Nagy et al., 2017); or census data to explore population-wide ethnic variations in the incidence of acute myocardial infarction and survival rates (Fischbacher et al., 2007).
Given the potential for data reuse and linking and the growing complexity of data-driven research, data stewardship is of paramount importance. Caring for data is essential to its integrity; especially since often it has a longer lifespan than the research project that created it (Gonzalez and Peres-Neto, 2015; Vines et al., 2014). In recognising the significance of data, planning and investment of effort throughout and beyond the course of the project is required. If data are well organised, documented and preserved, it may be invaluable to advancing scientific inquiry and increasing opportunities for learning and innovation (Chung et al., 2006; Knottnerus, 2016).
The enduring result of research data management
Research data management (RDM) is an essential part of the research process that involves the creation and stewardship of research materials to enable their use for as long as they retain value (Digital Curation Centre, 2018). RDM best practice is of particular interest for higher academic institutions involved in the development of training programs supporting researchers (Cox et al., 2017; Surkis et al., 2017; Whitmire et al., 2015), as well as stakeholders such as funding bodies (McFadden et al., 1995; Margolis et al., 2014; OECD, 2007; Read et al., 2015a). Primarily, RDM best practice ensures research integrity and transparency and maximises impact and reach of research data (Neylon, 2017). In addition, the publication and re-usability of research data bring great benefits such as enhancing the reputation of researchers and institutions, meeting obligations of funders and compliance with the open science agenda (Borgman, 2012; Universities Australia, 2016).
Researchers and their institutions have a responsibility to ensure that research data are well managed. In Australia, this is defined by the Code for the Responsible Conduct of Research (National Health and Medical Research Council, 2018) which recognises that good RDM practice includes ownership, storage and retention and accessibility to data. While traditionally academic libraries are the key support providers for RDM (Brown et al., 2015; Cox et al., 2017), the role for information management and engagement with the research community has expanded to include contemporary aspects such as policies surrounding data management and preservation, data management training and online resources, establishment of local data storage capacity and data literacy education (Akers and Doty, 2013; Wolski et al., 2017; Yu et al., 2017). A recent international study of RDM activities, services and capabilities in higher education libraries highlights major opportunities for librarians to engage more deeply with RDM practices in new ways and to extend support infrastructure to meet the demands of data-intensive research environments (Cox et al., 2017). The study concludes that while academic libraries have provided leadership in data management advocacy and policy development to date, these activities are not developed in isolation and should be a collaborative effort between key stakeholders such as IT departments and research directorates (Akers et al., 2014; Cox et al., 2017; Si et al., 2015).
RDM as contextualised by the research data life cycle
Using the research data life cycle framework provides the opportunity to understand where researchers are focusing their efforts in studying RDM. It is critical that RDM is achieved in each stage of the research data life cycle (Box 1), a concept that is often referred to help researchers understand the scope and meaning of data management tasks (Surkis and Read, 2015; UK Data Archive, 2012–2018).
Stages of the research data life cycle and associated RDM tasks (UK Data Archive, 2012–2018).
RDM: research data management.
Academic services modelled on the research data life cycle and based on researcher feedback and needs have been developed to better support research activities in institutions such as University of North Carolina (Vaughn et al., 2013) and the University of Queensland (Yu et al., 2017). These services are tailored to meet the needs of their researchers and integrated into relevant activities that span different phases of the research data life cycle. A scoping review that examined the literature on RDM in academic institutions identified that the majority of the included articles (31.0%) aligned with tasks associated with giving access to data, including the distribution, promotion and control of data (Perrier et al., 2017). This was followed by articles that addressed the preservation (27.0%) and reuse (17.0%) of data which conform with global trends in funded research, moving towards implementing policies on data, materials management and open access requirements (European Commission, 2017; National Institutes of Health, 2003; OECD, 2007; Universities Australia, 2016; Wilkinson et al., 2016). This review highlights a significant gap in the literature that examines the earlier stages of the research data life cycle (specifically creating, processing and analysing data), which have direct impact on the integrity of the research, such that they influence the quality and/or usability of data.
Data management practices of researchers
Since an important step to providing the right RDM support is to understand researcher’s current practices, it is surprising that limited empirical research has been published from the researcher’s perspective to better understand data practices, influences and needs (Perrier et al., 2017).
One study by Yu et al. (2017) reported that researchers did not regard RDM as an important part of the research process and often viewed it as an administrative burden. There was a lack of understanding about the significance of RDM across the research data life cycle, a finding which is also supported by similar survey studies of academic researchers (Buys and Shaw, 2015; Schumacher and VandeCreek, 2015; Weller and Monroe-Gulick, 2014). The diverse nature of data collected, created, used and managed by university-based researchers is explored by Kennan and Markauskaite (2015), who conclude that researchers faced challenges managing their data particularly after project completion. A mixed-methods paper by Fear (2011) argues that data management is part of a continuum of practice and separating data management from other research activities is confusing to researchers and potentially counterproductive. While not all researchers exercise good practice in managing their data, the challenge is how to get them to understand the importance of assessing their current practices and, if necessary, adopt or learn new practices (Research Data Alliance, 2015). While this challenge is not unique among academic institutions, limited contextual evidence that demonstrates the true data management practices of researchers is required.
Understanding RDM practices will help articulate planning strategies for institutional services and support and outline essential areas for future research endeavours in data management practices. The evidence to date identifies the complexities in developing RDM programs and services, and in particular, evidence is lacking that specifically addresses tasks aligned with the earlier stages of the research data life cycle. Therefore, the primary objective of this study was to examine the RDM practices of health and medical researchers from an Australian academic institution and to identify gaps in skills, needs and competencies. This study will inform effective methods of RDM training and development for health and medical researchers. RDM good practice will encourage greater integration and use of research data.
Method
Participants
This project employed a cross-sectional, observational study design and sampling technique. Participants were drawn from a research institute at Griffith University, Australia, and invited by a number of internal broadcast emails to complete an anonymous online survey.
Griffith University is a comprehensive research-intensive institution with 4 academic groups and 11 research institutes. The Menzies Health Institute Queensland (referred to in this article as the Institute) was established in 2015 through a partnership between Griffith University and the prestigious Menzies Foundation. It is comprised of four key programs (health economics, disability and rehabilitation, healthcare practice and infectious diseases) that focus on translatable and multidisciplinary research.
Members of the Institute who were employed at the University (permanent, temporary or casual appointments) or higher degree by research (HDR) candidates (PhD or masters students) were eligible to participate. In addition, participants had to be actively involved in research. This was assessed by a screening question which asked whether respondents had been involved in research that collects or uses data in the past 12 months. Eligible individuals (n = 401) were also identified via the Institute’s membership database and emailed participant information about the study, directly.
Development of the survey
A 37-item survey was constructed based on an iterative process between the study investigators, in consultation with Institute research leaders, and a review of similar published literature (Anderson et al., 2007; Buys and Shaw, 2015; Bardyn et al., 2012; Henty et al., 2008; Johnson et al., 2012; Read et al., 2015b). Survey questions were categorised into five sections: (i) researcher characteristics, (ii) RDM practices, (iii) data storage and retention, (iv) data-sharing practices and (v) RDM training and development. Questions used in this article examine RDM practices associated with earlier research data life cycle tasks (see Online Appendix). While information was collected about employment status and research experience, no names or identifying information were collected. Responses were recorded against a number of multiple-choice options and skip logic was utilised to ensure that questions relevant to preceding responses were presented only as appropriate. The average time to complete the survey was 13.5 ± 8.3 min.
Survey administration
The survey was pretested with six research academics from the Institution. Feedback was incorporated, and the final version of the survey was distributed via an open-source survey platform, LimeSurvey GmbH (v1.9X; Hamburg, Germany), hosted and secured by Griffith University, available for four consecutive weeks (17 July to 11 August 2017).
Data analysis
Data analysis was undertaken using Statistical Package for the Social Sciences (SPSS) (IBM version 25.0), managed in a specific research file and stored in an area with access limited to members of the research team. Frequency analysis (descriptive statistics) was used to provide a profile of participants by researcher demographics, and multiple-choice questions were represented as counts and percentages of the total.
Ethical considerations
Participation in the study was voluntary. The study was approved by the Human Research Ethics Committee (GU/HREC#2017/457) of Griffith University.
Results
Participant characteristics
A total of 81 individuals participated in this study: 83.9% (n = 68) were identified as members of academic staff and 16.0% (n = 13) were HDR candidates. The majority of academic staff (55.8%, n = 38) had 10 or more years of experience in research, 29.4% (n = 20) had 5–10 years and the remainder had less than 5 years’ experience (14.7%, n = 10). Just over one-third (26.4%, n = 18) identified as early career researchers, with an average of ±2.3 years since being awarded their PhD. Participants were located across four Griffith University sites and were from faculty groups that spanned biomedical and medical sciences (36.0%), social sciences (27.0%), allied health (15.0%), nursing (15.0%) and medicine and pharmacology (8.0%).
Data characteristics
Participants were asked to indicate where and what format they primarily sourced data. Data were sourced from a wide variety of places, with the most common being from surveys (66.7%), interviews (54.3%) and experimental studies (44.4%). The top data formats were statistical data (.sav, .sdq, .spv) (84.0%), data from spreadsheets (.wks, .xls) (77.8%) and text file data (.doc, .docx, .log, .pdf, .rtf, .txt) (66.7%). Additionally, research data were most commonly stored on personal storage devices such as desktop or laptop computers and removable media devices, at both the creation (50.0%) and analysis (55.0%) stages of the research data life cycle (Figure 1).

Key places research data are stored at the creation and analysis stages of the research data life cycle.
When asked to estimate the volume of data held for current research and likely to be held for prospective research (+12 months’ time), the majority of participants selected 1 GB to 1 TB (41.2 and 50.0%, respectively). The proportion of participants selecting data volumes less than 1 GB currently and in 12 months’ time reduced by 11.3% and increased by 6.3% for volumes greater than 1 TB.
Management and ownership of data
When asked who was responsible for the day-to-day management of their research data, the majority of participants indicated that they were responsible (75.3%, n = 61). Others identified a specific person, such as a research manager (60.5%, n = 49), and one person indicated that someone external to the university was responsible. Only two individuals were unsure who was responsible for managing their data.
In response to a question about the ownership of data, almost half (49.3%) believed that the research data were owned by the research team. Equal ownership was placed with the university and the participant themselves (37.0% each) and 18.5% were unsure. Other responses from five participants indicated data ownership with their research or funding partner, student working on the project or the health service. When asked “How they knew who owned the data?” most indicated that it was just understood (44.8%), identified in contracts or study agreements (41.4%), advised by someone who owned the data (17.0%) or it was covered in their employment contract (15.5%).
Data management planning
Of the 73 participants who responded to the question asking whether they had a data management plan (DMP), 43.9% did not, 26.0% were unsure and 30.1% did. Participants were asked to indicate the criteria which influenced them to decide whether or not to include a DMP in their present research (Table 1). The most common reason attributed to having a DMP was “good research practice” (95.4%), and the most common reason for not having a DMP was “not understanding what a DMP is or what is should look like” (56.2%).
Top reasons attributed to DMP status.
DMP: data management plan.
DMP status was further stratified by appointment type and research experience. Figure 2(a) shows that almost half of academic staff did not have a DMP and 70.0% of HDR candidates were unsure whether they had a DMP, signifying a potential lack of awareness or understanding by less research experienced participants. This is further evaluated by examining years’ of research experience and DMP status. Figure 2(b) illustrates that the majority of participants with more than 10 years’ experience had a DMP (38.0%), those with less than 1 year of experience were mostly unsure (64.0%) and early career researchers (i.e. 1–5 years’ experience) were more likely to not have a DMP. Again, this evaluation shows that HDR candidates are likely have less than 1 year of research experience and are largely uninformed about DMP practices. Although not significant (p = 0.61), DMP status tended to increase with the level of research experience.

Participant DMP status as distributed by: (a) appointment type and (b) years of research experience. DMP: data management plan.
Planning consent for future use of data
Research that involves participants and the collection and reuse of their data was explored. Classifications of informed consent according to the National Statement on Ethical Conduct in Human Research were reviewed (National Health and Medical Research Council, 2007 (updated 2018)). Of the 63 participants who indicated that their research involved collecting participant informed consent, the majority obtained specific consent (65.5%, n = 38) and the majority collected data in a non-identifiable format (69.0%, n = 40; Figure 3). Consent is categorised according to the National Statement on Ethical Conduct in Human Research where specific consent limits data for use to the project under consideration; extended consent allows use of data in future research that is an extension/closely related/in the same area of the research; and unspecific consent allows the use of data in any future research.

The consent to future use of data in research, as distributed by (a) type of consent and (b) format of patient data collected.
Analysing data
Additionally, software applications used to analyse and process data were queried. The majority indicated that they used analysis and statistical software programs, such as SPSS, Stata and SAS (34.0%), database and spreadsheet programs (i.e. Excel and Access; 33.0%) or biostatistics and scientific graphing applications (i.e. GraphPad Prism and SigmaPlot; 21.0%).
Discussion
While health and medical research is a field that is heavily informed by best practice policy and guidelines, there is an evident gap in the literature that examines data management activities associated with the early phases of the research data life cycle: creation, analysis and processing (Cox et al., 2017). In this study, our insight into the data management practices of health and medical researchers is contextualised by particular tasks associated with the research data life cycle. These initial findings are an effective way of informing institutional RDM strategy and provide clear articulation of professional competencies and foundational knowledge of the researcher community.
While the majority of respondents in our study did not have a DMP currently in place, of those who did (30.0%), most were likely to be in a research professional role such as a research assistant and have more than 10 years of experience. This is comparable to an international study by Tenopir et al. (2011) of researcher data practices and perceptions, indicating that 29% of researchers had a DMP and another study by Kennan and Markauskaite (2015) indicating 28.0% of researchers. Further, Buys and Shaw (2015) surveyed researchers from Northwestern University showing 45% of respondents had a DMP, albeit when evaluated by appointment status, there was no association with individuals who had a particular appointment type. The uncertainty about DMP status among HDR candidates (mostly PhD candidates) in our study suggests that students may not be getting the appropriate support and/or guidance from their faculty or supervisors related to DMP and possibly other RDM practices. This is compounded by 60% of respondents indicating that the day-to-day management of their data is the responsibility of a research manager. This leads us to question, if research assistants and research managers are primarily conducting the data management tasks, then it is likely that the senior researchers (who traditionally supervise and train PhD candidates) may not recognise or have the skills to apply RDM best practice. It can be concluded that RDM training and/or support should be offered to researchers, regardless of the stage in their career and assessment through competency as opposed to years of experience in research would be most suitable.
Reports exploring data management practices of academics advise that RDM can often be viewed as a “burden” to researchers (Bell et al., 2009) or a technical skill that is frequently considered to be an overhead and not really research (Anderson et al., 2007). Schumacher and VandeCreek (2015) argue that researchers are “largely unaware of the basic principles of…data management” and receive little to no training, other than what they learn doing research (Fear, 2011) and are therefore left to create their own ad hoc approach to RDM. While the Institution in this study has a best practice guideline for researchers related to the management of research data and primary materials, and in addition provides tools and support for researchers (Wolski et al., 2017), these guidelines are optional and neither mandated nor monitored. But, if RDM tasks were considered an essential part of academic training or made an institutional requirement, would this lead to a culture where RDM tasks are viewed as a “tick-box” exercise that has no real value or would it help to influence and motivate best practice among research and eventually become part of standard practice? Without clear direction, researchers are left to create their own guidelines. In a study by Hickson et al. (2016), a behavioural framework was applied to further understand the data management practices of researcher’s and behaviours that influence decision-making. Findings suggest that attitude is the predominant deterrent to good data management which should be addressed when designing intervention strategies to modify behaviour. RDM infrastructure is an essential platform for cutting-edge research, so there is value in understanding the underlying institutional culture and how it affects behaviours and attitudes towards data management.
Institutions that have championed the successful integration of RDM practices and policies, have treated it as a multi-faceted issue that has required cross-disciplinary effort and collaboration, encouraged a sense of shared responsibility and had the recognition and prioritisation from senior management. Therefore, the significance of understanding RDM practices within an organisation provides the ability to identify and target future RDM requirements. This approach will not only strengthen researcher capacity, but instil good research practice, and over time improve health informatics and research data quality (Kowalczyk, 2017). Decisions made at the creation of data are of utmost importance because they can influence all subsequent outcomes (Perrier et al., 2017; Willoughby et al., 2014). For example, in order for data to be reused it requires early planning to consider correct data formats, data integrity and perhaps tools to build and manage data (Gardner et al., 2003; Ray et al., 2014). Additionally, for institutions to support and implement data management policies, to manage risk of file format obsolescence or degradation of information storage, they need to better understand where data are sourced, their format and how they are stored. For clinical research involving human participants, this includes the collection of appropriate participant consent that would allow use of the data beyond the study for which it was collected. In our study, the majority (65.5%) of participants indicated that they only collected “specific” consent, which limits use of the data for the specific project under consideration. Almost one quarter had collected “extended” consent, which allows the use of the data for future research projects that are closely related or in the same area of research and only 10.0% collected “unspecified” consent, allowing the use of data in any future research. In light of the open science movement, it is now more important than ever to educate researchers about planning their research to enable sharing and reuse. This involves the incorporation of considered and appropriate patient consent (O’Keefe & Connolly, 2010).
Within the Australian context, the two main government research funders, the Australian Research Council and the National Health and Medical Research Council, neither enforce nor monitor policy frameworks for data management, so it is difficult to determine the level of uptake or compliance nationally. However, aside from RDM practices being good research practice, institutions in Australia should be preparing researchers for future directions in RDM as changes in requirements from Australian funding agencies relating to data management are implemented. For example, the ARC recently released a publication and dissemination of research outputs policy which all ARC-funded projects must comply. Outputs arising from the project must be deposited into an open access institutional repository within a 12-month period from the date published. In 2017, the Australian Government commissioned a broad ranging investigation into the benefits and cost of options for improving availability and use of data (Australian Government, 2017). This comprehensive report recommends the creation of a new, broad-reaching data framework that delivers benefits to the community, increases the availability and usefulness of data and engenders community trust and confidence. The responsibility of infrastructure and centralised leadership in Australia is primarily driven through the National Collaborative Research Infrastructure Strategy, a collaboration between researchers, government and industry, who are leading this national data reform process.
While some researchers recognise the importance of RDM, others might not be willing to invest the time or are resistant to the idea. Participants in our study reported a generally positive desire for further training and support in RDM practices, with the majority indicating a need for training/information in data management best practice, specifically related to the ethical, governance and legal requirements of data management. This was closely followed by support with writing a DMP and information about services for data storage and backup during active projects. Akin to similarly reported studies (Buys and Shaw, 2015; Tenopir et al., 2011), our study also demonstrates the importance of support and education, with particular interest in long-term data management. This is not a surprising outcome and certainly reinforces the growing awareness in the significance of research planning and conduct.
What will be the enduring result?
Learning all aspects of how to create, process and analyse research data is not typically part of undergraduate or postgraduate training, but according to the National Code, it is the responsibility of the researcher and Institution (National Health and Medical Research Council, 2018). Anderson et al. (2007) hypothesises that the lack of support appears to be a combination of social, technical and fiscal factors that could be part of the tradition of biomedical researchers who express reluctance to seek out and collaborate with experts. Navigating RDM practices within the context of the health and medical research environment is a complex task, and it is essential to recognise the importance of researcher skills and behaviours. It is therefore important to implement RDM support in the context of tasks associated with the earlier stages of the research data life cycle, which are important to the integrity of all research and have downstream effects on the ability to share and reuse data; both highly topical and valuable outcomes of research.
Limitations
There are several limitations to note related to the work presented in this article. Firstly, we acknowledge that participants self-selected out of a shared interest in data management practices, inadvertently introducing some biases into our analyses. Additionally, the self-selected convenience sample of participants may not be representative of common perceptions of the wider community of health and medical researchers. As such, we intend to replicate this study in other disciplines to better understand the ingrained culture of RDM practices and compare across the organisation. Further research at our Institution will help to identify likely solutions and a strategy to address data management impediments. An evaluation of RDM practices from project teams, as opposed to individuals, would also provide additional insight to further articulate services.
Conclusion
The outcomes presented in this study may contribute to the greater understanding of RDM practices of health and medical researchers, by presenting the perspective from one research institution in Australia. We report on the application and use of RDM practices related to the creation, processing and analysis of research data and argue if researchers are not applying the basic principles of data management and perhaps creating their own ad hoc approach, this will cause problems and potentially compromise the research. Even then, there are multiple aspects to RDM best practices that need to be followed, in order for research data to be suitable for data-sharing activities. Tasks associated with these activities are synonymous with good research practice behaviours and are fundamental to downstream processes, such as quality preservation, curation and data-sharing activities. Acknowledging this gap in practice is especially important, given that national investment is primarily focussed on downstream activities, such as building sophisticated data storage and access facilities, e-research tools and high-performance supercomputing.
In light of this, our institution is considering the importance of addressing researcher capacity in tasks associated with the stages of the research data life cycle, and in addition, we are designing intervention strategies to modify attitude and behaviours around particular RDM practices. Clearer articulation of foundational knowledge and professional competencies related to RDM as contextualised by the research data life cycle could be included in institutional strategies aimed towards building researcher capacity and good research practice. This study provides new insight of the RDM practices of health and medical researchers from one university to inform development of institutional services, support and training strategies.
Supplemental material
Supplemental Material, Supplemental_Material_App1 - Research data management in practice: Results from a cross-sectional survey of health and medical researchers from an academic institution in Australia
Supplemental Material, Supplemental_Material_App1 for Research data management in practice: Results from a cross-sectional survey of health and medical researchers from an academic institution in Australia by Michelle A Krahe, Julie Toohey, Malcolm Wolski, Paul A Scuffham and Sheena Reilly in Health Information Management Journal
Footnotes
Authors’ contributions
All authors made a substantial contribution to the conceptualisation and design of this article. MAK and JT designed the study and acquired the data. MAK conducted the analysis and led the manuscript preparation and writing. JT, MW, PAS and SR provided substantial guidance, feedback and edits during the research and editing process. All authors contributed to the interpretation of the results and revision of the manuscript.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
