Abstract
This study assesses the needs of researchers for data-related assistance and investigates their research data management behavior. A survey was conducted, and 186 valid responses were collected from faculty, researchers, and graduate students across different disciplines at a research university. The services for which researchers perceive the greatest need include assistance with quantitative analysis and data visualization. Overall, the need for data-related assistance is relatively higher among health scientists, while humanities researchers demonstrate the lowest need. This study also investigated the data formats used, data documentation and storage practices, and data-sharing behavior of researchers. We found that researchers rarely use metadata standards, but rely more on a standard file-naming scheme. As to data sharing, respondents are likely to share their data personally upon request or as supplementary materials to journal publications. The findings of this study will be useful for planning user-centered research data services in academic libraries.
Keywords
Introduction
Jim Gray (2007) outlined what he labeled the fourth paradigm of scientific research. The fourth paradigm points to increasingly collaborative, computational, and data-intensive scientific discovery. The growth of computational research during the intervening years has resulted in the rapid proliferation of complex data, the management of which requires advanced data management skills, techniques, and tools that researchers often require assistance with. Many academic libraries have responded to this need by developing a suite of research data services (RDS) (Tenopir et al., 2014). RDS may include informational services (e.g. consulting with researchers on data management plans, providing reference support for finding and citing data sets, providing finding aids for data), as well as technical services (e.g. preparing data sets for deposit into a repository, creating or transforming metadata for data or datasets, directly participating with researchers on a project). While the number of libraries offering RDS services appears to have flattened, it was still listed as top trend in academic libraries in 2016 (ACRL Research Planning and Review Committee, 2016).
To develop user-centric RDS, it is important for libraries to understand researchers’ data management practices and to discern their various RDM-related needs. A needs assessment is one way of gathering this information. This has been accomplished in a variety of different ways. In light of the NSF data management plan (DMP) mandate that was initiated in 2011, Peters and Dryden (2011) interviewed NSF and NIH grant recipients about their data management needs. Williams (2013) distributed a survey to early career Agriculture faculty members following a presentation that was given as part of a program aimed at helping them succeed in the tenure process. Wiley and Mischo (2016) interviewed Engineering and Atmospheric Science faculty at the University of Illinois Urbana-Champagne (UIUC) to learn more about their data management needs in the context of their research activities. This study intends to contribute to this line of research by conducting a comprehensive survey on the need for RDS in a research institution.
The library at the institution that is the focus of this study provides RDS primarily in the form of consultation services, particularly for data management plans, and education for graduate students on data management best practices. More specific assistance, e.g. metadata support, is provided on an ad hoc basis as time and expertise within the library allow. Because there is no one designated data librarian in this library system, RDS are managed by a Research Data & Scholarly Communication Committee that is comprised of individuals with data-related expertise from across divisions within the library.
Literature review
As research data services have emerged in academic libraries, studies have increasingly explored the research data management behavior of academic researchers across different disciplines and the types of support that they need in their activities that generate and make use of research data. Williams (2013) surveyed early-career faculty members in the discipline of Agriculture to learn how well-informed and prepared they feel about the data requirements of various funding agencies, what data challenges they perceive, and how the library might help. She found that Agriculture faculty see value in data management training for graduate students and research assistants, as well as the compilation of a library of data management plans from successful grant proposals. Wiley and Mischo (2016) conducted interviews with 21 Engineering and Atmospheric Sciences researchers at the University of Illinois at Urbana-Champagne (UIUC) to determine their data management practices and needs. They found that researchers consider data management as one part of the research lifecycle of which they take a holistic view. The faculty interviewed indicated heavy reliance on graduate students for many aspects of research data management associated with their projects, but they were unsure about whether or not they followed data management best practices. They also indicated a need for assistance with storage, back-up, long-term preservation, and archiving of data. In spite of these realizations, none of the researchers interviewed had made use of research data management services within the library, even though they were aware of these services. Keil (2014), an Associate Professor in the Department of Microbiology and Immunology at Montana State University, recognizes the value of libraries in providing data management support in the sciences. She sees value in library support with the management and curation of data throughout the research lifecycle, in particular highlighting the role of libraries in data sharing through support with institutional and disciplinary repositories. Parham et al. (2012) conducted a survey study that assessed faculty data curation needs at Georgia Tech. They investigated the types of data assets created by faculty researchers and associated methods for managing and storing research data. Their findings revealed that most of the respondents were not likely to make a specific plan for data management, and it was partly because researchers lacked knowledge in data management.
Studies have shown that RDS needs are not limited to researchers in the United States. Tripathi et al. (2017) interviewed 40 scholars at research institutions in India about their perceptions toward raw data. They identified a need for enhanced research data services in relation to the organization, archiving, and preservation of raw data. The interview subjects demonstrated uncertainty about data sharing, problems with data storage, and significant interest in library support with these aspects of data management. Similarly, Renwick et al. (2017) conducted a pilot study of 100 researchers across disciplines at the University of the West Indies in the Caribbean and found from the 65 valid responses a lack of knowledge in data management best practices. For example, 66% of respondents used flash-drives as a data storage method. They concluded that academic libraries can play an important role in supporting researchers in the management of their data and by providing technical assistance with data storage.
In the UK, there have been an array of projects that have evaluated research data management practice in higher education institutions, particularly focusing on data curation, long-term archiving, and data preservation. An early effort, the Data Asset Framework (DAF), originally called the Data Audit Framework (Jones et al., 2008) has served as a methodological tool for assessing research data management in academic institutions (Humanities Advanced Technology & Information Institute (HATII), 2009). The purpose of the DAF was to assist higher education institutions to evaluate their status of data management and to support their effective data management practices. According to HATII (2009), the DAF refers to a set of data assessment methods that covers different aspects of data management practices, such as what data assets are being created and held by researchers; how data are stored, managed, shared, and reused; any related risks, data loss or irretrievability; researchers’ attitudes towards data creation and sharing; and ways to improve data management. The DAF instrument involves the assessment of different stages of data management, including data format, amount of data, data storage, data sharing, and others (Kaye, 2016). The DAF was specifically designed to help identify gaps and barriers to RDM practices in academic institutions and to support building an effective strategy for long-term preservation of research data (Jones et al., 2009; Nassiri and Worthington, 2012). As a result, it has been widely adopted and used by higher education institutions in the UK in the past decade, including Northampton, Southampton, Hertfordshire, Nottingham, London School of Hygiene and Tropical Medicine, Sheffield, and others.
Alexogiannopoulos et al. (2010) adopted the DAF instrument to assess research data management within the University of Northampton Research Data Project. Their project intended to investigate the types of data created and held by researchers and associated data management practices. Then, they aimed to provide evidence to inform a possible new data management policy, as well as relevant services. From the DAF-based survey, they found that data storage needs and behavior varied throughout the research lifecycle, with different storage devices being preferred at the data collection, analysis, and project completion stages respectively. In addition, their report revealed that a shared server would be an effective method in sharing research data, but email was most frequently used among researchers. They also identified potential challenges in research data management, such as data ownership issues, outdated data format, data not used again after the project completed, under-exploited university’s data server, and several others.
Wilson (2013) reported the results of the DAF-based survey study conducted with 314 researchers in Oxford. Their findings indicated that more than half of the participants regarded research data management as essential to their research cycles, but their awareness of the institution’s data management infrastructures was low. Knight (2013) also adopted the DAF to assess the application of data management practices among researchers in the London School of Hygiene and Tropical Medicine. According to his analysis of 117 responses, more than two-thirds of the respondents handled personally identifiable information at some stage of the research lifecycle. The results also indicated that researchers used multiple types of data storage, particularly their school servers, local disk drive on a laptop, or portable storage devices. He identified a range of challenges in RDM, such as uncertainty on practices for data archiving, issues in production of data sharing agreements, and uncertainty on documentation standards.
Nassiri and Worthington (2012) conducted a comprehensive audit of research data management in the University of Hertfordshire, based on the DAF method. Their results highlighted that the respondents were not knowledgeable in data documentation. They also found that sharing data for reuse might not be common practice, and a large number of the respondents were not aware of the existing services in their institution designed for data management and sharing. Their study revealed that a lack of guidelines, policies, and facilities for data preservation could be an issue in research data management. Similarly, Parsons et al. (2013) conducted an RDM assessment using the DAF instrument at the University of Nottingham. Their findings showed that researchers utilized multiple types of storage for research projects (e.g. hard drives of campus computers or laptops, external storage, and shared drives), and they used an external hard drive most often for data back-up. They also found that a majority of the respondents did not record metadata for their datasets and only a small number of the respondents adopted existing standards or guidelines for data documentation. Also, approximately two-thirds of the respondents answered that they did not develop a data management plan for their project.
Cox and Pinfield (2014) also adopted the DAF method to conduct a comprehensive national survey of UK universities to understand in what ways academic libraries were involved in research data management and the extent to which RDM services were a strategic priority for them. Their survey showed that limited RDM services were available in general in 2012, and that the major challenges in RDM services included skills gaps, limited resources, and cultural changes. In spite of this, they found that academic libraries recognized the importance of RDM, and many of the participant libraries were involved in developing new policies and services related to RDM. From 26 semi-structured interviews of library staff from a number of different UK institutions, Pinfield et al. (2014) analyzed the contribution to and roles of academic libraries in research data management based on the analysis of main components and major drivers for research data management. From the interview study, they constructed a model that defines different aspects of research data management, including multiple layers of activities, multiple stakeholders and drivers, and associated factors related to implementation of research data management initiatives.
In a study on perceptions of research data management and RDS in higher education environments among academic institutions in the UK, Johnson et al. (2016) reported results from a comprehensive survey that explored the management of active data, data preservation, data sharing, and institutional RDS. They found that awareness of institutional services to support data management and sharing remained low and most respondents were not using available services. Respondents did indicate interest, however, in a number of topics for training, including long-term storage of data, developing DMPs for funding applications, and collaboration and sharing of data. This suggests that better marketing may increase receptivity of RDS at participating institutions.
Cox et al. (2017) conducted a survey study that involved academic libraries in Australia, Canada, Germany, Ireland, the Netherlands, New Zealand, and the UK. They found that while libraries in these countries have provided leadership in research data management advocacy and policy development, they lag behind in service development, especially in the area of technical services. This is consistent with trends in the US that are mentioned above. Based on their findings, they propose an RDM Maturity Model that suggests a range of research data management activities that reflects current and planned data services and a practice landscape.
Studies have also been conducted on the perceptions of academic librarians on the role of university libraries in the realm of research data support. Tenopir and associates (2012; 2014; 2015) conducted a series of large-scale studies aimed at defining and identifying types of research data services, and they traced the growth of these and related services. Tenopir et al. (2012) surveyed academic library members from the Association of College and Research Libraries (ACRL) in order to assess the status of research data services in academic library communities. They found that only a small number of academic libraries offered research data services in 2011, but a significant number of libraries planned to offer some type of data services in the near future. Tenopir et al. (2014) found that research data services were still not widely employed in libraries, but that many libraries were in the process of planning to offer these types of services. They noted that technical research data services were offered less often than informational services. In spite of efforts on the part of library leadership to provide training in the area of research data management, librarians do not perceive these opportunities to be sufficient. Tenopir et al. (2015) continued to find hesitance on the part of academic libraries to develop research data services, but interest in doing so. This is due, in part, to lack of institutional and administrative support, as well as lack of sufficient technical expertise to offer technical research data services and uncertainty about who in the library will be providing the services ultimately offered.
These studies demonstrate the efforts of academic libraries to develop research data services based on patron needs. This study contributes to this research, by expanding the scope of services to include those offered by other support units on campus. Our research questions are as follows:
RQ1: What types of data-related assistance have researchers received from the university?
RQ2: What types of data-related resources do researchers need for their research activities? To what extent are they willing to use these resources?
RQ3: How do researchers currently manage their data, e.g. data documentation and sharing?
Methods
In this study, a survey was distributed to faculty, researchers, post-docs, and graduate students in a public R1 research university. Undergraduates were excluded from the survey. Prior to distribution, the survey was evaluated by a group of four external experts comprised of two librarians and two library science scholars. The librarians both have experience in data management services in academic libraries, while the library science scholars have expertise in data science and data management. Based on their feedback, the survey was updated and pretested with 10 potential subjects. The final survey was distributed online to 1950 targeted faculty, researchers, and graduate students across a variety of disciplines. In total, 186 valid responses were collected for a response rate of 9.54%.
Table 1 provides the demographic information of participants. Approximately 55% of the respondents were graduate students, 40% of which were doctoral students. The remaining 35% were regular titled assistant, associate, and full professors.
Demographic information.
Results
Experience with research assistance on campus
The survey first investigated what types of data-related assistance respondents have received on campus (Table 2). The first question was: “Have you received assistance on campus with type of assistance?” This was followed by the question: “If yes, assistance from people in which area?” Respondents were asked if they had received any help in nine distinct areas from various units on campus. The results demonstrate that assistance for data analysis (39.8%) and data collection (34.4%) are the areas where help is most commonly received. Support in these areas came predominately from affiliated departments and colleges, with some reliance on research centers, campus IT, and the university library.
Experience with research data services on campus.
Multiple responses were allowed.
A cross-tabulation (Table 3) was created to explore patterns of research assistance by discipline. The percent value in each cell indicates the proportion of each type of assistance experience by discipline. Overall, researchers in the humanities were less likely to benefit from research assistance offered by units on campus. For data management planning, social sciences (23.8%), science and engineering (20.4%), and health sciences (21.1%) tend to make the most use of related services on campus. In regard to data collection, health science researchers (52.6%) make the most use of campus resources, followed by agriculture (38.9%), and science and engineering (35.2%). Support for data analysis is high among researchers in health sciences (63.2%), agriculture (61.1%) and social science disciplines (49.2%). Health scientists benefit most from assistance finding existing datasets (36.5%), followed by social scientists (30.2%) and agriculture researchers (16.7%). Social scientists appear to receive the most assistance with long-term storage (22.2%), while health scientists (21.1%), social scientists (19%) and agriculture researchers (16.7%) receive the most assistance with meeting funder mandates.
Research assistance experience by discipline – percentages.
Need for research data services
Perceived level of support for different types of data-related research activities were investigated using a 5-point Likert scale where 1=‘not at all needed’ and 5=‘very needed’ (Table 4). Results show that respondents most want to receive assistance with quantitative analysis (2.92), followed by assistance on data visualization (2.91). This was followed by a perceived need for support with finding existing datasets (2.84); data management plans (2.79); meeting funder mandates for sharing data (2.73); data collection (2.72); data analysis (2.69); and assistance with data refinement/cleaning (2.68). The two services for which there was the least perceived need for support include data documentation (2.67) and finding existing data sets (2.62).
Perceived needs for research data services (Responses to the question: “Rate the level of support that you need for the following research processes and activities.”).
An analysis of perceived needs for research support by discipline (Table 5) demonstrates significant differences across disciplines. The most significant mean differences exist between the health sciences and other disciplines. This is reflected by the relatively high level of need across different types of data-related research support in the health sciences, while humanities researchers exhibit relatively low perceived need for research support. For example, researchers in the health sciences demonstrate the highest need for assistance with quantitative data analysis (3.68), while researchers in the humanities demonstrate the lowest need for this type of service (2.00). Agriculture and social science researchers also exhibit moderately high needs for research support across different data-related research activities. In particular, they indicated a high need for quantitative data analysis and visualization support – agriculture (3.28) and social science (3.37).
Perceived needs for research data services by discipline.
p<0.01 *p<0.05.
Intention to use research data services
Potential intention to use research data services if they are offered on campus was measured with a 5-point Likert scale where 1=‘never’ and 5=‘very often’ (Table 6). Results reveal that researchers would be likely to use data visualization (2.90) and quantitative data analysis (2.85) if those services were offered. This was followed by data collection (2.70); finding existing datasets (2.68); services related to data documentation (2.61); and assistance meeting funder mandates for sharing research data (2.58).
Perceived intention to use research data services on campus (Responses to the question: “To what extent would you use the following services if they were offered on campus?”).
p<0.01 *p<0.05.
Perceived intention to use research data services by discipline can be found in Table 7. This question was rated with a 5-point Likert scale that ranges from 1=‘never’ to 5=‘very often’. As with perceived needs for data-related research support, there is significantly higher intention to use research data services among health science researchers than in the humanities. For example, health science researchers perceive an intention to use research data services in the area of quantitative data analysis (3.47), while researchers in the humanities do not (1.81). When looking into each discipline, researchers in agriculture showed high intention for data collection (3.24), quantitative data analysis (3.44), and data visualization (3.47), which are directly related to their research process. Humanities researchers showed low intention across different types of services, particularly lower intention in the services of quantitative data analysis (1.81), data refinement/cleaning (1.84), and funder mandates for sharing research data (1.75). Social scientists showed relatively high intention to use quantitative data analysis (3.22) and data visualization (3.10). Science and engineering researchers’ ratings were all lower than three out of the five-point scale. Health scientists relatively showed highest ratings for most of services, in particular, data collection (3.42), finding existing datasets (3.37), and quantitative data analysis (3.47).
Perceived intention to use research data services by discipline.
p<0.01 *p<0.05.
Data format, storage, documentation, and sharing
Next, this study investigates the research behavior of researchers in the areas of data collection, storage, documentation, and sharing. First, we inquired about the format of data that researchers typically use in their research (see Table 8). Unsurprisingly, tabular or spreadsheet format data are the most widely used by researchers (76%). This is followed by textual formats (52.7%), image files (41.9%), audio files (25.3%), video files (18.3%), geospatial data (15.6%), and artifacts/samples/specimens (13.4%).
Data format that researchers use in their research (Responses to the question: “Which format of data do you usually generate in your research?”).
Formats of data have been found to vary by discipline (Table 9). For all disciplines except for the humanities, tabular/spreadsheet formats are the most prevalent, e.g., agriculture (94.4%), social science (85.7%), and science and engineering (81.5%) Researchers in the humanities are more likely to use textual format data (65.6%). Image format data is used widely across disciplines, particularly in agriculture (66.7%) and science and engineering disciplines (55.6%). Audio recordings are frequently used by social scientists (42.9%) and researchers in the humanities (31.3%). This could be due to the frequency of interviews conducted in these fields.
Data format by discipline – percentages.
In addition to file format, the survey explored the type of data storage that researchers use for their research data. Table 10 includes data storage types that researchers selected. As expected, computers and external storage devices are the most widely used (85.5%). Cloud storage is the second most widely used resource (49.5%), which reflects the recent upsurge in cloud storage services in academia. This is followed by storage options in the lab (33.9%), department/college (25.3%), and at the university level (11.3%). While external data repositories are used by 10.8% of respondents, the library institutional repository is currently only rarely used by campus researchers (1.6%).
Data storage used by researchers (Responses to the question: “Where do you store your research data?”).
When comparing storage usage by discipline (Table 11), it is apparent that computers and external storage devices are widely used across disciplines with over 80% of respondents in all disciplines represented indicating that they regularly use these options. Cloud storage is the next most commonly used storage option with usage ranging from a high of 54% by social scientists and a low of 31.6% among researchers in the health sciences. On the contrary, health science researchers depend heavily on department and/or college storage options (63.2%), while those in science and engineering and agriculture use these resources less often at 31.5% and 22.2% respectively. Lab computers and storage options are frequently used by researchers in science and engineering (59.3%) and agriculture (55.6%). Social scientists also use departmental and college storage resources (19%), as well as the university server (12.7%). Health scientists rely more on the university server than researchers in any other discipline (26.3%). Reliance on external data repositories is lower than that for other storage options with the heaviest use coming from researchers in science and engineering (16.7%) and agriculture (11.1%). The resource least commonly used is the library institutional repository, with researchers in the social sciences (3.2%) and the humanities (3.1%) making the most use of this resource.
Data storage use by discipline – percentages.
In regard to the amount of storage needed for research data (Table 12), 17.2% of respondents indicated that they need less than 1 GB of storage, 28% indicated that they need between 1 and 10 GB, 19.9% require 10 to 100 GB, and 11.8% need between 1 and 10 TB of storage. Only three participants indicated that they need more than 10 TB of storage. When analyzed by discipline (Table 13), it is apparent that science and engineering researchers and health scientists require more storage capacity than those in other disciplines with 22.6% and 23.5% requiring between 1 and 10 TB respectively. Only in science and engineering disciplines did any researchers indicate a need for between 10 and 50 TB of storage space. This result is similar to the findings from several prior studies (Akers and Doty, 2013; Cox and Williamson, 2015; Johnson et al., 2016; Nassiri and Worthington, 2012; Parsons et al., 2013) that a majority of researchers manage research data sized less than 100GB.
Digital storage needs for research data (Responses to the question: “Which of these options approximately describe your current digital storage needs for research data and related materials?”).
Digital storage needs for research data by discipline – percentages.
Percentage was calculated with only valid respondents.
Data documentation behavior is broken down into the various methods that researchers adopt to keep a record of annotations, metadata, or other information describing their research data. More than half of the respondents use standard file naming to organize their research data (59.1%). Researchers also keep data documentation information in research notebooks, including field notebooks (27.4%) and lab notebooks (26.3%). Lab notebooks are more heavily used in science, engineering and agriculture. Codebooks are used by 23.1% of respondents with use being heaviest among social scientists. README files and separate data dictionaries are less widely used with 13.4% and 11.3% respectively. Standard metadata schemas are used the least often, i.e. all-purpose schemas (1.6%) and discipline-specific schemas (1.1%).
The last area studied was researchers’ intention to share their research data (Table 14). About half of respondents indicated that they are willing to share their research data with others (52.2%). When broken down by discipline (Table 15), science and engineering researchers are the most willing to share (62.3%), followed by those in social science (57.4%), agriculture (50.0%), and the health sciences (47.4%). Humanists are the least willing to share their research data (36.7%).
Intention to share research data (Responses to the question: “Do you intend to share the data sets associated with your research?”).
Intention to share research data by discipline – percentages.
For those who responded “willing to share”, we further asked which methods they would use for data sharing (Table 16). The most common method indicated was sharing by personal request only (68%). This was followed by a willingness to share research data as supplementary materials as part of a journal publication (43.3%) or to post the data to a website (20.6%). Intention to use external, disciplinary repositories and institutional repositories was much lower at 17.5% and 15.5% respectively. This finding reaffirms the Akers and Doty’s (2013) study that individual sharing via email is the most frequent method followed by supplement data linked to journal articles. Also, Cox and Williamson (2015) found the similar result that data repositories were not widely used as a means of data sharing.
Data sharing methods (Responses to the question: “If yes, which methods of sharing your research data do you currently use?”).
Multiple responses were allowed.
Percentage out of 97 respondents who answered “willing to share”.
For those respondents who indicated an unwillingness to share their data (Table 17), the most frequently chosen reason was that the data included confidential, proprietary, or classified information (39.3%). Lack of expertise and time/effort involved in sharing data were often selected as factors that impede data sharing, 31.0% and 25% respectively. Other concerns include intellectual property concerns (20.2%), lack of tools for sharing/publishing data (17.5%), concern about losing research advantage (15.5%), and concern over possible data misinterpretation (14.3%).
Reasons behind researchers’ unwillingness to data sharing (Responses to the question: “If no, please tell us why?”).
Multiple responses were allowed.
Percentage out of 84 respondents who answered “not willing to share”.
Discussion and conclusion
Using a survey method, this study comprehensively assesses the need of researchers for data services and investigates researchers’ data management and sharing behavior. The findings from this study will serve as a foundational understanding of researchers’ potential need for data services, which will inform the design of research data services in academic libraries.
The findings of this study reveal that researchers benefit from different types of support for their research activities, ranging from data management, data collection and refinement, data analysis and visualization, and others. While most prior studies investigated research data services in an academic library context (e.g. Akers and Doty, 2013; Keil, 2014; Williams, 2013), this study surveys more broadly research data services across different departments or units on a university campus. The bulk of such support is likely to come from departments and/or colleges, which tend to be tailored to researchers’ specific areas of research and which typically have separate funding or staff to support diverse research activities of affiliated faculty and students. It can be inferred from the results that departments/colleges where personnel have a certain level of domain knowledge in the discipline and relationships between support personnel and researchers are well established might well be the ideal place for support of data collection and analysis.
Researchers have very little experience utilizing research data services offered by the library. This result aligns with a prior study by Williams (2013), which reveals that researchers have little intention to request data-related assistance from libraries. The experience of researchers with library RDS support is related primarily to assistance with data management plans, finding datasets, and finding repositories for their data. It is for this reason that the university library on the campus that is the focus of this study currently provides general data management support and assistance finding repositories for research data, but not technical assistance, e.g. data analysis. This is consistent with research data services that are offered in libraries throughout the country that focus on informational and consulting services (ACRL Research Planning and Review Committee, 2016). This study affirms that researchers benefit from research data services offered by other units on campus, highlighting the importance of cross-campus collaboration, particularly in regard to information technology services (Wittenberg and Elings, 2017).
According to this study, the services for which researchers perceive the greatest need include assistance with quantitative analysis and data visualization. This conveys the need for more direct, embedded help in analyzing research data in the research process and reveals a gap between services currently offered and researcher needs. This finding, which is unique to this study, points to a gap between the perspectives of librarians and researchers with regard to assistance with data analysis. The survey study conducted by Cox et al. (2014) showed that librarians perceived data analysis as the least important in RDM. This study, however, revealed that assistance in data analysis is, in fact, most needed by researchers.
According to this study, the perceived need for research data services varies significantly by discipline. Overall, the need for these services was highest in the health, social, and agricultural sciences and lowest in the humanities. This is unsurprising given the likelihood that researchers in the sciences work with large, numerical data, e.g. clinical or genomic data. Researchers who work with this type of data tend to seek more assistance throughout the research process (Joo et al., 2017). This is likely due, at least in part, to the fact that research in the sciences is more likely to be externally funded, and researchers often need assistance with the data management plan portion of the grant proposal and other funder mandates (Mutz et al., 2014). Humanists, on the other hand, indicate the least need for research data services. This may be explained by their tendency to work individually with small sets of qualitative data, often without external funding. This may change with the steady growth of digital humanities.
In addition to examining the perceived research data needs of respondents, this study delved into the data formats typically generated and used, data documentation and storage practices, and data-sharing behavior of researchers. Respondents in all fields except for the humanities most commonly use tabular data that is managed in spreadsheets. This finding affirms several previous studies that show spreadsheet files and text document files are most commonly created and used by researchers (Alexogiannopoulos et al., 2010; Nassiri and Worthington, 2012; Parham et al., 2012; Parsons et al., 2013; Wilson, 2013). This is an educational opportunity for libraries given the many data management problems that can arise from improper use of spreadsheets, e.g. inconsistent metadata, multiple data types in a single column, and lack of version control.
In regard to data documentation, our findings reveal that respondents rarely use metadata standards, relying instead on a standard file-naming scheme. This finding is supported by the Akers and Doty’s (2013) survey study that most researchers are not familiar with data documentation practices and/or creating metadata for their data across different disciplines. This was followed by documentation in field and lab notebooks and codebooks. Given the inconsistencies that arise when using these various techniques for documenting data, this is an area where libraries can draw upon their extensive expertise in metadata to offer assistance and training (Johnston et al., 2012; Kim, 2014; Yoon and Schultz, 2017). Johnson et al. (2016) affirmed that lab books and notebooks are one of the most commonly used data types. Thus, it is important to guide how to effectively use field and lab notebooks for the purpose of data documentation. Previous studies, such as Knight’s (2013) and Parsons et al.’s (2013) survey reports, indicated that training on creation and management of data documentation is one of the topics of interest to researchers. Thus, it is important that academic libraries offer training opportunities for researchers to better organize their metadata for research data. The challenge is educating researchers about the benefits of using metadata standards when non-standard documentation methods have long sufficed (Akers and Doty, 2013). Highlighting the importance of metadata for preservation and access functions, such as data archiving, sharing, and reuse is a good place to start (Kim, 2014). Providing an easy-to-use, self-generating metadata option when designing data management systems, such as data repositories, can also facilitate the adoption of metadata standards.
As to data-sharing practices, researchers are likely to share their data personally upon request or in supplementary materials to journal publications. This reaffirms Akers and Doty’s (2013) finding that individual email is the most prevalent data sharing method followed by journal publication supplement. Johnson et al. (2016) found that cloud storage service is another widely used method for data sharing, but this study did not investigate the use of cloud storage as a tool for data sharing. This study also found that science and engineering researchers are the more willing to share than researchers in other disciplines. This finding is similar to those in previous survey studies. For example, Akers and Doty (2013) also found that researchers in basic sciences were more likely to be involved in data sharing practice than other disciplines. Similarly, Wilson’s (2013) report indicated that researchers in mathematical, physical and life sciences were most open to data sharing in the survey conducted with Oxford University researchers. The hindering factors that we found were consistent with prior literature, especially confidentiality and permission issues (e.g. Johnson et al. 2016; Williams, 2013; Wilson, 2013) as well as sensitive information (Akers and Doty, 2013). We also observed that researchers in sciences/engineering are more likely to share their research data. In addition, this study reaffirmed the findings from several prior studies (e.g. Akers and Doty, 2013; Cox and Williamson, 2015) in that data repositories have not yet been widely adopted in data-sharing practices. In terms of hindering factors in data sharing, confidentiality was a great worry to participants. Particularly, health scientists show relatively high concern in sharing data as they tend to deal with medical data that includes personal information. As data sharing can inspire researchers to initiate new or additional research (Wilson, 2013), it is important to create an environment that facilitates data sharing among researchers via institutional repositories. To support data sharing at the library, institutional repositories might work as an important tool to facilitate researchers’ data-sharing behavior. The availability of data repository can positively influence data sharing and re-use among researchers (Yoon and Kim, 2017). Academic libraries need to actively promote their data repositories and assist researchers to find appropriate repositories to facilitate data openness among researchers.
This study has several limitations. First of all, even though we tried to extract a representative set of samples from a large research university, it is impossible to represent the entire community of researchers in the United States. There might be different patterns of research needs from different sizes or regions of universities. In addition, graduate students comprise approximately 54% of the sample. Therefore, the sample may not adequately represent the perspectives of faculty researchers. Moreover, this study did not investigate the needs for training or education related to research data management. Prior literature emphasized the importance of user education and training in research data management (Cox and Williamson, 2015; Williams, 2013). This study relies only on descriptive analysis of questionnaire responses. Thus, it could not investigate in-depth reasons behind the responses of the participants to the questionnaires. For example, we did not examine the details of reasons why researchers would like to use or not use specific kinds of research assistance on campus. These limitations illustrate future research that enlarges the sample to better represent different regions and different sizes of research institutions. In particular, it would be needed to investigate the differences in data service needs between research-oriented vs. teaching-oriented institutions. In addition, we plan a next study to further analyze the reasons behind researchers’ needs on different types of data services.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Institute of Museum and Library Services (RE-32-16-0140-16).
