Abstract
Research data management is an important topic for funding agencies, universities and researchers. In this context, the main aim of this study is to collect preliminary information for Aperta, which is being developed by the Scientific and Technological Research Council of Turkey, to fulfil the following goals: determine the research data management awareness levels of researchers in Turkey; understand current research data management practices in their research environments; and find out their experiences of policy issues. For this, a questionnaire was distributed to 37,223 researchers, with 1577 researchers completing it. The results indicated that researchers who spend more time with data have more concerns about data management issues. The levels of experience of creating a data management plan were quite low. The importance of this study lies in how it is able to show the current research data management practices of Turkish scholars during the new repository’s foundational development stage.
Keywords
Introduction
The deluge of data in recent decades has not only created great opportunities for scientists but also the challenge of managing enormous amounts of information. Discovering, accessing, storing, migrating, integrating and reusing data has become easier and more common. Hence, research data management (RDM), which concerns the organization and dissemination of data (Whyte and Tedds, 2011), has become an important topic for all stakeholders involved in research. It includes ‘the organization, storage, preservation, and sharing of data’ (University of Pittsburgh, 2020). It has become both an opportunity (Royal Society, 2012) and a challenge (McAfee and Brynjolfsson, 2012; Pinfield et al., 2014) for research organizations.
Many researchers, funders and countries that understand the importance of RDM are taking important steps in this regard and carrying out their studies in this direction. For researchers and funding agencies in Turkey, the biggest problem concerns the awareness of and attitudes towards RDM practices. Up until 2019, there was a dearth of encouragement for researchers or a lack of drive to enforce guidelines regarding RDM from national funding agencies. However, researchers who are involved in the Horizon 2020 grant scheme are required to submit RDM plans within six months after their project has been funded, like any other grantee in Europe. Even though some researchers are aware of the benefits of RDM, they lack institutional support, have limited knowledge and technical skills, or are scared of falling victim to unethical practices, such as being scooped (Allard and Aydinoglu, 2012; Aydinoglu et al., 2017; Ünal and Kurbanoğlu, 2018). Much needs to be done to address the problem, but at least those at the highest level are now showing commitment.
The head of the main funding agency in Turkey, the Scientific and Technological Research Council of Turkey (TÜBİTAK), announced its commitment to open the data gathered during the 6th Turkey Open Science Summit, held in 2018. In 2019, TÜBİTAK announced its ‘Open science policy’, covering publications and research data that are produced through TÜBİTAK funding (TÜBİTAK, 2019). It recommends ‘establishing a research data management plan for open access to research data’ and ‘providing open access to publications along with research data’, and also commits to preparing templates and guidelines for data management plans (TÜBİTAK, 2019: 3–4). Moreover, an institutional repository, named Aperta, has been created (Aperta TÜBİTAK, 2019) and an RDM training portal (Araştırma Verileri Yönetimi, 2019) has been prepared. TÜBİTAK aims to provide open access to the output of all kinds of funded projects, including research data and publications funded by TÜBİTAK’s Support Program of International Scientific Publications, TÜBİTAK-addressed publications, articles published in TÜBİTAK academic journals and related research data (Aperta TÜBİTAK, 2019). On the other hand, the training portal provides templates, guidelines and videos to researchers in need of training on RDM (Araştırma Verileri Yönetimi, 2019). The active initial steps that were taken by the country’s main funding body may function as an accelerator regarding RDM and open research data in Turkish universities. In such an important process, this study aims to provide information regarding the RDM behaviours of Turkish scholars, which would help in the design of not only Aperta but also other data repositories.
Literature review
RDM offers several opportunities not only to researchers but also to the scientific enterprise and society in general. For researchers, because ‘proper data management is also a key prerequisite for effective data sharing’ (Sesartic and Dieudé, 2017), one advantage is increased visibility for their research and, consequently, greater opportunities for their publications to receive more citations (Herold, 2015). This is because of evidence that points to how studies which are made available receive more citations than similar studies which do not (Fecher et al., 2015; Ioannidis et al., 2009; Piwowar et al., 2007; Piwowar and Vision, 2013; Spires-Jones et al., 2016). As for science in general, RDM helps in identifying questionable research ethics and dealing with the reproducibility crisis that has plagued science (John et al., 2012; Roettger et al., 2019). The integration and reuse of data spells the introduction of new methods and better science (Fecher et al., 2015; Tenopir et al., 2011; Xia et al., 2017).
However, challenges remain. Research suggests that, despite all the efforts on the part of the scientific community, data-sharing still tends to occur through personal exchanges (Fecher et al., 2015; Ferguson et al., 2014; MacMillan, 2014; Wallis et al., 2013). Policymakers have been encouraging researchers to share their research data and, in general, to have better data management practices. For instance, different funding agencies in the USA have adopted RDM policies into their grant schemes (NASA, 2011; National Institutes of Health, 2008; National Science Foundation, 2010), and Europe has followed (European Commission, 2016). Higman and Pinfield (2015) looked at the adoption of data management practices in higher education institutes and found that successful cases could be attributed to large research funders. Despite enviable open access data-sharing practices, challenges persist. The fMRI Data Center for the neuroimaging community is a good example (Mennes et al., 2103; Poldrack et al., 2017). Sociocultural problems in ecology continue to hinder scientific advancement as well (Hardisty et al., 2013). Dental research is plagued by similar concerns, despite an environment of positive attitudes towards data-sharing (Spallek et al., 2019). A recent survey conducted among earth and planetary geophysicists found that scientists are aware of the benefits of sharing their research data; however, they are concerned about data misuse and the risk of not receiving credit for their data sets (Tenopir et al., 2018). Data misuse as a concern for sharing data has appeared in different surveys (Aydinoglu et al., 2017; Bertzky and Stoll-Kleemann, 2009; Cragin et al., 2010; Elsayed and Saleh, 2018).
The management of research data creates an opportunity for library and information science practitioners and scholars. For academic institutions, university libraries play a critical role not only in managing research data but also in training the next generation of researchers to be well versed in data management practices (Koltay, 2019; Tenopir et al., 2016). There exists a knowledge gap among scientists that can easily be addressed by these professionals (Steeleworthy, 2014; Strasser and Hampton, 2012; Tenopir et al., 2017; Verbakel et al., 2013).
Several researchers are of the opinion that incentives for researchers can be a solution to the sociocultural barriers preventing data-sharing in the sciences (Ioannidis et al., 2014; Koole and Lakens, 2012; Michener, 2015; Nosek et al., 2012). Data sets can be cited and counted in National Science Foundation grant applications as long as they are citable and accessible (Piwowar, 2013). Attitudes towards data-sharing differ among scientists depending on their academic discipline (Tenopir et al., 2011, 2015a) or which sectors they belong to, such as academia or industry (Pollock, 2016). Even within a discipline, developments over time may have changed the atmosphere and attitudes of scientists, such as in the case of medical science, which has traditionally been conservative regarding data-sharing (Tenopir et al., 2011) but is now moving in the opposite direction (Yegros-Yegros and Van Leeuwen, 2019).
Data citation is not the only thing researchers consider when it comes to RDM. Tenopir et al. (2011) conducted an international survey of researchers from different disciplines to explore current practices in, and perceptions towards, data-sharing and found that scientists do not make their research data available or receive institutional support, and are willing to share their data only if certain conditions are met. A follow-up study (Tenopir et al., 2015b) to observe changes (if any) regarding data-sharing was conducted with regard to how many funding agencies made data management plans mandatory for their grantees and how awareness had increased. As well as an increase in acceptance and willingness to engage in data-sharing, actual data-sharing behaviours also saw a spike. However, with greater awareness, risk perception among the participants had also spiked, as barriers remained. Other surveys have found similar results (Aydinoglu et al., 2014, 2017; Grootveld et al., 2018; Kratz and Strasser, 2015; Whitmire et al., 2015).
There have been a limited number of studies investigating RDM practices in Turkey. A nationwide survey found that the concept of RDM did not exist in open access policy papers (Tonta, 2012, 2013). The studies so far have focused on researchers’ attitudes towards and practices of RDM and data-sharing: Allard and Aydinoglu (2012) investigated data-sharing practices among environmental scientists; Aydinoglu et al. (2017) examined the RDM behaviours of academics at research-intensive universities in Turkey; and Ünal and Kurbanoğlu (2018) researched attitudes towards RDM. A recent study by Ünal et al. (2019) compared scholars from Turkey to those in France and the UK and found that there are big differences in data behaviours such as ‘the use of data from outside sources’, ‘expectations for funding for data storage and open access’ and ‘concerns for sharing their data’. It is important to have a better understanding of scholars’ data behaviours in comparison with each other as they collaborate frequently and what to do with the data comprises a large part of collaboration. Considering that the TÜBİTAK ‘Open science policy’ is in effect and Aperta has been introduced for researchers, it has become imperative to establish a baseline for Turkish scholars’ RDM behaviours in order to understand the mid- and long-term effects of the policy and archive. Therefore, this study is designed to understand the RDM behaviours of the target audience of TÜBİTAK funding – active researchers who have received funding from TÜBİTAK – by addressing the following research questions: What is the level of use, production and citing of research data for researchers in Turkey? Do these practices differ by academic title and field of study? What are the most common data types and data formats used in research? Is there a difference between fields according to the data types or formats used? What is the size of the data used in research? Is the field of study a determinant of the data size used? Where is the research data stored? Does the environment for data storage differ by academic title and field of study? Do researchers support open access to research data? What are their experiences in preparing a data management plan? How are the approaches of researchers for TÜBİTAK making data open and a data management plan mandatory for funded projects? Does the approach of researchers differ by field and title? What is the need for training on RDM? Does the need differ by title and field of study?
Methodology
TÜBİTAK ULAKBİM (the Turkish Academic Network and Information Center) sent a questionnaire to 37,223 researchers registered to ARBİS (TÜBİTAK’s researcher information system) to gather the preliminary information for Aperta, mainly on RDM-related issues (see Appendix 1). ARBİS, which has been designed and developed by TÜBİTAK, is an updated database to hold information about researchers. The researchers who have registered to ARBİS are able to apply for TÜBİTAK scholarship and support programmes, or serve as evaluation and monitoring phases for the submitted proposals (ARBİS, 2019). ARBİS is one of the biggest researcher repositories in Turkey. On the other hand, since Aperta is aimed at researchers who conduct or evaluate projects for TÜBİTAK, only scholars registered to ARBİS are included in the research. The questionnaire was distributed online by TÜBİTAK ULAKBİM through LimeSurvey and remained open for three weeks between 27 April and 17 May 2018.
The questionnaire, which consisted of 19 questions, was answered by 1577 researchers during this period. The researchers were asked questions, for example, about their research data usage and production; the types, formats and size of the data they used or produced; and the environment in which they stored their research data. The aim was to reveal trends and the behaviours and thoughts of the researchers, as well as to determine the knowledge levels and educational needs of the researchers.
The participants were asked three initial questions about whether they had used, produced and cited research data before. Since these three questions were not answered in 269 of the returned questionnaires, the analysis was based on 1308 returned questionnaires (1577 − 269), where the participants answered these initial questions.
According to Formula 1 and Formula 2 (Cochran, 1963: 75), a population of 37,223 can be represented by a sample of 1736 for e = 0.04 and by a sample of 996 for e = 0.03, both at the 99% confidence level (z = 2.56; p = 0.5; q = 0.5). Based on this information, it is possible to say that our sample represents the population at a 99% confidence level.
In Formula 1 and Formula 2, N is the population size; n 0 is the sample size; n is the corrected sample size; z is the Z-table score for the selected confidence interval; p is the estimate of variance; q is 1 – p; and e is the desired level of precision.
SPSS (version 21.0) was used for the analysis of the results, and Excel was used to generate the graphs. In addition to descriptive statistics, cross-tables and chi-squared tests were used to reveal the differences according to fields and titles. Interpretations were made based on percentages (row percentages) by field and title to prevent the results from being affected by frequency by field or title. In addition to the comparisons made based on the fields and titles, several comparisons were made to determine whether there was a difference between those who used/produced and did not use/produce research data according to their approach to open data and RDM.
Findings
Respondents’ general information
The distribution of the 1308 respondents covered in the analysis according to their field of study showed that 27% were in engineering, 24% in science, 21% in medical and health sciences, 18% in the social sciences and humanities, and 9% in agricultural sciences. Fifteen respondents did not specify their field of study. Figure 1 shows the distribution by title. The majority – around 40% – were Professors, followed by Associate Professors (22.5%) and Assistant Professors/Lecturers with a PhD (24%). These three groups accounted for approximately 87% of the respondents, with the remaining 13% consisting of Lecturers without a PhD (3.4%), Research Assistants with a PhD (5.1%) and Research Assistants in the process of postgraduate education (2.8%).

Distribution of respondents by title (%).
RDM awareness levels of Turkish scholars
The research data usage rate of the participants was about 83%. Approximately 73% of the participants said that they had cited research data before, and the option ‘I did not know that data could be cited’ was not selected by anyone. A group of approximately 71% of the participants said that they produced their research data (Figure 2). The tendencies of the researchers to use, produce and cite research data were also examined based on the field of study and title of the researcher. While the percentage of researchers who stated that they used research data was higher than 77% for all five fields of study, the highest percentages were observed in the medical and health sciences (91%) and agricultural sciences (90%). A similar trend was observed in producing and citing research data. Medical and health sciences (83% and 76%, respectively) and agricultural sciences (76% and 83%, respectively) were the two fields of study that exhibited the highest rates of producing and citing research data. The social sciences and humanities had the lowest rates of producing (63%) and citing (68%) research data. The titles of the researchers were divided into four groups – (1) Professor; (2) Associate Professor; (3) Assistant Professor and Lecturer with or without a PhD; and (4) Research Assistant with or without a PhD and Other – and comparisons by title showed that there was no statistically significant difference in terms of the title in all three aspects of the researchers’ involvement with research information: using research data: χ 2 (3) = 6.121, p = 0.106; citing research data: χ 2 (3) = 7.706, p = 0.052; and producing research data: χ 2 (3) = 1.012, p = 0.798. The percentages of researchers who stated that they produced (70%–74%), used (79%–85%) and cited research data (68%–76%) were found to be close for all four title groups.

Use of xls and txt data formats by most commonly used data types (%).
Commonly used data types and formats
The most commonly produced or used data type was found to be experimental data (50%), indicated by one in every two participants. The next most commonly used or produced data type was text data (24%), followed closely by survey data (22.5%) and graphical data (22%). It would not be wrong to say that these three data types were produced or used by approximately one in four participants (Table 1). In exploring how the data formats used differed according to field, it was found that the most important difference was in the use of experimental data (χ 2 (4) = 139.577, p = 0.000). The most significant reason for this difference is that the use of experimental data was seen in 67% of researchers in the medical and health sciences and 54%–58% of researchers in engineering, science and agricultural sciences, whereas it was used by only 17.4% of researchers in the social sciences and humanities. Contrary to experimental data use, the highest rate of raw data use, although not of the same significance, was found in the social sciences and humanities (21%). The usage of survey data by field showed that the highest usage rate was in the social sciences and humanities (46.4%) and then medical and health sciences (37%), with the lowest level of use being in engineering (8.2%) and science (10.2%) (χ 2 (4) = 181.747, p = 0.000). The two most prominent fields in terms of the use of graphical data were science (25%) and engineering (30%). It is worth noting that data models were used more in engineering (13%) compared to other fields. The use of lab books, which were not used in the social sciences and humanities, was found to be higher in medical and health sciences (21%) and science (16.6%) than in the other two fields (χ 2 (4) = 70.901, p = 0.000). Audio recordings and video data, both of which had a low usage in general, were used more in the social sciences and humanities (15% and 17.7%, respectively) than in the other fields.
Distribution of used or produced data types.
Note: Multiple options could be selected for this question.
According to 4% of the participants, the type of data they used was different from the options in Table 1, but only 10 of them specified the data type as abiotic data – that is, data from non-living physical and chemical elements in the ecosystem (five participants) or observational data (five participants). Another striking finding about the data types that were used or produced was that 36% of the participants responded that they did not use any of the 10 data types listed in Table 1 and did not specify any other data type in the ‘other’ option, which is a higher figure than for those who stated that they did not use (17%) or did not produce (29%) research data.
Based on Table 2, which shows the percentages of data formats produced or used, the two data formats that stood out were xls (49.5%) and txt (34.8%), which were used by approximately one in two people. The social sciences and humanities has the lowest rate of use of xls (39.6%) and txt (25.5%) data formats. The use of csv was found to be higher in engineering (19.4%) and agricultural sciences (18.6%). The free-text data format was used at a higher rate in medical and health sciences (23.4%) compared to its use in other fields. The researchers’ use of the sav data format, which was about 8% in engineering, science and agricultural sciences, was 29% in medical and health sciences and 25% in the social sciences and humanities. This finding may be related to the fact that survey data is generally frequently used in these two fields. The most commonly used data formats for survey data were found to be xls (75%) and sav (50%). On examining how the data formats differ according to the other data types used, xls andtxt data formats stood out as being the most used data types, as shown in Table 1 and Figure 2).
Distribution of used or produced data formats.
Note: Multiple options could be selected for this question
When the data written in the ‘Other’ option was examined, which was marked by 6% of the participants, it was found that 16 researchers (approximately 1%) seemingly considered word/doc as a type of data format. Moreover, although the data format of the statistical software SPSS is sav, two researchers specified SPSS as a data format in the ‘Other’ option. Additionally, pdf was specified as a data format. Another interesting point with regard to the ‘Other’ option was the picture/image and video formats (jpeg, tiff, png, mp4, etc.).
Average data size
It is thought that 36% of the participants who did not mark any of the options related to the average size of the data they used or produced in their most recent studies did not have a sufficient working idea about the subject when they began. The percentage of researchers who did not indicate the size of the data they used or produced was highest in the social sciences and humanities (45%) and lowest in medical and health sciences (24%). This information should be evaluated together with the percentage of those who stated that they did not produce or use research data in their related fields. The percentages of participants who did not produce or use research data for the social sciences and humanities were 37% and 19%, respectively, which were higher compared to the rate in medical and health sciences (17% and 9%, respectively). The percentage of participants who produced or used data larger than 10 GB (gigabytes) was around 7%. In general, it can be said that the data used or produced was not ‘big data’. The size of the data used or produced by 39% of the participants was less than 1 GB. With regard to data for the social sciences and humanities, 70% of the researchers replied that the amount of data they used or produced was less than 1 GB. The rate ranged from 58% to 62% for the other four areas (see Figure 3).

Distribution of the data used or produced by the participants in their latest study according to average data size.
Figure 4 presents the findings on where the participants stored the most recent data they had produced or used. The most preferred medium for storing data was local computers (61.5%), with the highest rate (73.4%) seen in medical and health sciences and the lowest in the social sciences and humanities (52.3%). The participants’ use of cloud storage was 7% and 9.6% for agricultural sciences and science, respectively. Although there was a ‘cloud’ option among the options, it is interesting that ‘Drive’ and ‘Dropbox’ were put (as other mediums in which data was stored) under the ‘Other’ option by some of the participants (5%). The storage medium specified by the vast majority of the participants who selected the ‘Other’ option was ‘external disk’. It can also be seen from Figure 4 that the use of institutional repositories or open access archives, for example, for data storage has not yet become widespread in Turkey. There were only 11 participants who stated that they used commercial databases such as the Data Citation Index for data storage. Although 71% of the respondents stated that they cited research data, they seemed hesitant to store their data in institutional repositories, open access archives, data archives and other storage options which they believed could facilitate or make it possible for other researchers to cite their data. The use of open access archives, institutional repositories and commercial databases for data storage was quite low in all areas.

Storage environments of the most recently used or produced data. Note: Multiple options could be selected for this question.
With regard to whether the data-storage environment differed according to the title of the participant, there was a statistically significant difference in terms of cloud usage at a 95% confidence level (χ 2 (3) = 32.114, p = 0.000). It was found that the Professor (7.4% cloud usage) and Research Assistant groups (25% cloud usage) caused this difference.
Data management plans
In order to improve the answers to the questions about data management plans, the participants were asked whether they had previously conducted a project, with about 76% of the 1205 respondents replying yes. Approximately half (454 participants) of the 913 participants who answered the question on whether they had previously conducted a TÜBİTAK project answered in the affirmative. To the question of whether a data management plan was mandatory (at least once) for the funding of (at least one of) their previous projects, 121 of the participants (approximately 9% of the participants and approximately 14% of the respondents to this question) answered affirmatively. This finding is supported by Ünal and Kurbanoğlu’s (2018: 299) study, which found that 13% of their participants had previously prepared data management plans, and 16% had data management plans in their current projects. Researchers in medical and health sciences exhibited the highest rate (14.2%), while those in the social sciences and humanities showed the lowest rate (6.4%) in terms of prior preparation of data management plans. It is thought that the 427 participants (approximately 33% of the respondents) who did not answer this question had no idea about data management plans. Based on these findings, it would not be wrong to say that approximately 91% of the respondents did not know what a data management plan is and/or had never prepared a data management plan before. Approximately half of the participants (49%) thought that open data and data management plans, which are mandatory in Horizon 2020 projects, should also be mandatory for TÜBİTAK projects. Only 7% of the participants expressed a negative opinion on this issue. It is thought that the 31% who did not express their opinion did not have any knowledge of the subject. Whether or not to use research data did not affect the opinion of the researchers (χ 2 (5) = 6.603, p = 0.252) on this issue. The question of whether to produce research data cannot be said to have had a significant effect (χ 2 (5) = 11.196, p = 0.048), even if those who did not produce research data had a higher positive opinion about data management plans (77.3%) in comparison with those who produced research data (68.4%). The distribution of the 903 participants who gave their opinion on the necessity of mandatory data management plans for TÜBİTAK projects by field showed that the positive opinion ratio for each field ranged from 69.2% to 73.9%, and that there was no statistically significant difference between the fields (χ 2 (20) = 7.687, p = 0.994). Similarly, there was no statistically significant difference between titles (χ 2 (15) = 11.340, p = 0.728), although the Research Assistant group (79%) had the highest rate of positive opinions on making data management plans mandatory for TÜBİTAK.
Training needs
When asked for their opinions on funding institutions’ (TÜBİTAK’s) checking of the research data produced during projects, 40% of the participants offered their opinions, which were significantly positive. The majority believed that the person responsible for the control and supervision of the data was none other than the researcher who had produced or was producing the data. It was stated that TÜBİTAK should have control of research data under the supervision of experts, and it was understood that researchers had many reservations about data security and ethical violations.
The study also found that 67% of the participants and 90.5% of the respondents to the related question stated that they were willing to participate in data management training by TÜBİTAK and to use open access resources and portals created by TÜBİTAK. The existence of a group of 371 researchers who did not express either a positive or negative opinion on the subject is remarkable (approximately 28% of the participants). Although it was found that there was no significant difference in terms of educational needs according to title (χ 2 (3) = 8.138, p = 0.043), it can be said that demand for training was lower for Associate Professors (90%) and Professors (88%) in comparison to the other groups. The willingness for education in the different fields ranged from 87% to 93% (χ 2 (4) = 3.311, p = 0.507). Moreover, those who used and/or produced research data and those who stated that they did not were equally eager for training. One explanation for this is the presence of participants who declared that they used or produced research data without knowing it.
The majority (59%) of the participants (76% of the respondents to the related question) thought that data produced by public sources is a public good and should be made available to the public as open data. The number of participants who expressed negative opinions in response to this question was very low (8.2% of the participants and 10.6% of the respondents to the related question). The opinions of research data users and non-users about open data were found to be similar (χ 2 (5) = 5.658, p = 0.341), but the question of producing data affected the opinions of the participants. An overwhelming majority of the researchers believed that data generated by public resources should be open to the public; this applied to both those who produced research data and those who did not (72.4% and 85%, respectively). The researcher’s opinion on this subject was found to differ according to field (χ 2 (20) = 38.287, p = 0.008). The reason for this difference is that the rate of supporting an open data approach was approximately 86% in the social sciences and humanities but between 71% and 76.5% in the other four fields of study. Although the participant’s title was not an important factor in their opinion about open data (χ 2 (15) = 11.187, p = 0.739), the rate of those who gave positive opinions was relatively higher in the Research Assistant group (80.4%). Examining the opinions expressed by the 121 participants showed that the most prominent opinion could be considered the basis of the philosophy of open access, which is that all scientific outputs produced by public resources belong to the public. It was stated that data-sharing would prevent the wastage of resources, would expand comprehensive research and could have a widespread impact. The participants also expressed concerns about several issues, including the use of data to advance personal instead of public interests, data with a high degree of confidentiality, the anonymization of data, guaranteeing the control and governance of data, how to determine the conditions relating to the use of data, and protection of the rights of researchers.
Discussion
This study utilized survey data with the aim of collecting preliminary information for Aperta, the institutional repository of TÜBİTAK, which is being developed by TÜBİTAK ULAKBIM, and obtaining information about ARBIS-registered researchers’ levels of knowledge and the current situation with regard to research data, RDM, open data and related subjects.
RDM awareness was found to be high among the survey participants compared to earlier studies conducted in Turkey (Allard and Aydinoglu, 2012; Aydinoglu et al., 2017). It seems that two domains – medical and health sciences and agricultural sciences – led the efforts in utilizing, producing and citing research. Although the social sciences represented the least effort among the five domains, researchers in the social sciences expressed the highest level of support for the argument that publicly funded research data should be made into open data. Furthermore, those scholars who did not generate research data were more in favour of open data policies. These arguments demonstrate that researchers who spend more time working with research data have more reservations and concerns about open data.
The most surprising finding of the study concerns the measure of data citation. Even though previous studies (Allard and Aydinoglu, 2012; Aydinoglu et al., 2017; Ünal and Kurbanoğlu, 2018) found that the notion and practice of data citation were not common among researchers, three out of every four respondents stated that they had previously engaged in data citation and knew about it. This can be interpreted as a positive sign that the efforts made by TÜBİTAK, the main funding agency in Turkey, to increase awareness of RDM and data-sharing have finally yielded some results. However, we also suspect that the respondents were referring to something else when they thought about ‘data’, such as a table or a graph in an article they had read.
As for data types, formats and size, the experimental data used by one out of every two participants was extensively in medical and health sciences, science, agricultural sciences and engineering. Not surprisingly, its usage in the social sciences and humanities was quite low compared to the other fields. Moreover, data models were used more in engineering, which may be a result of using big data. On the other hand, it was found that the survey data mostly used in the social sciences and humanities and the medical and health sciences had the second-highest usage rate. Aydinoglu et al. (2017: 276–277) found that the most commonly used data types were the same as for this study, but there were significant differences in the usage rate for data types other than for experimental data (53%), text data 47% and survey data (41%).
The highest use of the sav data format, which is the main data format of SPSS and used widely for analysing survey results, is common for these two fields: medical and health sciences and social sciences and humanities. As the main subject of study is human beings in the social sciences and humanities, audio recordings and video data were used more in this field. In general, the xls and txt data formats are the most commonly used, but they have the lowest usage rate in the social sciences and humanities. These two data formats are the most commonly used data formats for each of the most used data types. In Aydinoglu et al. (2017: 278), the two most commonly used data formats have a proximate rate of use. The third (free text at 30%) and fourth (sav at 27.4%) most commonly used data types are also the same in both studies, but the findings of Aydinoglu et al. (2017: 278) show a higher usage rate. The usage of sav files may indicate a problem, as they are not recommended for archiving and publishing because they hinder the interoperability of research data.
Ünal and Kurbanoğlu (2018: 296) and Aydinoglu et al. (2017: 279) obtained generally similar results in terms of data-storage environments, data usage in terabytes and the production of researchers. Both studies found a high use of participants’ own devices for data storage (96% and 71.6%, respectively) and of in-cloud storage (39% and 46%, respectively). The cloud storage preference among early career researchers was double that of Professors, which corresponds to Aydinoglu et al.’s (2017) study. Similar to the results of this study, Ünal and Kurbanoğlu (2018: 296) found that the use of university institutional archives (9%) and external institutional archives (6%) for data-storage purposes was also very low. That being said, the Council of Higher Education (2019) has initiated an open academic archive system to increase the use of university institutional archives and indirectly help researchers adopt better data behaviours.
Although a direct question was asked about the need for training within the scope of the study, the answers given to some questions also showed the level of knowledge of the participants about the subject, and hence the need for training. In fact, previous studies have also identified the need for training in RDM (Aydinoglu et al., 2017; Ünal and Kurbanoğlu, 2018) and documented the demand from their participants for such training (Allard and Aydinoglu, 2012). Approximately 4 out of every 10 participants did not specify the data type and data format they used or produced, or the size of the data they used in their most recent study. It is natural for the participants who did not use or produce research data not to respond to these questions, but it is understood that a considerable number of the participants who produced and/or used research data also did not respond to these questions. For example, in the social sciences and humanities, although 19% indicated that they did not use research data and 37% indicated that they did not produce research data, almost one in two participants (45%) did not (could not) state the size of the data they used in their most recent study. Some of the participants who answered the question about the data type they used erroneously answered the data-format question with ‘Word’, ‘doc’, ‘SPSS’ or ‘pdf’, and the data-storage question with ‘Drive’ or ‘Dropbox’, for example, instead of choosing the ‘cloud’ option; they may also be added to the group that needs training. On the other hand, although some participants engaged in using or producing research data, they indicated their need or willingness to undergo training.
Not only technical solutions such as Aperta but also sociocultural issues have to be taken into consideration. Despite an attitude of willingness, there is a serious training need for the scientific community in Turkey. A great majority of the participants expressed their desire to attend training on RDM. Based on some of the responses, we suspect that knowledge of concepts such as data, metadata and interoperable data formats, for example, was not very clear in the minds of the participants. A small percentage of researchers in Turkey have been managing research data, but incentives and mechanisms could be designed to disseminate their best practices. The training needs are twofold. First, the information professionals in Turkey need training. There has been no study on how much they know about RDM (which needs to be investigated in a future study); however, information professionals in other countries need training to support the research community better (Tenopir et al., 2015a, 2017; Wittenberg et al., 2018), and this probably would be the case for Turkey as well. Second, the researchers themselves need training to take better care of their research data, as the participants of this study expressed, and there are plenty of RDM training experiences that can be drawn on for inspiration (Bishop et al., 2020; Leaders Activating Research Network, 2017; Sesartic and Dieudé, 2017).
The responses to questions related to data management plans revealed that researchers in Turkey have limited experience in preparing data management plans – even less than their international partners (Tenopir et al., 2015a). The results of this study and other similar studies (Aydinoglu et al., 2017: 280; Ünal and Kurbanoğlu, 2018: 299) demonstrate that researchers in Turkey do not have sufficient knowledge about data management plans, despite TÜBİTAK’s initiative to make data management plans mandatory for TÜBİTAK-funded projects. Ünal and Kurbanoğlu (2018: 299) found that the percentage of those who did not know whether their institution had a data management plan or not was 73%, with more than 50% of the participants unaware what a data management plan was. However, 84% of them stated that universities should have data management plans. Despite these results, almost half of the respondents were positive about data management plans being mandatory for TÜBİTAK-funded projects.
There was also a contradiction between what was expressed and what has been done. When statements (such as support for making research data that has been publicly funded open data, made by 6 out of every 10 respondents) were compared with the actual amount of research data that has been deposited or shared in organizational archives, open access archives and commercial repositories, huge discrepancies were found. These findings reveal the confusion in the scientific community on how to deal with RDM. There exists not only awareness and goodwill, but also trust issues and poor data habits.
Conclusion
Policies and tools designed to promote RDM and data-sharing should consider the issues discussed above. Incentives (or the lack thereof) impact on how the scientific community adopts data-sharing but, unfortunately, there have been no incentives at all. However, the TÜBİTAK ‘Open science policy’ promises to change things at the funding-agency level. DergiPark, the online journal hosting system for academic journals in Turkey with approximately 2000 journals, can enforce data-sharing practices for the articles it publishes. TÜBİTAK can nudge DergiPark in this direction. Furthermore, TÜBİTAK can ask for data management plans in its grant applications and award additional points for the desired RDM behaviours. Along the same lines, the Council of Higher Education, the central body that governs the recruitment and promotion of academics in Turkey, can include data citation in its promotion scheme to encourage data-sharing practices. We expect that such and similar incentives and encouragement would have a huge positive impact on research data handling and management.
As for the awareness and training needs of Turkish researchers, library and information science practitioners and scholars would appear to be the critical group. They understand the basic concepts of RDM but lack experience with real-life RDM and expertise of the domains the research data is coming from. Train-the-trainer sessions would be the first step to address their lack of knowledge and experience. The second step would be linking library and information science professionals to research projects, where they could provide support to active researchers so that their RDM needs are addressed in real time. Simultaneously, RDM or data management plan training could be provided as a general course within the curriculum, such as statistics courses for final-year students or first-year graduate students – that is, those students who are more likely to deal with research data. Another option would be seminars or workshops targeted at active researchers; they could be discipline-specific, as different disciplines have different data habits. Information science professionals and domain scientists are needed to design such seminars and workshops. TÜBİTAK could support such activities, as it did with Aperta.
There is great potential in Turkey – it has a single funding agency (TÜBİTAK, which has influence over scientists), decent research (Turkish-affiliated researchers produce more than 30,000 Web of Science-indexed articles annually), awareness and goodwill (from the results of the current study). Well-thought-out organization of these resources would not only benefit Turkish researchers by supporting RDM activities, but could also be an example for other countries.
As for the future, two studies are planned. First, a survey of DergiPark journals’ awareness of and attitudes towards RDM is being designed. DergiPark’s journals can support healthy RDM habits. Second, a detailed analysis of the training needs of scholars from different disciplines is needed, as different disciplines have different approaches to data.
Footnotes
Acknowledgment
We would like to thank people working for The Scientific and Technological Research Council of Turkey (TUBITAK); the Turkish Academic Network and Research Center (ULAKBIM) for their support; Filiz Mengüç and Ebru Aydın for the processes of designing the survey; Murat Köreke for the distribution of survey via LimeSurvey; and M. Mirat Satoğlu, director of ULAKBIM, for his support on matters relating to administrative processes.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.
Funding
The authors received no financial support for the research, authorship and/or publication of this article.
