Abstract
While there is a rich literature reporting the prevalence of data sharing in many academic disciplines, and particularly STEM-related ones, the extent of data sharing in journals in Social Science fields has been subject to only little empirical enquiry, hitherto. Focusing on a particular Social Science discipline, Education, this research examines empirically two related issues associated with data sharing in Education.
First, journal data sharing policies were scrutinized via a search of the websites of 47 randomly selected Education journals. Over half of the journals in the representative sample had issued statements on websites encouraging authors to make the data underlying published research, generally available to the academic community, though only a handful of journals make such sharing mandatory. Thus, while the importance of data sharing is well recognized by journals in the Education field, a sizeable minority seems not to have taken a stand on this issue.
The second issue related to the efficacy of the positive stance taken by journals, in eliciting the desired response from authors, leading to the sharing of their data. This was probed in a limited, mainly qualitative, survey of the authors of papers published in journals that encouraged data sharing through their websites. It was found that not a single author had made data available – indeed, some authors were even unaware of the journal’s policy on this matter. Thus, journals’ well-intentioned procedures to encourage greater data sharing are seen to be markedly ineffective. Two main sets of reasons were offered to justify author reticence to data share: either authors did not regard it as being in their interest or data sharing was seen to be inappropriate or not possible for the data set in question. However, these fears relating to engaging in data sharing may not necessarily present insurmountable barriers to its wider adoption, as measures are available to circumvent, at least partially, or to meliorate their effect.
Introduction
Data sharing may be defined as “making the data which underpins empirical research papers available to the researcher community” (Rushby, 2013). While there is a rich literature reporting the prevalence of data sharing in many academic disciplines, and particularly STEM-related ones, the extent of data sharing in Social Science journals has not been the subject of extensive empirical enquiry, hitherto. A contribution of the present research is to make an initial attempt to explore the extent to which data sharing has been adopted by journals in one particular Social Science discipline, that of Education.
The digital sharing of research data is increasingly recognized as an important research integrity norm. Data sharing may be promoted through different avenues. The requirement to data share, increasingly imposed by major research funding authorities, constitutes one case in point. However, the scholarly publication process itself can function as a powerful promotion and enforcement tool for data sharing: journal editors and publishers can strongly recommend sharing academic research data that support the results published in academic papers or mandate it as a condition for publication.
Background
Data sharing emerged in journals in the Biomedical Sciences in the 1980’s and is now widespread in these fields. Among the Social Sciences, only in Economics is there is a clear trend of journals adopting clear data sharing policies, though initial efforts are also evident in Political Science and Sociology (Pienta et al., 2010; Zenk-Möltgen & Lepthien, 2014). A recent paper by Tal-Socher and Ziderman (2020) examined interdisciplinary differences in data sharing policies among journals drawn from 15 disciplines; this was accomplished through an inspection of journal websites for any declared policies on data sharing. The results reiterated the primacy of Biomedical Sciences in the implementation of data sharing norms and showed a lagging implementation in the Arts and Humanities. In the three Social Science disciplines in the journal sample (Economics, Social Psychology and Political Science/International Relations), on average 47 percent of journals had adopted positive data sharing policies. Since it appears that no comparable information is available on the extent and type of data sharing for the Education field, a first objective of the current research is to take a first step to fill this lacunae. In the first phase of the research, a parallel exercise was performed in scrutinizing the websites of Education journals, for information on data sharing policies.
The second phase of the research focused more narrowly on the authors themselves. A synoptic statement of the importance of archiving research data, summarizing the conventional viewpoint on this matter, stated that data sharing provides
“… an indispensable resource for the scientific community, making possible future replications and secondary analyses, in addition to the importance of verifying the dependability of published research findings” (Rushby, 2013).
However, this statement addresses the benefits of data sharing from the general perspective of the scientific community as a whole. But how is data sharing seen from the narrower viewpoint of the individual authors and researchers themselves? Do they regard this process as being in their interest or otherwise? Are there barriers – real or imagined – that militate against a willingness to data share? Would authors readily share data if not required to do so? A second object of the research is to probe such data sharing issues from the standpoint of the authors themselves.
The paper is structured as follows. In Section 3, the results of a search of journal websites, exploring journal data sharing policies, are presented. Section 4 reports the findings of a follow-up internet survey addressed to authors. Section 5 discusses lessons to be learned from the research results and implications for practice. The paper ends with a short section on limitations, followed by conclusions.
Phase 1 – Journal data sharing policies
Method
The general methodology employed by Tal-Socher and Ziderman (2020) was adopted for this phase of the research. The websites of a sample of Education journals were examined for information on the journal’s policy with regard to data sharing. This is defined as the sharing of academic paper-related research data on a digital platform principally accessible to the general public or an entire research community and linkable to the paper. Two definitions of data sharing were adopted for this purpose: Enabling data sharing and Strong data sharing. ‘Enabling data sharing’ refers to policies where data sharing is possible or encouraged but is not mandatory. Neither the sharing of data with individuals upon private request nor sharing only for the sake of the review process is included in this definition. ‘Strong data sharing’ is in place where at least some types of data must be deposited for open sharing as a condition for publication.
In order to set up a representative sample of journals for the study, the SJR (SCImago Journal and country Rank) prestige metric was chosen. The SJR website shows both the journal’s academic discipline and academic prestige and status (Q rating). First, all journals, for which Education was indicated as a principal or subsidiary field, were selected: 1025 such journals were identified. Second, journals were sorted and listed according to four hierarchical Q rank categories. The highest Q ranking (Q1) contained 256 journals; 255 journals were in rank Q2, 252 journals in rank Q3 and 262 journals rated Q4. Thirdly, probabilistic sampling was employed whereby each 17th journal was selected from the Q rank categories. In selecting each sample journal, three parameters needed to be satisfied: (a) the number of articles published in 2018 in the journal, with a minimum of 30 articles per year (the normal distribution assumption); (b) that the journal published empirical research papers (quantitative or qualitative articles); (c) the journal’s primary field was Education – in many cases Education was only a subsidiary field of interest. If a journal did not satisfy these three parameters, the next listed journal was selected. In the event, only two of the Q4 rated journals met the threshold requirements. Thus the final sample comprised 47 journals: 16, 15, 14 and 2 in categories Q1, Q2, Q3 and Q4, respectively (Fig. 1). The Appendix Table provides details of each of the 47 journals comprising the final sample.
Journal sample: Q rank categories.
During the months March-September 2019, the website for each sample journal was searched to see whether there is a text informing authors on the possibility of sharing the data on which the research presented in the submitted paper was based. Such texts are usually found in the Instructions for Authors section; sometimes they are incorporated into an Ethical Policy section or given a space of their own. If a data sharing policy was indicated, it was recorded and the nature of the guidelines noted – whether sharing is only encouraged (data sharing enabled) or required (strong data sharing). For each journal, the publisher (or specific publisher’s imprint) was also recorded. Data coding and tabular analysis were performed during October–December, 2019.
Data sharing policies: Overall results.
Overall results are shown in Fig. 2.
While the majority of the 47 sample journals did publish a positive policy towards data sharing (26), a substantial number of journals did not express any stand in this regard (21). Typical positive data sharing statements are:
“Where relevant, the Journal encourages authors to share their research data in a suitable public repository subject to ethical considerations and where data is included, to add a data accessibility statement in their manuscript file.” “Authors are encouraged to deposit the dataset(s) in a recognized data repository that can mint a persistent digital identifier, preferably a digital object identifier (DOI) and recognizes a long-term preservation plan.”
Data sharing was a prerequisite for publication (strong data sharing) for only two of the 26 journals, both being in the Q1 category. The journals state this requirement as follows:
The Journal “……requires, as a condition for publication, that the data supporting the results in the paper will be archived in an appropriate public repository.” and “Authors may either upload a suitable data file ……that would enable readers to replicate their findings, or agree to share such data with other researchers upon request. Authors using proprietary data that are owned by another entity or restricted-use data that cannot be released to other researchers- are exempt from this data sharing requirement.” These results are consistent with those reported in Tal-Socher and Ziderman (2020) relating to a group of social science disciplines (not including Education); about half of the journals reported positive policies relating to data sharing, while only a handful made this mandatory.
Data sharing policies in Education journals
Data sharing policies in Education journals
Source: journal websites.
Education journal publishers: Data sharing policies
*Scopus top publishers. Source: journal websites.
Data sharing policy according to Q ranking.
Detailed results by Q rank category are shown in Table 1. A declared (positive) policy in relation to data sharing is more prevalent in the two higher Q rank categories (Fig. 3).
The information derived from journal websites was also grouped according to journal publisher (Table 2). Two findings of interest emerge. First, positive data sharing policies are found, basically, only among the major journal publishers (and noteably among Scopus top publishers); such policies are absent among the small and independent journal publishers. Second, there appears to be a lack of consistency across journals for some of the major publishers; some of the journals of a given publisher are encouraged or required to share data, while other journals do not express a policy in this regard.
In sum, we note that while a small majority of journals in the sample did express a policy allowing or encouraging authors to share the data underlying their research, a substantial number of journals did not take declare any policy on this issue. But how did authors respond to those journals that had adopted a positive stance towards data sharing, but did not require authors to do so? Did authors react positively by sharing their data? And if not, what were the reasons for failing to do so? These issues were probed in a short survey of the authors of papers published in the sample journals; this is reported in the following section.
Method
The follow-up study was addressed to a randomly selected sample of 45 authors of empirical articles which had appeared in journals identified in the website search as having a positive policy on data sharing. The research questionnaire was sent to each corresponding author via email in two rounds: in August 2022 and a reminder in September 2022. The questionnaire, attached to the email message comprised two sections. The first related to background variables such as gender, age and academic status, followed by general information relating to the article’s data, as follows:
Was your article data-based? What was the data source? Was your article based on qualitative or quantitative research?
The second section addressed the specifics of data sharing practice:
Did the journal refer to the issue of making your data set available? Did you share your data set?
This was followed by an open-ended question, which was the central concern of the survey:
If you did not submit your data, why did you not do so?
Results
General findings
Thirty-one authors responded. The response was fairly well spread through the academic ranks: Full Professor, 11; Associate Professor, 2; Reader/Senior lecturer, 4; Assistant Professor, 3; Lecturer, 5; and the rest in other, miscellaneous job categories. There were twelve men and nineteen women in the sample, with an average age of 44.6 years (standard deviation of 10.6).
Two of the papers were not data-based; for the rest, 15 conducted research that was based on qualitative data methodologies, while 14 employed quantitative data. All of the journals, in which the authors had published, had issued a positive policy statement regarding data sharing. Thus a surprising finding is the large number of cases where authors state that the journals had not communicated this policy to them. Only 16 of the 29 authors were aware of the journal’s positive stand in this matter. In the event, none of the authors had shared the data set; however, there were two partial exceptions:
One author claimed that the data set was already available in the public domain
“I did not submit my data set, but I make my outputs of Mplus available through Open Science Framework. In that way the data is also available.”
The second author did submit the data set, but only to facilitate the review process, and was not therefore made generally available:
“The editor requested raw data materials during the review proces; however the raw data were not included in publication.”
For the rest, authors adduced a variety of reasons to explain failure to share data, despite journals’ encouragement to do so. These qualitative responses, which constitute the core of the survey, are presented in the following section.
Reasons for not sharing data
Reasons for failing to share data sets fell almost equally into two very different general categories.
The first is fairly straightforward and not unexpected. Many authors simply did not regard it as being in their interest to share their data with others. Some wished to reserve the data for their own and colleagues’ use, clearly fearing that their intended, continuing research might be undertaken by competing researchers. Others regarded the process as too time consuming or saw no personal benefit from the process. It will be argued below that while author resistance of this kind is understandable, it may not always be warranted.
Within the second category of reasons adduced for not sharing data, authors claimed that this was either inappropriate or not possible for the data set or the research project in question. This was the case where limitations on sharing data were imposed by a funding body or data source, as in the following case:
“I was prevented from publishing the data due to restrictions by the Chief Scientist at the Ministry of Education.”
Others quoted ethical reasons for non-compliance:
“We simply cannot share data without ethical approval. If my participants knew their data would be shared, they would be likely not to participate in the study.” “We have very strict ethics. We cannot share data to groups beyond those we list in ethics application. That would be very unethical.”
Thirdly, authors of qualitative research emphasized the impracticality, in many cases, of sharing qualitative data sets:
“Maybe it works with large scale anonymous surveys, but no one in the world can interpret my qualitative data without understanding the CONTEXT it was derived in. It would lead to very poor quality, inaccurate and unethical work as no one can understand how the data was collected, the nuances of the people and places involved and the knowledge generated as a result.” “I think quantitative data sharing makes an immense amount of sense ……The purpose is to be generalizable, so the implications can be very profound. However, for qualitative data, there are significant issues with confidentiality associated with sharing as interviews are entire stories filled with details relevant to the research question but also unique to the participant’s experience/identity. Because qualitative data are rich and thick, an external reader will not understand them in the same way that an immersed researcher will. Also, the results do not purport to be generalizable. The point is to give insight into a part of the human experience that readers of the study must then take and consider how it does or does not apply in their situation.” “ …the data included classroom video (not to be shared beyond the research team as per ethical clearance) and teacher recall interview transcripts (do not make much sense without the video).”
While the results of the survey are essentially qualitative in nature, they may be summarized in tabular form (Table 3).
Author’s reason for not sharing data
Author’s reason for not sharing data
Source: author survey.
Our examination of journal data sharing policies via a search of journal websites has shown that the importance of data sharing is widely recognized by Education journals (and publishers). Thus over half of Education journals in our sample have issued statements on their websites encouraging authors to make the data underlying published research generally available to the academic community, though only a handful of journals make such sharing mandatory. The question arises: how effective is this positive stance of journals, in eliciting the desired response from authors, thus leading to the sharing of their research data?
This issue was probed in a limited survey of the authors of papers published in journals that encouraged data sharing through statements on their websites. It was found that not a single author had made data available – indeed, some authors were even unaware of the journal’s policy on this matter. We may conclude from these findings that journals’ well-intentioned procedures to encourage greater data sharing have been markedly ineffective.
Given the importance of fostering the adoption of data sharing on a broader scale within the Education sector and the limited success evidently achieved thus far, our results raise three questions that have major implications for policy and practice. First, how may Education journals communicate more effectively to authors their affirmative stance with regard to data sharing, as well as encouraging authors to respond positively? Second, what steps might be taken, on a broader front, to encourage authors to share data? And a third issue is how may the many journals that have not declared a policy on data sharing, be induced to adopt a positive stance in this area? These issues are now dealt with, in turn.
On the first issue, journals attempting to promote data sharing do so mainly through notices to this effect on the journal website, frequently in the Notes to Authors section. Such procedures have been shown to be largely ineffective; indeed, many submitting authors remain unaware of the journal’s data sharing policy, presumably because they do not consult the journal’s website prior to article submission. Journals could convey more effectively their policies in this regard by communicating more directly with authors, either at a suitable stage in the peer review process or on final acceptance. However, some journals may wish to move beyond this and adopt a more pro-active stance, offering incentives to share data. For example, including the willingness to share data in the criteria for judging submissions, offers a possible way forward. Other journals may feel ready to move even further, by making data sharing mandatory. This practice is fairly common in some disciplines, particularly in the Biomedical Sciences, but is infrequent in the Social Sciences. While some Education journals (such as the British Journal of Educational Technology) have already moved towards mandatory data sharing, its wider adoption in the discipline may be largely contingent on a greater acceptability of data sharing by the community of academic researchers, an issue now addressed.
Secondly, our survey pointed to strong author resistance to data sharing. When questioned on the reasons for failing to share data, authors offered two general sets of reasons: either it was not in their interest to share data or data sharing was seen to be inappropriate or not possible. The desire to keep research data private is understandable in many situations, particularly when potential author benefits from publication are threatened or where reasonable ethical and practicality arguments are present. Indeed, these concerns are echoed authoritatively in an editorial on data sharing in the journal-organ of the International and Comparative Education Society. The editor, while recognizing that “data sharing has come to stay”, nevertheless expressed “mixed feelings” on engaging with the issue of data sharing (Nordtveit 2018). He notes:
“Many scholars may wish to come back to their own data and further explore them for future publications before making them available to the public. Data sharing may also lead to various ethical and moral dilemmas and additionally put impractical new demands on authors in the preparation of their research data for publication.”
However, while author resistance to data sharing is understandable, there are possible benefits stemming from data sharing, notably in the form of subsequent career advantages. Author resistance is often based on a lack of knowledge of potential benefits that can accrue at the individual researcher level (see Popkin, 2019). As noted by Logan et al. (2021), a number of studies have shown data sharing to be associated with an increased citation rate (e.g. Drachen et al., 2016; Piwowar & Vison, 2013; Colavizza et al., 2020). Another benefit of data sharing may lie in due weight being accorded to data sharing in research funding applications and academic promotion procedures. In recent years, acting as peer reviewer for academic journals has been recognized alongside research publications, as a valid element in a researcher’s CV. It is probable that increasingly over time similar status will be accorded to the creation and sharing of data sets deposited in approved repositories. There is the justifiable fear of being scooped if other researchers can utilize data sets in a data repository for research projects that is on the author’s future research agenda; however, it is possible in many cases to place a time limit on the use by others of posted data sets (say of 1–3 years). There remain practical issues of data set preparation prior to sharing and uploading to data repositories, for which many researchers are both technically and psychologically ill prepared. University Education departments could include data sharing technicalities and guidance in standard doctoral courses on the preparation of papers for journal publication. In this connection, the recent paper by Logan et al. (2021) provides a valuable step by step guide to data sharing preparation in the Education sciences.
Many survey respondents presented practical arguments against data sharing, claiming that it was not appropriate or possible for the data set in question, referring to ethical considerations or the qualitative nature of the data set. In our author sample, half of the papers in question had employed qualitative methods, reflecting no doubt an ongoing trend in recent years for qualitative-based data sets to constitute an increasing feature of educational research. While this raises challenges for data sharing, there is now a sizeable literature offering guidelines and hints on both how to generate and to report qualitative data research in ways consistent with ethical standards (see particularly Tsai et al., 2016; also Antes et al., 2018; Mannheimer et al., 2019). Ethical issues may also be present in the sharing of quantitative data sets, notably relating to the identity of survey respondents, when the full data set is made generally available. Meyer (2018) offers practical tips on preserving ethical standards when sharing quantitative data, including procedures to anonymize sensitive personal information.
Thus in conclusion, it is recognized that author resistance, a multi-faceted and sensitive issue, is understandable and in many situations, appropriate. Yet, fears relating to engaging in data sharing may not always justify failure to do so; we have noted that a number of measures are available to circumvent, at least partially, or to meliorate many of the perceived barriers to sharing data.
And thirdly, as noted above, very many Education journals have not declared a policy on data sharing, indicating indifference or even, perhaps, disapproval. The result is that there is no incentive or encouragement for researchers to share the data underlying the empirical papers that they submit to these journals.
There may be a crucial role to be played here by the small number of major journal publishers, under whose imprint most Education journals are published. There is a general impression that, the declared stand taken by individual publishers notwithstanding, a fairly free hand is extended to their journal editors in deciding on the appropriateness or level of providing data access in the journal in question. However, these journal publishers are well positioned to open dialogue on the desirability of data sharing with these editors. One persuasive argument that could be presented in this regard, stems from the increasing requirement, emanating from major research funding organizations that the data generated by the research they support, be made generally available; journal editors would be interested in securing such submissions, which are likely to be of high quality. An example is provided by new guidelines for research grant applicants issued by the Education and Human Resources Directorate of the National Science Foundation (NSF), which includes the requirement that: “unless otherwise restricted by policy or regulation, access to data and products should be provided, and data and the products of research shared, as soon as is reasonably possible”.
In a similar vein, Education professional/research associations can make a contribution through such activities as promoting special sessions on data sharing at international conferences and at doctoral workshops, as well as requiring data sharing in journals within their purview. Notably, the American Education Research Association (AERA) has been a leader in this regard. The Association has issued guidelines for the reporting of empirical research (including data sharing), has engaged in workshop organization (in collaboration with the NSF), and mandated data access in its journals (such as AERA Open).
Limitations
The time gap between the data collection phase and eventual publication is substantial; this results from delays in data analysis and writing up this research due to the occurance of academic pressures during that period, in the wake of Covid19. However, future research employing larger samples and in other social science disciplines, may be expected to substantiate the findings reported in this paper.
The samples employed in the research are admittedly small; thus, the results presented in this paper, though highly informative should be seen as indicative rather than fully representative.
The present research has been limited to a consideration of data sharing that is encouraged or required by academic journals. We note that data sharing may also be promoted through other avenues, such as by major research funding authorities. However, a recent qualitative research study based on five disciplines (albeit not including Education) has suggested that “far more data-sharing is occurring in scientific practice than seems to be apparent from a concept of open data alone” (Barlosius, 2023). Such data sharing takes the form of voluntary peer-to-peer cooperative arrangements, in which researchers voluntarily make their data accessible to other researchers. While, in practice, there may be considerable “hidden” data-sharing of this type, its extent seems to have remained unresearched hitherto.
Conclusions
While the importance and practice of data sharing is now well established in many academic disciplines, the present research has confirmed that, in common with most other social science fields, data sharing is rare in practice in the empirical research published in Education journals. While some encouraging initiatives are in evidence in a number of journals, these remain exceptional. There is still a long way to go before the Education academic community can benefit from the broader availability of shared research data. This paper has offered some steps that may be taken to come closer towards meeting this desirable outcome.
