Abstract
Over the past twenty years the normative framework that underpins social science research has undergone major shifts. Among the most salient changes is the growing incentive to archive, share and reuse research data. Today, many governments, funding agencies, research infrastructures and editors are pushing what is commonly known as Open Research Data (ORD). By reflecting on concrete experiences of data sharing, the different contributions to this issue point to the ethical challenges posed by this new trend. Through a fine objectivation of the archiving work, they call to take distance from the bureaucratic framework imposed by the new ethics and ORD policies and to think of data sharing as a situated, contextual and dynamic process. The cost of the exercise as well as the sensitivity of certain data and subjects suggest opting for flexible approaches that leave a certain autonomy and freedom of appraisal to researchers.
Introduction
Over the past twenty years the normative framework that underpins social science research has undergone major shifts. Among the most salient changes is the growing incentive to archive, share and reuse research data. The 1990s saw the emergence of an important international movement promoting the extension of open access principles, previously reserved for scientific publications, to the empirical materials that underlie them (Mauthner and Parry, 2013). Today, a growing number of governments, funding agencies, research infrastructures and editors are pushing what is commonly known as Open Research Data (ORD) (Pasquetto et al., 2017). Researchers applying for public funding are commonly required to develop a Data Management Plan (DMP) in which they have to describe the life cycle of their research materials, from production to dissemination. Far from being a mere administrative formality, this new requirement from funders constitutes a highly performative tool which explicitly aims to make data available, accessible, interoperable and re-usable – what is commonly known as the FAIR principles. 1 At the European level, all projects funded by the European Research Council (ERC) under the Horizon 2020 program participate by default in an ORD pilot, which aims ‘to improve and maximize access to and re-use of research data’ 2 and which is ‘monitored (…) with a view to further developing the Commission’s policy on open science’. 3
The enthusiasm of research governance institutions for ORD has been instrumental in giving it the appearance of a ‘common cause’ with clear principles and a coherent rationale (Chartron and Schöpfel, 2017). However, ORD constitutes a ‘boundary object’ (Flichy, 2003) whose definition varies considerably from one context to another and which, since its beginnings, has been the object of hegemonic struggles (Moore, 2017). Initiated by a network of activist researchers and librarians, ORD was initially animated by a libertarian spirit that consecrated ‘openness and collaboration as doctrines for the elaboration of goods and knowledge’ (Ibekwe-Sanjuan et al., 2015: 18). The idea was then as follows: ‘Research data is a public good, produced in the public interest, and should be openly available to the maximum extent possible’ (Pryor, 2012: 47). The modalities of application of this universalist credo were, however, largely voluntary and collaborative, the underlying motivations being ‘exchange and visibility decided by the researchers themselves’ (Chartron, 2018: 183). While this ‘pioneering’ vision prevailed for a long time, it has been progressively challenged by a much more managerial discourse, widely endorsed by institutional actors and pushed through top-down policies (Chartron, 2016). Over the last few decades, the ideal of the universal common good of science has been marginalized in favor of approaches expressed in terms of efficiency – scientific and budgetary – and promoted by research managers, publishers, information professionals and politicians through vertical incentives such as ‘conditionality’ – in some countries, public funders make the final installment of funding conditional on data deposit (Scot, 2006). As Chartron sums up: ‘In the evolution of the open access/open science movement, a great difference is evident between the freedom to open up one’s work in the 1990s and the current political injunction to open up everything, which profoundly affects the researcher’s autonomy of decision’ (Chartron, 2018: 183). By operating a shift in the control of data dissemination away from researchers towards research governance institutions, ORD policies have, indeed, contributed to significantly eroding the sovereignty of scientists (Mauthner and Parry, 2013: 61). What started as a community-driven project seems to have changed into a global binding policy driven by ‘institutional interests, commercial benefits and neoliberal ideology’ (Schöpfel, 2015: 322) and implemented through coercive instruments.
The shift from a bottom-up grassroots movement to an institutional injunction has not been without opposition, particularly from researchers working with qualitative methodologies. The British case is emblematic of this. The setting up of the pioneer data bank Qualidata in 1995 and its influence in the decision of the Economic and Social Research Council (ESRC) to make it compulsory to archive the data produced with its financing has been met with considerable resistance by ‘recalcitrant researchers’ who were ‘wary of the implications of depositing data, and the possibilities of reusing data’ (Moore, 2007: 1). This bunch of ‘resisters’ did not hesitate to publicly express their dissatisfaction, thus sparking lively controversies in the literature. In fact, supporters and critics of Qualidata have for years engaged in a (mostly) theoretical duel, through interposed publications, over the very possibilities and relevance of reusing qualitative data, the excessive decontextualization of qualitative materials and the potentially harmful standardization of research practices (Duchesne and Noûs, 2019). At the international level, the general introduction of binding data-sharing policies has had the same type of polarizing effects, although the terms of debate are not the same from one country to another – in the United States, for example, the issue of transparency occupies a more important place than that of data reuse. This has not only resulted in a large increase in the literature on ORD, but also in its structuration around a fault line between advocacy and critical stances. As Duchesne points out, if the initial scientific opposition between supporters and opponents of secondary analysis tends to attenuate, the opposition to the potential standardization of scientific activity is only growing (Duchesne, 2017: 9).
Introducing the Debate
Since the early 2000s, a growing number of researchers engaged in demonstrating the scientific, historical and economic benefits of ORD (Bishop, 2005; Corti, 2006; Corti et al., 1995; Corti and Thompson, 1998; Corti et al., 2005; Cragin et al., 2010; Davenport and Patil, 2012; Duchesne and Garcia, 2014; Fielding, 2004; Wallis et al., 2013; Wicherts and Bakker, 2012). From a scientific point of view, data sharing would have four main benefits. First, it would allow an increase in the robustness and ‘generalizability’ of research results by broadening the available empirical basis and increasing the number of case studies (Duchesne and Garcia, 2014). An oft-cited benefit of data sharing is, indeed, ‘the possibility of applying questioning to a larger population than would have been possible with a single (…) survey’ (Le Roux and Vidal, 2000: 63). Second, by allowing the (re)use of data produced in other contexts, ORD would foster ‘neutrality’ and ‘epistemological distance’ towards research materials, thus leading to more ‘objective’ analysis and therefore to ‘better science’ (Bornat, 2005; Bishop, 2009; Hammersley, 2010). More precisely, several authors argue that ‘whilst secondary users lack immediate knowledge of the research settings, there is no prima facie reason why the primary researcher has a uniquely privileged awareness of the situatedness of the research endeavour (…). It may be that distance itself sheds analytic or critical light’ (Irwin, 2013: 298). Third, ORD would stimulate innovation by facilitating the reanalysis of data from new perspectives and methodologies (Mauthner and Parry, 2013). The use of computer-assisted qualitative data analysis softwares is, for example, often considered to be an interesting and innovative way to address old datasets from new angles (Rioufreyt, 2019). Fourth, ORD would increase transparency by opening the ‘black box’ of data analysis and thus facilitating the verification of the scientific validity of research work through replication or revisit (Kaye et al., 2018; Pisani and AbouZahr, 2010; Royal Society, 2012; Schmidt et al., 2016; Watson, 2015). Hammersley sums up this position well: ‘One is rarely, if ever, in a position to provide all the recorded data relevant to a particular point, and there are few parallels in qualitative research to the data reduction techniques available in quantitative work that facilitate the summarising of large amounts of data (…). It is in this context that qualitative data archiving could play an important role. In principle, it allows those for whom the evidence presented in a research report is insufficient to gain access to whatever further data they require’ (Hammersley, 1997: 133–134).
From a historical point of view, ORD would then have two major advantages. First, it would contribute to history of science by allowing the archiving of and future access to the ‘kitchens’ of major research projects whose materials would otherwise be permanently destroyed (Mouton, 2008). Yet, access to these materials may be of interest for understanding how researchers of the past positioned themselves ‘in relation to theoretical, epistemological, methodological and substantive issues of the time of the research’ (Mauthner et al., 1998: 743). This interest is all the greater since much is unknown about the conditions of production of many classical writings in the social sciences. For example, surprisingly very little is known about the modalities of access to the fieldwork of great sociologists of organizations such as Michel Crozier (Bourrier, 2013) whose work is today considered essential. The creation of Qualidata responded in part to this desire for the historical preservation of research material threatened by imminent destruction, as Corti and Backhouse point out: ‘There was an urgency concerning the acquisition of material from earlier social studies as it was discovered that many important datasets were already lost’ (Corti and Backhouse, 2005: 2). A national ‘data hunt’ was even launched in the early 2000s among prestigious British scholars in the hope of saving the original collections of social science research archives dating back to the 1950s and 1960s (Scot, 2006: 54). Second, ORD would contribute more broadly to social history by providing access to ‘highly descriptive information about the (…) historical attributes, attitudes and behaviour of individuals, societies, groups or organizations’ (Corti and Thompson, 1998: 88). In the United States, institutions such as the National Anthropological Archives have precisely both a scientific and a patrimonial vocation as the materials they hold are not only accessible to researchers, but also to the descendants of the communities from which they were gathered (Schmid, 2008). In this regard, it is interesting that the early supporters of qualitative data archiving – at least in the UK – were mainly (oral) historians, who were keen to ensure that the ‘voices of the past’ (Thompson, 1978) – and particularly those of ordinary people – could continue to speak.
From an economic perspective, finally, open data would allow public research funding to be optimized by making possible more extensive use of existing materials, thus limiting ‘unnecessary’ production of new data. Based on the neo-liberal idea of performance, quality and efficiency, this vision rests on the idea that ‘those who spend taxpayers’ money should be accountable to the public’ (Shore and Wright, 1999: 557). In other words, to quote those of Arzberger et al. (2004), ORD would aim to ‘ensure that both researchers and the public receive optimum returns on the public investments in research’ (2004: 135). In this regard, it is interesting to note that major archives such as Qualidata emerged precisely ‘at a time of radical change in higher education (…), a time of a turn to neo-managerialism and audit culture, and an emphasis on cost-effectiveness and value for money’ (Moore, 2007: 9).
Generally speaking, pro-ORD literature rests – more or less explicitly – on ‘positivist’ postulates (Feldman and Shaw, 2019). Data tends to be seen as ‘objective’ and ‘factual’ observations that can be easily communicated, re-used and replicated. As Mauthner and Parry state, ‘open access data sharing policies embody an instrumental view of data in which data are seen as free-floating public commodities, openly available to anyone wishing to tap into their inherent meanings’ (Mauthner and Parry, 2013: 62). This vision seems to be largely dominant within major data archiving infrastructures, which owe much to the legacy of their founders (Scot, 2006). This excerpt from an interview with Paul Thompson, Qualidata’s co-founder and first director, is particularly enlightening in this regard: ‘I came out of history where there’s no subjective tradition. I then moved into sociology where, again, at that time, there was no post-modern interest in subjectivity. It was a tremendously strong tradition of doing social research to establish facts, and I was trying to relate to that. I don’t think, for the time, it was particularly positivistic, there just wasn’t very much of an alternative vision’ (Thompson, 2019: 30). From this perspective, data sharing is considered as a mainly technical issue that can be addressed through the implementation of adequate data management tools (Sansone and Rocca-Serra, 2012; Schumacher and VandeCreek, 2015) and adequate policies (Charbonneau, 2013; Davenport and Patil, 2012; Guedon, 2015). Researchers themselves are apprehended as ‘interchangeable data collectors who perform the technical task of amassing information’ (Mauthner and Parry, 2013: 60). As Borgman sums up: ‘much of the scholarship on data practices attempts to understand the socio-technical barriers to sharing, with goals to design infrastructures, policies, and cultural interventions that will overcome these barriers’. 4
Since its inception, the ‘advocacy’ literature has been constantly challenged by researchers that criticize its excessive ‘universalism’. For several authors, data sharing needs to be addressed case by case, in a contextualized and relational way, and not through standard policies and procedures (Moore, 2007). Generally speaking, three major criticisms have been addressed to ORD as it is implemented today. First, a number of scholars express serious doubts about the very possibility of reusing data by arguing that while some material – e.g. survey data – can easily be shared and reanalyzed, other data – such as field observations – suffer from too much decontextualization when archived and disseminated (Duchesne, 2017). A large part of researchers using ethnographic methodologies to conduct in-depth case studies (Béliard and Eidelimnan, 2008) are reluctant – if not hostile – to data sharing since reflexivity, which is considered to be ‘essential in order to reuse data and therefore reusing qualitative data’ cannot, in their own words, ‘be archived’ (Moore, 2007). For them, ‘without having experienced the investigative situation, it is not possible to understand and give meaning to the material’ (Huyghe et al., 2018: 9). More generally, several researchers express concerns ‘about the limits of using data without full access to the original conditions of the creation of the data; the irreproductibility of the original interview, the face-to-face encounter with an interviewee; the insufficiency of a transcript against the ethnographic moment of the interview; and the impossibility of ever adequately archiving the context of data’ (Geiger et al., 2010: 8). This vision is often based on an understanding of the researcher him/herself as a medium that internalizes – and even ‘incorporates’ – the contextual and substantial information necessary to the understanding of his/her object of study. In this perspective, materials such as fieldnotes are not ‘data’ but tools which have only a mnemonic function, namely that of stimulating researchers’ memory: ‘A newspaper clipping can be interpreted. The clipping has more validity of its own, but it can be a fieldnote if it needs to be read by me…It’s what I remember: the notes mediate the memory and the interaction’ (Jackson, 1990: 20). Some authors even speak of ‘headnotes’ to qualify all the crucial information used although not verbalized by the researcher (Ottenberg, 1990). In summary, part of the criticisms made to ORD denounce the implementation of archiving policies without serious reflection on the ‘value’ and real usefulness of materials (Chartron, 2018). This position is reinforced by the low rates of reuse of data made available by major qualitative archives (Duchesne and Noûs, 2019) and the growing accounts from researchers ‘who have embarked on secondary analysis only to find it fraught with more difficulties than perhaps were anticipated’ (Moore, 2007: 2).
The second burst of criticism to ORD comes from authors who stress the ethical issues raised by data sharing, such as the difficulty of obtaining informed consent in certain situations, the sensitivity of certain issues and data, the vulnerability of certain populations, and the particularity of some approaches that rest on a certain degree of confidentiality (Anderson and Schonfeld, 2009; Both and Garcia, 2014; Bull et al, 2015; Cooper, 2007; Feldman and Shaw, 2019; Haddow et al., 2011; Harding et al., 2013; Kostkova, 2018; Kowalczyk and Shankar, 2011; Mbuagbaw et al., 2017; Mennes et al., 2013; Prost and Schöpfel, 2015; Sheather, 2009; Takashima et al., 2018). Researchers who express such concerns generally tend to base their work on a contextual, situational and relational ethics that is less oriented towards the general public – as in the libertarian universalist approach of ORD – than towards the participants in research projects. As Mauthner and Parry point out, ‘researchers commonly uphold a moral imperative to honour relationships of trust they have developed with respondents who have entrusted them with personal information and confidences (…). They may use the data for the benefit of science and society, and to further their own careers, but many will do so only in a context where they feel they can safeguard respondents’ moral interests. This is why many researchers want to retain some personal control over their data, and over respondent protection. Open access policies, and their privileging of universal others, can be experienced as a violation of these personal and specific trust-based relationships and moral responsibilities’ (Mauthner and Parry, 2013: 59–60). These concerns are long-standing among American anthropologists, as this interview excerpt from a well-known article by Jackson points out: ‘The people being observed forget you’re there. There is something unethical about that: they go on about their business, and you’re still observing. So to have fieldnotes that reflect your direct observations become public property is (…) a betrayal of trust’ (Jackson, 1990: 22). In France, anthropologists such as Florence Weber have expressed similar views: ‘the ethnographer considers that he has a personal contract of inquiry with his respondents and that this contract does not include the question of reuse. Therefore, we emphasize confidentiality, anonymity, the fact that it will not come out of our hands and that no one else will be able to use it. This is the other side of the coin: the ethnographic investigative relationship is not, or at least not always, a public one; we must protect our respondents from the inquisitive gaze, this is the condition under which we can obtain something other than an official truth’ (Müller, 2006: 107). To address these types of problems, most data archives recommend – if not require – the anonymization of materials. In many data protection laws and regulations, anonymity is even ‘taken-for-granted as an ethical necessity’ (Moore, 2012: 331). That said, several authors point out that research data – especially qualitative data – is not only difficult to anonymize, but its integrity – and thus its scientific value – can be compromised by the process of de-identification (Parry and Mauthner, 2004). More precisely, many researchers argue that ‘one effect of anonymisation is a process of abstraction, tacitly acknowledged in the concern that anonymised data have lost their situatedness or context, and this may prevent their (re)use’ (Moore, 2012: 336). Some authors also point to the potentially ‘unethical’ nature of anonymization, or at least its contribution to reinforcing power asymmetries between participants and researchers, the latter always being named and therefore never ‘forgotten’ (Moore, 2012). The reluctance to share data for ethical reasons constitutes a major obstacle to ORD. A survey conducted by the Swiss Center of Expertise in the Social Sciences (FORS) in 2016 on the attitudes and practices of Swiss social scientists towards data sharing showed, for example, that 42% of researchers feel that their data is ‘too sensitive to share’ (Heers et al., 2017).
The third major type of criticism of ORD is more of a political nature. More specifically, several authors object to an injunction to share data that would not only be unsuitable for certain approaches – notably ethnographic – but would also have explicit aims of controlling and standardizing scientific practices according to positivist criteria. This quote from Laferté perfectly encapsulates this posture: ‘The archiving of the social sciences is (…) a positivist approach that (…) appears to be an extension of a scientificist will (…) that has been running for two decades to ‘better fix’ the practices of ethnography’ (Laferté, 2006: 33). Data quality control carried out by large archives has, in this perspective, been widely criticized for its potentially negative performative effects – i.e. the uniformization of data practices around positivist principles. Indeed, data infrastructures are often given the mission to ‘verify’ and ‘validate’ the deposited materials and have therefore full freedom to reject them if they do not meet their ‘scientistic’ standards, such as sufficient representativeness of the samples – a logic that often makes no sense for qualitative researchers (Hammersley, 1997). As Scot states, in data archives ‘the construction and presentation of raw data takes the place of proof, experimentation or validation in the laboratory’ (Scot, 2006: 58). Critics of this potential standardization of research practices point in particular to the effects it could have on creativity and innovation, since risk-taking could be limited to the simple reproduction of ‘ways of doing things validated by the community’ (Duchesne, 2017: 12). Questions were, for example, raised about the potential ‘sclerosing’ effects of writing field notes knowing that they will be made public: ‘Theoretical and methodological notes, not to mention diaries and personal logs, are currently written by researchers for themselves. What would be the implications of these being written with a wider audience in mind? Are we in danger of finding researchers producing documentation that bears only a remote relationship to how the research was done, in the way that bureaucracy sometimes produce official records that have little direct relation to how they actually operate?’ (Hammersley, 1997: 136). For these different reasons, some authors, such as Samuel A. Moore, plead for ‘messiness’ and ‘diversity’ in ORD implementation, notably so as not to discourage researchers by bureaucratic and exclusionary procedures. On this subject Moore declares: ‘Such messiness, if promoted as valuable in itself, would provide a space for diversity and the more marginalised voices and elements of academic research to be heard. It would also allow open access to not be so easily captured by dominant and/or neoliberal approaches’ (Moore, 2017: 12). And to conclude: ‘The important thing here is for funders, institutions and governments to back away from implementing restrictive mandates and instead facilitate experimentation governed by communities themselves’ (2017: 13).
The advocacy and critical literature on research data sharing has long remained essentially ‘political’. Indeed, ORD advocates have generally proved to be close allies of institutional repositories – such as Qualidata – that sought to promote a new culture of sharing, while opponents were mainly researchers concerned about the potentially negative consequences of the new top-down policies on their professional careers and positions. If the debate took on the appearance of a purely scientific controversy – due to the fact that it was mainly deployed in academic literature – it was nevertheless underpinned by very materialistic concerns related to gain, preservation or loss of legitimacy in the field of knowledge production. Indeed, like any policy, ORD generates ‘winners’ and ‘losers’ due to the fact that by carrying a particular vision of science – in this case a positivist one – it necessarily discredits the others – in particular the more ‘constructivist’ ones –, who are therefore forced to carry out a ‘reverse scientific update’ (Laferté, 2006: 32). Determined to promote a new culture of sharing, the Qualidata team has been very proactive in framing the debate through ‘scientific and institutional animation’ around the reuse of qualitative data which resulted in the organization of workshops, the publication of special issues and support for the creation of qualitative databases in different countries (Duchesne, 2017: 9). As a consequence, ‘much of the debate (…) has focused on the possibility, feasibility and desirability of creating and using qualitative research archives in the social sciences’ (Mauthner and Parry, 2009: 294) and has therefore remained at a very high level of abstraction. This was reinforced by the fact that most authors do not have concrete experience with archiving and/or reuse and therefore adopt a rather ‘hypothetical’ tone. Indeed, the discussions have remained mainly theoretical and have tended to neglect more ‘materialistic’ considerations – such as career strategies – and psychological considerations – such as emotional attachment to materials – which have hitherto remained largely implicit or even ‘taboo’. These factors, which can be very structuring, need however to be explored in greater depth, as suggested by a more sparse and subtle body of literature on data sharing.
Going more in Depth: Disciplines, Careers and Personal Factors
In recent years, the literature on ORD has taken a slight empirical turn. Indeed, a number of studies have been conducted to directly – and more ‘objectively’ – address researchers’ attitudes towards data sharing (Jeng et al., 2016; Kim and Stanton, 2013; Tenopir et al., 2011; Van den Eynden et al., 2016). From these, it has emerged that the scientific community generally expresses rather positive views on data sharing. However, these same studies have revealed a significant gap between declared attitudes and practices. Indeed, the proportion of those who think a data sharing culture is important is much higher than that of those who have actually shared data. In other words, the predictions made by some authors that pressure from funders and publishers would lead to increased data sharing (Kim and Stanton, 2013) have to date not been supported by significant evidence (Jeng et al., 2016). A survey conducted by the staff of the journal Science, for example, showed that only 7.6 percent of researchers put their data in community repositories (Science Staff, 2011). A study of research papers in 50 high-impact journals found that only nine percent of the papers had deposited the full data online, even when the journal requires it (Alsheikh-Ali et al., 2011). These discrepancies between declared attitudes and practices have led some authors to see dissonant voices on ORD as more ambivalence than opposition: ‘[Researchers] are supportive of the principle of data archiving and reuse, but cautious, hesitant and uneasy about how this principle is being put into practice’ (Mauthner and Parry, 2009: 292). That said, little systematic and in-depth work has been done on the structural, material and personal elements that inhibit the implementation of ORD in its dominant conception.
Among the avenues to be explored for the study of the barriers to the implementation of a global ORD policy are the following: disciplinary affiliation, career issues and personal factors. Regarding disciplines, it is now well known that scientists are strongly ‘influenced by disciplinary traditions and group norms, which can be deliberately enforced, prompted or activated as subtle cues through observation of how others important to them behave (…) [and] that exert social pressure on [their] data sharing and reuse behaviour’ (Curty et al., 2017). The contrast between historical disciplines – especially oral history – and disciplines belonging to the social sciences is particularly illustrative in this regard. Indeed, while oral historians ‘archive their data as a matter of course’, social scientists tend to consider their empirical materials as ‘personal resources’ (Parry and Mauthner, 2004: 148), especially in anthropology which is governed by strong rules about the ‘intensely private nature of field notes’ (Jackson, 1990: 9). This is partly due to the fact that within oral history ‘a main purpose of data collection is to secure an historical record for current and future access’, while in the social sciences ‘data are seen mainly as a potential resource to generate new hypotheses, findings and theories’ (Parry and Mauthner, 2004: 148). More precisely, oral historians generally apprehend their data – especially interviews – as direct ‘testimonies’ whose value increases with their publication: ‘The historian, given the requirements of his discipline, needs the testimony to be made public, to be accessible, identified, verified, controlled, etc. The more the testimony is exploited, the more it is used publicly and by name, in the end, the better’ (Müller, 2006: 108). In contrast, social scientists – and especially those working with ethnographic methods – tend to apprehend the discourses of research participants as something close to ‘confessions’ whose interest is less factual than situational: ‘Ethnography could be compared to confession. (…) On the protective side, it is a question of not revealing anything about the secret of instruction, we might say; on the scientific side, we are not interested in saying: ‘it really happened like that’, but rather in saying: ‘here are the mechanisms and processes at work in such and such a social situation’’ (Müller, 2006: 107–108). The contributions to this special issue – which will be more extensively presented in the next section – illustrate these different points well. For example, Bizeul’s article recalls not only the extent to which ethnography takes for granted the confidential nature of field notes, accessible only to ‘trusted friends’, but also shows how this type of approach is based on an implicit – but nevertheless well internalized - norm of confidentiality. On the contrary, Corriveau et al.’s article puts more emphasis on the benefits – if not the necessity – of sharing the collected materials. This posture can be explained in part by the socio-historical perspective of the authors who have multiple professional backgrounds as sociologists, criminologists and historians and whose logic is precisely that of patrimonializing first-hand testimonies with the aim of their subsequent – albeit controlled – exploitation.
Besides the issue of archiving itself, it is also interesting to note that while some disciplines are almost exclusively focused on data ‘production’, others are more oriented towards data analysis. For example, many ‘qualitative’ approaches are characterized by a lack of precision regarding the methods used to analyze the data: ‘While data collection methods are the subject of manuals and teaching materials and are the subject of more or less detailed indications in the publications, the analysis most often constitutes a ‘black box’ (Duchesne, 2015: 17; Thorne, 2000). As Gupta and Ferguson put it for anthropology: ‘It is fieldwork that makes one a ‘real anthropologist’ (…). Indeed (…) the single most significant factor determining whether a piece of research will be accepted as (that magical word) ‘anthropological’ is the extent to which it depends on experience ‘in the field’ (Gupta and Ferguson, 1997: 1). In this regard, it is worth mentioning the enlightening reactions that the publication of Street Addicts in the Political Economy, a book published in 1993 by Alisse Waterston – an American anthropologist whose early work was precisely based on ‘secondary’ data – provoked among some anthropologists: ‘Methodologically, Waterson’s work is innovative if, perhaps, questionable. It is an exercice in secondary analysis of qualitative data, collected between 1984 and 1987 on the Lower East Side of Manhattan by a multidisciplinary team of social science researchers directed by sociologist Paul Goldstein of Narcotic and Drug Research, Inc. (…) Waterston’s book is a tribute to the rich and multifaceted data collected by Goldstein and his team. Nevertheless, one obvious limitation of Waterston’s approach is that she was constrained by the research agenda and strategies of a project designed to answer different questions than hers. Another limitation is that she had no access to nonlinguistic experimental learning, which usually is a strength of ethnographic research and often distinguishes anthropology from other disciplines’ (Claeson, 1995: 516). It seems highly likely, therefore, that researchers whose professional identity is based primarily on their ability to produce data would be more reluctant to share and reuse than those whose legitimacy as researchers derives from their ability to analyze phenomena ‘by means of mathematically-based methods, especially statistics’ (Yilmaz, 2013: 311). The question of ‘delegation’ in ethnographic work is enlightening in this regard, as illustrated by this excerpt from a debate with Florence Weber: ‘We are (…) so sensitive to the question of delegation that if, at times, an ethnographer delegates part of his work, he does not admit it’ (Müller, 2006: 104–105). This issue, which is logically largely overlooked in the literature, has highly motivated the conception of this special issue, even if in the end it was abandoned. Indeed, within the framework of our research activities in an institution that offers the archiving of social science research data (FORS), I was brought to develop an in-depth reflection on the brakes and drivers for sharing. Quite quickly, my own research experience led me to underline the negligible importance of ‘data analysis’ in the construction of my professional identity. More specifically, trained as a political sociologist – specialized in the study of bureaucratic organizations and political elites – I have been led to develop an understanding of methodological writing as a process aiming more at describing the conditions of production of materials – field negotiations, interview and observation modalities, etc. – than their subsequent exploitation. That said, my experience contrasts with that of some colleagues working with ‘secondary’ – usually quantitative – data for whom ‘methodology’ refers exclusively to analysis tools and models. From there, it is understandable why some researchers are reluctant to reveal the behind-the-scenes aspects of their projects – as the criteria for the scientific validity of their analytical methods are not always formalized – and call their peers to ‘trust them’ (Bishop, 2009).
Regarding career issues, junior and senior researchers do not relate to data in the same way. Early-career scholars tend, for example, ‘to be fully engaged in every research stage of their projects, including data collection, processing, and analysis, whereas senior researchers focus more on constructing ideas and interpreting data’ (Jeng et al., 2016: 19), which generally leads to an increased personal attachment to data on the part of the former compared to the latter. Moreover, it has been observed that data-sharing behaviour tends to increase significantly with age (Tenopir et al., 2015). This behaviour can be explained ‘by the higher degree of competition for tenure and professional success that younger researchers face’ (Fecher et al., 2015: 4). Indeed, young researchers are more sensitive to the occupational risks of sharing, which leads them to reject principles considered ‘beneficial to science’, such as ‘replication’ or ‘falsification’ (Costello, 2009; Acord and Harley, 2012; Pearce and Smith, 2011). As Moore points out, ‘the level of openness associated with a publication is often controlled by the author who is incentivised to strategically reveal results and data for maximum career benefit. Keeping raw data secret while publishing only a description of the results, publishing rapidly to avoid being scooped, etc. are all ways to establish priority over a research result (…). Consequently, science and scholarly research more broadly (…) are (…) inherently connected with the careers of authors’ (Moore, 2017: 3). In this respect, it is interesting to note that the survey conducted by FORS among Swiss social scientists shows that one of the main concerns of researchers regarding data sharing is precisely to ‘publish first’ (Heers et al., 2017). In this issue, Daniel Bizeul’s contribution tackles this question head-on by explaining how his age allows him greater freedom in sharing his materials since he is now ‘out of competition’, while warning against the excesses of the growing injunction to deposit data. His position contrasts, however, with the contribution of Stewart and Shaffer who are researchers at an earlier stage in their careers but whose data deposit was imposed by their funder – namely the ESRC.
Finally, a number of important personal factors come into play when addressing the issue of sharing. Indeed, research work involves a significant investment of time, energy, effort and emotion, with the result that ‘researchers become intimately entangled, inseparable from, and committed to the human and non-human ‘things’ they study’ (Mauthner and Parry, 2013: 59). It is therefore quite logical that many researchers ‘understand their data as part of their intellectual property; have strong emotional attachments to their data; feel they have a moral right to determine whether and how their data should be used, by whom, and for what purposes; and want recognition for the investments they have made in data collection’ (2013: 59). This personal attachment to data can also be accentuated by the presence of personal elements in the empirical materials – which largely depends on the degree of involvement of the researcher in their (co)production. The contributions in this special issue shed an interesting – albeit indirect – light on this question. Indeed, the logics at work in an ethnographic fieldwork such as that of Bizeul, whose materials are full of intimate elements that make the data inextricably linked to his own person, are quite different from those at work in a historical research project whose materials are ‘naturalistic’ and ‘raw’, as is the case with the project of Corriveau et al., and which are therefore less imprinted with the researcher him or herself. This question is far from trivial in the sense that to look critically at data in which the researcher is intrinsically present is to criticize the researcher as a person, which can evidently constitute an obstacle to the dissemination of data. That said, despite the differences, all the contributors to this issue testify to a certain attachment – if only in the form of moral responsibility – to their data that leads them to defend a controlled, selective or chosen sharing.
While highlighting the diversity of barriers and drivers to data sharing has been enlightening, it is not sufficient for a fine understanding of the challenges raised by ORD. The literature lacks empirically grounded accounts that show how the different factors presented above – i.e. epistemological and political postures, institutional and disciplinary constraints, career strategies, psychological dispositions, etc. – are activated (or not), deployed (or not), articulated, prioritized and implemented in practice. It is this lack that has led some researchers to set up research programs based on experimentation, such as the national project ‘RéAnalyse’ in France, whose aim was to carry out concrete experiments in the reuse of qualitative materials and which was based on the following observation: ‘There has been little discussion of what this practice actually mobilizes in terms of methods and know-how. There are few examples of reuse in comparison to the general reflections on the expectations of secondary analysis and little discussion of the necessary conditions and know-how for secondary analysis’ (RéAnalyse, 2010: 6). That said, if the downstream of the process (re-use of data) is today the object of a more robust literature, the upstream (data management, archiving, dissemination, etc.) remains largely neglected – with the notable exception of some works in anthropology (Cliggett, 2015; Zeitlyn, 2000; Silverman and Parezo, 1995). There is therefore a real need to turn away from policy considerations to look more deeply at the concrete practices of researchers. Epistemological, ethical, political and practical strategies need to be addressed less so in principle than through the – preferably inductive – analysis of concrete implementation. In a context of increased standardization of scientific practices, it seems essential to point out the inevitably ‘tailor-made’ nature of data management practices. This special issue has been conceived precisely to contribute to filling this gap through the collection and reporting of concrete archiving experiences and reflections. More specifically, the idea was to gather reflexive feedback that allows an empirical-based approach to the various issues raised at different levels by the archiving of qualitative data. This idea emerged in the context of my activities at FORS, whose initial aim was to encourage a change of culture towards ORD among social scientists, especially those working with qualitative data. That said, my own sensitivities as well as my daily contacts with the research community gradually led me to review my approach in the sense of taking better account of existing research cultures in my work to promote data sharing. Indeed, top-down injunctions in favor of data sharing sometimes contradict the norms and practices in force in certain research communities. This is why I felt it was essential to address the debate less through abstract principles than through concrete experiences. Such an approach seems promising to me in the sense that it can allow research governance institutions to better understand the constraints that researchers encounter ‘in the field’ and thus to set up adapted and modular strategies. While openness and transparency are important values, the academic field cannot be limited to them, as it is crossed by a multitude of norms and power relations. By giving researchers a voice, I precisely hope to better highlight – from the bottom up – the different factors that really matter to them at the time of opening their materials and, afterwards, at the time of reflection.
Contributions to this Special Issue
The three contributions to this special issue provide empirical feedback on the archiving and (semi-) public disclosure of research data. The first, authored by Daniel Bizeul, reports on the archiving of an ethnography of the Front National, a former French extreme right-wing party, conducted between 1996 and 1999; the second, co-authored by Emma Stewart and Marnie Shaffer, proposes a reflexive look back at the archiving of refugee accounts produced as part of two research projects on forced migration in the United Kingdom; the third, co-authored by Patrice Corriveau, Jean-François Cauchie, Annie Lyonnais and Isabelle Perreault, discusses the challenges of creating a qualitative data bank on suicide in Quebec, including coroner records since 1763. These contributions offer analyses of contrasting experiences, initiated from different logics – oscillating between imperative and willingness to share –, but all of them crossed by a triple ethical, epistemological and practical concern. In particular, they all offer a reflection on the particular challenges posed by the public disclosure of sensitive data and/or data on socially and politically delicate subjects. The different articles offer concrete inputs to the sharing of research materials by addressing the trade-offs, hierarchizations, choices and sacrifices necessary for such an exercise, which necessarily depends on the scientific, social, political and personal sensitivities of each individual. Taking an a posteriori reflexive glance at these experiences also allows the authors to draw a number of lessons for other researchers and future research about archiving sensitive data.
A cross-cutting issue for all articles is the anticipation of data (re)use. This theme is mainly approached from two angles: the need to constitute a scientifically robust and (re)usable corpus while respecting ethical principles; and the concern to avoid a political or social misappropriation of materials. On the first point, all authors mention the importance (and difficulty) of striking a balance between ethical imperatives – i.e. protection of individuals – and scientific exigencies – i.e. data (re)usability. For example, all authors agree on the potentially negative effects of disclosing the identities of individuals whose data are archived: giving a deplorable or accusatory image of a person, institution or population for Bizeul; endangering participants in the case of Stewart and Shaffer; disturbing the peace of family members for Corriveau et al. That said, there is also general agreement that anonymization, although sometimes necessary, impoverishes the scientific value of materials and therefore limits their reuse. The three articles therefore discuss the arbitrations and practical choices that had – or had not – to be made in order to protect participants while maintaining the scientific relevance of the data. On this subject, the approaches are contrasted between the experiences of Bizeul, Stewart and Shaffer, who engaged in a rigorous and costly work of anonymization and ‘self-censorship’, at the risk of rendering entire sections of the investigation inaccessible, and that of Corriveau et al. whose approach consists, on the contrary, in setting up a nominative database. This difference can be explained in part by the perspectives adopted: ethnographic and sociological on the one hand, and socio-historical on the other. In all cases, it appears that maintaining the scientific value while preserving the ethical integrity of the data is a matter of ad-hoc risk management, or ‘micro-ethics’ to use the words of Stewart and Shaffer, which depends on multiple factors: the public nature of the data – in the case of assertions made by political figures or in the case of public official reports – the sensitivity of the materials, the real and perceived risks incurred by participants, the historical value of the materials, the promises made during field negotiations, as well as the epistemic position and moral sensitivities of the researchers. The authors also point out the power of the researcher to silence the words of their participants, and thus to leave them in the shadows of history. The concrete result of these evaluations is usually the implementation of layered protection mechanisms that juggle between anonymization, informed consent and, as we will see, access control.
Concerning data misappropriation, the three contributions to this special issue highlight potentially harmful uses that can be made of the archived materials. This sensitivity to the issue of ‘dual use’ is explained in particular by the fact that the authors all report experiences of data archiving related to topics that could be considered ‘sensitive’. Thus, Bizeul points out that his materials can take on the appearance of ‘intelligence sheets’ potentially useful to ‘Antifa’ groups for political activist purposes, which would be contrary to his own approach. He also points out, interestingly, the dangers to which he exposes himself – and has exposed himself – by revealing field notes that look like a ‘diary’ that might suggest ideological affinity – or at least complacency – with a socially disapproved party that is socially rejected in French academic circles. Stewart and Shaffer let it be understood, for their part, the very sensitive nature of issues related to asylum and thus the potentially harmful consequences of revealing the identities of refugees seeking UK citizenship. Corriveau et al. point out the potentially performative effects of making ‘farewell letters’ publicly available in the sense that they could trivialize if not encourage suicide. One of the solutions put forward by Stewart, Shaffer and Corriveau et al. in particular, is that of controlling access to data by research teams, who can not only decide on the motivations of the requesters, but sometimes even modulate access selectively according to needs. This gatekeeping mechanism would also have scientific virtues in the sense that it can allow a dialogue with the original researchers who have a better knowledge of the data and can, in this way, be of great help to data requesters. The implementation (or not) of such access control methods however depends on the likelihood that ill-intentioned individuals will access the data, as Bizeul points out.
In addition to the potential difficulties – and their solutions – to archiving qualitative data, the various contributions to this issue point to the concrete benefits – and for some even the reasons – for doing so. In response to criticism of the possible encouragement of suicide, Corriveau et al. mention, for example, that making this type of data available can help to ‘dedramatize’ the issue. Stewart and Shaffer note that sharing data on refugees can result in reciprocal benefits for both participants and researchers in the sense that they can serve to challenge negative prejudice about stigmatized communities. Bizeul, finally, emphasizes the historical interest of its materials, which are a first-hand testimony of a party whose ideas are now widely popular. He also emphasizes the heuristic benefits of the archiving process itself, which can allow the researcher and his ‘public’ to make visible the situated and erratic construction of knowledge.
Finally, the three articles in this special issue present a significant number of practical considerations regarding archiving – and more broadly data management – procedures. In particular, they discuss techniques for digitization, coding, anonymization, obtaining consent, filing documents, securing data and establishing access policies. A certain number of recommendations are made in terms of de-identification strategies – fine rereading of transcripts, triangulation of variables, no outsourcing of the deposit process, removal of information not related to the research problem, etc. – as well as in terms of consent – decoupling participation from archiving, having the transcripts reread by the participants, etc. The authors also highlight the very costly and time-consuming nature of archiving and suggest, as much as possible, to anticipate it.
The different contributions to this special issue invite us to think of archiving as a situated, contextual and dynamic process. Although they constitute concrete experiences of publicizing empirical materials, they call to take distance from the bureaucratic framework imposed by the new ethics and ORD policies. Indeed, the cost of the exercise as well as the sensitivity of certain data and subjects suggest opting for flexible approaches that leave a certain autonomy and freedom of appraisal to researchers. Since the risks of standardization are great, it is important to leave room for diversity of points of view and practices. Like the research projects themselves, archiving is a living, non-linear process, made of comings and goings, pauses, doubts and choices that the different authors invite us to (re)live in their company.
Footnotes
Declaration of Conflicting Interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author received no financial support for the research, authorship, and/or publication of this article.
