Abstract
The creation of socially and technically robust biobank privacy regimes presupposes knowledge of and compliance with legal rules, professional standards of the biomedical community, and state-of-the-art data safety and security measures. The strategies in privacy management and data protection presented in this review show a trend that goes beyond searching for compromises or efforts of balancing scientific demands for efficiency and societal demands for effective privacy regimes. They focus on developing synergies that facilitate cooperative use of biomaterials and data and enhance sample search efficiency for researchers on the one hand, and protect rights and interests of donors and citizens on the other hand. Among the issues covered are: a) ethical sensitivities and public perceptions on privacy in biobanking b) tools and procedures that allow maintenance of the rights and dignity of donors, without jeopardizing legitimate information needs of researchers and autonomy of biobanks, and c) a privacy sensitive framework for sharing of data and biomaterials in the research context.
Introduction
Obviously the creation of socially and technically robust, biobank privacy regimes presupposes knowledge of and compliance with legal rules, professional standards of the biomedical community, and state-of-the-art data safety and security measures. In a framework of responsible innovation 1 compliance with legal rules is a basis but not necessarily sufficient for long-term success of large-scale research infrastructures. In such a framework, norms and values ought to become design principles for technological development. Responsible innovation means that science policy frameworks increasingly evaluate research projects not only in terms of scientific excellence, but in addition include an evaluation with a view to normative anchor points such as privacy impacts. 1 Mindsets that aim at optimization and innovation in purely technical areas but only meet minimal standards with respect to ethical values and societal interaction become increasingly outdated. Ethics and social responsiveness are in that sense not only constraints on technological advances but are gathering momentum as crucial ingredients in policy strategies that aim to foster technical and scientific progress in the innovation society. In order to become accepted and supported by society, ICT tools and working procedures need to be fine-tuned to cultural norms and values.
However, technology also tends to impact society's attitudes and values. Millions of people readily abandon their privacy concerns when posting private information with social media, and hundreds of thousands willingly disclose health-related personal data over the World Wide Web. 2 On the advent of translational medical research and personalized medicine, which will combine more and more genotypic and phenotypic/clinical data over a cloud-computing database architecture, technical scenarios might soon emerge for which one cannot predict today whether technology will impact societal norms and values or vice versa.
Against this background, the twin sessions on ethical, legal and social issues (ELSI) and informatics aspects of privacy and data protection at the European, Middle Eastern & African Society for Biopreservation & Biobanking (ESSB) 2012 conference in Granada assembled a large variety of expertise. Together the contributions portray the state of the art of biobank privacy, data protection, safety and security regimes, and provide an outlook on developments in the domain of computing technology and architecture. The covered issues are: a) ethical sensitivities and public perceptions on privacy in biobanking b) tools and procedures that allow maintenance of the rights and dignity of donors, without jeopardizing legitimate information needs of researchers and autonomy of biobanks, and c) a privacy-sensitive framework for sharing of data and biomaterials in the research context.
The strategies in privacy management and data protection presented here show a trend to go beyond searching for compromises or efforts of balancing'scientific demands for efficiency with societal demands for effective privacy regimes. They focus on developing synergies that facilitate cooperative use of biomaterials and data and enhance sample search efficiency for researchers on the one hand, and protect rights and interests of donors and citizens on the other hand. 3 While starting from a diverse range of disciplinary specialities, the authors' aim to encourage transdisciplinary cooperation, leading to a new generation of solutions that bridge the gaps between ethical reflection, sociological analysis, IT development and biobank-based research.
Public Perception of Biobanks: Towards Acknowledging the Privacy Reciprocity Connection
Data protection systems that allow data processing in principle, but establish explicit procedural norms that render the terms of use conditional are an important but not necessarily sufficient means to come to grips with the problem of privacy concerns. Quantitative survey and qualitative focus group data have highlighted the connection between privacy concerns and expectations of reciprocity in the context of population-based research biobanks.
In the context of biobank initiatives, individuals are expected to contribute different types of data and information in the form of lifestyle information, medical history, physiological measurements, samples of tissue, blood, urine, and saliva, and other records that may be health related. Providing this type of information was regarded as giving up both, something personal and private by participants. 4 In addition, a 2010 Eurobarometer survey on Europeans and Biotechnology revealed that participants' willingness to participate in biobank research is negatively correlated with concerns about privacy. This means that inhabitants of European countries that score high on a privacy concern scale are hesitant to provide biomaterials and health related data to biobank research infrastructures. 5 However, another Eurobarometer study that surveyed attitudes towards data protection in European populations showed that privacy concerns are especially context sensitive. Even in countries like Germany and Austria, in which 65% and 79%, respectively, of respondents were very concerned about different private and public organizations keeping personal information about people, more than 80% trust medical services and doctors to handle their personal information with care. 6 Arguably the health care context signals trustworthiness. In the light of ambiguous quantitative results, studies that made use of qualitative sociological approaches tried to shed some light on the most important factors that influence opinion formation.
Two recent qualitative sociological studies5,7 revealed that issues of privacy and data protection are indeed important for participants in the context of population based and clinical biobanks. Interestingly, expectations about data protection potentials were relatively low. Adequate data protection measures were regarded as a necessary precondition but not a sufficient cause for donating private information and biological materials. Furthermore, perceptions of personal risk of disadvantage from privacy violation have not been considered as being the most important issues. Respondents had the feeling that in an information society their data “are already everywhere,” that they “have nothing to hide,” and that this particular type of data is rather unproblematic.
The studies partly confirmed the importance of notions like “altruism” and “solidarity,” which have received attention in recent bioethical reflections. 8 One of the most important reasons for citizens participating was indeed a desire to contribute to the common good of scientific progress and improved population health. Tensions are however created by what could be called a “promissory gap,” Individuals have to waive certain concerns and rights for privacy in the present, whereas the scientific results and potential improvements of health care will only appear in a distant future. In this context, attention is devoted to issues of trust in relation to the scientific community. 9 This trust however, appears in parallel with questions about personal and collective possibilities to retain some level of control about donated data and materials. 5
One of the most interesting insights is that there is a connection between privacy and reciprocity that needs to be understood in order to explain under which conditions people are willing to give up privacy and donate bioinformation.” 7 Biobank participants maintain a complex relationship with their donated specimens and personal information. 10 The interest in their donation does not terminate after the initial conditions of acceptance, and they tend to attach conditions to their gift of losing privacy for research purposes. An analysis of the group discussions once again confirmed a fundamental insight of Mauss' anthropological studies that something like a completely free gift does not exist. Gifts create obligations. 11 For many focus group participants, their conditional gift constituted an ongoing and long-term mutual exchange of benefits between biobanks, society, and the individual. They perceived the biobank to hold a duty of care and expected medical return- gifts taking the form of more comprehensive individual feedback and a certain degree of insight into the research process. People expressed the opinion that the general public should be actively informed about general outcomes of research projects that made use of biobank infrastructures. They liked the idea of seeing what has been achieved with the help of their donations. Moreover, this feedback mechanism would force scientists to communicate their results in understandable fashion and it was seen as a means of controlling the development of biobanks in the future, by enabling participants to reassess their decision for participation in the future.5,7
Biobanks should be encouraged to assume their duty to preserve the spirit'of the gift. People need to get the feeling that their donation feeds into a respectful and mutually beneficial relationship. Pointing towards potentially significant medical benefits that might result from biobank research at some unspecified point in the future is only one important element in a system that incentivizes biobank participation. Individual reciprocity serves to incentivize participation and emphasizes the participants' feeling of value to the project. Meeting demands for transparency, for regular updates and ongoing information on research performed on samples and data means that the vast resources of Web 2.0 still remain to be integrated and used by many biobanks. Self-contained Web pages will not be sufficient. What is missing are interactive features, like possibilities to alter consent settings, online opportunities to learn about research projects that make use of donated materials and possibilities to engage in an open dialogue about aims and procedures of a biobank. 12 Involvement of citizens and in particular potentially affected persons will stay an essential part of proactive biobank governance strategies.
Information Technology Solutions to Reconcile Research Needs and Privacy Demands
As noted earlier biobanks collect human biological material (tissue, blood, cell cultures, etc.) 12 and data describing these materials and their donors (e.g., patient records, questionnaires).
Efficient use of the biobank content in biomedical studies and the demands of protecting the rights and dignity of the donors, in particular their privacy, are seemingly conflicting. It will be shown that information technology (IT) solutions and organizational design can balance both requirements: support for researchers gathering relevant material for their studies quickly and effectively without violating privacy requirements and other usage restrictions according to regulations (laws, ethical bylaws, court rules, etc.) with the explicit intention of the donor. 13 For clinical and population based biobanks alike this can be achieved by an individualized informed consent represented in a computerized way in a disclosure model. The intention of the donor, captured in an informed consent, could be individualized in the sense that a donor can choose from a menu of different usages or circumstances he/she permits the data to be used. For example, the donor might pose restrictions on the research area (e.g., cancer research), the type of research (e.g., basic research, drug development), type of research organization (e.g., public, non-profit), the country where the research is performed, the type of control the research institution guarantees or the certification of research project, researcher or research organization. Certainly, for research and in particular for biobanks it is desirable that the informed consents are as broad as possible, but the possibility for imposing restrictions might attract additional donors.
The following section of this review concentrates entirely on the exchange of data between a biobank and a researcher. The basis of all the following considerations is of course that the data is thoroughly secured by the usual means and that only authorized access to the data is possible. Here we will focus on the question: Which data can be passed to a requesting researcher and when to prohibit the disclosure of personal data, 14 an important aspect for the design of information systems for biobanks. 15 We analyze the process in preparing and performing a biomedical study and discuss the appropriate measures for each step.
In the first step, a researcher formulates a hypothesis and needs to gather relevant data (cases) for performing the study. There, the intention of the researcher is to rapidly find relevant biobanks and biospecimen cases in the biobanks which could be used for the study. In this step, there is no need for personal data to be passed to the requester (i.e., data which can be associated with a single individual, directly or indirectly, by combining the data with other data that could be known by the requester).
The second step is the formulation of a project proposal including in particular the request for the material and data from one or more biobanks that were identified in the first step.
The third step is the examination of the proposal by a research ethics committee, which checks, (a) whether the intended results justify the use of the data and material, (b) whether the requested data is needed for the proposed study, (c) whether the purpose and circumstances of the planned project is covered by the informed consent of the donors, and (d) whether the data will be sufficiently secured against misuse during the project and after the completion of the project. It depends on the legislature how state authorities are involved in this process, in particular, whether they are represented in the ethics committee or whether they concede and audit the procedures.
If the review of the proposed project is positive, detailed data (and material) is handed over to the requester and the research project is initiated.
IT solutions can support these different steps. In the first step—identification of relevant cases—there is typically no need for very detailed data to be passed to the researcher, so we can apply different techniques such as posing the queries to k-anonymous 16 and l-diverse 17 data sets generated from the original database, or statistical blurring of the query results. Novel query techniques 18 make the search for relevant cases much more efficient and effective. Further support comes from automatically matching the informed consent and restrictions from laws, bylaws, etc. (disclosure model) with the explicit representation of the specifications of the requester and the planned project (request profile) during query processing. With this information the answers to the query can be restricted to those cases that might be admissible for the project according to intentions of the donor. This support makes the search for relevant cases much more efficient as it prohibits a researcher from chasing data and material that will not be available for the intended project due to donor's restrictions or laws, bylaws, etc. relevant for the queried biobank. Experience shows that collection of material and data for research projects is costly and time consuming and a serious bottleneck for medical progress. 19
While anonymization techniques like k-anonymity and l-diversity can help in the first of the steps outlined above, they do not solve all security questions. For example, for performing the study (after step 3), k-anonymous and l-diverse datasets are frequently not sufficient due to the loss of information during anonymization of a data set. Access control and access layers are needed, providing open access only to metadata and coarse grained data after consideration of risks (e.g., of re-identification and of statistical inference of membership). Access to further layers, which may use k-anonymity, will typically be restricted by access control mechanisms and data use agreements. It should also be noted that there is no clear consensus in the discussion of consent concepts nor a widespread use of IT-solutions supporting individualized consent: Currently, broad consent is predominantly used by major population biobanks, 20 and a relevant argument in favor of broad consent is that there is a need of consenting for future unspecified research. 21
The query results cannot be connected to individual persons or cases but pseudonyms can be generated to allow the concrete formulation of data and material requests in the project proposal.
The disclosure model and the request profile can also be used to support the work of ethics committees for checking whether the request is in conformance with the restrictions imposed by the donors and the biobanks.
To summarize: The outlined organizational and technical design for the processing of requests for material and data secures both the privacy and the intention of the donors and will improve the collection of material and data for research projects.
Privacy By Design via Secure Computations in Bioinformatics and Biostatistics
As is evident by an increasing number of publications, organization-focused, legal, and technological solutions need to go hand in hand to meet privacy demands and thus enable new research involving biobanks and other sensitive data sources.15,16,17,22,23
While several steps—in particular, statistical ones16,24 —can be taken to improve privacy by, e.g., strengthening anonymity, at the same time such approaches hinder the usefulness of data for studies in epidemiology, biomedicine, etc. due to the fact that information is actually retained within the organization holding the data and not released to the researcher on the outside, potentially hiding important features from scrutiny. Eventually, such cases arise quite frequently, e.g., in the field of rare diseases where expression markers and features need to be discarded due to the low probability of their occurrence within a cohort of participants.
Therefore, it is desirable to enable the analysis and computations of such data—potentially even to promote a mandatory release for some subgroups and their health problems. But how can we avoid the re-identification of persons with some rare features? The idea of k-anonymity 22 is one way to attack this problem. But in the most pressing cases as described above, this concept fails in making such rare features available for outside expertise, and in particular for distributed data analysis by a multi-disciplinary team of experts.
At the same time, traditional models of communications security, which are a prerequisite for distributed handling of sensitive data, model only situations where the malevolent party is an outsider—here, the so-called attack model will typically focus on (un)intended information leakage in the communication alone. Such scenarios are currently well researched and solutions exist. These days, however, the cloud computing paradigm is on the rise. Here, the aforementioned attack model does not cover all aspects of the cloud setting: In the cloud a malevolent party cannot only be eavesdropping on the communication between legally authorized parties in an analysis procedure, but such an attacker could also be the cloud as a provider or an attacker who cracked the cloud infrastructure; eventually rendering the infrastructure itself untrustworthy. The storage and the analysis of such sensitive data “under occupation” 25 adds to the complexity of the traditional problem of handling data in biobanking.
When dealing with encrypted data (the so called cyphertext of a message), one can convincingly claim ignorance about the real information (the so called plaintext). Therefore, storing encrypted data solves one of the problems induced by cloud computing and collaborative efforts in biobanking, statistics, and informatics.
To include the analysis step, however, requires another new approach: secure multi-party computations. 26 While the original idea dates back to the 1980s 27 only recently has progress allowed for first implementation of, pattern matching algorithms. 28 Recently, it has been demonstrated in a series of studies23,25,28 how to use such computation for statistical models and machine learning approaches. This new field of secure bioinformatics allows one to eventually work and analyze data in the encrypted domain, thus assuring zero knowledge 29 about the real data and thereby enforcing technical privacy-preserving procedures. Such a procedure allows the design of communication and statistical data analysis protocols that are privacy preserving by design, since the processing of data for research purposes occurs without knowing the data that are analyzed.
At present, we have established numerical accuracy within typical application regimes, such as the IEEE floating point standard 21 as well as sufficient efficiency to employ such protocols on typical commodity hardware frequently found in wet labs and biobanks. 23 It is noteworthy that a secure computational framework acts as an intermediate abstraction layer on top of which a bioinformatician or statistician can easily develop models and implement algorithms for the particular research question. For example, quite often high-level programming languages, such as Python, R, or C++, are employed in the development of statistical frameworks. A secure computational framework would just need to provide the secure implementation of the basic algebraic operators (addition, multiplication, and so on) and relevant special functions (such as the logarithm). Then, already implemented algorithms in C++, for example, could easily be transferred to the secure domain via operator overloading techniques. 30
Discussion and Outlook
The ESBB conference twin sessions on privacy and data protection issues highlighted that while on the one hand societal values inspire and influence the development of technological means, on the other hand technological developments tend to address the perceptions of societies. Peter Dabrock put the developing biobank privacy regimes in the context of the concept of the European innovation society. Herbert Gottweis and Georg Lauss highlighted that while privacy is still a very important value for many individuals, those individuals associate the concept at least as much with wishes for reciprocity and control as with data protection. Johann Eder presented bioinformatic solutions to safeguard donors' privacy without jeopardizing research efficiency and Kay Hamacher showed how approaches from the field of secure bioinformatics promise to solve the dilemma concerning research efficiency in multicenter studies. Their approaches provide accurate data protection standards even in the field of rare disease research where the criterion of k- anonymity is typically difficult to achieve. Klaus Kuhn showed how security mechanisms are used in current systems and how they will be further developed in EU projects. He argued in favor of risk and threat analyses as foundations of security concepts and presented examples. Together these contributions pointed towards promising possibilities to develop biobank privacy regimes in the context of contemporary innovation societies in a transdisciplinary fashion.
The discussions after the two panels highlighted three main topics. First, there were concerns about the necessity and feasibility to fully anonymize biomaterials and information in every scenario. It was proposed to make a distinction between situations in which we deal with controlled data and a tissue chain concerning specific research projects, and cases in which these requirements should not be in place. Second, there was an interest in possibilities to talk to and engage the lay public in the context of biobank developments. Third, the need to develop funding mechanisms that take into account that the development and implementation of privacy sensitive tools and procedures is a time and resource consuming task was discussed. Since scientific self-governance increasingly means more than mere compliance with legal rules and basic ethical principles, it is important to convince policy makers to address this issue and think of possibilities that cover the cost of these issues in their funding schemes.
