Abstract
After the failure of the care.data programme, a revised opt-out system has been introduced for British citizens to protect their health data from 2018. However, there are several exemptions from the previous and the revised opt-out systems, some of which are overly broad. For instance, the opt-outs may be completely ignored in the case of ‘anonymised’ data. The data protection terminology in the United Kingdom is slightly different from that in the European Union, and the key issue is that the terms are not used consistently, even in the most important documents and guidelines. This situation may lead to a weak opt-out system with transparency issues, which might erode public trust and lead to a repeat of the care.data failure. Furthermore, the United Kingdom intends to comply with the General Data Protection Regulation after Brexit, thus these differences may cause compatibility issues in the future.
Keywords
Introduction
In 2016, the world spent 7.5 trillion USD on health, representing close to 10% of global gross domestic product, and these costs continue to increase. 1 Globally, countries are actively seeking to enable efficient healthcare systems with the aim of improving the quality of care and reducing expenditures. One mechanism for furthering these goals lies in the secondary use of health data, which refers to the processing of data collected during direct care for new purposes, such as research and policy planning. 2 This secondary analysis offers opportunities to improve healthcare experiences, expand knowledge about diseases, increase healthcare system effectiveness, support public health and strengthen security, while also generating concerns of a complex ethical, political, technical and social nature. 3 As part of these efforts, countries have launched various programmes to accelerate research and improve the efficiency of their healthcare services. Within these programmes, cooperation and data sharing with private companies become necessary, since governments do not possess the required knowledge or technical background to translate their data into efficient care. Information Technology and pharmaceutical companies on the other hand rely on these data to improve their products and services. However, the sensitivity of the data makes cooperation between governments and corporations challenging, since trust and the respect of patients’ privacy is crucial in care services.
This article focuses on the issues and solutions experienced in England since the failure of its ‘care.data programme’, an initiative which was intended to translate citizen health data into improved care and services. In recent years, data sharing policies in England’s health and social care systems have been through several critical changes. The administrative structure of the National Health Service (NHS) was reorganised with the aim of providing better public health delivery. 4 As part of these reforms, the Health and Social Care Act 2012 (HSCA) has been used as the legal framework for centralising data sharing. It allows a special health authority – the Health and Social Care Information Centre (HSCIC, later renamed NHS Digital) – to collect and use health data to benefit patient care. 5 The HSCA changed the way that individuals’ confidential data are processed. According to the Act, NHS Digital could acquire personal confidential data from General Practitioners (GPs) without seeking patient consent. 6 Alongside these changes, the ‘care.data’ programme was initiated in 2013. The programme aimed to extract anonymised patient data from GPs’ records to form a central nationwide database. 7 While the scheme promised to use health data to improve healthcare, public trust and confidence are key factors in making data sharing possible. 8 The fact that the database was designed to be accessed by third-party users, including pharmaceutical companies and private entities, 9 raised serious public concern. 10
The care.data programme was deeply controversial and was halted several times in response to widespread criticism. 11 In June 2015, the programme was revived but stopped 3 months later when the National Data Guardian, Dame Fiona Caldicott, launched an investigation of the project and developed a model for patient consent and opt-out to data sharing. 12 Caldicott recognised that data sharing can be beneficial for research and healthcare delivery but asserted that patients themselves must be allowed to assess the possible risks and benefits. 13 The Caldicott Review, published in July 2016, recommended an eight-point model for the secondary use of health data. 14 After publication of the report, NHS England decided to terminate the care.data programme. 15 The most widely held view was that the failure of the programme was caused by insufficient public communication, confusion about the scope of some core concepts such as ‘direct care’, ‘anonymous’ and ‘pseudonymous’ health data, and the controversial execution of the patient opt-out provisions. 16 The Caldicott Review offered numerous suggestions for the government to regain public trust, such as new opt-out models and stronger protection of anonymised data. 17
After the termination of the care.data programme and the implementation of several suggestions from the Caldicott Review, patients were offered a revised system to state their national data opt-out preferences 18 as of May 2018. 19 However, the revised opt-out system featured only minor revisions to the most important issues, such as the de-identification exemptions. 20 De-identified data are still exempted from the new opt-out system. However, in the new system, what is deemed as being an adequate level of de-identification is based on the Anonymisation Code of Practice, 21 which was not designed for this purpose. This fact was identified as meriting concern by the Information Commissioner. 22 In fact, the original care.data model used a ‘type 1 opt-out’, which prevented information from being shared outside the GP practice for purposes other than direct care, but this more robust opt-out facility will be removed in 2020. 23 These developments have raised fundamental questions: firstly, what is the role of de-identification techniques in cases of secondary use of health data? Secondly, how can the public exercise effective control over the further use of their sensitive data?
This article argues that there has been no significant change in the secondary use of health data following the introduction of the new national data opt-out mechanism in England. It further asserts that ‘de-identification’ cannot in itself justify setting aside patient opt-outs without demonstrating compelling public interest. In order to shed more light on these issues, this article first discusses the legal background (in the European Union (EU) and United Kingdom) concerning the secondary use of health data and the transition to the new opt-out system in England. What then follows is an analysis of the most important exemptions from the opt-out system, namely anonymisation and direct care. The article concludes by elaborating on the connection between the public interest and de-identification techniques by evaluating three data processing scenarios. These scenarios will demonstrate the difficulty of finding a proper balance between individual privacy claims and societal interests.
The legal background: The secondary use of health data and changes to the opt-out system
Legal background
In England and Wales, the use of confidential patient data is subject to complex legal requirements. Section 251 of the National Health Service Act 2006 allows the Secretary of State for Health to make regulations that bypass the common law duty of confidentiality for defined medical purposes. 24 This power was deemed necessary as, in some cases, anonymised data are useless and seeking consent impractical, mainly due to the high costs and insufficient technology available. 25
The EU General Data Protection Regulation (GDPR) encourages innovation and technological developments; thus scientific research has a privileged role in the Regulation with several broad exemptions. 26 The GDPR acknowledges ‘it is often not possible to fully identify the purpose of personal data processing for scientific research purposes at the time of data collection’. 27 As a general rule, researchers may use health data with the data subject’s explicit consent. 28 However, the GDPR intends to ease restrictions on the processing of sensitive data by explicitly permitting its use for scientific research without consent, when appropriate safeguards are satisfied and the processing is based on Member State law. 29 The GDPR provides strong rights for data subjects, such as the right to be forgotten, right to object and portability. However, the Regulation allows Member States to decide, where these rights can be applied in the case of processing for public health and scientific research purposes. 30 The main reason for not requiring greater harmonisation on this issue is the lack of conferred competency of the EU on this field, which is primarily regulated by the Member States. 31 The GDPR has direct effect across all EU Member States. However, the GDPR gives Member States limited opportunities to make provisions for how it should apply at a domestic level. The Data Protection Act 2018 (DPA 2018) came into effect in the United Kingdom in May 2018. It states the United Kingdom’s position on areas of the GDPR that are left for each Member State to decide. Furthermore, the DPA 2018 adds requirements that fall outside the GDPR’s scope, such as data processing by law enforcement bodies. Another role of the DPA 2018 is to implement and retain the GDPR requirements, thus the flow of personal data with the EU might be smoother after Brexit.
The DPA 2018 defines health-related research 32 and requires more conditions than the GDPR. These additional conditions include the need for explicit approval by authorities to carry out research. Furthermore, when sensitive data are processed for research purposes: the research purpose has to be ‘in the public interest’, with appropriate safeguards, and the processing needs to be necessary to achieve the aims of the research. 33 The GDPR and DPA 2018 provide several rights to data subjects, such as the right to rectification, objection and restriction of processing. However, the DPA 2018 takes advantage of Article 89 of the GDPR, which allows Member States to derogate from certain rights (e.g. the data subject’s right to object) when the personal data are processed for scientific research purposes. Thus, these rights cannot be applied, if they would prevent or seriously impair the achievement of the purposes of the scientific research. 34 Furthermore, the DPA 2018 introduces new offences to protect data subjects. For instance, knowingly or recklessly obtaining or re-identifying personal data without consent would be an offence. 35 With these rules in place, the DPA 2018 has stricter safeguards compared to those of the GDPR. Even with the GDPR in effect, the legality concerning the secondary use of health data in the United Kingdom is unique. This is especially the case in England, where a national data opt-out framework is in use.
The revised opt-out system
Choice architecture refers to the design of the context in which people make choices – the conscious arrangement of these choices by policymakers can have a significant impact on how we make decisions. For instance, research suggests that people tend to choose default options, a phenomenon known as the ‘default effect’. 36 One of the most commonly cited examples for this behaviour is within the context of post-mortem organ donation. Two main ‘default systems’ exist at the global scale: (i) the opt-in system: which requires explicit consent from the deceased, and; (ii) the opt-out system: whereby consent is automatically assumed. 37 In an attempt to improve organ donation rates, many countries have moved from opt-in systems where citizens must express their willingness to be an organ donor, to opt-out systems where consent is implied unless individuals have expressed their wishes otherwise, such as opting out. 38 This change has resulted in significant differences in donation rates. 39 For example, countries such as Spain, Austria, France, Hungary, Poland and Portugal have all implemented opt-out systems and the numbers of organ donations are reputed to have increased exponentially. 40 This figure compares favourably to other countries such as Denmark and the Netherlands 41 which have opt-in systems. 42 Similarly to organ donation, making the secondary use of health data the default option is likely to increase the number of citizens involved. As of March 2019, only 2.74% of the population chose to opt-out from the secondary use of their health data in England, 43 which means 97.26% of British citizens’ data could in theory be shared for secondary use outside direct care.
With sharing as the default position, the purpose of the opt-out system is to enable the use of confidential data for research and policy planning without requiring explicit consent, while nonetheless respecting the data subject’s autonomy. The previous opt-out system provided patients with two choices: ‘type 1’ and ‘type 2’ opt-outs. Whereas the type 1 opt-out prevents information from being shared beyond the GP for purposes other than direct care, type 2 opt-out prevents data from being shared outside NHS Digital for purposes beyond the individual’s direct care. 44 The revised system was introduced in response to public dissatisfaction and the Caldicott Review. In this new system, ‘national data opt-out’ refers to the patients’ decision to opt out from the use of their data for research or planning purposes. 45 Patients were able to state their health data sharing preferences from May 2018. Existing type 1 opt-outs will be respected until 2020, whereas type 2 opt-outs have been automatically converted to national data opt-outs. 46 Although the previous opt-out system 47 has been revised, the new system 48 has similar drawbacks and exemptions.
Similar to the previous type 2 opt-out, the revised national data opt-out 49 will apply only where personally identifiable (not de-identified) data are shared for purposes beyond direct care for research studies, or to help in managing the efficient and safe operation of the health and care system. 50 Sharing confidential personal information for purposes beyond direct care will still be subject to data protection laws and the common law duty of confidentiality. What has changed is that patients will have more options through which to state their preferences. The revised op-out system allows individuals to register online, by telephone or in person at the office of their GP. Type 1 opt-outs will cease to be available from 2020.
The national data opt-out 51 is communicated to patients in several ways: by healthcare staff, leaflets, posters, online 52 and via telephone service. However, the type 1 opt-out is not communicated via these publicly available materials. NHS information 53 on the revised opt-out system states that the opt-out will focus on how the data are being used rather than on the type of organisation 54 using the data. This rather vague statement requires further clarification, particularly as both the DPA 1998 55 and DPA 2018 clearly distinguish between data controllers and processors. 56 NHS organisations, NHS Digital and entities outside the healthcare system are separate ‘data controllers’ under this legislation. This means that each organisation needs a legal basis to share personal data. This may not be the case if the organisations receiving personal data act as ‘data processors’, as they would simply process data on behalf of the data controllers. Furthermore, not focusing on the type of organisations receiving personal data may not satisfy public expectations. Public dissatisfaction with the Google DeepMind patient data deal 57 and several corroborative studies 58 indicate that citizens might have concerns about whether their sensitive data are shared with private or public organisations.
The most important exemptions – concerning direct care and anonymisation (which can also involve pseudonymisation in UK terminology, see ‘The legal background: The secondary use of health data and changes to the opt-out system’ section below) – have not been substantially changed compared to the previous opt-out system. 59 Furthermore, it appears that the type 1 opt-out, which was the only option for individuals to refuse the secondary use of their health data, will be removed in 2020. Overall, patients cannot prevent the secondary use of their health data, since the previous and now the revised opt-out systems allow the NHS to share de-identified data with public and private entities.
De-identification, anonymisation and pseudonymisation in the United Kingdom
Anonymisation exists as a broad exemption under the opt-out system in England. 60 On the one hand, an appropriate security measure such as anonymisation is crucial to protect personal data. On the other hand, it is questionable whether anonymisation can provide a legal basis for the further processing of personal data. Furthermore, the relevant laws and guidelines are inconsistent in their use of the term. As anonymisation allows setting aside opt-out choices and the usage of data for secondary purposes with significantly fewer limitations, this wide exemption requires consistent definition and application, which is currently lacking in the United Kingdom. Furthermore, the EU’s GDPR became applicable in the EU from 25 May 2018, thus the United Kingdom is required to comply with the Regulation. However, at the time of writing, the United Kingdom plans to exit the EU, and thus will no longer be subject to the GDPR. 61 However, since leaving the EU might hinder the flow of personal data, the United Kingdom intends to comply with the GDPR, 62 which means its definitions and data protection terminology need to be consistent with the Regulation.
De-identification methods represent a broad spectrum of tools and techniques to protect data subjects’ privacy. The two ends of this spectrum are clear: at one end, the availability of personal data 63 without de-identification, which directly identifies the data subject; at the other end is the anonymous/aggregated data, which cannot identify particular individuals. Between these two positions, exists a wide range of methods and techniques, which needs further clarification. Pseudonymisation is a ‘middle ground’ and involves the separation of data from the direct identifiers (e.g. name, address, NHS number), so that re-identification is not possible without additional information (the ‘key’) which is held separately. Differentiating among these techniques and methods is challenging in the United Kingdom, as the term ‘anonymisation’ is overly broad: it can be applied across much of the spectrum, and it is not similarly defined or employed in the official guidelines, such as the Information Commissioner’s (ICO) Code of Practice on Anonymisation. 64 Furthermore, the term ‘anonymisation’ has a different meaning in the context of the law of the United Kingdom and under the GDPR. In our article, we use the terms ‘anonymisation’ 65 and ‘pseudonymisation’ 66 in a manner consistent with the GDPR and Article 29 Working Party 67 decisions. According to these sources, ‘anonymisation’ means that the data subject is (theoretically) no longer identifiable, thus anonymised data cannot be linked to an individual, and falls outside of the data protection law. 68 For instance, when statistics are made from a group of people (e.g. patients registered with lung cancer in the past decade in London), and the group is large enough, it is not possible to identify the individuals. Hence, statistics authorities can publish reports and findings about the population without posing a risk to citizens’ privacy. Anonymisation is also defined by international standards such as the ISO (International Organization for Standardization), with the same meaning as in the GDPR. 69
On the other hand, ‘pseudonymisation’ is a useful security measure which reduces the linkability of a data set to the data subject. In this case, personal data cannot be linked to a person without the use of additional information. For instance, instead of a full name, a citizen’s code name might be ‘536b45’, thus it is not possible to recognise him or her, without knowing how the code was generated. To re-identify the data subject, a ‘key’ would be necessary, thus pseudonymised data can be linked back to the individual with the proper tools. However, the possibility of re-identification means that processing pseudonymised data poses a higher risk than using truly anonymous data. In the following, the authors refer to both of these measures as ‘de-identification’ techniques.
The data protection terminology in the United Kingdom differs from that in the GDPR, since ‘anonymisation’ is mostly used as a synonym for ‘de-identification’ in the United Kingdom, as an umbrella term for both anonymisation and pseudonymisation. 70 However, the key issue is that the terms are not used consistently in the United Kingdom, even in the most important documents and guidelines. As Table 1 shows, the terms are used inconsistently in the ICO Anonymisation Code of Practice, which was published in 2012 to help data controllers manage data protection risks, and in the Caldicott Review, which proposed data security standards to clarify how people’s health data can be used for secondary purposes and under what circumstances data subjects can opt-out. The Caldicott Review, 71 government documents and information websites reference the ICO Code of Practice on Anonymisation as a standard of anonymisation for both public and private organisations. The Department of Health also made it clear that the opt-out should not apply to anonymised information, 72 which is in line with the ICO Code of Practice on Anonymisation. The ICO Code defines ‘anonymisation’ as a broad term covering various techniques used to convert personal data into de-identified data. The Code draws a distinction between the anonymisation techniques used to produce aggregated information, and those – such as pseudonymisation – that produce anonymised data on an individual-level basis. 73 Difficulties arise, however, when an attempt is made to find consistency in the data protection terminology in the whole text of the Code. The Caldicott Review has similar issues, as illustrated in Table 1. The first row of Table 1 describes the meaning of ‘de-identification’ provided in the ICO Code of Practice and the Caldicott Review. The ICO Code uses the term de-identification in ways similar to the use of the term by the Article 29 Working Party; it can mean both anonymisation and pseudonymisation. In contrast, in the Caldicott Review, de-identification only means pseudonymisation. The second row compares how these two documents use the term ‘anonymisation’. The ICO Code of Practice applies it as a synonym for three concepts: complete anonymisation, de-identification and pseudonymisation. On the other hand, the Caldicott Review uses the term anonymisation mostly in terms of complete anonymisation.
Data protection terminology in the ICO Code of Practice and the Caldicott Review.
The following example demonstrates further the confusion that exists around the concept of anonymisation (not just in the whole system, but in one Code). The ICO Code of Practice states that: There is clear legal authority for the view that where an organization converts personal data into an anonymised form and discloses it, this will not amount to a disclosure of personal data. This is the case even though the organisation disclosing the data still holds the other data that would allow re-identification to take place. This means that the DPA (Data Protection Act) no longer applies to the disclosed data.
74
It is also concerning that whereas the Code is mostly consistent with the idea that anonymisation is equal to de-identification (which involves pseudonymisation), it invokes one of the EU data protection rules, which applies only to anonymous (completely anonymised) data. This code was also published with Recital 26 and Article 27 of the European Data Protection Directive (95/46/EC) in mind. These provisions make it clear that the principles of data protection do not apply to anonymised data and open the way for a code of practice on anonymisation.
75
Moreover, the process of de-identification is itself a form of data processing, 84 which requires a legal basis. However, the GDPR and DPA 2018 require organisational and technical safeguards, especially in the case of processing sensitive data, thus de-identification can be seen as an obligation for data controllers to secure data. 85 Data controllers need to apply anonymisation and pseudonymisation techniques as security-enhancing measures even when their purpose is not to later re-identify the data subject. Hence, consent is not necessary to protect personal data with de-identification techniques. However, according to the DPA 2018 86 and GDPR, 87 there must be a legitimate reason for re-identification, since personal data must be collected for specific purposes and not further processed in a manner that is incompatible with those purposes. Therefore, re-identification must be an integral part of the purpose of the processing. 88 For instance, in biobanking, re-identifying and re-contacting with the participants might be necessary for providing additional data and samples, or to be involved in new medical research. 89 In these cases, patients gave their consent to be re-contacted in the future 90 and ethical approval 91 might also be needed in order to re-contact them.
Hence, anonymisation and pseudonymisation of personal data is required by the GDPR and DPA 2018; yet re-identification needs a legal basis, such as consent or public interest. Despite the shortcomings of de-identification techniques, they can play an essential role in risk mitigation. However, these methods cannot justify the secondary use of health data while bypassing the choice of individuals to opt-out. In the United Kingdom, the lack of coherent standards for de-identification is a significant hurdle to responsible data processing.
Direct care
Both ‘direct care’
92
and ‘individual care’ constitute exemptions in the previous and revised opt-out systems. However, their scope and definition need further clarification under the relevant laws and guidelines. The definition of direct care is not regulated by UK law and can be found only in the National Data Guardian’s Review on information governance in the health and care system (Caldicott Review 2013).
93
The Review is not a legally binding instrument; it comprises recommendations in relation to the public’s health for all the jurisdictions of the United Kingdom. It defines direct care, as: A clinical, social or public health activity concerned with the prevention, investigation and treatment of illness and the alleviation of suffering of individuals. It includes supporting individuals’ ability to function and improve their participation in life and society. It includes the assurance of safe and high quality care and treatment through local audit, the management of untoward or adverse incidents, person satisfaction including measurement of outcomes undertaken by one or more registered and regulated health or social care professionals and their team with whom the individual has a legitimate relationship for their care.
The public interest and de-identification techniques
Each society has different privacy expectations. In some countries, citizens’ trust in their government is strong, and they may accept the secondary use of their data with fewer limitations. In other countries, people might be more concerned about sharing their most sensitive information. Thus, drafting policies and frameworks for the secondary use of health data on an international level is problematic. One size cannot fit all; even the GDPR operates with broad opening clauses, leaving room for Member States to further process health data for scientific research. The rejection of the care.data programme and public consultations indicated that people in England do have concerns regarding the secondary use of their health data outside the NHS ‘family’ 95 and they value the ability to opt-out. 96 As a result, it is crucial to balance privacy risks, the public interest and de-identification techniques when opt-outs are not applied to the secondary use of health data in England.
It might be the case in the future that all citizens will have equal access to technology and the Internet. In such a scenario, it may be easier to gain the (re-)consent of people to use their health data, and personalised (anonymous or not), digital communication interfaces 97 may connect researchers and participants. 98 However, currently, it seems impossible to gain the permission of the whole, or at least a significant part, of society for secondary data processing purposes. 99 As a result, finding the proper balance is challenging, especially as the meanings of public interest and privacy risk are context-specific. Moreover, this balancing process is complicated by the fact that the efficacy of anonymisation and pseudonymisation methods is constantly being challenged by data scientists. 100
A further reason to examine the role of de-identification techniques in the secondary use of health data is the revised opt-out system in England, which permits ignoring patient opt-outs in cases of anonymised or pseudonymised data. The processing of health data may provide means of enhancing individuals’ healthcare outcomes directly and indirectly through biomedical research. 101 However, the further use of health data poses complex ethical, legal and technical challenges. 102 Due to the sensitivity of health data, respecting and protecting individual autonomy and privacy is of crucial importance. Next to receiving the data subject’s direct consent, the law may also allow for the further processing of health data for secondary purposes. When this is the case, it is vital that processing is done for the public interest at least to a minimum extent, as it is this secondary use of data that elevates the public good above individual consent. The notion of presumed consent leads to the reasonable expectation that the data subject’s opt-out has to be respected, except in special cases. In these special cases, at least a general level of public interest needs to be apparent and stronger safeguards need to be put in place. The safeguards may involve de-identification techniques. 103
Below, three data processing scenarios will be evaluated to further illustrate the dynamic connection between the public interest and de-identification techniques. The first data processing situation involves low-level perceived public interest; the second scenario has a medium-level interest; and the final data processing situation presupposes high-level perceived public interest.
Low-level public interest
Most data processing happens without any (or with relatively low) public interest. For instance, when a smartphone application counts a user’s steps, the public interest is not immediately apparent. The data are valuable for the user to measure their daily routine. However, knowing the sport and commuting habits of society as a whole can be beneficial for the public. For instance, governments may use the data to promote public sports programmes and organise public transportation more efficiently. This begs the question of whether these sports programmes and slight public transport improvements justify the secondary use of data without consent, or even while ignoring the data subject’s opt-out? When the secondary use of data benefits only a small portion of society, or mostly serves private interests, the balance of privacy, de-identification techniques and research interests must be carefully reconsidered. The risk of re-identifying the individual must be minimised (e.g. aggregated data), or the data controller should have to acquire explicit permission from the data subject.
As Figure 1 illustrates, the data subject’s privacy prevails in data processing scenarios with lower levels of public interest, thus personal data can be processed only with explicit consent. Furthermore, the data subject’s autonomy is not affected by de-identification techniques, since collecting anonymised information also requires the data subject’s consent. Even a fitness tracking smartphone application should require the user’s consent before collecting anonymous information as there is a small risk that even anonymous data may pose threats to the data subject, as has been demonstrated in numerous cases. 104 However, aggregated statistical data 105 may be an exemption from this rule as this type of data is based on large numbers of data subjects, thus they are not identifiable. Hence, it is safe enough to provide information while protecting the privacy of the citizens behind the statistics. 106 Overall, in the case of a low level of public interest, de-identification techniques cannot justify overriding the data subject’s consent, except when dealing with aggregated statistical data.

The ethical secondary use of de-identified health data.
General (medium) level public interest
When a smartphone application is used to measure a user’s activity, it does not place any burden on society. However, when the citizen enters into a public hospital, there is an apparent public interest in providing a high-level service in a cost-efficient way. Managing a public health system is challenging for every country; moreover, the ageing population significantly burdens many countries. As a result of the public interest at stake here, many countries have started looking for ways to use their health data for secondary purposes. Since aggregated data has limited value for research, pseudonymisation has become a widely applied data security measure.
If there is a general level of perceived public interest, appropriate de-identification techniques and safeguards may provide more space for data controllers to further process health data. However, the secondary purpose needs to be lawful and compatible with the original one. In the case of a public interest scenario, the secondary use of health data has the potential to improve care for the entire population or for a subset which needs special care and help (e.g. patients with cancer). Furthermore, many treatments lack proof in their efficacy and may, in fact, cause harm. 107 The secondary use of data may help to achieve these goals by identifying ‘at-risk’ patients for participation in ongoing research and by data mining in search of clinical patterns not previously known. 108
The new data protection laws tend towards recognising that scientific research may be regarded as a compatible new purpose for the secondary use of data. 109 The GDPR permits the processing of sensitive data for scientific research without consent, when appropriate safeguards are satisfied. 110 The Regulation does not require Member States to provide opt-out provisions to citizens for the secondary use of their health data. However, the scope of secondary use is still not clear. 111 Processing without consent may be more acceptable in situations in which acquiring permission from the data subjects would be impossible or would require disproportionate effort. A classic example is the use of data from public registries. The frequency of these situations is higher in the age of big data because electronic health records are used by researchers to reveal findings which would be impossible to glean without the use of vast data repositories. 112 This type of analysis has great potential for health increasing discoveries by repurposing processed data, but where reaching citizens to obtain their consent is impossible or highly impractical. In such cases of data processing, the law may presume the data subject’s consent and interest because the scientific research can be expected to also serve it through actions taken in the public interest. However, what best serves the population may not always be in the interest of all individuals. 113 The choice to opt-out should be respected as consent may only be presumed until the data subject withdraws from data processing.
As Figure 1 indicates, a focus on the public interest may change the role of de-identification techniques as these methods reduce the risk of privacy invasions. The number of people who were diagnosed with diabetes in England in 2017 is an example of aggregated data. This information does not have implications for citizen privacy, and there is a public interest in knowing this number to prepare the healthcare system accordingly. Producing anonymous (completely anonymised) data may justify the inclusion of people since the risk of privacy violations can be decreased to a level which may be balanced by the public interest. By contrast, pseudonymisation presents a questionable situation. When each patient’s data are necessary to understand the reasons behind the growing number of people with diabetes, then the real balancing process starts. The pseudonymisation of their data constitutes a strong safeguard, and yet the data remain personal after de-identification. Thus, the individual is still identifiable, but only indirectly. 114 However, the public is more open to secondary processing without consent, if their data are protected via this method. 115 Two issues have to be discussed. First, the further processing of pseudonymised data without consent and second, whether or not opt-outs may be ignored in these cases. Pseudonymisation, combined with at least a general level of public interest and appropriate safeguards, can make further data processing possible in an increasing number of situations, as is underscored by the fact that regulations are developing in this direction worldwide. 116 Still, broadly speaking, pseudonymisation with only a general level of public interest does not suffice to justify ignoring the opt-outs. 117 There can, however, be unique situations, in which opt-out may seriously impair researchers’ ability to achieve their objectives. 118 In these cases, their research may not yield reliable results. 119 Biased research may lead to less optimal patient care, as the evidence of the scientific research may be invalid or misleading (failure to capture an important association). It is also possible that the research cannot be started or completed because of the prohibitive costs and administrative burden of opt-outs. 120 However, consent bias needs to be carefully evaluated and not overstated. 121 In these special cases, the opt-out may be compromised when a general level of public interest is perceived, the application is reviewed by institutional review boards and other committees, and appropriate safeguards are in place. If any of these conditions are missing, the opt-out must be upheld and respected.
High-level public interest
There are situations in which the data subject’s autonomy has to be put aside for the benefit of the whole society or humanity more generally. For instance, in the case of pandemic diseases, the pressing interest in research to halt the outbreak is more important than the individual right to privacy. Since a high-level public interest may result in even the deprivation of fundamental rights, through, for example, quarantining people, 122 the further use of health data without any de-identification technique may be appropriate (see Figure 1). In such a situation, individuals would have no right to opt-out from these essential methods of data processing in order to fulfil a substantial and high-level public interest.
Conclusion
The opt-out system was revised in England after much planning and public debate. However, the revised system did not significantly change the secondary use of health data as the opt-outs do not apply to anonymous and pseudonymised data. Since de-identified data constitute a broad exemption from the previous and revised systems, standardised definitions and guidelines are essential. As this article points out, this is not currently the case in England as the definitions in the most relevant guidelines remain confusing. Further, the type 1 opt-out, which truly stopped the secondary use of health data at the GP, will not be an option from 2020.
A considerable volume of literature already exists on the balancing of public interest and patient privacy for the secondary use of health data. This work aimed to contribute to the growing body of literature by adding a third dimension, de-identification techniques, to the balancing process to highlight new issues. As has been demonstrated, there are numerous types of de-identification techniques. Applying them as a legal basis for the secondary use of health data and exemption from the opt-out system raises concerns, since these methods would require further legal clarification and industrial standards. Since the United Kingdom intends to comply with the GDPR after Brexit, conformity in terminology would be essential to avoid potential barriers for scientific research.
The public interest in data processing needs to be balanced with the risks posed to the data subjects when their data are processed without consent. In cases of low public interest, only anonymous data can be processed. In these cases, the strongest safeguards are necessary; for instance, researchers should not be allowed to access individual data, only aggregated information. With general or medium public interest, such as improving healthcare quality and personalised medicine programmes, the secondary use of pseudonymised data is acceptable. In the case of high-level public interest, such as epidemic diseases, processing data without de-identification might be necessary, since the risk for the whole society outweighs individual privacy. However, there are challenges in communicating this balancing process to the public. If citizens are empowered to exercise control over their data, it is crucial to clarify the limits of their decisions, since transparency and control together can build trust, which is the foundation of effective health and care services.
Footnotes
Authors’ note
Janos Meszaros and Chih-hsing Ho are joint first authors.
Author contributions
Janos Meszaros and Chih-hsing Ho contributed equally to this article.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research is funded by the Multidisciplinary Health Cloud Research Programme. Academia Sinica, Taipei, Taiwan.
