Abstract
In the age of big data, many countries are implementing and establishing de-identification policies quite actively. There are many efforts to institutionalize de-identification of personal information to protect privacy and utilize the use of personal information. But even with such efforts, de-identification policy always has a potential risk that de-identified information can be re-identified by being combined with other information. Therefore, it is necessary to consider the management mechanism that manages these risks as well as a mechanism for distributing the responsibilities and liabilities in the event of incidents involving the invasion of privacy. So far, most countries implementing the de-identification policies are focusing on defining what de-identification is and the exemption requirements to allow free use of de-identified personal information. On the other hand, there is a lack of discussion and consideration on how to distribute the responsibility of the risks and liabilities involved in the process of de-identification of personal information.
The purpose of this study is to compare the de-identification policies of the European Union, the United States, Japan, and Korea, all of which are now actively pursuing de-identification policies. Additionally, this study proposes to take a look at the various de-identification policies worldwide and contemplate on these policies in the perspective of risk society and risk-liability theory. The constituencies of the de-identification policies are identified in order to analyze the roles and responsibilities of each of these constituencies thereby providing the theoretical basis on which to initiate the discussions on the distribution of burden and responsibilities arising from the de-identification policies.
Keywords
Introduction
During the World Economic Forum (WEF) held in January 2016, the new concept of the ‘fourth industrial revolution’1
The term ‘fourth industrial revolution’ first appeared to describe the stage of converging of manufacturing and ICT industries from the Industry 4.0 project, one of the 10 projects of the High-tech Strategy 2020 announced by Germany in 2010.
U.S. News, “Donald Trump Will Win, Says AI System That Correctly Predicted the Last 3 Elections”, 2016.10.28,
Money News, “Schwab ‘robo adviser’ grows to $5.3 billion in its debut year”, 2016.01.06,
But on the other hand, because big data may contain personal and private information, changes in how the analysis of big data is performed will be needed under the current personal information protection policy and framework. Today, most countries have selected the opt-in method for allowing the use of personal information for analysis and use, meaning whoever makes use of such personal information needs to acquire the necessary permissions based on prior notification from each individual for a limited duration, specified purpose and scope. However, considering the amount of data being collected and amassed in the age of big data, the difficulties of the opt-in method in determining which data belongs to whom, the exorbitant cost and time to acquire the permissions of each individual and the limitation in the use of the collected personal information for the specified purpose notified during its collection, will bring about the need to find a new system to replace such inefficient and cumbersome mechanism.
A good alternative to replace the opt-in method is the ‘personal information de-identification’ procedure which provides data usable for analysis while lowering the risk of invasion of personal privacy by removing specific information that can be used to identify an individual. Already, many countries firmly believe that this methodology will act as the conduit to leading the age of big data and are in the process of its implementation as national policy for personal information protection or for big data analysis.
One consideration to address for this alternative is the possibility of re-identification of individuals using the de-identified information together with other relevant information acquired separately. It, therefore, becomes very important to understand the risks and limitations that such de-identification policy possesses; unfortunately, to date, only limited research on determining the exact definition of de-identification and its scope of applicability are in progress in most countries. There is a definitive and urgent need to identify the constituencies of the de-identification policy and establish the roles of each as well as clearly define the liabilities that each should shoulder in case of unfortunate accidents and incidents to minimize the risk of re-identification and provide a sense of security to the owners of personal information.
This study aims to propose the safe use of personal information by appropriately applying de-identification policy in the age of data. To do this, we carried out research according to the following procedures.
First, we explored the significance and value of de-identification policy in order to understand the current social phenomenon more deeply and to grasp the essence of the discussion, and discussed the concerns and risks of de-identification policy. Next, we looked into the trend of implementation of de-identification policies in the EU, the United States, Japan, Korea. We compared the de-identification policies of the regions with the definitions of terms related to de-identification, principles of de-identification and its legal treatments, and identified the main elements of each policy to derive common de-identification procedures.
Following this analysis, this study looked at the de-identification policies in perspective of risk-liability based on the risk society theory. We described the need to approach the policy in terms of risk management according to the nature of the de-identification itself, and described the implications of de-identification policy and the theory of risk society of modernity. Furthermore, we identified the constituents involved in de-identification policies, derived their roles, and explored how each constituent should distribute risk responsibilities in the operation of de-identification policies.
In particular, we conducted a case study on Korea in relation to de-identification policy in order to apply the proposed structure of responsibility in the de-identification policy to actual cases and to analyze the issues involved. Korea is a country that provides the most specific guidelines for the de-identification of personal information, and it is a country where the issues related to de-identification policy have progressed to a great extent and the relevant issues are being revealed. Moreover, Korea has experienced a rapid industrialization process, and it has both the benefits of the information society and the social contradiction and conflict as a dangerous society. Therefore, the case study of Korea has important value at the intersection of characteristics of risk society theory and characteristics of information society (Lee, 2005).
As a result, this study presents a policy proposal on how to approach the issue of responsibility distribution in the de-identification policy and provides valuable guidance in designing related policies in various countries in the future.
De-identification of personal information
Meaning and value of de-identification of personal information
As the significance of Big Data Technology increases and impacts the growth of the economy and the competitiveness of companies, these institutions invest an enormous amount of time and money to extract meaningful data from the Big Data domain. For example, FICO, a US data analytics company specializing in credit rating services, measures consumer credit risk based on use of mobile devices and electricity and water consumption as well as rent or mortgage payments. Kreditech, a German online lender, that provides loan services to customers based on their credit history, analyses customer behaviour patterns on Facebook, eBay, and Amazon to estimate how thoroughly they read personal loan terms and conditions (Financial Times, 2016). Although personal information is very useful to government and business institutions, access to these data is regulated by personal information protection and personal privacy acts. To date, prior consent has been used to bypass these regulations and access personal information. However, the use of prior consent is not entirely effective, as it is very difficult to identify and receive consent to access an enormous quantity of new data, and the costs for identifying an individual and gaining his or her consent are extremely high.
Therefore, there is a requirement to address these problems, and one of the most effective alternatives is the de-identification of personal information. As will be discussed later, the de-identification of personal information is a process that makes it difficult to identify a given individual by protecting personal information from the other collected data. If the personal data cannot be identified, it will not be subject to consent regulations. If this is the case, the de-identified data can be used freely in Big Data analysis. In previous studies, this technique, described as a method of technical circumvention to avoid being subject to the regulations, was often expressed as a removal of the protection capsule of personal information (Oh, 2015).
In most of the leading countries, the scope of utilisation of personal information is limited by prior consent; however, if the de-identification technique is used, regulations will not be enforced. If this is the case, de-identified personal information can be used for purposes other than those described in the prior consent depending on the interpretation of the laws, and the information can be provided to a third party without additional consent. Thus, de-identification of personal information can be applied to international data transfers, and it is expected that this technique would have a positive effect on the global utilisation of personal information and Big Data.
Concerns and risks of de-identification
Currently, within the countries studied herein, there are many concerns regarding effective policies that address de-identification. Recently in South Korea, the government introduced legislation and administrative laws that address de-identification, and the National Assembly is in the process of advancing this legislation. These actions propose a new exception called de-identification laws for use of personal information under the current personal information protection law and data communication network law. The current laws require that an individual provide his or her prior consent, giving the individual control over the utilisation of his or her personal information as guaranteed by the national constitution and international human rights standards. If government institutions and companies present an authoritative interpretation regarding de-identification focused only on the utilisation of personal information, then they may obtain broad access and unlimited use of personal information. This exception will overturn the fundamental legal principles of the current law that protects the privacy of the individual.
Furthermore, current laws within several of the countries allow an individual to forfeit his or her consent of de-identification, thereby allowing personal information managers the ability to collect, save, and analyse personal information and to provide it to a third party. Given the current state of de-identification technologies and privacy protection models, it is not possible to entirely eliminate the risk of re-identification of personal information through the integration of personal information with other types of data and information. In other words, de-identified information is always exposed to the risk of being combined with some automated catalyst to enable the re-identification process, thereby allowing the current personal information security system to be bypassed in the near future.
There are many cases in which de-identified information has become re-identified, thus losing its security protection and showing the hoax of perfect de-identification. In 2006, America Online (AOL) collected 20 million instances of personal data extracted by the AOL search engine from 650,000 users over a period of three months and published these data in the public domain for academic research. A journalist from the New York Times, accessing these published data, identified the name of AOL user ‘4417749’ as Thelma Arnold, her age as 62 years, and her marriage status as a widow (New York Times, 2006). In 2010, Netflix, the largest online movie rental provider, published histories of the viewing activities of 500,000 users for the purpose of sponsoring a competition for the creation of algorithms that use Netflix customer profiles to suggest appropriate movies for these users. By accessing the published data, two researchers at the University of Texas at Austin, professors Arvind Narayanan and Vitaly Shmatikov, successfully re-identified a subset of the Netflix users by combining the open reviews of the users and corresponding data (Wall Street Journal, 2010). On another occasion, the Electronics and Telecommunications Research Institute (ETRI), located in South Korea, collected 6.67 million Korean Facebook accounts and 2.27 million Korean Twitter accounts for the purpose of analysing these data to identify the specific owners of the accounts. The owners of more than 3% of the accounts were identified (Digital Daily, 2013).
From these cases, it is clear that premature legislation and enactment of policies must be seriously discussed and debated because they may pose a severe risk to the protection of the personal information of citizens. The priority of eliminating this security exposure should be increased to insure that policies and laws addressing de-identification of personal information are passed in an expedient manner.
The de-identification policy frameworks of major countries
De-identification policies for personal information in different regions/countries
European Union (EU)
The personal identification protection policy of the EU provides fundamental basics such as charters, conventions, guidelines, regulations, etc., based on which each EU member can protect personal information through individual laws and legislation created by the specific EU member country. In order to protect privacy related to personal information and foster the unobstructed distribution of personal information between EU countries, the EU, in 1995, created regulations that govern the protection of personal information, ‘EU Directive 95/46/EC of the European Parliament and of the Council’ (European Parliament and Council, 1995). Recently, on April 14, 2016, the EU Council passed the General Data Protection Regulation (GDPR), for which enforcement will begin in two years (European Parliament and Council, 1995). This Directive includes stricter personal information security methods and strengthens the access that was guaranteed by other EU Directives. The EU also redefined the concepts related to personal information de-identification, as stated in the EU Directive.
To analyse the de-identification policies implemented by the EU, it is necessary to carefully study the definitions of personal information–related terms that are included within the EU GDPR. The GDPR defines personal data as ‘any information relating to an identified or identifiable natural person (data subject)’ (European Parliament and Council, 1995). In other words, if information regarding ‘an identified or identifiable natural person’ is either not recognized or is insufficient to uniquely identify the person, then the information can be used freely in the public domain, free of restriction from protection regulations. Article 26 of the preface of the EU Directive states that the principles of personal information are not applied to residual data that are not personal, referred to as anonymous data, and such data should not be (European Parliament and Council, 1995). However, since the EU Directive and the GDPR do not specify the technique that is used for de-identification, the boundary about the utilisation of de-identification data is ambiguous. Specific criteria and processing principles have been discussed within the EU Data Protection Working Party (Article 29 Data Protection Working Party, 2014).
There have been many discussions and studies regarding the grey area, or the boundary, between personal and anonymous data. As a result, the EU GDPR established and introduced the concept of ‘pseudonymisation’. The GDPR defined pseudonymisation as a procedure by which data can no longer be attributed to a certain individual without the use of additional information. Therefore, such additional information should be separated and technically and administratively removed in order to prevent re-identification (European Parliament and Council, 1995). Thus, if pseudonymous information includes additional data, then there is a possibility of re-identification, and personal information could be recognized.
The EU classifies de-identified information as pseudonymous and anonymous information. Pseudonymous information, which is treated as personal information, can be used under technical and administrative actions, whereas anonymous information, which does not include personal information and is treated as non-personal information, is a candidate for public use.
United States (US)
There is no United States statute within the United States Code (USC) that defines or regulates personal information. Rather, the regulation of personal information varies by state and, in some cases, by counties within a state. Most state regulation regarding personal information is based on case law precedent, under which personal information can be accessed by the public in accordance with local state and county laws.
In October 2015, the United States National Institute for Standards and Technology (NIST), an agency that defines measurement standards, reviewed all discussions related to de-identification collected from various individuals and institutes over a period of 20 years and issued a final report of their reviews (Garfinkel, 2015). This report presents the most current discussions and trends of the United States de-identification standard. According to this report, personally identifiable information (PII) is the personal information that is subject to protection, and it is defined as ‘any information about an individual maintained by an agency’. Furthermore, PII is ‘any information that can be used to distinguish or trace an individual’s identity and other information that is linked or linkable to an individual’ (Garfindel, 2015). Based on this definition, PII includes not only distinguishable information but also traceable information that can be linked to an individual, greatly burdening companies that use Big Data. NIST defines de-identified information as information that contains no reasonable or traceable information that can distinguish any two identities (Erika, 2010). Therefore, de-identified information is not classified as PII when actions supporting de-identification are carried out (White House, 2015).
More thorough and systematic approaches to the formulation of de-identification within the United States can be found in the Health Information Portability and Accountability Act (HIPAA). Since HIPAA is restricted to the regulation of personal medical information in the medical field and cannot be uniformly applied to other fields, we present it as an appropriate example to understand the general approaches and principles of de-identification in the United States.
First, Section 164.514 is the HIPAA privacy rule and states that ‘health information that does not identify an individual and with respect to which there is no reasonable basis to believe that the information can be used to identify an individual is not individually identifiable health information’ (Bradley, 2012). This definition identifies the domain of PII. In other words, any de-identified personal health record is not classified as personal information; thus, it allows unlimited use of de-identified information by the public.
The HIPAA privacy rule contains descriptions of two methods to define personal information as de-identified information. The first method is the expert-determination method, and the second method is the safe-harbor method. Under the expert-determination method, an expert determines the risk of de-identified information being converted to personal information using statistical and scientific techniques. If the risk is low, then the information is considered to be de-identified information and excluded from the privacy rule regulation. The safe-harbor method, on the other hand, eliminates 18 types of data determined by the HIPAA privacy rule. If after removal of these 18 types from the information set the identity cannot be distinguished using the remainder of the data, then the data are excluded from the privacy rule regulation. If 16 of the 18 types are eliminated, excluding ‘date information’ and ‘other identifying numbers or codes’, then a partial regulation will be applied. To summarize, in the US, actions defined by various individual laws can allow the public to utilise de-identified information as non-personal information without further consent.
Japan
In Japan, the Personal Information Protection Act was approved in 2003 and became effective on April 1, 2005. Different laws are applied to individuals and the public sector; the Personal Information Protection Act regulates only individuals, whereas the public at large is regulated by the Personal Information Protection Acts held by administrative institutions and independent administrative agencies. In order to keep pace with advances in information and communication technologies and the Big Data era, IT comprehensive strategy headquarters in Japan began to improve the de-identification legislation, which was completed on September 3, 2015, to take effect in 2017.
The main focus of the amended act on the protection of personal information for de-identified information is to establish and introduce the concept of de-identification of information, which allows de-identified information to be accessed by the public without the need for obtaining consent. The act defines personal information as either of the following: (i) name, date of birth, or other information that can identify individuals, or (ii) any information that contains a personal identification code (IT Strategy Headquarters, 2016). Also defined is de-identified information resulting from a non-recoverable process due to partial removal of the information in (i) or complete removal of the information in (ii) (IT Strategy Headquarters, 2016). Since de-identified information is defined as non-re-identifiable information, it is treated as non-personal information, and the technical and administrative duties to manage these data are assigned to a business operator handling de-identified information (IT Strategy Headquarters, 2016)
Like the United States, Japan published de-identification guidelines that are based on the amended act in the field of medicine that utilises personal information and advanced technologies. The guideline is the ‘De-identification Technical Guide for the Use of Medical Information ver. 1.0’, published by the Japan Medical Imaging and Radiological Systems Industries Association (JIRA). This guide claims that complete de-identification is impossible and classifies de-identification into three levels as follows: (a) connectible de-identified data (pseudonymisation), (b) so-called de-identified data, and (c) highly de-identified data. Among these levels, it is speculated that the boundary that determines whether personal information is assigned to the (b) level is to be discussed and determined by the Personal Information Protection Commission, which was commissioned in January 2016.
Korea
In Korea, the personal information policy consists of the public and private sectors, each of which was separately regulated prior to 2011 by two distinct acts. As the demand for a general law regulating the protection of personal information increased, the current law was legislated on March 29, 2011, and became effective on September 30, 2011 (Ministry of Government Administration and Home Affairs, 2011). After this, guidelines for the de-identification of personal information were published by various government departments as shown in Table 1 to protect personal information and promote investment in Big Data institutes.
Guidelines for de-identification in Korea
Guidelines for de-identification in Korea
De-identification of personal information in Korea and proper actions at each level. Source: Cabinet Office et al. (2016).
Most recently, on June 30, 2016, the relevant departments, including the Cabinet Office; the Ministry of Government Administration and Home Affairs; the Korea Communications Commission; the Financial Services Commission; the Ministry of Science, ICT and Future Planning; and the Ministry of Health and Welfare published the ‘Guideline for the De-identification of Personal Information’. It is reported in the guideline that all previous guidelines would be replaced beginning July 1, 2016; therefore, this latest guideline will be discussed to examine the details of the de-identification policies in Korea.
The Personal Information Protection Act in Korea defines personal information as any information that can be used to distinguish identities such as name, social security number, and personal images, or any information that can be combined with other information to constitute identifiable information. In order for such information to be utilised, the guideline defines de-identification as an action to make the information indistinguishable by partially, or even completely, deleting or replacing distinguishing factors. Furthermore, properly de-identified information is not considered as personal information since the information is not distinguishable, and therefore, that information can be utilised by any person or be provided to a third party.
Moreover, the guideline suggests four de-identification steps: (1) preliminary review, (2) de-identification, (3) appropriateness evaluation, and (4) follow-up management. The first action, preliminary review, examines whether the information to be used is personal information according to the Personal Information Protection Act. The second action is de-identification, in which identifiers4
An identifier is defined as the unique number or name that is assigned to an individual or an object of an individual; there are 14 values in total including name, address, date, and phone number, according to HIPAA privacy rules (Cabinet Office, 2016).
An attribute value is defined as any information that can be used to identify individuals by being combined with other information; it is categorized into individual, physical, credit, electronic, and family properties (Cabinet Office, 2016).
Such an application of de-identification at each step might be not practical for small and mid-sized businesses and start-ups, and details of the application can vary in different fields; therefore, specialised facilities in each field are designated and operated in Korea beginning August 2016, and separate support centres are established and operated in the main centre for personal information protection, the Korea Internet and Security Agency (KISA).
Definitions of terms related to de-identification
The definitions of the terms for de-identification of personal information in each of the different regions and countries are summarized in Table 2 according to their relevant laws and guidelines.
These terms are different from region to region. For example, ‘pseudonymisation’ is used in the EU, ‘de-identified information’ in the US, Japan, and Korea, and ‘anonymised information’ in the EU and US. All regions and countries define de-identified information as that information that cannot be used to distinguish individuals, contrasting this with personal information that can distinguish individuals. The EU defines pseudonymous data as data that cannot belong to an individual; this is a classification somewhat different from that used by the other countries, but its meaning in the larger context is similar to the others.
Definitions of terms related to de-identification of personal information in different regions/countries
Definitions of terms related to de-identification of personal information in different regions/countries
The unique property of the definitions of the terms used in the EU and US is that they classify the terms for de-identification into two groups according to the level of de-identification. The two terms used in the EU are pseudonymisation and anonymisation, whereas the two terms used in the US are de-identified information and anonymised information. In order to classify the data as de-identified, the EU requires that the data be ‘without the use of additional information’, and the US requires only with ‘the remaining information’. This means that the de-identified information could be re-identified if additional information is provided and combined with the de-identified information. Furthermore, in the definitions of anonymisation (EU) and anonymised information (US), the data satisfy the criteria stated as ‘irreversibly preventing identification’ (EU) and any type of ‘code or other association for re-identification no longer exists’ (US), confirming that the criterion distinguishing the two terms for de-identification is the possibility of re-identification.
On the other hand, Japan and Korea do not categorize the terms according to the level of de-identification but define a single term. Because in Japan de-identified information is defined as data that ‘do not allow restoring of said personal information’, this type of information represents that for which there is no possibility of re-identification. Instead, Japan aims to develop foster a discussion to categorize de-identified information into three levels according to the level of de-identification documented within the guideline (JIRA, 2015). Similarly, Korea does not have a separate definition for anonymisation other than the term ‘de-identified information’, and the ‘De-identification Guideline for Personal Information’ states that the ‘de-identification process’ is essentially identical to anonymisation in the EU. Therefore, as is the case with Japan, it is clear that the definition of de-identified information covers anonymisation as defined in the EU.
Although de-identification and anonymisation terms were both used interchangeably for the definitions of de-identification, their definitions were not clear until a few years ago. The unambiguous use of the terms related to the de-identification means that ‘trying to achieve anonymity’ can lead to misunderstandings as if ‘achieve anonymity’ (Ohm, 2010). Fortunately, the definitions of the relevant terms are now clearer as each country works to amend laws related to personal information and publishes guidelines that propose very specific criteria. As the Big Data era has begun and international interactions and communications of data have become very active, it is expected that these countries will make the necessary efforts to conduct discussions to converge upon a common set of standard guidelines and definitions.
Once a country has defined a vocabulary of de-identification terms, it must then define the principles that will guide the creation of clear, enforceable laws and regulations to be applied to the proper management of de-identified information.
Table 3 summarizes the principles adopted by each country discussed herein to identify de-identified information that are based on the laws, regulations, and guidelines of each country. The required principles that must be followed are labelled with an ‘O’, and principles that are not required to be followed or are not yet determined by a guideline are left blank.
Principles for handling de-identified information in different regions/countries
Principles for handling de-identified information in different regions/countries
As shown in Table 3, all regions and countries are consistent in the prohibition of re-identification of de-identified information (Principle 1) and the requirements for an administrative and technical safety management system to prevent any type of data exposure or re-identification of de-identified information (Principle 2). By adopting these two principles, it is clear that each country correctly understands the risk of re-identification of de-identified information as well as the side effects and loss from re-identification.
Excluding the above two principles, each country defines different principles for managing de-identified information. For example, generation of irreversible data (Principle 3) is required only in Japan when de-identified information is generated. This principle is a direct result of the amended Personal Information Protection Act enacted into law in Japan, which requires that de-identified information handled by the personal information user be provided in such a way that it is not recoverable, in agreement with the definition of ‘generation of irreversible data’ of de-identified information in Japan. Furthermore, Japan has adopted seven of the nine handling principles, representing the highest number of principles adopted by any of the different regions and countries. However, per Table 3, it should be noted that since the handling principles of the EU are based on the GDPR, the handling principles of Japan are based on the Personal Information Protection Act, and the handling principles of both the US and Korea are based on guidelines as well as laws, differences can be expected. Therefore, the detailed criteria to be determined by the laws of each country in the EU and by the detailed guidelines in Japan are also to be expected.
Table 4 summarizes the re-identification potential and the legal treatment of de-identified information in each country handled according to the previously discussed handling principles. The results of this analysis illustrate that this potential within some countries was similar and the potential in other countries was completely different; however, the interpretation of the results can differ depending on the de-identification regulations within each country. Consider the EU, which focuses more on protection than on the utilisation of personal information. Because the probability that pseudonymous data will be re-identified is exist, then the data are considered personal data and are regulated by the Protection Act. However, even though the pseudonymous data are considered as personal data, there are exceptions in which they are allowed to be used, e.g. public interest, statistics, and historical purposes required by some technical and administrative measures. Therefore, the EU attempts to strike a balance between protection and utilisation when it comes to policymaking regarding de-identification of personal information (Hintze, 2017).
Comparison of the legal treatment and the potential for re-identification of de-identified information in different regions/coun- tries
On the other hand, the US focuses more on the utilisation than on the protection of personal information. Although the US acknowledges that any de-identification of information may be reversible, de-identified information is treated as non-personal information with respect to the requirements for active utilisation of the data.
As shown in Table 4, the de-identified information of Japan requires many handling principles according to the Personal Information Protection Act, but if the de-identified information meets the handling principles and is processed as irreversible, then it will be treated as non-personal information and can be provided to the public and utilised without regard to existing regulations. Although companies may be burdened by the large number of handling processes that they would need to carry out to use the personal information, once an institution complies with the required principles, the institution can use personal information as non-personal information without any concerns about potential issues and liability.
Finally, the guideline for the de-identification of personal information in Korea considers even de-identified information that has passed the three-step de-identification process – prior review, de-identification, and appropriateness evaluation – not merely as potential personal information but also as information that can be potentially re-identified. However, in order to actively assist companies in the utilisation of de-identified information, the de-identified information is officially treated as non-personal information. Additionally, Korea is adopting policies by which this de-identified information is then monitored through the follow-up step, the fourth step in the guideline, to minimize possible damage due to re-identification. Moreover, if the goal of the utilisation of the de-identified information is accomplished or the information is no longer required, then the de-identified information is destroyed (Principle 9); thus, its entire life cycle from generation to destruction is managed. To summarize: Although Korea’s de-identification policies can minimize the risk of re-identification throughout the life of the de-identified information, these policies assign sole responsibility for any issues that occur during the follow-up management to the de-identified information manager.
As can be seen previously, many countries have been making the effort to implement the de-identification policy in their personal information protection framework by either revising relevant legislations or providing guidelines. Upon close examination of such efforts, five common steps or stages were identified in the procedures for the de-identification of personal information.
The first step is the determination of information to be de-identified. Because of the amount and diversity of data being accumulated in the age of big data, it is not easy to identify data that can be used freely from data that needs to be protected. Prior to executing the de-identification measures, it is important to determine the scope of personal information that will be de-identified. And for such, legislations and guidelines from each country clearly outline and provide not only the definitions related to de-identification and the procedures for de-identification but also the definition of personal information and the standard for determining which information are in fact, personal information. The policies of the select countries provide the standard for the determination of information to be de-identified and only allow de-identification of the information determined to be personal information.
The data identifiability spectrum. Source: Recomposition of the S. Garfinkel’s Diagram (Garfindel, 2015).
The second step is the de-identification measure. During this step, any characteristics that can be used to identify an individual are removed so that the processed information is no longer considered to be personal information and therefore, safe for use. While the specific methodology for de-identification can be outlined such as the Safe-Harbor6
If 18 types of data such as name, phone number and e-mail are removed and the remaining data cannot be combined with other data to identify individuals, information can be collected and processed outside of the HIPAA Act.
The third step is the evaluation of appropriateness. That is, once the de-identification has been performed, a process for reviewing the effectiveness of the de-identification is carried out by relevant experts. The expert-determination method7
Based on evaluation by relevant experts with the statistical, scientific knowledge and experience needed to review the de-identification methods, if the de-identified information is deemed difficult to re-identify, such information is free from the compliance of the HIPAA Act.
The fourth step is the security management. The select countries mentioned above do not simply allow de-identified information to be used freely but require certain security measures to be met before allowing the use of de-identified information. This may arise from concerns over re-identification of personal information from the de-identified information combined with other information.
The last step is the use of de-identified information. If the previous four steps have been properly carried out and fulfilled, the personal information processed into de-identified information can be used freely for big data analysis and other purposes free from the compliance issues of personal information protection. However, pseudonymized information as defined by the EU GDPR can only be used for public benefit such as record keeping or scientific, historical and statistical purposes (Council of the European Union, 2016). In the case of Korea, post management responsibilities such as monitoring for possible re-identification of the de-identified information are mandatory requirements.
As can be seen from the above, the de-identification policy of the select countries has five common steps and while only in its infancy, it looks as if the framework above will become the trend in the process of de-identification of personal information.
Characteristics of de-identification, and necessity of review from a risk management perspective
As yet, there is no known way to completely remove the risks involved in de-identification (Article 29 Data Protection Working Party, 2014). And Even if all the steps described in the last sections are perfectly executed under a systematically established de-identification process, it is impossible to completely remove all the risks inherent in each of the steps of the de-identification process due to the intrinsic and fundamental characteristics of de-identification itself.
As can be seen from Fig. 2, there exists a trade-off relationship between the level of de-identification and value of data for use. For example, a high level of de-identification can be achieved by completely removing any information that can be used to identify an individual thereby greatly lowering the risks of privacy issues but such information will be practically useless; if lots of information capable of identifying an individual is not removed, the low level of de-identification will mean higher risk of privacy issues but more valuable information for use. In consideration of the above situation, the risk of re-identification cannot be completely removed but should be managed at the minimum acceptable level once personal information has been de-identified.
The theory of risk society of modernity and risk distribution
The modern society of today is full of risks. The risk society as explained by Ulrich Beck is a concept that was developed to analyze the characteristics of the modern society and is defined as the increase in inherent risks of a society as it goes through industrial development and modernization to achieve material wealth based on technological development.
Beck defines the risk society as a society that goes through (1) the stage of ‘manufactured risks’ that are the result of technological advances and development in the form of new risks, (2) the stage of ‘risk definition’ to determine what these risks are, and (3) the stage of ‘risk distribution’ and how such risks are distributed within the society (Beck, 1986). First, looking into the ‘manufactured risk’ stage, although risk itself is certainly not desirable, in the process of achieving our goals and objectives, a certain amount of risk is inevitable (Wildavsky, 1988). As such, we have to live with these risks and in the age of big data, the problem of ‘manufactured risks’ such as the risks of leakage of personal information and re-identification of de-identified information are all but unavoidable should we continue to make use of information at all (Róbert, 2006). The ‘risk definition’ stage is where the causes of the manufactured risks are identified and clarified to assess and estimate the scope and influence of the risks. The efforts made by countries to clearly define the legal concept of de-identification and researches on ways to provide adequate levels of de-identification can all be said to be of this stage. Lastly, the ‘risk distribution’ stage is the concern of a society on how it will socially distribute such risks. This issue is the most important aspect in risk management and encompasses the discussions on how to assess and relegate liabilities arising from such risks that become a reality through re-identification of de-identified information.
Ulrich Beck also argued that a social invention called a ‘risk contract’ is needed to proactively mitigate the side effects and costs that will arise when new technologies are developed or new markets are opened and to properly distribute the risks therefrom. A risk contract is a guarantee that the government controls, distributes and compensates for the risks that arise within the system (Triantis, 2000; Beck, 2008). The responsibilities for risk can include comprehensive empowerment to perform any and all necessary actions to fulfill the ultimate goal of risk prevention or be limited to minimal measures to stop the risks from causing damage. Currently, without a properly established de-identification policy and framework, the governments of the select countries have chosen to endow extensive responsibilities to the organization that executes the de-identification procedures and measures while ensuring that such organization will take all the precautionary and safety measures in the broadest terms; at the same time, certain amount of leniency for good performance is given to make this type of regulatory compliance feasible. However, due to excessive burden placed on the de-identifier, this type of regulatory compliance may result in oppressing the profit generating activities of the businesses. Therefore, considering the global trend towards the strengthening of punitive liability, it may be better to place minimal responsibilities on de-identification by limiting the requirements for safety but follow up with sufficient liability for punitive damages to ensure autonomous compliance of the de-identifiers of personal information.
It should also be noted that the discussions on the risk distribution may easily gravitate towards how much liability is placed on whom. However, it is suggested that more research on clearly establishing the roles and responsibilities during the initial stage of risk management is needed before reasonable risk distribution can be agreed upon.
Implications of de-identification policy and the theory of risk society of modernity
In the current existing personal information protection framework, only leak of personal information is of major concern. But with the concept of de-identification thrown in the formula, additional obligations to protect the de-identified information and the duty of care have expanded the scope of risk. And with the addition of more handlers of information during the de-identification process, the distribution of roles for risk management becomes more complex; moreover, such process that creates additional points of management can only mean higher chances of risk. The de-identification of personal information itself is causing an increase in risk and makes risk management all the more difficult.
Most of the countries implementing the de-identification policy place fault based on the principle of negligence determined by causation where both cause in fact and proximate cause conditions need to be fulfilled. Therefore, in order to place liability of a specific event, it has to be clear without a doubt that a specific action or lack thereof caused the specified resulting event. Unfortunately, with the implementation of de-identification policy, such causality has become even more difficult to prove, making it practically impossible to single out an individual as the cause of an injury or damage. This kind of uncertainty between cause and effect brought about by de-identification may lead to dereliction of duty and evasion of duty.
Constituents of de-identification and their relationships.
In consideration of such possible risks arising from the implementation of the de-identification policy, it becomes important to make certain that the risk provider take responsibility from a risk management perspective. While it is difficult to unilaterally implement a regulation in general on liability risks since most countries acknowledge the principle of negligence, a regulatory mechanism that allows certain amount of leniency for management is needed together with the placement of duty based on risks for liabilities associated with these risks. Also, the constituents that are part of the risk distribution need to be clearly defined and their roles and responsibilities clarified to facilitate the distribution of liabilities.
Constituents of de-identification
In All duty and responsibility stem from the existence of a person or organization that can bear such burden and should there not exist such a person or organization, the concept of duty and responsibility become meaningless (Jin, 2010). Therefore, in order to distribute the liabilities of risks of de-identification, the constituents of the de-identification process must be identified and their relationships analyzed. In the de-identification framework, many actors such as the Risk Provider, Risk Accepter, Regulator, Insurer and Third Party exist to fulfil their roles; we describe the roles of each of these constituents and their relationships with each other.
A. Risk provider
The Risk Provider provides dangerous or risky goods and services to earn certain amount of profit (Jin, 2010). This Risk Provider is the benefactor of the benefits earned from the risks transferred to the Risk Accepter and is one of the core constituents of the risk theory. The Risk Provider also forms diverse relationships with other constituents to manage risk and in the de-identification framework, gathers and processes personal information into de-identified information. Private companies or other types of organizations that manage de-identified information are the typical examples of the Risk Provider. While the Risk Provider gains profit and benefits from collecting and processing personal information, the Provider also manufactures risks should the de-identified information be used to re-identify individuals and is leaked; the management of such risks is main role of the Risk Provider.
B. Risk accepter
The Risk Accepter accepts the risks that the Risk Provider transfers to benefit from the goods and services provided based on mutual contract. Specifically, the Risk Accepter provides his/her personal information in return for services rendered by the Provider and is the data subject. In such case, based on the permission provided by the Risk Accepter, the Risk Provider manages and uses the information provided, thereby creating a risk for the Risk Accepter. Because of such risk, the Risk Provider will always seek to remove the link between the personal information provided and the data subject through the de-identification measures but because the possibility of re-identification always exists, the contractual risk relationship between the Risk Provider and the Risk Accepter will always be valid.
C. Regulator
When a risk becomes a reality, and begins to cause economical
D. Assistant
During the de-identification process, much work and many roles exist to manage the risks created for the Risk Provider. But because the goal of the Risk Provider is to make a profit through the use of de-identified information, if the cost of risk management is greater than the profit that can be gained by making use of the de-identified information, the de-identification policy will simply not work. To ensure that this does not happen, the Assistant is needed to help the Risk Provider fulfill its role. The Assistant can be various specialized organizations that provide technical and legal consultation as well as specialized expertise and human resources pertaining to the process of de-identification to lessen the burden of the Risk Provider.
E. Insurer
Looking at the EU General Data Protection Regulation (GDPR) (Concil of the European Union, 2016) which was passed by the EU Assembly in April, 2016, the recently amended HIPAA Act of the US and the Personal Information Protection Act of Korea, there is a definitive worldwide trend towards strengthening the managerial responsibilities for protection personal information and a rise in the fine for punitive damages. That is, the Regulator is strengthening the liability of the Risk Provider for risks involved in the de-identification process but looking from a risk theory perspective, there exists the possibility of burdening the Risk Provider just based on the fact that the Risk Provider supplied the risk even when the Provider made all the effort it could to prevent such risks. Thus, it may become necessary for the Risk Provider to consider offloading some of the liabilities of the risks to a third party by making use of cyber insurance. Unfortunately, except for few insurance companies in the US insurance market capable of accepting such liabilities that can occur in the cyberspace, the cyber insurance market has not developed much; moreover, with the uncertainty of whether or not the damages caused by re-identified information can be included into the damages from leak of personal information, more long-term study and research is needed.
A*. Third party: Risk provider
One of the advantages of the de-identification of personal information is that the de-identified information can be freely transferred to a third party for processing and provision without the consent of the data subjects thereby raising usability of information thusly treated. Because of such advantage, many Risk Providers commission third parties to process personal information into de-identified information or provide de-identified information to third parties. But because many countries force contractual agreement for risk management in such cases, the third parties effectively become the Risk Providers because they are obligated to carry out the necessary precautionary and safety measures of the Risk Provider based on a legally binding contract. Because the commissioning and transfer of de-identified information to third parties may lead to the transfer of this information outside the originating country which means even more increase in risk and liability issues, deeper discussions on the roles and responsibilities between the constituents are needed.
Assigned roles of the constituents of de-identification of select countries
We will take a look at the actual roles applied to the constituents of de-identification from various select countries. First, the governments of most countries act as the Regulator (C) by defining the rules and compliance for the Risk Provider (A) to follow. The HIPAA Act of the US, the GDPR of the EU, the amended act on the protection of personal information of Japan, and the Guideline on Personal Information De-Identification Measures of Korea are examples of such legislation. The de-identification policies are still in the phase of policy implementation and related technologies will keep on developing which is the reason why most of the responsibilities and duties of the Risk Provider (A) laid out in these legislations tend to be principle oriented abstract clauses. For example, the IT Strategy Headquarters (2016) of Japan stipulates the duty to create unrecoverable information, the duty to prevent the leak of information, the prohibition of attempts to decipher information and the duty to take safety measures (IT Strategy Headquarters, 2016). On the other hand, because Korea’s legislation is an administrative regulation which controls the Risk Provider (A) through guidelines, it does offer more detailed measures on the duties and responsibilities of the Risk Provider (A). Some technical and managerial safety measures includes measures such as manager of de-identified information file must be designated, no information sharing is allowed between the organization that provided the original information and the organization that manages the de-identified information and various access control management for the de-identified information.
There is also role needed in the relationship between the Risk Provider (A) and the Risk Accepter (B). Basically, the Risk Provider (A) gathers and uses the personal information of the Risk Accepter (B) based on the consent provided but in the case of de-identification, because the de-identification process is the unlinking personal information from the data subject, no consent is needed from the Risk Accepter (B). However, in the case of the US, it is recommended (Federal Trade Commission, 2012) to notify and promise the Risk Accepter (B) publicly that information will be kept and used in the de-identified form so that the Risk Accepter (B) becomes aware of the possibility of re-identification. And Japan is also strengthening the right to know for the Risk Accepter by requiring the publication of the items of personal information included in the de-identified information (IT Strategy Headquarters, 2016).
The Assistant (D) is equally important in the de-identification framework. For areas that require highly specialized expertise which the Risk Provider (A) cannot process alone, the expertise of the Assistant (D) must be utilized for better efficiency. For example, the HIPAA Privacy Rule of the US emphasizes the expert-determination method for evaluating the appropriateness of de-identified information by appointing relevant experts with scientific knowledge and vast experience to determine the possibility of re-identification. In Korea, measures are being prepared to allow the competent government ministries to designate specialized organizations by each industry or sector to support the Risk Provider (A) in their de-identification process.
Because of possible ambiguity between the Risk Provider (A) and the Third Party (A*) in the case of commissioning and provision of de-identified information, it is important to establish the roles of both of these constituents clearly. To do so, the US government (Federal Trade Commission, 2012) is recommending the stipulation of a clause in the contract to forbid the re-identification of de-identified information by the Third Party (A*). And in the case of Japan, de-identified information provided to the Third Party (A*) must be publicly notified in accordance with the regulations set by the Personal Information Protection Committee.
Lastly, for the case of transfer of liability of the risk of de-identification by the Risk Provider (A) to the Insurer (E), there are no countries in the world that have as yet, made cyber insurance mandatory or recommended such measure for de-identified information. However, the market size of cyber insurance in the US is expected to grow to 7.5 billion US dollars by 2020 (Pwc, 2016) which will lead to more cyber insurance coverage; in the near future when the de-identification policy is set up well, the current cyber insurance market which is limited to providing coverage for leak of personal information may extend to de-identified information.
Risk distribution among the constituents of de-identification
In this section, we will discuss the matter of risk distribution based on the constituents of de-identification identified in the previous section and the assigned roles of these constituents by their respective countries. First, if damage to the Risk Accepter (B) occurs in the process of de-identification of personal information provided or by re-identification of the de-identified information, the Risk Accepter (B) may hold the Risk Provider (A) accountable for such damages. And as long as the Risk Provider (A) is not charged with negligence during the de-identification process and faithfully carried out the regulations of the Regulator (C), the liabilities for the damages incurred may be mitigated or even exempted. Unfortunately, the term ‘gross’ negligence and ‘sufficient’ precautionary actions are subjective and debatable at best, making it difficult to properly and reasonably distribute the liabilities incurred. For example, when re-identification occurs due to the advent of new advanced technology, it is difficult to gauge the level of responsibility of the Risk Provider to monitor such advances in technology. Details such as these needs to be deliberated on and discussed by a relevant committee to help the determination of accountability as well as provide detailed guidelines for reference by the judiciary branch.
Such activities and roles should be carried out by the Regulator (C), the government. While the Regulator (C) cannot directly hold the Risk Provider (A) accountable for the liabilities incurred, the Regulator (C) is capable of providing diverse support in dispersing the liabilities of the Risk Provider (A) for management responsibilities. That is, by providing various programs to support the risk management activities such as training for de-identification, evaluation of the appropriateness of the de-identification process, policy and managerial consultation for de-identification, etc., the Regulator (C) should carry out the roles of the Assistant (D).
Among the diverse activities and actions of the Assistant (D), the support for evaluation and consultation on the appropriateness of de-identification process can cause problems in the distribution of risks. For example, if an outside expert during the evaluation process gives his positive opinion on the appropriateness of the de-identification process being evaluated but in fact, is found to be lacking and allows re-identification, this expert may be held accountable for his evaluation. But the problem of appropriateness is an issue between the Risk Provider (A) and the Assistant (D), meaning any liability for damages to the data subject should ultimately be placed on the Risk Provider (A). The actual content of an expert’s evaluation and consultation can only serve as reference material in a court to determine whether or not the Risk Provider (A) faithfully fulfilled its obligations in the de-identification process, and not the evidence to implicate or hold the Assistant (D) accountable. On the other hand, it is possible to distribute such legal accountability should the Assistant (D) agree based on a legal contract.
The issue of accountability caused by the commissioning and provision of de-identified information to the Third Party (A*) will become an important issue to resolve in the process of de-identification. As can be seen from the previous discussions, most countries regulate the commissioning and provision of de-identified information to the Third Party (A*) by stipulating the accountability of a certain amount of liability to the Third Party (A*) in question through the contract that is concluded between the two constituents. In the case where the Third Party (A*) is commissioned to carry out the process of de-identification or to utilize the personal information and de-identified information by the Risk Provider (A), the Risk Provider (A) must be held fully accountable for any and all liabilities pertaining to the information thusly transferred. However, if the actual rights to the information was also transferred as well, it is not reasonable to place the full obligation of the risk on the Risk Provider (A). The fundamental purpose of the de-identification policy is to reduce the burden and risk in the use of information to make full use of big data but there is not much to be gained if the risk is not distributed to the Third Party (A*). Therefore, when de-identified information is provided to the Third Party (A*), managerial accountability should also follow to allow more use of information by spreading out the risk from the Risk Provider (A).
Case study : Analyzing the de-identification policy of Korea
In Korea, many regulations, case studies and other information have been published on de-identification methods and cases by each government ministry as the demand for big data began to emerge from the private sector. And on June 30
However, criticism also exists regarding the lack of legal basis for the four de-identification stages that govern the current Guideline in Korea (Citizens’ Coalition for Economic Justice Center for Consumer Justice
Also, there are another issues regarding post management responsibility distribution in case of commissioning and provision to the Third Party (A*). As discussed previously, Korea’s current Guideline is different compared to other countries in that post management responsibility of Risk Provider (A), in other words, management responsibility after de-identification process is greatly emphasized. Companies or service providers seeking to use or provide to the Third Party the de-identified information must monitor the possibility of re-identification of the de-identified information, and if any one of the inspection item is applicable, must devise and implement additional de-identification process. As such there is significant burden of manpower and budget needed for monitoring possible re-identification for the Risk Provider (A), even though sufficient technical and managerial measures that are realistically possible have been implemented during the de-identification, there is always the continuous burden from the risk of re-identification that can occur anytime from the Third Party (A*). Furthermore, if these burdens of responsibility outweigh the benefits from the de-identification process, Risk Provider (A) may opt to abandon the processing, commissioning and provision of de-identified information. As such, there is a need to establish a framework to prevent decreases in big data exchanges that reduces the re-identification possibility monitoring responsibility of Risk Provider (A) to a reasonable level, and transfer the rest of the duties and responsibilities to the Third Party (A*).
Korea’s de-identification support & management system
Korea’s de-identification support & management system
Korea has a de-identification support and management system to share the preliminary management and responsibility burden of the Risk Provider (A). Specialized agencies are defined and operated for each sector under the supervision of each relevant ministry, and a personal information de-identification support center has been established and operated by the Korea Internet and Security Agency (KISA), which is a specialized agency dedicated to personal information protection. Support roles of the sectoral expert agencies and the Personal Information De-identification Support Center are as shown above in Table 5.
Sectoral expert agencies and the Personal Information De-identification Support Center can be classified as Assistant (D) from the de-identification constituents, and perform functions difficult for Risk Provider (A) to provide by itself. These efforts of the government, which imposes a variety of responsibilities to the Risk Provider (A) as a Regulator (C) to reduce the preliminary management responsibility of the Risk Provider (A) in this way is greatly encouraging.
In much that same way, the responsibility distribution problem of the ‘de-identification appropriateness evaluator’ which performs the role of the Assistant (D) can also become an issue. According to the current Guideline of Korea, the Risk Provider (A) must establish a ‘de-identification measure appropriateness evaluator group’ (hereinafter “evaluator group”) with outside expert participation under the responsibility of the personal information protection officer and conduct strict evaluation after the de-identification measures. The evaluator group is composed of three or more inside experts and outside experts (more than half being outside specialists), and uses the k-anonymity model to evaluate the appropriateness of de-identification measures and levels. In this process, the inside experts are the Risk Provider (A) and outside experts are the Assistant (D). The current Guideline sets the completion criteria of a de-identification measure as an “appropriate” evaluation result from the evaluation group. If after receiving appropriate evaluation result, the de-identified information can be re-identified by an easy method or if it has been deemed to not have gone through sufficient de-identification, they can be held responsible for negligence. While the ultimate responsibility of the evaluator group composition and the whole evaluation process lies with the Risk Provider (A) and the inside evaluator from the Risk Provider (A), it may not be reasonable to burden all the responsibility to the Risk Provider (A) in spite of sufficient efforts to the contrary. At the least, as it pertains to expert areas, it is necessary to burden part of the responsibility to the Assistant (D). For example, methods such as putting in conditions for responsibility sharing on certain evaluation and contract signing cases for outside experts in the evaluator group, or specifying negligence responsibility when a problem arises can be employed.
This paper explored the trends in personal information de-identification policy implementation of major countries and discussed the problem of responsibility distribution among constituents relevant to de-identification policies based on Ulrich Beck’s risk society and risk liability theory.
Through this discussion, common process steps of de-identification policies of the select countries and the roles of de-identification constituents were identified; directions were laid out for what would be the most desirable way to distribute responsibilities for each constituent. The recommendations of this paper can be summarized as follows.
First, in order to maintain utilization value of data, an understanding of the de-identification institutions, which always has the possibility of re-identification of de-identified data is necessary, and thus the fundamental goal of de-identification policy must be ‘risk management’ not ‘risk removal’. Information cannot be classified in a binary manner as personal information or anonymous information, but rather should be understood as existing in a grey area which inherently possesses the possibility of re-identification, meaning policy instruments should be established that reflect ‘risk-liability theory’ perspective which stipulates that the subject accepting the risk state must also bear the responsibilities associated with the risk.
In addition, in order to realize the ultimate goal of de-identification institutions, which is the expansion of big data exchanges, there is a need to change from a preemptive regulatory system to a post regulatory system. Currently most countries have opted to implement policies that seek to prevent re-identification risk by strengthening preventative management responsibilities of the Risk Provider (A) that use de-identified information, but the resulting excessive burden can hinder the profit generating activities of businesses. On top of this, the expansion of risk management scope and difficulty of risk control possibility from the introduction of de-identification institutions are further increasing the burden. As such, in the future, it is recommended to transition towards a direction that imposes a narrow concept of responsibility requiring minimal safety measures before the use of de-identified data, and places accurate compensation responsibility when a problem occurs. And in this process a safety mechanism with sustained management and procedural feasibility should be prepared in parallel. A policy framework reflecting the risk management perspective that does not rely on a one-time de-identification measure should be established.
The result of this paper’s discussion shows that risk responsibility has to be focused on the company or organization (Risk Provider) that performs de-identification measures and seeks to utilize the de-identification information when distributing responsibilities among relevant constituents from the basis of risk-liability theory. In this case many companies cannot help but become hesitant to enter into the de-identification market, and as result decreases the practical benefits that can be garnered through this institution. In light of this, a number of methods are proposed to decrease the burden of responsibility for Risk Provider (A).
First, the roles to be performed by the Risk Provider (A) must be clearly defined. Mandating sufficient level of safety measures through ambiguous and abstract regulations only serves to increase in the burden of responsibility on the risk management subject and institutional inefficiency. To prevent this, there is a need to establish clear definitions and principles to follow regarding de-identification by legislating or amending relevant laws, and administrative rules legally entrusted with regulatory rights or guidelines should be distributed to encourage the implementation of clear roles by risk management subjects. Despite these mechanisms questions may still remain. It is recommended to resolve these remaining issues by establishing a personal information protection control tower such as a personal information protection committee.
Second, the government which acts as the Regulator (C) should strengthen its role as the Assistant (D). In order to reduce the roles and responsibilities concentrated on the Risk Provider (A), a government-led de-identification support organization should be established to provide supportive services such as the appropriateness evaluation, surveys and training. Through such an organization, preliminary management burden can be shared and liabilities can also be shared, albeit, limited to specific areas that pertain to the expertise of the organization.
Third, the managerial accountability placed on the Risk Provider (A) when de-identified information is transferred to the Third Party (A*) should be limited to a reasonable level. Currently, most countries demand identical level of accountability from the Risk Provider (A) for de-identified information transferred to the Third Party (A*). But for cases where the de-identified information is fully transferred to the Third Party (A*), not commissioned, the rights to this information has been transferred; to place the same accountability burden on the Risk Provider (A) in this case will only hinder the objective of the de-identification policy which is to promote the use of de-identified information for big data. Therefore, the accountability associated with safety managerial measures should be limited in a reasonable manner for a wide range of responsibilities from supervision to re-identification monitoring activities to ultimately disperse the accountability to the Third Party (A*).
Fourth, there is a need to implement a cyber insurance scheme in the long term to transfer the liability associated with the Risk Provider (A). The Risk Provider (A) should be forced to operate with insurance coverage for its liabilities. Even with the diverse mechanisms that can reduce and disperse the responsibilities of the Risk Provider (A), it is ultimately the Risk Provider (A) that has to bear the burden of such liabilities which will be considerable no matter what. So a cyber insurance system that can mitigate such burden must be established and promoted. And many countries only provide coverage for leak of personal information which should be extended to cover the risks of leakage of de-identified information and its re-identification. More detailed research on defining the damages, assessing the damages and formulating the insurance pay structure is needed.
The coming of the age of big data is inevitable. De-identification policy that has contradictory characteristics to make full use of the information and to protect the information at the same time. As mentioned above, we find it extremely dangerous to discuss policies or laws that only aim at increasing information availability, without having adequate protection against the risks of de-identification information that can be re-identified at any time. On this account, this paper emphasizes the necessity of the policy makers to change the basic goal of de-identification policy to the ‘risk management’ paradigm instead of ‘risk elimination’.
De-identification policy should cover several key elements : Sources and information base, legislations and policies relevant to topic, structure for processes, people, and technologies. In addition to these factors, the data sources and critical repositories should be considered as the basis for the analysis, since certainly the treatment of these can generate significant improvements in the process of application of the de-identification policy. Especially, this research has dealt with the details of the important elements that affect de-identification legislations and policies. We suggested the starting point of the discussion about how to distribute the risks inherent in the policy to the related constituents to manage the risk. In addition, by applying it to Korea, we examined the direction of concrete policy realization and the debate which is becoming issues. However, since the legal system and the policy environment of each country are quite different, the proposal of this paper has a limitation that it cannot provide a common and concrete policy direction. A few years later, as each country’s de-identification policy enters its setting phase, further research will be needed on which country’s policy direction has most effectively achieved the policy goals. Also, we will expand and deepen the discussion in various aspects by applying other factors that affect the de-identification policy.
