Abstract
Under European Union (EU) law, population-based cohort studies have the right to collect environmental data and to access geospatial data, at street level, on the web, from a host of public sources. As to geospatial information, they should be able to avail themselves of Member States' networks of services for geospatial data sets and services (discovery, viewing, downloading) via the Internet. On the other hand, linkage of health data of biobank participants to environmental data, using geospatial data, is limited, as it must satisfy the provisions of the EU Directive on the Protection of Personal Data, pursuant to which geospatial data regarding biobank participants are likely to qualify as personal data. Hence, we submit that the consents of biobank participants be reviewed to assess whether they cover the generation and linkage of geospatial data. We also submit that biobanks must have measures in place to prevent the re-identification of participants by use of their geospatial data. We present a model Geographic-Information-Systems (GIS) Toolkit, as an example of what measures could be taken to that effect.
Introduction
B
Hence, a key objective of WP7 was the development of a GIS-based Toolkit, specifically for use with data collected as a part of cohort studies (see Supplementary Materials; Supplementary Materials can be found online at www.liebertpub.com/bio).
1
This toolkit was designed to derive estimates of environmental exposures at the individual or residential address level of population-based cohort participants. In working toward the BioSHaRE-EU-GIS Toolkit, WP7 envisaged the following steps:
Step 1: The GIS environment and health toolkit will include the capacity to estimate exposures to a number of key environmental variables in a standardized way, which can then be applied to enhance the phenotype of individual cohort participants in biobanking studies. Step 2: These variables include: air pollution from traffic-related sources or industrial point sources (e.g., incinerators, power stations, etc.); electromagnetic fields (EMFs) from mobile phone masts, power lines, and/or TV transmitters; and noise pollution from transport (e.g., air, road, rail). Step 3: Cohorts/biobanks will be able to use their own source data (e.g., road network, mobile phone masts, etc.) to use the standardized methods. Where possible, existing freeware or open source software (e.g., AERMOD, US EPA dispersion model) will be implemented in the toolkit. Step 4: The addresses and postcodes of the cohort participants will be geo-referenced (e.g., to the European standard or the national coordinate system, as appropriate for the cohort under investigation) and integrated into the toolkit as point data. Step 5: Exposures will then be estimated by using the toolkit, and estimated at either a specific point location (e.g., EMFs estimated as cohort member address) or as a continuous surface (e.g., air pollution). For the latter, GIS overlay techniques, such as intersect and extract values to points, will be used to assign exposure estimates to participants. Step 6: With environmental exposures provided at the level of each cohort member, they can then be linked to other individual-level data and biobank samples for further epidemiological analysis.
As part of the ELSI work of WP7, we reviewed what legal issues were triggered by the development and use of the GIS Toolkit. The first issue, obviously, was that a population biobank must be able and permitted to procure environmental data (Steps 1 and 2). The second issue stemmed from the requirement that for a GIS Toolkit to be built, a population biobank would need access to geospatial data to enable the geo-referencing of the addresses and postcodes of the cohort participants (e.g., to the European standard or the national coordinate system, as appropriate for the cohort access [Steps 3 and 4]). Third, exposures had to be estimated by using the toolkit and the exposures provided at the level of each cohort member, so they can be linked to his or her other individual-level data and biobank samples for analysis (Steps 5 and 6).
As to procurement of environmental data (Steps 1 and 2), it turned out that the EU law actually mandates that environmental information be readily available for the public, at street level, on the web, from a host of public sources. As to access to geospatial data (Steps 3 and 4), the EU Infrastructure for Spatial Information in the European Community (INSPIRE) Directive provides that Member States shall establish and operate a network of services for spatial data sets and services, which services shall be available to the public and accessible via the Internet. However, as linking exposure data to other individual-level data (Steps 5 and 6) potentially involves the processing of personal data, this part of the process is restricted by EU Data Protection laws.
Given the European nature of the BioSHaRE-EU consortium, the legal analysis was limited to Directives of the European Union (EU), Opinions of the EU Data Protection Article 29 Working Party, and, for reasons of available expertise, the case law of the Dutch Data Protection Agency. Obviously, a full legal assessment of all pertinent issues would require additional analysis of national laws and biobank consent forms. Although the problem about access to public data, potential identification, and data linkage from various sources deals with all data, not only environmental, our focus was on environmental data, as the key objective of our work was to develop a toolkit to help enrich the nature and nurture data typically contained in a population-based cohort or biobank with environmental exposure data. The outcome of the assessment of the EU legal aspects of procuring environmental data, accessing geospatial data, and protecting any resultant personal data informed our design of the BioSHaRE-EU-GIS Toolkit, which is presented here as well.
Discussion
Procuring exposure data: the right to access environmental information
To access and obtain environmental data for use in the GIS Toolkit, biobanks and cohorts could avail themselves of their rights under the EU Directive on Public Access to Environmental Information (AEI Directive). 2 The AEI Directive was adopted by the EU to give effect to the Access to Information pillar of the 1998 UNECE (United Nations) Convention on Access to Information, Public Participation in Decision-making and Access to Justice in Environmental Matters (the Aarhus Convention). Its provisions are designed to align legislation in EU Member States with the Convention. 3 The objectives of this Directive are to guarantee the public's right of access to environmental information held by or for public authorities and to ensure that environmental information is progressively made available and disseminated to the public to achieve the widest possible systematic availability and dissemination thereof. 4
In the AEI Directive, “Environmental information” has been defined broadly, to include a variety of types of data, including, for instance, “(a) the state of the elements of the environment, such as air and atmosphere, water, soil, land, landscape and natural sites including wetlands, coastal and marine areas, biological diversity and its components, including genetically modified organisms, and the interaction among these elements; (b) factors, such as substances, energy, noise, radiation or waste, including radioactive waste, emissions, discharges and other releases into the environment, affecting or likely to affect the elements of the environment referred to in (a); […] and (f) the state of human health and safety, including the contamination of the food chain, where relevant, conditions of human life, cultural sites and built structures (.).” 5 This broad definition was found to be interpreted equally broadly by the Court of Justice of the EU. 6
Under the Directive, public authorities must organize the environmental information held by or for them, with a view to its active and systematic dissemination to the public, in particular by means of computer telecommunication and/or electronic technology, where available. 7 They must keep the information up to date and accurate. 8 “Public authority” has been defined as either a government or other public administration, including public advisory bodies, at national, regional, or local levels; any natural or legal person performing public administrative functions under national law, including specific duties, activities, or services in relation to the environment; or any natural or legal person having public responsibilities or functions, or providing public services, related to the environment. As a sidenote, this definition brings up the issue of whether a population-based cohort itself qualifies as a “public authority” and would, therefore, fall under the obligations of the Directive. It follows, however, from the definition of “public authority,” that population-based cohort studies set up by universities and academic institutions for research purposes, do not qualify as “public authority,” as they typically have not been bestowed with public or administrative power and do not provide public or administrative services. In addition, they typically have not been created on the basis of a statute or other legislative instrument and they typically lack funding from the exchequer. For similar reasons, cohorts that are maintained by public health authorities are also unlikely to qualify as “public authority”; they too are typically set up for research purposes and typically do not perform a public or an administrative function. The fact that research using the data from such a cohort could help inform public policy by public health authorities does not render the cohort itself a public authority, although this may be different in various EU member states according to their national definition of “public authority.” As to hospitals that have their collections of medical data and samples converted into a “biobank,” these biobanks are unlikely to qualify as public authority as well, for the reasons cited earlier; a hospital itself is usually not considered as having public or administrative power, even though it may be considered in some jurisdictions as performing public services.
Pursuant to the Directive, public authorities in the EU member states must make available environmental information held by or for them, to any applicant at his or her request and without him or her having to state the objective of his or her application. The term “Applicant” is defined as any natural or legal person requesting environmental information. It follows that the applicant could include biobanks, cohorts, and biobank researchers. It also follows that biobanks do not have to submit a protocol when requesting access. The Directive further prescribes that environmental information shall be made available to an applicant as soon as possible or, at the latest, within 2 months after the receipt by the public authority of the applicant's request. Access to any public registers and examination in situ of the information requested shall be free of charge; public authorities may make a reasonable charge for supplying any environmental information.
In addition, to help the public exercise their right of access to environmental information, EU Member States must ensure that officials be required to support the public in seeking such access, that information officers be designated, that facilities for the examination of the information requested be established and maintained, and that registers with environmental information be held by public authorities, with clear indications of where such information can be found. 9
Of particular relevance to the BioSHaRE-EU-GIS Toolkit is that upon request, and for certain kinds of environmental information (e.g. energy, noise, radiation or waste, including radioactive waste, emissions, discharges, and other releases into the environment), public authorities must report on the place where the information can be found. 10 The public authority must also indicate the measurement procedures, including methods of analysis, sampling, and pre-treatment of samples, used in compiling the information, or refer to a standardized procedure used, where requested. 10 This type of metadata about the provenance of the information could significantly help make scientific analysis by population biobanks and their researchers of these data more robust.
Requests for access to environmental information may only be refused by the public authorities on a limited, exhaustive number of explicitly listed grounds, which are to be interpreted restrictively. 11 One noteworthy ground for refusal is when disclosure of the information would adversely affect the confidentiality of personal data and/or files related to a natural person where that person has not consented to the disclosure of the information to the public, and where such confidentiality is provided for by national or Community law. 12 However, there exists an exception to this principle whereby the confidentiality of this information may not be invoked as a ground for refusal in cases where the request relates to information on emissions into the environment (the “emissions rule”). 13 The emissions rule creates a legal presumption by which the public interest served by the disclosure prevails over the confidentiality of such information, in the case of information related to emissions into the environment. 6
Accessing geospatial data: the right to access the spatial data infrastructure
To link their existing data to environmental data by using the GIS Toolkit, biobanks could avail themselves of the EU Directive establishing an INSPIRE. 14
The INSPIRE directive aims at creating an EU spatial data infrastructure, to enable the sharing of spatial information among public sector organizations and to facilitate public access to spatial information across Europe. The primary aim of the INSPIRE Directive is to solve problems related to EU and Member States cross-border environmental policies due to the (lack of) availability, quality, organization, accessibility, and sharing of spatial information. 15 It aims at solving these problems by implementing measures that address the exchange, sharing, access, and use of interoperable spatial data and spatial data services across the various levels of public authority and across different sectors. To that end, the Directive establishes an infrastructure to enable the sharing of spatial information in the European Community. Also, it sets out provisions to ensure that spatial information infrastructures in the Member States be designed to ensure spatial data from different sources across the Community be combined in a consistent way and shared between several users and applications. 16
The Directive requires that Member States establish and operate a network of services for the spatial data sets and services for which metadata have been created. It then provides that those services be made available to the public and accessible via the Internet or by any other appropriate means of telecommunication. More specifically, the Directive applies to inter alia discovery services, making it possible to search for spatial data sets and services on the basis of the content of the corresponding metadata and to display the content of the metadata, view services, making it possible, as a minimum, to display, navigate, zoom in/out, pan, or overlay viewable spatial data sets and to display legend information and any relevant content of metadata, download services and transformation services, enabling spatial data sets to be transformed with a view to achieving interoperability. 17
The Directive allows the Member States to limit public access to spatial data sets and services where such access would adversely affect any of a number of specific interests, such as public security, national defense, intellectual property rights, or, notably and of relevance to the linking of exposure data to biobank data using GIS, the confidentiality of personal data, where the person whom the data relates to has not consented to the disclosure of the information to the public. 18 However, the grounds for limiting access are interpreted in a restrictive way and Member States may not limit access to information on emissions into the environment, in line with as the “emissions rule” discussed earlier. According to the INSPIRE directive, Member States must also ensure that the services for access to spatial data are available to the public free of charge, subject to certain exceptions.
Protecting exposure data as personal data
Having identified EU legislation governing the collection of environmental and geospatial data (Steps 2 and 4 of the GIS Toolkit), the question remaining is whether the actual processing and linking of the exposure data (Steps 5 and 6) somehow or at some stage involve the processing of personal data and hence are subject to data protection laws. The BioSHaRE-EU-GIS Toolkit proposes to geocode the addresses and postcodes of cohort participants and to integrate this information into the toolkit as point data. Geocoding, also known as address matching, is the process of assigning geographical coordinates (e.g., latitude and longitude) to geographical location data, such as home addresses of individuals. These geographical coordinates can then be used to assign an exposure to that particular location or to individuals residing or working at that location. 19 This proposed methodology raises the question of whether it qualifies as the processing of personal data and whether the resultant geographical coordinates are to be considered “personal data,” as defined in the EU Data Protection Directive (95/46/EC). 20
The Directive defines personal data as any information related to an identified or identifiable natural person (“data subject”). An identifiable person is one who can be identified, directly or indirectly, in particular by reference to an identification number or to one or more factors that are specific to his or her physical, physiological, mental, economic, cultural, or social identity. 21 Recital 26 of the Directive provides that “whereas to determine whether a person is identifiable account should be taken of all the means likely reasonably to be used either by the controller or by any other person to identify the said person.” The term “personal data” has been interpreted by the EU's Article 29 Working Party, 22 which generally summarizes that the Directive contains a broad notion of personal data, and that the objective of the rules contained in the Directive is to protect individuals. 23 Therefore, flexibility is embedded in the text to provide an appropriate legal response to the circumstances at stake. 24
Furthermore, even data that do not concern a specific person can, nevertheless in some cases reveal information about a specific person; for instance, telephone numbers, license plates, and, notably, postal codes with house numbers are examples of such. 25 In this line, the geocoding of addresses and zip codes can be interpreted as constituting the processing of personal data, and it is therefore subject to the obligations laid down in the Directive. As to whether the resultant geo-coordinates constitute personal information is less clear and requires some further interpretation. According to the Article 29 Working Party, the concept of personal data covers information available in whatever form, in whatever format (e.g. alphabetical, numerical, graphical, photographical, or acoustic), and in whatever medium. 26 This definition seems sufficiently broad to include information in the form of coordinates, for example, one such as N40°42′46.021" W74°0′21.388".
In its Opinion on geo-location services on smart mobile devices, the Article 29 Working Party indicated that location data, for instance, from smart mobile devices, or from base stations and GPS technology, relate to an identified or identifiable natural person, and, consequently, are subject to the provisions of the Data Protection Directive. 27 In reaching its opinion, the Article 29 Working Party considered that this indirect identifiability also applies to WiFi access points. 24 More specifically, the Media Access Control (MAC) address of a WiFi access point, in combination with its calculated location, is inextricably linked to the location of the owner of the access point and, hence, the data controller should treat all data about WiFi routers as personal data.
The Article 29 Working Party further considered that the providers of geo-location-based services can gain an intimate overview of habits and patterns of the owners of devices collecting geo-location data, and they can build extensive user profiles. A behavioral pattern may also include special categories of perhaps more sensitive data, if it were to reveal, for example, visits to hospitals and religious places, presence at political demonstrations, or presence at other specific locations, thus revealing data about, for example, sex life. These profiles can be used to take decisions that significantly affect the owner of the device.
The Article 29 Working Party also stressed that, as with other new technologies, a major risk with the use of location data is “function creep,” which is essentially the eventual development of new uses for the data, which were not anticipated at the time of the original collection. 28 Much of this rationale also applies to the linking of environmental exposure data to the data of individual biobank participants. Although not intended, such proposed linkage may also likely enable the detailed profiling of participants. Indeed, some linkages could include historical (past residential) address data of participants, enabling “lifetime epidemiology” with the help of GIS. Furthermore, the fact that such profiling would occur in the context of not-for-profit (academic) research rather than in a commercial context (for example, through the use of smart mobile devices) does not matter, as the EU definition of personal data does not distinguish between the (alleged) purposes for the use of data.
Case law of Dutch Data Protection Agency
The Dutch Data Protection Agency (DPA) has provided interesting analyses of the question of whether geo-information can constitute personal data.
In the case regarding Google Street View, the DPA rejected the argument made by Google that the possibility to locate a Wifi router in a building does not entail the identification of the holder, as more than one person could use the router. 29 The DPA reasoned that in most cases, the location of the Wifi router can be determined at house door level. Subsequently, the holder of the Wifi router could be identified by using public registries, such as the land registry or the phonebook. It is true that in some cases the Wifi router can be shared by multiple people, but in other cases it is used by only one person. Moreover, cumulative use of a Wifi router does not mean that there is no longer identifiability. Phone numbers, license plates, and zip codes with house numbers, in which case it is also true that they can be shared by multiple people, also qualify as personal data. The mere fact that a category of data is not always traceable to one person does not prevent this category, in general, from being qualified as personal data. The DPA further reasoned that neither the lack of intent to identify nor the costs of identification were relevant, as the conclusion was that Google could establish location and, subsequently, identification without disproportionate effort, which is the legal test. Finally, referring to the Article 29 Working Party Opinion on Geolocation Services regarding smart phones, the DPA stressed that it is the combination of data that collectively qualifies as personal data: originally object data coupled with other data, resulting in personal data. In view of this case law, the geo-referencing of addresses and postcodes of the cohort participants that is a part of the GIS Toolkit (Steps 4 and 5) is likely to qualify as the processing of personal data (even if done on a no-names basis).
Our qualification (“likely”) stems from the fact that there is a difference and a challenge in geo-location data in relation to urban and rural areas, since location information by street in Manhattan is very different from that in rural areas. Indeed, environmental exposures, within an urban environment, vary greatly due to the density and mix of sources. In rural areas, however, sources are sparser and thus exposures are more constant. The challenge, therefore, is to optimize the geo-location data within urban areas, as this is where the variation in environmental exposures is the greatest. This optimization could also take into account that not all studies need location information down to the second (latitude, longitude) or meters. The spatial resolution that is ideally needed depends on the exposure being studied: Some pollutants reduce rapidly as you move away from the source, for example, some road traffic–related air pollutants, EMFs; some are more similar over larger areas, for example, aircraft noise. It also depends on what spatial scale can be obtained in population health data (e.g., area-level census data versus address-level cohort data). One avenue then, to avoid the application of the Data Protection Directive, would be to lower the resolution rate in the exposure study concerned. Whether this result (the used geo-data no longer qualify as personal data) could be achieved must be established on a study-by-study basis.
To the extent that the EU Data Protection Directive applies, measures to protect privacy must be taken. These are built into the GIS Toolkit, as described next.
The BioSHaRE-EU-GIS Toolkit: linking address data to environmental data
Taking the earlier considerations of the legal aspects into account, the Integrated BioSHaRE-EU-GIS Toolkit was delivered (see Supplementary materials; Supplementary materials are available online at www.liebertpub.com/bio). The kit gives a comprehensive overview of several procedures and methods that can be used to assist the integration and analysis of cohort/biobank data on environmental exposures and human health. Schematically, the Toolkit can be presented as follows (Fig. 1).

BioSHaREEU GIS Toolkit: procedures for linking environmental exposure data to individuals participating in cohorts/biobanks, while safeguarding privacy issues. GIS, Geographic Information Systems; BioSHaRE-EU, Biobank Standardisation and Harmonisation for Research Excellence in the European Union.
The biobank's participant's individual address and health data (based on questionnaires and biological samples) are usually held in a secure database. The following steps describe the procedure:
1. Export addresses and a unique identifier (ID1); 2. The address will be geocoded—for example, assigned to an x, y co-ordinate; 3. Environmental exposures will be modeled and assigned to each individual record; 4. An enriched dataset including x, y coordinates and environmental exposures will be handed back to the cohort/biobank for integration into the secure database based on the unique identifier ID1;
With unique identifier (ID2, note that this should be a different identifier than ID1, to ensure non linkage), the environmental exposure and the health data will then be securely transferred to the researcher conducting health analyses. The researcher has health and exposure information but not address information, thus making the dataset non identifiable. 1
As to a mediator or an authority that would serve as a mediator and link the information, in the United Kingdom, this process would be subject to usual ethics and governance approvals. For more details, including a method to document the accuracy of the geocoding procedure, please refer to the full description of the BioSHaRE-EU Integrated EU GIS Toolkit. 1
Conclusion
This article examines the legal aspects under EU law of linking population biobank data to environmental exposure data of individual biobank participants by using GIS. Our analysis of the pertinent EU legal framework reveals, on the one hand, that EU law actually facilitates the collection of both environmental data and geospatial data. Indeed, under EU law, biobanks and researchers have the right to collect environmental data and access to geospatial data. The EU Directive on Access to Environmental Information mandates that environmental information should be readily available for the public, at street level, on the web, from a host of public sources and does not reveal specific limitations to linking health and GIS data. A similar conclusion is reached in our review of the EU Directive on INSPIRE, which provides that Member States shall establish and operate a network of services for spatial data sets and services, which services (discovery, viewing, downloading) shall be available to the public and accessible via the Internet. Indeed, both Directives even provide that the right to access information on emissions into the environment may override the confidentiality of personal data, even where that person has not consented to the disclosure of the information to the public (the “emissions rule”). On the other hand, however, linkage of health and environmental data, using GIS data, does pose a problem under the EU Directive on the Protection of Personal Data, as the geographic coordinates resulting from geocoding zip codes and street addresses of biobank participants are likely to qualify as personal data. One avenue to avoid the application of the Data Protection Directive would be to lower the resolution rate in the exposure study concerned. Whether this result (the used geo-data no longer qualify as personal data) could be achieved must be established on a study-by-study basis. To the extent that the EU Data Protection Directive does apply, the processing of biobank-geo data would typically require participants' consent. In addition, measures must be in place to protect their privacy. We present a model GIS Toolkit that incorporates such measures.
Footnotes
Acknowledgments
The research leading to these results was supported by the Biobank Standardisation and Harmonisation for Research Excellence in the European Union (BioSHaRE-EU) program that received funding from the European Union Seventh Framework Program (FP7/2007–2013) under grant agreement No. 261433.
Author Disclosure Statement
No conflicting financial interests exist.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
