Data Classification

Introduction

The nature of the type of data in question is crucial, as it will determine the applicable legal norms. If data are not considered personal data, there will not be any particular data protection issue to deal with, as the General Data Protection Regulation (GDPR), the main binding legal text applicable in the European Union (EU), only applies to personal data. The concept of personal data plays a key role in the GDPR, characterizing its material scope. The provisions in the GDPR only concern personal data, excluding data that:

do not concern humans (e.g., data on natural phenomena),
data that, though concerning humans, do not refer to particular individuals because they are anonymized data, that is, data that lost its connection to specific individuals (even though the effective success of anonymization techniques remains to be demonstrated).

Within the field of personal data, there is a special category of data, referred to as “sensitive data” by Article 9 of the GDPR which notably includes genetic data and data concerning health. These data require additional protection as they can go to the very core of a human being and thus unauthorised disclosure may lead to various forms of discrimination and violation of fundamental rights. The special nature of sensitive data can be traced to the requirements for data processing: any kind of personal data requires a legal ground allowing its processing, as set forth in Article 6(1) of the GDPR; however, when it comes to sensitive data, an additional legal ground, to be found in Article 9(2) of the GDPR, is required.

On the 3rd of May 2022, the European Commission launched the Proposal for a Regulation for the European Health Data Space (EHDS), targeted to be a central piece of the European Health Union. Considering the value that health data have nowadays, but also the stringent difficulties imposed by the GDPR on its use within the EU, the aim of this legal text is to facilitate the use, reuse and sharing of electronic health data, both personal and non-personal.

The EHDS Regulation Proposal would rule every matter involving ‘electronic health data’, a concept that has a very broad scope, as it all covers fitness and lifestyle data. This regulation lays on a comprehensive conception of the aims for which data can be reused (i.e., secondary uses): medical care, obviously, but also research, development, testing of artificial intelligence systems, and drafting of health policies. Note that any form of commercial, for-profit purposes, is excluded.

In the part referring to personal data, the future EHDS Regulation Proposal must be read in conjunction with the GDPR in order to avoid conflicts between both regulations, but up to this moment the relation between these two regulations is dubious.

Through the development of gene and cell therapy, all these different types of data may be used: non personal data, personal data, and more specifically sensitive data such as genetic or health data, as well as electronic health data.

Main actors

Under the GDPR the main stakeholders are the data controllers, the data processors, and the data subjects.

Data controllers are the natural (human person) or legal persons (non-human person) that determines which data will be processed, for which purpose and using which means. (Article 4(7) GDPR)

Data processors are the natural or legal person that processes personal data on behalf of the controller. Data processors do not always exist, as the controller itself may be responsible for all these tasks. (Article 4(8) GDPR)

Data subject is the natural person (therefore, only human persons, not companies or any other legal person) to whom the personal data refer.

The data processing activities of data controllers and data processors are monitored by national and regional data protection authorities, who have the power to impose the heavy fines provided for in the GDPR.

The EHDS Regulation Proposal would bring a new category of relevant stakeholders: data recipients, data holders, data users and health data access bodies.

Data recipient would be the natural or legal person that receives data from another controller in the context of the primary use of electronic health data. (Article 2(2)(k) EHDS Regulation Proposal)

Data holder would be the natural or legal person operating in the healthcare domain (as a healthcare provider, researcher, developer of AI medical tools, entity or institution in charge of monitoring the activity of another player), that has the electronic health data under its control, to be transmitted to data recipients for primary uses of those data, or data users of the referred data. (Article 2(2)(y) EHDS Regulation Proposal)

Data user is the natural or legal person who has lawful access to electronic health data for secondary use. (Article 2(2)(z) EHDS Regulation Proposal)

Health data access bodies would be administrative authorities to be created in the Member States to perform the tasks listed in Article 27 of the EHDS Regulation Proposal, namely, to issue the data permit that will allow data users to have access to electronic health data for secondary purposes, arguably in a transparent, simplified, and secure way. When that happens, the health data access bodies and the data users will be both data controllers (joint controllers).

Definitions

Personal data: any information relating to an identified or identifiable natural person (the data subject). The category of personal data defines the scope of the GDPR, as its strict rules only apply to personal data. (Article 4(1) GDPR)

Identifiable person: the natural person that is not known at the moment, but that can become known using certain identificatory data, such as his/her name, national identification number, online identifier, etc. (Article 4(1) GDPR)

Data processing is the operation or set of operations performed on personal data, whether or not by automated means. (Article 4(2) GDPR)

Pseudonymised data are data that no longer can be ascribed to a specific natural person, except in conjunction with additional information, which is kept separately from those data. The GDPR does not provide a legal definition of this concept, but it defines ‘pseudonymisation’ in Article 4(5).

Anonymised data are – a contrario sensu from Article 4(1) of the GDPR - data that does not allow the identification of the person to whom they are referred (for example, aggregated data). For the data to become anonymised they must undergo an anonymization process. It has been said that genetic data are never truly anonymous. According to the authors of a paper published in 2019 in Nature, ‘even heavily sampled anonymized datasets [the authors are referring to genetic datasets are unlikely to satisfy the modern standards for anonymization set forth by GDPR and seriously challenge the technical and legal adequacy of the de-identification release-and-forget model’.[1]

Genetic data are personal data ‘relating to the inherited or acquired genetic characteristics of a natural person which give unique information about the physiology or the health of that natural person and which result, in particular, from an analysis of a biological sample from the natural person in question’. (Article 4(13) GDPR)

Health data are ‘personal data related to the physical or mental health of a natural person, including the provision of health care services, which reveal information about his or her health status’. (or, as the GDPR refers to it in Article 4(15), ‘data concerning health’)

According to Article 9(1) of the GDPR, sensitive data involve data that reveal the ‘racial or ethnic origin, political opinions, religious or philosophical beliefs, or trade union membership, and the processing of genetic data, biometric data for the purpose of uniquely identifying a natural person, data concerning health or data concerning a natural person’s sex life or sexual orientation’. Sensitive data are a special category of personal data, whose legal protection is more stringent due to the intimate connection between these data and the human person. This feature might potentially lead to severe threats to the fundamental rights of the data subject in case of undue processing and/or data breach involving these data. Therefore, the processing of this type of data is, as a rule, forbidden by Article 9(1) of the GDPR, unless one of the conditions provided for in Article 9(2) of the GDPR is met in the specific case.

‘Personal electronic health data’ is a concept brought by the EHDS Regulation Proposal, which refers to data concerning health and genetic data as defined in Regulation (EU) 2016/679, as well as data referring to determinants of health, or data processed in relation to the provision of healthcare services, processed in an electronic form. (Article 2(2)(a) EHDS Regulation Proposal)

[1] Rocher, L., Hendrickx, J.M. & de Montjoye, YA. Estimating the success of re-identifications in incomplete datasets using generative models. Nat Commun 10, 3069 (2019). https://doi.org/10.1038/s41467-019-10933-3.

Challenges

The GDPR governs the processing of personal data. In the context of health and human genetic data processing, determining when data are personal data is of fundamental importance and, at the same time, a challenge.

As portrayed in the subentry on ‘Data Protection Main Principles’, the intrinsic nature and characteristics of human genetic data raise complex challenges not only for data classification but also for the interpretation and application of data protection principles and rules as a whole.

Concerning the processing of human genetic data within the clinical, health and pharmaceutical ecosystems, including for research purposes, it becomes increasingly harder to underpin when genomic data is to be considered personal data. As technical and scientific developments unfold, the analysis of human genetic data becomes increasingly specialized and complex, which may result in the segmentation of human genetic data to an extent which may hinder the task of classifying it as genetic information, health information and personal or non-personal data.

Furthermore, human genetic data is often pseudonymised or processed through automated means which seek to take out the personal or identifiable elements of such data. Very often, such measures do not ensure the irreversible anonymisation of personal data given the subsistence of a link, albeit indirect, with the data subject to whom such data concerns.

At the same time, genomic data may be seen as inherently identifying personal data (even in cases where there are no further links to or impacts on the data subject) since these are, in themselves, elements which inextricably integrate part of a person’s identity.

Moreover, another challenge is the interpretation of the concept of human genetic data as defined in the law. The scientific and legal ambiguities associated with this particular kind of data, as well as the current tendency for legal fragmentation and the lack of a homogenous approach, jeopardize a tangible definition. This generates risks and threats related to the existing data protection principles and requirements, as well as the disruption of the potential for specific regulations, guidelines or best practices on the processing of human genetic data across the EU.

Although the concepts of genetic and health data may overlap, genetic data may provide insight into a data subject’s future health condition. Additionally, genetic data may unveil information concerning groups of individuals (such as family). This also brings about intricacies in the application of data protection principles and provisions as most legal frameworks take an individualistic approach to data protection rights.

Any processing of personal data must comply with the GDPR, deemed to be the most stringent law on data protection in the world. When the data at stake are sensitive data - such as health and genetic data – the process becomes even more complicated, for a start because two cumulative legal grounds must be met to guarantee the lawfulness of the data processing: a general legal ground for every kind of personal data processing, set forth in Article 6(1) of the GDPR, and a specific legal ground, required only for sensitive data, established in Article 9(2) of the GDPR.

Failure to comply with the norms of the GDPR might involve heavy administrative fines. Article 83 GDPR provides for two tiers of fines: the less severe ones can go up to €10 million, or in the case of an undertaking, up to 2% of the worldwide annual revenue from the preceding financial year, whichever amount is higher (Article 83(4) GDPR); whereas the most severe infringement can result in a fine up to €20 million, or in the case of an undertaking, up to 4% of the worldwide annual revenue from the preceding financial year, whichever amount is higher (Article 83(5) GDPR). In addition, the data controllers and/or data processors that, due to the infringement of the GDPR, caused material or non-material damages to a natural person, might be asked to pay compensation for damages, as per Article 82 of the GDPR.

The GDPR does not exclude the provision of criminal penalties for its violation, as provided for in the national laws of Member States.

A way to avoid being ruled by the GDPR - and thus to avoid its severe sanctions – is to operate with anonymised data instead. However, full anonymisation is extremely difficult to reach, as even the smallest detail can reveal the natural person’s identity.[2] The challenges posed by anonymization are especially stringent in regard to health data and genetic data. It has been said that anonymisation is impossible to achieve, as it is always possible to reverse the process and identify the data subject. This is especially true for genetic data, which are related to core identifying features of the human person.

GDPR’s Recital 34 extends the definition of genetic data to a general concept, which not only includes chromosomal, DNA or RNA analysis, but all other types of analysis that enable the obtainment of equivalent information. Thus, although it may be suggested that (i) not all genetic information is genetic data; (ii) not all genetic information is personal data; (iii) a genetic sample is not personal data until it enables the drawing of conclusions through analysis and data production; as well as that (iv) data resulting from the analysis of genetic data may only constitute personal data to the extent such genetic data is associated to an identifiable person; the definition of personal data included in the GDPR is broad to the extent that encompasses any identification by reference to factors specific to the genetic identity of an individual, independently of the means of analysis at stake.

Consequently, the practical effect of such interpretation leads us to the conclusion that the processing of human genetic data (which includes core genetic markers uniquely related to a person), as well as the results stemming from such processing, will most likely constitute personal data. Moreover, the processing of certain special categories of personal data, in particular genetic data, must comply with Article 9 of the GDPR.

Additionally, all results stemming from the analysis of genetic data which are linked to a specific biological sample are usually personal data, even if the results themselves are not unique to the individual, because the sample is by its nature specific to an individual and provides the link back to his/her specific genetic identity.

Nevertheless, it should be noted that in some cases genetic data may not constitute personal data. Suppose, for instance, anonymised information (for instance, genetic testing results) that may no longer, in any circumstance, be associated with a specific person, provided that there are no further records of genetic identity nor any other identifier. However, as seen, this scenario is not common in practical terms.

In a 2021 document, the European Data Protection Board (EDPB) recognised the fallibility of anonymisation techniques regarding genetic data and recommended data controllers always treat those data as if they were personal data, even when allegedly anonymised: "The EDPB points out that the possibility to anonymise genetic data remains an unresolved issue. As yet, it remains open to be demonstrated whether any combination of technical and organisational means can be effectively employed to remove genetic information from the material scope of the GDPR (...) it is strongly advised that such genetic data is treated as personal data and that the processing thereof is conducted with the implementation of appropriate technical and organisational measures to ensure compliance with the Regulation" [3]

[2] Finck, M., & Pallas, F. They who must not be identified—distinguishing personal from non-personal data under the GDPR. International Data Privacy Law, 10(1), 11-36, 2020.

[3] European data Protection Board, EDPB Document on response to the request from the European Commission for clarifications on the consistent application of the GDPR, focusing on health research, 2 February, 2021, https://edpb.europa.eu/sites/default/files/files/file1/edpb_replyec_questionnaireresearch_final.pdf?fbclid=IwAR1Vq9hjGYWjgfRHZdT6K326uQCEWHqaSdppoBHBVrm0rcUCSr7vIAh4PLU

Opportunities and incentives

The European Commission has already made note that there is an enormous volume of data not being used in the EU (eventually due to limitations imposed by the GDPR and the fear of violation of its demands), which has resulted in huge financial losses and undermined our technological development. The EHDS Regulation Proposal aims to respond to these needs, creating a more flexible regime to operate with electronic health data. Contrary to the GDPR - which strongly restricts the use, reuse and sharing of personal data - the EHDS Regulation Proposal aims to boost all these activities. One of its promising norms is Article 34, which sets out the requirements for secondary uses of these data by the data users, pursuant to a data permit to be issued by the health data access bodies (data access request targeting multiple data holders), (Article 49 EHDS Regulation Proposal), or by the data holder itself (data access request to one single data holder in one single Member State). (Articles 45-46 EHDS Regulation Proposal)

Another possible way to engage in data processing without being bound by the demanding criteria imposed by the GDPR is to use synthetic data, that is, data that are produced by artificial intelligence.[4] Some experts claim that synthetic data are not personal data and, as such, are not ruled by the GDPR.[5] However, this assessment is still open to discussion. In a press release from the European Data Protection Supervisor,[6] it is said that synthetic data may allow the identification of the individuals to whom the data used to create such synthetic data refer. If this is demonstrated correct, synthetic data must still be considered personal data.

[4] Synthetic Data - What Is It and What You Need to Know About It, Hyperight, Feb. 22, 2022

[5] What Privacy Officers Need to Know About Synthetic Data, WireWheel, July 13, 2021

[6] Synthetic Data | European Data Protection Supervisor

Practical steps

Before initiating any data processing involving personal data, the data controller must guarantee that its organization complies with the principles of data protection by design and by default (Article 25 of the GDPR). This previous assessment might prevent future lawsuits for failure to provide adequate protection to the personal data being processed. Amongst the technical safeguards to be taken, note the procedures for automatically classifying personal data, forms of pseudonymization and encryption of data, and technical cybersecurity measures.

The appointment of a Data Protection Officer (DPO) is always recommended, to ensure that the data processing complies with the GDPR. However, in some scenarios it is mandatory: whenever the organization processes sensitive data on a large scale; or, for all other types of personal data, whenever the processing involves large scale, regular and systematic monitoring of individuals (Article 37 of the GDPR).

Each data processing activity requires a proper legal ground to be considered lawful, as required by Article 5(1)(a), which must be found in Article 6(1) of the GDPR. The data controller must analyse which one of the situations listed in Article 6(1) is more adequate in light of the particular circumstances.

Consent might be the more intuitive option (Article 6(1)(a) of the GDPR), but in fact, it is extremely difficult to meet the demanding conditions laid down for consent as a legal ground: it must be informed, voluntary, specific and expressed by a clear affirmative action. Therefore, consent cannot be presumed by mere behaviours, such as accepting a previously ticket box (it must be ‘opt in’ and not ‘opt out’).

When personal data are also sensitive data, an additional legal ground is required, to be found in Article 9(2) of the GDPR. There is no absolute coincidence between the legal grounds provided for in Article 6 and Article 9 of the GDPR, which makes the search for a double legal ground a complex task.

A Data Protection Impact Assessment (DPIA) must be carried out whenever the data controller uses new technologies, whose characteristics make them suitable to negatively interfere with the rights and freedoms of natural persons (Article 35(1) of the GDPR). This is especially the case when the data processing involves automated processing, including profiling; massive processing of sensitive data (as listed in Article 9(1) of the GDPR); or systematic and extensible monitoring of publicly accessible areas (Article 35(3) of the GDPR).

This would certainly be the case of a biotechnology company offering genetic tests directly to consumers in order to assess and predict health risks or, similarly, a healthcare provider or hospital processing patients’ genetic and health data through the hospital information system.

Among other triggers, the processing of genetic human data often relates to sensitive data or data of a highly personal nature, as well as data concerning vulnerable data subjects and/or data processed on a large-scale. Stakeholders ought to bear in mind that, whenever these factors are applicable, a DPIA should be undertaken.

In some cases, the data controller will be required to maintain records of the processing activities (Article 30 of the GDPR):

When the company has more than 250 employees;
When the processing might involve risk to the rights and freedoms of data subjects;
When data processing is not occasional (to be ‘not occasional is not the same as ‘to be systematic’); or
When the processing involves sensitive data (in the sense of Article 9(1) of the GDPR) and/or data about criminal convictions and criminal offences (as per Article 10 of the GDPR).

Interactions with regulators

Any personal data processing is subject to the control, and eventual prior assessment, of national (and in some countries also regional) data protection authorities, called ‘supervisory authorities’ under the GDPR. These authorities, in turn, work in close cooperation with the European Data Protection Board, which oversees the uniform application of the GDPR within the EU.

The EHDS Regulation Proposal will bring two new regulators: the National Health Data Access Bodies and the European Health Data Space Board.

European Union Legislation

Charter of Fundamental Rights of the European Union, 26 October 2012, OJ C 326, 26.10.2012, p. 391-407, CELEX number: 12012P/TXT

Original text (available in the 24 official languages of the EU)

Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation) (Text with EEA relevance), OJ L 119, 4.5.2016, p. 1-88, CELEX number: 32016R0679

Original text (available in the 24 official languages of the EU)
Current version with last amendments (available in the 24 official languages of the EU)
Document summary (available in the 24 official languages of the EU)

Directive (EU) 2016/680 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data by competent authorities for the purposes of the prevention, investigation, detection or prosecution of criminal offences or the execution of criminal penalties, and on the free movement of such data, and repealing Council Framework Decision 2008/977/JHA, OJ L 119, 4.5.2016, p. 89-131, CELEX number: 32016L0680

Original text (available in the 24 official languages of the EU)
Current version with last amendments (available in the 24 official languages of the EU)
Document summary (available in the 24 official languages of the EU)

Proposal for a Regulation of the European Parliament and of the Council on European data governance (Data Governance Act), 25 November 2020, COM/2020/767 final, CELEX number: 52020PC0767

Original text (available in the 24 official languages of the EU)

Proposal for a Regulation of the European Parliament and of the Council on the European Health Data Space, 3 May 2022, COM/2022/197 final, CELEX number: 52022PC0197

Original text (available in the 24 official languages of the EU)

European Union Guidance

Europen Data Protection Board:

Relevant literature

They who must not be identified-distinguishing personal from non-personal data under the GDPR - Michele Finck and Frank Pallas. International Data Privacy Law, 2020, Vol. 10, No. 1, 11-36
- Scientific abstract
The EU General Data Protection Regulation (GDPR): A Commentary - Christopher Kuner, Lee A Bygrave, Christopher Docksey, Laura Drechsler and Luca Tosoni (editors). New York, Oxford Academic online edition (2020).
- Scientific abstract
The EU’s General Data Protection Regulation (GDPR) in a Research Context - Christopher F. Mondschein, Cosimo Monda. Fundamentals of Clinical Data Science (2019): 55-71.
- Scientific abstract
Big Data in medical research and EU data protection law: challenges to the consent or anonymise approach - Mostert, M., Bredenoord, A., Biesaart, M. et al. European Journal of Human Genetics 24, (2016): 956–960.
- Scientific abstract
Estimating the success of re-identifications in incomplete datasets using generative models - Rocher, L., Hendrickx, J.M. & de Montjoye, YA. Estimating the success of re-identifications in incomplete datasets using generative models. Nature Communications 10, 3069 (2019).
- Scientific abstract
Good Privacy Protection Practice in Clinical Research: Principles of Pseudonymization and Anonymization - Heinz Schriever and Markus Schröder. Berlin, München, Boston: De Gruyter, (2014).
- Scientific abstract
GDPR and Biobanking - Santa Slokenberga, Olga Tzortzatou and Jane Reichel (editors). Springer book (2021).
- Scientific abstract
The Patient, Data Protection and Changing Healthcare Models: The Impact of e-Health on Informed Consent, Anonymisation and Purpose Limitation - Verhenneman G., Intersentia (05/2021).

Acknowledgements

Published: 14/03/2023

Updated: 01/07/2024

Authors:

Vera Lúcia Raposo, Associate Professor, NOVA School of Law - NOVA University of Lisbon; FutureHealthlaw / WhatNext.Law

Tomás de Brito Paulo, LL.M Law & Technology at Tilburg University, Tilburg, the Netherlands; Associate at Vieira de Almeida & Associados, Lisbon, Portugal

Reviewed by:

Aurélie Mahalatchimy, EuroGCT WP4 Convenor, UMR 7318 DICE CERIC, Aix-Marseille University, CNRS, Aix-en-Provence, France

Volver arriba

Preguntas frecuentes

Glosario

Elegir idioma

Enfermedad

Tema