Data sharing is an important activity in scientific research for allowing various analyses and maximising the potential of the data for serving advances in health knowledge and innovation. Data sharing allows initial research purposes to be achieved and data reuses for other research purposes, including replication studies.
Collaborative research and scientific publication requires sharing data as part of research integrity duty.
Data sharing can intervene at different time of the research, either during a research project (research data sharing) or after a research project (research results sharing). While both aspects have similarities in terms of rules to respect, open data practices are specific data sharing concerning research results that will be dealt in the dedicated section “Open data”. Nevertheless, both sections are complementary and shall be read as a whole.
Of note, the sharing of (human) biological samples for research analyses shall be considered as a data sharing. Rules attached to the sharing of such biological material articulates with data protection regulations.
Data sharing is usually organised between different legal entities. Parties to the sharing could be stakeholders in a single research project (data sharing as collaboration). The sharing could also take place with stakeholders that are not involved in a collaborative project (data sharing as a service provision). This includes data sharing with or amongst different private companies (so-called “business-to-business” or "B2B" sharing), as well as from a private company to the public sector (so-called “business-to-government” or "B2G") and from a public sector entity to a private company (so-called “government-to-business” or "G2B").
Any data sharing activity necessitates data processing which can be subject to privacy protection or intellectual property (IP) laws and institutional policies. Data sharing practices and attached governance mechanisms should be adapted to the level of sensitivity attached to the data type (Please see the entry on Data Classification) and to the purpose of use.
Data sharing practices can consist in:
the provision of access to the data to a third legal entity (natural or legal persons) for a defined purpose of use, without moving the data being object of analysis, or
the provision of data, as a sending (moving) of data from a legal entity to a third legal entity for an identified purpose of use. This option is not recommended where sensitive personal data are concerned.
Different situations can be envisaged:
Sharing personal data as research data
Within a project consortium (as a collaboration)
With third parties (as a service provision)
Sharing non-personal data as research data
Sharing research data for publication (cf. Open data)
Sharing research results (cf. Open data)
As a paramount principle, legal and ethical rules applicable to the type of data processed for a sharing aiming to ensure that the level of protection of the data (in particular regarding personal data) must not be undermined when the data are shared. The same level of protection ensured by the data holder that allows sharing shall be afforded by the data recipient or user as a guarantee of trustworthy relationships.
Data sharing necessitates adequate data management capacities including IT infrastructures and data governance framework. The way in which the data will be organised and structured will be essential for allowing efficient and responsible data uses.
The data sharing operations must have been designed in the early steps of the project and data flows must be detailed in a data management plan (DMP). Where relevant regarding the nature of the data at stake, sharing activities and stakeholders must be assessed in the frame of a personal data protection impact assessment as required under the GDPR.
Researchers and their institutional Directory must be involved in the design of the data sharing activities, whatever the data category at stake.
Legal advisory services must be involved in the planification of data sharing and contractualisation step, including the Data Protection Officer(s) (DPO) of each institution involved in the sharing.
Data managers and IT professionals ensuring data access and/or storage capacities must be involved where practical, scientific or technical processing are required.
Data sharing or sharing of datameans the act of providing data or access to data from a data holder to a data user, usually a third legal entity, directly or through an intermediary, for a defined processing purpose, subject to applicable technical, financial, legal, or organisational use requirements, based on voluntary agreements.
Data accessor access to datameans the act of providing data access for processing by a data user, subject to specific technical, legal, or organizational requirements, without data transmission or downloading outside the access platform perimeter.
Data holder refers to an organisation or individual who, according to applicable laws or regulations, is competent to decide on granting access to or sharing data under their control, regardless of whether or not such data are managed by that organisation or individual or by an agent on their behalf.
Data producer refers to an organisation or an individual that creates, co-creates, generates, or co-generates data, including as a by-product of their social and economic activities, and can therefore be considered a primary data source.
Data intermediary refers to service provider that facilitate data access and sharing under commercial or non-commercial agreements between data holders, data producers, and/or users.
Personal data refers to information relating to an identified or identifiable individual (data subject).
Personal Data Transfer means the transfer of personal data to a third (non-EU) country or to an international organisation specifically regulated under Chapter V of the EU GDPR.
Pseudonymisation means the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organisational measures to ensure that the personal data are not attributed to an identified or identifiable natural person.
Encryption is the formatting process applied to data that will only allow data access by the sole authorized entity that has a decryption key and renders the data unintelligible to any person who is not authorized to access it.
Data Sharing Agreements (DSA)means a legally binding agreement between two or more legal entities (or individuals) concerning the sharing of data or information of any kind between these legal entities (or individuals). This covers a broad typology of arrangements and documents between two or more organisations or different parts of an organisation. This does not intend to cover any contractual relationships with natural persons in their capacity as consumers or data subjects.
Data Management Plan (DMP) DMP describes the data management life cycle for the data to be collected, processed and/or generated by a research project. As part of making research data Findable, Accessible, Interoperable and Re-usable (FAIR), the DMP is a key element of good data governance that is made mandatory by the wide majority of public research funders such as the EU regarding its funding programs.
Research Data Repository: a database designed to host, store, make visible and accessible research data. Its role is to allow data to be deposited or collected, described, accessed and shared for reuse. Each repository generally has a policy for the deposit, description and dissemination of data. These infrastructures are part of an approach to sharing and opening up data in accordance with the FAIR principles so that data is "Findable, Accessible, Interoperable and Reusable".
Non-personal data means data other than personal data as defined in point (1) of Article 4 of GDPR, this including anonymous data and data that have been appropriately anonymised according to state-of-the-art techniques.
Secure processing environment means the physical or virtual environment and organisational means to provide the opportunity to reuse data in a manner that allows for the operator of the secure processing environment to determine and supervise all data processing actions, including to display, storage, download, export of the data and calculation of derivative data through computational algorithms.
Standard Operating Procedure(SOP) means a set of written instructions that describes the step-by-step process that must be taken to properly perform a routine activity. SOPs should be followed the exact same way every time to guarantee that the organisation remains consistent and in compliance with industry regulations and business standards.
Research data meansactual records (numerical scores, textual records, images and sounds or audiovisual records) used as primary sources for scientific research, and that are commonly accepted in the scientific community as necessary to validate research findings. A research data set constitutes a systematic, partial representation of the subject being investigated. This term does not cover the following: laboratory notebooks, preliminary analyses, and drafts of scientific papers, plans for future research, peer reviews, or personal communications with colleagues or physical objects (e.g. laboratory samples, strains of bacteria and test animals such as mice). (OECD, Principles and Guidelines for Access to Research Data from Public Funding, 2007) Research data can consist in primary or secondary data.
Primary data refers to the first hand data gathered by the researcher himself. Also known as the first hand or raw data. The data can be collected through various methods like surveys, observations, physical testing, mailed questionnaires, questionnaires filled and sent by enumerators, personal interviews, telephonic interviews, focus groups, case studies, etc.
Secondary data means data collected earlier by someone else. Secondary data implies second-hand information which is already collected and recorded by any person other than the user for a purpose, not relating to the current research problem. It is the readily available form of data collected from various sources like government publications, internal records of the organisation, reports, books, journal articles, websites, etc.
Open access refers to the practice of providing online access to scientific information that is free of charge to the end-user and reusable. 'Scientific' refers to all academic disciplines. In the context of research and innovation, 'scientific information' can mean:
peer-reviewed scientific research articles (published in scholarly journals), or
research data (data underlying publications, curated data and/or raw data – secondary and/or primary data).
Open access to research data refers to the right to access and reuse digital research data under the terms and conditions set out in the Grant Agreement.
Open data refers to making available research data or publications without restrictions regarding the reuses.
Even though a majority of researcher agrees on data sharing principle, the practice is often challenging and raise potentially difficult issues notably regarding:
Organization of data sharing;
Decision-making about sharing (when and with whom);
Legal architecture and conflicting laws applied in different countries where the parties to the sharing are established or operating;
Practical organisation of data sharing activities;
Planning of a budget for ensuring data sharing and sustainable activities;
Selection of appropriate third services or software used for performing data sharing.
This list is not exhaustive and there are cases where data sharing as open data could not be recommended (cf. Open Data below in Practical steps).
Opportunities and incentives
Researchers are more and more sensitised to the importance of data sharing and open research data in contemporary research settings. Data sharing in research must be tackled as a good practice supporting high-level collaborations, research quality and public interest.
Interactions with regulators
There is no specific European authority in charge of ruling and overseeing data sharing practices. Nevertheless, funding agencies attached to the European Commission implementing the EU research and technological development framework programmes are favouring data sharing and open data with granted institutions, in line, notably, with the objectives of the EU Open Data Directive (ODD) and the Data Governance Act (DGA) and with respect to national laws.
Data sharing/open data involves authorities and regulators at national levels:
Supervisory Authorities for personal data protection and Data Protection Officer(s);
Funding agencies at national level.
According to their remits, they can establish specific rules for data sharing, provide technical and non-technical support to researchers.
Data Management Plan (DMP)
A DMP is either required for projects receiving public funding or is strongly recommended to be elaborated on a voluntary basis as a good governance tool.
Data Management Plan is a synthetic document that helps to organize and anticipate all the stages of the data lifecycle. It explains for each data set how the data of a project will be managed, from its creation or collection to its sharing and archiving. The DMP requires to take steps with all the partners for ensuring that the research data within the project will be responsibly processed. This require collaboration of partners and, in many cases, acceptance from funders.
According to the EU standards (Horizon Europe 2021-2027), a DMP should include information on:
the handling of research data during & after the end of the project;
what data will be collected, processed and/or generated;
which methodology & standards will be applied;
whether data will be shared/made open access and;
how data will be curated & preserved (including after the end of the project).
The DMP shall be tailored for the data sharing operations envisaged and maintained up-to-date throughout the research project.
Within the DMP, data sharing can be organized by identifying types of data that will be shared, purposes of sharing, legal bases, specific roles in the sharing, identification of data recipients, technical safeguards and security measures, format requirements for shared data, and specific data sharing SOPs. The DMP can serve to attribute a level of sensitivity to the data / datasets and corresponding organizational and technical measures that will ensure proper data protection in data sharing. Development of common data model will be essential for providing standardized quality data.
The databases to be shared can be stored locally at the source institution and/or mutualized (e.g., in an ad hoc project database; in a secured research data repository already existing). Cloud computing services could provide adapted solutions for data storage and making available but they will require guarantees and safeguards with regard to data management. Where a specific data sharing system or platform is developed or used for research data sharing, the DMP should detail the responsibilities, rights and obligations related to the IT system governance, and SOPs used for performing data sharing through such system.
The governance of such data access and sharing platforms is an important element for creating a trustworthy environment satisfying both data subjects interests and legitimate researchers needs.
Governance of data sharing must be scrutinized, in particular where thirds are providing data management services, in particular as to the following aspects:
Data controllership: who will have decisional powers regarding data sharing, access and use? Are rights of thirds involved in sharing (e.g. regarding secondary data) and how compliance is ensured?
Data sharing purposes: define the specific research domain(s) for which data can be accessed by thirds, where applicable in compliance with data subject consent
Data access modes and modalities: can the primary data be accessed remotely through a dedicated platform? Downloading restrictions? Uploading/Enrichment functionalities? Traceability?
Data security policy
Data Access Committee (DAC): including relevant scientific, methodological, ethical and legal expertise and related rules of proceedings
Contracts including adequate and legally binding data protection clauses, such as an obligation not to violate data anonymity through the processing; IP clauses, taking the form of Data Transfer Agreements in case of international data sharing (including access to a database from a legal entity established outside the EU territory) with supplementary conditions (see below section on Personal data transfer).
DMP templates are usually made available by the research funding agency. For example, for EU funded projects, Horizon 2020 DMP template has been designed to be applicable to any Horizon 2020 project that produces, collects or processes research data. They can also be provided by another organization, or a trusted party. It also exists tools to support the building of the DMP, such as DMP-OPIDOR established by Inist-CNRS Institut de l'Information Scientifique et Technique which facilitates data entry and exchanges between project partners, in addition to providing DMPs templates from funders and examples of public DMPs. See resources provided at the end of the section in “More”.
Data Protection Impact Assessment (DPIA)
The DPIA is made mandatory by Article 35 of the GDPR for certain categories of personal data processing which could result in a high risk for data subject’s rights and freedoms. Most EU countries imposes DPIA for health research data processing, in particular where genetic data and new technologies are at stake. The DPIA is a key element in projects using/generating personal data that will structure the project in terms of privacy-protective measures for data subjects as research participants.
The DPIA will necessitate strong collaboration between partners and their respective legal officers (Data Protection Officers - DPOs - in the EU Member States) before the first collection of personal data to be processed for the research purposes.
Data sharing issues and appropriate technical and organisational safeguards for personal data protection must be included as part of the assessments, detailed and comply with Article 89 GDPR and applicable national laws. The DPIA analysis will complement the DMP as regard to personal data protection and ensure both the respect of data subject’s rights, essential personal data protection principles (e.g. fairness, data minimisation, purpose limitation, storage duration, accountability) and facilitate personal data sharing joint practices.
Technical services involved in the processing and related guarantees with regard to personal data protection and governance will also be specified within the DPIA at general level and at stakeholders level if appropriate.
The establishment and/or the respect of specific SOPs for personal data sharing could be a useful tool for facilitating data sharing practices.
Description of organisational and technical measures planned for ensuring accountable personal data sharing shall be described into the DPIA that will also define stakeholders’ roles, duties and rights.
Data subjects information on personal data sharing
Data subjects should have been duly informed of the data sharing and should not have rejected or objected to it (regardless of whether opt-in consent or opt-out is used for data collection). In order to ensure a high level of transparency and fairness in data sharing, data subject should receive clear, explicit and intelligible information on the identity of the data controller or of its representative, on the purpose(s) of sharing, its legal basis, as well as the categories of persons able to access and use the data shared.
Information about the respect of confidentiality duty and technical measures ensuring privacy protection in the data sharing (e.g. pseudonymisation or anonymisation), on the inclusion of the data into a database or biobank managing the sharing operations, as well as the involvement of private actors or foreseeable commercial collaborations including data sharing should be provided for completing elements of information mentioned under Article 13 or 14 of the GDPR. (Please see the entry on "Data collection, processing, controlling")
Genetic or genomic data sharing in research require specific attention. Indeed, research analyses could lead to reveal clinically useful individual results related to the health conditions studied or incidental findings unrelated to the initial health conditions or research purpose. Such information are potentially eligible to an individual feedback and a medical taking in charge of the research participant (prevention including genetic counseling, diagnostic or treatment strategies). The procedure for identifying, validating such eligible information and for returning it to the research participant shall be described, for example in the DPIA, and attached SOP shall be developed. The data subject should be informed regarding the possibilities that such a situation occurs and about corresponding communication procedures.
Where data are collected under a research protocol (e.g., clinical trial), the consent obtained should allow for publication of the data analysis as research results and deposition in a repository of the underlying data reported in a study. This type of consent will usually not cover the sharing of individual participant data unless it has been adequately anonymized.
In any case, data subjects must always be able to exercise their rights over the data shared. This means that the data controller and data user (whether it is another controller or processor) are jointly responsible of ensuring efficient data subject’s rights, regardless of the data location.
In any case, the data subject shall have the possibility to refuse or oppose to the sharing, at any time. Where personal data sharing is fundamentally necessary to perform the research and is thus made a mandatory criterion for participating to the research project, data subject shall be informed that a refusal to share personal data for the project purpose would imply refusal to the participation to the project. In certain context, options such as systematic data anonymisation prior to sharing, or layered approach to allow the data subject to only refuse sharing for certain purposes or parts of the research project (Recital 33 GDPR) could be envisaged.
Information duty prior to the sharing applies also where the controller intends to further process the personal data for a purpose other than that for which the personal data were collected. The controller shall provide the data subject prior to that further processing with information on that other purpose and with any relevant further information mentioned here-above and in Article 13 or 14 of the GDPR.
Legal exemptions to the information duty planned under the GDPR exist and should be checked (cf. section Data processing and Art.13(4), 14(5) of the GDPR). As a reminder, the application of such legal exemptions do not preclude the application of other data protection requirements imposed to the data controller and processor.
As a good practice, data subjects shall also be informed about anonymisation practices and consequences on the exercices of their rights, such as the impossibility to get feedback of potential individual results or clinically validated and usefull incidental findings.
Of note, non-personal data are falling outside the scope of the GDPR and can be shared freely, with no restriction according to the EU Regulation 2018/1807 on a framework for the free flow of non-personal data in the EU. This does not preclude the possibility to conclude data sharing agreements specifying for example data storage localisation if this is justified and proportionnate with regard to a legitimate objective to safeguard (e.g. public security, protection of anonymity) and that such provisions enhance legal certainty, so that they do not constitute an undue barrier to sharing.
Personal data transfer to non-EU country
Specific rules exist for sharing personal data with non-EU partners or users.
Identify whether you are in a personal data transfer regulated by Chapter V of the GDPR.
The European Data Protection Board (EDPB) Guidelines 05/2021further specify the definition of personal data transfer by establishing 3 cumulative criteria characterizing such situation:
A controller or a processor is subject to the GDPR for the given processing;
The controller or processor (“exporter”) discloses by transmission or otherwise makes personal data subject to the processing available to another controller, joint controller or processor (“importer”);
The importer is in a third country or is an international organisation, irrespective of whether or not this importer is subject to the GDPR in respect of the given processing in accordance with Article 3 GDPR.
Consequently, the “transfer” notion under the GDPR excludes data sharing activities performed within the EU territory between different entities located in Member States’ territories, and the import of personal data from non-EU countries or international organisations within controllers or processors’ organisations in the EU (but international and European research ethics principles still apply, as reminded by the EC guidelines on ethics and data protection). Similarly, it does not concern personal data which are passed directly by the data subject located within the EU to a non-EU country organisation where such an operation has been executed on the data subject’s own initiative.
Where the 3 criteria are met, there is a transfer situation submitted to Chapter V of the GDPR. Therefore, the following rules must be respected.
By principle, personal data transfers are forbidden by law, except under certain circumstances.
Data subject’s specific information
The data subject must have been clearly informed about the intention of the data controller to transfer personal data for the purpose of the processing.
According to Article 13 and 14(e) and (f) of the GDPR, data subject must be informed about the recipients of the data (“importer”) or categories of recipents of the personal data and about the existence or absence of an adequacy decision by the European Commission (see below Specific rules for framing the personal data transfer), or in the case of transfers referred to in Article 46 or 47, or the second subparagraph of Article 49(1) of the GDPR, reference to the appropriate or suitable safeguards and the means by which to obtain a copy of them or where they have been made available.
Transfers shall not impede data subjects to exercise their rights afforded by the EU GDPR and national laws.
Specific rules for framing the personal data transfer
Personal data transfer must be framed in compliance with the GDPR and with any national sector-specific regulations applying to health data sharing.
This means that:
The data transfer must be lawful. For data transfers to non-EU countries to be lawful they must be predicated on at least one of the following grounds:
On an “adequacy decision” adopted by the European Commission in respect of the recipient country in question; list of concerned countries; or
Binding corporate rules (BCR) applied to group of undertakings pursuant to Article 47 GDPR that cover both sender and recipient organisations and approved by a national supervisory authority; or
An approved code of conduct pursuant to Article 40 GDPR together with binding and enforceable commitments of the controller or processor in the third country to apply the appropriate safeguards, including as regards data subjects' rights; or
An approved certification mechanism pursuant to Article 42 GDPR together with binding and enforceable commitments of the controller or processor in the third country to apply the appropriate safeguards, including as regards data subjects' rights; or
On another exceptional legal ground metioned in Article 49 GDPR including:
The explicit consent of the data subject (which requires them to be clearly and explicitly informed in advance of such transfers in the absence of adequate safeguards mentioned above and of the mechanisms in place to respect their privacy – while legally possible, this should not be priviliged option); or
The transfer is made necessary in the absence of previously mentioned safeguards based on the necessity to achieve important public interest purpose.
The data exporter located within the EU must be authorised by competent National Data Protection Authority (the ‘supervisory authority’) for performing such transfer activities where required by applicable law.
The data exporter must check that the recipients of the data is able to ensure the same level of data protection as it is required under EU law.
The research project in which samples/data are being shared and/or used must have received ethics approval from competent research ethics committees. The research sponsor must always be accountable with regard to data sharing activities and be able to provide ethico-legal documentation supporting its activities. These should be checked by the data exporter before implementing the sharing.
Any personal data transfer must be registered and documented within the processing registry maintained by the DPO.
In case of sharing with another data controller established outside the EU, the latter must designate a representative within the EU territory according to Article 27 GDPR.
Data security provisions, provided for by law or other regulations, and contained in reference frameworks (e.g., ISO standards), should result in regularly reviewed, state-of-the-art technical and organisational measures so as to protect personal health-related data from any illegal or accidental destruction, any loss or any alteration, and to guard against any unauthorised access, or unavailability or inaccessibility.
As part of the measures to implement in data sharing, the following shall be considered:
Of note, the processing activity that produces anonymised data is a processing of personal data, which can be considered to be compatible with the original purposes of processing from which the data are obtained. The anonymised data set is outside the scope of the GDPR only if it is possible to objectively demonstrate that there is no material ability to associate the anonymised data with a certain natural person, directly or indirectly, whether through the use of other data sets, information, or technical and material measures which may be available to third parties.
The Art. 29 DPWP (now EDPB) examined several different methods of data anonymization and clarified what measures data processors and controllers have to take. They specifically say that “removing directly identifying elements in itself is not enough to ensure that identification of the data subject is no longer possible. It will often be necessary to take additional measures to prevent identification, once again depending on the context and purposes of the processing for which the anonymised data are intended.” This warns on the fact that the US Health Insurance Portability and Accountability Act (HIPAA) notion of de-identification by the simple removal of 18 identifiers does not reach the level of anonymity required under the GDPR without further risk analysis and mitigation measures.
Description of data / data set use conditions (E.g. GA4GH DUO – Data Use Ontology for setting up machine-readable data use conditions and restrictions).
Traceability of actions carried out on the data (communication, changes made to or deletion of data).
Other relevant measures which could be applied regarding the physical security of the data and communication technologies used for the sharing. For further details, check whether your supervisory authority published guidance, see for example the French CNIL Guide on personal data security.
These data control measures should be detailed within a legally binding document and adhered by data sharing parties through contracts (Data Processing Contracts; Data Transfer Agreements).
Violations of security measures in personal data sharing are considered as personal data protection breaches.
Data protection, including privacy protection measurers, data licensing and intellectual property rights, are important issues to be solved in preparation of the sharing. As a good practice in scientific research, in particular where personal sensitive data are at stake (health data; genomics/genetic data), the signature of a specific contract framing any research data sharing operation is of utmost importance.
Types of Data Sharing Agreement (DSA)
Contracts can take the form of a Data Processing Agreement (DPA), or of a Data Access Agreement (DAA), or of a Data Transfer Agreement (DTA) or of a Material Transfer Agreement where human biological samples are shared together with the data (MTA). Whatever the type of contract used, these shall ensure adequate data protection and legal certainty for the data controller and processor, in the respect of the rights and freedoms of the data subject.
Content of the contracts
These contracts details the parties, the context, object and purposes of the sharing, the type of data, the duration of the sharing, the data controller and users rights and obligations, the practical modalities of sharing, storage and protection of the data or data set, including in case of sub-processing, as well as details about intellectual property rights and rules, applicable laws and dispute resolution mechanisms (extra-judicial or judicial).
Conditions of personal data uses, and of anonymised data exploitations, are usually further detailed within these legally binding contracts.
In case of personal data sharing subject to GDPR contracts between the data controller and data processors shall respect the conditions fixed in Article 28 GDPR in terms of content and attached guaranttees.
Restrictions regarding access rights to personal datashared could be envisaged due to the inherent privacy risks related to personal data sharing. By principle, access to identifiable data shall be restricted to the only professionnals acting for the sole research purposes which are subject to professionnal secrecy duty. Sharing personal data requires to envisage security measures, contractual guarantees and governance mechanisms. The practice by the data processor to which data are made available, of subprocessing shall be envisaged with prudence, in particular where genomic data are at stake. Such practice can be contractually forbidden or specifically restricted.
Of note, identifiable personal data are not subject to intellectual or orther types of legal property regimes under EU laws. Protections against personal data misappropriation shall be ensured by the data controller. Nevertheless, databases can be subject to IP law (Directive 96/9/EC on the legal protection of databases) which creates different rights, first for the database author(s) and, second, for the database producer(s) which shall be respected in the sharing and can condition use of the database to prior authorisations or contractual arrangements (see the section below on databases).
Generally, the Creative Commons (CC) licences are used for protecting intellectual property rights along data sharing. Other types of licences exist. The CC0 licence (the more open to data reuses) is usually used for metadata associated to a data set. Creative Commons licences are also used for data derived from physical samples such as biological material. Several degrees of openess can be chosen through creative commons depending on the sensitivity of the data and potential existing research consortium arrangements (e.g. Grant Agreement, DMP). The CC licences have a binding legal value for the user who shall respect their rules through reuses and visibly mention the licence through communications.
Contracts can condition access and sharing of the data with the use of technological and organisational environments or methods that shall impose conditions that preserve the integrity of the data but also, where accessed remotely through an IT platform, conditions that preserve the functioning of the technical systems of the secure processing environment used. Where protected information (confidential business information, personal data, data/database subject to intellectual property) are made available, their reuse shall be conditional on the adherence by the reuser to a confidentiality obligation that prohibits the disclosure of any information jeopardising the rights and interests of third parties the reuser may have acquired despite the safeguards put in place.
Who is involved in the contract preparation and implementation?
The researcher’s institutional and legal representative;
If relevant, intermediaries such as a biobank; possibly cloud computing services
Defining the legal protection applicable to the data and data sets to be shared should systematically involve the competent legal departments, head of the office or unit, or other competent persons acting on behalf of the researcher’s institution. The same precautions apply where the sharing is envisaged through third parties services such as online cloud computing services. Indeed, the cloud service provider and the user are entering in a contractual relationship setting forth responsibilities usually through terms of services agreed by the client (the user). The standard practice shows that the client is often invited to agree with pre-set contractual clauses which could not always be fitted for purpose.
In any cases, the contract will be signed by the persons authorised to approve legal documents on behalf of the concerned institutions (the parties: the one who share, the one who receive and use the data; potential intermediaries). In the event of a breach of the agreement, responsibilities of the parties could be engaged.
In any case, the legal qualification of the biobank in the data sharing shall be established with regard to its role (e.g. data controller or joint-controller; data processor).
For an overview of existing human biobanks in Europe and samples made available for research purposes see BBMRI-ERIC Website and search tool Directory; for human pluripotent stem cells lines, see the hPSCReg.
The use of cloud computing services (SaaS, IaaS, PaaS) for sharing data in research can be attractive for data controllers necessitating specific computational means and can present advantages in terms of cost savings and scalability. Nevertheless, this triggers specific issues that the data controller shall consider before using these services in order to ensure accountable data protection practices.
Activity entailing hosting health-related data externally and making them available to users should comply with the security reference framework and principles of personal data protection applied in EU.
Whatever the types of services envisaged to be used in research, it is preferable to seek for institutional tools’ offers which would already be tailored to the research settings and legal requirements before looking for other service’s offers.
Where no relevant institutional means are made available, data controllers can envisage using third party cloud computing services provided that they ensure accountable data protection governance. This means that data controller shall assess the data protection appropriateness of a cloud service, including with regard to IT security measures, and ensure that the right terms and conditions of use can be contracted (via the so-called Service Level Agreements - SLA). Legal issues relating to the potential restrictions imposed by the grant agreement or by your institution regarding the use of cloud computing services for managing personal sensitive data as well as issues regarding the management of intellectual property rights with the service provider and applicable laws shall also be scrutinized before use. The institutional DPO shall always be consulted.
In any cases, a private cloud solution (also known as internal or corporate cloud system) shall be envisaged for managing personal data in research, in particular where personal sensitive data (health, genetic, biometric…) are at stake. Private cloud shall be privileged as a tailored system for cloud computing services offered over the Internet or a private internal network to only select users, for specific purposes. It combines many of the benefits of cloud computing with the security and control of on-premises IT infrastructures and additional guarantees for regulatory compliance. By opposition, public cloud are less adapted to sensitive data processing as they are cloud computing services delivered over an infrastructure shared by multiple customers for multiple purposes (see information provided by IBM).
Any data sharing contract shall ensure that stakeholders are held accountable in taking responsibility, according to their roles, for the quality of the data they share and for the systematic implementation of risk management measures throughout the data lifecycle.
Specific support available in Europe with the European Research Infrastructure Consortia (ERIC) services:
European Infrastructures are providing helpful services for researchers seeking for specialised data sharing resources in life-sciences research projects, in particular ELIXIR-ERIC (bioinformatics and research data management) and its Federated Human Data Community (FHDC) and BBMRI-ERIC (human biosamples and attached data management for research uses).
Database sharing and IP rights
Database sharing and data reuses shall be allowed only in compliance with intellectual property rights.
First, the database author(s) can have copyrights on the database structure provided that the database, by reason of the selection or arrangement of their contents, constitute the author's own original intellectual creation.
In respect of the expression of the database which is protectable by copyright, the author of a database shall have the exclusive right to carry out or to authorize (Article 5 Directive 96/9/EC):
Temporary or permanent reproduction by any means and in any form, in whole or in part;
Translation, adaptation, arrangement and any other alteration;
Any form of distribution to the public of the database or of copies thereof. The first sale in the Community of a copy of the database by the right holder or with his consent shall exhaust the right to control resale of that copy within the Community;
Any communication, display or performance to the public;
Any reproduction, distribution, communication, display or performance to the public of results of the acts referred to in (b).
However, it exists exceptions to these restrictions on the use of the database content (see below Rights and obligations of lawful users).
Second, the database maker or producer has sui generis rights to ensure protection of any investment in obtaining, verifying or presenting the contents of a database for the limited duration of the right. The maker of a database is the person who takes the initiative and the risk of investing, this excluding subcontractors in particular from the definition of maker. Whereas such investment may consist in the deployment of financial resources and/or the expending of time, effort and energy, the objective of the sui generis right is to give the maker of a database the option of preventing the unauthorized extraction and/or re-utilization of all or a substantial part of the contents of that database.
The right provided to the maker of the database (Article 7 of Directive 96/9/EC) shall run from the date of completion of the making of the database or of its making available to the public. It shall expire 15 years from the first of January of the year following the date of completion or of the date when the database was first made available to the public.
Again the protection afforded to database author(s) and producer(s) shall not be considered as a property right over data contained within the database which can be subject to rights of thirds (e.g., personal data) nor to computer programs used in the making or operation of databases accessible by electronic means. It consists more in data custodianship.
Rights and obligations of lawful users
The maker of the database which is made available to the public in whatever manner may not prevent a lawful user of the database from extracting and/or re-utilizing substantial parts of its contents, evaluated qualitatively and/or quantitatively, for any purposes whatsoever. Where the lawful user is authorized to extract and/or re-utilize only part of the database, this paragraph shall apply only to that part.
A lawful user of a database which is made available to the public in whatever manner may not perform acts which conflict with normal exploitation of the database or unreasonably prejudice the legitimate interests of the maker of the database.
A lawful user of a database which is made available to the public in any manner may not cause prejudice to the holder of a copyright or related right in respect of the works or subject matter contained in the database.
Finally, exceptions to sui generis right of the database maker can be set forth by National laws, in particular in the case of extraction for the purposes of illustration for teaching or scientific research, as long as the source is indicated and to the extent justified by the non-commercial purpose to be achieved (see Article 9(b) of the Directive 96/9/EC). This shall be clarified through contract.
As it is specified under the DGA, the right of the maker of a database as provided for in Article 7(1) of Directive 96/9/ECshall not be exercised by public sector bodies in order to prevent the reuse of data or to restrict reuse beyond the limits set by the DGA.
Of note, datase makers and authors can either plan specific licensing terms and conditions or even abandon their rights through contracts for simplifying data access and sharing where the risks of misuses are deemed low compared to the advantages of sharing for scientific uses. The rights’ holders can also entrust a third party data repository lawfully established for managing such rights through adapted accountable governance.
Data sharing governance
Organisational measures such as the constitution of a Data Access Committee (DAC) which will receive, review and assess data access requests’ projects before granting access to the data should be envisaged, in particular for personal data sharing activities. Such a DAC shall involve stakeholders in data custiodianship and complementary relevant expertise, this including the data controller / data producer, legal/ethical expertise (e.g., DPOs), disease or research field specific experts, methodologists etc. DAC shall act according to a clear policy and internal rules of proceedings that can be detailed in the DMP. As part of its mission, the DAC should:
Check that the applicant and proposed work complies with the data processing policy and data protection requirements for ensuring data integrity and confidentiality (contracts);
Check whether the data access applicant demonstrates compliance with applicable laws with regard to its research project (including research ethics committees’ approval and other legal authorisations to be acquired);
Provide feedback to applicants advising on the improvements required for their requests if needed;
Approve access applications or Deny access based on a written justification notified to the applicant;
Be informed of / Approve aggregated figures to be publicly published based on the analysis of accessed data;
Document decision-making and help maintaining traceability of data accesses.
Data sharing governance mechanisms using DACs are notably in place in the most reknowned open-controlled access data repositories, such as with the European Genome Archive DAC.
Organisations to which the data sharing is entrusted shall respect the provisions fixed under the EU Data Governance Act (DGA - Regulation (EU) 2022/868). The DGA was adopted in May 2022 and will be applicable in September 2023. It aims to promote the sharing of personal and non-personal data by setting up intermediation structures. The DGA aims at facilitating the reuse of certain categories of protected public-sector data and increasing trust in data sharing across public sector bodies, users, data users, data holders, and overcome technical obstacles to the reuse of data. It does so by allowing wider reuse of protected public-sector data, by proposing a new business model for data intermediation services and by promoting data altruism for the common good.
The EU will boost the development of trustworthy data-sharing systems through 4 broad sets of measures:
Mechanisms to facilitate the reuse of certain public sector data that cannot be made available as open data. For example, the reuse of health data could advance research to find cures for rare or chronic diseases.
Measures to ensure that data intermediaries will function as trustworthy organisers of data sharing or pooling within the common European data spaces.
Measures to make it easier for citizens and businesses to make their data available for the benefit of society.
Measures to facilitate data sharing, in particular to make it possible for data to be used across sectors and borders, and to enable the right data to be found for the right purpose.
Guidance and technical and legal assistance to facilitate the reuse of certain categories of protected public sector data (confidential business information, intellectual property, personal data), this including a prohibition of exclusive arrangements (Article 4 DGA) and conditions for data reuse ensuring that the protected nature of data is preserved (Article 5 DGA), fees rules (Article 6 DGA);
Mandatory certification for providers of data intermediation services;
Optional certification for organisations practicing data altruism.
The DGA articulates with orther EU regulation on data, in particular with the GDPR and Directive 96/9/EC. It also aims at creating sector-specific data spaces to enable the sharing of data within a specific sector (data spaces for transport, health, energy or agriculture) which will be object of specific regulations.
Identify the data eligible and the nature of open access model in the DMP
Eligible data for opening
As a general rule, all research data (primary or secundary research data) generated though the project that can be qualified as research results should be provided in open access. Funding agencies apply the recommendation "As open as possible, as closed as necessary" for the dissemination of data produced in the framework of a publicly-funded project.
The DMP shall address open data issues in order to prepare such specific sharing activities. Data sharing and open data for projects who formalised a DMP will be much easier.
As a common feature with other data sharing in research, opening data to further/large reuses opportunities can be envisaged through 6 questions:
What open data obligations apply? The obligation may be imposed by the project funder, by national, European or international law, by the data policy of certain partners, by the journal in which you publish, etc.
What is the scientific value of the data and its potential for reuse? The current or future scientific, environmental, economic or social interest and usefulness of the data can guide the choice and identify necessary protections to secure data uses. The question of the strategic or commercial potential of the data may also influence the decision to open, the open access model and the selection of appropriate data repository.
Do you have the right to make this data public? Which are the rights of others potentially engaged by the data (Privacy, IP, other specific protection)? Are these rights respected? Is the data raising ethical issues so that opening would require validation by an ethics committee?
Have you obtained the agreement of all contributors?
Have you assessed the time and effort needed to format the data and metadata to meet the requirements of the envisaged data repository? Have you identified the person that will ensure data deposition process and follow-up?
Have you defined the conditions of reuse of the data you have produced? E.g., special restrictions of use? Standard for data citations?
Throughout the questioning, specific cases deserve particular attention and could even result in a decision not to share the data in open access mode:
Where the data are personal data and that the sharing would infringe individual rights or has not been authorised by competent research ethics committees or authorities AND that these data cannot be properly anonymised - DPO opinion is always necessary before releasing individual personal data or datasets;
Where data are preliminary data, meaning data that have not been carefully checked and validated. The only exception to this rule would be preliminary data that could potentially benefit the public. A researcher who has strong preliminary indications of a major threat to public health, such as unexpected side effects from a drug or an unrecognized environmental health problem, may have good reason to share this information with the public and other researchers before it is fully validated. Data that have no immediate public benefit, such as the discovery of a basic scientific process that could eventually lead to public benefits, in most instances is best held until the researcher is confident that the results will stand;
Where data are research results, meaning confirmed or validated data, which present certain IP interests. Researchers can withhold these data until they have had time to establish their priority for their work through publication or, in rare cases, a public announcement. They have not to release data on a day-to-day or experiment-to-experiment basis for other researchers to use, even though this might speed the advance of knowledge. Provided no agreement has been made to the contrary, keeping data confidential prior to publication is a commonly accepted practice that most researchers and funding agencies accept;
Once a researcher has published the results of an experiment, it is generally expected that all the information about that experiment, including the final data, should be freely available for other researchers to check and use. Some journals formally require that the data published in articles is made available to other researchers upon request or stored in public databases.
Where the data are classified or present risks of dual uses or significant risks of misuse. Classification of the research results is usually identified from the beginning of the research project and approved by research funders and competent authorities. It can happen that unexpected sensitive research results occur during the research. In such a case, consultation of competent research ethics committees, institutional referees (e.g., security and defence officers) and notification of funders shall be considered for a collegial decision-making about the necessary confidentiality level to ensure, this including possible exceptions to open access policies in sciences. (Please see the entry on Data Misuse)
In view of accountable data management practices, any decision to restrict or not to share openly research data shall be argued, documented and recorded as it could eventually be challenged as a research misconduct by reference to the ethical principle of research integrity or by reference to a breach of given data subject’s consent.
Open access models
The open access models can take several forms which can either be decided by the data producer or imposed by laws or specific agreements such as Grant Agreement (GA). The DMP shall plan which models can/shall be used for a specific project.
All the partners to the project shall be aware of these arrangements and be able to meet them.
Depending on the sensitiveness of the data considered, conditions related to the purposes of data uses, potential users profile, technical and organisational conditions or means which are necessary to ensure proper data processing are important elements to clarify before opening the data. Conditions include the respect of specific measures related to privacy protection, and the respect of IP rights. The conditions attached to the data / datasets resulting from a research activity can be detailed as part of the DMP and related data sharing policy. Different policies for different data sets are envisageable.
Regarding research data, there are three models, or levels, of open access that can be envisaged:
Full open access of research data;
Registered access or with authentication;
Controlled access with application systems, assessment procedure and governing body such as Data Access Committee (DAC).
Regarding scientific publications, there are two main models of open access:
Green open access – self-archiving mode
Gold open access – open access service offered by publisher, potentially subject to fees
FAIRification of the data before deposition in open access / research data repositories
Efficient data sharing and valorisation requires to have good quality data. The FAIR principles applied to research data are essential for preparing data sharing in the best conditions as they intends to create Findable, Accessible, Interoperable and Reusable data (Please see the entry on Data protection-Main principles).
Reaching FAIR data, in a nutshell, will necessitate specific processing and verifications from the part of the data controller which are summarised in the following table:
The practice of FAIRification can be challenging. Nevertheless, several resources are available for implementing FAIR data requirements, build data management and policies and perform FAIRification.
The European project FAIRplus(GA 802750) provided several tools and services for easing such a new standard research practice.
In particular, you can freely access and use the following resources:
FAIR Cookbook that provides information on how to FAIRify datasets, the levels and indicators of FAIRness, the maturity model, the technologies, the standards available, as well as the skills required, and the challenges, to achieve and improve FAIRness. This tool is associated to a search functionality allowing a full overview of existing recipes.
FAIR Tool Discoverer that allows a quick identification of practical publicly available technical software for implementing data FAIRification based on certain search criteria entered by the user.
Services offered by data repositories to data depositors are also generally available for assisting in research data FAIRification through the deposition process.
Other tools facilitating FAIRification:
Data papers, Bioresource papers:A data paper (also known as a data descriptor, data article, data briefs, resource announcements, data resource profile) is a peer-reviewed and credible scientific article: Data descriptor, Data article, Data Briefs, Resource Announcements, Data Resource Profile) is a peer-reviewed, peer-reviewed scientific paper. It describes a data set, the method used to obtain it and the potential for reuse of the data set. This type of article informs the scientific community of the existence, originality, quality and availability of a dataset. It valorises the work of its authors by explaining the importance of the data produced and its potential for reuse in future research. The data paper does not describe research results and does not contain any discussion or conclusions (Dedieu, L (2022)). Bioresource papers have the same goal for biogical materials and attached data made available for research uses. Software papers are the equivalent of data and bioresource papers for publishing code, scripts or software. Publishing such benchmarking papers in open access can be done in many disciplinary journals or in specialised journals (e.g., for bioresources, the Open Journal of Bioresources).
Researchers persistent identifier (PID):Valorisation of researchers’ work is facilitated by the obtaining of a persistent and interoperable identifier such as ORCID which can be easily referenced and attached to scientific publications, databases or other research materials.
Open access platforms for scientific publications
Research institutions to which authors of the scientiifc publication pertains may have internal policies for providing the publications to specific open access databases, in particular in the context of public research institutions.
Check the editor’s policy regarding open data before deposit on a repository. For this, you can use the tool Sherpa Romeo.
EU dedicated service for open access publishing
Open Research Europeis a platform maintained by the European Commission for facilitating fast and open peer-reviewed publication from Horizon 2020 and Horizon Europe funded projects.
Available support services for FAIRifying research data
OpenAire is a service provision platform contributing to the European Open Science Cloud (EOSC) which include services provided to researchers through national contact nodes in participating countries or through training and other tools in order to practice Open Science, to prepare FAIR data by implementing state-of-the-art techniques and policies. Consult the service catalogue.
Open data repositories for research data
Trusted research data repositories play a fundamental role in modern science where scientists as data producers or stewards have no internal capacities to store, manage and share research data. Selecting trustworthy and adapted open data sharing repositories or platforms is essential.
Scientists must be able to deposit and share data via trusted data repositories that implement FAIR data principles and that ensure long-term sustainability of research data across all disciplines. Selected data repositories should be easy to find and identify, and provide full transparency to users about their services.
Services offered must be adapted to the type of research data you are dealing with in order to guarantee safe data preservation, management and legal compliance (e.g., for personal data protection according to the EU GDPR). Some repositories offer data brokering services which are facilitating data deposition through a dedicated assistance to the data producer or legitimate data steward. To find more general information about data brokering as a concept, please visit the specific page of the ELIXIR RDM kit.
As a general rule, research data should be submitted as first intent to discipline-specific repositories, whether internal to your institution or not, before considering deposition in a generalist repository.
Several renowned repositories in life sciences are already in place. Please consult the list of resources ELIXIR Deposition Databases for Biomolecular Data including notably the following EBI/EMBL Open Data Repositories (offering data brokering available, training for depositors and helpdesk services)
EGA, the European Genome Archive, allows secured and controlled strorage and management of human research data, including sensitive personal data categories in particular genetic data, in compliance with the EU GDPR.
ENA, the European Nucleotide Archive, allows secured storage and management of non-human research data, including genomic data.
As complementary listing of research data repositories, you can visit:
Data Repository Guidance for more information and links towards some research data repositories identified by their scope.
Most of European data repositories are contributing to the ongoing construction of the European Science Cloud (EOSC). The latter aims at being the single European contact point for European researchers and innovators to easily access, use and reuse a broad range of data for scientific research. For more information, please visit the EOSC Portal and EOSC Catalogue and Marketplace. The EOSC also offers training and support services on open science.
As a reminder, opening data does not mean loosing opportunities to valorise, exploit the data, as most of the data repositories offers IP protection through embargo period and “grace period” to exploit the data for publication or patenting. For EU-funded projects, the European Commission has a European IP helpdesk that offers documentation and counsels to researchers facing IP concerns.
Ongoing initiatives of interest in the field of Open Science
The Research Data Alliance (RDA): The RDA is an open multidisciplinary forum intending to address issues, establish/share standards and tools for easing open science, formulate recommendations for funders, publishers and policy-makers.
The European Health Data Space (EHDS): The European Health Data Space (EHDS) aims to provide a trustworthy setting for secure access to and processing of a wide range of health data, in full compliance with applicable EU legislations (including GDPR and DGA) and National legislations. Today under development, the infratsructure will be governed according to a specific EU Health Data Space Regulation (A proposal is presently discussed – see draft EHDSR)through the EHDS infrastructure, aside the improvement regarding autonomous patients’ management of their personal health data, professional access to health data by professionals is allowed for:
Medical uses (defined as a part of the ‘primary use’ in the draft EHDSR)
Research uses (defined as a part of the ‘secondary use’ in the draft EHDSR)
The EHDS is a distributed infrastructure based on several data hubs (health data access bodies) in EU Member States (such as the Health Data Hub in France), and will be related to other data sharing initiatives such as the European Medicine Agency DARWIN EU project regarding the sharing of data that will serve regulatory decision-making by establishing and expanding a catalogue of observational data sources for use in medicines regulation; by providing a source of high-quality, validated real world data on the uses, safety and efficacy of medicines; by addressing specific questions by carrying out high-quality, non-interventional studies, including developing scientific protocols, interrogating relevant data sources and interpreting and reporting study results.
According to the current draft EHDSR, data holders shall make the following categories of electronic data available to Health data access bodies for secondary use in accordance with the provisions of Chapter IV:
EHRs – Electronic Health Records;
data impacting on health, including social, environmental behavioural determinants of health;
relevant pathogen genomic data, impacting on human health;
health-related administrative data, including claims and reimbursement data;
human genetic, genomic and proteomic data;
person generated electronic health data, including medical devices, wellness applications or other digital health applications;
identification data related to health professionals involved in the treatment of a natural person;
population wide health data registries (public health registries);
electronic health data from medical registries for specific diseases;
electronic health data from clinical trials;
electronic health data from medical devices and from registries for medicinal products and medical devices;
research cohorts, questionnaires and surveys related to health;
electronic health data from biobanks and dedicated databases;
electronic data related to insurance status, professional status, education, lifestyle, wellness and behaviour data relevant to health;
electronic health data containing various improvements such as correction, annotation, enrichment received by the data holder following a processing based on a data permit.
Health data access bodies shall only provide access to electronic health data where the intended purpose of processing pursued by the applicant complies with:
activities for reasons of public interest in the area of public and occupational health, such as protection against serious cross-border threats to health, public health surveillance or ensuring high levels of quality and safety of healthcare and of medicinal products or medical devices;
to support public sector bodies or Union institutions, agencies and bodies including regulatory authorities, in the health or care sector to carry out their tasks defined in their mandates;
to produce national, multi-national and Union level official statistics related to health or care sectors;
education or teaching activities in health or care sectors;
scientific research related to health or care sectors;
development and innovation activities for products or services contributing to public health or social security, or ensuring high levels of quality and safety of health care, of medicinal products or of medical devices;
training, testing and evaluating of algorithms, including in medical devices, AI systems and digital health applications, contributing to the public health or social security, or ensuring high levels of quality and safety of health care, of medicinal products or of medical devices;
providing personalised healthcare consisting in assessing, maintaining or restoring the state of health of natural persons, based on the health data of other natural persons.
European Union Legislation
Directive (EU) 2019/1024 of the European Parliament and of the Council of 20 June 2019 on open data and the re-use of public sector information (recast), PE/28/2019/REV/1, OJ L 172, 26.6.2019, p. 56-83, CELEX number: 32019L1024
Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation) (Text with EEA relevance), OJ L 119, 4.5.2016, p. 1-88, CELEX number: 32016R0679
Regulation (EU) 2018/1807 of the European Parliament and of the Council of 14 November 2018 on a framework for the free flow of non-personal data in the European Union (Text with EEA relevance), PE/53/2018/REV/1, OJ L 303, 28.11.2018, p. 59-68, CELEX number: 32018R1807
Regulation (EU) 2022/868 of the European Parliament and of the Council of 30 May 2022 on European data governance and amending Regulation (EU) 2018/1724 (Data Governance Act) (Text with EEA relevance), PE/85/2021/REV/1, OJ L 152, 3.6.2022, p. 1-44, CELEX number: 32022R0868
Commission Implementing Decision (EU) 2021/914 of 4 June 2021 on standard contractual clauses for the transfer of personal data to third countries pursuant to Regulation (EU) 2016/679 of the European Parliament and of the Council (Text with EEA relevance), C/2021/3972, OJ L 199, 7.6.2021, p. 31-61