Skip to main content
universidade lusófona

Datasets

Data Management for an Open Science Policy

"As open as possible, as closed as necessary" 

 

HEI-Lab  aligns with the policies regarding the management and sharing of research data issued by the European Commission and recommendations from the Foundation for Science and Technology. These policies include requirements related to data management and protection, specifically through the creation of data management plans, and define data sharing as the default option. Within the framework and requirements established in Horizon 2020 and Horizon Europe, this principle regarding research data has been formulated as "As open as possible, as closed as necessary".

FAQs

What is a Data Management Plan?
What does a Data Management Plan comprise?
Why create a Data Management Plan?
Requirements of Funding Agencies
When should Data Management Plans be created?
Data Management Plan – Recommended Tools
Where to preserve my data?
What if I cannot share my data?
Is it possible to make only some datasets of the project available?
How can I learn more about DMPs?
Why should I follow these guidelines?

 

 

What is a Data Management Plan?

A Data Management Plan (DMP) is a formal document that defines the lifecycle of data generated or collected in a research context. It covers various aspects, from the creation or collection to the processing of data during and after a research project. It identifies how the data will be created and documented, who will have access to it, how it can be (re)used, and where it will be stored and/or preserved. DMPs are dynamic documents that adapt and evolve as the research progresses. They are crucial for efficient research data management as they provide a comprehensive understanding of the data and the circumstances in which it was generated. This approach to data management enables its reuse and replication, thus contributing to a more robust scientific system.

What does a Data Management Plan comprise?

A DMP comprises several essential elements to ensure an efficient and ethical approach to data management throughout the lifecycle of a research project. Some of the fundamental components include:

  • What data will be created or collected? Clear identification of the types of data that will be created or collected during the project.
  • How will the data be created or generated? Detailed description of how the data will be created, generated, or collected.
  • What methods and standards will be adopted in data processing? Definition of methods and standards used in data processing, including cleaning, transformation, and analysis processes.
  • What methods and standards will be adopted for data handling throughout the process? Definition of procedures related to data handling, including all procedures for repository deposit, transfer, or safe and efficient reuse of data throughout the research project.
  • What documentation or metadata will be integrated into the data? Specification of documentation or metadata that will be integrated with the data to facilitate understanding and future reuse.
  • How will ethical issues be addressed? Indication of strategies used to address ethical issues related to data collection, (re)use, and dissemination.
  • How will copyright and intellectual property issues be handled? Detailed information on copyright and intellectual property issues associated with the data.
  • How will data be stored and backed up during the project? Outline of file formats and procedures adopted to ensure data storage security and backup during the project.
  • What are the levels of data access and security? Identification of data access levels (e.g., restricted to the institution, completely open access, etc.) and implemented security measures.
  • How will data be maintained and preserved after the project ends? Detailed plan for data preservation, including file formats and storage strategies.
  • What is the long-term data preservation plan? Detailed plan for long-term data preservation, including identification of curation processes.
  • What data will be made available in Open Access? Determination of which data will be made available in Open Access and how.
  • How will the data be shared? Strategies for data sharing, including platforms and formats.
  • Are there guidelines on data restrictions or open access? Identification of restrictions or open access to data, when applicable.
  • Who is responsible for data management? Clear designation of the person or team responsible for continuous data management (DPO).
  • What resources are needed to implement the DMP? Estimate of human, financial, and technological resources necessary for the implementation of the DMP.

Why create a Data Management Plan?

The FAIR principles function as guidelines, not standards. They outline essential qualities or behaviors to optimize data reuse, emphasizing the importance of elements such as description and citation.

 Findable 
 Accessible
 Interoperable
 Reusable 
 Assignment of a persistent unique identifier to (meta)data  (Meta)data are retrievable via their identifier through a standardized communication protocol  (Meta)data use a formal, accessible, shared, and widely applicable language for knowledge representation  (Meta)data have a plurality of precise and relevant attributes
 Detailed metadata description  Open, free, and universally implementable communication protocol  (Meta)data use vocabularies that follow the FAIR principles  (Meta)data are provided with a clear and accessible data usage license
 Registration or indexing of (meta)data in a searchable resource  Communication protocol allows for authentication and authorization procedures when necessary  (Meta)data include qualified references to other (meta)data  (Meta)data are associated with their provenance
 Inclusion of the identifier in the metadata  Metadata remain accessible even if the data are no longer available    (Meta)data comply with relevant community standards

Requirements of Funding Agencies

Common requirements include the creation of a DMP and the provision of research data in open access whenever possible. This requirement covers the data needed to validate results in scientific publications, as well as other data resulting from the project as specified in the DMP.

  • European Commission – European Data Strategy: The European Commission advocates for access to data that validate scientific publications and the availability of all other data associated with the project to maximize access and reuse of data generated by research projects. However, projects may justify not sharing data either at the proposal stage or during execution within the DMP. This can occur in the following scenarios:

    • If the project does not generate or collect data
    • In cases of conflict with the protection of results, especially if there is an expectation of commercial or industrial exploitation
    • When open data availability compromises the main objective of the project
    • In situations of conflict with confidentiality obligations
    • In disagreement with national security obligations
    • In violation of personal data protection rules
  • FCT – Policy on the availability of data and other research results financed by FCT

When should Data Management Plans be created?

When to create a DMP varies depending on the project context:

  • In the context of a funded project:

    • First version: During the funding application process or within the first 6 months of the project, as required by the funder.
    • Updates: Whenever significant changes justify it or new datasets are added. Midway or in the final stages of the project.
    • Evolutionary nature: The DMP is not static; it evolves, gaining more precision and substance throughout the project, as not all data or potential uses may be clear from the beginning.
  • In the context of research units:

    • First version: When proposing funding, aligning with the institutional regulation for research data management.
    • Support for DMP creation and maintenance: The research unit should support the researcher in creating and managing the DMP, providing conditions for the development of DMPs for associated projects.
    • Evolutionary nature and curation: The DMP is not static; it evolves, gaining more precision and substance throughout the projects, as not all data or potential uses are clear from the beginning. The institution should assume responsibility for data management, preservation, and curation even after the project ends as stipulated in each DMP.

Data Management Plan – Recommended Tools

The following tools are suggested to support the creation and management of DMPs:

  • ARGOS: ARGOS supports automated processes for creating, managing, sharing, and linking DMPs with corresponding research results. It integrates with funders and research projects, allowing the use of predefined templates for creating a DMP. It also offers the flexibility to create new templates according to the specifications of the institution, project, funder, etc.
  • DMPonline: DMPonline integrates a wide variety of project funders, making it easier for researchers to organize data collection processes. It allows editing, updating, and sharing different project versions among researchers and enables exporting in various formats (pdf, docx, csv, html, etc.) at each project stage.

These platforms allow the insertion of DMPs according to the models established by funding entities (e.g., FCT, Horizon Europe, etc.). However, given the specific requirements for DMPs may vary between funders and organizations, it is always good to consult the specific guidelines of the funder.

Where to preserve my data?

Data preservation is fundamental to ensure integrity and long-term accessibility. The following repositories are recommended:

  • Institutional Repository: The Lusófona Scientific Repository is a digital service that congregates scientific works produced in the Lusófona Group, making the Group's scientific production publicly and universally available.

  • Thematic Repositories: Some repositories focus on specific types of data, such as geospatial data, genetic data, etc. These repositories may offer a more suitable infrastructure for certain types of data, making them more accessible to peers (e.g., BioData.pt - biological data; APIS - social information data).

  • Specialized Data Centers: Some specialized data centers offer data preservation services to the scientific community (e.g., Zenodo; POLEN).

When choosing the best repository, it is important to consider the data preservation policy, long-term accessibility, security, metadata requirements, and specific research and community needs.

What if I cannot share my data?

Funding agencies recognize valid reasons for not disclosing research data, such as when the project does not generate data, when the data are subject to commercial exploitation, for confidentiality, privacy, national security reasons, or if data sharing compromises the main project objective. This is why the European Commission adopted the principle "As open as possible, as closed as necessary".

Is it possible to make only some datasets of the project available?

Yes, funding agencies request the disclosure of datasets that support publications, but there is no obligation to share the remaining datasets. Additionally, it is possible that initially, some datasets' availability is planned, but this decision can be modified later, especially if these data fall under the exceptions provided, such as the possibility of commercial exploitation. In such situations, the decision must be justified in the DMPs.

How can I learn more about DMPs?

Several free training courses can help better understand the need and how to create a data management plan:

  • Skills4 EOSC - FAIR-by-Design Methodology
  • NAU/FCCN - The Essentials of Research Data Management
  • RDNL - Essentials 4 Data Support

Why should I follow these guidelines?

Compliance with these guidelines positively impacts the project evaluation, starting from the proposal phase.