Module 1: FAIR Data
This page is currently under construction
The training curriculum is currently undergoing final revisions and quality checks. All materials will be released shortly. Until the official release, please refrain from using, distributing, or implementing any part of these resources.
Learning Objectives
- Learning Objective 1 (LO1): Apply FAIR principles and share data effectively.
- Learning Objective 2 (LO2): Recommend tools, workflows, and technical strategies to make research data FAIR.
- Learning Objective 3 (LO3): Explain the importance of making data FAIR.
Total Module Duration
4.5–5 hours (without optional activities)
Learning Objective 1
LO1: Apply FAIR principles and share data effectively.
Learning Activities
- Presentation (45 mins): Explain each of the FAIR principles, the benefits of applying the FAIR principles, how to make data FAIR (at each principle). Explain the main tools required, such as data management plans, persistent identifiers, licenses for datasets, metadata and repositories on a level where the workflow regarding a research project is clearly communicated.
- Case study analysis (60 mins):
- Case 1: A research project where data was made FAIR (a good example).
- Case 2: A dataset where metadata is missing.
- Case 3: A dataset where the formats are no longer maintained.
Materials to Prepare
- Presentation on FAIR principles and main tools to help apply FAIR principles.
- Case studies to facilitate a discussion activity on research data collection.
Instructor Notes
Presentation on FAIR:
- Introduce the learner to FAIR principles (Findable, Accessible, Interoperable, Reusable) and present the purposes of FAIR principles: not only for sharing, but to produce quality data in general. The aim is to give an overview of data sharing challenges in research and how FAIR principles address them. Use real-world examples of FAIR data implementation to explain the principles.
- Each principle can be explained at a time. Include for each the benefits of applying the FAIR principles, how to make data FAIR (at each principle). Explain the main tools required, such as data management plans, persistent identifiers, licenses for datasets, metadata and repositories. The instructor can make it clearer for the learners by highlighting where in the research project life cycle FAIR principles can be applied and where support may be provided. Consider including an interactive quiz such as a Kahoot to assess the learners understanding of FAIR/non-FAIR data practices with regard to the FAIR principles.
- The instructor can cover the following key takeaways:
- "Open" data and FAIR data: Ensure learners grasp the difference between "open" data and FAIR data. Be clear that FAIR principles are guidelines, not strict rules, and that flexibility is essential. FAIR principles are designed to make data reusable and more accessible.
- Sharing data using FAIR principles leads to more reproducible and impactful research.
- FAIR data is not just about open data but making data "as open as possible, as closed as necessary."
- Explain why the FAIR principles are being taught. The main objective of FAIR is to increase data reuse by researchers. The core concepts of the FAIR principles are based on good scientific practice and intuitively grounded. As data stewards, it will be the learners job to support good research practice and support FAIR.
- Make sure learners understand that the metadata for a given dataset is often linked to the data repository where it has been published. For instance, in Zenodo, the metadata is very basic, whereas in disciplinary repositories, for instance the Cambridge Structural Database, the metadata is much more detailed. Accordingly, address discipline-specific considerations for implementing FAIR principles in different fields, for example differences in metadata standards or data formats.
Case Studies:
- Prepare the three cases:
- 1) find a "good example" dataset that has been made "FAIR",
- 2) a dataset where the metadata is missing, and
- 3) where the format is outdated.
- Prepare questions for the learners to work with such as: which metadata would be appropriate to add to Case 2, how can we ensure that data is usable in years to come, which formats could we recommend (Case 3).
- Many libraries provide guides to where you can find teachable datasets, such as those linked in Resources 1–3. These datasets may require you to edit the README file or remove some metadata to make interesting cases.
Resources
Input for presentation and case studies:
- Datasets for Teaching and Learning. https://www.lib.ncsu.edu/formats/teaching-and-learning-datasets. Accessed 24 Mar. 2025.
- Huck, Jennifer. LibGuides: Data and Statistics: Find Data for Teaching and Learning. https://guides.lib.virginia.edu/data/teachandlearn. Accessed 24 Mar. 2025.
- Data Education in School. Useful datasets for data education in schools. https://dataschools.education/resource/useful-datasets-for-data-education-in-schools/. Accessed 24. Mar. 2025
Background reading for slide preparation and discussion with learners. These resources are also suitable for sharing with learners:
- A FAIRy Tale -- A Fake Story in a Trustworthy Guide to the FAIR Principles for Research Data. https://forskningsdata.dk/fairytale/index.html. Accessed 19 Mar. 2025.
Inspiration for instructor - "FAIR-Aware: Assess Your Knowledge of FAIR". FAIRsFAIR, 3 Jul. 2020. https://www.fairsfair.eu/fair-aware.
- "FAIR Principles". GO FAIR. https://www.go-fair.org/fair-principles/. Accessed 19 Mar. 2025.
- Adams, Jenni, m.fl. "Supporting FAIR Data Management Planning Across Different Disciplines at the University of Sheffield". Data Science Journal, bd. 22, nr. 1, June 2023. datascience.codata.org. https://doi.org/10.5334/dsj-2023-017.
- FAIR for Beginners | Danish E-Infrastructure Consortium. https://www.deic.dk/en/data-management/instructions-and-guides/FAIR-for-Beginners. Accessed 24 Mar. 2025.
- Jasinska, Agnes, et al. Open Licences for Data. https://doi.org/10.5281/zenodo.14921877.
Please note the Open learning object which is an interactive checklist to help learners choose a license before sharing data.
Learning Objective 2
LO2: Recommend tools, workflows, and technical strategies to make research data FAIR.
Learning Activities
- Workshop Activity 1 with workshop handout (30 mins): Discussion around tools to make research data FAIR.
- (Optional) Workshop Activity 2 (180 mins): Learners start exploring the technicalities of supporting FAIR data. The workshop aims to familiarise learners with the main technical strategies to make data FAIR and identify the competencies they need to develop further to support FAIR research data. The learners are introduced to a toolbox which will help them identify further when in the research project life cycle such tools can be appropriate to apply.
Materials to Prepare
- Workshop Activity 1: overview of tools to support FAIR data with key takeaways.
- Prepare and distribute the Workshop Handout.
- (Optional) Workshop Activity 2: Ensure the content is tailored to the learners' discipline(s).
Instructor Notes
Workshop Activity Part 1:
- Objective: Familiarise learners with the main technical strategies to make data FAIR.
- Begin with a brief recap of the FAIR principles and the importance of making data FAIR (depending on how the instructor is teaching the module, these can be covered by materials in Learning Objective 1).
- Present and share the handout, which provides a summary of key technical strategies for each FAIR principle (e.g., metadata, PIDs, repositories, formats, licenses).The workshop will give an overview of tools supporting FAIR data (e.g., Zenodo, Figshare, Dataverse, etc.). The purpose is to provide a more "hands-on" introduction to workflows for metadata generation, data storage, and archiving (this activity is also a practical exploration of learning material in Module 7 on Metadata and Module 9 on Data Management Plans). It is important to use time explaining licensing options (for instance, Creative Commons) that align with FAIR principles.
- Key Takeaways are:
- Not all tools are equally suitable for every type of data or research project.
- Many tools and repositories exist to facilitate FAIR data, some are more complicated than others. Test the appropriateness of the tool before recommending to learners or including in the workshop material.
- Choosing the right tool depends on your specific research needs.
- Well-structured metadata is crucial for making data findable and reusable.
- The instructor can highlight why FAIR principles are important for DS technical skills. Each FAIR principle plays a crucial role in ensuring that data can be easily discovered, accessed, integrated, and reused by other researchers and stakeholders. Assigning persistent identifiers, such as DOIs to datasets ensures they are reliably referenced and can be easily located over time. PIDs enhance the Findability and credibility of research outputs. Using standardised formats and controlled vocabularies facilitates data Interoperability, allowing datasets to be easily integrated and analysed alongside other data. This is vital for collaborative research and data sharing across disciplines. Familiarity with the tools and resources (for example metadata generators, repositories, FAIR evaluation tools) is essential for effectively implementing FAIR principles in practice. FAIR principles are also the first step for a reproducible research. These tools help streamline the process of making data FAIR.
Workshop Handout
The suggested handout can be a list that prompts the learners to think about the tools and workflows for each principle or developed further as an online resource with links to examples and more information. it can also be used by the learners to assess the FAIRness of repositories and other services.
Prepare a detailed explanation of technical strategies that can be employed for each FAIR principle. The list below can be:
- Findable
- Persistent Identifiers (PIDs): Assign globally unique and persistent identifiers (DOIs, Handles) to your data and metadata so that it can be reliably found and referenced. Examples include:
- DOIs (Digital Object Identifiers) for datasets (DataCite, Zenodo).
- ORCID iDs for author identification.
- Metadata: Create and publish rich metadata using standard schemas (e.g., Dublin Core, DataCite). Ensure metadata includes PIDs and is indexed in searchable databases and repositories.
- Tools: DataCite Metadata Generator, Zenodo, Figshare.
- Data Catalogs/Repositories: Deposit your data in open, FAIR-compliant repositories that ensure data and metadata are indexed and easily searchable.
- Examples: Zenodo, Figshare, Dryad, Dataverse.
- Persistent Identifiers (PIDs): Assign globally unique and persistent identifiers (DOIs, Handles) to your data and metadata so that it can be reliably found and referenced. Examples include:
- Accessible
- Data Licensing: Use appropriate open licenses (e.g., Creative Commons licenses) to make data accessible within legal and ethical constraints.
- Tools: Creative Commons License Chooser.
- Standardized Protocols: Provide data using open, universally accessible protocols like HTTP or FTP. Ensure metadata remains accessible even if the data itself is restricted or sensitive.
- Repositories like Zenodo and Figshare support these protocols.
- Authentication and Authorisation: For sensitive or restricted data, implement secure access protocols (for instance, OAuth, OpenID) and maintain controlled access to data through repositories that provide these services.
- Example repositories: dbGaP, controlled-access repositories.
- Interoperable
- Standardised Formats: Use open, non-proprietary, and widely recognized file formats that enable interoperability across platforms (for example CSV, JSON, XML, NetCDF). Ensure these formats are compatible with common tools and software used by the research community.
- Example formats: HDF5, NetCDF (for scientific data); RDF (for semantic web).
- Controlled Vocabularies and Ontologies: Use community-endorsed vocabularies and ontologies to describe your data consistently and unambiguously.
- Tools: FAIRsharing, BioPortal (to find vocabularies and ontologies).
- Linked Data and Ontologies: Make data interoperable by using linked data principles (e.g., RDF, SPARQL) and connecting datasets with ontologies.
- Examples: OWL, RDF, SPARQL endpoints.
- Reusable
- Detailed Metadata: Provide comprehensive and rich metadata that describes not just the data but also its context, provenance, and how it can be reused (methodology, instruments, software used).
- Data Licensing: Use appropriate licenses (such as Creative Commons licenses) to clearly specify how your data may be reused with regard to legal and ethical constraints.
- Tools: Metadata schemas like DataCite, Dublin Core, and discipline-specific standards like ISO 19115 for geospatial data.
- Data Provenance: Track and document data provenance (how data was collected, processed, or transformed) so future users can assess the quality and suitability of the data for reuse.
- Tools: ProvONE (for provenance tracking), RO-Crate (for research objects).
- Data Versioning: Use version control to ensure that data changes are tracked and documented, enabling users to reference the exact version of data they are using.
- Tools: Git, Zenodo versioning, Dataverse versioning.
- Citable and Open Documentation: Publish clear, citable documentation about how the data was generated, what standards it follows, and how to reuse it. This may include methods papers, data dictionaries, or readme files.
- Tools: README generator tools, Jupyter Notebooks (for reproducibility).
(Optional) Workshop Activity 2
A. Hands-on Exercise – Making a Dataset FAIR (60 mins)
Check there is internet access for participants and they have their computers with them to explore the tools later in the workshop.
Objective:
- Learners will apply strategies to make a provided dataset FAIR.
Instructions:
- Split participants into small groups (3–5 people per group).
- Provide each group with a sample dataset (ideally one with minimal FAIR compliance: incomplete metadata, no PIDs, and so on).
- Assign each group the task of improving the dataset's FAIRness by making it:
- Findable: Adding metadata to make the dataset findable using a tool like the DataCite Metadata Generator.
- Accessible: Deciding on an appropriate license using the Creative Commons License Chooser, and selecting a repository like Zenodo to host the dataset.
- Interoperable: Converting the dataset into a standardised format (such as from Excel to CSV or JSON).
- Reusable: Writing a comprehensive README file explaining the data's context, collection process, and potential for reuse. Add detailed metadata to enhance reusability.
Materials to Prepare:
- A basic dataset in Excel or CSV format (could be real or fictional).
- Access to tools such as:
- DataCite Metadata Generator (for metadata).
- Zenodo or Figshare (for repository selection).
- Creative Commons License Chooser (for licensing).
- README file generator or templates.
- Internet access for groups to access tools.
B. Evaluation and Group Presentations (1 hour)
Objective:
- Evaluate the group's work and learn from each other's approaches.
Instructions:
- Each group presents their FAIRification process, explaining the changes they made to the dataset in relation to each FAIR principle.
- The group should walk through how they improved the findability, accessibility, interoperability, and reusability of the dataset.
- After presentations, facilitate a group discussion focusing on:
- What challenges did they encounter while making the data FAIR
- Which tools were most useful
- What additional steps could be taken to further enhance the dataset's FAIRness?
Materials to Prepare:
- Presentation slides or flip charts for each group to summarise their process.
- A rubric or checklist to evaluate the improvements made to the datasets according to the FAIR principles.
C. Wrap-up and Reflection (30 minutes)
Objective:
- Reflect on the learning experience and reinforce the key takeaways.
Instructions:
Ask participants to reflect on:
- What technical strategies stood out as most important for FAIR data?
- How would they apply these strategies in their own research or work?
- What future skills or tools they would like to explore in more detail.
- Conclude with a Q&A session to clarify any outstanding questions about the technical strategies for making data FAIR.
Assessment:
- Each group's dataset should be assessed based on how well they addressed each of the FAIR principles.
- Provide feedback on their use of tools, completeness of metadata, and choice of repositories.
Optional Extension:
- Learners could evaluate the FAIRness of their own datasets and present improvements in a follow-up session or as a homework task.
Materials to Prepare:
- Pre-select a dataset for each group with suboptimal FAIR compliance.
- Set up access to necessary tools (Zenodo, DataCite).
- FAIRness checklist or rubric for the evaluation stage.
- Prepare a slide deck or handout summarising key FAIR principles and strategies.
Resources
Input for tutorials and tools to include in the workshop:
- "RDM Starter Kit". GO FAIR. https://www.go-fair.org/resources/rdm-starter-kit/. Accessed 19 Mar. 2025.
- RDMkit. https://rdmkit.elixir-europe.org/metadata_management. Accessed 19 Mar. 2025.
- "DataCite Training". DataCite Support. https://support.datacite.org/docs/datacite-training. Accessed 19 Mar. 2025.
- Knowledge Exchange Webinar -- Persistent Identifiers (PID's) in Academia: Risk and Trust | Danish e-Infrastructure Consortium. https://www.deic.dk/da/event/knowledge-exchange-PID. Accessed 19 Mar. 2025.
Input for additional tools and platforms that assess the FAIRness of data, such as:
- FAIRsharing. https://fairsharing.org/?lang=en. Accessed 19 Mar. 2025.
This resource helps identify community standards for datasets and metadata:
- Assessment. https://fairplus.github.io/the-fair-cookbook/content/recipes/assessing-fairness.html. Accessed 19 Mar. 2025.
This resource provides services, tools and indicators to assess data against the FAIR principles:
- "F-UJI Automated FAIR Data Assessment Tool". FAIRsFAIR, 22 Sep. 2020. https://www.fairsfair.eu/f-uji-automated-fair-data-assessment-tool.
This resource is an automated tool for assessing FAIR data objects:
- Jasinska, Agnes, et al. Open Licences for Data. https://doi.org/10.5281/zenodo.14921877.
Please note the Open learning object which is an interactive checklist to help learners choose a license before sharing data.
Background reading on suggested skills for data stewards:
- Wildgaard, Lorna, m.fl. Milestone: Pilot learning path for Data Stewards. Zenodo, 13 Aug. 2024. Zenodo. https://doi.org/10.5281/zenodo.13309349.
Learning Objective 3
LO3: Explain the importance of making data FAIR.
Learning Activities
- Debate (45 mins): Should all data be FAIR? Divide learners into two groups to argue for and against.
- Role-play (45mins): Advocate for FAIR data to a funding agency or a sceptical researcher.
- Mind mapping (50 mins): Create a mind map showing the connections between FAIR principles and other key concepts like metadata, data curation, and preservation. Based on the mind mapping activity, identify areas where learners feel less confident about FAIR data and address them.
Materials to Prepare
- Debate: Prepare debate topics and provide examples of ethical and legal frameworks as learner preparation.
- Facilitate the role play.
Instructor Notes
Overall:
- The focus of this learning objective will be to consider the ethical, legal, and practical reasons for advocating FAIR data. Advocating for the why rather than the how, which is addressed in LO2. Address how FAIR data increases transparency and impact in research as well as common concerns researchers may have about making data FAIR (such as time, cost, privacy issues). Talk about the advantages too, such as how FAIR data helps in the reproducibility crisis (discipline-dependent).
- Focus on balancing benefits with realistic constraints.
- There are common concerns (intellectual property, data security, lack of technical skills) about how to work FAIR and also advantages (for example making data FAIR can help secure funding and collaborations) which the data steward will need to mediate. Hence, there are valid concerns about FAIR data, but they can be managed with the right strategies.
- FAIR data benefits the scientific community and increases research visibility.
- The activities encourage the learners to articulate the FAIR agenda and consider FAIR from the perspective of researchers and data stewards.
- In this last lesson, talk about how FAIR principles are interwoven in other data stewardship concepts. Identify for example how other modules in the curriculum build on the FAIR principles (metadata, data quality).This will encourage the learner to think critically across the curriculum and the complexity of the tasks they are working with as data stewards on a deeper level. Make sure learners understand how FAIR connects with broader data stewardship topics and encourage learners to identify their learning needs for future modules.
- FAIR is a foundational concept that informs all aspects of data stewardship. The principles of FAIR will be applied repeatedly throughout the curriculum. Other lessons in the curriculum dive deeper into specific technical strategies for implementing FAIR.
Debate:
- The topic can be: Should all data be FAIR?
- The instructor can provide examples of ethical and legal frameworks as learner preparation. How to prepare for the debate and for tips can be found in Resource 1.
Role Play:
- Advocate for FAIR data to a funding agency or a sceptical researcher. "The What I Need From You" activity is an engaging format to structure the role play (Resource 2).
Mind map:
- This activity serves as an assessment. Mind mapping templates can be prepared (Resource 3) and used in the assessment activity. Likewise prepare to discuss with the learners the areas they feel less confident about FAIR, the skills they may wish to improve and where they can learn more.
Resources
How to set up the debate and role-play activity, an instructor guide:
- 'Classroom Debates | Center for Innovative Teaching and Learning'. Northern Illinois University. https://www.niu.edu/citl/resources/guides/instructional-guide/classroom-debates.shtml. Accessed 24 Mar. 2025.
- 'What I Need From You (WINFY)'. SessionLab. https://www.sessionlab.com/methods/what-i-need-from-you-winfy. Accessed 24 Mar. 2025.
Brainstorming tools for assessment activities:
- "Free Online Mind Maps." Canva, https://www.canva.com/graphs/mind-maps/. Accessed 19 Mar. 2025.
Background reading for instructor preparation. These resources are also suitable for sharing with learners:
- "RDM Starter Kit". GO FAIR. https://www.go-fair.org/resources/rdm-starter-kit/. Accessed 19 Mar. 2025.
- Elixir Europe. "FAIRplus Webinar - What is the value of FAIR data?" YouTube,. https://www.youtube.com/watch?v=2iWf4XtnzkI. Accessed 19 Mar. 2025.
- Wildgaard, Lorna, m.fl. Milestone: Pilot learning path for Data Stewards. Zenodo, 13. Aug. 2024. Zenodo. https://doi.org/10.5281/zenodo.13309349.
- 4EU+. "Research Data Management- Introduction to FAIR and Open Data. YouTube, https://www.youtube.com/watch?v=gK5ZPKVk4RA. Accessed 19 Mar. 2025.