Lesson plan 5: File formats

FAIR elements:

Accessible

Once the user finds the required data, they need to know how they can be accessed, possibly including authentication and authorisation.

A1. (Meta)data are retrievable by their identifier using a standardised communications protocol

A1.1. The protocol is open, free, and universally implementable

A1.2. The protocol allows for an authentication and authorisation procedure, where necessary

A2. Metadata are accessible, even when the data are no longer available

Interoperable

The data usually need to be integrated with other data. In addition, the data need to interoperate with applications or workflows for analysis, storage, and processing.

I1. (Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.

I2. (Meta)data use vocabularies that follow FAIR principles

I3. (Meta)data include qualified references to other (meta)data

Reusable

The ultimate goal of FAIR is to optimise the reuse of data. To achieve this, metadata and data should be well-described so that they can be replicated and/or combined in different settings.

R1. (Meta)data are richly described with a plurality of accurate and relevant attributes

R1.1. (Meta)data are released with a clear and accessible data usage license

R1.2. (Meta)data are associated with detailed provenance

R1.3. (Meta)data meet domain-relevant community standards

Primary audience(s): Bachelor's, master's, PhD degree students

Learning outcomes:

  • Knows which formats support FAIR data

  • Understands what the differences are between open and proprietary formatsh

  • Knows about open formats and how/where to check their openness

  • Is able to apply knowledge by exporting/converting files into different formats

Summary of tasks/actions:

  1. Raise awareness about file formats and their standards:

    1. obsolescence

    2. proliferation

    3. lossless vs. lossy formats

    4. significant properties

  2. Show the differences between open and proprietary formats, and explain their role in making data FAIR (documentation, standards):

    1. What are the advantages of open formats?

    2. What are the disadvantages of proprietary formats?

    3. What should you do if you still (need to) use proprietary formats?

      1. How to convert file formats?

      2. How to export files into a different format?

      3. How to save the files in containers to preserve the original (proprietary) format along with a more open option?

  3. Show tools for file format identification, e.g. PRONOM, and validation, e.g. JHOVE

  4. Application of knowledge in practice (quiz, exercises)

    1. Questionnaire: Open or not? Which of these file formats support FAIR data?

      1. Which of these text formats are suitable for long-term archiving? (Multiple choice)

        1. txt

        2. docx

        3. odt

        4. html

      2. Which of these tabular formats are suitable for long-term archiving? (Multiple choice)

        1. xlsx

        2. csv

        3. tsv

        4. spss portable

      3. Which of these image formats are suitable for long-term archiving? (Multiple choice)

        1. jpg

        2. png

        3. tiff

        4. gif

    2. Learners choose a random folder from their directory. They should check the stored file formats in terms of the FAIR principles and try to export or convert the file into a more open file format, if necessary.

Materials/Equipment

  • Computer/laptop

  • Internet/browser

References


Last updated