Lesson plan 7: Data standardisation and ontologies

FAIR elements:

Findable

Standardisation of data identifiers makes data easier to find.

F1. (Meta)data are assigned a globally unique and persistent identifier

Interoperable

The data usually need to be integrated with other data. In addition, the data need to interoperate with applications or workflows for analysis, storage, and processing.

Interoperability is made easier through standardised representations of knowledge and by using standard variables that allow linking of data files, e.g. using standardised date and time stamps.

I1. (Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation

I2. (Meta)data use vocabularies that follow FAIR principles

I3. (Meta)data include qualified references to other (meta)data

Reusable

Domain-relevant community standards make data easier to understand and reuse.

R1.3. (Meta)data meet domain-relevant community standards

Primary audience(s): Bachelor's, master's, PhD degree students(without a knowledge management background)

Learning outcomes:

  • Can explain aspects related to data interoperability and integration (standardisation in data and how data standards are used)

  • Can explain aspects of data preparation and cleaning

  • Can explain the roles of ontologies and vocabularies

  • Can recognise the use of ontologies and vocabularies

  • Can recognise the role of data standards in making data FAIR.

  • Understands that different communities use different data standards and ontologies to improve the understanding and interoperability of their research data

  • Can identify a few domain-relevant ontologies

  • Understands usage scenarios of ontologies during data collection, data analysis and when making data available through repositories, APIs, etc.

  • Knows how to act when an ontology does not exist or elements are missing in an existing ontology

Summary of tasks/actions:

  1. Explain with easy and practical examples (from your discipline) how standardisation of data can be applied in research. Standardisation enables interoperability of data, common understanding of data and facilitates (interdisciplinary) reuse of data. Some simple examples are:

    1. Standard coding structures (e.g. use 1=male, 2=female systematically, and not sometimes 1=female, 2=male, or 0=male, 1=female)

    2. Standard units: degrees Celsius vs. degrees Fahrenheit; wind speed measured in m/s vs. knots/s, universal date and time stamps (19)

    3. Standard geospatial representations, e.g. WGD84

    4. Statistical Classification of Economic Activities in the European Community: NACE code

    5. Universal system of (binomial) nomenclature and taxonomy to name and classify biodiversity, now also including DNA barcoding (20)

    6. Standards for dates and times (ISO 8601), for countries (ISO 3166), for geographical names (Getty Thesaurus)

  2. You could also show an example of how not using standards makes things more difficult, or involves more work to clean and translate data, for example:

    1. Survey data where standardised responses are still captured as 'text' rather than numerical codes (dataset with 'male', 'female' rather than numeric codes)

    2. Datasets where units of variables are not defined, so it is not possible to say whether the temperature is in Celsius or Fahrenheit

    3. Any other example listed above where no standard was used in the dataset

  3. Use examples of data standards in different disciplinary communities (see references)

    1. Help define data procedures, standards and guidelines by discipline. For example, are there guidelines for data processing, are there metadata standards, are there controlled vocabularies, ontologies and taxonomies, are there specialised data repositories used by the scientific community?

  4. Describe what ontologies are and their function in the semantic web. Learn the various types of ontologies.

Interoperability is also part of teaching and adheres to the following principles:

  • F2. Data are described with rich metadata – using ontologies is part of good metadata practice

  • R1. Meta(data) are richly described with a plurality of accurate and relevant attributes – using ontologies is part of good practice for rich and precise descriptions

  • R1.3. (Meta)data meet domain-relevant community standards – the same as the previous two bullet points

References

Use cases:

Take-home tasks

  • Analyse the existing standards (general and/or by discipline) required in FAIR principles

  • Study/Analyse the standards that apply in a particular discipline

  • Standardise a dataset: choose a discipline, create or download a dataset and standardise it according to the scientific community

  • Activities related to data standardisation tools:


(19) Good example on standardising date time stamp in: Data Tree, module 2, topic 4, Data Handling and Formats: Practicalities: Presentation: Data Handling and Formats (datatree.org.uk)

(20) Global Taxonomy Initiative (cbd.int)


Last updated