Lesson plan 13: Data access
Lesson plan 13: Data access
FAIR elements
Findable
The data access category should not influence the findability of data; all data should be findable irrespective of their access; the main thing is that the metadata should be openly accessible for data to be discoverable/findable.
F2. Data are described with rich metadata (defined by R1 below)
Accessible
Irrespective of the data access category selected, there should be clear information on how data can be accessed (described in the metadata), and the protocol should be open, free and universally implementable. If data access is restricted then an authentication protocol can be used.
A1. (Meta)data are retrievable by their identifier using a standardised communications protocol
A1.1. The protocol is open, free, and universally implementable
A1.2. The protocol allows for an authentication and authorisation procedure, where necessary
A2. Metadata are accessible, even when the data are no longer available
Interoperable
Open data are easier to use as linked data in an interoperable way, especially if available through an API. But interoperability may also require key identifiers to link separate datasets. If these identifiers can identify individual people, e.g. point coordinates of a house, social security number of a person, then access restrictions will be needed to allow such data to be linked.
I3. (Meta)data include qualified references to other (meta)data
Primary audience(s): Bachelor's, master's, PhD degree students
Learning outcomes:
Can state general requirements on data protection and access control
Understands the different access options that exist for data/digital resources
Understands the criteria that influence/define access conditions
Can apply strategies to decide which access level is suitable for their data
Can implement (alternative) research practices to achieve more open data
Recognises how access is important to make data FAIR (all 4 letters)
Summary of tasks/actions:
Introduce your audience to the different access options that exist. Research data can be made available in data centres, data repositories, via an AP, or on the web, with a range of access options. While open access to data may be ideal, there can be genuine reasons why that is not possible. Data access categories (24, 25) can be:
Open access
Restricted access
Embargo
Closed access
Open data can be defined as 'data that can be freely used, re-used and redistributed by anyone – subject only, at most, to the requirement to attribute and sharealike' (26). Access restrictions can require a contractual use agreement or data sharing agreement to be signed. Embargo means that access is closed temporarily. Closed access means that data are not accessible, except maybe to regulators.
Explain the criteria that can influence access decisions (27):
Presence of personal information in the dataset which can be used to identify an individual
Sensitivity of information, where the release of the data can adversely affect
a person, e.g. information on political views, criminal activities;
biodiversity, e.g. the location of rare and endangered species;
a community, e.g. terrorism; and/or
commercial interests of a company.
Intellectual property, where early release of the data can adversely affect patents or valorisation routes
Confidentiality agreement, where access to and sharing of data is restricted to the contracting parties.
Show how a suitable access level can be decided, for example, using a decision tree. Example: Data Sharing guidelines – WUR
Explain that alternative research practices, or adaptations to research practices, could be used to enable more open data. Examples include the following:
Capture data in an anonymous way
Anonymise information in a dataset so individuals (people, animals, etc.) cannot be identified from the information they have contributed during the research
Gain permission from people to make data open, even if the data contain personal or sensitive information (informed consent)
Use citizen science and participatory research methods to co-create data that are then co-owned and can be released as open data
Materials/Equipment
Computer/laptop
Internet/browser
References
Research Data Bootcamp (Bristol) - Repositories for sensitive data: https://data.blogs.bristol.ac.uk/bootcampsd/repositories/
CESSDA Data Management Expert Guide: https://doi.org/10.5281/zenodo.3820472
Open Data Handbook: https://opendatahandbook.org
FOSTER Open Science: The Open Science Training Handbook | Zenodo (p18 onwards)
Take-home tasks
Do one of these exercises on data access:
Exercise: Data access and licensing (UK Data Service) (with answer)
Exercise: Licensing and Access Controls (UK Data Service) (with answer)
Data access exercise (FAIRsFAIR)
Lesson plan 13: Additional material – data availability statements
The list below provides some example data availability statements. Please note that data access statements should be tailored to suit each publication, checking that they meet all funder and publisher requirements.
Statement type | Example statement |
Openly available data | "All data underpinning this publication are openly available from the University of FAIR-Data Repository at http://doi.org/10.15000/a789457" |
Embargoed data | "All data underpinning this publication will be available from the University of FAIR-Data Repository at http://doi.org/10.15002/a1234a56 from 01/02/2019 onwards, following the cessation of an embargo period." |
Restricted data | "Due to ethical/commercial issues, data underpinning this publication cannot be made openly available. Further information about the data and conditions for access are available from the University of FAIR-Data Repository at http://doi.org/10.15000/a1234b56" |
Partially restricted data | "Due to the sensitive nature of this research, only a subset of the participants consented to their anonymised data being retained and shared. Anonymised interview transcripts and survey results from participants who provided consent, other supporting data, and further details relating to the restricted data, are available from the University of FAIR-Data Repository at http://doi.org/10.15129/a1234b56" |
Physical data | "Physical data supporting this publication are stored by the University of FAIR-Data. Details of the data and how they can be accessed are available from the University of FAIR-Data Repository at http://doi.org/10.15129/a1234b56" |
Secondary data | "Pre-existing data underpinning this publication are openly available from UKDS at http://doi.org/10.12345/54321. Further information about data processing, and additional new supporting data are available from the University of FAIR-Data Repository at http://doi.org/10.15129/a1234b56" |
No new data created | "No new data were created during this study. Pre-existing data underpinning this publication were obtained from NPL and are subject to licence restrictions. Full details on how these data were obtained are available in the documentation available from the University of FAIR-Data Repository at http://doi.org/10.15129/a1234b56" |
No data | "This work is entirely theoretical, there is no data underpinning this publication." |
(24) https://data.blogs.bristol.ac.uk/bootcampsd/repositories/
(26) https://opendatahandbook.org/guide/en/what-is-open-data/
(27) https://data.blogs.bristol.ac.uk/bootcampSD/what-counts/
Last updated