Lesson plan 14: FAIR software/citable code

Lesson plan 14: FAIR software/citable code

FAIR elements: All (for details on how the FAIR principles can be applied to research software, see table 1 of Lamprecht, Anna-Lena et al. 2020).

Primary audience(s): Master's and PhD degree students

Learning outcomes

  • Is able to explain how research software differs from other types of software

  • Can understand the modified FAIR principles for software (FAIR4RS)

  • Understands accepted best practices on the basis of FAIR4RS

  • Can apply the principles of software citation

Summary of tasks/actions

  1. Define research software:

    1. Give a definition of research software

    2. Give examples and counterexamples, e.g. word processing software, of research software; be sure to include a breadth of examples including scripts and workflows

    3. Identify similarities and differences between research data and software with regard to application of the FAIR principles

    4. Identify similarities and differences between FAIR software and Free and/or Open Source Software (FOSS)

  2. Explore how the FAIR principles can be applied to software (Chue Hong et al. 2021) (28), in each case providing a concrete example of how to carry out the principle

    1. Findable – F: Software, and its associated metadata, is easy to find for both humans and machines.

      • F1. Software is assigned a globally unique and persistent identifier.

        • F1.1. Different components of the software representing different levels of granularity are assigned distinct identifiers.

        • F1.2. Different versions of the software are assigned distinct identifiers.

      • F2. Software is described with rich metadata.

      • F3. Metadata clearly and explicitly include the identifier of the software they describe.

      • F4. Metadata are FAIR, and are searchable and indexable.

    2. Accessible – A: Software, and its metadata, is retrievable via standardized protocols.

      • A1. Software is retrievable by its identifier using a standardized communications protocol.

        • A1.1. The protocol is open, free, and universally implementable.

        • A1.2. The protocol allows for an authentication and authorization procedure, where necessary.

      • A2. Metadata are accessible, even when the software is no longer available.

    3. Interoperable – I: Software interoperates with other software by exchanging data and/or metadata, and/or through interaction via application programming interfaces (APIs), described through standards.

      • I1. Software reads, writes and exchanges data in a way that meets domain-relevant community standards.

      • I2. Software includes qualified references to other objects.

    4. Reusable – R: Software is both usable (can be executed) and reusable (can be understood, modified, built upon, or incorporated into other software).

      • R1. Software is described with a plurality of accurate and relevant attributes.

        • R1.1. Software is given a clear and accessible licence.

        • R1.2. Software is associated with detailed provenance.

      • R2. Software includes qualified references to other software.

      • R3. Software meets domain-relevant community standards.

  3. (Advanced) Explore how software quality goes beyond the FAIR data principles

    1. Quality of the form vs. quality of the function of a research software

    2. Test for code maintainability

    3. Validation of the functional correctness

    4. Security measures

    5. Computational efficiency

  4. Recognise software citation as key to recognising research software as a first-class research output

    1. Software citation principles

    2. Ways to improve citability of own software, e.g. citation file format: CITATION.cff

References

Definition of research software

Best practices

FAIR for Research Software working group

Software citation

Further resources


Lesson plan 14: Additional material on software citation

It is appropriate to consider software in the context of FAIR due to the close relationship between data and software. Citing software is key to recognising it as a first-class research object in the same way data are. The FAIR4RS Working Group is at present adapting the FAIR principles to research software (29). Providing mechanisms to cite software effectively is still very much in progress and has proved to be a complex problem (D.S. Katz et al., arXiv 1905.08674 [cs.CY]). Nevertheless, significant progress has been made over the last five years. The FORCE-11 Software Citation Implementation Working Group have developed checklists for (paper) authors and (software) developers, best practices for software repositories and registries (arXiv 2012.13117 [cs.DL]), and guidance for journals (D.S. Katz et al. F1000Research 9:1257, 2021. https://doi.org/10.12688/f1000research.26932.2). The CodeMeta project is developing a minimal metadata schema for science software and code in JSON and XML.

JATS4R (JATS for Reuse), a working group devoted to optimising reusability of scholarly content by developing best-practice recommendations for tagging content in JATS XML, aims to support the various ways in which people can cite software.

Authors are exploring different ways to make their content, source materials, and methodology accessible to readers, and throughout this recommendation, we try to indicate where software citation initiatives are promoting change and development.

The following are the minimum requirements for a software citation (followed by desirable):

Required:

  • Creator(s): the authors or project that developed the software

  • Title: the name of the software

  • Publication venue: the publication venue of the software, ideally an archive or repository that provides persistent identifiers

  • Date: the date on which the software was published

  • Identifier: a resolvable pointer to the software, ideally a PID that resolves to a landing page containing descriptive metadata about the software, similar to how a Digital Object Identifier (DOI) for a paper points to a page about the paper rather than directly to a representation of the paper, such as a PDF file. DOIs are preferable, and other examples of PIDs include Handles, RRIDs, ASCL IDs, swMath IDs, Software Heritage IDs, and ARKs. If there is no PID for the software, a URL to where the software exists may be the best identifier available

Desirable:

  • Version: the identifier for the version of the software being referenced. If the version is unidentified or unknown, the date of access should be used

  • Type: some citation styles, e.g. APA, require the inclusion of a bracketed description of the citation, e.g. computer software.

Recommendation

Minimum requirements for a software reference

  1. <mixed-citation> @publication-type=”software”. Software citations MUST use a value of "software" for the @publication-type attribute. [[Warning when @publication-type is "Software", "SOFTWARE", "softwares" or "software" with anything else in the value]]

    Note: This maps to Datacite resourceTypeGeneral attribute "Software". JATS4R policy is to use lowercase for attribute values, in turn requiring crosswalk mapping of "software" to "Software"

  2. <pub-id>. If there is a well-defined identifier for software, this element should be used, for example doi, accession number, or SWHID. As per existing JATS4R recommendations on data citations, this element should be used to hold both the repository ID for the software in the element content, and, if applicable, the full URL to the data in the @xlink:href attribute.

    Note: GitHub/Bitbucket/GitLab is not considered a reliable authority for providing IDs, so a GitHub git commit ID is not considered a <pub-id>.

  3. @pub-id-type on <pub-id>. In contrast to what is stated in the Tag Library ("Type of publication identifier or the organisation or system that defined the identifier"), this attribute should only be used to state the type of identifier, and not to specify the organisation or system that defined the identifier (for example, doi, SWHID, accession).

  4. @assigning-authority on <pub-id>. When the given type of identifier can be assigned by more than one organisation, e.g. accession numbers biomodels.db, docker hub, and the organisation registering the identifier is known, you should include the @assigning-authority attribute on the &lt;pub-id&gt; element.

    Note: DOIs do not require an assigning-authority because although there are different DOI registrants, the DOI organisation is a central resolver service

Context

Elements: <element-citation>, <mixed-citation> <person-group>, <name> / <string-name> / <collab>, <article-title>, <version>, <pub-id>, <ext-link>, <date-in-citation>, <publisher-name>, <source>

Attributes:

@publication-type: Type of Referenced Publication (for example, "book", "letter", "review", "journal", "patent", "report", "standard", "data", "working-paper"),

@person-group-type: Role of the persons being named in <person-group> element (for example, author, editor, curator),

@designator: Used on such elements as edition number (<edition>) and version (<version>) to hold an unadorned numerical or alphabetical value of the edition or version number for machine search, when the number is a phrase or textual value,

@pub-id-type: Type of publication identifier, such as a DOI or a publisher's identifier,

@assigning-authority: Names the authority that assigned or administers an identifier used in this document, for example, Crossref, GenBank, or PDB.

Examples

  1. Example of accession with assigning authority pair, so renderer can create link. Preferred option, but appreciate many renders will not create the link: <ref id=“bib2”> <element-citation publication-type=“software”> <source>BioModels</source> <pub-id @assigning-authority="EBI" @pub-id-type="accession" xlink:href=”https://identifiers.org/biomodels.db:BIOMD0000000156”>BIOMD0000000156</pub-id> </element-citation> </ref>

  2. Example of accession with assigning authority pair, with URL too (if concern renderer(s) will not generate the link): <ref id=“bib2”> <element-citation publication-type=“software”> <pub-id @assigning-authority="biomodels.db" xlink:href=”https://www.ebi.ac.uk/biomodels/BIOMD0000000156”>BIOMD0000000156</pub-id> </element-citation> </ref>

  3. Example of identifier as URL link only (least preferred) Github example <ref id=“bib2”> <element-citation publication-type=“software”> <person-group person-group-type=“author”> <ext-link ext-link-type=“uri” xlink:href=“https://github.com/JATS4R/jats-validator-docker”>https://github.com/JATS4R/jats-validator-docker</ext-link> </element-citation> </ref>

Additional reading

Software Metadata Recommended Format Guide (SMRF)

Katz DS, Chue Hong NP, Clark T et al. Recognizing the value of software: a software citation guide [version 2; peer review: 2 approved]. F1000Research 2021, 9:1257 (https://doi.org/10.12688/f1000research.26932.2).

Guidance for:

Note on authorship

We recognise the author names are often missing from Github readmes, and only user names and handles are available. Likewise, contributors to code repositories vary over time, and the authors of software may differ from the authors of a research paper associated with the code. This recommendation offers no guidance on how to manage policy decisions associated with these issues. However, it deals with the lack of actual names by allowing for user names and handles to be used in author tags.


(28) Draft published in June 2021 by the FAIR4RS RDA working group (Chue Hong et al. 2021): http://doi.org/10.15000/a789457, reserved DOI for revised version currently in press: https://doi.org/10.15497/RDA00068

(29) Revised version in press, reserved DOI: https://doi.org/10.15497/RDA00068


Last updated