Top 10 FAIR Data & Software Things

Biomedical Data Producers, Stewards, and Funders


Sprinters:

Lisa Federer (National Library of Medicine), Douglas Joubert (National Institutes of Health Library), Allissa Dillman (National Center for Biotechnology Information), Kenneth Wilkins (National Institute of Diabetes and Digestive and Kidney Diseases), Ishwar Chandramouliswaran (National Institute of Allergy and Infectious Diseases), Vivek Navale (NIH Center for Information Technology), Susan Wright ( National Institute on Drug Abuse)

Audience:

Things

Thing 1: Metadata creation and curation

Beginner activity:

  1. Learn about the various types of metadata. DataOne defines metadata as “documentation about the data that describes the content, quality, condition, and other characteristics of a dataset. More importantly, metadata allows data to be discovered, accessed, and reused” - DataONE Education Module.
  1. Work through the DataOne Metadata Educational Module: Lesson 7 - Metadata.
  2. Explore the use of controlled vocabularies and Common Data Elements (CDE). A CDE is a “data element that is common to multiple data sets across different studies.” The NIH Common Data Element (CDE) Resource Portal has identified CDEs for use in particular types of research or research domains after a formal evaluation and selection process.
    • Take the NIH CDE interactive tour to learn how to use the site.
    • Browse the CDEs to explore how these might be used in your discipline.

Intermediate activity:

  1. Think about ways you can standardize minimal/core metadata to use across disciplines. For example, crosswalk between standards).
  2. Automated metadata creation can “help improve efficiency in time and resource management within preservation systems, and alleviate the problems associated to the "metadata bottleneck”.
  3. Review the Digital Curation Centre (DCC) Automated Metadata Generation primer page.
  4. Download the DCC Digital Curation Reference Manual and think about the ways you might be able to automate metadata creation at your organization.
  5. Watch the ALCTS Session 1: Automating Descriptive Metadata Creation: Tools and Workflows webinar which examines workflows for automating the creation of descriptive metadata.

Thing 2: Use of standard data models

  1. Explore the OMOP Common Data Model (CDM), which allows for the systematic analysis of disparate observational databases.
  2. Review one of the OMOP Community Meeting presentations and think about how this might align to the work of your organization.
  3. Familiarize yourself with one of the Observational Health Data Sciences and Informatics GitHub repositories.

Thing 3: Exploring unique, persistent identifiers

Beginner activity:

Globally unique and persistent identifiers remove ambiguity in the meaning of your published data by assigning a unique identifier to every element of metadata and every concept/measurement in your dataset (GOFAIR)

  1. Explore the GO FAIR F1 webpage to see examples of globally unique and persistent identifiers.
  2. Learn how a Digital Object Identifier (DOI) can be used to create a unique reference to your data. Watch a video that explain what DOIs are and how they work, and how they benefit managers of digital content.
  3. Read the Digital Preservation Handbook to learn about all of the elements that comprise a persistent identifier.

Intermediate Activity:

ORCID allows you to create persistent digital identifiers for authors.

  1. Create an ORCID ID.
  2. Link your ORCID with CrossRef and DataCite.
  3. Then, go through steps included in the Getting Started with ORCID Integration guide.
  4. Test the ORCID Application Programming Interface (API).
  5. As a best practice, use ORCIDs from the start of data creation. For example, you can attach data creator name/ORCID to dataset as a metadata field. Include ORCIDs with datasets in repositories (e.g. in Sequence Read Archive (SRA), include the ORCID for the data creator). This allows for the tracking of your research and enables citation of your data.

Thing 4: Versioning and data “retirement”

Beginner activity:

A source-code repository is a file archive and web hosting facility where a large amount of source code, for software, web pages, and other resources, is kept, either publicly or privately. Advantages of versioning include:

  1. Persistence of identifiers pointing to different/earlier versions
  2. Maintaining previous versions of code, software, and data.
  3. Sharing various levels of processed data (primary, secondary, or raw/clean/processed, etc.).
  4. De-accessioning of data that has reached the end of its life cycle

Intermediate activity:

  1. GitHub is one of the most popular options for code hosting. Explore alternative options for code hosting.
  2. Work through the Library Carpentry Introduction to GitHub module.

Thing 5: Linking research objects

Beginner activity:

  1. Read the following article on managing digital research objects.
  2. Read the linking data CrossRef page.

Intermediate activity:

  1. Using a (Github code repository or Zenodo), try to find data that goes with a published paper. Then answer some of the following questions:
    • Where is the data or code stored (for example, Github repo or Zenodo)?
    • Who created the objects (ORCID)?
    • Was there proper documentation? License information (regarding commercial use)?

Thing 6: Human and machine readability

  1. Read about the FAIR principles for making your code both human and machine readable, and the FAIR Guiding Principles article.
  2. Read the following report Jointly designing a data FAIRPORT from the Lorentz Center.
  3. Having code that is both human and machine readable supports:
    • API access
    • Allows for automatic integration of multiple datasets
    • Use of standard formats widely accepted in the discipline

Thing 7: Maintain/preserve entire research environment (e.g. software)

  1. Familiarize yourself with best practices for scientific computing. Read Good Enough Practices in Scientific Computing, and Top 10 Metrics for Life Science Software Good Practices to familiarize yourself with the topics of containers, software preservation, and software emulation.
  2. Read more about the Long-term preservation of biomedical research data.

Thing 8: Indexing repositories to enable findability

  1. re3data.org is a global registry of research data repositories that covers research data repositories from different academic disciplines. Register your dataset with re3data.org
  2. Schema.org is a collaborative, community activity with a mission to create, maintain, and promote schemas for structured data.
  3. Link your ORCID account to fairsharing.org, verify your email address, and create a public profile.
    • Familiarize yourself with their Standards, Databases, Policies, and Collections.

Informed consent for human subjects should be broad enough to make reuse possible. See Broad Consent for Research with Biological Samples: Workshop Conclusions. Also see, Recommendations for Broad Consent Guidance from the Office for Human Research Protections.

Thing 10: Application of metrics to evaluate the FAIRness of (data) repositories

Beginner activity:

  1. Explore the work of the FAIR Metrics Group. Explore their proposed FAIR Metrics.
  2. Read the following paper: Evaluating FAIR-Compliance Through an Objective, Automated, Community-Governed Framework.
  3. Explore the design framework for exemplar metrics for FAIRness.

Intermediate activity:

  1. Explore the Make Data Count Project, where you can learn about COUNTER Code of Practice as well as the Code of Practice for Research Data Usage Metrics.
  2. Learn how Zenodo and DataONE have responded to the Make Data Count recommendations.