Top 10 FAIR Data & Software Things

Australian Government Data/Collections


Sprinters:

Katie Hannan, Data Librarian (CSIRO), Richard Ferrers, Research Data Analyst (ARDC), Keith Russell, Manager Engagements (ARDC)

FAIR data

See ARDC image summarising what FAIR means; see also Force 11 definition.

ARDC Figure 1; FAIR in a nutshell. Image: ARDC 2018 - CC-BY 4.0.

Description:

Governments have a mandate to make non-sensitive data open. For example, the Australian Government Public Data Policy Statement says “Australian Government entities will … make non-sensitive data open by default…make high value data available for use by the public, industry and academia… ensure non-sensitive publicly funded research data is made open for use and reuse… to extend the value of public data for the benefit of the Australian public.” FAIR data is a way to extend the value of data. The largest 20 nations, the G20, agreed to make Open Data Principles a priority at the 2015 meeting in Turkey, saying “Transparency… Global transformation, facilitated by technology, fuelled by data and information.. Open data is at the center of this global shift.” (p.2).

Audience:

Government data custodians

Goal:

Help government data custodians to understand FAIR data principles

NB: Nomenclature and data:

Where “data” is used here, we also mean collections such as Cultural Collections, historical collections, documents, artefacts and other valuable collections.

Table of contents

  1. Thing 1 - Why is data important?
  2. Thing 2 - Open data vs FAIR data
  3. Thing 3 - Data discovery (F)
  4. Thing 4 - Describing your data (FAI)
  5. Thing 5 - Identifiers (F)
  6. Thing 6 - Licensing (R)
  7. Thing 7 - Dirty Data (R)
  8. Thing 8 - Sensitive Data (A)
  9. Thing 9 - Vocabularies (I)
  10. Thing 10 - Data Impact (R)

Things

Thing 1: Why is data important?

Read G20, Australian and States policies on Open Data

Figure 2 Figure 2; Data sharing drivers Source: Katie Hannan, 2018, CC-BY.

Beginner activity:

International
G20: Open Government Forum; G20 Turkey 2015. “Transparency… Global transformation, facilitated by technology, fuelled by data and information.. Open data is at the center of this global shift.” (p.2) Read and consider G20 Open Data Principles.

Familiarise yourself with your State or Territories Data Policy. See links in Appendix 1.

Australia

See Appendix 1 for a list of Australian State Open Data Policies.

Intermediate activity:

The following legislation may apply to the management of government data:

Advanced activity:

If your organisation doesn’t have a policy on open data, who are the key stakeholders that you would need to work with to prepare an open data policy?

What main headings would you need to include as part of your data policy?

Thing 2: Open data vs FAIR data

Read https://www.go-fair.org/faq/ask-question-difference-fair-data-open-data/ Can you think of examples of data you deal with that cannot be made Open but can be made FAIR? List some advantages in making this data FAIR.

Does the current wording in the policy for Open Data encourage making the data FAIR? Where do you see gaps?

See slide 14 here https://www.slideshare.net/sjDCC/open-fair-data-and-rdm

Beginner activity:

See how Geoscience Australia implement the FAIR data principles in their work. Geoscience Australia describe themselves as “the nation’s trusted advisor on the geology and geography of Australia” (GA 2018).

Advanced activity:

How FAIR is your data? - https://www.ands-nectar-rds.org.au/fair-tool Suggest using this now, and then finishing off the modules, making some changes to a data collection and then testing again using the FAIR data tool.

Thing 3: Data discovery

International government data portals:

Thing 4: Describing your data or collection

Some reusable content here - https://ecu.au.libguides.com/10-marine-science-rdm-things/Thing6

Beginner activity:

Read a data description on data.gov.au eg Arts Victoria, ABC or Research data Australia Eg National Archive of Australia, Australian Antarctic Data Centre, CSIRO (Commonwealth Scientific and Industrial Research Org), Geoscience Australia.

Reflection: Could you understand the description? Can you think of someone for whom this data or collection would be useful? Was it clear where to go next to access the data, or to ask for more information about this data or collection? What else would you like to know about this data/collection?

Activity: Post your questions or responses to the reflection above to: the data custodian, or the comments section at data.gov.au.

Intermediate activity;

If you are a data custodian/researcher, consider your five most important datasets, that you have contributed to or that you manage. Pick the most important dataset to describe.

  1. Start with: Title, Author, Year, Institution, Location/URL. This is the minimum description required to get a DOI (a permanent identifier). The URL for a DOI is the home page for the dataset description. If you don’t have one, make a person’s contact the URL.
    • (Hint: if you get stuck with the description, copy the abstract of a paper or conference paper or annual report, which uses or references your dataset. Edit the abstract to talk only about the data.)

Q: What type of data identifier does a government data custodian have?

  1. Add more rich description to your data description eg subjects, grant IDs (where applicable - RDA; the Australian National Data Catalogue, has permanent URLs for Australian ARC and NHMRC grants). Include a significant statement about why the dataset is important.

  2. Ask a colleague in a related field if they can understand your description. This helps the description be broadly readable by someone who is not deeply knowledgeable in your field. This will ensure that your description is more broadly understood.

Advanced activity:

Publish your data description on your resume, especially if online e.g. LinkedIn. Send your data description to your data librarian, for addition to your Institutional Repository or Data Portal. Alternatively, post your description to a public cloud service, such as Zenodo, Figshare or Data Dryad. No data need be included. A description record is valuable in itself as it reveals the existence of data, previously unknown and inaccessible.

Thing 5: Identifiers

To make data findable, It has to be uniquely and persistently stored with an identifier. A digital object identifier (DOI) is a unique, case-insensitive, alphanumeric character sequence and can be very helpful for this purpose. See also [ANDS Guide: Digital Object Identifiers (DOI) System for Research Data]](https://www.ands.org.au/__data/assets/pdf_file/0006/715155/Digital-Object-Identifiers.pdf).

See who mints ANDS DOIs, including NSW Office of Heritage and Environment, Bureau of Meteorology, CSIRO, Geoscience Australia, Dept of Environment.

Types of persistent identifiers:

Videos

Watch the video Persistent identifiers and data citation explained by Research Data Netherlands - https://youtu.be/PgqtiY7oZ6k

Read about persistent identifiers on a very general level (awareness). DOI requires five fields; author, title, year, publisher, URL of DOI landing page.

Beginner activity:

Visit http://www.doi.org/ and try resolving these DOI numbers:

10.26179/5bf63428ea2a1 10.26186/5b76556b396c0

Thing 6: Licensing

See the licensing guide: what is the appropriate licence for data produced by a government agency?

Refer to Australian Government Data Statement: “At a minimum, Australian Government entities will publish appropriately anonymised government data by default: …under a Creative Commons By Attribution licence (ie CC_BY licence) unless a clear case is made to the Department of the Prime Minister and Cabinet for another open licence.”

Specific CC licences, which require DPC approval, include NC - non-commercial, SA - share alike, and the very restrictive (and not-recommended ANDS) ND - no derivatives allowed.

Examples of licensing statements:

http://www.bom.gov.au/waterdata/index.shtml?selected=Copyright

Thing 7: Dirty data

Why is ”clean” data important? Public policy, changes to medical protocols and economic decisions all depend on accurate and complete data. See further at ECU resource which looks at the why and what of “dirty data.”

https://ecu.au.libguides.com/10-marine-science-rdm-things/Thing10

Beginner activity:

Read this case study. The Data Retriever automates the tasks of finding, downloading, and cleaning up publicly available data, and then stores them in a variety of databases and file formats. This lets data analysts spend less time cleaning up and managing data, and more time analysing it. https://frictionlessdata.io/articles/the-data-retriever/

Advanced activity:

Thing 8: Working with sensitive data

What is sensitive data?

FAIR data doesn’t need to be published as open data. See Thing 2.

Reuse: https://www.ands.org.au/working-with-data/skills/23-research-data-things/10-medical-and-health-things/m-and-h-thing-4

Useful resource: CSIRO Data 61 The De-Identification Decision-Making Framework - https://publications.csiro.au/rpr/download?pid=csiro:EP173122&dsid=DS3

Indigenous Knowledge: Issues for protection and management - https://www.ipaustralia.gov.au/sites/g/files/net856/f/ipaust_ikdiscussionpaper_28march2018.pdf

Additional resources (from Library-Research-Support-Top-10-FAIR-Things_DRAFT)

Thing 9: Vocabularies - Assisting with interoperability

Beginner activity:

Controlled vocabularies for data description

In addition to selecting a metadata standard or schema, whenever possible you should also use a controlled vocabulary. A controlled vocabulary provides a consistent way to describe data - location, time, place name, and subject.

Controlled vocabularies significantly improve data discovery. It makes data more shareable with researchers in the same discipline because everyone is ‘talking the same language’ when searching for specific data e.g. plants, animals, medical conditions, places etc

  1. Start by browsing Controlling your Language: a Directory of Metadata Vocabularies from JISC in the UK. Make sure you scroll down to 5. Conclusion - it’s worth a read.

Advanced activity:

Have a browse around the stunning level of data description and data contained in the Atlas of Living Australia.

Other examples:

Data Dictionaries Standardised, accepted terms and protocols used for data collection

Thing 10

Data impact:

Data reuse - It is hard to check/track when you don’t have persistent identifiers and there’s not much of a data citation culture.

Web stats Selected data.gov.au web analytics - https://search.data.gov.au/dataset/ds-dga-9fa9bfda-96b3-4214-8a09-497af105524b/details?q=data.gov.au

Some old uses of open data: https://data.gov.au/showcase

Use in GovHack(AU) - https://twitter.com/govhackau?lang=en

Tracking identifiers - data citation

Beginner activity:

Looking at the broader impact of how the data has been used and the benefits it has brought to society, industry, economy, etc. is a richer source of impact evidence than just looking at citations.

https://www.ands.org.au/working-with-data/articulating-the-value-of-open-data/data-engagement-and-impact

Postscript: Other topics to consider:

See for example slide 54 in this Data Readiness slideshow as well as the 24th edition of Share (cover shown below).

People in Data

References:

Appendix:

List of Australian state/territory government open data policies:

Australian Federal Government: Refer policy at Dept of Prime Minister and Cabinet. See also National Data Commissioner, ”responsible for implementing a simpler data sharing and release framework”.

Victoria Data Access Policy

“The Victorian Government recognises the benefits from and encourages the availability of Victorian government data for the public good. The DataVic Access Policy has been developed to support this recognition.”

New South Wales Policy (NSW)

“The objectives of this policy are to assist NSW Government agencies to: release data for use by the community, research, business and industry accelerate the use of data to derive new insights for better public services embed open data into business-as-usual…”

Queensland Policy

Tasmania Policy

South Australia Policy

Western Australia Policy

Australian Capital Territory Policy

Northern Territory Policy (Darwin)