Research Data Management Support
Lena Karvovskaya, Otto Lange, Iza Witkowska, Jacques Flores (Research Data Management (RDM) support at Utrecht University)
This is an umbrella-like document with links to various resources. The aim of the document is to help researchers who want to share their data in a sustainable way. However, we consider the border between librarians and researchers to be a blurred one. This is because, ultimately, librarians support researchers that would like to share their data. We primarily wish to target researchers and support staff regardless of their experience: those who have limited technical knowledge and want to achieve a very general understanding of the FAIR principles and those who are more advanced technically and want to make use of more technical resources. The resources we link to for each of the 10 FAIR Things will often be on two levels of technicality.
Our primary audience consists of researchers and support staff at Utrecht University. Therefore, whenever possible we will use the resources available at Utrecht University: the institutional repositories and resources provided at the RDM Support website.
Thing 1: Why bother with FAIR?
Background: The advancement of science thrives on the timely sharing and accessibility of research data. Timely and sustainable sharing is only possible if there are infrastructures and services that enable it.
- Read up on the role of libraries in implementing the FAIR Data Principles. Think about the advantages and opportunities made possible by digitalization in your research area. Think about the challenges. Have you or your colleagues ever experienced data loss? Is the falsification/fabrication of data an issue with digital data? How easy it to figure out if the data you found online is reliable? Say you found a very useful resource available online and you want to refer to it in your work; can you be sure that it is still there several years later?
- For more information, you can refer to this detailed explanation of FAIR principles developed by the Dutch Center for Life Sciences (DTLS).
Background: Metadata are information about data. This information allows data to be findable and potentially discoverable by machines. Metadata can describe the researchers responsible for the data, when, where and why the data was collected, how the research data should be cited, etc.
- If you find the discussion on metadata too abstract, think about a traditional library catalogue record as a form of metadata. A library catalogue card holds information about a particular book in a library, such as author, title, subject, etc. Library cataloging, as a form of metadata, helps people find books within the library. It provides information about books that can be used in various contexts.
Now, reflect on the differences in functionality between a paper catalogue card and a digital metadata file.
- Reflect on your own research data. If someone who is unfamiliar with your research wants to find, evaluate, understand and reuse your data, what would he/she need?
- Watch this video about structural and descriptive metadata and reflect on the example provided in the video. If the video peaked your interest about metadata, watch a similar video on the Ins and outs of metadata and data documentation by Utrecht University.
Thing 3: The definition of FAIR metrics
Background: FAIR stands for Findable, Accessible, Interoperable and Re-usable.
- Take a look at the image above, provided by the Australian Research Data Commons (ARDC). Reflect on the images chosen for various aspects of the FAIR acronym. If we consider this video, already mentioned in Thing 2, how would you describe the photography example in terms of FAIR?
- Go to DataCite and choose data center “Utrecht University”. Select one of the published datasets and evaluate it with respect to FAIR metrics. In evaluating the dataset, you can make use of the FAIR Data self-assessment tool created by ARDC. Which difficulties do you experience while trying to do the evaluation?
Thing 4: Searchable resources and repositories
Background: To make objects findable we have to commit ourselves to at least two major points: 1) these objects have to be identifiable at a fixed place, and 2) this place should be fairly visible. When it comes to finding data this is where the role of repositories comes in.
- Utrecht University has its own repository YODA, short for “YOur DAta”. It is possible to publish a dataset in this repository so that it becomes accessible online. Try to search for one of the datasets listed on YODA in Google Data Search. Take “ChronicalItaly” as an example. Was it difficult to find the dataset? Now try to search for one of the databases stored at the Meertens Institute using Google Dataset search. Why are the results so different?
- Take a look at the storage solutions suggested by Utrecht RDM Support. Identify searchable repositories among these solutions.
Thing 5: Persistent identifiers
Background: A persistent identifier is a permanent and unique referral to an online digital object, independent of (a change in) the actual location. An identifier should have an unlimited lifetime, even if the existence of the identified entity ceases. This aspect of an identifier is called “persistency”.
- Read about the Digital Object Identifier (DOI)) System for Research Data provided by the Australian National Data Service (ANDS).
- Watch the video “Persistent identifiers and data citation explained” by Research Data Netherlands. Read about persistent identifiers on a very general level (awareness).
Thing 6: Documentation
- Browse through the general overview of data documentation as provided by the Consortium of European Social Science Data Archives. Think of the principal differences between object-level documentation of quantitative and qualitative data.
Thing 7: Formats and standards
- Take a look at data formats recommended by DANS. Which of these formats are relevant for your subject area and for your data. Do you use any of the non-preferred formats? Why?
- Read the background information about file formats and data conversion provided by the Consortium of European Social Science Data Archives. Reflect on the difference between short-term and long-term oriented formats. Think of a particular example of changing from a short-term processing format to a long-term preservation format, relevant for your field.
Thing 8: Controlled vocabulary
Background: The use of shared terminologies strengthens communities and increases the exchange of knowledge. When the researchers refer to specific terms, they rely on common understanding of these terms within the relevant community. Controlled vocabularies are concerned with the commitment to the terms and management standards that people use.
- Browse Controlling your Language: a Directory of Metadata Vocabularies from JISC in the UK. Reflect on possible issues that may arise if there is no agreement on the use of a controlled vocabulary within a research group.
- Consider the following example from earth science research: “to be able to adequately act in the case of major natural disasters such as earthquakes or tsunamis, scientists need to have knowledge of the causes of complex processes that occur in the earth’s crust. To gain necessary insights, data from different research fields are combined. This is only possible if researchers from different applicable sub-disciplines ‘speak the same language’“. Choose a topic within your research interests that requires combining data from different sub-disciplines. Think about some differences in vocabularies between these sub-disciplines.
Thing 9: Use a license
Background: A license states what a user is allowed to do with your data and creates clarity and certainty for potential users.
- Take a look at various Creative Commons licences. Which licenses put the least restrictions on data? You can make use of Creative Commons guide to figure this out.
- Watch this video about Creative Commons licences.
Thing 10: FAIR and privacy
Background: The General Data Protection Regulation (GDPR) and its implementation in the Netherlands called Algemene Verordening Gegevensbescherming(AVG) requires parties handling data to provide clarity and transparency where personal data are concerned.
- Take a look at at the Handling personal data guide from the Utrecht University RDM website. Reflect on how personal data can be FAIR.