Top 10 FAIR Data & Software Things

Imaging


Top 10 FAIR things for imaging

Authors

Description:

This “10 Things” guide aims to promote the use of the FAIR principles in the bioimaging community. The FAIR principles are described in the context of bioimaging and the activities are optionald. This guide is to empower researchers, scientists and health professionals to incorporate data best practices throughout the research cycle, in order to improve research data quality, reproducibility and reusability of research outputs.

Audience:

Imaging researchers, neuroscientists, clinicians, microscopists, platform engineers, graduate students and computational scientists working on image analysis and processing.

Goals:

To inform data producers and users about the FAIR principles applied to bioimaging and suggest activities to apply to their research.

Table of Contents:

1. What is FAIR?
2. What are publishers and funders saying about data access?
3. Data sharing and discovery
4. Reusable data repositories for the image community
5. Managing and sharing sensitive data
6. Persistent identifiers
7. Describing data: metadata
8. Reusable data best practices
9. Licensing your work
10. Data citation for access and attribution

1. What is FAIR?

The term FAIR as detailed in 15 principles [1] stands for Findable, Accessible, Interoperable and Reusable. The FAIR principles [3] are guidelines to motivate and enhance reusability of data, by facilitating its discovery, integration and evaluation. In this context, “data” refers to all research-oriented digital objects (including data, metadata, software, workflows and packages) [4]. Wilkinson et al., have pioneered the definition of the guiding principles “emphasising the capacity of computational systems to Find, Access, Interoperate and Reuse data with none or minimal human intervention”, which is also referred as machine-actionable FAIR principles [3]. FAIR is also connected with open research and data management movement as Higman et al describe in Three camps, one destination: the intersection of research data management, FAIR and Open.

“FAIRness is a prerequisite for proper data management and data stewardship”

Communities are motivated to apply the FAIR principles to research activities and to enable people and machines to find, read, use and reuse research data and research outputs. For instance, in 2018, The Enabling FAIR Data Project [5], a coalition of stakeholders representing the international Earth and Space science community set out to develop standards that will connect researchers, publishers, and data repositories in this community to enable FAIR data on a large scale. This project will accelerate scientific discovery and enhance the integrity, transparency, and reproducibility of this data. In imaging, on 1 March 2019, Euro-BioImaging [6] and other research infrastructures including ELIXIR-Europe [7] joined forces as part of the The European Open Science Cloud [8] project to publish research data via FAIR databases. Community participation from academia, industry, small and medium-sized enterprises (SMEs) and regional bio-clusters is paramount for the success of this four-year project (starting in 2020). The imminent global uptake of the FAIR principles through different scientific domains, can only motivate us to move forward, promote and apply them.

Activity 1: CODATA, The Committee on Data for Science and Technology, shared in 2018 news to an important milestone “Enabling FAIR Data Project and Commitment Statement” [9]. Take a look at the partners [5], do you recognise partners in the imaging discipline?

Activity 2: Can you think of the benefits of making your data FAIR? And how you can align your current data practices to the FAIR principles? Consider the following resources when addressing the activity above:

2. What are publishers and funders saying about data access?

The following examples and statements are meant to motivate organisations and researchers to adopt the next steps towards FAIR. As disclaimer, most of the examples are from Australian stakeholders as the guide is being developed in Australia; nonetheless, international examples have also been included.

“Nature journals require sharing research materials because their core business is ensuring research quality and promoting research to the widest readership” (Nature Genetics, 2004) [14].

In 2014, The Nature Publishing Group welcomed its newest journal, Scientific Data [15] — a peer-reviewed, open-access publication designed to provide a better way to share and explain data. Scientific Data promotes reproducible, collaborative science and due credit to scientists [16].

PLOS journals require authors to make all data underlying the findings described in their manuscript fully available without restriction at the time of publication [17]. PLOS suggests using FAIRsharing [18] to index resources, for example their own PLOS list of recommended resources [19].

eLife Journal Policy [20] “wherever possible, authors should make major datasets available using domain-specific public archives, or generic databases, e.g. FAIRsharing page for eLife recommended repositories and standards” [21].

Funders like the European Commission have drafted Guidelines on FAIR Data Management for the H2020 programme (European Commission, 2016) “those projects funded in this scheme must submit a version of this FAIR Data Management Plan (DMP)” [25].

The Australian 2016 National Research Infrastructure Roadmap [26] has two FAIR highlights: 1. Australia must stay at the forefront of international developments and should continue to engage with internationally recognised initiatives… such as the FAIR guiding principles. 2.The Australian National Data Service (ANDS) (now ARDC) has been a foundational, providing in many cases leading, international policies and practices to support researchers and institutions in making data FAIR.

In 2016, The European Commission General directorate for Research and Innovation published the report and action plan Turning FAIR into reality [27] to implement FAIR and provide concrete recommendations and actions for stakeholders in Europe and beyond.

Starting in September 2016, all research papers accepted for publication in Nature and an initial 12 other Nature Research titles will be required to include information on whether and how others can access the underlying data. Nature Announcement: where are the data?.

The Australian Research Council (ARC) Open Access Policy Version 2017.1 [28] states “Author(s) should consider selecting publishers and research outlets, which have policies supporting the F.A.I.R. principles, as well as immediate or early availability of Publications via Open Access, in order to maximise the availability and impact of their ARC Funded Research.”

Policy Statement (2017) On FAIR Access To Australia’s Research Outputs https://www.fair-access.net.au/fair-statement [29] Headline: By 2020, Australian publicly funded researchers and research organisations will have in place policies, standards and practices to make publicly funded research outputs findable, accessible, interoperable and reusable to the Australian and international community.

The (Australian) National Health and Medical Research Council (NHMRC) promotes the highest quality in the research that it funds, based on international best practice. The NHMRC lists the FAIR principles under useful resources [30] for publication and reporting of research outcomes.

In late 2017, Australian Health Research Alliance (AHRA) committed to developing a coordinated national approach to Data Driven Healthcare Improvement. Leveraging data registration, linkage, integration, storage, security, access, management and analysis capabilities [31].

In 2018, The Enabling FAIR Data Commitment Statement [32], has been formalised, by a significant group of stakeholders (repositories, publishers, societies, communities, institutions, funding agencies and organisations, and researchers) to support and promulgate open and FAIR data principles and practices in their core science activities and policies.

Wiley’s data sharing and citation policies and service support the growing movement to make research more open [34], because this leads to a fairer, more efficient and accountable research landscape, driving effective and faster pace of discovery Wiley’s Data sharing and citation [35].

The Genomics Health Futures Mission (GHFM) - Projects Grant Opportunity guidelines 2019 states “research projects proposals with plans to manage genomic and/or phenomic data in alignment with the FAIR principles for research data are preferred”.

All disciplines should follow the geosciences and demand best practice for publishing and sharing data” (Stall et al., 2019) [23].

“Grant makers, professional organisations, research journals, publishers, and other entities in the research field increasingly stress the ethics as well as societal and practical benefits of data sharing, and require researchers to do so within a reasonable time after data collection ends.” (Dijkers, 2019) [24].

3. Data sharing and discovery

Why sharing?

“Both researchers and the broader community stand to benefit from the knowledge produced through publicly funded research” (ARC open access policy). Data sharing is well connected with the concept of reproducibility.

Activity 1: The slides (2-11) motivation on neuroimaging reproducibility What is your opinion about: Data + Workflow specification + Execution environment = Results?.

Activity 2: (Infographic) Research data may be discovered (findable) and shared (accessible) in many ways. Start by looking at some data sharing trends across countries and research disciplines. Consider your own current data sharing practices, and those of your project team(s). How FAIR are they?

Activity 3: How can data be shared and discovered? Think about open, mediated, restricted access data repositories. What examples of these types of repositories are you aware of? Discuss with others about their answers.

4. Reusable data repositories for the image community

How to walk towards FAIR?

Imagine if you were able to obtain extra datasets for your existing research project, or start a new project reusing publicly available datasets. You can do this by exploring the following resources.

Neurosciences Data repositories recommended by the Scientific Data Journal which accept human-derived data, in addition NeuroMorpho.org and G-Node also accept data from other organisms. Please note that human-subject data submitted to OpenNeuro must be de-identified, while Functional Connectomes Project International Neuroimaging Data-Sharing Initiative (FCP/INDI) can handle sensitive patient data.

Microscopy

Biomedical sciences

Non-domain specific

Data registries and catalogues re3data.org - a registry of some 2000 data repositories. Research data australia read more.FAIRSharing.org offers a catalogue of databases, described according to the BioDBcore guidelines. OpenAIRE content provider, European Open Science Cloud, Google Public Data, Google Dataset Share, for open access publications Open knowledge maps.

This and the previous section intend to show that it is becoming more common for funding agents and publishers to require research data to be made accessible via appropriate repositories. This list is a starting point for you to find out what data already exists in your research area. If you want to share your data, or find data relevant to your research take a detailed look at the examples provided, most if not, all will have guides on how to share data.

Activity 1: (Find a repository) Go to: https://fairsharing.org/biodbcore/?q=imaging and browse or search to find repositories relevant to your research. Try for example, searching on “neuroimaging”. Explore at least one repository you find. How well does it support the FAIR data principles? Tip: look for things such as persistent identifiers, clear descriptions, licence information, download options, file formats.

5. Managing and sharing sensitive data

Clarification, FAIR data is not necessarily “open” data. There are some good reasons why some data should not be open. For example, to protect intellectual property, commercialisation, national security, personal privacy or endangered species. However, it may still be possible to provide mediated access to such data, or to publish a description of the data so that others can discover_ its existence. To align with FAIR principles your “research data should be as open as possible, as closed as necessary_”.

The FAIR principles encourage us to disseminate data as widely as possible, in the most effective manner and at the earliest opportunity. This statement takes into account any restrictions relating to privacy, confidentiality, intellectual property, embargo period, or cultural sensitivities, that need to be addressed, discussed and clarified before sharing any data. In the planning phase of a research project, researchers need to consider at least making project metadata publicly accessible.

If you need examples and more information, check OpenAIRE sensitive data guide, ANDS publishing and sharing sensitive data, Earth Science Information Partners (ESPI) Handling sensitive data tutorial. The Australian Bureau of Statistics (ABS) informs about the application of the five safes framework and Table 2 provides examples at different levels of accessibility.

Activity 1: Promoting FAIR principles in the healthcare field by the Digital Curation Centre (DCC), January 2019. Highlights: The sensitive nature of patient data and additional concerns for these data include security and anonymisation of data subjects and although not the primary concern from a technical aspect, these are a major component considered. For more information visit FAIR4health.eu.

Activity 2: Think about when and how people can share data along the research cycle. Keeping in mind that it is strongly recommended to release metadata (description) of the project to comply with FAIR principles, even if you cannot share the data itself. Institutional repositories or domain specific repository should be able to store metadata of your project and then link that information via registries (Look the previous section).

De-identification / Anonymisation

Sensitive data should -seek to minimise the risk of exposing confidential information_. Sometimes restrictions of sharing can be resolved by de-identification or anonymisation of data. Anonymisation is sometimes used interchangeably with de-identification, ANDS makes a clarification of these terms.

Activity 3: Look at The Future of Privacy Forum’s visual guide to practical data de-identification.

Optional extra information. 1. Open de-identification tools by Open Brain Consent Halchenko, Y. et al., 2018. 2. The (Health Insurance Portability and Accountability Act) HIPAA Privacy Rule establishes national standards (US) to protect individuals’ medical records and personal health information, and guidance about methods for de-identification. 3 Anonymization of DICOM Electronic Medical Records (Newhauseret al., 2015).

6. Persistent identifiers

Identifiers are essential to the human-machine interoperation. Assigning globally unique persistent identifiers “is arguably the most important FAIR principle, because it will be hard to achieve other aspects of FAIR without them” (F1). Persistent identifiers or PIDs help find and collect data accurately, enable proper citation by collecting citation metrics about the use of a dataset, article or data generator (e.g. instrument, software, workflow). For the researcher, persistent identifiers enable disambiguation of people and enable linking existing works.

For individuals:

For digital objects (files, datasets, publications, software, etc.):

Disclaimer, there are a wide range of PIDs available, we only cited two examples for each type.

Activity 1: OpenAIRE/FREYA/ORCID guide for researchers “How can identifiers improve the dissemination of your research outputs?”

Activity 2: Six Ways to Make Your ORCID iD Work for You! If you already have an ORCID, check this video to link publications to your ORCID profile.

Activity 3: (Discuss in pairs, 5 min) The Joint Declaration of Data Citation Principles from FORCE11 https://www.force11.org/datacitationprinciples

To learn more about persistent identifiers visit Go-FAIR F1 Principle or the ARDC identifiers examples.

7. Describing data: metadata

“Metadata (information about data) provides means for discovering data objects as well as providing other useful information about the data objects such as experimental parameters, creation conditions, etc.” (Rajasekar & Moore, 2001).

Why building and using metadata is relevant? Because it supports the discovery, understanding and organisation of the process of research data across different communities, more information.

Some aspects of metadata to keep in mind whether you produce, read or reuse metadata. Creating, using and reusing metadata emphasises the need of a standard vocabulary, in order to properly be interpreted by either humans or software. Hence, why metadata items need to be precisely defined. A defined list of agreed terms constitute a controlled vocabulary, which is usually led by a user-community. Controlled vocabularies help data integration when, for example, ambiguities may exist on the terms used in the different datasets and across different repositories. If the data are to be re-used outside this community additional information may be required.

Controlled vocabularies are part of a model called ontology. An ontology has controlled vocabularies and the glue to link the terms providing an effective means whereby human and electronic agents can communicate unambiguously about concepts. This connects together to the Interoperability principle of FAIR I1. The goal of making data interoperable is to enable members of disparate communities to reuse and understand digital information over time.

Metadata for imaging should include a standard terminology and tools for describing physiological, clinical, demographic and genetic changes. The main recommendation is to share metadata per project whenever possible, even if the data is not yet available. Remember that metadata can be stored in general purpose repositories.

We can group metadata types in two: either automatically created metadata or manually created metadata, more information.

a. Why ontologies?

By expressing image annotation in machine computable form as a formal ontology, human knowledge can be brought to bear on effective search and interpretation of image data, especially across multiple disciplines, scales, and modalities” (Eliceiri et al. 2012). Keep in mind that if privacy is an issue, any (meta)data can be listed under embargo.

Implementation, adoption and harvesting of metadata, requires defined ontologies. Due to increased demand for quantitative analysis and robust curation and sharing of the image data, the need for full ontologies and annotations is growing.

More examples, Ontologies for Neuroscience describe three domain specific ontologies and how they build on top of each other Larson & Martone, 2009. They also note that existing domain specific vocabularies built the ontology with the help of the Open Biological Ontologies (OBO) (Smith et al., 2007) community. For example a subset of OBO is the EDAM Ontology for bio-imaging (Kalaš et al., 2019). The Neuroscience Information Framework has developed a comprehensive vocabulary NIF Standard ontology (NIFSTD) for annotating and searching neuroscience resources. Plant et al., 2011 provide an overview of what is needed to implement metadata that follows domain specific ontologies, they use as example microscopy cell image data. The National Center for Biomedical Ontology (NCBO) NCBO’s BioPortal provides access to more than 270 biomedical ontologies and controlled terminologies (Musen et al., 2012), and include some of those cited before. The Ontology for Biomedical Investigations (Bandrowski et al., 2016), OBI Ontology.org enables communication between existing ontologies.

b. Controlled vocabularies

Domain specific controlled vocabularies might be a wider landscape than ontologies to cover here, hence some more generic vocabulary examples are given. Schema.org widely used to build controlled vocabularies, a more specific example is bioschemas.org a collection of specifications that provide guidelines to facilitate a more consistent adoption of schema.org within the life sciences. Research vocabularies Australia is a public database of controlled vocabularies, at the time of writing this guide, no specific bioimaging vocabularies were found, maybe that is something you can help with?

c. Storing and publishing metadata

Where to store and publish metadata? The short answer is, depends which institution you are from. Enquiring the library, research officer or data steward are the best sources of information. Some options are:

  1. Institutional repositories
  2. Domain specific repositories
  3. Generic repositories

Keep in mind that the FAIR principle A2. Metadata are accessible, even when the data are no longer available, which reinforces the need of having at least shared metadata. For example, first look at the section “Reusable data repositories for the image community”. For a broad view FAIRsharing.org databases for imaging. The ARDC - Research Data Archive (RDA) harvests institutional repositories, hence it can be a generic repository. The CSIRO - data access portal (for projects related to CSIRO). DataCite metadata store allows users to register DataCite DOIs and associated metadata. Zenodo, provides a DOI and versioning capabilities.

Activity 1: (Discuss in pairs) Have a look at the metadata stored at Research Data Australia for the 7T Magnetom instrument. It contains simple but important public metadata and a PID. Activity 2: Where to store metadata? from ARDC.

8. Reusable data best practices

Here is a suggested list of data best practices to implement to your research outputs. These will improve data and software reusability by others, which includes yourself in the future. Remember, making data available for others to re-use publicly is the goal, but not all data must be shared to all. Adding terms and conditions of accessibility is an option to consider. To share data, you can make use of public infrastructures already mentioned (section “4. Reusable data repositories”) or use your institutionally provided data repository. To get started, there are a few things you should keep in mind.

a. Provenance - Usually provenance is a manually produce metadata file (it can also be automatically produced). It is important for the reuse of data in the future, it should contain descriptors such as data producer, date history (log of changes), data dictionary. Primary data ought be read only.

b. File formats - Most file formats are defined by the data producer (e.g. instrument or software), whenever possible you should try to convert data to formats that are publicly accessible.

DICOM Digital Imaging and Communications in Medicine. Mostly used in neurosciences, can be converted to NIfTI(Neuroimaging Informatics Technology Initiative) or BIDS format. Bio-formats. The Hierarchical Data Format version 5 (HDF5), is an open source file format that supports large, complex, heterogeneous data HDF5) used by MINC and Huygens Software. Tiff, extensively used in Microscopy.

c. Data structures Keep consistent file and folder naming conventions across linked projects.

d. Data curation Should be included in your data quality workflow as part of the process, ideally this will be automated.

e. Data versioning To keep the provenance of your data you might use data versioning tools: Git or GitHub (for code), Git annex, Datalad.

f. Containerisation For data processing pipelines. E.g. Singularity, Docker, or use Virtual environments, such as the Characterisation Virtual Lab.

g. Protocols Search for imaging protocols publicly shared. For example, Protocol Exchange from Nature Protocols is an open resource where the community of scientists pool their experimental know-how to help accelerate research.

h. Create documentation A README file helps ensure that your data can be correctly interpreted and reanalysed by others. For example, the DataDryad Readme is an example of minimum documentation.

i. Benchmarks or checksums

Activity 1: A brain imaging case study that provides direct evidence of the impact of open sharing on data use and resulting publications over a seven-year period (2010-2017). “We dispel the myth that scientific findings using shared data cannot be published in high-impact journals and demonstrate rapid growth in the publication of such journal articles” (Milham, M. P. et al., 2018).

Activity 2 (Discussion + Action): What Can You Do?

  • Contribute your data – Previously published datasets.
  • Release some or all of the project metadata – your call, as a simple rule, the more the better!
  • Curate existing datasets to make available in the future - you set the upload schedule.
  • Contribute your scripts/code
  • Have discussions with your team members about licensing and sharing.
  • Create a data management plan.

Activity 3: Go through the questions from the Horizon2020 guide to create a FAIR Data Management Plan and see if you can already answer many of them.

Recommended extra reading: Best Practices in Data Analysis and Sharing in Neuroimaging using MRI, Ten Simple Rules for Creating a Good Data Management Plan, Ten Simple Rules for Reproducible Computational Research and Ten principles for machine-actionable data management plans, these papers will help you connect all the concepts that you have learned so far.

9. Licensing your work

Licensing your work / research outputs to be open access (research output here means data, metadata, code, workflows) allows you as author or contributor to enable reuse and appropriate attribution of the work. If there is no license attached to your work, you are actually stopping anyone to legally reuse it. Did you know that No license = No permissions?. Also, if you find research outputs that you want to reuse, you should only reuse it according to their license.

Be aware that you have the right to choose a license that best suits your purpose. There are multiple different licenses and versions of these, to be applied to data and software. Some licenses are applicable only in certain countries, think of applying an international license. Be aware that the data repository that you use might ask you to accept their “terms and conditions” which affects how you might use or share data, by expanding, modifying or limiting the intended purpose or your own license. You can have multiple licenses, for different purposes or different audiences. Finally, not every part of your work/ research outputs needs to be publicly available or be licensed. The more you share the better.

Activity 1: What if you don’t choose a license?, explains and gives you a few reasons to think about licensing your work. If you are interested in reading about GitHub terms and conditions take 5 extra minutes.

Activity 2: (flowcharts as a survey) The ARDC has a guides about licensing for three specific scenarios: a) Data creator flowchart b) Data supplier flowchart and c) Data users flowchart. If you want to know more about licensing and copyright for data reuse visit the ANDS (now joined into ARDC) page.

A few types of licenses: Creative Commons (CC) is, so far, very easy to apply and it is broadly being reused; it is strongly promoted in the United States, however it is an internationally recognised license creator. CC is good for: a) very simple, factual data sets b) data to be used automatically. You should watch out for the version in use, recommended to use version 4 or later. CC has attribution stacking Non Commercial (NC), Shared Alike (SA) and Non derivatives (ND). The NC condition: only to be used with dual licensing. The SA condition reduces interoperability. The ND condition severely restricts reuse. To help you decide, use this https://creativecommons.org/choose/. Copyleft is a general method for making a program (or other work) free (in the sense of freedom, not “zero price”), and requiring all modified and extended versions of the program to be free as well. Open Data commons, also provides licenses specifically for open data, good for most databases and datasets, e.g. Open Data Commons Open Database Licence (ODC-ODbL) or Open Data Commons attribution license (ODC-By). Licenses specific for software: Mozilla Public Licence (MPL), MIT Licence, the GNU General Public Licence (GPL) and a list of open source licenses by category. To help you choose a license for software, look at the descriptions: https://choosealicense.com/. Acknowledgement, most of the cited licenses on this section, were first mentioned by License Research Data from the Digital Curation Centre (DCC).

10. Data citation for access and attribution

Citation analysis and citation metrics are important to the academic community, which gives recognition to the researchers and their work. Data citation continues the tradition of acknowledging other people’s work and ideas. It also helps make research data more findable and accessible. It is now common practice for authors to formally cite the research datasets and associated software that underpin their research findings.

Activity 1: (Video, 12 mins) Responsible Data Use: Citation and Credit.

Activity 2: How to cite data and software? This example from Dryad clearly shows how to cite the dataset that underpins a journal article as well as the article itself. Note that both citations include a Digital Object Identifier (DOI).

Activity 3: What to cite and why? For data and software from ARDC for more information.

Acknowledgements

We acknowledge Chris Erdmann for reviewing the first version of this document, and Jose Manzano Patron for the adding important resources to the third version of this document.

Pre-print

This document is also available via the Open Science Framework as a pre-print and it is citable with the following DOI 10.17605/OSF.IO/ZKJ4R where versions of it in docx and odt have been saved.

References

  1. GO-FAIR https://www.go-fair.org/fair-principles/ accessed 13 May 2019
  2. Erdmann C. et al. (2019)_ Top 10 FAIR Data & Software Things_ https://www.go-fair.org/2019/02/20/top-10-fair-data-software-things-published/ doi:10.5281/zenodo.2555498, accessed 13 May 2019.
  3. Wilkinson, M. D. et al. _(2016) The FAIR Guiding Principles for scientific data management and stewardship. _Sci. Data 3:160018 doi:10.1038/sdata.2016.18.
  4. Wilkinson, M. D. et al. (2017). Interoperability and FAIRness through a novel combination of Web technologies. PeerJ Computer Science 3:e110 doi:10.7717/peerj-cs.110.
  5. Coalition for Publishing Data in the Earth and Space Sciences (COPDESS) http://www.copdess.org/enabling-fair-data-project/, accessed 20 May 2019.
  6. Global Bioimaging News (1 April, 2019) EOSC-Life: developing an open collaborative space for digital biology in Europe http://www.eurobioimaging.eu/content-news/eosc-life-developing-open-collaborative-space-digital-biology-europe accessed 14 May 2019.
  7. ELIXIR Europe https://elixir-europe.org/, accessed 13 May 2019.
  8. EOSC https://www.eosc-hub.eu/, accessed 13 May 2019.
  9. CODATA News (27 Nov, 2018) Enabling FAIR Data Project and Commitment Statement http://www.codata.org/news/299/62/Enabling-FAIR-Data-Project-and-Commitment-Statement, accessed 13 May 2019.
  10. Australian National Data Service (ANDS) How to make your data FAIR https://www.ands.org.au/working-with-data/fairdata, accessed 18 May 2019.
  11. The Turing Way Community (2019) 7. Research Data Management https://the-turing-way.netlify.com/rdm/rdm.html https://doi.org/10.5281/zenodo.3233986, accessed 17 June 2019.
  12. LIBER (Europe’s Research Library Network) Valentino Cavalli (2018) “What is FAIR data? https://libereurope.eu/blog/2018/07/13/fairdataconsultation/ accessed 10 June 2019.
  13. The Swiss National Science Foundation (SNSF) Matrix with their interpretation of the FAIR principles and how to apply them accessed 20 June 2019.
  14. Nature Genetics (2004) ‘Good citizenship’ or good business? https://doi.org/10.1038/ng1004-1025 accessed 17 May 2019.
  15. Scientific Data Journal https://www.nature.com/sdata/ accessed 17 May 2019.
  16. Nature Physics (2014) It’s good to share. https://doi.org/10.1038/nphys3033 accessed 17 May 2019.
  17. PLOS ONE. Data Availability. Acceptable Data-Sharing Methods https://journals.plos.org/plosone/s/data-availability#loc-acceptable-data-sharing-methods accessed 18 May 2019.
  18. FAIRSharing https://fairsharing.org/ accessed 18 May 2019.
  19. PLOS recommended repositories and data standards indexed on FAIRSharing (2017). https://fairsharing.org/recommendation/PLOS accessed 29 May 2019.
  20. eLIFE journal policies https://submit.elifesciences.org/html/elife_author_instructions.html#policies accessed 29 May 2019.
  21. FAIRsharing page for eLife recommended repositories and standards (2019) https://fairsharing.org/recommendation/eLifeRecommendedRepositoriesandStandards accessed 15 June 2019.
  22. Stall, S. et al., (2018) Data Sharing and Citations: New Author Guidelines Promoting Open and FAIR Data in the Earth, Space, and Environmental Sciences. Science Editor 41(3), 83-87. https://www.csescienceeditor.org/article/data-sharing-and-citations-new-author-guidelines-promoting-open-and-fair-data-in-the-earth-space-and-environmental-sciences/, accessed 20 June 2019.
  23. Stall, S. et al., (2019) Make scientific data FAIR. Nature 570, 27–29. doi:10.1038/d41586-019-01720-7, accessed 05 June 2019.
  24. Dijkers M. P.(2019) A beginner’s guide to data stewardship and data sharing. Spinal Cord 57, 169-182 https://doi.org/10.1038/s41393-018-0232-6, accessed 17 June 2019.
  25. The European Commission. H2020 ProgrammeGuidelines onFAIR Data Managementin Horizon 2020 v3.0 (2016) http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf, accessed 10 June 2019.
  26. Australian Government. (2016) 2016 National Research Infrastructure Roadmap. https://docs.education.gov.au/system/files/doc/other/ed16-0269_national_research_infrastructure_roadmap_report_internals_acc.pdf, accessed 15 June 2019.
  27. The European Commission, Final Report and Action Plan from the European Commission Expert Group on FAIR Data. (2018) Turning FAIR into Reality. https://ec.europa.eu/info/sites/info/files/turning_fair_into_reality_1.pdf, accessed 14 May 2019.
  28. Australian Research Council. (2017) ARC Open Access Policy—Version 2017.1 https://www.arc.gov.au/policies-strategies/policy/arc-open-access-policy-version-20171, accessed 10 May 2019.
  29. FAIR access. (2017) FAIR Access To Australia’s Research Outputs https://www.fair-access.net.au/fair-statement, accessed 10 May 2019.
  30. NHMRC has released its Research Quality Strategy. https://www.nhmrc.gov.au/research-policy/research-quality, accessed 30 May 2019.
  31. Australian Health Research Alliance (AHRA) (2018) DataDrivenHealthcareImprovement https://www.wahtn.org/wp-content/uploads/2019/05/Data-Driven-Healthcare-Improvement.pdf, accessed 29 May 2019.
  32. COPDESS. (2018) The Enabling FAIR Data Commitment Statement. http://www.copdess.org/enabling-fair-data-project/commitment-to-enabling-fair-data-in-the-earth-space-and-environmental-sciences/, accessed 29 May 2019.
  33. Springer Nature. Research Data Policy. https://www.springernature.com/gp/authors/research-data-policy, accessed 13 May 2019.
  34. Wiley. Open Research https://authorservices.wiley.com/open-research/index.html, accessed 13 May 2019.
  35. Wiley. Data Sharing and citation. https://authorservices.wiley.com/author-resources/Journal-Authors/open-access/data-sharing-citation/index.html, accessed 13 May 2019.
  36. https://www.nature.com/sdata/policies/repositories.
  37. https://www.nature.com/polopoly_fs/1.20541!/menu/main/topColumns/topLeftColumn/pdf/537138a.pdf
  38. https://www.nature.com/news/data-access-practices-strengthened-1.16370
  39. https://www.nature.com/articles/nn1009-1205
  40. https://dx.doi.org/10.1126%2Fscience.1213847
  41. https://osf.io/u28sb/
  42. http://neuromorpho.org/
  43. http://datasets.datalad.org/
  44. https://openneuro.org/
  45. https://github.com/OpenNeuroDatasets
  46. https://central.xnat.org/
  47. http://portal.brain-map.org/
  48. https://www.humanconnectome.org
  49. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3262165/
  50. http://loris.ca/
  51. http://fcon_1000.projects.nitrc.org/indi/cmi_healthy_brain_network/
  52. http://adni.loni.usc.edu/
  53. http://neuroinformatics.harvard.edu/gsp/]
  54. https://brainbase.io/
  55. https://addictionresearch.nih.gov/abcd-study]
  56. https://idr.openmicroscopy.org/about/
  57. https://www.openmicroscopy.org/omero/
  58. https://www.ebi.ac.uk/pdbe/emdb/
  59. https://www.cancerimagingarchive.net/
  60. https://www.smir.ch/
  61. https://scicrunch.org.
  62. https://www.ebi.ac.uk/biostudies/
  63. https://ada.edu.au/
  64. https://datadryad.org.
  65. https://dataverse.org/
  66. https://zenodo.org
  67. https://www.re3data.org/
  68. https://researchdata.ands.org.au
  69. https://www.ands.org.au/online-services/research-data-australia
  70. https://provide.openaire.eu/
  71. https://catalogue.eosc-portal.eu/
  72. https://www.google.com/publicdata/directory
  73. https://toolbox.google.com/datasetsearch
  74. https://openknowledgemaps.org/
  75. https://fairsharing.org/biodbcore/?q=imaging
  76. https://www.fair4health.eu/.
  77. https://open-brain-consent.readthedocs.io/en/latest/anon_tools.html.
  78. http://latanyasweeney.org/work/identifiability.html,
  79. https://www.force11.org/datacitationprinciples
  80. http://www.dcc.ac.uk/resources/curation-reference-manual/completed-chapters/metadata
  81. https://ardc.edu.au/resources/working-with-data/metadata/
  82. http://obi-ontology.org/
  83. https://vocabs.ands.org.au/
  84. https://fairsharing.org/biodbcore/?q=imaging.
  85. https://researchdata.ands.org.au/
  86. https://docs.openmicroscopy.org/bio-formats/6.1.0/about/index.html
  87. https://svi.nl/HuygensSoftware
  88. http://bids.neuroimaging.io/
  89. https://doi.org/10.1038/sdata.2016.44
  90. https://fairsharing.org/FAIRsharing.rd1j6t
  91. https://github.com/IDR/idr-metadata
  92. https://github.com/UTS-eResearch/datacrate
  93. https://git-annex.branchable.com/
  94. https://www.datalad.org/
  95. https://protocolexchange.researchsquare.com
  96. https://datadryad.org//pages/readme.
  97. https://doi.org/10.1101/183814
  98. https://www.nature.com/articles/s41467-018-04976-1
  99. https://creativecommons.org/choose/
  100. https://choosealicense.com/
  101. https://ardc.edu.au/resources/working-with-data/citation-identifiers/data-citation/.
  102. https://datadryad.org/resource/doi:10.5061/dryad.bh78sn5
  103. http://commons.esipfed.org/node/1428.
  104. https://zenodo.org/record/1065991#.XO_SmIgzbIU.The
  105. https://www.nature.com/sdata/policies/repositories.
  106. https://www.nature.com/polopoly_fs/1.20541!/menu/main/topColumns/topLeftColumn/pdf/537138a.pdf
  107. https://www.nature.com/news/data-access-practices-strengthened-1.16370
  108. https://www.nature.com/articles/nn1009-1205
  109. https://dx.doi.org/10.1126%2Fscience.1213847
  110. https://osf.io/u28sb/
  111. http://neuromorpho.org/
  112. http://datasets.datalad.org/
  113. https://openneuro.org/
  114. https://github.com/OpenNeuroDatasets
  115. https://central.xnat.org/
  116. http://portal.brain-map.org/
  117. https://www.humanconnectome.org
  118. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3262165/
  119. http://loris.ca/
  120. http://fcon_1000.projects.nitrc.org/indi/cmi_healthy_brain_network/
  121. http://adni.loni.usc.edu/
  122. http://neuroinformatics.harvard.edu/gsp/
  123. https://brainbase.io/
  124. https://addictionresearch.nih.gov/abcd-study
  125. https://idr.openmicroscopy.org/about/
  126. https://www.openmicroscopy.org/omero/
  127. https://www.ebi.ac.uk/pdbe/emdb/
  128. https://www.cancerimagingarchive.net/
  129. https://www.smir.ch/
  130. https://scicrunch.org.
  131. https://www.ebi.ac.uk/biostudies/
  132. https://ada.edu.au/
  133. https://datadryad.org.
  134. https://dataverse.org/
  135. https://zenodo.org
  136. https://www.re3data.org/
  137. https://researchdata.ands.org.au
  138. https://www.ands.org.au/online-services/research-data-australia
  139. https://provide.openaire.eu/
  140. https://catalogue.eosc-portal.eu/
  141. https://www.google.com/publicdata/directory
  142. https://toolbox.google.com/datasetsearch
  143. https://openknowledgemaps.org/
  144. https://fairsharing.org/biodbcore/?q=imaging
  145. https://www.fair4health.eu/.
  146. https://open-brain-consent.readthedocs.io/en/latest/anon_tools.html.
  147. http://latanyasweeney.org/work/identifiability.html,
  148. https://www.force11.org/datacitationprinciples
  149. http://www.dcc.ac.uk/resources/curation-reference-manual/completed-chapters/metadata
  150. https://ardc.edu.au/resources/working-with-data/metadata/
  151. http://obi-ontology.org/
  152. https://vocabs.ands.org.au/
  153. https://fairsharing.org/biodbcore/?q=imaging.
  154. https://researchdata.ands.org.au/
  155. https://docs.openmicroscopy.org/bio-formats/6.1.0/about/index.html
  156. https://svi.nl/HuygensSoftware
  157. http://bids.neuroimaging.io/
  158. https://doi.org/10.1038/sdata.2016.44
  159. https://fairsharing.org/FAIRsharing.rd1j6t
  160. https://github.com/IDR/idr-metadata
  161. https://github.com/UTS-eResearch/datacrate
  162. https://git-annex.branchable.com/
  163. https://www.datalad.org/
  164. https://protocolexchange.researchsquare.com
  165. https://datadryad.org//pages/readme.
  166. https://doi.org/10.1101/183814
  167. https://www.nature.com/articles/s41467-018-04976-1
  168. https://creativecommons.org/choose/
  169. https://choosealicense.com/
  170. https://ardc.edu.au/resources/working-with-data/citation-identifiers/data-citation/.
  171. https://datadryad.org/resource/doi:10.5061/dryad.bh78sn5
  172. http://commons.esipfed.org/node/1428.
  173. https://zenodo.org/record/1065991#.XO_SmIgzbIU
  174. Rajasekar A.K., Moore R.W. (2001) Data and Metadata Collections for Scientific Applications. In: Hertzberger B., Hoekstra A., Williams R. (eds) High-Performance Computing and Networking. HPCN-Europe 2001. Lecture Notes in Computer Science, vol 2110. Springer, Berlin, Heidelberg doi:10.1007/3-540-48228-8_8, accessed 14 May 2019. https://www.nature.com/articles/nn1009-1205 https://www.nature.com/news/data-access-practices-strengthened-1.16370 Nature Editorial Data-access practices strengthened, 2014, links to other resources.

Comments + Extra resources

  1. This is an existing resource, similar but a different target, we will try not to overlap this effort, it is good to keep it in mind. https://librarycarpentry.org/Top-10-FAIR/2018/12/01/biomedical-data-producers/
  2. For any software related we can point to
    1. https://softdev4research.github.io/4OSS-lesson/
  3. A collaborative Australian Characterisation informatics Strategy https://www.cvl.org.au/__data/assets/pdf_file/0003/1367085/A-Collaborative-Australian-Characterisation-Informatics-Strategy.pdf
  4. https://www.nature.com/articles/nmeth.2073?platform=oscar&draft=collection
  5. Biological Imaging Software Tools, 2012 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3659807/
  6. http://www.repronim.org/module-FAIR-data/ The carpentries style lesson about FAIR for neurosciences
  7. Cool papers about FAIR https://www.nature.com/search?q=FAIR+principles&journal=&order=relevance
  8. http://www.rin.ac.uk/system/files/attachments/NESTA-RIN_Open_Science_V01_0.pdf, the case study for neuroimaging wasn’t as interesting as I expected, but leaving it here, if anyone wants to read pages 24,25,26
  9. Super awesome resources: https://github.com/ohbm/hackathon2019/blob/master/Tutorial_Resources.md#documenting-projects-and-code 13 June 2019
  10. How FAIR are your data? by Jones, Sarah; Grootveld, Marjan - A Checklist produced for use at the EUDAT (European data infrastructure is a collaborative data Infrastructure) summer school to discuss how FAIR the participant’s research data were and what measures could be taken to improve FAIRness https://zenodo.org/record/1065991#.XO_SmIgzbIU.