Top 10 FAIR Data & Software Things

Nanotechnology


Sprinters:

Contributors:

Description:

This brief guide is based on FAIR principles (findability, accessibility, interopera_bility, reusability) and describes data management in the field of nanotechnology. It consists of ten steps/chapters with information on data discovery and publication, practices and resources as well as activity ideas.

Audience:

It is addressed to nanotechnology students, researchers, librarians and research support staff.

Acknowledgments:

OpenAIRE Research Data Management Task Force

Things

Nanotechnology is part of the broader scientific discipline of material science and, particularly, it is the area of research in science, engineering and technology which deals with the manipulation and manufacturing of nano-dimensional materials at a scale of 10-9m. Nanotechnology applies in diverse disciplines, such as but not limited to physics, chemistry, engineering, energy, medicine, space, agriculture, information, communication etc.

“Some areas are more mature than others. For example, the entire semiconductor industry is now based on nanotechnology. Transistors with critical dimensions of 30nm are being built today and put together into circuits composed of over one billion devices, on one single chip, roughly the size of a thumbnail. In medicine, nanotechnology has led to the development of drug delivery vehicles and diagnostic devices for the detection and treatment of cancer. It is used in tissue engineering to repair damaged tissue and organs. There are advanced uses of nanotechnology in areas of storage, conversion, and renewal of energy – from LEDs to fuel cells to solar cells. Most high-tech information and communication devices use nanoscale production processes. Nanotechnology is also found in everyday consumer goods, such as stain-resistant fibers for clothes, tennis balls, running shoes, cosmetics, and numerous other day-to-day products” (Source: https://nanohub.org/about/nano)

Depending on how nanomaterials are manufactured, they bear different properties, however the main attribute of nanoscale materials, structures, devices and systems is electricity production and binary information production, storage and transmission. Nanotechnology materials can be used in mobile applications and the emerging flexible electronics, in detectors/sensors of security systems and for health monitoring, in everyday objects to make them resilient and durable (e.g. baseball bats, tennis rackets) etc.

Current research trends include the exploration of two dimensional (2D) materials’ surface capabilities, like graphene for use in the areas of flexible electronics and valleytronics. 2D materials are atomically thins, exhibit remarkable surface properties and capabilities and their output vary when compiled by different angles.

In addition to the processes of materials manipulation and manufacturing, there is research around nanotechnology which concerns complementary aspects such as risk assessment and governance, physicochemical characterization, (eco)toxicity testing, exposure, life-cycle impact assessment and decision support for sustainability of nanotechnologies among others. In fact, the European project GRACIOUS aims to provide the means to more efficiently assess risk and obtain safety information for the diverse in size, morphology and surface characteristics nanomaterials/nanoforms to ultimately develop a grouping framework.

Activity 1 - Discussion

Thing 2 - Workflow and Methods

Nanotechnology research is based on hypotheses (theory) and experiments. Hypotheses explore ways of producing structures that are 2D and simulations confirm or deny these hypotheses1. Theoretical and experimental researchers work together to produce new materials and their communication is bidirectional:

There are different methods for the sample preparation and characterization. The former mainly focus on the thin-film deposition of single crystals and include, but are not limited to, Molecular-Beam Epitaxy (MBE) and Chemical Vapor Deposition (CVD). The latter include a variety of characterization methods such as Raman and TEM spectroscopy, atomic force microscopy (AFM) and nonlinear microscopy (second harmonic generation (SHG) and two-photon photoluminescence (2p-PL)).

In addition there are free access packages for the calculation of the electronic, optical, mechanical and thermal properties of 2D materials, based on first principle calculations. The most widely used packages are Quantum espresso, VASP, Yambo and Wien2K.IATA (Integrated Testing and Assessment Strategy) usually defines which methods and hypotheses could be used for the given purpose.

Activity 1 - Discussion
Based on your field of application and your experience, what kind of workflows seem to work best? Why?

Thing 3 - Data types, outputs and formats

As already addressed, nanotechnology is a multidisciplinary scientific field by nature. It can be applied to many disciplines and thus, the types of data produced by nanotechnology research can be from simulations to chemical data and more. Some data types are, but are not limited to, the following:

  1. chemical data on intrinsic properties you measure the properties: aspect ratio, chemical composition, size, surface area, surface properties, coating
  2. chemical data on extrinsic properties : particle interacting within the environment
  3. crystallographic data about the real space positions of the atoms in 2D crystals such as graphene, transition metal dichalcogenides, hexagonal boron nitride and black phosphorous.
  4. data related to stratified structures comprising the same (homostructures) or different (heterostructures) 2D materials[L1] , such as the number of layers and the relative orientation (twist angle, poire patterns)
  5. output data of the density functional theory, e.g. band structure, direct and indirect band gaps, excitonic resonances

Open and standardised file formats are essential for accessing data by providing freely available specification documents necessary to open and read their corpus. Most common file formats used in Nanotechnology are CIF2, UPF3.

Activity 1 - Defining your data discussion

Activity 2 - Other uses for your data
Would your data be of use in any other research?

Thing 4 - Describing data: Metadata

Metadata is data about data and is an essential set of information describing scientific outputs, in the form of either physical or digital objects, in a machine-readable format. According to the expected use, metadata can be given different attributes. Most common type which enables discovery and identification are descriptive metadata. Descriptive metadata contain information about key aspects needed to search for and successfully find a given scientific output, e.g. by its title, author/creator, abstract, keywords. Moreover, metadata may be used for describing a service or a scientific instrument.

Depending on the area of focus, in nanotechnology there are few metadata standards, vocabularies and ontologies to facilitate standardised interpretation.

“NeXus is an international standard for the storage and exchange of neutron, x-ray, and muon experiment data. The structure of NeXus files is extremely flexible, allowing the storage of both simple data sets, such as a single data array and its axes, and highly complex data and their associated metadata, such as measurements on a multi-component instrument or numerical simulations. NeXus is built on top of the container format HDF5, and adds domain-specific rules for organizing data within HDF5 files in addition to a dictionary of well-defined domain-specific field names.”4

“ISA-TAB-Nano specifies the format for representing and sharing information about nanomaterials, small molecules and biological specimens along with their assay characterization data (including metadata, and summary data) using spreadsheet or TAB-delimited files.”5

“NanoParticle Ontology “represents the basic knowledge of physical, chemical and functional characteristics of nanotechnology as used in cancer diagnosis and therapy.”6

Apart from metadata records describing datasets, documentation is equally essential when writing code. As the minimum example, a README file helps ensure that your data can be correctly interpreted and reanalysed by others. A README plain text file should contain the following information:

For each filename, a short description of what data it includes, optionally describing the relationship to the tables, figures, or sections within the accompanying publication; for tabular data: definitions of column headings and row labels; data codes (including missing data); and measurement units; any data processing steps, especially if not described in the publication, or provenance file, that may affect interpretation of results; a description of what associated datasets are stored elsewhere, if applicable; whom to contact with questions, read more https://datadryad.org/pages/readme.

Activity 1
Go to the Research Data Alliance Metadata Directory and search for a metadata standard relevant to the focus of your nanotechnology work. Is there any relevant to your focus?

Activity 2
Have a look at the metadata record for the 7T Magnetom instrument at Research Data Australia. It contains simple but important public metadata and a persistent identifier. How could this record be enhanced to assist you as a researcher?

Activity 3
Go to OpenAIRE and search with a keyword about your research interests. What are the nanotechnology projects you find? How many publications, data or software did you get? What are some of the projects associated with these outputs?

Thing 5 - Identifiers

Persistent and/or Permanent Identifiers (PIDs) uniquely identify objects, people, organisations and activities and can ensure that the scientific output is accessible even when the URL of the website has changed. PIDs can be assigned to research outputs including publications, data and software/code. PIDs can also be assigned to researchers, samples, organisations and projects. A PID may be connected to a metadata record describing an item rather than the item itself.

The repository used for data or software deposit should make use of a PID service and assign PIDs to its outputs, also in compliance with the FAIR principles. PIDs can be resolved by the use of tools, such as the DOI Resolver.

PIDs used in research include:

To learn more about persistent identifiers visit Go-FAIR F1 Principle.

Activity 1
Search for papers and data in nanotechnology and observe the use of ORCIDs. Is it widely known in your community? Why do you think that is?

Activity 2
(Read 5 min) OpenAIRE/FREYA/ORCID guide for researchers “How can identifiers improve the dissemination of your research outputs?

Activity 3
(Watch 4 min) Six Ways to Make Your ORCID iD Work for You! If you already have an ORCID, check this video https://www.youtube.com/watch?v=h92bUZ5T_vA to link publications to your ORCID profile.

Activity 4
(Discuss in pairs 5 min) The Joint Declaration of Data Citation Principles from FORCE11.

Thing 6 - Interoperability

Interoperability enables data and metadata to flow between different systems with the use of standard vocabularies and references to other data and metadata. That is why standardised formats of protocols are important. In nanotechnology, you may find protocols for:

In materials science, as researchers create independent materials databases, much can be gained from retrieving data from multiple databases. However, the retrieval process is difficult if each database has a different API which is a common case. To address this challenge, there are initiatives such as the Open Databases Integration for Materials Design (OPTiMaDe) consortium which aim to make materials databases interoperational by developing a common REST API.

Activity 1
Protocol Exchange from Nature Protocols is an open repository of community-contributed protocols. Search for nanotechnology protocols publicly shared.

Thing 7 - Licenses and provenance of data for reusability

Licenses grant specific permissions for researchers other than the owner to use scientific output, such as publication, data or software following following each time the specified set of exploitation requirements/recommendations tied with the license.

OpenAIRE Guide for Researchers “How do I license my research data?” provides information about licenses for research data and how to apply them. You could also check OpenAIRE Guide for researchers “Can I reuse someone else’s research data?” and “How do I know if my research data is protected?”.

You may find useful information about specific licenses for data and software/code below:

Some tools which assist selection processes when applying licenses to your outputs are:

One of the key challenges of the materials science domain is the need to automatically prepare, execute and monitor workflows of calculations as well as to transparently retrieve and store the results in a format which is easy to browse and query. AiiDA’s design is based on directed graphs to track the provenance of data, and ensure preservation and searchability. Last, complex sequences of calculations can be encoded into scientific workflows. Sharing capabilities of AiiDA have greatly helped scientific repositories like Materials Cloud.

Activity 1
Select a dataset or code you have been working on lately and choose a license from the list. What did you choose? Why? Have discussions with your team members about licensing and sharing.

Activity 2
Go to AiiDA and load a calculation that you have recently created or that you are currently working on. What do you see? Make some changes. Can you find the previous version? Can you find the first set of calculations that you did in this sequence?

Thing 8 - Services and tools to store, publish and analyse data

Open science for nanotechnology, means open resources, databases and other platforms. Scientific output is often hosted in public and interoperable infrastructures to use, ideally freely available, but sometimes with a small /reasonable reproduction cost resulting from creating, maintaining and publishing data. So, nanotechnology data may be published and stored in a laboratory website or repository, an institutional website or repository, subject specific databases, servers or repositories or it may be collocated with the publication/ journal article it relates to.

Indicative resources:

Registries:

Hubs:

Databases:

Repositories:

Several platforms integrate centralized data repositories with software frameworks used to compute data such as:

Educational Resources:

Activity 1 - re3data
Go to re3data and search for a data repository based on your area and nanotechnology focus. What are the additional information provided by their functions? Are they useful to you? Are they relevant to the FAIR principles?

Activity 2
Looking after your data: Where do you usually store your data? Do you identify any major differences with those repositories listed in re3data for example?

Activity 3
Archiving your data: What data should be kept or destroyed after the end of your project? For how long should data be kept after the end of your project? Where will the data you keep be archived? When will data be moved into the archive? Who is responsible for moving data to the archive and maintaining them?

Activity 4
Sharing your data: Who else has a right to see or use this data during the project? What data should or shouldn’t be shared openly and why? Who should have access to the final dataset and under what conditions? How will you share your final dataset?

Thing 9 - Nanotechnology and High Performance Computing (HPC)

In physics, there are two methods used for manipulation of nano-materials:

The most popular method is the first principle calculation, however it is computationally intensive and is best undertaken within HPC environments.

In Europe, there are initiatives aiming to tackle big data and intensive analysis issues in a uniform way, such as HELIX (Hellenic Data Service) in Greece, a convergence e-infrastructure with Virtual Machines, cloud computing and HPC capabilities which lowers expected time for analysis to just a few minutes. The European Open Science Cloud (EOSC) which is currently under development, will eventually become a complete and trusted environment of such services and infrastructures serving the whole research lifecycle.

Activity 1
Discussion Have you experienced any challenges when managing and analysing your data relevant to the time of analysis required to take place? How did you overcome them?

Thing 10 - More best practices

These are some additional best practices to follow in order to improve data and software reusability by others, including oneself when accessing data that have been generated long time ago. Adding terms and conditions of accessibility is an option to consider when data can’t be shared completely open. To share data, consult Thing 8 - Services and tools to store, publish and analyse data. To get started, some issues to consider are:

Data structures: Keep consistent file and folder naming conventions across linked projects.

Containerisation: For data processing pipelines

Activity 1
How do you structure and name your folders and files? How do you manage different versions of your files? What additional information is required to understand the data?

Activity 2 - Discussion + Action
What Can You Do?

Activity 2 - Questions
Go through the questions from the Horizon2020 guide to create a FAIR Data Management Plan and see if you can already answer many of them. Then check the NanoCommons Data Management Plan and compare with your responses. Do you identify any differences?

Recommended extra reading Ten Simple Rules for Creating a Good Data Management Plan, Ten Simple Rules for Reproducible Computational Research and Ten principles for machine-actionable data management plans, these papers will help you connect all the concepts that you have learned so far.

Notes

  1. Check also p.44 “B. Life Cycle of Nanomaterials” in CODATA-VAMAS Working Group On the Description of Nanomaterials, ., & Rumble, J. (2016, June 30). Uniform Description System for Materials on the Nanoscale, Version 2.0. Zenodo. http://doi.org/10.5281/zenodo.56720 

  2. Crystallographic Information Framework (CIF) “A well-established standard file structure for the archiving and distribution of crystallographic information, CIF is in regular use for reporting crystal structure determinations to Acta Crystallographica and other journals. Sponsored by the International Union of Crystallography, the current standard dates from 1997. As of July 2011, a new version of the CIF standard is under consideration.” More information, tools and use cases are available here: https://rd-alliance.github.io/metadata-directory/standards/cif-crystallographic-information-framework.html 

  3. UPF stands for “Unified Pseudopotential Format” and it is used to describe pseudopotentials (initial values to facilitate the simulations performed on e.g. crystal structures that can be described in a e.g. CIF file). UPF is widely used by the nanotechnology community. Recent developments on the format structure show that there are efforts in converting UPF to XML: http://www.quantum-espresso.org/pseudopotentials/unified-pseudopotential-format 

  4. https://rd-alliance.github.io/metadata-directory/standards/ 

  5. https://doi.org/10.25504/FAIRsharing.njqq5b 

  6. https://doi.org/10.25504/FAIRsharing.vy0p71