Thing 1: What is musical data?
Thing 2: Catalogues and repositories of musical data
Thing 3: Metadata
Thing 4: Persistent identifiers
Thing 5: Standards and protocols
Thing 6: Encoding standards
Thing 7: Ontologies
Thing 8: Linked Data
Thing 9: Licensing and provenance
Thing 10: FAIR policies
This is a brief guide to ten topics relevant to understanding how the FAIR data principles apply to music research. It includes brief activities designed for self-paced learning or as training ideas. The aim of this document is to help those who wish to find, publish or reuse musical data in adherence with the FAIR data principles.
As noted in this paper on Music Information Retrieval, music is multirepresentational, multicultural, multiexperiential, and multidisciplinary. Musical data therefore encompasses a wide range of types and formats, including symbolic representations such as scores, audio recordings, images of manuscripts, and information about works, performances and composers. In many cases, researchers may not think of such resources as ‘data’, instead referring to primary or secondary sources, reference works, databases, notes or annotations.
Activity 1: Discussion
Activity 2: Go to Digital Resources in Musicology (DRM) and review the top level categories. Which of these are relevant to your work?
Musical data is often organised into meaningful collections: groups of musical resources (recordings, scores, transcriptions, biographies, etc.) that make sense together and revolve around a central topic (a period, musician, genre, instrument, culture, etc.). These collections (or catalogues/repositories) are usually hard to find. This is why libraries have index cards, and databases have metadata: so users can browse and search them in order to reach the data they need.
Activity 1: Discussion
Activity 2: In this paper on Characterising the Landscape of Musical Data on the Web, the authors tried to find, and describe, as many Web music catalogues as possible. These are published in the musoW (Musical Data on the Web) registry. Are catalogues and collections of your domain covered in this table? If not, please add them at the end of the table.
Metadata can be defined as data about data. Metadata commonly describe characteristics such as format, contents, creator and publication date. This information is often captured using a metadata schema, which are designed to capture a common set of information in a structured manner. Whether you are searching for data or depositing a dataset, remember that the quality of the metadata captured influences how easily data can be found and potentially reused. In short, richer metadata increases findability.
Activity 1: Choose one catalogue from Thing 2. This could be, for example, your favourite PDF score collection. Then, browse schema.org and look for properties you could use to accurately describe that collection’s metadata (author, time, genre, location, etc.). Most of them will be metadata describing datasets. Are there any music-specific properties in schema.org (or elsewhere) you would use?
Activity 2: Go to Google Dataset Search and try to find your chosen dataset out of Thing 2. Is it there? Why do you think it is (not)?
Persistent identifiers are long-lasting references to a resource, like a document, webpage, file, or music score. They are designed to uniquely identify such resources, and to be actionable upon them: a protocol is typically able to retrieve the content they represent from them (see Thing 5).
There are two important issues about persistent identifiers and musical data: object level identification, and persistent identifier providers. Object level identification refers to the granularity and level of detail for the object for which the identifier is being created. Does the identifier represent a whole musical collection, an item inside that collection, a score within that item, a page of that score, a note within that page, an annotation? Persistent identifier providers refer to the institutional service that generates the identifiers and ensures that they will function permanently. Regular URLs (web addresses) can perform this role with adequate maintenance; but institutionally maintained identifiers (such as DOIs and PURLs) typically do this maintenance externally.
Activity 1: Discussion
We have seen so far that stable, eternal identifiers are useful to name and find musical resources. But how can we use these identifiers to access the data they represent? Accessing the data behind identifiers is what we do, for example, when we physically go to a designated library location, or when we write a URL in our Web browser and hit enter. Interestingly, these things can also be done by automated agents (robots, programs). Both humans and machines need a standard, open, free, universally understood and authenticated protocol (so: a systematic sequence of steps) to perform this access. On the Web, URLs are preferred for identifying (musical) things, and the protocol to access the content they represent is the Hypertext Transfer Protocol (HTTP). Despite its initial purpose to transport HTML pages from servers to Web browsers, HTTP can be used to access Web data of any kind.
'curl -L -H'Accept: text/turtle' http://dbpedia.org/resource/The_Beatles' (without the quotes and with your chosen band or song), and observe the results. What are the differences with respect to what was shown in the browser? What similitudes?
Apart from identifying resources uniquely, a key aspect of sharing them is to make them readable and actionable by other users and applications. This is valid for any relevant resource that is published on the Web. However, a large number of music activities depend on some musical content. When reusing musical objects from the Web, a key problem is the compatibility of the format with the target tool. Therefore, a number of standards have been developed by the community of researchers and practitioners to represent music scores and making it usable across applications. These include (but are not limited to):
Activity 1: Collect scores of the same song in different standards and compare them: do they include the same information?
Activity 2: Choose a tool/application you are familiar with and check which formats are supported and which ones are not. Request the missing feature to the organisation or community that supports the development.
Activity 3: Once a score is encoded according to a particular standard, how is it rendered? One tool for rendering MEI files is Verovio. Go to Verovio’s MEI viewer at https://www.verovio.org/mei-viewer.xhtml and use the navigation menu to turn pages, zoom in/out and switch between examples. Verevio is used in a range of projects, including digital editions of Beethoven and Mozart.
Ontologies are representations of concepts and their relations according to the meaning they have in a specific community. Standard formats like the one discussed above have the purpose of encoding information in a symbolic form. However, they generally lack details about the meaning of the symbols used, that is specified outside, usually on a documentation manual. Ontologies aim at expressing the meaning of the symbols used with a high degree of formalisation.
Ontologies are defined using Semantic Web standards: URIs, RDF, and OWL. Web ontologies can be useful to publish Linked Data on the Web (see below). Domain ontologies are developed with the purpose of representing concepts which belong to a specific part of the world, such as biology, social media, … or music!
Music ontologies vary from metadata standards to sophisticated schemas to represent music-related objects. Some examples are:
Activity 1: Find a music ontology on the Web. What is the aspect of Music whose meaning it describes? A starting point for your search could be Linked Open Vocabularies.
Activity 2: Find projects using music ontologies. What is the ontology useful/used for? For example, have a look at JazzCats and its data structures http://jazzcats.cdhr.anu.edu.au/documentation/ Which classes and properties are from existing ontologies?
Linked Data is a way of representing structured data using the Resource Description Framework (RDF), so multiple datasets can be easily connected and queried together via the SPARQL Protocol and RDF Query Language (SPARQL). The Web community has linked so far more than 1,200 datasets and 200 billion statements.
Activity 2: Awesome Semantic Web enumerates a large number of Linked Data tools. Which of them do you think would be useful to support linking musical data? Which of them would support FAIR in musical data?
Activity 3: Get to know a few methods and platforms for navigating Linked Data resources:
Licensing is a key topic in music and musicology, since music has historically been a cultural asset with strong ties with industrial exploitation and copyright. At the same time, researchers that investigate music need musical data to be openly available, which sets a whole spectrum of compromise. At the same time, the high availability of musical assets opens questions about the provenance of the data: Who made them and why? When? What instruments and musicians were involved? These questions might be key for trusting musical catalogs and establishing standards of data quality.
Activity 1: Enumerate data licenses that are typically used in your field. What are their limitations? Are there types of musical data for which specific licenses suit better? Are there needs not covered by any such license? Examples of data licenses are:
Activity 2: Do you need guidance on how to license your research data? Read OpenAIRE’s guide on how to apply licenses to research data.
Activity 3: Discuss practices and standards in recording provenance of musical data in your field. Is provenance recorded automatically, manually, or not at all? In what situations would provenance of musical data be useful or necessary?
The FAIR data principles have gained significant traction since their conception. Statements citing the importance of FAIR data can be found in the policies set by funders, higher education institutions, repositories, journals and publishers. For example, the Enabling FAIR Data initiative of the American Geophysical Union has been endorsed by a large number of publishers and repositories. Signatories to the initiative, such as Nature, aim to promote best practices for data sharing and have implemented policies that assist adherence to the FAIR principles. As research transitions towards a FAIRer future, what other policy developments do you expect to see in the next few years?
Activity 1: Discussion
Activity 3: Read Tuomas Eerola’s blog on Open Data in Music and Science, which includes comments on education and advocacy. What steps could you take to promote good data management and data sharing practices amongst your students and colleagues?