Introduction to OpenRefine


  • OpenRefine is ‘a tool for working with messy data’
  • OpenRefine works best with data in a simple tabular format
  • OpenRefine can help you split data up into more granular parts
  • OpenRefine can help you match local data up to other data sets
  • OpenRefine can help you enhance a data set with data from other sources

Importing data into OpenRefine


  • Use the Create Project option to import data
  • You can control how data imports using options on the import screen
  • Several files types may be imported into OpenRefine.

Layout of OpenRefine, Rows vs Records


  • OpenRefine uses rows and columns to display data
  • Most options to work with data in OpenRefine are accessed through a drop down menu at the top of a data column
  • When you select an option in a particular column (e.g. to make a change to the data), it will affect all the cells in that column
  • OpenRefine has a Records mode which links together multiple rows into a single record
  • Split and join multi-valued cells to modify the individual values within them
  • When creating multi-valued cells in your data, choose a separator that will not appear in the data values

Faceting and filtering


  • You can use facets and filters to explore your data
  • You can use facets and filters work with a subset of data in OpenRefine
  • You can correct common data issues from a Facet

Clustering


  • Clustering is a way of finding variant forms of the same piece of data within a dataset (e.g. different spellings of a name)
  • There are a number of different Clustering algorithms that work in different ways and will produce different results
  • The best clustering algorithm to use will depend on the data
  • Using clustering you can replace varying forms of the same data with a single consistent value

Working with columns and sorting


  • You can reorder, rename and remove columns in OpenRefine
  • Sorting in OpenRefine always sorts all rows
  • The original order of rows in OpenRefine is maintained during a sort until you use the option to Reorder Rows Permanently from the Sort drop-down menu

Introduction to Transformations


  • Common transformations are available through the Menu option

Writing Transformations


  • You can alter data in OpenRefine based on specific instructions
  • You can preview the results of your GREL expression

Transformations - Undo and Redo


  • You can use Undo and Redo to retrace ones’ steps
  • You can save and apply a set of steps to a new set of data using the ‘Extract’ and ‘Apply’ features

Transforming Strings, Numbers, Dates and Booleans


  • You can alter data in OpenRefine based on specific instructions
  • You can expand the data editing functions that are built-in into OpenRefine by building your own

Transformations - Handling Arrays


  • Arrays cannot appear directly in an OpenRefine cell
  • Arrays can be used in many ways using GREL expressions

Exporting data


  • You can export your data in a variety of formats

Looking Up Data


  • OpenRefine can look up custom URLs to fetch data based on what’s in an OpenRefine project
  • Such API calls can be custom built, or one can use existing Reconciliation services to enrich data
  • OpenRefine can be further enhanced by installing extensions