Instructor Notes


General notes on OpenRefine

Common problems

  • If learners are using a browser other than Firefox, or OpenRefine does not automatically open for them when they click the .exe file, have them point their browser at http://127.0.0.1:3333/ or http://localhost:3333 to launch the program.

  • Mac users with the newest operating system will have to allow this to run by “allowing everything” to run. They can change the setting back after the exercise.

  • Some students will run into issues with

    • unzipping
    • finding the .exe file once the software has been unzipped
    • finding the data file on their computers after downloading
  • If OpenRefine crashes when launched from a network share drive, do the following:

    • Copy the OpenRefine folder to a local drive not mapped to a network share, e.g. “C:\Users\JaneDoe”
    • Open a Windows Command prompt
    • Change the working directory to the OpenRefine folder at “C:\Users\JaneDoe”
    • Run openrefine.exe
  • If “https” doesn’t work to fetch CrossRef during Advanced OpenRefine Functions, they can try “http”

  • If they need to diagnose failure to fetch the content from the URL they can check the “Store error” option in the “Add column by fetching URLs” dialogue and try looking at the common problems listed in the documentation

  • The data for this lesson was pulled from DOAJ in 2015 and may not reflect the same data currently available from DOAJ on the day of your workshop.

Powerful transformations

  • In the titlecase exercise, highlight the fact that each transformation can have unintended side effects, and advise that running one cleanup operation too few may sometimes be preferable to one too many.

Introduction to OpenRefine


Importing data into OpenRefine


Instructor Note

This is a good moment to review the points from What Should I Know When Working with OpenRefine?



Instructor Note

Carefully guide learners on how to revisit OpenRefine’s homepage to explore import options when creating new or re-opening existing projects, select the large blue diamond in the upper left corner of the browser window.



Layout of OpenRefine, Rows vs Records


Faceting and filtering


Clustering


Working with columns and sorting


Sorting and Reorder Rows Permanently

Do not rush these last two sentences. Repeat them slowly after a pause and allow learners to explore how sorting works for a moment.

Although the “Undo/Redo” tab is not introduced until episode 9, it may be worth noting that applying a sort does not count as a change to the data because removing the sort will restore the data to its original order. However, once you select “Reorder Rows Permanently” this does count as a data change and adds an entry to the Undo/Redo history.



Introduction to Transformations


Writing Transformations


Transformations - Undo and Redo


Transforming Strings, Numbers, Dates and Booleans


Transformations - Handling Arrays


Different meanings of ‘transformation’

Ask the students what transformation means to them currently. Many may only know it from Excel to convert columns into rows or vice versa. Discuss how in OpenRefine, transformation is specifically the working window–these values are neither stored nor displayed in the cells or output.



Recap on best practice for separators

Recall previous discussion of dangers of changing separators and ensuring you avoid using a separator character that is already used in the text. A possible question to pose to learners could be: Which subject would be broken if a hyphen were used as a separator?



Exporting data


Looking Up Data