Using spreadsheet programs for data organization

  • We will discuss good practices for data entry and formatting
  • We will not discuss analysis or visualisation

Formatting data tables in Spreadsheets

  • Use one column for one variable
  • Use one row for one observation
  • Use one cell for one value
  • Never modify your raw data. Always make a copy before making any changes.
  • Keep all of the steps you take to clean your data in a plain text file.

Formatting problems

  • Don’t use multiple tables in one sheet
  • Don’t use multiple tabs in a file
  • Fill in zero when you mean zero
  • Use an appropriate null value to record missing data
  • Don’t use formatting to convey information or make the spreadsheet look pretty
  • Don’t put units or comments in cells
  • Don’t combine several values in one cell
  • Take care over column names
  • Avoid including special characters in your data file
  • Put metadata (units, legends etc.) in a separate file

Dates as data

  • Excel is notoriously bad at handling dates.
  • Treating dates as multiple pieces of data rather than one makes them easier to handle and exchange between programs.

Basic quality assurance and control

  • Use data validation tools to minimise the possibility of input errors.
  • Use sorting and conditional formatting to identify possible errors.

Exporting data from spreadsheets

  • Use .csv file format for data storage and processing

Caveats of popular data and file formats

  • Be careful when using commas in values