Library Carpentry: Introduction to Git

What is Git/GitHub?


Teaching: 10 min
Exercises: 0 min
  • What is Git?

  • What is GitHub?

  • recognize why version control is useful

  • distinguish between Git and GitHub

What is Version Control

Version control is a name used for software which can help you record changes you make to the files in a directory on your computer. Version control software and tools (such as Git and Subversion/SVN) are often associated with software development, and increasingly, they are being used to collaborate in research and academic environments. Version control systems work best with plain text files such as documents or computer code, but modern version control systems can be used to track changes in any type of file.

At its most basic level, version control software helps us register and track sets of changes made to files on our computer. We can then reason about and share those changes with others. As we build up sets of changes over time, we begin to see some benefits.

Benefits of using version control?

There are many more reasons to use version control, and we’ll explore some of these in the library context, but first let’s learn a bit about a popular version control tool called Git.

What are Git and GitHub?

We often hear the terms Git and GitHub used interchangeably but they are slightly different things.

Git is one of the most widely used version control systems in the world. It is a free, open source tool that can be downloaded to your local machine and used for logging all changes made to a group of designated computer files (referred to as a “git repository” or “repo” for short) over time. It can be used to control file versions locally by you alone on your computer, but is perhaps most powerful when employed to coordinate simultaneous work on a group of files shared among distributed groups of people.

Rather than emailing documents with tracked changes and some comments and renaming different versions of files (example.txt, exampleV2.txt, exampleV3.txt) to differentiate them, we can use Git to save (or in Git parlance, “commit”) all that information with the document itself. This makes it easy to get an overview of all changes made to a file over time by looking at a log of all the changes that have been made. And all earlier versions of each file still remain in their original form: they are not overwritten, should we ever wish to “roll back” to them.

Git was originally developed to help software developers work collaboratively on software projects, but it can be and is used for managing revisions to any file type on a computer system, including text documents and spreadsheets. Once installed, interaction with Git is done through the Command Prompt in Windows, or the Terminal on Mac/Linux. Since Word documents contain special formatting, Git unfortunately cannot version control those, nor can it version control PDFs, though both file types can be stored in Git repositories.

How can understanding Git help with work in libraries?

GitHub on the other hand is a popular website for hosting and sharing Git repositories remotely. It offers a web interface and provides functionality and a mixture of both free and paid services for working with such repositories. The majority of the content that GitHub hosts is open source software, though increasingly it is being used for other projects such as open access journals (e.g. Journal of Open Source Software), blogs, and regularly updated text books.

How can GitHub help with work in libraries?

Uses in a Library Context

Consider these common library world scenarios:

Scenario 1: Local library looking to start a crowdsourcing project

A local librarian is looking to put thousands of historical photographs of the area online so that the community can help identify the people and places they depict. She combs the web for examples of existing crowdsourcing projects, and even though they all appear unique to each institution, she notices quite a few seem to have almost the exact same functionality and structure. Rather than build a whole new version from scratch herself, she wishes there was a way to just copy the code of an existing one, and modify it to reflect her project. She notices the GitHub icon at the bottom of one of the projects she likes, but clicking on the link just brings her to a confusing directory of files and oddly labeled buttons such as “Fork”.

GitHub hosts many open-licensed projects and allows any user to fork any public project. By clicking the “fork” button, any GitHub user can almost instantaneously create their own version of an existing project. That “forked” project can be used as the basis for a new project, or can be used to work out new features that can be merged back into the original. (From: GitHub for Academics )

Scenario 2: Multiple people editing metadata for a collection

A metadata specialist has exported a spreadsheet from a repository for cleaning and editing. She’s working with a group of library workers and students, so they need to make sure edits don’t conflict. They also need to be able to undo any edits and preserve the original metadata. Once edits are complete, the whole group wants to review the changes before re-ingesting the spreadsheet of metadata into the repository.

The team can choose to use Git by itself to track changes and resolve conflicts or they can choose to use GitHub to host the project so that users can collaborate and review changes on the Web. Git will preserve the original metadata as well as all edits. GitHub will facilitate discussion about what changes should be made, who should make them, and why.

Key Points

  • Version control helps track changes to files and projects

  • Git and GitHub are not the same

Getting started with git


Teaching: 20 min
Exercises: 0 min
  • What are repositories and how are they created?

  • What do add and commit mean?

  • How do I check the status of my repository?

  • create a git repository

  • track changes to files using the git repository

  • query the current status of the git repository

Using Git

One of the main barriers to getting started with git is the language. Although some of the language used in git is fairly self-explanatory, other terms are not so clear. The best way to get to learn the language - which consists of a number of verbs such as add, commit and push (preceded by the word ‘git’) - is by using it, which is what we will be doing during this lesson. These commands will be explained as we proceed from setting up a new version-controlled project to publishing our own website.

Creating a repository

A Git repository is a data structure used to track changes to a set of project files over time. Repositories are stored within the same directory as these project files, in a hidden directory called .git. We can create a new git repository either by using GitHub’s web interface, or via the command line. Let’s use the command line to create a git repository for the experiments that we’re going to do today.

First, we will create a new directory for our project and enter that directory. <!explain commands as we go along>

$ mkdir hello-world
$ cd hello-world

We will now create an empty git repository to track changes to our project. To do this we will use the git init command, which is simply short for initialise.

$ git init
Initialized empty Git repository in <your file path>/hello-world/.git/

The hello-world directory is now a git repository.

If we run the ls command now (ls lists the content of the hello-world directory), the repository might seem empty; however, adding the -a flag for all files via ls -a will show all hidden files, which in this case includes the new hidden directory .git. Flags can simply be thought of as command line options that can be added to shell commands.

Note that whenever we use git via the command line, we need to preface each command (or verb) with git, so that the computer knows we are trying to get git to do something, rather than some other program.

Displaying the current project’s status

We can run the git status command to display the current state of a project. Let’s do that now.

$ git status
On branch master
No commits yet
nothing to commit (create/copy files and use "git add" to track)

The output tells us that we are on the master branch (more on this later) and that we have nothing to commit (no unsaved changes).

Adding and committing

We will now create and save our first project file. This is a two-stage process. First, we add any files for which we want to save the changes to a staging area, then we commit those changes to the repository. This two-stage process gives us fine-grained control over what should and should not be included in a particular commit.

Let’s create a new file using the touch command, which is a quick way to create an empty file.

$ touch

The .md extension above signifies that we have chosen to use the Markdown format, a lightweight markup language with plain text formatting syntax. We will explore Markdown a bit later.

Let’s check the status of our project again.

$ git status
On branch master
No commits yet
Untracked files:
  (use "git add <file>..." to include in what will be committed)

nothing added to commit but untracked files present (use "git add" to track)

This status is telling us that git has noticed a new file in our directory that we are not yet tracking. With colourised output, the filename will appear in red. To change this, and to tell Git we want to track any changes we make to, we use git add.

$ git add

This adds our Markdown file to the staging area (the area where git checks for file changes). To confirm this we want to use git status again.

$ git status
On branch master

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)

    new file:

If we are using colourised output, we will see that the filename has changed colour (from red to green). Git also tells us that there is a new file to be committed but, before we do that, let’s add some text to the file.

We will open the file with any text editor we have at hand (e.g. Notepad on Windows or TextEdit on Mac OSX) and enter # Hello, world!. The hash character is one way of writing a header with Markdown. Now, let’s save the file within the text editor and check if Git has spotted the changes.

$ git status
On branch master

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)

	new file:

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)


This lets us know that git has indeed spotted the changes to our file, but that it hasn’t yet staged them, so let’s add the new version of the file to the staging area.

$ git add

Now we are ready to commit our first changes. Commit is similar to ‘saving’ a file to Git. However, compared to saving, a commit provides a lot more information about the changes we have made, and this information will remain visible to us later.

$ git commit -m 'Add'
[master (root-commit) e9e8fd3] Add
 1 file changed, 1 insertion(+)
 create mode 100644

We can see that one file has changed and that we made one insertion, which was a line with the text ‘#Hello, world!’. We can also see the commit message ‘Add’, which we added by using the -m flag after git commit. The commit message is used to record a short, descriptive, and specific summary of what we did to help us remember later on without having to look at the actual changes. If we just run git commit without the -m option, Git will launch nano (or whatever other editor we configured as core.editor) so that we can write a longer message.

Having made a commit, we now have a permanent record of what was changed, along with metadata about who made the commit and at what time.

More on the Staging Area

If you think of Git as taking snapshots of changes over the life of a project, git add specifies what will go in a snapshot (putting things in the staging area), and git commit then actually takes the snapshot, and makes a permanent record of it (as a commit). If you don’t have anything staged when you type git commit, Git will prompt you to use git commit -a or git commit --all, which is kind of like gathering everyone for the picture! However, it’s almost always better to explicitly add things to the staging area, because you might commit changes you forgot you made. (Going back to snapshots, you might get the extra with incomplete makeup walking on the stage for the snapshot because you used -a!) Try to stage things manually, or you might find yourself searching for “git undo commit” more than you would like!

The Git Staging Area

At the moment, our changes are only recorded locally, on our computer. If we wanted to work collaboratively with someone else they would have no way of seeing what we’ve done. We will fix that in the next episode by using GitHub to share our work.

Key Points

  • Git repositories contain metadata about files under version control

  • This metadata enables us to track changes to files over time

  • Git uses a two-stage commit process. Changes to files must first be added to the staging area, then committed to the repository

Sharing your work


Teaching: 30 min
Exercises: 0 min
  • How can I use Git and GitHub to share my work?

  • How do I link a local Git repository to GitHub?

  • How do I move changes between a local Git repository and a GitHub repository?

  • How can I see the differences between my current file and my most recent commit?

  • create a remote repository on GitHub

  • link a local Git repository to a remote GitHub repository

  • move changes between the local and remote repositories using push and pull

  • examine the difference between an edited file and the file’s most recently committed version

Create a repository on GitHub

When we have logged in to GitHub, we can create a new repository by clicking the + icon in the upper-right corner of any page then selecting New repository. Let’s do this now.

GitHub will ask if you want to add a, license or a .gitignore file. Do not do any of that for now.

Choosing a license

Choosing a license is an important part of openly sharing your creative work online. For help in wading through the many types of open source licenses, please visit

Connecting your local repository to the GitHub repository

In the previous episode we created a local repository on our own computer. Now we have also created a remote repository on GitHub. But at this point, the two are completely isolated from each other. We want to link them together to synchronize them and share our project with the world.

To do this, we need the GitHub repository URL, which should look something like this (with “some-librarian” replaced with your username):

The repository URL on GitHub

If the URL starts with git@ rather than https://, please click the “HTTPS” button to change it.


We use HTTPS here because it does not require additional configuration, which vary from operating system to operating system. If you start using Git regularly, you would like to set up SSH access, which is a bit more secure and convenient, by following one of the great tutorials from GitHub, Atlassian/BitBucket and GitLab (this one has a screencast).

Notice that GitHub is actually helpful enough to provide instructions for us so we don’t have to remember these commands:

GitHub instructions

You can therefore choose to copy these and paste them on the command line. Or you can choose to type them out to get them into your fingers. I will do that. So we start with the command to link our local repository to the GitHub repository:

$ git remote add origin

where some-librarian should be replaced with your own username.

Why origin?

origin in the git remote add line is just a short name or alias we’re giving to that big long repository URL. It could be almost any string we want, but by convention in git, it is usually called origin, representing where the repo originated.

We can check that it is set up correctly with the command:

$ git remote -v
origin<your_github_username>/hello-world (fetch)
origin<your_github_username>/hello-world (push)

Pushing changes

Now we have established a connection between the two repositories, but we still haven’t synchronized their content, so the remote repository is still empty. To fix that, we will have to “push” our local changes to the GitHub repository. We do this using the git push command:

$ git push -u origin master
Counting objects: 3, done.
Writing objects: 100% (3/3), 226 bytes | 0 bytes/s, done.
Total 3 (delta 0), reused 0 (delta 0)
 * [new branch]      master -> master
Branch master set up to track remote branch master from origin.

The nickname of our remote repository is “origin” and the default local branch name is “master”. The -u flag tells git to remember the parameters, so that next time we can simply run git push and Git will know what to do.

Pushing our local changes to the Github repository is sometimes referred to as “pushing changes upstream to Github”. The word upstream here comes from the git flag we used earlier in the command git push -u origin master. The flag -u refers to -set-upstream, so when we say pushing changes upstream, it refers to the remote repository.

You may be prompted to enter your GitHub username and password to complete the command.

When we do a git push, we will see Git ‘pushing’ changes upstream to GitHub. Because our file is very small, this won’t take long but if we had made a lot of changes or were adding a very large repository, we might have to wait a little longer. We can check where we’re at with git status.

$ git status
On branch master
Your branch is up-to-date with 'origin/master'.
nothing to commit, working tree clean

This output lets us know where we are working (the master branch). We can also see that we have no changes to commit and everything is in order.

We can use the git diff command to see changes we have made before making a commit. Open with any text editor and enter some text on a new line, for instance “A new line” or something else. We will then use git diff to see the changes we made:

$ git diff
diff --git a/ b/
index aed0629..989787e 100644
--- a/
+++ b/
@@ -1 +1,2 @@
-# Hello, world!
\ No newline at end of file
+# Hello, world!
+A new line

The command produces lots of information and it can be a bit overwhelming at first, but let’s go through some key information here:

  1. The first line tells us that Git is producing output similar to the Unix diff command, comparing the old and new versions of the file.
  2. The second line tells exactly which versions of the file Git is comparing; aed0629 and 989787e are unique computer-generated identifiers for those versions.
  3. The third and fourth lines once again show the name of the file being changed.
  4. The remaining lines are the most interesting; they show us the actual differences and the lines on which they occur. In particular, the + markers in the first column show where we have added lines.

We can now commit these changes:

$ git add
$ git commit -m 'Add another line'

If we are very forgetful and have already forgotten what we changes we have made, git log allows us to look at what we have been doing with our git repository (in reverse chronological order, with the very latest changes first).

$ git log
commit 8e2eb9920eaa0bf18a4adfa12474ad58b765fd06
Author: Your Name <your_email>
Date:   Mon Jun 5 12:41:45 2017 +0100

    Add another line

commit e9e8fd3f12b64fc3cbe8533e321ef2cdb1f4ed39
Author: Your Name <your_email>
Date:   Fri Jun 2 18:15:43 2017 +0100


This shows us the two commits we have made and shows the messages we wrote. It is important to try to use meaningful commit messages when we make changes. This is especially important when we are working with other people who might not be able to guess as easily what our short cryptic messages might mean. Note that it is best practice to always write commit messages in the imperative (e.g. ‘Add’, rather than ‘Adding’).

Pushing changes (again)

Now, let’s have a look at the repository at GitHub again (that is, with some-librarian replaced with your username). We see that the file is there, but there is only one commit:

Only one commit on GitHub

And if you click on you will see that it contains the “Hello, world!” header, but not the new line we just added.

This is because we haven’t yet pushed our local changes to the remote repository. This might seem like a mistake in design but it is often useful to make a lot of commits for small changes so you are able to make careful revisions later and you don’t necessarily want to push all these changes one by one.

Another benefit of this design is that you can make commits without being connected to internet.

But let’s push our changes now, using the git push command:

$ git push
Counting objects: 3, done.
Writing objects: 100% (3/3), 272 bytes | 0 bytes/s, done.
Total 3 (delta 0), reused 0 (delta 0)
   e9e8fd3..8e2eb99  master -> master

And let’s check on GitHub that we now have 2 commits there.

Pulling changes

When working with others, or when we’re making our own changes from different machines, we need a way of pulling those remote changes back into our local copy. For now, we can see how this works by making a change on the GitHub website and then ‘pulling’ that change back to our computer.

Let’s go to our repository in GitHub and make a change. Underneath where our file is listed you will see a button to ‘Add a README’. Do this now, entering whatever you like, scrolling to the bottom and clicking ‘Commit new file’ (The default commit message will be ‘Create’, which is fine for our purposes).

The README file

It is good practice to add a README file to each project to give a brief overview of what the project is about. If you put your README file in your repository’s root directory, GitHub will recognize and automatically surface your README to repository visitors

Our local repository is now out of sync with our remote repository, so let’s fix that by pulling the remote changes into our local repository using the git pull command.

$ git pull
remote: Counting objects: 3, done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 3 (delta 0), reused 0 (delta 0), pack-reused 0
Unpacking objects: 100% (3/3), done.
   8e2eb99..0f5a7b0  master     -> origin/master
Updating 8e2eb99..0f5a7b0
Fast-forward | 1 +
 1 file changed, 1 insertion(+)
 create mode 100644

The above output shows that we have fast-forwarded our local repository to include the file We could confirm this by entering the ls command.

When we begin collaborating on more complex projects, we may have to consider more aspects of git functionality, but this should be a good start. In the next section, we can look more closely at collaborating and using GitHub pages to create a website for our project.

Key Points

  • remote repositories on GitHub help you collaborate and share your work

  • push is a Git verb for sending changes from the local repository to a remote repository

  • pull is a Git verb for bringing changes from a remote repository to the local repository

  • diff is a Git verb for viewing the difference between an edited file and the file’s most recent commit



Teaching: 5 min
Exercises: 20 min
  • How can I cement my understanding of Git’s functions?

  • rephrase Git commands using everyday language

  • demonstrate Git’s functions by drawing or sketching

Let’s review

It is likely that some things won’t have stuck from the last hour. To try to reinforce how things work we can work in groups to develop diagrams to illustrate Git functions and language. This should make carrying out more complicated aspects of Git clearer in our heads.

In groups:

Exercise - visualising git

In group work, spend some time trying to illustrate some of the commands we’ve used with Git:

If you want to practise more feel free to keep practicising making changes to your file and committing the changes. If you want to explore more git commands, search for some more online or follow one of the suggested links below.

Key Points

  • the language of Git can be confusing and intimidating

  • rephrasing commands and drawing concepts can clarify Git’s workflow

GitHub Pages


Teaching: 15 min
Exercises: 20 min
  • What is GitHub Pages?

  • How can I use GitHub Pages to collaborate and share my work?

  • create a GitHub Pages branch and push a file to it

  • with a partner, experiment with collaborating on a GitHub Pages website

  • apply the workflow between local and remote repositories to collaborate on a website

GitHub Pages

GitHub Pages is a simple service to publish a website directly on GitHub from a Git repository. You add some files and folders to a repository and GitHub Pages turns it into a website. You can use HTML directly if you like, but they also provide Jekyll, which renders Markdown into HTML and makes it really easy to setup a blog or a template-based website.

Why GitHub Pages is awesome!

GitHub Pages allows you to version control your website. This is useful for a lot of different reasons. It allows you to keep a record of what changes you have made. It allows people to reference your website at a particular point in time and (if you make your source open) to see what it was like at that particular point in time. This is very useful for academic citations. Most people have had the experience of following up a reference to a website and either getting a 404 error or seeing something completely different. Although using versions on your site doesn’t guarantee this won’t happen, it does make it easier to manage old versions of your site.

GitHub Pages also mean that you can collaborate on a website with a lot of people without everyone having to communicate endlessly back and forwards about what changes need to be made, or have been made already. You can create ‘issues’ (things that need discussing or fixing), list things to do in the future, and allow other people visiting your website to quickly suggest, and help implement changes through pull requests.

Setting up a site

Now we’re all persuaded of how awesome GitHub Pages is (or you’ve identified some fatal flaws in my reasoning), it would be useful to try playing around with some things we can do with it. This will help us cement what we have learned in the previous hour and may help spark discussion for the last section of this session.

There are various options for setting up a GitHub Pages site. Let’s run through a few of them now.

The gh-pages branch

GitHub Pages uses a special branch in your GitHub repository to look for website content, and by default this is the branch with the name ‘gh-pages’. You can actually change this, under repository settings, to use for instance the master branch instead, but let’s stick with the default for now.

It’s possible to create a new branch directly on GitHub, but we will use the command line now. So we will move back to the command line and type

$ git checkout -b gh-pages
$ git push
fatal: The current branch gh-pages has no upstream branch.
To push the current branch and set the remote as upstream, use

    git push --set-upstream origin gh-pages

Ouch, that didn’t go as we wanted, we got a fatal error! Let’s see what Git tells us. It says that it doesn’t know where it should push the changes. But it’s also friendly enough to tell us what we most likely want to do, which is to push to the gh-pages branch at “origin” (remember that “origin” in our case is just a nickname for our GitHub repository).

So let’s do that:

$ git push --set-upstream origin gh-pages
Total 0 (delta 0), reused 0 (delta 0)
 * [new branch]      gh-pages -> gh-pages
Branch gh-pages set up to track remote branch gh-pages from origin.

You might remember from earlier that we did git push -u origin master to set up the master branch. The -u is a shorthand for --set-upstream, so above you could also have typed git push -u origin gh-pages.

And remember, we only have to do this the first time we push to a new branch. The next time we can just do git push.

View your site

If we now visit, we should see the contents of the file that created earlier. Usually it’s available instantly, but it can take a few seconds and in the worst case a few minutes if GitHub are very busy.

Challenge: Contributing to a page owned by someone else (slightly easier way)

To practice using Git, GitHub pages and Markdown we can contribute to a GitHub pages site. Pair up in groups of two (or more if needed) and do the exercises below together.

  1. Go to, where “some-librarian” is the username of your exercise partner.
  2. Click on “Fork” in the upper right part of the screen to create a copy of the repository on your account. Once you have a fork > of your partner’s repository, you can edit the files in your own fork directly.
  3. Click the “” file, then click the edit pencil icon:

    GitHub edit pencil

  4. Now is good chance to try some Markdown syntax. Try some of the examples at Mastering Markdown. You can preview how it will look before you commit changes.
  5. Once you are ready to commit, enter a short commit message, select “Create a new branch for this commit and start a pull request” and press “Propose file change” to avoid commiting directly to the master branch.

    Commit and create pull request

  6. You can now go to the repository on your account and click “New Pull Request” button, where you can select base branches repositories, review the changes and add an additional explanation before sending the pull request (this is especially useful if you make a single pull request for multiple commits).
  7. Your partner should now see a pull request under the “Pull requests” tab and can accept (“Merge pull request”) the changes there. Try this.

This whole process of making a fork and a pull request might seem a bit cumbersome. Try to think of why it was needed? And why it’s called “pull request”?


We made a fork and a pull request because we did not have permission to edit (or commit) the repository directly. A fork is a copy of the repository that we can edit. By making a pull request we ask the owner of the repository if they would like to accept (pull in) the changes from our fork (our copy) into their version. The owner can then review the changes and choose to accept or reject them.

You can open pull requests on any repository you find on GitHub. If you are a group of people who plan to collaborate closely, on the other hand, it’s more practical to grant everyone access to commit directly instead.

Optional challenge: Contributing to a page owned by someone else (slightly more complicated way)

Instead of making edits on the GitHub website you can ‘clone’ the fork to your local machine and work there.

Try following the rest of the steps under “Time to Submit Your First PR” at this guide:

(If you followed step 1 and 2 in the previous challenge, you already have a fork and you can skip the creation of a new fork if you like. You can submit multiple pull requests using the same fork.)

Optional challenge: Adding an HTML page

GitHub Pages is not limited to Markdown. If you know some HTML, try adding an HTML page to your repository. You could do this on the command line or directly on GitHub. The steps below are for working directly on GitHub:

  1. Make sure you are working on the “gh-pages” branch. Select it from the menu if not:

    Branch selector on GitHub

  2. To add a new file directly on GitHub, press the “Create new file” button.

    Create new file on GitHub

  3. Name it ‘test.html’, add some HTML and click “Commit new file”.
  4. Try opening (replace “some-librarian” with your username). Notice that the HTML extension is not included.

Key Points

  • GitHub Pages offer an automated way to create a website that is version controlled and accessible for collaboration

  • Collaborating on a GitHub Pages website uses the same Git/GitHub workflow you learned for collaborating via a GitHub repository