Version Control

Versions

This aside covers:

Excuses

"My computer crashed and I lost all my work"

That's not an acceptable excuse for a computer scientist

All computer scientists should know about the danger and make sure that their work is backed up

That means putting a copy in a safe place

Memory sticks

Some people make backups on memory sticks

We know, from experience, that memory sticks have a limited life, after which they fail

So that's not a safe place

Second disks

Maybe your computer has a second hard disk

We know, from experience, that if one hard disk crashes, it is likely to make the other hard disk crash

So that's not a safe place, either

The lab

Maybe you work in the lab, or you work on your own computer but make backups of your work in the lab

Over the last 35 years, it has proved to be a reasonably safe place to keep backups

Your files in the lab are backed up for you every night or two, which means you could lose a day or two's work, but not a whole project

The cloud

Another possibility is to store backups in the cloud

Examples are Google Drive, Dropbox, OneDrive, iCloud, Amazon S3, or even Facebook

But after talking about git, we are going to recommend gitlab or github or bitbucket

Versioning

For programs, versioning is important as well as backup

Versioning means keeping track of which versions of your files represent working stages of development

It is quite important when working on your own, and vital when working with other programmers in a team

Recommendation

Our recommendation is to use git for versioning

There are several modern versioning systems, of which git seems to have become the most popular

It will be used or mentioned a lot in your degree, and it is used a lot in industry, so learning it now is a big advantage

Tutorial

One tutorial on Git is Octocat

It is pretty, and worth having a look at, but it neither explains the simplest way to get started, nor gives an in-depth explanation

It is just a survey of some randomly chosen features and, as usual, it lacks the 'why' factor

Other tutorials

Other git tutorials are Git is simpler than you think and Understanding Git Conceptually

These make it clear that, internally, git is sweet, simple and reliable

It's basically a sturdy hash-table of zipped-up backups of your files, stored cleverly as a filestore-based transactional database

But its command line interface, its jargon, and its error messages, are a bit bizarre and obscure

Getting started

The git command is already installed in the lab, and on your own computer if it runs Linux or MacOs

If you have a Windows computer, you may need to install git (e.g. using the Cygwin installer - in Bash it is probably already installed)

We'll describe a very simple way to use git, and you can check the tutorials for more sophisticated things

Creating a project

Make yourself a directory (folder) for your project - let's say it is called imp

Change into that directory, and type git init:

> mkdir imp
> cd imp
> git init

Your repository for the imp project is in a hidden subdirectory called .git

Everything in imp is tracked including subdirectories (so don't do git init in a subdirectory)

What git does

What git does for you is to store copies of your files in the .git subdirectory when you tell it to

You can give git commands from anywhere inside your project directory (imp or a subdirectory)

You can use a graphical program to drive git for you, but make sure you understand what it is doing

Because your files are copied into the same filestore, the copies do not count as backups

Checking status

You can check the status of your repository at any time, while sitting in your project directory:

> git status

This lists files have been changed or deleted, and which have been added, since the last time you saved copies

> git diff filename

This tells you what changes you have made to a file

Adding files

Now suppose you have created a source file, maybe main.c

> git status
... untracked files ... main.c ...

You need to tell git to track this file - but you only have to do it once for each new file

> git add main.c

Your new file will be tracked, but git hasn't yet saved a copy

Changing files

Now suppose you have created some more files, added them to the repository, done some editing on your project, and reached a good time to save, where everything works up to some point

> git commit -am "first prototype"

-am is short for -a -m; the -a option tells git to commit all your changes, and -m allows you to add a brief comment about the version you are committing

A complete snapshot copy of all your files is saved at this point

The -m option

Typing git commit -m "message" adds a short message to the version you are committing

If you don't use the -m option, git will put you in an unfamiliar editor, probably vi, and ask you create an extended message, in which case you need to type :q and Enter to escape

The -a option

The -a option tells git to commit all the edits and deletions you have done, but it won't notice any new files you have created without using git add first

Instead of committing all your changes at once, you can use git add repeatedly to add the files that you want to be committed, and then git commit -m "...", without the -a option, to commit just those changes

It is suggested that you use -a and avoid staging, for simplicity, unless you know you need it

Configuration

By default, git assumes you are going to use staging, so when you use git status, modified files are shown in red even though git add isn't required, so you might want to do this:

git config --global color.status.changed green

Then only new files, which do require you to use git add, will appear in red

Ignoring

Suppose you create a file, e.g. a compiled program, in your project directory, you want to keep it for a while, but you don't want to commit it because it is not really part of the project, and you don't want it mentioned in status reports so they stay clean when all is well

Create a text file called .gitignore in the imp directory, add file names, one line each, and add and commit the .gitignore file

Each line is a pattern which may match more than you think!

Summary

Your normal workflow should be something like:

make some definite progress

git status         check what's changed
git add file(s)    add any new files

add files you don't want tracked to .gitignore

git commit -am "summary of new version"

This should be quite frequent, e.g. every few minutes

The cloud

gitlab and github and bitbucket are commercial services that combine cloud storage with git, so they add backup and collaboration to projects

They are keen to provide free services to students and academics, because it's good for business!

Roughly, gitlab is free with few limits, github is free for public use, and bitbucket is free for small private use

It is suggested that you start with gitlab, and see if you want to join up with the others as well later

Email address

You will need to invent a username (e.g. your Bristol one) and a password, and give an email address

It pays to register with an academic email address, e.g. one that ends with bristol.ac.uk - it may get you more free features (or you may have to ask)

You will get sent an email with a link to follow (probably presented as a button to press) to confirm that the email address is yours

Online repository

Probably the easiest way to get started is to create an empty repository on the gitlab/github/bitbucket web site, then clone it onto your computer

On the website, when you create a repository, you will see it has two names, one HTTPS, one SSH (you may see this straight away, or by choosing the clone menu option)

Start with HTTPS, then switch to SSH (which needs more effort to set up, but saves you having to type in a username and password all the time)

Using HTTPS

On your own computer or in the lab, type something like this, using the HTTPS address for your online project, copied and pasted from the website:

git clone https://gitlab.com/user/project.git

Then git will ask for your username and password (possibly using a popup window)

Or type this to save being asked your username:

git clone https://user@gitlab.com/user/project.git

Using SSH

If you don't want to keep typing a password, don't get git to store your password in clear, switch to SSH

On your own computer, type something like this, using the SSH address for your online project, copied and pasted from the website:

git clone git@gitlab.com:user/project.git

To make this work, read the instructions on the web site about how to create a key pair, store the private key safely, and give the public key to the web site

Backup

To backup your work, just after every git commit, type:

git push

If you get an obscure error message, one common reason is that you didn't commit first

Two computers

Now suppose you alternate between two computers, each with a cloned copy of a repository

On one computer, when you are done, type

git commit -am "..."
git push

Then on the other, before you start, type

git pull

Teams

Now suppose your imp project is to become a team project, with a shared cloud repository

Each team member will have a local copy of the repository on their own computer (or in the lab)

The owner can give online access, and each other team member can type

git clone ...

Summary

Each team member's workflow should be:

git pull       get any changes from the others
...            do a small amount of work
git status
git add ...
git commit -am "..."
git push

Try to make the amount of work so small that nobody else has pushed any changes by the time you do

Cooperation

If only one member of the team makes changes at once, and then pushes them to the shared repository, then everything is fine

But what happens if you are ready to commit your changes, and you find that another member has committed changes in the meantime?

Then your changes are based on an out-of-date version of the shared repository

What to do

Suppose that your push fails because someone else has pushed in the meantime

The first thing to try is another git pull

Then git will try to merge the changes you are pulling with your own changes

If it succeeds, it may put you into the vi editor to edit a message (as if you had done git commit without the -m option)

Type :q to exit vi

What if it doesn't work

If the simple attempt at a merge fails, what do you do then?

You consult tutorials, find out more about merging, rebasing (an alternative which is probably better for simple situations) and resolving conflicts

Submission zips

For a multi-file project, you may want to zip up the project directory in order to submit it on SAFE

The trouble is, if you do that directly, the zip file will contain your repository in .git containing the entire history of the project

That's bad, so instead type this:

git archive master -o project.zip

Now the zip file just contains the current snapshot