This aside covers:
"My computer crashed and I lost all my work"
That's not an acceptable excuse for a computer scientist
All computer scientists should know about the danger and make sure that their work is backed up
That means putting a copy in a safe place
Some people make backups on memory sticks
We know, from experience, that memory sticks have a limited life, after which they fail
So that's not a safe place
Maybe your computer has a second hard disk
We know, from experience, that if one hard disk crashes, it is likely to make the other hard disk crash
So that's not a safe place, either
Maybe you work in the lab, or you work on your own computer but make backups of your work in the lab
Over the last 35 years, it has proved to be a reasonably safe place to keep backups
Your files in the lab are backed up for you every night or two, which means you could lose a day or two's work, but not a whole project
Another possibility is to store backups in the cloud
Examples are Google Drive, Dropbox, OneDrive, iCloud, Amazon S3, or even Facebook
But after talking about git
, we are going to
recommend gitlab
or github
or bitbucket
For programs, versioning is important as well as backup
Versioning means keeping track of which versions of your files represent working stages of development
It is quite important when working on your own, and vital when working with other programmers in a team
Our recommendation is to use git for versioning
There are several modern versioning systems, of which git
seems
to have become the most popular
It will be used or mentioned a lot in your degree, and it is used a lot in industry, so learning it now is a big advantage
One tutorial on Git is Octocat
It is pretty, and worth having a look at, but it neither explains the simplest way to get started, nor gives an in-depth explanation
It is just a survey of some randomly chosen features and, as usual, it lacks the 'why' factor
Other git tutorials are Git is simpler than you think and Understanding Git Conceptually
These make it clear that, internally, git is sweet, simple and reliable
It's basically a sturdy hash-table of zipped-up backups of your files, stored cleverly as a filestore-based transactional database
But its command line interface, its jargon, and its error messages, are a bit bizarre and obscure
The git
command is already installed in the lab, and on your
own computer if it runs Linux or MacOs
If you have a Windows computer, you may need to install git
(e.g. using the Cygwin installer - in Bash it is probably already installed)
We'll describe a very simple way to use git
, and you can check
the tutorials for more sophisticated things
Make yourself a directory (folder) for your project - let's say it is
called imp
Change into that directory, and type git init
:
> mkdir imp > cd imp > git init
Your repository for the imp
project is in a hidden
subdirectory called .git
Everything in imp
is tracked including subdirectories
(so don't do git init
in a subdirectory)
What git does for you is to store copies of your files in
the .git
subdirectory when you tell it to
You can give git
commands from anywhere inside
your project directory (imp
or a subdirectory)
You can use a graphical program to drive git
for you, but make
sure you understand what it is doing
Because your files are copied into the same filestore, the copies do not count as backups
You can check the status of your repository at any time, while sitting in your project directory:
> git status
This lists files have been changed or deleted, and which have been added, since the last time you saved copies
> git diff filename
This tells you what changes you have made to a file
Now suppose you have created a source file, maybe main.c
> git status ... untracked files ... main.c ...
You need to tell git
to track this file - but you only have to
do it once for each new file
> git add main.c
Your new file will be tracked, but git
hasn't yet saved a
copy
Now suppose you have created some more files, added them to the repository, done some editing on your project, and reached a good time to save, where everything works up to some point
> git commit -am "first prototype"
-am
is short for -a -m
; the -a
option
tells git
to commit all your changes,
and -m
allows you to add a brief comment about the version you are
committing
A complete snapshot copy of all your files is saved at this point
-m
optionTyping git commit -m "message"
adds a short message to
the version you are committing
If you don't use the -m
option, git
will put you
in an unfamiliar editor, probably vi
, and ask you create an
extended message, in which case you need to type :q
and Enter
to escape
-a
optionThe -a
option tells git
to commit all
the edits and deletions you have done, but it won't notice
any new files you have created without using git add
first
Instead of committing all your changes at once, you can use git
add
repeatedly to add the files that you want to be committed,
and then git commit -m "..."
, without the -a
option,
to commit just those changes
It is suggested that you use -a
and avoid staging, for
simplicity, unless you know you need it
By default, git assumes you are going to use staging, so when you
use git status
, modified files are shown in red even
though git add
isn't required, so you might want to do this:
git config --global color.status.changed green
Then only new files, which do require you to use git add
, will
appear in red
Suppose you create a file, e.g. a compiled program, in your project directory, you want to keep it for a while, but you don't want to commit it because it is not really part of the project, and you don't want it mentioned in status reports so they stay clean when all is well
Create a text file called .gitignore
in the imp
directory, add file names, one line each, and add and commit
the .gitignore
file
Each line is a pattern which may match more than you think!
Your normal workflow should be something like:
make some definite progress git status check what's changed git add file(s) add any new files add files you don't want tracked to .gitignore git commit -am "summary of new version"
This should be quite frequent, e.g. every few minutes
gitlab
and github
and bitbucket
are
commercial services that combine cloud storage with git
, so they
add backup and collaboration to projects
They are keen to provide free services to students and academics, because it's good for business!
Roughly, gitlab is free with few limits, github is free for public use, and bitbucket is free for small private use
It is suggested that you start with gitlab, and see if you want to join up with the others as well later
You will need to invent a username (e.g. your Bristol one) and a password, and give an email address
It pays to register with an academic email address, e.g. one
that ends with bristol.ac.uk
- it may get you more free features
(or you may have to ask)
You will get sent an email with a link to follow (probably presented as a button to press) to confirm that the email address is yours
Probably the easiest way to get started is to create an empty repository on the gitlab/github/bitbucket web site, then clone it onto your computer
On the website, when you create a repository, you will see it has two names, one HTTPS, one SSH (you may see this straight away, or by choosing the clone menu option)
Start with HTTPS, then switch to SSH (which needs more effort to set up, but saves you having to type in a username and password all the time)
On your own computer or in the lab, type something like this, using the HTTPS address for your online project, copied and pasted from the website:
git clone https://gitlab.com/user/project.git
Then git
will ask for your username and password (possibly
using a popup window)
Or type this to save being asked your username:
git clone https://user@gitlab.com/user/project.git
If you don't want to keep typing a password, don't
get git
to store your password in clear, switch to SSH
On your own computer, type something like this, using the SSH address for your online project, copied and pasted from the website:
git clone git@gitlab.com:user/project.git
To make this work, read the instructions on the web site about how to create a key pair, store the private key safely, and give the public key to the web site
To backup your work, just after every git commit, type:
git push
If you get an obscure error message, one common reason is that you didn't commit first
Now suppose you alternate between two computers, each with a cloned copy of a repository
On one computer, when you are done, type
git commit -am "..." git push
Then on the other, before you start, type
git pull
Now suppose your imp
project is to become a team project,
with a shared cloud repository
Each team member will have a local copy of the repository on their own computer (or in the lab)
The owner can give online access, and each other team member can type
git clone ...
Each team member's workflow should be:
git pull get any changes from the others ... do a small amount of work git status git add ... git commit -am "..." git push
Try to make the amount of work so small that nobody else has pushed any changes by the time you do
If only one member of the team makes changes at once, and then pushes them to the shared repository, then everything is fine
But what happens if you are ready to commit your changes, and you find that another member has committed changes in the meantime?
Then your changes are based on an out-of-date version of the shared repository
Suppose that your push fails because someone else has pushed in the meantime
The first thing to try is another git pull
Then git
will try to merge the changes you are pulling with
your own changes
If it succeeds, it may put you into the vi
editor to edit a
message (as if you had done git commit
without the -m
option)
Type :q
to exit vi
If the simple attempt at a merge fails, what do you do then?
You consult tutorials, find out more about merging, rebasing (an alternative which is probably better for simple situations) and resolving conflicts
For a multi-file project, you may want to zip up the project directory in order to submit it on SAFE
The trouble is, if you do that directly, the zip file will contain your
repository in .git
containing the entire history of the
project
That's bad, so instead type this:
git archive master -o project.zip
Now the zip file just contains the current snapshot