Suppose we want to search for things in a list
One possibility is to keep the items in a 'randomly' ordered list, so insertion is O(1), but then a search takes O(n) time
Or, we could keep them in a sorted list, in which case we can use a binary search which takes O(log(n)) time, but then new items would have to be added in the middle, which takes O(n) time
When there is a mixture of search and insert operations, and both operations need to be well below O(n), then the items can usefully be stored in an ordered binary tree
A tree is created out of cells, with each cell having two pointers
We will create ordered binary trees, without worrying about how well balanced the tree is
Balancing techniques include:
Here's a struct for holding one node in a tree of ints:
struct node { struct node *left; int key; struct node *right; }; typedef struct node node;
This is essentially the same as
Tree a = Tip | Node (Tree a) a (Tree a)
in Haskell (using NULL
for Tip
)
Here's a function to create a new node (a one-element tree):
node *new_node(int n) { node *p = malloc(sizeof(node)); *p = (node) { NULL, n, NULL }; return p; }
Here's a recursive insertion function:
node *insert_node(node *p, int n) { if (p == NULL) p = new_node(n); else if (n < p->key) p->left = insert_node(p->left, n); else if (n > p->key) p->right = insert_node(p->right, n); return p; }
It uses p
as a current-node variable
When you call it, it returns a possibly updated node, which you have to put back where you got it
Pointers are shown pointing to the 'middle' of nodes, but that's only for symmetry
Inserting 5
, pointer p
points to nodes, moves down
the nodes, ends as NULL
Here's a version which doesn't return anything, but uses a pointer to a pointer:
void insert_node(node **p, int n) { if (*p == NULL) *p = new_node(n); else if (n < (*p)->key) insert_node(&(*p)->left, n); else if (n > (*p)->key) insert_node(&(*p)->right, n); }
It updates in place, and only does it once
When we reach the right place, we have a pointer to NULL, so we can replace it
Now p
points to a pointer, ends pointing to a box containing
NULL
, and the box can be updated
Here's a complicated iterative version:
node *insert_node(node *p, int n) { bool done = false; if (p == NULL) { p = new_node(n); done = true; } while (! done) { if (n == p->key) done = true; else if (n < p->key) { if (p->left != NULL) p = p->left; else { p->left = new_node(n); done = true; } } else { if (p->right != NULL) p = p->right; else { p->right = new_node(n); done = true; } } } return p; }
The structure is simpler using a pointer to a pointer:
void insert_node(node **p, int n) { bool done = false; while (! done) { if (*p == NULL) { *p = new_node(n); done = true; } else if (n == (*p)->key) done = true; else if (n < (*p)->key) p = &(*p)->left; else p = &(*p)->right; } }
Should you use recursive or iterative insertion, and should you use pointers-to-pointers or not?
You can disregard what anybody says about efficiency - what matters is complexity (and balance)
Use whichever you like - but when you write functions which use both left and right subtrees instead of just one, recursion stays simple while iteration gets nastier, and pointers-to-pointers are complex
So most programmers use recursion, and not pointers-to-pointers
Functions on trees are inconvenient if we force callers to use the nodes directly, either they have to catch the output (or pass a pointer-to-a-pointer)
So we need a wrapping structure for a tree:
struct tree { struct node *root; }; typedef struct tree tree;
This can also be a useful place to put global information about the tree
Here's a reasonable function to create a new tree:
tree *new_tree() { tree *t = malloc(sizeof(tree)); t->root = NULL; return t; }
A wrapped insertion function is:
void insert(tree *t, int n) { t->root = insert_node(t->root, n); }
The insert_node
function is the pointer-to-node recursive
version, but the user can't tell which version we are using
Searching is a bit simpler, and can also be done recursively or iteratively - here's a recursive version:
node *find_node(node *p, int n) { if (p == NULL) { } else if (n < p->key) p = find_node(p->left, n); else if (n > p->key) p = find_node(p->right, n); return p; }
Here's an iterative version:
node *find_node(node *p, int n) { bool done = false; while (! done) { if (p == NULL) done = true; else if (n == p->key) done = true; else if (n < p->key) p = p->left; else p = p->right; } return p; }
Again, we want a wrapper function
It shouldn't export any nodes to the user, it should just return (e.g.) a boolean to say whether the number is in the tree or not
bool contains(tree *t, int n) { return find_node(t->root, n) != NULL; }
A map is a structure which maps keys to values
For example, when counting words, you might want to map word strings as keys, to integer counts as values:
struct node { struct node *left; char word[20]; int count; struct node *right; };
The tree would be structured according to the words, and functions would retrieve or update the counts
There are many types of self-balancing tree, with red-black trees being the most popular in libraries because you only need one extra bit per node
The different types (AVL trees, 2-3 trees, ...) all use the same mechanism but have different policies
The mechanism used for balancing is rotation:
In both cases, A
< P
< B
<
Q
< C
Here's a function to rotate right:
node *rotate_right(node *q) { node *p = q->left; q->left = p->right; p->right = q; return p; }
A policy is an algorithm which decides what rotations to do and when, according to some extra info in each node, and which guarantees O(log(n)) depth