Answers to "questions2" Analysis of Algorithms
---------------------- -----------------------
1. The data structure needed is a queue of nodes. When a node is scanned,
all its children are added to the back of the queue. Then the front item
is removed from the queue to find the next node to be scanned.
Each node is added to the queue once, and removed from the queue once,
so the time taken is O(n) for n nodes.
2. To do insert, delete and search, we either want to use a hash table
(average O(1) time), or B-trees (guaranteed O(log n) time). However,
for this problem, we need three of them. It is possible to use three
entirely separate structures, one for each type of search. Then
inserting a record would involve inserting it into each of the three
structures etc. This involves storing what could be lengthy data three
times. Instead, one could have hash tables or B-trees of pointers to the
records. Inserting a record would then involve entering the record data
and updating the pointers in the three hash tables or B-trees. Each
operation would take average O(1) time for hash tables, or guaranteed
O(log n) time for B-trees.
3. The answer is simply to keep track of both the permutation, and the
position of each number in the permutation in two arrays. For example
the permutation 6 4 1 5 2 3 would have arrays:
i: 1 2 3 4 5 6
permutation[i]: 6 4 1 5 2 3
position[i]: 3 5 6 2 4 1
where position[4]=2 since number 4 is at position 2 in the permutation.
Now the questions "what is at position i in the permutation" and
"what is the position of number j in the permutation" can be answered
in O(1) time - the answers are "permutation[i]" and "position[j]".
Swapping the numbers at position i and j is also O(1) since it can be
done by:
temp := permutation[j]
permutation[j] := permutation[i]
permutation[i] := temp
position[permutation[i]] := i
position[permutation[j]] := j
It is also possible to swap the positions of the numbers i and j in O(1)
time. In fact, the position array is just the "inverse permutation" of
the original, so there is a complete symmetry between the arrays.
4. We need a data structure in which to keep the largest n numbers found so
far. As each number is read in, we need to be able to check to see if
it is larger than any of the numbers in the data structure. If not, it
is discarded, otherwise the smnallest number in the data structure is
discarded, and the new number is entered into the data structure.
A suitable data structure for this is a priority queue in which smaller
numbers have higher priority. Then the highest priority item is the
smallest of the saved numbers. We read a number in from the file and
compare it with the highest priority item. If the new number is smaller,
it is discarded. If it is larger, the highest priority item is removed from
the priority queue, and the new number is inserted in its proper place in
the priority queue. Of course, when reading the first n numbers from the
file, no numbers are discarded. The time taken is O(log n) per number in
the file.
5. The most natural data structure is the priority queue which takes O(log n)
time per operation. However, if removing the highest priority item is
the commonest operation, it might be better to use an ordinary queue
which is kept sorted by priority with the highest priority item at the
front and the lowest at the back. Removal from a normal queue is an O(1)
operation, whereas insertion may be O(n).
Note that this is a curious question, as it is difficult to see how
removal can be more common than insertion without the queue becoming
empty. However, in a scheduler many processes may have the same priority.
In this case one time slice typically involves removing a process from
the front of the queue, running it for a bit, then returning it to the
back of the queue, both O(1) operations on a normal queue.
6. A suitable recursive design for range-search in a binary search tree is:
To search for items in a subtree which are within a range
---------------------------------------------------------
compare the root with the range
if it is smaller than the range then search the left subtree
else if it is larger then search the right subtree
else (if it is within the range) then search both subtrees
The algorithm will follow a path down from the root until it encounters
an item within the range. The algorithm then follows two paths from that
item to the bottom of the tree, searching everything in between. The left
path represents the left boundary of the range, and the right path
represents the right boundary of the range, and all items in between are
in range. The length of the paths is O(log n) and there are R items between
so the time taken is O(log(n) + R).
7. We need to be able to convert from the name of a set (or item) to its
number, and vice versa. This suggests a hash table FINDSET in which to
look up set names and find the corresponding sequence numbers. We also
need an array NAMESET, each entry of which points to a hash table entry,
which allows us to take a set's sequence number and find the corresponding
name. In fact we need two more structures FINDITEM and NAMEITEM to do
the same for sets. Now the function "set-name = FIND(item-name)" works
as follows. We look up the item name in the hash table FINDITEM to
find the item's number. Then we use the parent array as usual to follow
parent pointers to the root of the set-tree, giving us the sequence number
of the set. The array NAMESET allows us to find the set's name.
The procedure "UNION(set-name1, set-name2, set-name3)" which forms the
union of set-name1 and set-name2 and gives the union the new name set-name3
works by looking up the first two names in FINDSET, using the parent and
size arrays to make the smaller tree a subtree of the larger, removing
the hash table entries for set-name1 and set-name2, adding a new entry
for set-name3, and updating the NAMESET entry for the union.
The times for the operations have been increased by the times for
accessing hash tables, which are O(1) on average. The times were
O(log n) for FIND and O(1) for UNION, and are now average O(log n) for
FIND and average O(1) for UNION. With path-compression, the time for
FIND increases to "almost O(1) on average", and the hash tables do not
alter this.
Now consider implementing "MOVE(item, set1, set2)" which moves an item
from set1 to set2. It would be easy enough to change the parent pointer
for the item to point to the root of set2, but there would be a problem
with all the descendents of the item in set1 which would then also appear
to move to set2. The only reasonable way to cope with this without
losing efficiency is to leave a dummy item in set1. For example, the arrays
1 2 3 4
parent: 0 3 1 0
size: 3 0 0 1
set names: A B
item names: w x y z
represent two sets A = [w, x, y] and B = [z]. Suppose we want to move y
from set A to set B. In order to avoid upsetting item x (which has y
as parent), we change entry 3 to a dummy entry, and add a new entry for
y as shown:
1 2 3 4 5
parent: 0 3 1 0 4
size: 2 0 0 2 0
set names: A B
item names: w x ? z y
Now we can safely follow parent pointers from x and get the right root.
The item y has changed position in the array, but this is OK provided
that we always access it BY NAME as in the first part of the question,
and if we keep the naming information up to date when we move it.
8. In the simulation, at any time we need a collection of future events,
each described by a record containing (at least) the time at which the
event is due to occur, and the action to be taken when it does occur.
Each step of the simulation consists of finding the event with the earliest
time, and taking the action indicated, which may involve adding new
events to the collection. The ideal data structure for this is a
priority queue in which events with the earliest times have the greatest
priority. Thus it takes O(log n) time to extract the earliest event
and O(1) time on average to add an event to the collection.
9. We have a sparse matrix problem - we only need record the live cells. One
possible data structure is a hash table. Each square is known by its
(x,y) coordinates. The coordinates (x,y) can be used as a "name" to look
up the square in the hash table. A hash table entry contains the coordinates
x and y, and the state of the square ("empty" or "contains live cell").
If a square is not found in the hash table, it is assumed to be empty.
Thus we can find the state of a square, or record a birth or death, in
a time O(1) on average, where n is the number of live cells.
10. The rough algorithm design is:
Create all permutations of ABC..Y
Put Z in each possible place to produce ZAB.., AZB.., ABZ.. etc.
Now the sequence ABC... is a list, and as Z must be inserted at various
different places, a linked list is indicated:
A -> B -> C -> ...
| ^
v |
Z
The number of cells will not change, so there is no space allocation
problem, so an array of cells is indicated rather than using records and
pointer types. Each cell has a name and a pointer. We can separate these
into two arrays so that BDAC is represented as
0 1 2 3 4
next: 2 3 4 0 1
names: A B C D
Element zero in the array indicates the number of the first item. A zero in
the array indicates no next item, i.e. the end of the list.
The array of names can be ignored, except for printing out.
We can now write a procedure permute(k,n) where k is the current size of the
list and n is the desired size. Permute is called initially with k=1 and
an empty list (next[0]=0):
procedure permute (k, n: integer); {called with k=1}
var p: integer;
begin
p := 0;
repeat
next[k] := next[p]; next[p] := k; {insert k after p}
if k=n then print {or whatever}
else permute(k+1,n);
next[p] := next[k]; {remove k}
p := next[p]
until p = 0
end;
This procedure takes only a constant amount of time outside the recursive
calls. If f(k) is the time taken by "permute(k,n)" then we have
f(k) = k * f(k+1) (with f(n) = O(1))
giving
f(1) = n!
as required.