Analysis of Algorithms

Answers to "questions2"                               Analysis of Algorithms
----------------------                                -----------------------

1. The data structure needed is a queue of nodes. When a node is scanned,
   all its children are added to the back of the queue. Then the front item
   is removed from the queue to find the next node to be scanned.
   Each node is added to the queue once, and removed from the queue once,
   so the time taken is O(n) for n nodes.


2. To do insert, delete and search, we either want to use a hash table
   (average O(1) time), or B-trees (guaranteed O(log n) time). However,
   for this problem, we need three of them. It is possible to use three
   entirely separate structures, one for each type of search. Then
   inserting a record would involve inserting it into each of the three
   structures etc. This involves storing what could be lengthy data three
   times. Instead, one could have hash tables or B-trees of pointers to the
   records. Inserting a record would then involve entering the record data
   and updating the pointers in the three hash tables or B-trees. Each
   operation would take average O(1) time for hash tables, or guaranteed
   O(log n) time for B-trees.


3. The answer is simply to keep track of both the permutation, and the
   position of each number in the permutation in two arrays. For example
   the permutation  6 4 1 5 2 3 would have arrays:

      i:                1 2 3 4 5 6
      permutation[i]:   6 4 1 5 2 3
      position[i]:      3 5 6 2 4 1

   where position[4]=2 since number 4 is at position 2 in the permutation.
   Now the questions "what is at position i in the permutation" and
   "what is the position of number j in the permutation" can be answered
   in O(1) time - the answers are "permutation[i]" and "position[j]".
   Swapping the numbers at position i and j is also O(1) since it can be
   done by:

      temp := permutation[j]
      permutation[j] := permutation[i]
      permutation[i] := temp
      position[permutation[i]] := i
      position[permutation[j]] := j

   It is also possible to swap the positions of the numbers i and j in O(1)
   time. In fact, the position array is just the "inverse permutation" of
   the original, so there is a complete symmetry between the arrays.


4. We need a data structure in which to keep the largest n numbers found so
   far. As each number is read in, we need to be able to check to see if
   it is larger than any of the numbers in the data structure. If not, it
   is discarded, otherwise the smnallest number in the data structure is
   discarded, and the new number is entered into the data structure.
   A suitable data structure for this is a priority queue in which smaller
   numbers have higher priority. Then the highest priority item is the
   smallest of the saved numbers. We read a number in from the file and
   compare it with the highest priority item. If the new number is smaller,
   it is discarded. If it is larger, the highest priority item is removed from
   the priority queue, and the new number is inserted in its proper place in
   the priority queue. Of course, when reading the first n numbers from the
   file, no numbers are discarded. The time taken is O(log n) per number in
   the file.


5. The most natural data structure is the priority queue which takes O(log n)
   time per operation. However, if removing the highest priority item is
   the commonest operation, it might be better to use an ordinary queue
   which is kept sorted by priority with the highest priority item at the
   front and the lowest at the back. Removal from a normal queue is an O(1)
   operation, whereas insertion may be O(n).

   Note that this is a curious question, as it is difficult to see how
   removal can be more common than insertion without the queue becoming
   empty. However, in a scheduler many processes may have the same priority.
   In this case one time slice typically involves removing a process from
   the front of the queue, running it for a bit, then returning it to the
   back of the queue, both O(1) operations on a normal queue.


6. A suitable recursive design for range-search in a binary search tree is:

      To search for items in a subtree which are within a range
      ---------------------------------------------------------
      compare the root with the range
      if it is smaller than the range then search the left subtree
      else if it is larger then search the right subtree
      else (if it is within the range) then search both subtrees

   The algorithm will follow a path down from the root until it encounters
   an item within the range. The algorithm then follows two paths from that
   item to the bottom of the tree, searching everything in between. The left
   path represents the left boundary of the range, and the right path
   represents the right boundary of the range, and all items in between are
   in range. The length of the paths is O(log n) and there are R items between
   so the time taken is O(log(n) + R).


7. We need to be able to convert from the name of a set (or item) to its
   number, and vice versa. This suggests a hash table FINDSET in which to
   look up set names and find the corresponding sequence numbers. We also
   need an array NAMESET, each entry of which points to a hash table entry,
   which allows us to take a set's sequence number and find the corresponding
   name. In fact we need two more structures FINDITEM and NAMEITEM to do
   the same for sets. Now the function "set-name = FIND(item-name)" works
   as follows. We look up the item name in the hash table FINDITEM to
   find the item's number. Then we use the parent array as usual to follow
   parent pointers to the root of the set-tree, giving us the sequence number
   of the set. The array NAMESET allows us to find the set's name.
   The procedure "UNION(set-name1, set-name2, set-name3)" which forms the
   union of set-name1 and set-name2 and gives the union the new name set-name3
   works by looking up the first two names in FINDSET, using the parent and
   size arrays to make the smaller tree a subtree of the larger, removing
   the hash table entries for set-name1 and set-name2, adding a new entry
   for set-name3, and updating the NAMESET entry for the union.

   The times for the operations have been increased by the times for
   accessing hash tables, which are O(1) on average. The times were
   O(log n) for FIND and O(1) for UNION, and are now average O(log n) for
   FIND and average O(1) for UNION. With path-compression, the time for
   FIND increases to "almost O(1) on average", and the hash tables do not
   alter this.

   Now consider implementing "MOVE(item, set1, set2)" which moves an item
   from set1 to set2. It would be easy enough to change the parent pointer
   for the item to point to the root of set2, but there would be a problem
   with all the descendents of the item in set1 which would then also appear
   to move to set2. The only reasonable way to cope with this without
   losing efficiency is to leave a dummy item in set1. For example, the arrays

                    1  2  3  4
      parent:       0  3  1  0
      size:         3  0  0  1

      set names:    A        B
      item names:   w  x  y  z

   represent two sets A = [w, x, y] and B = [z]. Suppose we want to move y
   from set A to set B. In order to avoid upsetting item x (which has y
   as parent), we change entry 3 to a dummy entry, and add a new entry for
   y as shown:

                    1  2  3  4  5
      parent:       0  3  1  0  4
      size:         2  0  0  2  0

      set names:    A        B
      item names:   w  x  ?  z  y

   Now we can safely follow parent pointers from x and get the right root.
   The item y has changed position in the array, but this is OK provided
   that we always access it BY NAME as in the first part of the question,
   and if we keep the naming information up to date when we move it.


8. In the simulation, at any time we need a collection of future events,
   each described by a record containing (at least) the time at which the
   event is due to occur, and the action to be taken when it does occur.
   Each step of the simulation consists of finding the event with the earliest
   time, and taking the action indicated, which may involve adding new
   events to the collection. The ideal data structure for this is a 
   priority queue in which events with the earliest times have the greatest
   priority. Thus it takes O(log n) time to extract the earliest event
   and O(1) time on average to add an event to the collection.


9. We have a sparse matrix problem - we only need record the live cells. One
   possible data structure is a hash table. Each square is known by its
   (x,y) coordinates. The coordinates (x,y) can be used as a "name" to look
   up the square in the hash table. A hash table entry contains the coordinates
   x and y, and the state of the square ("empty" or "contains live cell").
   If a square is not found in the hash table, it is assumed to be empty.
   Thus we can find the state of a square, or record a birth or death, in
   a time O(1) on average, where n is the number of live cells.


10. The rough algorithm design is:

      Create all permutations of ABC..Y
      Put Z in each possible place to produce ZAB.., AZB.., ABZ.. etc.

   Now the sequence ABC... is a list, and as Z must be inserted at various
   different places, a linked list is indicated:

      A -> B -> C -> ...
           |    ^
           v    |
             Z

   The number of cells will not change, so there is no space allocation
   problem, so an array of cells is indicated rather than using records and
   pointer types. Each cell has a name and a pointer. We can separate these
   into two arrays so that BDAC is represented as

              0  1  2  3  4
     next:    2  3  4  0  1
     names:      A  B  C  D

   Element zero in the array indicates the number of the first item. A zero in
   the array indicates no next item, i.e. the end of the list.
   The array of names can be ignored, except for printing out.
   We can now write a procedure permute(k,n) where k is the current size of the
   list and n is the desired size. Permute is called initially with k=1 and
   an empty list (next[0]=0):

      procedure permute (k, n: integer);        {called with k=1}
      var p: integer;
      begin
         p := 0;
         repeat
            next[k] := next[p]; next[p] := k;   {insert k after p}
            if k=n then print                   {or whatever}
                   else permute(k+1,n);
            next[p] := next[k];                 {remove k}
            p := next[p]
         until p = 0
      end;

   This procedure takes only a constant amount of time outside the recursive
   calls. If f(k) is the time taken by "permute(k,n)" then we have

      f(k) = k * f(k+1)        (with f(n) = O(1))

   giving

      f(1) = n!

   as required.