Analysis of Algorithms


                                   CHAPTER  3

                                ALGORITHM DESIGN



3.0 Recursion
-------------

There is really only one general method of algorithm design - break the problem
down into smaller subproblems. Recursion is just a version of this break-down
technique which breaks a problem into smaller subproblems of the SAME KIND. It
does not mean that the resulting program need have any recursive procedures in
it.

For example, let's use recursion to analyse the sorting problem. The idea is to
reduce the problem of sorting n items to the problem of sorting n-1 items.
There are two approaches. One is to sort the first n-1 items then insert the
n'th item in its correct place, giving the familiar insertion sort:


   sort a[1]...a[n]
   ----------------
   if n=1 return
   sort a[1]...a[n-1]
   insert a[n] in the sorted list a[1]...a[n-1]


Alternatively, we could search for the largest item, put it last, then sort the
first n-1 items, giving a selection sort:


   Sort a[1]...a[n]
   ----------------
   if n=1 return
   find the largest item and swap it with a[n].
   sort a[1]...a[n-1]


Both these algorithms take time f(n) satisfying


   f(n) = f(n-1) + O(n)      =>      f(n) = O(n^2)


Insertion sort is simpler and faster than any of the other famous O(n^2)
algorithms (especially bubble sort). It is fast on small problems, and fast
(linear, i.e. O(n)) on arrays which are almost sorted already. This makes it
suitable for combining with more sophisticated algorithms.

Remember, a recursive design should always begin with a simple case which can
be solved easily.



3.1 Divide And Conquer
----------------------

This is another important technique in algorithm design. It is just a more
sophisticated use of recursion in which we try to break the problem into equal
sized pieces. In other words, there are two features: a breakdown into
(1) SIMILAR PROBLEMS of (2) SIMILAR SIZES.


For example, consider searching for an item in a sorted array a[1]...a[n].
A linear search, looking at each element in turn is O(n). A binary chop is
better. The idea is to break the search into two equal sized searches:


   To search a[i]...[j] for value v
   --------------------------------
   if i > j return "not found"
   look at the middle element k = (i + j) div 2
   if v = a[k] return "k"
   if v < a[k] then search a[i]...a[k-1]
   if v > a[k] then search a[k+1]...a[j]


WARNING - if you try to "optimize" the 3-way test into 2-way test, you are
liable to produce an incorrect algorithm. The time taken is:


   f(n) = f(n/2) + O(1)      =>      f(n) = O(log(n))


Now let's try to break a sorting problem into two equal sized sorting problems.
One approach is to sort the first half, sort the second half and combine the
results, giving a merge sort:


   Sort items a[i]...a[j]
   ----------------------
   if there is only one item, return
   sort the first half of the items a[1]...a[k] where k = (i+j) div 2
   sort the second half of the items a[k+1]...a[j]
   merge the two sorted sequences


Assuming that merging of sequences of length n can be done in time O(n),
(see questions) the time f(n) to sort n items using this algorithm is:


   f(n) = 2f(n/2) + O(n)      =>      f(n) = O(n*log(n))

 
Alternatively, we could try to split the numbers into low numbers and high
numbers first, then sort the low numbers and sort the high numbers. We get the
quicksort algorithm:


      Sort items a[i]...a[j]
      ----------------------
 1.   if there is only one item, return
 2.   choose a "medium" value v
 3.   arrange for a[i]...a[k] to contain values <= v
         and for a[k+1]...a[n] to be > v  (whatever k happens to be)
 4.   sort a[i]...a[k]
 5.   sort a[k+1]...a[n]


The questions remaining in this algorithm are how to choose v at step (2), and
how to do the splitting at step 3. Step 3 is done by another recursive design:







   Split a[i]...a[j] into those <= v and those > v
   -----------------------------------------------
   if i > j then return
   if a[i] <= v then split a[i+1]...a[j] and return
   if a[j] > v then split a[i]...a[j-1] and return
   now we know a[i] and a[j] are mis-placed
   swap a[i] and a[j]
   split a[i+1]...a[j-1]


This splitting can be implemented with simple loops rather than a recursive
procedure. (Increment i until a[i]>v, decrement j until a[j]<=v, swap a[i]
and a[j], repeat). The splitting algorithm clearly examines each item once, so
is O(n). More formally, the time taken is


   f(n) = f(n-1) + O(1)      =>      f(n) = O(n)


As for choosing v, we can choose a random item from a[i]...a[j], or if we
assume the items are in random order to start with, we can just use v = a[i].
A popular compromise is to use the median of a[i], a[j] and a[(i+j) div 2].
On average, half the items will be less than v and half greater, so the time
for quicksort ON AVERAGE is:


   f(n) = 2f(n/2) + O(n)      =>      f(n) = O(n*log(n))


Thus Divide & Conquer design gives us two O(n*log(n)) algorithms for sorting.
Which sorting algorithm should be used in practice? Mergesort is a useful tool
in sorting large files. In memory, insertion sort is best for small problems
(say up to 10 items) or when speed is unimportant.  Otherwise use quicksort.
Either way, take the code from (e.g.) Sedgewick.

There are various ways of optimising quicksort. One problem is that it spends a
lot of its time sorting small pieces of the array, at which insertion sort is
better.  Thus a good scheme is to change the first line to "if the number of
items is less than 10 return". After the quicksort, insertion sort can be used
once on the whole array to finish the job - this will be quick since the array
is nearly sorted already. Also, one of the two recursive calls in the quicksort
algorithm can easily be replaced by a loop.



3.2 Divide & Conquer - Multiplying Polynomials
----------------------------------------------

As another example, consider multiplying polynomials, say with n coefficients.


       p   =   1 + 2*x + 3*x^2 + 4*x^3 + 5*x^4 + 6*x^5
       q   =   6 + 5*x + 4*x^2 + 3*x^3 + 2*x^4 + 1*x^5


The obvious algorithm involves multiplying each term of one poly by each term
of the other, giving an O(n^2) algorithm.  The divide and conquer approach
suggests splitting each polynomial in two and reducing the problem to that of
multiplying polynomials of length n/2:






       p   =   1+2*x+3*x^2  +  x^3*(4+5*x+6*x^2)
               -----------          -----------
                    a                    b


       q   =   6+5*x+4*x^2  +  x^3*(3+2*x+1*x^2)
               -----------          -----------
                    c                    d


The product p*q is:


   a*c + x^3*(a*d + b*c) + x^6*(b*d)


If we have to perform all 4 multiplications a*c, a*d, b*c, b*d then we have
gained nothing. However,


   a*d + b*c   =   (a + b) * (c + d) - a*c - b*d


and so the product p*q can be calculated using only three half-length
multiplications, namely a*c, b*d and (a+b)*(c+d). The additions etc. take O(n)
time as they can be done coefficient by coefficient, and so the time taken is:


   f(n) = 3*f(n/2) + O(n)      =>      f(n) = O(n^log(3))  =  O(n^1.59)


This is better than O(n^2), but only for large n because of the overheads.
There is a O(n*log(n)*log(log(n))) algorithm for multiplying large polynomials
based on the Fast Fourier Transform, but it is too complex to be practical
except for VERY large problems.

For matrix calculations it has been shown that the costs of multiplying two
matrices, inverting a matrix, finding the determinant of a matrix etc. are all
the same - a fast algorithm for one results in fast algorithms for the others.
The normal multiplication method for NXN matrices takes O(N^3) time.

There is a divide & conquer method called Strassen's algorithm for multiplying
large NXN matrices in O(N^log7) = O(N^2.81...) based on the fact that 2 by 2
matrices can be multiplied using only seven multiplications instead of the
usual 8. It becomes noticeably better than the standard method at around N=100.
There is a better algorithm still (about O(N^2.795)), but the best possible
power of N is not known.

Note that the problem size is, strictly speaking, n = N^2 and so the standard
method is O(n^(3/2)) = O(n^1.5) and Strassen's is O(n^log7/2) = O(n^1.40...).



3.3 Divide & Conquer - Graphics
-------------------------------

As another example, consider Warnock's algorithm for producing a
two-dimensional picture from 3-dimensional information (the hidden surface
problem). The aim is (for example) to produce a picture of a cube:






   -----------------------------------
   |                                 |
   |          ___________            |
   |         /__________/|           |
   |         |          ||           |
   |         |          ||           |   (six squares)
   |         |          ||           |
   |         |__________|/           |
   |                                 |
   |                                 |
   |                                 |
   -----------------------------------


given a description of the position of the cube, and the viewing position
and direction. Suppose the cube is represented as six squares of different
colours, Front, Back, Top, Bottom, Left, and Right. The viewing rectangle is
split into 4 equal pieces, and the squares are also cut into pieces.


   ------------------|-----------------
   |   (4 pieces)    |   (4 pieces)   |
   |          _______|____            |
   |         /_______|___/|           |
   |         |       |   ||           |
   |         |       |   ||           |
   ------------------|-----------------
   |         |       |   ||           |
   |         |_______|___|/           |
   |                 |                |
   |    (1 piece)    |   (4 pieces)   |
   |                 |                |
   ------------------|-----------------


Any piece which is completely covered by another piece can be discarded (see
lower left corner). As this process is repeated, a rectangle will eventually
have only one piece (or none) in it, and so can be easily be "coloured in" on
the view screen. If a rectangle becomes very small without becoming simple
enough to colour in a single colour, it can be coloured by some simple
heuristic.

Algorithms of this type are very common and useful in graphics, particularly
as a pre-processing stage, or for use on multiple processors.



3.4 Removing Recursion
----------------------

Recursion can always be removed from an algorithm if desired. This removes
the cost of the procedure calls involved. Advice - always remove recursion
if you can do it without using your own stack. If you have to use your own
stack, you might make the program slower, since the compiler may have access
to a special fast hardware stack. Removing recursion only changes the speed
of an algorithm by a constant factor, so it does not change the analysis.

The easiest kind of recursion to remove is called HEAD- or TAIL-RECURSION,
where there is a recursive call that appears very near the beginning or end
of the procedure. For example the simplest cases are:





      proc (n)                                 proc (n)
      --------                                 --------
      if n=0 return           becomes          for i := 1 to n do
      proc (n-1)                                  etc.....
      etc.....


      proc (n)                                 proc (n)
      --------                                 --------
      if n=0 return           becomes          for i := n downto 1 do
      etc.....                                    etc.....
      proc  (n-1)


These assume that n is not altered during the procedure.  As another example,
the last call in quicksort can be removed as follows:


      Sort items a[i]...a[j]
      ----------------------
 1.   while i+1 < j       (number of items > 1)
 2.      arrange for a[i]...a[k] to be less than a[k+1]...a[j]
 3.      sort a[i]...a[k]
 4.      i := k+1         (instead of:  sort a[k+1]...a[j])


More complicated recursions can always be removed in the same way that
a compiler does, i.e. by using a stack of local variables.
This can often be thought of as "things left to be done".
For example Quicksort becomes (using a stack of ranges):


   Sort a stack of ranges
   ----------------------
   while the stack is not empty do
      take a range i,j from the stack
      while i+1 < j
         arrange for a[i]...[k] < a[k+1]...[j]
         add i,k to the stack
         i := k+1



3.5 Dynamic programming
-----------------------

This is a fancy name for a simple idea, again built on recursion. The idea is
to speed up algorithms at the expense of space by keeping a table of previous
smaller results.

For example, in calculating factorial(n), it helps to keep an array of results,
up to the largest previously requested, say r. When factorial(n) is requested,
if n <= r, then the value in the array is returned. If n > r, then entries r+1
to n are filled in, using one multiplication each. This speeds the algorithm up
from O(n) to an average of O(1).

Another example is pagination. Suppose there is a paragraph of text which
consists of n words which must be split into lines in the best possible way.
Suppose we have an array w[1]...w[n] of words. We are given a function
LineCost(i,j) which measures the cost of putting words w[i]...w[j] on one line.
The cost is high if the result would be aesthetically un-pleasant, and
(effectively) infinite if it would be impossible. A straightforward recursive
design gives:


   To find min cost of paragraphing words w[1]...w[n]
   --------------------------------------------------
   for i := 1 to n-1 do
      find min cost of paragraphing w[1]...w[i]
      add LineCost(i+1,n)
   return the minimum of the n-1 costs found


In other words, for each i we find the best way of splitting the first i words,
and putting the remaining words on one more line.  If it were implemented
directly as a recursive procedure, the complexity would be roughly


   f(n) = f(1)+f(2)+f(3)+...+f(n-1) = O(1+1+2+4+8+16+...+2^n)  =  O(2^n)


If instead we keep a table of smaller results, so that ParaCost[i] becomes a
table lookup instead of a recursive call, we get:


   To find min cost of paragraphing words w[1]...w[n]
   --------------------------------------------------
   for j := 1 to n do
      for i := 1 to j-1 do
         find Paracost[i] + Linecost(i+1,j)
      Paracost[j] := minimum of the costs found
   return Paracost[n]


Now f(n) = O(n^3), assuming that LineCost takes time O(n). This is the approach
which the TeX typesetting package takes.



3.6 Dynamic Programming - Knapsack
----------------------------------

Another example is the knapsack problem. Suppose there are items of n types
varying in size and value, and a knapsack of size K, e.g.


                          size each (say cc)     value each (say $)
   stack of gold bars             12                    24
   heap of silver coins            7                    13
   tree of banknotes               2                     1

   knapsack                       28


What is the greatest value of items that can be put in the knapsack? Any number
of items of one type can be used.

An approximate answer can be obtained using a "greedy" algorithm by starting
with the items which are most valuable for their size. In the above example,
gold is best (2 $/cc) then silver (13/7 $/cc) then notes (1/2 $/cc). To fill
the knapsack, we put in as much gold as possible (2 bars) then silver coins
(none) then notes (2) for a total value of 48+2 = 50. Unfortunately, this
algorithm does not give the best answer, as it is better to put in 4 silver
coins for a total value of 4*13 = 52. For an exact answer, we must turn to
recursion. There are n choices of first item, and for each choice we find the
best way of filling the remaining space. Then we take the maximum over the n
choices.



   To find the max value of a knapsack of size K
   ---------------------------------------------
   for i := 1 to n do
      find value[i] plus
      max value of a knapsack of size (K-size[i])
   return the maximum of the values found


The time taken f(n) is again at least a sum of n previous values f(i), giving
f(n) = O(2^n). Again if we tabulate the answers as knapsack[j], we get:


   To find max value of knapsack of size K
   ---------------------------------------
   for j := 1 to K do
      for i := 1 to n do
         if size[i] <= K then find value[i] + knapsack[j-size[i]]
      knapsack[j] := maximum value found
   return knapsack[K]


which takes O(K*n) time. If we keep track of the best item as well as the max
value at each stage, we can construct the optimum contents as well as the max
value of the contents. The method can be adapted to the case when there are a
limited number of items of each type.

Note that this method is proportional to K, and is no good if K is large, say
exponential in n. If the numbers in the problem are real instead of integer,
this is equivalent to K being large.



3.7 Dynamic Programming - Sequences of Matrices
-----------------------------------------------

Another example comes from multiplying matrices of different sizes. Suppose Mab
stands for a matrix with 'a' rows and 'b' columns. The cost of multiplying Mab
by Mbc is a*b*c (since there are 'a' rows of Mab to multiply by 'c' columns of
Mbc, and each row/col has 'b' items in it). The cost of multiplying many
matrices depends on the order. If a=2, b=3, c=1, d=4 then


   (Mab * Mbc) * Mcd      costs   a*b*c + a*c*d  =  14
   Mab * (Mbc * Mcd)      costs   b*c*d + a*b*d  =  36


so the first order is better. Suppose we know the size of each of n matrices
M1...Mn, and we want to find the minimum cost of multiplying them all together,
i.e. of calculating M1*M2*...*Mn, we have a recursive design:


   To find min cost of multiplying M1*M2*...*Mn
   --------------------------------------------
   for i := 1 to n-1 do
      find min cost of multiplying M1*...*Mi
      find min cost of multiplying Mi+1*...*Mn
      add these to the cost of multiplying (M1*...*Mi) by (Mi+1*...*Mn)
   return the minimum of these n-1 costs


A two-dimensional table cost[i,j] is needed to avoid an exponential algorithm.
Cost[i,j] is the cost of multiplying Mi*...*Mj.  When calculating cost[i,j], we
need to have available previous values of cost[i,k] and cost[k+1,j].


We can ensure this by filling in cost[i,j] in order of increasing j-i. First we
fill in the diagonal i=j. Second we fill in the diagonal j-i=1.  Third, we fill
in the diagonal j-i=2, and so on.


               j -->
          ------------
      i   | 1 2 3 4 5
          |   1 2 3 4
      |   |     1 2 3
      v   |       1 2
          |         1

   To find min cost of multiplying M1*...*Mn
   -----------------------------------------
   initialise cost[i,i] to zero
   for L := 1 to n-1 do
      for i := 1 to n-L do
         calculate cost[i,i+L] as follows
         for j := i to i+L-1 do
            calculate cost[i,j] + cost[j+1,i+L]
            add cost of multiplying (Mi*...*Mj) by (Mj+1*...*Mi+L)
         cost[i,i+L] := min of values found
   return cost[1,n]


This gives us an O(n^3) algorithm. Again, if we record with cost[i,j] the best
place to split Mi...Mj, we can reconstruct an optimum order of multiplications.



3.8 Dynamic Programming - Optimal Search Trees
----------------------------------------------

Suppose we have a set of strings s[1]...s[n] in order, with known frequencies
of occurrence f[1]...f[n]:


 s:    A    B    C    D    E    F    G 
 f:   .20  .10  .05  .15  .30  .15  .05


We know that there will be a large number of searches for these strings, the
strings appearing with the frequencies given (e.g. 20% of the searches are for
string A). We want to create a binary search tree from them such as


                    D         --- 1 comparison
                  /   \
                 /     \
                A       F     --- 2 comparisons
                 \     / \
                  B   E   G   --- 3 comparisons
                   \
                    C         --- 4 comparisons


before searching begins. We want the tree with minimum average search time.
We measure the cost of a tree by multiplying the number of comparisons needed
to find a string (distance from the root + 1) by the frequency for the string
and summing. We want the tree with smallest cost. (This is like Huffman trees,
except we must preserve the order). All nodes to the left of a node must be
earlier, and all nodes to the right must be later.


If we choose s[i] as root, its left subtree must contain s[1]...s[i-1] and its
right subtree must contain s[i+1]...s[n]. Thus we try each string s[1]...s[n]
as root. Let weight[i,j] be the sum of the frequencies f[i]+...+f[j]. A
recursive design gives:


   Find optimum time for s[i]...s[j]
   ---------------------------------
   for k := i to j do
      find optimum time t1 for s[i]...s[k-1]
      find optimum time t2 for s[k+1]...s[j]
      calculate t1 + t2 + weight[i,j]
   return the minimum of the n times found


Note that every string in the tree takes one more comparison than it does in
the two subtrees, so the total time is t1+(f(i)+..+f(k-1)) for the left tree,
plus f(k) for the root plus t2+(f(k+1)+..+f(j)) for the right tree, giving a
total of t1+t2+weight[i,j].

Again, this gives an exponential algorithm, and we need a two-dimensional table
cost[i,j] = "optimum time for s[i]...s[j]". Again, we must fill it in order of
increasing length j-i.


   Find optimum time for s[1]...s[n]
   ---------------------------------
   for L := 1 to n-1 do
      for i := 1 to n-L do
         find cost[i,i+L] as follows
         for j := i to i+L do
            t1 := cost[i,j-1]
            t2 := cost[j+1,i+L]
            find f[j] + t1 + weight[i,j-1] + t2 + weight[j+1,i+L]
         cost[i,i+L] := minimum of the costs found
   return cost[1,n]


Again, this gives an O(n^3) algorithm.



3.9 Backtracking
----------------

Some problems are intractable, that is, there is no known efficient algorithm,
and we have to resort to an exhaustive search for possible solutions.  Such
algorithms are generally exponential, and suffer from "combinatorial
explosion". That is, there is a problem size, say 20 items, beyond which there
is no practical hope of solution. To cope with 21 items would require a
computer 20 times as powerful, and to cope with 22 items would require 400
times the power. The aim in designing exhaustive search algorithms is to put
off the explosion as long as possible. For example, if we can improve an O(2^n)
algorithm to O(2^(n/2)), we may put off the explosion from size 20 to size 40.

The most famous problem of this kind is the Travelling Salesperson Problem
(which always used to be called the Travelling Salesman Problem and is now
sometimes called the Weary Traveller Problem).  Given a number of towns and
inter-town distances, find the shortest route which visits each town once.
For example:





      A  B  C  D  E  ....
   A     5  6  7  8
   B        4  3  3
   C           2  4
   D              3
   E
  ...

The statement of the problem says that the traveller must visit each town only
once. There are circumstances where you might think it better to visit a town
more than once. Suppose town T is visited twice, the second time being between
visits to A and B (...T...ATB...). Then this must be because the sum of
distances AT + TB is smaller than distance AB, otherwise it would be better to
leave out the second visit. But that means that the shortest distance from A to
B (which just HAPPENS to go through T) hasn't been recorded properly. Thus we
may assume that the triangle inequality holds (direct routes are always
shortest) and that the traveller visits towns only once.

The most obvious algorithm for this is to generate all the n! permutations of
the towns, calculate the length of each tour, and keep track of the best one so
far. This is O(n!), but remember that success is measured by cutoff point and
not by asymptotic behaviour.

There are obviously gains to made through symmetry, but symmetry is a rather
special problem. A more general technique is PRUNING.  Suppose we generate the
permutations as follows:


    begin with each town A,B,C,D... in turn, and for each
        generate all permutations of the n-1 remaining items


giving


   ABCD...
   ABDC...
   ACBD...
   ACDB...
   ...


When all possibilities of one type have been exhausted, we "backtrack" and try
the next type. Note this is just another form of recursion.  We may as well
keep track of the length of the partial path. Suppose the best tour found so
far is 20 and the length of ACBD is >= 20. Then we can ignore all permutations
beginning ACBD... This is called pruning because it is cutting off a whole
subtree of the search tree:


                .
              /   \
             A
           /   \
               AC
               /
              ACB
              /
            ACBD   (this subtree is pruned)
            //\\





Pruning is more successful if an answer close to optimal is found early.
Thus it pays to make the order of search one which makes this likely.
For example, having decided on ACBD, the next town chosen could be the one
closest to D. Note that the majority of nodes in a tree are near the bottom,
so pruning can be very effective. It may be worth applying a good heuristic
algorithm first to get a good solution to aid the pruning.

Using very sophisticated techniques, the explosion point for this problem
is around 100-200.



3.10 Heuristic Methods For Intractable Problems
-----------------------------------------------

If faced with intractable problems as above, with problem sizes over the
explosion point, we have to resort to approximate methods.  These are difficult
to design and almost impossible to analyse.

Perhaps the commonest method is to look for a LOCAL OPTIMUM. In the travelling
salesperson case, we could begin with a reasonable solution and make small
changes. When no more small changes can improve the solution, we have a local
maximum - no solution "close by" is any better. One quite successful method is
to split the current solution into (say) 3 paths and investigate all possible
ways of reconnecting the paths:

   ABCDEFGHI JKLMNOPQR STUVWXYZ

   ABCDEFGHI JKLMNOPQR ZYXWVUTS
   ABCDEFGHI RQPONMLKJ STUVWXYZ
   ABCDEFGHI RQPONMLKJ ZYXWVUTS
   ABCDEFGHI STUVWXYZ JKLMNOPQR
   ABCDEFGHI STUVWXYZ RQPONMLKJ
   ABCDEFGHI ZYXWVUTS JKLMNOPQR
   ABCDEFGHI ZYXWVUTS RQPONMLKJ

It is difficult to say how well the method performs as there is no practical
method of finding the best solution to compare against.