Analysis of Algorithms


                                   CHAPTER  7

                            MISCELLANEOUS ALGORITHMS



7.0 Testing For Prime Numbers
-----------------------------

There are a number of algorithms where randomness plays an important part in
solving a perfectly precise problem. Perhaps the most interesting is the prime
number problem.

Suppose we want to tell whether positive integer p is prime.  Eratosthenes
sieve is essentially a dynamic programming technique in which we remember the
numbers divisible by small primes while working on the next. It requires one
bit for each number up to p and is suitable for numbers up to several million
(say). What if p has several hundred digits? The sieve clearly takes at least
O(p) time, and probably O(plogp).  The size of this problem is the amount of
information needed to specify p, i.e. the number of digits of p, i.e.  n = log
p. We want a fast algorithm, i.e. something polynomial in n = log p.  There is
practical interest in this problem in the area of encryption.

One of the fastest known methods is a probabilistic one based on Fermat's
little theorem. See Lehmann's paper in SIAM J. Comput. vol. 11.  If p is prime
and 1 <= a <= p-1 then


      a^(p-1)  =  1 (mod p)

   =>   a^((p-1)/2)  =  +1 or -1 (mod p)


For example if p = 11 then (p-1)/2 = 5 and:


      a:   1  2  3  4  5  6  7  8  9  10
     a^5:  1 -1  1  1  1 -1 -1 -1  1  -1


If p = 15 then (p-1)/2 = 7 and:


      a:   1  2  3  4  5  6  7  8  9  10  11  12  13  14
     a^7:  1 -7 -3  4  5  6 -2  2 -6  -5  -4   3   7  -1


Some nasty number theory ensures that the following probabilistic algorithm is
correct.


      To test if p is prime
      ---------------------
      If p = 2 give answer PRIME
      If p is even give answer COMPOSITE
      Choose 20 numbers (say) at random from 1 to p-1
      For each number a calculate a^((p-1)/2) (mod p)
      If any are not +1 or -1, give answer COMPOSITE
      If all are +1, give answer COMPOSITE (prob. of being wrong 2^-20)
      Otherwise give answer PRIME (prob. of being wrong 2^-20)




If k numbers are used instead of 20, the probability of a wrong answer is 2^-k.
If a theoretically sound random number generator is used, and the probability
of a wrong answer made less than the probability of a bug in the program, the
algorithm is as good as a non-probabilistic one.

All numbers, including intermediate results can be kept modulo p, so they all
have log p digits. The question remaining is how to calculate a^m efficiently.
Using m multiplications is O(m) - too slow.  Divide and conquer suggests
calculating a^(m/2) and squaring.  Indeed a^(2^k) takes k squarings, i.e. log m
squarings. If m is not a power of 2, it is a sum of powers of 2:


      m = 42        m = 32 + 8 + 2 = 2^5 + 2^3 + 2^1


In 5 squarings, we can calculate a, a^2, a^4, a^8, a^16, a^32 and with 2 mults
we have a^42 = a^(32+8+2) = a^32 * a^8 * a^2.  In general we can use O(log m)
squarings and O(log m) multiplications.  A long multiplication can be done in
O((log p)^2) steps, making O((log p)^3) overall.

Note that the algorithm does not factorise composite numbers. There is no known
polynomial algorithm for that, and indeed some modern public key cryptography
techniques depend on the fact that you cannot factorise a number of several
hundred digits.

Finally, note that there is an arbitrary precision number package on unix which
can be used in programs, or interactively through the "bc" or "dc" commands.



7.1 String Searching
--------------------

Suppose we are searching for a word of w characters in a text of n characters
held in memory (e.g. in an editor). The obvious algorithm is (essentially):


      To find word[1..w] in text[1..n]
      --------------------------------
      for i := 0 to n-w do
         for j := 1 to w do
            compare word[j] with text[i+j]


No better method was found until 1970, and no algorithm published until 1977.
The idea is to scan the text faster, using knowledge from the mismatches found.
The easiest method to understand is a variant of the Boyer-Moore algorithm. The
idea is to compare the last letter of the word first. Suppose we are looking
for STING in a text:


      A STRING SEARCH
      STING
          ^

The R mismatches. As there is no R in STING, STING cannot possibly match "
STRI", "STRIN", "TRING", or "RING ", so we can skip 5 places forward for the
next comparison:


      A STRING SEARCH
           STING
               ^


The S mismatches, but is contained in STING, so we can only skip 4 places:


      A STRING SEARCH
               STING
                   ^

This leads to a program:


      word: packed array [1..w] of char;
      text: packed array [1..n] of char;
      skip: array [char] of integer;

      procedure initialise;
      var j: integer; c: char;
      begin
         for c := chr(0) to chr(255) do skip[c] := w;
         for j := 1 to w do skip[word[j]] := w-j;
      end;

      function search: integer;
      var i,j: integer;
      begin
         i := w; j := w;
         repeat
            if text[i] = word[j] then begin
               i := i - 1;
               j := j - 1;
            end else begin
               i := i + skip[text[i]];
               j := w;
            end;
         until (j<1) or (i>n);
         search := i + 1
      end;


These methods are typically O(w+n) rather than O(wn). Note that the
preprocessing is worth it. If the text is fixed, and many searches are to be
done in it, it may be worth preprocessing the text using some hash-table or
binary tree method.  Note that there are variants with less overhead, and there
are variants which never "back up" in the text suitable for file based
searching or for parallel processors.