CHAPTER  4
                               NETWORK ALGORITHMS
4.0 Definitions
---------------
A network is a set of NODES (or vertices) joined by EDGES (or directed edges or
arcs or links). Each edge goes from one node to another, or possibly from a
node to itself. Each edge may have a LABEL. (For mathematicians, this is a
labelled directed graph with loops).
                            ____
                           /    |
          4         2     /    5|
      A -----> C <----- E <-----
      |        ^        |
      |        |        |
     5|       6|       3|
      |        |        |
      v   3    |    2   v
      B -----> D <----- F
A network may be used to represent airline routes between towns, electronic
circuit layouts, jobs in a manufacturing process where some jobs must be
finished before others can be started, flow through a network of pipes, etc.
The labels on the edges might represent distances, times, capacities etc.  If
edges don't have a direction, e.g. if they represent roads, each road can be
represented as two edges, one in each direction.
A network can be represented as a data structure in two main ways, depending on
whether it is dense or sparse. If it is dense, (most possible edges present),
the LABEL MATRIX (adjacency matrix) is suitable. If the network has N nodes,
this is an N by N square matrix of labels. The i,j entry "label[i,j]"
represents the label on the edge from node i to node j.
Some special value must be chosen to represent edges which do not exist - e.g.
a large (infinite) number if the labels are distances, or zero if the labels
are capacities. If the above network represents capacities, then it has label
matrix:
        A B C D E F
      A 0 5 4 0 0 0
      B 0 0 0 3 0 0
      C 0 0 0 0 0 0
      D 0 0 6 0 0 0
      E 0 0 2 0 5 3
      F 0 0 0 2 0 0
If the network is sparse, then the matrix is sparse, and we can do better by
having a list of the edges out of each node. The list could be an array or
linked list -- we will assume a linked list representation.  For each edge in a
list, we must record its label and its target node.
For example, the above network is represented by the following linked lists.
      A     -> 5B -> 4C
      B     -> 3D
      C     .
      D     -> 6C
      E     -> 2C -> 5E -> 2F
      F     -> 2D
This is just a compact representation of a sparse matrix, and in fact network
theory techniques can be used in the design of sparse matrix algorithms
generally.
What is the size of a network problem for a network with N nodes and E edges?
For a dense network, using the matrix representation, the problem size is
the number of elements in the matrix. For a sparse network, using the linked
list representation, it is the total length of the lists, which is proportional
to the number of edges.
                   DENSE                             SPARSE
                   n = N^2                           n = E
One important special kind of network is the PLANAR NETWORK. This is a network
which can be drawn on a flat piece of paper with no crossings, and is
particularly important for laying out electronic circuits on single-layer
chips. Obviously, the labels and directions of the edges make no difference in
trying to determine whether a network is planar (i.e. can be drawn flat) or
not, and loops make no difference either, so we can remove all these things and
leave ourselves with an ordinary mathematical GRAPH.
The two smallest graphs which are not planar are called "K5" and "K3,3" by
mathematicians. The graph "K5" consists of 5 nodes A,B,C,D,E which are
kompletely konnected -- i.e. every node is connected to every other. The graph
"K3,3" consists of six nodes A,B,C,D,E,F in which the first three A,B,C are all
connected to each of the second three D,E,F (but there are no connections
between A,B,C, or between D,E,F). The graph "K3,3" is sometimes called the
"water, gas, electricity" graph, because the nodes A,B,C can represent the
three mains services, and the nodes D,E,F can represent three houses in a
street to which the services are to be connected. The services cannot be
connected without one service conduit crossing another somewhere.
You should try drawing these two graphs on paper without crossings, and
convince yourself that it can't be done. It turns out ("Kuratowski's theorem")
that any larger graph which can't be drawn flat must "contain" one of these two
simple examples in some way.
In a drawing of a planar graph on a piece of paper, the paper is divided into a
number F of FACES, or regions, including the "outside" face which extends to
the edges of the paper (imagine cutting the paper along all the edges and
counting the pieces). The theory of planar graphs is heavily based on EULER'S
FORMULA which holds for all connected planar graphs (i.e. planar graphs which
are in one piece). Euler's formula is
                       N  -  E  +  F  =  2
which you can remember by noting that the items on the left occur in order of
increasing dimension (numbers of points, lines and surfaces) and that the signs
alternate (and this hints at the n-dimensional generalisation).
One way to see that the formula is true is to imagine measuring the Euler
number (N-E+F) of a graph by simplifying it in easy stages. Imagine removing an
edge between two faces. The two faces become one, and so both E and F drop by
one, leaving the Euler number the same. Repeat this as often as you can. You
are left with edges which have the same face on each side, so that if you
removed one, the graph would fall into two pieces. In fact the graph is now a
TREE. Now you look for a LEAF -- i.e. a node connected by only one edge to only
one other node. Removing the leaf node and its edge doesn't change the Euler
number, so you repeat this as often as possible. You are now left with a single
node, a single face and no edges, so the Euler number is 2.
The most important property of planar graphs (or networks) for us is that they
are sparse. In fact E is O(N) rather than O(N^2). The maximum number of edges
in a planar graph is 3*N-6 (apart from a couple of trivial exceptions - can you
find them?). A planar network can have two directed edges between each two
nodes, and a loop at every node, giving a maximum of 7*N-12 edges.
                PLANAR GRAPH                  PLANAR NETWORK
                E <= 3*N - 6                  E <= 7*N - 12
To prove the planar graph result, first make sure that your planar graph has as
many edges as possible by adding diagonals to any face with more than three
sides, until all the faces have three sides (including the "outside" one). If
we count the total number of edges by multiplying the number of faces by three,
we have actually counted each edge twice (once for the face on either side), so
the total number of edges is E = 3*F/2. Plugging this into Euler's formula
gives E = 3*N-6, so our original graph had E <= 3*N-6.
So we always use the sparse linked-list representation for planar graphs or
networks, with problem size n = N.  For example, for a circuit with 1000 nodes,
the matrix would need 4 M-bytes, whereas the linked list requires 150 K-bytes,
assuming 4 bytes per pointer or integer. Times taken to process planar networks
are similarly shorter, simply because it takes a lot longer to scan 4 M-bytes
of data than to scan 150 K-bytes of data.
4.1 Scanning Networks
---------------------
The simplest way of scanning a network uses a recursive design.  Suppose we
want to find (and mark) all the nodes we can reach from a particular node by
following edges. The following design will do:
   To scan from node x
   -------------------
   mark x
   for each edge out of x, to a node y, do
      if y is unmarked then scan from y
                            ____
                           /    |
          4         2     /    5|
      A -----> C <----- E <-----
      |        ^        |
      |        |        |
     5|       6|       3|
      |        |        |
      v   3    |    2   v
      B -----> D <----- F
To find all nodes reachable from E, we call scan(E). This calls scan(C) which
returns immediately, then calls scan(F). Then scan(F) calls scan(D). As there
are no edges out of D to unmarked nodes (C will have been marked), scan(D)
returns immediately. Then scan(F) returns, and scan(E) returns. The result is
that E,C,F and D have been marked.
We can estimate the time taken with a label matrix representing the network as
follows. Only unmarked nodes are scanned, and then they are immediately marked.
Thus no node is scanned more than once, so O(N) nodes are scanned.  In the case
of a label matrix, scanning the edges out of a node involves looking at all N
elements of the corresponding row of the matrix. Thus the algorithm takes at
most O(N^2) steps, i.e. O(n) time -- linear time.
We can estimate the time for a linked-list representation as follows.  Again,
no node is scanned twice. To scan the edges out of a node, we just run through
the list of edges out of it. The total time is at most the total length of all
the lists, so the time taken is O(E) which is O(n) -- linear time again.
Using a stack gives a DEPTH FIRST scan in which we follow a path as far into
the network as possible before backtracking. If we had used a queue (so we scan
the oldest node first), we would get a BREADTH FIRST scan in which all the
neighbours of the initial node are scanned, then all their neighbours etc.
4.2 Shortest Path Problems
--------------------------
Suppose the labels in a network represent distances, and the object is to find
the shortest distance from one node to another.  There seems to be no better
way of finding the shortest distance from A to B other than finding all the
shortest distances from A to other nodes.
The overall strategy for doing this is as follows. For each node we keep track
of the shortest known distance to it so far. Then we scan the edges and update
the shortest known distances as follows. Suppose we examine an edge from node X
to node Y, and suppose the distances are 5 & 8, and the edge length 2. Then we
update the distance to Y to be 7, since we have found a shorter route.
        5    2    8                         5    2    7
        X ------> Y        ===>             X ------> Y
There is a problem. Consider the following network:
        0     7     00    2     00
        A --------> B --------> D
        |           ^
        |           |
       1|          4|
        |           |
        v          /
        C --------
        00
We start with distances from A to A,B,C,D of 0,00,00,00 (00 means infinite).
After scanning the edges from A, we get distances 0,7,1,00. After scanning the
edges from B, we get 0,7,1,9. Then we scan the edges from C getting 0,5,1,9.
Since the distance to D is actually 7, something has gone wrong. If we had
scanned the edges from C before the edges from B, we would have succeeded.
To fix this, we keep a list of nodes to be scanned, but this time at each stage
we take from the list the node with smallest (current) distance. The most
suitable data structure for the list is a priority queue, nodes with the
shortest distance having the highest priority.  Initially, the priority queue
contains all the nodes, the starting node having distance 0, all others having
infinite distance (maxint, say).  The algorithm is:
   To find shortest distances
   --------------------------
   initialise distances and priority queue
   while priority queue not empty do
      remove node of shortest (current) distance from priority queue, say x
      mark it
      for each edge out of x to an unmarked node y
         if (distance to x) + (length of edge xy) < distance to y
            update distance to y
            shuffle y up to a new position in the priority queue
We can prove that the algorithm works as follows. At each stage, the marked
nodes have their final correct distances recorded. The distances of the rest
are the shortest distances via marked nodes only. When we mark a new node,
we know its shortest distance via previously marked nodes. There can be no
shorter route via unmarked nodes, as we have chosen it to be the closest one.
Thus the node we are marking has its final correct distance recorded.
We can estimate the time taken as follows. For a sparse representation, we can
use the usual "heap" structure in an array to represent the priority queue.
The priority queue has length O(N), so removing or re-positioning an item takes
at most O(log(N)) time. There is at most one removal per node, and one
re-positioning per edge, giving O(N*log(N) + E*log(N)) = O(E*log(N)) time.
Since we can take N to be O(E), this gives us O(E*log(E)) = O(n*log(n)) time.
For a matrix representation Length[1..N,1..N], the above algorithm would repeat
the inner loop N^2 times, giving an O(N^2*log(N)) algorithm, but we can do
better as follows.  We use an array Distance[1..N] which we use as priorities,
a boolean array Marked[1..N] and variables QSize and FrontOfQ. The item at the
front of the queue can be removed from the queue by setting Marked[FrontOfQ] :=
TRUE, but then FrontOfQ needs updating by scanning the queue entries to find
the one with shortest distance. This would take O(N) time giving an O(N^3)
overall time for the algorithm, but the scan can be combined with the scan of
edges out of a node, giving an O(N^2) algorithm. As before, all distances start
out infinite, except Distance[1] = 0, and all nodes are in the queue, with
FrontOfQ = 1.
   To find shortest distances (dense network variation)
   --------------------------
   initialise Distance[1..N], IsInQ[1..N], QSize, FrontOfQ
   while QSize not zero do
      take node x from front of queue   ( x := FrontOfQ; QSize := QSize-1 )
      mark node x                       ( Marked[x] := TRUE )
      for each node y, if y is unmarked then
         if Distance[x] + Length[x,y] < Distance[y] then
            update Distance[y]
         update FrontOfQ                ( if FrontOfQ=x
                                          or Distance[y]<Distance[FrontOfQ]
                                          then FrontOfQ := y )
This is clearly O(N^2) = O(n), so priority first searching is essentially no
more expensive than depth first searching in dense networks, but in sparse
networks it involves an extra factor of log(n).
We can find the shortest paths to each node as well as the shortest distances
by recording for each node not just the current distance to that node, but also
the "predecessor" node which caused that distance to be recorded. When the
algorithm is over, the predecessor nodes can be traced back to the initial node
to find the path.
This general technique of using a priority queue while searching is called
PRIORITY FIRST searching. It generalises breadth first and depth first search,
and is used in a variety of situations. For example in chess playing programs,
possible moves are often searched in order of the strength of the position so
that more promising lines of play are examined first.
4.3 Longest Path Problems
--------------------------
To find the longest path from one node to another in a general network is an
INTRACTABLE problem connected with the Travelling Salesperson problem (see
later).  Often however, the network involved is acyclic (a directed acyclic
graph, or DAG for short), i.e. it has no cycles:
      A -----> C <----- E -----> G
      |        ^        |        ^
      |        |        |        |
      |        |        |        |
      |        |        |        |
      v        |        v        |
      B -----> D <----- F -----> H
Efficient algorithms on DAG's rely on the following fact. A DAG can be
TOPOLOGICALLY SORTED, i.e. sorted into an order in which all the edges "point
forwards".  For mathematicians, a DAG represents a partial order which can be
embedded in a total order.
        ----------------------------------------------->
       |       --------------------------------->       |
       |      |                                  |      |
       E ---> F ---> H ---> G      A ---> B ---> D ---> C
       |                    |      |                    |
        ------------------>         ------------------->
The topological sort algorithm works as follows. Start anywhere and follow
edges until you find a node with no edges out of it (like C). Make this node
the last one in the sorted list, remove it and the edges into it, and
topologically sort the remaining nodes. This crude algorithm is O(N^2). To
improve it, we arrange to take up each search for a node where we left off the
previous search. This gives us an ordinary depth first search in which we add
a node to the sorted list when we backtrack from it. However, not all nodes may
be reachable from the first node we choose, so we may have to do several depth
first searches. The algorithm is:
   Topologically sort a DAG
   ------------------------
   Initialise the sorted list to be empty
   While there is an unmarked node u in the DAG do
      depth first scan from u
   Depth first scan from x
   -----------------------
   mark x
   for each edge out of x, to a node y, do
      if y is unmarked then scan from y
   add x to the front of the sorted list
Check that in the above example, with edges out of a node listed in
alphabetical order, this algorithm causes two depth first scans (from A and E)
and gives the sorted list shown above.  The algorithm is clearly linear, O(n).
To find the longest path from A to Z in a DAG, it is just as quick to find all
the longest paths from other nodes to Z (finding all longest paths from A can
be done best by changing the representation of the network so that each node
has a list of the edges coming in).
   To find the longest paths to Z in a DAG
   ----------------------------------------
   Topologically sort the DAG
   Consider each node x in REVERSE topological order
   For each edge xy out of x calculate (length of xy) + (distance of y)
   Set distance of x to be largest of these
It works because the ordering ensures that y is to the right of x, and so must
have been dealt with already. There is one calculation per edge, so this is
O(E). Variations on this are used in Critical Path Analysis for job scheduling.
4.4 The Network Flow Algorithm
-------------------------------
Suppose we have a network representing flow through pipes, or traffic flow.
Each edge has a label representing the maximum flow along it. There is a source
node A and a sink node Z. We want to achieve the maximum flow from A to Z.
The idea is to start with any feasible flow (i.e. one which respects the
capacities). In this problem, a zero flow will do. Then we find a path from A
to Z along which flows can be increased.
   (Pictures from Sedgewick 436-7: note misprints)
             0/8      2/6     2/4
           -----> B -----> C ----->
          /        \     /         \
         /          \  |/ 0/2       \
        /            \ /--           \
      A               \               F      After ADEBCF
        \            / \__           /
         \          /  |\ 2/3       /
          \        /     \         /
           -----> D -----> E ----->
           2/2       2/5     0/5
The first number is the current flow, the second is the capacity (max flow).
Now after ABCDEF, we get:
           2/8     4/6     2/4
                   2/2
                   2/3
           2/2     4/5     2/5
After ABCF we get:
           4/8     6/6     4/4
                   2/2
                   2/3
           2/2     4/5     2/5
Now there are no paths of the above type along which the flow can be
increased. However, the flow can still be increased along ABEF
as decreasing the flow along EB is equivalent to increasing the flow
along BE. We get:
           6/8     6/6     4/4
                   2/2
                   0/3
           2/2     4/5     4/5
Thus we define a "flow-augmenting path" to be a path of forward and backward
edges from A to Z in which the forward edges are not saturated (do not have
maximum flow) and the backward edges have non-zero flow.  It is easy to work
out the maximum amount by which the flow can be increased along such a path
(min of amounts by which the flow can be increased along each edge). It is now
fairly easy to show that when this procedure cannot be repeated further, we
have a maximum flow. (A search for paths will divide the nodes into "visited"
and "unvisited". The set of edges from the visited to the unvisited nodes forms
a barrier across which no more can flow - called a "min cut".)
There is still the question of what order the paths should be searched for.  If
we were to use a depth first method, we would be likely to find long paths from
A to Z which could be disastrous:
            1000   1000
             --> B -->
           /     |    \
         A       | 1   C
           \     v    /
             --> D -->
            1000   1000
If the paths found were alternately ABDC ADBC, the flow would only be increased
by 1 each time. It is much better to do a PRIORITY FIRST search looking for
paths producing the greatest increase. The number of flow augmenting paths
needed is hard to analyse, but is small in practice with these methods. Each
path needs one scan of the network, O(N^2) for dense networks, O(E*log(N)) for
sparse ones.
4.5 Problems related to Network Flow.
--------------------------------------
Suppose that we have a flow problem which also has capacities on the nodes,
i.e. there is a maximum flow possible through each node as well as each edge.
This can be converted into a standard flow problem by introducing dummy
edges:
             6                                    6
     --->--- X --->---            ---->--- X1 ---->---- X2 ---->----
A problem with multiple sources and sinks can be converted into a
standard problem by adding a "supersource" and "supersink":
                                          10 A ... Y 10
     A --10--> ... --10->Y              /               \
     B --7-->  ... --12->Z           A1 - 7  B ... Z 12 - Z1
     C --5-->                           \
                                          5  C
A problem with undirected edges (flow in either direction) can be represented
by replacing each undirected edge by a pair of directed edges (as usual in
network algorithms).
Suppose each edge has a minimum flow as well as a maximum capacity. This can be
handled in the same way, except that a zero flow is not feasible, and an
initial feasible flow may be hard to find. There is a separate (similar)
algorithm for finding an initial flow, again by changing the network and
applying the flow algorithm to the new network.  (non-exam: not in the books).
In the MATCHING problem, we have a number of applicants and a number of jobs.
Each applicant is qualified only for certain jobs. The problem is to match up
as many pairs as possible. This is a multiple source/sink flow problem:
   (Picture of bipartite network turned into flow problem)
   (Not the only algorithm)
A generalization is the ASSIGNMENT problem in which there is a cost associated
with each edge (representing suitability of applicant for job).  A further
generalization is the TRANSPORTATION problem as follows.  We have a number of
factories each producing a maximum amount called its supply. We have a number
of warehouses, each with a (minimum) requirement or demand. There is a cost
associated with transporting a unit of product from a particular factory to
particular warehouse. (Assignment is transportation with unit supplies and
demands). This is again a multiple source/sink flow problem in which we want a
minimum-cost maximum-flow. The flow algorithm can be improved to cope, or there
is a "Hungarian" algorithm which is better.
The MIN CUT problem is as follows. Suppose we have an electronic circuit which
we need to divide into two parts (to go on two sides of a circuit board or two
layers of a chip). We want the number of wires connecting the two parts to be a
minimum. If we treat this as a flow problem (unit flow along unit-capacity
wires between components - NOT electrical flow), then when we have found the
maximum flow, we will have found a minimum cut as described above.
4.6 Pattern Matching With Regular Expressions
---------------------------------------------
Regular expressions are a way of specifying patterns.  Suppose we only have
patterns of letters, other symbols having special meanings. The special
operations on patterns are CONCATENATION, UNION (+) and CLOSURE (*), and
brackets may be used. Concatenation means "followed by", union means
"either/or" and closure means "any number of (including zero)". We also allow
the abbreviation "." meaning any single letter (short for a+b+...+z).  For
example:
   Pattern     Meaning                        Matches
   -------     -------                        -------
   a           matches the letter a           a
   ab          matches a followed by b        ab
   a+b         matches an a or a b            a, b
   a*          any number of a's (maybe 0)    "", a, aa, aaa, aaaa ...
   ba*b        some a's between two b's       bb, bab, baab, baaab ...
   (a+b)b      an a or b followed by a b      ab, bb
   (a+b)*      any number of a's or b's       "", a, b, aa, ab, ba, bb, aaa ...
   (ab)*       any number of ab's             "", ab, abab, ababab ...
   aa*         any positive number of a's     a, aa, aaa, aaaa ...
   a(b+c)d     b or c between a and d         abd, acd
   .           any single letter              a, b, c, d, e ...
   .*          any sequence of letters        "", a, be, sea, xyzzy ...
   .*cat.*     anything containing cat        cat, scat, catch, zzcatzzzz ...
   .*(ie+ei).* anything containing ie or ei
Exercise: Find an expression matching any sequence of a's and b's NOT
containing two consecutive a's.
Note that there is a convention that where there are no brackets, closure is
done before concatenation which is done before union (like exponents,
multiplication and addition). Also note that the regular expressions in the
UNIX editors "ed", "vi" have no union, work with any characters and not just
letters, use \( and \) for brackets and have some useful extra abbreviations.
These expressions are called REGULAR expressions and form a regular algebra
which is an interesting one with a differential calculus etc. (see "Regular
algebra and finite machines" by Conway).
Regular expressions are important not so much for their convenience for the
user, but more because they correspond EXACTLY to patterns which can be
recognised by a single scan without using any memory. Thus there is no regular
expression which matches every palindrome since there is no alternative but to
remember the first half in order to match it with the second half. Similarly
there is no regular expression matching every pattern with an equal number of
a's and b's because there is no alternative to keeping a count.
We can build a network to represent a pattern as follows. The network will have
a start node A and a finish node Z. The networks for patterns "a", "a+b", "ab"
and "a*" are:
                  a
            A -------> Z
                          a
              ------> B ----->
             /                \
            A                  Z
             \            b   /
              ------> C ----->
                 a        b
            A ------> B -----> Z
            A ------> Z
           / \
          ^   |
         a|   |
          |   v
           \ /
            B
Each node is either a matching node with one edge out labelled with a letter,
or a choice node with two unlabelled edges out. Such a network has a sparse
representation without needing linked lists.  A network for an arbitrary
pattern is built up out of networks for smaller subexpressions.  For union, we
use a new start node with edges to the old start nodes, and join the old finish
nodes together. Thus (a+b)+a* gives:
                                    a
                        ------> C ----->
                       /                \
              ------> B                  \
             /         \             b    \
            /           ------> D ------>-\\
           A                                Z
            \                              /
             \                            /
              ------> E ----------------->
                     / \
                    ^   |
                   a|   |
                    |   v
                     \ /
                      F
For concatenation, the finish node of the first network is joined to the start
node of the second network, so (a+b)c* gives:
                          a
              ------> B ----->
             /                \
            A                    D ------> Z
             \            b   / / \
              ------> C ----->  |  |
                               c^  |
                                |  v
                                \ /
                                 E
Finally, for closure, we put the new start node at the old finish node, add a
new finish node, and add two new edges (from the new start node to the old
start node and to the new finish node). Thus (a+b)* gives:
           A ------> Z
          / \\
         ^  ^ |
        a| b| |
         |  | |
         B  C |
         ^  ^ |
         |  | v
          \ //
           D
For example, suppose we want a network to represent (ac+a*b)d. We get a network
similar to the one in Sedgewick:
                   a        c
               B -----> C ----->
              /                 \    d
             A                   D -----> Z
              \             b   /
               E -----> F ----->
              / \
             ^   |
            a|   |
             |   v
              \ /
               G
We can use a network like this to scan a string to see if it matches the
pattern as follows. At any time, there is a set of active nodes.  To begin
with, only the start node is active.  For each character in the string, we do
the following.  Each active choice node is first replaced by its two successor
nodes until no active choice nodes remain. Then the current character is
compared to the edge-labels out of the active nodes. Nodes which do not match
simply become inactive. Nodes which match are replaced by their successors.  If
the finish node winds up active, the string matches the pattern.
For example, suppose we want to know if "aabd" matches the pattern (ac+a*b)d.
We use the network above. We keep a list of active nodes, with the choice nodes
first.
      Step 1
      ------
      A  ->  E,B  ->  F,G,B  ->  E,C
      (replace A by its successors, then same for E, then match first "a")
      Step 2
      ------
      E,C  ->  F,G,C  ->  E
      (replace E by its successors, then match second "a")
      Step 3
      ------
      E  ->  F,G  ->  D
      (replace E, then match "b")
      Step 4
      ------
      D  ->  Z
      (match the "d")
The finish node Z is active at the end, so the string aabd does match the
pattern. If we try with abc, we get:
      Step 1
      ------
      A  ->  E,B  ->  F,G,B  ->  E,C
      Step 2
      ------
      E,C  ->  F,G,C  ->  D
      Step 3
      ------
      D  ->    (no states left active)
If we run out of active states like this, or if the finish node Z is not active
at the end, then the string does not match. We now know abc does not match the
pattern.
In order to implement this pattern matching algorithm, we need to be able to
extract the first item from the list, and add items to the beginning or end of
the list (according to whether they are choice nodes or matching nodes.  Such a
list is often called a "deque" or "double ended queue" and can be implemented
in an array in the same way as a queue. No node ever appears twice on the list,
so an array of length N can be used for an N-node network.
There are many variations on these networks. The ones in Sedgewick are almost
identical, but with extra unlabelled edges which make the building of the
network simpler.
It is possible to design simple building rules which eliminate choice nodes.
However, a node can then have any number of edges out of it, the work done in
each step of the matching algorithm is not reduced, so it is not worth it.
It is also possible to design networks which not only eliminate choice nodes,
but also have only one active node at any time. However, there are no simple
building rules for these. Such a network for (ac+a*b)d is:
                     b
          --------------------->
         /              -->     \
        /  a        a   \a/  b   \    d
       A -----> B -----> C -----> D -----> E
                \\       b       //
                 \-------------->/
                  \      c      /
                   ------------>
Networks like these are often better, but may sometimes yield very large (in
fact disastrous) numbers of nodes. Also note that there can be more than one
edge from one node to another (as B to D above). This is no problem in a sparse
representation.