Answers to "questions2" Analysis of Algorithms ---------------------- ----------------------- 1. The data structure needed is a queue of nodes. When a node is scanned, all its children are added to the back of the queue. Then the front item is removed from the queue to find the next node to be scanned. Each node is added to the queue once, and removed from the queue once, so the time taken is O(n) for n nodes. 2. To do insert, delete and search, we either want to use a hash table (average O(1) time), or B-trees (guaranteed O(log n) time). However, for this problem, we need three of them. It is possible to use three entirely separate structures, one for each type of search. Then inserting a record would involve inserting it into each of the three structures etc. This involves storing what could be lengthy data three times. Instead, one could have hash tables or B-trees of pointers to the records. Inserting a record would then involve entering the record data and updating the pointers in the three hash tables or B-trees. Each operation would take average O(1) time for hash tables, or guaranteed O(log n) time for B-trees. 3. The answer is simply to keep track of both the permutation, and the position of each number in the permutation in two arrays. For example the permutation 6 4 1 5 2 3 would have arrays: i: 1 2 3 4 5 6 permutation[i]: 6 4 1 5 2 3 position[i]: 3 5 6 2 4 1 where position[4]=2 since number 4 is at position 2 in the permutation. Now the questions "what is at position i in the permutation" and "what is the position of number j in the permutation" can be answered in O(1) time - the answers are "permutation[i]" and "position[j]". Swapping the numbers at position i and j is also O(1) since it can be done by: temp := permutation[j] permutation[j] := permutation[i] permutation[i] := temp position[permutation[i]] := i position[permutation[j]] := j It is also possible to swap the positions of the numbers i and j in O(1) time. In fact, the position array is just the "inverse permutation" of the original, so there is a complete symmetry between the arrays. 4. We need a data structure in which to keep the largest n numbers found so far. As each number is read in, we need to be able to check to see if it is larger than any of the numbers in the data structure. If not, it is discarded, otherwise the smnallest number in the data structure is discarded, and the new number is entered into the data structure. A suitable data structure for this is a priority queue in which smaller numbers have higher priority. Then the highest priority item is the smallest of the saved numbers. We read a number in from the file and compare it with the highest priority item. If the new number is smaller, it is discarded. If it is larger, the highest priority item is removed from the priority queue, and the new number is inserted in its proper place in the priority queue. Of course, when reading the first n numbers from the file, no numbers are discarded. The time taken is O(log n) per number in the file. 5. The most natural data structure is the priority queue which takes O(log n) time per operation. However, if removing the highest priority item is the commonest operation, it might be better to use an ordinary queue which is kept sorted by priority with the highest priority item at the front and the lowest at the back. Removal from a normal queue is an O(1) operation, whereas insertion may be O(n). Note that this is a curious question, as it is difficult to see how removal can be more common than insertion without the queue becoming empty. However, in a scheduler many processes may have the same priority. In this case one time slice typically involves removing a process from the front of the queue, running it for a bit, then returning it to the back of the queue, both O(1) operations on a normal queue. 6. A suitable recursive design for range-search in a binary search tree is: To search for items in a subtree which are within a range --------------------------------------------------------- compare the root with the range if it is smaller than the range then search the left subtree else if it is larger then search the right subtree else (if it is within the range) then search both subtrees The algorithm will follow a path down from the root until it encounters an item within the range. The algorithm then follows two paths from that item to the bottom of the tree, searching everything in between. The left path represents the left boundary of the range, and the right path represents the right boundary of the range, and all items in between are in range. The length of the paths is O(log n) and there are R items between so the time taken is O(log(n) + R). 7. We need to be able to convert from the name of a set (or item) to its number, and vice versa. This suggests a hash table FINDSET in which to look up set names and find the corresponding sequence numbers. We also need an array NAMESET, each entry of which points to a hash table entry, which allows us to take a set's sequence number and find the corresponding name. In fact we need two more structures FINDITEM and NAMEITEM to do the same for sets. Now the function "set-name = FIND(item-name)" works as follows. We look up the item name in the hash table FINDITEM to find the item's number. Then we use the parent array as usual to follow parent pointers to the root of the set-tree, giving us the sequence number of the set. The array NAMESET allows us to find the set's name. The procedure "UNION(set-name1, set-name2, set-name3)" which forms the union of set-name1 and set-name2 and gives the union the new name set-name3 works by looking up the first two names in FINDSET, using the parent and size arrays to make the smaller tree a subtree of the larger, removing the hash table entries for set-name1 and set-name2, adding a new entry for set-name3, and updating the NAMESET entry for the union. The times for the operations have been increased by the times for accessing hash tables, which are O(1) on average. The times were O(log n) for FIND and O(1) for UNION, and are now average O(log n) for FIND and average O(1) for UNION. With path-compression, the time for FIND increases to "almost O(1) on average", and the hash tables do not alter this. Now consider implementing "MOVE(item, set1, set2)" which moves an item from set1 to set2. It would be easy enough to change the parent pointer for the item to point to the root of set2, but there would be a problem with all the descendents of the item in set1 which would then also appear to move to set2. The only reasonable way to cope with this without losing efficiency is to leave a dummy item in set1. For example, the arrays 1 2 3 4 parent: 0 3 1 0 size: 3 0 0 1 set names: A B item names: w x y z represent two sets A = [w, x, y] and B = [z]. Suppose we want to move y from set A to set B. In order to avoid upsetting item x (which has y as parent), we change entry 3 to a dummy entry, and add a new entry for y as shown: 1 2 3 4 5 parent: 0 3 1 0 4 size: 2 0 0 2 0 set names: A B item names: w x ? z y Now we can safely follow parent pointers from x and get the right root. The item y has changed position in the array, but this is OK provided that we always access it BY NAME as in the first part of the question, and if we keep the naming information up to date when we move it. 8. In the simulation, at any time we need a collection of future events, each described by a record containing (at least) the time at which the event is due to occur, and the action to be taken when it does occur. Each step of the simulation consists of finding the event with the earliest time, and taking the action indicated, which may involve adding new events to the collection. The ideal data structure for this is a priority queue in which events with the earliest times have the greatest priority. Thus it takes O(log n) time to extract the earliest event and O(1) time on average to add an event to the collection. 9. We have a sparse matrix problem - we only need record the live cells. One possible data structure is a hash table. Each square is known by its (x,y) coordinates. The coordinates (x,y) can be used as a "name" to look up the square in the hash table. A hash table entry contains the coordinates x and y, and the state of the square ("empty" or "contains live cell"). If a square is not found in the hash table, it is assumed to be empty. Thus we can find the state of a square, or record a birth or death, in a time O(1) on average, where n is the number of live cells. 10. The rough algorithm design is: Create all permutations of ABC..Y Put Z in each possible place to produce ZAB.., AZB.., ABZ.. etc. Now the sequence ABC... is a list, and as Z must be inserted at various different places, a linked list is indicated: A -> B -> C -> ... | ^ v | Z The number of cells will not change, so there is no space allocation problem, so an array of cells is indicated rather than using records and pointer types. Each cell has a name and a pointer. We can separate these into two arrays so that BDAC is represented as 0 1 2 3 4 next: 2 3 4 0 1 names: A B C D Element zero in the array indicates the number of the first item. A zero in the array indicates no next item, i.e. the end of the list. The array of names can be ignored, except for printing out. We can now write a procedure permute(k,n) where k is the current size of the list and n is the desired size. Permute is called initially with k=1 and an empty list (next[0]=0): procedure permute (k, n: integer); {called with k=1} var p: integer; begin p := 0; repeat next[k] := next[p]; next[p] := k; {insert k after p} if k=n then print {or whatever} else permute(k+1,n); next[p] := next[k]; {remove k} p := next[p] until p = 0 end; This procedure takes only a constant amount of time outside the recursive calls. If f(k) is the time taken by "permute(k,n)" then we have f(k) = k * f(k+1) (with f(n) = O(1)) giving f(1) = n! as required.