In this, the last chapter of the introductory part of this book, we illustrate the use of SETL by giving a variety of programs which exhibit its features and can serve as useful models ol style. Some of the smaller programs present significant algorithms; the larger examples show how more substantial programming problems and applications can be addressed.
A graph is a collection of nodes, pairs of which are connected by edges (see Section XXX). Graphs come in two varieties, directed graphs, each of whose edges has a specified starting node and target node, and undirected graphs, whose edges can be traversed in either direction. The most natural SETL representation of a directed graph G is a set of ordered pairs [x, y], each such pair representing an edge with starting node x and target node y. It is convenient to represent an undirected graph G in the same way, but in this case the reversed edge [y, x] belongs to G whenever [x, y] belongs to G. This representation also allows us to regard G as a multivalued map: G{x} is the set of nodes connected to x by some edge. The following algorithm makes use of this fact.
Given an undirected graph G, the Eulerian path problem, named after the famous mathematician Leonhard Euler (1707-83), "who calculated as other men breathe," is to traverse all the edges of G exactly once by a single unbroken path p which starts at some node x of the graph and ends at some other node y (which might be the same as x). We can think of such a path, called a Eulerian path, as "using up" edges as it traverses them. Euler used the following argument to determine which graphs contain paths p of this kind. If a node z along p is different from the starting and ending nodes x and y of p, then immediately after p has reached z along one edge p will leave it along some other edge, and thus p will always use up an even number of the edges which touch any node z of p not equal to x or y. The same remark applies to the starting node x if x = y, but if x and y are different then p must use up an odd number of the edges touching x and an odd number of the edges touching y. It follows that a Eulerian path p which traverses all the edges of G just once can only exist if G is connected and either has no nodes x touched by an odd number of edges or has exactly two such nodes x, y; and in this latter case every Eulerian path p must start at one of x, y and end at the other.
Suppose, conversely, that G has either no nodes or exactly two nodes which are touched by an odd number of edges. Then we can construct an Eulerian path p as follows. If every node of G is touched by an even number of edges of G, let x be any node of G; otherwise let x be one of the two nodes x, y of G touched by an odd number of edges. Start the path p at x, and extend p as long as possible by stepping from its endpoint along any edge of G that has not been traversed before. Since we consider an edge to be used up as soon as it is traversed, the construction of p uses up more and more edges of G and therefore must eventually stop. Hence p must be finite. Suppose that p ends at a node y. Clearly all the edges touching y must have been traversed by p, since otherwise p could be extended by some edge. Thus, if the starting node x of p is touched by an odd number of edges, p must end at some other node y which is also touched by an odd number of edges, whereas if x is touched by an even number of edges, then p must return to x and end there. In either case, removing all edges traversed by p from G will leave behind a graph G' each of whose nodes is touched by an even number of edges. If p does not already traverse all the edges of G, then some node z along p will be touched by some untraversed edge. In this case, one can construct a path q by starting from z with this edge and extending q along untraversed edges as long as possible. Since the remarks concerning p apply to q as well, and since q can be regarded as a path in the graph G', and since all of the nodes preceding G are touched by an even number of edges, the path q must both begin and end at z; i.e., q must be a circuit. Hence we can insert q into p, thereby constructing a path which first follows p to z, then follows q until q finally returns to z, and then follows the remainder of p to its end. Call this extended path by the same name p. Repeating the construction and insertion of circuits like q as often as possible, we must eventually build up a path p which traverses all the edges of the original graph G.
The two following procedures realize the Eulerian path construction described in the preceding paragraphs. Procedure build_path starts a new path and extends it as far as possible, deleting (from G) the edges traversed by this path; Procedure Euler_path installs the path sections returned by build_path into the overall Eulerian path that it constructs and returns.
program Euler; -- Eulerian path construction graph := {[1,2], [2,3], [3,4], [4,1], [4,2]}; -- a small graph print(euler_path(graph + {[y, x]: [x, y] in graph})); -- which is undirected. procedure Euler_path(G); -- constructs Eulerian path for graph G nodes := domain(G); -- all nodes in the graph. if #(odds := {x in nodes | odd(#G{x})}) > 2 then return OM; -- since more than two nodes are end if; -- touched by an odd number of edges -- odds is the set of all nodes of G that -- are touched by an odd number of edges x := arb(odds)?arb(nodes); -- pick a node of odds if possible -- otherwise pick any node of G path := [x] + build_path(x,G); while exists z = path(i) | G{z} /= {} loop new_p := build_path(z, G); -- insert new section into path G -:= ({[y,x]: [x,y] in new_p} + {e: e in new_p}); path := path(i..i - 1) + new_p + path(i..); end loop; return path; end Euler_path; procedure build_path(x, rw G); -- builds maximal path section starting at x, -- and deletes all edges traversed p := [ ]; while (y := arb G{x}) /= OM loop -- while there exists an edge leaving the last point reached p with:= y; -- extend path to traverse the edge G -:= {[x, y], [y, x]}; -- delete the edge just traversed x := y; -- step to y end loop; return p; end build_path; end euler;
Certain problems, of which scheduling problems are typical, require one to arrange the nodes n of a graph G in a list such that every edge of G goes from a node n1 to a node n2 coming later in the list. This is called the problem of topological sorting. Suppose, for example, that a student must choose the order in which he or she will take the courses required to qualify as a computer science major, some of which have other courses as prerequisites. Suppose also that we represent the prerequisite relationship as a set G of pairs, agreeing that whenever course n1 is a prerequisite of course n2, we will put the pair [n1,n2] into G. Then, mathematically speaking, G is a graph; in heuristic terms, G{n1} is the set of all courses for which n1 is a prerequisite. (Note the connection of the topological sorting problem with the transitive computation of prerequisites described in Section 4.3.8.1.)
To sort a collection of courses topologically is simply to arrange them in any order in which they could actually be taken, given that all the prerequisites of each course n must be taken before n is taken. To do this we can simply find some course n which has no (unfulfilled) prerequisites, put n first in the list L, drop all edges [n, n1] from G (since n is no longer an unfulfilled prerequisite), and then continue recursively as long as courses without unfulfilled prerequisites remain. Written as a recursive SETL routine, this is short and direct:
procedure top_sort1(G,nodes); -- topological sorting procedure, recursive form return if exists n in nodes | n notin range(G) then [n] + top_sort1(G lessf n, nodes less n) else [] end if; end top_sort1;Invocation of top_sort1(G) will return a tuple t consisting of some or all of the nodes of G. If it is possible to sort nodes of G topologically, then every node of G will appear in t. This will be the case if and only if G admits no cycle of nodes such that
n1 is prerequisite to n2 is prerequisite to n3 is prerequisite to ... is prerequisite to nk is prerequisite to n1.To see this, note that it is clear that when such a cycle of mutually prerequisite nodes exists, no node in the cycle can ever be put into the tuple t returned by top_sort1. Conversely, if a node n0 belongs to no such cycle, then eventually top_sort1 will have processed all the predecessors (i.e., prerequisites) of n0, and after this top_sort1 must eventually put n0 into the tuple t it returns. This shows that the set of all nodes belonging to any cycle like (n1,n2,...,nk,n1) is simply
nodes - {x in top_sortl(G,nodes)},so that (1) can also be used to test a graph G for the presence of cycles.
Like many other tail recursions, i.e., recursive procedures which only call themselves immediately before returning, the topological sort procedure seen above can be rewritten as an iteration (see Section 5.4). Written in this way, the topological sort procedure becomes:
procedure top_sort2(G); -- first iterative form of topological sort nodes := domain(G) + range(G); -- Here we calculate the set of all nodes; this makes it unnecessary to -- pass the set of nodes as an additional parameter. t := [ ]; -- initialize the tuple to be returned while exists n in nodes | n notin range(G) loop t with:= n; G lessf:= n; nodes less:= n; end loop; return t; end top_sort2;
It is possible to improve the efficiency of (3) very substantially by keeping the current value of the set
{n in nodes | n notin range G}available at all times. To do this, we proceed as follows:
procedure top_sort3(G); -- second iterative form of the topological sorting procedure nodes := (domain G) + (range G); count := {}; -- initialize the count function described previously ready := nodes; -- The following loop will remove elements that have -- any predecessors from ready for [x, y] in G loop count(y) := (count(y)?0) + 1; ready less := y; -- since y has a predecessor end loop; -- At this point 'ready' is the set of all nodes without predecessors t := [ ]; -- t is the tuple being built up while ready /= {} loop n from ready; t with:= n; for n1 in G{n} loop if (count(n1) -:= 1) = 0 then ready with:= n1; end if; end loop; end loop; return t; end top_sort3;It is not hard to see that the preceding code examines each edge of the graph G just twice. Thus the time needed to execute this code is linearly proportional to #G.
Suppose that the members of a population of n students are applying to a collection of m colleges. We suppose also that each student finds a certain collection of colleges acceptable, and that he/she ranks these colleges in order of decreasing preference. Finally we suppose that each college c can admit only a given quota Q(c) of the students who apply to it, and that it is able to rank all the students in order of decreasing preference. We do not suppose that any of these preferences is necessarily related to any other; that is, different students can rank colleges in radically different orders, and different colleges may find quite different types of students preferable. The problem we consider is that of assigning students to colleges in such a way as to satisfy the following three conditions:
This problem was studied by David Gale and Lloyd Shapley (American Mathematical Monthly, 1962, pp. 9-15), who gave a simple algorithm for finding an assignment satisifying conditions (a), (b), and (c). The algorithm is just this: Each student applies to his first-choice college. Then each college c puts the topmost-ranked Q(c) students who have applied to it on an active list and notifies the others that they have been rejected. All rejected students now apply to their second-choice colleges. Then all colleges rerank their applicants, keep the first Q(c) of these applicants, and again notify the others that they have been rejected. This cycle of reapplication and reranking continues until no rejected students have any more colleges on their list of acceptable colleges. It is clear that the assignment produced by this procedure satisfies condition (a). Condition (b) is also satisfied, since if s2 finds college c acceptable, he/she will eventually apply to college c and can then bump any student s1 whom c finds less acceptable, but will never subsequently be bumped except by a student whom c finds more acceptable. Finally, condition (c) is satisfied, since if s1 prefers c2 to c1 he/she must have applied to c2 before c1 but been bumped from c2's active list by a student that c2 prefers to s1. But when this happened c2's active list could not have contained any student that c2 does not prefer to s1. Therefore, since the students on college c2's active list never grow any less attractive from c2's point of view, c2 will never regard any student on its final active list as less desirable than s2.
The Gale-Shapley iteration only continues as long as some student has just been rejected by the latest college to which they have applied. Since this rejection 'uses up' one of the items on that student's list of acceptable colleges, the nmber of iteerations required can be no more than the sum of the lengths of all these lists, plus 1. The code which follows reflects this fact.
Programmed in SETL, the Gale-Shapley algorithm is as follows.